►
From YouTube: New User Training: 09 Parallel I/O
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Hi
everyone,
my
name,
is
Jenny
I'm,
also
from
the
task
group
today,
I'm
going
to
talk
about
para
dial,
so
I
will
tell
you
a
little
bit
of
what
is
aisle
and
some
common
tire
issues
and
a
nurse
and
also
show
you
what
is
the
typical
hpc
I'll
stack
and
talk
about
a
little
bit
about
the
performance
versus
productivity?
That's
some
relatively
new,
also
I,
last
one
slides
about
one
case
study
which
is
as
seen
as
mysterious
IO
and
how
we
solve
their
own
problem.
A
A
There
are
millions
of
websites
choose
dot,
IO
that
good
question,
that
I
exactly
means
input
and
output
similar
with
what
we
mean
here
and
interesting
whenever
I
came
across
the
border
to
come
into
the
country,
I
was
asked
one
question:
what's
your
purpose
over
your
chief
and
I
I
told
the
officer
I'm
going
to
I'm
coming
to
this
country
to
do
my
PhD
studying
IO
and
then
the
officer
asked
explain
that
to
me.
So
I
put
that
on
to
that
to
his
keyboard
and
I
said
this
input,
and
what
do
you
see
from
the
monitor?
A
There's
the
output
and
what
is
parallel
I/o,
clearly
how
much
for
keyboard
and
much
more
monitors.
So
that's
all
I
believe
past
the
border
so
clearly
io.
Cannot
it
isn't
that
top-secret,
like
a
nuclear
technology
that
can
stop
me
from
coming
this
country.
Also
isn't
something
get
that
can
make
you
money,
but
why
do
we
care
about
IO?
So
here
at
least
some
come
some
common
io
questions
are
common
Iowa
issues.
So,
first,
today,
user
will
ask
hey
you.
A
You
guys
claim
that
your
system,
curry
that's
a
pig
back
then
with
around
700
or
more
gigabytes
for
a
second,
why
I
could
only
get
one
percentage
of
that
so
there's
a
really
common
question
and
and
secondary
scalability,
so
I.
If
you
get
a
little
bit
of
I/o
knowledge,
you
know
that,
with
more
I
of
processes,
more
storage
servers,
you
potentially
can
skill
your
application
to
a
larger
scale,
but
why
the
performance
isn't
actually
scalable
and
insert
the
metadata
issue.
A
By
closing,
mostly
like
a
dis
omission
there
we
have
limited
number
of
methods,
server
that
becomes
a
bottleneck
of
the
meditator
performance
and,
last
but
not
least,
the
pain
of
productivity.
So
users
came
to
us
and
said
I
like
to
use
and
try
Python
spark
or
tinder
flow
for
my
data
analytics
another
the
data
analysis
job,
but
why
the
iessons
is
slow.
So
those
are
the
typical
and
common
io
questions
and
that's
why
I
think
that
before
I
really
show
you,
how
to
optimize
IO
I
think
is
important
to
like
explain
the
complex.
A
So
there
are
two
major
factors
that
really
bring
this
iope
issue.
Firstly,
we
have
complex,
HPC
I/o
stack,
and
we
know
that
we
have
some
hardware
in
downstairs
and
when
we
we
know
that
we
can
run
our
application
in
the
system.
So
between
the
two
layers,
there
are
bunch
of
iOS
deck,
io
middleware,
so
including
parallel
file
system.
A
Lisa
also
mentioned
that
and
IO
middleware
like
MPI,
oh
and
also
high-level
IO
library
like
each
day,
five,
nine
CDF,
audios
and
also
now
these
people
start
to
use
more
Python,
which
offers
productively
a
productive
interface
like
H,
5
PI.
So
all
those
layers
come
together
to
serve
your
application
and
run
on
this
part
of
hardware
federal
file
system.
So
this
is
really
complex
and
whenever
your
application
issues
in
data
requests
impose,
our
I
would
request.
A
A
So
the
second
major
factored
is
is
the
difference
between
human
machine
in
which
how
we
describe
the
real
word
problem
so
for
scientific
data.
Typically,
we
we
try
to
describe
the
model
close
too
close
to
the
way
that
we
are
familiar
with,
for
example,
for
climate
science,
we
have
3d
data
right
and
latitude
longitude
and
hate.
That's
how
we
describe
the
weather
or
climate,
but
Hardware
on
the
hardware
layer.
It
only
can
understand
bytes.
A
A
That
means
you
want
to
read
the
image
column
by
column
and
without
in
such
kind
of
a
non-contiguous
IO
pattern.
So
in
this
case
the
disk
were
how
to
read
the
Sun
block
and
then
jump
I
think
jump
to
another
block
and
it
will
shake
a
little
bit
that
causes
some
latency.
So
in
this
simple
calculation
you
can
see
there.
It's
quite
dramatic
difference
between
contiguous
IO
cost
and
non
contiguous
IO
cost.
A
A
So
we
did
some
study
and
we
think
the
I/o
challenge
will
become
more
severe
in
the
next
few
years
in
2020
or
2025.
So
here's
the
simple
data.
So
if
currently,
your
application
produced
like
10
200
terabytes
of
data
so
2
years
from
now
that
application
can
produce
3
times
more
data.
So
many
years
from
now
it
will
be
22
times
more
of
data.
So,
given
that
huge
amount
of
data
how
to
efficiently
load
the
data
into
memory
to
continue
your
data
analysis,
that
will
be
challenging
so.
A
The
file
system
came
szene
to
really
leverage
the
multiple
disk,
multiple
object,
storage
servers
that
can
bring
some
parallelism
and
performance
to
your
application
and
a
nurse
we
used
Laster
and
gpfs.
Those
are
also
those
are
the
really
popular
parallel
file
system
across
our
those
HPC
facilities
and
that's
a
layer
right
on
top
of
the
I/o
hardware
and.
A
So
this
is
that
this
diagram
shows
our
current
architecture,
so
we
have
a
curry
King
as
well
and
calcaneal
those
two
partitions
both
connect
to
the
sink
all
night
router,
which
is
130
servers
that
our
net
router
can
redirect
your
I/o
requests
to
the
storage
to
through
the
parallel
file
system
master
and
totally.
We
have
148
object,
storage,
servers
and
also
additional
information.
We
can
try
to
spread
your
data
on
more
object,
storage
servers
with
the
simple
command
like
stripe.
Large
dots
means
your
current
directory.
A
And
when
they
are
on
top
of
a
parallel
file
system,
is
the
I/o
middleware,
so
I
omitted.
Where
comes
in,
to
really
bring
some
optimization
and
I
was
scheduling
and
exam
to
speed
up
your
I/o,
for
example,
in
MPI
earlier
they
provide
a
collective
aisle
and
non-block
non-blocking
I/o
by
default,
so
application
typically
will
use
just
independent
IO
in
which
all
your
processes
from
the
application
we
had
to
the
I/o
by
themselves
without
any
coordination.
A
And
if
we
turn
on
the
tractive
I/o
all
the
processes
before
the
actual
access
or
read
the
data,
they
will
start
communication.
First,
they
will
share
their
access
information
to
to
figure
out
the
optimal
way
to
access
the
data
and
mostly
attractive
I.
Okay,
optimize,
your
non-contiguous
IO.
If
you
obsess
the
data,
Duncan
Duncan
do
grizzly
the
intern
on
track
to
I/o
can
sometimes
bring
the
performance
benefit,
but
still
sometimes
we
will
use.
We
won't
use
independent
Iowa,
because
craig-carroll
can
bring,
because
there
is
a
communication
phase.
It
can
cause
some
synchronization
cost.
A
And
so
on,
top
of
all
I
will
meet
a
where
we
have
this
high-level
IO
library.
So
I
would
say
that
that
layer
is
more
close,
it's
more
closer
to
to
human
being
through
our
data.
So,
for
example,
hdf5.
Whenever
we
talk
about
hdf5,
it's
not
just
IO,
it
is
a
data
model
so
that
Lear
can
help.
You
describe
your
problem
easily
and
also
can
manage
your
IO
manage
the
IO
for
you.
So
you
don't
have
to
learn
a
lot
about
the
MPI.
A
A
And
last
is
about
productive
interface,
so
you
may
hear
about
spark
and
is
really
a
big
data
framework
and
tensorflow,
and
you
can
try
that
use
that
for
your
deep
learning
application
and
so
on,
so
for
using
those
kind
of
a
productive
software
and
you
have
to
pay
attention
to
their
IO
interface
and
mostly,
we
provide
some
recommendations:
photos,
productive
software,
for
example,
each
5pi,
just
the
productive
interface
at
the
Python
layer
and
also
tensorflow
aisle.
We
have
a
nurse
we
have
people
working
on
that,
so
it
will
be
released
soon.
A
Good
question
so
mostly
text
file
or
other
file
formats
that
popular
in
commercial
world
are
well
supported
in
the
existing
software,
but
for
running
that
on
each
PC
environment.
We
want
to
leverage
the
profile
system
in
order
to
do
that,
we
how
to
leverage
the
parallel
I/o
and
typically
do
like
a
test
file
interface
I
will
plug
in
it.
Isn't
we
are
supporting
those
parallel
I/o
feature,
so
we
typically
try
to
convert
those
text
file
into
the
hdf5
format
that
we
can.
We
also
call
that
just
because
that
we
have
some
nice
parallel
I/o
interface.
B
A
It
depends
so
some
people
say
we
don't
have
any
Iowa
problem,
because
their
data
is
tiny.
They
can
immediately
load
that
into
memory.
So
really,
in
that
case
you
don't
want
to
bother
with
parallel
I/o
and
turning
on
NPR
I/o.
Something
like
that.
So
in
that
case
probably
stick
with
your
what
you
have,
but
in
case
you
will
produce
100
gigabytes
like
even
terabyte,
so
you
really
want
to
think
about
the
file
format
and
the
I/o
piece
will
work.
A
And
here's
the
some
exam
before
using
it
by
PI
and
the
motor
neon
crack
vial
with
one
line
of
code
with
data
center
collective.
Then
you
can
really
leverage
and
benefit
from
the
collective
aisle
and
some
coding
compared
comparison
left
is
a
H
by
PI.
Right
is
written,
hdf5
C,
and
you
can
see
that
the
first
a
few
block
map
to
the
entire
page
of
the
right
and
then
taking
the
few
a
few
lines
point
to
another
page
and
the
last
a
few
lines
map
to
another
asserted
page.
A
So
you
can
see
in
terms
of
coding,
effort
writing
in
H,
5,
PI
or
Python
can
really
be
productive,
but
in
terms
performance,
the
question
is
again
is
that
when
you
gain
some
productivity,
how
much
performance
you
would
afford
to
lose
and
we
did
some
study.
We
found
that
most
cases,
it's
okay,
the
Python
layer,
doesn't
add
too
much
overhead
to
the
C
layer,
but
in
some
cases
in
case
of
a
metal
heavy
operation
or
communication
involved
operation,
there
is
some
performance
in
loose
in
using
the
productive
interface
last
slide.
A
Okay,
so
it's
bad
a
scene
as
IO
escena
is
a
Astrophysical
code
and
used
wide
in
wide
range
of
problems
like
interstellar,
medium
star
formation
and
when
the
user
first
came
to
us,
he
asked
I
want
to
know
how
much
I
always
taking
in
my
code
I
want
to
see.
If
there
is
some
eioped
button
egg,
then
I
used
the
darshan.
So
that's
also
something
we
recommend
for
profiling.
A
Aisle
I
simply
provide
his
code
and
showed
him
this
plot,
and
we
found
that
40
percentage
of
his
code
is
doing
IO,
which
is
useless,
and
for
years
his
code
is
wasting
the
time
wasting
his
time
in
doing
those
input
and
output,
but
really
the
cure
by
the
analogies
and
character
care
about
the
science
right
and
so
I
made
sorry.
Then
later
we
also
figured,
we
figured
a
little
bit
details
about
about
this
cause.
I
owe
patent.
We
found
that
every
few
seconds
his
code
produced
a
tiny
hdf5
file.
A
The
IO
patent
is
what
we
call
non
teenagers
and
the
number
of
percent
is
its
thousand,
and
then
we
try
to
turn
on
the
crack-tip
IO,
and
here
is
what
the
user
emailed
us.
So
the
user
made
that
change
and
he
found
a
extra
solved
his
problem,
so
the
ioad
take
40
percentage
of
time,
and
now
this
is
there
okay,
thank
you.
If
you
have
any
question,
feel
free
to
email,
our
consult,
look
at
this
website
and
can
easily
find
us.
Thank
you.