►
From YouTube: 10. I/O Best Practices
Description
Learn best practices for I/O at NERSC.
Slides for all sessions can be downloaded from here: https://www.nersc.gov/users/training/events/new-user-training-june-21-2019/
A
Hi
everybody
I
need
tea,
cozy
all
dance
group
as
well
and
I'm
gonna,
give
you
the
high
level
about
20
minute
overview
of
Io
best
practices
for
your
applications.
There
are
systems,
but
in
general,
need
to
be
deeper,
so
we're
going
to
cover
four
basics.
Parallel
I/o
walk
through
this
back
talk
a
little
bit
about
the
profiling
tools.
You
know
that
multi
you
have
at
Newark
in
other
places,
talk
a
little
bit
about
well,
okay,
everything's,
going
slow
and
like
that
you
start
on.
But
what
does
that
mean?
Was
my?
A
A
So,
from
a
very
high-level
perspective,
your
application
is
in
wanting
to
store
data
somewhere
in
the
files.
So
we
are
you're
gonna,
say:
hey,
go,
read
my
data,
hopefully
it's
not
in
CSV,
but
what
actually
happens
underneath
that,
on
the
way
out
to
the
file
system,
is
quite
a
stack
of
software,
so
below
that
typically,
you've
got
some
productivity
interface.
A
Typically,
they
play
fun
since
their
clothes
are
kind
of
layer
that
you're
gonna
call
it
to
that
later,
typically
goes
into
one
of
the
high
level
I
of
thy
grace.
It's
available
hdf5
unity,
yeah,
perhaps
unnoticed
or
rude,
something
that
provides
a
abstraction
layer
over
the
lower
levels
of
MPI
parallel
file
system.
It
gives
you
a
more
object-oriented
perspective
than
the
bite
winner
of
the
disk.
A
Io
me,
the
wear
is
very
that's
where
we
transition
into
fights
effectively
when
it
gives
us
the
beverage
for
knelling
still
that
there's
a
lot
of
processes
running
as
parallel
job.
It
made
important
what
we
call
the
collective
IO
I'll
talk
about
a
little
later,
MP
eyeing
telepathically.
These
sorts
of
packages
are
what's
typically
used
at
this.
The
whole
team
in
it
bring
everything
together
to
go.
Beer
is
way
below
that
number.
A
Even
you're
known
going
into
the
file
system
somewhere
recording
the
ILO
from
hollow
your
client
compute
nodes,
aggregating
into
larger
requests,
and
this
helps
to
reduce
the
box
attention
in
the
file
system
and
may
cross
over
outside
of
the
system.
Some
modern
systems
are
moving
story
to
internally
into
actual
nodes
of
the
hbt
system,
the
most
of
time
you're
transitioning
off
into
a
storage
box.
Some
are
outside
view
system.
Things
like
they
don't
work
before
at
all.
A
These
are
two
packages
that
you'll
see
another
parallel
file
systems
like
lustre,
the
and
qfs
that
doughnut
is
talking
about.
They
may
arrive,
have
a
POSIX
looking
interface
to
a
file
system,
but
are
designed
for
very
large
peering
and
current
workloads
that
are
coordinated
loosely
coupled
for
video
writing
randomly,
but
generally
some
pattern
of
coherence
that
is
imposed
by
the
ILO
or
perhaps
even
the
hdf5
there
and
then
below
that
we
get
into
the
I
know
hardware
that
we'll
touch
on
here,
but
you
know
business
them
in
beer.
It
dries.
A
So
you're
the
right
typically,
we
hope
a
very
high
level
I
find
code,
something
like
this.
Perhaps
this
is
using
H
by
PI,
which
is
a
way
of
that
further
50
layer,
the
Python
code
that
imports
the
right
modules
for
MPI
and
hdf5
does
something
simple
to
open
the
file
and
then
uses
the
niceties
of
pythons
array
access
slices
to
read.
They
then
for
just
or
write
data
out,
whatever's
going
on
and
there's
good
ways
to
set
of
collective
I/o
even
up
at
the
high
level.
So
you
can
maximize
your
apartment
here.
A
So
when
you
look
at
this,
there's
approximately
35
lines
on
the
left
side,
that's
an
entire
Python
program
on
the
right
side
is
C
with
an
HDI
similar
set
of
my
old
patterns,
but
you
can
see
we've
done
about
35
line
to
them
on
this
side,
but
we're
like
halfway
through
the
Carnegie
Tucker.
Something
like
that.
This
shows
you
the
end
of
the
conciseness
of
python
other
languages
that
are
a
little
bit
of
those
adenosine.
A
A
You
can
see
the
coding
effort
and
the
time
spent
figuring
out
how
to
get
your
I/o
is
much
much
easier
for
the
Python
world.
When
they
see
more,
you
get
a
little
bit
of
more
expressive
to
be.
There
might
be
a
feature
that
isn't
covered
in
the
Python
layer,
but
typically
the
and
other
rappers,
the
top
of
each
get
by
the
Tina
Cydia,
to
do
a
very
good
job
of
expressing
the
features
and
still
giving
the
power.
A
So,
but
okay,
fine
I,
improved
my
productivity.
How
much
performance
are
you
knew
that
this
is
going
to
ratio
performance
against
C
code?
So
you
can
see
that
typically
for
independent
and
collective
hi,
oh
you're,
really
close
with
Python
to
what
the
speed
is
of
a
C
program.
When
you
do
things
that
are
very
granular,
you
know
very
small
operations
of
metadata.
It's
going
to
perform
some
of
worse.
A
It's
got
a
lot
more
layers
to
pass
through
and
interpret
in
order
to
get
down
to
the
actual
Google
business
file
or
create
this
file
and
walk
along
individual
components
of
the
file
is
slow,
are
slower
terrible,
but
when
you
actually
do
the
actual
I
owe
the
the
that
scale
rank.
These
data
set
elements
out
very
memorable.
Almost
all
vehicles
game.
A
A
A
So,
even
though
it's
a
little
promotes
to
read
the
C
code,
it's
a
lot
less
verbose
than
writing
the
Rossi
code
to
talk
to
him
the
I,
so
they
had
a
very
nice,
well-defined
layer,
the
entire
ecosystem
that
you've
been
trying
to
participate
in
for
each
get
five
all
the
applications
that
can
read
and
ready
to
get
five
don't
get
now.
That's
your
credit
keep
use.
It
is
about
the
expressivity.
A
A
A
You
can
see
it's
not
terribly
proposed
in
parallel
after
adding
a
little
bit
of
extra
complexity,
but
it's
not
terrible,
and
hopefully
the
FBI
in
did
and
finalized
are
not
part
of
your
I
old
I
would
be
3,
so
the
in
actual
calls
here
for
doing
parallel,
I/o
for
a
to
deify.
This
you,
you
want
to
say,
hey
I,
need
to
use
the
MPI
driver.
A
You
set
that
up
as
part
of
your
property
this
when
you
open
the
file
the
manor
on
after
you've,
created
the
dataset
you
want
to
do
collected
by
Operation
in
many
cases,
not
always,
but
many.
So
you
have
to
tell
these
if
I
fight
away
go,
have
them
use
collective
writing
for
this
data
set
and
you
create
a
data.
Transfer
property
must
tell
the
property.
That's
that
anytime,
you
use
this
property
list.
It's
going
to
be
a
collective
I/o
and
then
you
go
break
you
down.
A
So
I'm
proceeding
down
the
staff
again
what's
below
the
f5.
Typically
in
the
some
of
the
other
projects,
I
mentioned
earlier,
they're,
largely
research
packages
and
in
production
systems.
Today,
you're
gonna
be
dealing
with
MDMA.
Oh,
it's
provides
another
layer
of
performance
over
the
raw
file
system
and
hopefully
it's
done
a
lot
of
the
optimizations
for
you
that
you
would
otherwise
need
to
be
doing
to
aggravate
together
the
denial
sufficient.
A
So
the
United
might
oriented
interface
very
much
like
POSIX,
and
you
specify
with
offsets
you're
gonna
write
to
you
know:
html5
builds
on
top
of
the
PIO,
provides
that
object,
abstraction
layer
on
top
of
the
POSIX
white
screen
so
API.
Typically,
when
we
talk
about
them
here,
the
most
important
thing
that
people
think
about
is
the
collective
IO
component,
but
it
also
has
interesting
ways
to
efficiently
describe
non-contiguous,
IO
and
the
state
of
types
of
use.
A
It
provides
site
amounts
of
non-blocking
or
does
a
synchronous
I/o
completely
asynchronous
in
the
fact
that
they
may
not
execute
in
the
background
completely
be
matched
you
may
need
to.
They
may
only
make
progress
when
MPI
is
being
paul.
They
can
provide
support.
Rather
than
other
languages,
we
said
we
saw
one
for
python
and
they
are
the
very
basic
way
of
providing
affordable
file
format.
A
It's
basically
a
squizzle
thing
to
get
everything
into
big
endian
format,
which
these
days
turns
out
to
be
a
mistake,
but
back
in
the
early
2000s,
the
pike
Sun
was
really
good
thing.
Big
endian
was
the
right
thing
to
do
a
little
bit
about
collective
Mindanao.
This
chaotic
arrow
diagram,
carrots
is
a
good
representation
for
independent
ILO.
Every
process
is
doing
its
own
thing,
nobody's
trying
to
coordinate
with
anyone
else
doing
the
rights
to
the
file
system
and
everybody
is
degrading
whatever
they
don't
talk
to
each
other.
A
They
don't
have
to
talk
to
each
other,
which
is
good
in
some
way,
but
they
look
at
any
benefit
from
the
effects
if
they
couldn't
and
you
do
want
to
use
the
dependent
I
know.
Sometimes
if
it's
very
small,
it's
a
metadata
that
you're
writing
up
your
desk.
You
know
how
many
what's
the
actual
size
of
your
array
or
something
some
header
that
you're
offending
the
file
may
be
independent
or
a
pic
of
duty
and
sometimes
of
synchronization,
doesn't
match
your
file.
A
We
get
this
sometimes
with
adaptive
mesh
codes
where
things
are
on
the
balance
or
variable-sized
of
different
frames,
but
typically
the
the
best
performance
is
gains
when
you
find
the
right
places
to
use
collective
IO
in
your
application.
Basically,
it
means
that
every
process
that
your
MPI
program
for
writing
HTML
is
the
same
way.
I
showed
that
earlier
all
the
processes
called
an
I/o
operation
and
that
that
I/o
operation
know
that
it's
a
collective
operation.
A
What
that
means
to
the
layer
below
you
say
to
get
5
or
FBI
is
that
you
are
agreed
to
communicate
with
all
the
other
ranks.
Did
you
know
they're
out
there,
and
then
you
can
look
at
the
I
know:
pattern
underlined
there.
You
get
five
or
if
you
go
to
the
I/o
billing
I
own
pattern,
that
it
gets
all
the
information
from
the
other
eggs
and
can
optimize
for
differences.
Oh,
we
have
six
rigs
in
this
job
and
everybody
is
rating
every
sixth
element
in
a
stranded
model
across
the
array.
A
A
You
know
you
have
a
number
of
questions
that
you
need
to
ask
you
something.
Well
am
I
reading
from
every
process
or
just
a
subset,
how
many
piles
of
iteration
big
is
the
items
in
the
file?
They're
writing
every
time
step
every
100
time
steps,
whether
the
buffer
sizes,
all
these
questions
right
and
think
about
these
aspects
of
your
I/o
pattern,
especially
in
a
parallel
world,
contiguous
and
non-contiguous
item
patterns
into
a
buffer
and
sequential
and
random
patterns,
are
also
very
important.
A
A
Sectors
there
are
blocks
off
the
file
system
on
the
desk
and
kupo
movement.
You
read
times
great
up,
but
it's
not
continuous.
You
see
the
disk
us
to
rotate
you're,
actually
reading
something
for
your
blog
and
your
total
time
to
do
that.
Series
of
iOS
it's
going
to
be
the
order
of
magnitudes
in
words.
It's
terrible
so
do
what
you
can
to
make
things
continuous
and
nice
sequential
patterns.
Hdf5.
Does
that
for
you
when
you
use
collective
so
to
live
the
I?
So
those
are
good
helpers.
A
So
you
say
to
yourself:
well,
that's
all
fine,
but
how
do
I
know
how
much
time
I
owe
actually
is
occurring
here,
so
nurse
provides
a
module
pump
darshana,
which
is
an
I/o
profiling
tool
that
profiles
MPI,
POSIX
hdf5
are
going
to
folks
have
developed
this
and
they've
been
maintaining
in
for
a
long
time.
That's
a
great
job
and
when
you've
note
that
module
that's
by
default,
even
home
Logan,
if
you
want-
but
you
probably
don't
want
to
do-
but
you
know
that
module
when
it's
live
in
your
apartment.
A
You
can
find
yours
using
your
username,
your
job
get
up
to
the
actual
name
of
your
darshan
file
and
you
can
look
in
there
and
say:
hey
what's
going
up,
he
took
your
time,
so
you
probably
want
to
use
some
of
the
darshan
commands
these
girls
scripts
and
she'll
scripts
to
actually
like
produce
summaries
of
the
entire
hog
or
her
file,
and
that
should
give
you
a
much
better
idea
of
the
middle
I/o.
Is
it
being
efficient?
Am
I
doing
actual
collective
IO
am
FBI
these
minimal.
A
A
These
green
bars
right
we're
taking
40%
of
the
application
it's
time
to
do
my
turn
so
running
that
through
insertion
and
looking
at
the
summary
output
at
dark
menu,
it's
terrible!
What's
going
on
get
in
there
and
adjust
Athena's
IO
behavior,
so
then
points
some
of
the
pitfalls
that
there
was
other
whites
doing
and
then
was
able
to
leverage
the
collective
item
correctly
and
how
it
basically
goes
to
zero
everything
that
was
taking
IO
before
it
is
completely
unmeasurable
in
in
the
application
runs.
So
this
takes
them
15
hours
per
year.
A
A
A
So
what
happens
when
you
do
access
to
the
first
buffer
from
your
computer?
We're
good
to
go
so
when
you
want
to
use
the
first
buffer
you
so
that
your
job
and
you
tell
Clara
by
the
way,
I
need
this
much
capacity.
Please
stage
these
files
in
before
the
job
starts
and
then,
when
I've
done
down,
my
output
files
are
named
list
this,
and
this
and
stage
those
out
after
my
compute
phase
is
done
that
all
happens.
Kind
of
outside
your
control,
I
mean
it's
in
yourself.
Learn
JavaScript,
that's
Larry!
A
You
went
from
with
one
application
h-,
which
is
doing
analysis.
Astronomy
images
bring
em
in
to
the
first
buffer
from
bus.
There
was
a
30%
improvement
in
some
cases
in
some
cases
for
pretty
percent
three
times
and
in
other
you
know
from
the
bits
piles
I
was
basically
at
four
times
in
group
fitness,
though
you
could
see
here
the
first
buffer.
It's
don't
importance
that
those.
A
Folks,
don't
agree
for
your
apps
performance
might
even
three-stage
your
data
into
the
first
buffered
as
part
of
your
job
you're
unable
time
to
go.
Wait,
that's
a
very
important
aspect
of
it
probably
needs
some
of
the
real
magnitude
benefits
of
doing
collective
I/o.
If
you
can
just
get
this
into
the
first
buffer,
then
even
independent
I
mean
it
was
probably
pretty
reasonable.
So.