►
From YouTube: 09 - Data Storage & Sharing Best Practices
Description
Part of the NERSC New User Training on September 28, 2022.
Please see https://www.nersc.gov/users/training/events/new-user-training-sept2022/ for the training day agenda and presentation slides.
A
My
name
is
Lisa
Garrett
I'm,
going
to
be
talking
about
data
storage
and
sharing
practices,
I'm
just
trying
to
find
the
right
button.
Hopefully
that
looks
okay
to
everybody,
okay,
so
I'm
just
going
to
solder
ahead.
So
I'm
in
the
data
and
analytics
group
and
my
specialty
is
file
systems
and
beta
storage.
A
Let
me
get
the
video
going.
Okay,
sorry
about
that,
all
right
already,
so,
first
I'm
going
to
talk
about
data
storage,
best
practices,
so
I
wanted
to
talk
just
a
little
bit
about
the
data
storage
policy
here
at
nurse,
and
we
have
a
very,
very
long
set
of
very
detailed
policies
here
at
this
link.
So
if
this
is
something
that
is
interesting
to
you
or
I,
I
encourage
everyone
to
take
a
look
at
it,
at
least
once
because
it's
important
to
know
so.
A
A
These
resources
are
intended
for
users
with
active
allocations.
It's
strongly
recommended
that,
if
your
allocation
at
nurse
ends,
that
is
the
project
that
you're
a
member
of.
If
that
ends
that
you
transfer
excuse
me
all
the
data
that's
associated
with
that
project
to
somewhere
else
where
you
have
access,
because
we
can't
guarantee
that
you'll
still
have
access.
A
If
you're,
not
an
active
allocation
in
terms
of
how
data
is
managed
inside
of
nurse,
the
project,
pis
and
Pi
proxies
can
request
the
modification
deletion
or
transport
to
another
nurse
file
system
of
any
data
associated
with
their
nurse
award.
A
So
if
you
are
at
a
running
at
nurse
under
this
particular
project,
your
Pi
can
request
any
data
that
you
generate
under
that
project
be
moved
to
another
place,
be
changed
to
another
user,
those
kind
of
things,
and
that's
because
the
pi
is
ultimately
responsible
for
the
research
products
that
are
made
at
nurse
and
that
data
is
part
of
the
research
project
in
terms
of
how
files
are
protected
inside
of
nurse
they're
protected
with
basic
Unix
file.
Permissions
based
on
user
and
group
IDs
that
are
set,
and
you
can
view
an
iris.
A
So
it's
it's
your
responsibility
to
ensure
that
the
file
permissions
and
things
like
default,
masks
that
you
set
are
set
correctly
to
mask
the
the
to
handle
the
privacy
or
exposure
of
the
data
that
you
want
to
have
inside
the
system.
I
mean
you
can
always
reach
out
to
us.
A
If
you
have
questions
about
this,
we're
happy
to
help,
and
we
also
have
a
long
page
about
file
system
permissions,
because
it
is
a
little
bit
hard
to
understand
and
then
finally,
the
final
main
policy
is
that
users
have
ultimate
responsibility
for
managing
their
data.
So
if
we
tell
you
you
know,
that's
a
file
system
has
ephemeral
and
you
should
back
things
up.
A
A
So
let
me
give
you
a
quick
overview
of
the
the
file
systems
we
do
have
at
nurse
and
basically
like
a
number
of
other
centers,
we
sort
of
right.
Now
we
have
a
tiered
file
system
where,
at
the
the
peak,
the
highest
layer,
the
most
performant
layer
is
generally
the
smallest
in
capacity.
That's
because
it
costs
a
lot
of
money
to
get
performance,
and
so
you
sort
of
trade
performance
for
capacity,
because
you
can't
buy
more
of
it.
A
So
here
I've
laid
out
a
really
simplified
kind
of
version
of
the
file
systems
that
we
have
available
at
nurse
on
both
Corey
and
perlmeter.
So
at
the
very
tippy
top
I
have
memory,
which
is
you
can
sort
of
think
of
as
a
file
system,
because
you
can
temporarily
write
things
to
memory
and
then
below
that
we
have
the
scratch
file
system.
A
In
our
Pearl
meter
we
have
a
32
petabyte
flash
scratch
system.
So
it's
very
fast.
You
can
get
aggregate
speeds
on
the
order
of
terabytes
per
second,
but
it's
small.
So
it's
temporary.
So
we
purge
it
and
I'll
talk
a
little
bit
more
about
that
later
and
then
the
layer
below
that
is
the
layer.
We
called
the
community
file
system.
A
That's
that's
a
spectrum
scale
file
system.
It
used
to
be
known
as
gpfs
that
might
be
more
familiar
to
people.
It
doesn't
have
as
fast
of
a
streaming
rate.
It's
intended
for
longer
term
data
holding
and
for
sharing
with
your
sharing
data
within
your
project.
A
A
A
So
you
can.
You
can
go
to
this
documentation
to
find
more
details
if
you
wanted
so
starting
at
the
sort
of
top
of
the
the
file
system
pyramid.
We
have
prometer
scratch
so,
like
I
said
before
that's
luster
and
it's
one
of
the
most
successful
and
mature
HPC
file
systems.
A
This
is
where
you
would
store
your
data,
that's
being
actively
read
or
written
by
jobs
on
computes,
so
you're
doing
a
lot
of
I
o.
You
need
fast
rates,
you
put
it
on
Parliament
or
scratch.
Then,
when
you're
done,
you
move
it
to
a
more
important
sorry,
not
more
a
more
permanent
area,
because
we
do
Purge
and
what
purging
means
is
that
if
files
aren't
accessed
in
a
certain
time,
they're
automatically
removed
by
our
system,
we
have
a
it's
just
a
machine
that
runs,
but
it's
not
something
that
we
do.
It
happens
automatically.
A
A
The
directories
on
promoter
scratch
are
user
level
directories,
so
each
user
has
their
own
directory
and
by
default
it's
only
user,
readable.
There's
no
group
readability
for
the
directories
on
scratch.
I'll
talk
a
little
more
about
how,
if
you
wanted
to
set
that
up,
you
can
do
that,
but
it's
fairly
rare.
Most
of
them
are
just
for
the
users
for
your
own
user,
for
your
own
self
to
write
while
you're
doing
jobs,
there's
a
quota
on
Parliament
or
scratch.
A
So
there's
a
default
quota
of
20
terabytes
we
do
have
so
we
do
allow
you
to
go
over
that
for
very
short
term
by
about
10
terabytes
and
that's
intended
to
allow
you
like,
if
you're
in
the
middle
of
a
job-
and
you
accidentally
exceed
your
quota,
we
don't
want
everything
to
fall
apart.
A
You
know,
if
you
have
hundreds
of
nodes,
writing
just
because
you're
like
a
couple
bytes
over
your
quota
or
something
so
we
have
this
buffer
so
that
you
can
write
everything
out
and
then,
after
your
job's
done,
you
can
come
and
clean
up.
So
after
you
exceed
a
quota
you're
not
going
to
be
able
to
write
any
more
data
to
the
file
system,
you
can
remove
data,
but
you
can't
write
anymore
and
you'll
get
an.
A
I
o
error
when
you
try
to
write
that
says:
you're
out
of
space,
and
so
Lester
has
this
really
nice.
One
of
the
reasons
why
it's
such
a
successful,
mature
HPC
file
system
is
that
it
has
a
whole
bunch
of
these
servers
called
the
OS
OSS
and
each
one
of
them
talks
to
Dedicated
storage
pairs
called
osts,
and
you
have.
A
A
So,
by
default,
data
files
go
to
one
OST
only
and
that's
because
most
of
the
file
sizes
that
folks
write
are
appropriate
to
stripe
across
the
one
OST
they're
on
the
smaller
size,
and
that's
that's
fine
for
small
files.
It's
also
great
for
the
kind
of
file
per
process.
A
I
o
that's,
that's
pretty
popular
here
at
nurse,
so
we
try
to
set
a
sensible
default
that
would
work
for
most
everybody,
but
if
you're
doing
something
that's
more
sophisticated
or
you
have
really
large
iOS
or
different
kind
of
I
o
patterns,
you
may
want
to
think
about
how
you
you
might
want
to
increase
the
number
of
osts.
This
is
striped
against
and
we
have
some
helper
scripts
that
will
sort
of
automatically
do
the
striping
for
you.
So
there's
there's
small,
medium
and
large.
A
They
should
just
be
available
in
your
path
when
you
log
into
promoter
Quarry-
and
we
have
a
table
here
over
here
to
kind
of
guide
you
a
little
bit
about
when
and
how
you
might
want
to
set
the
striping.
So
if
you're
doing
single
shared
file,
I
o
you
are
going
to
want
to
Leverage
The
striping,
because
you're
going
to
want
to
talk
to
multiple
osts
excuse
me
and
as
the
file
size
increases
you're
going
to
want
to
get
more
and
more
osts
involved.
So
you
can
really
push
the
bandwidth
for
this.
A
If
you
have,
if
you're
doing
file
per
process,
I
o.
What
will
happen?
Is
the
files
will
automatically
get
striped
across
all
the
osts
on
their
own
until
you'll,
just
by
default,
because
it
kind
of
round
robin
puts
them
across
the
osts
by
default?
You'll
get
some
the
kind
of
striping
that
you
want
to
have
to
really
have
optimal
file
for
process
streaming.
So
you
don't
really
need
to
do
anything
until
your
files
get
really
large,
because
there
are
a
lot
of
osts
each
one
of
them
has
a
limited
capacity.
A
So
you
have
a
really
large
file.
You
might
want
to
spread
that
across
more
osts,
so
that
you
can.
You
can
keep
from
being
really
tied
up
for
a
long
long
time
talking
to
a
single
OST.
So
you
can
use
these
helper
scripts
and
the
way
that
we
usually
recommend
you
do
it.
Is
you
set
up
a
directory
like
you're?
A
You
know
if
you're
going
to
write
out
files,
you
call
it
output
and
you
would
say
stripe
small
output
and
it
would
put
the
striping
on
the
directory
and
then
all
the
files
that
come
in
there
would
inherit
that
striping
and
if
you
wanted
to
look
and
just
kind
of
make
sure
that
something
was
actually
working.
You
could
manually
query
with
this
special
command,
which
is
lfs.
It
stands
for
luster
file
system,
get
stripe
and
then
the
path
to
the
stripe.
A
We
have
a
lot
more
guidance
on
our
website
about
this
and
we
can
open
up
a
ticket
if
you
have
some
questions
so
moving
on
to
the
next
layer
in
our
file
system
is
the
community
file
system,
and
this
is
intended
for
large
data
sets
that
you
need
for
a
longer
period,
say
on
the
order
of
one
to
two
years.
A
It's
set
up
and
set
up
for
sharing
with
group,
read
permissions,
so
that
means
that
every
project
gets
a
directory
on
the
community
file
system
and
it
usually
has
the
name
of
your
your
project
name.
So
if
you're
m1234,
it
would
be
the
m1234
directory
and
the
the
path
here
is
global
CFC
as
Cedars
the
m1234
or
you
can
use
this
dollar
CFS.
If
you
don't
want
to
write
all
that
out,
this
is
intended
for
sharing
and
long-term
storage.
So
it's
not
for
intensive.
I
o
you're
going
to
do
a
lot
of.
A
I
o.
You
should
use
scratch.
Instead,
data
on
the
community
file
systems
never
purged,
we
back
it
up
using
snapshots.
So
there's
a
seven
day
record
of
all
the
data
on
CFS
that
you
can
access
yourself.
You
can
actually
go
to
this.
If
you
delete
something
you
want
to
get
it
from
yesterday,
you
can
go
to
this
website
and
see
how
to
get
something
back
yourself
out
of
snapshot.
A
So
you
don't
need
to
open
a
ticket,
and
then
the
usage
is
managed
by
quotas
and
you
can
actually,
as
a
project,
can
ask
for
multiple
directories
on
CFS
and
you
can
give
each
directory
a
separate
quota,
and
so
this
is
pretty
useful
for
large
groups
where
they
have
say
a
simulation
group
and
an
experimental
group
or
something
they
want
to
split
these
things
up,
and
maybe
one
of
them
needs
a
really
large
quota
and
the
other
one.
Doesn't
you
can
take
the
whole
quota
you're
given
and
split
it
between
these
subdirectories.
A
I'm
finally,
we
have
hpss.
This
is
our
tape
archives
system
at
nurse.
It's
for
data
that
you
really
important
data
that
you
want
to
keep
for
long
term.
So
this
is
like
data
from
your
finished
paper.
That's
been
published
or
raw
data.
You
might
need
in
case
of
emergency,
really
hard
to
generate
data.
Maybe
it
took
you
a
really
long
time
to
do
it.
It
needs
some
special
thing,
any
kind
of
precious
data
that
you
want
to
keep
and
you
think
you
might
use
again,
but
not
right
now
you
would
put
into
hpss.
A
So
the
the
thing
to
remember
that
that's
different
about
hpss
is
that
it's
a
tape
archive
and
it's
actually
a
combination
of
tape,
archive
and
spinning
disks.
All
the
other
file
systems
are
Spinning
Disk,
it
reacts,
they
kind
of
you
can
list
a
thing
and
see
it
right
away.
A
You
can
get
the
the
bytes
out
of
the
middle
of
a
file
right
away,
but
if
you
try
to
do
that
with
a
tape,
there's
actually
a
huge
library
of
tapes
and
a
big
rack
and
a
robot
has
to
go
and
pick
out
the
tape
that
you
want,
bring
it
over
to
the
reader
and
then
read
the
tape
and
it's
fast.
But
it's
definitely
not
like
Spinning
Disk
file
system
fast.
A
So,
if
you're
trying
to
do
something
in
hpss,
where
you
need
to
access
the
data
on
a
bunch
of
files,
you're
going
to
have
a
bad
time,
so
what
you
usually
need
to
do
is
is
get
your
files
out
of
hpss.
The
way
that
you
think
you're
going
to
want
to.
So
you
put
them
into
htss
the
way
you
think
you're
going
to
want
to
read
them
and
then
pull
them
back
out
and
do
the
kind
of
analysis
that
you're
looking
for
there.
A
So
there's
a
there's
quotas
for
hpss,
just
like
for
everything
else,
but
those
are
controlled
in
Iris.
You
can
actually
go
to
the
iris
page
and
view
go
to
the
storage
tab
for
your
account
and
you'll
see
your
hpss
info.
A
Most
nurse
users
are
a
member
of
a
single
project,
but
if
you're
a
member
of
multiple
projects,
you
can
go
into
IRS
and
say
like
adjust
your
percentages
of
quota,
that
you
want
to
charge
between
the
projects
that
are-
and
you
can
find
more
details
about
hpss
here
and
so
now.
We
also
have
Global
common
and
that's
sort
of
our
file
system
for
software,
and
why
do
we
have
a
specialist
office
file
system
for
softwares
because
we
want
to
optimize
the
library
load
performance?
A
So
what
it
is
is
this
is
a
plot
of
the
it's
kind
of
a
messy
plot.
There's
a
lot
of
stuff
on
here,
but
basically
it's
time
range
on
the
bottom
and
then
seconds
to
load
on
the
on
the
y-axis.
And
so
you
can
see
this
black
dot
is
global,
common
and
then
the
other
two
are
the
red
is
CFS
and
green
is
scratch.
So
you
can
see
it
takes
considerably
longer,
in
most
cases
to
load
a
pretty
large
Benchmark
with
a
lot
of
libraries
from
these
other
two
file
systems
than
from
Global
Commons.
A
So
it
has
a
block
size.
That's
optimized
for
small
files
like
software,
like
software,
usually
is,
and
it's
backed
by
flash
storage.
So
it's
very
quick,
but
it's
very
small.
So
it's
really
just
intended
for
this
is
where
I'm
going
to
put
my
software
stack
so
I
have
a
calendar,
install
I'm,
going
to
put
it
there
because
I'm
going
to
be
reading
it
on
10,
000,
nodes
and
I
want
it
to
become
load
fast,
and
it's
set
up
similar
to
CFS
there
in
that
there's
group
writable
directories.
A
So
it's
a
good
place
to
put
the
software
software
stack
that
your
whole
Project's
going
to
use
and
on
Corey
it's
it's
read
only
on
compute
nodes
to
further
optimize
it.
That's
not
true
in
Pearl
letter
to
read,
write,
but
usually
what
you
want
to
do
is
install
from
a
login
node
and
then
just
read
from
a
compute
node.
A
Apparently
we
have
the
home
directories.
These
are
what
you
land
in
when
you
first
log
into
the
system.
These
are
sort
of
kind
of
a
good
place
for
bookkeeping.
Put
your
setup
scripts,
those
kind
of
things
you
may
sit
in
there
and
compile
and
install
somewhere
else.
Everybody
gets
20
Gigabytes
of
quota.
A
We
very
rarely
give
out
quota
increases
because
we
have
all
these
other
file
systems
that
are
intended
to
take
up
the
large
capacity
stuff.
It's
not
intended
for
intensive
IO,
don't
read
or
write,
don't
read
or
write
large
files
from
your
home
directory
during
compute
jobs.
It
will
not
go
fast.
A
The
home
directories
are
backed
up
every
month
in
the
hpss.
We
also
have
snapshots.
So
if
you
want
to
delete
something,
get
it
right
away,
you
can
just
go
into
the
snap
special
snapshot
directory
and
get
it
back.
A
A
The
file
system
is
quota
management
because
those
are
shared
directories
and
sometimes
projects
can
have
hundreds
of
people.
When
you
reach
a
quota,
it's
really
hard
to
say
who
needs
to
clean
up
or
what?
Where
is
using,
all
your
space?
A
A
It
tells
you
you
know
with
the
little
bars
you
can
see
right
away,
which
ones
are
getting
close
to
quota,
and
then
you
can
I,
don't
have
it
here,
but
you
can
toggle
this
usage
detail
and
you
can
see
which
users
in
this
project
are
using
the
most
data.
So
when
you
get
close
to
the
quota,
you
can
say
hey
guys,
we
got
to
clean
up
and
you
can
show
a
little
plot
of
who's
using
what
and
then
folks
will
be.
A
They
can.
Individual
users
can
also
come
here
and
they
can
see
where
their
biggest
files
and
directories
are.
So
they
know
where
to
start
if
they
need
to
start
cleaning
up
or
archiving
and
here's
an
example
of
what
a
usage
report
looks
like.
So
it's
a
particular
directory
and
you
can
see,
there's
a
bunch
of
users.
A
If
you
have
a
quota,
then
maybe
you
would
want
to
talk
to
this
particular
user
and
see
if
maybe
they
could
clean
up
a
little
get
your
most
bang
for
your
buck
there,
and
then
you
can
adjust
the
quotas
for
the
community
file
system
in
Iris.
So
this
is
for
a
project.
This
is
the
Das
project
which
I'm
in
on
the
storage
Tab,
and
it
shows
all
the
the
CFS
directories
that
we
have
here's
their
names.
A
So
this
would
be
Global,
CFS
leaders,
gas,
repo,
that's
what
I
would
see
on
the
file
system
and
it
shows
who's
who's
the
owner
and
it
shows
how
much
of
the
storage
has
been
given
out
and
how
much
I'm
using
and
then
up
here
you
can
see
the
total
storage
for
the
whole
project,
so
this
200
terabytes
is
split
amongst
these.
A
What
is
this
seven,
six
seven
eight
directories
and
spread
by
how
we
how
we
use
it-
and
you
can
come
in
here
and
if
you
you,
can
adjust
these
percentages
if
you're
the
pi.
You
can
also
ask
for
a
new
directory
just
by
clicking
this
new
button
and
then
it'll
pop
up
a
little
box
and
you'll
fill
it
out
and
tell
the
name
and
who
you
want
it
to
own
and
then
it'll
propagate
to
the
file
system
within
a
few
hours
and
they'll
be
ready
for
you
to
use.
A
We
have
a
somewhat
new
tool
called
the
pi
toolbox.
This
is
used
for
pis
to
come
in
and
adjust
permissions
in
their
CFS
directories.
A
A
You
can
even
change
the
ownership
in
here,
so
it's
a
fairly
handy
tool
for
managing
permission,
drift
in
the
community
file
system,
so
now
I'm
going
to
move
on
to
the
best
practices
for
data
sharing,
so
first
I'm
going
to
start
with
the
idea
of
sharing
inside
of
nurse.
So
I
talked
a
lot
about
the
community
file
system.
This
is
the
main
way
that
a
project
has
to
share
data
with
themselves.
A
A
You
can
also
have
a
similar
kind
of
construct
in
hpss.
You
can
have
a
shared
project
directory
in
hpss
that
can
be
shared
by
the
whole
group.
A
Another
interesting
way
of
sharing
data
at
nurse
because
these
things
called
collaboration
accounts.
These
are
accounts
that
are
tied
to
a
project.
Basically,
instead
of
an
individual
user
and
then
the
pi
can
control
who
in
the
project
can
access
it
and
then
you
can
come
and
you
can
do
a
special
command
and
log
in
as
the
collaboration
account
and
then
you're,
just
like
a
regular
user,
except
this
as
a
with
this
collaboration,
account
and
groups
use
it
for
doing
things
like
managing
shared
data
sets
or
running
shared
workloads.
A
It
turns
out
to
be
really
useful
tool
because
then
you
don't
have
to
keep
choning
data
sets
when
people
leave
for
things
that
need
to
continue
to
to
propagate
at
nurse.
You
can
also
share
data
with
scratch.
It's
not
as
common
to
do,
because
it's
a
little
bit
harder
to
set
up.
We,
we
really
recommend
you
only
share,
read
access
out
of
your
scratch
directory.
A
If
you
want
to
allow
rights
in
and
out
of
just
if
you
want
shared
rights
in
a
scratch
directory,
we
suggest
you
set
up
a
collaboration,
account
it's
just
too
confusing,
managing
the
quotas
and
things
like
that.
Otherwise,
but
if
you
did
want
to
make
your
project
readable
in
scratch,
you
could
you
could
change
the
group.
So
that's
readable
by
your
project
and
then
propagate
group
readable
permissions.
There
is
just
an
example
below
how
you
do
that
and
then.
Finally,
we
have
these.
A
So
if
you
want
to
share
with
external
collaborators,
we
have
a
whole
bunch
of
different
ways
to
do
that
too.
So
those
public
HTML
access,
you
can
create
specific
by
default.
If
you
create
in
your
CFS
directory
a
directory
called
www,
those
things
are
available
via
our
portal.nurse.gov.
Anything
you
put
in
there
and
make
World
readable
will
be
viewable
at
this
URL.
So
it's
a
really
handy
way
to
just
really
quickly
share
files
over
the
web.
A
Finally,
we
have
Global
sharing,
so
I'll
talk
a
little
more
about
Globus,
but
basically
this
is
a
a
read-only
endpoint,
a
read-only
point
where
folks
can
share
data
via
the
Globus
protocol,
and
this
is
the
the
way
that
we
recommend.
If
you
have
large
data
sets,
you
need
to
share
with
the
general
public
like
terabyte,
sized
or
larger
you'd,
really
want
to
look
at
using
Globus
sharing.
A
We
also
have
a
network
of
data
transfer
nodes.
We
have
four
of
them
that
are
open
to
SSH.
You
can
just
go
to
dtn.01.
nurse.gov
to
get
to
them,
and
these
are
set
up,
so
they
have
high
bandwidth
network
interfaces,
they're
tuned
for
efficient
data
transfers,
and
we
monitor
bandwidth
between
nurse
and
the
other
media
facilities
over
esnet
like
Oak,
Ridge
and
argon,
the
other
National
Labs,
to
make
sure
that
things
can
move
quickly
around
and
these
data
transfer
nodes
have
direct
access
to
the
community
file
system,
hqss
archive
and
Quarry
scratch.
A
So
if
you
want
to
move
large
volumes
of
data
in
and
out
of
nurse
between
nurse
systems,
we
recommend
you
use
the
nurse
dtns.
If
you
want
to
move
data
in
and
out
of
promoter,
you
should
use
the
parameter
login,
nodes
or
Globus,
which
I'll
talk
about
next
Globus.
Okay,
so
Globus
is
the
recommended
tool
for
moving
data
in
and
out
of
nurse.
A
It
will
retry
if
things
fail,
it
does
a
check
sum
on
either
side
to
make
sure
that
the
data
Integrity
is
kept
and
then
it'll
send
you
an
email
when
everything's
all
done
or
if
there's
a
problem.
So
this
is
really
great
for
if
you
need
to
move
large
volumes
of
data,
it's
kind
of
drag
and
drop
and
come
back.
You
can
check
on
your
file
transfers
and
see
how
they're
going,
and
it
just
takes
care
of
all
the
pain
of
moving
the
data
for
you.
A
So
there's
a
web-based
GUI,
that's
how
most
folks
interact
with
it,
but
we
also
have
there's
a
Globus
CLI
command
line
scripts
that
you
can
use
and
we
have
a
module
that
you
can
load.
So
you
can
interact
with
some.
We
have
a
couple
of
command
line
scripts
that
you
can
use
to
move
data
and
then,
if
you
need
to
share
with
someone
who
doesn't
have
so
most
institutions
pay
to
set
up
a
Globus
endpoint
and
you
can
find
them
and
move
things
between
them.
A
But
if
you're,
just
a
person
who
wants
to
transfer
a
bunch
of
stuff
from
the
nurse
scan
point,
you
don't
have
to
deal
with
like
keeping
your
SCP
going
in
the
background
and
restarting
it
and
figuring
out
all
that
stuff,
you
can
still
use
Globus.
They
have
this
thing
called
Globus
connect
personal.
It's
a
little
software
thing
that
runs
it
can
run
on
Linux.
It
can
run
on
Mac.
It
can
win
on
Windows,
it's
a
you
install
it.
A
It's
usually
plug
and
play,
and
then
it
sets
up
your
own
little
personal
endpoint
on
your
laptop,
and
so
you
can
go
from
say
the
nurse
dtn
to
your
personal,
laptop
and
transfer
data.
That
way-
and
it
won't
be
as
fast
as
going
between
the
large
Computing
centers,
because
you
don't
have
the
super
fancy
network,
but
it
will
keep
trying
and
it'll
keep
going
and
the
data
will
get
there
eventually,
even
if
it's
a
little
slow.
So
it
is
possible
to
use
Globus
even
without
leveraging
an
endpoint
at
a
large
Institution.
A
So
some
general
tips
for
transferring
data.
We've
already
said.
We
really
think
you
should
use
Globus
for
large
transfers.
They
don't
have
to
be,
they
can
be
internal
or
external.
If
you
have
hundreds
of
of
terabytes,
you
need
to
move
from
CFS
to
scratch.
You
can
use
Globus,
for
that.
It
doesn't
need
to
be
external.
A
A
A
So
when
you're
moving
data,
some
things
to
think
about
in
terms
of
performance,
it's
often
limited
by
the
remote
endpoint,
we
rely
on
our
companions
at
esnet.
They
give
us
out
of
nurse
some
really
high
speed
and
great
performant
Network,
and
that
goes
fine
as
long
as
you're
on
the
general
network,
but
generally
usually
it's
the
last
mile.
A
So
that's
that's.
Usually
the
problem
with
most
performance
file
system
transfer
with
most
large-scale
data
transfers,
is
problems
in
the
wide
area
network.
But
there's
sometimes
that
nurse
there
can
be
file
system
contention
and
if
you
are
seeing
that
or
if
you
know,
if
you
look
on
the
motd
and
see
things
are
degraded,
you
may
want
to
try
a
different
file
system
or
try
a
different
time
if
you're
seeing
really
slow
rates.
A
Sometimes
you
can
have
problems
because
you
are
using
the
wrong
directory.
Don't
expect
to
transfer
a
whole
bunch
of
terabytes
into
your
home
directory
and
have
it
do
well
number
one
you
hit
your
quota
number
two
it'll
be
really
slow,
so
don't
use
that
for
them.
A
If
you're
taking
all
these
considerations
into
into
effect
and
you're,
not
getting
the
performance,
you
expect
then
definitely
please
open
a
ticket
and
we'll
help
you
debug.
What's
going
on.
A
Just
a
few
minutes
left,
but
I
want
to
talk
about
transferring
with
nurse
hpss,
so
I
mentioned
before.
Hpss
is
special
because
it's
a
tape
archive
it's
great
because
it
gives
us
a
whole
bunch
of
capacity.
It's
about
200,
petabytes
that
we
have
in
there
right
now,
which
is
a
lot
of
data,
but
it's
makes
it
a
little
bit
hard
to
get
large
amounts
of
data
in
and
out
of
it
without
doing
some
special
things.
A
So
we
have
some
mechanisms
set
up
at
nurse
to
help
you
best
get
the
data
out
of
hpss,
so
we
for
one
thing:
we
have
a
transfer
Key
Queue
that
you
can
use
to
transfer
data
in
and
out
of
hpss,
so
you
can
run
up
to
15
jobs
at
once,
pulling
data
from
hpss
and
you
can
use
our
transfer
queue
to
spread
those
out
across
all
the
login
nodes
and
spread
the
load
across.
A
A
Htar
is
a
built-in
bundling
algorithm
that'll,
take
your
small
files
and
bundle
them
up
into
things
that
are
that
are
optimal
for
tapes,
and
then
we
also,
if
you
want
to
use
Globus
for
this,
we
have
command
line
tools
for
external
Globus
transfers.
We
have
a
whole
lot
of
details
here
in
this
link,
so
I
encourage
you
to
check
this
out
if
you're
moving
large
volumes
of
data
in
and
out
of
hpss,
so
just
to
conclude
nurse
has
multiple
file
systems
to
fulfill
different
performance
and
capacity
needs.
A
You
know
we
have
a
whole
bunch
of
different
ways
to
share
and
transfer
data,
and
if
you
need
more,
if
you
want
to
read
more
details,
please
check
out
our
web
documentation.
So
thank
you
very
much.