►
From YouTube: 8. Data Transfer Best Practices
Description
Learn about the best practices for transferring data to and from NERSC.
Slides for all sessions can be downloaded from here: https://www.nersc.gov/users/training/events/new-user-training-june-21-2019/
A
Okay,
so
I'm
here
to
tell
you
about
data
transfer
and
access
here
at
Newark,
how
you
get
data
in
and
out
of
our
systems.
If
you
have
a
bunch
of
data
sitting
on
remote
machines,
you
want
to
bring
them
in
to
nurse.
If
you
have
data
over
here
that
you
want
to
share
with
your
collaborators,
move
it
to
other
machines
that
you
know.
Science
is
a
big
shared
enterprise,
and
so
we
all
want
to
collectively
be
able
to
make
sure
that
data
can
live
on
the
systems
where
it's
most
needed.
A
So
I'm
going
to
tell
you
about
all
the
different
ways
you
can
get
data
in
and
out
of
here.
So
we'll
start
with
data
transfer
nodes.
So
these
are
some
dedicated
submission
nodes
where
you
can
move
data
in
and
out
of
norsk,
and
primarily
these
are
configured
to
use
things
like
Globus
and
great
FTP
and
I'll
talk
about
that
in
a
second.
A
But
what's
really
going
on
here
is
that
these
are
servers
that
are
tuned
for
high
bandwidth
network
transfers
and
for
doing
these
very
efficiently.
So
they
have
things
like
buffer
sizes
tuned
and
their
network
interfaces
are
hooked
up
directly
to
es
NAT,
so
you're
getting
the
full
bandwidth
there,
and
we
actually
monitor
the
performance
of
these
transfers
between
us
and
other
sites
to
make
sure
that
things
are
actually
performing.
The
other
useful
thing
here
is
that
these
nodes
actually
have
direct
access
to
the
big
file
systems
that
nurse,
including
the
Cori
scratch
file
system.
A
So
you
can
directly
move
things
on
to
the
file
system
or
move
it
off
of
the
file
system,
that's
running
on
the
Cori
system.
So
it's
it's.
Unlike
some
other
systems
which
may
or
may
not
have
access
to
all
the
file
systems.
This
one
really
does
have
access
to
everything.
So
these
are
nodes
that
are
DT
and
X
X.
So
the
X
X
is
the
there's.
There's
1
through
12
I,
believe
so
you
can
log
into
any
of
them.
Is
it
1
through
12
or
1
through
8?
What
do
they?
A
Ok,
1
through
12
I,
know,
there's
1
through
8,
but
yeah
you
log
into
one
of
those
nodes.
Again,
you
go
through
Globus
and
that
hides
that
for
you
to
some
extent,
but
but
you
can
move
at
a
fairly
high
rate,
all
right.
The
other
thing
I
should
note.
This
is
also
a
useful
way
to
move
data
between
systems
at
nurse.
So
if
you
want
to
move
things
to
the
HP
SS
system
again,
it's
got
a
pretty
heavy-duty
pipe
between
the
file
systems
in
HP
SS.
A
All
right
so
talk
a
little
bit
about
Globus
and
most
of
this
III
have
a
demo,
so
it'll
be
easier
to
just
show
you
guys
what
we're
doing
here,
but
this
is
probably
the
easiest
way
to
move
data
in
and
out
of
nurse
it's
reliable.
It's
easy
to
use.
It's
basically
a
web
interface
where
you
drag
and
drop
files
around
and
the
nice
thing
is,
it
actually
is
a
managed
transfer.
A
So
it's
essentially
automated,
which
means,
if
you
move
a
folder
it'll
move
everything
in
the
folder,
for
you
keep
track
of
what
you've
moved,
what
you've
not
done!
If
something
fails,
it'll
retry
it
at
the
end
of
the
transfer,
it
will
tell
you:
oh
these
files
were
moved
successfully.
These
files
we're
not,
and
so
it's
a
much
better
way
to
kind
of
fire
and
forget
you
set
off
the
transfer
and
then
you
go
off
and
do
your
own
thing.
You
don't
have
to
keep
something
running
on
your
terminal
or
anything
like
that.
A
It's
just
a
managed
there
in
a
transfer
interface
there's
extensive
documentation.
They
also
have
a
REST
API,
so
you
can
script
your
own
clients
around
this.
So
if
you
will,
if
you
didn't,
want
to
use
the
web
interface-
and
you
wanted
to
you-
had
a
web
service
that
you
wanted
to
have
data
movement,
some
sort
of
data
brokering
going
on
in
the
background,
you
can
write
a
rest
tool.
That'll
basically
talk
to
the
API
and
move
stuff
for
you
all
right,
so
we've
got
a
bunch
of
different
endpoints.
A
Most
of
this
is
actually
no
longer
valid
because
we
don't
have
an
SSN
or
PD
SF
anymore,
so
you
should
just
use
the
nurse
DTM
nodes
and
the
nurse
kori
nodes
I'm,
not
even
sure,
if
we're
telling
people
to
use
DTN
jagi
but
I
think
we
still
have
that.
So
if
you
a
jgi
user,
you
could
use
the
DTN
jgi
node,
but
I
I
think
at
this
point:
you
just
want
to
use
nurse
DTN,
maybe
use
nurse
kori,
but
beyond
that
I
think
a
lot
of
these
are
obsolete
at
this
point
all
right.
A
So
you
log
in
with
your
username
and
password,
and
it
takes
you
to
this
interface,
so
here
you'll
start
by
just
typing
in
the
name
of
your
endpoint.
In
this
case,
we'll
just
start
by
typing
in
nurse
see
what
shows
up.
There's
a
bunch
of
endpoints,
really
I.
Think
you
mostly
want
to
do
stuff
with
nurse
DTN.
A
That's
the
data
transfer
node
and
it's
my
home
filesystem
shows
up
here
and
let's
say:
I
wanted
to
move
data
from
here
to
nurse
gage,
pß
and
I'll
bring
that
up,
and
then
the
HP
SS
filesystem
shows
up
here
and
then
I
can
say.
Oh
I
want
this
one
of
these
folders.
Let's
pick
something:
there's
an
animal's
folder
I
have
here,
which
just
has
a
bunch
of
animals
in
there.
So
I'm,
just
gonna
pick
that
folder
and
I'm
gonna
say
go
ahead
and.
A
A
Nope
hang
on.
Let
me
reload
the
app
because
there's
actually
just
a
little
button
that
says,
transfer
and
so
yeah
that
that's
what
I
was
looking
for
this
little
start
button.
So
I
guess
the
JavaScript
didn't
completely
load
anyway.
So
yeah,
you
just
pick
a
file
and
then
you
push
start
and
it
moves
the
data
or
I
pick
a
folder
and
you
push
start
and
it
moves
the
data
and
you
can
look
at
your
transfer
itself.
So
you
have
an
event
log
that
captures
all
the
activity.
A
A
A
The
other
thing
that's
useful
here
is:
you
can
also
move
it
between
your
between
nurse
to
your
own
resource,
so
I
actually
have
this
set
up
to
work
on
my
laptop.
So
there
is
a
tool
called
Globus
connect
personal.
So
you
just
download
that
tool
where
you
can
you,
click
on
Globus
connect
down
on
their
website
get
Globus,
connect
personal
and
you
just
install
it
on
your
laptop
and
it
shows
up
as
a
little
tool
here.
So
this
is
basically
hooked
up
to
your
laptop
and
now
I
can
move
files
between
my
laptop
and
Globus.
A
A
A
Sorry
so
I
think
the
really
what
you
should
do
is
use
Globus,
if
possible,
because
it'll
just
round-robin
between
the
D
TNS
and
it
yeah.
So
so
Globus
just
says
nurse
d
TN,
if
you
use
that
it'll
take
care
of
all
the
load,
balancing
I,
think
there's
a
mode
where
to
leave
and
try
and
like
stream
the
transfers,
if
needed
so
it'll.
Basically,
if
you've
got
four
different
files,
you're
moving
it'll
try
and
hit
all
the
different
nodes
to
be
able
to
maximize
the
bandwidth.
A
So
again,
that's
probably
another
reason
why
you
want
to
try
using
Globus
instead
of
just
doing
it
yourself.
If
you
much
if
you
do
need
to
just
log
into
a
DTN,
no
that's
fine,
and
just
so
you
know,
Globus
uses
good
FTP
under
their
covers
as
well,
so
alright,
so
yeah.
So
we
already
talked
about
the
DT
ends.
It's
got
all
the
file
systems
mounted
to
transfer
data.
A
A
A
A
Typically,
you
want
to
do
this
for
files
that
are
under
under
100
megabytes,
because
if
it
gets
too
big,
it's
just
gonna
sit
and
spin
for
a
long
time,
which
means
you
can't
leave
your
laptop.
You
have
to
leave
your
window
open,
so
it's
also
SCP
is
gonna,
be
slower
than
globus,
so
again
use
it
for
small
files,
it's
fine,
but
if
you
need
manage
transfers
use
globus,
we
do
tell
people
to
not
use
the
D.
You
do
have
login
access
to
the
data
transfer
nodes,
but
don't
use
them
for
doing
other
things.
A
So
don't
compile
your
codes
on
the
DT
ends
or
don't
use
them
to
run
other
services.
The
whole
point
of
them
is
to
use
them
for
data
transfer
and
it's
a
shared
resource.
So
if
you
start
misusing
it,
then
your
affect
your
impacting
other
people's
performance
and
you
can
still
use
a
regular
copy
to
move
things
between
file
systems.
A
We
have
an
anonymous
FTP
site
which
I'm
not
going
to
tell
you
about,
but
you
can
go
and
look
at
it.
If
you
need
external
partners
to
transfer
data,
we
have
north
science
gateways,
which
is
again
another
way
of
getting
your
data
out
to
people.
So
these
are
web
portals,
which
you
can
build
to
create
sophisticated
interfaces
around
your
data.
We
have
a
service
called
spin,
which
is
a
docker
based
system
that
you
can
build
these
portals
on.
A
So
you
can
ship
your
application
in
the
container
it's
connected
to
data
on
project,
and
now
you
can
have
this
nice
sophisticated
web
app
that
serves
up
all
of
your
information
and
you
can
go
through,
go
to
the
docs
dot
nurse
gov
page
and
look
at
how
to
build
science
gateways
how
to
get
started
on
spin,
but
at
a
bare
minimum.
You
can
just
put
data
in
your
dub-dub-dub
folder
on
project
and
it
becomes
visible
to
the
world
alright.