►
From YouTube: 15 Deep Learning
Description
Part of the NERSC New User Training on June 16, 2020.
Please see https://www.nersc.gov/users/training/events/new-user-training-june-16-2020/ for the training day agenda and presentation slides.
A
Okay,
good
so
hi
I'm
Mustafa,
one
of
the
machine
learning
engineers
that
does
I'll
be
talking
to
you
in
the
next
20
minutes
about
different,
deep
learning
capabilities
that
we
have
at
nurse
how
to
access
the
resources
and
how
to
use
them
efficiently
and
also
some
links
for
further
resources
on
certain
like
how
to
do
certain
aspects
of
deep
learning
at
norsk.
So
to
start
with,
so
this
talk
is
actually
focused
on
deep
learning.
We
do
have,
of
course,
like
the
traditional
machine
learning.
A
Software
stack
is
all
something
that
the
transit
norsk
and
it's
a
standard
thing
and
I'll
talk
a
bit
about
that,
but
I've
focused
mostly
on
deep
learning.
As
you
all
know,
by
now
deep
learning
has
been
is
on
the
way
to
transforming
science.
So,
there's
a
lot
of
excitement
about
the
potential
of
deep
learning
for
solving
so
many
problems
that
we
can't
solve
in
we,
some
of
them
that
we
can
solve
and
using
traditional
methods
and
some
of
them
where
deep
learning
will
help
us.
A
Do
it
faster
and
I
won't
go
through
through
the
list
of
things
here.
But
I
just
want
to
highlight
that
in
the
index
fear
of
that
UE
there
is
a
lot
of
big
investments
that
are
either
already
here
or
they're.
Coming
for
a
lot
for
different
applications
of
deep
learning
to
the
various
Sciences
that
the
do
e
is
involved
with,
and
there
is
a
like
a
300-page
report
that
came
from
the
town
halls
on
AI
for
science.
A
A
Machine
learning
is
a
particular
way
of
doing
I
using
statistical
methods
and
deep
learning
is
a
sub
class
to
machine
learning
that
focuses
on
using
neural
networks
to
solve
the
same
tasks
so
and
with
neural
networks
we
can
actually
solve
more
even
more
tasks
than
what
we
were
able
to
solve
using
machine
learning.
But
it's
still
it's
a
sub
part
of
machine
learning
that
focuses
on
using
neural
networks.
A
The
the
approach
that
we
have
for
supporting
deep
learning
tasks
is
we
do
that
at
multiple
levels.
First
of
all,
we
focus
on
making
sure
that
we
have
a
software
stack
that
is
optimized
for
performance
on
our
on
our
machine.
So
we
work
closely
with
the
hardware
vendors
and
also
the
software
vendors,
in
this
case
by
tortion
tensorflow,
with
Google
and
Facebook,
to
make
sure
that
the
software's
of
the
gives
they
hire
the
best
performance
that
we
can
get
out
of
our
machines,
and
so
that's
for
the
software
stack
for
accessing
the
resources.
A
A
We
provide
tools
to
that
enable
large
scale,
deep
learning
through
the
badge
system
and
finally,
we
also
try
to
to
give
training
and
consulting
to
to
essentially
organize
or
conduct
some
training
and
consulting
programs
for
applications
of
deep
learning
or
science
and
I'll
talk
a
little
bit
about
this
later
on.
So
first
on
the
software
stack
Lucy.
A
So
if
you,
if
you
look
at
the
plot
plots
to
the
righty
or
that
the
first
plot
that
this
this
one
I
know
remember
that
musing
cursor.
So
there
are
different
libraries
that
are
famous
that
are
popular
for
using
deep
learning
for
machine
learning
in
general,
people
use,
psychic,
learn
and
that's
something
that
runs
smoothly
on
our
machines,
for
the
purpose
of
the
start
will
focus
on
the
things
are
related
to
deep
learning,
which
is
Kerris,
tensorflow,
2.0,
pi,
torch
and
1001.
These
are
the
most
popular
libraries.
A
A
One
of
them
is
just
to
use
the
nernst
modules
which
I'll
talk
about,
and
that's
the
most
popular
approached
people,
at
least
for
prototyping
and
testing
out
things
in
the
beginning.
They
just
use
the
nernst
modules.
You
see
this
89%
of
the
of
the
users
use
that,
and
then
you
can
also
set
up
your
own
Conda
to.
A
If
you
want
to
further
customize
beyond
what
you
can
do
with
the
modules
and
then
you
can
also,
some
people
prefer
to
build
from
source
or
to
use
shifter
and
shifter
usage
is
actually
on
the
rise
for
report,
a
software
stack.
So
that
is
lesson.
That's
that
I'll
be
talking
about
each
one
of
these
to
do
so.
All
of
this
is
for
single
node
or
a
single
GPU
performance
to
do
beyond
to
go
beyond
that.
A
As
you,
if
you
have
some
experience
with
deep
learning,
you
know
that
it
requires
a
lot
of
data
which
takes
a
long
time
to
Train,
so
you
might
need
to
distribute
your
training.
So
that's
another
aspect
that
we
focus
on
is
how
to
do
distributed.
Training
on
our
machines,
I'll
talk
about
a
bit
about
this
ya
later
and
then
also
mention
shifter
contributor.
A
For
for
the
modules,
it's
it's
pretty
simple,
so
we
have
conceptual
modules
and
we
have
also
Torche
modules
and
tensorflow
includes
carrots.
So
if
you
want
to
use
Karis,
we
provide
curse
that
its
back
with
that
tends
to
flow
back
end,
which
is
intensive
pro
2.0.
So
you
can
also
do
it.
Use
just
a
tensor
fuel
module
and
the
way
to
check
which
versions
are
available.
You
just
do
module
avail,
tensor
fill
or
PI
torch,
and
it
gives
you
a
list
of
those.
Then
you
can
do
after
that.
A
A
If
you
want
to
further
add
other
packages
without
having
to
do
a
lot
of
things
or
to
create
your
own
Condor,
you
can
just
use
the
pepper,
install,
u
user,
which
will
install
any
package
that
you
want
in
your
in
the
in
the
user
area
related
which
is
which
comes
with
the
with
the
tensile
or
a
torch
module.
I'm.
Sorry,
just
wanna
check
on
time.
Okay,
so.
A
If
you,
if
you
ever
need
to
create
your
own
Conda
environment,
as
I
mentioned
earlier,
shifter
is
becoming
more
common
now
to
run
deep
running
stack
yours,
we
do
provide
images
of
PI,
torch
and
tensor
fill
that
are
optimized
for
best
performance
on
our
machines.
This
is
currently
only
for
quarry
GPU,
and
that
is
what
we
plan
to
use
in
the
future
on
per
motor.
A
A
Yeah,
if
you
want
to
submit
to
submit
your
bad
jobs
with
with
these,
we
recommend
that
you
use
the
slurm
batch
options
for
shifter
rather
than
using
the
interactive
command
as
it's
here.
You
just
use.
You
specify
the
image
and
the
volumes
to
mount
in
there
Spach
options,
and
then,
after
that
you
just
run
your
script
and
if
you
get
to
use
it
and
you
have
some
feedback,
please
let
us
let
us
know
we
are
interested
in.
A
Essentially
if
things
are
working
great
we'd
love
to
hear
about
that.
If
things
are
not
worth
specially,
if
things
are
not
working
great
for
you
or
you
see,
degradation
in
performance
compared
to
the
module
or
the
bare
metal
versions,
please
let
us
know
so
that's
in
terms
of
how
to
access
the
software
stack.
A
This
is
the
so
general
recommendations
of
how
to
do
things.
We
recommend
that
use
the
modules.
That
is
the
easiest
way
right
now
and
if
you
are
in
quarry,
you
can
also
try
the
containers,
that's
what
we
recommend
encourage
view.
If
things
don't
work
out
for
you,
you
please
check
the
docs
to
see
which-
and
you
want
to
build
your
own
condo
or
your
own
image.
Please
check
the
docs
to
see
how
to
get
the
best,
the
best
versions
of
tensile
for
my
torch
for
developing
and
testing
your
your
workflow.
A
We
do
if
you're
running
on
the
batch
system
like
directly
on
a
command
line,
then
using
the
interactive
qsys
is
the
it's
the
best
way
to
do
this.
You
get
resources
almost
immediately
and
then
you
can
do
prototyping
or
you
can
use
a
Jupiter
which
also
just
requested
some
resources
from
a
batch
system.
A
The
one
thing
that
we
recommend
that
you
do
when
you
run
your
your
code
once
you
get,
you
finish,
the
prototyping
and
you're.
Just
you
now,
you're
about
to
run
like
full
full-scale.
We
recommend
that
you
check
the
utilization
of
either
the
CPU
or
the
GPU
that
you're
using
if
the
utilization
is
under
100%
significantly
under
100%.
A
That
usually
means
that
there
are
some
bottlenecks
in
your
near
work
flow
and
the
most
common
bottleneck
is
to
is
the
data
pipeline
and
essentially,
for
example,
if
you're
running
with
the
GPU,
your
data
pipeline
cannot
keep
up
with
how
fast
the
GPU
is
processing
the
data.
So
it's
just
so.
The
GPU
is
waiting
for
the
data
pipeline
to
fetch
more
data
and
the
there
are
multiple
recommendations
there.
A
The
first
one
is
to
use
the
the
framework
specific
API
for
for
data
loading,
for
example,
for
fighters.
There
is
a
de
apply
towards
data
loader
for
tensorflow
there's,
also
TF.
The
data
set
dot
data,
the
data
set
that
we
recommend
to
use,
and
those
would
also
give
you
enough
knobs
to
tune
the
d-10
gestern
pipeline.
Hopefully
that
will
give
you
best
performance.
If
that
doesn't
work
and
you're
on
the
cpu,
you
can
use
burst
buffer
for
allocating
your
data.
A
If
you
are
on
the
GPU,
you
can
use
the
the
SSDs
that
are
our
slash
temp
on
the
quarry
on
the
GPU
note
itself.
If
you
have
any
questions
about
that,
please
just
ask
or
send
me
an
email
or
send
the
ticket
and
I'll
send.
You
will
send
you
more
kindness
than
that
if
everything
works
and
you're
sure
that
the
data
pipeline
is
all
alright
and
everyone,
but
you
still
have
utilization
issues,
then
you
might
want
to
actually
start
looking
into
some
profiling
tools.
A
For
distributed
training
first
for
tensorflow,
we
recommend
that
you
use
uber
framework
Harvard's
framework.
We
test
this.
We
test
every
version
of
this
on
our
machines.
We
make
sure
that
it
gives
the
best
performance
Harvard
on
a
scale
of
a
few
nodes,
whether
it's
a
few
tens
of
CPU
nodes
or
a
few
GPU
knows.
That
means
you
know
tens
of
GPUs.
A
It
should
give
you
almost
ideal
scaling
without
I/o
if
there
are
no
I
or
bottlenecks
and
that's
what
we
recommend
that
you
use
with
denser
flow,
and
you
can
also
use
it
with
carrots
for
pi
torch.
We
recommend
that
you
use
distributed
data
parallel,
which
is
a
part
of
pi
torch
itself.
This
is
a
class
that
helps
you
do
distributed.
Training.
A
There
is
a.
There
is
one
class
that
is
called
data
parallel,
which
is
much
easier
to
use.
It's
just
one
line,
but
it's
not
efficient,
so
it
it's
just
down
the
line.
It
would
be
a
bit
problematic
to
scale
beyond
like
a
couple
of
GPUs,
so
we
recommend
that
you
use
the
distributed
data
parallel
class
with
distributed
data
parallel.
You
actually
need
to
specify
the
back
at
the
communications
back-end
on
the
CPU
partition.
We
recommend
that
you
use
the
MPI
back-end
on
the
GPU
partition.
A
For
their
workflow
tools,
a
few
words
about
sometimes
one
first
of
all,
Jupiter
as
I
mentioned
earlier,
it's
available.
You
can
access
you
to
the
nurse
job.
I'm
sure
you've
seen
that
in
the
and
the
talks
today,
and
it
should
work
on
both
Cori
Fiat
CPU
for
accessing
monitoring
your
training.
We,
the
most
popular
training,
monitor
framework,
is
tensor
board
people,
both
tensorflow
and
fighters.
Users
use
tensor
board,
it's
very
easy
to
use
and
it's
there's
a
lot
of
community
around
it.
A
A
You
is
integration
with
the
scheduler
and
create
vo
and
integrates
seamlessly
with
this
term,
and
that
that
will
save
you
a
lot
of
time
that
you
can
just
run
it
out
of
the
box,
and
we
have
also
examples
that
you
can
access
to
see
how
to
run
that
for
rate,
you
know,
quarry
GPU
break
yonas
is,
is
it's
very
popular
in
the
community?
A
lot
of
people
are
using
it.
It's
a
for
so
many
different
backends
different
algorithms.
A
All
you
need
to
change
is
two
lines:
the
number
of
nodes
that
you're
using
and
then
the
which
Pytor
version
or
tensorflow
version
you're
using
and
then
just
running
your
code,
and
it
should
run
out
of
the
box
and
it
uses
multiple
MA,
not
only
a
single
GPU
node
with
a
GPS,
but
you
can
use
multiple
GPUs
and
there's
many
yeah,
so
multiple
nodes
that
will
have
so
in
this
case
I'm
using
two
GPU
to
two
nodes
that
have
a
GPUs
each.
So
it's
running
sixteen
experiments
at
any
time
as
I
mentioned
earlier.