►
From YouTube: 14 - Deep Learning
Description
Part of the NERSC New User Training on September 28, 2022.
Please see https://www.nersc.gov/users/training/events/new-user-training-sept2022/ for the training day agenda and presentation slides.
A
All
right,
yeah,
so
congrats
everyone
on
making
it
to
the
end.
This
is
the
last
talk.
I
will
try
to
not
go
over
too
long,
but
today,
I
just
wanted
to
give
you
a
little
overview
of
some
of
the
deep
learning
for
science
things
happening
at
nurse
and
then
obviously
tell
you
about
the
Deep
learning
stack,
focusing
on
promoter,
because
that's
our
latest
exciting
machine
and,
of
course,
lots
of
gpus
on
it
so
very
exciting.
For
for
machine
learning
and
deep
learning,
workflows
and
I'll
discuss
how
to
use
some.
A
So
obviously
deep
learning
is
is
a
very
exciting
and
kind
of
growing
field.
It's
it
can
enhance
various
scientific
workflows
and
interesting
ways
can
help
you
analyze.
Very
large.
Complex
data
sets
potentially
help
you
accelerate
some.
You
know
computationally
expensive
simulations.
A
We
see
a
lot
of
enthusiasm
among
scientific
communities
and
adopting
deep
learning
for
various
applications,
there's
a
lot
of
growth
in
machine
learning
and
science
conferences
or
workshops.
There's
a
lot
of
there's
been
some
significant
recognition
lately
for
achievements
and
AI.
So
you
know
the
2018
touring
award
or
some
Gordon
Bell
prizes
recently
were
awarded
for
achievements
in
machine
learning.
A
Deep
learning
and
obviously
HPC
centers
like
us,
are
awarding
allocations
to
do
this
type
of
work
and
we're
optimizing
our
systems
to
to
be
good
at
doing
things
like
machine
learning,
and
then
you
know
in
sort
of
broader
scope,
Beyond
just
obviously
the
doe
is
investing
heavily
as
well
in
AI
for
science,
there's
a
number
of
different
funding
calls
out
there
and
there's
you
know,
there's
this
popular
AI
for
science,
Town
Hall
series,
which
produced
a
very
long
report
but
yeah
it's.
A
You
know
a
very
exciting
field
to
be
working
in
obviously
sort
of
unique
as
well,
because
there's
a
lot
of
interest
from
the
industry
side.
So
there's
a
lot
of
research
being
driven
by
industry,
stakeholders
that
has
led
to
a
huge
proliferation
of
different
machine
learning
techniques
out
there
in
the
scientific
machine
learning
area.
A
So
obviously,
historically,
we've
seen
this
pretty
significant
trend
of
deep
learning,
just
getting
bigger
and
bigger
every
year
right,
so
the
models
are
solving
more
and
more
complex
tasks,
they're
requiring
more
parameters.
For
example,
if
you
look
at
like
large
language
models
today,
they
have
hundreds
of
billions,
even
trillions
of
parameters.
A
We
also
see
this
trend
reflected
in
our
user
base,
so
we
do
a
survey
every
two
years
of
our
machine
learning
users-
and
we
see
you
know-
people
are
interested
in
training,
larger
and
larger
models,
tackling
more
and
more
complex
scientific
machine
learning
tasks.
As
these
you
know,
as
systems
like
promoter
become
more
available
and
accessible.
A
So
for
for
doing
deep
learning
on
HPC
system,
you
really
want
to
be
able
to
take
advantage
of.
You
know
the
fact
that
you're
running
on
a
supercomputer
right,
you
want
to
be
able
to
run,
hopefully
parallel
training,
and
so
there's
a
couple
different
ways
of
doing
that.
The
most
common
one
is
is
data
parallelism.
This
is
where,
if
you
have
a
batch
of
data
that
you're
training
on
you
split
that
up
into
smaller
batches
and
you
send
those
out
to
each
of
the
different
processors
in
your
job.
A
A
This
is
a
little
bit
more
complicated
to
set
up
and
practice
often
so
people
generally
opt
for
data
parallelism,
but
also
you
know,
depending
on
your
problem,
you
might
want
to
do
some
sort
of
hybrid
parallelism
technique
where
you
do
both
data
and
model
parallelism
and
probably
the
most
common
form
of
model
parallelism.
That's
out.
There
is
called
kind
of
like
it's
called
layer
pipelining,
where
it's
kind
of
just
parallelism
only
across
the
different
layers
of
your
model,
so
you
have
the
first
layer
on
your
first
GPU.
A
So,
like
I
said,
deep
learning,
data
parallelism
is
is
by
far
the
most
common
strategy
to
scale
it
out,
especially
if
you're
doing
scaling
across
nodes
or
multi-node
trainings.
A
So
we
see
the
majority
of
our
users
opting
to
use
that
the
great
thing
about
data
parallelism
is:
is
the
leading
Frameworks
like
tensorflow
or
Pi
torch?
They
both
support.
You
know
kind
of
native
data
pipeline
or
data
parallelism
and
pipeline
parallelism
natively.
So
you
don't
have
to
really
do
much
extra
work
to
get
those
kind
of
functional
and
performant.
A
If
you
do
want
some
extra
performance,
especially
in
the
case
of
using
tensorflow,
you
can
also
use
horavod,
that's
kind
of
the
most
popular
distributed
training
framework
that
isn't
actually
built
into
tensorflow
or
Pi
torch
and
there's
a
couple
other
ones
as
well
on
this
plot.
But
but
basically,
all
these
support
either
MPI
or
nickel
back
ends.
Mpi
is
what
you
would
use
if
you're
running
on
a
CPU
cluster.
Obviously,
nowadays
most
people
are
running
on
GPU
systems
and
so
they're,
using
nvidia's
nickel
library
for
communication
between
gpus
foreign.
A
Form
of
data,
parallel
training
or
scaling
up
that
we
see
is,
is
weak
scaling
where
you
try
to
converge.
Your
training
Faster
by
taking
you
know
fewer
training
steps,
but
each
of
those
steps
is
is
a
bigger
step.
A
So
the
way
this
works
is,
if
you
kind
of
look
at
what's
happening
in
in
the
way
you
train
these
models,
you're
using
the
stochastic
gradient
descent,
algorithm
and
you're
sampling,
your
data
and
you're
getting
an
estimate
of
the
gradient
with
respect
to
your
loss,
function,
you're,
trying
to
take
a
step
to
decrease
your
loss,
functions
or
decrease
your
error
right,
and
so
what
you
can
do
is,
if
you
add
more
gpus
to
your
job,
you
can
get
a
larger
Global
batch
size
and
what
that
gives
you
is
hopefully
a
less
noisy
or
a
better
estimate
of
the
actual
gradient
that
you
care
about,
and
so
hopefully
you
can
take
it's
safe
to
take
a
larger
step
right,
so
you
can
use
a
larger
learning
rate
in
this
gradient
descent
algorithm.
A
So
in
this
cartoon
example,
here
it's
just
kind
of
a
diagram
showing
maybe
on
your
single
GPU
training.
You
have
to
take
three
steps:
three
different
gradient
updates,
whereas
if
you
have
more
gpus
with
a
bigger
batch
size,
you
have
a
better
estimate,
so
you
can
just
take
one
big
step,
so
that
sounds
great
in
practice.
Obviously,
there's
some
caveats
sometimes
so
this
often
requires
a
lot
of
tuning
to
get
it
exactly
right.
A
If
you
want
to
converge
stably
at
large
scale,
and
so
there's
a
lot
of
different
considerations,
little
tricks
you
can
do
where
you've
changed
the
learning
rate
throughout
the
training,
maybe
warm
it
up
and
then
scale
it
up
or
and
slowly
Decay
it.
You
can
use
different
optimizers.
They
have
these
special
adaptive
optimizers,
for
example.
There's
a
lot
of
details
there
so
I
encourage
you
if
you're
curious,
to
go
check
out
our
deep
learning
at
scale
tutorial.
A
Looks
like
on
promoter
for
deep
learning.
Our
general
strategy
here
is
to
give
you
kind
of
functional
and
high
performance
installations
kind
of
out
of
the
box,
and
we
do
this.
We
focus
on
the
most
popular
Frameworks.
Obviously,
but
we
also
want
you
to
be
able
to.
You
know,
have
enough
flexibility
where
you
can
customize
it
to
your
particular
use
cases,
maybe
install
whatever
python
packages
that
you
need
for
your
domain
specific.
A
You
know,
data
analysis,
steps,
or
maybe
you
have
special
data
pipeline-
that
you
need
to
set
up
to
read
your
data
files,
so
flexibility
is
also
key.
We
support
the
the
main
you
know.
The
top
three
Frameworks
right
now
are
tensorflow
Keras
and
pytorch,
and
basically
Keras
is,
is
now
folded
into
tensorflow.
So
you
can
access
all
the
Keras
API
calls
just
through
tensorflow.keras
to
do
distributed
training
with
either
of
these.
A
You
can
of
course,
use
whatever
is
already
there
built
in
to
each
of
those
or
you
can
use
the
horrified
library
that
I
mentioned,
so
we
also
provide,
for
example,
our
tensorflow
installation.
We
use
horovod
to
help
do
distributed
training,
that's
there
by
default,
and
then
you
know
external
tools
that
are
really
useful
for
deep
learning.
That
we've
heard
some
great
info
on
already
are
Jupiter
and
shifter.
So
I'll
mention
a
little
bit
more
details
on
those
later.
A
Yeah,
like
I,
said
you
know,
out
of
the
box.
I
think
the
easiest
way
for
you
to
get
up
and
running
on
promoter
just
to
do
deep
learning
is
to
just
use
the
modules
that
we've
already
installed.
So
we
have
tensorflow
and
pipe
torch
modules.
Of
course,
tensorflow
and
pytorch
are
just
you
know,
pythonic
libraries
right,
so
you
just
it's
the
top
level
language
that
everyone
loves
in
machine
learning
is
python,
so
these
are
just
content
environments,
basically
that
we've
built
with
optimized
installations
of
of
the
software
stack
for
tulsiform
pytorch.
A
We
have
a
couple
different
versions
available.
So
if
you
have
a
need
for
a
particular
version,
you
can
explicitly
load
that
or
you
can
just
try
to
pick
up
whatever
one
is
default.
There
we've
already
heard
a
lot
about.
You
know
how
to
customize
your
your
python
environments
and
stuff
one
kind
of
easy
way
you
can
do
it
with
these
is
just
to
use
this
pip
install
user
method
and
and
that
works,
because
we
we've
set
automatically
that
python
user
base
folder
for
you.
So
it
doesn't.
A
You
know
anything
you
install
on
top
of
the
tensorflow
or
pipe
torch
modules.
It
won't
kind
of
pollute
any
of
your
other
python
environments,
which
is
convenient.
Another
great
option
is
you
can
just
do
a
direct
conda
clone
of
any
of
these.
So
if
you
do
like
module,
display,
tensorflow
or
module
display,
pytorch
you'll
it'll
show
you
like
the
path
of
the
actual
condo
environment,
corresponding
to
that
module.
You
can
clone
that
into
your
own
personal
version
and
then
do
whatever
you
want
with
it
afterwards
and
then.
A
A
Obviously,
I
encourage
you
to
come
back
to
this
presentation
afterwards
and
visit
all
the
links
to
our
documentation.
There's
a
lot
more
info
there
and
some
code
examples
that
you
can
kind
of
copy
for
all
these
different
use
cases.
A
We
just
heard
a
great
presentation
on
shifter
Pearl
Mudder,
so
that's
our
current
solution
for
supporting
containers
and
it's
great
it's
what
I
use
for
pretty
much
all
of
my
deep
learning
workloads.
It's
I
think
it's
pretty
easy
to
use
and
as
Laurie
mentioned,
it's
it's
very
performant,
especially
at
scale.
So
even
our
you
know
our
top
500
entry
used
a
container
to
run.
A
You
can
just
see
the
the
currently
available
images
on
the
system
by
doing
shifter
image
images
and
as
they're
shared
across
all
users,
there's
actually
a
lot
of
pytorch
or
tensorflow
containers
kind
of
already
there
waiting.
So
you
might
even
just
be
able
to
grab
one
of
those
and
start
using
it.
A
A
So
I
won't
go
over
these,
but
I
will
just
mention
that
yeah,
as
Laurie
mentioned,
the
the
Nvidia
containers
for
deep
learning
on
gpus
are
by
far
I
think
the
best
starting
point
so
that
these
are
the
NGC
or
Nvidia
GPU
Cloud
containers
they've
already
set
up
kind
of
optimized
images
with
pytorch
or
tensorflow
and
horovod.
These
have
optimized
drivers
and
Cuda
runtimes,
nickel,
coding
and
installations.
So
literally
everything
you
would
need
right,
there's
a
lot
of
different
versions
available.
A
A
We
also
provide
some
versions
of
these
that
are
kind
of
like
you
know,
nurse
specializations
of
them
and
those
just
have
a
couple
useful
extra
python
packages
in
them
that
we
see
a
lot
of
our
deep
learning
users
kind
of
wanting
or
using
frequently,
for
example,
in
our
pie,
charts
one.
We
install
the
inops
library
because
that's
a
pretty
popular
library
for
doing
kind
of
tensor,
manipulations
and
models.
A
We
also
have
our
in
these
ones.
We
have
a
parallel
H5,
Pi
installation,
so
that's
kind
of
convenient.
If
you
have
maybe
some
training
that
you're
doing
and
then
you
want
to
do
parallel,
I
o
afterwards.
A
Yeah,
you
can
also
build
your
own
containers
if
you
want
it's
very
easy
to
build
on
top
of
nvidia's
NGC
containers.
In
fact,
that's
exactly
what
we
do
and
we
have
some
examples
for
how
to
do
that
link
from
our
documentation.
A
You
can
also
do
this
pip
install
user
method-
if
you
want
you
just
have
to
manually,
set
this
this
python
user
base
past
yourself
and
then
finally,
Lori
totally
went
over
this,
so
I
don't
even
need
to
mention
it,
but
yeah
the
entity,
NGC
containers
use
openmpi.
So
of
course
you
need
to
do
that.
Little
extra
step
where
you
disable
the
end
pitch
model
module
for
shifter
and
use
MPI
equals
pmi2
foreign.
A
Sort
of
General
guidelines
for,
if
you're
doing
distributed
training-
or
maybe
you
have
your
single
GPU
code
and
you
want
to
make
it
a
multi-gpu
code
if
you're
working
in
tensorflow,
we
recommend
using
horovod
to
do
this
and
that's
just
because,
if
you're
going
kind
of
beyond
the
single
node
scale,
so
if
you're
doing
like
multi-node,
you
need
16
nodes
for
your
training,
it's
much
easier
to
use
horovod
in
our
opinion
than
the
built-in
tensorflow
distribution
strategy.
A
It's
yeah,
it's
easy
to
use
with
our
slime
schedule
and
it
uses
MPI
and
nickel
to
kind
of
coordinate,
Communications
and
send
data
between
processes.
Obviously
it's
great
because
it
has
lots
of
examples
online
too,
so
it's
pretty
easy
to
just
follow
and
and
start
working
quickly
on
it.
A
Tensorflow
also
has
some
really
good
profiling
capabilities
built
in.
So
if
you
want
to
kind
of
improve
the
performance
of
your
training
code,
look
at
maybe
what
what
part
might
be
slowing
you
down,
there's
a
really
easy
way
to
just
kind
of
import:
the
tensorflow
profiler
and
use
it
for
pytorch.
We
don't
even
really
need
anything
beyond
just
a
library
itself.
So
pytorch
has
a
really
good
built-in
library
for
district
distributed
training.
It's
called
distributed
data
parallel.
A
It
just
kind
of
wraps,
whatever
model
that
you've
already
created,
makes
it
really
easy
to
do
distributed.
Training
they've
spent
a
lot
of
effort,
kind
of
optimizing
this
and
making
examples
and
stuff.
So
it's
a
great
starting
point
and
this
one
actually
doesn't
even
need
NPI,
so
it
just
uses
nickel
for
all
Communications
between
gpus.
A
Just
for
some
some
extra
sort
of
General
tips
here,
as
I
said
we
we
recommend
providing
our
or
using
our
you
know
already
provided
built
modules
or
containers.
If
you
can
that's
a
very
good
starting
point,
it
could
probably
limits
the
amount
of
setup
work
that
you
have
to
do
and
we've
already
tested
these
pretty
thoroughly
for
functionality
and
performance
also
allows
us
to
kind
of
track
who's
using
what
and
helps
us
kind
of
set
up
our
support
strategy
for
future
systems.
So
that's
nice.
A
A
If
you
want
to
track
your
trainings
I
recommend
using
either
tensorboard
or
weights
and
biases.
These
are
external
tools.
I'll
talk
about
in
a
moment
that
help
you
kind
of
track.
What's
going
on
during
training
and
then,
of
course,
for
performance
tuning.
You
can
do
things
like
check
the
CPU
and
GPU
utilization
to
see
if
there's
bottlenecks,
so
you
can
use
something
like
top
or
Nvidia
SMI.
A
To
do
that
and
that'll
just
tell
you,
you
know,
for
example,
if
your
GPU
utilization
is
really
low,
maybe
that's
an
indication
that
your
data
pipeline
is
not
very
efficient.
This
is
often
the
most
common
source
of
bottlenecks.
We
see
in
our
users,
training
codes.
You
know
the
the
CPU
is
trying
to
get
some
data
off
the
file
system
and
provide
it
to
the
GPU
for
the
training
step,
so
the
GPU
is
kind
of
just
waiting
and
it's
not
the
most
efficient
and
so
to
speed
that
up.
A
You
can
kind
of
use
like
some
of
the
recommendations
that
are
just
built
into
these
Frameworks
like
tensorflow
or
Pi.
Torch
have
a
lot
of
recommendations.
You
can
use
multi-threading
in
your
data
loader.
You
can
try
and
Stage
data
or
Cache
it
so
recommend
following
their
tutorials
on
optimizing,
your
data
loader
for
sure.
A
If
you
really
want
to
you
know,
do
a
deep
dive.
You
can
of
course
profile
your
code.
You
can
use
nvidia's
Insight
systems
tool
for
that
or
you
can
use
any
of
the
you
know
like
the
built-in
tensorflow
profiler
tensorboard
also
has
a
profiler
that
works
with
pytorch.
If
you
want
so
recommend
those
as
well
I
guess
I,
don't
really
need
to
say
much
about
Jupiter,
since
we
already
had
some
excellent
presentation
on
that.
I
will
just
point
out
that
we
already
have
our
our
you
know.
Tensorflow
and
pytorch
modules
installed
as
kernels.
A
So
if
you
start
up
a
server
and
start
up
a
notebook
in
that
server,
you
can
just
select,
you
know
tensorflow
and
it
should
work.
Pretty
much
out
of
the
box
should
be
able
to
import
tensorflow
easily
same
thing
for
pytorch
or
you
can
use
your
own
custom
kernel.
If
you
have
specific
libraries
that
you
need
so
I
touched
a
little
bit
on
tensorboard,
which
is
different
from
tensorflow
tensorboard.
You
can
use
with
either
tensorflow
or
Pi
torch,
but
this
is
a
great
tool
for
visualizing
and
kind
of
monitoring
your
experiments.
A
So
as
you're
doing
a
model
training,
you
can
track
the
loss
over
time.
You
can
see,
you
know
you
can
add
custom
metrics.
So
if
you
have
some
specific
statistic
that
you
care
about,
you
can
see
what
the
value
of
that
is.
We
have
a
little
tensorflow
helper
or
sorry
tensorboard
helper
in
Jupiter.
A
So
if
you
have
a
Jupiter
notebook,
you
can
just
import
this
and
then
it'll
give
you
a
URL
to
go
a
visit
and
that
it
should
be
where
all
of
your
data
is
kind
of
getting
displayed
into
a
nice
little
convenient
dashboard.
A
Now,
beyond
that,
it's
also
very
important
in
deep
learning
to
do
hyper
parameter
tuning
so
hyper
parameter.
Optimization
is
a
key.
You
know
stage
of
the
deep
learning
process
and
obviously
it
can
be
sort
of
embarrassingly
parallel
if
you're
just
searching
over
a
wide
range
of
parameters.
So
it's
a
good
fit
for
systems
like
promoter
where
you
have
lots
of
resources
available,
because
there's
just
so
many
tools
out
there
for
HBO.
We
don't
really.
You
know
like
ask
that
you
use
one
in
particular.
A
We
we
kind
of
generally
support
whatever
people
want
to.
We
don't
install
these
in
their
own
separate
things.
So
some
of
these
are
already
there
in
the
tensorflow
and
pytorch
modules
that
we
built,
but
they're
also
probably
easy
to
set
up.
If
you
need
some
custom
solution
for
HBO.
A
All
right,
I
am
going
as
fast
as
I
can
here.
These
are
the
last
two
slides,
so
hopefully
we're
not
too
much
over
time.
I
just
wanted
to
mention
also
some
additional
resources,
for
you
know
people
who
are
maybe
more
newcomer
or
they
want
to
see
they
have
some
deep
learning
familiarity,
but
they
don't
know
too
much
about
applying
it.
For
actual
you
know,
scientific
applications,
the
Deep
learning
for
Science
school
is
something
we
put
on
a
couple
years
ago.
That
has
a
lot
of
resources.
A
So
all
of
the
lectures
and
the
demos
and
stuff
are
available,
so
I
recommend
visiting
that
and
looking
through
there's
some
interesting
topics
there
that
definitely
go
beyond
just
like
introduction
to
deep
learning
style.
Stuff
I
also
mentioned
this
deep
learning
at
scale
tutorial.
So
this
will
give
you
a
lot
of
detailed
information
on
how
to
profile
optimize
your
code
and
then
how
to
start
scaling
it
out
across
multiple
gpus,
multiple
nodes
up
to
maybe
thousands
of
gpus.
A
A
Yeah,
so
that's
that's
all
I
had
for
you
today,
thanks
for
thanks
for
your
attention.
Thanks
for
your
interest
in
deep
learning,
I
hope
you
agree
that
there's
a
lot
of
good
options
for
doing
machine
learning
and
deep
learning
on
Pro,
Mudder
and,
of
course,
file
any
tickets
or
or
reach
out
for
any
additional
assistance.
You
need
and
I'll
just
end
with
one
more
plug
for
this
machine
learning
at
nurse
survey,
which
is
what
we
do
every
two
years.