►
From YouTube: 03 - Overview of NERSC DL Stack - Wahid Bhimij
Description
Deep Learning for Science School 2019 - Lawrence Berkeley National Lab
Agenda and talk slides are available at: https://dl4sci-school.lbl.gov/agenda
A
Is
a
big
data
architect
in
the
endoscope
at
risk?
He's?
Aha,
you
kisses
by
training.
I
was
in
the
UK
handling
data
challenges
for
the
Large
Hadron
Collider
before
coming
here,
and
you
know
of
the
years
he's,
definitely
developed
a
lot
of
expertise
in
deep
learning,
deep
learning,
as
applied
to
hanji
physics
for
classification
and
generative
problems.
A
B
B
So
I'm
just
gonna
give
a
brief
super
brief
introduction
to
nurse
talk
about
the
production
stack
that
nurse
the
tools
that
we
provide
and
why
and
then
a
little
bit
of
practical
information.
So
that's
probably
what
you'll
actually
want
to
pay
attention
to
at
the
end.
Okay,
so
you
heard
this
from
city.
B
/
nurse
gives
the
mission
APC
Center
for
the
Department
of
Energy,
so
we
support
the
full
range
of
Department
of
Energy
science,
and
you
know
vast
number
of
the
parliamentary
scientists
and
the
main
machine
we
have
on
the
floor
is
Cori
and
there
was
an
Edison
box
here
until
recently,
but
that's
retired
now,
so
the
only
machine
that
really
is
Cori.
This
is
predominantly
made
up
of
around
10,000
ninth
landing
CPU
nodes.
B
So
that's
where
the
bulk
of
the
props
came
and
when
this
machine
was
installed,
it
was
like
the
first
be
an
old
machine
really
and
in
fact,
by
big
drops
because
she's
in
the
country,
but
that's
dropped
down
now.
So
you
know
we
have
this
combination
of
as
well
and
xeon
phi
nodes,
and
then
these
are
all
connected
with
high
speed
into
connect
from
crazies.
On
these
and
then,
of
course,
large
file
system,
both
plus
the
file
system
and
a
flash
burst
buffer
SSD.
B
A
B
So
that
just
has
a
small
number
of
one,
relatively
small
number,
be
100
volt,
so
GPUs,
okay.
So
the
reason
we
have
a
GPU
test
system
is
partly
because
we're
expecting
a
big
GPU
machine
on
X
machine
perlmutter.
So
this
should
have
about
four
times
ability
of
Cori.
And
you
know
a
lot
of
those
props
will
come
from
the
GPU
accelerated
nodes
which
are
exciting
to
people
doing
deep
learning,
but
there's
also
many
parts.
B
So
I
guess
we're
hoping
that
you
guys
are
going
to
push
science
or
to
use
deep
learning
and
therefore
make
exploit
the
GPU
nodes.
But
if
you
don't
manage
to
do
that
and
they'll
still
be
a
large
amount
of
work
that
needs
to
run
on
CPU
nodes
and
so
we'll
have
a
CPU
partition.
That's
you
know
as
big
as
Cory
but
composed
of
AMD
CPUs,
and
so
you
know
this
machine
should
fly
for
deeper.
B
So
we'll
have
optimize
that
to
provide
that
and
a
lot
of
the
preparation
for
that
now
and
then
the
GPU
nodes
will
be
comprised
of
four
of
these
Bolton
next
to
you.
So
current
machine
hands
these
be
100
GPUs.
This
would
actually
of
the
next
generation
GPUs,
and
you
know
a
lot
of
details
about
that
I'm
secret,
but
you
know,
will
at
least
have
tents,
of
course,
and
even
three
for
connecting
the
GPUs
together.
So
you
can
use
them
all
as
well
as,
and
then
this
will
all
be
connected
again
by
a
high-speed
Internet.
B
But
one
of
the
differences
on
this
machine
is
its
Ethernet
compatible.
So
that
will
make
it
much
easier
to
also
transfer
big
data
sets
from
outside
into
the
machine.
You
know
same
kind
of
you
know
for
supporting
me
experimental
science.
So
that's
exactly,
and
then
you
know.
Currently
we
have
this
relatively
small
burst
buffer
as
part
of
Cori,
but
on.
B
Systems
will
be
flash
based,
so
that's-
and
this
is
coming
in
in
late,
2020,
okay,
so
to
get
to
the
production
stack.
So
you
know,
as
Probot
mentioned,
machine
learning
in
science
is
certainly
growing.
So
you
know
last
year
we
did
a
survey
and
there's
lots
of
interesting
results
from
these
survey
that
we
talked
about
another
time.
But
one
thing
is
that
we
saw
that
you
know
the
respondents
to
the
survey
across
various
types
of
sites
that
it's
interest
across
science.
You
know
this
is
a
little
thing.
A
B
With
projects
you
know
and
you're
learning,
more
about
supervised
and
unsupervised
learning
and
different
techniques-
and
you
know
across
this
gap
the
science
examples
already
and
you
know
we
have
in-depth
experience
of
some
of
those.
So
you
know
given
that
interest
and
need
for
deep
learning.
We
want
to
provide
a
platform
if
you
like
for
doing
that.
So
you
know
at
the
top
here
are
the
scientists
or
actual
experiments
and
they
should
have
both.
You
know
interactive
ways
of
doing
that.
B
So
this
is
where
I
do
stir
notebooks
and
stuff
come
in,
but
they
should
also
be
able
to
you
know
plumb
into
automated
pipelines,
and
then
they
should
sit
on
top
of
suitable
methods
and
stuff.
So
that's
where
you
know
things
like
this
school
help
to
thy
shape.
You
know
push
encouraged
cutting-edge
methods,
but
then
these
should
sit
on
top
of
optimized
libraries.
So
we
work
on
making
sure
that
ivories
work
well
on
the
hardware
that
we
have.
We
also
try
and
get
the
best
hardware
to
meet
this
me
so.
B
B
Ok,
so
to
be
more
specific,
here's
kind
of
like
the
deep
learning
stack
or
an
HPC
machine
and
there's
a
bunch
of
things
like
hardware
here
and
the
libraries
that
you
probably
don't
have
to
worry
about.
As
a
user.
You
know
we
will
be
talking
a
bit
about
the
deep
learning
libraries
in
the
distributed
libraries
in
Friday's
session
on
distributed
training,
but
you
know
most
of
what
the
user
sees
is
really
up
here
in
these
sort,
I
level
frameworks
and
talking
about
high-level
framework.
B
B
Is
also
a
significant
growing
framework
of
interest
and
Stephen
supports
a
torch
on
our
machines
and
a
bunch
of
people
are
talking
this
week,
so
I'm
not
going
to
criticize
by
George
boner
much
in
and
I
know,
there's
many
great
things
about
it
as
well,
but
a
lot
of
the
exercises
we'll
be
doing
our
intense
flow,
which
is
the
easiest
work,
particularly
okay.
So
now
the
practical
part,
so
you
can
luckily
because
you've
got
foodie
and
your
new
choice
but
to
pay
attention
so,
okay.
B
So
today's
a
hands-on
session
and
a
little
one
we
have
tomorrow
in
the
Thursday
lunch
self-guided
one
will
all
be
run
in
Jupiter
on
nurse,
so
this
people
have
do
Nazca
counts.
This
is
a
different
Jupiter
site.
Udl
and
I'll
show
you
this
in
a
minute
or
you
can.
If
you
want
just
run
these
exercises
in
Google's
collaboratory,
so
we
have
the
links
for
both.
So
if
you
google
service,
you
can
just
run
them
there
if
you
want,
but
if
you
want
to
use
this
machine,
no.
B
B
On
this
one,
okay,
yeah
well
so
I
mean
so
ultimately,
so
at
the
moment
is
four
hours
so
time
out.
So
if
you
wanted
to
do
so,
I
mean
this
is
something
we've
just
set
up
for
the
school
to
do.
The
hands-on
exercises
and
I
will
come
to
work.
What
hours
it's
available
and
things
like
that.
We
have
a
reservation
in
general,
the
service
nurse.
B
B
B
If
you
take
that
after
I
finish
talking
and
to
the
registration
desk,
then
you
can
get
a
training
account
that
has
a
username
and
password
right
now,
even
if
you
have
a
nurse
account,
if
you
want
to
use
the
reservation
of
CPU
in
this
thing,
you
will
need
to
use
that
training
account
and
return
the
form
to
get
the
training
account,
but
you
can
still
run
it
on
regular
Jupiter.
If
you
want
the
notebooks,
but
you
won't
get
a
GPU,
and
so
they
will
run
slowly.
B
So
you
know,
probably
you
want
to
do
this
just
to
comment.
There
is
a
little
box
for
OTP
I'll
show
you
the
Jupiter
login
there
isn't
an
OTP
for
these
training
camps,
so
just
leave
that
blank
okay,
then
another
practical
thing,
but
tomorrow's
working
lunch
will
make
you
work
again.
But
again
it's
lightweight
working
lunch
with
an
expert,
so
we
have
various
rooms
and
various
people,
including
the
speakers
from
tomorrow.
A
B
B
B
Okay,
so
I'm
just
going
to
show
you
briefly
how
to
run
this
Jupiter
DL.
So
as
I
mentioned
these
GP,
you
know
to
reserve
during
the
hands-on
sessions
outside
those
hours.
The
account
will
work,
but
we
are
sharing
these
GPUs
with
others.
So
you
may
not
get
a
GPU,
and
in
that
case
the
server
won't
start
up
and
probably
get
an
error
message,
but
it
would
be
pretty
obvious
what's
happening
and
similarly,
after
6
p.m.
B
B
Yeah
so
there's
a
no
TV
box
not
filled
in
is
that
what
you
say:
okay,
yeah
and
so
you'll,
just
put
your
training
account
and
password
here
if
you
don't
use.
So
this
is
a
bit
small
now,
but
basically
this
is
the
GPU
now.
So
you
just
press
Start,
hopefully
and
take
a
moment
because
actually
starting
a
batch
job
and
it
might.
B
If
you,
if
you
like
so,
for
example,
so
GPU
sat,
is
installed,
which
is
a
nice
program
to
seeing
what
GPU
you
have
and
run
that
you
see
GPU
or
one
here
and
utilization
of
it
and
then
there's
various
notebooks.
You
know
that
Steve
will
show
you
in
the
afternoon
that
you
can
run
okay,
so
one
gotcha.
Occasionally
you
can.
If
you
have
multiple,
if
you
just
download.
A
B
B
Okay
and
then
one
other
point
so
as
I
mentioned,
this
is
running
on
the
back
system.
It
will
hold
a
GPU
for
four
hours,
just
killing
the
window,
isn't
enough
to
kill
the
process.
So
if
you
do
want
to
be
a
nice
citizen,
you
should
go
to
control
panel.
Stop
the
server
like
this
and
then
you
know
you're
giving
some
feedback
they
stopped
yet
so
they
stopped
so
then
you
can
start
again,
but
then
you
can
logout
or
what
have
you
now?
B
B
B
Okay,
so
that's
that
information
I,
don't
know
if
there's
any
questions
on
that.
Actually
maybe
I
should
take
a
question.
If
there's
any
questions
on
using
this
so
Steve,
oh
okay,
you
have
a
yeah
yeah,
don't
don't
do
hardcore
training
on
the
CPU
I
mean
you'll
probably
be
I
mean
it
should
restrict
you
to
one
core
and
B
right.
Yeah.
B
Okay,
we're
here
to
help.
So
you
know,
there's
a
bunch
of
people
at
the
top
we're
finding
the
organizers,
mikhailo,
isn't
here,
she's,
fair,
but
she'll,
be
here
tomorrow,
and
then
we
have
a
bunch
of
people
who've
kindly
volunteered
to
help
out
as
TAS.
They
have
a
range
of
skills,
summer
systems,
more
systems,
expert
somewhere,
more
deep
learning
experts.
So
you
know
you'll
just
have
to
ask
you
know
and
ask
whoever
and
they'll
find
the
right
person
to
help
you
so
commit
all
these
phases
to
memory
and
then
your
but.
B
Under
websites,
so
there
that's,
these
are
all
TAS
and
Torsten
will
be
helping
out
at
the
scale
and
stuff
is
an
expert
okay,
so
conclusions,
deep
learning
is
awesome,
asylums
and
nurse.
We,
you
know
we
built
on
tools
and
hardware,
and
you
know
algorithms
and
stuff
to
make
sure
we
run
these
on
our
machines
scale.
You
know,
there's
various
challenges
we
face
in
doing
this:
it's
not
just
by
computational
and
methodological
and
practical
challenges.
So
we
do
welcome.
You
know
new
ideas
and
collaborations
and
whatnot.