►
From YouTube: 8. Jupyter Deployment at NERSC
Description
June 12, 2019 Jupyter Community Workshop talk by Rollin Thomas, National Energy Research Scientific Computing Center
A
A
So
anyway,
we're
the
production
user
facility
for
HPC
and
data
for
our
Department
of
Energy
funded
researchers.
So
that's
people
at
universities
and
other
national
labs
in
the
country.
This
is
all
our
stuff.
We
don't
have
Edison
anymore.
We
just
unplug
that
and
disappeared
last
week,
basically,
but
we're
essentially
a
huge
file
system
and
network
with
computers
that
we
attach
and
then
detach
every
five
years
or
so.
A
B
A
A
One
of
the
things
that's
a
trend
for
other
HPC
facilities.
Is
people
showing
up
to
have
experimental
and
observational
data
sets,
at
least
within
the
Department
of
Energy,
a
lot
of
those
people
who
maybe
would
have
stood
up
a
departmental
cluster
or
depended
on
local
resources
at
their
University,
and
they
write
into
their
grant.
You
know
hey,
we
need
to
purchase
this
hardware
and
it
needs
to
run
for
five
years.
Increasingly.
A
Do-E
has
been
kind
of
telling
those
people
that
they
need
to
figure
out
a
way
to
rented
the
at
the
HPC
facilities,
leadership,
class
facilities,
but
mainly
that
means
here
so
there's
all
of
these
experiments
that
are
showing
up
with
lots
of
data
and
the
stuff
they
want
to
do
with.
It
is
different
from
the
kind
of
stuff
we've
been
doing
for
the
past
twenty
years
or
so
so
we're
shifting
we're,
adding,
basically
not
really
shifting
we're
adding
a
bunch
of
data
analytics
machine
learning.
A
Real-Time
data
analysis,
because
a
lot
of
what
these
people
want
to
do
is
they
want
to
take
data
at
a
beam
line
on
a
synchrotron
or
something
like
that
figure
out.
How
should
I
rotate
the
sample?
That's
a
big
computation,
so
I
need
to
ship
the
data
over
to
some
place
that
can
actually
be
sitting
there
and
ready
to
do
a
kind
of
larger
parallel
computation
and
send
the
answer
back,
and
then
they
want
to
look
at
that
and
then
decide.
Okay,
I'm
gonna
go
five
degrees
more
this
way
or
whatever.
A
So
this
kind
of
dynamic,
workflow,
human-in-the-loop
and
analysis
and
steering
of
experiments
is
something
that
we're
we're
really
looking
forward
to
helping
people
with
the
system.
We've
got
on
the
floor.
Right
now
is
Cory.
This
is
our
first
system
that
was
supposed
to
address
the
needs
of
simulation
and
data
people
at
the
same
time,
and
so
it
has
a
bunch
of
these.
What
we
like
to
call
data
features,
data
friendly
features,
namely
slurm,
which
is
a
kind
of
a
huge
engagement
for
us,
is
working
with
the
CERN
developers.
A
It's
getting
be
containers
first
buffer
globus
file
transfer.
We
have
nodes
set
aside
for
for
data
transfer,
work,
look,
Oh,
work
flow
to
notes,
and
then,
and
then
there
was
like
one
node
for
things
like
Jupiter
as
part
of
the
contract
and
I
think
the
stupid,
the
Jupiter
part
was
the
best
part.
A
Why
did
we
take
over
running
a
hub
service?
What
we
kind
of
noticed
early
on
was
the
users
figured
out
how
to
run
the
notebook
being
SSH
tunneling,
and
then
they
could
do
Jupiter
stuff
at
nurse
scan.
They
wrote
blog
posts
about
it.
Like
here's,
how
you
use
Jupiter
head
nurse,
you
just
installed
Jupiter,
and
then
you
can
set
up
this
complicated
SSH
tunnel
and
use
that
maybe
we
should
help
dial
it
back.
B
A
It
was
around
that
time,
actually
I'm
so
okay,
so
we
wanted
to
kind
of
embrace
this
and
make
people
not
have
to
do
SSH
tunneling,
and
all
of
that,
so
you
know
Jupiter
hub
was
helped
us
do
that
by
letting
us
kind
of
standardize
the
service
and
authenticate
people
the
way
we
wanted
them
and
educated
and
maybe
help
them
kind
of
manage
the
the
process
of
setting
that
up.
So
hopefully
they
can
just
get
started,
doing
Jupiter
stuff.
Here's,
our
I,
guess
our
history
at
nurse
I.
A
Guess
we
invited
Fernando
to
give
a
nurse
user
group
talk
back
in
2013
I
guess
we
stood
up
Jupiter
installation
on
kind
of
some
hardware
that
we
said
hey
are
you?
Are
you
throwing
that
away?
Can
we
use
it?
So
we
you
know
we
set
up
a
Jupiter
hub
instance
there
and
a
couple
people
used
it
and
they
liked
it.
A
The
thing
that
they
liked
was
that
we
could
mount
the
global
nurse
global
file
system,
so
they
could
see
their
NATO
sitting
on
the
project
final
system,
basically
and
that's
the
place
where
we
tell
people
to
put
their
data
so
that
they
can
share
with
other
people.
It's
not
the
high-performance.
While
your
job
is
running
you
can
you
can
hit
that
file
system,
but
it's
for
sharing,
so
you
know
people
could
could
make
plots
and
they
could
do
little
data
analytics
tasks
on
that
on
that
one
node.
A
But
the
next
thing
that
we
did
was
move
the
place
where
notebook
spawn
to
be
inside
Cori
well
on
Cori
I,
don't
mean
inside
because
inside
and
outside
crater
are
different,
but
this
is
more
outside
Cori
bits
on
a
login
node
that
we
repurposed,
or
actually
we
set
aside
from
the
beginning
for
running
Jupiter.
So
Correa
is
something
like
24
login
nodes,
which
is
a
lot
for
us
really.
Only
12
of
them
are
in
the
load
balancer
for
users
to
actually
ssh
into
the
higher
numbers.
A
Login
notes
are
reserved
for
these
kind
of
big
memory,
notes
or
Jupiter,
or
file
transfer,
workflows,
and
things
like
that.
So
we
have.
We
got
one
node
for
our
notebooks
and
you
know
maybe
20
users
used
it
for
a
while,
but
I
think
somebody
said
yesterday
like
they,
you
know
you
give
users
a
resource,
it
becomes
theirs
right,
and
so
everybody
started
wanting
it
for
themselves
on
one
note,
basically
and
so
that
that
worked
for
a
while
and
we
actually
kind
of
not
the
best
in
terms
of
programming,
Jupiter
components.
A
A
A
I
really
like
to
do
mice
do
a
lot
of
stuff
that
nurse
simply
because
you
have
have
Jupiter
here.
So
this
was
great
for,
like
maybe
20
or
30
users,
but
as
more
users
gone
on,
they
started
to
notice
that
it
was
one
note
and-
and
they
could
like
crash
it.
A
Maybe
we're
gonna
change
them
around
to
be
Jupiter
stuff,
and
so
a
lot
of
the
work
that
we
did
was
was
kind
of
socializing
that
our
architecture
so
I'll
do
a
little
architecture,
diagram
demo.
A
So
we
run
the
hub
glory.
We
run
it
on
a
container
infrastructure
called
spin.
It's
Rancher
underneath
it's
running
the
old
Rancher
scheduler,
but
soon
it's
going
to
turn
into
kubernetes.
So
we'll
make
that
jump
this
here.
So
we
run
the
hub
there.
We
have
a
few
other
containers
sitting
alongside
we
split
out
the
database,
we're
splitting
out
the
proxy.
We
have
a
couple
extra
services
that
we
run
alongside
to
call
notebook.
We
let
them
be
idle
for
24
hours
and
we
shut
them
down.
A
We
have
like
a
monitoring
container
that
runs
alongside
as
well
and
sends
information
to
our
central
data,
collect,
set
height
of
quarry
in
a
in
a
docker
container,
and
then
we
have
two
custom
components
in
the
classic
sense.
We
have
our
own
Authenticator
because
we
have
multi-factor
authentication
started
out
with
GSI
SSH,
but
that
started
going
away.
A
It
was
a
real
pain
to
keep
that
working,
because
it
was
a
service
that
needed
to
run
when
the
node
came
up,
but
we
have
our
own
custom
Authenticator
that
uses
our
internal
authentication
mechanism
and
our
own
kind
of
spawner
infrastructure
that
lets
us
do
what
I'm
about
to
show
you.
So
this
is
our
Authenticator.
Here
we
have
an
internal
API
for
managing
generation
of
networks
with
our
multi-factor
authentication.
Once
a
user
is
authenticated,
they
can
choose
where
their
notebook
is
going
to
spawn.
A
In
the
center,
so
if
Cory
is
down
for
maintenance
say
and
they
have
a
paper
deadline
and
they
want
to
make
a
plot
say
we
don't
want
to
tell
those
people
sorry.
So
we
have
another
container
sitting
inside
inside
of
spin
that
allows
users
to
start
up
a
notebook
inside
this
shared
container
and
at
least
make
some
and
make
their
plots
with
kubernetes.
We
might
be
able
to
do
something
kind
of
you
know
more
normal
to
spawn
over.
On
the
login
nodes
on
on
Corrine,
we've
now
been
able
to
repurpose
three
nodes:
three
nodes
total.
A
You
might
be
able
to
get
a
couple
more.
Those
spawned
by
using
SSH
spawner
that
we've
written
I
think
there's
a
couple
SSH
spawners
out
there,
but
we
we
sit
on
top
of
a
sink
SSH.
To
do
that,
once
you
have
a
notebook
running
on
a
login
node
on
Cori
we've
extended
the
internal
network
on
query,
which
is
all
the
compute
nodes
which
don't
have
routable
IP
s.
A
Generally,
we
extended
that
network
out
to
the
login
node
of
these
high
numbered
login
notes,
so
you
can
have
a
notebook
running
or
a
thirteen
or
fourteen
or
19
and
talk
to
say
a
task
cluster
running
in
a
job
that
you
started
up
through
a
regular
job
submission,
so
I
got
to
know
as
the
IP
of
the
internal
IP
of
that
of
the
head
node
of
that
job.
Ok,
so
that's
a
super
popular.
A
We
think
this
could
probably
be
the
most
popular
way
that
people
are
going
to
combine
Jupiter
and
the
batch
queues,
because
their
notebook
gets
to
stay
around
basically
forever.
There's
another
way
to
do
this,
which
is
kind
of
straight
batch.
Spawner
start
the
notebook
up
in
a
job,
and
we
have
a
API
for
associating
IPS
on-the-fly
to
jobs
for
a
small
number
of
IPs,
so
part
of
the
job
startup
is
is,
is
to
hit
that
Sdn
API
and
get
a
get
an
IP
address.
A
So
I
should
I
really
I
want
to
point
out
that
a
lot
of
the
infrastructure
services
we've
added
on
the
center
side
for
SSH.
A
Shane
here
so
there's
a
lot
of
infrastructure
stuffs
that
we
did
that
it
really
helps
to
have
like
somebody
that
understands
how
the
the
center
infrastructure
works
to
be
able
to
do
these
kind
of
things
and
then
batch
spawner,
okay,
trace
was
here.
I
think
you
showed
some
extensions
yesterday.
One
of
them
is
this:
Jupiter
lab
slurm
jupiter
lab
plugin.
Let
you
look
at
the
queue
and
top
jobs,
and
things
like
that,
so
that's
in
development,
actually
I'm
william
over
there
has
been
working
on
it.
A
A
Alright,
so
we
have
I
didn't
talk
about
how
our
deployment
model,
but
we
do
like
a
monthly
deployment
cycle.
We
have
somebody
said
I
have
a
B,
dev
and
test
or
whatever
we
have
like
test
stage
and
then
the
production,
the
production
stack.
So
what
I'm
showing
here
is
the
stage
one
which
is
the
one
that
is
like
right
before
we
do
the
production,
so
we've
customized
the
authenticator
we've
customized
the
login
page
template,
so
that
we
can
stick
in
our
flavor
of
multi-factor,
authentication,
Thanks
and
I'm
logging.
A
It
is
myself
and
so
I'm
staff,
so
I
see
some
things,
not
non-staff
people
don't
see.
So
this
is
our
console
or
our
home
page
and
again
this
is
customized
as
well.
What
we
wanted
it
was
enable
users
to
pick
one
of
these
to
run
at
a
time.
So
you
can
run.
This
is
the
shared.
A
A
One
of
those
is
test
fit
and
then
over
here
you
have
to
be
in
a
special
QoS
to
be
able
to
see
the
view
notes
which
actually
are
running
on
separate
sermon
controller
and
it's
not
actually
part
of
Corey
anyway
I
mean
I,
can
push
these
buttons
and
and
they'll
do
things.
But
so
this
is
starting
up
the
GPU,
a
GPU
node
job,
it's
fairly
fast,
but
if
I
go
to
start
up
a
job
on
the
on
the
CPU
nodes,
using
that
funner,
it's
pretty
slow,
because
our
slurm
is
super
super
busy.
A
Okay,
so
it
might
take
three
minutes
before
your
job
starts
up,
and
then
this
is
some
thing.
I'll
come
back
to
it
in
a
minute
which
is
our
users
are
like
always
like?
Well,
I
want
to
go
back
to
the
console,
but
it
always
leaves
this
extra
window
open
and
if
I
stop
my
server
it,
it
makes
the
page
gray
and
there's
errors
and
stuff
and
I
don't
like
it.
Let
me
look
the
test
user,
though
so
I
have
a
test
user.
A
B
B
A
A
The
point
I
want
to
make,
though,
is
how
do
we
do
that
right?
We've
got
to
know
some
stuff
about
me
and
we
got
to
know
some
stuff
about
users,
and
this
is
important
for
if
we
have
say
an
options
forum,
we
don't
want
to
list
all
accounts
that
live
that
nurse,
just
the
ones
associated
with
the
user.
We
don't
want
to
list
all
the
shifter
images
that
our
nurse
just
the
ones
that
they
should
care
about.
A
We
don't
want
to
list
all
the
reservations
that
are
in
slurm
right
now,
just
the
ones
the
user
can
submit
to
say.
So
we
have
us.
We
have
internal
services
that
we've
exposed
through
a
REST
API
or
through
REST
API.
So
let
us
get
at
that
information.
We
do
have
a
little
bit
of
difficulty
getting
in
that
information
at
the
home
page,
but
we
have
no
problem
getting
that
at
the
options
forum
stage,
because
that's
a
callable,
that's
a
callback,
that's
a
covert
team,
so
we've
shoehorned
in
a
little
fix
that
is
kind
of
waiting.
A
Actually
have
a
real
production
one
here,
because
you
charge
equal
force
up,
so
that's
something
that
that
we've
developed,
but
things
like
what
reservations
aren't
there
much
if
your
images
are
there,
these
are
done.
These
were
done
for
this
demo
by
SS
aging
to
the
Machine
and
running
in
command,
so
should
fix
that
oryx
terms
and
have
a
recipe
anyway.
So
we
have
a
new
machine
coming
and
Jupiters
going
to
need
to
work
on
that.
This
is
what
that
page
is
gonna,
look
like
maybe
in
a
year
or
so,
you'll
have
all
these
options.
A
A
Okay,
so
this
is
the
thing
that
we
ran
into
is
that
we
we
have
these
things
we
want
to
expose
to
users
at
the
center,
our
computational
resources,
our
file
systems.
If
they're
gonna
be
submitting
jobs,
we
need
to
make
it
easy
for
them
to
say
well,
you
know,
use
my
default
repo
or
use
this
particular
repo
that
I
want
to
use
for
this
job
or
I,
have
reservation
and
I
can
only
submit
from
this
repo
or
whatever.
A
B
A
Cory
has
login
nodes
where
which
you
just
log
into
you,
can
generally
the
regular
login
loads
are
for
compiling
code.
Messing
with
your
you
know,
writing
software.
Looking
at
your
data
may
be
a
little
bit
interactive
stuff,
but
then
submitting
jobs
with
batch
queue
that
runs
on
compute
nodes,
which
are
not
normally
accessible.
The
same
way
from
the
outside.
B
A
B
B
A
I
didn't
ever
know,
goober
nodes
are
yeah,
so
these
these
ones,
these
repurpose
login
nodes
I,
should
mention
that
a
normal
login
node
has
maybe
50
people
on
it
and
you
given
time
a
lot
of
Martha
sitting
there
hardly
doing
anything
like
editing
a
file
Jupiter
the
Jupiter
nodes.
We
currently
have
concurrently,
like
200
notebooks
running
at
any
given
time
across
the
stree
nodes.
When
it
was
a
hundred,
the
node
was
going
over
like
every
other
day.
A
Cgroups
memory
secrets
are
in
place
so
that,
if
anybody
kind
of
gets
today
we
know-
and
they
just
don't
know
what
happened.
But
hey
don't
do
that.
So
so,
but
you
know,
we've
we've
talked
to
the
Systems
Group
about
about
other
alternatives
like
putting
people
into
jobs.
They
don't
actually
know
our
running
things.
That's
learn
can
do
that
might
be.
You
know,
it's
just
me
and
the
C
group
yeah.
B
A
A
We
do
recognize
that
there
are
some
security
things
that
we
need
to
address
and
that's
one
of
them
also
if
somebody
starts
up
a
bouquet
server
and
they
have
a
Sdn
entry,
their
bouquet
or
sorry
desk
in
the
bouquet
server
sitting
there,
for
instance,
they
don't
have
that
guy
behind
TLS
people
could
look
at
what
they're
doing
people
could
send
mess
up
their
ass
up
their
bouquet
server.
I!
Guess
that,
but
we
this
is
these
things.
B
B
A
We
don't
yeah
so
the
way
I
look
at
it
is
you
could
you
could
ask
that
about
kind
of
anything
you
could
do
with
Sdn
API,
and
so
we're
actually
still
reviewing.
Is
that
the
way
that
we
want
to
do
this
and
one
of
the
things
we're
talking
about
is,
should
we
run
CHP
for
the
rally
on
an
on
a
serious
note,
yeah.
B
B
A
Yeah
I
appreciate
that,
okay-
and
this
is
my
last
slide-
is
in
terms
of
wish
list
which
I
should
have
put
in
quotes.
It's
not
really
wish
list
is
just
stuff.
We
didn't
really
know
how
to
do
and
then
thought.
Maybe
there's
a
thing
to
do
all
the
stuff
I
talked
about
that
refers
to
like
what
a
council
user
can
charge
you
their
shifter
images,
all
that
stuff
we're
just
sticking
it
on
of--steak,
which
gets
set
when
you
authenticate
that
can
be
refreshed,
which
is
great.
A
So
if
they
go
into
our
accounting
system
and
they
sign
up
onto
a
new
reap
allocation,
we
can
probably
get
that
the
difficulty
was
getting
it
at
the
point
we
needed
it,
which
was
on
our
our
our
name.
Servers
iPage,
basically
grabbing
ahold.
Of
that
exactly
means
I
mean
you
have
to
do
it
from
from
inside
co-routine.
So
the
get
method
there
was
okay,
but
it's
kind
of
weird
looking
that
it's
just
sitting
there
going
hey
by
the
way
go
find
out
what
all
the
user
can
do.
A
Maybe
there's
a
better
place
to
put
this
so
I
think
it's
kind
of
jammed
in
there,
but
hey
it
worked
secure.
I
know
there
was
like
a
discourse
topic
I
think
you
think
UV
started
it
may
a
couple
months
ago.
One
is.
We
would
really
like
to
know
about
the
possibility
of
notebook
level
on
it.
So
every
cell,
every
thing
that
goes
through
the
X
term
extension
tutor
Minato
we're
interested
in
finding
out
how
to
log
that
hubs.
A
Fine,
all
that
logging
is
going
up
central
data
collect
and
you
don't
have
to
do
anything,
that's
part
of
our
infrastructure,
and
there
is
interest
from
our
networking
and
security
group
on
the
on
the
code
review
side
of
things.
So
if
you,
if
there,
if
you
were
looking
for
somebody
to
do
this,
I
know
of
a
person
who
would
be
really
excited
to
do
that
and
then
there's
a
thing
about
rap
spawner,
that
I
could
be
just
doing
wrong,
which
is
there's
like
a
default.
A
There
needs
to
be
like
a
default
kind
of
spawner
hanging
out
there
and
it's
local
process
spawner
right
now
for
me
in
the
container,
and
the
only
reason
it
doesn't
work
is
because,
like
there's,
no
get
PW,
name
or
whatever,
and
it
just
fails,
but
I
think
there
should
be
a
spawner
that
doesn't
do
anything.
So
this
would
be
a
good
default
for
that
unless
rap
spawner
can
do
it
in
a
different
way,
but
something
that
would
say
whatever
you
did.
You
picked
a
name
for
your
server
like
ABCD
efg.