►
From YouTube: Introduction to Perlmutter
Description
Jay Srinivasan of NERSC presents an Introduction to Perlmutter. Recorded live via Zoom at GPUs for Science 2020. https://www.nersc.gov/users/training/gpus-for-science/gpus-for-science-2020/ Session Chair: OisÃn Creaner
B
I'm
yeah.
A
Thank
you
all
for
inviting
me
to
give
you
an
introduction
to
to
promoter.
I'm
jason
young
austin,
the
project
director
of
the
formal
project
that
we
have
with
the
nurse
to
to
bring
in
the
system.
A
B
A
To
give
a
quick
introduction
to
nurse,
I
think
most
people
here
might
be
familiar
with
nurse.
We
are
the
mission
high
performance
computing
facility
for
the
office
of
science,
and
you
know
on
the
left.
You
see
a
bunch
of
statistics
about
nurse.
We
have
thousands
of
users,
hundreds
of
projects
and
a
lot
of
codes,
and
what
that
means
is
really
we
have
a
diverse
workflow
and
that
diverse
workload
you
know
is
characterized
by
simulations
by
data
analysis
and
recently
learning
as
well
right.
A
Some
comments
here,
okay,
so
what
that
means
is
also
that
the
systems
that
we
get-
and
you
can
see
the
roadmap
here
from
later
earlier
on
in
this
decade-
have
to
be
able
to
cater
to
that
really
diverse
workload
that
we
have
right.
So
starting
in
2013.
A
We
had
edison
and
then
followed
by
corey,
which
gave
our
users
an
introduction
to
the
mini
core
era
and
then
later
on
this
year
and
in
into
21
as
I'll
talk
about
we're
going
to
have
the
ninth
generation
of
our
systems,
which
we're
calling
promoter
and
those
will
have
a
mix
of
both
cpu
and
gpu
nodes
on
the
system
and
then
later
on
in
this
decade,
we'll
get
into
nurse
content
right.
A
So
what
is
promoter?
It's
a
system
that
you
know
we've
right
from
the
get-go.
When
we
started
the
project
we
decided
it
was.
It
was
gonna,
be
a
system
optimized
for
science.
So
what
does
that
mean?
It's
a
system
that
provides
a
substantial
increase
in
performance
over
over
quarry,
which
is
our
current
knl
based
system.
A
That's
three
to
four
x
of
quarry.
It
has
a
mixture
of
both
gpu
accelerated
and
cpu.
Only
nodes
that
meet
these
three
pillars
of
our
needs.
Right.
Simulations.
C
A
C
A
That's
compute
storage
and
networking
right.
The
the
data
stack.
That's
optimized
for
this
system
will
enable
us
to
support
both
analytics
and
machine
learning
at
scale
right.
So
what.
A
A
The
all
of
this
is
connected
together
using
the
next
generation
of
interconnect
by
by
create
called
slingshot,
and
that's
an
ethernet
compatible
interconnect
and
what
that
does
is
it
basically
opens
up
the
inside
of
these
machines
to
in
sort
of
a
seamless
fashion
to
the
outside
world
right,
and
it
enables
data
movement
much
much
more
easily
than
was
possible
before
on
on
systems
like
edison
or
even
core
right,
the
the
gpu
only
nodes
as
I'll
talk
about
have
four
nvidia
gpus
with
you
know
the
latest
in
cores
and
interconnect
as.
A
And
they'll
have
one
amd
milan
cpu,
which
is
the
next
generation
from
what
people
are
obviously
now
on
the
wrong
line.
We'll
have
over
6
000
ampere
gpus,
the
the
interconnect
as
I've
talked
about.
Is
this
ethernet
compatible
high
performance
interconnect
and
really
we
we
expect
that
we'll
be
able
capable
of
terabit
connections
to
and
from
the
system
right.
A
So
here's
a
chart,
that's
similar
to
what
I
just
showed
before,
but
sort
of
also
shows
how
we're
bringing
in
the
system
right
so
we're
because
of
various
timing
issues
and
the
way
things
sort
of
roll
out
from
the
technology
perspective.
A
We're
bringing
this
in
two
phases
right,
so
we'll
bring
in
the
first
phase
late
this
year
and
that'll
consist
of
the
gpu
accelerated
nodes,
all
of
the
storage
and
and
the
all
of
the
associated
nodes
that
will
help
us
run
this
as
a
system
right.
So
that
includes
the
login
nodes,
all
of
the
nodes,
for
you
know
high
memory,
work,
workflows
and
so
forth
and
then,
as
well
as
the
storage
and
access
to
external
storage,
so
it'll
be
fully
integrated
into
nurse.
Just
like
the
systems
we
have.
A
Milan
cpus
will
come
in
later
in
2021,
as
will
the
the
you
know,
client
side
of
the
high
performance
interconnect
right,
which
is
the
the
slingshot
part.
A
A
Only
all
of
those
blades
will
go
into
a
compute
rack
and
each
rack
will
have
64
blades
so
we'll
have
either
128
nodes
or
256
notes,
depending
on
whether
it's
a
gpu
accelerated,
blade
or
a
cpu
only
blade,
and
then
all
of
those
racks
will
be
put
together
to
give
us
this
pro
monitor
system
and
that
has
12
gpu
max
and
12
cpu
racks
right,
the
you
know,
the
other
part
that's
important
when
it
becomes
a
whole
system
is
how
do
you
get
it
all
to
work
together
right
and
that's
probably
of
most
importance
to
to
you
all
as
users.
A
We
have
activities
in
in
the
areas
of
network
the
storage,
the
application,
readiness
work,
as
well
as
the
system
software
work,
to
ensure
that
this
system
that's
been
put
together
will
work
really
well
for
our
users
and
be
a
productive
system
right,
and
so
you
can
see
from
those
four
areas
that
all
of
those
four
areas
are
things
that
are
are
new
technologies
to
promoter
they're,
not
just
new
to
promoter
they're,
also
new
technologies.
Overall
right,
the
the
high-speed
network
is
a
brand
new
technology
from
craig.
A
The
all-flash
storage
is
going
to
be
one
of
the
first
times
that
all
flash
storage
is
run
on
a
shared
file
system,
which
is
luster
based
at
the
scale
right
for
the
kinds
of
workloads
that
the
diverse
kinds
of
workloads
that
we
have
obviously
gpus
aren't
new,
but
running
it
in
production,
for
our
diverse
workload
at
the
scale
of
users
and
at
the
scale
of
science
that
our
users
do
is
new
and
so
getting
our
apps
ready,
for
it
is
an
important
part
of
that
effort
and
the
system
software
that
ties
everything
together
is
also
a
new
revision
of
the
system.
A
So
if
there
are
any
questions,
please
speak
up,
I
don't
know
how
regina
or
others
you
want
to
handle
the
questions,
but
so
I
you
know
you're
gonna
have
plenty
of
talks
today
on
on.
A
A
Right,
obviously,
we
didn't
have
as
grand
releases
if
people
had
do
it
were
able
to
attend
gtc
in
person,
but
hopefully
everybody
has
listened
to
the
talks
and
things
like
that
from
gtc
and
seen
the
features
that
ampere
has
so
we'll
be
getting
the
a100,
which
is
a
an
implementation
of
the
ga
100
gpu
right,
and
you
can
see
all
of
these
statistics
these.
A
You
know
stats
and
speeds
and
feeds
on
there
and
we're
looking
at
things
like
you
know,
almost
20
teraflops
with
the
tensor
core
on
fp64
and
so
forth.
So
the
other
features
that
really
are
of
importance
to,
we
believe
to
our
users
and
in
fact,
that
the
talks
over
the
next
couple
of
days
are
addressing
are
things
like
you
know?
How
are
these
things
connected
together
and
how
are
you
going
to
be
able
to
use?
A
You
know
the
four
of
them
that
we
have
on
each
node
on
promoter
effectively
within
relay
three
right.
This
multi-instance
gpu
technology
that
ampere
puts
together
is
very
interesting.
A
We're
going
to
have
it
available
in
promoter,
of
course,
and
to
use
it
effectively
we're
going
to
have
to
be
able
to
integrate
that
and
make
it
expose
that
technology
to
our
users,
to
our
workload
manager
and
to
how
users
access
the
system
from
a
scheduling
point
of
view,
and
that's
something
that
we're
going
to
be
looking
at
very
closely
between
now
and
when
promoter
comes
into
service.
A
The
tf32
support-
and
I
think
jack
and
she
talked
about
this-
is
the
the
mixed
there's
going
to
be
a
number
of
talks
on
mixed
precision,
stuff
work
and
that's
going
to
be
very
interesting.
So
I
think
that
technology
is
going
to
be
explored
as
well
over
the
next
couple
of
days
here
for
the
system
that
we
have.
You
know
I
just
wanted
to
touch
on
one
specific
aspect
of
the
all
flash
file
system.
A
So
this
one,
obviously
it's
all
flash
it's
luster-based
and
it
has
it's
gonna,
be
fast,
it's
gonna
be
usable
and
it's
optimized
right.
So
it's
fast
across
multiple
dimensions
has
high
bandwidth
because
it's
all
flash
it
has
excellent
higher
iops
performance,
and
it's
able
to
sorry
this
should
say
on
3.2,
with
3.2
million
file
creates
per
second
right.
A
It's
going
to
be
usable
for
our
users,
there's
35
petabytes
of
usable
capacity,
that's
on
essentially
on
the
machine
right.
It's
not
a
remote
file
system
that
you
have
to
access
to
a
small
network
pipe
or
anything.
It's
on
the
machine.
It's
part
of
the
fabric
that
compute
nodes
are
on
and
the
gpu
and
the
cpu,
and
it.
A
So
if
people
are
able
to
use
it
and
we're
going
to
have
data
movement
capabilities
that
are
new,
that
allow
people
to
move
data
seamlessly
between
the
tiers
right.
So
what
are
the
tiers
we're
talking
about?
Is
the
storage,
the
external
file
systems
that
we
have
and,
and
obviously
things
like
you
know,
the
archival,
storage
and
so
forth.
So
and
then
finally,
there's
a
number
of
optimizations
that
luster
clearly
works
at
scale
as
people
know
now
on
corey,
but
with
all
flash.
A
You
have
to
worry
about
things
like
how
does
small
file
I
o
perform?
How?
How
do
we
take
advantage
of
these
high
ops
that
are
there
and
so
forth
and
where
we're
making
sure
that
the
file
system
is
optimized
for
that.
A
A
A
Hopefully
your
lives
easier
when
you
do
use
the
system-
and
that
includes
things
like
system
software
and
scheduling.
How
do
we
schedule
these
multiple
resources
and
even
drilling
down
into
the
node?
How
do
you
make
sure
that
things
like
the
multi-instance
gpu
technology
is
available
in
a
fashion
that
makes
it
useful
for
multiple
users
to
use
the
same
core
gpu
partitioned
into
multiple
instances
and
things
like
that?
A
The
workflow
architecture,
which
enables
people
to
support
you
know,
data,
intensive
workloads
and
things
like
that.
How
do
we
make
sure
that
that's
available
on
a
system?
That's
going
to
be
new,
that
has
a
new
system
software
stack
that
has
this
diverse
workload
on
it,
the
storage.
A
I
just
talked
about
making
sure
that
all
of
those
features
available
on
palmada
are
tested
ready
for
our
users
and
the
networking
as
well,
which
enables
this
us
to
take
advantage
of
this
ethernet
compatible
network,
so
make
make
available
all
of
the
features
that
allow
ethernet
to
sort
of
connect
up
to
the
outside
world
in
a
seamless
fashion,
available
on
in
the
inside
of
the
system
as
well.
B
A
Yeah,
I
just
have
a
couple
of
more
slides.
I
think
so.
The
other
aspect-
that's
useful
of
this
effort-
is
the
nisap
program
that
I
think
you're
going
to
hear
a
little
bit
obliquely
about.
A
A
I
think
will
will
show
you
how
well
we're
doing,
and
in
fact
that's
what
has
given
us-
some
confidence
that
gpu
system
and
the
cpu
system
and
the
way
we've
divided
up
the
resources
on
our
promoter
system
into
those
two
you
know
technologies
is,
is
going
to
be
useful
for
our
users
and
it
in
fact,
what
motivated
us
to
ensure
that
we
get
in
the
gpus
and
make
those
available
to
our
users
as
soon
as
possible.
A
The
there's
other
work
that
isn't
sort
of
formally
part
of
the
project,
but
that
nursk
is
doing
we're
working
with
you
know,
pgi,
to
enable
openmp,
gpu
acceleration,
and
that's
effort
that
I
think
you're
going
to
hear
about
a
little
bit
where
obviously
performance
portability
is
a
key
aspect
of
making
sure
that
these
systems
are
usable,
are
productive
for
our
users
right,
not
just
on
nurse,
because
nobody
runs
just
owners.
People
run
all
across
resources
in
the
doe
complex
and
so
and
then.
A
A
So
in
summary,
we're
really
excited
to
be
able
to
introduce
promoter,
which
is
the
system
that's
optimized,
for
science
to
our
broad
user
base.
Our
staff
have
been
engaging
with
our
users
to
ensure
readiness
for
the
system
and
the
the
effort
that
the
postdocs
here
put
in
to
bring
this.
A
This
set
of
sessions
to
our
users
is
sort
of
testament
to
that.
We
have
a
very
strong
training
effort
in
collaboration
with
nvidia
and
createhpe,
to
give
new
information
about
the
new
technologies
and
promoter
to
our
users,
and
we
really
look
forward
to
working
with
you
all
to
make
promoter
a
very
productive
platform,
the
next
generation
platform
for
our
users
and
I'll
stop
there
happy
to
take
any
questions
that
you
have.
B
B
We
had
a
question
there
from
hugo
hugo.
Do
you
want
to
unmute
yourself.
C
Yeah,
hello,
jay,
you
show
the
slides
with
the
synergy
of
the
different
team
working
on
the
network
on
the
app
readiness
and
other
boxes
for
building
palmetto
and
making
it
an
efficient
machine.
C
How
will
you
describe
the
synergy
between
all
these
factors
that
make
per
meter
a
good,
supercomputer
and
easy
to
use
by
the
end
users
like
how
all
these
teams
work
together
to
to
make
it
transparent
to
end
user.
A
Yeah,
that's
a
good
question,
so
you
know
promoter
is
really
the
focus,
the
most
important
focus
of
nurse.
I
would
argue
right
and
what
that
means
is
you
know
we
at
nurse.
We
have
a
formal
project
that
says:
okay,
we're
going
to
bring
in
the
next
generation
of
the
system
that
we
have
over
50
percent
of
nurse
staff
are
directly
working
on
the
effort
to
bring
in
per
mutter
into
production
right,
and
so,
when
we've
split
it
up
into
these
different
groups.
Really.
A
We,
we
have,
you
know
weekly
meetings
to
to
bring
in
to
bring
together
all
of
the
efforts
that
these
people
are
doing.
All
of
these
people
are
going
to
be
participating
and
are
already
participating
in
some
of
the
early
test.
A
Testbed
hardware
that
we
have,
which
isn't
yet
at
the
scale
where
we
can
expose
it
to
users
or
it
isn't,
doesn't
necessarily
have
all
of
the
technology
that
would
make
it
useful
to
expose
it
to
users,
but
for
the
staff
effort,
it's
possible
for
the
staff
to
start
getting
access
to
some
or
all
of
this
technology
gradually
between
now
and
the
end
of
this
year,
when
we
have
how
money
are
in
there,
and
so
all
of
those
efforts
that
people
are
working
on
sort
of
mesh
together
in
in
the
back
end,
if
you
will
right
without
necessarily
being
exposed
to
the
users
right
away
at
this
point.
A
But
it
is
part
of
one
big
project
that
you
know,
like
I
said
over
over
half
of
nurse
staff
are
directly
working
on
promoter
related
activities
at
this
point
right
and
have
been
for
the
last
year
really.
B
B
I'm
gonna
need
to
move
on.
So
if
you
do
have
any
further
questions,
please
do
put
them
in
the
q
a
and
we
will
bring
them
up.