►
From YouTube: NERSC Today and Over the Next Ten Years
Description
NERSC Now and Over the Next Ten Year, NERSC Director Sudip Dosanjh. Recorded in Berkeley at NUG 2013, the annual meeting of the NERSC Users Group.
A
A
We
do
no
collaborations
that
are
water,
but
our
primary
mission
is
serving
the
needs
of
our
clients
and
it's
through
high
performance
computing
and
extreme
data
analysis
and
we're
seeing
as
Kathy
mentioned
kind
of
a
growing
importance
of
data
at
nurse
and
I'll
be
talking
about
that
throughout
the
talk
as
I
mentioned
nurse
is
unusual
and
that
we
do
have
a
long
history.
Therefore,
there
are
other.
Do
we
computing
facilities
that
are
that
are
more
recent,
but
nurse
was
established
back
in
1974,
it's
1996
that
that
nursed
moved
to
Berkeley,
Lab
and
I.
A
A
I
know
that
will
impact
on
those
communities.
Hp
SS
became
the
mass
storage
platform
in
1999,
so
we
established
a
facility
wide
file
system
in
2005,
and
then
we
had
started
our
collaboration
with
the
joint
genome
Institute
to
provide
all
their
computing
back
in
2010.
So
so
a
number
of
these
have
really
had
to
do
with
with
kind
of
our
growing
mission
and
data.
A
We
we
do
also
you
know,
work
with
computer
companies
and
to
deploy
advanced,
HPC
and
data
resources.
So
we
deploy
a
wide
range
of
different
types
of
systems,
better
first
of
the
kind
and
push
the
state-of-the-art
in
terms
of
technology
in
different
ways
we
deployed
hopper,
which
is
the
first
one
of
the
first
Cray
petascale
systems,
with
a
new
gemini
interconnect.
A
So
one
thing
that's
different
from
up
for
us
is
that
we
directly
support
do
a
science
mission,
so
we're
the
primary
computing
facility
for
DOA
offices
science.
So
we
do
allocate
about
10%
of
the
resources
ourselves,
but
most
of
the
allocations
that
nurse
are
really
done
by
office
of
science.
So
six
program
offices
allocate
their
base
allocations
and
then,
when
we
put
a
new
system
on
the
floor,
they
can
submit
proposals
to
over
four
over
targets
and
the
deputy
director
of
science
prioritizes
those
over
target
requests.
A
So
so
it's
really
a
you
know
it's
serving
the
the
needs
of
do
a
Office
of
Science
up.
There
is
showing
kind
of
the
breakdown
among
the
different
offices,
and
the
other
thing
is
that
the
usage
shifts
of
Steel
we
priorities
change,
and
so
what
we
notice
is
that
we
Tyrael
science,
for
instance,
has
gone
up
in
the
last
10
years.
A
A
There
have
been
a
number
of
notable
accomplishments
using
nurse
resources.
It
simulations
at
nurse
were
key
to
two
Nobel
prizes
in
2007
and
2011
data
resources
and
services
played
an
important
role
in
two
of
Science
Magazine's
top
10
breakthroughs
of
2012
Smithsonian,
magazine's,
fifth,
surprising
scientific,
milestone
of
2012
four
of
science
magazines
insights
at
the
last
decade.
A
The
other
thing
to
note
here
is
that
a
number
of
these
have
really
in
add,
in
addition
to
simulation,
have
involved
data,
and
so
both
of
the
the
discovery
of
the
Higgs
boat
boson
and
the
measurement
of
the
theta
1-3
nutrient
neutrino
weak,
mixing
angle.
Those
were
both
focused
on
data,
the
three
genomics
ones
that
are
at
the
bottom.
Those
were
focused
on
data,
there's
a
supernova
that
was
caught
within
hours
of
an
explosion
in
2011.
A
That
was,
data
that
was
transferred
from
telescope
2
to
the
nernst
systems,
and
analyzed
and
telescopes
from
around
the
world
were
redirected
the
same
night.
So
so
we
are
seeing
this
shift
that
Kathy
had
mentioned.
The
other
thing
that's
really
different
for
us
is
that
we
support
a
very
broad
user
base,
so
we
have
4,500
users
and
we
typically
add
350
users
per
year.
A
So
it's
geographically
district
distributed,
we
have
47
states,
we
have
multinational
projects,
so
we
have
users
around
the
world.
We
have
10
states
with
over
100
users
and
we
have
13
with
50
to
50
to
99
users.
I
was
tell
my
story,
is
I
was
on
a
Southwest
flight
and
I
had
off
my
hopper
shirt
and
I
hadn't
shaved
I.
Had
my
tattered
jeans
on,
and
someone
came
up
to
me
and
said:
do
you
work
at
nervous
kidding,
and
so
so
we
have
users,
apparently
on
all.
A
A
You
know
a
dozen
or
a
few
dozens
of
users,
and
they
have
you
know
maybe
a
dozen
code
teams
and
codes
that
they
worry
about.
We
have
600
codes
and
algorithms
and
we
really
in
terms
of
algorithms.
We
have
all
kinds
of
different
things,
ranging
from
fuchsia
fusion
to
density,
functional,
functional
theory
to
climate,
to
mb.
Lattice
QCD,
so
so
we
have
to
serve
the
very
broad
needs
of
this
community.
We
also
have
people
running
at
all
kinds
of
different
scales.
As
Kathy
mentioned,
we
have
people
running.
A
This
is
showing
the
job
size
breakdown
on
hopper
and
and
it
has
about
150
3,000
cores.
So
in
red
there
are
jobs
that
use
over
65,000
cores,
and
so
you
see
that
we
have
lots
of
people
using
65,000
cores
and
over
over
15,000
cores,
and
then
we
have
lots
of
smaller
simulations,
we're
really
very
high
volumes
of
smaller
simulations.
So
we
have
to
be
able
to
support
this
very
diverse
work
load,
so
we
really
have
an
operational
priority
which
is
providing
highly
available
HPC
resources
backed
by
exceptional
user
support.
A
So
so
we
try
to
maintain
a
very
high
availability
of
our
resources,
so
we
always
have
one
large
HPC
system
available
at
all
times,
so
we
try
to
have
two
systems
on
the
floor
if
at
all
possible,
because
it
takes
usually
several
months
to
get
one
of
these
systems
stabilized,
and
so
so
you
know
right
now.
Both
argon
and
Oakbridge
have
been
upgrading
their
their
their
systems
and
they've
not
been
available
for
for
a
period
of
time
and
so
a
notice.
We
don't
do
that
yeah.
We
can't
do
that.
A
Given
our
mission
in
our
in
our
user
base,
our
goal
is
really
to
premised
the
productivity
of
our
users,
so
we
provide
one-on-one
consulting.
This
shows
the
number
of
tickets
over
time.
So
this
is
essentially
we
deal
with
this
with
constant
staff
over
the
last
10
years,
and
so
ten
years
ago
we
were
seeing
about
3.4
tickets
per
user.
A
A
So
we've
been
asked
by
do-e
to
do
us
strategic
planning,
so
we've
been
busy
doing
that
for
several
months
and
so
I'm
going
to
talk
a
little
bit
about
that
and
what
we
project
is
the
future
needs
and
challenges
and
then
kind
of
our
strategy,
so
so
Richard
and
and
harvey
wasserman.
They
do
these
requirements,
reviews
with
six
program
offices.
So
there
are
reviews
with
every
three
years
with
each
office,
and
a
number
of
you
have
probably
attended
some
of
those.
But
the
program
managers
invite
representatives
set
of
users.
A
Richard
Harvey
worked
hard
to
have
them
identified,
that
their
science
goals
and
representative
use
cases
and
based
on
those
use
cases
they
try
to
back
out
what
the
requirements
are,
and
then
they
rescale
the
estimates
to
account
for
users
that
are
not
at
the
meeting.
And
then
we
aggregate
the
results
across
the
six
offices,
and
then
we
try
to
validate
from
other
sources,
including
including
what
we
hear
from
this
meeting.
A
This
tends
to
underestimate
the
need,
because
we're
missing
future
users
so
so,
but
if
we
project
out
so
the
the
black
line,
there
is
the
trend,
the
historical
nernst
trend
over
time
of
computing.
That's
that's
available
to
the
users.
This
is
normalized
in
terms
of
hopper,
which
is
1.3
peda
flops.
So
so
one
there
is
is
one
hopper
year,
and
this
just
shows
over
time
it's
been
a
pretty
pretty
linearly.
A
We've
been
going
up
pretty
linearly
on
this
logarithmic
scale
and
and
shown
in
red,
are
the
actual
hours
delivered
and
if
you
right
now
we're
when
we
deploy
Edison,
this
is
where
we're
going
to
be
with
hopper
and
Edison,
and
so
what's
going
on.
Is
that
that,
in
terms
of
kind
of
the
the
general-purpose
x86
processors,
it's
going
to
get
harder
and
harder
to
stay
on
this
trend
line
with
those
kinds
of
processors?
A
And
so
we
made
the
strategic
decision
with
Edison
to
make
it
an
x86
based
system,
because
the
users
really
weren't
in
general,
ready
for
GPUs
or
accelerators.
But
you
can
see
that
if
we
continue
that
we
would
fall
below
the
kind
of
the
historical
trend
line
so
with
nurse
8
we're
really
looking
at
some
kind
of
system
that
has
a
more
energy-efficient
architecture.
A
That
would
let
us
get
back
closer
to
the
trend
and
we
have
kind
of
a
range
here
that
depends
on
the
budget
and
so
so
for
at
the
higher
end
of
the
budget,
we'll
be
able
to
get
back
on
the
trend
line.
So,
of
course,
the
important
thing
is
really.
What
are
the
users
need
and
it
turns
out
that
the
users
need
a
lot
more
than
that,
and
so
this
is
kind
of
the
agar
negation
from
the
requirements.
A
So
if
you
see
this,
if
you
plot
this
I
was
asking
Richard.
Well,
you
know
on
a
logarithmic
scale,
you
don't
quite
see
it
as
much,
but
if
we
plot
it
on
a
linear
scale,
you
can
see
that
in
the
Northgate
time
frame
that,
if
we
were
on
the
lower
end
here
would
be
over
a
factor
of
five
less
than
the
what's
been
identified
as
the
the
science
needs
of
the
different
offices.
A
A
The
other
thing
we've
been
looking
at
is
the
the
data
traffic
into
and
out
of
nurse,
and
that's
also
following
a
linear.
It's
going
up
linearly
on
this
logarithmic
plot
and
there
are
a
couple
of
notable
things.
One
is
you
see
this
slight
drop
here,
but
that
was
really
because
of
some
improvements
in
software
and
TCP
Auto
tuning.
But
then
you
saw
this
jump
as
we
started
to
see
more
traffic.
This
is
really
from
from
high
energy
physics,
but
you
could
imagine
that,
and
this
doesn't
count
the
jgi
traffic
and
so
there'd
be
another
step.
A
What
people
are
doing
is,
while
they
are
doing
that
they're
there
they're
transferring
away
lots
of
data
and
we've
gone
up
to
a
petabyte
per
month
in
terms
of
data
traffic
out,
but
for
the
last
four
years
we've
actually
seen
more
data
coming
in
and
so
a
number
of
times
we've
seen
more
than
a
petabyte
per
month,
and
so
this
this
plot
includes
jgi,
whereas
the
previous
one
didn't
but
we're
seeing
really
staggering
amounts
of
data
coming
into
nurse
skin.
And
so
this
is
really
a
lot
of.
A
This
is
in
memory.
This
is
in
processors,
and
this
is
an
instruments,
those
things
like
sequencers
and
detectors.
What
you're
seeing
is
we're
used
to
thinking
that
processors
are
on
this
Moore's
Law
curve
and
that's
really
a
very
fast
of
improvement,
but
things
that
are
instruments
are
improving
at
a
much
faster
rate
than
than
Moore's
law.
A
And
if
you
look
at
things
like
cost
per
genome,
it's
dropping
much
faster
than
Moore's
Law
and
if
you
would
look
at
expected
data
rate
production
from
things
like
light
sources,
what
we're
seeing
is
that
they're
going
to
get
up
to
terabits
per
second
in
the
next
five
to
ten
years,
and
so
so,
when
we
were
talking
to
the
light
sources,
you
know
they
were
kind
of
projecting
their
there.
Their
data
needs.
You
know
they're
at
about
65
terabytes
per
year
now
and
they
were
projecting
or
in
2009.
A
They
were
expecting
to
go
about
1.9
petabytes
per
year
in
2013,
and
if
you
just
extrapolate,
this
trend
line
out
they'd
be
out
to
up
to
exabytes
in
2021,
and
there
are
other
other
communities
that
we
deal
with.
Were
you
know
they're
they're
going
to
be
generating
hundreds
of
petabytes
of
data
that
they
need
to
they
need
to
analyze
and
what
they're
seeing
is
that
in
a
lot
of
cases
that
you
really
can't
analyze
all
the
data?
A
And
you
really
can't
compare
across
data
sets
and
a
lot
of
scientific
discovery
is
really
from
comparing
across
data
sets
and
they
have
very
limited
ability
to
be
able
to
do
that
right
now
and
I
won't
spend
as
much
time
on
this.
But
but
you
know,
as
Kathy
was
pointing
out,
you
know
the
computer
industry
roadmaps
are
not
going
to
meet
the
the
mission
needs
that
we're
seeing
here,
there's
kind
of
a
great
challenges
with
the
technology
as
we
go
forward.
A
A
This
would
be
the
same
amount
of
computing
as
this,
but
but
but
when
you
actually
try
to
program
this-
and
you
look
at
the
amount
of
memory
that's
available
and
and
the
memory
bandwidth
it's
going
to
be
a
real
challenge
to
get
the
scientific
productivity
out
of
this
that
you
got
out
of
this,
and
so
I
really
need
to
meet
these
challenges
through
hardware
and
software.
So
we're
gonna
need
to
rewrite
some
of
the
codes,
but
we're
also
going
to
need
to
influence
the
computer
industry.
A
A
And
so
what
I'm
showing
here
is
that
you
know
we've
been
on
this
trend
line,
which
has
been
as
Kathy,
was
showing
it's
been
exponentially,
increasing
improvements,
but
we're
really
beginning
to
see
this.
This
fall-off
in
terms
of
what
people
are
actually
achieving
on
these
systems.
It's
not
keeping
up
with
what
the
you
know.
A
As
we
begin
preparing
for
nurse
gate,
and
the
other
thing
we're
doing
is
we're
really
looking
at?
How
can
we
influence
the
computer
industry
to
ensure
that
some
of
these
systems
can
meet
the
the
mission
and
science
needs
of
the
office
of
science,
and
our
secondary
objective
is
to
increase
the
productivity,
usability
and
impact
of
do
e
user
facilities
by
by
providing
comprehensive
data
systems?
And
it's
not
just
hardware,
but
it's
also
software
and
services
that
are
needed
to
be
able
to
do
this.
A
So
Kathy
already
mentioned
the
the
the
new
facility.
This
is
critical,
deploying
this
facility
in
terms
of
being
able
to
provide
both
the
power
and
and
the
space
that's
needed.
So
this
was
just
a
shot.
That's
taken
so
just
down
the
hill,
but
the
retaining
wall
is
in
place
and
and
and
so
the
foundation
is
being
completed
and
so
in
within
the
next
couple
of
months
the
foundation
will
be
completed
and
and
they'll
start
working
on
the
structure.
So
our
the
plan
is
to
to
to
move
in
early
2015,
so
so
first
quarter
of
2015.
A
So
in
terms
of
those
different
different
objectives
that
I
mentioned,
providing
the
first
was
providing
usable,
exascale,
computing
and
storage
systems.
As
I
mentioned,
we
made
nurse
7
a
x86
based
system,
but
in
terms
of
nurse
8
that'll
be
our
first
pre
exascale
system,
we'll
have
a
pre
exascale
system
in
2019
nurse
9
and
an
exascale
system
in
2023.
So
our
strategy
is
that
that
this
is
pretty
much
what
we've
been
doing
in
the
past,
but
it's
really
having
open
competition
for
the
best
solution,
so
we
don't
pick
which
systems
we're
going
to
buy.
A
We
really
look
for
a
competition
to
see
what
people
propose
and
try
to
pick
the
best
of
those
we
focus
on
the
performance
of
a
broad
range
of
applications
and
not
dis
benchmark.
So
our
goal
is
not
to
build
the
best
Linpack
machine,
but
it's
really
to
build
a
system
that
works
on
our
broad
range
of
applications
because
of
the
diversity
in
the
codes
and
the
algorithms.
A
We
really
need
general-purpose
architectures,
so
what's
what's
new
is
that
we
want
to
do
earlier
procurements,
so
we
can
have
a
greater
influence
on
the
design
there
are
these
do
a
fast
forward
and
design
forward
efforts
that
I'm
very
involved
with
these
are
collaborations
with
processor
memory
companies.
These
are
going
to
be
what
system
integrators
and
with
with
interconnect
companies,
but
we're
working
very
closely
with
them
so
that
the
research
that
they're
doing
benefits
do
applications.
A
A
We
want
to
provide
support
for
legacy
code,
although
that's
going
to
be
at
less
than
optimal
performance
and
would
like
to
be
able
to
get
reasonable
performance
with
MPI
plus
OpenMP,
at
least
in
the
near
term.
Nurse
gate
will
support
other
programming
models
and
we're
really
not
pre-selecting
those
that's
going
to
be
based
on
based
on
the
procurement
that
we're
doing
and
we're
going
to
support
optimized
libraries
so
that,
hopefully,
people
can
get
some
of
the
performance
by
just
by
just
using
libraries
that
are
highly
tuned
or
optimized
for
these
systems
longer-term.
A
We
really
need
to
have
a
broader
effort
to
converge
on
the
next
programming
model.
And-And-And,
that's
something
that
we're
also
looking
at
is:
how
can
we
drive
this
so
that
that
it's
not
this?
This
is
probably
more
evolutionary,
but
we
want
to
leave
room
for
something.
That's
revolutionary
and
much
better
I
mean
in
some
sense.
We
are
very
focused
on
performance,
improving
the
performance
of
our
codes
on
next-generation
architectures,
but
programmability
is
really
a
critical
as
well
as
we
don't
want
systems
that
are
that
are
next
to
impossible
to
program.
A
So
in
terms
of
transitioning
the
codes
so
we're
beginning
to
deploy
testbed.
So
there
a
number
of
test
beds
at
risk
to
help
you
and
with
us
gain
experience
with
new
technologies
and
to
better
understand
some
of
the
trade-offs.
We're
gonna
have
in-depth
collaborations,
with
some
selected
users
we're
trying
to
cover
that
algorithm
space.
A
So
so
I
think
it's
really
critical
to
know
it
again
that
that
all
the
users
will
be
impacted,
and
so
so,
when
I
get
asked
well,
what
fraction
of
the
codes
eventually
need
to
make
this
transition
out
of
those
600
I?
Think
eventually,
all
of
those
codes
have
to
because
otherwise
you're
going
to
be
stuck
at
today's
performance
levels.
I
mean
you're
not
going
to
see
this
Moore's
law
and
provement
in
your
codes
unless
you're
able
to
make
this
leap
to
next
generation
architectures.
A
So
so,
as
I
mentioned,
we
also
want
to
influence
industry
more.
We
want
to
make
sure
that
these
future
systems
meet
our
needs
and
and
are
more
programmable
and
reliable
part
of
that
we're
partnering
with
Los
Alamos
and
Sandia
on
our
procurements
in
2015
and
2019.
So
we're
already
seeing
that
the
larger
size
of
these
procurements
is
giving
us
more
leverage
more
interests.
We
had
10
different
companies
respond
to
our
draft
RFP.
A
We
want
to
provide
industry
with
greater
information
on
our
workload.
There
are
all
these
co.design
efforts,
but
in
some
sense
the
the
people
like
Intel
and
NVIDIA
are
really
going
to
be
more
influenced
by
what
does
overall
workload
look
like,
rather
than
just
one
particular
application,
and
so
so
those
co
design
efforts
are
important,
but
we
also
need
to
be
provide
them
kind
of
broader
information
through
things
like
instrumentation
and
measurement,
as
I
mentioned,
we're
already
actively
engaging
with
fast
forward
and
design
forward.
A
We
have
this
computer
architecture
lab,
that's
been
established
by
Oscar,
that's
Berkeley,
Sandia
collaboration,
and
we
want
to
serve
as
this
conduit
for
information
flow
between
computer
companies
in
our
user
community
in
terms
of
our
extreme
data
strategy,
we're
partnering
with
DOA
experimental
facilities
to
identify
some
of
the
requirements
and
create
some
early
successes
were
developed
being
in
deploying
new
data
resources
already,
but
our
plans
are
to
deploy
systems
that
are
really
focused
on
data
in
the
in-between
years.
So
in
2017
and
2021
we
would
deploy
systems
that
are
really
focused
on
data
data
analysis.
A
We
want
to
provide
a
new
class
of
HPC
expertise.
What
we
would
like
to
do
is
be
enable
people
to
rely
on
nurse
for
data
analysis
the
same
way
they
do
currently
for
for
a
computation,
and
we
really
have
a
unique
opportunity
here,
with
es
net
and
and
and
all
the
Oscar
research
that's
funded
to
create
some
end-to-end
solutions
in
the
space.
A
We
think
that's
going
to
continue
on
into
the
future,
and
so
so
we
will
have
these
compute
intensive
architectures
that
are
really
that
the
goal
is
to
maximize
the
computational
density
and
local
bandwidth
for
a
given
power,
cost
constraint
and
so
we're
gonna.
Try
to
you
know
those
try
to
maximize
the
bandwidth
density
near
compute
for
data
intensive
architectures.
A
Your
goal
is
really
to
get
the
maximum
data
capacity
and
global
bandwidth
for
a
given
power
cost
constraint,
and
so
you
want
to
bring
more
storage
near
compute
or,
conversely,
embed
more
compute
near
the
storage,
and
this
also
requires
a
lot
different
software
and
programming
environments.
So
people
are
interested
in
running
databases,
for
instance.
A
One
system
which
would
be
kind
of
the
follow
on
kind
of
the
natural
follow-on
to
the
the
data
systems
that
we
currently
have
deployed
in
2017
and
and
2020
2021,
and
so
in
blue.
What
I'm
showing
is
kind
of
the
the
need
that's
been
identified
in
the
in
the
various
requirements
meetings
the
aggregate
need
the
yellow
line
is
the
historical
trend
line
for
nurse
this
light
blue
would
be
if
we're
kind
of
limited
by
our
current
budget
and
power
in
red
would
be
if
we're
limited
by
budget.
So
it's
it's
more.
A
What's
the
largest
system
that
we
could
buy
for
a
certain
amount
of
money
and
in
green
is
if
we're,
if
we're
limited
by
power
and
Oscar's,
told
us
that
that
we
should
plan
each
of
the
computing
centers
should
plan
for
a
maximum
of
30
megawatts,
and
so
it
would
be
reaching
30
megawatts
out
here,
and
so
this
would
be
kind
of
the
trend
line.
So
kind
of
the
bottom
line
here
is
really
if
we
want
to
get
anywhere
near
what
the
users
have
told
us
that
they
need.
A
We
really
need
to
be
much
more
aggressive
in
terms
of
deploying
some
of
the
hardware
that's
projected
for
for
exascale
systems,
and
so
we
really
need
to
have
some
active
research
with
with
industry
to
try
to
push
push
this
curve
up,
because
in
principle
you
know
we're
never,
for
environmental
reasons
and
for
for
just
cost,
it's
very
unlikely
that
anyone's
ever
going
to
want
to
deploy
more
than
30
megawatts
of
computing,
and
so
so
so
being
even
staying
at
that,
which
is
a
lot
we're
gonna,
be
well
short,
and
so
so
I'll.
Just
close.
A
What
saying
that
you
know?
We
do
have
a
strategy
and
a
plan
for
meeting
the
kind
of
the
ever-growing
computing
and
storage
needs
that
we've
identified
with
the
community,
and
we
really
want
to
enable
the
science
teams,
with
the
nation's
largest
data,
intensive
challenges
to
rely
on
nurse
to
the
same
degree.
They
already
do
for
modeling
and
simulation.
So
I'll
close
with
that.
Hopefully,
that
gave
you
something
some
ideas
of
where
we're
headed.