►
From YouTube: Intro to GPU: 01 Why GPUs
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
A
So
you
know
nurse
is
the
mission
HPC
Center
for
the
Department
of
Energy's
Office
of
Science,
and
so
what
that
means
is
that
predominantly
our
mission
is
to
to
advance
science
and
what
the
scientists
continually
tell
us
is.
They
need
more
and
more
and
more
cycles,
more
and
more
compute
resources,
storage
resources
to
stay
competitive
in
sort
of
a
global
science.
A
A
And
so
this
light
is
just
kind
of
a
motivation
of
how
how
and
why
that
change
is
coming
about.
It's
largely
driven
by
the
consumption
of
power
and
sort
of
heat
dissipation.
That
is
pushing
hardware,
vendors
towards
kind
of
lightweight
cores
and
when
I
was
kind
of
describing
as
exascale
like
architectures.
So
this
plot
here
kind
of
shows
a
trajectory
of
energy
per
flop
over
time,
and
you
can
see
that
the
these
kind
of
two
flat
lines
here
are
basically
business
as
usual,
whereas
the
many
core
and
heterogeneous
computing
lines
are
down
here.
A
As
you
can
see
that
you
make
a
substantial
several
orders
of
magnitude
increase
in
capability
by
by
kind
of
switching
from
traditional
heavyweight
server
processors
to
these
lightweight
processors.
And
we
started
this
transition
with
the
quarry
system
at
nurse
which
is
largely
powered
by
these
Intel
knights
landing
many
core
processors,
and
you
know
what
we
found
is
that
Cory
Cory
is
a
boon
to
Science
in
the
u.s.
A
So
it's
sort
of
further
motivation
in
terms
of
moving
towards
GPUs
in
particular,
as
we
think
about
replacing
the
Edison
system
editors,
there
was
decommissioned
sort
of
the
middle
of
middle
of
last
year
and
replacing
it
with
the
upcoming
Perlmutter
system.
You
can
see
the
potential
of
GPUs
to
really
kind
of
increase,
our
our
energy
efficiency
and
the
total
capability
that
we
can
provide
to
the
user.
So
this
is
an
example
of
a
code
running
at
sort
of
different,
different
different
scales
on
the
Edison
system.
A
In
the
summit
system,
which
has
the
current
generation
of
NVIDIA
GPUs
and
for
a
few
different
problem
sizes,
so
if
you
compare,
for
example,
these
blue
squares
with
the
the
red
squares
and
the
blue
circles
with
the
red
circles,
you
can
see
that
we're
essentially
achieving
an
order
of
magnitude
and
energy
efficiency,
which
is
along
the
diagonal
in
this
plot.
The
y
axis
is
time.
The
x
axis
is
power
and
so
time
times
power's
energy.
A
So,
if
you
look
at
along
the
diagonal,
you
get
the
average
energy
used
for
the
for
the
simulation,
so
this
is
I
think
really
exciting.
There's
a
lot
of
potential
gain
here,
as
we
go
from
the
Edison
system
that
just
retired
to
the
upcoming
upcoming
Perlmutter
system,
so
nurse
users
have
been
demonstrating
kind
of
groundbreaking
science
on
the
the
KL
system.
A
These
large-scale
energy,
efficient,
compute
and
I
want
to
say
that
modernizing
codes
is
possible,
so
I,
you
know,
I
mentioned
a
couple
slides
back
that,
while
Cori
has
been
a
boo,
also
requires
effort
on
behalf
of
the
co
teams,
and
what
we
found
is
that
it
it's
it's
definitely
possible.
We.
We
kicked
off
a
nice
app
program
for
Cory
and
found
that
on
average,
when
these
teams
looked
at
their
applications
performance,
they
analyzed
it
they
improved
it.
A
They
ended
up
with,
on
average,
about
3x
improvement,
and
one
of
the
other
takeaways
is
that
when
you
improve
your
application
targeting
one
of
these
sort
of
exascale
like
architectures,
you
end
up
basically
learning
things
about
your
performance.
Learning
things
about
your
code
that
end
up
improving
it
everywhere.
So
you
end
up
with
you
know:
even
the
code
running
back
on
sort
of
more
traditional
HPC
system,
like
Edison,
ended
up
being
about
two
times
faster
after
the
after
the
changes
that
you
make,
and
so
that's
I.
A
Ok!
So
let's
talk
about
where
this
increase
in
performance
is
coming
from
on
the
exascale
architecture,
so
on
KML
and
GPUs,
getting
performance
kind
of
relies
in
you
effectively
using
essentially
the
increased
parallelism
that
is
coming
in
the
in
the
processors.
So,
for
example,
you
have
order
of
a
hundred
cores
or
I.
Think
sort
of
the
equivalent
might
be
SMS
on
a
GPU
per
per
processor
per
chip
with
many.
A
What
I
would
call
hyper
threads
on
K
and
L
or
warps
on
a
GPU
to
hide
any
to
hide
any
latency
in
the
case
of
KL
each
one
of
those
cores
had
what
we
call
a
vector
processing
unit
that
could
process
eight
double-precision
wide
vectors
at
at
a
time.
So
you
could
basically,
instead
of
operating
on
a
single
number.
You
operate
on
a
vector
of
numbers.
A
When
you
go
to
a
GPU,
that's
basically
a
32
wide
vector
that
we
I
guess
we
would
typically
call
a
warp
and
then
there's
multiple
flops,
even
available
per
vector,
Lane
using
sort
of
advanced
instructions
like
FM.
A
s
which
stands
for
fused
multiply
adds.
So
you
can
do
a
multiply
than
add,
essentially
in
one
cycle,
as
well
as
tensor
instructions
on
the
on
the
GPUs.
A
So,
as
as
a
short
way
to
kind
of
describe,
this
change
from
sort
of
traditional
CPUs
to
GPUs
on
sort
of
one,
the
sort
of
throughput
extreme
is
that
you
know
you
could
see
the
parallelism
increasing
across
every
single.
One
of
these
lines
are
going
from
64
to
82
threads
to
potentially
64
warps
per
SM.
So
this
would
be.
A
We
when
we
thought
about
the
procurement
of
Perlmutter,
we
spent
a
lot
of
time
determining
how
the
workload
would
benefit
from
the
GPUs,
and
we
kind
of
did
this
analysis
of
the
the
workloads
readiness.
This
plot
on
the
this
sort
of
pie
chart
on
the
left
here
shows
the
breakdown
of
the
nurse
workload
by
cycles
across
the
different
codes
that
are
used
at
the
center.
A
So
the
good
news
is
have
a
good
fraction
of
the
codes
that
are
at
you,
Cetner
score
already
GPU
enabled,
and
then
there
are
some
that
are
you
know
down
here
where
we,
you
know,
there's
still
work
to
be
done,
and
you
know
some
cases
we
think
it
you
know
it
could
be.
It
could
be
a
challenge
to
get
the
GPUs
to
work.
A
A
A
This
was
actually
I
think
the
first
time
that
we've
named
a
system
after
somebody
who's
still
alive,
but
I
I
think
that
Saul
was
fairly
humble
I.
Think
one
of
the
things
that
he
was
worried
about
is
whether
people
would
have
to
type
the
entire
his
entire
last
name,
every
time
they
ssh
to
Perlmutter
and
so
I
guess
he
and
our
director
made
a
compromise
that
you
would
be
able
to
log
in
here
to
just
solve
down
there
stuck
up
instead
of
the
entire
Perlmutter
that
that
nurse
go.
A
So
we
we
just
we
designed
this
as
from
kind
of
the
beginning,
as
a
system
optimized
for
science
and
as
I
said
part
of
our
mission
is
to
make
sure
that
we
can
deliver
the
capability
that
the
science
community
relies
on,
and
so
a
large
fraction
of
the
system
will
be
gpu-accelerated.
But
there
will
be
some
cpu
only
nodes
to
meet
the
needs
of
some
of
the
large
scale,
sort
of
simulation
and
data
analysis
projects
that
we
think
are
going
to
have.
That
will
require
some
time
before
they
can
port
to
the
to
the
GPUs.
A
A
So
to
tell
you
a
little
bit
more
about
the
specs
of
the
system,
here
is
the
breakdown
of
the
the
CPU
nodes
and
also
the
essentially
the
CPU
parts
of
the
of
the
GPU
nodes
will
be
using
the
next-generation,
AMD
Milan
CPU.
These
are
the
specs
for
the
current
generation
and
so
I
think
what
you
can
expect
you
can
kind
of
put
like
a
greater
than
or
equal
sign
to.
A
Essentially,
you
know
assume,
just
you
know,
bigger
better
faster
for
the
for
the
next
generation,
and
then
here
is
the
is
what
we're
expecting
for
the
GPU.
So
we'll
have
a
configuration
with
one
CPU
and
four
GPUs
per
node.
These
again
are
the
current
generation,
volta
specs,
and
so
for
both
the
next
I
think
you
can
again
kind
of
put
a
greater
than
or
equal
sign
and
just
expect,
you
know
somewhat
bigger,
better
faster,
but
the
Volt
and
next
product
hasn't
been
formally
announced.
Yet.
A
Okay,
and
so,
as
I
said
earlier,
we've
begun
this
process
with
nice,
app
working
with
our
teams
on
the
on
a
number
of
the
applications
getting
them
ready
for
particularly
the
DG,
the
GPU
partition
of
perlmutter,
and
so
this
is
some
of
the
early
progress
that
we've
been
making
and
helps
answer
sort
of.
Why?
Why
GPUs?
A
You
know
pretty
pretty
good
progress
and
that
the
overall
scientific
throughput
will
go
up
pretty
significantly.
There's
a
couple
cases.
You
know
there's
a
challenging
code
here,
for
example
Atlas
that
is
sort
of
at
the
at
one
where
you
know
currently
they're
projecting
actually
worse
performance
on
the
GPUs
and
CPUs,
but.
A
You
know
that's
actively
being
being
tackled
and
if
we
look
at
the
different
categories,
we
essentially
have
six
categories
or
different
types
of
applications
and
at
least
happen.
If
we
compare
their
projected
GPU
to
CPU
no
performance
on
perlmutter,
we
have
this
plot
here
we
can
see
that
there's
significant
performance
increase
using
the
the
GPU
projected
for
the
for
the
GPU
new
nodes
over
the
CPU
nodes
for
at
least
a
representative
app
in
each
one
of
the
categories.
A
It's
not
surprising
to
see
machine
learning,
really
high
I
think
everybody
knows
that
machine
learning
runs
really
well
on
the
GPUs.
It's
it's
I
think
it's
really
great
to
see
apps
in
each
of
these
other
categories
high,
you
know,
even
even
the
grids
of
particles,
I,
think
this
number
ends
up
being
about
a
9x
speed-up
and
that's
a
pretty
challenging
category.
That's
where
we
include
like
the
climate,
apps,
the
block,
structured,
great
apps
and
like
the
pic
and
particle
and
cell
codes,
for
example,
as
well
as
one
example
of
early
progress.
A
I'll
just
highlight
this
Tomo
PI
application,
so
tomo
pi
is
a
tomographic
reconstruction
code
that
is
used
at
the
I.
Think
the
Advanced
Photon
Source
in
Argonne,
National
Lab,
and
so
essentially
they
have
a
bunch
of
you-
know
a
bunch
of
2d
images
where
they
kind
of
rotate
a
sample
in
front
of
a
camera
and
they
try
to
reconstruct
the
3d
volume.
A
And
you
know,
one
of
the
things
I
want
to
highlight
is
that
this
wasn't
entirely
just
kind
of
a
straight
port
of
the
code
in
the
sense
of
let's
kind
of
slap,
a
few
directives
here
and
there
they
actually
did
change
the
algorithm
kind
of
fundamentally
to
use
the
GPUs
and
I
think
in
our
experience.
We
found
that
you
know,
and
sometimes
you
can.
A
So
I'm
just
going
to
kind
of
close
here
with
some
a
few
practical
notes
about
using
you
know
how
you
can
go
about
using
the
GPUs
on
Cori
or
sorry
on
the
upcoming
perlmutter
system.
So
we
have
kind
of
taken
a
practical
approach
here
at
nurse.
We
realize
that
lots
of
folks
have
you
know
existing
GPU
codes
or
thought
about
porting
to
GPUs
in
the
past
and
I
think
we're
basically
ready
to
engage.
A
A
You
know
a
compiler
available
to
test
on
the
core
GPU
system
and
perlmutter
and
in
the
near
future,
for
this
activity.
You
know
the
the
other
thing
that
I
think
is
important
as
you're
getting
started
is
thinking
about
what
are
the
optimization
concepts
involved
in
moving
towards
you
know
an
energy-efficient
architecture
and
GPUs
in
particular,
and
you
know,
I,
think
that
in
our
conversations
with
users,
we've
discovered
that
users
kind
of
want
to
know
the
following
the
answers
to
the
following
questions.
So
what
part
of
my
code
should
I
move
to
the
to
the
GPU?
A
A
A
Just
basically
conclude
here
with
the
answer
of
why
GPUs
well
I
think
the
practical
answer
is
basically
because
they're
coming,
but
I
hope
I
kind
of
convince
you
that
they're
coming
for
exciting
reasons
and
that
Perlmutter
is
really
a
system
that
is
optimized
for
the
scientific
community
and
it'll
include
both
these
NVIDIA
GPU
accelerated
nodes,
where
a
large
fraction
of
the
capability
will
be
as
well
as
the
CPU
only
knows,
and
so
with
that
I
will
conclude
and
I
guess.
I
could
take
questions
there
or
any
questions.
A
Yeah,
that's
a
good,
that's
a
good
point
I'm!
So
the
you
know
some
of
the
specifics
of
the
architecture.
We
can't
quite
talk
about
cuz,
not
I,
guess
not
all
of
the.
Not
all
the
products
are
completely
announced,
but
one
of
the
reasons
why
we're
advocating
for
OpenMP
is
is
because
it
is
a
kind
of
a
portable
approach
between
the
different
different
vendors.
A
As
we
talk
here,
the
ones
the
you
know,
there
are
two
here
that
are
clearly
vendor
specific
and
those
are
CUDA
and
CUDA.
Fortran,
the
you
know,
open,
ACC,
Koko's
and
Raja
would
be
good.
You
know,
potentially
good
performance,
portable
options
as
well.
I
think
maybe
your
question
is
just
more
like
a
suggestion
of
something
to
do
throughout
the
day.
Is
that
right.
A
A
You
know,
I
would
just
comment
that
I
do
think
it's
a
good.
It's
actually
nice
that
in
some
sense
that
the
upcoming
systems
that
are
gone
and
Oakridge
will
also
be
GPUs,
even
if
there
are
different
vendors,
I
think
for
the
first
time
in
all
and
in
a
while,
the
architectures
at
least
look
similar
enough
that
there's
kind
of
hope
that
you
can
portably
code
for
for
all
of
them.