►
From YouTube: NVIDIA Quantum: cuQuantum and QODA
Description
NVIDIA Quantum: cuQuantum and QODA
Presents cuQuantum Slides
Jin Sung Kim (NVIDIA)
A
So
yeah
thanks
thanks
everyone
for
having
me
thanks,
Katie
and
Don
and
Neil
for
having
me
today.
So
my
name
is
jinson
Kim
I'm,
the
developer
relations
manager
on
the
quantum
Computing
team
and
Nvidia,
formerly
research,
scientist,
IBM,
Quantum,
so
nice
to
see
a
couple
of
familiar
faces
on
this
call
today.
A
So
the
two
main
dress
that
we've
been
working
on
today
are
quantum
circuit
simulation.
This
is
our
coup
Quantum
SDK,
which
a
couple
of
the
previous
speakers
showed
a
few
benchmarks
around
I.
Think
there's
some
really
nice
work
from
Xanadu
earlier
in
the
day
as
presented
and
so
coup
Quantum
is
our
SDK
for
accelerating
Quantum
circuit
simulation
on
Nvidia
gpus,
and
the
other
thing
that
we'll
also
talk
about
today.
A
We'll
have
a
quick
presentation
by
zohim
chandani
on
on
coda
coda
is
our
platform
for
hybrid
Quantum,
classical
computation,
and
so
this
enables
the
domain
scientists
to
flexibly
integrate
Quantum
Resources,
whether
emulated
or
actual
into
a
performance
workflow
within
a
single
stream,
C
plus
environment.
With
python
bindings
coming
in
the
future,.
A
So
let
me
start
by
talking
about
coup.
Quantum
coup
Quantum
is
deployed
on
promoter.
Now
we
actually
have
a
really
nice
container
that
Neil
a
very
helpfully
deployed
and
the
way
to
think
about
coup
Quantum
is
that
kind
of
sits
below
this
layer
of
the
quantum
circuit
simulator.
So
imagine
that
one
is
programming,
some
sort
of
quantum
Computing
application.
A
A
So
coup
Quantum
sits
in
the
layer
beneath
the
the
quantum
circuit
simulator
and
contains
two
libraries,
crew,
State,
FEC
and
Q
tensor
net,
and
these
libraries
allow
you
to
accelerate
your
computations
on
a
GP
accelerated
back
end.
So
we
have
some
really
really
nice
benchmarks,
showing
some
significant
speed.
Ups
of
a
Quantum
circuit
simulation
compared
to
a
single
CPU.
A
So,
let's
dive
into
the
two
leading
pump
circuit
simulation
approaches
today,
so
the
first
one
is
the
state
Vector
simulation
method,
which
I'm
sure
everyone
here
on
this
call
is
familiar
with.
So
this
is
a
you
know:
the
gate
based
emulation
of
quantum
computer,
where
you
maintain
the
full
two
to
the
end
Vector
state
in
memory
and
every
time
you
apply
a
Quantum
gate.
You
update
the
the
state
Vector
in
time.
A
This
is
a
very
powerful
technique
for
simulating
Quantum
circuits
as
I'm
sure
everyone
here
knows,
you
can
simulate
very
deep
circuits,
very
entangled
circuits,
but
there
is
a
kind
of
hard
memory
trade-off
in
that
every
time
you
add
a
qubit,
you
double
the
required
memory
required
to
simulate
your
system,
so
there
is
a
kind
of
a
practical
limit
of
about
50
qubits.
Even
on
a
super
computer
that
you
can
simulate,
there
is
also
a
complementary
technique
based
off
of
tensor
Network
methods
and
the
way
I.
Think
about
this.
A
Is
you
you
only
simulate
the
states
that
you
need
so
by
optimizing
the
the
path
that
you
contract
your
tensor
network
over.
You
can
actually
dramatically
reduce
the
memory
footprint.
A
That's
required
in
your
workflow,
so
by
using
an
optimal
path,
contraction
you
can
actually
simulate
hundreds
or
even
thousands
of
qubits
for
many
practical,
many
practical
Quantum
circuits,
so
just
to
kind
of
illustrate
the
face
space
that
these
two
kind
of
complementary
techniques
occupy
in
this
kind
of
qubits
versus
circuit
depth,
diagram
for
the
say,
Vector
you
can,
you
know,
obviously
simulate,
maybe
a
few
tens
of
qubits
up
to
something
of
the
order
of
50.
A
But
you
can
do
very
deep
circuits
and
on
the
other
hand,
you
can
do
tensor
networks
simulation
which
allows
you
to
do
hundreds
or
even
thousands,
of
qubits
at
the
expense
of
your
circuit
depth
and
so
in
relation
to
the
current
qpus.
Current
qpus
today
occupy
kind
of
this
lower
left
region
of
this
diagram.
But
in
the
future
we
expect
them
to
be
able
to
explore
this
unknown
and
unsimulatable
space
of
the
space
space.
A
So
let
me
talk
about
the
DJI
School
Quantum
Appliance.
This
is
our
container
that
is
currently
deployed
on
Pro
motor,
so
it's
currently
integrated
with
Stark
as
a
front-end,
and
it
actually
supports
multi-gpu
capabilities.
A
So
in
our
initial
release
of
the
dgx
coupe
Quantum
Appliance
back
in
June,
we
had
some
really
nice
benchmarks,
showing
almost
100x
speed
up
on
a
couple
of
different
Quantum
algorithms
of
interest
and
showed
some
really
nice
strong
scaling
measurements
here
so
up
to
eight
gpus,
getting
almost
a
90x
increase
in
speed.
A
Since
then,
we've
actually
put
a
couple
of
performance
optimizations
solely
in
software,
so
in
our
most
recent
release,
we're
getting
almost
a
300X
increase
in
performance
for
the
Quant
4A
transform
for
32
qubits.
This
is
using
all
agpus
in
a
dgx
a100
box.
So
really
really
nice
strong
scaling
and
really
nice
performance
benchmarks.
A
Overall,
what
am
I
say
about
the
container
coming
out
in
the
future
is
so
we
have
a
container
that
is
slated
to
be
released
Run
next
month,
quarter,
four
and
in
this
next
release,
we'll
actually
support
kisket
integration
with
multi-node
and
multi-gpu
support.
A
So
we
have
some
initial
benchmarks
of
our
multi-node
performance:
we're
simulating
up
to
40
qubits
on
on
djx
a100
nodes.
This
is
on
256
gpus
and
we're
showing
you
know
pretty
nice
weak
scaling
measurements
here.
The
execution
time
is
remains
under
a
minute
and
we're
also
showing
some
really
nice
strong
scaling
measurements
for
32
qubits.
This
is
for
256
gpus
again.
These
are
for
Quantum
volume,
qaoa
and
Quantum
phase
estimation.
A
So
really
nice
performance
benchmarks
supporting
multi-mode
and
what
we
also
want
to
show
is
kind
of
this
record-breaking
performance
for
simulating
a
Quant
volume,
Circuit
of
depth,
10
about
three
and
a
half
times
faster
than
64
node
CPU
cluster,
just
onto
tgx
a100s.
A
So
these
are
all
benchmarks
for
crew
stavec
for
tensor
net
I
mentioned
before.
If
you
do
some
optimization
with
upfront
and
find
the
optimal
contraction
pack,
you
can
actually
dramatically
save
on
the
computation
the
cost
of
your
computation
in
terms
of
memory
and
performance.
A
So
there's
two
things
that
we
like
to
characterize:
tensor
networks
with
one
is
the
quality
of
the
path.
This
is
the
the
total
contraction
cost
and
the
the
time
to
find
this
optimal
contraction
path
or
the
pathfinding.
A
So
in
comparison
to
some
of
the
kind
of
state-of-dr
packages
that
exist
out
there,
comparing
to
Optima
and
kotengra
kutensernet
is
actually
several
orders
of
magnitude
better
than
optimism
and
about
20
30,
better
than
cotank
in
terms
of
the
total
contraction
cost.
A
In
terms
the
of
the
time
to
find
a
contraction
path,
we're
about
an
order
of
magnitude
better
than
cotangram.
So
really
really.
Nice
metrics
for
our
tensor
Network
performance
and
I
just
want
to
point
out
one
really
nice
demonstration
that
we
like
to
show
this.
A
Some
of
our
colleagues
at
Nvidia
research
developed
a
novel,
variational
Quantum
algorithm
and
used
it
to
attack
this
Max
cut
problem
with
a
known
solution,
so
we're
actually
able
to
scale
this
up
onto
20
nodes
of
our
Celine
supercomputer,
and
we
were
actually
able
to
solve
a
10
000
vertex
problem
which
correspond
to
a
5000
Cubit
simulation
with
the
93
accuracy.
So
really
nice
results
and
still
room
to
improve
this
performance
even
better.
Since
we
only
use
20
nodes.
A
In
terms
of
our
ecosystem,
we're
partnerally
partnering
broadly
across
the
ecosystem,
so
we
aim
to
partner
with
everyone
and
all
simulators
within
the
ecosystem.
So
we
have
a
variety
of
industrial
Partners,
we're
partnering
with
all
sorts
of
quantum
startups.
We
have
Integrations
with
you
know
all
the
major
Computing
Frameworks,
as
well
as
a
lot
of
HBC,
centers
yeah.
So
in
summary,
Coupe
Quantum
is
available.
Today
we
support
state,
vector
and
tensor
Network
methods.