National Energy Research Scientific Computing Center (NERSC) Quantum for Science Day, October 24, 2022, 4 Nov 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: NVIDIA Quantum: cuQuantum and QODA

Description

NVIDIA Quantum: cuQuantum and QODA
Presents cuQuantum Slides
Jin Sung Kim (NVIDIA)

A

So yeah thanks thanks everyone for having me thanks, Katie and Don and Neil for having me today. uh So my name is jinson Kim I'm, the developer relations manager uh on the quantum Computing team and Nvidia, uh formerly research, scientist, IBM, Quantum, so nice to see a couple of familiar faces on this call today.

A

So I want to talk about a couple things that we've been working on Nvidia to enable the acceleration of quantum circuit simulation on gpus.

A

So uh the two main dress that we've been working on today are quantum circuit simulation. This is our coup Quantum SDK, which a couple of the previous speakers uh showed a few benchmarks around I. Think there's some really nice work from Xanadu earlier in the day as presented uh and so coup Quantum is our SDK for accelerating Quantum circuit simulation on Nvidia gpus, uh and the other thing that we'll also talk about today.

A

uh We'll have a quick presentation by zohim chandani on on coda coda is our platform for hybrid Quantum, classical computation, uh and so this enables uh the domain scientists to flexibly integrate Quantum Resources, whether emulated or actual into a performance workflow within a single stream, uh C plus environment. With python bindings coming in the future,.

A

So let me start uh by talking about coup. Quantum uh coup Quantum is deployed on promoter. Now we actually have a really nice uh container that Neil a very helpfully deployed and the way to think about coup Quantum is that kind of sits below this layer of the quantum circuit simulator. So imagine that one is uh programming, some sort of quantum Computing application.

A

uh You have your Quantum Computing framework of choice, whether it be Circ or kiskit or Penny, Lane or a variety of others, uh and each of one of these has their own associate column circuit Simulator, for example, kisket, air or circusim.

A

So coup Quantum sits in the layer beneath the the quantum circuit simulator and uh contains two libraries, crew, State, FEC and Q tensor net, and these libraries allow you to accelerate your computations on a GP accelerated back end. So we have some really really nice benchmarks, showing um some significant speed. Ups uh of a Quantum circuit simulation compared to a single CPU.

A

So, let's dive into uh the two leading pump circuit simulation approaches today, so the first one is the state Vector simulation method, which I'm sure everyone here on this call is familiar with. So this is a you know: the gate based emulation of quantum computer, where you maintain the full two to the end Vector state in memory and every time you apply a Quantum gate. You update the the state Vector in time.

A

uh This is a very powerful technique for simulating Quantum circuits as I'm sure everyone here knows, um you can simulate very deep circuits, very entangled circuits, but there is a kind of hard memory uh trade-off in that every time you add a qubit, you double the required memory uh required uh to simulate your system, so there is a kind of a practical limit of about 50 qubits. Even on a super computer that you can simulate, uh there is also a complementary technique based off of tensor Network methods and the way I. Think about this.

A

Is you you only simulate the states that you need so by uh optimizing the the path that you contract your tensor network over. You can actually dramatically reduce the memory footprint.

A

That's required in your workflow, so by using an optimal path, contraction uh you can actually simulate hundreds or even thousands of qubits for many practical, uh uh many practical Quantum circuits, so just to kind of illustrate uh the face space that these two kind of complementary techniques occupy uh in uh this kind of qubits versus circuit depth, diagram uh for the say, Vector you can, you know, obviously simulate, uh maybe a few tens of qubits up to something of the order of 50.

A

But you can do very deep circuits and on the other hand, you can do tensor networks uh simulation which allows you to do uh hundreds or even thousands, of uh qubits uh at the expense of your circuit depth and so in relation to the current qpus. Current qpus today occupy kind of this lower left region of this diagram. But in the future we expect them to be able to explore uh this unknown and unsimulatable space of the space space.

A

So let me talk about the DJI School Quantum Appliance. This is our container uh that is currently deployed on Pro motor, so it's currently integrated with Stark as a front-end, uh and it actually supports multi-gpu uh capabilities.

A

So in our initial release of the dgx coupe Quantum Appliance back in June, uh we had some really nice benchmarks, showing uh almost 100x speed up uh on a couple of different uh uh Quantum algorithms of interest and uh showed some really nice strong scaling measurements here so up to eight gpus, uh getting almost a 90x increase in speed.

A

Since then, we've actually put a couple of performance optimizations uh solely in software, so um in our most recent release, we're getting almost a 300X increase in performance for the Quant 4A transform for 32 qubits. This is using all agpus in a dgx a100 box. So really really nice strong scaling and really nice performance uh benchmarks.

A

Overall, uh what am I say uh about uh the container coming out in the future is uh so we have a container that is slated to be released uh Run next month, uh quarter, four and in this next release, we'll actually support kisket integration with uh multi-node and multi-gpu support.

A

So we have some initial benchmarks of our multi-node performance: uh we're simulating up to 40 qubits um on uh on djx a100 nodes. uh This is on 256 gpus and we're showing uh you know pretty nice uh weak scaling uh measurements here. The execution time is remains under a minute and we're also showing some really nice strong scaling uh measurements for 32 qubits. uh This is for 256 gpus uh again. These are for Quantum volume, qaoa and Quantum phase estimation.

A

So really nice performance benchmarks uh supporting multi-mode and what we also want to show is uh kind of this record-breaking performance for simulating a Quant volume, Circuit of depth, 10 about three and a half times faster than uh 64 node CPU cluster, uh just onto tgx a100s.

A

So these are all uh benchmarks for crew stavec for tensor net um I mentioned before. If you do some optimization with upfront and find the optimal contraction pack, you can actually dramatically save on the computation the cost of your computation in terms of memory and performance.

A

So there's two things uh that we like to characterize: uh tensor networks with uh one is the quality of the path. This is the the total contraction cost and the the time to find this optimal contraction path or the pathfinding.

A

So in comparison to some of the uh kind of state-of-dr packages uh that exist out there, uh comparing to Optima and kotengra kutensernet is actually several orders of magnitude better than optimism and about 20 30, better than cotank in terms of the total contraction cost.

A

In terms the of the time to find a contraction path, um we're about an order of magnitude better than cotangram. So really really. Nice uh metrics for our tensor Network performance and I just want to point out one really nice demonstration that uh we like to show this.

A

uh Some of our colleagues at Nvidia research developed a novel, variational Quantum algorithm and used it to attack this uh Max cut problem with a known solution, so um we're actually able to scale this up onto 20 nodes of our Celine supercomputer, and we were actually able to solve a 10 000 vertex problem which correspond to a 5000 Cubit simulation with the 93 accuracy. So really nice results and still room to improve this performance even better. Since we only use 20 nodes.

A

uh In terms of our ecosystem, we're partnerally partnering broadly across the ecosystem, so uh we aim to partner with everyone and all uh uh simulators within the ecosystem. So we have a variety of industrial Partners, uh we're partnering with all sorts of quantum startups. We have Integrations with you know all the major uh Computing Frameworks, as well as a lot of HBC, centers uh yeah. So in summary, uh Coupe Quantum is available. Today we support state, vector and tensor Network methods.

A

uh We can simulate noisy or perfect qubits and we're integrated in all major Quantum circuit simulation Frameworks. uh The dgx coupe Quantum Appliance is available on Pro motor today and for download um multi-gpus supported today and in our core 4 release will include kiskit and multi-node.

A

So I like to stop there and take any questions, and hopefully we can get this tutorial on.