National Energy Research Scientific Computing Center (NERSC) Quantum for Science Day, October 24, 2022, 4 Nov 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Large scale Hybrid Quantum Workflows with PennyLane

Description

Large scale Hybrid Quantum Workflows with PennyLane
Lee J. O'Riordan (Xanadu)

A

Okay, so thank you very much for having me today. I'm I'm, Leo, Reardon I am a senior Quantum software developer at Xanadu and I'm. Currently, the performance lead for the Penny Lane Software product, so I'm going to give a quick overview of this.

A

What we call a large scale, hybrid Quantum, workflows with Penny Lane and the general idea- is first of all, I'll introduce everybody to what Penny Lane is the you know our kind of goal for the projects uh for the for the library uh discuss a little bit about integrating a good, HPC, tooling and software, and then move on to the results we've had for for uh from using Pearl water over the the past year.

A

So great so yeah I I'll just give a quick, introduce introduction to Penny Lane first, so petty Lane itself is in our eyes the kind of the best way of allowing a researcher to kind of build. These state-of-the-art kind of hybrid and device agnostic, Quantum algorithms. So whatever kind of real I guess powerful points is that Penny Lane allows you to incorporate multiple different types of devices in a single workflow.

A

So if you want to offload your device to as add a new device to a simulator to a given Hardware, you can do this All in a single circuit, and you know we have multiple users who you know build their own devices, build support for their own hardware and you build their own simulators, and so you know with a fairly large community, a large number of contributors, but you know inside and outside the research communities themselves, and you know we're really kind of quite happy with how people are using it.

A

So another thing I should say: is we work with a lot of people in a lot of places right so from the hardware side, you know we're building our own, uh our own Hardware, at Xanadu, from the software side and partner, with a lot of different organizations and um and companies to add support to pennile and to make use of pennile, and you know from the applications point of view as well.

A

We do a lot of development and research with uh with other organizations, and so the goal of this is kind of to support quantum programming really on on any platform. Okay, so we really want to make sure that you know no matter what Hardware you have available. You should be able to build an example within Penny Lane and to use Penny Lane for your uh your Hardware access, so it'd be that you know integrated photonics for our own stack, whether it be superconducting, qubits simulators, trapped, ions.

A

We want to make sure you can also integrate your workflows into machine learning tools like torch, tensorflow and Jacks, as well as making use of you know, HPC platforms and Cloud native platforms. Okay, so we do this with cognitive, this composition of hybrid Quantum and classical uh design philosophy, and we treat Quantum circuits themselves.

A

As you know, natively differentiable, so any Quantum circuit that can accept the parameter should be able to be trained based on that parameter and a user should be able to kind of build a overall hybrid workflow by combining these trainable circuits, as well as their classical model and build something that's better than the zone of its parts.

A

So if you have some type of quantum problem that evaluates and gives results, two song classical neural network or do some other parts of a pipeline and then uploads to another quantum device, we want to make that as seamless as possible with Penny Lane, and we do this with some help from what we call Quantum nodes or Keynotes. So this is where the and the integration comes with, uh with you know, quantum computers and classical scientific libraries within Penny Lane.

A

So if you have made use of tensorflow, jacks or torch in the past, you notice that they might have these native tensor-like objects and with these native tensor-like objects, they each have their own way of tracking gradients, uh you're tracking the the operations through a classical machine learning model. So to make sure that we kind of have a nice integration with the quantum circuits we unwrapped these tensors. We feed them into our Quantum circuits in the most efficient way, and then we convert the output of a Quantum circuit back into a tensor again.

A

So, as far as what goes on inside the cue node, the classical machine learning, Frameworks, don't care or within the queue node itself, it can be simulated in Quantum circuit. It could be Hardware that you know passes on the parameters to build a circuit and do evaluations or or some kind of hybrid combination, okay, so just a quick overview as well. uh There's plenty of examples to go around uh you know, Penny Lane has could have put you know, education at the Forefront, for you know upcoming, and you know Cutting Edge research.

A

So we build examples from papers all the time and, if you're kind of looking to know more I would suggest visiting the the Penny Lane AI qml website, and you know we have cross. You know we, we don't stick to just their own simulators. You know, there's there's devices there, which are you know non non-zanadu devices. We have a you know, GPU simulation, CPU simulations everything you might want to see.

A

Okay, now that the the Penny Lane introduction is out of the way, let's get on to the HPC side, so one of the the main focuses we've kind of had over the past year is making sure that Penny Lane is natively supported with uh with with HPC Hardware right I mean we see these.

A

You know the development of quantum computers and software for quantum computers to be very tightly integrated with with HPC platforms, so we could have started by focusing on ensuring Penny Lane has I, guess, suitability and tooling, which you know is native to the hvc based space. So right now, there's was a lot of focus at least on integrating it with kruda and especially cool Quantum from Nvidia. To make sure we can, you know, definitely take advantage of those a100s and pearl Mudder in the in the most efficient ways.

A

We also have native you know, C plus, plus back simulators, which are offloading with openmp uh to make sure that we can. You know, always run well on the given Hardware, as well as a new support for Cocos, which we're I'll discuss briefly in the following slide. uh Some of the other work we're doing. Obviously, I mentioned the machine learning framework integration so apply torch, tensorflow Jacks.

A

These are natively supported with Penny Lane, and you can easily build a hybrid Quantum job that will work within a given a given workflow for for these platforms and last on the list. We're discussing distributed, workloads, not good MPI this time, but with Ray and dusk right so I mean these. Are you know if anyone has ever played with the likes of Ray or daspies or great tools for for task-based computation?

A

And you know they natively support distribution of of workloads, and you know, we've had great success with that and I'll come back to Ray as well, when I discuss the results on role model so just to give a quick overview as well of our software suite. So this is the busiest slide on the deck. You know, I'm, sorry for the words, but it's just easy to have it all on on one place.

A

So we want to make sure that Penny Lane runs on everything and we want to make sure that Penny Lane runs fast on everything. So, to start with that, we built a device which we call lightning.cubus, which is a modern C plus 20 code base. So the idea is that if you have a modern compiler, it will run natively and we support batching of observables with gradients.

A

So one of the things with Penny Lane is we want to make sure that gradients are supported, natively and you're going to be calculating gradients with respect to observables in your circuit. So we can independently batch these observables over openmp threads on a given on a given uh CPU system, one of the next things we kind of added recently, which is this automatically dispatch, Syndicate kernels.

A

So in this case that, if you have support, for you know AVX 512 on your Hardware or avx2 or even just AVX, we should be able to query what gate sets you should automatically support and our internal dispatchable issues which set of operations will be the most performance on your system and the nice thing here is you don't actually need to comply with this from scratch to make it work? You just do hip, install Penny Lane. You get this on Windows on Mac on Linux, and so this is kind of our our Bare Bones festive.

A

You know the fastest uh simulator that we built up until a few months. Back and obviously recently, we've been focusing a lot on Lightning GPU, which is our cool Quantum back simulator. So with this you know we're getting the best performance on Nvidia Hardware, but we've also implemented some native GPU support for what's called agile impact.

A

Propagations, forgive me for agile impact propagation, and so this is kind of a way of efficiently evaluating Quantum circuit gradients in I, guess the most performant manner for for for classical Hardware, where you have actually where you have actual access to a stage vector and last but not least, we have multi-gpu support for the batching of these Observer wood gradients as part of this device as well. And so, if you want to install this, you know assuming you're running on a Linux machine.

A

You pip, install any land, lightning, GPU and pip, install Google, Quantum and you're ready to go again, no, no combination needed, and, lastly, the caucus device is one that I I want to. Just draw a little bit of focus to so. This is something we've put together recently, because we want to make sure we can support pretty much any accelerator. That's a you know available on the market right now, as well, as you know, coming out over the next year or two, and this automatically multi-threads irrigation kernels uh for us.

A

You know over open and previously, both plus threads, depending on how you compile it, but we also support Cuda natively. We support hip and rock them if you want to compile for AMD, gpus and sickle if you want to compile for a single supported platform. Okay, so why? Why all of the tools right uh a goal?

A

I guess at Xanadu is you know to build a quantum computers that are useful for people and, at least with what I was kind of describing here was making sure that all the tools we build with Penny Lane are useful, so making sure that anybody who intends to use Penny Lane should be able to build whatever workflow they want using that tooling and especially making use of HPC Hardware, so I'm going to focus a little bit now on.

A

You know: variational Quantum, optimization problems and take a little bit of a detour as well into something called circuit, cutting okay. So let's focus a little bit on variational algorithms and the gradients okay. So the general idea with a variational algorithm or you know, a parametric point of circuit. If you will is you have some set of parameters and you have some Quantum circus which will accept these parameters, and you can treat this as a function effectively. A black box function where you know your function is your circus.

A

Your parameters can be passed in and then based on your incoming parameters, yeah, but from your Quantum circus, um will will differ depending on the algorithm you're, putting together one of the big things that I guess we we tend to support in Panama, as I mentioned, is native gradient support.

A

Okay, so we want to be able to make sure that these parameters, we're passing in, can be updated based on some cost function or some relative gradients that we are are interested in evaluating. So this allows us to effectively navigate a potential landscape and find some type of solution to to a given problem that is of interest to us Okay.

A

So next, on the list I want to say is quantum circuits are natively differential, provided, they're parametric, and in this case this is easily supported in Penny, Lane, uh right out of the gate, and so one of the Arcus two of the the given methods that are, you know, most I guess prominently. No one being one is finite.

A

Difference we're all familiar with this from even classical types of problems, but in the quantum world we also have parameter shift, and this is kind of a way of saying we can build gradients from multiple executions of quantum circuits and we know the scaling for these. It's you know for n parameters, we're passing in.

A

We need on the order of two and evaluations of quantum circuits, and this is a method that will apply to you, know classical Hardware to Quantum Hardware or you know some any hybrid Hardware that we're putting together and it will be exact if we're using parameter shift or it will be approximate if we're using Planet difference, but the general idea is.

A

We need to evaluate lots of circuits if we have lots of parameters, and this can cause a problem depending on the type of workload you're trying to put together, okay, quick DeTour for a moment, so I'm going to talk about circuit, cutting for the next minute or so and just say that a tensor network is a Quantum circuit. Is a tender Network? It's a Quantum circuit right. So these things are interchangeable.

A

Depending on how you formulate your problem- and you can always convert the tensor Network into a Quantum circuit or a Quantum circuit into a tensor Network, provided you choose the appropriate operations that are supported by both old paradigms and in this region you can apply methods that work in tensor networks to you know native state, Vector simulation and Quantum circuits, and some of the you know. The issues we can do are one of the the behaviors we can do on the tensor network.

A

Is you know, cutting the the indices of you know connecting um connecting tensors or cutting the gates themselves or which would be the tensors? So we can effectively distribute and break these components apart and kind of evaluate them, independent as we have another, and with that we can actually do this to Quantum circuits too so take, for example, a 60 Cubit, Quantum circus.

A

You know this is not going to be runnable on a you know, a state Vector Simulator for well, unless we have some very, very amazing improvements in in memory over the next few years. I, don't think this is ever going to be the case.

A

However, we can you know, assuming that we have an appropriately built problem decompose this into a you know, a large number of smaller circuits that can potentially fit on the available hardware and, with you know, physical Quantum systems coming online, and you know logical uh qubits that will be available on these systems. We kind of expect that there will be limits to what we can run for.

A

You know for the uh for the hardware as it becomes available, and so the idea with this is to be able to break a problem down into bite-sized, showing second you're on on Quantum hardware, and so by taking this methods, we can break circuits up.

A

We can stitch them back together, classically after we evaluate all of their individual components and the nice thing being that we can effectively evaluate all of these Cuts independently from one another, and you can ask them, you know: can we Farm these out to you know combinations of CPUs gpus qpus, and the answer is yes. Obviously, this is kind of one of the the nice things we've kind of shown with this work, but first I want to do another quick detour into an example of how you can do this in Petty Lane.

A

So this is our Quantum circuit. So what I'm going to do is I'm going to create a two qubit device which in this case, is our lightning qubit simulator next I'm going to enable our circuit cutting functionality in Penny, Lane and then I'm going to build a tree qubit parametric circuit right. So obviously you know if you're trying to simulate a cupid problem on a two cubers device you're going to run into issues. However, by adding this wire cut, we can effectively say: okay, we can just.

A

We can break this circuit into into multiple pieces, which will be smaller than than the original um than the original composition, and then we can evaluate some type of exploitation value, one action. So the idea is that we want to make sure this is as seamless as possible for the user and circuit cutting and stitching is effectively hidden behind the scenes. So, as far as you're concerned, you just tell it to do the circuit cutting and it evaluates everything as though, as the 3D Jupiter circuit. She will.

A

You know, pass in your your parameter there, which this requires gravity was true. Make sure that you know your system is trainable. You can evaluate your circuit with the parameter or you can evaluate the gradient of the circuit. Well, it's into that parameter, and you know this. This works out of the box with uh with Penny Lane, currently, okay. So now we talk about scaling this up so circuit execution. For for this type of problem, you know it. It runs into lots of evaluations pretty quickly and the idea is.

A

We can kind of take this single forward pass of a Quantum Circuit by passing in parameters breaking it apart with our circuit, cutting and then evaluating it as a as a given function into this type of a workflow where the circuit transform is effectively cutting our circus down into smaller chunks. We evaluate these circuits independently of one another. We get an outpush and then using this post-processing uh classical uh reduction.

A

We can bring the results back into into one um one: numeric value of Interest and, as I mentioned before, gradients are also of interest to us. So as we're calculating more gradients, each slice of a gradient is effectively a forward pass execution in this framework and, if you're passing in lots of parameters- and you want to calculate lots of gradients well, you need to scale up the number of evaluations you have to do independently.

A

So this is kind of the workload we would expect to see for, for a large scale run using this type of circuit, cutting plus Quantum gradients workload. And what are we doing this for? Well? The idea is to do Quantum parameter optimization for q-wayoa, and you know the the results in the paper uh listed at the bottom of the page. Are there I would suggest everybody to have a read some very nice analytical results, but I'm most interested in numerics in this.

A

uh In this talk, and so the idea is, we can kind of use this workload where we're building the forward pass, we're building the the gradients- and we can use this to calculate information, variational energy for for QA QA problems. So the first one I'm going to demonstrate is a 129 Cubit problem and then look at variational parameter, optimization of the 62gbit problem that fits into this uh into this result. Space so on to the numerics. Well, I have I think I have two minutes left so yeah in terms of variational energy calculation.

A

We kind of hit a up to 129 cubits, and we had some very nice analytical results that allowed us to evaluate the same. uh The same results for the problems we were looking at. So in this case we have. You know two qaway circuit layers. We have 25 nodes, first QA cluster and we have one node per inter-cluster connection.

A

So I mean this is kind of building our problem graph and the idea is, we can kind of increase the number of clusters to increase the qubit requirements of our problem, and so we were able to run this nicely with Penny Lane, just uh out of the gate and the second problem we had was looking at this parameter, optimization over 62 qubits, so for parameter for certain parameters of certain qaoa circuits.

A

You can kind of evaluate the cost function, analytically, which was great to have something to compare to, but in regimes where you actually cannot compare analytically, we were able to show that you know these numerics actually hold up quite well, and you know some runtime times and for one problem: we're looking at two GPU nodes for half an hour for a little bit more of a complex problem, 12 hours on 10, GPU nodes- and you know the the idea of these numerics were to support the theory of of the paper.

A

So again, I would suggest anyone interested feel free to jump back and have a read and so building this we even played a few tools. uh We use Penny Lane to do all of the problem. Definition the circuit, cutting and everything we actually use Ray as part of the orchestration of each of the individual circuits for every array, remote task, we use nvidia's crew, Quantum uh simulator as part of our GPU simulating device and then obviously the uh the wonderful Pro motor to to do all of the heavy lifting.

A

So we were quite happy to see you know some very you know decent scaling results for this. um Obviously it's you know strong scaling, so we had a very fixed problem size. We could definitely make it harder heavier and make sure that we scale better with uh with different problems, but in terms of what we wanted. You know we were very happy with the results. So all of the the data I've shown here today is in this penny is in this. You know repository on GitHub, and you know that is pretty much it from me.

A

So I'll say thank you very much, I'm happy to take any questions and you know feel free to reach out. If there's any comments of anything on the above.