Cloud Native Computing Foundation KubeCon + CloudNativeCon Europe 2019 (Barcelona), 1 Jun 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: GPU Sharing for Machine Learning Workload on Kubernetes - Henry Zhang & Yang Yu, VMware

Description

Join us for Kubernetes Forums Seoul, Sydney, Bengaluru and Delhi - learn more at kubecon.io

Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects

GPU Sharing for Machine Learning Workload on Kubernetes - Henry Zhang & Yang Yu, VMware

Machine learning is becoming more and more popular in the technology world. The community is beginning to leverage Kubernetes to deploy and manage the machine learning workload.

One of the key challenges is to schedule the GPU-intensive workload. The Kubernetes has included GPU support for applications. However, there are some limitations of GPU usage:
1. GPU assignment is exclusive. Containers cannot share GPU resources.
2. A container can request one or more GPUs, but it is not possible to request a fraction of a GPU.

This session introduces how to run workload using the GPU in Kubernetes. In addition, an approach will be demonstrated to use virtual GPU (vGPU) technology to enable multiple pods concurrently accessing the same physical GPU. This approach not only increases the utilization of GPU resources, it also allows more GPU workloads to be scheduled on the same physical GPU.

https://sched.co/MPb0

A

Let's get started hello. Everyone welcome to stay here for the last. The may not be the least session for this conference, so this tree is introductory talk about the how to youth share the GPU in the kubernetes clustering using a technology of GPU. That's very basic, but I hope that the idea we introduced here would be inspire you to pour more complicated work later.

A

So, first of all, let me introduce our ourselves so my name's Henry Xin and the technical territory ember China, Rd I'm, based in Beijing. We are based in Beijing I'm. The original creator of project Harbor, the cloud native registry project, which is already a CN CF incubation project last year, and my daily job is to for the incubating new project in the emerging technologies like container colony tips and also AI and all kind of stuff like brought Chinese dosa, and today with me, is my colleague yeah.

B

Hi I'm, yeah and I'm, also from China with Martin R&D team and before I joined the kubernetes I worked for the OpenStack and I'm, focusing on the nutrient component. So glad to see you guys here, you know it's the last session today and also the last session for euro for cube account Europe this year so nice to meet you. Okay,.

A

So, let's guess with the content, so no man who has access been part of few waves about AI event Dom in the industry last couple decades about 2.8 talk to that case ago, sigh I was in the university that the my professor told me that we were in the era of AI and it turned out that what that's all real, not so realistic when we were not quite there yet, but most recently, that we feel that it's a yes very closer to our like daily lives, and we have many applications that we can feel and touch in our daily work.

A

Daily work at life, so I promise bring as many benefits like reducing human effort, boost productivity and calls, and so on right. um So here's just some concept about AI I'm not going to too deep into into it. But but generally speaking in the AI space, we have three key elements. In order to succeed, the first one is the data for training models. The second one is the algorithm algorithm to construct the models. The third one is the computing power to create models.

A

So, with the proliferation of internet, we have a huge amount of data already with the advancement of computing science research. We have very a new algorithm for the to the training I'm today, I'll talk, we focus on the element, the third element of computing power using GPU, or especially for the machine learning workflow on kubernetes, so I'm sure many of you have listened to many talks in this conference about many use cases of kubernetes. One of the use case for machine learning is the world use case of kubernetes.

A

The machine learning were low on or low for on kubernetes, so many machine learning work nodes can be encapsulated in or run as containers like for the portability or the manageability benefit. So running. Machine learning were low on kubernetes could be a very good fit for this kind of below, because kubernetes right now is the de facto standard for containerized application.

A

For example, some user may have models, so one of the benefits is that right from some of the user talked owners that many user they may have vendors, generating or creating models for them and all the spend and they need to play in the same pipeline of their machine learning workload so need to put their workload into the same platform. So kubernetes could be their natural choice because once they once the machine learning train, the model can be put on to run on the kubernetes. It can become a standard platform for them to integrate together.

A

So kubernetes could be a very standard. I we have a standardized platform for this. Machine learning were low and it's implicit. It is available. It means that kubernetes is kind of service of communities a commodity in provided by many car service provider, so in in order to use the machine learning wallow in the kubernetes that we need to take advantage of the device plug-in of the other of the kubernetes in order to consume the computer celebrators, for example like the GPU, FPGA or ASIC.

A

The idea is simple things worse than the document there. In the inner kubernetes, however, there's some limitation for the GPU scheduling in kubernetes, the first one is the exclusive assignment, which means that once you assign a GPO resource to a particular port, you can no longer be assigned to other key other other ports. That means that you cannot share and the other restriction or limitation is that there's no fractional assignment.

A

For example, you see video, you can have point 0 upon points, point 30.5 resource assigned to a port, but in GP we can only have the whole number one two three something like that, so that would create some kind of limitation or not. Flexibility in the you know, in the use to the application situation so I, also in in the in the community. We're seeing that there's a lot of people trying to walk around with this limitation. One of the workaround is to call a model stuffing.

A

The idea is to put all the all kinds of models into a big container to encapsulate all the kind of training models, and then you share that it's so that it can be run on a kubernetes to share the same cpu same 7gb, you so you're, seeing that a lot request or requirement for chiranjeevi or in a Cuban 80s platform, the community, so sharing the GPU can help us the increase, the utilization and you have the pro flexibility, but there's not so satisfied solution.

A

So far, so the existing there's some existing solution that we see in the community I says the they may have partially solved the problem by sharing the sharing the GB resource across different workload, different parts, but one fundamentally when one restriction is that they don't have isolation or no QoS guarantee. That means that your workload may be affecting each other or coded the noisy neighbor a problem. So our solution here is defining used to using a technique called virtual GPU to help to address these issues. So the GPU virtualization is not something new.

A

It's just the similar to the CPU virtualization. So I think many of you are familiar with the civilizations like virtual machine right. So in most imagine you can separate your CPU and as used by different workloads, different users and in GPU we can abort realizations right now. There are many different vendors providing such such mechanism for the virtualization or GPU like the media, AMD Intel and sharing GPU between VM, so that you can get isolation. We can get QoS ready for all this workflow and in terms of hypervisor.

A

The main stream I have advised us like vSphere, KVM and XenServer. They all have the support for the trip universalization so in in so I. Come to talk about this. How we can make use of this virtualization in our communities so that we can share the Machine machine. Learning were low with the GPUs and in this example, I'm going to use the vSphere and lavinia a GPU in our demo or our our registration purpose.

A

So here's just a diagram showing that in vSphere how a physical GPU can be split into a smaller logical unit, called widget view and map into actual virtual machines. So each VM has one or more views devices and can use them as a native GPU. So there anything anyone load inside this review inside this VM. We can use this to me GPU as a as a GPU that you can see in a physical machine.

A

Here's the configuration UI, basically a physical TV. You can be divided into a fixed frame buffer and used by different vgpu, for example, a physical GPU with 16 gigabyte memory cameras to participate in to for virtual GPU each half one gigabyte memory to vgpu with each with 8 gigabyte memory. I'm the compute engine of the tributaries here between the VG views. So there are few scheduling strategies available to fit different scenarios like here, I, listen to feel like best effort, a fixer she Randy pusher.

A

So in short, the memory of the rich views affixed to each of the review, but compute compute engine is sharable between different VG boobs. In this vSphere console. You can see that we can choose different profile. It's calling in this example. I can see this grid P 100 GPU physical GPU, having the a8q means that a gigabyte memory, so by choosing different profile, you can virtually have create different type of size of the GPU, and some people may worry about the overhead of MotoGP.

A

You hear some research, we did we interrupted a while ago, so calculating studying for the virtual of overhead or the with GPU. Actually it's quite trivial like here in the it is an NLP dataset or the PTP dataset are using the RNN for the NLP training. So me about 4%. Overhead is absurd, so virtually it's nothing. So we don't need to worry about too much about the overhead.

A

Now it comes to part that how we can set up a kubernetes cluster for GPUs sharing before that I introduced the basic of GPU, and then here is the diagram that we help. We can set it up so first, the hyper hyper visor is configured with virtual GPU and next week, ray VM using the virtual GPU provided by the hypervisor, and then the machine learning application inside each of the VM can make you a fee which give you the GPU device, as if they are in a natively physical machine.

A

In this way the GPU a physical GB, you can be sure between different parts of kubernetes into each of the worker. No, it feels like it has a Fuji bu device, in fact the GBO device. A worker note you know of the word kernel is a fraction of a physical GPU, for example, in the purple ones. It's a GPU as like a half of a physical GPU, that's mapped into each of the VM and the green ones. Each of them is a quarter of a physical GPU, the computing power supporter of the GPU.

A

So in that way, each of the VM will have a portion of the physical GPU internally. So essentially, it's a sharing that you bu for the physical GPU for the workload next I'll talk about how to make use of this make use of this TV you in kubernetes, so there's still a little bit more setup. That's require in the kubernetes, so I'll. Let y'all talk about that and show a demo after a break.

B

Well, after after we we know, we have a, we have a requirement for GPIO sharing, so now I will introduce how to how to use the water GPU in kubernetes yeah.

B

The first thing is, you need to provision the worker nodes with the way GPO what you GPO device and they used the device plug-in to discover to discover in the report what you GPO results and the single master is you need to label the worker knows ways that different work, different to what your GPO capabilities and like like you, like you, see in the diagram, the worker, the worker way I'm born in a two, are consuming a consuming base, tea consuming with the water GPO, with 8 gigabytes frame paper, which is the which is the counter which is a half of a physical GPO.

B

So we live so we level worker node one and the worker node two with the which appealed out 5 equals 8, 8, 8, GP, so say: master/worker BM, 3, 4, 5, & 6. You can see here they are consuming the words you were to the virtual GPO with fold gigabytes for improper. So we we used a different level for worker node, 3 4, 5, &, 6 yep.

B

So now you can define the part to use what you to use the virtual GPO here, we'll use the immediate us plugging as an example. It discovered what you watch, your GPO results like a physical GPO results and the reported discover result to the cubelet. So in now, in your part definition you can require. You know definition. You can request the one GPO result and to specify how much how much GPO results you want. You want to consume using the notice alike. Selector.

B

In this example, you can see we want to deploy a part two to a node, with 8 gigabytes frame paper, what you GPO so so yeah and as the way and even though you repress the 1:1 GPO results here. You know. Actually it's consuming a what you GPO with its consumer or half of a physical GPO here and, as you know, way leverages of what you GPO technology, so the chaos and the isolation is provided by default. Oh, let's see the demo.

B

So now we will tell more how to schedule a part to a to a worker to a worker nodes with with a specific watch of GPIO device, yeah, let's, let's Jill. What's the current to kubernetes cluster, that happened here.

B

Yeah you can see here we have two worker nodes with different different label information. Now we can, we can get more details for these two worker nodes. This is the worker node they're, wrong. Yeah, you can see here we have a GPO results and yeah the labels. We include them what your GP of profile he is here is the Fulco pads frame buffer and the let's see the next worker. No.

B

Yes, you can see here we use a different label. This worker nodes has 8 gigabytes frame buffer. Let's go to the worker is to worker node to check if to check the specific virtual GPO profile you can see here for Q and 8 q yeah. Is it tesla type form idea what your GPU device yeah? Let's deploy a tensorflow part to worker now.

B

Lets you see the part? Oh here we use where we other when we need. We need to monitoring that. What's your GP all here, you can see later you, okay, look! You can see that there will be one container to consume the water JP or later.

B

Let's see the podium, oh yeah, you can see here we request the one GPO results and specify the notice lecture to consume the fukiko pads free pepper. You know the Poli will be scheduled to the first worker. No.

B

Yeah, it's running you can see here there will be. There is a process on a worker working out. Oh we're, not zero, and this is a full kill. Tesla Tesla you on GPO water, JP attack profile. Yeah. That's that's done! For this time. Oh.

B

Yeah, you know this is just a simple demo to consume the water jpo using the notice selector, but actually we can provide a more flexible master to consume in fractional GPO. We can insert under the scheduler and the change that you want plug in.

B

If we change the device plug-in to discover and the entrepot the watch of GPU, you know due to the restriction of an API server, the fractional number is not allowed, so we use one thousand to represent a physical GPU and, for example, here you can see we we rip out a washer GPO result as a number five hundred. So that means we discovered, what's your GPU with half of a physical GPO.

B

So here we reap our 500 and the next of way attended the Cuban and his scheduler to support to support the new results tab or to GPO register, and we use the filter in the prioritized method to to select a passed, a matching worker node and the way we used to define the master deployed to deploy the part onto a worker, node uh yeah, and we also need to add the annotation into the part party.

B

I'm the part definition file for to about the requesting what your GP or results yeah and the doctor blade will will use this annotation to update the worker, knows with what you GPIO say, with the water GPU to update the water GP or is also correspondingly. Yes, this code is under development and the developer meant we will open South dated later in next time. Yeah next page hurry we introduce today you'll use oak ice for water drip. You, okay,.

A

So much so one more thing is that just now we introduced a way to have a fractional GPU, because we don't want to change the upstream code, so we use the 1000 as one whole CBO and then you say: 5 500 as a 1/2 of view, so that we by customizing the device plug-in as well as extending the schedule we can achieve a way to to fractional GPU. So it's more easy for the actual use. So next I want to cover one of the two other use cases. Therefore, this GPU is sharing techniques.

A

Well, the one things I want to talk about. The first thing I want talk about is the a traffic case, so I'm coming from dredging I'm based in Beijing, so in Beijing there's in a city, there are many cameras on the roads that monitoring the traffic daily. So if a cross on the roll with it is true and expired, the camera will capture this vehicle, and then we generate a notice to the owner of the car and maybe issue a notice for four or five.

A

So we auto reminder owner to be to pay to buying some insurance or up today, right so for for each of this work to be working, we need to have the identification of each of the color image. So in this sample we is virtually becoming a identification image at the image recognition problem. So for each of the car we identify.

A

We outline the box of this car and send it to an inference, inference server for its to capture that the license plate to know the cars number I since number, and then we will based on that, we will check whether it has the asked has the insurance expire or not, so that we can decide whether to issue any action to any action. So this was some pretty pretty simple right.

A

But what happens if, when there are many cars on the road, you virtually were having a concurrency problem right, so you have many cars that you need to identify other editors at a certain time. So what we do so one way is that if you still using one inference service that you have what Annette instantly for on that inference service, every request we're going to that. What Annette right and we could- we could use the scale out- use multiple instances of inference services in that way.

A

Each of the image recognition work will be used or will be handle by the additional inference services using GPU. So virtually we are sharing the physical GPU by multiple threats or Bob, multiple requests serving at the same time, so we can increase the utilization of the physical GPU. This is a very common use case for inference service, especially for inference service. The virtual GPU is useful for that purpose.

A

You can scale out as needed and also we can shorten the response time so that you don't need to worry too much about having a lot of hardware there for the very expensive hardware. Just one use case. That's the second use case. I'm going to talk about is actually a university in China. They have a very diverse work load. Machine learning works in a very large cluster for logical of students, faculty members and they're doing research, work, teaching development and all kinds, different workloads and training, inference data processing, all kinds of all go together.

A

So by using the word float review, we can allow those people to shear. The machine learning were low on top of the same class, the physical cluster, so we can increase the utilization increase, density and multi-tenant, and so we define the policies so like, for example like here. In the day time, the physical box can be shared by four different people for different workflow and, for example, for development of a distributed training, or you can difference for teaching or learning purposes and at night or the net night time.

A

We don't need that much users, then we can have to have the to remove all those or suspend all the daytime worker nodes and then replaced by a girl like relatively larger virtual machine, a worker node, so that you can contain big use of the full power of the physical GPU. In that case that that all the training, a heavy tribute intensive.

A

What low like training can happen in this very big, bigger box and also all this process? Switching back and forth can be done by using automated script or other API calls that we can achieve this goal. I will let young again and to have a demo for how we can switch the profiles between different worker notes, so that you understand how actually helping the people to have a flexibility in their configuration.

B

Ms assume, we have, we only have a 1-mile holster, which has GPIO device.

B

You're here we have a tip line for worker notes on the insane yes Oh same holster on same holster, with fo-fo gigabytes, bring pepper and let's check the kubernetes cluster yeah. You see here, we have four worker knows yeah. Let's go to the worker notes to see the water GPO profile yeah just to take two, for example yeah you can see here we use the four gigabytes.

B

Yes, here it's also full gigabytes water GP of profile, so that suspended the worker knows their role to four.

B

It is a simple script tool to automate suspended the worker knows, and you can see here you can specify the the label which you want to suspend.

B

So our worship waves will be powered off. Yes, si si the kubernetes see the countless kubernetes set up. You can see all worker nodes yeah, our kernels are not ready, so that started a new worker nodes which consumed the whole face code review.

B

You know in you know, demo, the whole of physical water table is Tesla.

B

We we 100 with 16 gigabytes.

B

It let's check the community set up.

B

Your here there is another one worker another worker null is ready. So let's go to the worker node to check the worker. Node is using a physical GPU.

B

This is the know what another five.

B

You see you can see here, it's it's consuming a whole physical GPO yeah it's! This is just a simple demo for the user case too you can. You can deliver your your own scrip to choose to spend and resume. The worker knows which you want to do. Okay,.

A

Yeah, so that's a very simple demo: I was really fundamentally fundamental, but I hope you get the idea about how we can share a GPO between different ports book notes, so that you can share local, especially useful in the cases that, like inference service, you can have different services sharing the same physical GPU and basically kubernetes is suitable for machine learning workload and using our techniques in the GPU virtualization.

A

For the machine learning below we can achieve a few benefits like utilization will be very high relative hi, you don't have idle less idle physical devices, because GPUs are expensive and also we can have better scalability. You can have more concurrent different workload or mix what know you can have request, handle more requests at the same time, and also you can have isolation and QoS because many of the existing solution in the community, they don't have this isolation and us property.

A

Also one more thing I want to talk about is the mention is the suspend we can suspend a virtual machine or as along with the V GPU, and also resume at a later time. Also in VMware we can do the live. Migration is that you can move the below from one place to one machine to another machine.

A

That means that you can have a better maintain, maintain abilities, because when you need some time for maintenance, you can migrate the book note to another physical box without stopping your current training job and also we have snapshot and cloning. The orders kind of things are coming from the virtualization, either from CPU or GPU. So with that all this is the our idea of idea of the GPU sharing and kubernetes. So we take by taking advantage of the virtual GPO technologies.

A

Hope you get the idea it's again. This is a simple and introductory, but I hope you you can utilize it to more interesting work based on our our introduction here. So that's! That's all what we have so far.

A

Thank you and if any questions I would be happy to take some young, we can keep a microphone.

C

Thank you. It's.

A

C

Interesting stuff, how do you decide whether you want to just use the virtual GPU API versus switching out your worker node to have a smaller GPU or bigger GPU, like you know, suspending it and starting it again? Sorry.

A

I didn't get a question so yeah, oh yeah,.

C

So how do you decide between just using the V GPU API and that you can create in kubernetes versus suspending and recreating the node with a different size, GPU like when? When should you do that, like.

A

Well, why do you ever.

C

Need to do that so.

A

The question was: when how do we decide when to switch the different profile? The GPU write me GPU, so sometimes depending on the world os3. So like my luck in my example there before, when we have four, we need to split into four like, for example, sixteen gigabyte each up have four to go by was some more odd jobs, machine learning, jobs right if we are smaller than you don't need a whole, the full power of the GPU. You can have smaller one. Oh.

C

Maybe I'm missing, maybe I didn't describe it, so you, the the beat the vgpu api, allows you to dynamically request it on the worker nodes that exists right so like I can I can deploy a pod that has 500 units and then it uses half of the of the GPU. Why do I? Also like it also made it seem, like you have to change like actually to shut down the node to change, to change the size of the GPU.

B

So your question is why we need to suspend some worker nodes and the tool to resume some worker notes. You know you mean the part can consume specific water. Jpo results in our. You know our demo right.

B

That was because for some for the hypervisor, you know for the food for the hypervisor. If we only have a one holster with GPU with the physical GPIO device, you know for the hypervisor we cannot, we cannot have flexible to to severe survive which which were GPU. You want to use for your way.

B

So if currently, if you use one virtual way around on the physical host to to use to use a water table a profile like full gigabyte stream paper, you cannot add apply, you cannot add, apply or eight gigabytes virtual on to the same holster. So that's why that's why we want to? We want to wait if you want to deploy a new worker node with a different, what you dip your device onto the same hosted, so we need to suspend the existing way I'm and resume one new one, I'm, not sure.

B

If it's your question, I.

A

Mean the reasoning is that when you assign for VMs here, they already assign all the video buffer memory buffer of GPU car. So you cannot spend another PM that used reusing this sharing the same physical GPU, so we need to shut it down, shut them down before we release the GB Rios resource and then reorganize as one big host of the way. Again. That's that's why?

A

Hopefully they answer your question. Okay, any yeah.

D

I'm, seeing this brutalization of the GPU and I'm wondering if this is specifically for the case of multi-tenancy.

D

Kubernetes deployments, where you maybe create one cluster of natives for each tenon, and you don't want the different notes. Computer of the different kubernetes cluster share the same GPU, so you create a bit well GPU.

E

D

Single tenant, this is not as I understand, no.

A

Because for single tenant there's also a use case, I mean I know. Your question is that if your multiple tenants you need you do want to separate them right, isolate them, but you go for single tenant. There's also a use case for the noisy neighbor. Even you the same thing and we have different jobs. They can. If you don't have isolation, then you can either interfere with each other. Then you cannot currently when you, which one will work better, which one will finish. First, you is hard guarantee that, but.

D

You can assign a fraction, even in the case, so.

A

If you miss out without the GPU technology with GPU technology, all the workload will be scheduled together, so you cannot guarantee which one gets. How much share? Also physical chip.

D

What's the meaning of the fraction part if you assign a fraction, but you cannot enforce it, I know, there's no.

A

Fraction fraction only work on our vgb or without or whichever event the fraction thing doesn't work.

D

So the fraction you you just described in the pot refers to the partition in virtualization mode, not in the usage of each bit. A tip you I'm, sorry, I didn't get. The meaning of the fraction number is regarding the fraction inside of your GPU, but the.

A

Fraction thing is a way, so without a fraction thing we can rely, we rely on the know, selector to assign the actual physical GPU resource that the GPO to be resource. This fraction thing is that we want to make it more flexible or convenient for the user to assign the total up to to request the resource they want.

A

It's not just for just for isolation is for, for if your I can only use a half of a GPU or if I can use a 1/4 of a GPU, that's by this number so by some customization of the communities cope device plug-in in the scheduler. Ok,.

A

That's one more head back.

E

So for this fractional allocation of devices, isn't it so what you will get 500 allocate calls in your device plug-in.

A

E

For each instance like for each single digit in your resource request, you will get a allocate call to device plug-in. So if you are a question, let's say why can we spot above, like 300 V GPUs, you will get a new device plug-in 300 allocate calls yeah.

A

This 300 is, we deliberately make it not exactly the same as 500 so for here for this device device Praveen it reports 500, because it's just a half of a physical GPO like here this physical trip you has divided, has been divided into two identical ones. Each of them is actually a half of a physical GPU, and then you map into the worker node as a 1gb you write is in the work. No, but whenever we report it gooblat.

E

Will three hundred times call the plugin allocate me device I'm, sorry.

A

E

Couplet will three hundred times cool with device plug-in saying it's.

A

No wonder we modified it, so you report it to the coolant is five hundred right 500 and then, when we're doing the allocation, we deduct five hundred territories. No, when we do not interface three hundred.

A

Okay, you mean you mean that the location is five hundred.

A

What sorry overload API I think I think it's over low. It's one one core: it's not to render times cause I, guess 100 cause which 300 cause you mean.

A

Okay 500 times, okay, so so that's that's the way. We're trying to simplify the API and I know what you mean yeah. So what we need to work more verify that page that we are not quite sure where, because you know in a process of making this working, we haven't reached the final step. But we are. The idea is that we don't want to change the code for the kubernetes api server, so we reuse the integer number there.

A

But but you view things through avoiding the 300 calls write your signature in the cost.

A

For 1000 sorry, what.

A

Okay, API calls too many right, so we need to find a way to get around with that. So understand me in curve any cost there yeah.

F

I have another question about: does it work only for one physical GPU, this this implementation of fractional Depew? Does it work for for physical.

F

So only one GP on you should have only one physical GP right, yeah.

A

The assumption of a nice one when we give you yeah we're still, we see how to improve it, but not not yet damn that's far. Yet you.

B

Mean you mean you'll have a full physical gpo, for example, right yeah, so so yeah you want to schedule. You want to schedule your part to to physical GP. You may be more flexible, you want to schedule, or what's your GP or physical GP, all right, oh yeah, cuba, cuba later can working in council enhance that device plug in the device plugging in 2 to 2, scheduled to schedule the part to a specific to a specific GPO, but I mean we're only support of one watch, your jpo per what you watch it.

B

If you want to pass through pass through the.

B

Part, you want to use more than one GPU per per mole and van physical GPO right.

A

Now, right now, there's a limitation that you cannot have two fractional goods which of you inside one VM. Thus, currently, we are working on that, whether we can what's.

B

F

Work in a vm k.

A

Vm may have a different way to work. It out have.

F

You tested it will.

A

Be able to try that, but but we just tried it on on this view, yeah we give. You may have a different mechanism to work, but but idea here, if we present here, should judge making the affectional number working.

A

Okay, I think I think so now, thanks thanks for coming for for this session, and this is last one. Thank you very much.