KubeVirt KubeVirt Summit, 24 Feb 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Virtual Machine Batch API

Description

KubeVirt extends the Kubernetes ReplicaSets API to provide Virtual Machines with similar functionality and the same can be done with Kubernetes Jobs. In order to bulk schedule VirtualMachines, an admin could use a VirtualMachine Batch API, a VirtualMachineJob, to launch many VirtualMachines from a single API call.

In this session, we’d like to share ideas, discuss use cases, and consider possible solutions to bulk Virtual Machine scheduling.

Presenters:
- Huy Pham, NVIDIA. Github: huypham21

- Ryan Hallisey, NVIDIA. Github: rthallisey

A

All right, let's give this a go.

B

All right we're pulling an audible there. We go.

A

Okay, but um all right, I will change the slide. When you tell me.

B

C

Okay, uh we'll go right to slide.

B

Two and hui when I go ahead.

C

Yeah, so um so we are uh engineer, and we want to automate things right. We want to do tools that the user can use very easily. So in our use case we want to um spawn up vms on demand and it need to be ultimate and um the demand is high.

C

So let's say we spawn up like 100 vms um at the times and um so right now we need to um the the um the automation is querying api server, 100 time to request 100 vm being great and um that's repetitive, and let's say we only want to spawn up identical vms, then um it will need to uh prepare um 100, identical vms and now to create a vm. It need to going through a lot of steps to preparing vms, preparing, vmi object and preparing the part it's going to update the vm.

C

I spec label label it and adding conditions, prepare data volume set up metadata for bmi and prepare part templates, and that in information are the same if we're using the identical vms. um But our um case is um different at uh as we want to have a specific data per vm and we don't require the vm to um last long. It can be ephemeral, so um we're thinking maybe um coming up the api like kubernetes jobs should be useful, so I will share my. Can you go to the next slide?

C

C

Oh yeah, so uh let me share with our investigation, so um we think that vmi replica set can fit into the picture here, because it can allow a user to spawn up or to request a number of replicas.

C

However, um we have some drawbacks that um that we we decided to um have another uh tools to our use case, because um we we need the tools that um we can decide which uh which vm to scale down and um bmi replica set, want to maintain the the number of replicas instead of latin, letting us pick out which vmis to remove, and secondly, each vmi is injected with the specific data. So we want to help control um yeah yeah.

C

We want to have control of the specific instance and third, we dig inside the implementation of my replica set and found out that it used the vmi api, which repeat the process of preparing the pod template and bmi template. However, we realized that we can share the common template to cut down the repeating step to provision the identical bmi and we find that the need to have a crd, that's similar to ks chop, to simplify number of step and api calls.

C

um The api should be easy to use and let the user remove the specific bmi instance. So um ryan can talk more about our ids and then the next slide. um He will share a document and now he can take over it.

B

Cool thanks, sweet um I'll share the design in the in the chat. Here, it's also on slack, um and uh can you uh switch over to it? You get a chance, david um and, and so the so thanks we so kind of introduce the the concept and what we're we're. Looking um at.

B

I'll reintroduce it like, we want to create hundreds of virtual machines and, um and what we're going to end up doing is we have loads of api calls um that we do can get repetitive. um So it seems you know that that kubernetes has sort of a similar way of dealing with this kind of thing. With their batch api with jobs, you can create a bunch of pods, you can create them in parallel and it will just launch you know.

B

However many you want it's nice um and it has some features like if you delete the job it'll clean up the pods, so there's sort of an attachment, and also um you can delete the pods um individually and the job isn't necessarily going to go and create more. It's just sort of like this uh very light attachment to the between them um and so there's sort of um an example. Yeah things are pulling up um that that's out there um and what's interesting is like so you can see a us have this.

B

Has this uh called fleet um and some of the things you want to highlight in here? It's like, okay, you can um have say uh in your fleet. You can launch a bunch of vms and you can do overrides so if you scroll down a little, you can actually see in that picture. There um there's like c4, large and there's sort of a capacity for it um and then the c5 large, of different types.

B

Things like that we can have sort of the control over. We want to launch this many vms and they want them to have these overrides. So there's like a little bit of templating, you know as well as um being able to to control a large number.

B

I like to do a bulk request within a single api, so that's kind of where um that's kind of where we are and what we're thinking so um to kind of like kick off um discussion, uh I put like one idea into the design doc um as like a a way that, um like uh we could look at this so scroll down a little to the virtual machine job. So I even kind of highlight this a little bit so the differences between let's say like a kubernetes job and a virtual machine job.

B

um You know first thing like a non-parallel, so all the highlights in blue are sort of things that would differ between the two, a non-parallel parallel creation of jobs, as opposed to virtual machines. A fixed completion account um pods uh like so. The job will look for completion um uh for virtual machine. We could define completion, maybe a little bit more to be a specific phase. Something like running. I was thinking um things like templating, something like that uh be the same um job owns all the positive crates. Instead, we go on the virtual machines.

B

We have some sort of wait for conditions and then we're gonna also wait until we've reached the allocated amount uniform metadata. um This is a requirement for jobs. um Virtual machine um we'd probably want to have some unique uh metadata or user data like cloud in the data. We want uh past secrets, passwords things like that. um We could see it being the case that you need to have unique metadata. um So if you scroll a little farther down um the, I have an example here.

B

This is just a sort of like uh just a general idea, like there's already the virtual machine api, which sort of serves as like a template of things, and um the general idea here is like uh that. I was thinking. Okay, we have this virtual machine object. You know what, if you were to sort of reference, the virtual machine, object and say I want um a certain number of those and then maybe I do some overlay overrides of them.

B

So, like you have 10 of those number one vms and we're looking for having them in a running state, and then we have ten those number one theorems in parallel. We change the memory. You know, then maybe we have 50 of those number two vms with some unique.

B

You know cloud data unit, uh cloudinet data, um something like that and you know maybe those those virtual machines um created as uh as well as the virtual machine instance being spawned um with them um so sort of like having that kind of relationship um and then at the bottom there I kind of have like a if you were to do like a get of it. You just like some of what it could. Look like like um 10 1vms, and you can see them and running something like that.

B

So um I wanted to just kind of turn it over to anyone that has some thoughts about this and um and we can kind of record them and see what uh people think.

A

Yeah, I I have an initial thought.

A

So uh I think one thing to kind of clarify would be um what are functionally. I know we went over this, but I think it would help with the conversation the differences between what's trying to be expressed in this virtual machine job and what we have in the vmi replica set today. So vmi rackbook sets always trying to reconcile x number of um virtual machine instances and we take one out and it's going to spin up another one and the job.

A

If say you want 10 in parallel. uh What would happen if we manually like shut down one of those virtual machines? What would you expect this virtual machine job abstraction to do? Would it replace it or is it saying that it's completed or yeah? What's the thought there.

B

So in in the um in this, in the way that we're using a job it wouldn't replace it, there wouldn't be, it would be very light linkage between sort of the the job object and the actual uh virtual machines themselves. um So the so like I said they wouldn't be. It wouldn't be create that you could have control over the individual virtual machines and you can do whatever you want with them.

B

um The only control would be that if the, if you were to remove the job itself, then the objects, the virtual machines themselves- would be removed. Okay,.

A

So in um we'll just take this one that I have highlighted right here, you said parallelism 10, so we're going to get 10 of these uh virtual machine instances.

A

D

Say we spin up all 10.

A

Of them and the job still exists, the virtual machine job still exists and I delete one of them, so we've declared that we want 10 in parallel.

A

Are we saying that it's okay, after we've reached the certain condition if all 10 have been created, that somebody could go in and begin deleting ones that may have been spun up by this job and the job would not replace them? Or are we saying that the job is not in a reconciled state? If we do that, that's kind of I know I'm asking the same question twice: kind of just trying to kind of drill down.

B

Yeah a little bit so the um the once we reach the the sort of the completed state. um The jobs is not going to be exercising any more control over in the underlying assets. There will be um we've reached 10. Now we're not going to be reconciling it. At that point,.

A

So ownership really yeah. Okay, at that point, it's a competition.

B

Very light ownership only yeah we reach, we have a completed state um after we've, reached that we don't we don't really care about them anymore and what they're doing.

E

And would you wait there for the vms, for I don't know uh a ready condition or something until you say: okay now I consider it created and I would not create a replacement. Or would you not want to create a new vm instance at all.

D

C

B

C

Yeah so, um as I mentioned, we have a specific data um inside vm and uh if the vm corrupted or if we want to remove the vm, it should not reconcile. So um that's the use case that we see.

C

So if we spawn 10 vms and we want to remove one, then it should be nine. It shouldn't reconcile bike to 10.

E

Okay, so you really want uh let the controller create 10, vms and then just monitor, but don't do any action at any cost. Basically,.

E

One question with, or maybe someone else I have to reformulate my question. Maybe someone else wants to continue.

C

E

B

um There's a question in the chat which I think is.

D

B

Is what you're really aiming at um a batch start, job or more of a vm pool in the end, so a vm pool um is something that um it is definitely something in the area here, um and this is, I guess, maybe like um the discussion between like the replica set and so where we kind of landed on.

B

This is like what is sort of the fit for a vm pool, because I guess you could say that's sort of what we're after, but what's the right fit um and it was like originally that's where we started kind of with the replicas didn't quite find for the reasons we mentioned, that it fit exactly what we were looking for, and so this at least isn't the perfect fit, but at least gives us the opportunity to get what we want out of it so sort of having that flexibility that control granular control of the virtual machines after they get created.

B

um So we can sort of make it into a pool like a pool if we want to with sort of maybe another abstraction on.

E

Top one more question this one uh regarding the difference to pool and chop: um let's suppose you create the 10 vms and five of them are shut down and it goes on and gone and maybe at the end there is only one running. What would be the trigger for you to create more vms than like? uh I would you expect. Would you want to outside of this controller, then monitor the overall situation of how many vms you have and then create another batch when you think the time is right or what is there.

B

Yeah, that's right, yeah! That's right! So we monitor the the count. So that's how we sort of reach this idea of pools that we'll monitor the count and we'll always know okay like that that we're not at capacity right now and we know what we want to reach, and so we will reach it. So we're missing five! So we'll launch another five.

E

So I wonder if I see why the virtual machine is inserted because it's not a clear fit um and maybe also know the pool like it is. uh I wonder if you did consider some kind of a mixture with. Maybe I don't know something like a vm pool or virtual machine deployment where you can have a special condition or something which you can somehow fill from the workload which can say. Okay. Now I'm done I'm not useful anymore, or something and combining this with something like uh horizontal, auto scaling with the kubernetes primitives.

B

Well, will that get us sort of the the uniqueness um sort of at the at the pod level and sort of the control that we would need, though,.

E

Well, another question, then: what is the uniqueness in this case? You say you want. You have a virtual machine job. You say you want 10 vms, you point it to one vm. It seems it's called 10 vms, the name right.

B

E

And then you override a part of this vm, so you get 10 vms, which look exactly like this right.

E

You don't get 10 vms, where, if everyone has individual data, there are 10 identical ones, yeah.

B

Yeah, no, that's correct! So what what what is sort of what I this is good to expand on. So what I'm thinking is that um that, uh for, I think the second one like 50 number, two vms, with with new my cloudant data. So in this case um uh the idea would be that we would have some sort of secrets um that uh would get passed, but it would be a unique secret for all of them, so whatever it is like we'd have to so.

B

The templating here is that okay, there's gonna be some sort of secret override, but whatever it is, it could be. um It could be unique so like in this case, something that matches a label um secret with this label. Will they will be handed out to all these virtual machines.

B

And there could be many that match that label, so they will all just be individually handed out to those virtual machines.

E

D

This would be an implicit.

E

Property of the con of the virtual machine, chop, controller or.

B

Yeah, so it's sort of so we have so like. I think this example said 50 vm, so in this case we have 50 secrets with some sort of label. um We then do the override and say we're going to hand out 50 secrets. Each of them will get a unique secret um from this list of secrets, and these that have this label.

D

B

That can give us sort.

E

Of sorry, sorry, for.

D

E

Again, but in this case, all 50 vms would have the same secret or would would there be 50 secrets created with different content.

B

There'd be 50 secrets of different content. All could have the same label.

E

And and they see and the secrets are pre-created outside already, and you somehow attach it, and I guess I personally am missing a little bit information on this part on how this would work so and how, for instance, from this with the machine shop, the vms would find the corresponding secrets.

B

So this is where um yeah so like.

B

So the idea would be that they can some controller that um that sort of man- that's dealing with this virtual machine job would have um would be to recognize his override and would understand that that I want a secret- um and um I don't say it here but like if it had a it- had a label field um for the over excuse me and um the label that I called you was called the 50 unique secrets, or something um and- and I have 50 unique secrets- they all have that label.

B

um When I start the one of the virtual machines, they'll be given one of the secrets with that label and and so on, and so forth, all the way through 50, and so they all end up with a different secret um and each of those secrets I created beforehand. They all have unique data.

E

So there is one case in kubernetes on deployment on stateful sets. Maybe that is resembles the idea pretty well just to clarify there is this case you you can specify on the stateful set the pvc template and it will create the pvcs for you right, but uh only if it does not already exist like in this case it uh you can tell up front what the pvc names will be, and so it will then just automatically pick it up if it already exists.

E

Like I don't know, pvc one pvc2pc3 and that will correspond to the created name of the of the pod. Is this kind of the direction you're thinking about.

B

Yeah, that sounds that sounds like it yeah, so the sort of getting that uniqueness for um for some of the pods or some of the vms.

E

Okay, I think you understand this better now, thanks.

B

Okay, there's another question um so kind of what um agony's I don't know. What that is is doing for game servers on kubernetes, you spin up a pool of xvms and you don't and you'd want to claim them from the pool to own them. So they pull so the pool doesn't, uh does not manage it anymore, um yeah, so very similar yeah. So it's a very similar idea.

B

um A lot of what I guess that I'm I'm trying to like nail home is that, what's like the kubernetes way of doing this, like and and sort of applying it to a virtual machine like I'm trying to follow the sort of replicas of the model and the comments has requested so does hubert. So what is like the kubernetes way of doing this um in cubert and to get to at least a lot of um what we're, after with a few little tweaks like the unique metadata, I would say, is one of.

B

B

Yeah, let me check the chat if there any more.

E

But uh what I guess so so good questions indeed from kevin, but I guess what I'm missing here a little bit. Is this point on um what what normally?

E

What I see very often kubernetes is that you have a very clear purpose like what kevin is describing here and then you're implementing all those parts in kubernetes, so that kubernetes does does all the work for you so that it does not just create 50 vms for you, but like that, you have this explicit controller which can cover your use case in the sense that you say okay here I always want to have 50 spare vms um and you can, for instance, then take them out and use them.

E

Maybe I don't know by re-labeling or whatever and based on horizontal, auto scaling or just based on the number on not used vms. I will replace the gun ones so that you always have something to pick from from the pool, but maybe you just read, and then I mean you can do with them. Whatever you want, you can shut them down. In the meantime, I have already created a replacement for you so that you never run out of fresh instances, so that would be, for instance, from my perspective, a fully closed.

E

Feature, let me put it this way, I'm not sure if you understand what I try to get it compared to in this case, it's not entirely clear to me. It looks like only a little bit automated, not really taking the full feature into kubernetes, like just really just batching api calls and not really actually addressing it. The full use case um yeah.

B

It it it doesn't address the the full ucs yeah I mean I do agree with you, but this is where um now so, I'm like I'm open to addressing the full use case but sort of finding the right balance is, I guess the is the problem I'm having and sort of um would like and I'd be open to having a question if we want to say like that, kubert wants to develop a um a virtual machine instance pool, for example, um kind of api object.

B

That would be really cool um and definitely be interested in discussing that. If, if folks thinks that's something that would fit, um it seems awfully close to a replication controller with some nuances.

B

So I wasn't sure if that was the right way to go with this job seemed to be a little bit more closely like it could extend.

A

I think we could get the first off we're really close in time. We have like a minute, um I think.

D

We can get pretty.

A

Close to the behavior you're, wanting with the idea of the pool we.

D

Just need to give it a little bit more thought.

A

And take into account um the use case you're trying to achieve and try to see how the pool could be shaped to do that.

A

There's enough overlap that, if we're already considering a vm pool, I not sure how we could explain the vm job and the vm pool both existing uh anyway yeah.

B

I think they're, I think I think, they're sort of I think, a job, a pool in terms of virtual machines. I think it probably makes more sense. It's just a matter of whether or not like. If that's so, I I think if we want to pursue the pool, I think it if it fits what you know like.

B

If we have an api that sort of fits in kubernetes like way then yeah, I think that would make more sense.

D

And I'm sorry to step it, but um I I believe we will have to I'm afraid that we will have to continue those discussions beyond the summit because we are already on time for the next session.

D

So at this point I would like to thank you, ryan and, and we to lead the discussion and introduce the discussion, but I and thanks david and roman and everyone who participated, but I would encourage you to follow up after the session, probably through slack and most likely also to keep your tab. The google group and communicate there. Okay, yeah.

B

D

Thank you very.