KubeVirt SIG Performance and Scale, 9 Mar 2023

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: SIG - Performance and scale 2023-03-09

Description

Meeting Notes:
https://docs.google.com/document/d/1d_b2o05FfBG37VwlC2Z1ZArnT9-_AEJoQTe7iKaQZ6I/edit#heading=h.tybh

A

B

uh Welcome to sixth scale: this is March 9th 2023.. Okay, um the link to the meeting notes is in the is in chat. uh Please add ourselves an.

A

B

Okay, so first things first um I we're we've been doing the the performance job.

B

um The periodic job results for you for a little while now, and one of the things that um we want to change is eventually get away from having to do this and move to publishing the stuff in graphs and then sort of reviewing it in graphs instead of having to go through the job. So um for now, just um what I'll do?

B

Let me just make sure that we're not seeing any failures and that we need to address, and if there are then okay, so overall, we're passing, which is good, so I think this is fine, and then there is the let's see test, which is passing military. The time okay, so I mean the the job looks good.

B

So we'll continue to monitor this on meetings, but eventually we want to get to actually having something that looks like this, which is you can see, for example, um in this graph, basically, is looking at um Gathering much of data about uh different jobs and pro and rendering it. So. The idea is that we should be able to do the same thing with what we're doing um right now with our jobs.

B

So um Lupo had some ideas for this, so I started a threaded keeper death with lubo and uh we can follow up there and then, hopefully, in a future meeting, we can have a more in-depth discussion if based on what he thinks we should do and go from there.

B

Okay uh performance cluster, so I asked Brian to get this information um uh still in discussion with them. To get this I. Think um I, don't know if this is done yet um I don't think we have. We have this information and I'm still working on getting the hardware respect so once I have this uh we'll publish it here and I think what we'll do is eventually publish this somewhere as part of like our description of our um our performance cluster.

B

So it's clear, like you know, when we're looking at the results, here's you know we can see the hardware specs that.

A

B

Jobs are running against. Okay, um let's talk about K walk uh Olay. Do you want to talk, uh introduce this topic.

C

Yeah um can I share my screen.

A

A

C

Oh so um I just want to take some time today uh to informally talk about a new project that might be very interesting for the scale work um that we are doing um in this call.

C

um So if you look at our performance cluster, it requires a large amount of hardware.

C

um So one of the things or one of the key ideas behind this new project uh kubernetes without cubelet is, can we fake out some of the needed Hardware so that we can create a fake notion of a scale in the cluster and and then see how the control plane components um uh scale with respect to those fake objects? Still um recently, an upstream project called Kwok was introduced.

C

This project has two um controllers right now, one which that works on the fake node status and one that works on fake pod.

C

So here is the blog post about that um some use case. Use cases that are mentioned here is that um this is largely going to be used for um testing like and Learning and Development. So this helps us understand the performance and scalability of control playing better and same goes like if we extend this for the cubeboard control plane.

C

um This could could be leveraged to understand the scalability of the cupboard on control plane, obviously, because of the fake nature of of the objects there is, there is some concern about accuracy and and functionality of of these um fake objects. So, in order for the test to be meaningful, we should not rely on the functionality, but rather rely on the scale aspect um and and create the churn in the actual um actual objects.

C

So with that, I I want to quickly walk through how um and introduce how this works.

C

So internally, as I said, the the entire project has two controllers: those controllers can be deployed in a cluster environment or they can be deployed in in a local environment.

C

um These controllers have uh a notion of stage the the way this works is that in in a stage there is a template defined. These templates are rendered in in the controller and the rendered output is applied as the status of of the controller or status of the fake options.

C

The interesting part here is that you can combine set of stages, so here you can see that there is a next section. So after this stage is applied, what is the next stage that will be applied right? That's what this section defines, and there is some kind of um configuration parameters with respect to um time related configuration like some amount of Randomness, Jitter and and after how many, how much time this state with the status will be applied. So those are all configured by this section and the actual object is configured by by this object.

C

So, to sum up the stage, um this is not actually a crd, it's something that you would have to Define at the start, time of the controller. So this is a configuration file that is supplied to the controller um in the beginning, but in the spec of this stage allows you to configure what resource to act on when to act on and what status to apply.

C

Because this is defined in in a stage kind of configuration, you can have a set of stages that your fake objects can go through and that allows you to create a notion of fakeness in the cluster and still test the control plane, as well as your controllers, because the amount of status updates or Randomness being generated from from this can easily help in testing the control plane, as well as controller components of of a project.

B

So lay um kind of what you're showing here is that so we could take the controller from KW. Okay, we can apply it to a cluster and we should we will be able to launch a bunch of objects, and this is going to stress the control plan for us and that's going to give us an approximation of of our scale. Is that.

A

Is that right, yeah.

C

Okay and the the two controllers that this project has by default is the part and and the node, so we we should be able to create fake nodes uh to get the scale estimation.

B

And so you would you, do you know if you do you, if you need, is it required for you to launch fake pods on fake notes, or is it um like? What is the relationship or is it that they're independent, like we just create, can create fake nodes to create pressure, or it's a requirement to have to use the two of them together?.

C

So the way it works is you have to use a label selector um to Define what objects these controllers will act on, so you can create fake pods on on real nodes, but they just have to match.

C

um The problem with that is you'd somehow have to tell the cubelet on the real node to not act on it, uh and that would be little uh difficult to configure in my uh in my little experiment. So far um what I have observed is this awok controllers just work on the label selector, so they can take over objects in the um in the real node. But not you would have to somehow um stop the cubeletter or something to make sure that cubelet and the fake controller don't um race against each other.

B

I see so I guess maybe in the simplest way. Look at this is that we probably would only want to watch the fake pods on the on the fake notes, at least before.

A

C

um Yeah, so that's that's a bit of the introduction of this stage. I wanted to take a moment to go, look at um delivered stages that will help us.

A

In understanding in just a second.

C

So this is the default um stages for a node, so there are two stages here: one is the node initialize and the other is a node heartbeat, so the node initialize, what it does is it initializes the node status, with a set of default resources and other other status conditions like the IP address, the hostname, cubelet and point, uh and and so on and so forth.

B

Kwok has a controller and I'm seeing status here, I'm seeing some other fields, and since this is a fake node, but I'm understanding like this, this fake note is like a real like it's. It uses the node API right. So how is this able to interact with status and some of the other fields that are really meant for the node controller and that guy wasn't not able to? How is it? I was able to interact with those fields and how is the No Control and not fighting against it?.

C

So I think my understanding is that some of the fields in this uh status are owned by the cubelet. The other fields are are owned by the node controller. So when, um when you initialize the the node this the status that is being applied only um populates the field that are owned by cubelet, so um in a way it fakes, the cubelet and once it has learned that the next part uh the node heartbeat, is uh what um like this is.

C

The part I think that is applied by the the node controller and because some agent is already applying this. uh The node controller is, um you know, not taking a look, but that is one question. I I have and I'm still trying to figure out like. Why is the node controller? Not fighting, but that's my initial guess as to because there is no cubelet running on it. Node controller is not able to talk to the cubelet and it's like not updating the status so.

A

C

Node node controller will come in and append the heartbeat status.

B

It's taking the keyword side, so maybe maybe um so, maybe that's right. So that makes sense to me like when, when the Cuba, the fake, well, the controller, that's faking, the keyboard stuff fills out the fields. Then it's like how so that I'm.

B

Assuming then that the node controller must do something that would be interesting because um I mean I I, don't see how it would, and it probably would want to do something and then maybe maybe then the um the fake controller does a bunch of stuff afterwards or something that um I would expect at least that the node controller does at least at least something. But that's that's fine I mean if it does or not I think it just I.

A

Think it just illustrates.

B

The full picture, though, to me I, think it's the key. The part that I understand was like okay, cubelet there's this controller, that's faking cubelet and it's filling out these fields so that we can progress to the next stage. So the controller can do its work. Correct.

C

Yeah I'll take that as an open question to figure out. uh Why is this not raised against the node controller? um It's an interesting part. uh It will help out a lot in understanding how this project can be used.

C

Yeah um and yeah I I wanted to complete uh the thought I was running with. Is that so the first stage, the initial stage applies the phase equals to running um to the to the node object. Then the next stage you can see here that in the selector section it says, um match status, dot phase. So once this fake object has gone into running State without delay of so and so mentioned in the spec denode controller. Well, the fake node controller will append these um status conditions to the um to the fake object.

C

So what ends up happening? Is you stimulate the fake nodes with with some statuses, and you don't need the actual hardware for it, and then you can use a similar, um similar, odd, yeah or letting your odds go to running state.

C

Now I want to take a step back and.

C

Talk a little bit about how this can be leveraged with keyword.

C

So if you look at the keyword stack when we, when a user creates a VMI, the word controller sees that VMI and creates a pod for it, the PO. What we could do potentially is that the BMI can be configured with a fake uh with a selector that selects fake nodes and the Pod created for that by the god. Controller will eventually land on those fake nodes. The Pod can go into running State because those are actually fake odds uh and and not running real compute behind it.

C

uh Although there is an interaction between the word launcher pod and the uh word Handler, the word Handler tries to talk to the word launcher pod and move moves forward. The state of the vmis just like how cubelet does so when I tried this. What ended up happening is the VMI started. What well the VMI was created. Word controller created a fake pod for it the part went to running state, but the VMI was stuck in schedule, State because there is no word Handler running on the fake node to drive it forward.

C

So just like how there are these stages for node and pod I think we would have to add an extension for VMI to move these stages forward, that is, to move the BMI from scheduled to uh running State, and once we have that functionality.

C

We can leverage this to create a constant number of fake vmis in a cluster, so that adds the scale to it and then you can create well, you can extrapolate how keyword is performing by actually running real vmis uh on on real cluster, so this reduces the need for real Hardware by a factor of at least half right.

C

Depending on the test configuration we can scale down the actual hardware for the performance test.

C

Yeah, so those are the topics I I wanted to discuss. Unfortunately, I don't have a demo for it, but I can Target next week's meeting to create a small demo and share with the community how this project works.

B

It's pretty cool, so the so fake hurt handlers. One thing you've identified so this we would need a um so we'd say it. We call it a fake VMI, which is really a a fake for launcher running in a fake pod. So is it like, since it has no run time, I guess what we could do is like what we were saying. Is they just kind of the vert Handler, since when is the pathways right?

B

It's like the launcher has a bunch of signals and sends back to the Handler I guess what we just need to do is have the events we just sort of faked the whole thing we have um as if the events already happened and um and just send the information, and basically we just update the object. This is I think what we do. Yeah yeah yeah the other option would be like well, this is the hard part. This is the harder one.

B

Is that if, if there was a, if you had the ability to run a real vert, Handler and I, think this this could be possible because technically what's happening is the launcher? Is communicating over grpc and then there's a bunch of there's a domain socket on there for events with um with the guest, and so you could communicate over the socket do fake all events being sent back to a real word Handler and it could do it as well. There's um there might be two approaches in terms of how we could do it.

B

I guess depends on the environment that we um like, if you have, if you have, for example, if you have a node, a real node and you have a real handle, you could do the fake, fake launcher method. If you have, if you want to go with the full, fake, node and fake VMI method, you could go, um you could go that as well. Yeah, I guess the difference would be is like one test of Our Land one test. The word handle the other. One are both of them test the control plane.

B

One also tests the rectangular.

C

Correct, yes, so I I think to elaborate the second part right, the the one where you have real node. It tests the scaling capability of word, Handler Handler, without the limited without cubelet in the picture. So because we have this fake power cubelet, we will first have to figure out how to not race against the cubelet and assume that cubelet doesn't uh intervene on running that fake pod, and once you have that you can understand how word Handler can scale independent of cubelet.

C

uh So that's a value and then the second part is the entire thing is fake and you just test the control plane as in the API server, as well as the word controller, scalability, so yeah.

C

um The third approach I had in mind was that run the run. The real word Handler on a different node, but ask it to reconcile World launcher pods on on this of this fake. Now.

C

That could also be possible, since bot Handler will not have a real pod.

B

So you'd have to the controller you have to write here would have to be on the same note as the Handler and then it would have to send over all the connections over the local socket.

B

So it could be so like it'd just be lying, so like it'd, be, um it would be, um it would just have to um well okay, so there's some there I might be a little more complicated than that, but you would have to um that would be the gist of it is like you have to sort of lie where, where this, when a connection is coming from.

C

So I guess that part is common with the so that that part is common with a real word, Handler right right. So, regardless of a real word, Handler, reconciling a fake pod on on the real node or a fake pod on on a fake node, we would have to send those signals from somewhere.

C

So I I think it's yeah you're right like it's, it's a little bit complicated, but these are the three options we have we'll have to go. Think through the stack and understand what components needed to be set up.

B

Yeah, that's cool, I, think that that makes sense, I think that's definitely evaluate these I think this is. It should be cool I'm like thinking to myself, like whether you're talking about this. You know like we have this um since this runs in a cluster.

B

um You could probably easily throw this into. um You know the make cluster up and see how this how this does um or even if we were to go, we have this performance cluster.

B

um It would be interesting to see like you know, if we can sort of extend ourselves a little bit past the physical hardware and see how what approximations we find and since that cluster is a little larger. So that's cool that sounds like it would be really cool to see.

C

Yeah yeah uh I think the easiest part would be to assume the fake node and the fake part combination and just try to do the BMI objects from a controller. That might be something we can realize um sooner than the other parts sure so yeah. We can try that.

A

Cool okay, yeah, that sounds cool.

C

Okay, yeah, that's all I have thanks thanks for the time.

B

Sure thanks a lot: okay,.

A

um Share again.

B

Okay, cool yeah thanks a lot of light and uh I took some notes here and so next time we'll see from. If we get a chance that the demo we can name I wanted to show us whatever you got uh for next time: cool. Okay, um all right, I, don't have any more topics. So is there anything else and we still got more time? Is there anything else? People want to bring up and discuss.

C

Hey Ryan um I was a little bit interested in getting an overview of that CI health and um the the graph that was um that is constructed on the the main page.

C

I wondered if, though, the graph data is coming from uh churning the like, is it coming from processing the the project UI or.

B

C

B

Don't understand how this works. It's, like so I mean we can look at I, mean.

A

B

Let's see, this is a link to a grafana void. What is this free? uh We test some words, so I guess this is coming from grafana.

A

Ca Health output results.

A

Let me see what this one's going from.

A

B

So yeah I mean there's a bunch of things. I I, don't know much I, don't know where this is coming from. I guess.

C

Yeah, um the reason why I was curious is that it will be good for, like, as you said, for our scale jobs. It would be good to numb some kind of data in the um in the graph format. So we have this historical collection over time and I was curious like if we can follow the same method to do this. um So that's why I was looking at where to go, poke.

B

Yeah, absolutely I I was hoping to hear a little more from Google because um discussion and they had an idea because there's some automation here, like that's happening like like how does this get here? I- have no idea like there's a keyword bot that does this I, don't know where the keyboard it's wired up, so that it is able to pull this data um because I I thought there was a way to pull this from prowl and I thought. That's what was going to be going on here and that's.

B

What I was hoping is like, because we already have like, like in our jobs like right at the at the ends of each shot. We have to like basically a format like this: it's just it's not in Json, but we could easily make a Json export it as an object and then and then we basically have everything we need and and then we just need to put it to a graph. So I mean we pretty much have the hard part down, and we just need to figure out how to wire this up.

B

So yeah I don't know I'm hoping to hear a little bit more from lubo and Daniel on. So we can get get started on this I said. I would definitely love to see this I mean this is this is great because I mean thinking about it. Like you know, we have a good visualization over time like this is something that when we look at our CI jobs, we we don't really have that we can like just you know, look at what we can only look at a few at a time.

B

So it's hard to see you know over. This is over two years three years or so so be nice to have the same kind of format.

C

Okay, thank you. Yeah I'll follow that right.

B

Yeah, there's it's on it's on keyboard, Dev and um I can tell you as well. Okay, okay, all right! um Are there any more topics anywhere things people want to bring up.

A

Sounds like no okay.

B

All right, well, um just a reminder: please add yourself: it's on 10B, since I we've had some people talk with like that yourself, some attendees, because it's important to see the attendance-based meeting. Okay, thank you. Everyone, bye-bye.