KubeVirt SIG Performance and Scale, 14 Oct 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: SIG - Performance and scale 2021-10-14

Description

Meeting Notes: https://docs.google.com/document/d/1d_b2o05FfBG37VwlC2Z1ZArnT9-_AEJoQTe7iKaQZ6I/edit#heading=h.phpt2kytr3mt

A

Okay, welcome to sixth scale, it's 10 10 14 october 14th. um The link to the document is uh is in the chat at yourself as an attendee um and you're creating okay. um So, let's get started, it looks like so. The first thing is the thing that I added um all right. Let me uh it's pretty large, um there's some snippets in here. So um if you want to add agenda items, just add it on top here. Just so I don't miss it, um but we can start with this.

A

So um I was looking at a little bit of tracing and they are doing it some trade, twos and tracing in the code. The problem I was actually looking looking at was trying to figure out. um You know, what's going on between the time, the transition time between scheduling and scheduled. I see a lot of time gets taken up in this this area and um I wanted to do a little bit of tracing. So I found this library um the uh kubernetes api server actually uses it.

A

um There may be other places it's used, but um it's it was pretty easy to to kind of to take advantage of and add to the code.

A

So basically, the way it works is like, if you, uh if you um add a you, get some some traces that if they go over a certain amount of time, it'll just print to the um to the log, um the amount of time from when you started the trace to when you um when it stopped- and you actually have some cool things like uh you- can add steps in there um and then it takes the difference between the steps and it ends up printing them out, and so I, what I did was I actually added.

A

um I looked in the controller since that's where I'd expect to see the most of the time between scheduled scheduling and schedule to take- um and I found something- that's kind of weird to me so to kind of explain, um so I don't have the full picture yet, but to kind of explain what I'm doing is that I have this thing called queue and I have a count.

A

Basically, um I can I'll show you the code, maybe even easier to do. um Is I um I added this um right in the execute. Can you see my terminal by the way, making sure I'm showing everything yep? Okay? So I do is, I add, this um I'd start a trace with a key in this execute function, um and uh I step uh I'll step if I re-cue I'll just record a step and then I stop when we do after we do the forget um and the uh when I do the recording I actually take.

A

um I have a count along with it that will increase whenever.

A

Every time that the the key gets seen on the cue again, I just increment it uh over and over and over again and record the time in between each of the um uh the last step and this step right here, which is the um the key with the count. So that's how this comes out. You can see like q1, two there's there's a three in there. It's just too fast.

A

The way this records it is, I might anything, less than a millisecond gets thrown out and sometimes they're just too fast, so it's 4, 6 7 8 9, 10 11, and so you can see the total time it took to go through this to get scheduled 52 seconds.

A

I actually went on the object and looked at it and it's it's pretty accurate in terms of the way it um in terms of the total time, but the thing that was weird- and I saw this and pretty much every vmi- that I looked right around the ninth time, the eighth time somewhere around there- that this object goes through the queue you can see this.

A

uh This is a large number. This is 46 seconds and then here's another one, q8 eighth time to the q. We have 34 seconds the total times, 39.

B

Where significant.

A

Amount of time yeah what.

B

Controller, what's this again that you this.

A

Is yeah? This is invert controller.

A

Oh sorry is that is that any question, so this is in the um yeah. This is in the vert controller. It's in the um uh it's in the watch. um The vmi execute loop.

B

Yeah, okay, so you have a you start the trace the first time you see the key and you uh end the trace once it's been forgotten and if there's an error, I guess that's what I'm interested in the most there's an error. Then we just do a step and do you know if, like the 43 second, whatever gap trace if there were errors that occurred during that.

A

I don't so so do we call execute here. I should see them um here in this. I should catch them here and um I don't see any. um I don't see anything occur, uh the so 43. So the only thing I get that I guess you could see like it's so between q8 and q9 is update. Status is has a step in there, but it's very quick. It ends almost instantly and we go right to q9.

A

So my thought is that we're waiting like on the queue which is but that's a long time, 46 seconds.

B

So we're saying that the 46 seconds is the time spent executing a key. Is that accurate.

A

Yeah, it's it's! It's the so this is well okay. The the this this total time would be the time it takes to um to move from when we first saw the key to when we finished processing it. The time between the cues would be um when, like, yes, like you said, executing we're doing work on a key, it's popped off the cue and we're we're in execute.

B

46 seconds to execute a key so do the prometheus metrics um show like. I would expect the uh we had some sort of metric around uh work, queue, duration or something like that. Maybe we don't, but uh I would expect that to show spikes like this as well. If this is occurring, this is a really unexpected 46 seconds, invert controller.

B

A

Yeah, this is weird, and what I find with this is it's everywhere um like if I do.

A

If I go back- and this is just all I did is I did this with just um a big cluster up, so you can try this. I can give you the patch. If you want.

B

A

B

That, uh if you could put that patch or your branch uh in the notes or something that'd, be helpful sure, like my first instincts when I'm hearing this, is that there's something uh unexpected happening with your patch and less likely with vert controllers. This sounds crazy.

A

Okay yeah like here here, it is again like right on q9 right around the middle.

A

There's again, actually that's the same one, that's the same dm here's again, q8 on the 11th vm that I do yeah it just kind of shows up everywhere. Nine.

A

At the end, q9.

A

Yeah, I don't know I it's it's weird, I'm still trying to figure out because there's something yeah, there's just something bizarre with um what I expected, but the beside, even even though like while I still invest. I'm gonna still investigate this, but I don't know I want to see if there's any thoughts around this, but I think that what's cool, though, is even actually this. This library is really easy to integrate. It might be something um I don't know if it makes sense like if you want to do it in.

A

If you could do this in blogging or something, but I don't know, I find this to be valuable like if we I could set the threshold to anything um like one second, um which would probably be more reasonable and we can actually see all the steps it takes for that or over one. Second, um it might be something easy we can do to improve tracing. I don't know how this would integrate with other tools, but like jager and stuff, but I don't know it seemed like a pretty serviceable, easy on-ramp to to get some information.

B

A

Seems more practical.

B

To me than some of those other more advanced tools for what we're trying to see.

C

Yeah, it looks very interesting.

A

C

Think I've seen in the past some sometimes messages like this when, when the stores were not fast enough to catch up with watch events coming in so yeah.

A

Yeah and, like um you said, dave like there, we do see some of this like with the longest running. um Was it like remaining work or something like the the metrics for this? Like we see that the I think they're, I don't know if I have them somewhere here, marcelo's pictures, but we do see some that that you see pretty often that there are ones that are fairly long. I think where we see him in, like we've seen 10 minutes.

A

I think, if I remember correctly, we see some very long ones if I can find his um previous documented somewhere in here, but.

B

Yeah, that's so the thing that's surprising to me about this is that vert controller isn't doing anything, that's blocking. I mean it makes some api calls and things like that, but I think they have deadlines and under normal operation like we're, talking milliseconds for those, uh so for this to be causing 45 second or 43 whatever it is delays during x. I can't think of anything that we do. That would come. That's very strange.

C

And- and you could really see also on the vms- that somewhere hanging on startup or.

A

Let's look at them like.

B

That sounds like the amount of time for scheduling or something like that um to be reflected, and even that's high for for some clusters. So time.

B

A

This is uh about a minute right, yeah, but this so the scheduling step is pretty fast scheduled right. Okay, so it's all the way to running, but here is a second yeah yeah.

C

And then we have normal yeah, but then we have more than a minute. I don't know it's scheduling. As you said, it's a scheduling, phase yeah.

A

It's so it's from it's from here to.

C

Here, yeah, but so you, the pod, was probably waiting for a place to get scheduled or something sure.

B

That's totally normal a second there. I don't even would never even consider that being a problem.

C

So where did was this? The um no yeah yeah so so pending to scheduling is one second right.

B

No right pending to scheduling is oh, it is the second millisecond.

C

Until the part is up and drop, I think it takes more than one minute, which means scheduling, starting and ready reporting from the port. That's what you see here combined?

C

Yes, so if the pod needs some time until the image is pulled longer than usual, or if it takes some time until it gets scheduled, this can easily be a big number. I just wonder now from the tracing perspective.

C

If, from the tracy's perspective, it would look like it is really in the controller loop this time, but here it looks like it's more the stuff where they were waiting to get scheduled or something could you check the the pod itself yeah there you can see when it got ready and when it got created. That's also interesting for us.

C

So, let's go up to the creation time.

C

So it got created, yeah yeah, that's not fine, green enough.

C

What are you looking for here? Roman? I just want to see the creation time and until it's there reading this time step. So it's started at 9, 39 15 and oh, that's in a container.

C

I wanted to see the create time of the part itself.

B

It's in the metadata yeah.

C

It's on the top.

C

Here yeah and then yeah 14.

C

right yeah, so it gets immediately scheduled. But then I see that the init port above just finished way later.

A

So in a container.

B

All this is asynchronous to the controller loop, though that's.

C

The image container disk so 940 here you see the difference, so it's and it took it so long until the init container was done or even started that shouldn't be responding to this trace.

C

Okay, I agree, but this explains the long time from the timestamps and the vmi was captured right. Yes, so um so what I no wonder, I mean there is a huge number in the trace. The trace is even bigger than or as big as we would we see here, but this as we can see, it's not reflected in the timestamps here that there would be an additional delay where nothing happens in between that's what I.

A

Wanted to see so this is a this is so this is a minute um yeah, okay and then um that's still within that 40-second window. So what what would we say we're doing we're waiting we're just waiting for this right here? This has to finish. I.

C

Don't know what he did with the tracing code, I would have looked there, but here it just looks like it took pretty long until the disc under the container. This was pulled and started. If you could look at the describe again, we could probably see it again go to the events at the bottom.

C

Here we see that the container yeah it started pulling with 36 minutes ago and 35 minutes ago. It successfully pulled it. It took one minute: paul took one minute 15 for that.

A

Well, I see okay.

B

That doesn't explain the trait.

C

Yeah, so I wonder now, since from the time perspective it looks.

D

Like we're covered here.

C

So I wonder if there's something wrong with the tracing statements, yeah yeah, that's what I wanted to get it.

C

Because tracing should just capture how long we are in the code in the controller.

A

Oh, so you think so you think basically, the tracing is capturing. What then, I'm capturing too much here.

B

We don't know we need to see that I don't know yeah.

C

But it looks like it because I can see here on the timestamps on the parts and the vmis, no hints that for 30 or 40 seconds, absolutely nothing happens in the controller and it was stuck just had a normal startup flow where just a container took a long time.

A

Yeah and this in this thing, you have to pull the container discs every time, um or is this like this is this is like like this? This already exists like I've. Already, this shouldn't, like I have like I just launched 20 of these. This is what's what's doing this work like. Why is this listening.

C

A

C

So I mean, if it's already in the node and it's tagged, so it should. uh If not, you should use, if not present, with the pull.

A

So, like I shouldn't see this on any other notes right, I should not see a successfully pulled image like at a minute 15.

C

How does it locks look for another vm, which is on the same note? I mean only the first one pulls it, of course, and only that one gets the deleted. So maybe it was just a matter of the first vm which had to go to the next node at this delay, because it had to pull the image.

A

Yeah, I don't understand that, oh, why would yeah yeah just how does make cluster up work? I don't know. Maybe I'm doing this with make question up, because okay.

C

Then it's pretty simple: normally it should, but I mean oh no, no just this, not your oh ah yeah you're, just pulling it normally so the first time you schedule the vm it gets pulled, but only the first time.

A

They're all the same image, though.

C

And if you go down, it already says container image or res present on the machine below, so it took more than one minute to find out that everything is already here. uh I know sorry, the first pull is from the init container. The second message is for the for, for the container later on, which uses it too.

A

C

Looks to me like.

C

Was it? Is it still the same? No, that's another. Is it another one which was slow or.

A

This is a different vm. It's same image, different vm.

C

Yeah, but is this also one where you saw a delay? I would.

A

Yeah yeah, I still I'm here we can go back to the light and I see this on on all of them. Oh you see, zone, okay,.

C

Okay, it looks like for whatever reason in your case, it has issues pulling the image.

A

Okay, yeah, let me see it's not here's. Nine.

A

Nine there's 62.

A

um Yeah I mean I see it on all of them. There's five 55.

A

Seven's got.

C

Okay, yeah, I would just check in the code if you are.

C

Yeah 50 seconds, if you are capturing too much here yet okay and verify and find out why the image pull takes long or right even tricks.

A

I might try this on a different. I might try this on a different environment, because I I don't- I don't understand that either um okay I'll play around with this, it's kind of interesting, but I anyway that I just kind of found that pattern. I thought that was that was weird I can share with you guys. Also after um I'll show you the patch and if you guys want to play around with it um or we can share whatever okay um all right. We have no other topics.

A

If we don't have any topics we can, I can just share the patch people want to try it. We can do that.

A

What if people uh do we have any other things before we go down that that red hole.

A

Okay uh hold on, can we do we can review some of these or something first, so we have. um Let's see we talked a little bit about this last time. There's some updates from last time.

A

That uh so, if you do some, if you restart the uh the vert controller, you can actually you can see this. Here's a here's, a picture of the um the metrics in prometheus, so you can see the um the label gets dropped off. There's no.

A

What are we looking for? I think it's um phase yeah there. It is there's no phase that shows up and what I noticed recently is that if you actually go through and delete a bunch of these all sudden the face starts showing up again and then they get deleted. But um it's almost like an event.

A

It gets uh to reattach. The label gets repicked up.

A

So I'm not sure what watch is it or what would uh maybe like the metric? That is um that does the labeling is.

A

Is uh we lose it somewhere when our controller restarts and then when we, when another event occurs, we we find it again. We find the vmi again, we label it based on its current phase.

B

So it just happens when the update occurs.

A

Yeah, I I also saw it when the panic happened that panic issue too there's so I'm kind of relating it to the being. The controller was starting.

B

I wonder if we are allowing metrics to be scraped before we've had everything sync or something like that.

A

I know okay, all right still need some more triage. Then.

E

I have a question about that: yeah.

A

E

Long, how long does it take for the label to reattach.

E

A

So it will, um what I've seen is that it will actually like it won't ever reattach, um if you say we're just to have a bunch of running the eyes in the zone. But if you just let it let this the vmix sit there, it doesn't, it doesn't ever reattach, um but for some reason when I was did a delete, I noticed that a few of them did start to reattach. um So my thought was that maybe an event causes it to relocate the object and reattach the label.

A

Not sure, though,.

A

Okay, um let's see um profiling under high load. um There's two moss here: just wonder how this is going.

D

A

A

um Let's see tomas, do you have an update on on I'm missing how it's going.

D

Yes, so there's open pr. uh I've addressed comments today from janusz and david and I'm just waiting for response and hope to have it merged soon. Okay, yeah, I haven't done the profiling on the live screen. I mean I did a bit, but I don't have the precise results. uh I hope to have it ready in the next couple days. So maybe I can share in the next couple of weeks for at the first glance, it looks like we spent a lot of time marshalling and martialing data.

D

uh I've looked a bit into a code and it seems like sometimes we do it unnecessarily, meaning, for instance, we just do it solely for uh for type conversion.

D

Additionally, we're using standard encodings, slash, json library and they're like more efficient libraries to do martial, link and marshalling which both use less cpu. They just do less operations and they use less memory, so yeah. So I'll. Try to see how much of an improvement we can get by simply replacing the dependency okay.

A

Okay, cool all right, thanks, tomas.

A

I don't know so we don't. I don't think we have marcelo, um but we still have some open items here on the um the previous experience that experiments that marcel did um okay, so we said we're gonna profile, that'll be the next step for this one and then.

A

Yeah and this one we want to talk about again. We don't have ourselves for this.

A

Okay, all right, those were the bugs. Are there any other um features or pr's out there that we want to look at? um I saw david's uh request merged, and then we also have david. You have the um you, have the change that for ci that went in. Do we have like any data at this point from the uh from the um uh from ci gathering uh towards any of the thresholds or.

B

We should yeah, I haven't looked at those periodics, but it's a file that is stored as an artifact or it should be so we can begin. um I mean we could use that today to come up with some thresholds.

A

What's uh is it public? Is it something we can look at now or like do? We need to move it around.

B

I think roman is that public. Can anyone look at the prow artifacts.

C

Yeah yeah anyone can, I just don't know which chocolate you're talking about.

B

uh So we would want to look at the performance or the density test. Prow periodic.

D

And we would want.

B

To look at the artifacts there to find the um the perf audit results.

B

C

Have are there's sick performance. Now I found the tests and older. They are failing the the lanes.

C

Let's see, I'm just showing the link just a moment, stitch other stages, so anyone can just look and search for the preftests and.

C

Let's see if it ran the test, at least the tests ran won't passed. I don't know why it's ah too bad. It can't find the perfscale audit tool.

C

So it runs everything and then it can't collect the metrics right now. Oh there yeah yeah the puff skill audit file. Oh the config is there, but no data.

B

What do you mean the config.

C

Is there yeah the the config is also shared uh stored as an artifact? If you look here in the file.

C

The only store artifact is the conflict jason and when you look at the test lock, you will see that at the end it complains about missing.

B

So this is built from the latest in the main branch because we're supposed to be building perforat by default. Now I thought.

C

It's probably just a minor mistake on binary is or something that would be. My expectation.

C

Or the binary is not explicitly called before, that's is there.

B

A way to execute this periodic on a pr to make sure that, when we fix it, that we.

C

You have to duplicate it as a pre-submit as an optional one. You have to do what so you have to take the whole trap definition copied over to the period to the pre-submit jobs.

C

So you have to make you have to copy the actual content you're executing to a new project which is a pre-submit an optional one.

B

By being optional, does that mean that it always runs, or is it something that we can you.

C

B

That so you, you probably want to make it.

C

Optional and and only run when triggered that's configurable, and then then I would do a dash test and and the name and then yeah, yeah, okay and you can even write something like dash test help.

C

This is a job it would, it will not find and then it will show you all defined jobs for the pr. So you can find the name too.

B

Okay, I'll try to look into that.

A

Okay, so does that need to change? Does that mean you make a change.

C

But wait there was a performance real time lane. Is it doing a similar thing.

B

I don't know what that is.

C

Yeah it does, I I think it does some. Oh, the real-time one is testing the guest performance. That's something different.

C

So how data right now, until they fix the job.

A

Okay, all right, okay, so we need to make.

D

A

There, okay, all right, maybe next time we'll have- uh we can do a report for that yeah um all right. I pushed my changes. I I mean well actually before we do that, do we do we have any other topics. Do you want to discuss.

A

Okay, all right um also could people add themselves as attendees. I've heard a bunch of people talk, and I only see two people yeah. I just wasn't attending, because we do just just to show that people are here all right. I did push these changes here. I want to look at them. Let me see I'll link them.

A

All right, well, there's a link to the branch, um try it um so I I guess that for this, um let me I'll play around some more and see if I can figure out um what this is new. There's a mistake in here or something but um like I want to see what's going on and then and then either way, um depending on what what I find in this, I kind of want to see I'll, I'm going to maybe come with proposal or something and how we could add this or what's a reasonable way.

A

We could add this. um I think this is pretty convenient and maybe just a very simple tracing and maybe start with the vert controller or something um maybe it doesn't even have to be configurable at first, but just something simple: we can post in the logs. I don't think it'd be too verbose if we make that we pick some sort of reasonable time for, like the logging, maybe something like I don't know. We could do like a few seconds like five seconds or something.

A

I don't expect things to take that long and then maybe we can output those just so we have them. Whenever they do occur,.

C

A

Okay, okay, all right! Well, I don't have any other issues. I think we can close early. Then if we don't, uh we have no more discussions or anything and then um yeah check this out. um Well I'll message you guys and slack uh david afterwards.

A

We can. I don't know if you have comments on that. Let me know if you see something wrong with it about I'm gonna keep playing around and see. If I can figure out what this is. Okay, all right, everybody, I think we'll close early. Then thanks.

B

A