KubeVirt SIG Performance and Scale, 15 Jul 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: SIG - Performance and scale 2021-07-15

Description

Meeting Notes: https://docs.google.com/document/d/1d_b2o05FfBG37VwlC2Z1ZArnT9-_AEJoQTe7iKaQZ6I/edit#heading=h.u0jl6a1n3b1m

A

Okay, uh welcome to sixth scale july 15th. um I added the um our notes to the uh to the chat um and yourself as an attendee and uh please add items to the agenda. um So um let me see I added a few, um so I've been tracking a few things. I think there are a few pr's, um a few new pr's that we can. We can talk about and highlight so please add those if you want to talk about them um all right. So why don't?

A

We start off with the first thing there uh five eight five, eight three five, which I think is marcelo.

B

Right so I reduced the pr as we discussed before I talked to roman, and it's still only one thing I well, I think it's it's very close to what brahman was expecting now.

B

There is only one thing um that it's to configure prometheus to add some special labels, but maybe we can keep this pr as we discuss like as simple as possible, and we can include this prometheus label later. You know in the future.

A

We're seeing roman okay, so the one of you uh marcelo. Can you summarize, then, where we are with the the pr, um because I'm not caught up so like what is it? uh What do we have right now in here? uh What's gonna do right now.

B

Right, so what we have now is it creates 100, vms and waits to be created, and that's all.

A

A

Okay, we create 100, vms and wait okay, gotcha and does this? um Is this like reflected here in the in ci somewhere in one of these, or is it? um Is this something we just have to enable? Oh, it is okay, which one of these jobs is it just so I know.

B

No, it's it's not uh it's, not in the cia, yet because, uh okay.

A

We're, including.

B

The ci only when it gets merged so.

A

Okay, so they'll eventually be like a like a because I see like a bunch of sig.

B

Exactly some some.

A

Storage, so we would have like a sixth scale or a stick. Perf thing like.

B

A

Okay, great okay, good start, okay, so there anything else, so we want to do with uh with this everyone. Okay, with five eight three five looks like it's um everything's passed and do we wanna go ahead and um and lgt on this? Are people comfortable with that or do we do another pass on this.

B

If someone is talking, maybe it's muted.

A

Yeah, what roman you've done a bunch of reviews on this? Is this good with you? Are you good with this yeah.

C

Yeah, it's pretty good uh yeah. uh As I said before, um the the prometheus level over like marcelo, said the prometheus label thing must not necessarily it's not necessarily there or must not have to be there. That's what I wanted to say: okay, so uh I'm just giving you the last pass.

C

Okay and anyone else is of course, free to do.

D

That too, what's the prometheus label just make sure I'm on the.

C

Same page, it's pretty easy with a relabel config. To just add to every metric which gets collected during the test is run a label with the test id to make it very easy to identify.

C

Where it was run, then you don't need to know from which time to which time a test ran or something you just select based on the label, and you have all the metrics which are relevant for it.

E

Oh interesting, okay, so if we're looking at that perf audit tool that I I made, I would need to um if I wanted to support that, I would need to add a label. I guess.

C

What's interesting right now, mostly for it is that when we run when we start running a second test, for instance, or a third test, that uh the global metrics collection can easily distinguish them, then for creating dashboards and so on.

C

Okay, but it's of course, for david's tool that can be used too.

A

Okay, marcel did you want to talk about at all about this? Let you link here this doc.

B

Yeah, if you can open it, just some quick- uh you know comment about that, so I run some well. I would not say per escape test. It's only 300 vms that I could create. um So, first of all, it's only 300 because I have three nodes and there is a limit of 110 pods and then I increase that to actually 200 something pods, but our convert also has needs to be configured to allow to create more vms in the in the node, so more than 110 vms.

B

In the know, I didn't try to do that, so I will play that later. I don't know if the convert ci has some. um You know easy metric, some environment variable that we could configure that, if not maybe it's worth you know, including that it will make it easier to deploy, convert with enabling more vms, deploy more vms per node, because, as we mentioned, we want to create a much more vms per node for our escape test um yeah.

B

This is something that uh we need to look um and uh then I run like this test. I create this dashboard. Maybe I can share it. uh I don't know where, with you guys, uh it's graph on the dashboard, it's pretty much. It's focus on covert, so it's with the work kill. um I oh it's. I think we forgot to mention here. uh We have, uh I enable well, it was something it. There was like some other um logic to have the work q metrics.

B

I just did some fine tunes to expose it. Now we can see all the work queue metrics from the other controllers,.

A

So do you do you want to walk through each of these I mean we can, even if you have the dashboard around and be cool even to see like you share your screen and even go through some of these. If people can't see this, I mean I can try and zoom in or something, but do you want to walk through like each of these, like talk a little bit about like what we're seeing.

B

A

uh What tests you did too.

B

Right, maybe just uh this first one, it's more interesting. So uh first thing is: I did two tests one so.

A

One thing so this is this is so this is, you said three nodes, 300 vms right, that's what that's! What this all is for.

B

Yes, the description is before so, but yeah yeah.

A

Okay, I just want a pair all right. Sorry go ahead.

B

Yeah and then I create like um many uh different scale tests starting from 10, then 20 and I put like a slip of five minutes between each run. That's why you see these bumps and it will be like 10, vms, 20, vms 30, vms. If you throw a little bit uh in the upper part, no, the yeah, it's 10, 20, 40 and so on up to three so.

A

This is this crate. 10 grams then create 20 grams or increase.

B

A

Okay, so you're creating 10 and then you're deleting 10 and then creating 20 doing 20.

F

Exactly okay got.

A

It okay, so we can.

F

D

A

This that the like each test, we have the space in between and we're seeing an impact, but.

B

Do you see that it's the 400 uh code from the api request, so it's page not found it's increasing a lot uh and uh it surprised me just isn't it so it's calling something.

B

That's it's giving the page not found for the convert api, not sure what is because the the monometry doesn't tell that, but it's it's. Maybe this can be um related, didn't it. So what uh to the performance? I don't know it's just it's increasing a lot just uh page not found you know, uh requests.

A

Okay, interesting.

E

Do you have a vmi definition that you is that in this report, like the vmi definition that you used.

B

A

Okay, use it further down I'll, take a look at them all right, thanks, okay,.

B

G

B

Run before the fix for the five, the 589, you know uh things that the the david's pr did, and I just just to confirm that. Actually I don't see any any five, you know 509, you know calls anymore. It's the right code, request that I say in there. It wasn't the put request. The 500. and.

A

It was like what it like. What is this like? What's what's.

B

This is the same. I just I just increased. You know uh the first graph. Oh you mean. What's me, reading yeah.

A

B

Yeah, let's read.

A

B

I have you know all these things. I have a table of metrics in the end of the document, so.

F

B

Yeah just to be easier.

F

B

Navigate so it's the api requests for the convert group and now the list get and watch then, when we have like right, it's the put. You know pulsed patch, the beat okay, so I it's pretty much inspired in how kubernetes it has this.

B

uh You know dashboard for it their api, so I just include a few more things here for that, so the vmi creation, as we already discussed you guys already mentioned about that, it's when we increase we push the system and spend more time in the scheduling so and but now we can see like which work queue.

B

That is, uh you know, increasing with actually the vert controller bmi and, and then one thing that it's interesting also is with a work. Queue is adding more, you know, events which actually the disruption budget. I don't know what this controller is doing, but it seems to be the most intensive one, and maybe it's.

A

Anyone does anyone know what what do people think of that.

E

Yeah, that's surprising, um so the destruction budget controller is uh adding a disruption budget for every vmi. That's a mixer strategy to live migrate to ensure that the pod, the vmipod can't be evicted or just killed um and we're hooking into that to cause a lot of migration, uh and that does track with what you're saying with 404s and things like that. It's probably likely related to that disruption. Budget, controller.

G

A

Someone uh mind like writing a few notes, while I um about some what we're seeing while I'm like sharing sharing over here, because, like I like this, is interesting. We have. This is an interesting correlation. Like all these crafts, they all look. The same. We've got more key depth increasing and almost at the same rate as all these things are increasing. We've got also got this over here. It's kind of interesting.

E

um Well, every iteration adds more vms right, yep yep, so it's yeah. It's.

A

Literally, at least.

E

Like I'm saying, yeah.

A

E

There's an exponent that that's the quality would make our exponential that work key length or depth. One.

A

Yeah well, it's interesting to see like you have here are you've got we've got handler working depth, that's not increasing, or is this yeah then we've got the controllers kind of exploding fairly quickly. um I don't know I mean it's hard to say like what we got. A bunch of we've got a ton of data here, um what it is, but there's like wait. What would be our expectation? I guess that would be the question like we would. We would like to see. We would like to see very small increases.

B

Work creation time it shouldn't be like well, it's increasing too much need like from one second to three, almost three and one minute from almost three minutes per vm. So I think it's too much, especially because it's not too much it's not a lot of yams. It's only 300.

C

I guess it all goes back to what uh ryan's colleague said regarding to the queries per second in the client red, so it completely makes sense from the perspective and we'll see that on everything I mean it's, the the 404s are indeed very surprising as you as already said, but apart from that it looks like all goes down to. We have many objects. We try to create a lot of objects, but uh we are readily making ourselves.

C

Did you have a chance to look at the client matrix from the rate limiter? No, I.

A

C

Oh sorry, you're.

A

Talking to him, oh sorry,.

B

No, I didn't check that so, but we can we. I can try to see uh another thing that it's interesting. If you go stealing the work you they go below yeah, you go a little bit more down yeah, so the work you have also just unfinished work and longest running process. Although it looks like similar, they represent different things, so the work q latency- it's it's interesting also is we were talking about before how long it takes to process uh event? Isn't it in the work queue? It's almost constant tense.

B

You know less than 10 seconds so which I don't know if 10 seconds it's very high, but it looks fine for me. Wait.

E

We're talking about one key taking 10 seconds. What's that metric represent.

B

um Yeah you can go to the to the list. You know then the table just to see exactly the metric, but it's like the word. Recu latency yeah.

A

B

Guys they were queue, duration seconds, the histogram, so I I think it's yeah. It's.

A

The worst case of the worst case of a work, queue of or q key, is that what this is.

G

Yeah, that's, I think you just double check. What's exactly this metrics.

E

I I don't see that metric in the code base where.

B

Where does this come from? Oh, the queue, duration, work queue. Also, maybe it's because you need to find q duration seconds. Okay, yeah, the matrix. The name of the matter is generated. So, ah okay, guys.

A

Yeah, this is the group right and then bucket yeah.

B

Yeah, so it's how long as an item stays in the work queue before being requested. So it's the time that it's been waiting to be. uh uh You know how long it's it's stay, waiting in the queue.

A

So this is worst case, the worst case item in the queue, so we've got seven seconds in some cases.

B

Yes, this is the worst case, yeah exactly.

G

Average also just.

B

Okay and this unfinished work, that's it's something that we need to make to see. You know because the the unfinished work.

A

Yeah, what's the best work.

B

Yes, this is um so this is official metrics, okay. So it's not it's it's nothing that I came up from myself. So it's official metric from work to you so what they say that and they unfinished work. They say like how many seconds of work has done that it's in progress and hasn't been observed by the work you.

B

So it's something that it's processing um and uh and it's it's not uh you know, uh maybe uh what's my different dish, you guys can help with that. But it's uh it's something that it's processing and it's not being um finish so and then what they say here is a larger. A larger number of that means stuck threads straight threads. That are, you know some lock. I don't know something, but it's processing to chew is low and it's keeping here.

B

A

Would be a good example, I don't know if anyone thinks of an example of like you know if we had a key that we're doing, maybe maybe we're like reconciling or maybe we're trying to add change the state from scheduling to running or something and we're processing it and it gets stuck. Is that is that what this is like? Like.

E

For example, let's.

A

E

A

E

Api call and it hangs for five seconds that would that would probably cost this so work unfinished work. Second, so that that's saying how long it takes for a key to get popped and then uh like act at the end of it. Is it that gap that we're tracking.

B

Yeah, it seems to be we can. We can double check the code. You know just to make sure that, but uh it seems to be that yeah.

E

But because that it's very long than 10 seconds 10 seconds yeah, that's that's uh huge. We we would expect sub. um I would expect less than a second hey. Are we looking at p99 or p90? It's.

B

F

B

E

So that could happen if the api server just throttles us or so or or doesn't respond. Quick enough, I don't know, I'm hesitant yeah.

H

I think this is sounds like a symptom of rate limiter on the on the kubrick side, um because, like this, this number of seconds usually mean that that, when you make like this update or create call the rate limiting will will just throttle you for a couple of seconds. uh If you are making too many requests.

A

Yeah, this would be a good one to to get.

D

A

See like 50 or even even if we try to increase qps, to see how it affects this, that yeah.

B

That'd be interesting, yeah, well, the rate limit. I don't know. Let's, for example, when I create the vms, I put a slip to create you know of 300. I was doing that for this experiment, 300 milliseconds, before each creation it shouldn't, then I don't know my expectation. Is it shouldn't be true?

B

You know high pressure to introduce rate limit yeah.

C

But this is really just the client side rate limit and they said very low defaults like five queries per second and the burst of 20. yeah.

H

And the problem like doesn't.

C

Take much to hit it.

H

If you like doing this, this time, you're.

A

Hitting it here.

H

A

You're doing this, this.

H

These requests you, you also need to remember that there will be a reconcile, so you will have like five more requests. That will happen uh due to like updates right. So you may have this sleep every every creation, but then you need to also like take into consideration that um there will be more requests happening uh for for a single vmi.

H

So even though there you you are trying to to like you know like to to lessen the burden of on the api server, there will be still this this happening so and the qp is, as it was mentioned, is it's quite low? It's like five so and burst. I think default is ten, so it's like five requests per per. Second, it's like nothing right. So then, then the sleep comes in and it will like throttle for a couple of seconds. A single request right.

B

C

B

So it might be related.

C

To that then- and we don't if we don't increase the query per second, the graphs will always look like this. I think yeah so yeah is it.

B

Hard hold it or can you.

C

Yeah, no, it's well yeah. We have no configuration option right now. You have to it can be configured on the client construction, but we are not passing in that out right now, so it would have to be made config.

E

Variable so it sounds like we have to do that before any of this is going to be um measurable, for us.

D

E

Okay, so we have to do that immediately. All right.

A

E

This is another one.

A

Even two I'd want to see with your change david with the 409s. I would be interested to see like because we have collisions and other stuff that's more requests. I wonder what that graph looks like here without change. Well,.

B

I I have a wolf here so yeah this first one it's with with the change. If you go below.

A

B

I have this uh the second scenario.

A

So same same one,.

B

No, it's without the pr yeah.

A

Okay, um wait with that with or without david's yeah.

B

Jesus, we don't can.

E

You just summarize those results real quick, like uh did you see a notable improvement or not.

B

uh Not um not for uh the the the performance of the vm creation. I just see that the the four all zero was gone. So, okay.

E

So no more 409s yeah.

B

Sorry: okay, horrible lines: yes, sir: okay,.

E

So now we have the 404s to investigate that looks like the biggest.

E

And I I see that you have the vmi spec. uh Well, you describe what's in the vmi at the top, the vmi spec itself is that something you can share and additional that can we share these dashboards, like I mean actually commit them to code where we can import them and things like that into our own.

D

E

D

I don't know, what's going on here, like a share repo or something to this.

C

But the place for that would be, uh I think we discussed it already. A point would be a in project infra, where we already have other dashboards. I think okay.

E

A

This is that yeah. This is that the repo you shared, do you have the link somewhere just throw it in there we can and then marcel. Maybe you can push all these uh edits there and we can start going against it.

E

As soon as we can start reproducing this individually, I think we're gonna see a lot of progress get made, because you know what I hear of this: okay, the 404 uh issue, for example. I immediately want to figure out what's going on there with this dashboard, I can immediately if I can import this and I'll immediately, see results with my own stress testing.

B

Yeah definitely so, if you guys point me where to to put this, uh you know the source of this dashboard again, I can put there yeah so and uh just also we can see you know the things that are interesting. Just um uh it's the we were discussing before how long it's taking to to process a key. It's uh you know the the the right side of the unfinished. They stuck the unfinished work of the work queue we have now the histogram.

B

If, if you write, if you can yeah the longest run is process you see, this is the time that it's taking to process the key and it's I would say, 200 milliseconds, it's fast enough, isn't it so the problem is with the the task. There are, you know waiting for something, and maybe this is related to the 500. You know four, it's it's. You know waiting to get something some some calls and, and then it's waiting too much again.

B

I don't know which api which actually, which the call itself is the one that it's happening, that but okay.

A

These um yeah, all the I think we have like we have so much here, like all every single one of these like we, I see improvements everywhere, all right. So what about memory usage network like these? um Let's see you get increased fert handler usage.

B

It's increasing but as expected, and it's still like very low, you know cpu usage, it's it's slow and memory. It's I. I would say that it's everything is fine for resource usage for the control plane and the dtcg. You see also it's the the next one. It's about the tcg.

B

uh It's also fine foreign for what I would expect and the kubernetes also requests increase. But then it's very fast, you know to process things.

B

It's the the next, the next part, and there is the the whole cluster resource usage- is very low. You can check out in the report.

A

B

A

Is it's kind of weird that the davis jumps at the end, but the this.

B

Jumps, but it's very few, you see yeah it's because it's zoom, it's very few megabytes, that's increased! So oh.

A

Yeah, it's just like one: okay, that's yeah, okay, yeah! I see it. Okay, yeah, that's kind of really fun. Okay, the the handler over here, um so we're saying that it expects we expect it to grow memory usage as we get more vmis. Well, what's interesting about this, like so, let's say um uh I would be interested in this at scale like let's say we had, um because we're gonna hit a limit eventually right, a node is going to have a certain amount of vmis that it can that you can have on it.

A

It's just going to hit a limit. Eventually, I'm wondering um I'm wondering what will change at scale like if you know when we hit those limits. Do we do we still see like uh this graph increase? I.

B

Would say that this experiment it's hitting the limit because, as I mentioned right now, it's the limit- it's 110 vms per node, but we want to increase that for the test, of course, but I would say the normal situation right now. I don't know what's normal anyway. Well,.

A

So we'll give you example so like we we hit, this is 300. um Well, maybe 200 100 whatever it is. If this was one node, we would hit our node limit. We'd hit the limit for a node right here at 100 right because they'd all go to one node.

A

If we're doing 200- and we have two nodes- we'd hit the limit right here and 353 nodes, we're filling the whole thing. What I'm wondering is, like you know, based on scale, if we increase scale, we just kind of keep filling up our data center with vms. Does this graph continue to climb, because I would expect it to stay?

A

I expect all the handlers to eventually level at the same level, because they're all full. They eventually stop increasing right, because they're only watching the vms on one node and not increase yeah.

C

Yeah would be the expectation at least yeah.

A

So with it I mean we don't have enough vms to test that, but we're seeing a pattern of increasing. So I guess.

C

We don't have data, but that's something there's I think for seeing a pattern. I think it's enough to already to just create two or three nodes and run 100 grams there, and then you should see.

A

That yeah and same with cpu usage go routines. Well, yeah, I mean kind of like we eventually want to level up eventually, all these graphs level off somewhere. So that's, maybe something to think about. When we kind of do, we start doing some more scale, testing or something.

I

A

I

Do I read the go routines graph correctly that, even though we scale down completely, we still have a lot of leftover guru jeans running after every run,.

B

I wait five minutes but I don't know. I know.

I

B

I

Between is still growing, so we might have a leak somewhere right.

A

A

Yeah look at this right here I mean there's no, there's no vmis right here right and and look it's almost at 500.

B

Right, I don't know what did I fall because I.

I

Saw I saw I I looked at this a lot in the api when I did the streamings, the port forwarding, stuff and and cleaned up a few, but we, I don't know how many lego routines. We still have around right.

F

Yeah, well, it's it's a good point. It's it's interesting.

A

Okay, well, um I mean, I think, we've gotten through so this was so top one is with david's forward on change bottom ones. Without so I will what I want this like, um because I just we see a lot as a lot of things in here. um I can create issues around like all of them in terms of what we expect um from each of these scenarios or something or maybe marcel. You can help me with some of these, because um I I see like a ton of things that we could.

A

We could check here um each of these graphs, like maybe we can just outline for each of these graphs and grafina what our expectations are given. Given you know a test like this, and then we can try to see if we can work backwards and find what's going on and get it to the what we expect.

A

Okay, cool that looks awesome thanks for sharing that. Okay, um all right, all right I'll, do that. So let me.

A

Issues around uh okay, okay, we'll create a bunch of those, um and we can I'll track them in here. Okay, um so this is one pr. uh There is some others before we even go to this. Like, um let's see, I saw david, oh yeah, this one.

A

Let me pull down.

A

Dude, do you want to talk about this at all.

E

Yeah uh yeah I'll talk about it briefly, so, in addition to dashboards and things like that, uh I wanted to begin talking about uh or investigating a tool that we can run during a performance test that gives us like a report that lets us kind of compare like a baseline to a previous results.

E

So the idea would be to create a tool that would be able to pull the prometheus queries that we're interested in over the duration of the test, create a report out of that and then programmatically be able to compare that to previous results to determine if the delta is outside of our performance threshold, which would then fail. So we would, you know, take this a little bit further. If we continue with this direction, we would come up with a way of determining the delta.

E

I guess next of one report compared to the next and have thresholds, and that would be something that we could use for. Our periodic.

E

Yep cool- and I just had a really simple so right now, all it does is get the time to creation over a duration, so you could give it a start and end time or you could just give it a end time and duration, and it would figure out like how far back to go and things like that um we can have more queries there.

E

Whatever is important to us uh pretty much, this just kind of gives us a way of quickly adding queries and um and creating some sort of structured results um report out of it.

A

Cool okay, um okay, yeah folks, be sure to review that one um and were there any other uh prs that I missed here I don't know, but there was supposed those.

E

And then maybe just to talk about the tool of it further, maybe maybe we could get a little bit of discussion going on here. We have time. Does the approach that I'm taking here make sense to everyone like this is something that we should continue with and you know begin putting more behind it or uh I, I guess, look at that. Workflow that I have here um maybe go down a little bit in the example.

E

Just so you can see the whole thing: uh yeah, yeah, yeah, so um starting a tasking start time running your stress test, getting the the end time and then capturing results over that time period. Is that the way we want to do this, or is there a different way that people would be interested in capturing these results and reporting them? I just want to make sure that we're all uh kind of come to some sort of consensus that this makes sense for us.

B

You also have the durations in it like one hour before you know,.

E

Yeah, it's all it's all optional! So right the api has three ways of representing time. You can give it a start time and end time. It's going to capture over that period. You can give just an end time and duration. It's going to have the end time and then subtract that duration to calculate the start time. So you didn't want to do that or you can give it nothing and it'll just get like the last 10 minutes or something.

A

Yeah, so we're retroactively, we're looking back and we're scraping metrics um and we're getting data generating a report, the the other, um uh the other thing I I think I sent you that like we could also look at doing count like we could do. I think, that's like the only other metric I can think of like the number of items that we could look at um and doing like, um so that we could find like. I want the last 100, because I don't know, I don't know when that was run.

A

I just want the last 100. I just want a slice of um what happened previously. Yeah.

C

So that this is why I had a suggestion, in the other pr to use the prometheus label, which adds test id to the metrics that you can. Then.

B

B

Eventually, we can also include that in these two.

E

Yeah that would make.

B

E

Just maybe to experiment with that for a second that idea that would mean running this tool multiple times with different labels to capture the results uh for each label.

C

uh I see multiple ways you can could run it this way or you just say to it in this case. We know that the cluster is probably not running too long. You would just say: look at the metrics from the last day or whatever it can. If it knows what the test label metric looks like it can out, could auto, detect the tests and just should fetch them all and create reports without any further essay config all possible, but I mean it can be more or less smart. I don't know.

A

Yeah the direction, though I think what we were asking before, that that seems to make sense like we we're doing it after the fact um and and we're getting all the data like. I think our assumption was that we can do it after the fact, because we expect everything to be in previous and if we, um if we feel like we need to do it during, we have the possibility to do it later.

A

If we have a specific use case, but if everything is in prometheus, then we should have no problem getting everything after and so, like all those use cases we just that makes sense like I think they we can get along data. That way. So to me, this the structure makes sense.

A

Yeah we're getting to standard out so like we can from here we can like. I can run this as just a dev like I've got my I didn't make cluster up. I just committed a pr I tested it worked and I just want to check okay. What's my do I meet the performance thresholds um just like I would run. You know, make tests just to verify that all my unit tests pass or my functional test pass. So same kind of thing get that information, and then we can do all sorts of things with this later.

B

Yeah yeah the perf tests later you know the density and the other ones that we envisioned to create can actually call this tool and dump that you know in the job later. So anyone can can check that in in the pr. So it would be nice. Well, at least I think it would be nice. I don't know if you guys.

A

Yeah yeah, I think, there's.

I

A lot of things, I think you could even build a brow plug, and it would show these numbers on the build on on the on the test run.

C

Yeah now, then, I think they're called.

A

Yeah, I think the direction makes sense dude. I.

E

Think it definitely makes sense. Okay, so there's the next step. Would it make sense to how do we compare the delta? Is that something that should be programmatically done within the test and let the test kind of figure this out itself, or does it make sense to create code external like accompanying this tool with the delta binding tool?

E

I don't know I'm trying to figure out even how to word that. um How would.

G

Somebody want to integrate.

D

E

Their test, that's the best way to describe it.

B

I think I think you need to have like some uh input for the for the the thresholds in it that we are expecting so because it can change for the environment. You know depends on your cluster. How powerful are your vm, your bare metal nodes or whatever you are running, it will change so it I think it just needs to be configurable.

B

You know to calculate deltas. Well, you either you mean the delta g. Did your friends in it for between runs you see or what? What delta we're talking about so.

E

Let's say you have a baseline result.json, so you have a previous run that you've done and you've gotten the results, and you want to compare that to a future run. What should be calculating that delta?

E

Do you think that this uh I should create a tool that knows how to interpret two results and determines um like a configurable delta between the two or is that something that the test is? Are we comfortable with the test just kind of implementing this logic itself, or should I create some sort of reusable logic to calculate deltas based on thresholds?

E

uh Hopefully I'm describing that accurately. It's I'm having a little trouble.

A

Yeah so either like um so either uh you know I just I could. um I just ran this test. I got my run times here in seconds, my worst 95, my 150. um I want to know if I have. I want to know if I've met a threshold and I could compare I could find out if I met the threshold by comparing against my last run, or I could compare it against. Let's say we have a baseline for all of cuber for a specific release.

A

um Where should that be calculated? Should that be done like in the reporting tool?

A

I guess the question that should be done: the reporting tool so that I can essentially do this and report it right here or should you have something else externally, where we take input like another file? Like a previous run, I think that's the question. Yeah.

E

You you've you're on that's what I was trying to say. Thank you.

B

I think it should be done in a test, so yeah.

A

Well, I like I mean I still like I still like the idea like so I know here as hey, I'm trying to think of like when I'm um like even you're saying it should be intense, like I'm trying to think of how, if I wanted to run, if I wanted to know I'm at thresholds with my code, you know before I push it.

A

I want a way to do that, so I mean I I wanted. I want to know outside of the test. um I guess the only thing I'm just trying to figure out is like whether it should be like as a part of this like. Maybe we just maybe we posted both like maybe we posted the the precise uh we post, we post a report, the data, and we also compare it to our baseline for a specific release. We have them both here, maybe and then what.

E

A

E

The input config, I could have a baseline, where, if you give a baseline, they don't compare the results of the new one to it or something like.

A

That yeah, so maybe like so I guess you could say, like we have a pot. So by default, we compared to the q verb release baseline, but we have the potential to override it with a another baseline like in the command line field.

B

But what I mean about the cube baseline is: it will be like in the environment that we control, but if you are testing your laptop, for example, will be completely different. So then it's not comparable. That's that's what I was saying, um but if you run that like before your you know your chains and after you're changing your laptop, then of course you can compare. You know it's the same environment, but then it's very small. This is it so you can just get the file the two files and and look so but yeah.

B

But if, if you have this compare you know compare everything in the tool. It's fine, but I I don't know if it's necessary that students.

A

Well, do we do we expect, like I mean it's a good, it's a good point like we don't want to compare apple to oranges like we don't want to give false positives or anything. um I guess like yeah, it's a really yeah, there's a hard line here, um because I because we we do want to avoid that.

A

I'm also like I'm also struggling with you know what, how much of a difference if I launch, like 10 vms, how much, how far on my laptop, how big of a difference is that from like, if I'm doing it as an estimate, how big of a difference is that to say um what's going to run in ci because, like I want, because otherwise, what if we're saying this is apples and oranges, then we're saying that the developer running this baseline and comparing it we're running this test and comparing it to the the upstream release baseline is useless, no matter what, because I think that's true.

A

E

Think we'd have to just run our own baseline, so developer, we're creating our.

A

Baseline, isn't that useless too, like oh well, I guess they'd have to run it before and after right, like that's. The only way.

E

A

E

The release in their laptop gather a baseline and then they test their new dev branch and then be able to calculate the delta.

B

Yeah then, in this case the delta is nice, because you can, just you know, actually show the percentage how bad it became, because we could expect some. You know if it's become ten percent worse in your laptop, it can make it it. Can you know around ten percent worse in the other environments? Isn't it so yeah? Maybe the delta might be nice, however, the developer when it they push the pr they will see this test there. Isn't it and that's the goal to have so they can go to the job and and check their exactly.

A

We'll get it we'll get it on the pr that'll be like our our our gate like that'll. Have that will be consistent every time so yeah. So I guess, then, this focus for in terms of the developers persona is that it is meant to be run before they do they they attach their code and then and then after to get to the delta. So like just to you know, measure.

B

It's good to check I'd, say it's yeah.

D

B

Think it's worth checking before pushing up your, so it would be nice yeah.

A

Yeah, I yeah, I mean, I think it's valuable. I I don't know if everyone's going to use it but like if you're doing perf work, then this is a valuable tool like because this is exactly what you're targeting. So I think it opens the possibility for people who are who are looking to find that information. We can. We can quantify it for them before they even push the change.

B

Yeah, but I think.

A

B

To leverage that and try to use in the task that we are going to to have so.

A

Yeah and then, and then also it would extend to the ci too, because we just like you, said dave, we just you know we default the config file whatever to the baseline and ci. So it's not like it changes the um the workflow at all. We just kind of now we're just comparing it and we just we're always going to make sure we're comparing up the samples, um whether it's the dev persona or the ci yeah. That makes sense okay, yeah. So whatever config option is what it sounds like we could.

A

I think I mean sorry to summarize, like I think it makes sense in this tool. We have a config option um so that you have the opportunity to run this twice with your delta to c to measure improvement and then ci would do the same exact thing they just would point to see. I would point to a baseline that we have somewhere committed in repo and that would be its baseline for the delta.

E

Yeah, I think I think what I'll end up doing here is creating a separate tool that can interpret two sets of results and give you a information about the difference between the two like percentages and things like that for the major metrics.

A

So it takes so just like take this as input or something this result, it would.

E

Take two results.json in this case uh and you would be able to generate based on those two because well let me back up for a second. I expect this results.json file, that's generated from this perfscale audit tool to become more complex, like it's probably going to have a lot of things in it a whole lot of things.

E

If we really want to get detailed so having a tool that can take two of these files to these results files and give you just kind of an understanding of what the delta is and let that be used as a pass or fail. I think, having a separate tool that can do. That is probably what I'll end up doing so. You'd retroactively have two files feed it into this tool and it was spelled spit out at delta.

A

So yeah you just you you, instead of instead of putting it all just kind of continuing to build into this results based on we're, basically gonna we'll keep them separate and we'll have one file for the deltas one for the transition for some titles or whatever it is. We end up in this results based on.

B

Okay, so maybe an idea instead of have like two different tools, this tool has, you can have like an option like collect and then you can have another option. You know like compare you know or something like that, and then you have just the same binary tool to do those things you know collect and compare different.

E

Yeah brands: fine too.

E

E

All right, thanks that helps me cool.

A

Great okay, that was kind of what I want to talk about. Actually in this framework, like kind of getting to like the report and talking more about like extending it, um the only the only other topic I had left was generating a load.

A

We only have four minutes, though, um but uh why maybe we can table it for for next time, since we have a bunch of like we have a bunch of the like, there's a lot of work going on here and we're not going to get to this just yet and- um and we have a bunch of issues that have been taken.

B

To generate the report, it's related to the pair of scale, audio.

A

I'm sorry: what's that.

B

The the thing that we just discussed so it's it will be like the tool to generate the report from the firmware yeah yeah.

A

Yeah yeah so, like I wouldn't what I'm saying is like both of these things like combined, make sort of like our framework like we have a way to like whatever it is like we're, generating load and rapport. We have like some way of doing I'm just calling out the framework.

A

Okay, all right um um are there anything. Is there anything else uh we discuss? I think we're three minutes left.

A

Okay, all right, I guess one few minutes later then um marcelo I'll sync with you after we'll we'll put together some of these and uh we'll add them to the stock. Here all right. Thank you. Everybody have a good day.

A

B

Thank you all right. Thank you. Bye, bye,.