KubeVirt SIG Performance and Scale, 30 Jun 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: SIG - Performance and scale 2022-06-30

Description

Meeting Notes:
https://docs.google.com/document/d/1d_b2o05FfBG37VwlC2Z1ZArnT9-_AEJoQTe7iKaQZ6I/edit#heading=h.tybh

A

Okay, uh welcome to sex scale. It's june 30th 2022 I'll link to the notes in the chat.

A

Okay, um please add yourself an attendee, please, okay, let's go over the performance, periodic tests, um all right, so uh marcelo you had you had the fix for the unexpected number of vmis. This is strange. I don't know why we um why this only showed now I found this surprising that we did. Maybe it was because the um the number of uh maybe because the the test changed um and and now it's catching it or something.

A

Maybe that could be why I don't know it seems a little weird, because I thought do you remember if, like we, I thought we'd delete the first bmi. I actually I do you remember.

B

I don't remember now: yeah, okay or yeah.

A

I thought I I mean I thought we did, but maybe not.

B

Yeah, if, if we are deleting, maybe it's taking time to delete now and then it's shiny, you know during the test, you should wait. Maybe two of these.

A

Yeah, I wonder because, like.

B

um Yeah, as I mentioned, I did a quick fix so and I'm trying to find the notes. Okay, to put my name so, um did a quick fix because we are creating. You know in the in the end 101, so we should be fine.

A

Yeah, I'm trying to see.

B

Yeah, it should no, it shouldn't. Oh, I think it's in the test, isn't it it's.

A

It's in the it's in the uh the.

B

Generator yeah.

A

Okay, so, somewhere in here, we want to do it. I don't know where it is in here. Yeah.

B

It should be yeah- oh no, not not the login here uh it's this! Is this fast! It's the functional test.

A

Oh okay! So it's um oh okay! So we do. We go out.

B

Yes, yeah yeah performance.

B

Well, there is a scale there.

C

B

Yeah, it should should be density, yeah, um and uh here.

A

Right here we go and then we do a right. We do like a primer.

B

A

Yeah, wait: wait! It's ready.

B

No, it's not deleting you see, because this.

A

Doesn't delete yep, yep, okay, makes sense, makes sense and then where's that, like so where's the weight um where's the weight 100 happening um only which it's in your pr, wouldn't you change this.

B

You you, you put the mouse exactly yeah.

A

uh Okay, so vmware, okay, so test performance density, okay, I was right there I was it was uh right, though, there's your change right. There, uh okay makes sense. Okay got it, got it yeah that makes sense. Okay,.

B

Cool yeah, okay, maybe I should put like uh some comments here, but I did it very very, very fast. Well.

A

No, no, that's fine! I I think like this is this is actually I think this is really good. It's actually. So I think before we were maybe not waiting for the multi. I think this might just confirm that what we have we might have had a bug previously and now we've with like waiting and now it's and now it's fixed because we're if we're cons. If we were consistently seeing this, then we are consistently checking that we have the right number of vms. So I I think that that's right. I think that makes sense.

A

Okay, good! Well, then, that's good, all right! So good! That's fixed.

D

Is your favorite failure.

A

Well, we had some success. Look since um what was this, I don't know when this merged, but we had so we had a bunch in here. So that's good. I bet this is the oh. No, this is something else. This is.

A

This failed immediately. um This probably has something to do with uh an end point being down. I'm guessing like.

A

C

Sure why this failed.

A

It's like basil, something else, but I don't even see the air.

B

Yeah, it stops suddenly so maybe was killed.

A

Yeah, this doesn't make sense. I I this doesn't look like there's anything to do with us and then this one.

A

Air running sub commands.

A

Bill did not complete successfully, so I this doesn't look like us, yeah, okay, good, all right. Well, then, I think- let's just see I mean this- should this should eventually go to green okay and then let's go to um periodics.

A

Where is the uh this one? This is your dedicated cluster right, okay, so we have.

B

B

I think it looks yeah just it would be very big.

B

A

Okay, so we get to our replicas waiting right.

B

It might be the 600.

B

The case with 600., because that's sometimes.

A

I don't want to see if there's some oh here, we go here's our answer, so this is this is the uh the one from the bug from.

B

Last night.

A

Yep, so I I didn't get the chance to create an issue for this I'll, create an issue just so we can track it, but yeah. That's.

B

A

B

Yeah, I sent a message to do something about that.

C

Yeah, sorry, what if you want to assign it to me, go ahead, um feel free to do so I'll. um Try to dig into that.

A

Okay, anyway, okay yeah I'll, create an issue on that and um sign tla yeah sounds good. Okay, okay, cool that'll, be so that that's probably this that looks like the 600 and the 100 the so this is also is this also on the dedicated performance cluster.

B

Yeah, it's it's 100, it's only one time, so it's creates and deletes and then that's the other one. It's like uh creates. 200 weights creates 400 ways and then the deletion doesn't didn't have the time to delete okay, then maybe we can increase the interval between the tests.

B

I don't know maybe.

A

I mean, I think, well once once we get, I think once we have that fixed, when we wait till the objects are removed, it should naturally increase the interval. I think that'll probably give us the cushion.

B

And the other thing that I was thinking also, is you know when I create the range like uh you know, 400, I don't know, I don't know if it's 400, 200 and 400 really makes sense. You know I think 600, it's fine, because I see more issues with 600, for example, when I see the graph on dashboard anyway, what I want to mention is: maybe it's better to have like separate jobs for each number.

B

Instead of have a job that creates 200, 400 and 600, we have different jobs. You know um it will fix the issue to delete and uh but of course we don't want to. uh You know hide the issue so just but I'm just thinking that, um because if we put a you know a bigger interval, I don't know, can you open the graphone dashboard? Let me send you here.

A

um How do I get it if I get to it.

B

Oh once I forget.

A

If I have a link for it around here, I.

B

C

One second.

B

I'm standing here in the zoom, where is chat.

B

Okay, I sent the.

B

Okay, so did you see these bumps here, uh the the we can see the vm counter? Maybe it's the easier one, the lower okay yeah.

B

So first of all, you can see that the previous test, the 600, were failing now the 600 is working the for the newest ones, but if you see that there was some task rates 100 and the other test that creates 200, 400 and 600, but they are very tight together, you see. So it's really hard to analyze like that.

B

We could have like a longer interval between the range.

A

Yeah, well, I think, maybe well maybe what you're getting at marcel is. That is that at just one test, doing just 600 in isolation is different than doing 200 scaling down 400 scaling down at 600.

B

Yeah exactly so it's easier to visualize. You know we can yeah, I'm just saying that if, if it's fails, we know exactly it's it's easier to know exactly which test, instead of gauging the logs you see, which one failed. You know just just like that. So and also maybe we don't need one. Two hundred we have 100, maybe 400 600, you know 600 is the maximum. We cannot have more than that. We have like only three nodes um right and maybe 400, I don't know we can check.

B

If may, if it, I don't know if it makes, if it's really really valuable, to see 400 because we have 100. This is important and then 600 like the with our uh big range, and if we want to introduce more tests, it can be something else like, for example, the steady state. You know we can save time here. We cannot, we need to timeshare, we cannot run parallel tests, otherwise they will impact each other and we can decrease not only for 600 and then and then include another test.

B

For example, the steady state, as we mentioned yeah.

A

Okay yeah, maybe we can tweak this a little bit. I think um yeah, maybe like um yeah like a good just a 600 isolation would be interesting, yeah and then.

B

A

And we still have like we still want to do like we. This is burst. We still steady state, we could do it would be. I mean this is 600 max very valuable, like we've got a lot of information, it'd be cool to see, 600 study states, and then we can configure the rate and there's there's like an infinite number of tests.

A

We can configure there, which is the rate of deletions and recreation as long as we can do yeah, okay yeah, but I like the idea that, like they're likely conceptualized that these are two separate things and they deal definitely two different results. So yeah. Maybe if we're getting a lot of results from 600, then maybe we just do a 600 in isolation and um and we do uh we maybe do some measurements off- that we do the measurements off 100 and then we can.

A

Maybe we get rid of this one for now and we kind of at.

B

A

We could just.

B

A

B

Update it, so you can update it and remember this.

A

Change over 600 yeah.

B

Yeah, exactly okay.

A

Yeah makes sense, okay, yeah, but at some point well, maybe we'll come back to I I mean my guess is like with steady state, will kind of take over this because, like steady state is kind of like this and that like right, we're just creating and deleting creating deleting, but um instead of starting at what we started like you know, the higher value probably is what we do. So I mean this is kind of like a steady state test. I guess.

B

Yeah, it could be replaced yeah.

A

Yeah: okay, yeah! That's a good point.

B

Yeah- and uh I think it's good to replace you somehow, because you know to have uh our we can even like you know- increase the 200, the 100.

B

If we see that 100, it's not showing enough data for us and and have the steady state, um and I wouldn't break too much test because otherwise, as you saw like in the grafana it might it start to be very hard. You know to compare things yeah.

A

Okay, yeah. I agree, I think, maybe what we'll do I? I think this is good, because a hundred of three notes are like. I don't know how many people will have different densities like these are good range because, like I mean.

B

A

Know for like us, and video like we have our vm count is a lot is limited by the number of gpus, so we don't have incredible density whereas, like this is three notes, that's incredible density, so I mean this is a good range like this is, I think it's pretty valuable, we'll get data from both and then getting the um uh and then yeah, then the study state and then I think, then steady state will have to play around a lot with you know the right.

A

The right um I mean we kind of want to find a balance like this. Something like this where I, but I think we sort of have two different axes here like where we start from, is one like 100 600 and then the rates that we do the delete and the recreates is this or the other axi axis, so it we'll have to we'll plan to play with a little bit.

A

I guess we'll probably have more steady state tests, I'm guessing we'll settle in around four we'll do 100 600 and then we'll do like a really quick or like a lot of deletes at once, and then one with a little more mild deletes at once, and we do measurements, and then we do the same thing for 600 and then maybe eliminate based on what we don't find some of them valuable.

A

So I think that's probably how we'll go with this. Okay still still works to do that on the steady state test. That is, it's not fully complete, but we'll have to um yeah something we can we'll do in the future. Okay makes sense, and this uh and this grafana is this. Something like is this. If I um is this like this is publicly accessible right, like I think it lasts, yeah. Okay, let me make.

B

Six hours because no test run in six hours but put like 12. Maybe you see the last yeah.

A

There we go okay, yeah what I was going to do. I wanted to make a note just so I keep tracking that.

B

Yeah, I think it should be very important, like that you every week to have a look on that and yeah.

D

Let's see some.

B

Recreation or something like the vm, the latency, the api request, latency the number of requests, because we have here, you know you know, requests per second, for example, for for the different, um you know, components and uh they, you know, see the request. Duration. Oh, you can see the one metric that is interesting if you go down yeah, oh, but up a little bit now, yeah, just the rate limit duration, if you can click in the bridge controller.

B

Yeah, so you can see that there was. It was happening like a lot of virtual controller rate, limiting uh oh it. Actually, it doesn't have any more so you because it's two days if you can increase the range again for one week.

B

Yeah I mean too many data.

B

Anyway, so the the yeah, the pr you see, the pr was merged and it was having a lot of freedom.

A

B

Yeah in the vert controller, and now it doesn't have anymore, so it's I think we can. We can say that it's a it's a big performance improvement that we we got. You know um from from our from our work.

A

Yeah, no, that's that's really good. Let's see so did you. um I think there were some bugs that you had associated with. I think, like the work you uh this one right, mm-hmm.

B

A

Oh good, you closed it. Okay with me.

B

It wasn't me, but it's closed because of time. You know open time.

A

Completed, uh I thought maybe did I mark the close. Oh.

B

No, it was completed by the okay. I.

A

B

A

You cause, I thought you referenced it somewhere, yeah.

B

I've referenced this yeah, okay, okay yeah it was, it was, you know, fixed by increasing rate limit, yeah yep.

A

That definitely was okay, that's cool and then yeah. We got oh and then that was one thing we wanted. We talked about last time. I just remembered um we uh you mentioned did andrew. I don't know if we have andrew here but did. Did you even speak with him about um about the vert controller node, we're queue.

B

A

See if there are any updates on this issue.

B

A

Send a message.

B

A

B

I will I forgot to do that, but I don't know I don't know if he's going to work on that anymore. So.

A

B

A

B

Open issue, so I think this is you know this uh just add rate anyway. If we see that this is something that's normal and it's something that you know we cannot improve.

B

We just need to be our aware about that, but it seems to me something weird because it's scaling with the number of great vms and- and uh you know, to requeue a key too many like that it doesn't compare to. I don't know if it's too many, but compared to the other controllers, it seems to be very high, isn't it so it's definitely something that you need to keep an eye on that.

C

Correct, I'm not sure if we want to spend some time digging into it, but if we do then I I would be interested in taking a look.

A

Okay, I think we'll allay we'll have um yeah if you've got some time. Maybe what so.

B

Oh and oh, it's not not angry, it's another one.

A

Lay we'll have it so you can do. How do we take um we'll do this? uh Let's, let's fix the performance job first just so we get that one out of the way. um So have you look into that one first and then we'll next meeting, let's um I'll book some time, because I need to do some research on this one myself. Maybe we can all do some research and we can. um We can have this as a discussion topic for next meeting and see what we find and then we can update the card.

A

Okay, all right um yeah- I just I wanted to have it here just a week, so I remember to come back to this. Okay, so yeah this one's definitely interesting we'll have to um so anyway, like, like, I said we'll follow up on this one. We'll do some investigation we'll come back to it. Okay, um let me see I just here so from last time: marcelo did you get a chance to do any tracing, or did you get any of the tracing results? You have it available.

B

So I got it and uh I included you know this. You know these things on the cold that the the other trays and plus other other ones. However, I was looking to the trace and I still only see the sync and the update okay.

A

B

In a way that I'm down you know, I don't know about the results, so it's not showing any other. You know tracing points and then I was thinking. Maybe I run the wrong convert or uh or it's it's fine, so it's it. The other trace points was not lo higher than one second and then that's why it not was not appearing the log. So um I'm sure that I I deployed it in the right way, but uh I'm in doubt because I didn't see any other trace points.

B

So it's just the sink and again this is with the very old the old cars per second, when it was very low with the newer cars per second. The the trace points doesn't appear anymore.

A

Yeah, okay, interesting, so I guess, then I wonder if maybe what's happening is that each of these individual ones are less than a second and they're just adding up.

B

A

B

uh I think I I don't have the cluster anymore and it will be hard to do that again, but it could be a way to do that. It's remove the one second or or lower that for 100 milliseconds, for example, and then everything would appear and then we can could see like exactly. You know the time in the each each point that we put the trace, but.

A

Yeah, okay, all right I mean this might be another one. We can do in other experiments, only it's more time. Okay, um yeah! Maybe this is one at some points. We can sorry, god.

E

Hi guys can I also ask some questions.

C

E

uh I was away from these discussions here. Unfortunately, uh uh since nvidia already release them source code of them, drivers me and my team. We are trying to uh start to develop a gpu live migration, for I send a link on the chat window for you understand: what's what's the the unknowns from any video?

E

What is the correct forum to talk about uh uh what have been done already regarding live migration uh before we understanding what have been done so far and uh discuss who have done that work? How to implement the gpu live migration? Also, since we have already the uh the open source drivers of nvidia.

A

Probably uh the the wednesday meeting for the the the keyboard.

E

Or the mailing list would be yeah. I was yesterday talking to the guys, and they asked to to be here to to talk to you guys also who who I can talk to to start this process and have the correct path on the beginning.

E

A

Wait so the yesterday's meeting they sent they sent us to this. They sent to this meeting. Okay.

B

Who would you talk which channel did you talk.

E

The community meeting on on wednesdays.

B

Yeah, so there is also eyes lock, because you know um this is regarding a feature. Isn't it so I I didn't play with live migration for gpus, um not aware about.

E

That I don't know I.

B

Don't know who implemented that.

E

Also, who implement live migration inside kubernetes at all? You understand even yeah,.

B

E

Who are the guys have done that? That's my question. Okay,.

A

B

A

B

Maybe roman, I think maybe in the or maybe the storage guys that implement live migration anyway. So there is a channel in this lock, that's called dev and those guys are in this channel. So if you ask they will reply.

E

B

E

B

E

uh Yes, I asked them already nobody answered well now. Let's wait, then sometime.

B

Fingers get lost there, maybe just rephrase and send again. You know and simplify just say I have issue with live. Gpu live migration, who's in charge of that. Can you help you know yeah and because maybe sometimes when the message is too big, I don't know if you did that, but people get lazy to read. You know just go straight: okay,.

E

Okay, thank you so much guys. Okay uh is there. uh uh I was away also. Is there up that date for release version 1.0.

A

There's uh I I don't think there's been a publicly stated uh target at this point. It's I mean. All I can only know is that it's it's we've had a lot of discussions about it um and I think it's moving along like I think I would say it's a lot closer than it was you know a few months ago, and it's it's definitely something that's in focus um by a lot of people, but um yeah I mean there isn't really a there's a hard date.

A

I mean there's like I think, there's a document floating around somewhere with like the um with the remaining items. I think the last from what I recall. The last remaining item was uh a policy for how we handle decker decrementing apis and when you started like alpha apis or things like that, um that's that's already being worked on, there's already a document for it somewhere. It's, I think it's in the community repo um and I think after that merges. I think that was the last item to then say that we have everything for v1.

A

So it's coming up, but there isn't. There isn't a date that I know of that's. You know say when it's going to be, but um I I expect it'll do some more info soon, because I mean I can tell you that at least you know. From my perspective, I'm definitely very interested as well like.

A

This is something you know we also want to. We also want to get to so it should be soon. I mean it's just I can't I don't know can't say when yeah there's another release date. The feeling is, that's that that is this year, correct, yeah, my feeling is: is this year my feelings is, I I feel I think just based on conversations and what's remaining, I I feel confident and yeah. The thing this year will give you one.

E

Wonderful! Wonderful! Thank you guys. Welcome.

A

Okay, all right, um let me go back to let's see, look back at what we had last meeting. So um so that's good marcelo. It's awesome to see that this. uh How much that that impacted the work you that's really good. Did you uh did you ever rerun the? Have you rerun the test recently or um you actually know you did right. You ran it yeah here you go, you ran it after and you already saw you recorded the performance improvement. So that's really good at some point.

A

I still want to do that that test, where we can like because, like you have here like a really good measurement of like how much of an improvement that you've made, but we have like we still don't have anything to measure against so that we can publish you know our data still something that bothers me. I wish we had. You know like this patched. It had this much of an effect on performance or something you know what I mean.

B

Well, the the the pr that close the work. You is doing that. Can you open this.

B

The one that yeah I know.

A

This one oh wait: this is open. Wait! Oh sorry! This is not this one. um This one.

B

Exactly this one, so this is what's just showing: what is the improvement, so it's like uh what's 62 times improve, you know the improvement in the in the latency to create 1000 vms. Something like that. You know yeah.

A

But I guess I mean sorry what I mean is like um this like this is really good, but like with your scenario, you were able to induce this problem like we need. I guess maybe that's what I'm saying it's like. We need specific scenarios that, like we can measure against, say like we tested thousand vms in 12-month cluster, and this pr did this improvement.

A

um But we don't have that standard to like measure across releases to say: okay, here's what it was two or three five: zero. Four, two: zero: five: zero, zero, five! Three and you can see the improvement like we don't we don't quite have that measurements just because. Well I mean it's it's difficult to. Maybe it's something we'll have to do. We can work on the performance clusters like we can yeah.

A

You know as we can use to like get to get the standardization or something because I think like I mean it really just because this is um because I mean this is great work and it just needs to be highlighted like and after even at the release level like. There is a massive improvement here.

B

A

Something, I think something we just need to fix, continue to improve on with ci, so that we can continue to build it out so that we can get to this and then measure continuously and then get it to the release level like elevate these pr's to the release level.

B

E

I would like to ask you something here: we are doing also some tests but uh much higher volume, 10 000 vms, across 1250 nodes cluster.

E

For you understand what kind of testing are you doing today, because we are testing other way other than you are using today.

A

Yeah that that's what I'm saying is that, like your, your your environment is different, and so your performance is going to vary, but what we're saying is like so marcelo's got this pr here that that greatly improves performance on a 12-0 cluster. When you create high amounts of density, which is you know, a thousand vms, it eventually basically shows here like it. The the vmi creation latency just is very high with uh with that, before this change and then with it it gets it gets much faster.

A

So I don't know if you're seeing this on your clusters, it's it's hard to say, but um I mean at least for you know a cluster that that gets this much density.

A

uh We should see an improvement, so um I mean you, yours might be. Yours might be a little bit less. So it's hard to say, but the test is different is really what the point is.

E

We are reaching less less than than this, for you understand.

E

We are reaching something around milliseconds.

E

B

You why creating vm or vmi, which object? Are you creating.

E

B

Okay and the vm uh automatically starts the vmware.

E

B

Okay, I'm just asking because this is a configurable parameter, so.

E

um Because these vms change the size over time and with that, we also change the number of of uh nodes in the cluster across because the same cluster can reach for 10 000 users reach 157 nodes between 157 roads and 1250 nodes for the same 10 000 users, because all the vms are different like two virtual cpus, four gigabytes of ram and another one has four virtual cpus and eight gigabytes of ram, and things like that.

E

For you understand, these are dynamically created over on the fly over our over the api of convert for you know, but we don't have a single cluster. We are doing uh 10 clusters across several regions.

E

Also, there are some clusters in brazil clusters in u.s classes in europe, clusters in asia, and also we are doing everything uh to be- let's say, scalable uh all over the world only for which our solution works, but uh your numbers seems to be worse than we are getting that's. Why I'm asking you? Why uh also the pvc behind the scenes?

E

We are using a faster storage before you understand how we are achieving the best performance. If possible, we use something called ram disk.

E

Let me grab you that.

E

In every server.

E

uh Window in every server we have 600 gigabytes of ram.

E

We we grab six uh 300 gigabytes of ram to have a ram disk and we expose all these 300 gigabytes of ram of every node as an cluster file system to the vms, and that's why we are reaching 1 million times, sometimes faster than mvna's storage, and that's why we are getting better numbers behind the scenes how we are making having everything happen before you understand, we was working before with rook with and seth, but rook and seth, the the duplication part of it is on alpha stage, and then we roll back to gloucester, for you know how we are exposing these ram disks to the vms and why we are getting better numbers than yours.

E

For you understand, okay,.

B

Now my question here is okay, so you mentioned that you have like many clusters, different regions and- and also you mentioned about like 10 000 users, but probably they are not creating 10 000 at the same time. So yes.

E

At the same time, because they we have like at 7 00 a.m, everybody arrives at the offsea and everybody.

B

Okay, requests: at the same time,.

E

Like a waterfall of logins, okay,.

B

And what's it what's, the number of uh you know parallel vms vms creation, perk cluster, we.

E

Created more than ten thousand, we have uh like.

B

E

We have in each cluster ten thousand users, and we are, we have hundred thousand co-current users logging in at the seven a.m. Before you understand these are like uh spread all over the clusters. We have. Every cluster handle ten thousand users, only okay, and we have more than hundred thousand concurrent users.

B

Did you change the configuration of the rich bridge controller, for example this this this configuration that I changed the cars per second and burst.

E

Nothing we just run out of the box before you understand, but behind the scenes we have better storage. Let's say right, that's why he's faster.

A

Well, there's also, you said this, so I think what I heard was uh 10 000 users per cluster, and then I think, a thousand nodes is that what it is per cluster.

E

Yes, in the single cluster, we go from three nodes to 1250 nodes on this single cluster. Okay,.

A

So, roughly the math, it's like it's like 10 vms per node. It's right about up to up to 64., that's the maximum. We are.

E

A

Node, okay and then marcel your density here is.

B

It's 100 100., okay, yeah.

E

B

Might be some difference and but especially the hardware probably changed a lot so.

E

B

If you have, you know verb, just these nodes are also very powerful, but maybe the one that you have it's even more, even even better cpu things like that, so it will impact.

A

How many, how many api servers do you have how many kubernetes they pass? First yeah per cluster.

A

What's catching your your speech, can you tell uh say again: oh how many kubernetes api servers do you have per cluster the three.

E

I don't know exactly, but is at least five.

B

Yeah, this also might impact a lot, the experiment, you know, I'm I'm using the you know the the. I think it's three by default. You know, I don't remember, know what.

C

B

Yeah um yeah are.

E

There there's a number of apis servers for sure.

A

And did you did you notice like a performance change when you did that? Is that why you did it.

E

Yes, yes, because we have the waterfall broke at one of the days on the beginning, and now we are, let's say: smooth logins, okay,.

B

Yeah that that's that's very interesting, you know how actually the kubernetes control plane impacts the convert control plane, and uh this is a test that I didn't have the time to do, and this is interesting you know to increase the data.

E

Yeah marcelo, I send you an email in your at ibm email. If we can talk later on over, what's up or something, I appreciate that okay, marcelo.

B

E

Did you receive my email.

B

E

Great then, we can discuss and share some thoughts at least okay.

B

E

But for you know uh we are building infrastructure for one million concurrent users to be uh in production and on the second semester before you understand these gonna be like hundreds of clusters.

B

Again, it's just huge yeah, it's very interesting! Yes,.

A

Yeah, that's really cool.

E

And far you know everything based on convert.

B

And uh are: do you have any plan to you know maybe show some metrics from the cluster. I don't know if you, if that's.

E

Why I'm talking to you help me to to push the information out because we are using and we are doing only for ourselves right now. Okay,.

B

Yeah it's, it might be interesting, you know, if maybe, if you write like a blog post, you know you you'll get a lot of attention for that. You know. No one is using this for this size.

E

I'm sure before you understand we, we have half million gpus.

B

E

Again on the project.

B

Yeah, it's very it's very big. It's might be interesting. The bottlenecks, if you are seeing any bottleneck, you know.

E

Many that's why I'm trying to to put the pieces together and make it happen.

A

Do you uh do you see at all the um do you see any pvc, so you said you said you've high performance pvcs, um but the do you see any latency when actually creating them like with kubernetes, actually goes in and and creates them and deletes them. Do you like see any issues at all with that, because I don't think that has anything to do with like this on the creation.

E

No on the delete- yes, okay,.

A

Yeah, this is something we've at least observed: there's a there's, a pvc protection controller in kubernetes that builds up quite a work, cue on deletion, and it's we've at least from some some of our testing that we have. This is one thing we run into a lot as the work queue grows very large and it can cause latency issues during creation, vmi creation, just because we're deleting the the pvcs and this controller is using a lot of the api services uh resources.

A

So it sounds like something similar to what you, what you're, observing yeah.

E

Okay, but the deletion is not blocking my my service. Let me explain you why we are a non-persistent dji. What this means, when the users log off, we just kill the machine.

A

Yeah makes sense yeah it's similar use case, so what actually yeah? What we do too, like because the point is right, you delete the pvcs after each vmi is is finished right, so you get right. So you have a lot of delete requests, which is what leads you to this: the cleanup you're being very slow, yeah, exactly okay, yeah, very similar problems to what we've been dealing with internally, okay, interesting.

E

uh Can I ask you: what kind of cpu are you using intel amg arm, um amd's yeah? I would like to have it. We are using intel.

A

I think oh hold on. Maybe it's I think it's amd's I I forget, but I think that's what it is I'll just double check.

A

B

A

Know it's not arm yeah we're not using arm, but maybe something at some point I know arm is still being worked on in cuber. So.

E

Yeah our goal is to move everything to arm. For you know, okay, including.

A

E

A

Okay, well, anyway, the the um this. This exercise, though, is kind of um I mean it's interesting like like. I was saying that this scenario, like you, you definitely hit a pressure point here, marcelo with this with the density. Maybe it's just because you went over like like um maybe you went over 80 or 90, or something and that's just based on the configuration of your cluster of three api servers or something like that. It's when we run into this kind of latency.

A

It might be something like that, but it's still it's just it's beside the point, because it's it's a valid use case. So it's something we need to address and you're totally right that the the qps should be higher, like just based on what your analysis was. So it makes sense, but it's interesting how it affects people differently.

E

uh Are you working on the actually code of convert or not.

B

Yeah I was doing that um but, uh okay, you know yeah, um I'm changing the project nice right now I talked to ryan. So I'm going to you know, I'm smoothly going to um you know, leave the cup vertical. You know a couple of projects. Unfortunately,.

B

But we were we're improving the cold and uh trying to you know brian also put some traces in the code and create also a sequence diagram. You know to understand how the workflow that the vmi goes at least for some part of it. It has more things uh that we were discussing before, but uh it's the. I think the whole goal here is to understand bottom ax and then try to you know to identify that in the code and improve the code.

C

Yeah yeah, we like marcel, said.

E

More than the number of of vms per per per node that can help a lot.

A

Well then, uh that's exactly right: what marcel has been testing.

E

A

E

Because the limit of 120- that's, I think, is that the limit is too high, at least on my mind. Yeah.

B

Well, sorry, what limit the number of vm's number.

E

Of vms per uh the kubernetes number of uh pods per per per vm on kubernetes, now.

B

You think you can increase that you can increase that for open shift by default already increase that for 400 500. I think- and I did, that I did some tests, for example in hours I did some tests that I was creating 400 vms in an old, so change something the convert code to do that also, um but you can you can increase that, but for like I.

E

Was not aware of that? I should know marcelo.

B

Yeah, it's um I think, officially uh openshift, not not appreciation. uh Officially cooper says that we can support 250 in a saved way per node. Okay, of course, depends on the because when it's the problem, you have like too many parts per node. It starts to overload kubelet. Okay, so um and uh it it's things you know the the container run time start to be overload and thing gets nasty but 250, it's it's! Okay, as I mentioned to you, I could run 400 very tiny, vms, okay um and uh without big problems.

B

More than 400 is start to be like too many, and um and uh if you, if you aim to go to 200, it should be fine, we are actually creating 200. Now, in our uh you know, perform steps.

E

B

E

My issue there is that every vm has the gpu slice and that's why the maximum number of gpus inside a single server and things like that.

D

Right right, yeah that yeah, but then that.

E

Is not uh what's.

D

B

E

Limit not not not the amount of vms without gpus, because we don't offer vms without gpus.

D

A

Yeah, it was going to ask you because, like with your gpu workload, do you run into any issues like with when the way you slice? When you wait, you slice your gpus? Do you run into issues on any of the smaller, like smaller configurations, like an eight to one four to one or something? Do you run that small and do you do you run into any issues like with performance?

A

um I don't know. Do you use cpu.

E

Slide set by vgpu of of nvidia and we pass through to the vm, and that then works perfectly.

E

Regarding the user.

A

Perspective at least okay, so I don't understand so you you, always you always pass through the physical gpa event to every customer. We applies.

E

uh 16 gigabytes in a 16 of one gig, eight of two gigs, four of four gigs or four of eight gigs. It's simple as that: okay,.

C

E

This, the reservoir or 44 with 16 gigabytes uh gpu each.

A

But do you so I I guess from here is like so you have the um so like on your node, you, you have um what appears to be one physical gpu, which then you pass to a you pass through to one customer's vm, because you said it, you said you do slice, but it gets passed through as a physical gpu.

E

Yes, it's like a technology from the nvidia call it dgpu virto gpu.

A

Yeah yeah, I well, I know I know it is, but I mean like it's the but you're, not um okay, so you're you're, you're, you're slice, you're slicing up into an eighth and you're um and you're you're, using like vfio or something to yeah, okay, correct.

E

A

E

Me give you here.

E

We are using this here.

A

So, in the case of you have like the eight to one, the vgp or something, um if you on the eight to one free gpu, are you at all we reaching six to one okay and the 1601 gpus? Are you at all having any performance issues with like the.

E

A

E

To control is the temperature of the gpu.

A

So then, each so then each time. So then I guess: how do you allocate the cpus and uh for each of them, if you're so, do you give like each um so you give one a 1 16 gpu and do you allocate a whole cpu.

E

It's one two four, I think one car every four vms, something like that.

E

A

So you do you, don't do anything with pinning or um or any memory bandwidth allocations.

E

No, we don't do or let's say over bookie.

A

Okay, cool, okay: I was just curious, okay cool. We should.

E

Check that we plan to do that, but.

A

E

I mean the reason I was asking was because doesn't.

A

Exactly support it right now for what this essentially I'm talking about. So I was wondering if you had a solution that was- and you does hubert because doesn't support it. So I was wondering if you had some other solution that you that you could publish, because it's something that is something that's interesting too, like I mean there's a lot of there's a lot of things in that area. That would be interesting to see in the community.

E

What we are working on also, but this is uh one or two years work we plan to to to have finished, is the video I send you here is to finalize the para virtualization uh virtual 3g have known for for linux. We plan to have also for uh windows and mac also.

E

Before you know, this is plan we what we're planning to do internally, but this is a couple of million dollars later to finish.

A

Yeah, well, that's that's cool. I mean you should, um at some point, it'd be cool to share your some of your um the phase transition times. It'd be cool to see like how you guys perform with those transition times to uh in your cluster. If you, maybe you have some optimizations that we can, we might be able to publish so that others can can copy it. So it would be cool to see at some point yeah. That's the goal, great, okay! Well, um all right!

A

Well, thanks for sharing, so any more uh any more topics, some people, I think, already covered. Quite quite a few things. I think, there's nothing else yeah. If there's anything else going on in the left side.

E

If you have any any further uh scalability issues, that's why I plan to to be here always to share what we are already reaching. We are doing a stress test for 100 000, concurrent users across multiple clusters across multiple regions, behind the scenes.

A

E

Yeah we have some.

A

Work like marcel's a lot of work on the load generator and we have the burst test. Eventually, we want to get to doing steady state and which is probably more in line of what your use case and it'll be interesting. I mean we, we we're, obviously don't have as many as much hardware as you do, but maybe we can try and simulate some of the pressure at a lower scale see if we can find so.

B

Yeah, yeah, great and uh and again so some of the bottlenecks that you see and uh if, if it's possible open an issue- and you could refer it, you know and describing a little bit the bottleneck. So so we we can work on that. You know discuss that. Also. There.

E

B

You know great.

E

Great we are reaching there. That's why I I'm preparing to to to to be more community active with you, because we was planning. We was installing everything. It's not so simple to automate everything here,.

B

E

This took time to to be, there is no touch of a server. We everything is completely automated to deploy the services servers. Everything is completely automated here, and this took some more time that I was expecting like it's.

B

Already working.

E

I'm working 2016 on the project, for you understand: okay,.

D

B

And uh when you join the meeting yeah, if you, if you it's okay, for you put your name here in the meetings uh notes, it's I think it's I'm.

E

Trying to find this pdf, uh the the the the google docs, where is it for? I uh understand normally.

B

Ryan put it.

E

B

E

It's always the first.

B

E

Sorry, I I enter late. That's why.

B

It disappears when you enter. I see now.

E

When I enter late, uh it doesn't uh show what I have uh send before. I.

B

See I I just don't know that so I think ryan puts here again it's in the. If you have the the calendar you know in the in the in the google calendar the invitation there you there is also the link for that.

B

Oh you got it now, but uh it's just just because it's kind of you know um put the attention how many people are attending the you know this meeting and more people it's better. So it brings more attention to performance, yeah.

D

E

D

Well, thanks andre.

E

And hr, uh where are you working on? Are you from ibm also.

B

Yeah, it's brian's from new video, sorry.

E

Yeah, I'm from india, oh wonderful, nice to know if I can your email, I borrow some time from you on the email. Also, okay, yeah.

A

Yeah, it's just uh it's this video.com! It's my my nick here.

A

Yeah, well I mean I I reason andre like I was so interested in a lot of what you're doing is because it's it's, it's actually a lot of similar, what we're doing in the video. So it's we have almost identical use cases in in the infrastructure side, not but not end user side, not exactly the other side, but it's very similar in for side. You know, so it's that's cool, like um your scale and everything so yeah definitely share the problems that you're seeing um you know. uh I bet you there.

A

We were probably running the same things. This is a very good chance for sure.

A

Okay, all right guys, I think we're a few minutes over. So I will call it here thanks for your time, talk to you later.

B

Thank you bye-bye, bye-bye.