Kubernetes SIG Scalability, 28 Oct 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 2021-10-28 Kubernetes SIG Scalability Meeting

Description

Agenda and meeting notes - https://docs.google.com/document/d/1hEpf25qifVWztaeZPFmjNiJvPo-5JX1z0LSvvVY5G2g/edit?ts=5d1e2a5b

A

uh Hey everyone: uh this is six callability meeting uh 28 uh october 2021 uh and uh today I can see that we have two things on our agenda to discuss.

A

So maybe, let's start with the pnf and performance benchmarks,.

B

Yeah I just started looking at it, so I opened an issue um so that you can track it.

B

um I have a couple of questions there, but I think at a high level um I looked at the web hook that the e2e test suite uses. um I think we can add a separate handler there.

C

Oh slow down um you're going into problem solving, let's first think about the problem. Definition right: what are we.

D

C

What do we want to accomplish here right? I think there are a few different things you want to accomplish. um One is we, you know was articulated years ago when we first started the apf work is testing. Can we turn off the client side rate limiting and have a system that works um another that was articulated much more recently when we started introducing uh constants in the support for long lists and watches of you know, finding good values for those constants.

C

Is that a good summary of the problems we want to solve.

B

I think in general yeah go ahead.

A

Yeah, I think the plans, uh I think it makes sense. um I'm just wondering like um from client-side perspective, uh because we want to like turn off this throttling on client side. uh There are some things that probably still can impact cluster like uh creating connection which, which is kind of heavy, so um you know to some degree. Probably we could try, but but anyway, like pnf might not help in this case.

A

E

um Client-Side, throttling or rate limiting, won't really help with like establishing connections, because we are reusing connection across requests right. So um unless you are explicitly terminating the connection in your client, which is not with which is not even possible in the, if you are using client, go for example, which is what our components are doing, then, um then you will be reusing connections anyway. So it's not like the problem that is um affecting us right or our okay. Yes,.

C

I would put it in a slightly different way. um I think maybe it's it's a valid concern that in general, in the real world uh there may be scenarios with a lot of connections being created, um but the client-side rate limiting doesn't affect that anyway.

C

um So the switch from using client-side rate limiting to relying only on the apf self-protection uh is really independent. Of that.

E

So I think I yeah yeah that that makes sense. I think that's roughly what what I was trying to say. Yes, I think we may also consider not not turning off client-side rate limiting by default in every single client, but we may potentially remove or yeah remove it only in like some or all like um in three components, or maybe even some out of three, but not not in like not remove it in the library itself, but um just stream within some components. I think that's that's also an option here, although.

C

Well, I think again, maybe you're getting ahead of ourselves right. I think our first question is: we want to understand what will happen if it gets turned off. I guess maybe it's fair to say what turn off scenarios do we want to explore, maybe not a complete turn off of partials turn off, but the first question before we decide what to actually do. We want to do some tests right and that that's the conversation here. What tests do we want to do for for what purpose right.

D

C

Seeing two purposes: one is tuning our constants and one is seeing if we've got something that actually gets the self-protection job done.

B

Do we see any modeling uh today on the client side for our like big scale, tests.

E

Sorry, can you repeat.

B

Do it today, when we run our tests like at scale, do we actually see any evidence of plant site throttling happening in our components.

C

B

D

You answer that.

C

Let me ask what evidence is there that could be seen.

B

There should be metrics or logs, I suppose.

E

B

E

We do in particular like um we are.

E

We are to some extent like artificially like on one hand, artificially throttling the tests themselves, and also we are um yes, the the in particular scheduler or controller manager when, when creating pods, they are throttled by client-side trade committee.

C

How do we know that? Is it by examining logs or metrics.

E

E

We know it kind of from both, so we we know what the client-side rate limit rate limits are, and we we can see from the metrics that we are saturating that limit. That is, let's.

C

Just say you can see from the server side metric that the client is making requests at the limit that we know that it has.

A

Yes and that's correct yeah, so, for example, we have a scheduling throughput and basically we can see that it's 100 pots per per second. I wonder.

C

Would it be interesting to add client-side metrics to say something about how much the client is being slowed down or how much it's being throttled.

E

Potentially I mean I'm not against adding the metric. I I'm not sure how much value we will have from it.

C

Yeah, I'm not sure either. I think- and I suppose in fact it's it's kind of difficult to interpret right, because really the question you want to ask is kind of compare this world to a hypothetical alternate world. What would happen if we had a different rate limit and no metric can really tell you that, because it's it's a system, behavior question.

F

So I think that is: uh can people hear me? Yes, hey? uh I I think there is one metric which probably indirectly says that there is some sort of throttling, but not necessarily just with api calls, which is like, I think, a bunch of our controllers, for example um with cube control manager. They have this uh backlog or, I think, work queue, depth um kind of which says how many things are still to be processed.

F

I have to check if scheduler also has something similar um I think scheduler doesn't have, but but yeah this metric is.

E

Doesn't necessarily reflect that like metric if the controller is slow on its own and we and we used to have such cases so.

C

Yeah, I wouldn't even say.

E

G

C

Right: it's that that q depth is a consequence of a lot of things. Interacting right, I mean you've got the number of sinkers. You know. What's the number of sinkers running concurrently, how fast does each api call run? How many api calls does each sync loop iteration take there's a bunch of stuff wrapped up into that.

D

C

Not to mention arrival rate right, the q length is going to be a function of both the service rate and the arrival rate.

F

F

Yeah another way which um I kind of use. Sometimes I used it in the past, which um gives a more strong indication that there is actually throttling is like you can, because you have audit logs and if you have audit logs for the calls coming from that component, you can just kind of uh query through those. You can just count um the number of calls coming from that and match that up with with qps uh that we've configured for it to see if it's being saturated, but that's that's still not like a direct.

E

F

E

Are talking about logs, we are explicitly logging when, when the request was actually throttled, then we are when we are actually sending that later. We are explicitly writing that. It was throttled for xyz seconds yeah, but at what.

C

Verbosity level.

B

Yeah, this is a log level, four.

E

Okay, it's not like over four. I thought it was free, but yes, um but yeah. Okay,.

C

All right, I think, we've beaten that detail to death so yeah. We have evidence that it is being throttled.

C

So so, let's ask the question then right: do we have a specific proposal for a scenario with less or no client-side throttling? That would be interested interesting to evaluate.

E

Yeah, I think um around like getting rid of all the rate limits in our existing test. Our existing, like load test would be, would be the first thing that we, it would be interesting to see. The second is like we can. We can even speed up like it's. It's super easy to speed up this test and and and say it should be going as fast as possible and see what will happen like if in in, if the system, if we want we all like.

E

Oh, if we don't have any reasonable overload protection that it will probably blow up.

E

So I think that's like um I, I don't there's like this won't be a signal like we are fine, but it won't be. It will give us some signal that it um at least gives us something I mean um in the past when we were trying to um speed up the test and run it as fast as possible. It was basically blowing up the cluster.

C

Okay, so I think you've made two proposals: both involve turning off all client-side rate, limiting one says: keep the current a higher level control over rate, and the other is says, speed up the higher level control as much as possible.

E

Yes, I mean the yeah. This control is like purely the test, speed more or less that's right.

G

F

E

Yes, but yes, it's mostly yes what I'm saying.

C

Okay, I think that's quite plausible, uh let's start with something simple and we'll just see what happens? I mean the simplest thing. Probably the smallest thing to do would be to uh try take leave the test, speed as it is, and just try turning off all the client-side rate. Limiting, let's see what happens.

E

Yes, that that's the simplest thing we can do.

A

Okay, yeah and- uh and I believe it will still see the difference um because of, for example, scheduler that we know that is hitting the limit.

C

All right, so that's one uh branch of the investigation, the other branch is for tuning the constants and in fact this could kind of be done on this first branch right, because really the constants are in service of this self-protection right, but we do have some constants.

C

I think we have three constants one is in the support for list requests uh and there are two constants in the support for watch requests and right now, they're numbers. uh You know that, are you know just guesses. um We should have some evidence for uh setting them.

E

Yeah, so um I think that the one for list request it's it's, not a complete guess. um I mean right. No, no, no.

C

We're complete guests, but right, but they're not really based on evidence. It would be good to have.

E

Some, but the watchers watch in particular are pretty much guesses. Yes,.

C

All right, but I think all three of them, we should try to get some evidence for how to set.

E

C

E

Yeah absolutely, I agree.

C

All right, so what would be a plausible way that we could do this? I mean one of the concerns I have is that the the regular testing is run in scenarios where there's concurrent load from other activities, it's totally uncontrolled, so um performance results are variable um and it's tough to draw conclusions from that. um Do we have tests that are run with you know nothing else competing for cpu or network, or anything like that.

E

I'm not entirely sure what exactly you're asking. So if you are asking about like in what scenarios in what environment our scalability tests are running, they are running like there's only that one single test.

C

Great, yes, that's what I was hoping for right that the con confounding factor is the tests that are regularly run on pr's right, they're run on machines that are also handling other stuff, and you know we have a lot of flakiness due to the load and the you know this. The behavior varies a lot due to the amount of concurrent load, just the interference from concurrent activity.

E

Yes, yes, um but I think that um also our existing scalability test may not be perfect for that, because um in particular, we in particular for lists, because we are not doing a lot of heavy lists or a lot of lists in general.

E

So I think we we probably need also somewhat more synthetic thing that will really try to do.

E

Try to overload the the api server with list with purely list calls like, let's maybe start with list calls with purely leased calls and um of different sizes and see what will happen and try to tune that variable. Based on that.

A

E

We are thinking.

A

More about about kind of like benchmark right, where we don't test whole system, but more like just listing, for example, and then we tune the constant.

E

um I think we probably should start there yeah.

C

I agree we should start with something simple like that. um I I do have a question, though: don't we need to the in some sense, compare this to the normal case. Right I mean if we have a situation where we find out what setting of the constant just gets us to some cpu limit. We want to enforce, for example, um that's so in my mind, right the way what's going on here right.

C

The reason we're doing this is because we have this regulation of concurrent requests, which is basically making the assumption that the cost of serving a request is proportional to its duration, right, because the concurrency is the product of service duration and arrival rate, um and what we find is that for lists, the cost of serving is bigger than for most requests, so to set the constant we're looking for a ratio or how that ratio between duration and cost varies between normal requests and the list requests.

C

So we need to have an experiment that has in some sense a control arm and experiment arm right, where we do some normal requests and see what we get we do. Some list requests see what we get tune, the constant so the two are giving the same result.

E

I I absolutely agree what I meant more is that I I don't even want to do that in very large cluster and and we don't need the all the load that is coming from all the nodes and stuff like that. I think we want to have better control over what requests are are are being rhymed against. I.

C

Agree, I think this can be done in a synthetic way um and that will be best because we can get the strongest signal to noise ratio in a synthetic test.

C

A

Okay, so so I guess we, we agree that we should create synthetic tests for kind of like um estimating this um ratio between the get calls and list calls.

C

We probably want some variety.

C

um It would probably be interesting to have some variety, because I'm sure that the cost of all the other requests not exactly the same it would you know what all we need to do is get the list calls in this ballpark, but we need to know what that ballpark is so having a variety would tell us what the ballpark is. So I think that would be good.

C

E

I mean we need to for other requests. We need both like rights and treats, and we need rights and reads for, like super small object, medium objects and larger objects to get some.

C

um Right- and in fact you know the fact, if we, you know pump up the object size enough, we're going to discover hey, you know these other kinds of requests can be exceptionally expensive too.

C

Yes, so that does kind of beg. The question of you know: what's the the sort of actually, how do we identify the ballpark? We want to be in.

E

Yeah, I think it's there's this concept that we wanted to do for watch at least, but I think it makes sense for all other requests, which is like trying to estimate the size of the requests also and try to compute the width based on our like that. Yes, the work estimate based on that too.

C

C

G

I will say.

C

Right that what we're doing for lists is, in some sense the the first and most extreme example of you know exceptionally expensive requests, but I have had colleagues say they had a problem with some other kind of requests. um Let's see, remember.

D

C

Right there was something: oh yeah, I think they introduced a new controller for a new custom resource, and you know the first draft was just really crummy and it wrote you know ever increasing sized objects, so this particular puts got really expensive.

C

Yes, it would be good if we had something that was in some sense, auto tuning all the requests and could recognize.

C

You know somehow um what what's appropriate for for every request. But then.

E

I think sorry go ahead.

C

Yeah, in some sense, we're straying out of six scalability back into api machinery, but right right, but I think yeah. I would like at some point to have something that could you know automatically auto tune all the requests, but I think that's not the question for here right. I think the question for here is given: let's just talk about how to tune with the the constants and what we've got so for four lists.

C

For the constant in lists you know, can we identify a you know, representative mix of normal requests, and then you know compare that to lists of various size to tune the constant for lists.

E

Yeah, I think we can because, like from.

E

Yeah, I think we can. We can probably we don't have to be super smart here. I think if we just like introduce some mix of like um yeah, as I mentioned, like read and write and like different sizes of objects, we can.

E

We can probably experiment a little like run this this like. Could we do.

C

Something could we do something like run a real. You know ede scenario record, all the api calls and then you know play them back at our leisure without having to worry about. um You know all the other stuff, that's going on affecting the timing and so on.

A

So I think right like currently, we are not able to play back kind of their requests.

E

Yeah and also like you, won't have a lot of lists there, so it right, but I'm thinking.

C

If we could, I think we could you know kind of bifurcate. What we do right, one is to say: let's have a representative mix of requests um play them as fast as we can, and you know, subject to a given concurrency limit right and and see what the resulting cpu and network is all right and then, instead of that, just do heavy lists.

C

You know after loading it up so there's data um just do a bunch of heavy lists right and tune the constant till we get to the point where that also produces the same cpu and network.

E

Sure yeah, I don't think we really need to replay anything. I think we can. We know roughly what is going on going during the tests. um So we can.

E

We can approximate pretty well with, like some small number of requests, so.

C

E

Just respond, let.

C

Me just recite that back in a little more detail, so I think you're suggesting I get on the two sides of the experiment. One side we can run some edb tests, look at the request rate and the cpu and then on the other side, uh replace the those ordinary requests with you know. After loading up some data, uh just do a bunch of of heavy lists and tune the constant until we get the same relationship between request rate and cpu.

C

Now that's not going to work because the request has got to depend on what we're doing, uh because if it's it's yeah, because otherwise it's just going to be a function. Request rate.

E

Yes, but what I'm saying is that, like there aren't that many like, if, if you want it to simulate somewhat what is happening in e2e tests or in our scalability test, we we know what is happening like this. This will this this will be dominant. This is dominated by.

E

Starting the pots and notes note originating signals like in particular, um node heartbeats and updating endpoint slices. So so we don't really need to to like simulate the test like I we we can um play a little bit with that, like ratio of of those calls, but we we know what is happening there right and like it can be.

E

We can generate like uh four types of five or five types of calls and like that, the simulation will be pretty simple. To do.

B

If we have a test that basically creates objects of different sizes and then we get and we do some patches and concurrently if we execute some lists with different configurations, that probably is a good baseline. I think for us.

E

Yeah, so that's roughly what I what I was trying so so yeah. I think what mike is saying that, like we first should do pretty much this without lists. Even but but yeah. That's I mean it's. It kind of boils down to the similar thing. We can do it in one shot. We can do it in two shots, like I'm fine with both. So yes,.

C

B

Are you saying that uh when we do list, we will be doing only list that that doesn't like? I think, if you have a mix, that probably is more um close to the real world scenario.

E

I I agree. Yes, I mean we, we should have at least some non-list calls like. um We should think about, like duration and probably experiment with different ratios of like um non-list to list calls. But but yes, I think we we. We should have both.

C

Yeah, I think I mean again, I I think it should be sufficient right to be able to uh run it kind of in two modes um yeah if we have a mix that, in some sense, you know, is diluting the signal right. uh If, if lists are a fraction of the load, then they're going to be a fraction of what happens well, it'll be easier to see and analyze. What happens if we have two pure distinct modes.

E

Yeah, so so, how I, how I personally envision the the test, is that, like you, have like this one? Not when you are saying that um n percent of the load is coming from those calls, though, like those right and simply like single object, calls and like 100, minus n is coming from lists, and we can. We can just by changing like one constant in our test like run either only list or all, or only like those single object, calls or a combination of both.

C

That'd be great right that way, we could test the hypothesis uh by actually trying some in between and seeing if it also matches the prediction from the extremes.

G

E

A little bit more, but we should probably try to conclude it somehow um so abu would you would you be able to start something around that.

B

Yeah yeah, I mean from our initial discussion from the last meeting. uh We thought about like adding a web hook, but we can just go with the first like even the first version is what we discussed today and then we can go from there.

C

Right I mean the web hook, isn't really addressing this issue right. The issue here, we're just discussing is the driver, that's submitting requests. We need.

B

An appropriate driver, yeah, the the purpose of the web book is to basically increase the load and then see, but that could be a different phase.

E

B

E

Once we have like a proper load generation, we can, we can go farther and then see what is happening if the requests are getting artificially longer and stuff like that. But yes, I think it's. We need to start with some load generation. First yep.

B

D

Very good, thank you.

C

D

B

Can start with that? That sounds good.

C

All right, so is this meeting supposed to be half an hour? Is that what's going on here.

A

C

Dang, I did have another topic.

C

You know with wearing another one of my hats that you guys haven't seen now, I'm trying to do some performance studies um looking at latency uh out to um watch clients, um I wanted to talk about how to support that, but I guess we're out of time for today, um but if anyone wanted to follow up on the slack channel, um that would be great because I I want to make progress soon.

E

C

All right, thanks guys, thank you.

F