Kubernetes SIG Scalability, 1 Apr 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 2021-04-01 Kubernetes SIG Scalability Meeting

Description

Agenda and meeting notes - https://docs.google.com/document/d/1hEpf25qifVWztaeZPFmjNiJvPo-5JX1z0LSvvVY5G2g/edit?ts=5d1e2a5b

A

Okay, hello, everyone: this is six scalability meeting uh february 18th.

A

um We don't seem to have anything at the agenda. Yet, to be honest, I'm like I was mostly busy last uh weeks with like feature freeze and production readiness, and I'm still recovering from that.

A

um So I don't have anything prepared myself, but I guess like I guess some of you have something to.

B

B

Hey waytech, I I can go first. um So so uh are you able to hear me?

B

Yes, yes, okay, yeah yeah, so I had a question about um like specifically with respect to endpoint slices, so um so now that we are actually splitting those endpoints into different blocks like uh do you know how we are like measuring or like if we plan to measure the like the network, programming, latencies and stuff with with that like? How are we going to measure p99?

B

Is it going to be per like per slice or per the whole service or.

A

It seems I was muted, um so no, I think the definition doesn't doesn't really change. um The definition was basically per pod, more or less um so that doesn't really matter. If we are using end points or endpoint slices, it doesn't change anything.

B

Okay, so so in cluster loader. Also, we are today measuring just based on part two part basis. Is it for each part like.

A

We are basically measuring from from when pod was turned into being ready or not ready um to when it's reflected in ip tables.

A

Okay. Well, I think it like yeah. It doesn't have to be ip tables, but I guess we didn't implement it for others than ip tables. I might be wrong, but um that's my feeling, if I remember correctly.

B

Okay, yeah, that's fair! I think that answers my.

B

B

Anyone else has any questions.

C

Hi, I have a good question around infrastructure for scale testing, uh so there is a perv dash project uh um inside uh perf test repository. I wonder how actively it is developed. I I see there is a public installation of it and I was wondering how feasible it is to have it locally installed.

A

It is definitely feasible. We are, for example, in google. We are running our own instance too internally, so it's it's generally quite easy to set it up um regarding how actively it's being developed, it's not really being very actively developed. What we are thinking about is we are thinking about switching um at least some parts of our tooling, in particular, of perth dash to to something what is called mako.

A

It's basically like a framework for tests, it's basically a whole framework for scalability or performance testing. It doesn't the part for.

A

For load, generation and and and stuff like that, doesn't really work well for our purposes, but the depart from like gathering results, um processing them, um detecting, regressions and and analyzing data, and all that stuff is actually pretty generic and we ideally would like to um migrate to that. That's why we are not really investing into dash, but it like. We don't have much capacity to do that.

C

I say, thank you is: is macro open source? How do you spell.

A

It yes, it's it's! It's based on, like the thing that was created internally in google, but it's fully like I mean mako itself like is the open sourced version of of our internal infrastructure, so yeah it's fully open source. Yes,.

C

Got it thank you so so mainly you publish the results to test grid which is part of sick testing and really part of testing infrastructure, and uh um there is perth dash uh to be replaced by maca. It does that conclude a set of uh visualization tooling around it or other more uh projects.

A

um No, I think that that's that's it. Okay,.

C

D

Oh sorry, can I have the link for for this for pep.

A

D

Yeah, oh no! No! This is like marco. I think what do you say? Oh.

A

The marco um yeah, I think, let me let me find it yeah.

D

Yeah, this is the first time uh uh attending this meeting, I'm I'm wilson and I'm from by dance, and actually we are uh working on the schedule like cluster scalability uh projects and we are uh trying to you know like uh set up cluster like uh around, like 10 uh 10k to 100k uh nodes. That's the the target is very ambitious, but uh we are working on this so yeah.

D

That's why I uh uh talk with different, like you know, cognitive uh teams and see if we can uh get some, you know ideas or get some of our uh like idea, um get some ideas and also get or like the optimization.

D

uh You know like uh get into the open source or contribute back to the open source projects, or something like this. Okay, yeah nice to.

C

See you guys yeah nice to meet you.

C

B

Hey, I can go ahead with another question, um so this is. This is more like curiosity um and I I probably haven't fully thought they thought about this idea yet, but um so so like when you're sending a request when we're reading something from hcd right from like when we do a regular read without any resource version, and the list list call goes to hcd.

B

So in such cases like today, like the response, the whole response comes back from hcd right. Is it? Is it possible, let's say uh to not do that and just return? For example, a resource version of what the response would have been would have had and can the can the api server if it has a if it has a cache, which is at least at that resource version, then it can just uh serve from that is that is that something which.

A

B

A

We've been discussing, I think clayton recently described it in a little bit more details, how it could work.

B

A

Been discussing it with clayton, I don't know a year ago, or maybe even a little bit more um and it's something we've been thinking about. We believe we can make it work. We just need someone who will prototype it and prove how it would behave basically in scale.

B

Oh nice, do you have a link to that to any ticket or something where this discussion is.

A

um Let me find it here, I think I will. I will probably add it to the notes for the agenda at our agenda notes, um but I think I can probably yeah you can keep discussing stuff. I will probably find it pretty quickly.

B

Okay, yeah, so that's one I mean um so I had a question around that.

B

Probably you guys would have already discussed it on that thread so um like as in if you're going to serve from cash, which is at least at that resource version or higher, then, is that going to cause problems when there are like two, two apis server instances right and then like once, let's say you're requesting for rvx, but one is on, let's say x, plus five and the other is on x, plus 10, and the first call goes to the second one.

B

And then you get a more recent result and then you make a call to the first one. You might get an older result. So would that be a problem that you you might get if if it works like that, like, if you just from cash for anything greater than export, that way.

A

Now, I think what we, what we were thinking about is a slightly different version of it, which is basically you send the request to hcd, but not for the value, but just for its resource version. You can do that. So give me a resource version of that particular object and what you basically do on api server is take the object with that research version. You need to you. You need to have a quite specialized cache for that, but it's doable.

B

I see and for this cache, which is for which you're saying you're getting that resource version, the response expected is like the apsr will send at least something which is on or later than that that resource version.

A

Now, in fact, like what we are saying is that we will return exactly the same thing that that cd would return, because we are, we are not even saying that we will return at least us fresh thing from the from the cache, but we are saying that we will return exactly that version from the cache.

A

Oh so you're saying some kind of a quorum read so you you said so. What you do is like you send a request to at cd for the resource version, which is a quorum break. We are, I think we are right or I'm sure we are by default. Using quorum reads to nxt or api server by default is sending quorum rates.

A

um So you know what exact version of the of the object is from the fcd, but you don't send the data themselves from a cd and you don't have to like deserialize them and stuff like that, because you already have them in cash hopefully, um and you you, basically save that dc realization um and send and sending that data. Basically, I see so.

B

So uh so just digging a bit deeper. So if, if that, uh if let's say you got this resource version, but your cache does not have information at that, it has something which is further into the future.

B

How are we going to serve, or you mean we are going to like change the cache in a way that it also keeps.

A

Yeah, it doesn't work now, like you need to significantly you pretty much need to re-implement the whole whole stuff, like the re-implement. The watch question completely different way. um This is the link. um This is where the discussion about that is starting.

A

It's it's going through like a couple more comments there, but pretty much till the end of this issue.

D

Okay, I saw from the aps server it's not like it's using linearized uh linearizable, uh like request to the active activity uh server, so I saw thought it was just like doing one read and then you know like uh send a request to any one of them, but but specify it's like uh you know uh by default. I think it's linearizable read, but I'm not sure like uh I. I didn't check like that. I didn't see the detail for the corner. Read so, do you guys uh have any more information about this.

A

So we are setting the quorum at the city, client level,.

D

Oh client level right, okay, okay, I'll, go back and double check that. But I from what I've seen I thought it was just um doing. The linear rise, linearized, uh lowest linearizable are like requests so that like, when you send to the onenote, then the activity server likes. The follower will request a leader and then request requested the uh I mean send the request to the leader and then later, when the leader uh make sure that all the com, all the like the current um like id applied, then the uh follower return.

D

The read request: something like this: okay I'll go back and check that.

B

I I think what you might have described is actually how quorum read works when why they can correct me if I'm not wrong, but where you have to replicate yep.

A

Yeah yeah, I don't know, to be honest, I don't know the exact decades of raft and like how it's it's the implemented underneath, but um but yeah and conceptually it's a quorum right here.

D

Okay, so we are doing it on the client. We are not doing anything specific right, so just using a client and we're doing the like quantum ray. Is that what you're saying yes, okay? Okay, I see that yeah. I think it's probably just using something like using the thing that I probably uh just said like the internally by the activity server to do the uh do the quorum read. I think.

D

Yeah we discussed about the uh the apk server and activity server, so I think we now have actually in internal in our company. We are actually thinking about um like, uh uh like the um how to make the uh activity server um like see like do the like do the sharding. So what we all plan is actually uh because, like for the one activity uh cluster, the um it's like, basically, it's uh it can serve like many uh reads, but for the right it's like for a single cluster is limited.

D

So what we want to do with a as well uh cluster increases we want to have the like. The uh we saw that the opponent like the activities of an aps over uh here it becomes a bottleneck. So we are what we are planning actually uh to do the uh to the sharding for the uh for the aps: server like uh backhand, so that means like for the api server instead of having one uh uh activity cluster uh it can like uh use several uh activity cluster and like do the key charting.

D

I know that in the eps server is already it can already for the different names. I think for different resources using different active activity uh cluster, but it's actually, we saw the uh the number of as all cluster size increases significantly. The number of paws itself is very significant, so what we want? Actually we want to have the like the resources uh distributed uh like uh evenly like amount and also the load distributed uh evenly among different uh activity clusters.

D

So actually I uh in the community I uh discussed about this before, but the what they suggest me is actually uh file like file uh kp, so that people, you know like- uh can discover, discuss about this and well actually from from my initial design.

D

Actually, we uh we want to make the resource version, and since it's like a string inside the eps servers, what we are planning to do for the different activity, cluster backend, we have a resource version and then for the api server like the resource version string, we actually contains a vector which is actually the different uh activity. Back-End cluster uh version, like the combination, uh make it into one string. So in this way we know the different uh resource versions for uh like on the different uh activity: back-end clusters.

D

Yes, that's a war initial plans but since like since um aps, server is not using like not required to use any like distributed transaction or something. So I think um like because api server is just using some of the basics, such as like uh basic functionality such as, like the transaction simple, you know like update and delay operation, also the watch um functionality. So I think probably this might be possible uh from the like.

D

The change might be possible from the eps server side, but I'm playing I'm planning to make a kp document later share with the community with the community and get some feedback from people.

A

So we've been considering that like couple years ago and they were like you- can probably try looking into the repository, I remember those discussions happening in the past. um There were couple things that I can't remember exactly. um I would need to think about it. More um like watch is certainly the non-trivial here how to how to make it basically.

A

A

Well, maybe we don't need to make the full order on the events.

A

Yeah, if watch is definitely one of the things that we need to keep, um we need to pay a lot of attention to. The other thing is like how complicated the exchange will be, and the other thing that is probably well. It depends on like the exact setup, but the if you know up front like how many shards you want to have that sounds much simpler, but it sounds like much simpler project problem than if you would like to allow resharding later. Also.

D

Nicely yeah recharging, definitely a issue. You need some coordinator, you know or like some something to you know like uh help in in in the during the.

D

Operation cool, thank you yeah. Let me later see if the only I can find something really this in the.

B

Repository hey one last question, so um do you think it makes sense um to allow if I don't know if the client already does my understanding is it doesn't uh if we can actually ask for a non-quorum read through a client if we don't actually care about the results results being linearized?

B

Is this something.

A

We so if you don't, if you don't care about uh the most fresh request, you can simply opt in for watch question right, so setting resource version, zero.

B

Yeah, that's a good point. So original version.

A

X, if you know that you want to be at least fresh, at least us x is okay, but.

B

The the problem with that is, I'm not sure so so can, but we are not able to paginate with that right if we have to let's say list from the word.

D

B

You will have to do the whole list at once, but I didn't actually think that's. A good question is if hcd can do pagination when the reader's not quorum.

B

Or do you that'll be tricky as well right if.

A

A

A

I think not necessarily, I don't know like I don't know the internals of that cd, but.

A

It potentially may work.

D

B

Yeah, I I mean, I think, we'll have to preserve the resource version, but let's say the different calls go to different hcd members like each page of it goes to different lcd members.

B

Then it may be possible that the resource version you are requesting is not like the page of that is not available on nhd that you just you, send the call to because yeah it's to all. If it's to.

A

All it may always happen right, so um um not even too old actually and to.

B

A

Yes to fresh may happen in that case. Yes, yeah too fresh. Yes,.

B

Okay, but anyway, so this is this: is this something we spoke about before we talked about before or like was the decision or like why we don't allow this? Is it because we just want to try and watch cash in such cases or.

B

Because this is actually causing problems, so I mean I didn't give the background about this. Sorry so so, like we were observing this on some some of our like clusters that we manage um so what's happening is like all those which are sending requests to hcd they're by default becoming quorum and the client actually to be honest, is probably doesn't care that much about being quorum.

B

uh Quorum breed, but on the other hand, it's also not a good thing to just do a non-presentated call to to the watch cache.

B

So yeah, I just want to see if we thought about any ideas here or if any of the work.

A

Yeah, I think the question is like which direction we would like to go. If we want to go if we want to proceed with the direction that, like we've, been discussing with clayton, which I mentioned about about this, like re kind of redesigning, watch cache, um uh I think it may be possible to just implement pagination there. Basically and then you would be able to to paginate from the cache directly.

A

I didn't think about, like I didn't, think super carefully about it, but I'm fairly sure it is possible. I mean I I have some intuition how this could be done.

A

um I would need to think about the details, but I I I'm fairly sure it should be possible.

B

Okay, so is this a cap? Why is it going you're still thinking about it.

A

No, I don't think.

C

There is a cap.

A

um I know that clayton was looking for someone to to make a prototype and just to see if it really will work um at scale so that we can like do some scale, testing and so on, and so on. um I'm not sure if he found someone so far or not.

A

I think we didn't want to go with like a formal cap and stuff like that before we before we really really prove it will work. I think we we have intuitions like how should this should behave but um well. Those are mostly intuitions.

B

Okay, yeah, that's good to know, though, thanks because we're over time.

B

See you all in two weeks, then all right guys, hello,.

A

Oh sorry, bye! Well so hello.

A

Bye thanks, thank you guys for the information.