Kubernetes API Machinery Special Interest Group, 19 May 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG API Machinery 20210519

Description

SIG API Machinery Meeting, see the related agenda.

A

Hold on it's recording, thank you. Welcome everybody to seek api machinery. Meeting for kubernetes today is may 19 2021. Thank you for everybody that has joined and I hope, um everybody's safe and doing well in this crazy times. We do have a packed agenda today, um which is great and we are going to get to it uh right away. So I think the first topic is from yusuke. um I will give it to you. Let me know if you need to share screen- or I just you know, browse through the links you pasted thanks.

B

Let's see, okay, can people hear me? Yes, okay, cool. uh So, uh first of all you know thanks for having me. Let me give you a super tiny intro about myself, because that's my first time here um yeah, so I'm yusuke, uh I'm the I work at google and I'm the engineering manager for a third party operator known as config connector, and so you know, one of the discussions that we kind of have in the design of things is how object references should work a little bit of background.

B

Config connector is a an operator to help you manage resources in the google cloud platform, so the relationships there tends to be a lot of them um and that kind of motivated a discussion around maybe trying to understand what the best practices really are around object references, and so um I I filed this pr um and I would love people to kind of take a look through uh and I for those who do kind of have context.

B

Maybe the easiest thing to see really quick is the um the files and then just going down to kind of look at the object, references examples, um so there's conversation that we can kind of dive into but kind of uh been incorporating feedback and where we've kind of landed, uh you can just stop there. Maybe right!

B

That's a good place too um is the the guidance um you know from people in this group, uh david, daniel and so on, um is to basically kind of move away from what was existing as a best practice documented previously in the api conventions.md, where you're, using the object, reference type, which included uh fields, api version and kind to specify uh the resource that you wanted to refer to effectively and trying to kind of shift towards these fields for group and resource which um are actually unique identifiers versus the the former um which causes some ambiguity uh for um kinds that are mounted upon on multiple resource paths.

B

uh So that's that's kind of one of the highlights uh I wanted to to kind of give, and then um you know, there's probably another point. One of the big discussion points right now is around uh kind of how field paths um should work.

B

So there's examples of a situation where you might want to go reference: an arbitrary object, um yeah field reference, um but and you and you may want to basically uh specify the the value of the the sorry, the yeah, the field by what you want to extract the value from um so um you know effectively that's kind of where this proposal is right now and I have a couple specific questions I want to dive into.

B

But um are there any questions I can answer in regards to kind of the proposal or the conversation that we've had up until this point.

C

uh I guess I can say something since I think I've left uh well, I don't know, maybe a plurality, if not a majority of the comments on the on the pr. uh I am I I don't think what you've suggested or I I don't think it's a huge change, the pr in fact, I'm I'm a little confused about how like it seems like a large discussion for for for the change. I also don't think anyone's substantially arguing.

C

uh I I just want to clarify one thing which you said something about api api version. I I I just for for for anyone who hasn't been here for like the five years or whatever that we've been doing, this uh api version is misnamed. It actually includes the group, uh it's actually group and version in that field, um yeah, so uh yeah, yeah, so explicitly. Switching to group- I I guess I don't have an issue with it's. We don't do it anywhere else, but I don't necessarily have a problem with it.

D

I, like the change to group, and I certainly like the change for resource- um oh yeah. It has to be resource and not kind for this purpose. Yes, um I I will admit I did not check back after my first couple rounds of comments.

D

um There was a discussion about serialization formats when you're selecting an individual field and how they require a version. uh I want to make sure that's included um and then yeah.

C

D

C

If you're, if you're sl, uh I didn't, I didn't see that either but yeah. If we're, if we're referencing a specific field, we have to lock that down to a specific schema. Otherwise it might not be a valid field: selector yeah and it's okay for meta fields. So, like name namespace, you would control or ref etc, because we can say that those are effectively non-breakable ever that's effective.

E

What we said with meta like v1, we will probably never introduce breaking change if we did. That would be a part of the schema change of the reference and up to the author to make sure that's backwards compatible. I can't think of a use case, but, like an example might be, um let's say we were crazy and removed cluster name, and someone was actually using cluster name and object. Ref it's up to them and their object reference to do the mapping themselves and to resolve ambiguity.

C

Yeah, this is actually a good point and I wonder, do we actually have a use case anywhere for actually putting in a field path? I wonder if we could leave that undefined and and have it have a pr just about adding a field reference type, because because of what clayton just said um right, it might make sense that that, like you're referencing, a particular object. That's some resource, but your field path is is part of meta and meta could like.

C

We could treat that as its own kind right, and maybe maybe you want that to go.

C

Maybe you want your field reference to to be a schema and a path, not just a not just a path right and the the the kind that you want to to specify the path in may not be the same as the the kind that is stored in that resource.

C

Oh yeah, that's even trickier. um We haven't talked about that in a long time, but kind's changing under resources, yeah.

B

Yeah, that's a good point um and just for clarity of the conversation field path is actually today sort of semi standard. So if you look at al, not the right place to kind of post links, um but there's actually a core sort of object, reference struct that exists, that's leveraged in the api documentation, that was the previous convention, and that includes actually a field path um field. So my intention was to kind of clarify anything that was already that clearly kind of had a meta convention. If that kind of makes sense and.

E

We didn't move object reference out of core. We didn't create a duplicate of it in meta v1, because that would have involved a bunch of other changes. I don't remember if there was another reason we didn't add it to meta v1 at the time david. I don't know.

D

uh In fact, I enumerated like the six reasons why we don't want it, um there's actually the what happened, I think is that this dock didn't match the godoc on the field, which I do not have handy, but the godoc on object. Reference talks about shortcomings, where it's things like you can't provide documentation about what a field is used for. You can't provide information about.

E

That's right, which.

D

Resources are there.

E

And that's very similar to conditions where we basically eventually said you should have typed conventions and we use duct typing for conventions from a if you're, a generic person and you're looking at status conditions, you're duct typing it effectively, but in go code you should have typed. You should have condition types that are specific to your object and your schema. They should follow the duct type convention, but they are not they're not required to effectively and we didn't want people sharing conditions.

D

We didn't want to sharing the condition types, but the shape of the conditions.

E

Share the interface, but not the implementation, if only we had some sort of language that we could use to reference types across programming, languages and serialization.

D

F

D

D

I have uh have people go through and make a final pass.

B

um Yeah, I guess I don't know what the full delta is between this and sort of approval. So if we're at kind of approval level that's great, um but if there's any feedback I should incorporate before then I'm yeah.

C

Yeah, I I think that probably just needs another uh another pass or two from probably mirrored or david, uh and I I don't know if there was anything major wrong with it.

C

E

Want to solicit feedback from api reviewers.

C

Yeah, okay, sure I can do that. Yeah.

B

C

Yeah, that's good. I I was just going to say that I think the the reason to have it on the on the sig meeting here is just so people know about it and if you want to object or just follow along at home, uh not your chance.

B

There I had one question about follow-up action items of this so because um sort of the recommended structure structure is kind of not present in the kubernetes like core code um is it like? Is there a desire to kind of you know add such a type like in the core types, or do we just leave it like? Basically, this isn't implemented. This is a recommendation. Without you know some implementation, even in the core kubernetes code, so.

D

I'm hesitant to add it to the core code, although in in the slimmed down. This is how you reference a type. I think it's a little bit better, but oftentimes. There are limitations on what kinds of resources can be referenced.

D

You usually know whether you're referencing something inside of a single name, space across namespaces or in a cluster scope, and those things can be reflected if you define a local type and write it down in your dock.

D

If you reference a common type, you can get unnecessary fields and you can you lose that ability to say this has to be a resource that looks like a fobber.

D

C

I don't know that we need one right now, I'd be immuneable to to adding like a fully general reference type, um but I don't know that any existing api actually supports fully like arbitrary references like maybe I don't know, owner references and uh yeah, maybe owner references do, um but obviously we can't change those they're talking.

E

About object, references aren't fully general because an object reference actually has specific meaning and that's david's point about the type for object. References needs to convey specific meaning to owner references that you can't reuse. The types for like you need godoc for owner.

E

This is what the you would field means. This is the behavior of the field um and so actually sharing the type leads to like. I think it leads to worse documentation.

C

Yeah, that's that's! That's true.

B

C

Stuki's point about: uh there's: no there's no uh like canonical place to copy paste your your reference from, um and that might be.

E

That would be a good to have in the docs section of maybe like in the doc.go comments of meta v1 types like the package dock should make that easy.

G

And we another consideration.

E

G

Evolving, a type is super hard like as soon as you have a reference type like that. Someone want is using it for something and they want, to like add a namespace field or add a whatever field, and uh it's easy for there to be accidents where you expand usage.

E

For a bunch of people could api conventions just have the copy paste example that we want people to copy and paste.

C

Yeah that seems like a reasonable place to to put it to me is like copy paste. This, um the only actually, the only use case I can think of for, like fully general, like arbitrary object with arbitrary field reference, like the only use case I can think of, is something that's like a like. A find and replace like like insert an environment variable or something like that like get it from this api random api object, um but that also has to deal with like.

E

Typing, so then you need it's a. I think this is like the in order to do a reference. You have to fully specify the destination schema in order to do it or the destination interface, but not everybody.

C

Will implement.

E

That so kind of a halting problem of references.

C

Okay, I mean yeah yeah if, if you're going to reference a field yeah, you have to do that. If you're not going to reference a field, then you're, counting on the the interpreter of the you, you have the of the reference to know what kinds of objects might be there and and what to do with them. Yeah.

D

We use fully generic references to be able to say if something has gone wrong with this resource. Look at these and other resources, which maybe cluster scope, maybe name space scope, maybe in a different name. Space are of types that we don't know and maybe a list to help diagnose the problem and it effectively builds a way to make like a a super described command right where you recursively check. All these things.

C

Yeah, maybe another use use for, like a very general references is, if you want it to inspect the conditions on the object that could that could work for many kinds of objects.

B

I have seen it used for reporting in core, so event also has an object reference. It's very generic as well. So right.

D

But even so like the fields on our data collection, one we document about how it's for data collection, what happens if it's left empty uh and those semantics could vary depending on what you were trying to do. Yeah.

C

Yeah, uh so I think we're all agreeing that it's reasonable to put like an example, go type in this documentation for people to to copy paste and give them the instruction to, like you know, write your docs write your docs on this.

B

Okay, yep totally totally sounds good um and then one quick design question on that. One specifically with like um the resources, there's a situation where I think like you might want to annotate, for example, like a sub resource like scale for a deployment with something like deployment scale, but I know that that doesn't directly translate into sort of a path that you can paste into the api, because the the syntax is sort of deployment, slash name, slash scale, so this is a mini edge case that might lead to some ambiguity.

B

um So I didn't know if that was worth clarifying now or.

B

This sort of my problem makes sense.

E

I we at some point do so there's a longer running thread here of, like even cube, control doesn't deal with such resources gracefully, but I've talked a couple times about going and fixing that I have like I've had three historical pr's trying to make sub resource more general. I would probably say just based on historical lessons.

E

Adding a sub-resource is just another field on your object, ref and someone would have to define the semantics of that. We definitely wouldn't. I would say we would not want to collapse that in the resources in the resource field, I think historically, when we've merged underlying objects or underlying fields into a single field, aka apa version- um we ended up regretting it a bit later, so um we did that for backwards compatibility.

E

If we don't have a backwards compatibility case, I would say: uh if we're making a recommendation, don't bundle sub resource with resource sub resources is a distinct thing.

G

Yeah, I think that's the case. I agree and I think that's the case for like in this context. Are you expecting clients of this reference to deal with sub-resources and if you are then include a sub-resource coordinate for them and if you're not, then don't, but that's where, like a generic reference, would get us in trouble.

G

I think like if there was a sub resource field that was there, but pretty much, no one used it and so uh like unless they took steps to prevent data from being in that field, like people could stick random data in there and then you'd have clients that were or were not expecting it. um I think that's the point of the specific use saying like we expect you to have some resources. So if you have that include a field for it, yeah.

E

And concretely like an example, would be hba pointing to a workload controller, um it's up to the implementer of the hpa api to define what they want there, but the object reference implicitly is using scale. It's totally reasonable for them to have a implement. It was totally reasonable for hpa to have an implementation that bypassed the scale sub resource on things that didn't have scale sub resource and just made a guess.

E

That's up to the implementation, and when you define your api of what what your object reference is, you should clarify that for your users, which is you didn't, need a sub resource field on the hpa reference because scale was implicit and then, when scale didn't exist for extension types, the controller could do something reasonable. So you didn't need you to specify subresource in your api.

B

Yeah, that makes sense, um I think, also it was kind of a. I don't have a concrete use case for why someone might want to declare a sub-resource on a generic reference anyway. So um I'll just omit it from the uh I made a recommendation, then we can go back to the. If someone wants to try something.

G

I one one thing that I I hadn't looked over the uh the doc update um after I had gone through a few iterations. It might be useful to kind of say like as a client.

G

How would you use this and make the point that if you lack the version coordinate, you have to hit discovery or have like a priori knowledge to say, like in this group, find a version that has this resource, um so there's there's still an aspect of using discovery, data or a priori knowledge um and uh that that connected to something there's some resource stuff like to say like given these fields like a group and a resource and a name.

G

I guess how do you actually make the request for the thing, and so what that would look like is.

G

See which versions have this resource uh then construct your url like this, and that would sort of make concrete the uh the idea that this resource field is not a glomming together of like resources of resource yeah.

B

That makes sense they're um there isn't the current pr sort of uh uh english like controller behavior. That doesn't enumerate things, I think, is explicitly, but if it's desired I can go back and actually you know have a step by step of kind of what the steps are and the expectations there.

G

Yeah, I don't think it has to block this, but if we're looking for like people to find their api and people who are looking to consume an api like being clear about the best practices on both ends would be helpful. It can.

H

G

Client stuff details can be a follow-up. That's fine.

B

All right sounds good. Okay, those are those are my questions. So thanks. I know it's a busy agenda.

C

Okay, who wants to talk about uh client, go load, balancing.

D

I'll go ahead and talk about this one, so it's come up a couple times um where a person thinks.

I

That they want you, should we have someone in favor of this uh advertise it to us.

D

I don't know if stefan is here. um Sorry, I'm in.

F

D

uh I support this.

F

And okay, clean.

E

So in general, client-side load balancing complements and does not completely replace server-side load. Balancing clients have advantages that servers. Don't we made a run at this for a couple of specific use cases historically, specifically around cubelet, dealing with inadequacies of the broad range of load. Balancers we deal with, I think, as a reasonably generic core client library.

E

Client go should have the ability to do a few very specific use cases I think, there's a spectrum from uh a simple fallback or a which would be useful for places where you're expecting a load balancer to work, but there's a local option. You could use all the way up to potentially a cautious approach to. If someone would like to do something a little bit more intelligent with client side load balancing to a set of api servers that should be possible.

E

I'm uh I would advocate for the it should be possible to do this and we should recommend have some guidance around when we would think it's turned on and what we think our clients should do. um But I know that and I'll I'm, like you know, stefan and I were arguing a little bit about this. There's a ton of complexity in this and getting it wrong has a lot of subtle problems and so just making sure that we have a a concrete path where someone could reason about it. We could test about it.

E

It wouldn't really be what I would consider a standard option until it had gone through some maturity. There may be libraries or ecosystem advantages to plugging in other stuff. I just didn't want to.

E

I don't think we should close the door on this being possible in client. Go because I think over time, clanco is being used in context where either within a single cluster. We have a reason that this might be useful for infrastructure requirements or in the long run.

E

We are talking to more of a diversity of api servers than we offer historically before subtle plug and those api servers may not always be in the traditional load balance configuration.

D

So can you talk about what the advantages are? um It sounds a lot like an advantage would be. I don't have to set up a load balancer. Maybe what else.

C

I I can, I I'll add one one thing in the pro column, uh which is um uh priority, fairness is taking the approach that each api server divides up its it's a available concurrency independently, which means that as a client, if you want to take full advantage of your allotted concurrency, you have to talk to all the api servers separately.

C

So that's now maybe priority fairness could take different choices, but it seems like other choices, involved. All the api servers talking to each other about who's. Doing what um so we we're definitely doing this for the for the moment for simplicity.

C

So that's a reason why you might want a client to be able to talk to all the api servers in the quorum uh in the in the group. uh I don't actually buy clayton's last point about, like maybe you need to talk to different api servers, seems like uh those api servers.

C

The way that clayton, I think, is thinking or they're, not combined that way, and neither either way it doesn't seem like client-side.

E

C

And thing is going to help you with that problem, so an example would be aggregated.

E

Api servers, which is um if you're depending if you have web hooks on a cluster, you want to run those web hooks. You should be running two replicas and the cube proxy is not perfect and from a resiliency perspective, I would say a client on cube using a well-written client go client load. Balancing plugin can do better than cube proxy in many cases, especially for core resiliency operations.

E

Another example would be the controller manager talking to either the local instance of the api server or the load balanced one in failure cases and another one is the cubelet talking to a service load balancer address versus using the addresses of the api servers for internal connections versus the internal.

C

E

Bouncer each of those are examples of a similar pattern, but they.

C

E

C

It's it's possible to get better reliability if you have more connections, but it's also possible to get worse reliability. If you have more connections right like you, have to actually use the connection, that's good and not the connections that are broken uh so uh like, hopefully we wouldn't add a bunch of bugs but yeah. Well, I think that's.

B

Not a slam dunk.

D

C

This is better.

D

C

D

That that this is better than cube proxy like if it's better than q proxy, why doesn't cube proxy just do what this does.

E

Is separating q did this originally and then we switched to a higher performance option for maximum kernel throughput for all of the world's most demanding services, and we actually made q proxy worse right, so cube proxy round robins you used to get connection failover behavior, which was awesome, and then we regressed it. When we moved to iep tables some people moved to ibvs some networking plug-ins implement this. Some don't so cute proxy is good, but and like so david and daniel, like I'm, not actually advocating that we should just rush into this.

E

What I'm saying is, with the appropriate amount of caution and a sufficiently constrained set of use cases that benefit failover and failure, cases where you don't get to own all sides of the equation.

E

There are advantages, but it comes with. The risk is what you think is that's my fundamental con. Is it's very hard to do this correctly? um I've watched us struggle over the last seven years to get it right. In all cases, we would want a compelling use case or the ability to clearly indicate to a consumer when you should use this before we rush ahead. So maybe that's like my con, which I think is agreed with your con.

D

I mean I've got I've got other things like other use cases you mentioned where, like I got a cubelet and works, I want to go directly to a cube api server. I mean again the question there is like: if you can't get one load balancer right, this load balancer is effectively fanned out to every cubelet and the behavior model. There seems to be a scaling issue and also, if one doesn't work right. Why are the 500 in your cluster going to work right.

E

Another trade-off- and this is worth noting- is um the cube. Api server has a certain set of guarantees that are when running in ha mode that are intended to make them easy to reason with, but to preserve some of these scaling characteristics to not leverage that, like it's, certainly possible, that arbitrary servers um behind a load bouncer, don't actually work correctly, and you need to let the load balancer handle that for like things like session affinity or locality. Those are really just like workarounds.

E

um There is an angle here that uh we don't actually do all that great in our tests and some of our use cases and our controllers of actually handling the behavior of getting people can easily get tricked by the different state of the watch cache. For instance, it would be nice, although not required to actually exercise that as part of some normal operations, because again it is a behavior that should work correctly and regressions to it are not valuable.

E

So if we can cobble enough other advantages for using it with some component that might um actually mitigate it, the one that I think comes to me is cubelet.

E

uh Being able to do something similar to what q proxy does to look at the api endpoints of the service, um I don't know that that's the best case for all deployers, but I would probably argue that that would be reasonably effective at dealing with one of our largest scale dimensions, and you know voytek, maybe might argue differently, but we certainly the ability of the nodes to act as a thundering herd on the api servers behind a poor load. Balancer is something that I do think falls within our scalability responsibilities.

D

So we are trying to close some of the gap for the cubelet thundering herd problem. uh Some of that is coming in terms of well. The biggest change will be uh related to priority and fairness. There was a kept merge for for 122 uh to a described way to handle, watch initialization, uh and so when cubelets are dealing with an outage and they all storm at once to relist or re-watch their secrets that will now be handled.

D

So that's good.

E

So mike mike had gotten three quarters of the way to implement this before and we stopped because we're getting concerned about the complexity of it, I'm not opposed to another go at it. I just think that it needs to be done in a very responsible way and the responsible way is clear use case, a simple enough implementation or the use case. The implementation that matches the use case actually shows a benefit for a series of scenarios.

E

We could argue and then what's the test regimen, that makes it an optional, valuable thing that someone can use alongside clanco. Maybe it doesn't start in client go I I could see that path um like client-side load bouncing is just the dual of server-side load balancing they both have trade-offs. If the trade-offs don't work for all of our use cases can we argue that it works for none of our use cases, I'm a little skeptical about that so and aggregated.

C

E

Have similar cases too.

C

Here's here's my question with client side load balancing uh if we're going to push it down to the into the kubernetes client, why not push it down into the go http client right like why not give the http 2 client a list of ip addresses instead of a single one and let it like it. It already is maintaining a bunch of connections uh over or a bunch of a bunch of channels over the same connection like why not just let it maintain a bunch of uh a bunch of channels over multiple connections.

E

We describe that, maybe as like a neither pro nor con, which is, we should be looking at what the implementations in the the wild are. That would be most general and most likely to be bug free that match a very concrete use case versus assuming we have to write it ourselves. Sure.

C

Yeah uh yeah yeah that yeah I can live with that with that formulation right like like we're, not the only people on the planet to have a need for some client-side load, balancing or maybe we haven't. Maybe we don't we're, not the only people to think that we have a need for client-side load, balancing um so yeah sure. Is it really that hard? I don't know I've never tried it.

G

I think the kinds of questions I would like to see answered are things like. Do we see this as specific to client go, or is this like something? We expect all kubernetes clients to.

G

To have as a capability, uh if it's expressed in a cube config, then that seems to indicate that it um would be like able to be consumed by all of the clients we produce that read, cube configs. So that's one question. um Another question is: uh do we expect this to behave in a sort of load, balancy health checky way like route like check, z or red, easy or whatever, and route to the server that uh we.

B

Already expected.

G

As like a failover like, I tried a thing and then I got an error: that's a network error or an unavailable or a timeout or a 429, and I'm going to go on to the next one.

C

No, no, you don't need any. You don't need any red easy checking. I don't think because the http 2 already has this ping frame and we're already using it to detect uh whether the existing connection went away like this was a major issue with http 2 is like connections would die and it wouldn't notice for like 15 minutes. um So so now we do this application level ping. So I think if we were going to maintain multiple connections, we would just do the same thing.

D

Ready, z right so if we don't have ready z, you can end up in cases where you are trying to get a resource uh say a crd and you try against a server that isn't ready. Yet it's discovery. Information isn't ready yet because the cache hasn't warmed and it fails.

C

uh Did you get a connection to that server in the first place, though, because you.

I

E

Nailed it not all of not all the world's api servers, uh including ours, are good enough to do the right thing. um We have versions that have bugs. I agree. Those are bugs in many cases, but they're not.

G

Trivially fixable, we don't have to answer all the questions. These are just questions that I think should be answered.

C

G

Third, one that came out, you should probably.

C

You should probably call red easy. The first time you you, you contact a server.

D

C

Can't just be the first time you have a static.

D

List so like a thing can restart on that note,.

G

How do we envision clients, distributor, configurers, distributing updates to the list of servers?

G

This was something that we hid in our scd client, where, like we had static configuration of the ncd members and um like scd, has a provision for dynamically discovering members, but we don't use it because we need to be able to still function if we restart and if you use the sort of membership changed functionality, then over time you could drift and end up talking to a completely different set of servers. And then, if you restarted, you'd, go back to your static list and be broken, and so like understanding how someone would use this.

G

um It's not something we necessarily have to solve, but we should at least make sure that it's usable like how, if you've, got multiple servers that you're talking to. How do you add a new one to the list and like take one out of the list like what does that look like.

C

um uh I have a question about the option. One there says cube. Config is updated. I don't understand why you would put this in cube. Config.

D

uh Well in in this case, because he wants to configure a cube with it, or the first use case is to configure cubelet with it.

C

Why isn't that a like cubelet flag and not a right like okay,.

D

I mean okay, fine, so snip that part, uh it still faces the problem of ready z fan out, because these can go up and down. It faces problems of how you distribute end points which, interestingly enough, both things that jordan just called out. um I don't think that not using ready z is an option. We tried that with a local host connection and experienced bizarre failures as cube, api servers would come up and be in half ready states and they were inconsistent with the rest of the cluster yeah and effectively.

D

We ended up adding our own health checking to try to make that work, and then eventually we said you know what we have a load balancer that does this switch to the load balancer.

C

Yeah, I I do think the the endpoint distribution isn't isn't, so I think, there's actually three cases for endpoint distribution.

C

um There is the case where you're going through a load balancer- and uh you only have one ip address and the only way to get multiple connections is to make multiple connections to the same ip address and hope that you get different back ends. um That's one case. Another case is you're running in in the in the cluster and or you've got like a like a dns name that you can resolve and or you can read the endpoints table, and in that case you get a list of ip addresses.

C

You can make multiple connections, uh and the final case is: uh are the weird ones like cubelet and q proxy, where, like some, some turtle has to be at the bottom of the stacking like you can't read the the endpoints table until you get a connection so uh like how do we? How do we program that today, I I guess we'd have to evolve that or uh let let those particular um libraries worry about it right. So I I guess, I'm saying: there's not a general solution for that that works for every client.

C

But that common there's a few common clients that are common types of clients, for which there's a there's, an obvious way to do it right, like publish multiple a records or, however, it works in dns.

G

Oh, I thought of one more uh question: do we expect this to be transparent to the thing that is using the client in a way that, if I do a write and that results in a connection error, that the client does a failover to the next server and tries the right again and that that's.

B

G

What the ncd client did.

E

G

E

Requests and honestly, we have that problem in the client today, which is we inconsistently retry in some failures that are connection related, like watch, doesn't actually retry in some cases, but does so I would agree like this would be a good topic of what are we expecting our clients and that we can strengthen the expectations of retry behavior.

G

And so like some of the idempotence and like domain specific things like ready, z, like some of that, requires like high level understanding of what's going on, which would mean it wouldn't be as simple as like saying all right go: http, client, here's a list of ips like do failover for me, because go doesn't have any idea about like our right semantics or um our uh radius um so like whether we want a high level.

G

Low low level has implications both in terms of where the implementation is, but also like what knowledge it can assume about. The request that it could make use of to do.

G

C

Yeah a load balance client gives you a lot more possible strange cases in the client like if you're sending requests and those are getting muxed over different connections.

C

You could be like writing to one api server and reading from another api server um to be fair. If we have, if any.

F

Of those are observable from the.

I

G

D

Now, there's no guarantee.

G

C

Yeah, I I agree it's just much easier and, like you could be flipping back and forth. I I in theory, clients should handle this condition correctly in practice. I don't know that everybody does yeah. I'm less concerned.

G

About the the possibility of like two requests talking to you in api servers that already exist today, I'm much more concerned about the complexity and like, like david, said, like writing. Load balancers is hard if we're unhappy with the ones that we are already talking to that, like our purpose built for this, I'm skeptical that we can do better.

E

Well, but I mean let's- and I think this is a different audience- it's like, um I would say in general, the spectrum of load balancers used with kubernetes that are not cloud load. Balancers, come with widely varying quality and configuration.

E

I found cloud load balancers come with wildly varying complexity and special edge cases that are different from cloud to cloud, and so I would say I'm somewhat unhappy with the state of load, balancing as it applies to the ability of a kubernetes, a generic kubernetes user to successfully run a cluster, and some of this is like stefan's perspective here is: stefan is being asked to make kubernetes work consistently across almost every environment.

E

I think that's kind of in our lane, but I think it's kind of a can. We do this in a way, that's uh responsible for the broad set and doesn't introduce additional failure modes to your point. Jordan is like how do we? How do we make everybody's kubernetes work pretty well consistently and then not get ahead of ourselves and make some people's kubernetes work worse because of the aforementioned, like they do have a confident load, balancer set up or the service is handling properly.

D

You know if you're having trouble with a particular kind of load balancer. Maybe the solution is to provide your own load. Balancer configuration right like there are purpose-built load. Why couldn't you run one of those and maybe get better results.

E

And that's that's actually what many people do? I guess the question is like none of our the kubernetes is not responsible for whether your network is set up properly and that's a responsibility I think like uh distributions have to handle, um but then there's a flip side to it, which is what part of would we say is conformance from an expectation of like cubelets can talk to the api server, um and I think this is a as a sliding scale.

E

It doesn't have a right answer and it could be that we end up as a group saying, let's push a little bit more of this down to more of a vendor case or a a specific use case approach like a different. Each installation has to do the right thing.

E

That being said, that's a lot of duplication and so then there's an argument too of. If you can solve some classes of problem that plague kubernetes correctly and offer good recommendations, that's probably a net win for most people, and maybe cluster api or some other cluster lifecycle would be willing to um engage in some of those discussions. More than maybe sigipa machinery.

E

And maybe that's really. The challenge here is that this is a sig api machinery code, but it impacts cluster lifecycle a little bit more than sick api machinery day to day in practice.

A

So I see all the notes: what are the next steps to decide if we want to proceed here for the people who is in favor, or we want further discussion, jordan's questions.

D

Largely overlapped with mine.

D

I do have reservations about whether we would get it better than existing load balancers. um I think what I found probably most persuasive was we.

D

Why do we do it? There might think about that.

E

I have web hooks too. I mean I, I it's a good there's a good chance if you're running your web hooks on kubernetes and you hit the like operationally. If you touch a web hook on kubernetes and you're, using q, prox cip tables mode you're, going to have the half, your requests are getting rejected if one of your proxy endpoints goes down, so maybe there's maybe this is that maybe that would be a better place that we start helping. People mitigate self-inflicted web hook, problems well.

D

E

D

Handling end points in our aggregation layer more effectively. I could certainly get behind that right. That's an area where we've already decided to try to take ownership it. It runs in a sub-par way um and if someone proved that they could handle failover there better, that would be, and that gets.

E

That gets to daniel's point about it being a lower level, so maybe that's a um like. The aggregation layer is a little bit lower than client go. Maybe if we find an approach that works there, we could also recommend that for someone who's using clientco.

C

Yeah, I don't know I, I know the the go http2 client like didn't solve this for a single connection, um so I think that's bad news for us. If we want to like like, are we going to do a better job than they did and and on multiple connections? um I don't know yeah. I I think.

C

I think it's really unclear whether you want to try and pretend that all the connections are really the same at like a network level, and the application doesn't have to be aware of this at all, which of course, is nice for users kind of unless the side effects leak through which they probably will.

C

But then, if, if you, if you don't try to pay for it over and and you like, make users select uh the connection that their requests are going over, um that that doesn't seem great either.

D

um Yeah, I guess if, if someone wanted to start something in this area, I don't think its first landing point would be clanco.

D

I think that the problem could be explored for some benefit, using a list of endpoints that we already control and directly use and directly consume updates for in our aggregation and hooking libraries, where we are already going out we're reading a list of dynamic endpoints.

D

We are selecting one to use and right now, if one is bad, just one out of every three calls fails. I would be pleased to see that improved and maybe it would work so seamlessly. It would answer some of these questions that we.

C

Have yeah, I think, it'd be an interesting experiment to try something that that is like, I don't know an alternative to golang's, http, 2 client and see how good a job can be done right. If you get an awesome replacement, then that makes things a lot easier if it turns out to be really hard. I guess that would be information.

C

Also, um we've got nine minutes left. um I don't know that we're going to reach a decision on this, I'm not even sure. If we got to the point where we know what everybody is, everybody is wanting to see out of a decision.

A

We are missing, stefan, which I understand was in favor and had some ideas.

D

Too, um I spoke to him on slack. It sounded like um before this. It sounded like, I think, clayton represented it. Okay, probably even more fairly than I would have. uh I would have done a good attempt.

F

For what it's worth david.

D

I I know you were trying.

F

Your inherent skepticism, I think, is, uh does you credit.

C

um uh What do we want to do? We want to bring this up next next time for a rehash and see if anybody changes their mind.

G

I I think the like the idea that you could start it uh without modifying client go uh either by doing something like what david suggested, where there are places where we already are doing sort of custom, endpoint, dialing connection management, or I mean client go- gives you the facility to plug in your own dialer, plug in your own channel.

C

Yeah, it's got that.

G

C

G

C

Like a proof of concept,.

G

Client go plus plus that, like does some of these things like in you know, in an experimental location, by using that transport, wrapper or um dialer uh seems like a safe place to experiment um yeah. I would like to see these questions at least answered some of the some of these questions, uh like people were like. Oh obviously, no and other people were saying. Obviously, yes, and so at least maybe reaching an understanding of what is being proposed uh would be a good next step, combined with, like some experiment, concepts stuff, maybe.

B

I've taken a stab.

D

At a summary of conclusions for myself and daniel and jordan and feel free to add your own, and if I got anything wrong for you be sure to fix it.

A

Okay, maybe we added uh for the next meeting to see you know what daniel said um or if somebody is saying. Okay, let's do what jordan is suggesting. You know, let's do a client globe last class displaying some of these proof of concepts and show it.

A

Okay, so did we have enough for this topic? I think so.

C

Yeah, uh should we uh uh okay, let's see vivec I see you're on, is your topic short or is it going to be long? I don't know this is the first I've seen it. I think.

J

uh It depends but uh because it's just like I just want to give an intro on like what we're trying to do with that cd and uh how we can help like kubernetes uh kind of incorporate. What we're trying to do or see a release of kubernetes is like really interested in doing something like this. So I'll. Just brief it up like this, I think it'll be a two minute introduction, um so kubernetes is backed by htd.

J

Htt is a key value store, it's just a distributed version of the key value store, and the idea is that you have a specific amount of storage that it uses now. Currently, um the storage isolation between kubernetes and other clients. Isn't that great? So, for example, like if kubernetes fills up all the eight gigs of the database that was assigned to lcd, then other clients are locked out.

J

So the idea is to provide storage isolation by saying like okay kubernetes, you get like six gigs of this and then the other two gigs are assigned to other people that use that cd, because uh it's just making it multi-tenancy compliant so and then we assign it by the key. So you can set like, oh um maybe you know, kubernetes gets like six gigs or maybe five thousand keys that you can store in ncd, while the other clients get about two thousand and then whatever the remaining spaces.

J

So something along those lines. So essentially this uh you, we can call it quota namespace management and then uh each key is like a namespace in hdd, so but just kind of um trying to see how we can help td, because I mean how we can have kubernetes, because kubernetes already has something like that within their own ecosystem for quotas and stuff. So we're just trying to see if kubernetes can leverage what we're.

H

Trying to do so, I was gonna say this: is this comes up a lot.

E

For other things, I'm actually pretty interested in this from a right saturation perspective. So a lot of like production outages are clients blowing up ncd with rights. In a way. That's you know, a single broken controller can do that, um and so I.

C

E

Think we're gonna.

C

We're gonna fix that with priority fairness, I think not at the ncd level. The the issue is, the issue is if we, if we, if we do something like this and split up quota at the ecd level, then you can get weird weird uh priority: inversions and things like where, where you're, but you need to do a few rights and only some of them succeed.

E

Right right, but I mean there's, there's there's other aspects beyond priority affairs. Maybe you were thinking about prior affairs to do this, which is the volume of rights that are actually executed to ncd is a lot different. So someone very reasonably carrying out a patch operation is sending a very small amount of traffic through pnf, but has a high right. Amplification.

E

Very large objects are different. So, like um I I'm interested in this, mostly from the I don't think, defense and depth is.

C

I think prior interference is going to incorporate that information, so priority.

E

And fairness is going to know what's happening at the ncd level, when a patch operation happens.

C

uh It's the thing that the thing that we're adding um soon hopefully is is going to know, is going to have an estimate of how many watchers there are for a key and penalize uh requests based on the number of watchers um right. So.

D

C

E

I think there's some different trade-offs too, but like today you could do a bunch of slow, totally fair operations and exhaust all the ncd space from a single writer um and not all parts of quota solve all parts of that. So.

C

You're talking about the copy and write behavior yeah.

F

No I'm kind of going well. Let me just say um ultimately.

E

uh This is a different problem than what pnf is trying to solve, and the best option would be that it's orthogonal. I don't know that it's as critical to someone who is until you close the barn door and pnf there's still some challenges but like the exhaustion of ncd is effectively disk usage and it is also memory usage and it is also cpu usage.

E

I I'm a list, I'm interested in this in the general sense of like how do we make careers more resilient to all types of tenants, not just the ones that fronted but again it's. I do think it's a little bit of a lower priority. It needs to be somewhat orthogonal.

D

Yeah, there's also of if you identify different ncd clients in a typical kubernetes cluster. We expect to have one to three different scd clients total. So how would you even partition them.

C

Yeah, I I I don't think we can do this justice in one more minute. uh I I I don't mind if people do this, but I don't think it's useful for kubernetes, because I think um we should never if we've gotten into the situation where api server passed something and it's failing some sort of quota check at the lcd layer. That is not going to be good for for api server. So um well.

E

Clients I was going to actually suggest so, like um a number of, so another dimension here would be um like we. We do want to get folks in the community kind of rallied around problems that sig api machinery maybe doesn't have to care about, but draw investment right like some of the stuff with you know, minimal api server, which jason talked about or like kcp or like people who want to use the api server in heavily multi-tenant environments. Maybe there's an angle here for the core ask for api machinery will be api.

E

Machinery is not interested today because it's got a lot on its plate, but maybe there's ways that we can actually go after improvements that we could be proven out and tested orthogonal to what api machinery. This group is doing right now and broaden the scope of what some of these protections could be. So like the same way, we said we're not going to implement atd a different storage engine and ncd and sig api machinery.

E

In the near term, um we didn't close the door for people to go experiment with that, and we could improve the ways that people could do that themselves through better. You know factoring of the api server and so forth.

J

Okay, so that makes sense.

A

I hate to cut the discussion, but we are over time. I think um if everybody is fine, sounds like something we could continue on the next one and we stop it here is that okay, that sounds.

D

Good, I don't think we have a choice. I am interested in hearing more.

A

About it sounds good, thank you vivek for coming and presenting, and thank you everybody for the good discussion today. I will upload the meeting later. Send it in the channel and see you next time stay safe, see.

A