Kubernetes API Machinery Special Interest Group, 20 Jun 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG API Machinery 20180620

Description

For more information on this public meeting see this page: https://github.com/kubernetes/community/tree/master/sig-api-machinery

A

Okay, so welcome. Everybody today is the 20th of June 2018. We are starting our official API machinery, sig meeting for this week and I will refer to Daniel and David start for the first point.

B

Yeah I'm happy to talk about this one, so I think this got added to the to the agenda, because who are we trying to serve CRTs consistently or something anyway? The the I've noticing a pattern where there's no good way for API servers to do things sort of at the same time right like if we want to sync up discover like like so so. The scenario here is you're in an aka situation and we have to consider like the maximally hard case where you're doing a rolling upgrade of API server.

B

It's very similar problem to I've turned on a CR, D and I'm, not sure if all of the API servers have turned it on yet or not. Additionally, there's like suppose, I do a rolling update and toggle in API server flag like I, earn a group on or off by runtime config and there's also say, I. Add an API service like a aggregated API server.

B

How do I know when that has been added to API server, so there's sort of I think two components. The problem one is for some things: users need to know when we're done where, when it's active and there's also like to support that I think API server itself needs to have some information about the state of other API servers in the system.

B

So I don't have I. Think it's a fairly difficult problem and I haven't seen like a coherent design to address the address the problem. So this is mostly a problem statement and.

B

Just I'm asking everybody to be aware of it think about it. If you see, if you see problems, if you CPRS coming in that that do something related to this, like think about it, does the additional behavior make sense? Is it something that we need to solve globally, so.

C

You know I agree that it is a general problem that we have I, don't think that I'd be inclined to block polls that you don't make that problem. Worse, though,.

B

Yeah, if you're not making it worse, that's fine.

C

Whether you have a comment down, there should be blocking new dynamic, config work until we have a general mechanism I'm disinclined to block something like there was one on the agenda just a moment ago. I'm not sure we pulled it off, but there was one about dynamic auditing support, so I guess.

B

We need some sort of function forcing function to make this better. I am okay with additive things that don't go in a wrong direction and also don't change this behavior I'm, not okay, with adding things that solve this piecemeal without considering the the whole thing is that that makes the problem larger. Instead,.

C

Of making it smaller it can now I. There is also I think Stefan added. This comment about config Z I have issues with a config Z endpoint yeah.

B

It can fix the end point of this. It's not gonna work out, because it's not a bi yeah. We can do whatever, like whatever mechanism we use to surface the sort of information to users needs to go into the status field of another real API and I. Don't see any way to do it.

C

Status field- I, don't know that I would be strongly against a config Z endpoint. Once we actually had a had something types to return. Them is gonna, be stable, but yeah.

B

I mean I, guess it sort of depends on what you're gonna put in the config Zee end point like yeah I. Another aspect of this is we need to look at the flags that API server takes on the command line. I think there's two categories. One is like specific things about this API server on this host and the other is flags that need to be synchronized between all API servers in order to make sense and and really like flipping a flag in the latter. Category requires like a rolling like a rolling.

C

Update or something of that nature in that case, though I don't know so so, I see something like the the dynamic auditing it's dynamic config for auditing, as analogous to dynamic admission where you want to have a uniform set of audit rules across your cluster right and I, see that as distinct from something like I want to set a particular Phi on the cube, API server to I, don't know change its host bind parameter or something right. So, yes,.

B

I imagine I. Imagine like I, don't know the the advertised address flag. Did we still support that one?

B

Maybe maybe yeah we used to have this flag that, like control of what it advertised itself to India in the kubernetes service, like all API servers, really need to agree on that or they're gonna fight over creating the kubernetes service yeah.

D

All all kubernetes api services so.

E

I think there are three.

D

Categories there's things specific to a particular instance like the IP of this machine, and there are settings that must be consistent between all API servers in an H a set, so all cube. Api servers are all right.

C

D

Particular aggregated AF server and then.

B

There's ones that must be consistent throughout the entire control plan.

D

C

Admission, yes, yeah.

B

So that's a good, that's a good addition. Yeah I agree.

C

There's there's three there's a lot of areas so to be clear. I am disinclined to stop forward work in that last category. We're all API servers, regardless of type and the cluster need to agree on a thing. I. Don't think that we should stop people from progressing on those.

B

Yeah I I agree that we should let people continue to add things. I think we shouldn't let them. We should make sure that they add things that still have this problem. Basically, rather, unless, unless they're offering a solution that can be generalized because I don't want like one-off solutions for each of these things, that seems like that seems like a poor state to end up in yeah and and David mentioned admission web folks.

B

I think I forgot to mention that, as as one of the things that that's like admission and maybe Ozzy and maybe often are also things that happens, problem.

B

So I don't know, I'd be happy. I guess this is mostly informational, update, I'd, be happy to see designs or discussion or issues yeah.

D

I think I just got scopes down to kind of strip out a lot of the multi level. Cluster scope, namespace scoped bits to not try to subdivide like policy around what kinds of audit you should be able to set like basically letting the dynamic version have the same ability as the currently provided static version and yeah.

D

That's yeah, yeah.

C

D

The administrator doesn't want to expose themselves that way. Then they can not not enable them. Yeah.

B

Like even like dynamic auditing should be like, we should be careful with that. Api like if it's referencing say a file on the file system as a destination for dynamic auditing that should not be in an API. That's shared among all API servers right because that file could be there different, different hosts right. So I.

D

Think the the remaining questions were is the dynamic that a singleton initially or does it allow for multiple destinations with different policies? The former is much a much easier path from where we are today, but pretty limited. The latter requires pretty big, possibly requires changes. Big changes to the audit pipeline, which today is optimized around short-circuiting and not doing any more work once the policy says it doesn't it no more detail is needed, so I think those were the last two kind of questions left singleton versus.

D

Are we going to target singleton or multiple, and are we only going to target webhooks for the dynamic auditing.

F

Yeah, so what we've kind of come to this week? A talk with you.

F

Just to not have the poll so you'd be dynamically configured and just have the web folks. Just basically allow you know a single or I guess it could be. Multiple web works but they'd work off. That's statically, configured policy and then once.

G

Kind of focus on that piece.

F

And then worry about the dynamic policy down the road. Just so ya, hear of lis get this stuff through and agreed upon. So.

B

Just to clarify something I said earlier I as much as I. Don't want a bad solution to this I'm, even less inclined to have more additional static configuration for like Web books and stuff. So it's so I do agree that having web hooks dynamically configured is definitely the way to go.

B

F

B

H

I

F

There still know: we've skipped it down there just in the web perks, you know, do we want to allow multiple over? Certainly we want that to be a singleton objects, initially like George's mentioned. If so,.

B

I'm in favor of allowing multiple books, we didn't allow multiple web books for the Aussie in pointer, first and I. Think that was problematic, so yeah.

D

B

J

Labor of that, as well.

D

But it is not a simple drop in there's a there's, a lot of work to get the pipeline supporting disparate policies. I think.

J

It's possible, but it's I mean it's gates with a number of approves. This must be clear. Yes, a lot of work. Oh yeah! You should carefully consider whether you really need the overhead of multiple.

B

Sources, on the other hand, if you use the infuse the rule system that we use for the admission web folks, you could have different resources audited by different audit backends or whatever. So so, maybe you'd only configure it so that any particular resource got to one sink. I, don't know I.

F

Don't think there's a use for having multiple backends and maybe different policies for those. So it's is tricky. I mean we're. Trying I guess skip it down to just talking about the web. Folks I definitely think there's a use for for multiple web folks, but I do heed the warning that that may be difficult.

F

B

Yeah I think it makes sense to have multiple web folks, like I I, can imagine like people making standard consumers of or servers for these web hooks for various audit backends, and maybe, if you want to roll over from one to the other, it makes sense to enable both for a while, so I I think I think that's nice to users. That doesn't mean it's worth the cost sure, and- and it's also as long as you don't make the API too horrible.

B

It doesn't exclude like doing it first for one web hook with like an obvious path to extending it to multiple multiple looks yeah.

F

I think I'll probably just give the work and we can take it from there. Yeah.

B

B

Does that cover for the dynamic auditing input that you need from this group? Yes,.

F

We're just looking at trying to be. You know the first version of this with just the web folks, paying down a moving figure, basically taking the flags and putting those in the configuration objects. We're.

G

Going to get that.

F

Or seer little later, so, if you have any input, please turn in there now, so we can get over at least this initial piece of just taking those specific flags and making them configurable yeah.

B

If you're, looking for my input, specifically I'm afraid that you're gonna have to harass me over over slack I'm sorry, it's come down to that. But I have more to do than I can reasonably accomplish so I know.

F

Where's I know we have a couple reviewers for maybe I'm machinery, okay, yeah.

B

I've Georgians, looking at it and I'm pretty okay with that. Well, okay, Antoine! You wanted to talk about dry, run, dry.

K

On yeah, so in proposing API some on trial- and you see scan about the kernel to the South Sider player, yes, but this is going to go into a clean master, not in the future Blanche. We have also a side apply: mmm-hmm, I, open.

B

For where the freezes unfroze, so it can't go in master now, yay.

K

So I'm proposing we have a query parameter to specify, drag on. We already we build the API, so no in 111, yeah.

B

I think we're already agreed on the query: parameter yeah.

K

So I'm proposing also passing and flag to your mission controllers first dynamic and study. Yes,.

C

K

Probably having side effects el padre island space, if you keep for us, for example, and also changing the storage layer, so that who would avoid stoning the object in case of a dragon, the goal is to go as deep as we can in the stack so that we can return an object that is going to be as close as possible as it would have been stopped. The goal is to get an idea of what would have happened if it had happened without having it happen.

K

This can be useful to keep birds, Dave and validation in general.

C

Yeah, so it looks fairly straightforward, but there are a couple things worth noting. We talked in the comments about admission controller config. It seems like a good idea to force an admission. A web hook, admission piece, a yes I support dry run because not supporting dry, run and being called is kind of a big deal, and it can default to false now and to true later in a later version, because we know we're gonna have more versions to come when it goes GA. The other was inside.

C

Of our actual storage stack, the there are rest interfaces for things like there's like rest update, graceful deletion, creator. Things like that I think is gonna be worth having a new interface there for things that support a dry run, so that someone and there's a new level of duty. They will actually get a compile failure when they try to use it and I am, and so.

C

And there was one other, oh, the behavior, of something like Florida, the Oh Daniel just typed. Yes, different want to do different things, so I think it's worth noting that the major quote I think we all agree. Quotas should not bump usage, but whether or not quarter should reject is less obviously I want to use dry run to see if they have enough quota. Other people want to use dry run to see what the object would look like, and so, if quota rejects, it makes it impossible to fulfill that latter use case.

D

I

Yeah actually asking.

D

About like different different purposes, so it's like it not just being a boolean but like. Why are you dry running? Are you dry running to validate? Are you dry running to do the full, the full of mission chain like what? What is your purpose is.

B

That really is that really necessary I.

E

Think it's going to be because there are cases that I want to validate mission.

D

But if you don't go through validating admission, have you really validated.

E

Well, there's lots of reasons why something is valid for an object, but not valid to create on this server at this time. I think.

B

If you want to run schema validation, that's not the same as dry run, yeah.

D

I dry run like picking it piecemeal I, don't know how. How are those results useful? The.

C

Quota, one is one that I know of right, so Derek for some reason at some previous time had wanted to be able to do a dry run to see if he had space in quota. Do I get a rejection yeah, that's that's what I would anticipate it, but doing the.

B

Other Ceaser couple days ago came.

C

To me and said, I want to look at the object that would be created for me, regardless, where the quota would succeed or fail. I want to see what object we created, yeah, okay, why not it and sorry right David! So wanting in and not in, is it's pretty difficult and.

E

Other cases here is like there's different types of UI is a CLI is not the same as a web. Ui UI I, don't necessarily need to validate a controller that somebody's building. An experience around like a deployment doesn't need the same thing. The deployment controller needs when it needs to validate or when, when you're running a creating a pod directly, so I'd be like even at the highest level, we just say: don't put googly incident general level. I think that might even just be big people need a hood.

B

The thing is like we'd have to like thread it that way through every places. Use like we wouldn't be able to know that a particular admission web hook or a plug-in is doing a thing that the user wanted to exclude in their dry run. So.

D

I think we have, we have to pass that the the intent of dry running through.

G

D

But we can pass non.

K

D

What how would you characterize the difference between those two desires? David is one like. Is this going to be accepted and the other is I want to see the shape of the object? That's.

C

A good way to put it another.

E

One is independent of our back like this is a know.

C

That I think it's a bridge too far. For me.

E

I'm not saying bypass our back I'm saying there is a valid use case, which is I. Want a system. Look a bunch of objects, I don't have a namespace. Am I as a client asking the question? Is this object valid for this server namespace or valid for my rules, plus that I'm not saying that to solve that in this particular API flow, but like like?

E

Let's not get too narrow like there's a little bit of like looks like a tunnel vision on the thing we have to get built right now when the end goal is making sure that end-users can make sure their objects are networked. Okay,.

C

So for the ones that I mentioned, there were concrete use cases for the image API for the quota thing and there were concrete use cases for a cluster API from Caesar. For the second thing, yeah.

E

Do we have exhibit use your cases list like is, that is the cap do want to put this in the Caprica me.

B

Why not? We should probably make a separate cap for this well.

D

David for the cluster API iris I think the confidence about whether something will be accepted and whether what the final shape will look like is dependent on what namespace it's put in. But.

B

Yeah I I, agree and also like bypassing Iraq or something like that leaks. A lot of.

C

Information I was the same. Are you for Clayton's point? I was trying to explain me. Why do I have some things that wanted run quota and be enforced and some things that don't, as opposed to the no namespace thing, I, think that just representing the the query parameter as a string and threading the string through and then having individual mission webhooks can look and say do I support this train. Do I know what it means. My like I, don't reject kind of hey. We we probably need.

B

To have the web books declare in advance which which strings they support, because we can't just call you if we don't have confidence that you're not going to have some unexpected side effect on the resource, so I, don't I, don't.

C

B

We have a that.

C

They need to indicate that they understand that the parameter is coming, but then the understanding, the value of that parameter seems like the server that's being called could decide after it's already been called I mean.

B

We need we need, we would need the guarantee that a web hook, if it sees a string and doesn't understand, fails without doing anything.

D

Yeah I I think once it acknowledges if we define this as a enum and part of the definition is, if you don't recognize, you have to reject, then acknowledging that they're expecting the dry-run parameter is sufficient.

E

Well under if we just want tight list of phases, like logically like there's in the validation, I got to go back and look at Andy's original doc, but we actually talked about the different student different types of validation, Optima, its internal, like it's, the object internally consistent is the object consistent within the other resources that might be in that namespace is the object, consistent with limits that are applied to you, I think, there's one other one. Other distinction again like I, don't know that we need to implement all of them.

E

I just want to think about these before we lock in an API I'm, which might be enough, but I just want talk through.

K

It is what I suggest I'm going to use a string and phone. Now, it's gonna go all the way and we can decide later with a different proposal, what we do with different strings yeah.

B

Antoine will the well the code that we put in the previous for this current release, reject things if it sees something other than a bull, it is presence of the string presence of the parameter.

D

B

That's what I thought, which is perfect.

D

Yes, occasionally we make the right choice, luck.

C

A

I

Thanks that was helpful.

G

So hi I'm, Jenna I'm, usually in sick apps and I, worked on workloads api's. So in C caps we've observed some limitations of garbage collection. So that's why I prepared this proposal to expand the garbage collections and I plan to work on this on in Kanto, so I don't plan to discuss the details of the design today because I just publish it yesterday, I love everyone to take a look and give me some input, maybe discuss it in the next meeting.

G

B

In particular, I think, there's there's like three specific editions that could all be done independently, so it'd be good to like take a look see if some of them are gonna sail through and if we need to discuss others in more detail to make sure that we don't confuse, don't confuse them because I think they're all independently useful.

A

Okay, so they asked for the group is to take a look to the proposal, your comments there and then we will do a more formal and non presentation in the next meeting. Okay, very good, thank you and finally like be there.

L

Yes, I'm here, um I just wanted to pick your brains and and develop some understanding of well. My original question was scaling in the garbage collector, but actually Clayton gave an answer that really answered the real question, which is really I, need to I'd like to understand everything that has some need to to load all the objects. I mean that's a and others can sort of like to understand that.

L

So did you follow up on Clayton's ants or let me just pick much so in the API servers there is I may help me with the terminology right. There's this thing called hoob aggregator and what I think of as the main API servers is built on the KU, bagra Gator, and then something that provides the the entry types and someone without of tree types can create additional API service objects. That register where to find the anchors yeah.

B

Thanks, yes, Stefan did a great presentation of that at cube con if you've watched that recording on like 2 X feet or something yeah.

L

So my question was in terms of who knows what objects does the main API servers does each of those load? All the objects are just the ones from the entry pipes or what I mean I server.

D

Keeps its storage keeps a cache of its own types which it uses to satisfy watch requests, and that is all the main API server keeps in memory. Yeah.

B

I think we we only construct the storage object once and that same object gets threaded to all three API servers. Is that correct, I hope.

C

It's that way, um yes, the Informer's should be shared amongst them all and.

B

The watch cache is underneath the storage abstraction, so there should be one watch, cache yeah.

D

So each API server has a cache of its own types if it's using the standard cube storage, so the garbage collector acts as the controller actually runs in the controller managers. So if you're asking about the garbage collector, yes yeah.

L

K

L

To ask about both and thank Clayton for answering about both I was just trying to discuss them one at a time, so I think I've gotten most of the answer, then about the API service. So each one is gonna load, every object of every type that it's responsible for and I think Clayton mentioned that if there's multiple versions in play, if the watch cache is gonna hold a copy to each of those versions, right, yeah.

C

Worth remembering that your cube API server may also maintain a second cache to be able to do lookups for things like authorization, our back will have a cache and I guess node authorization. It will also have a cache. Those.

B

Shouldn't be cashing entire objects. There right.

C

D

Think Informers, so no.

B

But that's that's like config objects like the web book configure.

E

The artbook pauses yeah yeah, there's no there's no caching kubernetes today that caches anything other than full objects, except the dynamic client under.

C

Garbage collector, which we saw them completely, Joe that actually stores full objects always has. There was a bug: yeah Damle,.

B

Gonna, the garbage collectors graph I.

E

Believe only stores metadata, yes, the grant, but you saw the informants backing it, which store the whole direct web engine. It has a few cases where we have started to have reflector driven caches that only cache a subset of the object defined by the cache like keep like you know, the resource version, the name and the namespace and a couple other fields.

C

I think we merge any of those yet I thought those were still open tools. We.

E

Had some already in there the trigger caches like that for image tours it's just a it's, not a quite as easy as it should be, but right now everybody caches, Informers and then builds indices on top of them, with very few exceptions. Yeah yeah.

B

So, and, and the remaining thing to know about controller manager- is that the controller is all share informers, so you don't get one copy of objects per controller that uses it. You get one copy of objects, per controller manager, binary, that's running right.

L

Okay, so yeah, so let's go on to the garbage collector, so the garbage collector runs in the controller manager right, okay and the garbage collector loads, everything except events. So it's loading the internal version or some which version external approval, which external version yeah. That's actually yeah,.

B

That's that's. Actually, a governor doesn't have a good way of knowing which yeah it.

D

Listed version or in each group of a resource.

M

D

L

D

Sets appear in all three apps versions: it loads them from the first version listed in discovery, yeah.

B

But it doesn't know that extensions Damon said is the same or extensions deployment is the same as the apps appointment. True.

D

B

D

B

To hard-code that one.

D

I can't remember if we hard-coded it or we.

G

D

Those are rapidly well rapidly, slowly getting shut off in the extensions API group yeah Network policy on a way we have a few left so.

L

What is the order in which the versions appear.

B

Through discovery, discovery is supposed to list. The API servers preferred version first, okay, so.

D

uh If you actually go look at this discovery doc, you see the the groups come in in a particular order and then within a group you see the versions listed in a particular order.

L

Okay, very good, so.

H

Okay, because which collector impacts things by UID. So even if it watches for both extensions deployment and apps deployment, they internally, they represent as one single nobody in the world.

B

It seems you are, these are unique across resource types, yeah.

D

B

It has a wrapper.

D

That they are across resource types, mm-hmm.

H

They happen to be for.

D

Those because they're shared that's why I thought.

B

The point of view IDs is they're unique across every yeah yeah. They better be unique. So.

B

So I I have a somewhat related thought, which is eventually we're. Gonna have to think about charting our controllers by namespace. I. Think is the only way to do it, because, eventually, we're gonna have clusters that are big enough, that we don't want to run all the controllers on one system for all resources, yeah.

L

That's kind of where I'm going because I'm, as mentioned I'm using Kubb APM machinery, to build the control plane for something else, and we want it to be bigger than a kubernetes cluster, so I'm trying to think about how what whether I need sharding and if so, how so yeah. So so.

B

So the way the only way I could think of that will make charting work. Is you have to shard you can't charge within a day in space you have to shard, you have to include entire namespaces and the reason I think it has to be. That way is because there's a lot of controllers that depend on the visibility guarantees so like. If your deployment controller, you have to look over all the replicas sets in the namespace to see if any of them are owned by the deployment you're trying to control. So because of this visibility.

B

Concern I, don't think that our system I, don't think it makes sense to shard our system into pieces smaller than a namespace. So I think we need to think about charting groups, including entire namespaces groups of namespaces. We we don't have right now from API server. We support watching all namespaces or an individual namespace. We go have a option where you can watch several namespaces I mean you could set that up with this series of watches yourself, but we don't.

B

We wouldn't make that easy for you, so I think to support to support a good sharding mechanism. We have to start thinking about that. Yeah.

E

So I'm gonna push back a little bit, which is I, don't want to do sharding so the next year, because I've got a bigger than everybody else's clusters, except maybe huawei's, and it hasn't been a practical problem. Yet except David keeps making things more inefficient, but that's not David's problem. It's like that's just our architecture.

E

So like I right now, the biggest clusters I know of peaked at about 20 gigs for the controllers, which is like it was like that was like it wasn't quite the 5x load, but it was something like 3 to 4 X 3 on forex. What prefix, 3 4x overlap, do two redundant cashes garbage collector, cashing things good and.

G

E

As it as we split, we were getting more cash from like scheduler and all that we're still like roughly in that range I. Don't know that I'm worried about hundred gig memory clusters, which would be something like for us. That would be like 10 million keys somewhere.

G

E

And 10 million turkeys in that CB, which is right in the limits of that city anyway, although.

B

It works that we're working on extending that I think Joe is hoping to get that CD up to like 16 gigabyte database size, we're.

E

At we've bounced off the limit several times on eight yeah, it's kind, one of those things where I just I want to I want to get everybody in the community to kind of buy off. Before we go, do something complicated, which is there's a lot of work to do sharding in practice. Do we really expect clusters to get past five or ten million key boundary.

L

That was the question I wanted to ask which is number of keys, so you're you're telling me you're running 5 to 10 min well, let me keep your brain. We're.

E

About a million right now, it's been a little bit higher at some points, and actually 90% of that is secrets, because we have support names, we have so we keep it: 20,000 name spaces, new service accounts per namespace and three relatively large secrets per namespace.

E

That's a part of the reason why the whole secret stuff is in so that would cut that we'd be back down to much smaller keys and smaller entity size at a million, even without those like those are about a hundred thousand secrets and the other nine hundred thousand are just the fan out from having twenty thousand those spaces.

L

Okay, so in one part of what I'm the control plan I'm thinking about life would be easy if I could have 27 million keys and sounds like you're saying my life is not gonna, be easy.

E

No I mean I, don't think it's impossible if I don't think it's impossible to to to get to that point, you're definitely pushing the edges of.

B

Yeah and storing things as JSON for series may make it a little worse. Yeah.

E

I mean I know like most of the high cardinality by namespace. Resources are our back quota, anything that you need one of per name space because name, space roughly equals tenant, and that's about a million with 20,000 I mean if you want to get you want to do that by an order of magnitude. It's within reason.

G

E

Feel, like you get past that and you're starting to push up against the bounds of what the underlying stuff is gonna work with. Is that solution? So, oh.

B

I know what it's gonna ask: are those 27 billion keys gonna be in one namespace or are you gonna use multiple namespaces yeah that all these political money, namespaces okay, so I mean you could? If you want to manage watching each namespace individually, you could actually shard controller manager. Today it wouldn't be easy. The the the thing that you can't do is get API server to or tell it I want to watch this range of a big case of namespaces. You can't get that.

L

Right so I'm thinking about something you know I'm talking about making an infrastructure while control plane using the Korean API machinery, so India knew an infrastructure cloud will typically have a large number of tenants all right. So we want to be able to handle. You know thousands of tenants, so I hate to make a shard for each tenant. That would be too small and I. Think that's where you're going right. You want handles as some kind of aggregation there yeah I I can tell you like.

B

Like there's sort of two ways you can do multi-tenancy right, you can make things really really big or you can make things really really small, so so right like if it were basically cheap or free to operate an API server, then it wouldn't be so bad to to run one per user. But it's not so you could consider a middle ground which is run multiple, multiple clusters that has the additional, like.

B

You still get many of the benefits of being operating a big multi tenant service, but it gives you a smaller blast radius in case something goes wrong. So that's that's another thing to consider. Well,.

L

I'm not quite sure, I follow I mean if you're suggesting run a coop. Api servers would be multiple for reliability reasons if I run a set of coop API servers for each tenant. I give that that's too many right, yeah.

B

That's too many, but if you run a set of API servers for everybody all together, then you have 27 million. That's a geek ease. So what if you targeted like five million NTD keys and ran ten right clusters right right.

L

Yeah well, I might not even have to have a cluster whole cluster I mean you know. As we talked about earlier, it's yeah. This is for one particular kind of object. I could imagine doing some kind of sharding for that kind of object.

L

You know the other kinds of issues, not so demanding yeah yeah, if it's just oh yeah, just that one object, but again they think then there's the garbage, collector and there's the controller managers, and so it's it's not just the storage of these objects. Right I mean.

E

Garbage collector could be made a lot more efficient. Don't you know people a few people have suggested other types of meta controllers, the deal only with either object or some similar interface.

E

We can certainly a couple would get an order of magnitude, improvement and efficiency there, but I think they'd be about the bar without more expensive stuff, I mean I, don't know, I, don't disagree that some form of sharding would be useful. I would want to leave that from sharding based on what makes organizational tendencies full or like security boundaries like the difference between a cluster admin and every other user on the cluster, trying to think through those cases there, but there's several proposals on those lines like yeah.

B

I think I think we can charge without worrying about making it a security boundary III, don't think we have to solve those problems. At the same time,.

E

I was saying: I think the security boundary wants a more important problem scale, because security boundary is. If you're gonna have more and more people on these clusters, then we need to spend more and more time making sure you don't get outside your bounds and like.

B

Iii could go either way on that. One I think there's a lot of people that would benefit from running bigger clusters. I, don't know, but I feel like.

E

I mean I'm, just I'm not trying to be condescending or anything. It's like we're running really big, and it's not even our main problem, like our problem, is none of the controller's deal well at all with any failure modes and so lightning at a million keys like these are the big data bases, but the problems are not getting to two million or three million or 4 million keys. The problems are all most of the stuff we work. We.

B

Have yeah I'm surprised that that you managed this with or I'm surprised that the your clusters don't blow up because of endpoints? They look. They blew by B tables. First I think.

E

Ok is not efficient when you have, and in while we hit this like I, I kind of feel like the Huawei use cases and some of ours, like another thing, they're an absolute upper bound, but they kind of feel, like you know, there's probably like six or seven people who care about this scale. I would much rather spend our time on everybody who said Oh order of magnitude. Smaller scale make their lives easier, yeah.

B

Well, it sounds like I'm a little bit more in favor of charting than Clayton well.

L

I think I, if I understand Clayton you're. What you're telling me is it's not just a matter of how big it can be, but I think you. Let me follow up on that. You're saying that you were saying something about how errors are handled is, is more of a problem.

E

I mean a great.

L

E

Are burning? We had like five seven one defects and OpenShift on one 10 kubernetes right now, because the demon sect controller loses its goddamn mind. Pardon my language, where's.

G

E

Damon said controller like it's, some of the edge cases for failure modes like if a particular node plagues out impacts the entire Damon SEC controller. If a particular namespace has an insane request and, like you know, you're getting air back offs because of omission, plugins that don't allow certain things going through the controller machinery and generalizing.

E

The lessons we've learned on controllers would probably benefit anybody who's at a high cardinality, tennis tenancy and we haven't spent a ton time like you know, make the generic control or the work that's going on and queue builder be resilient to. If a particular namespace is erroring out. Maybe we should just take that namespace out of the QB for awhile, it's stuff like that. So it's not that it's not that we're 100% broken it's just that for well a lot of what we've seen the mixed use. Clusters tend to fail.

E

One error case: propagates, you know, controller goes crazy, saturates, the API server we generally clamping goes down. We got events, controllers are getting better we're, glute controllers are getting better, but we're still not there yet mm-hmm.

L

Well, I noticed also in the multi-tenancy working group, which of course is meaning at the same time, independently they're having a discussion about exactly the issue of protecting against controllers that go crazy for one reason or another.

C

Well, they were looking at something slightly different and looked like they wanted to try to introduce the the concept of these namespaces are related and have different cue bucket, so it looked very different. Animate yeah.

L

Yeah that too um I thought I. What I'm mentioning is that that discussion has branched out- and you know, enlarged, to include I- think the concern of what happens when controllers go crazy or are you know.

C

Taken over by some malicious actor, I mean they're, they're they're conflating a couple conspirators together. Right like how you deal with controllers going crazy is not the same as how do I implement fairness amongst different sets of namespaces to different right right, yeah.

B

There's an email thread or several email threads on this topic yeah. This is a super interesting topic and I could probably go on about it all day, but but maybe we should ten Marxists in gone. Yes,.

I

Thank you. Yes, bye, Sarah, yeah, so Martin.

M

Okay, so kinky me yeah, yes, okay great, so it's about recommendation, so I've been talking with with Frederick from instrumentation and right now they, the IP I library, here's a struct which, when you register metrics Prometheus, you can define an endpoint which, when you send a delete, request to tweet it basically use as a matrix or just or part of them and according to him this is a very bad anti pattern and we should probably get rid of it. And my thing my question is: if we should get rid of it, tweet-tweet booth.

E

Exists was for six scalability, so you need to talk to six scalability high scalability had an inconsistent in point. They they had added this originally, so they could reset metrics during runs, and then we moved it to delete, which is where it is today. But then in general this was a six scalability.

E

They implemented and made it a requirement just so they could reset. They could probably change it, but yeah.

M

Because I thought it only resets only like 410 of the metrics, not all, not all of them and yeah. Okay. Thank you. I'm.

E

Pretty sure they can work around it, it was I think it was just the easiest thing at the time for someone to add an endpoint rather than to remember the previous metric state and do the Delta by everything.

B

E

B

Is there a reason why it's best to use the delete verb to mean reset.

M

And in the county for mine, probably pretty much explain it. Oh it's bad I mean by definition. You should pick much to restart the process when you want to reset the metrics. It's.

C

Not obvious to me that that would be the only way to do it. I mean I, don't feel super strongly one or the other, but saying I want to reset the metrics yeah.

E

I mean this use case was only because of something that six scalability testing was doing in their testing just to make something easier. I'm I! Don't really think that this is a requirement for us they for their use case. The reason it was put in they could probably just at this point, parse the metrics and do a delta calculations, not that difficult.

M

Because I did a search and get her pants at least I can't find any reference to it on all of them. Some documentation. You know one is known it after using it. Yes,.

E

It's almost certainly a boy attacker shyam would be the ones who talk about how hard it would be a change. Okay, thanks.

B

We have a couple minutes left over any additional things that we can talk about in like a minute.

A

Okay, well, thank you very much. Everybody for attending I will upload the recording later today, good luck, Daniel with your adventure and we'll see all of you in the next meeting. Okay have a great day night.