Kubernetes SIG Node, 13 Jul 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG Node 20210713

Description

Meeting Agenda:

https://docs.google.com/document/d/1j3vrG6BgE0hUDs2e-1ZUegKN4W4Adb1B6oJ6j-4kyPU

A

So good morning, everyone today is the july 13th and there is our weekly signal and last week we cancelled, so we have the two weeks updates and thanks everyone's effort. So we have like the have the branch card and everything uh I I personally think he's pretty smooth this time and everything's a lot of things get merged, and I I know we have some negative left and a also also see your exceptional request there.

A

uh So maybe, let's start today's meeting and sergey and elena, do you want to update our regular pr status and also the bug charge issue charity status here.

B

uh Yeah, I can do that. Do you also want to talk about keps first person bugs first.

C

I think we can discuss gaps retrospective, maybe next time, so we know the status of exception requests and uh we can just go through the lease that would accept it for the milestone and uh discuss. Maybe the reason why it didn't make it or made it made it through so yeah.

B

That sounds great to me.

C

I think there are some.

B

Still pending uh so yeah caps next time and then in terms of pr I so in terms of the board, I think everything that had the milestone has been either looked at or merged, um and so everything else right now. I think, on the board like there was one remaining thing that I was like. Oh renault, can you please look at this? I'm pretty sure that one got merged now so other than uh I think some of the stuff that got an exception request approved uh everything else is good for this milestone.

B

I haven't started looking at some of the burned down stuff, but uh last I checked it was all pretty close. There is one um thing with I think, the um the pod life cycle refactoring. I found a small bug in where, in certain circumstances, there was an npe, so clayton's looking at that, but I think uh we're doing pretty well uh and uh for uh ci sort of burned down. uh We're definitely doing really well this release so uh yeah not not too.

D

E

B

Going on uh now that we've hit code freeze, which is good, maybe everybody can just take kind of a chill july, get it all done.

C

A

C

What's going on uh in terms of prs just go through the links and you can check out what was closed, what was uh merged so it's very interesting to watch what people are doing. You're working on.

A

And also surprisingly, this time, actually, I know there's the steel figure and the e2e test the figure, but compared to the uh last couple of years. I think this time actually is pretty smooth and also uh in the past a lot of time. It is yeah across the finger so far at least it's not the block. A lot of the changes being merged right, so approve the change be merged. So so far is pretty good. So that's why I think about it smooth this time.

A

So, let's move to the first topic. Sweaty, do you want to.

D

Yeah uh hi hi everyone, um so this came up in the signal ci meeting a couple of weeks ago and we we've been finding a little challenging finding reviewers for um for areas related to container manager and all the resource managers as well as pod resource api. So I just wanted to bring this up. um Kevin has been extremely helpful in this area and the this became even more evident because he was out on paternity leave and I think he was working through his annual leave to help us with the reviews.

D

So I think it's probably not the best place to be as a community and relying on just one person. So my question here was like we're all pretty familiar with the requirements in terms of being a cubelet, approver or sorry cubelet reviewer. I was wondering if there there is a way that you could maybe propose yourself as a reviewer for a subsystem um and that probably might help with this.

A

uh Maybe uh other people please feel free to jimmy, but I want to share one thing: this is online with last meeting we have, uh uh I think, allah brought up uh approval status and behind using and actually the direct- and I uh we talked about- we think about it today. It is the signal approval.

A

It is too coarse, green, it's not defining green, like exactly what you see when we first build the signal the I want to share with here, because a lot of people is new here. Actually, we are on purpose to build the several area of many sample area. This is also I share with a new name uh that new tk naughty in google, so we built a sub area.

A

If you look at the signal, the responsibility, there's the basically have the five area we built in initial name, so there's the like the pod life cycle management, for example, for kubernetes pleg and the cri all those kind of things. If you talk to manu and and also nanta uiji in the past, they are, they are actually responsible for those kind of things, and then we have like the resource management in the past.

A

The resource management we have the vishnu direct and myself and also joining uh david, the dashboard later, so all those people we on that area, then there's the monitoring, logging and there's the many people and there's the nodes that created like a team eclair. So you can see there are also even storages. So you can see that today is a storage chair, uh shot. Actually, our injury is the signal and then later found the storage same symphony, the like the team or claire used to be signaled in charge about the know.

A

The secret event later become the sig us cheer at the tech lead, so they're same thing, uh kind of keep going that way, but over time it's kind of like the way I lose those boundaries and also a lot of people move and to the different area. So that's kind of the the today's problem so derek and I unfortunately on vacation we supposed to have the meeting and discuss with the existing approver and figure out how to.

A

I also talked to the uh clinton and about how to uh make a previous subtle area not clearly defined cyber area. But do we do actually? At least that's can go when I first found it the signal and we want to build a different area. So maybe it's different, because today we have already have this cigars right, so we don't need. Maybe we don't need to know the security, but we need some people review security perspective on the node side like, for example, username space ziggler.

A

Also so there's not many of those things and clear over time and the people join the different, safe and lose the, and also some people also even at the uh the uh leave this community. So that's why we need to rebuild the community and with a little bit of a structure. So this is a really good question and a good time and uh yeah.

A

We understand that the problem, so we really want to have that uh build the more finer uh grand narrative, the structure, reviewer structure, a pool structure and have like the small server community, so people feel they support each other, and so I don't have an answer, but that's kind of I want to share, hopefully initially how we did, and I do see that we lose the viewer, like the a lot of areas become to uh single point finger and the soul responsibility.

A

This is kind of problem today, yeah, please feel free to chain people and how do they feel and uh once direct has come back, we will discuss and more detail and uh to figure out and feel free also to chiming see uh how will build the structure, how to uh gain the knowledge how to get help from the community on different areas. Yeah.

D

That sounds good don like even if we can define like high level like you mentioned the different subsystems per se, and maybe people can volunteer in the areas they are interested. That might be helpful. Yes,.

C

Specifically for this area for cpu manager, I think uh from google rubin we'll be taking a look uh at prs. I will start mentioning him more and more, uh but I mean it doesn't mean that we'll have reviewer immediately like, uh but at least uh we'll have more attention to this.

D

Yeah perfect thanks sergey thanks john.

A

Sorry, I I couldn't immediately answer your question, but uh we are going to. This is a good timing and we are going to talk, discuss and make this more explicit and to help community.

D

Yeah, that's okay, like I wasn't expecting an answer per se, but it's something that we need to think about and how to.

B

A

D

In this yeah, that's fine thanks, dawn.

A

Yeah, so really do we? uh Okay, do you want to share the status about vpa.

F

uh Yes, um I think we pushed really hard for one two two, but uh my bad.

F

It came in uh quite late at the end, my own schedule slipped a little bit to start because as the distracted with the cove situation in india, so it started from there, but I think it's better to take this in one two three, given uh that we have some feedback that I think couple of major areas that rework that needs to be done is, in the api part status, how we are reporting it, how where we are converting it and, more importantly, when we get the container status back from the cri, we are converting it over there and initially that seemed convenient, and there were reasons to do that.

F

But, as I was going through a second pass of the code last week, I discovered another area that needs to be addressed. I've taken care of update containers, both windows and linux, but on the return path, I didn't do that, so there is really good reason why atlanta's feedback makes a lot of sense here.

F

So I'm going to work on that uh now that we have more time I'm going to work on it and do it right, but let I'm hoping to see if we can get this in uh early in 1.23 whenever the branch opens for check-in, is it going to be like early august or what is it generally.

F

F

Can you hear me.

F

F

I was just wondering what uh what target time frame we should shoot for. I'm thinking like early to mid august is a good time frame to get this in uh the brand should be open and we already pretty hot. I think atlanta is engaged and I don't want to lose momentum, but um I think it's important to do it right and alana also mentioned that there were some reviewers that not all the reviewers are on there and I wanted to identify and loop them in. If there's anybody missing.

F

As far as I know, tim has approved it and he likes it, and there are some a few comments that might come from uh abdullah or um scheduling they were looking at it. They have it's a very small change there, but uh ask them to take a look at the broader perspective, to see to get input from there.

B

Yeah, so tim is just the api reviewer, so he doesn't.

F

B

Like final sign off on the whole thing, so the things that I flagged uh there were three things that I was concerned about when I did the review. So the first thing was that nobody from uh who works on package cubelet cm had reviewed the changes there and they were pretty substantial.

B

uh The second concern was that there were changes made to the container status life cycle that caused things that were supposed to be purely transformative functions to be mutating data, so they were no longer thread safe. So that was a problem uh and the third thing was there were no end-to-end tests uh and those need to be in place for alpha. uh They don't have to.

C

B

But you have to have some end-to-end tests, so given all of those things uh like, unfortunately, I think you know the the work in progress uh only got pulled off of that pr like a week before code freeze. I think we could have resolved all of those things for this release, if that was ready for review like a month before the deadline, uh but because that wasn't the case, uh people didn't get a chance to look at it early enough, so we couldn't catch that stuff early enough, and so I that that's why it slipped.

F

Yeah and oh yeah, that's okay, and I think I've asked uh chen wang to work with you on the the tests uh e28 tests, how to get them running so she's gonna reach out to you uh we'll take care of that and uh regarding the the container uh where we convert yeah, we had to do's on that uh lent out point flagged them down, and I figured okay since the feature is going to be disabled. We can work through it during beta, but now that we have time we'll just get it I want to.

F

I was uncomfortable with the level of change that was there, because this one has been pretty well tested in the previous iteration of the design. A lot of the code that david ashford looked at has remained the same.

F

So when we want to change this just before so close to the code phrase, I was getting uncomfortable with it, but this is the right decision to put it in the next early in the next really so that we get a lot of big time and there are quite a few people who are looking to try this out. So we'll have no shortage of hands. You know exposing issues, potential issues finding them early even before it becomes official alpha.

B

Yeah, it's just got to be fully baked. uh I think uh the the only issue was that you know the reviews got so compressed that any changes uh that needed to happen there just wasn't time to make them so yeah.

F

Well, I think we have a few weeks now and I'm gonna try and I have some other things to attend to as well, but I'll try to keep some momentum on this and uh address the key concerns that we have for the tests. I think uh wong chen from uh red hat will reach out to you she's working on it. So she's.

B

She's on my team, so oh okay, I can. I can chat with her for testing.

C

B

Changes uh like I'm, not the only one uh who can review those. uh We have an entire group uh and in fact I can't even approve those. So I'm just a reviewer uh so yeah. um uh If you're more curious about that sort of.

G

Thing you want.

C

B

Topic to our ci subgroup agenda. uh That's a thing you could do as well.

F

uh Yeah, I I'm gonna, try and get in the involved as much as possible. There's a lot of work that I've left to finish this out. So I want to catch up with that as well and come back to this okay.

E

Yeah, okay, yeah in terms of the implementation uh I just want just because we have some more time. I just want to make sure that uh we fix two issues. One is the status part uh today. The problem is that after the container is updated, there's no signal sending back to kubelet so that the prg doesn't know that they need to refresh cache. So since now we have time we may want to spend some time thinking about how to solve that problem. So the other one is the the other one.

E

uh It was marked as to do because the the time frame is too tight, but not now, because we have time. Maybe we also want to address that is the the how to handle the results. We try, because today it's done by each part worker and we need to introduce a big log within cubelet to handle it. So now we have time, maybe we can look into how to let the the the sink loop to handle the charge instead of each other yeah yeah.

E

So because now we have time, maybe we have some more time to discuss that uh to properly implement those two things.

F

Yeah, let's do that and I don't know if you want to reconsider the the idea of when pods depart the node. Then we retry, because that's that I feel, is a smart way to do things, because uh if you retry, when everything is stable, then the retry answer is going to come back the same as nothing has changed. Why you're asking again, but if a part leaves a part that consumes resource leaves, then you have a better shot at getting the retry saying yes, you're good.

E

Yeah, but I think in kubrick, usually what we do is to use the periodical cleanup to handle the to make sure the the recon I mean to reconcile all this data make sure they are in the right state, because if you you make it even even triggered, maybe it even never comes and we don't get the chance to to do the thing. So usually we have even triggered uh update, but we also have the baiken theoretical, cleanup or recon sync to to sink the state.

E

So to me I feel like the recessed thing may may, if we cannot fit it in in the first at the first place, maybe we can do it but yeah. I think we can discuss more either. It's uh when even trigger the oh, oh periodical thing. But to me I feel, like it's better, probably better to do periodicals think it seems more reliable to me.

F

Oh yeah periodical sync will work uh event triggered. Is uh it's not a r, it's more like a hand, so we can have even triggered at a later point.

F

If we see that you know there when pods leave and some other part comes in and takes what was supposed to be the resource, it shouldn't happen now because we're using the max in the previous iteration of the design, we were not doing that uh so, once the part cleared out, a new part could be admitted while a pod, that's pending resize, hasn't gotten what it needs just because it's waiting on the period and it's not the right timing. But now we are.

F

We are using scheduler uh max of the two, and that has its downsides, but uh let's go with what it has, what it does and see if that works. uh Another change that I think that will help us is, uh and this is yeah I want to mention this so tim, I think, has been working with the api machinery guys to see if we can get the status as something we can rely on as a persistent uh guarantee from the store, and it's that cap is uh all in, but name approved.

F

So uh one thing that we are doing in kubelet is: uh we do this unwieldy, get part qos computation every time we want to see which class it belongs to? This is a tangential change. Independent of this feature. What we should do is look at replacing that with or that function essentially becomes once the part is bootstrapped. The qos class is on the port status.

F

Just rely on that, uh alternatively, is to use our pod structure c group structure to get that information, but I think we can rely on the particulars class and the power status.

F

This can be a separate like a. I don't think it's a it deserves to be. A cap can be a bug fix. We can drive that just to just to get everyone's attention on this one. That's a potential thing we could do. Cubelet has a change, uh has a few places where this change is important, and then uh there is one change where one change in, I think somewhere in the quota handling resource quota that relies on it, but yeah.

F

This game came up because I had a change in validation, which says, if you're trying to adjust the limit so that your part looks like a guaranteed class when you're initially created as a burstable, then we don't allow it.

F

Did I confuse everyone with this.

F

Okay, I can put something to get something in a slide or something and bring it up next week. Next time.

A

So uh so, really sorry, I lost when you see another cap and there's some distraction, and so, but please put that link if you have the already have that have. There include the link and also I noticed that you mentioned that uh six guideline wheel, review and please also uh I'd, uh come back to the update that review status. If there are any decisions made yet and one thing also, what's your mention something like the missing reviewers.

A

So what do you mean the missing you? I think you might have the enough reviewer for your pr and.

F

Sorry that was alana mentioned that not enough reviewers, so I was wondering who are we missing? It.

B

Wasn't so much not sufficient reviewers, as there were certain reviewers that needed to be engaged that were not so, for example, like there was nobody from package cubelet cm who had taken a look at those changes and so like. We probably want the folks maintaining that code to look at those changes like we talked earlier in the meeting about like folks who want to become just reviewers or approvers of that sub component.

B

So no one from that group who's focused on that code had reviewed your changes, so I'm suggesting that at least one of those people could have looked at those changes. Yes,.

A

Yes, yes, but before we got that one actually at least the dark and the eye on that here, both we are the uh packages cm's, the reviewer and the pro. So that's why I I this is why we asked originally we missing of the apm reviewer. So this is why uh I asked the team hacking help and also we missing some of the resource management reviewer. So this is the david dashpool and jumping to help even more and but right now he's so busy.

A

So that's why he cannot help, but then, but at least that part is being revealed and and also dark, and I actually reviewed the design. The reason right now this is cannot merge is just because api change came from the team hacking because we in the past we agreed about the api. Actually, api approve approved the previous api, but the new change, actually uh new changes cause a lot of implementation and which it is called about the cri level of the change.

A

This is why I asked landow help and and especially related to the pleg and the part life cycle, because nanda have the most experience here, and so so I thought, do we cover most things and the manual also, I believe, is also take a look in the past. So that's why I kind of like the wondering what it is review. We lead it and the missing and I have to be.

B

Some folks, looking at that code, specifically like I saw for example, comments. uh I also commented on some of the pod life cycle stuff, uh but like there were, there was effectively like nothing saying: yeah looks good or uh like no. These changes need to be made, so I wasn't sure if it had gotten looked at uh because the pr.

H

B

Only gone to um uh to reviewable like a week before the deadline, so it's not it's not a lot of time for turn around. uh So that's that was that was alright.

A

Yes, yes thanks for this one, so I totally agree because uh I don't think of the pi's ready and then this is why I comment before this. Today's meeting, because I know the relay is asked- is go on google for 1922 and then I think about is not to go, because it's not ready like that. I commented before this meeting and but I want to make sure we have enough of the reviewer includes in that one and make sure uh we next finished.

A

We got enough attention and because this is the big change the I worry about this is this. Stableness are our system, but at the same time this is important feature and so how to balance those kind of things and also the clayton recently have the pleg big enhancement and I think renee you need to rebase and you need the bigger rebase there.

F

Oh okay, okay I'll do it. Yeah that go no-go was for last tuesday.

F

Thursday was the deadline, so I was trying to see if there's enough confidence, I myself was feeling more towards no go after the changes that I made over the long weekend. But- and now we have time we'll just do it right.

A

Yes, yes, uh actually, the uh last last week- and I discussed uh so we do think about this- is another idea. Okay,.

F

A

So, let's move to next topic and.

I

Yeah sure, so is it possible for me to share the screen.

A

Yes, yes, let me find her. I tried to find you oh yeah cat, okay,.

I

Cool, let me try.

A

Let's go ahead.

I

Cool, can you see my screen now cool so so we discussed. So I had a cap earlier like to add up out of tree pluggable stuff to the pod admit handler, but we decided not to go down that path because api machinery, they came up with something to add on the control plane side where user cannot delete a web book and that kind of like mitigated the risk that we called out.

I

But we recently learned that sick api missionary has dropped that plan of adding a manifest based web book due to some concerns that they had with respect to with respect to the uh kubernetes admin permissions being mishandled due because of that manifest based web book. So instead so they dropped that plan and then they are working on like a different approach to see how to mitigate that.

I

And so that's why I wanted to reopen this kept on. Adding this out of tree plugin for pod, admit handler and then bring it up here to see like what you think and then and then decide whether we wanted to go down this path or not. So I have like quick three slides to just to cover like our use case, because we didn't go deeper in the last meeting. So I wanted to just go over this and then and then have some feedback from you all to see.

I

If we wanted to go down this path or not, does that make sense.

A

Cool, yes, make sense to us.

I

So yeah so just to set some context here like at aws. We support like different types of managed, uh node groups and one of the node group. We call it as far gate node group and in this fargate node group. This is more like a serverless node like we manage.

I

We run the nodes on our site, so our customers don't actually see the worker nodes and for this fargate worker nodes there are only like a few types of workloads that we support and there are like few components that we run on the node as well like that we don't want customer to get access to so in order to like make sure that we don't run pods. That doesn't does not work on the forget worker notes. Today.

I

What we do is like we have a validation web book and that validation web book validates all the fields and if, if some of the fields are not supported for our worker, node type like it just rejects the request. But there are also cases where, like customer, they delete these validation web book, but still have their own pods to be like running on.

I

Our forget worker nodes, so in order to like prevent this from the from customer directly like attaching their pods to run on our forget worker nodes, we evaluated bunch of options that are already available. So one of the option we evaluated is adding tanks to the nodes, but customer can tolerate the pod with any tanks and still have their parts to be scheduled on this forget worker nodes, another one we tried was to have a psps but same like validation of a book.

I

Cluster admins can go and delete those uh psps or validation of web book and then still have their pods scheduled onto this forget worker node. So when we were evaluating other options that can best fit our use case, what we found is the pod admit, handlers that we have in cubelet, so just to set some context for like people who might not be aware of the spot, admit handlers. So in cubelet there are like different set of handlers and one is called as a soft admit handler. Another is called a hard admit handler.

I

So when cubelet gets a part, what it does is like it invokes all these handlers to verify if it can actually run the pod or not, and when it invokes the hard admit handler when, when one of these handlers says that okay, this part cannot be executed, then it just moves the part to a terminal state and when it, when one of the admit, handlers under the soft admit handler, says that this cannot admit the part, then the part stays in the pending state, but with an error message saying why this part cannot run on this particular cubelet and what we thought is like.

I

It might be better for us to add a another another handler over here, which kind of like validate whatever we wanted to validate, therefore, like, even if customer deletes the validation web book or even if they have a pod which can tolerate any taint. But when the cubelet gets the part, it will call this soft admit handler, and then it will call our admit handler, and this will say whether to actually run the pod or not and yeah. Obviously, adding the code specific to aws worker node here doesn't make sense.

I

So what we did was like we added a interface which actually calls out another binary on the host and that kind of returns, whether to admit the pod or not, and we thought, like this particular use case- might be applicable for other cloud providers as well as for other use cases as well, because today, some of the admit handlers what they do.

I

They check for the docker version or the kernel versions to see if the particular docker has these functionalities, for example, there's no new privileges or the process mount and then and then admits the part and then decides to whether to admit the pod or not, and we thought like using external process might be helpful for other use cases as well and that's why we wanted to propose this idea and see what everybody thinks. So we have like few suggestions on the way that we can start admit handler on the way we designed it inside.

I

So we had like three different plugin types. One is the shell based one, which is similar to the cni networking model that cubelet interacts with another one. Is the grpc or a unique socket unit socket, is similar to how the csi or the ebs drivers or the csi drivers are implemented, and we can also have a grpc based uh model as well so before we like go into like how we wanted this configurations.

I

So we wanted to like hear from you all like whether this makes sense to have in the soft admit handler or if there's anything else, that's available for us to prevent this.

E

F

Sorry, I just want a quick question on that. Did a mutating web hook, not work to you know, uh fix up the part before it even gets to the scheduler.

I

Yeah, so the thing is like in aws like customers, whoever creates cluster, they get the admin access, so they can go, delete those mutating web book configuration or validating web book configurations. So that's why, like say, api missionary. What they came up with was with to add some sort of like a like a static, manifest based web book, which cannot be deleted by the cluster admins, but that kind of goes like in the way of uh cluster admin permission. So that's why they didn't. They didn't go down that model.

E

One question I have two questions: one is that there is many to prevent customers from scheduling parts to those nodes by mistake. It's money for mistake. Oh this is, is this is supposed to be some strong enforcement, and the second question is that uh even I mean cluster I mean, can can walk around the new emission web hook, but they can probably also- I don't know whether they can access the fog in node, but if they can access into it, they can also remove the handler right. So what's the difference? Yeah, that's.

I

Correct so yeah, so we will go over the first question, so the first question was whether this whether to stop the part to be landing on the far gate, node is the only concern, but that is one of the concern other concerns. Since we manage these nodes, we monitor we do the recovery process on this node. So we do have like few of our process that are running and we don't want customer to get access to those or see the logs for those as well.

I

That's fine and second one is the given that these worker nodes they run on our site, like customer, doesn't have ssh access to the nodes itself.

H

I

G

I I had a question um so what happens after the web hook? Isn't there or whatever fine um it gets to the node? It's rejected it shouldn't be there. It should have been on this node pool, not this node pool, let's say now. What um like is, is it's it's sitting there it's up to the user, to say: uh oh, the scheduler didn't do it right and I ended up over here or were they supposed to have a node selector or I just worry.

G

So I think this is kind of interesting like if you could imagine having a couple of different node pools with like kind of more severe sensitivity concerns. Maybe in one of the pools- and you don't want those workloads coming over here- this could be used to have like a next second layer of isolation to ensure that you know between different node pools running the cluster.

G

But my concern is you know: what's the feedback path for the scheduler to to see this and then go ahead, and maybe do the right thing that it should have done in the first place? um If I pass the schedule the first time, how do we handle now.

I

Right yeah, so that's a good point. So what we kind of do over here is like in this model like we have two different schedulers on all our clusters or, if you create and cluster in aws like you, will see two schedulers running on your cluster, so one is called the far gate schedule. Another one is the default scheduler and for gate scheduler. They act only on the pods, which has the scheduler name set to fargate, and the default scheduler, of course, doesn't act on them because the scheduler name is set to forget.

I

So this issue happens when a customer like when they explicitly like set the scheduler name to forget and then target one of the forget nodes.

I

So the scheduler doesn't think like come into like the impact picture here, but it's more mostly like customers. They are playing with it and trying to get access to the forget notes.

J

The the scheduler can then take a look at it as well right.

I

uh Let's just so, do you mind elaborating on it a little bit.

J

So I I think this is going to be very helpful. I think I kind of run into the some very similar use case for this as well, um and I think it's probably in a previous slide, where you mentioned the third case, which is the node managed resources uh which could be uh criteria. Yeah, note, local validations, yeah.

C

J

Which could be uh something basically the the scheduler or the admission hooks cannot do um and in in our use cases we may have cases where um you know it depends on how each core is filled, how much pneuma node is filled and these pots could be coming onto these nodes based on certain annotations, you've added to it, and all of these things are just not they're, just totally transparent to the control plane, and if we can customize it and you know, have it be pluggable and on the note level we can uh choose to admit or reject and have either the scheduler or the scheduler uh watching over these uh rejections and act.

J

Accordingly, I think that's just going to be very helpful.

I

Yeah, I guess you're right, like we can have a custom scheduler or with the scheduler outside of this, that runs on the control plane side that takes action on the pods based on whatever cubelet decides.

A

And I think young your case is quite different from what uh swiss use cases here. What you are talking about, I agree, there's some things not to transparent to the scheduler.

A

One thing that was most of the criticized: today's the scheduler is not the resource, usage or wireless scheduling right. So even like the vp earlier, we are discussing and many many many things discussing like that, and so so we there are. A lot of information is not to represent at the schedule level, cluster level, so so scheduler sometimes not do the good placement.

A

uh But what is discussing here use cases actually is quite different from the standard of the standard, kubernetes use cases here I have to say, and this is could be like they basically want to using the know the local pluggable part the admission controller uh to um what I should say so so so so this kind of could be rejected. So there are schedulers more like the receiver from the customer's request and using this one I do, but the it's not really node.

A

It's kind of what sounds to me: it's more like the virtual node, a fake node and using this one to do another cluster I will, or maybe whatever know the level forget uh virtual node level admission, so it's different, but the promise I want to I want to share.

A

This is the most concerned in the past the signal, the concept it is because, once we enable this kind of things like the arc, just ask the same question so then you reject it and if the customer could be using this kind of things rejected and the cause of the ping-pong situation, because it's not majority of our use cases is not the faculty use cases right, so majority use cases for kubernetes. Today it is there's the uh a proper cluster level, scheduler scheduler, it is pluggable for a giving cluster.

A

There is a sum of a bunch of the results there and it is connected. It's not all virtual. This is a little bit, and so it's real uh computer resource and the storage resource and all those kind of the device resources all those kind of resources there and at the cluster level they understand those resources. They do the really scheduling decision so and the know the level of the admission will reject it. Because schedule may don't know the real uh know the state, but it should be real.

A

We have tried to avoid the ping-pong situation we rejected something and then schedule the reschedule back and reject and the job and the output level being real run like level always stuck at the pending state. I understand in focus situation, won't be such situation, because your six scheduler is just receiver about a job from the customer, random customer. I guess and based on what you just mentioned here. um So then you basically just admit schedule just admit and then just try and you want using part animation controller decided.

A

Can I forget uh virtual note accept this one. Then you do another level of the scheduling to schedule to the real node behind the customer. Saying customer is not the next situation or service situation, so so yeah.

I

Yeah so dawn like quick question, so quick clarification on that, like the fargate worker node is actually a worker node, but the the problem that we are having is that uh if customer bypasses the scheduler and directly assigns a port to the node itself, if it does not go through the scheduler, but if they have their port spec with the node name set directly, then it goes to the node and without going through the scheduler. So that was the problem that we had.

E

So, based on based on my understanding, the foggy nodes are actual nodes, but they are provided by aws or eks, and you don't want customers to have.

E

I mean some two random parts there so that they get access to something within the node or you don't want them to access. They should know it as well. Basically, they can just schedule the the limited set of parts running there, but no other things. Yeah.

I

You, okay, right, whatever you said, is right.

K

Okay, I had a quick question. um I'm just wondering what happened um because I know there was a cap for the static uh static web hooks yeah, basically for the api server, where you can register a static web hook so that it'll be registered upon when the cube api server starts up and that solves the issue of customers or, like other folks, basically being able to delete that web hook. um That seems like that would address this issue as well and would kind of follow the existing web hook pattern.

K

So I'm just wondering um what's the downside of kind of that approach, I believe there's a cap open for that. Currently.

I

Yeah, so that's true so and because of that cap is what, like we closed, the earlier kept for that we opened as well in the sig node and but then. What I heard from daniel like recently is that the sig api missionary decided not to go down that path of adding a static web book, because that kind of like prevents cluster admin from with few few pro that introduces few problems for cluster admin permissions.

I

So that's why I guess sick api missionary decided not to go down that path, but I didn't dig deeper into why they didn't go down, but given that uh that particular cap is no longer valid, that's why, like we wanted to like uh discuss about the spot, admit handler again and that kept whatever you said has opened like he was planning to close it. But I don't think he followed up on that, but I can check with daniel on that danielle or vivek. Whoever is the author for that pr.

K

I

So I see a couple of raised hands there. Eric.

G

No, I just don't know how to lower my hand. Sorry thanks.

I

So like as part of this kept like in the under the motivation section, we we, like, I called out bunch of other things that we evaluated as well. So what I would like is maybe like uh take a look at those as well and then maybe in the next sig note meeting we can see if there are any other feasible approach or if we wanted to go down this path. And then we can discuss about the in interaction between cubelet and then this binaries and other stuff. Later.

F

Just a wild idea: is it hard to do like a custom list watcher that the kubelet listens to since you're configuring the node uh pretty much pass through for all purposes, except the filtering action that it needs to take.

I

And- uh and you are saying like use this custom list watcher and then reject the part.

F

Oh yeah or mutate it if you want note to be the ultimate gate.

I

But yeah, but that kind of like makes it like a one-way door for us, because that doesn't provide extensibility for for other users other than this aws, because what we are thinking is, this might not be just like helpful just for fargate worker no types, but it might also be helpful for other customers who wanted to restrict or who manages worker nodes, because.

H

It's it's potentially useful for anyone who runs kublet on a multi-tenant environment, so say, for example, if you're running um kubernetes nodes inside you know vms on shared infrastructure, where you can also mutate nodes outside of the kubernetes control plane.

H

um I think we do something weird about that today, uh but I'm not sure exactly what.

I

Yeah, if you have some uh use case or if you know about something and if you can comment on the kepler, I can follow with you later as well. And then we can include that as well as part of this skip.

I

Cool. Thank you.

I

Does anyone have any other questions, or else like we can so I was thinking we can discuss this week so give a week time or two weeks time for everybody to go over this and and feel free to add your suggestions or feedback. I can follow with you offline and then we can bring it up in the next week meeting.

I

Does that sound good.

B

Sorry, I'm just trying to find. um Is there like a link or a cap, or something like that to look at? I didn't see anything in the chat.

I

Okay, yeah, so there's a cap. Let me it's there in that sig note weekly meeting doc, but let me paste it in the.

B

Okay, if it's in the meeting minutes, then uh people should be able to find it.

A

Yeah, it is intermittently and also small, so this slice also increases one of those peers, perfect.

I

Thanks yeah, that's it! I have thanks evan.

A

So yeah, I think, that's the good plan, so, let's uh at least please reach out and for your use cases. I think the young use cases we we we know, and even our personal thinker- well that's kind of potential use cases, but did the particular concern also is real for us. So so that's why yeah so please and uh reach result use cases. So we can make that uh we, we can take a consideration and how to make this is generic useful before.

I

Yeah, it makes sense zone, so I was thinking about yarn use case as well, and I was thinking that we can actually extend whatever I was proposing for other use cases to perform some node level validations that jan wanted to do.

J

Yeah, so it seems like um that's well aware, um but I just wanted to maybe add to that which is you know when I tried to do this in the past I looked at the topology manager, which can do an admission and the cpu manager, which also does uh animation, and I was like. Oh in our case, we have something that's sort of like partially managed out of the core. So what if I can have a web hook to it, and I can't based on my logic, design that'll just be super awesome.

A

Okay, thanks and also, I think, there's someone earlier sorry. I forgot I I didn't remember your name, and you mentioned multiple attendance use cases please also and uh make make them more clear and list into the cap appear there yeah.

A

Okay, that's all for today and any other topic not in the agenda. You want to call out or get everyone's attention. Otherwise we will call it off today.

A

Okay thanks everyone and you got to five minutes back thanks emma.

A