Kubernetes SIG Node, 28 Jun 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG Node 20220628

Description

SIG Node weekly meeting. Agenda and notes: https://docs.google.com/document/d/1Ne57gvidMEWXR70OxxnRkYquAoMpt56o75oZtg-OeBg/edit#heading=h.adoto8roitwq

GMT20220628-170315_Recording_640x360

A

All right uh well welcome everyone to the June, 28th uh Sig node uh meeting.

A

um We have a fewer participants than usual today, but that's okay, I apologize I was out last week, um so uh first on the agenda, uh vinay um looking for probably feedback from me on the In-Place pod vertical scaling.

A

uh Anything you want to highlight.

B

Yes, hi our direct I think yeah though I looked at I was away for the last couple of weeks. uh There was a conference last week in Austin I'm, now back uh for one week and I fixed the uh the issue that you had found in the unit test and also the CRI the spurious field there was there that Mike identified. That was something we put in into the cap and the initial version of the code before 120, where I believe the windows containers came in and I missed.

B

Looking at that one, while I was sporting things over, so we can remove that safely without any issues. I've already done that now, what remains, in my opinion, is uh identifying the interaction, the question that you had with the CRI interaction on the resize status, how it's generated um I, went back and did verified with the test that with dockershim, let's see the transitions that I like and what I need to add this to that we can currently close the loop until we have the containerdy change in, but probably a partial test as possible. I'll.

B

Look into that. The other comment which is uh which I think we should nail down, is uh uh about the expected behavior of the runtime uh I think there is still some um uh fog around what the current time is expected to do. uh I'm trying to understand myself whether what we get today when we query uh when we look at the C group stats, is that the cash value or is that the enforced value, because our test currently uh looks at the C group, it's invasive. It shouldn't be doing that.

B

But in the absence of having the container status implementation, uh we are verifying that the update has been enforced by looking at the values. That's configured on the C group. uh So.

C

I don't know winter is here yeah when I am yeah I'm here to Peter S2, so I think, while looking into the rancy behavior right like there was a Delta between V1 and V2 yeah. But what we're trying to do is if the update succeeds, then we can assume that the update went through. Otherwise, the runtime will throw an error. That's okay right! Then you don't need another loop to go and figure out whether it was applied or not.

A

Okay, so make sure I heard that properly. You know how my audio hooked up, so the question I had on there was if the update action was Atomic or not Atomic from the perspective of the caller you're saying uh the desired. Behavior would be that it's Atomic, but right now on B1 it might not be, and in B2 it is so.

C

Rv only touching memory and CPU here.

B

Yes, uh the memory example I put in there was because it's the most explicit way of seeing what happens and I believe that's what I uh demoed in the last, the kubecon Barcelona when I showed the earlier design of this. uh So you try to run a stress test that allocates beyond the limits that you have and that the home killer comes in wax it now.

B

um That's that's! Okay, but I.

C

Think the reason so we know the reason I'm asking is to address like Direct's transaction question, yeah.

B

C

We are only worried about memory and CPU yeah.

D

C

If on the runcy side and like any implementation side of fee update memory, first and I think we can get transactional Behavior right because CPU is compressible and memory is the one that is likely to fail. Yes,.

E

C

We start adding other ones, then it might become trickier, but we should all the short term. We can.

B

Yeah no I'm thinking long term so for the we should probably look for a behavior, that's consistent! That's uh the asynchronous seems to be the uh in my mind, the more reliable Choice here, which is what I got from you cash and then you apply that's what I I think Peter mentioned in that comment, and that's fine. uh As long as uh when we, when you query back, it should tell us what the enforced value is for limits.

B

It's one thing, but for requests today we have requests, we are locally uh checkpointing it and that's what we use is okay. This is the amount of memory we have reserved for your pod and that could change to okay, we'll get that information observed from what the what the runtime is telling us now then I think it's important more important for it to be true, as opposed to a cash value that we're going to apply this. That will probably result in better Behavior or the correctness of the that's.

B

What I'm more in C group V2, where we have memory requests? That's where I'm getting a little more concerned about what- and this is something we probably will. Also uh make it as a guidance for other runtime implementations as well. So we should probably think through this a bit more.

A

Yeah so I guess my bias would be that the keyboard can make an update request and if it succeeds, the Assumption from the qubit was that it did succeed. I'm not left in a in a big state. If I am left in a big state, then I'm even more wanting the Callback from the runtime to tell me how I'm in a vague state, so I can make a call to update again to reconcile so um I'm.

F

Trying to think about how to proceed on.

A

This particular PR, though right because we we have two open implementations that we know will be done and probably need a tweak on run, C itself. So.

C

One may not be that hard to implement it's a challenge of getting it into this CI, but I think that's just for v2 and we'll have to check the memory and CPU ordering, and if you want to do the async one, maybe we can bundle it with the invented plague.

C

Yeah as an async Channel too.

B

C

To send back the information.

B

That's yeah, that's that would be desirable, but for now, uh as far as the update container requests, the behavior goes. uh I think I tried this before this is more applicable or relevant when there is a series of like a pod, has multiple containers and we are increasing the memory on one container and decreasing it on the other. In that case, we order it such that you know. Let's say we uh do the decrease first and then the increase.

B

So even for a brief moment, we are not overshooting the total net end result memory and if one of those fails we uh bail out at that point, we and then come back and try the series again later so that we're not trying to apply something. That's Downstream that requires an increase when a decrease has failed, so that behavior is there currently in uh in how the multiple containers update is being processed.

B

uh It's for retaining that behavior, where I prefer uh at least uh CPU I realize it's easier to manage, but for memory at least we should see if we can do it uh uh synchronously at least try and then fail. If it succeeds, then great.

A

So maybe uh sounds like we all agree. There's some fuzziness here. We need to.

B

A

In the interest of not holding up the chance, particularly well, it's an alpha State yeah.

G

F

A

When a uh you and Bernal like get a comment on the pr that says um like, in my view, it'd be good. If we can make the update, call transactional for the impacted resources, so CPU and memory, and then, if, if we know that that's not fully achievable in run C today then like get that enumerated as a as a beta, blocker, I guess, but I, wouldn't necessarily do it as an alpha blocker um uh but yeah.

A

Just getting us writing down what we want to get to I think is helpful and so like uh uh yeah.

G

We don't not aware of any immediate.

A

Plan to expand this Beyond CPU and memory, although I think it's good to think through it, but even memory limits today on V1 I think we still hit this problem like you. We let you change either value, so um uh huge Pages aren't over committed on a node and are not uh resizable in the current.

B

There was a request I think in the cap when the cap was in progress to if we can do this for ephemeral as well. I push back to keep scope contained for.

A

Which? One? Which resource.

B

Ephemeral storage.

A

B

I, don't know that won't.

A

Even be managed by run C so um yeah, let's treat that separately. I guess.

C

Heard something going for now enumerate what we want in the end and then my trade towards it yeah.

B

And I agree: this is not an alpha alpha blocker, although we should resolve it before beta. It's uh feature gets it turned off by default, it's not going to harm anyone and we can always constrain it. Okay, we'll take it one at a time. Instead of you know doing a series of containers at a time, we can always do that as a fallback. So.

A

And then I guess as containerdy and run C and any other runtime or cry over any other runtime implement this. The guidance would be to apply memory before CPU and that would get us as close to transactional today as possible. Yes,.

C

We can also possibly possibly do it in runc I'll have to check the code when we pass both what order it's supplied in, but if we can change that order, it's taken care of.

B

Yeah, let's iterate over the guidance of this one I, think that's an important piece to add to the cap. Okay,.

F

E

Right uh well, I'll take a look at that. Can you put it in the cry API too, while you're ready.

F

You say we need, it needs to be in the right places. I agree, ment.

E

F

If we, if we need especially.

B

If we need to change sequence, yes, uh once we identify the exact, it should be like four or five short sentences. I want to get to that and then yeah. So all.

A

B

For the catch yeah.

A

I'll, look at the other changes. uh I think we're getting close on this one, and this is obviously important. So um great yeah. Another.

B

Logistic more housekeeping uh logistical question here is uh uh this: Gap? Is uh it's sprawled all over the place? I think we have API uh goodness from uh ngtm from Tim and uh I, want to talk to the six scheduling and see. Should we break the scheduling piece, it's a very small piece: uh should we break it out into its own PR so that it makes it easier to get the LG TMS to merge this? Otherwise we probably need at least three different. uh We.

A

Can manage I, don't I, it's still I! Think it's just since ignode I. Don't think you need to have a ton of.

B

No for the pr to merge, uh because it's touching it's touching- it's quite fingers everywhere in kubernetes I, think so any any file at touch that doesn't there is a owner for that. That needs to like six scheduling needs to your GTM We.

G

B

Off right, it's a little bit as thinking more about that as we came as we come close to the to the entry Gates I.

A

Think that would be helpful. Let me just review your last set of changes. Let them know yeah and then um yeah. It is Big, it's one of those PRS that we open up and it GitHub things for a while as you. How does he go through yeah.

B

No, no, no I'm getting to the logistical part of how to who all to you know best to get this merged and then okay. It would be easier if it was a separate PR for at least for the scheduling it reduces. The number of LG teams needed to merge this, and that should that should be ready and it should come in. It won't be able to merge until this is in it's good. That's how it's going to flow yeah.

A

So I was good with the API and quota pieces and the scheduler piece I agree. Wasn't that complicated, so yeah? um If we nail down the last set of a note stuff, then I think maybe uh yeah a rapid fire breakup of like discrete PRS, would be quick.

B

um Yeah I could do that I'm running three separate branches for each of the major pieces, anyways internally.

A

All right: uh well, let's get a comment on there, just describing the runtime behavior and a comment on the CRI itself. That says yeah.

B

A

Recommended order, um okay, all right! Well, thanks so much for now and Manuel and Peter and others uh Mike for joining us. Thank.

B

A

um Next topic, I think Brett.

E

Yeah I'm here: can you hear me yeah.

A

E

Great, um so just a little bit of background here, um the special resource operator um is meant to manage how to treat kernel modules for kubernetes for us as well, in the open shift space. But um we have got approval from sick node last year to create a repo and then from there work on transitioning a code base into that space. In order to continue with that freedom development.

E

um As we started working more with the code base, we realized that we had a couple of issues that we ran into with security and also a certification on our side supportability.

E

So we decided to refactor the code base into something that is more kubernetes friendly and also would allow us to be able to contribute Upstream more efficiently and have our partners participate in that space more effectively as well as a result of that, we also went through renaming with the community to rename this operator, and we came away with a kernel module management operator as the new name for this.

E

So the requests for the meeting today is to see if there are any objections to renaming that repo to from the special resource operator to the kernel, module management operator and once that rename takes place, then we can start pushing our code base that we've been developing into that space and again began working out. If they're, like okay and uh just wanted to put that out as a proposal, not very technical. But.

A

um Really get everybody's thoughts on that my first ask would be um if I recall, uh it was uh folks across in Via Red Hat, maybe others who were working in the community for this, um and we had a presentation from svanko in the video on like what the operator as envisioned was going to be uh I, guess I'm, trying to figure out if the name change implies a change in vision and if there's like a representative, read me that you could provide.

A

That explains, explains the change, because I guess, um aside from the rename it's not clear, if, like the ask, is changing scope of any kind. Does that make sense.

E

It does the scope of what we're looking to do with this operator um actually reduces uh input print, and the idea behind this was to simplify the code base, um make it more friendly to use, but for those who may not know the historical background of SRO, it was primarily home-based and the original idea behind SRO was to use that as a stepping stone to then instantiate other partner Solutions inside of kubernetes, and so as we started to reverse that path.

E

We realized that that's not a sustainable support experience and it's not something that would become over long term an easy way to manage, and so, as we were looking to push the code Upstream, uh we were removing all references to partner, specific Helm charts to deploy their Solutions and we changed our consumption model to be something more along.

E

The lines of we're just going to deploy and manage the life cycle of your kernel, module and device plug-in inside of kubernetes, and so the partner in themselves would build their orchestration piece inside of their operator, which would then have SRO as a dependency. As we made that strategic change, we then decided to you know hey. This is an opportunity for a rename and an opportunity for us to simplify things.

E

A

So um is it possible uh I know you have a link to I guess the project meeting that was being maybe.

E

Because we have a link into your community meeting notes in there, where we've discussed the proposals with the change, the remains and agreements uh that took place there. I don't have a specific readme there today that describes the differences between what SRO original proposed versus what we're doing with kmmo kmmo itself is it's just a fraction of what SRO was intended to do from a health perspective, so.

A

I, that's fine, I think I. Just uh it's it's important. This is like understood or transparent and like if we deviate from what we originally presented, we clarified. So would you be kind enough? Maybe the folks that were participating in the community meeting to like write up that read me and send it to the mailing list and um um folks have a chance to review it. I guess in the same way that we had a chance to review the original presentations from smoko.

E

Absolutely and uh quick news on the call with us today is our lead in that community space and he's also got some slides that we can share around kmfo the design behind Cameron MO and the use.

A

um All right: well thanks Brett and the group that was exploring the space, and, um let's just do that as a step forward before proceeding um to make sure there's no disagreement on uh folks. If it's substantially changing the nature of the project does presented sure um all right. uh So moving up uh next uh Adrian looks like you are ready to be merging your implementation together. Is that right, yeah.

C

So Derek I'll I'll make a final pass and then I'll hand it to you for approval. Okay, we are close, yeah Adrian, anything else. You want to add.

G

No, that's that's exactly it. The the the there are a couple of reviews. Positive reviews have have been done, and uh so now the final approval is is missing. Waiting for. That's that's all there is currently yes.

A

Okay and then, since it came up in the cap, review I guess uh this exposed the new endpoint on the cubelet server process like I, guess, uh I, don't know if you can just make sure that it's not open like that. I guess: I forget what we said for securing that endpoint, but that would probably be what I would focus on.

C

You you mean beyond the node right, like it's just locally.

A

Yeah, like I, thought this had a uh a new rest endpoint you could hit on the cubelet server process. That said, do the checkpoint right, we're not mistaken yep, yeah and I've I thought Tim o'clair had some questions on how it was secured, so I don't know if we've deviated or anything on that so I don't know. Just look at that closely. I guess.

F

A

uh uh And then the last topic here was uh Swati.

D

Hi, can you help me yeah? Okay, um so this is related to topology, where scheduling work. We had had a discussion back in 2020 to get a repo created uh for populating crd based API, that is consumed by the two components that are part of topology about scheduling.

D

um I had proposed a PR to do to kind of populate that repository with the feedback at the time that we got was that it'll be better to pursue these components out of tray, and that is what we've been doing so far. So we have nft topology of data component, which is part of NFD, which is consuming this API and topology, where scheduler plugin, which is in the scheduler plugins repository. So both these components are part of projects that are under kubernetes sake, uh umbrella and we've been um like they are consuming API.

D

That is part of an external GitHub organization, which is what we created to unblock ourselves. So I just wanted to bring this discussion back to signal to see if it makes sense to to maybe pursue the direction again and populate these apis, or this API again and I think it. The issue has been linked as part of the agenda and I have a bunch of PRS as well, but I just want to see if it makes sense to go ahead in this direction.

A

A

I'm trying to remember our past discussions on this, so I apologize.

D

um Is there anything I can do to help that? Well.

A

Maybe others on the call have a fresher memory. I recall we so the flow on this was we. We had apis that allowed you to introspect qubit assignment, and then the cups had explored building layers on top to do technology. We're scheduling, I, don't recall if we had the intermediate step. That said, should we create that as a repo in case eggs or not, or is the ass that we did? We create the repo and the repo.

D

Just populated exactly so, we we got approval to create the repo, but at the time we were proposing that the scheduler plugin be within kubernetes itself as an entry plugin. So he was. It was suggested that you know we explore these two components, which is one component, is the component that exposes uh the information on a per numer basis and the other one is the scheduler plugin that consumes this information and uh in order to make a topology where scheduling decision.

D

So the question here is: if we should pursue populating that, because the API itself, sorry, the repo itself was created, but never populated.

A

D

A

To figure out is, did anything deviate from?

A

Was it an issue that we've learned something since the original plan and approval to create the repo was done that made us act differently or is it just a matter of like we love the folks working in the space lost capacity at that time to go? Do it.

D

Kind of both I think we didn't have capacity to pursue this, and the other reason was that previously we didn't, we didn't have the component per se to that were consuming apis and they were mature enough. It was just early phase, essentially consuming um apis that were created on like random folks and things like that.

D

Then we created like a dedicated GitHub Organization for topology, where scheduling work and the API has been relatively stable since and we've had um uh out of three compute components that have made their way into uh appropriate repositories, one of them being NFD and the other one being a scheduler plug-in Repository.

A

Yeah um I I, don't know how other things feel I, don't see a reason to deviate from what was already I guess thought through and but my memory is a little uh week right now on this, um as does anyone else have opinions on this.

F

I think it would be good to move those API definitions out of external or optimization to kubernetes six, uh because that means we can use both all uh cncf infrastructure for, like approvals, review or streams to automatic, not just and so on. So it's a good step forward. I think.

A

I guess when I look at the linked PR here, basically, this is being treated like the CRI, it's in the main repo, but then getting synced out in staging that's.

D

Right yeah, so the football the proposed uh API would be within the kubernetes project would be in staging and then it would be. The publishing bot would create a mirror of it in under kubernetes organization.

F

F

So for any kind of Serial users, usually kubernetes, isn't it.

D

um Again, I'm not sure previously we were, it was suggested to us that the cre base apib part of kubernetes organization um I'm, not sure how, if we were to propose it under kubernetes, we we would run into circular dependencies because we are, we have those components already under kubernetes sick and then they be component uh they'd be consuming an API within the kubernetes state. I believe.

F

D

Yeah, so that's the current state. The current state is that we have external components consuming that API, but in the long run we might want actually to have internal components, so the scheduler plugins that are outside currently might become native scheduler plugins.

G

D

G

Yeah so I think that was the reason to black person to get it in kubernetes organization.

G

A

Yeah um yeah, so uh my it looks like we had discussed this back in November 2020.

D

A

So it's I think it's fair that each of us just takes a moment to refresh what my thoughts were, but I guess uh my own feeling in this body was like um it's been a weird few years, and so it's good that people are able to pick this up now um and uh thanks Alexander for so spawning. It uh I'll just quickly look through on what the history was, but it seems fine to me that we should populate it, um particularly folks in the community, we're like willing and able to work on it.

D

Okay, so will I wait for kind of a lazy consensus on this or will I wait for someone's update on one of like even an acknowledgment on these peers.

A

um I'm I'm, just as we're talking here going through the old Community, related changes on the repo creation request and make sure I then can look at that perfect PR.

F

A

A

A

If I have, who has.

E

A

Permissions to approve the new staging repo.

D

I think the staging report- oh okay,.

A

um Is the staging directory itself will require some higher level approval, so we'll probably need to find somebody outside of just within the Sig at top level, um and then maybe the ask would be in the interim. If there's it looks like you have yourself, and at least one other individual is a security contact. If, if, um if this list is up to date or if others want to um be at it, that would probably another thing I would look at, um but yeah, uh maybe Alex.

A

If your timer, maybe you can help out, may just make sure that we're doing the right things, but it looks like uh seems fine to me, so we can.

F

E

Can go back on our.

A

History, if either one of us can comment on it and say yes, this seems good, like let's just unblock finally making progress. I guess.

F

A

D

Cool thanks thanks. Everyone.

A

A

And then related to this is a new repo I'm, not sure if there's a weird enhancement thing we need to think through here on this one, but there was nothing going on core.

A

Thank you, but this release related to it. So yeah is there any impact? If we do this now, with respect to like the changes in the enhancements process between when this was proposed and and now.

D

I'm not sure I understand the question. Are you.

A

Talking about like traditionally like if we make a CRI change now like that, goes through like a cup and we it's.

D

A

Object in to keep Cube and now link to like a cut template, and uh we just had the enhancement freeze and that whole thing, um since this will be derived from a staging directory within KK itself, I'm just trying to think through up. There was a any um thing we need to do with respect to the enhancements process, to make sure that uh Sig release is Happy.

A

um So maybe, if we can also before we merge it, get a release lead to come, and if there was something special here, we needed to do. Okay,.

D

Yeah I can I can try to get someone from release team to look at it and see from like process perspective. If there's anything, we need to do awesome.

A

All right thanks so much buddy.

G

F

Was actually one of the reasons why I commented about kubernetes six, it might be a lot easier to create a team and start using it properly without a democracy related to the station part. But let's, let's.

A

Try yeah yeah I just want to make sure we're doing the right things um and nothing else to be undone or look.

A

Look that incorrectly I guess uh given all of us getting up to speed on what we thought in 2020., um so yeah uh with that I think uh that's. Why just give us some time, maybe today or tomorrow, to refresh and then we'll comment on the pr and then make sure the release team was good.

A

uh Any other topics that people want to bring up today.

A

All right: um well, uh thanks, so much for those who brought topics forward and um we'll see you all I think next week might be the Fourth of July holiday in the U.S. uh Maybe we should decide if we.

G

Want to meet next week.

A

Or not, yeah I'm not sure how many people will be here or not uh either way. Well, I! Guess we'll play it by year on that Tuesday morning. So uh for folks who are here next week, uh we'll see you next week bye everyone.

C

G