Kubernetes SIG Node, 11 May 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG Node 20210511

Description

Meeting Agenda:

https://docs.google.com/document/d/1j3vrG6BgE0hUDs2e-1ZUegKN4W4Adb1B6oJ6j-4kyPU

A

All right so uh welcome everyone to the may 11th uh trinity, signed meeting. uh We have a few items on today's agenda uh along with our regular uh order. Business. um Sergey did you wanna? I don't know if you're available do you want to give any updates on where we are with uh uh incoming and upcoming code? I guess, especially in light of probably focusing on caps.

B

Yeah I first became I don't have much update. I pasted the table um and um I was super under the water last week, so if uh somebody else can um say something what's happening uh in regards to signals, I mean I think this table is a great way for me to just review what happened. So I will do that.

A

Sure yeah, I would say at a macro level, I know for myself: I've been focusing on helping people shepherd their cups to the upcoming deadline, for I guess the 13th, um and so maybe uh in the spirit of just giving an overall update. For that. I don't know. If renault and alana we wanted to talk through the items that we were tracking in the shared google doc, and at least we could.

C

Yeah, uh I mean from the enhancements team's perspective, they're tracking, in the cap spreadsheet, for the release, rather than like our dock. So uh after last week's meeting everything that we agreed upon that we wanted to do for this release I put into the spreadsheet and anything that like didn't, make it because it was red or there wasn't a pr or anything like that. I did not add to the spreadsheet. So uh I can.

C

I don't have screen sharing, but I can at least give you sort of the stats. uh According to the release team, they are currently tracking right now, 22 enhancements for sig, node and one has merged the rest are at risk. So the deadline is this thursday, at which point things need to be merged, uh like you know, meet all of the criteria in order to be considered for inclusion and the release and anything that uh misses that deadline, an exception would need to be filed.

A

Yeah so at least for myself, when going through this this morning and will continue this afternoon, I think these set qualified domain name kept graduating at the ga. That was tags, so I think across voytech and myself that was good. um The setcomp as default work that tim, eau, claire and sasha were pushing. I I had to prove that as well, uh that looked pretty good.

A

um I know marinal had a cap open just before this call on priority ordering uh of shutdown among priority classes. I think I had one minor knit on that, but that also looked good and then um I know alana had a number of comments on your swap one, but I think we're getting there.

C

Oh, I was able to figure out the filter view. So, if you want to give me co-host, I can share my screen and show you what the release team is currently tracking for node.

A

Yeah, you should be um good to go now.

D

Okay, while waiting for atlanta, I also have the quick update the contender notified, and I think the owner already reached me, so they want to retarget the two next units, not this one and because size also overwhelmed and also the cab owner also is busy, and, I believe the sad weather said, weather less and the ci4, I think almost getting suppose and the the peters and the david one and the swipe, I think we kind of get the front highlight, will already agree a while back and there's there's some detail for counter.

D

We need, I didn't miss. The basement. Understanding is, should be okay for this one, that's the stuff, I think, can we maybe we'll discuss? Oh by the way. Another thing debug container also, I think, should be okay, it's more it's from like the cap owner how much his time committed for that one. But I think the car provided basically is ready.

A

All right cool, um so alana do you have this.

C

Yes, this is a spreadsheet here, so these are everything that the release team is considering at risk right now, so here's the one that they have said is tracked, which I guess has been merged and all that uh other than that one which has been merged. I would not trust what it says in this pr status column right now, because I know, uh for example, this one says prr approved, but it's not so, uh but we're hopefully going to use this once.

C

This is all filled out in this column uh to be able to track whether or not prr is done uh and uh when all of the prs have merged as an update for caps and uh they meet all the criteria, then you know this will go from at risk to being tracked.

C

These are the enhancements contacts for each thing. So this is the person on the in on the release team, who's tracking that enhancement and is making sure that you've met all of the criteria, um and I think uh almost all of them are something graduating. We have two deprecations uh and it's kind of a mix across the board between alpha beta and I think, there's only a few things going to stable, so yeah.

A

The thing I'm confused on is uh reconciling don's feedback and I guess seth's concerns and the author's concern on container notifier and now that that's marked as tracked. That's.

C

A

C

On with that one I mean we can. We can specifically look at that one if you want. Let me see if I can find it, but.

D

I guess maybe cat is merged, but there's no way they're going to finish the real implementation. This.

C

Well, we can remove it from the milestone if we want to.

A

Yeah, I think it definitely should be removed from the milestone.

D

Yeah yeah, I can't I can't I can read targeted to 1.283. There are things they already asked.

A

um Yeah, I I know much of my afternoon is going to be spent reading the rest of the caps here. So um so, hopefully we can yeah.

E

I think another one another one I think the usernames faces. Probably unless you know someone is actually gonna work on it, I don't.

D

Yes, I I I'm so sorry I brought this back. We couldn't find the people in this uh we need, but do we definitely want to find the people as soon as possible, but for this release there's no. Everyone is overloaded. Okay, okay, yeah.

E

So sorry about this, I actually gave up yeah. You know what I think. Maybe we can just target like a cap during this time frame and then the next one we can target implementation.

D

Yes, yes, the same thing for container notifier, because I know it says overloaded. I already call out and internally try to find the another reviewer and the pattern is success to review this one yeah.

C

I'm really not sure why um the release team called it track, maybe because tim hawkin put an lgtm and an approve on it, but there's no prr done here, so uh I'm not sure and and also tim is not uh approver for sake node, so I mean we can go talk to the enhancements team and tell them that, but this isn't merged. So I'm not sure why that would be called tracked.

D

I think they just misunderstand team only just because in the past we have like a back force like the even direct high involved several meeting with team. It's more is from the scope and the content notifier scope, because that could be huge and then api how we are represent. So I think the teams also have some compromise this time and uh okay with the lyric down scope and also okay, with the current api, similar like the vpa right so back, we agree about the design scope, but there's the back force on the api.

D

So so that's kind of he only approved on the um he's on the api level. So that's different.

C

Yeah, because I'm just looking at like the this pr column, I don't think it's accurate because I have, I think I was very close to pr approving on this one, but I did not yet approve it. I know that this one has not yet been approved for me and so on and so forth, so uh someone's probably enough to go through and update this, uh but at least you know this is this is the list of what the enhancements team is tracking and the issue numbers and whatnot?

C

Basically, everything on here right now is currently at risk. So we probably need to prioritize what what we think is actually going to make it into the release versus like what we think is just not ready.

A

Yeah so at least for my part, I think I've gotten through a number of these cups this morning, we'll get through the latter half this afternoon um and I hope to have any comments needed for those by end the day today um and hopefully that won't be too bad um and I think anyone else in the community that wants to do that. You know please let your comments be known, uh thanks, alana for uh pointer to this.

A

um I think the next item on here was uh vinay on your two cups, but um not sure if you had additional information, you want to share beyond what we already discussed now, which was we're working on getting in so.

F

Not much uh the only item that I was concerned about is, uh if we can identify a code.

F

Review in june july time frame and david ashball, uh even jordan, leggett and tim uh to to a certain extent, look right quite closely at the code before tim figured, we probably want to remove the resources allocated being in the api in the pod spec and move it to checkpointing in the couplet. So my plan is to take that code, which has been pretty well reviewed and move it lift and shift surgically on most parts of it to the latest code base.

F

So the minimal review would be required for the for the status updates and api. uh The pod spec update and the container uh the sync the sync pod loop.

F

uh The new thing would be uh your checkpointing in the kublet that will be new code and I'm wondering if there is a reviewer available to look at that closely. So we ensure that uh this goes in with high quality.

D

Okay, I I can't ask how I know that today have some.

B

D

Yeah, I will ask your help on this one, because he'd have all the experience in the past and he kept asking me what he wanted to come back to the signal where.

F

To help so I think it's.

D

Perfect because he also participated all those old discussions.

F

Yeah he I think he was there parts of it at least the cri part. He looked at it and made some. We made some uh changes to the cap, so that's great! So if we can get him, I'm going to plan on starting I'm going to start working on the code. What remains is to look at the prr section. I added that move things around a little bit to line up with the latest templates, so I believe derek and you can look at it and uh elena elena has already looked at it.

F

I think the first pass and uh for the most part correct me if I'm wrong, but you think it's close. So we need uh three of you to sign off and then the cap should be official.

C

Yeah, I just did the prr review, so the prr is very close. I didn't read the rest of the cap. Okay,.

A

All right awesome, um the only comment is, I think um I don't think the checkpoint stuff has evolved deeply since uh lantau had looked at the area, but um I'm happy to help out there as well. So.

F

Okay, yeah I'm gonna, look at the cpu manager example and then just pretty much follow that and.

A

Yeah, that's what I was gonna reference is. Basically everything that was done in those areas has been pretty stable.

B

A

F

A

Think the next family agenda was a follow-up on the disc issue. I think.

A

uh Is um jack here to want to talk about that.

G

Hey folks, so just real, quick I'll, just paste a pr. This is a candidate fix to address um it's actually documented in code, so there are known races between setting up the container manager and getting node status. Sometimes the ephemeral storage allocatable data is not yet around, and so I've been spending a lot of time in the last couple weeks, figuring out a way to surgically improve this to uh minimize the race condition without refactoring everything, because a lot of it is pretty hairy there.

G

But if anybody's opinionated about that there's the pr. I don't think I I'm going to spend too much time talking about in detail, because it's it's hard to summarize.

A

What's the I think, last week, when we talked about this, I was like probably see advisor was late and getting some information is that basically, the kldr.

G

That's exactly right, and it's actually known it's like there are code comments that describes c advisors, sort of role in the various flows. So there's some serialization that happens. The advisor comes sort of next after the initial container manager instantiation, which means that we can't do a root fs. Quite yet um in in most cases it seems the the first node status request comes after c advisor has started and the container instead, I'm already at a super low level.

G

Anyway, the container manager gets that information, so the node status has the right allocatable storage data, but that doesn't always happen and sometimes the node introspection is faster than that, and so the node data representation includes an allocatable storage of zero.

A

Yeah, um so just trying to think through, like what's the actual impact of this race, because uh last release, uh or so we had a correctness issue in the cubelet, where, um for example, um it might have been able to list watch pods but had not yet been able to list, watch the node resource itself and then, um as a consequence, there was an issue where the cubelet could launch a pod that may not have been actually feasible to schedule on that local node, um and then we had done a number of fixes to try to get right in the first fix, resulted in kind of a it was more correct, but then was slower and then impacted cube, adm startup times, and then uh we iterated together to try to get a faster fix.

A

That was also uh correct. What I'm wondering here is like this seems like it will more greatly impact cuba dm startup times, and I'm wondering uh if, if there's a budget that this use case is being driven from that says, you know how quickly we expect all these things to.

G

Yeah, that's a that's a great call. I think all that is actually open. So the pr right now has some retry functionality under the with a timeout. So we can drive the timeout super low. If we're concerned about that. But to answer your first question really the repro, the canonical repro is the cube adm um configuration that includes local lcd, so you're, delivering an fcd pod via etsy kubernetes, manifests and since 120, that includes ephemeral, storage requirements um and for most qmedium joins.

G

That's okay, there's actually a lot of ways to kind of like try to simplify things, but for the second when, when you join the second note in particular, it.

D

A

Don't know if it's just me, but I lost jack's audio.

H

Yeah looks like the program did.

G

A

I think maybe you hear you again.

G

Okay, testing testing one two: three, I'm not sure where it cut off, but um did you did? I explain the second ncd node quorum thing.

A

That that's, where your audio would cut off, but in general.

G

A

If I can read on the issue, I was just mostly trying to be concerned on um correctness, fixes that then introduced latency.

D

A

Then resulted in a lot of uh um work and on this particular issue I don't know if we have a better, faster path to get uh more time squeezed out so um anyway, I was just curious on that. One I'll read it.

G

Yeah, no, I think it's a great call. I think that should be totally under consideration. I mean I plan to actually go after this go to cube adm and maybe advocate that we roll back the ephemeral storage requirement, but I feel like it's the best thing to do due diligence here, because there is a race, and so if we can identify that race and make a decision on what we want to do about it, that's really the foundational thing.

G

So I will, uh I think the pr I have in place is totally open to to even not retrying. We can just do it just in time. If we don't have allocatable storage, we can try once to see if c advisor has started. I mean it's it's hard to explain the stuff in the abstract, get to kind of know the code, but that would that would mitigate any potential like 20. Second 30. Second, warm-up delays, um waiting for allocatable storage, on, like the nominal cubitium case, even when ephemeral storage isn't a factor yeah that makes.

A

G

um Someone has a comment, I think.

H

Yeah, I have some comments, I'm aldo from six scheduling, um so I was looking at this issue because I saw it in the scheduler log, sometimes that we also get ephemeral, storage zero uh for some time. But then that's fine, because the scalar will retry right. um So in that scenario, wouldn't be an issue, you wouldn't get parts in the cubelet that the that require ephemeral, storage.

H

uh Now the problem is when the node restarts uh and it has already run in parts.

H

um Then at that point you have a pod that should should work in that node and then, after the restart it it fails and- and the issue is that we have remaining parts or fail pods that don't get don't don't get a garbage collected. So that's pretty much the issue um and that you just get those spots that need to be recreated by some other controller right. um So my my suggestion or my question would be: is it possible to leave any pods or like pots that don't require uh ephemeral, storage unaffected by the delays?

H

And only if you know only if a part requires ephemeral, storage and wheezy and we see zero in the allocatable, then we kind of don't fail the pod yet but retry it later.

A

Yeah, so for each of those paths we have to work our way through. On that um I mean all things are good, so it's probably possible on the static pod issue, like we had to delay a start of a static pod until we had known that, we had in the past, been able to list watch the node from the api server um so that there is some parts of the code we've added gating.

A

But um uh I think if, if you had a test case for the static pod scenario, that was like consuming ephemeral storage and we showed it failing- um that's a great way of us, then getting it fixed right so um that, if, if that's impacting anyone right now and even the cubadm community, we can better enriching our test. Cases is a nice way of ensuring that we get it working.

A

I guess if it did regress, but um the actual like where, in that sync loop or in the pod, at mission check on the cubic side the handle they re-cue it for later. I'd have to think through. You know, with someone else together on the best place to put that.

G

Yeah, the problem I'll have a brief comment during that, but the challenge there is. uh That was my first approach as well. um But if you add retry tolerance, there you're you're not really actually guaranteed, depending on what kind of race conditions area you're in that that it's ever going to succeed under the hood.

G

So I I felt like the best most surgical change would be to at the point where you're actually gathering data to detect when ephemeral storage has not ever been gathered, which is the particular race at the start of the container manager and wait on block on c advisor for a small amount of time. Trusting that in most cases it's going to come up quickly after and you'll be able to get that data.

G

If you just retry in the predicates flow, the race condition still be present, so you can retry for 20 minutes, but you're still, you've still got.

A

Zero, maybe I don't know jack, even if you had a look at your pr, if you had an ede in needy node that just simulated starting a static pod with the thermal storage, that's probably even if you don't, the cube, adm community doesn't leverage it. It probably at least gets us on a path to know that we're not going to have issues where static pods would be consuming. It.

G

Yeah, that's literally what I've been running uh h, a at a control, plane, cube, adm, build cluster, so yeah.

A

But I'm just saying restrict that even down to just a smaller ede that starts to stop.

G

A

Yeah yeah all right cool, all right, well uh I'll, look at the pr and go from there jack thanks a lot. Thanks um tim. uh I know we talked about the psp release last week. Were there new discussion points we want to talk to.

A

Am I looking at my agenda? Oh I'm, looking at the wrong week's agenda. Oh thank you. uh Are there any other topics? People want to raise today? Otherwise, we'll give back time to continue our cup review processes.

C

I had a quick thing, which was: I have gone through, and I've updated the spreadsheet and everything that I could find a pr reviewer I put into the spreadsheet at least half of them. Don't even have someone assigned. This is mandatory for your cap to merge. So some of them I've gone and commented and said please assign one, uh but if or the release team already has. Please make sure that you do that.

C

If, uh because otherwise, your cap won't be able to merge without a prr reviewer and if nobody's assigned, no one will know to review it. So.

A

All right: well, thanks, alana, that I guess we will end today's meeting. Yeah.

I

I have a quick question: if uh give me a moment, I'm trying to get a sense of where the cisco's set up uh what the current status of cis cuddles is and sort of marking new cisco levels is safe. uh My reading of the recent enhancement proposal is that it was largely destabilizing or like documenting the existing allow and save cisco's mechanism. uh Is there anything sort of on the roadmap for adding new systems, or where should I be asking about how to add new cisco's? That should be safe.

A

Yeah, so some history on that the the original set of cis controls that were identified as safe were probably enumerated about. I don't know order of three to five years ago.

I

A

It's possible that new things have become safe, that we're unsafe at that moment in time. um So if you have a a more updated set that you we can demonstrate are safe and we know the kernel versions at which they're safe to be exercised.

A

um Maybe I think we'd all invite the feedback and we can think through the best way of expanding the safe list.

I

Okay does is that just a code change or does that need a cap or what's work, I'm new to this? So what's the workflow.

A

Yeah, so uh I think before you even jump into a cap or code change, just sharing what you sounds like you may have discovered in in your.

I

A

A simple google doc next week or your convenience, you know great place to do that: okay,.

I

So put something on the agenda for next week that has sort of a link to here's, a bunch of systems I found and here's why I think they're safe yeah. That's.

C

Perfect, I found an issue too in kkk.

I

Yeah, okay kkk is kubernetes.

A

Okay, the issue we'll have to work through is like the underlying kernel versions or stuff, like that that I will challenge our way.

I

Of determining.

A

uh Yes, you're.

I

Not okay! Yes, I will. I will take that is there? Is there a kernel range I should be looking at? Is there even a supported kernel for kubernetes.

E

It's hard because, like rel, does back ports, so rel may say four point something, but it has newer stuff back ported so like just looking at the version is gonna, be.

I

Okay, okay, I will, I will look from four dot something onwards and then we can. We can figure it out. Next week I can come back with some text for next week.

D

You can also look at the kubernetes kubernetes, have the system, validation test and the link against that test, and then say what do we put there? If, if I wrote her a long time ago, we basically just 3.18 up and about, but we should refresh that one, we totally okay to refresh that whoa, because that's the couple years ago we have that for kubernetes yeah.

I

Where is this sorry, cuba.

J

A

Yeah, so if you look in the cube code base, there's at least it used to be, there was a validation.

D

The system validator.

I

Okay: okay, the the link that was sent to the chat.

D

A

Yeah yeah, so there's like a kernel, validator dot go, might be what I want to check out. Thank you all right, all the best and look forward to hearing on that. Next uh next time, you're convinced jeffrey talk later bye, everyone, my folks.