Kubernetes SIG Node, 11 Apr 2023

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG Node 20230411

Description

SIG Node weekly meeting. Agenda and notes: https://docs.google.com/document/d/1Ne57gvidMEWXR70OxxnRkYquAoMpt56o75oZtg-OeBg/edit#heading=h.adoto8roitwq

GMT20230411-170451_Recording_1522x928.mp4

A

Hi everyone welcome to signode weekly meeting on April 11 2023. uh We have a couple of topics on the agenda, so I see Dawn already responded to Kevin's uh request for approval on the dra cap, so we can quickly move on to the second topic, which is the n minus three skew by Jordan and Derek.

B

Yeah Jordan: do you wanna uh kick off the discussion on this yeah.

C

D

uh Let me, but you will let me I, can share my screen and kind of pull up some of the diagrams and things.

A

I just give you host children, yeah great, uh see, I think this works.

D

All right, can you see that.

B

D

um Yeah, so I I wanted to kick off a discussion, I'm kind of making the rounds of um various cigs that touch things on the Node or care about skew policies, um so talked with cluster lifecycle last week and sick Arch last week, and then node and network this week. So node and network are the two sigs that actually own node components, so the cubelet and Cube proxy.

D

So I care a lot about what folks here think about this, uh but the the too long didn't read version of this is it would be really great if the oldest node, that we support and the newest control plane we support, works together.

D

That's that's the goal, uh and that was actually the goal of the current SKU policy, which was saying n minus two nodes support current control planes, um but when we moved to a yearly support period, uh I guess a year or two ago, we realized that users actually need a couple months overlap after we release a new version for them to qualify and upgrade to it. And so, um if we release three minor versions in a year, we actually support the oldest minor version.

D

For a couple months after we cut a new minor version, and so there's like a 14 month, support window and that two-month period is intended to let users qualify and upgrade, uh and so, if we only support, if we strictly support nodes that are two versions older, um then, in order for users to stay within like the supported SKU, they actually have to upgrade their nodes twice if they were wanting to leave their node pools at the oldest supported version and then jump them to the newest, supported version.

D

uh So I'm talking with Derek and other folks that um it's pretty clear that node upgrades in particular are like way more disruptive to users, workloads, so control plan upgrades you normally have like one or three control, plane, numbers um and user workloads. Don't have to take dependencies on control planes. So if the user workloads are, you know, running pods that don't actually care about the keyboard. Api server they're happy to keep running, even when the control, plane's upgrading but node upgrades minor version.

D

Node upgrades require draining or spinning up new uh nodes, and so every workload in the cluster is gonna have to get recreated or restarted in the process of doing a minor, node upgrade, and so making people do. Two of those, especially when you might have like thousands of nodes in a cluster uh is, is actually way more disruptive, so um the goal was to see if we could let people just do a single node pool upgrade to get from the oldest version to the newest version.

D

um So that was the goal. The next thing we looked at was like what would it actually cost us to do this, uh and um so there were three types of changes that I looked at, and this is where I would like feedback from folks in Sig node.

D

If we've missed types of work or types of changes, that would be impacted by expanding to one more version, but the three types of changes we looked at were how fast we can roll out new features, uh how quickly we can drop support for olds, deprecated, no longer supported features, uh and then the third category was like uh rest, API changes. uh The third category is actually in really good shape ever since, like 119 I, think all the things that cubelet and node Cube proxy use are uh stable, they're at V1 levels.

D

So we don't actually use any apis they're going to go away at this point, which is awesome, hey.

B

Jordan, just maybe one other comment that I realized. We didn't capture them or discussion, but the alternative is that we we could, as a community just say, uh do more to support In-Place updates of qubits um and I think it's worthwhile. Maybe we put some language in the cap to explain why we wouldn't necessarily recommend that, because typically users would one that would impact the keyboard's ability to change to operating system uh updates.

B

So, for example, like doing a c group B1, the V2 migration in place is not really a thing we can do, um and so maybe just some language in here that calls out like um the cubelet could have chosen to do in place updates.

B

But we as a community, uh probably feel that that is a bad idea, because it would inhibit our ability to actually keep up with the pace of operating system, Innovation um and so like right now it's been a benefit that we always recommend users drain their notes before doing that maintenance, um uh but I could see some folks pushing back on this and saying well. Why doesn't just stick? You can just change the kiblet binary.

B

You don't even need to do a drain um and I, don't I, don't think it's a good idea for the Sig to take on that posture, but I, just I want to I want to call it out. Call it out.

D

Yeah yeah: that's a good point.

D

Just metered gone yep I think you were feel free to unmute and interrupt me at any time. If you have something you want to say uh in terms of like what work you would actually take to achieve. The goal of kind of this uh upgrading, a node pool from the oldest version to the newest version, non-disruptive, with as little disruption as possible. I agree that supporting In-Place upgrades across minor versions is probably way more work and therefore way less likely to actually get done.

D

um I I, really like sort of incremental improvements that give us a lot of bang for the buck and uh and so I I did want to jump to some of the analysis. I actually looked through those types of changes over the past couple years to see like how many new features did we actually delay until the oldest node supported a feature, and there were actually very few of them.

D

Typically, we roll out new features, and we just say if you want to use the new features you have to upgrade your nodes to a version that supports that feature which I think is probably pretty reasonable. Like you want to use a new feature, you have to upgrade to a version that has that feature like there may be user experience things.

D

We could improve there to make it more obvious when a user tries to use a feature and their nodes aren't new enough like make it fail in nicer ways or tell them earlier in the process. I can get our user experience improvements, um but generally we don't wait for all supported skewed nodes to support a feature before we say you can enable this and use this in newer nodes.

D

Most of the time, the only times that we will delay is when it's like a security issue, um so I think pod security standards waited to relax requirements on Windows nodes or Windows pods, until we were sure that all cubelets would honor the Pod OS field.

D

Things like that, so it's pretty rare actually that we delay rollout of a feature until the N minus 2. Node has support for it.

D

um More common was dropping deprecated functionality from the control plane once the N, minus 2 node didn't need it, and so there were a few instances in Sig storage around like dropping entry volume plug-ins. Once the N minus two cubelet was guaranteed to be using CSI migration.

D

um But again the cost of just like letting old code hang around for One More release and then dropping it is actually pretty low. At least that's the feedback. I've gotten from Sig node I still have to talk to some of the other six, but uh if we can make users lives better by letting them upgrade their nodes at the cost of like ignoring a deprecated package for One More release before we delete it, that doesn't seem like a terrible, terrible trade-off. It's not impacting velocity of new features.

D

It's impacting like cleanup of old deprecated stuff. Thank you.

D

So then this was just like showing homework. These were uh I went back two years to 122, I, guess and looked at enable one of new features, removal, deprecated, stuff and then removal of apis beta apis and tried to see which of these would have caused problems with n minus three skew, and so there was only one example of new feature that I could find, and there were a couple uh sorry two two examples, both of which were Sig off, so maybe I'm shooting my own Sig in the foot.

D

By proposing this, but uh not not really I. In this case, we would have just left it in beta for One More release. It still would have been available for people to use on newer clusters, and here we would have waited One More release to relax validation for Windows pods.

D

Anyway, that's the context and the proposal.

D

What I'm looking for from this sig is sort of gut reaction to this, like support, oldest node against newest control, plane goal uh and then pointing out anything that we missed in this analysis in terms of like types of work or types of features, if there were features that we waited to roll out um until the oldest node supported them, that I didn't have in this list, that'd be helpful to know uh and then like some of the Alternatives or other ways of accomplishing this goal, like what Derek pointed out so I'll stop talking there and let other people talk.

B

Response uh Jordan is that there's just widespread agreement that this is the right thing to do, and there was oversight when we changed the project, support policy, um I think.

B

B

If uh do people find the language and they're kept clear about how to handle new features, so um you know um I'm trying to think about features that are in flight and right now from note the Note 6 so in place.

B

Pod resource resizing um is the language in the cup here clarifying for how we would choose to enable that, potentially in the future on by default, the API server side or is it unclear or if there's a a particular resource, that the qubit is not yet tracking, but we've thought about tracking um I, think Rinaldi you and I, and Sergey and Dawn had a conversation about file descriptors, for example um like to me.

B

The key thing is that we are comfortable with the language, that's in the kept to understand when and how we choose to allow a feature to go on and off by default or give appropriate guidance to those who come to the Sig on like the time frame. They might be looking for for that future to be on by default, and if the language here is clear, that's good! If it's not, then that's probably the best thing we could gather together as a community to make sure that we give proper guidance to those going forward.

B

D

Yeah for for feature enablement like there's a few places like in the cap template there's a section around version SKU that encourages people to think about how this rolls out across a cluster.

D

If there are specific questions around like how does this behave on older nodes, that might be a good sort of sample question. To put there as a prompt. I will note that, like we already have n minus one and N minus two support, and so hopefully people are already asking these questions and already thinking about how this rolls out, uh and hopefully the only impact of this proposal would be for people who are actually waiting until the N minus two nodes have a feature enabled to turn something on in the control plane.

D

um So my my sense is that it's not actually impacting most features. Most features enable in a way that uh the feature wouldn't work with an N minus one or n minus two node, and they would just tell the user if you want to use this feature, you have to upgrade to a newer node.

D

um I I also included a link. uh uh Where was it here? We go um one of the ways that we enable features today. Is we just default them on in a given release and then, as that release sort of propagates back into skewed nodes more and more releases supportive feature? That's not actually in really terrific way to roll out features.

D

It kind of is in some ways it's the worst of Both Worlds like it's slow uh to make progress, because it's like a four month gap between each release, um and it also leaves clusters that have skewed nodes in a state where, like maybe the features allowed at the control plane, but doesn't work with older nodes, and so it's the current state is actually something that could be improved and I linked to a kept.

D

That Daniel has um in progress, which is maybe trying to talk about improving the way we toggle feature flags on, so that, instead of just being tied to a release, it actually like can be more cluster aware. So, if you had a cluster where all the nodes and the API server are on the newest version, then great the feature enables. But if you still have nodes that were on an older version that didn't support a feature, maybe we wouldn't default the feature on. Maybe we would wait until your nodes supported it.

D

um So I I think there's room for improvement in How, We Do feature rollouts I. Think that's orthogonal, to um whether we have a minus two right. Minus three support, but for those who are interested I would definitely encourage. Reading Daniel's kept and weighing in with an eye towards like skewed node control, plane, rollouts.

D

Okay, uh that that was all I had if there are no more questions here, feel free to read the the N minus three kept and the questions there or ping me. If there are things I ever got or you want to see and uh we'll try to get it updated.

B

Just the final TR tldr Jordans, like we're wanting to reach closure on this skew policy, change this release, and then it would be retroactive such that 125 nodes would be expected to work with 128 control plans. Yeah.

D

So so, by default um like if we don't make any changes in the control plane uh in a new release, then they have just as good support for n minus three nodes as n minus two nodes, and so ideally, if it's not disrupting plans that sigs have in place, um I would like to see the next version of the control plane.

D

128 support back three versions like keep as good support for 125 nodes as 127 had so I tried to do some analysis like forward-looking to see what plans sigs had and Sig storage was the only one that I could find uh so yeah I would like to have 128 control planes, support 125 nodes, just as well as 127 control plans did that's that's my goal.

B

Did you meet with six storage I can't remember if you had that on the list.

D

uh Six storage went over this last week and um the dropping the entry, uh vsphere and Azure file plugin support were the only thing that would be impacted and delaying that by One release would allow them to have n minus three support, starting in 128 control planes.

A

Thanks all right, thanks, Jordan and Derek uh folks on the call, please take a look at the cap and chime in if you have any thoughts. Okay,.

D

um Moving on to the next.

A

C

Sorry I'm sorry, bro I just already comment that for the In-Place pod resource update, uh I had to take a closer look at the cat, but uh from what I can tell, maybe it will need some additional changes to ensure that if you request a resize on a pod, that's running on a on a node that doesn't have this feature at all, then we reject that request uh early in the API server Jordan. Does that about sound right.

D

um Maybe I again like we already have a SKU the possibility of skew right. You could be running on a node, that's one or two versions older. um So.

C

That is supported in this. In my current plan, it's supporter, n, minus 3 is not okay.

D

All right um so yeah, let's, let's sync up and look at what the In-Place uh design was going to propose for skew handling and see if this adjusts anything yeah sounds.

B

Good was that it would delay turning that feature on in the control plan by default by uh One release yeah, um but it wouldn't preclude um the ability of others to use that feature in clusters that they knew in their local deployment. Posture were at a satisfactory level. So to me it just it changed when it defaulted on, um but uh yeah. If you and vinay can sync up on that, that's kind of uh how I read uh our our language.

A

C

A

Any more thoughts, no questions.

A

All right, uh you can move on to the next topic. uh Marlo. Are you on the call.

E

So we're still making a good Progressive updates. We've had some other work come out the last week where we wanted to be, but we're also waiting. Looking for feedback for one from one of the main members um we had I think we had Don assigned to this, but Don hasn't made a lot of the calls so I'm trying I'm looking for help right.

A

Because is it time to maybe come to signode and get feedback? Maybe present which we are at? Does that.

E

Make sense yeah.

E

If that will help, but also I was hoping to have an iodons Eric circuit is already aware of what we're up to I'm, even working with Francesco Ed platy yeah.

E

We have a pile of people, we've been working with. Oh.

B

Is this um uh apologies I'm coming back from being out for a week um this? This is the same overall uh plug-in design we were discussing. The past were like to my knowledge. The issues that we were encountering was around bootstrapping.

E

That we're not worried necessarily if it's dropping at least at least in Alpha, and then, if we want to integrate, then we can start looking at the extracting cases.

B

Just be the only thing I wasn't aware of is if we had a satisfactory resolution to like the bootstrapping challenges um and uh or if there was a proposal to to update them. I'll.

E

B

Look to read through the link here, but I think um uh I would kind of plus one renault's uh feeling of like uh maybe have folks give an update on the latest state of the discussions for those who haven't been able to attend, um because otherwise yeah I was on the impression that we were still um blocked on the core bootstrapping problem.

E

S because we are because we changed the design and it's been that way for a while, so we've been we've been waiting to get feedback. If you think we're ready, we can do that for me to do that.

A

So maybe we can schedule it for the week after uh kubecon, so I think that brings us to the end of the agenda and I. Think next week is kubecon. So do we want to cancel the call or keep it.

B

A

Yeah all right, uh meanwhile, folks reach out on signode. If you have anything that you need uh thanks for joining, see you all couple of weeks, bye now.