Kubernetes SIG Arch - KEP Reading Club, 2 Aug 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: [2nd August 2021] SIG Arch - KEP Reading Club Community Meeting

Description

Kubernetes Enhancement Proposal (KEP) Reading Club is an initiative by sig-architecture.

KEPs covered in this session:

- https://github.com/kubernetes/enhancements/pull/2640
- https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2000-graceful-node-shutdown

A

Okay, uh hi welcome, so this is the I don't know which session of cafe ring club, but this is the next session of the cap. Reading club today is the 2nd of august 2021 and, as a general reminder, this meeting as all communities meetings follows the cncf code of conduct.

A

Generally, boys are to be excellent to each other and since this meeting is also being recorded, be mindful of what you say and make sure.

A

So today's two caps, uh if you haven't, if you don't already know or have the.

B

A

Today's two caps are secrets, protection and graceful node shutdown.

C

um Really, there's a bit of noise from someone's mic with.

A

Okay, uh so let's try to like: um let's, let's try to get through like the first one and it's okay, we can take our time, let's not rush it, because I.

D

B

A

uh We had a little bit of difficulty right, so, let's just take our time.

B

A

Probably give like 15 minutes or like so for this one gap, and then we can discuss it's okay, if you can just discuss one today, I'm fine with that as well. uh So let's just put like a thumbs up reaction on your screen or like in the chat. If you.

D

A

Like get started, and you can start a timer for that.

B

Awesome: okay,.

A

Does anyone like need any help with where it's located.

C

uh Yeah, can you post that link is.

A

It just like pr.

A

So the first one I'm posting is for secrets protection. The next one is node, therefore node shutdown, I'll I'll post, that once.

A

Yeah, I think that's yeah, okay,.

A

Okay, sorry about that, uh let me just give like a brief overview, so what happens is every week every two weeks we have this session uh and people such as caps that are of interest to them or if they are an author of the cap, they suggest the cap for us to like, go through and read and understand and provide feedback on if we can. Otherwise it's just. If you find a cap interesting, you suggest that, and we all read it together and discuss it later on and kept this. This has been going like said.

A

A lot so kept stands for kubernetes enhancement proposal, it's the uh new mechanism for getting new features in or deprecating older features, or even deleting all difficulties or updating current features. So that's the whole process of getting uh those things into the kubernetes project, and the purpose of this is basically to like familiarize ourselves with the whole process, and things like that and kepler is like the reading. Our character is the best way to know familiarize ourselves with the future. Okay.

A

So uh could you briefly explain what is prr um so pr stands for production readiness review? uh I personally am not too familiar with, uh like the whole thought process behind it, but let me just note that down so that I think dems is like the best person.

E

Yeah I mean I I even I didn't know what how uh what's the full form of pr. So even that much was helpful. Thank you.

A

Okay, I think we can get started. I posted the link, so I'm going to bring up like a 10 minute timer. This.

B

A

Is a soft timer for this time like we can excited if people don't finish.

A

And starting the timer in three two and.

A

A

The 10 minutes is up, so if anyone needs like an extension, just react or like put it in the chat.

E

uh We can, we can wait for two more minutes.

E

E

E

uh Thanks mark, we can go back if everyone is ready.

A

I think the author of this gap also joined them, so thank you for joining in.

C

A

Folks have questions. We can probably get started.

C

Your voice is very low. What are you saying is the author here.

A

Yeah, I said the author also.

C

A

Thanks for joining in, uh if we have questions, we can probably get started.

E

uh I have a very general question uh not like before, starting with the cap: what do what do we mean by feature gate because yeah there is, um we are talking about protecting secrets and config map in two cases in the entry and the outre resources, and one of them is enabled through a feature gate and the other one is through a flag. So what do we really mean by feature gate in general.

A

I can probably take like what is a feature gate, but I'm not too sure about the distinction between the flag enabling through the flag and enabling to the feature gate for the entry and others so feature gate in general. uh Is it's like a mechanism for you to enable features uh that are probably like alpha or beta? So if a feature is an alpha, then the feature gate for that would be by default set to false. So you won't be able to use that feature in the binaries that come out.

A

So if you want to use a certain feature which is in alpha, you would have to explicitly set the feature gate saying that this features feature gate equals true and then that component would be would work with the assumption that this feature exists as part of the kubernetes system.

A

And if a feature is in beta, then that feature gate by default is set to true, uh meaning that if you want to disable it, then you would have to explicitly go over and disable it. And when a feature is in beta, it's usually meant to like get further feedback from the community and like continually improve on.

A

So that's why it's by default set to true and it's at a relatively stable state now when it finally reaches ga after getting feedback and increasing consistent consensus, and things like that, when it's in ga you don't really need a feature flag over there, it's it's there in kubernetes uh by default, you can't really disable it uh and yeah it's there. Basically, you want to see.

C

This enabling and disabling happens like at the time of installing the kubernetes binaries on the nodes right.

A

Yeah like whenever you want to run the binary, you can set a feature yeah thanks for sending that link.

A

Sorry, first first things first, uh just to make sure I don't mispronounce your name uh is it? Is it masaki? Is that right? Did I get it right or, if not.

D

Oh yeah, I'm masaki.

A

Hi, sorry, uh do you want to take like the second half of the question so like for entry, you enable a feature gate, but for out of three you do it through a flag.

D

Yeah, thank you for your explanation. Yeah! It's right uh for entry feature. Gate is needed, but out of three we don't have a mechanism to do it so yeah.

A

E

um It would be uh fine if, if you would like to uh since we have read it, would you like to just give us a brief of uh the cap itself in like in your words, that would be helpful.

E

Am I audible other.

A

Okay, uh okay, I think we can like revisit that soon, uh but are there any like specific questions.

A

Related, I think I had a few.

E

um I I have uh just one uh another, so we are talking about entry and out of three uh tree resources and in it like, enabling the secret and conflict map protection on both of them, and so what I understood after reading the gap, it's like, we are introducing two new controllers. That's right!.

B

A

So, okay got it uh dislike as a follow-up to that. um The finalizers that are added to these conflicts and secret maps uh were these all. Are these already present or does the cap introduce these finalizers.

D

It introduces a new finalizer.

A

E

In in very broader sense, uh how how we are checking whether something is being used like a secret is being used by a board. I see there are lots of tests written there.

E

So what's the process uh like how we are using whether a secret or a conflict map is being used by a certain resource or not.

D

uh Actually, uh the implementation plan is not yet decided because uh we are still in discussion, but.

D

As for the idea in this step, current cap, it is checked on saturation and if there is a finalizer, we can add a logic to check any check on the duration and.

D

We add a logic to check whether the secret is used and it is checking all port persistent volume and whether it is referencing to the particular drinking secret.

E

Okay, okay, yeah, make sense, and uh I have just one another question, so I there is a section talking about risk and mitigations uh and it is talking about a condition where, uh where everything in a namespace is deleted, but but because pv uh still remains, after, like resources in our inner name, space are deleted, so the secret will remain, which is like the pvs are using the secret as volumes or or something like that.

E

So uh there is a situation there and uh we are considering this as one of the blocking conditions, because here the user would have to go manually and delete the finalizers to delete the secrets up so, like our design, details actually takes care of this particular situation or it's it's a known issue. At the moment. It's a broker.

D

uh It is actually still an issue, but uh I I'm thinking about the way to resolve it yeah. uh Actually, uh in sixth stage uh there is a discussion to add the annotation on such information.

D

In keyword site yeah, so if we add this kind of information, we can utilize it, but until then we have no way to find this information from pv.

E

Okay and- and there is a way uh somewhere down- I I see- I'm just just give me a second so um there somewhere in the cape, it's uh mentioned that if we have to man, okay, I I think that man will be deleting the secretism is a known thing: okay, sorry, yeah! That's all from.

A

A

So, just as like a closing question before we like move to the next one.

A

What aspects can people like uh reach out and like offer help in and like how can they contribute further to this particular feature, how much work is to be done and where can people help out if anyone is.

D

D

Oh, could you repeat the question again.

A

Sure uh so, as like a closing question uh for this particular tip, uh what work is remaining and where can people offer help in if any uh and like? Where can people help and contribute uh for this particular feature, if they're interested.

D

Yeah, uh actually, uh this cape has a lot of discussion around how to implement. I I first thought uh we can go ahead with the similar implementation to pb and pvc protection, but their discussion continues.

D

uh We are thinking about the other way to protect duration for certain situations, so.

D

I've created another kit for uh in-use protection. It's a generic mechanism to protect uh some of the features uh some of the resource when it is in use.

D

If you are interested in.

D

This particular secret protection cap.

D

I would expect uh other people also uh uh joined the discussion on the other cape. Could you check the link.

C

Yeah good thanks.

D

And uh and also any feedback will come here.

A

I think this is the genuine genetic mechanism.

D

Yes, yes, yeah, I'm not sure we we need to rely on the generic mechanism, but the discussion around these will help.

E

um Generally, this uh secret protection and config map protection, this entire uh new feature uh will fall under but sig uh like precisely. This work will be uh tracked by what's sick.

D

As for secret protection, it is under six storage and and generic mechanism for protection should be under api machinery.

E

Okay yeah. Thank you. Thank you.

D

Yeah you're welcome.

A

D

A

Have any questions before we can move on to the next.

A

C

I think we can move up. Okay, awesome.

A

C

Thank you so much for taking out yeah.

A

Okay, so the next one is graceful, node shutdown. Let me just paste the link.

A

A

And I'll start a 10 minute timer in three two and.

A

C

Should we start like, since only less than 15 minutes is left.

A

Yeah there's two minutes of the time.

A

C

Yeah, that's name. I.

C

A

Does anyone have questions initial questions.

E

So we are talking about adding a new new config filled uh shutdown grace period so, and we are talking about having a graceful shutdown for pods, no, not at the moment. So uh my question was like uh if I am understanding it properly, we are talking about adding a new film in the cubelet conflict right, yeah, okay and I have a follow-up question which is um here we have. We are talking uh somewhere like how we are going to calculate the time like for how long it will wait.

E

We are talking, will categorize supports into a system critical and on another category as well, so a system node, critical and system cluster critical. So this kind of uh categorization we already have some mechanism being like at this moment. We already.

B

E

What we call as system cluster, critical or system node critical, it's something that will come with this yeah. I think okay,.

A

Yeah, I think uh system cluster critical, those are existing priority classes that exist, and uh so in scheduling you have different priority classes that you can have.

A

So you can mark certain things as system note, critical or system cluster critical and those pods. Would I don't have like a very thorough idea about it, but those points will take priority over some other pods. While scheduling happens,.

E

Is what I think? Okay, so it's already there yeah yeah it's! It happens with priority class. Okay! Okay, thank you.

C

uh So what I wanted to ask was like uh it mentions that the reason this needs to be done is so that, like the pods can be gracefully, uh shut down and like to have some buffer time. What I wanted to know was like does anyone know of any examples like where pods like need to do something before they shut down and like which is essential, which is what we are like trying to provide here.

A

uh So one thing that I can think of is let's say you are running some application, which sort of um opens or like creates certain resources, but once you want it to shut down, you want to make sure that those resources are cleaned up in the way that you define so, for example, um like I can't think of a like a very negative effect sort of use case happening. But I'm sure there is something that.

C

I I just can't yeah right now. I mean this example only makes it clear like what.

A

Yeah yeah like when you want things to like, and also most importantly, when you want the bot to follow the life cycle that you expect it to. That is uh like scheduled, running, uh then terminating and so on. So you wanted to follow the normal life cycle rather than it being abruptly uh terminated um so and this this the next thing, I'm about to say I'm not sure of but another implication of that I think could also be and like anyone can hear, can correct me right wrong.

A

But if you, if, like a note shuts down and all the pods they're, also shut down abruptly- and let's say you didn't- want some pot there to shut down then to recreate that pod and reschedule it, you would have another additional check over there, rather than it rather than like. If you follow the normal life cycle of it, then through the normal life cycle, you would get know that something is shutting down and then in the normal workflow of things you could.

C

So you're saying basically like another point, might not get rescheduled at the place of this, which was scaled.

A

C

A

No, it will get rescheduled but uh yeah. I don't have enough clarity on this, but this is.

E

um I I have just one one question or a one open question or one point to make so uh number one like doubt an open question, so uh here by adding a graceful not not shut down, we are making sure that the pods have time to shut down properly or something like that right. We are trying to help the at the pod level. With this with this node graceful shutdown.

E

E

Okay? So there is something known as poor disruption budget, uh which also I'm just.

E

Did you hear anything at all because I was speaking and I don't know- okay, okay, sorry,.

E

A

You cut off and right when you sent feelings.

E

Okay, okay, so I I was, I was saying like uh there is something known as poor disruption budget, which does the same thing so where poor disruption budget is used, one of the examples could be when we are upgrading a cluster or in those cases. What really happens is we need to up like we are upgrading the platform. So, for example, we have to upgrade the kubernetes version itself on a production cluster.

E

So what we do is we we try to gracefully, do it node by node, so we we try to drain one node, and then you like put this put the ports on different node, but in that situation uh this kind of uh having graceful board shutdown helps because we don't want to right away, delete everything. That's scheduled. We need to first check like what is important and what needs to be scheduled on a different node first before just deleting it right away.

E

So I I like I had this question when I was reading this skip for the very first time that how this particular kept is different from different or related to the poor disruption budget uh yeah. If anyone has any idea, I want to talk about.

A

It uh I don't really have too much of an idea, because this is yeah.

C

A

Don't know too much about, I.

C

Think we can start a thread on the sega thing and posting there.

A

I have one question.

B

A

Can you maybe type it out in the chat if it is possible that might.

A

A

Are you it out.

A

Yeah, I am messaging the child with a little chat.

A

Okay, I see so from like the initial glance at like the link you sent. uh What I understand is this particular camp. Isn't really isn't at least I think it isn't related to pdb.

B

Because is doing is.

A

If you, if you specify a port distraction budget for a particular uh quad with let's say a replica count of five- and you say a pdp of three for that.

E

Yeah, it will be like yeah. There should be that number of things present at any moment when, when we are yeah yeah.

A

Something like that, what this one is saying is just in the sandbox of the node or like at least boundaries.

A

If the node is set for shutdown, then uh cubelet, then, through system b, the node shutdown will be delayed by so in so many seconds, and in so many seconds the cubelet can attempt to uh gracefully shut down critical and non-critical thoughts based on those different policies that were mentioned, and once those are done, the note can go ahead and shut down, but it won't really ensure that the disruption budget is actually met or like.

E

Yeah yeah yeah. That was my question uh like when I was reading it that, in this case, when we are when we are implementing this feature like that feature, would take care of the port disruption budget. If, if any is set like it would take care of that or it was just right away. Wait for that particular minimum seconds that we are introducing and then right away yeah.

E

So I, like, I had a question that way like if we have a poor disruption budget set as well, then, if it is going to like respect that or.

C

Would be like in the implementation? Is there a pr for this, because yeah.

A

This one: what is it.

C

Like in discussion,.

A

uh This was part of 122 in node, so I think there is a pr, but that's a good question like I've written it down in that in the dark, so you can probably like attach it in the questions thread so that you can get a better idea. Thank.

A

A

uh There were you had a question related to pre-stock hooks. What is it.

A

I couldn't share anything at that.

A

E

And you can type it properly, we'll just wait! Sorry about cutting you that time, please type yeah.

A

So sorry, for going over time, so if folks wanna drop off is certainly alright, so I'll see you next session, but uh yeah I'm here for some more damage. Anyone wants to talk about things related to this.

A

Okay, are you asking what pre-stock hooks are? Okay? What's the use of that, I am not fully sure what the use of that.

A

A

Let me just look.

A

C

I just link something to you in the chat and that is like the official documentation. I think you will. That should be helpful.

A

Yeah, so from what I understand right before a container is terminated, uh some logic defined through a pre-stop hook for that container is executed. So if you want something along the terms of, if you want some action to take place inside the container, uh then.

B

A

B

D

A

It's deleted or like terminated. Rather then you would define a pre-stop and similarly, a pre-start hook is similar.

C

So basically like that, deleting part you were talking about in case of.

A

E

And and I'm just reading in the official documentation under the pre-stock- and there is a very interesting line there- it says the pods termination grace period countdown begins before the pre-stock hook is executed. So, regardless of the output of the outcome of the handler, the container will eventually terminate within the ports termination grace period. So it means like the the pre-stop logic.

E

Whatever is has to be finished properly successfully doesn't really matter, so it will just wait for the outcome until during the that grace grace full shutdown period and whatever be the result, it's going to dominate regardless right.

E

So what we are with this feature we are trying to add is some adding some more time over on top of the board.

C

I think it's like we are trying to trigger this pre-stop hook. If so that, like, if there's like an immediate something happens in the note, then the pre-stop hook won't get triggered right, because the note didn't know that it had to shut down stuff. At least that is what my understanding was that this would.

A

A

So this graceful shutdown is to delay the shutdown of the node, as in like the actual box or the machine by some amount, so that in that time frame you can actually uh terminate these pods and those containers in those forms. So that's why you have that graceful shutdown minus something as the grace period of the pod. So that is the gray sphere that is stopped above in the cockpit.

A

So these two are.

C

So, basically provide a graceful period to the node so that the pod can use its graceful shutdown. Is that what you mean? Okay, yeah? That makes sense.

E

So so we are saying at the moment, even though there is a there is a graceful domination like grace period at the moment, but because the node doesn't have any at the moment, so it doesn't like it. The poor, dis port termination doesn't happen properly, so we are trying to help board at the moment. It's.

C

Basically, like I guess, a very simplistic way is like: if the electricity goes, then windows doesn't know that it has to like close the programs during shutdown. I guess I mean that is what how I do.

E

Yeah I mean that that's what the impression I got after reading like we are switching off the machine, the power button. So I was thinking in my like when the power button is off to me. Really everything is gone. So what like you ask what what's the use case we are trying to help her. That would be interesting to know like what's the example.

A

Like if there is no power supply, then yeah, you don't really have a choice, but in case a shutdown signal is received that, instead of immediately initiating a shutdown uh and not giving leeway for graceful termination, this sort of system b facilitates that delay by someone.

A

Yes, further questions just either like free free to add it in the dog or we just start with three or.

C

In the street yeah.

A

Yeah thanks for joining and hopefully.

C

E