Kubernetes SIG Cluster Lifecycle, 11 Oct 2017

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG Cluster Lifecycle 20171011 - kubeadm implementation

Description

Meeting Notes: https://docs.google.com/document/d/16CEsBSSGm3sMpvB_cFnKnqqi1OxhIcyX3lVwBpIyMHc/edit#heading=h.ouxlycri8nmz

Discussed status of self hosting dependencies: daemonset surge updates and kubelet checkpointing.

A

Okay, hello and welcome to the October 11th 2017 edition of the self-hosting /h, a breakout meeting. First aid cluster life cycle. Let's see we had started talking briefly before the recording about surge updates. I see that Aaron is now here. So maybe we can continue that discussion Aaron. You know if Diego's gonna show up also I, don't.

B

Know that he's gonna be able to, but also Ryan Phillips, is on the COG who's following that process, as well yeah.

A

So I just mentioning to Tim that Brian gave a a comment on the issue last night saying that he didn't think we should add it effectively and Tim and I were talking about the the fact that we had chosen to use daemon sets for the master components, but maybe we should reassess that decision in light of Brian's comment, so I guess they're sort of too fast forward. One is to go talk to Brian and Kenneth and try and get sig apps.

A

To actually add this feature, if we think it's the right way to move forward and the other option is to see what other other features we can use. You know from deployments or a different type of controller, for the master bits.

C

What was the what's, the TLDR for the or do you have a link to it? We could put into the dock. Also I, don't see the notes on the dock right now, the today's date yeah I, haven't pulled I.

C

Haven't been following the demons that PR I.

B

Think the main contention right now is how to deal with host ports in this update strategy and then coupled with the fact of wanting to move all workload, API is stable in 1/9 or to GA and 1/9. This complicates that potentially I.

C

Think that's probably Brian's primary motivation.

B

I mean I can talk to like if we're we're considering alternatives, I mean we can cover how we handled it with deployments they're, not the cleanest way, but it works well.

C

We, the original motivation behind using demons. That's for sticking around the scheduling, snafu that existed with not having the Tate's be schedulable when the network condition and the network wasn't there yet right and I believe that's all fixed. Now it's all exists or is in place. So I don't know if that condition, that there was the primary driver behind choosing demons. That's I, don't know if that's still a constraint that we actually have anymore.

B

Yeah I think I think that got resolved I haven't checked, specifically, it didn't affect us with buku, because we install the the networking Damon said kind of that we just shove everything in at the same time and kind of wait forever for it all to resolve. So I think it was with Kubb admin.

B

The phased approach, kind of complicated that a little bit but the stuff that we run into with using deployments is essentially you want these components to be spread across nodes in an H, a situation, but if you're kind of bootstrapping into a single node or it happens that they all end up on one server, it's actually pretty common. For that to happen, you want to start using anti affinity rules on your deployment, so you're, saying: okay!

B

Well, I want scheduler Control Manager to spread across my master nodes, but when you do that, if you do a required anti affinity, then you're back in this situation, where, if you have a single master upgrades in place, don't work because I can't schedule the replacement pod onto the same host, then it just doesn't proceed and we've gotten around that with a little bit of kind of finagling, with with how, with like selectors versus labels and then also the anti affinity.

B

And so essentially, our anti affinity allows pods of the same object but different versions to coexist on the same node, but it won't allow pods of the same version of the same object on a node. So during an upgrade, we create a new deployment that has a label that changes so that it'll be even though it's the same component. Technically, it's the scheduler, let's say it'll be able to be co-located during the upgrade process.

B

But after that, it'll spread it across all masters, and this is purely to get around the situation where we want there it to be the same process, essentially for a single master case or a multi master case. This isn't an issue in multi master, so.

C

To take to take a step back, this is this is the conversion of using deployments, but daemon sets the only problem that I'm aware of that exists is if you have a daemon set of one. If you have a daemon set of more than one, this problem doesn't even exist at all because it doesn't matter if you kill the one, that's there and start a new one. It only matters when you for a daemon set of one on the upgrade, so it's a special condition right.

A

C

I think we're doing a lot of rigmarole and workaround for what is essentially going to be a single one-off case, which we already actually have in the code to handle. So I'd want to talk with Lucas, to verify that you know the one-off code that we've written for upgrading, zuv upgrading of a single node.

C

We might want to maintain for a longer period of time, but the demon set solution works fine for most h.a scenarios, where you go to zero and come back because you will have multiple controller managers and they'll. Just bounce and you'll have multiple schedulers and they're also bounced, and you have multiple API servers and they're just real of balance. So it's I don't see a problem there other than the the one node scenario yeah.

A

I mean this is definitely for one note, I guess one one comment on what you said: Tim was that we have workaround code right now. Do we actually have a break code in the self-hosting case, or only upgrade code in a static pod case? We.

C

Have upgrade code in the static pod case right.

A

Which doesn't have the problem, because you can, you can just put the new static pod in place right and you know it's going to start.

C

Yes, I'd have to think through the scenarios here where you might want to do a sort of the conversion scenario right where, if it's a single node and you're doing an upgrade, it's a it's. It's on the machine type of process right so who ATM could do all the work for you know doing a then B, then C in that order for the upgrade process, but I, don't this problem goes away again, once you actually have high availability.

B

The other guard are the other thing that you know like we're speaking about this in terms of upgrades, but ideally you know it being self hosted. You want to change a flag on your scheduled area, controller manager, you're going to edit it and that's the same process as an update, and so, if we're saying that this logic of being able to safely update lives in through that and the tool like that becomes the only mechanism of like making those kind of changes to that component.

B

Unless someone's really careful about how to do it, which isn't the end of the world. But it is a little bit unfortunate and you do lose a little bit of the benefit.

B

A

Basically makes it really easy to shoot yourself in the foot. Are.

C

You doing yeah do we want I mean this is where we started to get into policy questions right like do. We want to recommend self hosting in a single cluster scenario. Is that is that a wise thing for folks to do I mean.

B

I think there's enough options here that I want to say: I want to continue to say yes like we do it today and it works. But there's like that's where Damon says would have been nice in this case. Another thing that we've kind of internally floated is just writing a babysitter controller that just sits there and all it knows how do is extract. The pod speck out of a out of a deployment object, specifically controlling manager or scheduler. If they don't show up in too long and then just injects it in a bit cluster.

B

It's not like the cleanest implementation, and so we didn't actually move forward with that. But that's just another option or just kind of a babysitter controller yeah.

C

You can yeah, you could just have some, it doesn't need to be a deployment it could babysit. The demon set right, I could just do whatever and communicate back to the main process.

C

That seems totally legitimate to it and in absence of an updated strategy, that seems completely fine to me, because it would just be part of the main initial deployments right, and it only does one thing: yeah.

A

I guess the the update strategy to me seems a lot cleaner, because you don't have fragmentation between single master and multi master scenarios you're doing exactly the same thing. In both cases, it would make a lot easier to switch between the two right. If you start out with three masters and you decide, I really only need one and you go down to one unless you add extra things to your cluster, to make it sort of safe to run a single master. You are now in a very precarious situation, so.

C

Our top priority in the workloads area is today SD crisis table to g8 one point: five, this seems complicated. It's arguable whether we should ever do it, but we should not do it in one line. I am flummoxed by the idea that we have these notions of being able to have feature gated items to enable these types of things right and if it's completely behind a feature, gate I do not understand why this would be a problem.

C

B

You know like this, you know I thought I was gonna, say if we made it like an annotation or something that got acted on, it might be more reasonable, but if it makes the into the spec itself and that spec is being graduated to GA, that's I think where the concern is, why don't.

C

We do an alpha annotation. That seems completely legit to me.

B

Maybe an option.

A

Okay, what I've got a meeting type setup with with Kenneth who's? One of the leads in cig, apps and I will see what I can that side. I was trying to do it yesterday, but he didn't get back.

C

I'm gonna put a comments there. Just so I have history behind this conversation of. Why couldn't this be this D and alpha feature.

C

Feature gated annotation.

C

Because that seems like a totally legit path for a bunch of things in the system that are of the same ilk and that way the API is untouched and cleaner clean-ish and it provides an exit strategy for what is essentially our nice little Kobayashi Maru.

A

Okay, I'm not sure if we're gonna get much farther continue to discuss the steps forward in this meeting. I think we have a pretty good notion of of what our options are on the self-hosting or on the on the surge update front. Tim do want to give any updates on the checkpointing I have.

C

A talk, I talked with Don yesterday briefly didn't mentioned here that I'm, it's considered, p0 from our side and I'm talking with you tomorrow. My plan today is once I finish up some work to start my rebase and change all the names to bootstrap slash checkpoint, because I don't want to have any more arguments, because the arguments originally started from the generic checkpoint, name and I. Think for the sake of expedience and just to get things done, I'm just going to call it bootstrap checkpoint. Unless there's any ambiguity, other people care about.

C

So that's that's. My tentative plan I'll have more updates. Next week after I talk to eg tomorrow and I should have. There are plenty of comments on the original PR that were legit comments, but most of the work was there. I'll update the PR and go from there and see what Yugi has to say tomorrow.

A

Hopefully we we don't get the same sort of push back from signo that we got from sick apps. This is both of these are taking a whole lot longer than I thought they were going to it's.

C

Always happens that way, yes, well.

A

But we've gotten verbal agreement from both of the other parties about like a very simple scoped implementation that would unblock us, but maybe not you know, be a generic solution for everybody and it seems like you've, served on off into the woods a little bit. I think.

C

The lesson learned on both of these scenarios is that, if, if we have a limited use case to try and exit strategy through the the means that have been defined in the community, such as leveraging feature gates and alpha annotations, where possible versus making things more, generic, which is always it's just a lesson for death by a thousand needles.

C

C

I was hoping that Lucas smites do you know if he's going to be out for a while I.

A

Don't know I had to meet with him earlier this week where he said like his computer had died, his video wasn't working, and so it seemed like he was having some technical issues that he needed to get resolved.

C

Right, I'll poke him on slack too and see what's up, but I wanted to start to get at least resolution on the a che approach that we want to do for the between him and Jamie. For the one nine cycle.

A

Yeah I poked a couple of PRS and, like I, said he's gonna try to get to him later this week and so hopefully he's able to consolidate the AJ design stuff. At that time, too,.

A

A

Is there anything else if you want to discuss it, gender looked pretty empty.

A

All right, if not we'll, give everybody a half-an-hour back and call it thanks for joining and hopefully I'll have some updates about. The surge updates later today thought you guys soon.