Kubernetes SIG Cluster Lifecycle, 5 Aug 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: SIG Cluster Lifecycle - Cluster API Azure office hours 2021-08-05

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

All right, hello, everyone today is uh thursday august 5th, and this is the cluster cad provider azure office hours. As always, uh we invited by the cncf code of contacts, so please raise your hand and be respectful to everyone in the call.

A

If you'd like to raise a topic, please add it to the agenda open discussion section and if you need access to this agenda, you can join this cluster lifecycle mailing list uh all right. So let's get started, um oh and if you can't, please add your name to the attendee lists also, um so I don't see any new faces here today. So I'm gonna skip the welcome um and let's just go straight into discussion. I guess so.

A

The first oh yeah, so the first three are mine, but um let's take them one at a time. So first one um first thing I wanted to bring up is: I noticed there was a lot of churn in the azure managed cluster area.

A

Recently, lots of really great improvements in pr's, going in a couple of bug fixes- and I was thinking since that area is emerging- to be quite distinct from the rest of the self-managed clusters code and that it's been most like most of the code has been written by a couple of different people um which are mostly not the same people that are writing most of the code for the self-managed clusters. That might make sense to add, like a separate owner's file.

A

um That would be a step set or not a subset, but that those files are subset of the overall code base. So any of the overall maintainers could still approve prs, but it would allow the people in that owner's file to also review those vrs independently um and have you know, maybe more targeted reviews for those spouse. I don't know what people think if they're any opinions or dissent on this.

A

B

I think it's a good idea, but also if, if you say, if you're thinking about adding ace, it's probably can be a maintainer in general, but for everything, but but also it's useful to have the separation for the future. As it's like growing a little bit different than the rest.

C

David, I was actually going to say the same thing as later. uh Yeah ace has been contributing a lot to the project and uh I think it would be great to.

A

You know recognize that yeah plus one to that. um I will talk to ace because I don't know if he has the bandwidth to like take that on officially, but also in general.

A

I haven't talked to anyone that I was thinking of adding to that file, so that would be something to see if they're willing, but also in general, I was thinking of adding a few reviewers, not necessarily approvers, for managed clusters, um so, like the people, mostly who have been contributing uh a lot lately so uh loken, I don't know if that's the first name but uh broken rn. I think like the his github, endo and maybe nicola uh you know if you can uh contributing in that area, if you're interested in becoming a reviewer.

D

A

um So I guess action item is: I will reach out to those people to see if they're interested in being reviewers and then I'll reach out to ace to ask them if you prefer to be a project maintainer or a azure managed cluster maintainer.

A

Any any other questions or comments.

A

All right um next topic is um we yeah, so we haven't opened the shoe for renaming the master branch to main branch.

A

uh It's been open for a while, and we said we would wait for the alpha 4 release to be out before doing that, just because we didn't want to break all our test signal right before the release. I think now that we're uh past that we're in a good place to start thinking about it again.

A

uh Fortunately, the kubernetes, maintainers or kubernetes contributes group has made it pretty easy and they've outlined, like exact steps that you need to make before and during the transition. um So I think we just need to follow that. So the two things I wanted to figure out here is like, first of all, was there anyone who's interested in taking that online being the owner and like maybe delegating some of the tasks to others?

A

um And the second thing is one would be a good timing in terms of doing it.

A

I'm at boardsman.

E

um It seems like this week would have been a good time to do it, because there are fewer people paying attention as far as I, although there's still been a lot of activity, but uh sooner than rather than later. Obviously, because it ripping the band-aid off is a good thing, do other projects or cappy have a timeline for when they were gonna switch that we should coordinate with, or is it okay to just do this independently.

A

um I think it's okay to do it independently, but we should definitely post on slack about our intentions to you know see if we should coordinate if anyone thinks we should coordinate with the other projects, I think gcp has already done it. So they're ahead. Okay,.

B

And aws have done as well like a few weeks. Oh.

A

Okay, so yeah, so we're probably behind at this point, so we should get it done. um David.

C

Yeah uh any any thought to uh getting the calico patch in there um and then making uh then cutting a release is. Is that something that we are interested in doing? uh Oh, the only reason I'm saying it is, uh it seems like that's pretty close, and if we get that in kind of release move the branch um just. Would that be easier or would moving the branch be easier.

A

Yeah, um so I don't think moving the branch would prevent us from cutting a release or anything like that. It's just that it might disturb the testing like on pr's for a little bit. So if we wanted to get that merged it would, I think, probably be better if we merge it first. um As you said, it's pretty close and I think from nader's investigation last night is just missing a configuration that I need to go and update.

A

So I can do that right after this meeting and we can try to get that merged today and then target the branch change for tomorrow. Also since it's the friday, I I don't know if that's a good thing or a bad thing that it might be down on the weekend.

A

Actually it might be a little better if we do it monday, so that we're all around to fix it if something goes wrong, but um I don't know what what are people's thoughts on that later.

B

What happens if it's broken on the weekend, because then I think one of the things that will happen when you change the branches it will trigger tests on all like the different vr's. So maybe it's a good thing to get this out of the way on the weekend, when there's less things going on.

A

Yeah, that was my thinking too, but it might block some new contributors who are weekend contributors. That was my only worry. Oh okay,.

A

A

um I guess yeah oh matt.

E

Sorry um yeah, I was just I'd love to help with this, um because for whatever reason I really like kind of get stuff, but um I'm gonna be gone until wednesday. So I can't, if we're gonna, do it right away.

A

Okay, um well, so I guess I I don't know if I don't think anyone else um volunteers, so I wouldn't mind doing it, but I don't think there's like a huge urgency to do it right now, but we should get it done soon, um so we could also like.

A

I could also do some of the steps, because I know some of the steps need to be done before that happens, and so that might not be instant, because actually we need to like get some pr's merchandise like we need the appropriate, reviewers and approvers to take a look at them, so it might, we might not even be able to do it tomorrow.

A

Actually, so uh what I propose is that I can try to get all the groundwork layout done by next wednesday like get all those pr's in place, so that we're in a good place to actually cut the like cut over when you're back and you and I can coordinate on that. How does that sound.

B

A

B

If there's things you want to delegate to me, I can work on that stuff. On monday, okay,.

A

I'll take a look at the the list, there's like a to-do list and we can like split up the items if there are a lot thanks, although.

E

Don't don't wait for me to get back if things are straightforward and proceeding just but but I'll.

A

Be happy to help there's.

E

Stuff to do on wednesday sounds good thanks.

A

All right um cool cool, let's I'll, skip over this one and let's go to the last one. First.

A

All right uh david, I added an item because I saw you said something in a comment about. We should discuss this in office hours. So, let's discuss it.

C

Fantastic, uh so that item is uh using a service principle directly without actually using nmi or anything else.

C

The reason why uh I wanted to bring it up- or I thought it would be good to discuss- is to kind of talk about the use case and understand whether or not we want to support it. um So the the use case is really.

C

I don't want to run aed identity or I can't run it um and so, for whatever reason, um I need to use a service principle only, and that's that's just that's the only way I can uh make this work.

C

uh Is that a reasonable scenario a does it even make sense? um Can we run a d pod identity wherever we want? Is there a reason why somebody wouldn't want to run? You know a d pod identity, and if so, is this a viable solution? Anybody have any ideas.

D

uh I can give an example, just not about azure and adrenaline, but about aws and kim, for example, which is, I believe, a similar app for the similar purpose like 80, but identity.

D

So in our in our experience, we noticed that some use cases from some customers can cause uh qm app to crash a lot or to re gets restarted a lot so in general, it can add. You know a little bit of pain when it comes to operations. So I could. I could assume that maybe customers like that would maybe like to try not to use it if they're, if they had the option not to use it.

D

So if there was an option like that, for example in cap z, if if we can imagine scenario where ad file identity is causing some issues because it has some bug or whatever it might be, a good alternative to have an option not to use it and just leave cable controller. Take care of everything.

D

If that makes sense, it's a theoretical situation, but we had experience for another provider.

A

B

uh I haven't really looked at the pr too much but like what are we losing by not using nmi? Remember when we started doing that, we wanted to use nmi because it's already like tested established- and we know it- does all the things the right way. So what are we losing by just doing the thing ourselves.

C

um So we aren't losing, we aren't really losing anything. um The uh token client for auto rest will do the right things. um The the gate, the. What we use is code complexity, so we lose on code complexity, it becomes more complex. uh We have more permutations of identities, that's a loss.

C

It is not quite as simple, um so there's there's just more documentation for it. um We gain. On the other hand, we gain the ability to not have a dependency on aed pod identity. If we don't absolutely need to- um and I guess I think- that's pretty much it like.

C

Can anybody think.

F

Of another, like positives or negatives,.

A

So that seems like negative like what we lose if by adding that option right or by not adding that option. But what about? Let's say we say: okay, this is a value use case for some users, so we should have that as an option and we have both side by side. What are the reasons that we should encourage someone to pick? You know using part identity versus just using this like what do they gain out of it? If it's the same thing.

F

I think it's nader's question so.

C

The way I see it is uh we we've had folks uh complain of or not complaining, but open issues for uh stuff, like um I'm already running a d pod identity of this version, um I I don't want to run it from the version that you're using or I can't run it from the version that we are putting out in the infrastructure yaml.

C

um So I'm gonna use my version of it. uh Perhaps at some point you know we we end up having an incompatibility issue there, or uh maybe it's not right what they have it configured to watch only certain name spaces or something um this. This can start to be a little bit painful. um So I I think this lens flexibility to be able to not have to configure a deep identity, not have to use it and have something a little bit more self-contained.

A

So I I completely understand what the value is of not being able to not use pot identity, and my question is: what is the value of using it like now that there's this option on the table, why would someone choose to use spot identity? Is it more secure? That's the question.

A

B

Yeah, I guess I was trying to ask as well like. Maybe this is the future version that we don't want to have about identity anymore and we can have both at the beginning and then, if everything is like all right now, with this put this way, then we can remove the dependency. Is that would that be our goal?.

C

I think pot identity is, is the right way forward um in that it does, as cecile was alluding to, um and I think she already knew the answer uh but pot identity is, is probably a more secure solution than writing uh service principles to file. Since you know, pod identity, a will say, you're using user managed identities uh or at that point pod identity enables you to have a rotating uh secret, that's deployed by the azure fabric, so that you don't have to worry about rotating a service principle.

C

um And then you know redeploying your machines, because you need to make sure that service principle is in the azure.json file on an individual machine. The secret gets rotated for you.

C

It is a better solution for long-term use and, uh for you know, lowering the amount of like operational day-to-day kind of stuff that you would have to do, uh but it does say uh you know you have to take a dependency on this. This other piece of infrastructure.

B

The one tricky part that I remember, because I worked on adding uh part identity, is that we couldn't figure out a good way of making it an optional, uh installing it optional. So to solve these like to serve the value of having this feature is that you won't insult but identity, and I don't know how you would do that.

A

Yeah, I was going to say the same thing like this. This feature, I think, is good.

A

I think what we want is to have pride identity recommended and then, if you can't use it for some reason, you can use this to like opt out, but we also need an option to like disable installing it all together, and I think those should be separate, like those are two different things independent, because you could want to not install it because it's already installed on your cluster and but you still want to use it or you could want to not install it because you don't want to use it ever or you could want to still install it.

A

But then some clusters use it. Some don't for some reason. I don't know if that's a really valid use case, but it's possible technically.

B

Well looks like we all agree that this is useful, plus some other stuff, that we still need to figure.

C

Is that, indeed, uh consensus.

A

Cheyenne, it's your pr. What do you think.

G

uh So I think main uh complexity is like making it optional. So that's where, like I think we had, uh we didn't find a good way of like making adoption. The only thing I could come up with was having a replica field and making a zero if they want to disable it, but uh but the crd still get installed the cluster binding crds. So so that's where uh we were kind of stuck.

C

Yeah and folks will still see nmi, you know crashing because you know, host port is already taken, so we we just need a better solution, for you know having a different infrastructure, template or templating an infrastructure template.

A

If and if the replica account is zero, there should be no pods right, so there shouldn't be a crashing enemy, but.

C

uh That's deployed as a demon set.

A

A

And nicola you were going to say something.

D

Yeah, I have one more example actually specifically about adipode entity, so if there is no one, if we have an option not to use it, so it increases complexity on caps decide, but on the other hand, for somebody that wants to use and deploy cabzi for them. The complexity is lowered, so it's less complex. There is one one less thing to deploy.

D

We had an example a few months ago, so soon will be a transformer we'll be moving to using cabzi and cap in general, like completely full introduction, and uh we have a different way of deploying uh apps and services uh to our clusters. So we are using. We are deploying in our tasks. Currently, then it will be like that. Also in the future.

D

We are not using this exact uh same way to deploy cabzi. Like you know you would. You can read in caption repo, but we have a different way. We package our manifest differently and we use the same code with the same. uh It's the same project and not modified, but uh we deployed differently so having a depot identity in there. For us it was an additional task, so we had it before, but it was like an optional app that our customers could use, and uh now it's it's not optional anymore.

D

So it's something that we have to package up a little bit differently now and we have to deploy it as well. So it was an additional thing to take care of and also to take care of in production. You know during ops work as well, so overall, like uh more work.

A

So another thing we could do is so we could merge this change to like allow opting out cluster per cluster.

A

But in terms of installing it we could publish like a separate infrastructure components manifest that is without a departed identity like 80 pi, then completely removed um and then have like cluster cdl still use like the one with a departed nd um by default and have it be like the most like. That's our 90 use case.

A

We think most users are going to want it, but in some cases, if you do want to like opt out, here's the template or here's the yaml spec for installing kfc without it- and you might have to you- know, go the old. Keep ctl apply way to get it there or whatever. Your installing mechanism is.

A

I think you can also pass into a cluster kettle. You could also pass in like a file source or like a link, but I think that might just be the release. I don't know if you can specify a different file name to check.

A

Peter david, what do you think.

B

I think having the separate infrastructure yamo is probably the best idea, so you can choose whichever one you want, but just you have to be careful of which uh things we test requiring our tests, because just to make sure it's covered and like things don't get broken, and we don't notice.

C

C

I think we have to support it one way or another.

C

I'd almost like to see a feature flag around uh aad pod identity. If we were going to release something that allowed you to.

C

Allowed you to not have that infrastructure around. I don't know, I'm not sure they have a lot of thoughts bubbling around about that.

B

Well, we said this pr is not doesn't have to be tied on figuring out how we're going to install things differently. So we can get this done and then think a little bit more about uh and investigate how to do that. Make it optional, install.

C

Agreed uh plus one to the pr and uh uh thank you so much for uh working on.

A

It well all right, um let's move on, if no one else has anything on the topic, all right, so um just wanted to um talk a little bit about a proposal that I've been working on. It's in the prq.

A

If you haven't looked at it, please take a look. I'm looking for feedback and I've been working on the code, while this is being reviewed, so I'll have a poc 2 show pretty soon, but uh it's still in progress, and basically the idea of this pr is I'll. Just give like a little summary, um and then I can answer any questions. If there are me see how do I you file.

A

Okay, so um the tilders right now when we uh we've talked about this before in office hours, but when we create azure resources, we block on the completion of the operation. So if we do or create our delete. So if we do a put or delete we'll like send that put to azure and then we'll pull azure um until that operation is complete and we can return.

A

So what this means is essentially, if, like a virtual machine creation, takes like two minutes, let's say in like a bad day, then it would mean that for two minutes our controller loop is stuck waiting for that creation to complete, and it's not doing anything else. That also means that it's not returning any information to the user. From the user's perspective, things are hanging they're, not seeing any updates.

A

They don't even know that the creating has started um and it can be quite slow, and that also means that if you have like, let's say a thousand machines that you're trying to deploy- and you only have a concurrency set to 10, which is the default in our controllers, that means you'll process 10 virtual machines at a time which means you'll wait for the 10 to be created before moving on to create the next 10 and etc. That could take quite a while, if you're in a dynamic environment like cluster autoscaler.

A

So what I'm proposing is that we follow this pattern that has been set as a president in azure machine pools by david, which essentially stores the state of that long-running operation in the status of the objects, and it uses that to um like check on the operation the next time around. So instead of waiting for an operation to complete, let me show you the um the diagram.

A

um So let's look at delete for a second, because it's a little easier, but um basically, whenever you start a resource deletion, um the first thing we'll do is we'll check. Was there a previously running operation?

A

Do I know about a previously running operation and if there is uh no long-running operation in progress, that means that you're doing this from scratch, so we'll attempt to delete the resource um and also um yeah so you'll delete the resource, which means you'll, send a delete, call to azure um and one little uh difference from what's been done in azure machine pool and that's taken from cheyenne's proposal earlier that I'm doing here is I'm waiting for the thing to complete for x seconds right now.

A

That's set to, I think five, but I'm playing with that number um and that basically just say like if it is a short operation, like relatively short done in a few seconds, and then we can wait for it that we don't need to like requeue later.

A

If it doesn't complete before that timeout, then um that means we need to store its date for later. If it does complete, then we don't need to start the state anymore. We make sure it's empty nil and then, whatever no matter what happens? We always end by updating the status of to set the conditions of the object and then going back.

A

If you do have a long-running operation in progress, then, um instead of trying to delete you get you try to see where it's, whether that operation is at so you pull the status from azure, and um if it's done, then uh you can update the status and say things have changed. My resource is not deleted, so uh I'm in that good state. If it's not done yet, you just re-queue, um and so I think right now we're re-queuing to um like say req in 15 seconds.

A

That way, if we don't immediately review, because that wouldn't really be like that, doesn't really give us much advantage because then, like it's, probably not gonna, be done in. You know a few milliseconds if it wasn't done now, but in 15 seconds, there's a good chance that it might so um those numbers I'm still trying to play with to like find, what's optimal and trying to like do different performance testing to see which ones will work best but yeah, a slight difference with uh create or reconciling and by the way. Those are like.

A

This applies like all the like. If you're familiar with the code base the azure services so like, for example, this would be like a virtual network delete and for reconcile. The slight difference is that before creating, we do like a get and we're not doing that consistently in every service right now, but I'm hoping we can change that to be a little more consistent and the reason for that is that if you look at azure api, actually I have it open here, great um api resource and limits.

A

um You have more reads than you have rights um right. That's a very common pattern and reads are cheaper generally, so you want to do a read if to help you avoid doing it right if possible. um So every time we do again, we say is the resource already there does it have everything it needs? If it does then skip the rights, we don't need to do it right and only do the right one, it's absolutely necessary, so we use those wisely um for the delete.

A

It's a little different because deletes are actually cheaper than needs, so we're not doing a get before the delete. We're just deleting every time just trying and if it's already deleted, then we move on.

A

um So this is the general idea where's, my okay, that's the general idea.

A

uh What else should I show?

A

Oh, this is like a full reconcile of a um like entire azure cluster reconcile loop, so note that every time there's a it's, not very zoomed in, but um every time there's a context timer exceeded in one of the services we short circuit out of the loop, and this is once per service as well.

A

So the significance of that is that if you have several resources in the same service, so let's take public ips.

A

Usually we reconcile multiple public ids per loop, like you have a public id for one load, bouncer, and then you have another one for another load balancer that would like we'll try to do all of them before we like exit um short circuits, because the assumption is- and I I explained this in the in the dock somewhere, but the assumption is that a public ip will not depend on another public id ever and so it's safe to do those in parallel or concurrently.

A

um So even if one of them is not done, we can start doing the other one. So we'll kick off all the public, ip creates or all the public ip deletes, and then, if any of them is not done, that means we can't proceed to net gateways. So we like short circuit and update the status um and then maybe the other last thing I want to mention is uh our proposal on adding a bunch of new conditions, um and this is.

A

Yeah, so this is like uh all the conditions and the since um that, I'm proposing we add, and it's basically give like a way more granular like uh status, update on which resources exactly have been created already and which ones haven't.

A

um So that's what would be updated in that update status here.

A

So uh yeah that just wanted to show that and give like a little chance for like questions like life questions. If anyone has like read through it and has questions or comments, but if you haven't read through it, I would love your feedback.

A

Thanks for the comments in the chat.

A

A

C

C

The ordering of resources- and this is something that we talked about- and I I just wanted to bring it up because there were there- were two uh async reconciliation uh proposals and there's two types of asynchrony uh or concurrency that we are looking at. um This one is still uh serial in the in the way that we are approaching resources, um which is not entirely uh the order in which they could be done.

C

uh So there is uh still another opportunity for optimization after this, where we could build a dag of resources- and um you know do a few in parallel of multiple services and then another set and then another set- and this is really applicable, I think mostly or only for um cluster, uh but that that was basically the heart of the other uh concurrency proposal.

C

uh So I I just wanted to bring that up and great work. uh The proposal looks really great and I I hope uh folks will read through it, because it's actually a really really cool. Look at how you know to use azure and use it use it well.

A

Thanks um yeah, I actually documented this as an alternative, the apparel reconciliation and noted that it could be done in the future. It's not mutually exclusive, and also note that in the non goals I also wrote, increase or decrease the overall duration of a reconciliation. That's very important. That means this proposal is not trying to reduce the time it takes. It's just trying to do it uh in a way that we have a faster reaction time.

A

We improve the ux and also um there's the case of like the um in the end game, like of having many many machines or many uh yeah many machines. I don't think you could have many clusters you could technically but yeah. um You could create those like concurrently and start off the crates before, like they all finish, and that's where you would gain time like if you're trying to create 200 vms but um on one vm they're not going to gain any.

C

A

Any other questions.

A

All right, that's all I wanted to show um feel free to reach out. If you have questions once you read it and you want to talk about it.

A

um I think this is all we have for today. Does anyone have any other topics? I want to discuss.

B

Oh, I just was well sorry today, but.

A

Go ahead later.

B

I just want to say too quickly, if anybody has any prs that are uh they can close or, like finish off before we start renaming, that's probably helpful, because uh everything will get triggered off on like all existing open pr. So if you can finish it off close it, uh if it's not like active, that would be helpful.

A

Yeah, that's a great point and also, if you have a pr that's like waiting for reviews, that's ready and it's just waiting for people to review it. I know we've gone a bit behind on some of the pr's lately because we have less people and people go on vacation and everything. But uh please ping us and we'll try to take a look and unblock you.

A

So we can get those merged all right, david.

C

Sorry I was trying to fill in the dock at the same time, uh matt and I have been- uh and also uh dan jordan's uh have been looking at uh machine pool machines and the proposal is open and cappy. This will start to move the idea. There is to move uh machine, cool machines well kind of what we did with azure machine cool machines uh up into uh cappy, so uh to be able to provide uh machine representations uh for uh machine pools.

C

um So if uh folks have um some questions comments uh feedback, please uh take a look at the proposal when you have some time and uh yeah love to get points of view.

C

uh Does anybody should I give an overview of how, like the the solution, to the reasons or the problem, we're trying to solve, or do we just want to leave the proposal out there.

B

Is it the same as you've done here like same idea, right same concept but apply to copy.

C

Indeed, um so, there's a lot of common functionality for machines, health checking.

C

No drain a lot of stuff that we have been doing internally in cap c, that are already available in capi and if we're able to represent machines at that level, that also adds a level of abstraction to allow other components like say, cluster, auto scaler, to address cabbie machine pools and deployments in a similar manner.

C

So uh it's really um exposure at the generic level and uh you know reuse of functionality that already exists.

G

So this is uh this: is the question very specific to cluster auto scaler like I, I was thinking that cluster auto scaler, like scales, the resources based on the replica account.

G

This is something I didn't know that it deleted the individual machines.

G

um So that's: do we really like me cluster order scale to know about the individual machine, or does it only care about the node group and just reduce the number of replicas.

C

That's a really fantastic question, uh so cluster auto scaler has a lot of really cool logic, to be able to look at what nodes are least used and and what scheduling would be impacted by deleting individual nodes for us when we do like lower replica count, what happens? Is we just go and tell uh the virtual machine scale set: uh hey we're lowering the replica account. uh Actually, that's not true. uh Let me rewind. If we were to tell vmss to say: hey, we want less replicas.

C

Vmss has no idea what is scheduled on those machines and uh it would just shoot the machine and all of a sudden uh that workload there has been preempted without any kind of like uh safe. uh You know recording and drain, um so we actually delete individual machines, uh but first we go through record and drain uh where we, uh you know drain off the workload, move it out to other machines, and then uh we delete it.

C

um So you know you don't want to be running a web app and all of a sudden, uh your user gets disconnected, and hopefully the load balancer helps you out at some point uh you want to you want to do that proactively. You want to you want to drain workload correctively, so we can't actually rely on vmss just to decrement the the replica account um from the from the machine pool level. We can decrement the replica count and then we handle it appropriately in the uh chem z layer.

C

um When we move up machines at this point, then a replica count can be decreased and uh you know cabbie can pick a machine to get out or we can just you know, assume the replica and count and delete the appropriate machine.

C

Cluster autoscaler adds more logic to that and picks the best machine to uh get rid of.

G

A

So I actually like bought the same thing. I thought that we only changed replica count um and I actually started like on the prototype a few months ago to get machine pools working with cluster autoscaler, and um I got it almost there, like it's pretty much all there, except for this one function.

A

That is basically not possible for machine pools and what it does is it deletes a specific node, and so you need to be able to and the way we do it for machines is by annotating the um like the cluster api machine to say that it should be deleted um and for machine pools. We can't do that and I think there's another reason for that. I think the reason is cluster over schedule. It doesn't only deal with like scaling up and down the replica account I think.

A

Historically, it also has a functionality to remediate unhealthy nodes, which arguably is like a little outside of its scope, but that's what it does, and so the provider implementations of cholesterol scaler have to implement that delete, note function which will delete an unhealthy note. So that's also why we need that.

G

That might make sense yeah thanks.

A

Cool, um I actually have a question for you david on this um proposal, but uh and that so what happens? If a provider wants to use like the provider capacity directly at some point like let's say, I'm a provider, who's implementing my infra machine pool and my infrastructure allows, like um I don't know like allows, uh auto scaling or something with draining. I don't know. Let's say one of the providers integrates kubernetes functionalities into like their implementation of machine pool and they want to be able to like outer scale without using the cappy draining code.

A

How does that work? Because that was the original promise of machine pools, as you can rely on your cloud provider directly instead of having cappy orchestrate your machines.

C

So that's a really awesome question and I would love that. Can you add that note to the proposal? um We need to think through that a little bit better uh right now.

C

I don't know how we do that, because we have a actual replica account on the machine pool. So as of right now we still don't take into auto scaling. We don't take it into account at all, um which I isn't a mission right. So at some point we need to be able to.

C

um I'm not sure if, like setting you know, replica account is is desirable in that situation or if replica account ends up being hey, either replica count or auto scale, with minimum kind of setup, maybe min and max uh auto scale with like emitting uh replica count.

C

um But this is definitely something that that's a really awesome question and something we need to think through.

A

Yeah, I think that's something the kappa provider was doing well, especially because they didn't have autoscaler working at the time, but kind of like ignoring the replica account and just letting that be handled by the um and then using the cluster of scalar provider for aws, not the khaki one.

C

Yeah, I think it's a bad practice to ignore the uh oh, I'm sorry yeah nichola.

D

Go ahead, sorry, it's that that's what we have to do in our implementation of azure operator exactly so. We used only for now we're using only crds from capping cabsie and since we are running cluster, auto scaler in regular azure mode, so azure provider for cluster, auto scaler. uh In quite a lot of places. We had to make sure that we are ignoring machine pool replica account, so classroom scale can do a job regularly and just to add.

D

What's the situation that cecile described uh about uh cloud provider hg for example, or vmss having some implementation where they take care of training the workload?

D

It's not far from that, but there can be a similar situation to that where people are using some of those apps that watch, for instance, metadata service and then they watch. We use that, for example, in our azure implementation, so we watch for machine termination events and we drain the notes automatically.

D

So it's not like something that ever provides, but there are uh azure supports uh mechanism, so you can implement that actually on your own and I guess that that would be probably in conflict if a cluster autoscaler would try to do the same thing, and I think that aws, for example, even has I don't know if it's official, but in aws github repo github organization, there is an app to do that and it's quite maintained. Well, I think.

C

Yeah, there's there's a call out in this proposal about uh spot instance utilization um where we haven't filled it out completely yet, but that's one of the scenarios right.

C

You want to catch those events um and you want to have the node notify like kubernetes, that hey I'm going away draining cord myself and uh then take me out right um that that is definitely something that that needs to come into play there and the way that that would feed upward is that uh we're looking at having um infrastructure references exposed from the infrastructure provider, machine pool implementation, and so those infrastructure references as one of the machines goes away. It would also it should also be cleaned up in in cappy.

C

um I think it needs to be paired with what you're saying uh nicola uh with with something on machine. That's going to catch those uh instance metadata notifications um and something we need to describe uh in detail here.

D

Yeah for azure it's a little bit tricky for spot uh vms because you get only 30 seconds to do whatever you would like to do so.

A

Yeah, so I'm curious. If you looked at this proposal, it's actually been under review for a while needing more reviews, but that was exactly the discussion that we were having on one of the comments: uh yeah yeah. So basically, this is proposing that we use the termination handler for, like interrupting uh spot instance. Workloads and the big difference with azure is that we only get 30 seconds, and so I'm not sure that the proposal, as is, would uh work well.

A

So we need to yeah work with them to see how this integrates with what you're working on.

D

Yeah I'll take a look at the proposal.

A

All right, any other questions, comments.

C

uh Yeah for auto scaling uh nicola. If, if you're uh interested, I I would love to see like an issue or a proposal around how to enhance a machine pool to better handle, auto scaler uh functionality, because, like you're saying a replica count, goes right out the window.

C

Really it's more like you know, min max and uh or uh maybe some set of configuration uh in some way. We have to be able to mark uh replica count as hey this. This doesn't even matter anymore. This is this is useless information and- and it would be great to get that solidified before we move it out of.

D

Experimental, okay, cool uh I'll I'll try to right. Where I have something. I can tell you what we did uh dance form uh we added. uh So it was a sort of a quick fix, so we added annotations for cluster autoscaler and the logic is simple. So if the annotations are set, we use those and we ignore replica account. Otherwise we use replica account, but it would be definitely nice to have some more robust solution in copy, so not sure what that could be, but yeah. I will give it a talk.

A

All right, I think, we're at tim um thanks everyone and I'll talk to you. One sec.

F

Thanks everybody great great comments, yeah great consultations. Thank you.

F