Kubernetes SIG Cluster Lifecycle, 24 Jan 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG Cluster Lifecycle 20180124 - Cluster API

Description

Meeting Notes: https://docs.google.com/document/d/16ils69KImmE94RlmzjWDrkmFZysgB2J4lGnYMRN89WM/edit#heading=h.io2hjkir5u89

Highlights:
- Namespacing for api objects
- Summary of the node-controller-manager design review
- Discussion about managed machine states
- Status of PRs for Machines & MachineSets
- Validation custom provider configs

A

Hello and welcome to the cluster API breakout meeting for Wednesday January 24th 2018. We have a number of follow-up items from last week, so I thought we should starts meeting with those. If you have other meeting topics you'd like to discuss, please add them to the bottom of the agenda.

A

The first follow-up I want to talk about was there's an issue. We open number four six three talks about how we should namespace API objects and a particular. This relates to how to whether we want to support putting nodes in a cluster that represent nodes in a different cluster and we poked at this. Last week a number of people went in with input on the issue which was awesome and so Chris. Do you and I sort of summarize, where we're at and try to close this one up.

B

Yeah sorry, if it's kind of hard to hear me, but um the consensus was we do want to do namespaces from cluster AP resources. The main debate was over whether we put a field or annotation on the cluster object to let it know whether it's being locally managed or remotely. There was no good consensus on that, and so we I I'm leaning in the direction of not doing that. With the caveat that if we find that we need it, we can add it easily. Later.

B

um Annotations can be added for individual, like deployments that actually care about whether it's open, managed or not, and if we see that's used widely enough, we should probably figure out a better way to support it or at least standardize on the annotation. That's the high-level summary.

A

So for anybody who hadn't looked that issue? Does that sound reasonable? Is that gonna block any use cases that you know about.

C

A couple of us had talked about the field: I, don't think it would block us in any way. It seems fine that we could add it later if we need to.

D

So I understand this will actually can we add a link to the issue. So maybe I can just read a line. Yes,.

A

There's a link in the meeting notes: I, don't know if you have of the MIDI notes, handy and I'll stick a link into or if somebody beats me to it into zoom as well. Okay, thank you. I.

D

Asked because one of the things that I'm contemplating is I want to have a cluster that manages other clusters, but I want some isolation, and so it's unclear to me how many remote cluster objects I want to store in my management, cluster.

A

What kind of isolation are you talking about there so.

D

Any kind of configuration information secrets- the idea would be that if the management cluster was compromised, how many customer customers would be affected.

A

Yeah so then you're getting in towards some of the multi-tenancy discussions around kubernetes and how sort of strict of separation can you enforce and I? Think if part of that depends on what level of compromise is in the management cluster right? If, if somebody can take over, you know a single pod in the management cluster and you can restrict the access of that pod to the rest of the system. You probably have a better story there, especially if you're running pods on different nodes, where you have sort of the VM or hypervisor isolation.

A

If somebody gets root credentials on your management cluster, you know no matter how you do the isolation. You're. Probably out of luck right.

B

Yeah I just want to mention that nothing in the cluster API right now insists that you have credentials some people do put like project names and stuff like that in writing, but at least in our on our demo for a Q Khan, we had a service account that was in the pod definition itself for the controller. So it's how you would do isolation with any kubernetes application. At that point,.

A

Okay, so cuz I think it sounds like we have consensus here that we do want to support namespacing of resources. We are not gonna, add any special fields or necessarily enforce any standard conventions on what resource in a namespace means, and if those sort of conventions come out. Naturally, as people start using the system, then we can start enforcing those at some point in the future.

A

This I believe this implies we're gonna, add some new bits to some of our API objects like in particular. What namespace are in is that correct, or do they already support that via the meta imports.

B

Dream is just part of the object meta such as we have to use it. It changes how we register the types with the API server, but I think that's easily changed.

E

There would be automatically generated generating the clients if you want to make all of those chips.

E

B

It has some slight implication on quiet generation, but that shouldn't be that difficult, either yeah any code that uses the current client will have to change to a date. That namespace call at some points, maybe depending on which client calls the users.

A

Okay, so I were doing two action items in the notes. It sounds like we need to upgrade our up their registration logic to be namespace aware, and we need to make sure that we get the client generation updated as well to be named so far, do you have any other applications that we should write down, or is that sound like it to you.

B

Though that sounds like major ones and uh I would ask that a fan. Look into that since he's doing Nia the API aggregation stuff right now and we might as well just do it. There.

F

Is the expectation then you're on multiple controllers, one for namespace or is somehow configure to deal with multiple databases? I.

G

Don't think it right I, don't think it's decided yet so how do deployed the controller is still open. Question I.

B

Think you can have cases for both I mean this doesn't restrict what the scope of the controller is. You could have a controller per namespace or you could write a controller that works on multiple namespaces I.

F

Guess I, mostly I'm, mostly wondering about the default recommended controllers, would be the chosen strategy. There I think.

B

I would personally recommend a controller for namespace because you might be managing different types of clusters per namespace and you might want to have different flags on your controller for each of those clusters too. Even if they are the same provider.

A

Yeah I guess I would also say that I think in some ways this is a lot of the clusters that we would, you know, create through tooling that we're sort of singleton clusters that weren't part of a larger system probably aren't, can need to do this. Remote node management scenario- and this is something we're more building for flexibility extensibility for people who want to run a cluster that effectively manages other question. I.

A

Think in that scenario, where you have this sort of master cluster, that's managing room of clusters, controller per namespace or a single controller is sort of up to the person managing the master cluster, but for posts for people who are just managing a single cluster that they've created. They will probably just have one controller that manages the local cluster.

E

It also makes sense to consider the possibility, where administrator of the master cluster might not want to reveal certain C, RDS or certain or might want to keep some obstruction from the slave clusters right, so the node controller manager. Maybe we want to have two different kinds of plans, so say, for example, one client which points to the master clusters API server.

E

Let's say for for now registering the important CR, DS machine deployment and so on, and maybe one more a client which basically points to shoot blister wood, where we might want to look at the available nodes and so on so, but it can be made generate in a way that user, who has a master/slave kind of architecture, can do it that way, but a user who has the same cluster, which is both I, mean what a simple cluster is both master and slave.

E

He should also be able to do with a single flag, so that kind of thing may be helpful.

E

Because certain CR DS are certain apart, like secrets that we are using for the EWS a SS credentials in node controller manager, right so as administrator of the master cluster, we would not like to give it to the slave clusters, so we would like to hard hide certain parts to the master. So two hooks of two clients may make sense.

A

So in that case, what I feel like your architecture would be is: is you let users talk to your master cluster, to list machines that would describe the nodes in their client cluster right medium? When Chris and I were talking about this yesterday or right now we have a field, the Machine specification in machine status, part rather that points it's a node ref and it points to a node. But now we're talking about that.

A

That node is not part of the same cluster that the Machine lives in it's actually in a different cluster and right now, I, don't think we have a way to represent sort of these cross. Api server resources.

B

I mean way to do that is to like, if you get a machine object, and you want to look at it snowed, you list the cluster in the same namespaces, that machine object. It has an API endpoint and the node will always live at that API endpoint, whether it's the same cluster or remote. One.

B

But yes, I agree. We should be a better way to note that I'm just saying for now. That's one way you can think about it to get around.

E

The namespace do we have API server running for the slave cluster. We would still expect that administrator of the slim cluster might want to have anyone excess on the slave cluster right. So what we want is maybe, in our calculation what we wanted was. Maybe we can hide a specific things like cigarettes and Muslim class level of configurations. We keep those things.

E

Those kind of series on the master user still gets the benefit of the deployment or scaling up or scaling down so some kind of subset of CR DS here and there, but that's a that's more about creating a hook inside for two different clients and feeding two cube convicts, but maybe we might want to think of urgently predator.

C

E

We will be entertained both of the possibilities.

B

Talking about maybe just are backhauls saying they have least access to the machines, but they don't have worst access to any of the secrets.

E

Secrets, but as admin of slave cluster, he can pretty much do anything that he wants to do right. So we want to go one layer on top of it, so that only the admin of the slow cluster keynote.

E

We check to see our dislike machine deployment and so on. Yes,.

F

I think the use key is being described is if I have a master and slave Questor configuration I want the admin of a slave Questor to resize the cluster. So how do they get access to this cluster API to remove notes? But honestly they do a lot more than that.

F

That's at least what I got from this comment. Yes, yes,.

A

So the set of desired machines for that slave cluster lives in the master cluster. What kind what Chris is saying is: can you put our back rules into the master cluster that allows the admin of the slave cluster to resize a machine set in a master clusters API, but not add secrets or machine classes or whatever the other resources you want to protect are, and that would be our Beck rules and them in a master cluster which is not nor tor. Not then modifiable by someone who has complete control over the site, cluster.

E

Yes, that can that can probably solve problem at some extent, but it depends that what level of visibility we might want to give. So we might want to hide the complete master cluster from the slave cluster. It just depends on different kinds of architectures in one layer we might want to give some visibility of the mistype luster to the sale cluster set mean at some point. We might not mean why I might not want to give any kind of visibility of the master cluster. Just for the usage of reasons.

E

How big is good, it's in place, but yeah.

E

Just wondering this is this trying to figure out the generate? Yes,.

A

I'm trying to think like are you talking about sort of splitting the resources across the clusters like in your case, you'd put the machine classes in the master cluster, but the machines and the machine sets into the slave cluster. It would reference classes in the master, cluster flaky.

E

Stuff, like that, some critical information goes to master well, there's zero visibility for this live.

A

I'm not sure maybe read-only visibility, but if there's zero visibility, then you're referencing in an object that you can't see like I, guess your controller might be able to see both from.

E

From mustard there's still, a visibility of the complete slave right so muster can still do a lot of things.

A

Interesting I think we're actually kind of veering a little bit into the second topic for the the meeting anyway, because that on Monday we had interesting hour-long discussion about the node controller manager, which is sort of alluding to in this conversation. I linked from this many notes to a notes.

A

Talk that we put together from that describe that discussion, where we had I sort of puts a bunch of questions after reading through the design, Docs and the documentation and the types files to try and figure out how much overlap there was with that existing code base and and the cluster API proposal that we we've got going here and I think there is a lot of overlap conceptually.

A

There are a couple of edge cases like the one we're discussing now, which are interesting because, as if you folks have a couple of interesting use cases where they're trying to separate different personas in a particular persona of someone who can sort of manage the underlying infrastructure versus a persona of someone who can sort of scale that infrastructure, but not not manage some of the details of how its interacted with which I think are not yet well addressed in an existing cluster API proposal.

A

And so we've been thinking since then about about a sort of how to fold some of those things in so I. Think that was it for me, at least that was a really useful conversation to have.

A

So again, I think I think the notes of people when you go take a look at those I. Don't know if there's any follow-up we want to have in this meeting, I think that the our sort of ran out before we got to a great conclusion points or even sort of got to the next steps we're still going through some of the questions so I think we'll probably have some some follow-up discussion there about sort of the the right path forward.

A

As anybody who wasn't at that meeting curious about any more details about about what happened, I'm- happy to recap- some or I- don't always say we can do the whole thing. It would take another hour, but any particular details.

H

I'm fairly interested in in general I have a few days ago. I finally got the prototype AWS Fork of cube, deploy, kind of working and I need to push that over to you to as a PR just for discussion purposes. So as I'm looking at these node controller manager notes. There's a lot of interesting comments um were later relating to been during, for instance, and management of secrets that are relevant. uh This is the first time I've looked at these notes, so I don't have a lot set.

A

Great well I'm sorry, we weren't able to loop you in earlier because it sounds like you might have been interested in showing up to that meeting. Oh.

A

All right, if there are no other comments, I'm gonna, I'm, gonna skip I'm gonna switch the order. The next two things on the agenda. One thing we talked about last week was trying to define sort of a standard machine life cycle. If you will of the sort of states that a machine goes through as it's coming up being managed by the cluster API and then eventually going away, and so Philip and I sat down.

A

We sort of talked through these things verbally last week, but I think that was probably pretty confusing for most people and film. I sat down and sort of drew up a picture of what this looked like and tried to describe what the different states were. We shared that out at the end of last week on Friday to give people a couple of days to look at it before this meeting I.

A

There were definitely some people who jumped on it and added some comments, which was awesome but I kind of wanted to present it again this week with a picture and sort of talks through the states. One comment that was really interesting was Daniel, Smith who's. The lead of the CG API machinery group mentioned that for other parts of kubernetes they explicitly did not create such a picture or diagram of the state transitions because they thought that would prevent them from being able to make changes going forward, and that is one thing.

A

I plan to talk with him about in the next couple days is whether this is just a bad idea to write this down and try to sort of codify what the states and the state transitions are or, if there's a subset that we can codify. That would at least allow us to build some common tooling without tying our hands going forward.

A

Assuming we can get sort of past that discussion. If you guys pull up the picture here, I can sort of talk through it again and I think there were some questions about a couple of the states in the dock and I. Think people who asked those questions are here, which is even better so I, think draining was may be contentious. So Martin had asked whether we should have an explicit draining state in the API partially, because you can do drains out of band.

A

Is that the concern you had there? Martin? Oh yeah,.

I

I mean it's because if you got drained, then we might some other condition than some other condition, depending on the on the types of operations that you want. What about coasters, so it did entity we could become bloated. In my opinion, we should just focus on the on the the least amount of states that we want to support.

A

What do you think the the minimum set of states that we care about is I mean it's be? Creating servings deleting deleted yeah.

I

And probably just more or less probably complete, rink or something but to the dosa optional. In my opinion, if you look at the normal virtual machine, those are more as the states that you want.

A

To yeah I think it's interesting because, like if I look at what we did in in gke for doing upgrades, we we drain before doing upgrades to move things off the vm, because we delete the vm and replace it with a new one, and I think it is kind of nice to be able to see the progression of those steps like you can tell.

A

Am I still draining, or am I actually you know reconfiguring or replacing this underlying VM sort of two distinct things, because otherwise we end up with, like your machine, is basically always in a serving state, and you can't tell what's happening to it during that sort of long piece of the lifecycle of the machine where it has been created. But now it's just. Is it always just happy and running after that? Or can we tell what's what's happen to it in the meantime, I think.

A

That's why we were trying to sort of pluck out some of these other states of you know you can reconfigure a machine without draining it or you can reconfigure a machine after draining it, and you know that reconfiguration could be changing. The kubernetes version contingent the container at runtime changing the underlying operating system like there are lots that changing parameters on the cubelet. Now there are lots of different things that we might want to to change and be able to have controllers sort of tracked that state declaratively, and so I think some of this.

A

These transitions might be more of like an internal API that we want to enforce between different controllers. So we have a set of sort of States and machines, walkthrough, that's consistent across in different environments, and maybe we don't expose all of those out to the end-user and the public facing API. But it's more of like we agree that internally, as we're building controllers, we're gonna sort of implement the similar workflow.

A

Because I think Daniels concerned was, if you put this in your public API, that sort of ties your hands going forward, because people start to depend upon the states and the state transitions which makes it hard for you to change them like. How do you add an extra line right or change where a line goes.

E

Maybe just just it's a thought, so we tried to learn from the existing cubanía disappear for the motor controller manager and what we did is we basically added a last operation field, so the last operation could be create, delete and in place. It could also be update and then there are two layers, so one layer represents the machine. State and the second layer represents the machine phase. Yes, so in Machine State we got only like only very minimal. Only three states like processing failed and successful.

E

So you can represent that the last last state was if it was create, then it was processing failed or it got successful and on the second layer the Lillian machine face can be shown as part of the status. But you say it's either it's pending or it's available, but not yet join to the cluster or it's running already or somebody has to got the tournament terminating trigger the deletion. So it's an a terminating phase or it is completely unknown phase or its failed, States or draining phase.

E

So there could be more phases for the current status, but the state of the machine state could be minimal and last operation could identify that. What was the last thing that we tried to do? It was creation or deletion on a reputation system. That's what we did for eventually for controller manager, good controller information.

A

So it seems like you're you're, still exposing a similar number of states, you're just doing it in across two different fields. That's not right! Yes,.

E

So it's more of a layered thing, so certain certain controllers might want to rely more on the Machine state than machine phase, so say, for example, my machine set controller might not want to check the Machine phase, but it will just check that the last last last operation was create and the Machine State for that was so it can already act on it.

E

It doesn't need to know that what is the exact machine phase machine phase would be more for the user or human perspective that what was really happening to the Machine right now, whether it is available to join available, then it means that the machine is created. Button has not joined right now running the process is running heavy, so we tried to create two layers of the machine students, interesting.

A

Yeah I think what you said there there was I thought was most interesting was that they might be targeted at different use cases where the states were more targeted at the higher level. Automation and the phases were more for people and human consumption.

A

I know that in the way we're trying to handle errors right now in the cluster API. It's also split that way where we have like here's, an error string that supposed to be machine, readable and here's, an error that supposed to be for a human right, I think so we're we were thinking about some of those similar concerns about splitting the different audiences apart, because it's often really difficult to do something that satisfies, but for those audiences.

I

About that as well, we also have unique where this weekend events. So there were no discussions, don't you do this points for events, so some mistrust on standardization or some ideas about more as deep as that we won't get on the machine resource yeah.

A

I think an events is to me one of those things that falls under the human audience category right because of the way that they are tracked by the system and sorted by the system. It's I think in pretty much impossible to build automation that relies on events to drive it forward, because events can easily be lost right, so yeah I think for automation.

A

You need something more persistent like a field in your API, but you can definitely help help the human audience and help debugging and understanding of what's going on the system through generating events and I.

A

Don't I, don't think our prototype implementation generates any events yet nor we discussed sort of when it would make sense to generate events, but I mean if you look at the state transition diagram, I, think sort of each of those state transition arrows is is a good candidate for something that's changing in the system that you might be interested in like we are now drain this node. We are now updating the OS. We are not recreating this node, like those all make sense for points when we generate events.

A

So I guess the question then becomes: does it help the the automation to know that they know those intermediate states or the intermediate states more for for human consumption where something like events or your your two-level phase and status split makes more sense. Philip I, don't know if you have any any thoughts, you're, very quiet, yeah.

J

So yeah feedback first so I think like it might help us to try to like brainstorm what we would need in the higher level controllers, because that would drive, probably the the requirements in this like state state that we need right. So not sure if we there was this discussion happening and I was not present or like, for example. What do we need to create machine deployment which respects like disruption, budgets and can do row bugs things like that right? So is there anything that that you eat from the machine itself.

A

Something on on our end, we haven't haven't explored the heart. What the higher level controllers would look like quite yet I know that the the note controller manager, folks from s IP, they have those controllers in place right and they they have this fields and so I'm curious what they think is necessary to build those higher-level controllers.

E

So I think to stick this Way's we would need at least I mean machine said would not need to know lot of details. It would just need to know that at certain point, machine is in a healthy state or you can always categorize or always bunch up bunch of searching faces, ease and term it as a healthy answered term and categorize certain other phases and tell me them as unhealthy right.

E

So Machine State still considers the phase like the initial phase, where it is creating when we call it pending and when it is covering the available state and when it is running state we categorize all of them as a healthy, and then we categorize other phases where it's more of a failed. Well, it's a couplet fails or discretion happens.

E

So in all of these cases, machine controller declares declares the state of the machine to be filled, and at that point machines the Machine set decides that now this machine has to be removed and needs to be recreated and on one layer on top of it machine deployments. So we don't really expect the machine deployment to interfere much with the machines we expect only machines said to interfere with the statuses of the machines made from the from the Machine deployment.

E

We only list out certain certain categories like these are the available set of what are the number of number of available nodes? What are the number of labeled nodes available which are available nodes? Which means which are the curve which are the nodes which are fully labeled according according to the label, selector criteria and so on?

E

But the major part, major part of reaction based on the status or the state of the machine is, is from the machine set, so maybe in feature we might want to categorize all bunch of certain phases into a single phase and the minimal set of state from on which machines machine set one period and not lot of this well machine said, gets confused that in this phase, I hope to do that and that.

J

So, for example, you mentioned their health health status of the machine. So do you do distinguish, it does like one of the states or phases, or this is like something separate.

E

So so, currently we let's say the pending state, which is the initial. When the machine is created, we call it pending. Then one machine is created but yet to join. We call it available available phase and then once it's connected, we call it running phase right. So all three, all these three machine phases we call them machine state, is running or successful, I think yeah, so we called the running machine state.

E

So all these three different phases gets categorized as healthy or running machine state and in the rest of the cases where, where the Machine phases like it's failed or by the deletion, it failed or something can happen. Something else has happened like it has gone into the unknown state. So we have also seen the cases where your machine goes into a black hole. So you won't know what happened. I mean networking has gone down or something right, so the Machine goes into unknownst unknown, face so catch.

E

The word face and state yeah, so the Machine goes into unknown phase, so that unknown face with wait for five minutes and after that we convert that unknown face in profile, so that machine that initially for five minutes, the machine was in a healthy state, but then it becomes in that way. Instead, so that's what so, we more or less try to bunch up the faces into a minimal state, because eventually machine sucks to know whether it should recreate the machine or not to maintain the number of healthy replicas the it doesn't really care.

E

Whether machine was down from the cloud provider or cubelet went down so on.

E

Maybe I can try document these parts and put up in one of the proposal. Yes,.

J

It's like the interesting part of like the interesting to me. The most is the the part where or this higher level controller needs to, and so for example, there is some unavailability budget. So, for example, someone configure the cluster that no more than two nodes should be should be unavailable because of upgrades and and so on, right. The rest of them should be.

J

You know, capacity for pots to be able to schedule, and so then, like the higher level controller, needs to understand that the upgrade should slow down, because some note is unhealthy or the state is basic like he's under repair, and then you know that, then the update should go slower or stop until it's enough nodes are fixed, so stuff like that I, like I, feel like like.

J

We may need to actually understand that one note is another unavailable for scheduling because it's drained, so this is something that will not just wait and see what will happen because we know it's drained. That's why it's another rule versus it's just unhealthy because, as you said, like networking is flaky. So this machine, or something like that, so maybe distinguishing those two things- is helpful for economy. That.

E

That's a good question, so we have considered this part on the machine deployment controller right. So in the nation deployment controller we have exposed two fields or mechs search and mix unavailable machines, so Mac search is self-explanatory.

E

That number of healthy machines, plus the number that we mentioned as part of the mixers, is the maximum amount of machines deployment. Will any point in time create right and makes unavailable. Machines is the number of machines which any point in time will be unavailable in the cluster. Well, while it is doing the rolling update so deployments, job is to mainly mainly to a rolling update post, rolling update order rolling back.

E

So, as you said, if loading update is happening and let's say, two machines got updated properly and the third machine is stuck so at that point we wait till the third machine actually gets up and becomes healthy. If it is not becoming healthy, then we then the rolling update automatically gets paused. um So we have this thing: I need similar the way the deployments were in Cuban. It is right now for the pods.

E

So if you see, if, when you are doing the running up, wait for the pods, you have capabilities of pausing manually in between, so you can actually execute the command and pause the rolling update in between all if running, upgrade is failing. If the new pods, the new version of the pods, are getting stuck somewhere, the rolling update automatically gets paused so that thing happens in machine deployment as well. So eventually you will end up in a hybrid cluster Abell.

E

It should not happen that all of the machines go Sandow and Hill, this state and stuff like that. So that's that's. Actually. That was actually a very good point that we need to stop rolling update in between from the deployment controller if things are not going and they're. Actually, we also plan to think of a scenario where we need to think one layer on top.

E

So let's assume that we have applications running in the older version right and when we are doing the rolling update, we should actually also ensure that those applications are able to run properly on the newer version of the machines that we are transferring. So we should also check the pods are getting into the ready state and that's where we actually meet to check the PDB so for disruption budgets. So we need to check those parts and then the application layer.

E

So we have not yet implemented the insurance on the application layer that the applications actually gets up again when the rolling of flow discipline. But we have that thing in the pipeline.

A

And similarly, and when you're choosing the next node to replace during your update, you should also be respecting the part, disruption, budgets and slowing down the update, not just based on max unavailable on the machine deployment, but also based on the pod disruption budgets. The pods that are running on those nodes.

J

Yes, so that was also pretty interesting, so the dis waiting for whether the pods can actually successfully run on this new version of of machine. So how do you? How do you wait for that like what? What is the idea here? Do we have like another phase where you're like validating that this new version is fine for next few minutes before you proceed to the next machine, so.

E

That part is yet to be implemented, so we don't have it in the implementation, but just what you say it is correct at some point you will have to label the machine or we will have to create a new phase for the machine. The machine might be healthy. It might not be really ready for further routing updates.

J

E

That's a good point: if you still.

J

Keep these skills still keep this token for disruption. You don't release it for the next machine until you are out of this face or state or whatever lethality.

E

The machine needs to be the thing is yet to be: it's not implementing I.

I

Think is that, for example, the replica set you need to know which machine clock should delete or, for example, arbitrary control like the outage carer or somato scammer can actually tell you do it I want to delete that machine, the annotation or by similar means, so this I think should also be present in the yeah it isn't or in the API or just in your control.

E

Yes, I think that's a slightly separate part well when we try to integrate the autoscaler right. So it's really interesting that with Otto Skinner we will be able to scale up very easily. That's not a big problem, but when we do a scale down, we need to think about a lot of things yeah.

E

So we need to really need to check that what kind of parts are running in it or we actually need to prioritize the machines that which machine should be scaled down first now, in that case, we still might want to put the intelligence inside the other deployment on machines set in a way that, whether we put a label or a notation on the machine object, so that deployment can understand that how could I have got firing machines, but this is the machine that should be scaled down. First yeah.

E

That thing that is I, think very interesting. Question and I was also very curious and I asked in one of the channel that I wanted to know that is there any plane from the past or the scalar guys to to into think or already have a proposal for the design of integration of autoscaler and machine api.

E

A

There are two answers: one is that they they already do. The first thing you talked about inside the autoscaler. So when they're looking at scaling down the cluster, because I think that it's under utilized, they look for a machine to scale down that will have the least disruption right. So they look for a machine that isn't running. You know singleton pods. It won't come back because they're not underneath the controller.

A

That's maybe only running system pods, you know, or things that are in daemon sets that they won't delete a node that will disrespect upon destruction, budget or something running on that know it right, so they already take all that into account when they choose, which node to delete all right. So they're not saying scale down this MIG by one they're saying delete this particular node from the MiG or the SG and I think they would want that same feature in the machines out of the machine deploy.

A

Maybe I of we are smart enough to know which thing to get rid of. Please leave this one for us, not please scale, but down by one and I trust you to pick the right one, because there are cases where the clusters underutilized, but they don't want to scale down by one, because it will violate one of these constraints. That's your first answer.

A

The second answer is I've been talked a lot with Marcin, who is from the Google Warsaw office and is part of the sig auto scaling group about integration between the cluster API and the autoscaler, and you know he was pestering me yesterday about like timelines and when they should start trying to rebase and so I think there are discussions there, I'm, not sure. If, there's anything, that's been written down about what the autoscaler would look like or what the proposal is quite yet, but I can ask him. I should see him today.

E

A

Bill did you have any other other questions on this topic? I feel like we're petering out of.

J

A

I'm not sure we've wrapped up. Yes,.

J

I think we should maybe wrap up with some some action items, so I think it sounds like the better understanding of the use cases of like what do we actually need as an input. So, for example, we talked a lot about healthy versus unhealthy and those phases and States so so, for example, the dog that the Roberts and I prepared lee doesn't even talk about any health states, or also it's not talking about like intended States. Not all the failures are like outside of scope.

J

So it's not something that it's touched on and I totally agree with. You guys that it's it's important, that's an input for the higher level controllers, so I think we should have a is full picture first before we can but understand like before. We will be confident this is really de the states or phases that we want to end up with all right, so these I'm not comfortable yet with you know, seeing this to.

A

The end: okay, how do we make for progress on that? That sounds like a somewhat nebulous action item. If we should yeah you.

F

A

Is there anything more concretely that we can try to do you know maybe before next week, when we circle back.

J

So I think we should so I will try to think a bit more because this is not something I already have in mind about these disc, healthy versus unhealthy, how these can be or if either modes, how with that bit that can be combined with River States. So that's I have any opinion on that. So I would definitely trying to figure out this part, but I can hear that there already some proposal so I also need to catch up on that on what is already there.

A

Yeah I think Hardwick, you said: would it be helpful if we wrote something up about what we're doing right now and I think the answer is definitely yes, if you guys could sort of like start a Google Doc, which makes it easier for people to comment on.

A

That would be really really helpful for us to to help the better understanding of use cases why you guys made the design decisions that you did yes and how we can either fold some of those decisions into the cluster API or make sure we're being the same use cases in a slightly different way. Mm-Hmm.

E

A

If you can take that as an AI on your side, that'd be awesome.

A

Excellent, so the last thing I had on the agenda was to discuss the machines machines, PR and the Machine set PR, so the machines PR has been mostly idle as Jacob as I mentioned last week, transition to working on SEO instead of human Eddie's, he put forward a proposal of how to wrap that up and get it merged, which everyone agreed to and I need to go poke him to actually finish that off, because it's not his full-time job anymore.

A

So it's kind of a side project for him which plans to get that merged and then Fang as part of moving things from CR DS to a great ap is sort of moving those types over into the new location.

A

So I think that one we have a relatively clear path forward on, although we I've been poking at him for the last couple days and we haven't finished so I'm hoping to get that wrapped up this week, the Machine set PR has been open for a little while now the the Lucy guy sent that one over and I don't know that it has any comments on it. Yet so I'm not sure if anybody's really looked at it very deeply.

A

I guess Martin looks like you added at least one comment which is which was a while back I. Think a lot of the discussions were having sort of swirl around things that might need to fold into that.

A

So the discussion that we started on Monday about the distinction between just having machines or having machines and machine classes, and that I think in my mind, folds into the machines FPR in some ways, because I'm Nick not yet convinced we need classes at the lowest level, but I think we probably need them at a higher level, especially to enable things like the autoscaler and so I'm trying to figure out.

A

If there's a way, we can propose a machine set API where we bundle machine sets or even machine deployments, with machine classes at the higher level, where we expect users to want those ways to sort of stamp out lots of copies of things. But then at the lowest level, where we have machines. We don't have that complexity so that a user who wants to just use machines doesn't have to worry about linking together, multiple different objects.

A

So that's sort of what I've started thinking about in terms of where I think that might where we might be able to end up with something that sort of meets both use cases of having something very simple for users as they ramp up, but also sophisticated enough to meet the needs of the autoscaler, and they need some more complex deployment scenarios.

E

Maybe you just on a side note if we discuss this on Monday, but we could not get enough time on it. So we might also want to figure out a middle ground for a motion.

E

Machine class and provided config, so middle ground might look like following so you might have a single CR deformation class and the main benefit that I see from the provided configure way is that we can add the arbitrary number of key value pairs and we can make the Machine controller smart enough to understand it and talk to cloud provider where we don't really need to do. Api, versioning all the time when something change.

E

So what we can do, what we can't other do is we can have a stable fields in the machine class such as waist image or disk, discreet, eye disk size and so on, which are already very stable and they don't need to be modified. But then we can have a new field in the machine class. Let's call it where you configure itself and that's where we can put the arbitrary key value pairs and the Machine controllers would still assume if anything changes on the cloud provider. We do not really want to immediately change the version.

E

So now we can make the motion controller smart enough to understand and leverage that new features that came up on the cloud you know so just thinking through it, but thought me may make sense at some point, because details redundancy of lore of Troy Durkin fix inside the machine looks scary to me. The only reason is that the only reason is that we still have a question on there.

E

We go the cloud a night weight or whether we really go the post post reboot way right and if, if anyone decides to go the cloud I night weight and the guy would want to put the cloud config inside the provider country and that's where it will explode so yeah.

A

I think I've seen that pattern in other API. It's where you have like here are the things that we are pretty confident about. That we think are stable fields that are gonna, be consistent across impairments.

A

You know you can even represent it instead of machine type, as you could say number of cores, and you could have a mapping on different clouds to what type of machines that means, and then you have basically like a extra key value chunk that you can add arbitrary, other config things that will get passed through right so that that pattern is is established in other api's they're, trying to abstract different cloud environments and.

E

The cloud in ID part that I mentioned is actually something that that can prevent a lot of us from brightly consuming because, if I my cloud, you know it would be few KB's and if you try to put it as part of the config, then each machine having that cloud in a cloud a night inside it will be really scary.

E

A

We talked about a little bit with allude to you guys at cube con of of having that cloud in it instead of being stored in the API itself, be something that's maybe inside the machine controller right. So if the machine API object doesn't have like the entire cloud in it like startup script and all that sort of stuff, it just says like I'm, underneath this controller, the controller looks at that and says: oh great I know for you. I apply this cloud in it.

A

That's another way to hide it from users where users can't see it or modify it, but not sort of burden them with like it being dumped in their faces. They're trying to to look through the different API objects.

A

We have seven minutes and Martin added something at the bottom of the agenda, which I want to make sure we don't we're at a time for yeah.

I

So more or less like now, we have like the Machine API, but then everyone without their custom needs much more or less like custom, configuration, etc. So how this the API, the the machine API server, actually do validation for the configuration on the running machine. So when you create a when you submit in the machine, you probably want to check if the require, if the configuration is ok to and the only way I can think right.

I

Now is everyone to have like external web hook or what was called Mission Control yeah for external external admission, yeah logic, the.

A

Thing I, don't know if you are a jump in there. We talked about this earlier this week, so.

G

For the because Cullen demons machines are used in CRT, it's really.

J

No way to do any.

G

If it's valid configuration or not here, but when we move to the aggregate API server, it's possible to add a validation when you create the new machine object. So in our case you can like the general, has an error in such case.

I

Yeah, but this is only for the machine for the machine API but y'all for the custom bit it's more recent work together with the thank you.

G

A

I think we did talk about maybe using admission controllers for that. The other option that we talked about is the machine controller could see that this new resource was created. The provider config I think it's a string right now what we talked about, making that, like an actual real typing humanities, there's an issue open.

A

The intent was for that field to actually be itself a kubernetes object right. So it still has you know v1 whatever. This is now object. It's for this controller, that's encoded in a way that we can. We can do better sort of board backwards and forwards compatibility and versioning on that.

A

The controller understands how to read so the controller can read that in do its own versioning and then spit back an error on the machine where it says you created a machine with this desired state and that desired state is invalid, right and I think there might actually be an error type in the top level machine object that basically says like an invalid desired state, and so that wouldn't be you submit it and it immediately gets rejected you'd submit it. It would be sort of accepted as a possible desired State.

A

The controller would look at that and say, like this, isn't actually possible and kick it back out and basically take that machine and say this machine cannot be instantiated.

A

Likewise, you know there are going to be cases, especially as we talked about with the generic past a couple of extra fields to the underlying cloud provider that aren't represented directly at the top level, where we're gonna get something in and we're not actually mailed to know a priori whether it's a valid state, we're gonna call out to the underlying cloud. It's gonna reject that back and we're gonna have to say. Sorry, you specified, you know it's an availability zone that doesn't exist or you specified.

A

You know an instance type that doesn't exist right, and so we also have to take care of that case anyway, and so we can sort of use the error reporting mechanism in a similar fashion for both cases and that's not to mention things like you ran out of quota. We can't scale you up, like that's another case, where it's more of a transient error where you could add quota and it could then later be reconciled. They're gonna be states where you're sort of in a permanent error state where your configure is just plain invalid.

F

Only a time at which you, how would you clear there I guess for the transient cases I'm not too familiar with the API? How things are working out terms of like janetti's api is but will be part.

A

Of there, so, if like, if there's like a quota error, that's a transient case I could imagine the Machine controller, saying I could try to incentive machine. You know in OBS told me, I had no more quota for that. You know project in that zone. I'm gonna wait now. Maybe I'll try again in ten minutes or at that is.

J

A

Like have an exponential, back-off and I, you know eventually I start trying once every hour I will keep trying like I'm not going to give up, because I know this could be resolved externally by an actor going changing photo, whereas if they started crate of Vienna, it says like sorry, you specified in machines, but it doesn't exist. There's not really much point in trying again, so you can basically put in sort of a terminal this. This desired machines that you gave us is not going to ever work.

A

So I think it's useful to distinguish between those two cases make sense.

A

Does that help? Nor did you are you still confused.

I

Yeah yeah well yeah, there's a transcendentalism! Well I would be like if there was a way just to check for for like if this type of string or whatever that can happen, drink yeah the issue, a mission, object, yeah.

A

Yes, I think we can look at admission controllers for that I think Chris admin Chris had to drop off, but I think he mentioned that it was possible to do that with admission controllers. So that's one way to look at it and, if we're running our own extension API server, presumably you can stick admission controllers in that API server. To do some of that initial validation also.

A

Which is a little unfortunate because you're decoupling it from like the actual controller code, and so you don't want those things to get out of sync, but it does still give you the flexibility of being able to have different. You know different machine controllers with different input, validation without having to surface everything at the top level and try to generically abstract across every cloud, which is I, think something we were really scared of trying to do it.

A

We aren't we're not trying to build another terraform, so all right with that, we are just about out of time. Thank you for everyone coming. This has been a very productive and fruitful conversation. I will be out of town I'm out of the office next week, so I'll find somebody else to run the meeting.

A

Please continue to show up, don't don't not show up just cuz I'm, not here, there's a lot of stuff to discuss and I look forward to watching the recording and reviewing the notes when I get back and I will see everybody in two weeks. Thanks.