Kubernetes SIG Cluster Lifecycle, 18 Oct 2017

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG Cluster Lifecycle 20171018 - Cluster API

Description

Meeting Notes: https://docs.google.com/document/d/16ils69KImmE94RlmzjWDrkmFZysgB2J4lGnYMRN89WM/edit#heading=h.w4xw7yvzdgs

Highlights:
- Reviewed the pitch deck for the cluster api (join the sig cluster lifecycle mailing list to get access)
- Process going forward
- Overlap with the cluster registry
- Reviewed an early version of the proposed node api

A

Hello, everyone and welcome to the inaugural meeting for the cluster API effort, which is part of sync cluster lifecycle today is Wednesday October 18th and since this is our first meeting, I think we're going to started by just sort of talking about what we think we're doing and why everyone's here and I'm sorry that my camera doesn't work. It worked at the meeting two hours ago this morning using the same software and appears to have broken since then.

A

There's nothing I can do to turn it on so I think, maybe to start we'll just kind of go around and if you want to introduce themselves and talk about why they're here and what they hope to get out of this meeting in particular or of this effort.

A

So I'll go first, my name's Robert Bailey I'm listed as cluster ops and zoom, so I can record the meeting work at Google and I'm one of the leads of the cluster lifecycle, sync and working on the coaster, API effort, because I think it's important for us to standardize not just at the sort of cube admin level, but also at the level of specifying what clusters look like to meet a number of different important Hobbs cases. I'm just gonna go across the top and sort of call on people Chris.

A

Which Chris I'm sorry Chris Rousey? You were next in the list: okay,.

B

I see I am Chris Ramsey I'm also on the cluster lifecycle team here at Google and I am mostly interested in the control plane. Definition in the API group I'm curious, I'm I want to see this effort become a standard way to deploy and make it easy to automate and maintain and operate clusters and I think getting the control plane. Definition just right for continuous operation is one of the most critical pieces for that cool.

A

We'll go to the other Chris now! Oh.

A

And we cannot hear you Chris I can see that you're unmuted and that your mouth is moving and that you're waiting all right, Jake up fighting you go wall Chris to fix it.

C

So a I'm Jacob, Beecham I'm, also on the cluster lifecycle team at Google I'm, most interested in the node portion, the declarative, node portion of the API and interested in the project for all the same reasons as the other Googlers.

A

Rodrigo yeah.

D

I'm, a Rodrigo I'm on the App Engine collect cement at Google and just interested no generally in this space, so just kind of listening and learning about this place. At this point, thanks Alex.

E

Hi alex moore, I'm also at Google on the cluster lifecycle team, manage it so I just thought I'd take along and see. What's going on, my particular interest is really about making the cluster ops experience better right and I know we're focusing on deployment and Suns initially, but that longer-term lifecycle of keeping clusters happy and running, and so.

F

E

We can do better than we we do today and this path I think together.

A

Thanks Chris I see if your mics working can.

F

You guys hear me: yes, okay,.

A

F

I am involved with lost your life cycle, I, copped and I wrote, keep accord and now a lot of other people contribute to that project. Great and I believe that the fact that we don't have an infrastructure API is hard for me to swallow. So I really want to see this thing. Take off and I really want to fight for it and then get into devote a lot of work time and a lot of personal time and.

A

Excellence thing.

G

Hi everyone lately, the phone I'd, listen to rejoin the commonality in a Google, so I just trying to help. We are again.

A

Thanks Matt be.

H

Young Matt I'm from a UK company, called just like we focused one of the things we've been doing recently, is on a cloud agnostic, cubanelles provisioner, based on the work that we've previously done with some of our customers. We open source out last week a stomach with a k now we're docked at the blast or API, or it's the very early, humble beginnings of the hunter API work so in to contribute. Really.

H

We've got some feedback of our own, based on what we've put out and just yeah just keen to be here and listening and contribute where we have.

A

Awesome yeah Chris was asking: if there's a link to your project, can you stick that into the meeting notes? Yep. Thank you.

A

I

Howdy I'm cold I work for core OS, mostly planning to be a fly on the wall for these discussions. I have been involved in far too many communities, bring up tools and I would love to see some standardization for at least the entry point for it. So awesome marooned.

J

Video I can see the top of your head. My name is Meru actually work on the cig. I work on Federation for Red, Hat and I was just kind of curious how this sort of tied into the cluster registry thing that some people do own started so I'm just applying the wall for that I guess.

A

G

Jessica Chen also on the cluster like Michael team at Google and I'm gonna, help out.

A

Awesome I think I got everyone. Please poke me. If I missed you and you'd like to introduce yourself I kind of wanted to start by going over a pitch deck that one of our PMS helped us put together for sort of why we wanted to start working on a cluster API. Hopefully this resonates pretty well with people like Matt, who sort of started on this. Everything, selves and Jacob was nice enough to share his screen some time doing the recording and we'll just sort of walk through this really quickly.

A

I know a lot of people have probably seen something like this, or this will not be a surprise, but I want to set a stage of where we think we're headed. So, if you're the next slide, the current state of the world for cluster management and cluster office, as Alex mentioned, is very fragmented, where we have and lots of different tools right. You know you know Chris mentioned just you know.

A

We mentioned that she'd started, Cuba, corn and- and you know, Cole mentioned that he'd been involved in lots of different tools and and they're sort of been a flowering of different tools in the community over time, as we haven't had any agreed-upon definitions of what it means to to describe a cluster right.

A

We've explicitly even said that that's not part of core kubernetes, just in Santa Barbara sent a pull request a couple of years ago to try and put the notion of node pools into core which was rejected and, as a result, sort of everybody has as reimplemented things like node pools. People are employing different types of upgrade strategies, there's not any consistency in terms of cluster configuration for admission controllers, etc across different clusters and result.

A

We don't have a foundation to build higher-level sort of cluster management or cluster ops tools, so things like the cluster autoscaler get reimplemented and talking with the cholesterol scaling. Team they've actually started themselves, also sort of implementing something akin to like a cloud provider interface, which is what we're trying to get rid of in core slash like a cluster API definition interface. That is portable across clouds, so I think we're.

A

What we're starting to see is a lots of different people are stumbling across this same issue and if we can sort of solve it once and in a good way, we can reuse that across many different parts of the ecosystem and get a lot of value out of it.

A

A

So the custom API is, is meant to be a declarative way to create, configure and manage clusters where we have controllers and the kubernetes sense of controllers will reconcile desired versus actual state. If you think about a lot of the existing system, this is this is sort of how they're built you know. Cops has built this way.

A

Gk is implemented this way cubic or has built this way, and so we want to sort of just standardize on on what those look like cloud providers would then implement support to sort of glue the cluster API on to their particular infrastructures of service that underlies cluster api, and we would then start porting existing tools like the autos autoscaler to work against the cluster api instead of to the provider.

A

Api is directly next slide, so sort of from a diagram point of view. What this looks like is: you'd have user, interacting with the deployment tool the deployment tool uses a cluster API, plus this sort of glue layer that I mentioned to actually instantiate a cluster on top of it and a different cloud environment.

A

You then have cluster automation. That can be pointed at the same cluster API so that the cluster automation can introspect what different nodes look like what the pools of nodes look like, can automatically scale or add nodes to the cluster or remove nodes. You can build things like cluster repair or monitoring that also targets that and is able to take automated actions built on top of the cluster.

A

The next couple of slides are just sort of strawman sort of rough estimations about the a my API might look like you know. All of these are just sort of here to point out the fact that we started thinking about what this would look like in terms of structures and then the resulting sort of VM. All that you'd get would be something that you, what is gonna look pretty familiar if you've had to deal with Trinity amo files, but instead of describing your application, would just drive your cluster.

A

So again, those are all just sort of strawman.

A

It's not meant to actually look like the real cluster API, because we've actually started making progress past those initial sorts from and since they're written up a couple weeks ago, from the user point of view, what a user would see is a they might have different tools that are customized for different environments, but where you actually end up with clusters that look the same, so you take the club same cluster, that yam will file, run it through different tools to create clusters and end up with clusters that look the same.

A

In the end right, you might end up with with two clusters that both have in a chain master configuration and three one core nodes, but into in different environments, and then, after that, you could use the same tool on both environments.

A

To do things like scale of clusters, so you'd be able to add nodes or you'd, build us at the autoscaler to increase the node count on both environments using the same tool right so deployment might be specialized tools for different environments, but once you've deployed- and you have the cluster API in your cluster, you can use consistent tooling after that, so some exam features of where we think this might be useful are for doing sort of multi cluster operations across clusters in different environments.

A

So if you have three clusters on Prem and say three I'll measure, you can define a rollout strategy where you're going to roll through all of your clusters and do upgrades over the course of a number of days and- and you can sort of cobble this together yourself today.

A

But you have to write sort of glue that sits on top of the fact that in each different environment, you've most likely used a different tool to create your clusters, and what we want to encourage is even if you're, using different tools, you can use the same obstacles on top of those and then likewise, you know I've tried, add some with Alexis from we've works and he's been espousing, something he calls get ops, which is basically, you check all of your configuration for your applications and ticket and those get automatically reconciled into your clusters.

A

And so you use your version control as the source of truth, and you can do the same thing at this at this level with your clusters themselves and you can check in your cluster definitions and use those to be the source of truth. We've actually does this today using terraform. They check in terraform configs it to get and use that to describe what their cluster should look like, and that works great for their environments. But again it's not portable across environments and a class across companies and across infrastructure, and so with the cluster api. You'd.

A

Be able to make make that sort of get ops, workflow work across different infrastructure.

A

So some risks that we see are that we, the API design here, is gonna, be pretty tricky. We can't make the API, so generic that you can, you know, use the same y ml everywhere, but you can actually expose the underlying power of different environments. And, conversely, if the API is so specific that it works great in one one environment, but but not others or you can actually extract the general pieces out, then it's gonna become really difficult to make things that are truly portable and a good example.

A

This is the cluster autoscaler if your API is so specific to one cloud or if the cluster autoscaler has to be re-implemented to know very detailed information about specific cloud implementations of the cluster API, then it's not actually sort of portable in the way that we want.

A

The second one is community adoption, where everyone today is using one of the tools that we talked about on the first slide and has sort of grown preferences for how they configure and manage their cluster through that experience and getting people to consolidate on sort of a new way of running that cluster through a consistent API is, is gonna, take some work and some effort, and then finally there's it's.

A

It's been a long time since people are sort of thinking about this problem, but nobody's really pushed hard to make it true and the longer we wait, the harder it's going to be to drive people to consensus into a standard and in the last slide, is talking about. So what what we're doing from the Google side. So you guys probably meant notice that there a lot of Google people on the call. Today we have a number of people that were spinning up on this effort to to make sure that it makes some significant forward progress.

A

We've started working on a spec. We once you share that spec and start getting feedback, and especially for people who've also sort of tried to do this and we're starting to build a prototype against that to run on Google's cloud. We hope that people will do that as well on other platforms, we're looking to figure out what tools that we can port to to sort of prove out the cluster API in the near term, and the last bullet here I convinced rpm, to resolve that, and so that's no longer an open open question.

A

It was when we wrote the slides, but the the codes can end up in to cube, deploy repository I just merged a PR hour ago to create a directory for the cluster API, and we can start putting everything there. The nice thing is it's in the communities organization, it's got the CLA bought attached to it and so forth, so that we know that contributions will be covered sort of legally going forward in the same way as the rest of communities.

A

So I kind of I rush that pretty quickly do people have any questions. Comments concerns so.

F

Cuba point is gonna, be where the go code lives potentially would be consuming. Yeah.

A

So the idea is to put it there for now, until we have a better home for it. So the problem is right. Now, the with the sort of recent election of the steering committee, the incubator process, has been put on hold the process for creating new repositories in communities is also put on hold I talked to the cluster Federation folks last week, and they have their own repositories that they created like two days before that hold was put in place, so they kind of squeaked by having their own cluster registry repository.

A

But at this point, new repositories are not being approved and and nobody's graduating from incubation until the steering committee figures out sort of what that whole first process and procedure should look like, and so Brian grants had recommended that we sort of squat in an existing repository, cube, deploy is, is almost completely unused at this point and we can sort of use that as a reasonable home for now, but the intent is not for it to live there forever. That's just a intermediate step.

A

While the steering committee works out some processes and we can argue for a more permanent help, I.

I

Had a couple questions about scope, it seems like there's, potentially a massive massive massive huge scope here, especially if it includes things like doing cluster upgrades across Louds. Considering that's a problem that virtually no one has solved even for like single clouds, I'm just curious what the explicit goals are. Is there a goal to homogenize the top-level input? It seems like? That's, that's something and everyone agrees upon. Is there a goal to make clustered portable between tools as well, when.

A

You say portable between tools. Can you? What do you mean by that? Like.

I

I deploy a cluster using tool. A using you know, cube using cops and then I decide that I want to use the new hotness cubic horn and I want to. You know, use cubic horn to drive an upgrade now should I expect that my cluster is portable between these tools, because I'm using the same API, I. Think.

A

What we'd expect is that tools that target the API would be portable, regardless of how you would deploy your cluster in the first place, it's not clear that that has the following implication that you'd be able to sort of switch tools on the fly so say you use, cube spray to deploy your cluster and that setup sets up the cluster API and a reconciler that uses ansible to provision and installed new codes into your cluster.

A

You couldn't necessarily just say, because it has a cluster API I can now use cubic corn to provision and add new notes that cluster, because when you tell the cluster API, you want a new node. The reconciler that's running at is gonna. You know be using ansible, based on your configuration, leaving some it's possible that you could replace that reconciler with a new one. That you know is the cuba, corn, reconciler and and sort of switch switch midstream that way, but I don't think that's an explicit goal.

A

So much is the fact that the ops tools can be consistently used across clusters. Okay,.

F

The API is more of a I, can replicate and not I can look into it.

I

Or maybe, maybe put another way. The goal is not necessarily portability of deployment tools as much as it is add-ons that you might run in the cluster that need to interact with the cluster and know about the cluster and have a concept of a high-level view of the cluster. Is that fair to say, because it seems like we're making a distinguishment between tools that actually deploy the cluster versus tools that operate on the cluster, like the autoscaler? Well,.

F

I think I think to me, though thinking um going from 0 to 1 should be should be no different than going from 1 yeah.

I

That's that's actually exactly. My concern is that the autoscaler is probably going to need a need to know about some things that exist beyond the the API as well like the autoscaler for adjure, for example, you know the depending on which one you're looking at both of them have you know dependencies that aren't obvious from the API level on how the how the plates are created or how they're managed so.

E

Let me try this and see if it's resonates right, like cute cuddle is different from the cloud provider backing. Our API is in the core, kubernetes right and so I'm wondering is if this is a way of splitting tools into I'll, say user facing generic tooling.

E

That is like the command-line tools versus the cluster API back-end provider, and so at least how I don't know if it's possible, but in an ideal world we would be able to make most of the provider bids, be hiding behind the API, whether it's cubic or nor cops or so on, and you wouldn't expect to be able to shift between cops as a provider and cubic or as a provider and adopt all the data structures of one versus the other.

E

But if I'm a cluster admin monitoring these things, if I want to kick off an upgrade from version a to version B, maybe that's the cube, a DM style tool equivalent that sits in front of this cluster API and you can call it and say: upgrade node 57 from version a version be.

I

Sure that that makes sense to me I guess, I guess I'm a little confused in reconciling that with the the using the cluster API as an example or using auto scale or as an example of why the cluster API would be useful.

I

It sounds like you're kind of lumping in the autoscaler with the deployment tool there, which makes sense conceptually from what the autoscaler needs to know, but it seems to not necessarily make the best example that I've what the cluster api would be usable for.

A

So maybe this would help so if, if you have a consistent way to add capacity to your cluster right, so the autoscaler is is automation for adding an asset if your cluster, but even if I'm doing it manually and I, say I want to add three new nodes to my cluster. If there's a consistent way to say you know, create you know, queue petal create a CRD resource that describes that I want a new node with one core and that's it, and that at a high level can be made to be.

A

You know consistent across clouds.

A

Then you could say, give me a core and the fact that I'm running on a sure the provisioner knows how to map that one core node onto an azure node, that it spins up and initializes and joins to my cluster and I. Could I could run that same command against my GP cluster and the GCP provisioner knows to add a single core, node and initialize it and add it to my cluster. Then the autoscaler effectively just automates that process where it can look at the set of nodes that exist.

A

It can say if I were to add a node like this and run a simulation. This pod would now go from pending to scheduled. Let me declaratively, say I think a new node should exist. It looks like this and the provisioner for whatever cloud it's running on could add that node and in the pot you become scheduled. Okay,.

I

A

I

That makes sense, I guess it. It just doesn't happen to be the way the autoscaler works today, but it makes for my firm an ideal perspective which.

C

Is in the autoscaler has kind of two responsibilities today, it not only decides that we should scale up and how large the cluster, but also needs every cloud provider that we support to know exactly how to create a new node and Azure versus GCE, and so it hasn't. It doesn't have this intermediary interface, where it could just decide. I want to scale up- and this is something close I want and then have a completely independent process, actually fulfill.

C

That yeah, like let's say, GCE, but in a new kind of way that also works device manually scale, my cluster or, if I, also specialized GPU scaler. That knows about my workload away from my machine learning work, and so we can do having that. Inflection point allows both sides to evolve independently, instead of one tool having to do everything and yep.

I

Totally makes sense, I think that I think the declaring the the having that flexibility point was kind of the piece I was looking for. So thank you. Yeah.

E

I'm, sorry, not maybe it's dead horse but like we have replica sets, you just changed the count and all of the mechanics of doing it or hidden behind you, but that changing the countess, nice API I.

F

Have questions about process if we can? Yes, oh yeah, if we I'm kind of imagining we're all sort of sitting here, shaking our heads yes say and like okay, this sounds like kind of thinking of the same thing, at least vaguely to begin work on on this. I imagine this was sort of a proposal. Style exercise we're going to be going through. What does that process look like, and how would we start making formal proposals of what an api might look like one day.

A

So I think Jacob had started working on part of the API might look like and he's you know, writing it up in the committee style with with ghosts, trucks and we've been trying to get get some initial feedback from some people inside of Google, like Tom Harkin, who have a lot of in-depth knowledge both of kubernetes and of trying to write sort of cloud provider options so timid, closely involved in the ingress work and in the volume work which are both places where we've tried to create generic interfaces that have very different implementations and so I think.

A

The next step is to take that and turn into a PR against cube, deploy, and you know, get some initial feedback, but get something checked in that we can agree is as enough of a straw man that we can writing code around to test it out and iterate from there.

F

Okay, I think that's I, think that's a great idea. Any any idea on the ETA for a pre-alpha version of the new API.

C

So, for one of the things that we did early on was distinguish the fact that this is kind of a massive effort and there seems to be one overall concept of defining your control, plane and all that stuff and there's this separate concept of how do you define nodes and so I've been heads down looking at the node side and Chris Rousey has been ramping up on what should the control plane definition look like so for at least for the middle portion, I could have a PR later today.

C

In fact, if we have time in this meeting, I can go through some slides of the way I've been tackling it and how I've been thinking about it and what it looks concretely like today, but yet for at least the no portion I have something that is interesting enough to discuss I'll, say later today. I think.

F

Going through the node portion would be helpful, especially.

F

And then I one thing in the name of the control thing that I wanted to mention was: we had talked about it being almost a one-to-one mapping of cube ATM moving forward. So yes, there's a possibility that there may not be much work from us on that front as well, but yeah anyway, I wrote going through the node stuff anybody else, yeah.

C

I mean if there's anything else, that other people want to discuss first or wanted to get out of this meeting. No otherwise I can just immediately launch into my to socialize I will.

B

Just mention that as far as the cluster API go so I hope you get Strom in out in a PR, for we want to review sometime late week. So that's a general ETA for a discussion starting point on that. As.

F

Soon, as we have an ounce of something I'm going to try to glue that into Cuba corn, so we can at least start playing with it. Okay,.

A

F

J

I have one thing I'd like to ask before you dive into the technical, so on the multi cluster side, there's been effort to create a cluster registry both for discovering of clusters that are available to give an organization and also for Federation to know like which, which clusters to target and I guess I'm, just wondering like, is there any overlap here? It kind of seems like if you're using this cluster API to you know manage your clusters.

J

That would be an easy entry point for discovery for the class of users that just want to use a cluster registry. For that use case, and maybe would you know, provide a way for Federation to ignore a lot of the details and just you know, point to the the clusters that are managed using the cluster API I'm just curious. If there's been any discussion with the multi cluster team around this issue,.

C

Yeah so so early on, when we were brainstorming, scope and as people have brought up like scope, is so important in this project. So we don't just boil the ocean and take forever to actually deliver something. We were doing a lot of brainstorming about. Do we want to completely like subsume the the Federation effort? Do we want to be able to have a single control plane? How is the the definition for multiple clusters and allow you to reconcile remote clusters and have like one master cluster?

C

That's actually doing the reconciling across many different clusters and then, when we found out about that, the cluster registry project, it seemed like a really nice delineation of concerns would be a cluster API for for a single API server. That's housing definitions for this. It would be it's. It really simplifies things if you say it should only house the definition for a single cluster and that any sort of aggregation you need across clusters should use the cluster registry so that it just has really nice semantics.

C

If you need listing use the cluster registry to first discover all of the clusters that you care about, then once you connect to them, then you can use the cluster API to actually actuate each one of them independently and that decoupling I think really simplifies things. But if there's a really strong need to to revisit that separation like this is the time to bring it up. No.

J

No I'm, just ignorant so I was just I was just curious. It sounds like cluster Registry is still useful. Kind of is like a more of an index than anything, and then, if you actually want to do work you it on to the quest rate yeah.

C

And if you want to write a controller or a tool that is multi cluster, aware I think it's reasonable to say here is the endpoint from a cluster registry and use that to discover all the clusters, and then you actuate them based off of that. Instead of having the cluster API attempt to do both scenarios or handle everything. Okay,.

J

I, just I wasn't clear to me that the cluster API is specifically like single cluster oriented. Okay. Thank you. That's.

A

That's a really good point, though, because it wasn't clear to us when we started thinking about this either whether the cluster API was a single or multi cluster, API and I. Think as Jacob said, we've we fallen on the side of single cluster API because it's a lot simpler and because there's another group of folks that are actively working on the multi cluster abuse cases and and Chris asked in chat. If we implicitly create a registry by doing this and I think the answer is, is not exactly I think we can.

A

If a registry exists, we can automatically register into an existing registry.

A

We could, if we wanted to like specific deployment tools, could create a registry inside of the cluster right. The registry right now is implemented as a pod that you can run in a kubernetes cluster, so we could. Cuba, corn, could spin up a cluster install the cluster api and it's also registry inside that single cluster. I think they're also going to be used cases where there's sort of an external cluster registry, and you want to create or cluster and register it with with that separate registry as well.

C

And a bit of an open question that I think is a little implicit here is how, if we're doing a kubernetes tile API, where do these objects actually live and for the node portion for all the use cases I've been thinking through I think it makes a lot of sense for them to live in the cluster that where the nodes are supposed to exist, it allows you to to actuate them on the cluster itself. You can just cute cuddle, apply or to cuddle, edits and and create new.

F

A

C

You to run the cluster autoscaler there and and but but there's another there's another problem if you actually host the definition of the control plane in that cluster- and we don't have a really solid answer for that.

C

But if you, if the controllers that are trying to reconcile that states just generically point to an api ten points that gives us the flexibility of either housing, the definition for the control plane in the cluster itself, which might be completely safe to do in like an h, a environment we have multi masters and destroying one master doesn't mean that you completely lose access to the control plane or it allows you to house them.

C

You know the authority of what your control plan should look like completely off of the cluster, and it gives you that flexibility without having to bake it into the actual API, where these objects should live, and that might be a safer operation. Just in case you accidentally completely wipe out your control plane, you don't you don't prevent. The reconciler is from being able to even access what the control plane should look like to be able to fix it. um But I I don't think that we have that same problem necessarily with the node portion.

C

I, don't know if that was actually implicit or if I was just reading into things too much, but I just want to call it.

C

Cool, so if there aren't any other questions, I'll just steam forward interrupt me all you want screen share again.

C

Let me figure out how to screen share again.

C

C

Okay, so the naming for the first of all, can everyone see the slide so far? Yeah yeah, the the naming for this has been tedious and it's gone through a lot of revisions and will probably continue to go to revisions right now.

C

I'm, just calling these machines, we already have the concept nodes in a non declarative way in kubernetes core, and so we kind of want to introduce the concept of how do I creates a new node with a specification and so I'm just calling it a declarative, node a machine and I'll go through the specifics of that skip through a few things.

C

What actually? Maybe we should go through these temps, because this is this kind of important context, so for the overall cluster API efforts and and also for this machine's API, we've strictly said we're not trying to get into kubernetes core, as Robert said, like the the president has been set, that core kubernetes does not care about machine management's and there's even been a proposal that was turned down to try to add no concept of node pool to kerbin and a score, and so all of this is going to be strictly outside of kubernetes core.

C

It's it's a little TBD whether these will be API extensions or using API aggregation or if there will be custom resource definitions. We definitely want it to be kubernetes style, but we're not trying to get this into the actual kubernetes codebase and also, as a general goal, we're trying to separate out in order to maintain portability. It's important to call out what types we think are reusable insurable across clusters and completely different environments versus the types that are strictly tied to exactly one provider and versus the types that are strictly tied to a specific cluster.

C

So just being clear about the portability needs and and.

C

Considerations to take in with different levels of the API and so I think cold. Call that like what what are we actually trying to solve, what our use cases and since I'm just focusing on nodes right now.

C

The priorities here are a little in flux, we're trying to get a better understanding with our PM from the Google side, with what we think we really want to deliver this quarter versus what we should ultimately have in scope, long term. So just going over a few things, node creation super simple. So, given some form of templates that somewhat describes your node, can we create a new node, and can we delete notes, specific note? Deletion is important for a few tools like the cluster autoscaler doesn't just want.

C

Let's say we have a notion out of a node pool like a group of nodes with just a number to scale up how many instances you want it's not really powerful enough for something like the cluster autoscaler, which actually wants to look at the workloads running on specific nodes and target the ones that are nearly idle and delete. Those specifically, instead of just generically, saying well I, want a few fewer consensus. Please randomly delete them.

E

That also include draining notes were like if I, maybe you could talk a little bit about how it does it just make it disappear, or what is the integration that we foresee? The users seem.

C

A little TVD at this point, presumably we would drain I, don't know if that should be so right. Now. All of our draining code, as far as I understand, is client-side. It's baked into cute cuddle, it's baking into other clients that want to specifically evict pods from nodes before they're, taken out of service I, don't know if that should be automatically this API, but we at least need some form of I would like to delete this node.

C

So it could be the responsibility of whoever is deleting the node to first evict all the market unschedule evict all of the workloads from it and then delete it or have that part.

A

Of it, yeah I mean right now that code is duplicated right like take upside, it's in queue, petal-like, cerrado, scalar, also unschedulable, x' and drains nodes. Before removing specific notes to as the gke resizer, there is a existing issue to move drain into the server and so I think that as we go here, that's we're gonna, try to bump the priority of that issue. And/Or help implement it because, like other things that started out in queue pedal as client-side code like the draining should really be done.

A

Server-Side from the kubernetes point of view, is there community.

E

Reaction thoughts on that I seems reasonable to say that you know deleting a node should also be able to train it first and then delete it and then support the course force. No, do it real, quick, no, no, wait, delete now kind of style, but if that's a or maybe we just continue sorry, you.

C

That flexibility, now, let's assume that deleting a node actually just rips it out of existence as soon as you say, I want this to be deleted. You have the flexibility of first in maybe in a workload aware way or in just a generic I wanted Victor the pods first way you can do that and our API doesn't necessarily need to represent that. But if we, if we build in the behavior that you always safely evict, then it removes that flexibility I understand that we could have the strategy of how to delete baked into the API.

C

But you still have a lot of flexibility if it's just rip the node out of existence when we call delete.

H

Just an observation come here, um you know in the mode or the machine you're sort of assuming that it's gonna have an OS image and the Kuban a nice version and is. Is there a thought here that you would say you count more generic machines that are used as part of a Cuban in this cluster, such as net CD instances, because we we've can't face the problem that when standing up a cluster, it is not just a couplet machine and across the patrolling machine.

H

You also, as part of the cluster needs, to stand up at CD, we're also, in our case easing fault for standing up cluster pki. So we would really like to refer to all machines in the same way, because otherwise you start having to differentiate the way that you refer to an SD machine from the control plane from the nodes. Yes,.

C

So it's definitely thinking that nodes would have this notion of a role which is ORS cubic or now, as those cops has this, where you say I want you to create a node, and this is strictly a master node or this is strictly a worker, node or its master and worker. So it you should be able to schedule workloads on it.

C

So do you run machines that you don't want the rest of the control plane on, but you still want to say: I want exactly at CD on this, but I also don't want to blitz exactly.

H

C

It's just kind.

H

Of consistency, otherwise you have to start dealing with those different way from a consistency perspective. It wouldn't make a lot of sense. I mean we've got atomic uses spray. If I'm jumping head it's coming.

A

I'm actually really interested to drill into your use case, so you have a net CV at CD machine and your are you running sed just as a like a systemd process, I think most people running through rallies run at CD inside of a container, which means that you would also be running a cubelet on that machine. But it sounds like you guys, aren't doing that. No.

H

No, we just just because iris, which is kind of actually just running it through standard systems, is actually the most well understood you trying to debug it and we're running, though some dedicated machines as well.

C

Interesting things yeah, that's really a few initial responses to that, which is the way that the API is structured right now is. We could continue to add these. You know enumerated things that we think would be these numerator roles that would potentially have use across different environments, and one of them could be important enough or significant enough to say. I want an at CD only mood. If that's a pattern, we think that's going to be generally reusable.

C

The other thing is that I think, as Robert said earlier, like the power from this from this API comes from the ability to actuate after deployment and I. Don't know if, as an operational thing, using the standard set of tools to be able to operate on kubernetes nodes, if you want to be able to take that at CD, node out of service or add another at city node, so at another potential way to tackle this is you can still have a completely custom installer?

C

That's says in addition to all these worker nodes and the normal master nodes. I know how to create these other hosts that are Etsy, be specific and don't have cubelet and I'll just create those as special snowflakes and then later on. I can still use the cluster API and other existing tools to work. On my you know, strictly speaking worker nodes or to upgrade my control plane or all these other things, but maybe you won't get the full value of the cluster API for those EDD nodes. That's just one other way of looking at it.

C

H

I think in that case, that would produce, um we would sign up to I mean we would have a more dedicated controller. That might be responsible, for, for instance, is a rolling upgrade of xcd itself, yeah I'm, just thinking that is when we're picking up a cluster I'm, just think of the other roles that might play a part you would want to deal with in the same way, but yeah I mean I recognize it's probably more specific.

H

C

Is a really interesting use case? One of the other factors that we've been conscious of is not only do a lot of people want to create a cluster, but they also want to create a whole lot of infrastructure around it to support the clusters needs, and so we've kind of been struggling to draw the line for where this specific API should end, and you should start just opting into other better tooling for those purposes.

C

So I think the last thing we want to do is accidentally re-implement terraform, which allows you create completely arbitrary cloud resources and is really good at doing that. But those resources might be completely orthogonal to bringing up your cluster or maintaining it over time. So I don't know exactly where the line is drawn, but there's definitely something we want to stay away from I. Think completely.

H

Great I think we've we've been faced with the same challenge as well like: where does the line get drawn? See? I've really really happened anywhere where, where it goes where we could stick to eating those in the KPI versus actually it's just sitting outside denying a place.

C

Cool any other special considerations for I thought this would be like just breeze right past this slide, but note this is super interesting stuff, so anything more specific for note, creation or deletion. Any specific concerns I'm.

I

Curious, what does what does no deletion look like from the perspective of a declarative API, and is that something like our be imagining that all of the nodes are listed one by one individually in this cluster API? Or is this just something that's called out from like an all? Around ecosystem needs to be able to do.

C

I

Node deletions.

C

So at this point at this slide, it was like all around. We should have this ability to give you a brief preview. When we originally crazy on on how to represent nodes, we started with the concept of grouping. It's just so natural to say we're gonna have a lot of a lot of them, are gonna, become obvious. I probably want to be able to scale them up and down.

C

So we just started with node pool or node sets, or some way of saying, live--I templates and then I have a number of those, but based on a lot of feedback that we got. We think it's more important to to really focus. First on. How do you define a single node? There still seem to be strong use.

C

Cases for I just want to experiment with this one-off, node and I, don't necessarily want to create a whole sets that it's just size one, and so this is absolutely up for debates, but the way that I'd been thinking about it so far in the API is to have an API object. That declaratively says I would like a new to exists and I would like it to look like this right now. The concept of nodes as an object exists in core and they have a spec and a status. They aren't. This isn't really declarative.

C

You can't just say you know it reports the version but cubelet that it's running, but you can't just say in the spec well I'd like to run this version then have it upgraded. So so, if you imagine a node as it exists today in core as an entire status of the cubelet and I want to create a new object, that was actually the specification for that.

C

So, if you create one of these, it's kind of like you can create a pod at first its unscheduled and then ultimately well to a node, and you can delete that pod and get some that node. You can create this declarative I'm, just calling in a machine at this point you create a machine with this of the case of life and mode to look like and then that node should come into existence. If you're in an environment where they can heat. Auto provision and then it won't delete that node.

C

Instead of deleting the node object, you delete this declarative object, which is the machine and then the underlying controller should take care of deep provisioning, a node which think it's unregistered. That's right.

I

Just to be really crisp to kind of clarify if I have you know, two node pools each with 50 nodes and them you're, implying that I'm going to have a hundred different machine objects than expressing the description of each one. That was.

C

My current thinking, it doesn't forbid us from also having the concept of machine set, which makes them much more manageable, and so the relationship would be very similar to how you have pods and replica sets right all machines and machine sets, and so, if you don't care about the individual management of every single machine, you could just create a machine set and then scale it up and down, but under the coverage that might be translated.

C

200 different machine objects just like that with a hundred pods or a hundred replicas, it's going to actually create a hundred pods for you. Okay, make sense cool anything else on the sledded I have.

F

A question about note creation: yes, one of the things we see a lot and keep according we saw this in cops was users who want to bring their own aftermarket logic while strapping a note. A really simple use case is I. Have some security tool that I need to yum install before I can run kubernetes pods on whatever right? Some are security logic that like user assistance and the trainer's want to run. We turn our notes down to MIT and cops and said it was an anti-pattern.

F

We do support it in cuba, corn and my question is: is there a place for that in the API, or is that a controller level? We don't care about this.

C

Higher level abstraction, so I thought cops had the concept of hooks at this point and that hooks would run on nodes. There is hooks, that's right and.

E

I'll, throw out like compute engine VMs if I have the notion of startup scripts, and so this is a really common style of of thing. I think what my take would be. We should enable you to pass such blobs to the controller. Is it worth having a standardized mechanism across all controllers to support that functionality? That's a fair question to write.

F

Like on one hand, it could be up to the controller to have to sort of care about what the actual implementation of bringing up the node looks like and then, on the other hand, we could actually have a blob that we could define that says, go run these. You know bash script. These fast commands in an order or something yeah.

A

Also mention that on gke we also sort of turned out I was up at that that pattern and said that we aren't gonna support no specific startup scripts and you should use daemon sets instead and that we initially got some pushback from customers. But recently people have not been asking for startup scripts anymore for no customization.

E

Is that applicable here in the sense of okay? We.

F

Do have the difference of as a service.

E

And more standardization trade-off versus more flexibility, and so I I. Don't know, I want to say, like it's, not clear that that's a decision that carries over without re looking at it, yeah.

A

I agree: I'm, just I'm wondering if it's one of those things that you don't put in the top level API to start with, until there's specific demand for it and and consider the offering use cases as opposed to. We think people might eventually want to do this.

F

Yes, I mean I have concrete cases there in the cops in Cuba corn issue tracker.

A

Excellent I, like this I, will do a quick time check. We have six minutes before the meeting specifics attend my proposal to get to the other three slides on capabilities today and then Jacob send a PR with the actual API struct definitions rather than trying those during this meeting, and we can review them next week.

A

C

The concept of what an OS image is could be different in different environments, different if you're in cloud. Maybe this is a single string in GCE. We actually have the this, like nested structure, to define a single OS image that actually has the project and family in an ID and if you're on premise, your concept of an OS image might actually be more like I'm using Debian met this version, and maybe some other metadata, but on a per node basis, again kind of.

C

If anyone feels strongly, this should be on a per group basis or shouldn't even be considered for this API. It seems useful to be able to actually upgrade nodes, and that could be so. We want the way that I'd raised it here. We want that it the notion that I want to upgrade this node to be declarative, but the way that gets actually it could be different in different environments.

C

So if you're in a cloud environment, the controllers that you've set up mites just strictly to replace the node with a brand new node, conforming to the new spec, and that was bound to our declarative notion of a machine boy for your ants on premise. It could just be actuated by apt-get upgrade or some other in-place installation that doesn't actually replace the node. But as a capability, upgrading the operating system or whatever definition that is in your environment, seems useful, independent of being able to upgrade the kubernetes version, which would be the actual hewlett.

C

That's running, ignoring the control plane. At this point, just the the cubelets, maybe container runtime on a specific node.

C

Anything on that before I go to the next slide. Actually, let me blast through just the next two slides and then we can come back if there any questions or comments. So I kind of alluded to this very similarly will specify the container runtime that you want to use, and it's version so I, don't know how big of a use case it'll be saying.

C

I want to switch from docker to certainly being able to say, I want this machine to run docker at this exact version, and I would like to be able to independently appraised that in a real, you know node by node, without affecting the version of kubernetes or the operating system image, but this adds lower priority, because this is something you could achieve by making a new OS image if you're in the cloud just have it. Pre, bundled in the operating system.

C

Image and I think it called out as a separate field, but it seems like really useful functionality to be able to Rev container own time, independent of cubelet, independent of operating system and as a super low priority. This might be addressed with the concept of bootstrap scripts or might be useful from an overall like what does kubernetes look at like as a distribution, but we could use the same sort of if we're already versioning the cubelets and average this amount and your container runtime. What?

C

If we extend that to be any importance software that you clear about to be on the node? So that's you could apply a critical security patch to open SSL without having to bump the version of kubernetes or use a completely new operating system image.

C

You could just say: yeah I mean there's no bonus assemble and it's probably something you do as an in-place upgrade, but you can do that declaratively in the API I'm, not calling that out as a super important deliver immediately, but just how I've been thinking about the evolution and where this could go in the future and then Auto Skinner's interesting use case that we brought up after talking with with other people in Google last week.

C

I think this is a less important thing to immediately focus on, certainly I think the project wouldn't be as successful if we couldn't eventually rebase the autoscaler on top of it, but we already have auto scaling for a lot of different cloud providers. It's not new functionality that we're enabling it just gives us kind of a better, consistent view of nodes. So, but but if we want to tackle auto scaling, they have these concrete requirements of being able to do accurate predictions.

C

So, instead of just saying, I want this instance type and I would like several more machines of it and then once they come up, I'll then figure out how much memory they have and how many CPU cores they have in ours just enough to be able to actually predict ahead of time if I scale up this set of nodes or create this new node on the fly. What are the capabilities that we'll have after that?

C

I I know that I should scale up this kind or I should scale up ten of them instead of the first one, and then wait to see that that doesn't actually fool my requirements scale up another one wait for that to come up it just being able to predict how many you need to scale of what comes and then there's a lot of price modeling in the cluster autoscaler right now, where they have a waiting mechanism to look at the different kinds of machines and actually figure out what is the cheapest way to scale the cluster for the pods?

C

That I need to scale right now, instead of just being completely ignorant to cost, and you know scaling up what looks like a good machine, you could actually have these weighting mechanisms to say: I'll try these they're the most efficient in this, in this case right now. Okay, so I've reached through those- and we only have a few minutes left, but anyone want to add more table we'll keep them to lose that they really feel strongly about or want to drill into any particular one of these and discuss their needs.

C

I

Very last one seems like a huge seems like a like scope creep or something and the well it's. The impact of this seems like more like an action item for cluster autoscaler itself, rather than the cluster API I mean it's the cluster API going to include these price Testaments or uh so.

C

I I can get into there's a lot of technical details there. The way that I was thinking about it was making sure that we wouldn't build any sort of pricing into the node portion of the safety either the machines API, but maybe we support all of the providers provider agnostic inputs that they need in order to be able to model the price in the way that they do today. Otherwise, it's kind of a hard stop from them that they'll never rebase. On top of this api.

C

Unless we support these inputs, they can completely own if they need to make additional api's on top of an api or if they need to build additional to lean on top of it. But my thought on this was just making sure that we could strawman and and have the inputs to their pricing models. Okay,.

I

I got you I understand, I.

C

Think we're at time, if there's any other concerns, you'll reach out to any of us, I I think the best place is probably. We can certainly wait till next week's meeting, but I think a conversation is still going on. The cluster lifecycle, mailing lists or in our slack Channel, certainly reach out to any of us individually. I'm gonna stop sharing my screen the reactor.

A

Yeah I think thanks, like the highest point, may be the conversation mark if people feel like we need another meeting sooner than next week to keep going over some of these things, I'm happy difference on the calendar also.

C

Cool otherwise I'll send out PR with more context about the actual types. Am I thinking behind them so far and I look forward every once you've done.

A

Fritz well thanks, everyone for coming I hope to see you guys all again next week same time and I'm stuck in the meantime. Okay, soon.