Kubernetes SIG Cluster Lifecycle, 14 Feb 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG Cluster Lifecycle 20180214 - Cluster API

Description

Meeting Notes: https://docs.google.com/document/d/16ils69KImmE94RlmzjWDrkmFZysgB2J4lGnYMRN89WM/edit#heading=h.xvr33m5suu00

Highlights:
- CRDs vs aggregated APIs with @erictune
- Splitting api groups
- Type for provider config
- Terminal vs. transient errors
- Container runtime in machine spec

A

Hello and welcome to this Valentine's Day edition February 14th 2018 of the sig cluster life cycle, cluster API, breakout working group. Today, we are lucky enough to have Eric toon joining us after our conversation last week about CR DS versus API or Gatien, we looped eric in to come and sort of continue that conversation and hopefully get to a good conclusion, so for short of for both sort of short and long term answers for the future of the project, so I don't know Eric.

A

If you had a chance to read through any of the notes from last week or if you're, going mostly on on what we spoke about offline.

A

And you're muted, we kind of flipped through your presentation last week, I know if you wanted to walk through the slides at all again or free to talk about it. Yeah.

B

Do them slightly to reflect the latest thinking, but they're not they're, pretty much. The same. I did not have a chance to read your notes. I may be. The easiest thing to do would be to do QA or if you fingers met people who haven't read them. I can flip through them quickly.

A

Why don't we flip through the slides, quickly and that'll? Give us a chance to answer? Ask questions about them as we go so I think I had I had a couple of questions. I'm assuming other people might also okay.

B

Tell me if you see my screen share.

A

B

Okay, so my my pitch, is you see our DS whenever possible, see our DS has way more users than aggregated here's, a bunch of people that are using them and seem to like them? I've talked to most of these people issue I work with very closely and talk to them like at least weekly on their needs. Around CR, DS and I am highly motivated to keep them happy, as they are close peers and I.

B

Think also other people that are in the kubernetes space outside of Google are also exciting ideas TL and share a motivation to support that that use case as some of the easily installs on your kubernetes cluster there's. Also a lot of people are building operators and there's a much other peripheral thing. So a lot of people are succeeding with CR DS, so the momentum I just talked about is office code that you need to.

B

If you just want to do basic validation and define a schema, then you have to, for you write no code to use C AR DS, except for your controller, which you're gonna write either way. If you want to do fancy things with validation, then you end up. Writing some code for that and it's similar to an aggregated, API server.

B

But you end up like the Matco you forked and after rebase is you know more than ten times smaller and you don't ever have to run your own SCD assuming you have a relatively small nor objects, small meaning the number of light clusters or cluster objects is much smaller than the number of pods.

B

Then you're going to be in the noise in terms of the API server, the SE DS, you know load from your resource, so you shouldn't worry about that and there's one less moving part in your cluster in that case, so they can do a lot. Currently, they can do quite a bit of validation, I. Some examples are like min and Max values for fields: pattern matches for Strings, uniqueness requirements, one of befall that link to open API, v3 you'll, see there's a ton of things that you can do without writing any code, just a schema.

B

We you can or passive 1-9. You can do any kind of validation. You want using a validating webhook people have used those with good success and you can do any kind of crazy defaulting. You want using a mutating webhook, so that's also been used by sick with success by the nest geo project.

B

So the the biggest gap we know people have asked for is multi version support, we're trying to squeak something out for 110 that will at least allow you to like promote from like a b-1 alpha, 1 to V 1, beta 1 or 2 V 1. Whatever, with there's no changes in your schema, just communicating like yeah we're. This is more production ready and then it probably not in 110, but by 111.

B

We're gonna have at least the ability for you to like rename fields as you go from versions to versions, and then we definitely will eventually are gonna. Have an escape hatch where you can do arbitrary transformations is probably worth having a conversation. If you guys want about how versioning and storage sharing works in kubernetes, if that's not clear to everyone.

B

I'm going to fast, but just button, if you have questions so and then the whole CRT thing we have a goal by sometime this year to like bring the whole thing to v1. So you shouldn't be worried. It's gonna go away.

B

One thing: that's gonna happen is we're gonna revamp like how strategic merge, Pat's and apply works this year and if you're building an aggregate, API server, you have to rebase your code to pick that up if you're using a CRT, the expectation is it'll just magically happen when you upgrade the API server that its installed in so again, less rebase pain with CR DS times I would say to use API irrigation is if you're a facade in front of some other kind of storage or API, or you need to like use a fancy sub resource like connector stream.

B

umm I, think you guys said you might have had a case like that, which is where you wanted to maybe SSH to a machine or something.

A

uh No, why don't you finish the presentation, then I'll I'll? How do you go back and I'll start questions? That's.

B

The whole presentation- ok.

A

B

Happy to go back all.

A

Right, so if we go back to slide 3 we're talking about benefits and try to figure out how those apply to us in particular, so the forking rebasing code thing found that there's an api server builder, which allows you to auto generate most of those. You know six thousand lines of code which I think significantly decreases. The pain of forking and maintaining your own code Prabhas and get rid of the rebasing pain entirely, and some people last week mentioned that rebasing can be rather difficult with the api server.

A

I think that sort of reduces the impact of how painful it is to maintain that for code base I that.

B

The person who wrote that one of the people who wrote API server builder is now working on a new version that can span either a CR D or an API server, based on your preference, cool.

A

Yeah, that's good to know so for running a separate Etsy D. You talked about how if number of clusters is or machines as small, compared to pod, that's sort of in the noise in terms of storage, but one thing we've talked about is for sort of reliability and availability. We actually would prefer to have it on a separate storage environment so that being able to figure out the desire to sate of your cluster as possible.

A

If your cluster control plant itself is down, in particular, if you're modeling your master machines as machines in the machine API and you screw them up, it would be nice to be able to recover from that broken state. And so your.

B

Concern you want to, like software fault, isolation by like partitioning a simpler API from a more complex API, and it's more likely to break yes.

A

And I think that the main kubernetes api server is more likely to break, especially as everybody else starts, using CR DS and increases the chances that it becomes fragile in unpredictable ways and having the machines and cluster API in a separate Etsy D. That's less likely to go into an umlaut if we're not doing proper, garbage collection of leaking job objects or something seems like you, good reliability story for us that.

B

Does make sense, I think that is a strong, a good reason to want to have separate storage I'll point out that another thing you can do, which I maybe should have talked about these slices. You can run the released latest release of kubernetes api server turn off the other api. Is you don't need it basically have no nodes and install your resources CRD inside that API server so again, you're actually not owning any code.

C

Yeah but I think he's just pointing out that not running a separate ed city is not necessarily a benefit to us. That is I agree with.

A

That ok and the last one I think lower net memory. Requirements is tightly coupled to that sort of same deployment scenario, where that's not necessarily a benefit to us, because, regardless of whether it's a forked API server or our own, with all of the default api's turned off it's sort of the same footprint in terms of resource usage, yeah.

B

If you guys feel strongly about default isolation, then I would say continue on your current path of using aggregation and at some point, if rebasing becomes difficult, I'd be happy to have a conversation about how to.

B

Switched using the released API server binary, but that would mean you would want to be disciplined about not doing non-standard things in your API server code right.

A

And I think that's sort of the. The conclusion we came to last week was that short-term, tactically, for a number of reasons, including the fact that, like a lot of stuff, that's on the CRT roadmap. Isn't there today and we want to have stuff working today. It makes sense to continue with a great API server in the short term.

A

I think what we're really looking at is: where is the sort of long-term convergence, and we want to always have a separate, a great API server, or do we want to sort of converge back with the desire to have people using CR DS? Even if we have a deployment model where we run our own API server binary? If it's not a fork, API server, if it's just an API server with CRTs installed, you.

B

See API is being one to one with a kubernetes cluster, or is it more like its global in the sense that it might manage lots of clusters.

A

So it's the latter I.

B

Want clusters in US and Europe when I run one of your guys's API servers for the whole world or when I run one for each region.

D

One for each line- oh well,.

A

It's sort of both right, so we talked about having there's a discussion on a github issue and I. Think Chris was driving this about Chris Rousey about whether machines and clusters should be named spaced and what we decided on was that we would allow them through namespaced and that sort of by convention, although not enforced, if they're in the default namespace. That means they're the local cluster out resources and if there are other namespaces, they could represent remote clusters.

A

So that would allow people to have a deployment scenario where you had sort of a single API endpoint. That was managing multiple clusters. If you wanted to use, if.

B

You wanted to have a completely separate clusters: API endpoint, that you just wouldn't use the default namespace and you would just have a namespace for every managed cluster rights.

A

I think the pattern we we think will be the most common, especially in clusters that people are building and writing themselves. Is that the it'll be a one-to-one mapping like that we're saying where it sort of lives in your cluster, in the sense that you you could kyouko get nodes.

A

Cucit machines hit the same aerated api point: it'll go to two different API servers and give you back the right resources, that's the most common case, but we wanted to leave the door open for the case where you'd be running controllers to manage things remotely because it sounded like there are a number of people in the working group where that would that would be very useful to them down the road. Okay.

B

So I, both your global Mis and your fault isolation means it's clear that you want to run. That's what I call a standalone API server, one, that's not associated with a heavily used commands cluster I.

B

Think you still what so, then your controller manager, you imagine you'll just run on the same VM as the API server or perhaps you'll like build that into the same process. Yeah.

A

I think right now it's a separate, binary, I think by default. It just runs in the same place, but you know it's flexible right. It's a controller can kind of run anywhere. Okay,.

B

So you your requires, makes sense. I'll do my slides reflected what you're doing is another time it makes sense to not. Maybe you see our knees and I appreciate you guys giving me feedback on that. So.

A

The other thing I wanted to follow up with it sounds like since we're pushing most people to use. Crt, isn't that's where a lot of the community? What mentum is gonna be is since, at least for the short term, we're gonna be using API. Irrigation I want to make sure that is sort of a well supported path and if everybody else jumps off of it, and we are the only people still using it.

A

I worry that we're gonna end up sort of becoming the de facto owners and I really don't want to have to own the you know, API server builder or running, like the sort of use case of running your own aggregated API server. So I want to make sure that sort of remains a sort of first-class supported for deployment model, at least until CRTs have feature parity with our API eyes. I think.

B

It will be I think your from my stable, your larger risk is that, given the additional freedom that owning your own code gives you you decide to like set off in a in your own direction and like and your API deviates from the conventions that are possible, CR DS and then we layer on additional structure in the project and then you're left figured out how to rebase on those new conventions and the newer clients. Don't like to talk to you and you can't take advantage as, like. You know a lot of stuff in the future.

B

That to me is a bigger risk, but if you're just running using API server builder and doing kubernetes like things with your API, then I think you're safe. Does that make sense? Yeah.

A

I think that's sort of the plan is front, trying to do anything necessarily off the beaten path, we're just trying to reuse the existing infrastructure in a way. That's not quite possibly, LCDs I. Think.

B

If you ever think you're doing something, weird I'm always happy to look at and say yeah that doesn't the.

D

Using API, so ability was person assurance right because that's the same type of people meant in the code, so yeah.

B

If I mean when I was talking about bringing the API extensions API to GA this year, which would be an even strong commitment of it being possible to use the extension API server mechanism, that's not quite what you're saying! No! You want to. You really want the API server. Libraries continue to be able to be used to build your own binary.

B

We don't have like a GA notion for that. I, don't see that going away soon,.

E

Can ask a controversial question, get in I believe II work primarily on CRD.

B

E

To say, first of all that, like I very much appreciate the work you are doing to make adding API objects easier, but why are we doing CR DS and not making the API machinery make the AI extensions easier right? Why I mean? Could we not get most of those benefits by having you know, a super smooth, build or having that we talked about having that CD being able to inject into an existing CD for the gke scenario, that sort of thing so two reasons: I guess: three, one.

B

Is it's very like you've been in the community for a long time Justin? So you understand a lot of the nuances of kubernetes. Api is versioning style, apply by all that stuff takes a long time to build an understanding, and so the way that we build people who tell the key people on that golden path is by giving them less choices and CR DS gives them significantly less choices. We also want that path, so we want the kubernetes api platform to be cohesive and we want people to.

B

We want to tell people to start on the easiest path. You guys are pro so I'm comfortable like that you're gonna wander off and use the hard thing and not screw it up, but most people I would not be comfortable with and then the third reason is that we want to be able to bring new features without you having to rebase so, for example, we're gonna move certain applying to the server side. We can't do that.

B

If you know you guys, don't rebase to pick up those changes or like chunked api, so you'd have to rebase to pick that up with CR knees, you'll just get chunking for those, because you're you're just installing a declarative definition. So those are the three reasons why we like that path. But we realize you need an escape hatch and I'm committed to there being some kind, an escape hatch. There may be some rebase paint on that risque patch, but it'll always be possible to do that.

E

That is good yeah. The they oneness I think that the big escape hatch problem for me is the lack of the format in density is different. Right, like they see our deeds, they racked, if I'm, not mistaken.

E

When would that come up as like? Why would anyone ever see that just because so, if I, if I start on a CR, D and then I've changed my mind and want to get my machinery later right, I, it is harder to do that. Migration. It is a dump and restore type. There's.

B

A trick you can do where you could install an aggregation if you start on a CRT and you want to move to aggregation, you can install a facade that then goes back and, like you know, talks both to its own storage and to the existing CR DS and then like manages the migration itself. I haven't figured out all the details, but there's no reason: you're aggregating Gaiser can't go back and look at the old resources you might have to bump the version or something like that.

B

But I believe that there's some paths that are possible there- they're not simple, but that's the cost of being on the escape hatch.

C

That kind of approaches- a topic I was curious about um when and if Ciara deeds become feature sufficient for what we need and we just want to run the standard API server with all the other API groups turned off. Is there any sort of definition of a transition plan from aggregated api's like as long as they don't violate certain things, but from aggregated api's to series.

B

If you so the things you can things you, the main thing you can do to like not be able to transition would be your like define your own sub resources.

B

We don't have a way like we're, not sure we'll ever have arbitrary sub resources for CR DS or do like weird things in, like the you know, storage layer right or try to guarantee how to miss it Eve across like adding multi-object transactions, because you can do it because you've modified the code. Those are things I can't help you with. If you do that, I'm not for that answering your question, my.

C

Main question was: is there like a plan, or at least a high-level discussion happening about? We want people to adopt CR DS, but the features aren't ready yet do we even have like a plan or a simple one-page or to say hey when we get these features? Here's how you move no.

B

I'm trying to keep people on CR I can't anticipate all the things that could go wrong. They would prevent you from moving so I can't write that doc, which is why I'm trying to discourage people unless they have a strong reason, and so most people are using CR DS, so I'm, just mostly my effort is keeping them there and understanding. Why do you aren't there? Ok,.

C

I was mainly looking for the optimistic diagnose like if you've done nothing stupid. Here's how you move.

B

Okay, I will, like that's good feedback. I will try to see if I can sketch that out and share that.

D

Any more questions.

D

The question is for the APS of a builder right now. The project dear Marcus, Nava, is any plan to like for the this tool to become this beta in the 2018 I.

B

Know then they filled with rock who's. One of the people on that has a thing called COO builder he's working on he hasn't announced. Yet besides I know, I shouldn't have stolen his fire, but, like he's actively working on the next generation of it, I don't know what is out for being a plan is okay, reassuring to know that the guy working on it is still working on yeah.

D

Yes, he's a very helpful I help out fix a couple years older than them, but I don't.

F

Know I can't say that there's an indefinitely.

B

But it's open source, so yeah well.

A

That was my point before is I, don't want to become the owners of it just because it's the open source right? That's that's, not the primary mission of our sig and it's not really our value, add for the community, and so we've really like before. Maybe it's not just fill, but for they can machinery say to sort of commit to owning that going forward. Will.

B

Make friends with the metrics API cuz they're committed to using it forever? So at least you can be a half owners with them.

A

Hoping is that they owned it long enough and give us a transition path off so that we can be on sort of the happy path. Let's see, are these once CRTs or feature complete, like Chris was saying: I, don't think we're planning on doing anything that will prevent us from doing that. We just need new features that don't exist there today and we're not letting thank.

B

You I can't see it becoming hard. You becoming burdensome for you like significantly in like the years time, and this project moves so fast, I'm, not gonna, guess what where other stuff is in more than a year. So sorry yeah.

A

No it's the best. We can do alright.

B

Thank you for inviting me in any. Let me talk in it here. Thanks.

A

For me, Eric I think this is. This is really useful and helps sort of bridge our connection, at least starting to bridge our connection with with the API machinery. So you we're gonna have, like you said, have to become friends with here, going forward to make sure that this stuff continues working as we expected yeah.

B

And I'm et une at Google calm. If you guys have a modest number of follow-up questions, all I'll try to feel those afford them, don't abused. It thanks. Bobby knows where to find me too. So thank.

A

You Eric all right Chris. You have the next agenda item splitting up API groups, I.

C

Guess doesn't mean yeah I'm, sorry I wasn't here last week, but there was a someone suggested that we split up the API groups and then I didn't know about that and Rodrigo pointed me to an issue I'm just not convinced.

C

That's a great idea, because currently the machines depend on input from the cluster type and splitting it out may turn into a versioning nightmare where, like machines, v1 beta one needs at least cluster V 1, alpha 2 or something like that and I was just wanting to maybe understand that um happy to take the discussion to the issue. If it was discussed ad nauseam last week,.

A

Someone I think you brought that up last week. Oh.

G

Yeah I mean in the current machine implementation the Machine motion types. There is nothing binding them. The book gather Mallis, there's.

C

Nothing in the type definition, but in the prototype controller and everything in.

G

That in that yes, but but the Machine functionally it's it's kind of different, let's say step group or step resource. Just if you, if you put it in in a semantic way, all us- and this was my my main course or Mallis.

C

It is a different type, yes, but it is closely related to the cluster and I think you need there. There were basically some cluster level configurations that need to be known cluster wide. That machine has to pull from like the pod. Cider has to be known cluster wide, so you know like if you're using cube ATM, it's the 10th offset of that pot cider for DNS for every node needs to know that yeah.

D

I feel like I also feel like it's a close related thing.

D

Maybe for the deployment machine deployment of machine said could be separating in space. We can discuss a lot, but for the machine in the class I feel a good should be in the same same cool, I,.

C

Agree that it is confusing that there's no explicit relation there, but we were just operating under the condition that there would be one cluster object per namespace and it would be tied to all the machines defined in that same namespace.

C

I am definitely willing to entertain more explicit connections in that API, but I don't think they should be in separate API groups again if everyone feels I am wrong. I'm willing to back down from that, but.

A

A

So Chris you mentioned that there should only be one cluster per namespace. What would it look like if we created more clusters in the same namespace? Hopefully.

C

It'll look like a validation error once we put that in okay.

A

So that's what I was wondering is: is that something that we want to enforce.

C

We either enforce that or we come up with a better way to associate machines to classroom, maybe a more explicit way.

G

hmm In our project, we currently have like the notion more or less of a project which is no less than namespace and incited this project. Inside of this next place, you can, you could have multiple clusters so I at least from the way we actually do our deployment so of cried scooters. We really want to keep like this notion so.

C

You wanna allow multiple clusters for names: races, oh yeah,.

G

H

We had a, we had a similar situation where we were considering namespaces as potentially one account like a customer account with could. Potentially it would normally be one cluster, but it could potentially be multiple.

C

All right do one, or both of you want to propose a github issue to hash out how we should solve this because it sounds like we need to go with more explicit association between machines and clusters. Sure I could do that. Yeah.

G

A

I think I think that makes sense. I think we were trying to keep them very loosely coupled, but it sounds like they're, not sort of semantically loosely coupled anyway and so I'm. Making the connection explicit makes a lot of sense, because then you can easily tie them together and I gave you the flexibility to have more than one which right now you could. It just doesn't make sense, and if you tie them together explicitly, then you can have more than one and it actually doesn't make sense.

A

It sounds like there are some valid use cases for that too.

A

So I think in that case we should probably close the issue that's linked to here. That was added to the Alfa milestone and replace it with an issue to explicitly tied the resources together and mark that for the Alfa, milestone, I think Devon, you volunteered to create the new issue. Yes, once you do, if you could ping it to either Roderigo or myself, if you don't have permission to add milestones, we will do that. Okay, sure.

A

All right anything else: there Chris.

C

Now that was my only discussion point thanks for hearing me out thanks all.

A

Right so next there was an issue that was extracted from the initial machines PR about changing the provide type of provider. Config I just wanted to mention this briefly and ask if there are any objections, I think so far, I've only seen people saying yes both on the initial PR and on this issue, but I wanted to put that in front of folks to make sure nobody was saying no before we just went ahead and did that.

A

I'll give people a couple minutes to read through it if they need to.

G

Just wanting for the internal version, if you're going to support and we're going to keep the aggregated ipi server, we need we need to have these type needs to be I think object yeah.

G

So this one is nothing that we have to consider as well.

G

And I'm not sure that the client automatically goes route wrong extensions. It's like a struct inside of it that the raw data and there's also like an an optional object field and I'm, not sure that the client, even even though I automatically generated two ones populates this object. So we have to do some magic with the colors.

A

Okay yeah, so the issue suggested either raw run, tender object or runtime rocks tension and it sounds like you are leaning towards runtime. That object is being I.

G

Mean no, no! No for the food external versions that are with yeah we v1 alpha. It should be I mean according to the egg, to the documentation. It should be raw extension and for the internal version. It should be object and object. I think I haven't tested. Yet.

I

In search catalog, we use raw extension both in the internal and external versions, yeah.

G

But you if I run metal corrector, you only put some JSON blob there, not in the yeah structured type like type resource, smallest.

C

That object. Does that mean that object has to be registered with the API server for the internal version.

G

Yeah I mean internal versus cell are only for the API server Mallis, because no there's nothing to do to do the conversion and I. Don't think that the external version I mean by the father, the convertor, the decoder we were actually I think complained that it cannot be cooked to object to tune interface, more less.

C

Yeah I'm a little concerned with having to register all the provider config types with those with a generic API server. I'm I'm, not against I, just need to be convinced that that's not gonna, be an issue like I like if I'm just deploying say to an AWS cluster I, don't want to have to have the azure types, the GCE types and all all the other types I want registered with that API server.

C

If it can be done dynamically instead of compiled in. Maybe that's fine.

G

Well, you can more or less we'll we can have known as the like the controller like the API groups, that you can register, I, think or or select which API groups you want to run on your ideas over. So can you but.

C

For those flags to works, they still have to be built-in and I was hoping to go for a deployment where the aggregated API server is generic and the only provider specific part was the Machine controller.

C

Yeah I against it just sounds like there needs to be some more investigation. I think we should keep that issue open, continue investigating it. Oh I'll comment on that issue with my concern. Thanks.

A

Chris so it sounds like we are not not quite decided on exactly the right path for here. I need to do a little investigation to get enough information to make the appropriate decision.

A

I, this is another case where I worried that we could back ourselves into a corner and not be able to switch to see our DS if we're doing something custom with the API server.

A

If we're just registering objects, it's probably okay. If we're passing extra flags, that seems a little bit less flexible, because I don't have to restart the API server. If I want to manage a different, you basically install a different machine controller into the cluster that shouldn't require a certain API server.

E

Do they actually have to be real API types that are actually registered anywhere? What are the rules about this I.

A

Think that's what Chris was asking with front-end objects is: do does the runtime that object have to be a fully fledged type that the API server knows about, in which case you'd have to register her I.

E

Don't mind sometimes they do not write it's only if we do something that oh but you're, saying if, if yeah, if it was a C or D, might it then have to be a real one-time object or real API type.

D

Right, yeah I, don't have limited experience on this, but I feel like that. The the aggregation is done by a plenum resource pass to the main ideas over the API, but not necessary. I can know all the detail about this object. It just forward all the requests stuff on the front of it with the aggregator from maybe a server to the extension. So.

A

Okay, so I think if Chris has an extra item to update the issue.

C

Again, it's not an objection. It's just questions. I had that need to be answered.

A

Excellent, all right, if there's nothing else there, we'll move on I was going through the API definitions last night and came across the number of things that I wanted to bring up during the call today. Some of these, hopefully, will be somewhat quick, hopefully most of them, if not all of them. So the first one was terminal versus transient errors. So looking at machines, API the documentation basically says that we have.

A

You know we decided not to use conditions, because Eric tun and Brian grant suggested that conditions were on their way out and we shouldn't be using them for anything new and so, and the new way you're supposed to be doing things is sort of putting top-level errors like rolling errors up into the top level of your status objects. And so we have two error fields.

A

We have a reason, an error message and the documentation says they should not be set for sort of transient errors that you're expected to fail from, and they should only be set for thermal errors, which is a case that you're expected to never ever get out of. And then, if you look at the the documentation for the actual reasons that there could be errors, many of them say this is a transient error, which seems contradictory with the documentation of what they're supposed to be setting, which is a terminal error. That is not actually transient.

A

So it wasn't clear to me- and maybe nobody on the call knows since I think Jacob wrote this many of these Docs initially whether whether the documentation is just inconsistent, but the the principles are correct where, when we say that they're they're transient what we mean there is that they are only recoverable by a person and not by automation or if we have sort of a mismatch there.

A

In addition, in the proposed machine state diagram, which we talked about a couple of times on this call, we purposely did not put any sort of terminal error state and that's that's something. We learned from gke, where we do have a terminal error state in our internal state diagram for clusters and we've found that that can be kind of a painful situation to be in because we have a terminal error state. It means that you go there and there's really no way out and in a lot of the cases like.

A

If you look at the different types of errors, they are actually potentially recoverable right. So, like a crate machine error, the example is time out connecting the GCE which to me sounds like. If you try it again, maybe the service was down and GC would come back up and you'd actually would be able to create a machine at some point in the future.

A

If it's an error that you're out of quota right so insufficient resources, you can actually fix that out-of-band by adding more quota to your project and then have sufficient resources, where a machine controller that is constantly trying to reconcile state would actually be able to get you out of this error state and back into a state where you're working so I guess what I'm trying to get at.

A

Is it's not clear to me if we don't have conditions which conditions seem to sort of represent States and a state machine, whether we want how we want to use these error fields right? How what what information we want to? The error fields to convey back to clients of the machine's API, so I'm going to stop talking and let somebody else jump in here.

C

I can hear Rodrigo right next to me, but yeah.

A

Nobody else can.

A

Maybe his microphone is not working.

E

Fine question the intern: we do expose the state as a field correct. We.

A

Don't I was another thing that I I had on my list of things to talk about was maybe I press write down. Should we expose, like the explicit state machines, Daniel Smith's feedback in the document that I've linked to is that they we did not. We decided not to do that with some of the other kubernetes types, because they thought that once those states were exposed and the transitions between those states were exposed, that trying to change the flow between states would constitute a breaking API change.

A

I know that if you look at some of the other sort of machine management tools like the ones that this, if you guys have it's got sort of two fields represent state you've got one field that represents sort of the steady states and another field that represents sort of most recent transition. I, don't know if you, if you guys, have tried to change what those states so just and found that it's caused any sort of compatibility problems with clients or not. Oh.

G

It's new office, so we probably change those things but yeah we are playing to Center yeah I.

E

Would point out, notably pods, do expose a state right, and that is useful for the machines for the machinery that builds on pods like replica sets and deployments which obviously logs so.

A

I mean we have, we have a sort of implicit notion of a node being ready right, so you have a machine that crate causes the node to be created and I know goes into a ready state. And if you look at the the Machine set API that went in kind of like replica set, it's got fields for a number of ready, pods a number of available pods, which is slightly different right.

A

So we can sort of take that implicit state from a qubit reporting status and saying that the node is ready state machine, but in the machine state diagram. That is one of the potential states which is actually that represents more than one potential state right. Because, if am, if a note is saying it's ready, it could be in the serving state it could be in the drained. State read the qulet or so report ready.

A

If it's in the drain, State sort of inferred that, because it'll tell you that it's been cordoned but doesn't actually tell you if it's been drained or not just tells you if it's been cordoned, it's not schedulable and so I think we have a potential to actually have sort of a more refined state machine that we could expose and we'd have to make sure we reconciled that with like the actual State on the node. Oh yeah right now, it's not a not explicitly exposed it's sort of sort of implicitly exposed.

E

Once one way it make the solve, this would be to drive it by the use case of the Machine set, say what exactly doesn't machines that need, and that is what we shouldn't make sure we go to the Machine and then maybe we find that error text or whatever it's not even used. So it's purely informational, yeah.

A

I mean it was put there because we needed a place to surface errors initially right. So if you take crate machine and you try to create a machine that, like just, is completely invalid, some of those you can catch during input, validation, stages and some you can't right. You go out to the cloud provider and it says sorry this doesn't work and that was sort of a way to service back to the user.

A

So you'd say like we aren't gonna bother trying again but I think some of the error error types that crept in there are things where we should be trying again right, like if you're out of resources that could be a transient state. If there's a stock out. That sounds like a transient state to me, like those things, do get resolved or could get resolved, and you need to bubble up the fact that it's not working right now, somewhere, right, I, think that's what conditions were used for events.

E

Right, which is a pods pods, might have the bad image that or can't have Grantchester fully image and that it comes up as an event and I. Don't remember what the status is during that yeah.

A

Yeah the documentation says, will produce events, but events are not really reliable for higher-level controllers to be built on top of right events are more transient. They're, not they don't sort of the same guaranteed storage as your current status field does that's easy to miss events. Great yeah.

E

I guess I guess: the point is that the replica set doesn't do anything with the notion with the idea that Claud a plots image can't be pulled right. It won't behave in differently, and so the information is only of value currently to humans, who are looking at the events and maybe in our system in the future. But I guess the question is: could a replica set do something smarter with that information now more pertinently could are machines that do something smarter with it.

A

Yeah, that's a good question. Let.

J

Me know, okay, so so the discussion about conditions, I added the note, the link to the meeting notes as well, don't with some historian conditions and why or not use on those, so the thought process after the no reading quite a bit no yesterday is that, like it would be used in kind of events for anything that is transient to the controller. So the controller is rich. We try and and I just want to put something about what it's it's actually doing. Their own failures are happening.

J

It would be event or kind of a logs, the no control logs I we didn't side like cards, it's cuz, what's the line between the two, but essentially that's kind of.

J

If the controller is retrying, it omits analogs in our events at the moment, when the controller don't gives up it, bubbles up with the notice, there's information in the error field and that's kind of a that's, that's I, think what's being called term, you know, but terminal, maybe may not be a kind of a good term right, as you said in okoda, you know you, you might try a number of times and eventually don't give up there and any well output code at the top.

J

Actually is right, although like yeah you're gonna, if you try again later, you know you might be able to fix that. The same goes for any internal errors. You know internal cloud or stock-outs or something that might you know be not be expected, not be cutting welding errors and later than hope, they happen. Eventually, the controller gives up and that's kind of what gets followed up- that all sorry with the thought process and the mission set and- and we can revisit it- make consistent across everything that I've needed.

A

Should it give up, though, or should it just back off and retry it at low frequency right, I? Think, like both those cases you mentioned, we shouldn't give up and say we're, never gonna try to create this machine as part of machine machine set again, and maybe the difference is on machines. We say we try to create the Machine we gave up and that allows the machine set to try again later and they can do the back off and I. Think that was Justin's point is what fields do we need for our automation to work?

A

Some. There are two things: one is: how do we get error messages out to user so that they can manually fix things if necessary, like if there's a quota problem? You know in the same way that if you try to pull an image on a pod that doesn't exist, you know your replica set, won't scale up and you can look at it and figure out why and fix it. So we need a place to bubble up the things that you know.

A

People might need to go fix, out-of-band, and then we need to have enough fields there to build the higher-level tools. So maybe the fields there are sufficient for both of those and we just need to update the documentation to make it less confusing. But what I found reading through it was it sounded sort of self contradictory, so yeah. So maybe the answer is we sort of keep building the machine sets start looking machine deployments and figure out if the fields are correct and if so, let's just make sure the documentation is clean. Yeah.

J

I yeah, I, think I, think justice white is really good and let's kind of go over than all the use cases and we're like they know. These information needs to be mobile, optional for automation, and then it just says nay, as needed. So I think that make sense.

A

Cool so next thing: I we have the container runtime the machine spec. It's it's got used in a field that it's also in the Machine status. So we probably just don't want to change it because it is I think useful to have in a machine status. But in some recent discussions I've had with folks from sig node. It sounds like the future direction for container runtimes is that they are going to be tightly coupled and bundled with the underlying operating system.

A

So, for example, if you're running rel, you might get cryo as your container runtime and if you're running Ubuntu you might get docker and if you're writing core West. Maybe you get rocket, but the intention wouldn't be to go and install docker on core OS. It would be. We get core OS. It has rocket.

A

That's been validated as sort of functional pair of OS plus container runtime, all the Catena runtimes implement the CRI interface, and so that Cuba doesn't really care too too much what's running, underneath it as long as it passes, the node expect validation. Right now we have in our declarative, API a way to specify I want this container runtime with this cubelet and it's starting to sound.

A

You know if we kind of read the tea leaves that that future maybe doesn't make sense, and instead we should be saying I want this version of the cubelets and when I get an OS from my environment, that will come with a container runtime that should be okay, so I was kind of hoping to just go in and rip out the notion of a declarative container runtime, but it gets a little bit muddled because it's in the machine status as well.

A

So one option is to just update the documentation to make that explicit and say you know: I'll set this field and we're and we can reject it. That way. We can leave it in the status object. Another would be to break up the to type, so we have a different type for stuck in for status. Oh.

J

How pervasive you know our images with the runtime they don't baked into them. If you go to any cloud provider or if I have like home premises and anyone started using our image like do we have the majority of images with the container runtime today, then, if I were to pick one a random one, I would go. Oh so that's kind of a where I'm going with this question, but it I.

E

Mean I can answer from my experience on any device which has a pretty broad range most of the stock images. Debian Ubuntu rel CentOS do not include a runtime at all some of the ones which are like cops, I'm, kubernetes I guess builds one specifically, which bakes in the relevant version of docker, but that is something we do and it's purely an authorization to speed up boot.

E

Most of them do not have dr. Belton the the complexity. Is that, like I, think, the reason is that docker versioning doesn't match up with the the docker speed of version doesn't match up with the lifespan of OS releases very well. So it is a pain for people to bundle. It I think there are exceptions like I, think, I, think Red Hat has atomic right, I'm sure their image doesn't core. Os includes docker, because you can't install software on core OS, basically or what, if they're shipping these days, I don't but then yeah in general.

E

The answer is no, though it is not in truth. Yes,.

A

I think that's that's true. The state of the world today is, if you ask, for just an OS from Amazon you're, not gonna, get one that has a container runtime and some of you have to install one and maybe for something like CI. It's important to be able to specify a version, I guess what I'm saying? Is it looking forward?

A

I think what we're gonna see as we see projects like wardroom from hep do get spun up as we look at people building more things like atomic or core OS that are more sort of container optimized operating systems. Is that we'll start to see a tight coupling and if you say, give me an atomic image and install rocket on it? It's just not gonna make any sense, because atomic is gonna come bundled with a version of docker and that version of docker is work.

A

K

Have a suggestion maybe could we just add an enumeration to the existing fields for not only the seer container runtime version but also say, like the couplet version to just have an enum that specifies that it's OS provided.

K

So that way that there's an escape hatch to not use.

A

That feature that's interesting, so I think there is a provision in at least one of those fields. Now I mean you can leave them all blank and you basically just get the defaults right, and so maybe that's we say by convention. We expect most people to leave these things blank and just take the defaults, but there are some escape hatches where you might want to specify a specific version, and some machine controllers will just reject you know somewhere all of those requests if it's not supported yeah.

K

I think it you'd leave it to the machine controller to do what they do with that field. Right, that's something that could be relatively opinionated for the use case.

G

And another thing right now: the note has like a note info a primary if I remember correctly, and there is like the container runtime version as well, so I'm not sure why we should keep it in the state of smallness yeah.

A

So the comment for that field basically says we are copying it over from the node, because that puts it in the same structure and format in a spec and status, which makes it really easy for controllers and the field names are all different in the node object. So if you follow the reference from an end users, point of view that's trying to like introspect machines and nodes, it makes it difficult because the fields are inconsistently named and that that basically puts the bird on the controller. Do the mapping from I understand?

A

What's reported the node object and I'll map that over to the machine, object to make it easier for consumers of the Machine API and that's another decision. We could say: let's, let's change the burden and put it on the user, because information is already surfaced in the API elsewhere, and people should just follow the reference.

K

And that would also potentially fix the problem because then I could just rip it out so say we were not to support this field. This is something we would just expect in the provider spec in the case, one yeah.

A

So like like justin, was saying if you're on Amazon- and you say, give me a stock, Ubuntu and I want this version of docker installed. You can put that in the provider spec or the you know that the machine controller could just pick the right version. Based on that. Oh s, right. You can encode that logic of how to how to choose it in at run time. In a couple of different places that the provider spec would be one place, you can put it.

J

Yeah I think my just for New Year's old announcer, the API. If this is something that is like I'm doing my own image or I, have this image I've been using and I need to kind of create a new image and bacon doctor version or any other softly? Sorry, mister API, that's one extra step and all work for them, and that's just another consideration right. If the majority of users you know, could you know do that or wouldn't need to do that, then that's something to consider I.

J

Think we're converging data like at least a supporting an image that comes with the runtime would be you know. We certainly want to do that and have it probably some sort of a notes back that we're gonna build it against make sure their own time is in place, etc or nobody playing I think we're not sure. If we're gonna keep the installation code, then all that installs on the images with all that right.

C

I'm just curious outside of CI scenarios is the container runtime a meaningful choice.

A

Yeah I think that's the other thing I was kind of getting at here is for most people. They say, give me a machine with this version of the cubelet and because we are abstract about the container at runtime through through the Siara interface, and because it looks like we're moving towards having a tightly coupled with the OS like that's something that users probably shouldn't care about.

L

Like containers.

A

That's a good point: yes, I mean I.

C

K

About what there's a nickname, that's often not enough to support features like multistage codes,.

E

That is the huge only see on cops, which is everyone wants a version of docker that is still supported, for example by you, by the docker project and like Canaries versions, do not intersect that set, but I think we could solve that. The right way.

A

What's the right way? Well, stalker I mean we're getting container D at some point right, which is the runtime part of docker, and maybe it's maybe it's support life versioning trajectory were changing.

A

Chris also points out in chat that the users- probably don't care, but ops folks, probably do care about asbestos and and to be clear. The users I'm talking about are the ops folks right. The primary users of the cluster API are the ops they're, not the application developers, and so if we think that that the ops folks do care, then maybe it is worth leaving in yeah.

K

I mean if the Machine controller gives me a way to define what Sierra is running that can make it really easy to.

E

Do it well another great, really rolling ball.

K

Back to respond to an unnoticed, bug or something that affects.

H

Just a question: I'm sorry: this is a little bit foundational but I'm not aware of whether we're targeting sort of a in place, upgrades or blue green, because that would start to play into the you know: inability to update your container runtime I think.

A

We're trying to leave the API flexible enough to do both and letting machine controllers decide if they want to try and implement both okay. So certainly blue green is easy because you can always create new machines, delete old machines. You can sort of orchestrate that at the higher level and in places is more difficult and I think most people have not implemented in place upgrades for kubernetes today, I, don't want to close the door to doing that. Living actor. I think there's some valid use cases where we want to do that. Okay, thanks.

A

Alright I had a couple other things to discuss, but it looks like we're just about out of time so I'm gonna punt those to next week and thank everyone for coming, and we will see you all again soon how people had action items to go and open issues. Please make sure you follow up on those and certainly if people want to keep chatting, we've got slack an email and so forth. So thank you. Everyone for coming and we'll see you again soon.