VMware TGI Kubernetes, 21 Jan 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: TGI Kubernetes 182: Karpenter

Description

Join Naadir Jeewa and Marko Bevc of Scale Factory to discuss new Kubernetes cluster autoscheduling solution Karpenter ( https://karpenter.sh/ ).

Episode notes at: https://tgik.io/notes-182

Photo by Ricky Kharawala on Unsplash

A

Hello, everyone- and let me oh I've, always screwed up I can't get. I should turn that thing off all right. Welcome to tjik, I let me figure out how to turn off the starting soon. That's great.

B

A

um There we go right: hi, I'm nadia, jima zeva, um we're gonna, be looking at carpenter, weird marco hi, marco.

B

A

Hi, so um do you want to introduce yourself.

B

A

B

uh First, thank uh thank you for having me uh my name is marco belts. I work as a head of consultancy at the scale factory.

B

uh We are aws partners and we work in the aws space and also tightly kind of with containers and containerized workloads where we help our customers to kind of you know achieve more uh in their cloud workloads so yeah. That's me, cool.

A

And um if you're not familiar with, who I am, I I've been working on kubernetes cluster api for a long while for aws and full disclosure me and marco used to work together, and we will be talking about carpenter, because marco's presented that on carpenter a few times so bringing on as a subject matter expert and I'm currently an engineering lead for tanzanite's grid at vmware.

A

uh Let's see who's around happy friday, all from israel from home thanks happy friday to y uh hi, martin from the netherlands from saudi arabia, uh egypt cool. um So as ever our first thing, oh, I should post a link to the notes we are. The notes are going to be at notes, one eight and let's please, let's check out okay yep, that's the notes so um feel free to edit in real time. So the first thing we are gonna do is our week in review.

A

If I can find the correct things now, the banners that's white one. So there we go all right, let's uh so first things first um in week in review, so from kubernetes, quite we've had a couple of new patch releases. um uh One thing we've that's happened in the last week is uh docker shim support's been with me. I probably should share my screen. Let's do that hold on let's uh there we go.

A

There, okay, so dr shim support has moved from cuba adm now. Some of you might remember the sort of semi controversy around this when it was first announced about a year ago. I think that we would. This was going to happen in the kubernetes project and there was a lot of confusion around what that actually meant.

A

Yes, so don't panic is the answer, so doc is not going to go anywhere. You can still build your containers using docker, and what we mean by this is the um using docker on its own, as the container run time for kubernetes is going to go away and um in mostly in favor of using container d actually for most what this means for most people, which docker ultimately uses anyway. So really um nothing really changes. It's just we're just removing one of the middle people.

A

I guess in between um like container d and kubernetes, so don't worry still we're still expecting people to use their favorite tooling to create images. So just in case you might see that in your panic, so um well so just notice in the cncf sandbox and we had a new project join and that's devfile.io which I'd not seen before, but I think this is around.

A

I don't know if you okay, what was the equivalent back in the docker days? What was it um there was that method of running you had like a yaml file and it would run a bunch of docker containers and you stick it in your repo um who remembers that I've totally forgotten.

A

Jesus anyway, it's it's a replacement for that and also, I think, it's also a bit from what I've seen it's a bit like tilt.dev, so which is something we I've used a lot um until we're using cluster api. It uh it's a mixture of actually using stala ah compost. That's right! Darker compost, that's the one! Yes! So this is kind of like docker compose um and the bit. I think there's been a couple of other similar sorts of projects as well. So it's another one in that space. So let's join the cncs sandbox people interested.

A

We might do uh tg ak on this might take give it give it a spin find out what it's all about. um What I'm really familiar with is tilt uh which uses the starlark language, and we use that a lot in um in cluster api and but it says compose, is very light alive. You can push from compost to ecl or aks yeah. Thanks for that, uh I stand corrected.

A

um Yeah cool um is there and if you haven't seen- and please don't all just sort of start watching this instead, like I, I know, like you- know, youtube short attention spans and all them fickle viewers. um You know it's not like. We've got the sponsorship spots, you know um not gonna, do any here, we're sponsored by skillshare or brilliant.org, or anything like that, so um you can stick around, but the kubernetes documentaries, just out their first uh part, um was premiered. Earlier today um I haven't seen it yet.

A

It's got a lot of the founders from kubernetes um suggest everyone watches it, but not right. Now. Okay, later um all right. So today we're going to talk about carpenter, uh which, uh which has been on the horizon for a while now I've. I've certainly had some conversations with aws around this previously um talking about how we might use it in cluster api, um but yeah, it's um I think they did a formal announcement at the end was it I think, possibly.

B

I think it made it the ga of the reinvent, uh but the first time that kind of picked my interest was, I think it was kubecon uh europe cubecon eu, so yeah must have been like almost a year ago now. So uh it was.

A

Already in kind.

B

Of a 0.3 version, but still quite early stages, so they kind of pushed it through and made it to ga 0.5, and this is kind of the currently latest stable release that they're using.

A

Yeah thanks um and it's here where I reveal as far as I'm concerned there for myself the emperor has no clothes, so I I do not run kubernetes in production.

A

My personally um I develop on the kubernetes project itself, but as far as I don't use it in production, much um actually the last couple of months I've been doing design, stuff and other bits, so I haven't actually used kubernetes for ages, so this is going to be fun, isn't it as we play around so um I think the way I'm going to play this is uh revisit cluster with a scalar, basically um uh as you're. Well, probably, people are sick of me going on about I'm from the cluster api space.

A

So I'm going to redeploy cluster autoscaler with cluster api and then uh we'll take a look at that and go through the shortcomings um and then we'll switch over to carpenter on eks for reasons that I'll get into later um and yeah and go through. Why did what the difference are um why we might want to support it? Support cluster api using carpenter, um yeah and just get some ideas see what that's about so yeah. um Does that work for you, marco.

B

Yeah sounds good to me. uh Definitely definitely sounds like a good plan, I'm quite interested to kind of see where that leads us today.

A

Yeah, um thank you valid for dropping the link, so there's a talk. The talk that marcus mentioned was from ellis tan and petit kudia from amazon um in kubecon eu 2021 yeah. It was ls that I had talked to previously about this yeah. So all right. So if I go to my terminal, so I have spun up both clusters. I we'll just close: it's got eks cluster, which has spun up with dks cattle.

A

In the background, um if I just go back.

A

um For those who don't know how class api works, you class api uses kubernetes itself to deploy clusters, so I had kind kubernetes in docker um hi vlad hi ellis thanks for joining all right. So um um I might throw you a link. Ls might get you on. That's fine.

B

No pressure there on the deer. Yes,.

A

um Yeah all right, so, um yes, I've uh got a kind clusters, that's kubernetes and docker. If it's like this bit like using minikube um in which cluster api is running, I had defined a cluster which is called tgik182, um and I can probably do this. One.

A

All right there you go uh so it's made up of it's an aws cluster api cluster using cube adm control plane, which is kind of defaults around cluster api. uh We have a bunch of machines.

A

uh We have a machine deployment, I think uh yeah, which has no probably want to give it some replicas. So uh machine deployments are like um normal deployments, except we're scaling machines instead of uh let's just d1 for now, um instead of pods, so we're just going to create one machine.

A

This might give you an indicator of how classical this scanner is going to work.

A

So that's doing its stuff. um We have.

A

This is the eks uh cluster.

A

Did it deploy a cni?

A

I haven't no aws night. I assume that's, I'm sure it runs a vpc plug-in, okay yeah. It must be. I just play some components which are running on the control plane. I assume it is so I'm just going to do that. I've not used eks before.

B

I think the cmi comes with aws node deployment, so it should be all right.

A

Yeah point I'm just taking off that we can review banner as well right um and we have the this cluster and I think I deployed a cni to it earlier yeah, so we're running andrea, oh it's in oh yeah, because we just spun up the new node. So that's two mix cool fine!

A

That's that will settle down in a minute right, so we need to deploy cluster autoscaler on this, so uh instructions have changed since I, since I last looked at this, which was probably about 18 months ago, which is kind of ironic, so I was temporarily a reviewer for this and.

A

Probably shouldn't have been um yeah, so I was yeah, formerly a reviewer for the prs that were coming in to the cluster with this scale across the api provider uh right. So I think when this first started, you had to run um two copies of the cluster autoscaler one in your management cluster, one in your actual cluster. Today you only need to do it in the one place and that's because we can set this setting so we're gonna actually run the autoscaler on in my kind cluster, on my laptop.

A

So in production usage, you're gonna have one permanent management cluster, which is probably going to be in aws, and that's going to manage lots of other clusters that create underneath it, and you would just install auto scan on that. I've not, um and there is a process in which you can move resources from your local machine over to a newly, create created workload, clustering, converting management cluster. We're not going to do that today, um because it's a bit of a pain um there are tools to. There is a see there is a cluster.

A

I think we could do it well, not going to campaign. There is a cluster cuddle move, come on.

A

Yeah, it's only a pain because I can't be bothered to grab the cube config and pass it into the flag. But it's straightforward enough.

A

Basically, if I did this, it would convert that cluster that I've ma made into its own self-managed management class. Then we'd run it all together, but for the purposes of this and to sort of simulate how you would do this for real we're going to pretend that my local docker based kubernetes cluster is a permanent management cluster and that's where you would run one autoscaler that manages all of the other clusters.

A

So hopefully I have got this manifest properly. So I took the examples and it's just going to create clusters going namespace gonna, stick a uh deployment in it and we're gonna set the setting. That's needed um a bunch of cluster role bindings that allow it mostly they can.

A

Yeah they can watch the relevant uh cluster api resources, which are machine deployments and the scale sub resource uh machines and machine sets so um yeah. So that's pretty much it and hopefully that works, and I haven't screwed up networking which I did 35 minutes ago.

A

A

Okay, and also because I was just gonna, I'm just in a bit of a rush.

A

um Going to oh.

A

uh Yeah, so I was just going to deploy. I already have that workload. So uvh has a nice little demo, so I'm just going to use it uh also. I do these are not permanent clusters. So all right. So let me okay. One thing I need please watch out and tell me off if I'm getting up going to the wrong context, because I've done this like five times already today, deploying things into the wrong place um right.

A

Okay- and we will do the same thing on.

A

Eks cluster, as well.

A

Okay, um all right.

A

All right, let's go back to his kind, see how my how bad my internet is.

A

A

Oh, I didn't deploy the classical just go wrong. I did deploy in the wrong.

A

Oh dear already,.

A

It doesn't matter like I mean basically what's going to happen. Well, if we have a look at what's happening on that other uh cluster, if we having a bad bad time, I think it's yeah there we go it's incredibly, we can have it look at what's uh why it's in and it should be pretty yeah um yeah. I'm I'm not convinced that the output of this is particularly helpful today, but once you do get scroll all the way up to the top, it's fairly obvious.

A

uh So I think probably something can maybe open an issue in the cluster with a scale. um We pray to not spew out millions of debug lines for the error, which is yes, it could not could not find the cluster api resources because they're not deployed in this cluster.

A

A

A

Yeah, so that's working correctly, that's running all.

B

A

Cool, oh yeah: now it's confirmed yep, where you're using the aws node is the cni cool.

A

um Oh, that's the one where it sets up routes and everything uses.

B

A

Okay, I think um right. Okay, so got that running um and.

A

Other next uh emperor has no clothes thing: I've not never used classic, auto scaler so that that's fun.

A

I guess technically a vmware project since vmware bought heptio, which project are you talking about andrew.

A

A

Do I do I need to tell anything about? Do I need to tell cluster autoscaler what it's allowed to scale and what it isn't or does it just do it.

B

I think it should just do it if it has enough permissions. um Okay, I'm just thinking. If you can't remember, if there are any attacks, you need to kind of put on.

A

Yeah, I'm just wondering about that. Okay, so.

A

Enabling that yeah.

B

A

To edit those yeah fine before so that makes sense, so I need to put those on the machine deployment so all right so on the cluster auto scale list: oh ktx. Yes, that's right!.

A

Yeah, I I don't know if that counts as a heftier project. I think that's more or less just someone that hemtia did that yeah uh vlad goes auto scooter, it does have auto discovery, but it needs certain tags. Yes, these ones. So um so on this bus there I've got a machine deployment and I had already manually scaled it to one earlier um right.

A

It's slow it's weird anyway,.

A

um And the way this works is this: machine deployment is linked to an aws machine template. So that's really gonna define.

A

It's just it's one uh instance type, uh so I think I configure it as t3.x large much earlier um yeah, so it can spit up a number of t3x larges.

A

These specs I forget, but I just pick it because it it sounds like it's not going to crash, so I just don't know we're using an upstream cis that value. This is why I use that one or you might be using 2x large, I'm not sure. I know it's not um so small that once you deploy like cni and stuff that it doesn't just start falling over right, um so we're gonna add the metadata which let's go back to the docs okay. I need this one annotations.

A

One, I guess it's max item 10 surprise. I think.

A

Almost certainly, I think I've got enough um voice creator in my aws account for about 200, so that should be fine.

A

Okay, I think that's it. So, let's save that and if there's any logs come on, I didn't.

A

Why is it not tearing up? I should just use annotations.

A

A

Oh, it needs to be a string every time done that so many times right.

A

Couldn't find template for node group machine deployment, thief ah interesting, you don't have that in the.

A

A

Don't wait saying that maybe I have to add.

A

That is the string. Stop complaining.

A

A

We started that pod. Maybe that's what you needed.

A

Yeah valley said, I guess one of the advantages of comfortabilities cast. The autoscaler is less configuration.

A

B

Probably a fair point about yeah.

A

We'll find out, I guess.

B

A

All right, it's still complaining about, couldn't find the template for the node group, but uh we'll leave that I don't know what static autoscaler.go is that maybe it's just.

A

A

Oh well, we'll just ignore it we'll see what happens right, so we put some annotations in there. So we have our workload.

A

Right, oh, we need to now go to.

A

Actual cluster.

A

All right, so this has got three replicas. It's got a bunch of resource requests which are well within bounce at the moment.

A

um Oh and what is a tv.x large when it's at home.

A

Don't I still use this site.

A

What's cooking, I just joined.

A

uh Yeah uh carpenter, vs autoscaler, that's pretty much it so um also. uh I pretend I know how to use kubernetes.

A

It's large, it's 16, gigs of ram four vcpus.

A

uh So if I want to scale this, let's, let's I guess, let's give it some, let's give it a bigger request, so we can reason about it more easier that same sound, reasonable, marco.

B

1G you mean one.

A

Let me just leave 10.. Oh.

B

Wow all right, you're really going for it. Okay,.

A

No, no, no! Oh! What's doing the wrong thing! Cpus! That's! uh Let's do ken g. I hope.

A

uh Oh that's already. Oh, it's already running three replicas. Let's maybe give that four then.

A

And he didn't like.

B

It, what do you like.

A

A

A

All right nose up again.

A

Hi justin thanks for joining.

A

Yeah uh colors we'll get to that any additional providers in the world yeah, so it as far as I know, it's only aws.

A

uh Vlad, I always have to relearn these things too. Everybody does took me literally 10 15 minutes figure out how to get a crd without auto, complete and just typing random things. Yeah. That's pretty much. My uh my everyday.

A

All right, so that's not that's all no real change there, but if we were to.

A

I guess try and scale this to 10 now now that we've got requests up, that's to be well beyond the bounds.

A

We've got pending now, if I.

A

I don't there was a twitter fred about this. Don't like um I don't like the cube ctx changing between chills. I believe a few days ago.

B

A

A

Still only let's have.

A

A

Still complaining about the missing template.

B

I'm trying to think why it's complaining all that.

A

A

Right so we connected it with that version.

A

Use specify cube config for workload, cluster.

A

Is that what I'm missing.

A

A

I don't think I needed to do that. uh Oh okay, so it's still.

A

Right so you it's a single instance of the cluster, it is still a single instance of the cluster all day scale per workload, cluster right. So on this kind instance, I have a bunch of secrets. One of those secrets is actually a cube config. So I need to join that in.

A

Right uh so this also means that this needs to be deployed in the default namespace, because that's where the cluster is right. Okay,.

A

I was already messing about.

A

I had misread the docs once right. They've done that okay now!

A

Well, you can mount a secret here.

A

I'm gonna get all the syntax.

B

That's usually the tricky bits right.

A

Let's go to the dots when it's there.

A

There's that one.

A

um What's the name of the secret right this one, let's keep complete.

A

Yes, that needs to be secret name.

A

And we need to put a mount in here.

A

And we need to get this secure, config, okay,.

A

A

A

Hopefully, we'll stop complaining around.

A

About that missing, template.

A

The uh question valid: are you following the aws cluster, auto scalia provider? No, I am following the cluster api provider for cluster autoscaler, which is supposed to be infrastructure agnostic. It does come with some limitations, it mean um there's also. I want to look at carpenter because the us api authorizator doesn't know about things like um what instance type it is. It doesn't know if it's um doesn't know like if it's got gpus or whatever.

A

So there's actually quite a lot of limitations about the cluster api and provided for class, a f4 cluster, auto scanner, um but it is somewhat infrastructure agnostic.

A

um So it's not going to create any auto scaling groups on its own. All it's going to do is change the value. I think it's already done it so.

A

It's already scaled that the machine 2.4 and then we have a.

A

This controller uh plus the api aws controller, is the thing that is scaling the which is doing creating ect machines individually, doing it one by one. So it creates a new ec2 instance. There's a whole bunch of things that happen where it generates a um temporary credentials.

A

For that you know to then join the kubernetes cluster, um there's a bunch of stuff that happens, but so cluster autoscalers only interface is a hook into the workload cluster to be able to monitor, um like pods, that are waiting to be scheduled, etc, and I hook into the management cluster side to be able to scale machine deployment resources. It doesn't need any other um credentials than that, so it operates on its own. So I don't have to do anything with iem beyond what I've already given to cluster api.

A

That makes sense.

A

um But yes, so we have that. So I guess I guess now is where we start getting in. If I were to increase the request to something that a t3x large isn't possible.

A

I think I'm now going to run into a problem. Is that right, marco yeah.

B

Yep, I think so so if you're gonna try to scale it up to something, that's not gonna fit. uh I think we're gonna end up in an interesting situation. Yeah.

A

All right, let's do that.

A

Let's give it a some of these temp files.

A

I I'm gonna get rid of that finished touching the so.

A

Oh, we edited it in place. Okay, oh well,.

A

A

Oh stuck in there probably.

B

Need to switch context right. You.

A

Need to switch contacts.

A

I need to edit a deployment, not a machine deployment.

A

Okay, right now, we've got the right thing, so we have 10 replicas. So if I were to say so, it's I think it's um 16 gigs of ram in total or b3.x large. So if I were to give say this is actually 32 now I'm going to run into problems.

B

Not being modest here.

A

So now these are all and well is it slowly terminating everything.

B

Well, they should end up in a pending state right because they cannot be scheduled. You know until the you know, the nodes are available.

A

Yes, slightly yeah.

A

Okay, yeah it's slowly, oh yeah, because it's trying to roll through them.

A

That's not really helpful. Is it so.

B

A

Going to see is there a bunch of pods pending um and a bunch of them will still be running because it's not going to get rid of the old ones until it's had a chance to see these running, which is never going to do now, because we can't. So if we were to do a describe on that.

A

There you go insufficient memory now we're stuck.

A

So nothing we can do here and there's nothing plus. The autoscaler can do as well, because it doesn't have knowledge of different instance types um yeah.

B

Yeah, it only knows how to scale up the same type of instances. So if, if you end up, you know trying to schedule a workload that doesn't fit the instance type you're kind of stuck right.

A

Yeah now what we could do, maybe we could get well, let's now I guess so. Carpenter will solve this problem for me right, hopefully,.

B

We're going to say, oh I'm saying it's going to solve your problem but yeah, let's, let's all get the hell out.

A

Okay right, so, let's make plus a little cisco for a moment, um I'm interested in coming back to it maybe later um so.

A

I think we can stop switching between multiple contacts as well, because we're going to go to this eks one and only this eks1. Now we don't have a management cluster or whatever to be fair. I could have provisioned an eks cluster through cluster api as well. um It's just there wasn't really any point.

A

um I wouldn't see a situation where you won. Well, maybe right, I don't know, would you run a cluster a you might still use cluster autoscale on eks with cluster api, possibly yeah. It would work um yeah.

B

A

We are going to deploy carpenter on this, which and when I last looked at the instructions, would you want to give us a little intro in what why carpenter exists and.

B

Yeah sure absolutely so.

B

As we've seen with the cluster auto scale, one of the reasons why you probably want to go with something else rather than cluster autoscaler is, uh uh is exactly the reason we seen. So, if you know your workloads, don't always kind of match the resources you have. uh This is obviously one big limitation with cluster scale or similar solutions in this space right.

B

So it's not just the cluster autoscaler, probably worth mentioning the reason why we showed you the cluster autoscaler, because this is kind of let's call it a de facto kind of auto scaling solution out there, or at least the solution. That's mostly most popular at this point, uh but you know in case you're, adding you know things that don't fit.

B

The class dot scale can help you, and this is where carpeting kind of comes in, where you know the scaling.

B

uh It's also called like a nodeless, auto scaling and the reason why it's called like that, because it kind of eliminates the the whole concept of uh node groups, so instead of scaling in a way that you have node groups where you would add, like identical workloads, no identical resources to it in ethical nodes, it literally works on a different level, so it eliminated eliminates that concept and kind of try to provision uh resources for you outside of that and then help in then help the scheduler to kind of efficiently.

B

Add those resources in your cluster based on the workloads you're trying to schedule, and uh there are other kind of kind of advantages as well. So the speed is a big advantage as well, so with the cluster autoscaler right. You need to wait until you're in aws, auto scaling group figures out. Oh, I need to scale up it. That's instances in so all that kind of you know uh procedure can take up to like maybe like three to five minutes. I guess depends on the oh thanks, dom yeah.

B

The note group is a concept in the eks. Yes correct, uh but it's implemented with. uh You know using autos, sorry, uh auto scaling groups which, like I said it, can take up to like five minutes to kind of provision, your resources based on what you're requesting which region and other bits, whereas with the you know, uh carpeter, it's actually using the direct api access and it's using.

B

I think it's a api call, that's using its uh fleet create fleet uh and that literally requests a fleet of resources for you using the api calls and they can they're, probably spun up in this in the quickest way possible. I would say uh that you can actually get those resources available from aws and as they're being provisioned.

B

Carpenton also ensures that those resources are being bootstrapped and automatically added to your cluster. As you know, standalone notes- and it works in you know, in collaboration with scheduler, where it kind of indicates to the scheduler where those spots needs to be scheduled right.

B

So it doesn't wait for resources to be available before it relies on scheduler to kind of figure out where to go, but it kind of pre-predetermines that using, I think, it's actually marking parts using the uh uh using the kind of node node annotation, so it kind of annotates the parts where they need to go so the whole thing kind of happens in a time span of usually about a minute.

B

uh I think the the the good friends at aws have uh kind of a really really kind of really good, ambitious goal to bring that up down to 10 to 20 seconds, if I remember correctly, which I'm actually looking really looking forward to that, I I think that would be really an improvement.

B

So that's one of the kind of the problem spaces that is solving and and the other one is probably, for example, if using things like cluster autoscaler, uh you're also limited in a way where your nodes would be provisioned to so it would automatically automatically it would just pick a random region where the you know, the next node would be kind of I that too, whereas, for example, if you're using um you know like uh uh persistence um volumes uh that might actually be provisioned in your cloud provider such as aws in a different region where you're getting the nodes to right.

B

So you can end up in a situation where you know you're provisioning notes, but the parts still cannot be scheduled, whereas carpet is a little bit more intelligent there right now,.

A

How you say why I say.

B

Sorry not region, easy yeah, you guys already have so the the easy where a carpenter actually provisions the the notes in exactly the same region where you actually need it, which is also a big advantage there and, like we mentioned before the nodes that you're getting, are actually the right size. So with the cluster autoscaler, you would always end up with the same type of nodes that the original, auto scaling group or the template was used, whereas with carpeter, it's actually using. um I think they're calling it like a really fast kind of uh controller.

B

That kind of it's using uh a bin packing algorithm, uh that kind of tries to figure out uh and kind of figure out which resources uh can actually provision in order to fit all the specific parts that need to schedule. So it would actually use the optimized instance size that would satisfy the the need for scheduling your workloads, which is also kind of cool as well.

B

So obviously it will provision uh the you know the nodes from the list that you provide, but at the same time it will pick the optimal one for the specific uh specific need that you have at this point. So, for example, if you need to provision a single part, you might end up with a really small instance. If you have like a a large amount of pots or a huge one like on the deer, show that's before right.

B

So if you kind of really bloated up it would try to provision an instance that would satisfy satisfy that that that requirement, which is also kind of cool.

A

Cool um yeah, just uh yeah thanks for that um ellis, is saying. Carpenter has a full scheduler implementation. It's also um scheduler is only in the loop when capacity already exists. Okay, um there's some questions about azure, yeah, so azure uses uh vm scale sets you kind of have to do it in azure as well, because so there is a difference in actually one thing I like about azure, not too big obvious. It's um like you, can put everything in the resource screen.

A

You can also hit delete on a resource group and everything goes along with it, which is really nice. When you want to clean up, you don't have to use tools like aws, nuke and stuff. Now there is, you can kind of use the resource tagging api to do some of that now. But um one of the limitations around resource groups in azure, however, is that there is a it's like some weird number like you can only have about 834 resources in it or something it's some weird limit. Anyway, every vm counts towards that limit.

B

A

So, if you've wanted, if you want to do a large cluster in azure, you absolutely have to use vm scale sets um and from a cluster api perspective. That's implemented in using something called um a machine pool resource which I don't think the cluster auto scaler quite yet supports, because there needs to be additional information for the cluster authenticator to figure out what what the machine type is and whatever. So I think today uh what happens? Is cluster autoscaler on the cluster api context?

A

Determines the size of an instance by looking at the machine deployment looking at the aws machine related to the machine deployment going into the workload cluster, getting the information from the node and then inferring that all of the any node that would be created as an additional replica machine deployment would have the same uh like memory or cpu capacity, but because the machine pool can go down to zero or something like you lose if, as soon as you've got the option to scaling from to zero, we lose all that information.

A

So there is a design document in the cluster api project to give more information around how that would work, and then that would allow asgs essentially to work with cluster api, but class api based cluster auto scada today does not use auto scanning groups or vm sets in it. Lets cluster api creates machines on its own um there's a question from changuin: can the user customize the node size and type to optimize the cost.

B

Oh yeah, absolutely uh as you I mean we're, obviously going to see it kind of more more a little bit better as we go through the example, but there's a concept of provisioner so which is defined through the crd and in the crd. You can actually define a list of instances that you want to provision from, so you can actually limit it down to a certain set of instances that you're happy to have provisioned uh and obviously, when you're provisioning workloads they need to fit into that those kind of instant sizes.

B

But yes, that that that's the thing you can do. uh I would just like to go back to, I believe, alice uh posted on on in comments as well. That actually has a full schedule: implementation in the source.

B

uh What I what I really like about carpet uh comparing to cluster autoscaler, is it's completely decoupled from the from the scheduler, so, for example, if you're using cluster autoscaler, you actually need to use the same version uh than the kubernetes version that you're currently running it on, whereas with carpenter, it's completely decoupled. So it's not really using any of the kind of dependencies on the scheduler itself, which is quite nice, but at the same time it is working quite closely together with the scheduler.

B

So you get all the benefits but you're not really dependent on the versioning. So that's very nice.

A

Yeah yeah, I I didn't realize that was the reason why the class auto schedule is so tightly coupled to the kubernetes version. It's certainly been a pain for us in terms of um it's just yet another dependency that we need to think about when we're like from a uh vendor perspective or from like versioning all the components, it's yeah another one. We need to take into consideration whenever kubernetes gets spun. I.

B

Didn't realize it's because.

A

Of the scheduler and the scheduler code being imported, etc,.

B

Exactly and for example, I was just looking at one of the issues. I think it was, I think, as well known issue uh on the I think it's able that was provided when you want to scale down to zero.

B

uh There are some kind of rough edges around there and there is an issue opened and even though it's gonna be merged, for example, uh the way how the merging process works uh it's going to be merged in the next version of cluster autoscaler, which is 1.24, and it's even gonna, made it in 1.25 or something like that to the eks, which you know you can imagine that the you know the release cycle is quite long if you're waiting for a feature that will make it to your eks cluster. So.

A

Yeah all right, I guess we'll, uh let's I guess, try and get this going. I suppose so um I've got. We have that uh right. We need to create an iam well.

A

Apparently, uh yep yeah: okay, let's uh try yolo that.

A

Let's have a look.

B

A

I'm sure it's fine, but uh you know.

A

Let's be sensible.

B

Maybe there's some bitcoin mining in it, but it's fine right.

A

Right, I I can finally get my hexagon twitter profile, I suppose um all right. So it's all looks reasonable.

A

It's gonna, yeah, we're gonna, have the fleets and the run instances and terminate instances and stuff. So uh okay, is that it does it always use a is there? Is there a case where it might use the run instance? Api instead of fleet.

B

uh I think how it currently works, it uh just provisions using crate fleet. I believe, if, if I remember correctly from the initial talk that was posted before in the comments, uh it was explained that they were kind of contemplating between one of another kind of api call, basically they're literally the same thing, so they just decided to go defeat, but it it should offer the same functionality.

A

I just oh we're done with the dots okay.

A

Let's keep this pg.

A

A

Did I swear? Do I.

B

A

I don't think so.

A

um All right, oh no, I don't really care, but I'm just I don't really care about the club, the rebels, plus the name right all right. That's recent! It's amazing badger! Apparently I did not choose that name. That's just one randomly generated by.

A

Thank you guys later, oh yeah, all right thanks ellis for joining us. uh Thank you see. You later see ya.

A

A

A

I guess if I paste it in one, go that would have been fine.

B

Yeah, it's just creating a temporary location locally. I'm just going to remove it.

A

Okay, um I've not used a cursed castle, but I guess that does some. What does that do.

B

uh It's just gonna create, I am identity, mapping, so you're gonna end up with a service account map to a specific role. I believe so.

B

A

B

No, that's the one for yeah the the role for provisioning instances, yeah.

A

Do I need to do expect me to set my account? Oh yeah, that's about saying I was oh. I should get that with fcs get caller identity, which you can.

A

Hopefully describe that, surely that could be an export before that then as well yeah, if you're going to do it down there. Probably this these stocks need a bit of an update.

B

Yeah, oh thanks a lot yeah there. There is an example getting started. Example with terraform as well. So uh there was a question: if you need to know cloud formation, there is an example how to get started with terraform as well. So you don't need to know cloud formation.

A

uh Which parties will be a bit quicker, don't tell anyone.

B

Yeah, that's probably true.

B

But yeah that's just as we're going along what we're creating now. It's literally the the iron rolls needed to kind of provision the nodes uh as we're running the controller yeah. So.

A

I mean I had before we had started. I had already done that for the cluster api classes as well, so there is a command called. um That's the aws adm.

A

Confirmation templates a lot of the same things or you can just run create, and it just does it for you um just love the same thing except a bit more stuff because um we're managing the pcs as well all right, so that um stack was created. We now need to do that. So is this, because is this? I am. Is this fiddling around with.

A

Is this fiddling around with the aws I am authenticated stuff, or is it fiddling around with um the oidc stuff.

B

Oh adc, so it's going to create a role with oadc connection to yes,.

A

All right, let's do that export that bit, because.

A

Someone's got some time might be worth creating finding an issue for that one, oh yeah, justin just said they create. I am wood. Come on modifies the aws off conflict map all right yeah, so it's with aws. I am authenticated, then yeah. That makes it. I guess we can just take a look at that config map after I've just won this. Why don't we take a look at before and after okay? uh Where is that config map.

B

System is it like that.

A

Yeah, so for those who don't know, eks uses a project called the aws infanticator, it's what's used to have nodes join the cluster, it's also how authentication works and what allows you to authenticate using aws credentials to the kubernetes cluster itself, um and it was that this was genuinely an old hefty project, in fact, which we deny donated to the kubernetes.

A

So it's going to edit this and add it so probably clearer, like I don't understand, I'm not completely familiar with the syntax, so just become clear. I want to actually run it and oh can anyone remember where I just ran.

A

You know playing in this this show or what shell have I been using. Oh jesus.

A

uh Yeah this one sorry yeah.

A

Ec2 private dns name is that no.

B

A

Going to be templated, oh okay, okay, okay,.

B

Yeah, that's going to be templated in the config map, but not here the.

A

Variable and cluster.

B

A

Yeah, so I this will make sense now, so it's definitely not the oidc thing. What we're saying is we're.

A

Always uh hold on we'll find out in a minute, so we're have you exported the cluster name.

B

A

I thought I did.

B

A

No cluster found no, maybe not. I.

B

Think there is a suffix at the end of the amazing badger somewhere. Isn't it.

A

Oh, do I need to put the whole lot in.

B

Depends how the cluster is named? To be honest, I usually do specify that I'm not sure what the default is. If you try to leave it to.

A

B

Generator name.

A

Yeah, except now, I've created the stack with a different uh okay, let's sort this out. So probably that is gonna, be amazing, badger, okay, because that's the.

A

um But that cluster is going to be now this thing, I suppose.

B

uh You can always do get cluster using the ek scuttle that should get you the name of the cluster that you created.

B

So it's amazing the suffix yeah.

A

Okay, so this should work.

A

um Let's just say: check.

A

I might just hide my screen comfortably just in case um right.

A

I got a lot of output there's a lot of um so this is just a dev account to be fair, but um that's a load of additional garbage in this dev account for reasons yeah. That's fine! So I'll put my screen back there we go, so we got that. That's that node! Well, blah blah blah! That's that's why we wanted the account id.

A

Right um and if we were to.

A

Yeah we can see the additional mapping.

A

Yeah, okay, cool.

B

Yeah, so the reason why this is created is when carpet is spinning up new nodes. uh They will be spun up using the carpet and no draw iron node profile that will have enough permissions to kind of join the cluster and kind of map. The the group in the I am here sorry.

A

So it's just something internal to its provisioner, or uh am I gonna, I'm gonna have to give this to carpenter at some point, then.

B

Yes, you need to provide this as a parameter when you're gonna run the controller or hum chart or whatever you decide.

A

B

A

Yeah that will make sense so yeah. So what happens here is um for those who aren't familiar with, if I remember this correctly, you're um plus one for qps1.

A

Yes, I mean I did use the one powershell as my default on linux. I didn't do that for a while, but.

B

A

Tired of missing autocomplete uh well basher will take complete.

A

Is that's what that is yeah.

B

A

Yeah, so the way this is work is um we nee. We need a way to make sure not any old machine is joining your kubernetes cluster, because that could be bad. So we spin up a machine with an im role. It's going to have some credentials from the instance metadata service, then to authenticate the kubernetes cluster. It does um and it doesn't call aws sds, get caller identity, but it creates a signed request. So aws uses hmac the aws hmac v4 is its uh signing mechanism for requests. It creates the site.

A

The signed request sends that with us, the kubernetes authentication token there's a controller called the aws im authenticated, takes that actually runs the execution and says. If that is successful, then I know that the person who's gave that, to me at least has this iam role.

A

They have credentials valid for this. I am role and I'm going now. Gonna look it in my mapping. It then looks it in the config map and says right if someone comes in and they are able to present a sign request that matches this. I am role, then we, I will issue a service account token, which is a member of these groups to beat stratege group with this identity, and this just happens to be exactly what is required for kubler to be able to register itself against the control plane. So that's how this is working.

A

Right so we got that next bit is and whatever the helms that I've not used helm for like a million years, um so that's going to be fun right. uh We need to create a so now. We need to create an. I am roll. I like to now be coming to the oidc bit bit. We're going to create a service account for use with uh carpenter, so competent itself needs permissions to create one easy to run fleets and all those api calls. So we are now going to do that bit.

A

um And, oh my god come on there we go copy paste and I don't trust any of my environment werewolves anymore. Well, aws account id is pretty fine, but this one is the.

A

Oh, what's going on with my pump.

A

A

This one's just amazing badger, because I did it wrong.

A

A

A

Yeah so earlier today, well just before we started, I turned. I normally have the cube context stuff off on the spaceship prompt, but I was making so many mistakes there back on drawback is it's massively slows down just drawing of the prompt because it ends up contacting the cluster every time.

A

Oh, how do I enable the uh okay I need to enable do? I need to turn on the plugin and what how do I turn on the oa dc stuff, enable.

B

There is a parameter in the config when you use the eks cuttle young file, I'm gonna have to kill the cluster. Aren't I uh maybe not. Let's try. Let me drop that here.

A

Yeah because I did not use a config file.

A

B

A

B

Not sure if it's part of the parameters should be, I guess.

A

If not, let me see if I can just turn it on in here.

A

Right uh we're in ireland.

A

No all right, it's not there is it only at creation time then.

B

I'm just trying to.

A

Quickly here we go here, we go oh yeah, but so eks ctrl would have done a big dance to set this all up, and I haven't because I didn't do that.

A

I would now have to do that manually, so I might be screwed which I'm because I'm I'm not gonna it's too late in a week to do the uh to do.

B

That running it guess cuddle utils associate I'm always a provider and cluster name.

A

Oh, that might be that, might that might be good? Yes, all right. Okay, fine yeah.

B

Maybe that's gonna cut it. uh There is hyphen having classed an email.

A

I put a dotted end.

B

I think they're uh approved that you need to provide as well.

A

Yes, that will be fine. They do not want to be doing that manually.

A

Is that it is that.

B

B

B

I'm not sure if that's gonna appear here. I think those are the external providers, not the issue.

B

A

B

Oh yeah, there we go there, we go.

A

Got the overnight.

B

Yeah, that's the one, but the authentication is just for the external ones.

A

Just check that works.

A

B

A

Yeah, that's fine.

A

um Said the airport you just got has to come on for it you mean this one or the one previously, when I would.

A

Okay, that does look better.

A

B

A

Okay, so we got that.

A

So uh from people not familiar with the oidc stuff, so this is kind of based on bits which were introduced in what kubernetes 121 120. I think um so kubernetes, because oidc has kind of been used. In a quite few places. There was the support added to make kubernetes acting as an oidc provider, and once you do that, there's the assume role with web identity federation api in aws, which allows you to exchange oidc tokens for aws credentials.

A

So, given that kubernetes can absolutely dc provider, we can make that kubernetes cluster trusted for uh aws, and then we can then have the ability to swap service account. Kubernetes service account tokens for im world credentials, um and then there is a web hook, which is called that I forgot what ursa stands for um yeah there's an eks pod web hook, identity thing that can take annotations off of service accounts, to exchange them for iem roles and that's kind of what um this is.

A

This step is only necessary if this is the first time you're using ect spot in this account. It very well might be so, and also I keep looking everything so we'll do that. Just in case.

B

Yeah they're just going to create a link troll for spot instances, yeah yeah.

A

um I think I did all that previously, so I'm gonna.

A

Okay, so we now have this helm command that we're gonna whack in. um I guess that should be the real one says in the one, with the num.

B

Yep yep that that should match the real name.

A

Of it yeah other than the service account, because that's already coming from the previous step.

B

Yeah this one is just gonna. Do a query on the endpoint.

A

A

Let's just check separately that that works down here and it's not going to explode.

A

B

A

A

A

All right, that's.

B

A

Running right, okay, so we have carpenter running yes, yeah, all right. After all that, so now we can run something. Did I put install that all right?

A

So we have that prime numbers think business running um how many nodes do I have right now, anyway, I couldn't even check so I've got two uh yeah, so I've got one node, oh yeah, so, going back to what dom was saying, node group is eks, specific yep, so yeah, so just machine deployments in cluster api node groups in um eks, node pools in uh cloud foundry or our old tansy kubernetes grid integrated um yeah, where everyone's got similar concepts to.

B

A

Same so, we have two: we've got the prime numbers um and actually what what I didn't even check, so it's m5 dot large. What is it m5 dot? Large these days I mean no 82 instances.

B

But should be a bit beefier than what you had before. I guess.

A

All right, then, it's frozen.

A

What is m5n I haven't seen that is, that the graviton ones I thought.

B

A

B

No, that's g, uh not sure what the n is.

A

Oh, is that nemo is it for new one? Is it.

B

No all the m5 generation should already be having enabled that, but.

A

There you go: eight nice, smaller, actually smaller than the t3.x large eight.

B

A

Of ram tv cpu, so we have uh two of those right, so, okay, so if I were to um edit that deployment, I guess.

B

It seems like the and stands for network optimized, so they can lock to 100 gig network performance very.

A

Interesting so um oh yeah look, we were gonna fake out there.

A

All right, we're gonna, pretend that this needs um for mystery reasons, 10, let's say 10, that's obviously too much now.

A

um Maybe let's have a look at the logs yeah see what this is doing in the background.

B

Yeah under the logs of the controller yeah.

B

A

Nothing super exciting. Now um it's just saying it's reconciling nodes work account 10.. Let's have a look in here.

A

Do I need did I need to do anything else? Oh yeah, we might want to add yeah actually hold on. Why don't we turn on the debugging?

A

Why don't we do that.

B

That's gonna get very verbose, but okay. So what what we're still missing is the provision.

A

Yeah interesting, okay, cool.

B

So, uh as you deployed hung chart that actually created a crd for us which we're going to use here and using that you can actually define things like we mentioned before, what kind of instances you want to provision which pages you're limiting to and things like that, all right.

A

And we need to replace that.

A

All right, I won't turn on the debugging, then as you're saying so robert we're both I'll leave it off fine.

B

I think it's going to be enough to see what we want to see in not using the other vlog, but yeah feel free to use the book.

A

uh It down and then the instance profile has got the wrong name so without the hyphen. So we do that. So here we won the alpha five. It's good good to.

B

See api yeah they want to do responses.

A

Do I want to use spot instances?

A

I don't know okay.

B

I don't know what.

A

A

A

It's that keeps the um vmware accountants happy.

A

A

That they're not constantly asking me to reduce, then it's fine.

A

B

Yeah, oh, have you labeled the subnets for the auto discovery you haven't? Probably I did not no yeah.

A

I guess it would have done that had I used the config file to create a cluster, but given I didn't.

B

Correct yeah, I think the the the example provided in the getting started already has that in so yeah all right. Let's.

A

Do that manually, then.

A

Obviously, don't do this in production.

A

uh Oh, I've got a lot there's. Oh there's lots of nets.

B

Private subnets, where you want to provision your notes.

A

Private subnet, so I can just I can just pick one of these, then I don't have to do all of them. Do I.

B

I think one is probably enough to get things going.

A

All right, so, um what's it looking for.

B

Yeah, I think the cluster name is the value.

A

Yeah that sounds about right.

A

Let's have a look at this um provisional resource.

A

So it's got a status, nothing there's nothing in the status so far.

A

Maybe I'd maybe I'll just give it a kick.

A

Because it might have gone into back off or something so, I'm just like just give this a gentle kick yeah.

B

B

Right, it looks a bit happier now.

A

Yeah, that's right.

B

Yeah, that's the status, we're looking for okay, so it's being packed uh one part for a single note.

A

B

A

Did I need to tag some security grids getting.

B

A

B

Yes, there are tags on those as well oops.

A

Is it the same ones or is it different ones? Yes,.

B

It's the same tag that you use for vpc.

A

A

A

uh This one, I suppose.

A

Yeah, so we could see the um computed packing of one node for one pod with instance. Types options are three lights, so it's using some.

A

um I think this information is, if I remember correctly, it's still compiled in right. There's a table lookup table inside.

B

So it's dynamically querying all the instance types available. Unless you specify like a list, you can kind of scope that down to a certain set of instances, but if you don't provide that it's just going to consider all available instances that you can provision.

A

Still unhappy about the security grids.

B

Yeah, maybe it needs on the kick in you know, to the controller.

B

So if you scroll the app you're going to see yeah, there is odc that we were missing and there is stacks that are being kind of provisioned. All the resources.

A

Yeah so they'd say on.

B

A

A

A

B

We go actually.

A

Yeah, maybe it's calmed down, though.

B

Yeah yeah, no, no. I think it needed some time to pick it up, but obviously there is a pod that has been bound to a node that has been provisioned and if you look at the locks you're going to see that it picked up r4 large, so that was kind of the optimal instance for the workload that we're looking for.

A

So does it I I did it, I don't think it picked that did.

B

A

Thing no, or does it actually what I thought it happened is it it sends all of these into the fleet request.

B

A

Definitely picked one, it's the one.

B

And provisions the specific one that it's provided here, yeah.

A

Okay, maybe let's have a look at the fleets whilst we're here at least.

B

B

It should be easy too. I don't live now.

A

It's just all under ect, yeah.

A

What is it no, what.

B

I thought it was under easy, too.

A

It's not on the spot request. Is that what the hell is it in the console.

A

B

Yeah, that was the one.

B

So it was our four large yeah, the one that we provisioned.

B

It might be a little bit different for spot rather than on demand, but I think with on demand, it will pick one up and provision on that. One.

A

Yeah, I'm kind of kind of interested. Why don't we just have a look at crowdtrail for a second and have a look at what he was doing actually.

A

Right so it did send a create fleet reque request, and in that request, that we asked it asked the spot yeah it sends in all of them right.

B

That's why I thought.

A

B

So he just sends the whole list and the fleet just picks one than one.

A

Yeah, that's right.

A

And then the response so yeah and then the fleet says this is the one that it picked.

B

A

Yes, because I I've had this conversation with aws, so in terms of like I, I can't remember who it was with ellis or someone else, but that um in a sense the run instance. Api is kind of deprecated ish, because.

B

A

Run instance, api. You can only request the single instance type.

B

A

That's true, whereas with the fleet api you can send in all of them, and the fleet requests will try and get the best one or the best cost one for you yeah. um You can use easy to instant fleet, which gives you the same outcome as ecd1 instance, but you can specify multiple instance types.

B

Right, so if you were to kind of specify a single one, the functionality is exactly the same right, but you can provide multiple yeah.

A

Yeah, so you can so they aws suggested that in cluster api aws that we actually rip out one instance replace it completely with fleet, and I know there was some fight from red hat. I think alice mentioned michael mckeon, but also.

A

Oh, I forget his name, joe jojo someone. Oh my god. My memory is terrible, but go away from cluster api for a week, and this is what happens um yeah. So we have had some conversations about doing that in class. I, it would be a migration because you know with people we've got one set of iem permissions and they've been using that forever.

A

Then we have to. If we switched api, we have to do a bunch of stuff to make that happen, but that is a way where we could at least enable the aws.

A

We could enable cluster api provisioner that works on aws through and carpenter, wouldn't need any permissions, and then we could start looking at how to support vsphere, for example, like vsphere would be quite interesting right because in vsphere you can you basically have a like a slider like a moving slider for the mountain memory or cpus. You can almost get a sort of fargate type experience where you can provision a single node for that single pod, which has exactly the right sea view and exactly the right memory.

A

It's quite interesting, which is something that vmware did in the past right.

A

It's kind of interesting that old ideas get come up again in new ways and to enable new use cases which is kind of interesting yeah. So what I think what I'm interested in maybe is just having a quick glance at the.

B

A

It's going to be aws.

A

Could land on somebody else's replay or something right? um So we have these cloud provided so we have. Are they? Let's have a provisioning? It's about the provisioner here.

A

Yeah, okay and that provisional resource, if you look at, has been written generically enough that it can right. These are nothing about. This is aws specific, necessarily.

B

No, uh this the.

A

Section provider.

B

Yeah, so so the provider would probably react differently. If you would have something like I don't know, vmware or gt or whatnot.

A

Yeah because there's no instance profile in beyond my lab, for example, and no other cloud points. So let's have a look in that so.

A

What have we got in the spec.

A

It's got a wider type object.

B

So yeah it's but it's kind of.

B

Prepared for different cloud providers right currently, it only supports aws, but uh should be able to extend that to, like you know, vmware or google or azure.

A

Yeah so that's kind of.

A

Yes, I think that provider is kind of generic, so we'll just take a as long as it's sort of valid based on three. That could be anything and then I guess that gets handed off then to this cloud order and we've got this aws one. So if we were to want to write our n1, then we'd put the logic in here.

A

B

A

Sense, um yeah, and I think that some of this is cleaner than yeah and, as we said before, you know we're not importing the kubernetes code, so that means we're not having to immersion together. um Carlos says pod autopilot concept. What's that what do you know what that is pod autopilot.

B

I'm assuming that's referencing, the google autopilot, maybe yeah.

A

Oh okay, yes yeah, yeah, yes, very much so yeah. I think that would be definitely very similar in that response that regards yes,.

B

Yeah, it does kind of offer the same functionality where you would just leave it up to the cloud provider to kind of dynamically scale up and down as you want. I guess.

A

Yeah, um I think it's probably just to say everyone's clear. I had some conversation with some users in earlier in this week. um There are some limitations with carpenter. At the moment I mean it is it's a fairly early stage project? So still I mean it's ga from age respective, but it's still an alpha api. There is still only one provider, the aws one um but yeah just be aware, like uh pod affinity, pod anti-affinity is not yet supported, um so there are some limitations there.

B

I mean the one that I kind of most commonly here is, I think, currently the the the node storage that is provisioned. It's all it's kind of baked to 20 gig. Now so uh the launch uh template, that's using. uh That's used it's actually having that predefined. Obviously you can get around it by specifying a custom launch template, but it's currently not supported in the provider directly.

A

B

Provisioner yeah.

A

Yeah, so that's interesting so once if you once, you have um yeah so carpenter will find it difficult to schedule because it doesn't support affinity or anti-infinity. So once you have that yeah so.

A

Okay, so there are limitations. Just beware: the scheduler is not 100 complete well today, um and I think, maybe probably then this might need, if it's not in here, which probably might be worth making that clear in a documentation probably like what are the? What are the limits? uh What are the limitations? What does work, what doesn't work just so it's clear to you end users. I think, because otherwise, this just for my own sort of open source experience. You don't make that clear.

A

Then people, people, if you don't say what is possible people, will try and do things. Unacceptedly um ah and interestingly, yeah eks built in core dns leverages the pod anti-infinity rule yeah. So that's interesting, so you got. There are some caveats that you need to be aware of.

B

Absolutely I mean, obviously, if you look at the open issues, there are 117 so before maybe spinning that up in production, I would definitely recommend looking at those. If those are the things that would block you or kind of prevent you doing stuff and maybe uh hold it off for now, but.

A

Yeah, but it's definitely interesting and exciting. I think we, I think it's something to look forward to and if there's also any chance of cross fertilization back to your cluster or to scale otherwise, then.

B

A

Also cool um we've been doing.

B

It for quite a while now do we quickly want to try something really quickly like what at least I found really a nice feature as well. So, okay, they do provide a finalizer on the note. So, for example, if you go to the console and just if you try to delete the note, it's interesting that it's actually having a finalizer on it and it's actually trying to drain the note like gracefully, which is kind of a nice feature as well.

A

If I just delete it from the console, yeah.

B

Just go cube, cut or delete node trash, it yeah.

B

I think that's a cool feature, so obviously, if there is capacity available, it will try to reschedule parts on a different node. But if there is not it's just going to provision under the node to satisfy the the requests.

B

Just make sure you're killing the one that runs your workloads, but okay.

A

It was 13 minutes old, so I guess it was, and it's.

B

So maybe maybe if you show the locks from the controller, we're gonna see what's actually happening.

A

But you know what would be funny if I killed the node running carpenter that would be funnier.

A

Didn't check that yeah I got deleted: okay, uh cordon node, okay, yeah cool, so yeah, so that's similar functionality. What we have in cluster api as well so in cluster api. If you delete a machine then it would start doing the same sort of thing. I guess the benefit of this is we get that bin packing quite quickly as well?

A

um That's pretty nice all right! Maybe we should call it a day. I think I've been going for our fifth day. I think that's a good! Let let you go to sleep, and I can also I mean I've. I've had multiple monster energies, so I'm pretty, unfortunately not being able to sleep um but yeah. uh If you've, uh let me be a normal youtuber and say you know: if you've enjoyed the show, then click click like um we're, not sponsored.

A

We don't get very new from youtube that none of that matters but uh yeah uh do click the like button. If you didn't like it, then leave a comment. Tell us what we can improve um and then also hit the subscribe button and click the little bell icon. So you get notified every time we do a stream.

A

um Any final comments from you, marco or.

B

No, I have to say I really enjoyed the session. uh Really thank you for having me. It was great fun kind of you know, doing it from the scratch, and maybe you know, like you said you, you haven't really had an experience with it, so it was really nice to kind of uh get across all those bumps that you know. Maybe users would usually get across if they deployed the first time around so yeah which good stuff.

A

All right, thank you very much, marco for joining us uh and thanks for everyone for tuning in. um I think there is a show next week. Yes, there is, I forgot what this topic is, but you will be we'll have another one so see you next week. Everyone goodbye.