Cloud Native Computing Foundation Online Programs, 13 Sep 2023

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Cloud Native Live: Kubernetes automatic rightsizing

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

Oh okay, so hi Welcome to Cloud native, live the way uh we dive into the code behind Cloud native I am Muhammad charger, almost like a metal, so I'm a sensitive, Ambassador and I will be your host tonight. So every week we bring a new set of uh presenters to Showcase how to work uh with Cloud native Technologies. They will build things, they will break things and they will answer your questions in today's sessions. I'm stoked to introduce Andy, who will be presenting on kubernetes automatically right sizing.

A

So this is an official live stream of the cncf and, as such is subject to the CNC code of conduct. Please do not add anything to the chat or questions that would be in violation of the code of conduct. Basically, please be respectful to all of our fellow participants and his presenters. With that I'll hand it over to Andy to kick off to this presentation and, let's add Andy to the session, so hey Andy. How are you I'm.

B

Good, how are you today yeah.

A

I'm, fine, okay! Thank you! So much for this all sensation! So yeah! You can start.

B

All right so uh my name's Andy I'm, the CTO at Fairwinds, um author and maintainer, of several of our open source projects, including Goldilocks, which I'll talk a little bit about today, um but today, I want to talk about something I've been working on over the last couple of months um and that we'll be slowly maybe making its way into Goldilocks, which is automated right sizable.

B

So for those who aren't familiar with Goldilocks Goldilocks is a rapper around the vertical pot Auto scaler project that lets you um automatically provision vertical pod, Auto scalers and then view the recommendations for resources for all of the pods in your cluster in a single dashboard. Now, up until this point, Goldilocks has been really focused on recommendations. How do we see what resources our pods are using and allow us and and give us a baseline for setting those going forward and how to tweak those?

B

What I'd like to start to explore further as we go you know into the future, is how do we start to utilize the automatic right-sizing abilities of the vertical part out of scalar in a safe and effective way, so that we can increase the utilization percentages of our clusters? There's so many of the Clusters that I work with are utilizing so little of the resources available in them, because we tend to over provision and I know. That's a really hot topic right now, because we're all worried about cost across the board.

B

At least a lot of us are I know so um today, I want to show how we can set up a cluster utilizing I'm, actually going to use four different, open source projects to put together a cluster that automatically sizes all of the workloads in the cluster and allows it to Auto scale.

B

um So uh please interrupt me at any time with questions. Keep them coming and we'll just kind of dive into the setup here and I will show what we have going on.

B

um All right is my screen share up all right, so I have an eks cluster here and I have four different Technologies specifically running in this cluster that are going to help us out today. So the first one is I need uh cluster Auto scaling, so I need to be able to get new nodes in my cluster and I need that to be relatively um uh relatively flexible because we need different node types in order to maximize the utilization of our cluster.

B

So we have here running uh Carpenter, so for those not familiar with Carpenter carpenters and open source, Auto scaler for kubernetes, you create an object, called a provisioner, and so we can take a look at the provisioner here in the cluster and the provisioner essentially gives you the ability to say. Okay I want these kinds of nodes, but it also allows you to specify a certain amount of flexibility. So there's some values in here that are important.

B

We have instance category so I've listed three different instance types that we can get in this cluster, so we have C-Class instances m-class instances in R class, so that's compute, optimized, uh general purpose and then memory optimize instances. So I want Carpenter to be able to pick nodes that have different balances of CPU and memory based on the workloads that I'm going to deploy in this cluster uh in order to save on cost I'm, allowing it to only provision spot instances.

B

uh That's kind of up to you whether you want to do that in your environment, but I'm doing it, because this is a Sandbox and I I. Don't want to spend a.

A

B

Money on it um and then you can cap out the amount of resources the carpenter has. This is just sort of a safety thing for me. I don't want my cluster to blow up we'll talk about some of the pitfalls of automatic right sizing later on and I will explain why this has to be in there and then the only other thing that I like to enable in Carpenter, which is not super related to.

A

um uh Hey Andy, can you.

B

A

A bit I guess it would be.

B

A

For the folks, no.

B

Problem yeah no.

A

I guess that's good.

B

Yeah all right, so um the only other thing that I like to to do in Carpenter, which is not required for um automatic right sizing but I think it does a nice job of keeping my cluster fresh and allows me to do. Upgrades more easily is set a TTL on my notes, so my nodes will expire after a day, so no node will live longer than a day um with you know varying load in my cluster.

B

That tends to not happen anyway, because Carpenters constantly rebalancing um and then actually that reminds me of the last thing, which is consolidation. So this gives Carpenter the ability to evict pods move them around to different nodes and sort of rebalance how the cluster is structured, which is really important for automated right sizing, so Carpenter the first tool we have in place again: we've got consolidation, enabled true.

B

We've got multiple instance types that it's allowed to use so that we can get different topologies and then that TCL seconds until expired is just something that I think is nice to have. So that's the first tool that we have configured in our cluster um and so I install that with the helm chart.

B

So if, um if we want to look at my values file, it's fairly straightforward, it's really just giving it access to the roll arm that it needs to do its job um and then some other some service monitoring, so that I can get some metrics in the cluster and really not much else. Just some security things that I have enabled in this clusters.

A

There's not a ton of.

B

uh Configuration for Carpenter other than adding that provisioner that I just showed so.

B

All right see, we have some folks with video issues. I think we look okay from my previews, so I'm gonna keep going. um The second tool that we have configured in the cluster is the vertical pod Auto scaler. So we are going to install that with the Fairwinds chart. For that. So that's at github.com fairwindsops, slash charts.

B

uh We have a vpa chart that allows you to install the vertical pod Auto scaler I'm, using the latest version of it and I'm honestly using a good portion of the defaults except I'm I'm, enabling the admission controller, which I believe is not default in our chart, and that's so that we can actually do automatic vertical pod Auto scaling. So uh we need to have a certificate in place, I'm using cert manager to generate that certificate and manage the mutating web hook. Configuration and beyond that I think vpa is fairly standard configuration.

B

Oh the last thing we have to do. We want long uh long-lived data to feed our vertical pot out of scalar, so this is super important to getting accurate um recommendations from the vertical parallel scalar. So we have it hooked up to Prometheus. So in the recommender, which is one of the components of the vertical pod Auto scaler, we give it a Prometheus address and.

B

We give it a Prometheus address, we're going to give it a minimum CPU a minimum memory and we're going to say the storage type needs to be Prometheus, so that will allow our vertical pod Auto scaler to reference this Prometheus in the cluster to get metrics. So if we go take a look, we can see we have Prometheus running in this cluster. We have things like you know: machine CPU, cores available. We have all the various metrics I'm using the standard. Cube Prometheus stack installation to get that um I seem to have lost. My comment.

B

Feed so I'll have to rely on you to to throw questions at me as they pop up. Okay, I will do that all right, um so few Prometheus stack collecting all the metrics in the cluster relatively default, configuration there and then the vertical part Auto scalar, pointing at that Prometheus using Prometheus asset storage. So we've got Carpenter.

B

We've got vertical pot, Auto scaler a couple different values associated with those, and now we get into the the next bit, which is um let's talk about Goldilocks next, so Goldilocks allows you to create vpas for all of your workloads. So if we look in this cluster and we do a get vertical pod.

A

B

Scalars across all the namespaces we'll see that we have one for every single workload in this cluster, there's quite a lot of different workloads, we're running our CD, we're running Prometheus, we're running. um You know a few different demo. Apps we've got one in the team. One namespace we've got a Yelp app. We've got a demo, a basic demo, app running as well.

B

I'll talk about the actual applications in a minute, but we can see we have a lot of different vertical pod, Auto scalers, and so in order to do that, we installed Goldilocks using a Helm chart, like I've mentioned in the past, um at the same repo that we have in our Fairwinds stable repository, and the only thing that we're doing here, that's not standard. Is that we're setting this on by default flag? So what this does? When we tell the controller and the dashboard on by default? This means that um it will.

B

We don't have to annotate the namespaces that the objects are in in order for Goldilocks to create vertical pot over scalers. It will just create one for everyone in the cluster automatically, no matter what so Goldilocks when we configure it. This way we'll create a vpa for every object, but it won't be turned on it'll, be in mode off, which is just the recommendation mode, which is the default for how Goldilocks operates.

B

So the last thing that we do is modify goldilocks's ability uh modify, how Goldilocks create these, creates these vpa objects. um So let's go ahead and get the namespace Yelp.

B

This is one of our demo applications, the Yelp application and we've added two annotations to this namespace, and this is a sort of uh not well-known feature of Goldilocks that allows you to modify how the vpa is created. So first one we have here is we have Goldilocks about fairwinds.com vpa, update mode set to Auto, so that's going to put all of the vertical pod Auto scalers in the Yelp namespace into this automatic mode, which means it's going to automatically when a pod gets created.

B

In that namespace and mutating admission, web Hook is going to set the resource requests for the pods created in that namespace and you'll notice. It's turned on on auto for all of my namespaces, so every single pod that gets created in this cluster has its CPU requests set by CPU and memory requests set by this mutating admission web hook in the cluster, which is a little bit terrifying.

B

um You take things on the fly as you're, creating them all the time, which is why I'm doing this in a sandbox cluster and we'll talk about some more of the pitfalls of that later, and the last thing that we have is the ability to control minimums and maximums Via, this container policy and or this vpa resource policy annotation, um and it's probably easier.

B

If we look at the uh the vpa object that gets created itself, we'll take a look at this Yelp UI vpa in the Yelp namespace, and we will see that it had Goldilocks has added this resource policy here that defines the behavior of the vertical pod, Auto scalers automatic, right sizing or resizing in this namespace. So this applies to all containers in every pod, because we have a star here and in this case we're saying the maximum allowed is four CPUs and six gigs of memory. I.

B

Think it's important to pick a value for this um I had some early experimentation. That was really interesting where the vpa, when it doesn't have a full amount of data, will sort of recommend, potentially very large or very small amounts, and in the case of it, recommending very large amounts. What can happen is uh say, you know it thinks it needs. 16, CPUs, right, I, think I had one that it said this pod needs 16 CPUs.

B

Well, that wasn't accurate, but uh it went ahead and created that pod modified that pod to request 16, CPUs and then Carpenter in all of its. You know, flexibility very happily, obliged and I think it created an m512xl in my cluster, which is a very large instance size that I was not expecting, and so um you want to have these caps on here just for a little bit of safety. Now you those are controlled again through that annotation that we showed earlier.

B

This is just a Json format of this container policy that we're looking at so you can modify that on a namespace level. You could even add additional. You know. Specific containers are allowed to request more, but I definitely recommend having this resource policy in place just to cap things. The other thing you can use is a limit range, which is a kubernetes object. The vertical pod Auto scaler respects limit ranges as well. I found this resource policy to be a little bit more flexible and easier to work with than the limit range object.

B

I'm going to recap: we have all of our containers being automatically resized by the vertical pod, Auto scaler, you can see. We have a lot of different recommendations. The Ingress nginx controller here is requesting four CPUs and I can't do translation from bytes to gigabytes in my head.

B

But that's uh you know a decent amount of memory that it's requesting there, um and so we have recommendations for all of our pods we're collecting all of the metrics in Prometheus and then we're allowing Goldilocks to create these and then Carpenter is giving us new nodes in our cluster.

B

Based on the uh requests um coming into the cluster, so uh you can kind of see all of the different Dynamic pieces going into this that allow all of these to right size, and so the last thing that we need to talk about is horizontal pod, Auto scaling, so we're vertically sizing we're setting our requested limits, but we also have applications like this.

B

um Let's go to the Yelp namespace. We also have applications that need to horizontally scale. So um if we take a look here, we'll see that well, we have two replicas of the app server running. That should actually be more I'm, not sure we'll dig into that in.

A

B

Minute dangers of a live demo, but where are we need to horizontally pot Auto scale? Now a typical way to pot out a horizontally scalier pods would be with an HPA object and you would maybe set that to some Target of CPU utilization for that group of PODS or average CPU across your pods.

B

But if we're also vertically scaling on CPU, we don't necessarily want a horizontally scale in CPU, because those two will be add-ons with each other and possibly conflict with each other. So the real key to this is being able to horizontally scale on a separate metric, and since we already have Prometheus metrics, if we go take a look here, we could get. You know something like nginx.

B

um uh Let's see nginx Ingress controller requests, so we're using Ingress nginx we're getting requests and we can divide that up by the uh different uh ingresses in here. So, for example, say the sargo CD Ingress. We can see how many requests it's getting. We also have metrics such as you know, Network traffic coming in or latency for this particular Ingress. uh We have.

A

Lots of different metrics.

B

Available but setting up the HPA with those can be a little bit difficult, so I'm going to add in a fourth project, actually I guess we're up to five now, because we've got Prometheus, Goldilocks DPA and the vertical pod, Auto scaler, so I'm going to add a fifth project which is uh Keta or how do folks say it like I, don't know, I, don't think we could do a poll here, but I'm curious, whether it's pronounced keita or Keta uh I'm gonna go with Keta for today.

B

um So ketta is a nice uh controller that allows you to uh sort of create horizontal pod, Auto scalers, with a different spec called a scaled object, so I'm installing Keta in the Keta namespace, with a fairly um standard set of values, I! Think of just setting some resource requests, adding some Prometheus information so that I can get metrics but other than that I'm I'm, using a fairly stock install of Keta and so with Keta.

B

What we get is we get these scaled objects and the nice thing about these scaled objects, and let me switch tabs here for a second is: if we take a look at the spec for that, and we look at say the Yelp app server scaled object. This is a very straightforward, spec, very similar to a horizontal scaler that allows us to specify a Prometheus query out of the box, so we can say here's where my Prometheus lives. This should look familiar from when we configured the vpa.

B

This is uh what I want to call this metric. This is the threshold that I want to shoot for per pod, so this would be 10, 000 requests per pod and then I can put in a query. So if we take this query for the Yelp app server and we go punch that into Prometheus, we can see the um the value of it at this given time so right now, it's very low.

B

We just saw that it's scaled all the way down, but what this does is it creates an HPA for us and then serves this metric via one of the kubernetes metrics endpoints, so that our our HPA gets automatically configured for us.

B

So if we look at the HPA, it's currently available here we'll see that that if I'm in the right namespace we'll go back to the yellow notes, just here's that Yelp app server HPA and we can see that 10 000 Target, currently we're at 277, so we're at a minimum of 2 out of 40 pots, but I didn't have to create this HPA I didn't have to write a Prometheus metrics adapter to uh do that. Keta just.

A

Did all of this.

B

For me so I'm a huge fan of this project, um and it is really the last piece of this so that we can scale all of our pods on horizontally, based on um metrics other than CPU and memory.

B

So the and then the the last thing I need to do. If this is not a real environment which it is not is I need to generate some load, so I'm going to go over here, real, quick and just double check on my load generation, because it doesn't seem to be working I'm using a tool called k6. This just runs a whole bunch of load against various endpoints, and that seems to be working. We'll see what happens.

B

Yeah, okay, that window wasn't important, I'm not going to worry too much about it, but so to recap again, because I think there's a lot going on here and I think it's it's sort of tough to put all the pieces together in your head. Necessarily, if you haven't done this before we've got the kubernetes metrics coming in through Prometheus, we've got the horizontal pod Auto scalar that is configured via Keta using Prometheus metrics that are not CPU in memory to scale horizontally.

B

We've got our vertical pot on a scalar scaling, pods up and down in their resource requests, and then we have um Goldilocks creating those vpa objects automatically for us and configuring them in that automatic mode. So in theory, everything should be completely dynamic, as I increase load on the cluster. We should see you know potentially the vertical size of some of these pods getting bigger as they start to actually consume resources.

B

We should see horizontal scaling where we we all the the different horizontally scalable workloads in the cluster, go in and out, and then we should also see the cluster creating new nodes to accommodate those and then maybe reshuffling them over time, and so, as you start to think about this you're like oh, how am I supposed to like wrap my head around this, so there's something wrong right. How do I like look at this happening and so what I'm working on?

B

Putting together what I have here is a dashboard that sort of brings all the pieces together in grafana here so I'm, going to switch over to that Yelp namespace that we were just looking at and I'm going to say: let's look at the Yelp UI Ingress, because the only one in this namespace and, let's just say, all hpas in this namespace, and so if we go, let's go last 24 hours, because I think we have some peaks in the last 24 hours that we can look at.

B

We have a few different graphs. We can look at. We can see the HPA, so we just saw that HPA. The target value was 10 000 earlier here we had it hovering right around that the actual value hovering around that. So this is 10.

B

228 requests coming in to that app server, and so you can. This is just kind of seeing the um and let's just zoom in on that time period. This is the HPA doing its job right. The HPA wants to keep the number of replicas such that the per pod number of requests is a 10 000.. So this is good. We want to see that and then we have, on the lower end, uh the UI pod. So this is a multi-tiered thing.

B

There's a UI and there's a back end, so the UI is um is also scaling horizontally. So that's the HPA in um in action doing its work. uh We can look at latency, so the next thing that we have to consider is the balancing Factor right. We can. We can, uh you know vertically scale and horizontally scale, all.

A

B

Where how what balance is that, what's on the other side of the equation, and then generally on the other side of the equation, is some sort of performance metric right? We need to have enough resources to have good performance in our cluster, so we see here we're just uh we're tracking latency on the Yelp UI Ingress.

B

You know this particular one has been sort of high around 600 milliseconds I'm, not actually sure why uh it's something I've been planning to look into, but um you know we want to have another metric to balance against, and one thing that I've done in other namespaces is add. This latency metric as a second scaling metric for the horizontal part of the scalar, because that key to that ketta spec lets, you add multiple metrics as targets, because you can do multi uh multi-metric pod, Auto scalers.

B

So that is one option for maybe balancing performance with your resource requests uh down here we just have the raw number of requests coming into the Ingress um and then over. Here is where we start to look at the vpas, the vertical part of the scalers.

B

um So if we look here and I'm just going to filter down to the app server, because it's a little bit easier to see, we've got the uh Target from the vertical pod Auto scaler, which is its recommendation, and then we have the actual request. Now these should be the same which might be worth looking into, but they are very close. So we can track what the vertical pod Auto scalar is doing for both CPU and then over. Here we have memory. So let's go ahead and filter this one down as well.

B

We have memory at exactly what the target is and the request.

B

um So we can see the vertical pod, Auto scalar, doing its job modifying the requests as they come in and then we can look down here at actual utilization. uh We can see the uh we'll filter down to just the app server again.

A

B

Using 60 Milli cores, the vpa is targeting 25 and I'm, looking at a very small window of time in this particular graph. Just so that we can see the graph, my guess is if this hovers at 65 for long enough, the vertical pod, Auto scaler will start to bump that up uh and then same with memory. If we look at just the app server for some reason, we're hovering around 1.7, gigs, um not sure. What's going on there.

A

So this particular app behaves a.

B

Little weirdly um might be not the best example, but we we have a place now where we can start to see all the different pieces um and then over here we have just a little graph showing uh Carpenter doing its job. So this is how many machines, it's created and terminated um I've been working on adding graphs for seeing the types of nodes. But if we take a look at the cluster right now, we can see uh we have.

B

We have our base instance group, so we have a single managed instance group that allows us to run the carpenter controller and things like that. I've changed that to a C5 instance type, because I've noticed that this cluster is particularly CPU heavy. But then the ones that have a provision are listed. These were created by Carpenters, so we have a C5 2XL and a c54 XL.

B

Obviously, Carpenter also recognizes that we're very CPU constrained in this cluster, not memory constrained, um and we can see how it's uh reacting to that by giving us compute optimized instances.

B

Well, I absolutely can use the look at how this is functioning. Is this tool called Cube capacity written by um Rob Scott? If anybody's familiar with him- and uh so if we look at Duke capacity and we add the utilization flag, the dash U flag, we can see that we have um well, maybe there it goes.

B

We have a CPU utilization of 83, that's pretty good across the cluster I. Don't see that very often up that high and then we have a memory utilization of 20 percent. Now that seems a little low. I would love that to be higher. But if we dig into that a little bit, we can see that we're using 10 gigs of memory across the cluster and we're using 24 and a half CPUs. That is a two to one CPU to memory ratio.

B

And if you look through the instant size list available in Amazon, you will find no instances give you that sort of spread of memory to CPU, so I've been having a debate with some of my co-workers about whether I just have some uh non-ideal workloads running in this cluster. That are just you know, not sort of your average workload or, if you know, maybe there's something uh to to dig into further there, but having a CPU utilization above 80 feels really good to me and it's sort of what I'm going for.

A

B

Along with, if we go back to maybe one of our other demo apps um a um a um latency value, hovering under 100 milliseconds for this particular application, so that feels pretty good to me um and so yeah. Well, it was here it went back up here, so we'll have to look into that, but we also dropped our requests.

B

um So I'd also like to share just a little bit about the demo apps. We have I think three of them running in this cluster. We have Yelp, which I've mentioned a couple of times, uh which does not seem to be functioning at the moment.

B

Always got to break something: don't we nope all right? We have this demo application, which is just constantly pings the uh back end and shows you how many pods there are, or just kind of shows you which pod it's talking to the color tear a little off. So you can't see that, but that's the name of the pod that's being hit. So we can see the horizontal Auto scaling in uh action here and then we have.

B

The Emoji Emoji photo app from our friends at buoyance who make Linker D where you vote on an emoji, and you can see how many votes they have because I'm generating traffic against this. So some of these have a lot of votes, um 180, 000 or so all right. So those are the three apps we're running um and yeah.

B

So that's kind of the general setup and you know obviously took me a half hour to get through the whole setup because it took me probably two weeks just to build all this and and get it working um and the goal is in the future to make this easier to do it's such a complex process, there's so many different pitfalls.

B

There's so many different levers you can you can pull not as you can turn that uh we want to start to understand how all of these tools work together and then build an easier story going forward. So that's kind of.

A

B

Goal here, um do we have any questions? Do you have any questions.

A

I guess not I. You know, I have no question because it's very smooth, okay, okay,.

B

Yeah um so I think an important thing to talk about is various different, different pitfalls. You know, I've talked about sort of the the issue, a couple of the issues, one of them being that vpa requires eight days of data to be really giving a good recommendation, um and so, if we we go back and we look at Prometheus and we grab a um CPU utilization, oh we can grab really any any graph here.

B

um We need to be able to say a week of data right, so I've got a Prometheus instance set up here. It's retaining eight days worth of data. That's can be a considerable amount of information to store for any. uh You know Prometheus instance, depending on the size of your cluster. So that's one thing to worry about is: how are we storing all this Prometheus data for long-term storage a week may not be a problem. It hasn't been too bad for this cluster, but it might be in a much larger environment.

B

So that's the first thing to consider. The second is understanding um how the vpa works, so the the vertical part Auto scalar um it uses over those eight days. It uses a decaying, histogram and so of of the utilization for CPU, and then it uses a um sort of that. It uses memory Peaks over an interval to generate its recommendation and um it can only set requests. It cannot set limits, and so, when we set the requests, it can adjust the limits and move them up.

B

So if we have an initial amount and it's going to move them up or down it'll move them proportionally, but it it won't set limits by default. So actually, in this cluster I have very few CPU limits or memory limits set. This might be uh risky in certain environments or for certain um for certain workloads, and so that's something to evaluate if you go to set something up like this is: do I need, CPU and or memory limits.

B

I know, CPU limits are a hotly debated topic and I won't dive into the details of that today. But um you know, do I need them. Where should I put them? That's going to limit the ability to uh to scale as effectively or to utilize more resources as effectively so I had a really hard time getting to that CPU utilization of 80 percent uh across the cluster, without removing limits, and so something definitely to consider there um in your individual evaluation of it.

B

The um the next thing to think about- and uh let me just pull up my notes here- um uh the dangers of using Carpenter I pointed out earlier. You know you might get an M5 12xl or a c512xl spun out because of an errant, um an errant recommendation from the vpa, so capping is important.

B

The other thing to be aware of with Carpenter is that there are lots of ways to utilize, uh node, selectors and resource requests or annotations in Carpenter that um restrict Carpenter's ability to function so I could create a pod that said, I want to run on a C5 2XL specifically, and that will force Carpenter to create a C5 to XL.

B

Well, maybe that's not the best choice for the balance of price and compacting workloads that Carpenter wants, and so what I recommend is using some sort of policy engine to restrict that in your cluster, so I actually have in this cluster oops wrong window. There we go um I have some Opa policies. These are being applied by Fairwinds insights. I, won't I, won't talk too much about fireman's insights. Today we have some opal policies for carpenter that restrict specific things, specifically the uh we're restricting the ability to use the node selector carpenter.cates.ios instance.

B

Family um I think there may be other node selectors. The carpenter respects that I need to restrict, but essentially we're saying you can't create a pod in this cluster with this node selector and we're enforcing that via Opa at admission time, so that we don't let workloads in the cluster sort of mess with Carpenter's ability to compact.

B

um So that's that's one thing to be aware of with Carpenter I. Think I've talked about this in other content that we've put out before, but using some sort of policy to control.

B

um You know the workloads coming in, so they can't break Carpenter. uh The other one that we have is uh the carpenter do not evict annotation, so you can tell Carpenter not to evict this pod ever. But what that means is that you can't scale down you can't let Carpenter move workloads around and compact the cluster, um because it can evict pause. That's the mechanism by which it moves pods around so instead of using the do not evict what we use is. We have pod disruption budgets on um some of our apps.

B

So if we look at the various services for the Emoji photo app, we have a pod destruction budget of Max unavailable one, so it can only evict one POD at a time, so we're not affecting performance. We're not letting you know, Carpenter just wipe out our whole service, but we're using these pod disruption budgets to protect us.

B

So the sum of this is auto scaling 101, like things that you should be doing, no matter what, whether your automatic right size or not, if you're horizontally scaling- and you have multiple replicas, you should probably have a pod disruption. Budget that'll protect you in the event of you, know, nodes being drained for an upgrade or various other events like that, so pod disruption budgets are super important.

B

um Another thing to be aware of, we talked about performance a little bit, but um you know, especially when you're, using an Ingress controller. I. Think one interesting thing um here is we're automatically right sizing, not just the workloads but the Ingress controller that serves those workloads.

B

So we have a horizontal pod, Auto scaler, that is I, believe working on um the query. We're using here I think it's requests a second metrics average value. Http requests total, so just pure number of requests coming into the Ingress controller, but it needs to scale respective to all the workloads in the cluster because it's serving all of the traffic, and so your English controller may end up getting fairly large.

B

um If we top pods in the nginx.

A

In graphs, namespace.

B

We'll probably see we're using five CPUs per uh instance of the English controller, and we have four of them, so you know you need to be aware of the relative scaling of you know your.

A

Funnel, if you have everything coming to the English control.

B

Controller and then funneling out be extra sensitive about how this particular workload scales. I, think that's super important to be aware of, um and then really just monitoring uh I had to do a lot of extra stuff to get all of these metrics into grafana for these various workloads. So something to be cognizant of is all of your various Prometheus configuration to get that working.

B

um I think you know in order to get um in order to get the BPA State apologies into Cube, State metrics, you have to add custom resource State for the vertical podoscope, so this is pulling in that vpa container recommendations into Prometheus, so that I can see that I also had to I'm starting to pull in um the carpenter annotations for the various.

A

B

That I can keep an eye on what instance types are being put into my cluster. So that gives us that Cube node labels.

B

um Metric in uh Venezia's, so we can see specifically, this is a c54xl here, c5d2 extra large, so with those nodes that we saw earlier now we have that available in.

A

B

We can create dashboards and monitor that and keep an eye on the size of our cluster all right and then um just to kind of show the the history of working on this. This is the last months of data I've got in this cluster. We started out at 28, CPU utilization or down at 16 here and we've gone all the way up to 87, and we should be hovering right around 80-ish here uh over the next few days and then memory. We know from what I've shown before.

A

B

Still fairly low, uh something that I'll be I'll, be working on in this particular environment, so um yeah I was hoping for more questions. There's a lot going on here.

A

Yes, I guess the below. We don't have any questions, I guess, but there was one question like which should we you choose, so it was in between the uh when you are actually showing the labs right, but yeah other than this there's no questions left uh I, guess uh viewers! You can actually ask your questions whatever uh you have any. If you have any doubts or not but other than this yeah, you can continue session. I. Guess: okay,.

B

A

B

Don't, uh let's see I'm trying to think if I have a ton else here to cover okay.

B

um We could talk a little bit more about the various queries that we're using to horizontally.

A

B

I've been experimenting with different ways to uh to scale, so you know you may not have an Ingress metric available.

B

um You may only have you know, sort of the base, kubernetes metrics for your pod or just what comes out of the box with Q Prometheus stack available, and so we need a metric. That is not CPU, not memory, but maybe we don't have Ingress requests. Maybe we don't have latency to scale on. Maybe we don't have something like that. So.

A

B

Else can we scale on, and so here uh we're actually scaling on container Network receive bytes. So just the raw amount of data coming into that.

A

Container, which is a.

B

Rough approximation, for you know how many requests or how much you know load do I have on this container. The the issue that we're seeing with this is that this threshold is wildly different, depending on the application, the size of a requests.

A

um Oh yeah, we have questions.

B

A

Yeah I guess we have a question right.

A

Where Carpenter is not available, what are the options to try out this.

B

That's a great question: um I'm, really hoping that Carpenter expands out into the other Cloud providers in the near future, but I know that's not necessarily. um You know a priority for them. I know a lot of those folks work for AWS and I, get that that makes sense, but um I know in uh at least I'll focus on the ones.

B

I know better I'm not super familiar with AKs I'll focus on like gke I know you have the ability to give gke um control over your instance sizes and allow it to sort of uh dynamically pick instance sizes. The other thing you can do is just use slightly more traditional cluster Auto scaling, but being more cognizant of your node sizes so or your node types right. um So cluster Auto, scaler Works in all the cloud providers. It will give you more and less nodes based on demand.

B

How many pods you have um and so I would definitely you know, recommend, starting with that, it's not as intelligent. It can't pick instance types, and so what you need to do in that case is monitor. That utilization. Look at um you know say that that Cube capacity chart or I'm still working on a I, I think out of the box. Actually we have a uh from Q Prometheus stack. We have the cluster sort of utilization metric that we can see here.

B

So we can see the CPU saturation and start to look at that balance between CPU and memory usage. If you're, you know, if you have, you know, 16 cores total that you're you using or yes, 16 and you have you know 32 gigs of memory in use at load. Then you know something with a one to two ratio, which is a commute compute optimized size is going to be appropriate and then you can adjust your cluster Auto scalar settings to do that and then potentially utilize multiple node groups within cluster Auto scaler.

B

To give you options, if you have enough disparate workloads to need both, maybe some memory optimizing computer, optimized or things like that, so any type of cluster Auto scaler will get you closer to automatic, Resource Management, it's just not going to be as intelligent as something like Carpenter so and then there's other there's commercial options. Like spots, um there's an I think it's cast.ai has an AI driven spot instance. Generator I think both of those work class, multiple Cloud providers, so there's commercial options to look into there as well.

B

So great question thanks man great all right. We were talking about other metrics that we can scale on so yeah um container Network receive bytes or transmit bytes or using both of those metrics can work. But if we go take a look at this metric, we'll go back to our Prometheus um and I'm, going to zoom back out just a little bit because we're going to go show the graph.

B

This is for that particular workload for the last week, but let's drop the namespace and pod requirements and we'll just take a look at this graph for all the workloads in the cluster Maybe.

B

And our perceived bytes total should be getting more than just oh, we need to drop the sum. That's why.

A

B

Just adding them all together, this is the entire cluster. So if we just drop the sum- and we look at the rates for all the workloads across the cluster, this might take a minute to query the last week, but we'll see that the levels are so different, that's entirely dependent on the app like. If we take a look at this demo app here, we're just sending a little ping request every few seconds to the Pod. So that's a very small size of requests, um whereas uh this particular application.

B

When I you know, need to send back the full list of all the votes. That's probably a much larger object, so it's going to be transmitting more data uh same with the Emoji one. This is a you know, much more information being transmitted back to me, so it can work, but you have to sort of tune that uh that auto scaler in order to do that, I, don't think Prometheus like same amount of query amount of data I'm querying here, but um you know it's, it's very different, pretty very different apps.

B

So something to keep an eye on. uh Another thing you could do is just be raw. um You know number of connections or different metrics that are available in Prometheus. So um you know that's one of the things that I'm starting to look into is how can we sort of generalize the horizontal pod Auto scalar metric to make a recommendation for a starting point?

B

That's much easier than um trying to you know craft this yourself and then go tune this you know threshold number based on on your percentage, um and so you know that's one area to look at um and then the other thing is multiple. You know: faceted multiple, multiple faceted.

A

B

Scalers, so here we can see the um is it this one.

B

Has this scaled object for the web service on the Emoji photo app, has multiple metrics, so we're we're scaling on that received bytes total notice, a very different threshold for this app that I've had to tweak over time um and then we're also scaling on the P95 latency of the Ingress controller response, duration, uh they're using the Ingress controller response, duration metric for this app, and so we're trying to keep our 95th percentile latency around half a second or um actually, this comes back in milliseconds, no, no I'm trying to keep it below.

B

You know around 500, milliseconds or lower. So we have two metrics here that we're horizontally scaling on that sort of balance each other out or hopefully, balance each other out. So another thing to experiment with: if you're going to go down this route, is you know really thinking about what's important about what your horizontal is going on? If latency is the most important thing for that application, maybe you should only be scaling on latency. It's going to affect your automated right sizing right, you may get!

B

You know a little bit more over provision resources to provide that, but that made me the trade-off that you're willing to make so I think you know the hardest thing about. All of this is choosing metrics and then tuning those different numbers to values that you want, and in that case load testing is super important. You need to be able to generate some sort of, even you know, remotely realistic load against your service in a non-production environment in order to play with these values.

B

So again, I used k6, which is uh actually now owned by grafana, um to run the load and you just kind of write, little JavaScript Snippets that hit your app and it can run those you know I think we're.

B

If, if we go back over here to this, this tiny window I'm running, um let's see this one's running, you know 100 000, requests this one's running, 200, 000, requests um and running them pretty quickly against the app so we're generating a decent amount of traffic, as evidenced by some of our um requests here, and actually we can go. Take a look at just the Ingress controller uh graph here. We're doing 589 requests.

A

A second: it's not massive.

B

But for a small cluster like this, that's uh you know more than it does just sitting there day to day by doing nothing, and so we actually get some real numbers here. So load testing is super important to be able to tweak these numbers and and set up all these different things.

B

All right well now's the time to ask all the questions. If you have them um so again, six different Technologies today we've got Prometheus, we've got keto, we've got vertical, potoscaler we've got um Carpenter or cluster Auto scaler, depending on where you're at and then Goldilocks to sort of tie it all together. So keep an eye out on Goldilocks over the next six months or so, hopefully we'll be releasing more features related to this sort of concept of automated right setting.

B

It's really where I want to take Goldilocks in the next next uh iteration of it. I think recommendations were a great place to start and I. Think folks appreciate those and the Goldilocks dashboard won't be going anywhere, we'll still be showing you your. You know, recommendations here, it's just that! Maybe you didn't set them because you're letting the vpa take them over now and we'll be you.

A

Know showing that as well on.

B

B

A

Think that's all I have for today.

A

Yes, okay, let's see if any question pops up all.

B

Right: okay, yeah.

A

This this word for the questions to follow: okay, so other than this like uh yeah. So what is you inside on this? Like uh yeah you're talked about a lot right, so some best practices regarding these things to these tools. What do you suggest on this? What should be done.

B

You know that's a good question. I think. A lot of this is really you know tying. All of these different projects together is very experimental at the moment. um It's not something that you know.

B

I've definitely had some mistakes in this testing process, where you know, I've blown up the cluster really big or the entire thing's fallen over, because nginx isn't getting enough CPU, it just can't serve any traffic, and so there's a lot of risk associated with trying to automatically right size and so as best practice, uh don't run it in prod right now, I think, just just uh really.

B

If you're gonna go down this route of automatic right sizing test heavily in your non-production environments, before rolling it out and being confident about it, but I think it's a good path to being able to get better utilization out of our clusters.

A

Yeah we have I think we have a question yeah so yeah, so how to use uh Goldilocks for apps uh that uh has burst spike in memory users.

B

That's a good question um and that's a tricky one with the vertical parallel scaler. So if, if it's bursty in a semi-consistent manner in that, like you know, you're going to get bursts throughout the day, the vertical pod Auto scaler should account for that, because it is using memory, spikes within a window to calculate its Target, and so in theory. It should be able to handle that now. If you think you're going to have like you know a spike once a week that you need to account for then I think aggressively.

B

Setting a memory limit higher than your memory request may be. The.

A

B

You do have to be careful with you know, requesting or having memory limits higher than the memory available on a node or doing that in too many places, um but that is one potential uh way to sort of uh mitigate that and then keeping you know, vpa will also take into account room kills. So if you get out of memory, killed, uh vpa will bump up the next recommendation to a higher memory amount, and so it will sort of self-correct over time potentially.

B

um But that's sort of a thing where, if you're expecting specific burstiness that you can test right right, a load script that generates that burst and see how that looks. um And then the other option is just for those particular workloads. Don't use Goldilocks, don't use VPN. So.

A

So I think he also added a few things like wheat, burstable keywords, config also I see apps getting um built mm-hmm.

B

A

B

Know I did just you.

A

Know sort of mention.

B

That, if you've seen, um kills definitely bump that memory Limit Up um and keep an eye on that, but uh you know burstiness is in here, can be inherently difficult. The other option is to drop the memory limit and let the no Doom kill it if you run out of memory, but then you've got to watch your node sizes and your um your system level and kills not just your kubernetes in cups, so yeah.

A

All right, yeah, I guess so yeah there's no questions left I! Guess, let's.

B

A

See if there any question pops up? Okay, so yes, okay, so other than this yeah. That was really an awesome session and yeah. We uh we've been seeing. We've been visually represented with a lot of drops and so honestly, so yeah, okay, so yeah. That was awesome, so something you would like to add or okay. There is another question like okay, okay, so like so what's happened when scaling to memory or CPU limits.

B

um I, don't quite follow the question, but I'll try and answer best I can get. It can definitely scale on memory or CPU limits. um I have intentionally not done that here because it conflicts with the vpa. So if you try to run vpa and Keta or any auto scaler on CPU at the same time, you're gonna get unexpected results.

B

um So that's the the.

A

B

Which is why I'm not using uh CPU memory so I.

A

Think that's the question, but you know feel free.

B

To correct me, if I'm wrong so.

A

So yeah arpan also mentioned this one. Like is not a vpa finals.

B

Okay, yep yep correct two separate projects: yeah.

A

Okay, yeah I guess so that's uh yeah people are saying yeah. That's thanks for your gift station, so I guess! If there's nothing to add more, uh we can end this session. uh Yeah great okay, thanks.

B

For listening, yeah.

A

Thank you Andy for your grace issue. Yeah.

B

Appreciate it, y'all have a good.

A

One another one I guess at the last moment: let's take this question, one.

B

More question all right: yeah, let's see yeah demons, yeah, it's potentially resource hungry, adding a new caseworker node won't be a good idea. So vertical scaling should be considered. Yeah.

A

B

Damon sets are an interesting thing to consider here. um Damon sets would be considered sort of the uh node overhead in a calculation by Carpenter about you know uh whether you should add another note or not, and Carpenter does have the ability to take that into account. uh So, ideally that should be handled by Carpenter.

B

um But yes, vertically scaling using larger nodes in a case where you have very hungry Damon sets is a great idea um to sort of avoid uh just wasted overhead on your notes, um but, like I, said I believe Carpenter should be taking that into account something to to consider and keep.

A

B

Aware of if you have lots of damage, so that's a great point to make. Thank you.

A

Okay, so so yeah Goldilocks different from cute ghost.

B

ah uh Well, Goldilocks is older, that's for sure um been around longer, but uh Goldilocks uses the vpa just to provide recommendations, there's very little cost functionality in Goldilocks there's a little bit, um it's very specific, uh very limited in its ability. Cube cost has a lot more cost focusability, whereas Goldilocks is much more focused on just resource requests and limits. So that's kind of the big.

A

The big differentiator and uh yeah so yeah, okay,.

B

A

It is awesome like when we are going to end the station. People are like popping out questions, okay, so let's I guess: let's now end the session and we are done with the session. Has the time has ended? Okay, so thank you so much Andy I think the all of the questions are done so.

B

Thank you so much hope to see you soon. Yeah.

A

Okay, so let me take you to the backstage, then bye, bye,.

A

Okay, so thanks everyone for joining the latest episode of collaborative life. We enjoyed the interaction and questions from the audience thanks for joining us today, and we hope to see you again soon.