Cloud Native Computing Foundation Online Programs, 9 Feb 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Cloud Native Live: Building Stability

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

Welcome to cloud native live um where we dive into the code behind cloud native, I'm annie talvasto and I'm a cncf ambassador as well as a senior product marketing manager at camunda, and I will be your host tonight very excited to have you everyone joined today.

A

um So every week we bring a new set of presenters to showcase how to work with cloud native technologies.

A

They will build things and they will break things and they will answer all of your questions so join us every wednesday to watch live. You can also watch on demand later, if you missed an episode.

A

So this week ha week we have andy suderman from wherewinds to talk about building stability, um but before we get to the topic of today, another exciting thing that's happening in the cncf universe. At the moment is the kubecon europe co-located event. Cfps are closing soon. So if you have any talk, ideas or you need to come up with ones, you can go ahead and submit them now soon will be too late and, as always, this is an official live stream of the cncf and as such it is subject to the cncf code of conduct.

A

So please do not add anything to chat or questions that would be in violation of that of contact. Basically, please be respectful of all of your fellow participants as well as presenters, so with that I'll head over to the speaker of today to kick off the presentation.

B

Great, thank you um so I today I wanted to talk about uh resource requests and limits, and those who know me are familiar with some of the things that we do at fairwinds. um Resource requests and limits are kind of a pet peeve of mine. If you might say, or just a thing that I commonly harp on or talk about, and so I feel like. We spend a lot of time telling people to set their resource requests and limits on all their workloads, and we tell them.

B

You know different strategies for doing this and things like that.

B

But what we don't talk about as much at least out in the open is what happens when you don't set them properly or what we don't get to see very often, except in real life clusters, are some of the the negative side effects that can happen, uh and so I've been wanting to do this for a while, I put together some demos of kind of the different different things that you can break um so hopefully, today we'll get to break some stuff, see some things fall over and uh have an idea of why that's happened perfect.

B

So um all the code I'll be using today uh see is the screen share up sorry, I can't actually tell.

A

B

Can everybody see my screen here.

A

No there we go awesome. Thank you all.

B

Right, so everything I'll be doing today is in a github repository that I made public this morning. So if you want to go tinker with this, um what I have here is, I have a gke cluster. It's on.

B

um I have um just n2 standard two nodes, so they've all got two cpus and eight gigs of memory. uh I've enabled node auto scaling uh across the three zones, so I by default, have three nodes. I can scale up to two notes per zone, giving me six nodes total and then I have an application running in this cluster. So uh if you saw my last live stream a few months ago, I used the same app, it's kind of a fun little app.

B

You can just go to the app and you can vote for where you want to have lunch and the counter goes up. We see how many page views there have been. We see the name of the back end that we're connected to, uh and all of this is stored in a db in the cluster. So if we go take a look at our cluster in the yelp namespace, we have the app server which is kind of the back end. We have the db.

B

We have the ui and there's a redis server for caching as well and all the code to deploy this into the cluster series in this repository in the app directory. uh So if you want to deploy it to a cluster, you can just keep ctl apply that app folder and get this app running by default. It creates a load balancer. We have just an ip address here. Very simple setup really easy to recreate and right now this app is, as far as we know uh functioning, it seems to work.

B

I can click, I can vote for stuff and it seems to be doing its job uh and we see it's using relatively low amounts of cpu and memory right now. So we have you know once one millicore and 36 megs memory going on here, and so we have a happy app, that's great!

B

So now what the first thing I'm going to do is I'm going to show what happens when you deploy a noisy neighbor? Next to an application: that's properly configured, so this app. If we look we're going to focus on the app server today. So that's the back end for this application.

B

It tends to have the most dramatic swings in resource utilization with traffic coming into the application, which is why we're going to focus on that, and if we take a look at that and we scroll down to the resources section, we see we have a resource request of 200 millicourse and a limit of 200 millicourse, a memory request of 100 megabytes or megabytes and a limit of the same.

B

And so what we have here is a guaranteed qos class, and what this means is that kubernetes and the colonel are going to try to their very best to give this all the resources that we have listed here.

B

So what I'm? What I'm going to do now is- I have in here this course yaml it. It works with an open source tool that we have essentially I'm going to deploy a helm chart and I'm going to deploy a helm, chart two different helm, charts that are going to spin up a program called stress which is just going to eat up cpu and memory in the cluster and these two helm charts that I'm about to deploy.

B

These pods are going to spin up and they're, basically just going to eat, attempt to eat all the resources in the cluster.

A

So and then there's an audience question: what tool is used to show pots cpu, slash memory usage. Thank you to more lethargy for the question.

B

Yeah no problem, um so I'm using here this is k9's canine s. um So it's a just a kind of a 2e or a terminal user interface for interacting with kubernetes. um What we can do also, essentially what this is showing us is. If I do a cube, ttl top pods, I can see the util the current utilization of the pods in the name space of my current context, which is the yelp namespace.

B

That's essentially the same thing, I'm seeing here the nice thing that k9s or canines, I'm not sure how people say that does for us is that it shows us the percentage of the request and the percentage of the limit that we're using as well. uh So that's percent cpu, request percent cpu, limit percent memory, request percent memory limit, which is kind of a nice thing to see as we go through the rest of this demo, which is why I'm using canines in this case.

A

B

So I'm going to go ahead and install that helm chart and it's going to create the namespace called stress and it's going to deploy those pods that eat all of the cpu and memory in this cluster and what we're going to see, because these pods are trying to eat up essentially as much memory and cpu as they can, but we're not requesting any cpu we're not putting any requests or limits on these pods.

B

What they're going to do is hopefully succeed at just overwhelming the node, um and so we have a couple different ways to look at this. um We can do cube ctl top nodes and we can see the.

B

Cpu percentage and the memory percentage utilization. Currently uh we can also see this in k9s by going to the node view uh in canines, or we have another tool that I typically use that we'll use a couple times throughout this demo. It's called cube capacity and if we pass it the usage flag and make this a.

A

B

Bit wider because there's a lot of output there we go, we can see the uh cpu requests and limit totals for the nodes. So what we're? What every pod on that node is requesting?

B

The total limits of all of the pods on that node and then the current utilization and you'll notice that all three of these nodes are now packed at 103 percent, cpu usage uh and over a hundred percent uh memory utilization as well, uh and hopefully our view down here- will start to catch up with that. um So yep there we go uh cpus at 2000. We have two cpus on these, so that would be full utilization. It's 103 of the available cpu on the node, and so now what we're gonna do is uh well.

B

First, I'm just gonna go click on the app and see if it's still working, um because that's like the easiest way to check. But another thing we have in place for this demo that that's going to be useful uh is I'm you. I have another file in the repo.

B

It's a javascript file called load.js and what this is going to do is I'm going to use a tool called k6, which is a load testing tool and it's going to run a load test against this app. So essentially, what it's going to do is it's going to go? Click on those buttons, so I'm going to go click on those buttons. The default is set to 10 iterations.

B

So it's going to go in click on each button and then sorry load, the main page click on each button and it's going to do that 10 times and then right now, I'm using two what they call virtual users. So it's going to click on it's going to use 10 different, essentially processes to do that, so they all kind of happen in parallel, uh and if we see here at the request, duration, uh the average http request. Duration for this test was 78 milliseconds and the average iteration time was about 4.8 seconds.

B

This is the baseline of what I expect for this app. So if I had run this before, we started stressing the cluster, this is what we would have seen the app performance. So we can see that the nodes being fully utilized- and you know completely overwhelmed by this other uh application that is improperly behaving- are not affected by.

B

um They are not affecting the application that we're running in our cluster because we set our resource requests and limits to that guaranteed qos class. This is why, for anything that is, you know critical or you know important to you. I generally recommend that you use that guaranteed qos class set your request to them. It's exactly the same and set them to a reasonable number that you've tested.

A

Yeah great um and I see a comment from gary um sorry, I showed up late, not sure if I missed it, was there a link given for any of these tools. I don't think we've given links so far, um but we can see if we can add um some during the duration of this um webinar live.

B

Definitely definitely if we can share the link to the github repository, uh there's actually a section at the bottom, that has a list of tools used and some links to those as well. So if we can just share that uh initial um github repository url, that.

A

I that has been shared if it's, the top fairwinds resources demo.

B

That's the one! So if you go to that page and scroll to the bottom of the readme, you should see most of these tools.

A

Perfect, so there you go gary, you can get the tools from there and thank you so much for asking a question once again also yeah.

B

Thank you so now we've seen kind of what a well actually there's one more thing to show sorry, uh so the other thing we're gonna use, we're gonna, go take a look at the pods in that stress name, space, uh because we're spinning up a whole bunch of pods that are attempting to use way more cpu and memory than it is available and they have no resource requests and limits set on them whatsoever and we're gonna see something particularly ugly here.

B

uh We're gonna see a whole lot of pods that have been evicted um because the uh because they're trying to use so many resources and because they have essentially you know the best effort, qos class, we haven't set any requests or limits. We've said just try to run this see what happens. uh We're gonna, we're gonna, see them get evicted as the node has the condition memory pressure, so we're running out of memory on the node. We need to find some pods to get rid of to make space for other things.

B

These are the first on the chopping block to get removed because they have no resource requested limits set. um So now we've really seen the a the detriments are not setting any resource requests and limits. You're gonna see you know: pod evictions, you're gonna, see potential issues with applications running and then also the benefits of setting your resource requests and limits properly on your on your critical apps, so that they're not affected by other workloads in the cluster that may do bad things.

B

Let's see any other questions about that, I think we're all right. So I'm going to delete the stress, namespace and we're going to stop stressing this cluster so much uh we may have noticed here in our node list. We have six nodes now, so we've scaled up to our maximum number of nodes because of all of the extra pods I've been attempting to schedule and all of the memory pressure on these.

B

It's also interesting to note that if we were you cluster, auto scaler in this case, if this wasn't a gke node pool um it's possible, we would not have scaled up the cluster, because there are no resource requests uh for those that need to be scheduled, and so it may not have known it. The pods wouldn't have gone into pending state and the cluster auto scaler would have known what type of node to spin up so in another type of cluster. This may have had even more detrimental effects, not allowing the cluster to scale up.

B

All right, so that's the first demo, that's covered in the readme kind of describing what I did, what the different effects are and we're going to go on to the second uh thing. That can be a problem which is not setting your cpu limits correctly.

B

There's been a lot of debate in the community, about cpu limits and cpu throttling and what you should set your cpu limits to, and linux kernel bugs that resulted in increased cpu throttling more than you would expect, and so I'm just kind of kind of cover. What that looks like when you're experiencing a lot of cpu throttling.

B

um So the first thing I'm gonna do is I'm just gonna put a little bit of stress on the cluster. I'm just gonna um schedule some pods that uh use some extra cpu just to kind of create a little bit of extra uh noise in the cluster. While we do this, so this is the same app I was running before, but we're just stretching cpu and we're not running nearly as many of the pods so that we don't get quite the same behavior.

B

And then what we're going to do is I'm going to go. Take a look at the app server deployment and I'm going to edit this and we're going to find the resources block and I'm going to turn this way down. So originally we had cpu requests, 100 limits, uh 200 or should have been different than that, but that's right. I'm going to turn this way down to cpu requests of 10 millicourse and a limit of 10 millicourse and we're going to take a look at the pods in that deployment.

B

We're going to see what happens.

B

So in the container, creating the state and we're waiting and describe this we're still pending.

B

Just grab this part again see if we see what's happening here, pulling the image.

B

Surprise this is taking so long.

A

Yeah, I think it's the demo. I think it's you every time. It's.

B

Every time yeah live demos are a dangerous thing.

A

B

All right, so we've started the container.

B

It's running, we're waiting for it to go into a ready state, and now we start to see readiness probes failing so we're trying to do a poll or a get request to the api. Endpoint get stats as our readiness probe in order to tell kubernetes when our pod is ready to receive traffic, and these are just failing um and the liveness probes now failing as well, it's the same api endpoint.

B

So I would expect that and if we try to get the logs for this pod and grab the previous, oh there's no previous, yet uh we'll grab the logs and spots, not logging. Anything nothing's happening so essentially what's happening. Here is this: we have throttled the cpu down so far that this app can't even serve requests anymore. It just can't can't serve these requests.

B

So if you have uh random intermittent failures of probes that you can't explain if you have a pod that doesn't come up just because the probes are failing and there's no logging or maybe there's some logging, but it's very intermittent. This is usually evidence of cpu throttle. Now you can go. Look at graphs from prometheus or from stackdriver and construct grafts disease or whatever your monitoring tool is but first thing I always look at when I see just unexpected probe failures. I know the app should respond on that.

B

That endpoint is the the cpu uh limits, specifically the limits, because that's what controls the cpu throttling so we're gonna go back to our deployment and we're gonna edit this again and I'm gonna turn this up to something a little bit more reasonable. 10 millicourts is tiny. We originally had 100 millicourse, so let's bump this up to 40.. Maybe we just started this thing up. Maybe we would kind of expect it should only take you know: 40 50 millicourse. We want to be conservative.

B

We don't want to give it too many resources up front because we want to have the most efficient cluster possible and so we'll see if we can get our positive start uh with this new setup.

B

If we look at the the last one that tried to start, which is now terminating, we see it was well over its cpu limit and request and it crashed twice probably killed um because of its failing probes, and so it just wasn't in a good state. So we're going to try this new one here and we're going to see how it goes. Let me take a look at the logs on that.

B

It's actually starting this time, so we have something good going on there and it's running now we're just waiting for that readiness probe to succeed and there it goes so we're going to finish the rollout I'm going to let it stabilize before we do anything else here.

B

It is also fun to note that, uh due to the wonderful features of kubernetes, our has actually been functional the entire time throughout this uh that new pod, because of our deployment strategy, just didn't come up, but our other two pods were still running, and so, if I had gone and clicked on the app or run my load test here, we would have seen that the app is still fully functional all right. So we have one running to running.

B

We should only have two so right now I have pinned the hpa to oh, no. I haven't hang on.

B

Let me fix that, I'm not supposed to do that until later.

B

A

um And then we have a question from mark about the slack channel for the chat. um I think it was linked or told a bit earlier. Yes, there we go, uh you can see there, um so you can join in there. But obviously you can also ask the questions, as you did mark already, via the um chat in your preferred um streaming provider um so that you already are doing really well on that front perfect um and then actually there's a question from muhammad again.

A

Sorry, if I'm I'm failing in pronouncing the name by the way, um so they say, uh look still like looks like still. Cpu takes 100 percent off 40, um better to increase limit, cpu 100m, and then thank you so much to jonathan for saying awesome great. That you're excited to be here.

B

uh There's a great point about using 100 of 40 millicourse uh as the limit uh right now we're sitting at two percent and 62 on our two pods that are running uh with. I would assume very little to no traffic unless everyone watching the live stream has gone and started to click on this thing um so uh and, and that is actually part of the demo. So I'm gonna talk about that here in a second.

B

So now we've got it we're sitting at 40 millicourse our app seems to be running we're passing the probes, everything's kind of stabilized out, um and so I'm going to run my my little benchmark thing here that we ran.

A

B

K6 load test and I'm just going to take a look at the numbers here. Let's see what happens.

B

And you may have already realized it's taking far longer than it did last time uh and we look, we see our uh average requester duration was 500 milliseconds and our average iteration was 12 seconds, so we've almost doubled the amount of time it takes to run this test, and this is a very small test. uh This is not indicative of any real real life traffic or anything like that.

B

uh If I ran this for a lot longer um turn up the leave the views at 10 but change the iterations like 10 000 and just let this sit, we will see that the app will use 100 of that that cpu and it's just sitting there and getting throttled over and over and over again. This would be another good opportunity to go. Take a look at our metrics graphs and see that throttling in action.

B

You have to be a little bit careful with those kind of graphs, because sometimes they can be misleading. Sometimes you see throttling and you're, not sure. If it's you know consistently a problem. What I prefer to do is take a look at actual latency. Just look at your application performance. Look at those gold metrics and see if your app is performing the way you expect, and in this case, at 40 millicourse.

B

We know that our request, duration of 500 milliseconds, is way too high. That's just not right for what our application, what we expect our application to do, and so the uh the suggestion in the chat to bump up to a uh a limit of a hundred millicourse, definitely a great idea. So we're gonna do that. um Let's find the resources block and bump this up to back up to 100 and 100 and hope our app starts to perform.

A

B

B

So it's also interesting to note that, during that that test we never actually saw the cpu spike. So cpu throttling is is a very complex mechanism. It's I've watched a few videos on it. I'm not sure I fully understand it, uh but I I understand the effects of it, and so, if we see that our app isn't performing and our limits are too low, we're seeing a lot of cpu throttling turn those limits up, even if you're not necessarily seeing full utilization all the time.

B

It's probably indicative of a problem with your resource requesting limits so we're gonna, wait for this to stabilize again looks like we have two new running pods that happen a lot faster this time, because we're not being cpu, throttled, quite so heavily uh and let's run our tests. Yes,.

A

Well, um while we wait for that as well, there's another question from antoine: is it a good idea to profile applications? You know worst case cpu usage and then adapted.

B

I think that's a great idea to profile your applications. I think most people typically don't do that or don't have the ability to do that or just don't spend much time on that, but a great way to uh understand your application performance.

B

You are correct in the last comment as well. The last comment is resource. Cpu is not necessary. I'm assuming you mean uh request. Cpu request is not the same as limits and that you can keep the requests low and the limits high, and that is definitely true, but I am going to cover that a little bit later in the demos and so uh we'll talk about that in a few minutes. uh So we ran our tests, we're back down to 120 milliseconds, still a little higher than uh what we started with.

B

I think we were at 200 on the baseline, so I'm going to go back and change this again and we're going to go up to 200, probably more than I actually need, but not a huge deal for this demo. We're just going to double it see what's happening.

B

And antoine, you are very welcome. Thank you for asking questions. Keep the questions coming. It's a lot more fun when people are interacting all right. So hopefully, we've got this down back under 100 milliseconds. Let's see what.

B

Happens all right, yep we're back down under 100 milliseconds, so 200 seems to be a pretty decent sweet spot for us, I'm going to leave it there for now and we're going to move on. I believe the defaults in the uh actual yaml that we used to deploy this is 200 for both the request and the limit. It's set that way because we want to start with the best. So if you go to run this demo yourself or tweak this, you should start with that at 200..

B

So that's the end of the cpu throttling demonstration, uh I'm gonna move on to the third demo much simpler than the last one.

B

What happens if we just turn down the memory requests and limits um so we're gonna edit this again and this one most folks are probably familiar with. This- is a really common thing. It's relatively straightforward, because the uh reaction to running out of memory is much easier to understand than the then cpu throttling cpu filing, as I talked about, is very fairly complicated.

B

uh What happens when we turn the memory limit down? Is we're just going to keep getting killed over and over again? So if we go describe this, we see the last state was terminated. We are oom killed or out of memory killed the exit code, for that is always 137.. You may not necessarily see the reason as um killed.

B

You may just see terminated, but if that exit code is 137, you know there's not enough memory there and your pods just gonna sit there and crash over and over again, so we're gonna go back and uh we're gonna edit. This again oops wrong button edit and find that resources, and we started- I think, 50s. So, let's bump this up to 20 because you know obviously 10 10 megabytes was not enough for this app to even start just like in the cpu one.

B

We weren't able to even get the app running on that, but let's pump it up to 20 and see, if maybe maybe it'll start on that this is a little bit harder to do. Nope, it's not going to even start.

B

I believe this app uses the most memory at startup uh and so we're gonna go back up to 40 here.

B

Another great place to test this out in this app is the database. The database uses a lot more memory, but I'm trying to focus just on the app server today.

B

All right see if this new one can come up.

B

40 megabytes.

A

Yeah there's another comment from.

A

I feel so bad for that pronunciation today. So if java based admin requirement 512 at least.

B

You are correct, but this is not a java based app. So as far as I know, I actually don't remember exactly what it's written in. We can go find out. This is the repo it's linked in my repo and it looks like yeah front-ends javascript back-end is I'm gonna, guess ruby by the uh by the github analysis. Here, I'm not gonna go dig through it because that's not important to the demo so much um so we're at 40 megabytes. We see uh just at steady state.

B

Well, I'm not running any traffic right now we're using seventy percent of our memory limit, uh and so I'm gonna run that uh load test again and I'm just gonna. Let it run for a few minutes here and we're going to see a lot of times. What we'll see is we'll see intermittent boom kills. So things won't necessarily you know the app will start up, but under load it will start to fall over.

B

This is an opportunity to either adjust our memory limits up and have a little bit of buffer for it to surge or it's an opportunity for us to turn on a horizontal uh pod autoscaler and that's usually a much better option to scale horizontally rather than vertically. um But we will see this app's not so much memory bound as it is cpu bound. It seems to use a fairly consistent amount of memory, uh and so I'm not going to likely get a ton of results out of this. But we'll see what happens.

B

All right so we're up to 85 and 83 on our two little pods here.

B

B

We go take a look we'll see how much traffic we're at 22 596 requests, we've done. Another couple thousand or so the app is still running just fine um and we can take a look at the stats and.

B

Look for duration, http, request, duration, we're at an average of 67 milliseconds, so with a p95 of 102..

B

So it's actually running quite well, and another thing to think about here is that 92 memory utilization might be a good thing. That might be exactly where we want to be so if the apps, if our golden metrics or our golden signals, aren't showing any issues if our latency is still where it needs to be, 92 is fine, I'm just going to let it run.

A

Great and thank you so much for all the questions so far and then there's another one from antoine. Thank you. Is there a way to reduce the delay between the moment more resources are required and the moment a new node have been added to the cluster.

B

Reduce the delay between the moment, more resources.

B

uh Predictive auto scaling that would be the golden goose, wouldn't it um I don't know of any great things out there, there's a couple of solutions for sort of uh creating buffer space to reduce the amount of time it takes to scale up the it's. The node over provisioner is a common solution where you essentially just run a very low priority.

B

Pod that sits and holds holds those resources available and then automatically gets evicted when those those resources are needed by something more important, and so essentially you keep those nodes kind of pre-warmed, um so that can reduce the amount of time it takes to scale up as far as predictive auto scaling, you really have to know, like the patterns of your traffic, coming in, to be able to do that, and so it's a much more complex problem to solve, and I don't know personally of any great solutions out there for it uh generally.

B

The solution is to just scale more aggressively, so turn your targets down, or things like that. So we're still running at uh 93 percent. Here we uh have an average uh request, duration of 67, milliseconds or 69 milliseconds. So actually doing quite well. Here, even at 91 95 memory utilization, so I'm going.

A

B

This I think it's a great spot to be at. um I don't really care that it's at 95 96 97 percent, as long as it's not occasionally getting om killed. If we do start to see those occasional om kills, then we may want to increase that just a little bit but, as I said before, I don't think this app's memory bound it's very much cpu bound, uh and so I'm not going to worry about it. So that was the easy demo boom kills very simple.

B

We all relatively understand them, so I am going to move on to the fourth one, and this is my favorite, uh because we see this a lot in the clusters that we run for our customers and that I've run for people in the past. um It's very common to think. Well, I know my app.

B

uh I know my traffic's gonna be bursty. I know that my resource utilization is gonna, be bursty. You know I don't need to request as much as my limits are, I'm going to take my requests and my limits, I'm going to use that um burstable qos class and I'm going to set them really far apart.

B

So I'm going to request 10 millicourse, because that's what I need to start with. You know. That's that's really all it needs to get going, that's kind of what it uses its steady state uh and I'm going to set my my limits and I'm doing this backwards on the screen, because I'm trying to talk while I type but I'm going to set my requests to 10 millicourse, which we already know is far too little for this app from the earlier tests that we were showing on cpu throttling.

B

But I'm going to set my limits of 500 millicourse. So our limit is way up there we're not going to get cpu throttling, um and uh so, but you know I don't need to request as much so it's common thing very common thing to do, and I'm also I'm going to do the same with my memory. uh This isn't actually in the script. I'm going a little bit off script here, but we're just going to see what happens uh when we do that.

B

I'm going to set the memory request to 10 millicourse and the limit to 100., uh which is way higher than what we had before, and so I'm.

A

B

To set that, at the same time where this really becomes a problem, um because it's not so much a problem when you're running a static number of pods, it can be a problem. But where this really really becomes a problem is when we start using horizontal pot, auto scaling.

B

So I'm going to edit that hpa that horizontal pod autoscaler and I'm going to let it scale way up. So I'm going to say mineral because 2 max replicas 200, and what we're going to see here. um If we take a look at that hpa, we'll see it's a cpu, auto scaler and we're targeting 50, cpa utilization and the interesting thing about this is that.

A

Horizontal pilot.

B

Scale is based on request and scheduling is also based on request and so by requesting only 10 millicourse, but allowing a limit of 200 or whatever. I set it to 500 that huge number we're gonna see some probably interesting behavior in our cluster, um and this is where I like to go back to that cube capacity tool.

B

um So let's just keep watching this for just a minute, um and actually I'm going to keep running load against this and I'm going to run a little bit more load. So say we got quite a surge in traffic, I'm going to use 30 virtual users instead of 10.

B

we're going to see what happens um so we're already sitting at 280 percent of our memory request uh and 278 percent of our member requests on these two pods. So we've we've we're asking for not nearly enough, so we've scheduled this pod on a node thinking that it only needs 10, megabytes of memory and 10 millicourse.

B

The schedulers made this decision to put it somewhere, but um that that's way off because now we're using three times that amount already. So the scheduler's already made a bad choice because we told it to and let's also not demo, one sorry.

B

We're still running some cpu stress in the cluster as well.

B

um So if we take a look at our hpa, we'll see that we are now fifteen hundred percent of our target cpu usage, and so what this is gonna do is we're just gonna explode the number of pods we're just going to scale way up here for really no good reason. So, if we look at our stats.

B

And take a look at our current request: duration. um Let's see okay, nope.

B

That's iteration duration, we're at seven seconds or a little bit high right now, but not passively. Remember. That was five earlier.

B

um If we look for the request, duration, we'll see that we're at an average max of 900 milliseconds, an average of 200, so we're a little bit high, um but we're not as high as we'd expect for the massive traffic spike that we had.

B

So if we take a look at the number of pods here, um we now have 75 pods. Take a look at our hpa, we're starting to settle down on our target here, um but we probably don't need this many pods. We.

A

You know for the.

B

Amount of traffic I'm running, I would not expect to need um 70 replicas, that's going to go up again in a second because of where we're at here uh we're do we're using a hundred pods for a relatively small amount of traffic. So, first we're gonna, see problems with scaling. Scaling is gonna happen too fast, because our percentage of our request is so far off from what we're actually using.

B

The second thing that we're going to start to see is nodes are going to start to get uh overwhelmed. So this is where that toolkit capacity comes back. In that I talked about written by an old co-worker of mine, rob scott and uh it's gonna sum up your cpu limits uh per note. So if we look at all the pods running on this node, we add them up.

B

You can get this information from a get nodes output, but this really sums it up nicely in an easy to read way, but you'll see that our cpu limit is currently uh our summed cpu limits are currently 500 400, 300 percent of our node capacity.

B

So we have the opportunity to attempt to use 500 of our cpu or uh you know I didn't really constrain on memory, but I've seen this happen with memory where you get to three four hundred percent of your memory available on the node and as soon as you get a large amount of traffic and those nodes start those pods start to consume more and more resources.

B

uh Your nodes are gonna, get overwhelmed. Things are gonna start to fall over we're gonna start to see. Evictions like we did in that first test, where we just spun up a whole bunch of stuff with no requests and limits, and it just really can result in a very unstable cluster.

B

I'm not saying that all applications have to be guaranteed qos class, but for your critical applications, use that guaranteed qos class and when you're going to use burstable, be mindful of the entire cluster in the ecosystem. That you're deploying into and understand that, if your you know, combined usage starts to get to a point where maybe you could hit 600 of a node? That's probably not a great situation to be in and so be. Mindful of those summed limits and those summed requests.

B

See what's happening here.

B

So we'll just run the regular test, real quick here and see what our uh current uh utilize, our current numbers come back at looks like we're we're at 382 milliseconds average, with an iteration of 10 seconds. So even with all these extra pods spun up we're still not getting great performance.

B

We are at four nodes, we haven't spun out any new ones and we can take a look at the utilization output way too wide.

B

um Let's see our memory utilization looks good, but all of our nodes are spiked out on cpu, which is probably why we're seeing such high latency in our test, because, with that level of utilization on the node, we're going to start to see more and more cpu throttling.

A

Right and then there's an audience question again, um which is amazing, so this is part of chaos. Engineering over communities from jonathan.

B

hmm I mean you could kind of say that what I'm doing is a form of chaos. Engineering, uh if you really wanted to introduce some interesting uh chaos into your cluster, deploying something like that stress application to kind of attempt to eat resources would be a form of chaos. Engineering, so sure yes, thanks for the question.

B

All right: well, that's actually, my last demo, I know we're a little bit early on time. um Are there any other, I'm sure I can come up with some other ways to break this cluster. Any questions other questions from the slack channel or anything like that.

A

Yes, perfect, there's a lot usually when we're near the end. The question amount just increases and increases, which is amazing to see by the way. Thank you so much so dave asks. What do you prefer uh k6 versus pie, test for endurance, testing.

B

um Sorry, I didn't quite catch that wait. What's the channel in the cloud native, I can uh go, pull these up.

A

uh It was from dave um from youtube side of things.

B

Oh from youtube, okay, oh I see k6 versus pi test for endurance testing. um I really enjoy k6. uh It's super easy to run. Writing the tests is easy. They're, um their uh documentation is great. uh Their cloud products actually pretty nice, um but um you know it works for a whole bunch of different situations. I've been using it for a while. I don't know if I've used pi test directly for this type of testing myself.

B

So I don't know if I can necessarily give a good recommendation there, but um huge fan of k6, it's very simple, to get running.

A

Great um keep the questions coming. If there's uh more, we have time to take them and then a linkedin user asks. Is this kubernetes native application.

B

um No, I don't believe it is uh originally. I will say it runs quite well in kubernetes, I've had good luck with it as a demo application. All the different pieces are nice. I grabbed the the yaml for deploying it modified it pretty heavily from um the repository, so it definitely runs well in kubernetes, whether I could say it's necessarily like originally kubernetes native. I I'm honestly not sure.

A

Great uh and thank you to you, dave by the way, um of course, we're asking the question great. So any other questions, or do you have some some other way to break break things more.

B

Oh there's always fun ways to break things, so other scenarios we can get into with cpu and memory requests and limits. What's another common one that we see. um I had an idea this morning. It just left me.

A

uh While we wait for that inspiration to strike again, um I have a question actually, um so you talked a bit about different projects and so forth. But do you have a favorite cncf project or another open source project regarding stability for kubernetes.

B

Definitely uh I'm a little bit biased in that we have some that fairwinds has released some open source projects. um Goldilocks is around. uh It is designed for setting your resource requests and limits, or setting up baseline for them using the vertical pod autoscaler project, which is part of the kubernetes project, to set uh to get recommendations on how to set those things initially.

B

So goldilocks gives you a nice way to just view those on a single dashboard and then, as far as other stability issues that are not necessarily cpu limit and request based, we have a tool called polaris that checks for best practices in your configuration of your workloads, so not just cpu requests and limits, but usage of hpas and pdbs and other configuration things that are important.

B

Liveness, probes, readiness, probes uh and stuff like that, so those are kind of our two main ones around stability um and then, of course, any of the tools that I showed today. All of the the kubernetes native tools, cluster, auto scaler uh metric server is obviously in use here as part of gke. So I think that's a good list of tools that I frequently use.

A

Great perfect and yes, jonathan, thank you for for hyping us up with great awesome, thanks for the share of for this increased capacity over communities. Thanks a lot. Thank you for attending and thank you for asking questions and um good day and good night and so forth to you as well and um great and then there's, I think, a question in the slack aside as well. um So it's clear about setting good requests. What would be the best practice to set good limits.

B

um Well, if you're gonna set your, I mean I I feel like I touched on this a little bit so if you're gonna set your requests and limits different if you're not gonna, use guaranteed coils class, I would say a good guideline is to just not keep them too far apart. um So I generally like as a very very like total generic baseline, you could start with not setting it, maybe 10 more than 10 over your request, depending on various various variables.

B

Obviously, I can't give a blanket recommendation: it's going to work for every workload in every cluster, there's just too many variables at play.

B

So if you're going to use burstable, just don't keep them too far apart and be mindful of the whole ecosystem, um but in general, if it's a critical workload, if it's your main application, if it's sensitive to being throttled or you know any sort of disruption, then using that guaranteed qos class is going to be your best route, so set your resource requests and limits the same and set them high enough that your app can function and you get the lowest latency possible.

B

There's actually another tool out there that I forgot about that I've used in the past to do this, it's a much more complicated to use, but it's a really interesting concept.

B

It was made by a company called carbon relay that is now part of storm. Forge, I believe, forgive me if I get your names wrong, but essentially what it does is. It allows you to run a series of tests, so you set some variables that you want to monitor. You say I want to look at my latency and I want to balance that against my you know how many resources, I'm using so effectively my my cost and so the I want to maximize or I want to minimize latency.

B

I want to minimize cost as well, and I want to tweak the cpu requests and limits, and so you give it a set of parameters and what it will do is it'll. It will, um and then you give it like a test to run like a k6 load test or something like that and it'll make the change.

B

Do the iteration of the test and measure those variables and it'll do that over and over again and plug those into a machine learning algorithm and essentially give you a curve of like okay? If you take your cpu request and do this with it, what how does that affect your latency?

B

It's a really interesting tool and a fantastic way to um sort of get an idea of what what what's going on in the cluster around these two types of things. So that could be a fun thing to look at for folks.

A

Great thank you to moisturize for that great question and um yeah there's been a bit talk about the slack as well. So um there was a comment then, and if you send cncf um on youtube your email for monaco, uh we will get. You um started on the slack as well on that front, but yeah. Those are the questions so far. So if there's any still, we do have a few minutes to take them, so keep them coming.

A

If there's anything that pops into everyone's head uh did inspiration strike and did you remember the um good way to break everything.

B

um It did not. Actually, I was too busy answering questions, um so I think these definitely cover the the primary scenarios that we see. Most often, um I'm sure there are dozens of other ways to break the cluster, uh but these definitely are the the best. uh You know the most common ones that we see.

A

Yeah and uh to to dumb chicken, you can see the slack um channel on the stream itself, where the joiner live chat and cncf slack there and for the cncs slack. I think you can find that from the cncf website, for example, link to the slack to begin with.

A

It has also been linked to this chat, so you can maybe find it from there, but if not, that the cncf um website itself should have the link to the cncf slack, where you can then find the cloud native live channel itself and then there's a comment from jerk saying: we've had long reviews on the request and lim range and come down to 20 in our particular case.

B

That's great reviewing your cpu requests and limits is always a good idea, something you should definitely revisit on a regular basis um and uh definitely see you know it's a balance between performance and efficiency or performance and stability and cost right, because we can always get a very stable cluster by just turning our requested limits way up always using you know, guaranteed qs costs on everything all the time and it will be very stable and we won't have any interest, but we're probably over provisioned considerably.

B

If we do this and so reviewing that over time, looking at your metrics over time and then making tweaks to those uh that don't affect your, you know, latency or you have a reasonable uh range that you're shooting for with latency uh is super important to do so. I think that's super great.

A

Great everything's going smoothly then over there um and then there's another question about what is this channel for the slack and it is still on the stream. So you can see it. You should see it like this direction, so join our live chat on cncf club made it live, so that is the slack channel you should be joining, um but obviously in here we actually see comments from youtube. So stephen, we do see your comments here, already live from there as well or from linkedin or from the other places. But that is true.

A

You can join the sac as well and join the conversation on that side as well, and um thank you so much for joining and uh I think on the thanks and.

B

A

It and great messages on that side and the slack link. I think you can find that from the cscf uh website itself, um if I'm correct, um and that should be there so that you can join there. There.

B

A

uh There's the link to the cloud native slack once again.

B

A

That's great everyone's interested in continuing the discussion there and um it's the same channel that we have every week for these cloud native lives. So um if you um hop in over there, you can then join in next week as well and see um the chat in action in there as well as well as um the links to all of the live shows, are linked to that slack channel as well. So that is very nice tip for everyone here again, so we have three minutes left in our scheduled programming time.

A

um Do you have any final comments or um recommendations, summaries or anything.

B

Not anything new just make sure you set your resource requests and limits. I will continue harping on that, probably for the rest of my career uh so and uh thank you so much to everybody for all of your questions and for listening. uh It's been great having you all here,.

A

Perfect, thank you so much for joining um so now that we have two minutes left. I think it's a great time to start wrapping it up. We've had a lot of great interaction. um Thank you so much to everyone. So thank you. Everyone for joining the latest episode of cloud native live. It was great to have andy here talking about building stability.

A

We I and we both, probably also really love the interaction and questions from the audience and as always, we bring you the latest of cloud native code every wednesday, so you can join in next week and the next week after that, as well. So next week we will have another great session from jason morgan talking about their um amazing topics as well. So thanks again for joining us today and see you next week.