Cloud Native Computing Foundation KCD Berlin 2022, 10 Nov 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Predictive autoscaling patterns in Kubernetes - Roberto Carratala, Red Hat

Description

During this talk we will demonstrate how your applications can benefit from Vertical Pod Autoscaler improving the responsiveness and performance of your workloads in Kubernetes environments.

Website: https://www.redhat.com/de

Organized by @Microsoft @kubermatic7173 @SysEleven

Thanks to our sponsors @CapgeminiGlobal, @gardenio, @sysdig , @SUSE, @anynines, @redhat, nginx, serve-u

A

It is a situation when your container is consuming more memory than its limits. The the final and in this situation, kubernetes will Mark your different ports, your different containers as a possible candidate for termination.

A

If your application is consuming more and more a memory, then cubelet will restart and terminate your boat and, on the other hand, also, you can suffer a poor workload performance, because if your application is using more memory and CPU more resource than is expected also will affect to the other containers and all the pots that are running in the same node, because this no this node it's allocating and it's running different applications. Then, for example, it's competing for the same resources.

A

So you need to take care also for not having this poor workload performance, and these can cause also wrong resource allocation. You can, for example, set um your Java application with eight gigs, but this is not an optimized situation. You need to adjust, deploy your applications and the resource consumption, with the proper request and limits and with the proper resource consumption in order to better utilize and optimize the different resources in your kubernetes customs, and with all of that in and more situations that can have with a wrong request and limits definition.

A

Your sres will not be happy for what reason, because it's putting more extra burden in the backpacks and the sres needs to focus also in your different applications and need to take care of the different applications, and even your slas and slos can be at risk.

A

So what's a possible solution for avoiding these different issues that we describe entering vertical board, Auto scalar, vertical part of the scalar or vpa is a kubernetes 2 in the cncf is fully open source that frees the users from the necessity to setting up to date, resource limits manually for the containers in the ports. So it will set the requests and limits automatically based on this new certain metrics and Define. The proper thresholds that will allow proper scheduling and turnouts, so BPA will look for.

A

Your application will also look for the kubernetes metrics I'm, based with this usage will Define the request automatically without manual intervention, and then you can proper scheduling your different ports into the different nodes and, on the other hand, will also maintain the different ratios between limits and requests, that we show that it's very important to have this quality of service types and, for example, when to use VBA when you can use vpa.

A

So you can use vpa to help ensure your ports stay in periods of the high demand by scheduling ports into the different nodes that have the abbreviate resources on each spot.

A

Imagine that you have, for example, a Black Friday period Then, you are expecting very high load and then you can deploy it manually or you can use vpa to always estimate the resource of your different application and adapt and automatically apply this request and limits and never have these situations of the home killed, for example, and on the other hand, for the sres, for the sysadmins and for the operators you have also- and you can use the vpa to better utilize.

A

The cluster resources, for example, to preventing the Bots to go and reserving more memory and CPU than is needed and, for this reason, optimizing the different resources in your kubernetes clusters, so the vpa will monitor the resources that the workloads is actually using, asking the API and also grabbing these kubernetes metrics and adjusting the resource requirements. So the capacity is available also for other workloads that are running in your kubernetes clusters as well.

A

So this is the vpa architecture that have three major components. You have the recommender that will monitor current and past resource consumption and based on it based on these metrics and that crop from the kubernetes metrics will provide recommended values based in this container and memory requests and on the other hand, the updater will check which of the different managed spots have this correct um set and if not, are applied to your ports.

A

It will automatically kill this spot and recreate it with the updated request based in the recommended resources that, in the real time, kubernetes is grabbing and, on the other hand, the admission plugin that sets the correct resource requests on the new boards. Just for example, if I created or recreated by the updated that we saw this is a diagram that shows the um VPI architecture.

A

We have the um recommender and also the updata that it's in the middle that it's watching the Pod through a BBA object, and also we have this um vpa emission controller that acts and reacts. For example, when a new Port is generated and, on the other hand, always the vpa controller, it's pulling different metrics from the metric server through the API server and it's pulling utilization and events at the real temp.

A

So it will monitor your application at the real time and will adjust and provide recommendations predicting the different um recommend predicting the different requests and limits to avoid, for example, to have these situations where you are using more memory than is expected or situations that you have eviction for this home killed for example, and sometimes you wanted to have also um apply automatically this, this vpa recommendations, but sometimes not so have three different modes to run.

A

The first is the automatic mode that automatically applies the recommended resources and ports associated with this controller, so the vpa terminates existing ports and generate New Ports with the recommended resource limits and requests, and on the other hand, we have the initial that acts more or less like this automatic, but with um only on certain events, for example, when you create a new bot, never will touch existing and running boards, and also the last but not least, is the oauth mode that only generates resource recommendation, but never will touch a running board, neither also if it's created.

A

So just we look for your different boards and then will suggest several recommendation, values and sometimes vpa recommendations. So imagine that your application is consuming more and more memory and then the vpa um sets a recommendation that might exceed these available resources. For example, node size. It's requesting more things that the available quota is allowing so can cause this VBA recommendation that ports go to pending state using the cluster autoscaler and combining this cluster other scalar with VBA.

A

The cluster of the Scala is a kubernetes tool, also open source in the cncf that increases or decreases the size of the kubernetes cluster, for example, adding more nodes or decreasing these nodes based on the spending ports and also the utilization metrics.

A

So combining these two, this BPA plus cluster autoscala, could be very good combination in order to scale up and scale down your different um nodes and also based in this spot capacity and base and looking for this utilization metrics. So you can combine this vpa plus the cluster of the Scala in order to grow the ports and also the number of nodes in your different classes, and then we can see these uh yeah. Hopefully you can see my screen perfect, so we have our kubernetes.

A

Our cabinet discussed here that have different nodes and I will run to avoid type manually, and this uh just demo script that will run effectively this demo. So we will have two different scenarios: the first without BPA that we can see what happens. If my application is using more memory, then it limits its definition. So just I generate this vpa. Hopefully, you can see the screen right and the font is okay, perfect, so I will deploy an application that will use an image.

A

This image is the stress image and just will allocate a certain amount of memory and CPU.

A

You can see that I allocated 250 and I requesting more done its limits, so what happened home killed for what reason, because my application is using more memory and then trying to use more memory and then cubelet detects this and then we'll restart my application and my application is not um in good shape. Isn't it so? If I describe this application and I try to grab for the reason, the reason is: OHM kill so effectively, because my application is using more memory than its limits.

A

I not calculated properly and then I have this state so how vpa can help us? So the first thing is: you need to deploy BPA into your different kubernetes clusters. I used an open shift cluster and I used olm. The operator lifecycle monitor in order to deploy it and, as you can see, we have the admission plugin the recommended and also the update and the port of the scalar operator.

A

I will deploy a namespace this name space. We will deploy the exact same application and we can check how vpa can adjust automatically the different requests and limits. So these requests and limits we Define and we will Define the exact same application, same image, everything it's uh okay, but we will set at the first time, 150 allocated memory and one core.

A

This limit is a little bit below of the news and I'm doing that to try to let the application running properly and also grab different kubernetes metrics, to try to produce these metrics to fit the vpa in order to recommend values and then try to increase this vpa and and this consumption demonstrating, for example, how vpa can help you so vpa will use the metrics to try to adapt these application resources.

A

So let's check them and if everything is all right, so we have here the exact same application we have that is in running, and then we check that, for example, effectively it's running with these limits.

A

That is 200 mini course, and the request 100 millicos and then BPI will just use the metrics that it's uh within the kubernetes metrics and you can check this metrics using the cube CTL Top Pot and the this um we'll check the different metrics that you have in this specific namespace and as you can check, for example, the request that we provide it's one, um uh almost one core and a little bit below of the meme respected, but also we can use this.

A

So with this metrics, we need to generate an VBA object. So we have the controller. We have also the recommended updater, but we need to generate an object Asia for the vertical Board of the Scala. So we can check here but effectively we generate a vpa autoscaler that we look for my deployment in my specific namespace and will have also the minimum value that is allowed and the maximum value.

A

And this is the maximum value that the vpa will allow to be adjusted, because, even if my application, it's requesting more and more memory and CPU I, will put a limit in order to prevent bad things in my cluster and also running out of memory. So my controlled resources will CPU and memory and then, after that is applied, vpa will look for my application and will check the different metrics and if my application have a request, for example, and is using more memory than before, BPA will look for these metrics and will Define different recommendations.

A

So, for example, if I put the minimum and the maximum value at the maximum, we'll put one call and one gigabyte of memory never more, because we are limited at the maximum level. Only for this application remember that this vpa and you can have every um vpa objects that you want controlling different uh objects. uh It's a namespace. So if we check the BPA status, vpa will look for these metrics we'll check this Matrix in order to adjust these different resource requests and limits and we'll check the different CPU and memory.

A

And then, if we check the VBA status, Let's- uh okay, we have here that it's recommending one CPU and unless the memory that it's consuming in this moment and as we can see, we have this target. This target is defining this and also we have the upper bound. That is the maximum value that we can have in bytes.

A

As we can see, the application is limited to 200, and now we can simulate how vpa will adjust the different resources as long as the application requires more resources and how we can do that specifying and trying to patch our application in order to require more, um for example, more memory than its limits. So in a real situation without BPA, this will be on the um home killed, but vpa will save the day.

A

How will adjust automatically the different requests and limits based in the usage, so we'll check the different kubernetes metrics, we'll check this and look? We have this VBA that Port resources updated by a stress, BPA the container and the memory request are evaluated and then it's applied automatically so the next time that we have this spot, that it's requested automatically increases the limits and we have a memory that is adjusted and also the request and limits that are defined and are a little bit um more higher.

A

And for this reason my application is surviving and it's not entering in this home, kill in this out of memory. So the vpas save the day and as you can see, this is running, and we have here my obligation, for example, that we can check and my application it's safe and it's up and running and we can check the effectively that it's um products and it's this requested limits that are adjusted by the vpa.

A

And hopefully you enjoy the demo as well, and you have all of the resources in this repository. It's a fully open source. You can run BCM or whatever you want in each kubernetes clusters that um it's definite. Hopefully you enjoyed the the demo and also the session.