Cloud Native Computing Foundation Online Programs, 28 Feb 2023

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Dynamic right-sizing of Kubernetes for cloud cost savings

Description

Don't miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe in Amsterdam, The Netherlands from 18 - 21 April, 2023. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

A

A

Now, let's get going, thank you for joining us. Everyone welcome to today's cncf live webinar, Dynamic right sizing of kubernetes for cloud cost savings, I'm, Libby, Schultz and I'll, be moderating today's webinar I'm going to read our code of conduct and then I will hand over to varsha Nike devops engineer at Trigg and Chip Huang technical product marketing manager at oci a few housekeeping items before we get started during the webinar. You are not able to talk as an attendee, but you can leave all your questions in our chat box.

A

uh We will get to as many as we can at the end. This is an official webinar of cncf and, as such is subject to the cncf code of conduct. Please do not add anything to the chat or questions that would be in violation of that code of conduct. Please be respectful of all of your fellow participants and presenters. Please also note that the recording and slides will be posted later today to the cncf online programs, page at community.cncf.io, under online programs and they're also available via your registration link.

A

The recording will also be available on our online programs. Youtube playlist and the cncf Channel with that I will hand it over to varsha and Chip to kick off today's presentation. Take it away cool.

B

Hey thanks Libby, so my name is Chip Wong I'm with work for cloud infrastructure and she didn't join paparta and ifm trade, one of the large insurance and all of Scandinavia to discuss a topic. That's top of my for making of you when it comes to run your applications, kubernetes application cloud, and that is the cheap customization for a cluster without confidence performance application, and that is why right side, you kubernetes clouds, are so important because it's done right, you can effectively achieve both of the objectives. uh Next slide. Please.

B

So when we're talking about rice, I think kubernetes cluster, we really talk about two things, the first of which is allocating the right amount of resource for a cluster, and this means the memory CPU. It's important that Clips are not resource around the application to ruin the application fitly, but you don't want to over provision and waste resources so by right-sizing kubernetes cluster. You are saving money because you're only paying for a resource to utilize.

B

The second aspect is really selecting the right type of Hardware. In no types, not all the applicants perform the same. Some required more CPUs other requires more memory or more or are IO intensive or or require specialized Hardware in order to run effectively so providing the right, node type and write Hardware to your application. You allow it to perform optimally, the third of which is really when you write a kubernetes cluster.

B

You maintain a balance of every sort of different application, and this belongs to the critical because it minimizes the resource condition for application which can cause disability.

B

So if you write that equivalent study effectively, essentially your app your your cluster can operate more smoothly, as well as become more stable, so a key aspect of right side, kubernetes cluster. It's really looking at the best way to scale the application depending the application it might be using vertical products, Taylor, quartz and scaler or composition mode, but you're also looking for the right metrics in order to be able to get your application. So just by going through the exercise of Rights right cycling, equivalent cluster, you essentially make your applicants more scalable.

B

It also allow to run more efficiently uh next slide. Please.

B

But when, when you write that in kubernetes cluster for the cloud it does come with some challenges, well, first of which the word close in kubernetes is dynamic and the amount of resources May allocate to your application run effectively, will change it depending on the local application and because of the changes. The way that you right size, your kubernetes cluster needs to dynamic as well. So you need to just along with the lowdown application and before you even start are able to write that here, recruited cluster.

B

You really have to understand how your research are being used by your application of cluster, and that means you need to find the right right way to monitor your monster cluster. But what are the right tools and what to write methodology. So these are some of the things that's kind of hard to determine and also when you're right at the terminal called it is complicated.

B

First of all, you do need to understand how your applicant behave, but even knowing how to apps in your face. You still need to understand for a Google cloud provider What hardware and what type of available. So you can match it up correctly and finally, when you're doing on on, when you do when you're recycling through this cluster, it can affect your performance application. So you have to keep that in mind when you're doing dynamic, dynamic aside from this cluster, so the methodology youth do not interfere with the before the application.

B

So consider these challenges it's crucial to address them effective efficiently, and it's just like the topic. Today. We have bartram's trig, where we share her experience on right to the group, move, kubernetes clustered and what she did for trig. So varsha I'll turn it over to you.

C

Packaging, hello: this is Russia Nike, a devops engineer from trick for crink I. Also have my platform manager present here, pyoto haikovski again from trick he's one of our panelists today. So let's get started actually before we get started. Let's talk a little about who we are yes, scandinavia's largest non-life insurance company. We headquartered in bellhope Denmark. We have over 5.3 million customers and over 7 000 employees.

C

Where do we stand in the market position in the whole of Scandinavia? We hold the top three Market positions spread across Denmark, Norway and Sweden, and in no and in Denmark we hold the highest position. We have a broad variety of insurance products made available to our customers and it is spread across various business sectors, for example in the private sectors we have accident, Insurance, home insurance, pet insurance, health insurance and various others.

C

We have a commercial sector where we have insurance for small and medium scale, uh businesses we have a workers, liability, insurance, property insurance, motor insurance and likewise, and when it comes to corporate sector, we also have group life insurance, including property insurance, transport, insurance and the like. So what I'm?

C

Trying to stress upon is, we have a huge variety of insurance products that are made available across various different business sectors and which means we have a huge amount of data flowing in at real time, both structured and unstructured, and what we have to do is we. We have to collect them from various different sources, and many of these sources have density of data in terabytes.

C

We have to structure them, we have to model them as per the Accord standard for insurance. We have to centralize and streamline this data and make it available at one place at real time, so that we can feed our analytical and business intelligence services and also to our stps straight through processes. This is just to ensure that we improve the customer experience and also fasten the process of insurance itself.

C

Of course, this involves a lot of Technologies and all the applications are to be deployed on kubernetes and that's what we are trying to achieve today.

C

So the agenda for this session would be, will first uh try to have a quick view of how we do a deployment, a quick overview of that, and then we talk about the challenges that we initially faced.

C

There were quite a few I'll try to touch upon the major ones and we'll talk about the solutions Solutions as in what we did to actually right size, our cumulative cluster, so there are actually spread across two stages. One is we have to right size, the worker nodes involved in the kubernetes cluster, and the second way is: we have Auto scaling various techniques of Auto scaling that we have leveraged in order to optimize the utilization and save cost.

C

And finally, we would be glad to share the statistics and the results that we have achieved uh before and after we did these optimizations and a quick summary.

C

So coming to the deployment overview, we are using, of course, Oracle cloud and we have oracle's kubernetes engine, that's orchestrated using terraform, which is infrastructure as a code.

C

We use this because it's easiest easy for us to collaborate within the team and also maintain the code, and we have Cloud native applications deployed as containers inside parts of the kubernetes, and this is done using Helm charts using gitlab, cicd Pipelines and, to start with, we used a basic standard VM that was made available in oracle's VM list and we try to deploy all the applications onto the same type of nodes in the node pool.

C

Thank you now talking about the challenges that we faced with this architecture, we had workloads reading with CPU hungry workloads. We had few memory hungry workloads, we had few performance intensive workloads and we understood that we could not put all of these application Parts under one umbrella and have the same host or same host note for all of these applications.

C

The second issue, being we have a huge deployment, a huge scale of deployment and every deployment of every kubernetes cluster will comprise of at least two thousand hundred odd number of pots. So this would basically mean that we are stressing the underlying host quite a lot and we hit a few edge cases and to get around it, we had to use few customization scripts and this we were able to achieve using something called Cloud.

C

Init Cloud init is basically a customization scripts that you can run as soon as the VM comes up and, of course, this is not available in all Cloud platforms, but they are available in, of course, Oracle, okay, uh Google's GK and Amazon's eks.

C

uh What, for example, how we use this cloud and it was? We wanted to change few arguments in the cubelet service of all the worker nodes. For example, we have wanted to increase the system, reserved resources of uh the node itself on every worker node, and this we were able to achieve using the cloud init script and then coming to the third point.

C

We had a diversified workload as in the uh there were busy hours and there were idle times in our workload, and we had over committed the resources to cater to the busy hours alone, and this meant that when the load is basically idle, we were still paying for the same amount of resources regardless, and that was costing us too much and of course, like every other business. We also had budget constraints and we had to bring down the cost. Somehow.

C

So I keep stressing about. We have huge large deployment and a huge scale of deployment, so this basically means that for every kubernetes cluster that we have in our production environment, we have approximately 5000 vcpus, approximately 13 terabytes of memory, 300 or terabytes of storage and 200 250 number of parts, and this is for one kubernetes cluster and one production deployment, and we have at least four of them running at any given instance. So there was a huge dirt near need of optimizing the cost and the resources here.

C

So we started with right sizing our worker nodes. We try to explore what options to our Cloud providers provide.

C

So apart from just selecting the host OS type or the flavor, we also thought about the processors that are used by the compute instances. So, for example, at least in our project, we had few Java applications which were AMD 64 based, and we had Kafka clusters also deployed as Bots within the cluster, uh using the stream Z operator, and we used amt64 based compute nodes for this, and we have few other Java applications like in the second box, where arm 64 piece.

C

The images were created and we could easily deploy them onto the erm64 compute nodes. The advantage of this is if they have highly performant and also they are fairly cheaper in comparison with the AMD 64 ones, and these arm 64bs compute instances are available in most major Cloud platform, speed, Azure, Google, Oracle, of course, and Amazon.

C

So we could leverage that wherever possible, and then we had a special requirement for running a database as a part inside the kubernetes cluster. Now you would ask me why this is because we wanted our applications to be able to talk to our database as much as possible and as frequently as possible, without incurring much of a latency and Oracle, of course, provides a VM type or VM shape, which has a local disk attached to it, an nvme based SSD attached to it.

C

Of course, this is not available in all of the cloud platforms, but it is available in Google's, GK and Amazon's eks, so you could search for ephemeral, storage, local SSD in Google's gke or you could search for local disk provisioning in eks.

C

Furthermore, most of the cloud providers have these demarcations of various VM types. They have CPU intensive memory, intensive and I O intensive, sometimes even balanced types of VMS. We could select our VM types of the underlying host based on our workload necessity.

C

Furthermore, now that we have chosen the nodes to deploy on, we now have to make sure that the pots, every part that we have application Port, goes to the right or stipulated notes worker node and to make sure uh this happens. We use kubernetes, node, Affinity, node, selectors, stains and tolerations, and talking about arm 64 again, we were able to build ERM 64 based images using the same Docker multi-architecture Builder called build X so in case you're interested.

C

Furthermore, now that we've decided that the flavor is so and so- and we have this type of VMS to select from, we also have to consider how do we architect our kubernetes cluster itself? So from our learnings, we suggest that we should have a discrete node pool planning, basically, meaning that every node pool should serve just one purpose and one kind of workload. It will be easier to manage and also makes more sense.

C

This way consider I choose a very huge node with a very huge amount of memory in CPU and say: I will deploy all my parts onto this node. There is a disadvantage to this, which is if we have block volumes or volumes attached to the pods in these nodes. There's a limitation on the number of volumes that can be attached to every instance, compute instance- and this is true for all the cloud platforms so we'll have to watch out for that.

C

If I see, if I take the Other Extreme wherein I have a very small note, worker node, with very small amount of CPU and memory and I try try to have the same set of PODS to be deployed, then I'll need more of these worker nodes, which would mean I'll, have more worker notes and I'll need more number of ips, and there is a chance that I can run out of the IP assignment for the worker nodes itself in the kubernetes cluster.

C

So we'll have to watch out for that as well. Of course, not all Cloud providers have this flexibility of selecting the memory to CPU ratio. Oracle provides it in the form of flex uh virtual machines, but few other Cloud providers. For example, Azure, provides an exhaustive list of standard VM shapes and sizes to choose from, so that also will help in choosing the right size for your kind of workload.

C

Of course, we suggest that we have a limited set of node pools so that it is easier for us to manage every kubernetes cluster all right so now that we have decided on what host to deploy our ports on, let's Venture into Auto scaling and just before we start with auto scaling. I would like to give a prerequisite for the scaling which is called a metric server.

C

This is basically a server that is to be deployed as an add-on in most of the cloud providers.

C

A metric server will basically collect the container resource metrics from all the cubelets in the worker nodes and then send them to the kubernetes API server and that becomes available to all our autoscalers beat horizontal vertical or cloud. or cluster Auto scalar. So that's kind of a prerequisites. It's also used when you try to use a cube, CTL top command. It results in how how much of CPU and the memory every pod is utilizing. You get to see all those stats.

C

If you try to install the metric server now, let's Venture into the first kind of Auto scaling, which is horizontal power to scalar in as the name suggests, it tries to scale out the number of replicas of a particular controller horizontally, as in when the lower is high, tries to increase the number of pod replicas belonging to a particular deployment or a stateful set and the load. When the load decreases, it tries to scale down again, and there are two ways you can do.

C

This one way is using the CPU and the memory based on how much CPU or memory every uh deployment or part in every deployment is using. It scales out or scales in the other way is to use a custom metrics custom metrics, as in your application, Port, will export some metrics, which makes sense of cost to the horizontal port at a scalar to the Prometheus server, which is installed, maybe on the cumulative cluster itself or somewhere outside and then using these metrics.

C

The Prometheus adapter makes these metrics available to the horizontal power autoscaler as a feedback loop, and the horizontal part of the scale then decides if it has to scale out or scale in so there are two ways of using horizontal part of the scalar and there's just a note that you have to consider. If you're, using a horizontal border to scalar with CPU and memory, you will not be able to use vertical port or to scalar.

C

So if you're using custom, metrics, horizontal port or to scalar, then you can use vertical powder to scalar we'll get to what is vertical part of the scalar a little later in the slides, I. Just thought I will do mention here as well.

C

Okay, before we talk about this graph, I'll just give up a few introduction to this graph itself. This is a Custom Tool uh which is derived which drives all the metrics and the stats from our OK or kubernetes engine cluster, and this is collected from our production environment.

C

So I will be showing you many such crafts and they are all collected from our production environment, and uh this is in congruence with the deployment chart that we showed initially with 2000 art parts and uh 300 uh terabytes of storage. So here you can see on the x-axis, it is time and then the y-axis, you can see the number of PODS and with time you can see, the number of ports are varying because the workload is demanding.

C

So you can see that at one point it went below 350 and at one point it even went beyond 400.. So uh we see that this is varying with time, as opposed to a parallel line parallel to the x-axis, as it was earlier before, horizontal part of the scalar was introduced. Now with this, are we saving anything?

C

No, unfortunately, because we are only waiting, the number of pots in the cluster doesn't mean that we are changing anything with regard to nodes. The nodes are still static and that's what we pay for so, unfortunately, with just horizontal pod, Auto scalar we're not achieving cost savings, so let's bring in cluster Auto scaler so to explain this I'll. Just take an example: let's consider Node 1 and node 2 of the same size for our uh for our convenience.

C

Let's say that we can accommodate a maximum of three parts of the same type and both Node 1 and node. 2 are occupied right. So now, let's consider there are two more parts trying to get scheduled on the same cluster. Now the cluster Auto scalar senses this and starts a new node or Provisions a new node and then deploys these two nodes.

C

These two parts Part 7 and part 8 onto those notes- and this is how the upscaling works in cluster Auto, scalar and downscaling again, when we consider the same Node, 1, node, 2 and node 3., let's see node, 2 and node 3 are running, uh are not optimally used and they have enough capacity to accommodate two more parts of the same type. Then cluster Auto scalar science is this again and then it marks one of the nodes for deletion and then tries to move that part or scheduled on that note to the node 2..

C

As you can see in this figure and then it will try to delete the note 3.. This is provided the Pod disruptor. The constraints like poor disruption. Budgets are all of the scene, so there are few constraints on which the cloud cluster test killer acts, but then, if there are no Port, just uh though there are no constraints, it will try to reschedule the pots based on which node can accommodate it and then try to reduce the number of nodes.

C

Of course, cluster Auto scalar is provided out of the box in few of the cloud provider. Cloud platforms like azure, and there we can. We have the Liberty to choose the minimum and the maximum number of cloud notes in every node pool and that's the maximum we can configure.

C

But there are a few other clouds, a cloud platforms like Oracle, where we will have to install it as an add-on, and this gives us the liberty to choose or fine tune. The class starter scaler for our needs. I have just tried to highlight few of the flags that we have configured and tweaked with just to make sure that they fit our needs better. So I'll just quickly take a few examples here.

C

For example, you can see this scale down utilization threshold, which will tell the cluster Auto scaler how much of the utilization should a node be uh using before you try to scale down that node. So before you consider to scale down that particular node, and you can also specify how much time uh you give up node particular node to get provision and come up to the ready State here we have just kept as 15 minutes. There are many such parameters that you can play around with, and you can find that on the GitHub repository.

C

Of course, you can also select the node pools on which the cluster Auto Scala should act, and you can also provide the minimum and the maximum note count for that node pool alone. So this way you can pick every node pool that is to be monitored under cluster autoscaler.

C

Now this graph is together with a horizontal portal to scalar and cluster to scalar. This is what we achieve. This is a similar graph provided from the same tool, but on the y-axis you now see the node counts, the number of nodes in the node pool, so you can see initially the node count went up to 100 and then it gradually tried to reduce and it stabilized at around 60..

C

This is because, in our workload uh we have initial. Since we are DNA, of course, data and analytics, we have a huge amount of historical data to be loaded initially and then, when it gradually gets to the current time of CDC, then the load is pretty much stable.

C

So that's how you can see the node uh count goes up and down and are we saving anything here cost twice, and the answer is yes, because notes are what we pay for and if we see lesser number of nodes, of course, we are paying for lesser number of them as opposed to having a street land parallel to the YX x-axis, and we would have to pay for all 100 of them at all. Given times so, this is an advantage, but is this enough?

C

Our graph C otherwise- and these are graphs taken from the CPU and the memory stats of scene production clusters same workload and at the same time?

C

Well, let me talk about a few things in this graph for CPU, the X Y axis denotes the number of cores and for memory the y-axis denotes the number of gigabytes of memory with time, of course, and the green, the red line indicates the actual utilization of the resources of a of the pots put together and the blue line, in the case, the resource requests of every pod man in the Manifest file.

C

Of course, this is like a det mining factor for the cluster Auto scalar to scale up or scale down the number of nodes, because this is a guaranteed amount of resources that we guarantee to the user. So this is what we pay for, and this is what we utilize.

C

You can see a huge difference between these two and you might ask me why the answer is: if I am a developer and I have an app and I have to deploy this app I would give the port manifest in such a way that the resource request is high or it caters to the busy hours of my application and I'll give a little I keep a little bit of buffer beyond that, because it's an estimate at the end of the day right- and this is for one app and we are talking about 2150 parts and we when we put all the all of them together.

C

This is the cumulative disaster. We have a huge difference between the utilization and the requests and we are paying for the requests by the way and vertical part of the Skiller comes to our rescue again vertical part of the scalar, together with horizontal Paratus killer, can work only when a horizontal part of the scalar works on the customers, custom, metrics and not CPU or memory.

C

Now, let's consider vertical power, Auto scalar, as opposed to horizontal part of the scale which increases or decreases the number of replicas of a particular pod increases the size of the part itself. So when I see the size, it basically means the CPU, and the memory requested by a part is increased based on the actual part utilization.

C

This is done on a regular interval, of course, that's configurable and in steps which is also configurable in the vertical part of the scalar. So in this diagram, what I try to show is. First, we start off with a minimum CPU and the memory that is configured by the user for the vertical part of the scalar. You can also configure a limit sorry about.

A

C

So well, it started with vertical power to scalar I'm, hoping uh we got through this slide, where I discuss why we need vpa. So we see that the utilization and the requested resource is way off and we have to bridge this.

C

Gap and vpe comes to RSQ uh vpa together with hpe can be used only when HPA is used with custom metrics and not with CPU or memory and PPE, as opposed to hpe, where the number of ports broad replicas, gets scaled up and scaled down, PPE tries to increase the size of the Pod itself when I say the size it is CPU and the memory requested by the port is increase based on the actual utilization of the pod, and this is done in regular intervals, which is configurable and also in steps which is also configurable just for the sake of Simplicity.

C

I have chosen some random numbers to depict how this works. So initially you have to set a minimum and the maximum number of CPU and memory that vpa can play with, and then it will start with the minimum configured CPU and memory on the Pod and as as and when it senses that the Pod is utilizing more than this.

C

It steps up and it tries to increase this requested CPU and memory and steps up further again based on requirement and scale down, is also happening in the same steps and at the same regular intervals, that's configured in the vertical product is killer.

C

Now this is, of course, added as an add-on installed as an add-on in the kubernetes cluster, and here I have started to just share a few flags that might be of use to you.

C

Of course, there are many others that can be found in the gitlab repo, but these are the few which we would like to highlight, and you can ignore these numbers that we have configured because we had to do few iterations to get to these numbers, and so will you uh because it'll, if you have to tailor this VP to your workload, you'll have to run a few iterations before you get to the ideal values for your project.

C

So um to start with, I will try to highlight a few Flags here. One is this recommendation margin fact fraction, which is basically the amount by which CPU are or the memory gets stepped up or stepped down. So we have chosen 30, so every step would be 30 increase in CPU and the memory. We would also have parameters like how long do you want to retain the memory and CPU history before it dies down before you discard them? We can also set which namespace do you want to uh do?

C

You want vpe to act upon and we have used Prometheus for our storage here, so we provide the Prometheus URL so on and so forth. uh This is for the vpa recommender entity. We also have something called vpa updater entity where you can provide informations like how what is the interval at which you want the vpe to run. We have set that to 10 minutes, you could choose yours and we can also provide the number of minimum replicas of every controller. That is to be run.

C

uh That is needed for the VP to act on a particular controller at all, and if you see too many evictions of the pods- and you want to tolerate- or you want to reduce that, we, for example, use this uh eviction. Read limit, read, pass and also eviction tolerance. This is to just to tell the vertical part of this killer that you have to have a certain percentage of the replicas running at all times when you consider evicting one of the pods in the controller.

C

So these are few of the flags that we have touched upon, but there are various others that might interest you and with vpe, together with Cloud cluster, to scalar and horizontal power to scalar. We get this graph and it is the same workload and uh the same production deployment.

C

We see that the red line, which is the utilization of the resource, actual utilization of the resource, is very close to the blue line, which is the requested resource you might ask. There is still a huge glitch or huge difference initially, uh at least in the initial few days, and this is because we have configured it to be such, and this is because uh totally because our workload behaves this way.

C

We have huge historical data in the beginning, and we want to reduce the number of evictions that happen due to the vertical product is killer, and we have configured it kind of this way. But you can refrain from doing this of course, and are we saving here with this? Of course, combined? We are seeing a lot because, firstly, our ports are optimized, our pods are scaling and the nodes are also scaling and the notes are what we pay for, and that's also optimized, and we save huge, a safe, huge on the cost.

C

So this table that you see on screen is uh from from the same workload in our production environments, two different production environments, one without any of these optimization techniques, the one at the bottom that you see you can see that we have used 4K, CPUs and after optimization, we've come down to 1.7, K uh CPUs and with memory we had used 12 terabytes of memory, and we came down to 6.5 terabytes of memory, almost 50 percent of the cost. Saving that we're doing here. Of course, you can find few differences in the tables.

C

uh This is because we we tried to get wiser, so so to say we try to increase the planning or increase number of node pools and discreetly, uh uh discreetly, have notes defined, and we also change the shapes here and there. So that's the reason why you see it, but it the load that it is can is the scene. So you could see these uh denzio A1 E4 Flex. These are basically standard VMS provided by Oracle kubernetes cluster danzaio is the locally the one with locally attached nvme SSD uh A1 Flex.

C

Are these arm 64 pieced compute instances? E4 is the standard, uh amd64 images uh compute instances.

C

So with this we come to the summary, and to summarize, uh we first need to explore our Cloud providers.

C

Vm standard standard, VM sizes shapes before we actually decide on what our node type should be a node, a particular node pool should be based on our workload requirements and, furthermore, we have Auto scaling techniques like cluster to scalar, which scales the number of nodes in the cluster note pool, and we have horizontal power to scalar, which furthermore ensures the Pod replicas are scaling, and hence the cumulative sum of all the parts keep decreasing whenever it is not necessary and vpa, which is vertical power to scalar, which ensures that the port is optimally used, as in the resource request, is as close to the actual utilization of the port as much as possible hope.

C

This was of some help to you. Thank you for your patient listening and sorry about the technical glitch. We are open to questions and answers. Maybe I should get back to the chat.

C

C

uh We are not doing git Ops as yet uh for the first question. Okay, let me get to the question: if you do a new deployment, the latest value will get overwritten by the value that was in the Manifest. So you didn't get. Do you do anything special to sync to get uh we when I see deployments in production, we do uh fresh deployments from the gitlab CI CD every time, so the values are in the gitlab repo and once we update there, that's when it goes to the master and that's what gets deployed.

C

So we don't have this issue as of now, but when we get to git Ops I'm, pretty sure we'll get into this issue.

C

And, according to BP configuration resizing, many may happen every 10 minutes based on one rdk histogram. How much stability was affected with endless restarts across class, and this is the reason why we had this initial glitch that you saw in the graph.

C

That's because we tried to start the vpa a little later and we configured uh the resource parameters, resource request, parameters to very high value initially, because initially that's when at least for our workload, that's when most of the loading and the busy time happens, we have a huge amount of historical data and to get to the stable CDC till then we don't want too many uh evictions of the pods.

C

That would just delay the loading process and that's the reason we try to manually have a glitch there and we start uh start applying or patching the VP a little later during the deployment.

C

So a horizontal power to scalar can't use memory or CPU. What metrics are you using to know when to scale up the multiple parts yeah we use custom metrics. We have applications which export a particular kind of metrics, which is kind of a deciding factor because it provides basically how much are we lacking in time? I would not try to go into the details, but it could differ in your raw project as well, but this value should basically be the deciding factor for you to know.

C

If you have to your workload, is heavy or if your applications are busy or not. So if you have some metrics in your application, you could of course export that metrics and then Prometheus uh to this property server and have from this adapter or try to make it available to the horizontal port or to scalar.

C

Handle when there is one or two parts preventing it from removing the note, is there any way to flag some parts as not important or that or that they can be rescheduled if needed? uh Yeah? This is uh one thing that we also faced with Blaster Auto scalar. We had pod disruption budgets, and uh that meant we had one instance of the pot and the pot is disruption. Budget was saying that I should have a minimum of one and in which case, cluster autoscaler refrains.

C

uh It raises its hand and says: okay, now, I can't do anything with this load. You will have to exist because the port is still existing and there's no second replica of that part. That's kind of to say wrong configuration because for a poor disruption, budgets should start with at least two replicas. In my opinion,.

C

And to get around this uh I would say you will need a bit of manual intervention if you really want the cluster Auto scaler to run, or you have to make sure you decide wisely. When you have the poor disruption budgets configured, do you have keeda along with VP? Is it possible?

C

I am not aware of Kira to be frank uh here that you have any inputs on this yeah.

D

This is for for collecting values metrics. We could use it yes, but in our setup we don't have it. Maybe we will use it in the future. For example, we use custom metrics from from Kafka, so it would be perfect, perfect feet for the purpose.

D

Yes, it is possible yeah.

C

Okay, so each app has a has to push metrics to Prometheus at some interval, or does parameters pull uh yeah? There are two ways the Prometheus works. One is push and one is pulled, so we are doing the push wherein we push the metrics to the Prometheus and yeah. There is configuration in Prometheus itself, where you can see. How frequently should you uh pull or push the metrics from various apps? That's a configuration in the Prometheus. When you install Prometheus onto the cluster, you have those configuration parameters.

C

The HP and VP methods are used together. What is the difference between its metrics? Okay, uh if uh HP is used together with vpa I'm presuming? This is because HPA was used with custom metrics and the custom metrics basically means your application is trying to export some customized metrics, very customized to your application right and vpa methods. Vpa uses the actual resource utilization. When I say when you do a cubesatel top command, you actually get to see how much support is actually utilizing, regardless of how much you've configured the Pod resource requests.

C

So that is what it considers for vpa vertical power to scalar and for hpe. You have used the custom metrics, which is your. What your application is seeing. I need uh to have a threshold to scale out and scale in the number of replicas of a pod hope that answered, but I didn't miss any question.

C

Hope the session was of some help to you. If you further have any questions, you can, of course, uh leave them with the organizer and they will try to communicate to us over mail, but thank you for your patient listening.

A

Okay thanks so much are there any other questions before we? Let everyone go.

A

All right well excellent job. Thank you, varsha and Chip. Thank you. Everyone for joining us um I think you know how to reach our speakers. If you have any additional questions, but thanks for joining our cncf live webinar today. uh As a reminder, everything will be online later today, early tomorrow. So um just let us know if you have any questions and we'll see you again next time, thanks so much.