Cloud Native Computing Foundation CNCF Webinars, 12 Apr 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Webinar: How to Make K8s Autoscale Work: Novice to Pro to VIP with Real Multi-Tier MicroServices

Description

Join us for Kubernetes Forums Seoul, Sydney, Bengaluru and Delhi - learn more at kubecon.io

Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects

Assuring your Application performs to Service Level Objectives is the end game, and Kubernetes provides Horizontal Pod Autoscaling Policies that allow you to define a set of conditions for when Kubernetes will automatically scale your services to help manage to SLOs. Identifying the best values for these policies is not easy, with limited information on how to achieve results for real, complex applications. Policies that do not properly account for multiple factors will not only fail to assure performance, but could negatively impact other services. Effective policies need to consider how to choose the best KPIs that reflect SLOs and resources, how to use multiple metrics, how to determine the max number of replicas per service, and when does vertical work better than horizontal scaling. We will share our repeatable methodology based on a twitter-like app consisting of multiple services: http frontend, multiple gRPC accessed backends, using Cassandra. We deployed Istio to load balance traffic, which provided telemetry data for response time based KPIs, and used Locust for load generation. We will share lessons and best practices using an iterative approach that ensures SLO while maximizing resource efficiency with HPA policies.

A

Okay, joins have stabilized, so we're going to go ahead and get started like to thank everyone. Who's joining us today welcome to CNCs webinar how to make kubernetes auto scale work novice to pro to VIP with real multi-tier micro services.

A

My name is Josh burkas I'm, a CNC F ambassador and I am kubernetes community manager at Red, Hat and I will be moderating. Today's webinar we'd like to welcome our presenters today, Ming ding and inland shoe from turbo nanak who are going to be presenting on auto scaling before we get started. Let me explain a little bit of this for those of you who haven't attended one of these webinars before first I during the webinar, you will not be able to speak as an attendee. You are muted.

A

There is a Q&A box question in the answer box at the bottom of the zoom application. If you have questions, please drop your questions in in there and we will get to as many as we can at the end or if the presenters decide to pick up. Some of the questions before then keep in mind that this question that this session is being recorded and will be sent out afterwards with a link to the presentation, but you do also want to keep in mind that we're being recorded when you ask your questions with that.

A

I'm going to hand this over to Ming and inland to kick off today's presentation. So please tell us how to make auto scaling work.

B

Online and together with me, is mundane, so we are coming from subatomic just want to make sure. Can you hear me? Okay, yes, sir.

A

Coming through loud and clear, that's.

B

Great thanks. Yes, so, as Josh mentioned, that we are going to share our experiment of how to make kubernetes auto scale work so I, don't know how many people have tried different Auto scale technologies provided by kubernetes call the box, and do you think it's easy or difficult to configure it? But in any case we want to share our experiment on our test that you is based on the real multi-tier micro-service based application and share some lessons. We learned with everyone. Hopefully you will feel it useful.

B

So a little bit introduction about ourselves we understand up, is definitely not a random pigeon session. So, but we really want to tell you who we are and you can decide whether you want to trust us or you. Don't it's not just not to trust us so term anomic is founded in 2009 according arson, and we have now more than 500 employees. We are one of the CMSes silver member and I mean the upcoming could become Europe in Barcelona 2019. We actually one of the platinum sponsors of Rubicon.

B

So if you happen to go there, also you will eat I think it will be much easier to find us compared to the past when we were silver sponsors and the whole idea of the software that we provided is that software should manage your IT resources, not people. So basically, we provide an abstraction and common analysis tangi on top of the absorption, to help you to manage your IT resources, your most efficient and most productive way to increase your application performance.

B

So just like what Kelsey hi-power putting in his Twitter here's my application run it for me when and where I want it securely. That's the end game, so our goal is to provide you. The experience of the end games of you know how to manage your data center manually all the time. So next slide, please so container adoption is accelerating what a surprise so I'm, pretty sure that you, the reason why you join is because you are at the different maturity curve, different stage of about being the containers.

B

So this is actually a survey, our marketing team, among more than 2,000 of our customers and the. If you can see the sample size of the survey is about 600-700 and based on the response we get, we expect to see a hundred sixty five percent growth in the next 18 months. So this is not a surprising I think so so container adoption has been accelerating over the past few years and that's the same.

B

We get from a survey that we look at so so among among them, 43% of their environment will be running, compare eyes application. We think that, like 18 months and after we found a 6% of their environment along containerized application today, among what 28% are mission critical so which means that application performance is really really important to these people, because it directly impact their business next slide, please! So where are they in terms of maturity curve?

B

We can see people who's playing with a different technology they actually at different stages of how they adopt the container from exploring like how do they contain so you can. You can actually find a more interesting trail map that is coming Francia directly at different stages, like you start to look at pathology to containerize the application. What is the runtime, and you start to do some CIT be exploring how to package? What is the image registry? Something like that?

B

Then, later you start to come say when you have a certain skill deployments that think about what efficient platform should use. Kubernetes versus foundry versus methyls called it our stalkerish one, for example, right and based on some of the conversation I had with some of the biggest banks in the world. In fact, I heard a lot. I want to focus on 6% that is showing here.

B

A lot of people actually told me that the platform first strategy, which is basically the new standard, especially for the greenfield application, the new standard, is every application- has to be run in container, which is actually prior to even there's several parallel things. I was going together right. Some people is looking at which container technology to use which APIs tration to use, but at the same time, I live. A lot of people are deciding that for ongoing basis that all the green free application should be packaged in container.

B

That basically tells you how promising is containerization trend we have seeing at this moment. So, let's back it next slide, please. So, let's back to the question containerization, why is it because soccer is cool so part of the reason why hustle x2 quake, but that's okay? Thank you so so docker is absolutely cool.

B

But docker is that point that not the inventor of containers or slaves like Alexeev, appears much earlier than talker, but docker did bring a lot of convenience to people who want to use containers, bring containers good experiences of using containers at different level. It's much more user friendly now, but the reason behind the containerization that revolution is definitely not because this is so cool that you have to adopt it and also is it because container is lightweight things? You already see the answer, though.

B

No yes, it's funny a few years ago, when you start to educate container to people, you always see a comparison chart between compare and BM, because that's where you big two units get a lot of similarity, different different points and how we differentiate them. But in fact yes container, doesn't you don't have to install operating system? That's mostly the highly visible light and compared to VM which contain it, but is that the motivation why people want to use container? The answer is no, because you think about your huge data center.

B

You have a lot of resources there and what is the offering is just going to take you. The over hat is not that big to motivate people to start to about container. So that's also not the main reason so is container is cost-efficient, as you can imagine the out, there is also no so apparently that container you can run container on bare mantle, which may limit some of the tax or virtualization that you pay for these virtualization vendors. But we also see a lot of people.

B

They run containers inside their virtual machines, which means they are still going to pay the tax on virtualization. So whether or not this actually save the cost or not, is another question behind how you absolutely are design and I'll keep that the holy hype infrastructure, but definitely that's also not the reason. So, what's the reason it's all about applications right, so it's all about how do I scale my application more properly on the fly? How do I make my application more probability?

B

So, with the help of kubernetes, you actually install your kubernetes, which serves an abstraction for different. Is options can do application deployment, lifecycle management on the same platform, regardless? What kind of where they use on prom topic, crowd or mix up to multi-cloud? You can actually run applications in a uniform way, and also you want the elasticity coming from the full step where you want the elasticity on the application instances. You also want the elasticity at the cluster level so that you can pair up and bow your resources that containerization.

B

That will give you the flexibility that you can have all these benefits, and once you actually structure react attack your application into micro-services put dumping. The container is a no-brainer option. That's the real motivation, the real reason behind the scene why people start to about container? So what does container do here? Unfortunately, that's packaging and distribution again, so you can choose not to believe me, but in fact this is not coming from me. If you, if you check a month, please give another.

B

Could yes thank you so Kelsey Hightower, actually sorry, I keep copy and paste is tweeted I'm. So so because this is, we are highly division, but otherwise be highly aligned. So so Kelsey Hightower mentioned that things back in December. 2017 people will soon learn that containers only saw the software packaging distribution problems. Continuous, don't manage anything, they need to be managed. So if you associate, let me do a quick summary of the previous few slides, that's okay!

B

Well, we can go to the next slide if we can quickly go over the past few slides really quickly, so people are adopting containers and people actually deploying that mission-critical applications in through their container eyes the environment into their campaign apostle, but to assure that this business application, critical mission-critical glaze application performance, they expect they're, never actually violating kind of any kind of service level, objective or service level agreement. That's not easy right. Containerization doesn't help you with that portability. Scalability and elasticity.

B

Imagine that now you have application, that is in smaller pieces, but still your job application is you're going to be job application. You still might need to manage jbm. You still need to manage a VM now in teach micro services. Originally, you may have one BN that runs everything monolithically now you have tons of containers that is break down into micro-services for you to easily to scale them up, deploy them different places, but, on the other hand, to actually manage the performance to manage the efficiency of how you run. These containers becomes more competitive. Why?

B

Because you have a much higher density and more layer to manage, just like from the lab, see right that so that's the complexity. So, having said that, we that's why the upstream kubernetes also try to come up with some Molina focus on some out of our solutions for different layers of your stack, for example, how the vertical scarier container have use horizontal scary apart? How do vertical sorry? How do you scale up and down your postures using the scaler, so different scaling options to use them?

B

How do you actually them, how they actually manage them, is maybe one of the solutions that you can actually manage this complexity, so without saying much I'm going to hand it over to a Hmong to continue and share all the experiments that we have done using the autoscaler that provided by kubernetes, one click over.

C

Here, ok, thank you very much. Tim Jimmy yeah yeah. Ok, thank you. So my name is Ming I'm. A software engineer in which economic SNF just said in a container world assuring performance and maximize resource utilization is a very difficult problem. It's a hard problem. And-And-And-And Cooper, ladies, does offer auto scaling solutions on different levels, but we feel that there is limited information in terms of you know how to make best use of these solutions and apply them to a real multi-tiered micro-service applications. So that's the purpose of this webinar.

C

We did some experimentation and we want to share with you guys our effort in answering all those above questions and the lessons we learned at namagiri.

C

So as you as many of you must have known that the kubernetes offers auto-scaling solutions on different levels. On the pot level, there are HPA, which is horizontal autoscaler, which is the most secure, mature and easy to enable solution on the method. Solution on the pot level is the vertical pop, although scalar, which is still evolving very quickly, and it's an add-on solution for kubernetes and we'll talk about the pros and cons of these solutions in the next slide. On the note level, there is the cluster autoscaler solution, which is used.

C

For example, it can decide when to scale up scale out the node or scale down on the millet based on the pending path or the note utilization in your cluster. So imagine you have these many options that you can choose from and how you can utilize them in the best and most efficient way is, is definitely announced with your task, but this particular talk we're going to focus on the horizontal autoscaler solution, because that is the most mature solution and I think most people are trying to use that and get and get analyze with that.

C

So just a quick talk are the only honest solution itself: it's a very high-level overview of the HP Envy PA solutions. So this is an overview of HP, a solution where it is still going to be Native communities controlling.

C

There is the horizontal other scaler that is monitoring the metric periodically and make decision to scale up or down the number of replicas of your deployment or on or RC based on a threshold based algorithm and the metrics. That's being watched can include, you know the usual CPU and memory, but also support, which is very nicely the custom metrics things like response time and PPS network traffic. These metrics campus can be supplied by third-party vendors. It can be nicely integrated into the system and also external metrics.

C

On the VPI side, as I said, it is a fast evolving solution in the community. The idea is basically again to monitor the metrics, which is CPU use and memory, use, value and and use that to update and resize the CPU memory request for each pod in the deployment. Of course, there is some downside of the solution for now the BPA at this moment.

C

If skew disruptive in order to update the resource request, part needs to be filled and we started I believe there are efforts in the community to enable the in-place resource update of the pod, which could be very handy when that's implemented.

C

So there, in order to evaluate the kubernetes on a scale solution, we have created a non-trivial test that, because we believe that it's the only way to to be able to verify the solutions and to test the current cost. So the the application we have we have created in-house is a real micro service based applications with multiple tiers. We have the HTTP front-end api service, backed by a multiple G RPC services.

C

The whole application is is like a Twitter life application, basically taken multiple users and tweet and make friends things like that and everything's, backed by the Cassandra database.

C

The deployment of the testbed as a line in this slide the service we just talked about. Of course, we installed a steel and injected proxy into the into the past sto, as many use to load balance the HTTP and both HTTP and gr keys in traffic, and also to do tune to the metrics collection and reporting Cassandra is deployed as state %.

C

So this is the flow of our test bet, so how we put everything together, we, the blue box, illustrates the Twitter app and we just talked about and that the orange boxes Castillo service mesh deployment and which provides all the functionality that we talked about and also, of course, very importantly, the visibility through the Performa dashboard and the green box are the metrics servers. This includes the default metric server that is used to collect the CPU and memory usage and also the custom metrics.

C

That is very important for experimentation to collect custom metrics like the response time and transaction, and, of course, these metrics server can be accessed through the kubernetes api alligator framework, so to basically to unify the endpoints. And finally, the the user simulator is is achieved by launching a locust cluster. We use the locust cluster to simulate user workflows.

C

You can specify different patterns and different numbers of users, so we will present our results a bit later so with this. That means that so we have our test that in place. The next thing is how we're going to design our tests. What kind of how do we enable these solutions? So what? What? What is the steps that we should go through?

C

Naturally, we think that the easiest step would be to use the CPU based HBA, because that is readily available. All you need is to install a metric server and that's apply the metrics that is needed, and you just run a few miles on to to specify the parameters, but the difficult part for us is. You know, as as many of you may experience when you find a tune, an application is that you don't really know exactly which tears would bottleneck, and you don't know which particular deployment to scale and how many replicas to scale.

C

And how do you specify your threshold? Of course we don't know in the first place either um we just make a guess so I think CPU percent as 75 is a reasonable guess. Any any number higher would be. You know probably course performance, Faraday, ssin any performance lower, but probably cause the resource, wastage and- and also we don't know what would the maximum number is set. We just chose 10 as a safe number, you're just going to experiment and, of course we would for most of our Twitter services.

C

That includes the API that we'd and Anna Mueller services, we wrongest experiments and, of course we can collect the metrics and and show them in the in Cortana pour in a very nice way. As you can see here, we launched the workload on a locust cluster, where we simulate 300 users that make a request to the Twitter service. At the same time, this the workload starts at 7 o'clock and ends at 8 o'clock.

C

It is constant workload as we can see from from the graph we collected from all the metrics, of course, is collected from Prometheus and an illustrated shown on the dashboard, the when the load suddenly comes at 7 o'clock. The response time spikes to a very high number, which is not shown here in the graphics. The CPU utilization of the pod also starts to increase. I need to mention that initially, every every service has only one part, as you can see here, API friend, tweet user every service.

C

As long as one half with the onset of the workload, we can see that the CPU based HPA exam and the number of parts for the API service and tweet service gradually spins up it, takes about 10 minutes for the whole thing to stabilize and for the response time to go down to an acceptable level which is around 2 to 300 milliseconds and the we can see that would the effect of HPA based on cpu utilization, the number of tweet and API servers, number of pots stabilizing as 4.

C

So this is this is this is encouraged as we can see that when the when the note stabilized and HPA based on cpu was a bit is able to maintain and an average CPU ization across the whole cluster to be around the target threshold which is 75.

C

The downside of this approach is that we see that, with the onset of world code, the the spin up of the parts is a little bit slow. It takes almost 10 minutes for the whole system to respond to the workload. This is a little bit undesirable, but this is also explainable by the algorithm right. So initially you have 75% is the threshold and the CPU only goes up to 100%, so number of parts can only increase by one so that how that's explained how helped the whole system how it works with the CPU based approach.

C

So now we think. Ok, what can we do to improve this initial slow response to the increase of the workload? We think, naturally, if we can use response time as our metrics to monitor, because we know that when the workload suddenly increased, the response time goes very high. If we use the response time, I sell custom metrics and we do HPA. Based on that, we must achieve very good results in terms of you know fast response to the workload right. So that's what we trying to do next.

C

But it turns out that to enable custom, metrics based HPA requires a little bit of work, and so, first of all, we need to deploy the ischium metrics to collect the response time and Leydon, which is the same as latency in transactions.

C

Whatever custom metrics, you want to collect right, and then we install the the kubernetes Prometheus adapter, which is an open source solution which is I believe, is the most popular open source solution for this particular use case, and we define the metrics collection rules for the adapter such that we are able to such that the MCUs adapter is able to collect and expose his less magnetic, extend points.

C

And after enabling the custom metrics server and we create an HPA based on the response time, this is only enabled through the auto scaling, beaching beta, 1 and up API versions. So, as you can see from the configuration file that you know going through.

C

So, basically, we specify the object, type and I mean we specify our average response time, which is collected by the commuters adapter. We in fact did not know what the max number of replicas to set initially and apparently we did a lot of tuning and experimentation and with the the next slide we can see the result of when we define the maximum number of replicas to be 15 for API and tweet service.

C

We see that there is a very undesirable effect of ping pong where the number of replicas was fins up. You know in a large number was quickly and and also when, when when the number of replicas is up quickly and and the response time suddenly, you know, lowers and then goes to the undesirable. You know number which caused the scaled down of the apart again. So you see this ping pong effect, which is really really undesirable and we have tools.

C

The number of replicas we tried from number 4 to 15 the ping pong effect effect is always the same.

C

So now this leads to us to thing what, if we can combine the solution of the cpu-based HPA and the response time based HPA, if we combine them together that make provide an opportunity to not only maximize the resource utilization but also respond to the sudden increase of load very quickly. So this is the next experiment. We try.

C

Because we already implemented the metrics custom metrics server, so it's just a matter of defining multiple metrics and the HPA configuration to monitor and, of course, we need to specify the target value. The target value is obtained.

C

Actually this is it takes some time to understand the the characteristic of nature of the application, and we realize that we cannot specify a random number of the maximum replicas. It doesn't necessarily.

C

It doesn't necessarily make a better result if you specify a larger number of mass replicas, the number of max replicas to be specified is based on the CPU based HPA results, and we understand know through the previous experiment that, when the whole system stabilized, the max number of treated API replicas is around 4 to 6. So that's why we specified in the max number of 6 here and also we got the value of response time in a stabilized environment and we specified that as the threshold.

C

So as you can see that the number is because of this iterative experiments, we understand the characteristic of the application. We are able to supply it with a more reasonable a number.

C

So, as you can imagine, with all these implemented, we are able to achieve the best results out of the three. So, first of all, when the known increases the constant load increases at 1700.

C

We see, of course, there is a sudden spite of response time, but, of course, because of the response time based HPA the number of hot spins up very quickly, and because of also we didn't set a very large number of Max replicas, so in also sort of stabilized at four to maximize the CPU utilization, and we also see the effect of the the response time. You know lowers much more quickly in about you know three to four minutes. The response time lowers to around 300 to 400, which is in define a predefined range.

C

So this is very encouraging results and we, given with this particular configuration, we apply this to a different varying workloads. So when we change the workloads, basically at the locus cluster side, retune and change the number of users coming in and with this combined CPU and response time based approach, we are able to achieve the results as expected.

C

So this are the iterative experiments that we've been through. Of course, we pick the most.

C

Those experiments that you know that illustrates our our point on the best right because, of course when, when we actually did the experiments with many many more rounds- and it really is a very time-consuming exercise include, you know, understanding the custom based HPA metrics settings, configurations to all the tuning of the replicas and defining and finding the bottleneck in in the multi-tiered applications.

C

So I want to talk about the takeaways from our experiments as I've just mentioned it, the upstream kubernetes HPA solutions. All those things solution does work, but not before you spend a lot of time, setting it up and then tuning it and I've outlined the many of the steps that we've been through and every step is also very time-consuming. But of course, if you've never been through the whole exercise and journey, it could be a little bit daunting.

C

But after you have been working on it with it with a particular application, you get you get the basic idea and then it will be much easier for you to continue to tune. Of course, it does require multiple experiments, participant to determine the best set of configuration and thresholds overall, and it is a threshold based algorithm and it's up to the user and a man to find out the best number to team.

C

And another takeaway is again: we've mentioned that it is very difficult. It's not an easy task to to assure application performance, because what we see is only a symptom right. You have speed up slow response time. You have no transaction ups, so we have. We have. We have many dimensions to explore. Shall we use horizontal scale or is vertical scale, a better options when we vertical still? What is the the capacity you want to scan to? Well, we have noisy maker issues.

C

Do we have the visibility of the entire staff that the application is bombing are? Are there any underlying infrastructure congestion that we are not aware of all of these combined to make it very difficult? And what about this is only just one application we're talking about what? What about you know many many micro-services in your application.

C

This is just becoming a trend right many many company, our beacon posted, a monolith, application and pretty much. We arrived from ground up to make micro service base. It could become very, very complicated and it's gonna be a very interesting exercise for a for a for for everyone to implement HP a solution in an interview environment.

C

So with that, that's pretty much I want to talk about and share for this particular session. I was I, wasn't sure. If there's enough questions.

B

Thanks mom, actually how we received quite a few questions in the Q&A panel and I think, while mom is talking I'm trying to type by answers there, so so, since we have enough time to go over the Q&A, so I would just go over all the questions that we received from the audience and hopefully first thank you so much mom to share the experiment.

B

I hope this people find it useful because for us we do find that even though HPA sounds really promising and it looks like you only need to configure meaning replica public bus. What's a target utilization, it seems easy enough, but the in reality it's not so easy because you have lot of metrics to choose which exactly which one you should actually scale the application. In fact, it's really a case by case scenario for different types of application for databases for web servers for machine learning applications.

B

So everything can be different and we are not saying that we have the best practice here. Please bear in mind. We are just trying to share what we did with a web server, a treat like web servers and how, even though this application is simple, HP sound, simple, but to use HP a against this simple application is not absolutely that trivial, as as Mon said, and the good news is everything we did use actually open source.

B

You can actually find the test pair and it was all the Yama file you can deploy the test by yourself. You could deploy the low-cost elements that similarly play with it. Also you can apply the same methodology on your own applications, use the same load simulator on the same you're on your own application and try to play with HP and see if that is so easy to configure.

B

How do you actually config being replica match your process and our utilization as you achieve your SL or not so so yeah, so so the one of the question. So let me just read aloud the question so so. Josh should I just read a lot of the questions and share the answer with everybody. Yeah.

A

I can read them to you and that way people hear you have what's the question. What's the answer, so, let's start from the top anshel asks. Is cluster autoscaler the preferred way over the auto-scaling provided by cloud providers.

B

Yes, so, as I already tied the answer why I'm on is talking so actually the cost of this game are sorry that we only actually cover HP a and it seems like we will actually, as I said, everything is open source that we were sharing online and we will keep on continuously operating with a BPA in the scalar as well, and hopefully we get another opportunity to speak and share more experience about using these other stay with solutions, but the house and shoes questions. So thank you for the question.

B

So the cost of the scalar is usually built on top of these negative of the scalar provided by these cloud service providers. So if you look at the cost ap across the autoscaler for for hey WS, that is usually built on top of the ASG saying AWS, meaning that you the cloud formation of the terraform fire you already, it already created okay SGS on AWS for you, and that will be something that cost a lot of scaler leverage to know. Your questions.

A

Second question: also from anshel: are there some best practices that should be followed when using cluster on a scaler, we're using cluster autoscaler currently and some of the pods, such as sto, get killed during scale down? Yes,.

B

And possibly this is a. This is a generic problem when you actually specify one dimensional stress well, because because this is you can imagine right. This is this is not that one dimension thing that specifies CPU, so even you guess HP a thought that we give. We actually have to combine two dimensions together.

B

Same thing, I think apply to see a, although we didn't have time to go over the CA concepts, but if you only scale up and down your cluster based on CPU, of course, it's very complex in the kubernetes world right, you may have different resources. It may be a hybrid like hydrogenous cluster, that you have some node with GPU capabilities of no, it doesn't have GPS capability, but they happens that you, the GPU, actually, no, that actually has a very low utilization or CPU which meets evil criteria to scale down your cluster. Does that?

B

Does that mean you should scale down the cluster? What if you actually want to run on GPU? Have you work well, like the machine running work phone? Where are they going? So these are the things that you have to have a multi-dimensional system help you to actually decide in a more comprehensive way to decide which which nodes should be the best candidate to scalping. If there's any opportunity right, just not a single dimension thing.

A

Okay, next question from Patrick does helm support the auto scale parameter right now or do you need to use coop kernel directly.

B

So the answer: yes, definitely you can specify your home template yamo, you can. You can happening replica, makka, pakka, target utilization, everything in your template and you can deploy. You can actually talk away these values. So yes,.

A

Next question: Wow from Anne Shiell in HP, a target utilization- let's see, oh, is the HP at our utilization? Is it based on requests or limit parameters of pods come.

B

C

To take this yeah, these are all based on requests, and this is the same with it's also request, because request is it's a way in the kubernetes or you're guaranteed resource? This is especially for CPU. You have the CPU and you have the container runtime to make sure that. So yes, it's a request. Okay,.

A

Thank you and one last question from Patrick again: can a pod get evicted move because of vertical scaling? Is that configurable so.

B

So I think there's an ongoing effort to make VP a non-disruptive.

B

You can we can also share the effort that is being done in the upstream proven 80 to make vp8 more less less disruptive. Yes,.

C

Eventually, BPA is going to be disruptive, but for now it is I believe because, if it to make it a non-disruptive, the change in the schedule inside of is needed and which really needs a lot of due diligence and work to make sure everything goes smoothly, so that that takes I believe a bit longer exactly when it's going to be done. I'm not sure.

C

A

Yes, we have some new questions, you ready for them. um The first question from Tommy is: do you experience issues with production with horizontal I auto-scaling in production and kubernetes, particularly in metrics.

C

Yes, the question is: if we experience an issues, production grade HPA in kubernetes, especially with multiple metrics I'm, not sure what you mean by production grade. If the HPA implementation we use is from just from the community and as far as our experiments goes, everything is working according to design in terms of multiple metrics, you can supply any metrics, that's supported by HPA. Basically, you can supply the default metrics server and can have your own custom metrics. Ever there are many third-party metrics providers.

C

You can also you know, dump someone metrics in the queue humans, someone use QPS and bind them together, as I've shown in our last experiment. So I, don't I, don't see any issues. I didn't see any issues, at least with our experiment.

A

Ok, the I'm gonna actually answer his follow-up question because he needs a link there.

A

So then, we're gonna move on to Alex's question regarding the vertical autoscaler vertical pot autoscaler. Whenever the problem of disruption is solved, the pot is no longer following this question.

B

So I think, okay.

A

I'm, just gonna go onto this ml yeah go ahead, no.

B

I think alex is interested to know why we well. We need both a BPA and GPA if I understand correctly I. Think since HP a seems working fine, why we don't just use HP a instead, we want to use BP, assuming that BPA is also non-disruptive, visitor, I, hope, I, understand you correctly Alex.

B

Oh, this is a great question so but in fact I think a lot of times so, for example, I think when you're eating an Oreo cookie right, we really need the cookie every time you you want a new cookie or you really want just to eat the cream. For me, I just want the cream. So in that case, if you can give me more cream without the biscuit, that's exactly what we want.

B

So this is the similar concept, a CPA versus HP, a if I just want more B CPU, but you just gave me more CPU. I just want more memory, just give me more memory, so it doesn't have to be like every time I'm if I'm congested a CPU that is making my performance degradation, you always make a copy that will always make a replica of the memory usage as well, which is going to waste memory which I don't through the cookie away.

B

I just want the cream right, so so, in that case, I think there's certain use cases I, think of shooting documentation. You I think a lot of times that you see UCP. They want to have VP. A mostly use cases is about database application and HPA's for other application. Web server application right.

C

Ppa is I, guess the design. The best case would focus for the stateful parts right HP a will be stateless pause, which you don't care it get killed and you spin up somewhere else.

C

It's like well mission-critical long-running services, which is monolith phrasing. Well, it could be an example. Someone I some others case could be. For example, you can run as far executors in the kubernetes cluster and it can initially give it a very, very small amount of resource, and when the load comes you can you can decide to vertical scale because uncle to scientific computation, where you have, for example, MPI, application or or machine learning applications where you don't executors to be killed. You want it to be there, but you want it to scale up and down.

C

Based on you hope. That's helpful.

A

Thank you very much, and so then there was a request in their eye for a link to the source code again. So if you just want to tech paste that in to the QA, when.

B

A

I Kenneth is asking about a link to the source code. So if you can just go ahead and paste that in yeah.

B

So someone actually copy and paste the test that we have with Kenneth.

A

And that looks like yeah.

B

I just I just Josh I just noticed that Tommy has a question that actually related to ActiveMQ q, guys I see you also sent a link about configure, auto scaling, custom, metrics, yeah.

A

B

The complicated part, but based on my experience, the queue management, is usually complicated because because the queue itself congested, for example, that doesn't mean that this is the cause of the problem because, for example, in Kafka just similar to given Q, for example, if you, if you have in Castile, you have different topics that is belong to different consumer to consume the queue. So, if you actually see the queue size is very congested. That usually means that some problems with the consumer side that it's not processing the queue, the task that is fast enough.

B

So when you talk about the performance and you want to apply HP a on a queue like ActiveMQ messaging queue application you need to. You need to be extra careful about that, because that, usually those are the problem being we need to look at the consumer side, see if there's some bottleneck on the consumer side that slow down the process of the queue message, and we are happy to discuss with you more on those I hope that helps.

A

Okay, I think: that's all the questions that we have I, don't see any others in there. So thank you very much, Ming and inland for this presentation.

B

Thank everybody for attending. Yes,.

A

Thank you, everybody for joining us today. The recording and the slides will be online later today and we look forward to seeing everybody at a feature: CN CF webinar, as one later this week in fact so happy to share lots of information and technical expertise with you and thanks again, Ming and inland for sharing yours.

B

Welcome happy Tuesday, bye.