Knative Public Tech Talks, 6 Jul 2019

Previous Meeting

Next Meeting

⏯

youtube image

►

From YouTube: Knative tech talk - Build and Autoscaling

Description

This tech talk gives a brief introduction of Build and Autoscaling project in Knative.
Build is presented by Jason Hall from Google.
Autoscaling is presented by Joe Burnett, also from Google.

A

Are we late now.

A

So we'll start it's it's my pleasure to introduce Jason.

A

He's going to present on build and then we have the second teacher, Joe he's aced out of Ross office and he would basically be talking about Alice, Kelly, so I think Jason. You can take over from here all.

B

Right thanks. Can everybody hear me? Okay, okay,.

B

Yeah so I'm Jason hall, that's my email address.

B

If you have questions about this afterwards and whatever, if you also have questions afterwards, first I wanted to start with sort of a history of what the build project has been build has been around since the beginning, since before it was called K native, it was included in the first release of a native and ever really, since the objective of build was basically to augment serving by deploying by allowing people to deploy serving apps from source, where the images are built as part of the deployment process, particularly in the kubernetes cluster running with serving infrastructure.

B

The model that cloud that it uses is largely based on Google cloud builds model that is historically because that's what I worked on since 2015 and at the time that we founded Canadian and that model is also sort of similar to circles. So you guys containerized build process if you've used that, but they've all sort of diverged apart since then build, has met significant contributions from people at Google, pivotal, Red, Hat, many many others it is. It is a team effort and we wouldn't have been able to do it without everybody.

B

Builds resource model is, is fairly light fairly simple, especially compared to serving and inventing. We have three custom resources. One build a build specifies optionally, specifies some source like a get reboot to build from the steps to take on that steps are required, steps to run an order in containers on the cluster when the build controller sees a build. It starts a pod and watches that pod and then reports on the status back through the build status build logs are exposed from the underlying pod.

B

So if your build, you know your maven build or something emits logs. Those will be available in the underlying pod asterisk, mostly for something we will talk about later. In addition to build, we have another resource called build template, which is basically a sheriff will be usable, parameterised, build process. The build specifies some steps and says: I will run step ABC where each of them can be parameterised in some way and then instantiated with a build. When you create the build, you say, go and Stan she ate this template filling in these parameters.

B

With these arguments, there's a library of reusable build templates that candidate build templates. We have them for like build a Conoco basil, build kit, maybe maybe half a dozen others and then related to build template is clustered built in blue, which is just a cluster scope, diversion of a build template. It can be referenced from any namespace, so the build templates are vanilla, build templates. Our names based cluster build templates are across the whole cluster.

B

The intention there was for an operator with significant, with sufficient permissions to install a template across the whole cluster and for everybody to use that build template used the same, build template across the cluster, as I mentioned before the build lifecycle is when you create a build resource. The build controller validates it using the admission web book. It does a couple of simple things: like did you specify steps if you referenced a template? Does that company things like that?

B

The build translates that request into a pod. Basically, the steps you specify become init containers in in the pod. The source is, if you specify a source, we will prepend the container to that list of containers. That knows how to French the source. We do a lot of crazy stuff to make credentials work.

B

You can specify get credentials that are ssh credentials. You can specify user name and password get credentials. You can specify docker, username and password credentials, which authorize requests to push images at the end of the build to a private registry or pulled from a private registry. So the controller sees the build translates into a pod starts the pod in the same namespace watches that pod for updates as the pod progresses.

B

It will update the build with that status and then, eventually the pod finishes, and it updates the build status to say I'm done successfully or unsuccessfully and so on.

B

By way of illustration, this is how it's used in serving or how it's how it's been used in serving so a certain configuration can specify a build in its spec that build can specify the source you want to build, how to build it. In this case, it's using a fill template called Conoco. The idea here is that the details of how Conoco works and when what it does is entirely hidden from the from the deployer user. They don't they don't really care, they don't have to care how Conoco works.

B

As long as that template is installed in a namespace or a cluster built, template is installed on the cluster. They just have to say: use Conoco I, don't care how and push this image the blue my image and then in the configuration the revision template says, use that image the same image that I know which we'll talk about a little more later. When you create that revision, it will first start the build the revision controller before start the build watch the build status while the build is ongoing.

B

The revision status that has build complete false because if the build is not complete yet when the build completes, it says complete true and the revision controller proceeds to do the next thing, whatever it is after that, so the Canadian built are very, very rudimentary they.

B

They say this so I think the basic needs of a just-in-time deployment, a just-in-time, build during a serving deployment. They give you a sequential list of steps to find as containers they let you build images they actually let you do anything if you want to specify whatever steps you want, that run unit tests or container image scans, or you know, update github or do whatever you want. You can do that, but primarily people just use them for building container images that are then run in that downstream revision.

B

It's a bit of a misnomer builds can really do anything. They can run tests, for instance, they barely type their inputs, so you can tell build that it's building a git repo or a Google Cloud storage object other than that it doesn't know anything about what that means and it doesn't report anything about its output. So it doesn't say, I built this image. With this digest, specifically, it just says: I succeeded in building the image.

B

You asked me to build the distinction there is that it won't tell you exactly what it built it will just say. You told me to build X and I did it which it, which is a bit of a gap if it were to do anything else. If it was running unit tests, it wouldn't be able to tell you anything any sort of structured information about what those unit tests were and which ones paster succeeded, faster, failed, and so that's a bit a bit limiting as an implementation detail.

B

Translating the steps into any containers in the pod made persistent logging very complicated. It turns out that any containers aren't great in Corinne at ease in some cases in a container logs are dropped before all of the ended. Containers are done, running which can be problematic and even if they are persistent until the end of the pod they're not persisting much after the pod and because the NIC containers run in serial before the pods.

B

Regular containers there's no way to specify at Logging's ID card, for instance, to to persist to collect those logs and persist them somewhere else. So you really only have like a split second to check those logs before they disappear forever into the ether. There has never been any automatic triggering of builds or automatic deployment process in general for serving builds. Builds must either be manually, started or started as part of it manually started serving deployment.

B

That's something. Users have asked us about a lot and we don't really have a good answer koi.

B

So around September last year we embarked on this effort to experiment with what it would look like to rewrite build from the ground up, basically solve all these problems of its rudimentary linear, behavior and lack of triggering and lack of, recording typed inputs and typed outputs and being able to like explicitly report what happened during that build.

B

They came in the form of a new repo we've created a separate repo key native, build pipeline. It was originally itself based on Canada, build so I'll get to that. A second. Its resource model was only slightly more complex than builds the same way. They build head, build and build template.

B

Key native build pipeline, had tasks, runs and tasks. Tasks are like build, templates task runs are like builds, and then we created another layer of resources on top of that called pipelines and pipeline run pipelines define many tasks to run, possibly concurrently with typed inputs and outputs passed between them and pipeline runs, are the executing instantiations of a pipeline, etc.

B

At the end of the day, test runs much like builds start and watch. Pods at first task runs, created, builds which started pods and then updates were bubbled back up. We eventually removed to build intermediary in there and resources is another resource that that the build pipeline experiment added, which was sort of to be able to type the inputs and outputs. So a thing says: I rely on a git repo that git repo is now a resource resource.

B

There's only so many words yeah success. I think people were really happy with the level of express ability and flexibility that that to build a pipeline project gave them. The ability to report inputs and outputs like that was was pretty compelling. We dropped our dependency on Canadian, build and started pods directly.

B

We dropped the NIC containers for better persistent logging, which turned out to be really really useful for it's like usability, and it was so successful that we moved that project out of K native into another project called Tecton, which was donated to the continuous delivery foundation and moved out to its own github repo. So the build pipeline repo is no longer there. It just redirects to tech on CD by place similar to how we have a candidate build templates repo. We have a tech con, CD, catalog repo.

B

This currently has a catalog of reusable tasks, but it could also have pipelines and resources etc, and the Tecton project is actively looking into how best to specify triggering of these CI CD pipelines, I like to say the triggering, is putting both continuous 'as in CIN CD, because currently it's just I D, which isn't very useful or helpful.

B

But at the end of this, at the end of this experiment and and sort of the success of build pipeline, Tecton pipelines, we were sort of faced with an uncomfortable situation. K native build still exists exists today. Tech con pipeline exists today. They largely do the same thing, but they don't share much if any code. They share a little bit of like a native PKG code and they share some concepts and they share some copied four code, but mostly they're separate efforts, logging, persistence and candidate build is still hard.

B

We don't, we still don't have into typed inputs or typed outputs in cañedo. Build and Tecton has all of these things. It also led to a lot of aside from duplicated effort. It led to a lot of user confusion. Users show up set, the users show up at Tecton. Sorry, users show up at K native dev and say I want to use, build, they use it for a little, and then they say this isn't really solving what I need. Someone eventually says you should try Tecton and they say Tecton looks very similar to this.

B

Why do you have both? Why do I? Why do I need to figure out how to use both or which one of the both to use and the build working group spent a couple of months trying to resolve this split? Trying to think of a good way to resolve this split, we thought about having build the K native, build components, depend on a tech, con installation, whether that's one that was installed when you installed build or whether that was one that we required you to have installed both of those have downsides, pretty serious ones.

B

Operators shouldn't have to manage a matrix of compatible versions of K native built and tacked on pipelines. We don't want to have the default K native installation require a Tecton installation. We don't want to have the default installation of can ativ install Tecton for you. If so, which version like it's it's a gigantic headache. Another option was that Tecton would produce a library for creating and watching pods that build would consume.

B

This also has some overhead as far as like who is responsible for features to that library and sort of at the same time, we were going through this existential crisis of what ease what is even a built. What is build before build historically, like I said, was for just-in-time, builds as part of a deployment process.

B

That's great and that's a really good, like getting started user workflow, but where do tests happen in that world? Where do more complicated things like integration tests especially happen in that world? It's not a very clean story in that case and really what we want and what users want is CI CD they want like. That is the best practice we should be. We should be pushing on our users.

B

It's like don't do all of this work as part of a deployment do all of the work and then, if it's successful deploy you know so that was sort of a lot of the overarching. We were having like tactical discussions about how to resolve this split technically and then strategic discussions about what is what is built good for. We should instead I think that the result was. We should make CIC be easy to adopt from the start 4k native and see where we go from there.

B

Meanwhile, separately on another thread, Kay native serving v1 beta one had a proposal to stop bedding builds in serving I, think largely informed by and and spurred by the discussion that we were having about whether just-in-time builds are a good thing or not, and so this is a slide from that proposal embedded into this slide. That is a meta slide, but basically the result was you know how I said in a serving configuration you can specify a build after v1 Bay, the one you cannot this.

B

The this slide says a leave integration of build and serving for a separate orchestration concept, which vaguely could mean almost anything. It could mean Jenkins, it could mean Travis, it could mean a hosted service like Google cloud build, it could mean tacked on it could mean a client. It could mean a lot of things and likely it will mean a lot of things.

B

So where does this leave build after v1? Be the one serving will no longer depend on build if it wants to, but I think that the limitations that build are such that I. Don't think it personally makes sense to take any new dependency on build, and so build is sort of this free-floating entity in Canada today.

B

That con is is a separate non-canadian project that is more mature and more powerful and more people are contributing to it as all the shared history of Canadian build and more development, since it left in February partner and so just Tuesday yeah, just Tuesday um Vincent de Mistura Red Hat proposed this weekend a to build repo to deprecate Canada build in favor of techsan pipelines. This is a hot off the presses proposal, and so don't be surprised. If this is the first you are hearing about it.

B

It was discussed some yesterday in the build working group which I am realizing now I haven't put up. The recording of it is going to be discussed at today's TOC meeting in about an hour and 15 minutes if you're curious and want to come to that, and it will be discussed at steering committee meetings and things like that as well.

B

Don't expect this to happen quickly if it gets ratified. If it gets accepted, it will still be a process of slowly deprecating it and responsibly sort of guiding it into the ground. So what can you do now? You can go read this proposal discuss this proposal. If you have, if you really really like build, if you're using it today and really really love it in it, and it solves a specific problem that nothing else would solve for you, please let.

C

Us know that.

B

Is why it is a proposal and not like an edict? We would love your feedback. Is there anything that Tecton can't do that build? Can that you're using build for what is your ideal developer? Experience for deploying Kay native apps is that something in this can CLI. Is that something in CIC be Morgan, please let us know we welcome and need your feedback.

B

That is the end of my talk. I'm happy to take questions or I. Don't know if we usually do questions at the end or between.

B

C

B

Any questions I know this isn't like the 4k native community comes to the Inspiro.

B

Okay, maybe send me an email or hit me up on slack or send an email to the group. If you have any questions about this afterwards or if you're watching this on the youtubes.

C

D

C

So, thank you. Thank you, Jason. We would now switch to our second speaker, job or neck who's. Gonna talk about auto-scale, Joe,.

C

A

Excellent awesome welcome so giad a push at what time is it in Poland? What.

E

Time is it 7:30 sorry, 7:20 p.m. okay.

A

E

A

C

E

At the office and had dinner with a family over gvc- and you know missing bedtimes, but you know yes, I- think it's worth spending a few brownie points to get a chance to talk so.

C

E

Cool so I'm about to present.

D

Then somebody hit at the windows.

E

Can you see my screen yeah, okay, great okay, so let me introduce myself. My name is Joe Byrne and my email is at Google I'm. The scaling working group lead, I've, been working on app engine and then K native for a couple of years. Well, Kate native, since it started kind of just fell into auto-scaling, just because it was a thing to do, and Candida was first getting started and I've been working on it ever since it's been a really fun and interesting prone space.

E

So what I'd like to do today is I'd like to kind of start at the problem, space and kind of tell you a little bit about what I mean by auto-scaling and what Katie, if this gate and key native scaling working group does and and then touch on a particularly interesting part of K native scaling, which is scaling to zero and I, want to talk to you a little bit about the mechanism that we use the way that the scaling, the way that the auto scaling infrastructure works.

E

What I'm not going to touch on is the core algorithm of the key native autoscaler in the metrics pipeline, because I think that deserves some focused discussion. So this is part one of two Marcus Thomas with IBM. Sorry, excuse me: the Red Hat he moved companies, we Marcus Thomas with red hat, is going to talk on the core algorithm and the metrics back line the next autoscaler to talk in this series. Okay, so, let's start a little bit.

E

Let's talk about auto scaling, you general okay, so auto scaling is about balancing performance with costs and I'm talking about performance, usually I'm talking about latency. How long does it take to give a response? Usually it means that the request is able to get the resources that it needs to process the request as fast as it can and without being throttled or delayed. So you can optimize for performance. This is like you know, just giving yourself enough elevation provisioning enough resources that you just can handle anything that is thrown at you.

E

On the other hand, of the spectrum you can optimize for cost, meaning you can only provision the resources that you need and throw them away as soon as you're done with them.

E

Now these these are hard to mix server. Lists is about getting more of both of them. So what does it mean to be service? Well, service auto-scaling needs to be very fast to scale up. It needs to be pretty fast to scale down and when you're not using it, it shouldn't cost you anything. The resources should be just there. That's what kind of makes it this no magical, sparkly service, server list thing.

E

Is you just you know you just put your code out there, and it just has the resources that it needs when it needs them, and you just pay for what you use it's like for any of you, military helicopter enthusiasts. It's like flying nap of the earth. You know it's just just having the resources that you need just in time and it's it's a little bit harder than more simpler, auto-scaling use cases so.

E

The auto scaling problem space is very related to routing that is request. Routing I'm talking in particular about, like you know, serving workloads, request, response, workflows, one of the engineers from pivotal gesture, Jacque Chester, has likened routing and auto scaling to particles and waves.

E

You know like two parts of the same two sides of the same coin, because whenever you talk about one you're going to talk about the other trade offs in one affect the other, so as far as routing goes, you could take a centralized approach where you put all the knowledge in one place where you have the request to come to one place, it sees. What requests are there? It makes precise auto-scaling decisions sends the work exactly where it needs to go.

E

An example of this is the open whisk project, which is you know, does exactly this. It's very efficient at low and rapidly changing load. It's good for a function framework. However, the scale is limited by having a central choke point. Everything has to come to one place so that you can make a decision in there and then send it in to where the work is gonna happen. Daler end of the spectrum is a decentralized routing and auto scaling system.

E

Now this is kind of where Kay native started with the sto. So the assumption is that, like okay, we can't we, we don't have all the knowledge in one place, so in particular in a mesh mode with a with sto and a mesh mode. You know routing requests just go directly of the where they're needed routing is very, very decentralized, so those scale limits are much much higher. It's a very efficient at like high lobe. We know this from. You know how we've used it inside of Google and it's it's efficient.

E

It's a relatively stable low, meaning you know if the requirements aren't changing really dramatically. You know it works very well and since you don't have everything right there to make a absolutely central auto-scaling decision, you need to have some sort of a feedback controller. Where you you make a change. You observe the system. You make another change, you observe the system, so there's a feedback loop that you operate on, rather than this more precise scaling of a centralized system. So keen Edith has some above you know they vary.

E

It started very much on a decentralized end of the spectrum and we've been including some features of centralized routing as well. So what I'm gonna do is I'm gonna, walk you through how Kay native scale, 2-0 works and you'll see how some of these parts are, how we kind of balance between you know decentralized, mesh mode and a more centralized sort of queuing mechanism.

E

Before we do, I wanted to kind of touch on makinia entities just to make sure everybody's familiar with them. This should be just a refresher that the service is, you know you give the service a container, and you say: I want it to run this container in this way, and part of that is telling it if it has a concurrency limit such as this thing can only handle one at a time. That's where you tell it, you say hey. This is this: is the limitation of my container?

E

It creates some other entities, the configuration which is the canonical staff shot of code and configuration that you want to run and an immutable list of revisions which is stamped out every time. You change the configurations. The revisions are the things that actually run. They create pods by way by way of deployments and actually run your code, and then the route is a thing that references revisions which says: ok, how do you want to get these requests?

E

Do you want them just to be sent to whichever revision is most recent and healthy, or do you want them traffic split? Or do you want to sort of have a Bluegreen deployment etc? So these entities are the public Cana divinities. That I think you should be fairly familiar with for the purpose of decomposition, extensibility and.

E

Internal mechanics, we've created a couple of internal custom resources. One is called the pod autoscaler and I gave a I gave a cute con talk on this entity, specifically the last Seattle cuke on December last year called scaling from 0 to infinity, which did it all over-promised and the so you feel free to go. Take a look at that.

E

The pot autoscaler encapsulates they were auto-scaling requirements and state for a single revision, so each revision, stamps out of pot autoscaler and that pot out of Scylla sort of keeps track of whether or not you know the revision is scale to zero or not it kind of acts as like a place where you can operationally control some of the auto scaling mechanisms.

E

But since then, we've actually kind of been developing.

D

E

The scale to zero mechanism and the way we handle endpoints and victor has created a circus service SKS, which is a proper kubernetes service which is sort of populated with weather. Endpoints are capable of serving your code, it may be a running pod. It may not be so. You'll see how that works in just a my first I want to kind of show you what scale 2-0 looks like in a key native. So when you first deploy a revision, the route creates some ingress stuff creates the misty or outs.

E

The revision creates a deployment which creates some pods.

E

The revision also creates a server list service which creates some private, which creates a private service, and it creates a public service and the pods when they become healthy, get put into the private set of private endpoints and the server the service will then copy those over into the public end points, and then ingress will be able to find those pods and send requests to them. So you serving path is fairly straightforward and because this is a mash, if those pods want to send requests to other revisions, they don't go back through ingress.

E

They just go directly to the other pod, because the service is programmed on all of the pilots. All of these, the ingress and the pods they all have sidecars, so they all are programmed by history, oh, no, where to send their requests, so this is sort of like the initial mode, very decentralized routing. Now the first thing that we introduced is an auto scaler. So there's this cluster common component, it says so here observes the metrics coming from the pods.

E

It actually has a system to scrape Prometheus metrics from the pods and say: okay, how many requests are you operating now? How many do you are working on now and it take keeps track of the average concurrency and tries to maintain a desired target average, concurrency and scales up and down to achieve that? um The actual mechanics of that are quite interesting and that's what Marcus is going to talk about when he comes comes to talk at this series?

E

Well, what happens when you stop receiving traffic right? Because oftentimes, you know either you deploy another revision and you stop using, though this one or maybe your vision, is something that is only called a couple times a day for whatever reason you start receiving traffic and we need to come all the way back down to the ground. Well, the circle is serviced. This is the reason that we have it. It actually takes the takes different set of end points and copies them in to the public in points.

E

So what it does is it routes traffic, then to a service called the activator and the activator is a another cluster common component- that's always running, but it knows how to catch requests that are bound for revisions that are scaled to zero.

E

It knows how to look at the header and figure out which revision it's meant to go to and then proxy the requests to those pods when they become available, so the activator caches requests when there's no pods and then when the climate is scaled to zero and it waits for them to be available before proxying the request. So this is where we start to get more a centralized system. You can see that, since all of the requests for the revision come to this one place, this acts as a revision level. You know each pause.

E

Each pod has a small queue on it, so that it can, it can put work into a pending state while it's processing the request, but here we have actually a higher level Q in the activator, so there's to make the system actually work in to work. Well, there's a couple other things we do kubernetes will still be populating. Those private endpoints, which is you know, just always whatever pods are available, it puts into the private endpoints set the activator, actually watches those private endpoints and is aware of how many pods are up and running.

E

So it actually uses that to throttle the number of requests that it sends to the downstream pods so that it doesn't overwhelm them. This is important because you know you don't want to take all of the requests that you've received and dump them on the first pod. That shows up. You want to give it just as much work as it can handle so that when new pods show up, you can give them work too. This is a way it is sort of load balanced a little bit.

E

So this is this is part of the routing mechanism.

E

It's not precise, like it's, not it's not as though we're saying okay pod, you take this one and you take this one and you take this one, although this is something that we have discussed in that key native scaling working group. This is really an interesting area of discussion like how how intelligent can we make this route, and we could we could do a lot with regard to using these resources efficiently from the activator, the other. The other additional piece here of this system is once the activator gets some requests.

E

It gives a signal to the autoscaler to say: hey. I have this back of requests. That is this big. The autoscaler can make a decision to create, as many pauses are necessary. So actually, when you get your very first request, that's zero! So suppose your scale to zero a request comes in lands on the activator. It doesn't go anywhere because that for aatul is it's zero as there's no private endpoints.

E

As a matter of fact, let me actually walk you through this. Okay, so I'm a request right here. Oops you poop and I land on the activator.

E

Should I come down, I get proxy to the activator. I, don't go anywhere because there are no private endpoints. A signal comes to the autoscaler that there's backlog. Autoscaler says: oh well, I see that you have five requests. I'm gonna make you know five pots, it scales it up. The pods get created, they get health checked put into the private endpoints.

E

This is watched by the activator. The activator opens up the throttle and says: okay, five requests. You can go through those requests get proxy to the pods process. They do whatever they do and then metrics are returned to the autoscaler. If you continue to receive requests like, maybe you get more on an ongoing basis, they still just kind of flow through here. I wish. This thing would just talk before they keep going through.

E

Here and then you just keep getting metrics from the pods as well as a backlog if there is any backlog, so the story autoscaler sort of like taking in both of these signals.

E

Then, if you reach a certain threshold or as the you once once you're scaled above zero and maybe you're up and running getting you know some number of QPS, the service service will start copying the private endpoints into the public endpoints, taking the activator out of the serving path and then request start going directly to the pods again. So you kind of back where you started so that's gist of how was hail to and from zero works.

E

Now, there's more, we can do here and I'm gonna talk about like the the next thing I have to talk about is what we're gonna I'm gonna do next, and but maybe I'll I'll talk a little bit about some new work to provide a guaranteed burst capacity. So remember one night: when you first give the container to the service, you say: hey I, want you to run this container and I want you to do it in this way. It's one thing that you can specify is it would be possible to say.

E

I would also like you to make sure that any given time you can handle an additional 1000 QBs and since the serverless service is, has full control over where requests are routed. Through this private and public endpoint mechanism, we can leave the activator in when we're below the desired target, burst capacity and accept a little bit of overhead from a jump through the activator as well. As you know, not being able to scale.

E

You know a little bit more load on the activator for sending all the requests right and kind of mix in a little bit of a centralized routing low scale or when, were you know, below a certain threshold? I call this duel mode routing, because we're kind of like switching back and forth between you know a decentralized and a centralized routing mechanism.

E

C

E

Calls a serve mode and proxy mode, so this is, this can be very powerful. The ultimate goal, of course, is that you would just give a container to a service, and a revision would just run it. You would just send it some requests and it would serve those requests and the latency would be pretty stable and if you send it, nothing just goes to zero. If you send it a whole, it scales up as high as you needed to as long as you have cluster capacity.

E

The goal is for this thing to be kind of magical. That's the service, auto scaling aspect of it. So that's kind of that's the main piece that I wanted to show in this tech. Talk kind of sets up the environment of our auto scaling system, as I mentioned, target burst capacity, something the victors working on, which is really cool. It generalizes scale to zero. So it's really, you know not just scale to zero. It's just like. Whenever you're below this threshold use the activator as an.

D

E

Activator is probably not the best name for that component anymore, but we tried to rename it once and it was kind of challenging to clean up existing resources and to guarantee we didn't break anything when you upgrade your cluster, so it's activator for now. So don't worry about it.

E

The other thing another thing to working on is a cold start latency, so Greg Haynes from IBM has been spending a lot of time, looking at what why it takes a long time for the first request to get serviced, which is about six seconds, it's it's much too slow. It's not really that very fast scale-up that we were promising and there's a couple reasons for it.

E

The cubelet in this in the custom container, runtime interface, whatever you happen to be running, usually are contributing a large chunk of latency and actually just starting the pod readiness probes take some time to get going. I think the smallest interval you can configure is one second and envoy has to start before. You can send the readiness probes through to your container and then there's network programming, for example. The pod that starts may not be on the same node, so you have to wait for everybody to know how to get from here to there.

E

You know from the activator to the mentor at the pod. That's ready to serve your request. There's a couple ways to approach this. We've talked about everything from you know, scheduling, nodes right there. Where were there we're scheduling pods were then right where the request is locally.

E

You know, like other ways of you, know, of speeding this up. It's a work in progress. It's that definitely an area that needs some investment, but greg has been really focusing on it. It's been improving actually, which is great.

E

Another thing is coming up in Auto scanning space, Marcus Thomas, who you'll hear from next time with red hat, has been sort of pulling apart, the thought of scaling decider, the sort of like algorithm piece from the metrics system, so that if we can provide the metrics to other auto scalars and also so that you know, we can have a better layering for things like running HP a to scale to zero so like this is this layering is actually really important, because T native does support CPU scaling, which is really nice.

E

Cpu scaling is nice because you don't really need to configure it. You just you know it's just a percentage, it's one of the easiest things to use, but it doesn't scale to zero right now and you know, like key native needs to inject some of its, not I'll. Let about requests to enable scale to zero, because you'll never really use zero CPU and you can't you to get off the ground right. So you need the K native networking's awareness of requests in flight in order to scale and to incomes there.

E

So, anyway, that's something that can be coming up and there's a bunch of other stuff too, that we're working on in the auto scaling space is a very lively discussion. Each week on scaling working group, every Wednesday, 9:30 a.m. PST slack is always good to ask questions and we've recently landed a 2019 roadmap sort of outlining some areas that we won't invest in. So if you want to know more, come ask play around with it. If you want to work on it, you know come check out the roadmap, see what issues are available.

E

There's lots to do it's a fun space.

E

One of the engineers I mentioned before Jacques Chester has implemented a simulator which is a lot of fun to play with, and it's really powerful for understanding how the algorithm works and I think maybe that'll be more of interest after Marcus's talk next time. That's all I wanted to present. Is there you have any questions on.

D

F

Otherwise, this is shell from Google for the auto-scaling. Is there any plan, or is it currently already.

F

Projection of future laws like.

E

Predictive or scan right.

F

So, for example, if the autoscaler is observing steady, increase of were close, aha can kind be more aggressive.

F

Scheduling would cause yes monkey favor. If you see.

C

E

Lord is just going up and up and up and up, can you like ahead of it, so it depends on what kind of profile that is so right now the algorithm is pretty simple: it's just like shooting for an average, and it has a panic feature. So it's.

D

Not really necessarily.

E

Gonna like handle that, in the most optimal way, we don't have plans to really dig deep in sophisticated algorithms. Like there's a lot of ways you could tackle one you could use what's called a pit controller, which sort of is a better mathematical model for describing these changes. It has a proportional piece that will make bigger changes in response to bigger error that has a integral which sort of understands how changes have been accumulating over time and a derivative. So I can see if it's like you know, curving in other direction.

E

A pig controller would probably do better in sort of a slow, continuous ramp in recognizing that you have that kind of a profile. There's also, you know you could do machine learning to do predictive, auto scaling. We haven't gotten into that, because we're still focusing on like really making the auto scaling system that we have rock-solid.

E

You know like we're, really we're still. You know an alpha state, you know, and a lot of the service service work, making sure that we don't that we can handle traffic and not drop it on the floor and generally like me, making all these components a lot more robust he's more important right now than improving the autoscaler algorithm. But if you need it, then you can do it.

E

This I don't mean pull request. Welcome I mean like I. What I mean is like I have an escape hatch and it's this pod autoscaler here. So if you want to, if you're like okay I know exactly what scale I should be using right now, I've got like just algorithm, and you know, and I just want you to use this. The pod autoscaler can be annotated with a class and you can provide a different controller for that doesn't matter. In fact, a native comes with two controllers for the pod autoscaler resource.

E

One will create a key native auto-scaling and one will just turn around and create a generic kubernetes HPA resource, and that's actually how we support scale on CPU. So you can go implement your own I've, the the coupon talk that I mentioned scale from 0 to infinity really kind of walks you through what that would look like it even gives an example of like a of an alternative controller. It's pretty it's a pretty large component to replace because you have to collect your own metrics and implement and your own auto scaling algorithm.

E

Yes, Marcus pulls those two apart. It would be easier to have an algorithm operates over the existing key native metrics. You know, like you'll, have you'll be able to replace smaller and smaller pieces. You know as we as we tease the system apart right now, you can, you can still replace the pod autoscaler and implement a predictive algorithm if you want I, think gun been browning actually recently has worked on this and implemented a pod, auto scale or reconciler that auto scales on a workload cue on a kafka cue. So that's something!

E

That's that you can. You can actually do it, it's not that it's within reach, and it's meant to it's meant to be done, done it to.

F

Your question is.

F

E

D

Think in general, Canada.

E

As a whole has a principle to be, let me see if I can get this right decoupled on the top and pluggable on the bottom, meaning you can use Candide of serving independently. You can use K native build independently, but you shouldn't do it because Jason said there's a better thing: now you can use, you can use T native events independently and they work they, but they compose together and within each of these projects. They're pluggable, so you can actually plug in different pieces like this is.

E

This is one of the reasons why it has taken some time to get to a very highly performant system is because not only are we, you know, building this highly scalable service infrastructure, we're also trying to do it in the right layer, kubernetes, where the kubernetes is designed, so that you can get at the individual pieces.

G

Give me the questions. I have one.

E

G

Is Carlos hi there in a while? Can you talk a little bit about the work that is happening upstream in kubernetes I? Remember there was a pull request. Around I remember was allowing the HPA to say mean replicas or.

C

G

Wanted to, instead of meaning that dissembling like leave it there, but like sets up us to zero is the circle. Question is I. Maybe you already answer is the HPA in upstream takes custom metric and they have a new API or maybe like recent API, to take custom metrics are we? Are we pushing our metrics in there or that? That's what you talk about, pluggable like we can plug any X metrics into the HPA or kPa, using the same API sure.

E

I can talk, I can talk on both those, so first of all the but then actually answer the firt of the second one. First, yes, we're definitely planning on doing more deeper integration with the v2 ember Nettie's HPA, there's a v1 v2 beta 1 and a V 2 beta 2. They differ in a little bit. Only in like what things you can specify. V2 beta 2 is probably the best one.

E

We do plan to provide the key native metrics. That is like the concurrency of each revision. We plan to provide that as a custom metric in the cluster. So Marcus is part of this. Decoupling is actually implementing a custom, metrics adapter, so that anything in the cluster in access or metrics in a standard kubernetes way, including.

B

E

So, as suppose, Marcus is done with this I could go, create an HPA that scales on metric, so I could use the HPA algorithm to scale on our concurrency metric.

E

That's that's pretty powerful I mean ultimately, if the kubernetes autoscaler becomes as good or better than ours, we can just throw ours away right, there's not really a strict need for us to have our own autoscaler. The other thing that it enables is. It allows for us to provide custom metrics so now that we and actually that this change just landed just like a couple days ago in the conserving we are creating v2 identities. Now we to beta want entities.

E

So now we can even plumb through custom metrics, so that, if you have a custom metric that you want to omit from your pod as.

F

E

Endpoint Imam and you have a way to scrape it you can. You can tell the service by the way you should be scaling on this metric name and that's will be fun through for you there's somebody working on that. Actually, in slack he chats about it, sometimes so definitely yeah. We plan to integrate with that now on the on the HP, a scale to zero in kubernetes upstream I. Think it's a very it's. It's.

E

The scale to zero work, I think, if that particular pull request, brings it to the point where we would work for queues, because you know if the HPI notices that the metric is above zero, it will scale up, which is amazing right and if it notices the custom metric is at zero for some time it will scale down the problem with using that directly is you know we can't wait for 15 seconds for the loop and the metrics api's don't have watch so some of the changes we've been trying to push upstream are adding watch to the meta extern to the custom, metrics API.

E

Yeah because I mean really, you know you could watch the metric and be notified immediately of a change. I mean like you, can make the system propagate very quickly and that's the way that Marcus's is wiring it up within K native. But it's just not the primitives.

G

Not quite there so I push it into the resource. Yeah.

E

Right, so it's not fast enough yet, and but it's not I, don't rule it out. I mean pushing changes upstream remains one of our. You know active goals yeah. Maybe we keep looking for opportunities for that answered all.

G

Your questions yeah and that we're in the same page, one like push always pushing up upstream and that we.

D

G

And we are experimenting our K native and then yeah.

E

The 2018 scaling roadmap kind of enumerates a little bit more clearly our principles in designing the system which had really changed our goals are make it fast. So, first of all, it has to be a solid, fast, auto scaling system goal. Number two is make it light, meaning you should be able to. Just like you know, give us your pot and we'll just do the thing for you right blight on configuration, and the third thing is: make everything better right.

E

We want to push changes upstream, so it's definitely a core part of it of our strategy cool any other questions.

G

Now this is, is great and hopefully you're sharing the slides with him who you said, yeah right, spread, spread this knowledge that people keep asking a few of us and yeah.

E

G

E

One thing really touch on is how to actually like you know, use all this I really got kind of deep on like the details. So if you were expecting that I'm sorry, but there are, there is a in the examples. There's.

E

You know like some examples about how you can use CPU based auto scaling by adding an annotation how you can configure the target, so maybe Margot Marcus might get into some of those details, because some of the configurations that you can make parts of the actual key native, auto scaling algorithms yeah, if you have any, if you have any other questions after seeing this, if you're not viewing it right now, if you're watching the recording get on slack, ask questions that one's very friendly very wanted.

D

We just have one more minute, so if anybody is a very good question, we can do that when we get down to the very stones, any other questions.

D

Okay, if not I, would like to thank the speakers for taking the day and I'm out and thank you to the attendees like yeah. The speakers have their contact information, so many questions is big feel contacted. Thank you.