solo.io Hoot Livestream, 26 Jan 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Episode 12: Hitless Deploys

Description

Join Yuval Kohavi dives into how to perform rolling deployment upgrades without any downtime using Envoy and Kubernetes.

About us https://www.solo.io

Questions? https://slack.solo.io

Code Samples: https://github.com/solo-io/hoot
Suggest a topic to cover here: https://github.com/solo-io/hoot/issues/new?title=episode+suggestion:

A

All right so welcome everybody to another episode of hoot, where we are going to discuss hitler's deploys using android kubernetes and we're going to talk about that. Why is it how we want to do it? Why is it problematic we're going to have a little slideshow and then a little demo, and I also want to once I do the demo. I actually have some questions to our viewers about how you want to see demos going forward. So uh please stay for the demo, so we can get that answer for you.

A

B

A

This episodes every two weeks talking about cloud stuff and without further ado, let's get started so I prepared a little presentation here talking about heatless deploy and again, if you have any questions, as per usual, hood feel free to ask them in the chat, and I will answer them live. So, let's start with our goal. What what are we trying to accomplish right? So our goal.

B

Is to keep the user happy.

A

And specifically, keep it happy when things are changing right. So if we have a cloud setup with a load balancer sending traffic to an edge gateway. In our case, we call it envoy that sends traffic to services. Every time we roll out a new service, or even when we upgrade the the edge gateway android, we want the user to keep getting 200 okay to the user's request right. We don't want the user to be ever said and ever request a turn into 500 instead of 200 right.

A

So presumably it looks simple, but because of all the moving parts, it gets a little bit more complicated. So.

B

The general strategy.

A

Is to have each component in in the in these layers that we just discussed understand the state of the components they depend.

B

On right and usually.

A

To signal components that you're about to go down and in our instance, is about to replace you, we can use health checks and we can combine that.

B

With some retries and we'll.

A

See a bit later in the demo exactly.

B

How, with that.

A

To do this without going and to recover from network failures without letting the user you know propagate the net back to the user as a 500.

A

So a little bit more in the details right, we have all these layers, we have the user and which is essentially an app or a website. That's sending a request to the cloud they. These requests get into the cloud.

B

A

B

Cloud load, balancer.

A

Sends them to the edge of our kubernetes.

B

A

And in our case, that's going to be an android proxy that then sends them to a kubernetes service right. So in this case, when we talk about making sure that there's no hits right that everything is flowing, we can say that. Okay, what can each layer do right? So the user can or the app can perform several retries.

A

If a request is failed before showing an error to the user, the cloud load bouncer can perform health checks on the edge gateway to know which instances of the edge gateway it can use and the edge gateway can in turn, can perform health checks and retries on the services to know which instances of the service it can use and I'm going to go into a bit more detail and I'm going to show some a mistake: can I common anti-pattern and then what can we do to make it right? So.

A

Now, essentially, we need to make sure that each layer understands the health status of the layer below it right. So we need to configure android with health checks to understand the health state of the pods that it sends traffic to, and we need to configure the load balancer with health.

B

A

To understand the.

B

A

Of those envoy instances that it sends traffic to now, why is it a bit hard than what you would think.

B

So the low balancer.

A

The edge gateway angle in our case and the services are decoupled from each other.

B

A

Most cloud loan balancers are not kubernetes, aware and anwa.

B

A

Also not kubernetes aware, unless the control plane bridges that gap. So it's it's. You.

B

Gotta understand.

A

The layers in order to understand how to make it in such a way that it would work so, let's first talk about the cloud load, balancer a to envoy direction right, so the user.

B

Sends a request.

A

It reaches the cloud lancer and the cloud base it sends it to our edge gateway in our kubernetes cluster. In our case, we're talking about android, so the cloud load balancer needs to know the health state of android. The problem is that cloud load bouncers are usually not kubernetes aware and are usually designed to work with virtual machines or auto scaling groups.

A

So let's give an example on how.

B

Not to do this so let's say we configure.

A

The our edge gateway as a node port service, and we can configure our cloud balancer to send traffic to the kubernetes node port right. What's the problem in this approach, the problem with this approach is that if cloud load answer what it says, it does not see envoy right. It sees all these kubernetes nodes and their node port and each health request that gets to a node in kubernetes can reach any of the envoy instances, because they're distributed kind of randomly there's also.

B

Ways around that.

A

But never never mind right now, right.

B

So essentially, this is.

A

Not gonna work right because the uh if I'll go back to remove that x, because the load balancer sends a health request to node 1, it will get the health status of randomly either android.

A

Note 1 or android note 2 right, so it makes the health information useless and again it's kind of, because the cloud load balancer is not kubernetes aware so, essentially, if you're using a node port to expose your age gateway to your cloud load balancers, you cannot just naively use hell shades because they will mix and match on the different pods and essentially become meaningless right. So what to do, then? If we're not going to do that, so you gotta, you have several options in gke.

A

You can use a cloud a cluster, that's a with a native vpc cluster.

B

And that essentially.

A

Makes the balancer aware of the individual bot pods each part will get its own routable ip right. So if your a cloud advancer can be made aware, can we make kubernetes aware that you know that's great and it essentially, you don't need a node port service anymore.

A

Another option is that if you have to use a node port, you kind of have to make sure that the association of a kubernetes node to an edge gateway pod to android is one to one right and you can do that with either a daemon set with host port or host networking, or we can talk about different deployment methods where you deploy at most one pod to one node and use external traffic policy local on the service, so that cool proxy only sends requests to local pods right.

A

So that's to solve the the health uh correlation problem with the cloud of dancer. Another thing you need to do is configure the health check filter in envoy and we'll show you how to do this in the demo in order, so that envoy replies to health check requests all right so up until now. If there's any questions feel free to ask, is everybody seeing that presentation, okay feel free to drop a line in the chat, say hi?

A

That makes sense if you want me to go a bit deeper or anything all right, so so that's kind of the short version of the heatless de you know: health checks and utility plus between load, balancer and and the edge gateway right. If we mess up these health checks, uh the gateway might mistakenly think that one of the nose is not healthy and and will route less traffic to it.

A

Even though it's only a specific boat, that's not healthy right and traffic will still get to that and we get 500 right. So if I'll go to this diagram again, let's say that.

B

The load answer, because.

A

Of this earner set setup things that node one is not healthy. They'll send requested to node two they'll still make it to the angular or node one. That's a coming down and the user will still get 500 right and that's why we need all this strong correlations between the health check performed by the load balancer and the envoy pod. That's essentially the goal that we're trying to achieve here all right.

A

So let's talk about the the other segment of and void to the upstream, so the request made it from the load balancer to angular, and we did some processing as an edge gateway and now sends it upstream.

A

So, in order for that to work, ama needs to know the health state of the service spots right so envoy is not aware of kubernetes readiness and liveness probes right. So, even if you defined a readiness probe and the readiness pro fails, as the pod terminates android.

B

Is not aware of that unless the control pane.

A

Makes andre aware of that, and we will not be by default, aware of that, because android is kind of infrastructure agnostic right it just does the data path and it lets the control plane, uh give it the details it needs from the infrastructure.

A

In addition, you got to remember that in a distributed system, the view of each component in the system of the of the system state is eventual consistent right. So if a pod tears down it takes time for this information to propagate to android right. So, even if you have a control plane that takes the kubernetes information and sends it to android it still. You cannot rely on this happening instantaneously right and let me uh I have this animation.

A

I I gained some animation skills in here to to kind of understand to kind of relate the all right. So let's talk about a fail attempt here on how we want to do a rollout for another pod in this deployment. So let's say we have android and envoy's view of the state of the cluster that it has two endpoints. It has a 1001 and 1002 right now.

B

A

Want to do is we want to kind of do a rolling, deploy, deploy another pod and uh remove one of these pods in our deployment. So now we deploy this v2 pod with ip003 and we delete the previous pod as part of the rolling deployment right now, because the system is eventually consistent. There's gonna be a little bit of time until this information propagates to anboy right, it might be half a half of a millisecond, but it or more.

A

Realistically, it's probably going to take a second or two, but it will take a non-zero amount of time for this information to propagate, and during that amount of time you will get 50 of the requests will fail right and if you have a million requests per minute and that could translate to a good amount of errors right eventually, the information will propagate to envoy and it will send the request, but it might take time now it's especially true if, if your deployment is small and if you don't have any returns configured- and you might see this problem manifest even more right, so the way to overcome this problem is to make sure all these components kind of give a little bit of wiggle room to allow the rest of the components that depend on them.

A

Understand that they are about to go down and that's where the health checks and retry is coming right. So this obviously doesn't work because we get unhappy users. So what do we do when a pho terminates? Kubernetes sends a sig term instead of exiting immediately the pod should.

B

A

Some time to drain connections, gracefully right.

B

A

For example, maybe for 30 seconds, the pod should just stay there and keep processing the existing connection and, at the same time signal components that you use it to stop using it and the way to do that is to fail health checks right so let's say the pod serves health checks on slash health and normally, if the it returns 200 once it gets the sick term, they need to start returning, 500, so dependent components and in our case, for example, knows that this boat is about to go to go away and it should stop sending traffic to that pod right.

A

So, if there's any questions feel free to write them in the chat, I know it's a lot of information I'm giving up all at once. So please feel free to pause me anytime and ask if anything is unclear. Yeah.

B

Right so the basic.

A

Idea is kind of when you're building two parts and you want to put them together. You you don't want it to be too tight of a fit because then you they might not work you. Oh you want a little bit of wiggle room right and that grain period is that wiggly room right. So the pod, instead of terminating immediately hangs around for a bit, fails health checks. Let dependent component know that it's about to go down and after that drain period is over. Then it exits right.

A

So, on the amway side, we can configure health checks and retries to to essentially.

B

Process that drain and understand that drain.

A

Period from the plot now a service mesh can automate some of this. It can actually add health checks that fail automatically, and so that's also something to explore. If you have, if you don't want to rewrite all your application code everywhere, maybe a service mesh can help you, but you know: do your own research on that part all right. So let's talk about what we're gonna see in uh this short demo.

A

I just basically gonna, demonstrate envoy uh talking to two services and then being deployed, and after that we'll review the android configuration and see how we made it to work with the heat less deploy.

A

Now the question I want to ask about this demos, usually I like to keep these demos kind of local, that everyone can run them on their laptop without kubernetes, but I'm not sure, maybe in this case it'd be more valuable to do a demo on kubernetes with deployments, and so, if you have any opinion on how this demo should go going forward, please leave a comment.

A

If you prefer a demo that you can hack around yourself and change and not need a lot of stuff to run, or do you prefer more realistic demo on kubernetes on kind that you can play, you know in a cluster.

B

Environment so.

A

Either leave it in the chat here or leave a comment and tell me kind of which demo directions. You think I should go with all right so with that, let's change it to vs code.

A

And so first let me review this uh um kind of service environment that I've created.

A

So what we're gonna do here. First, we're gonna start uh service, one right and service, one returns 200 just so we can tell the difference between it and other services and we're going to let.

B

Anway know that.

A

The service was deployed essentially update, android eds information with the endpoints of that service and then we'll deploy a service too. Now, unlike service 1 service, 2 drains, when it terminates.

B

But service 1 does not.

A

Drain when it terminates right, then we'll exit service, one update the information that service one disappeared and.

B

Do the same thing with service.

A

Two and then see the difference between a service that drains and a service that doesn't so first thing. First, let's start envoy here in the background.

B

All right we have we're running and that's the connection there. That's fine, because the control plane is not running, let's run the control plane. So it's no complaints.

B

Here we go sorry here we go okay, clear the screen.

A

Perfect, so your language connected everything looks good all right.

B

A

A second terminal here, a realistic with kubernetes. uh Thank you for that and I'll try to make the nexus more cluster, with a yamas and kubernetes and kind.

A

Six, so now what I want to show is we're gonna, run, hey and issue some requests. Now you see, service 1 is responding with 200. Everything is good, nice all right. So now I'm going to just hit enter here and we're going to see what happens when service 2 is deployed and service 1 is being teared down without any draining to service.

A

So it's deploying service, 2 and removing service one, and, as you can see, I don't know if you said it was pretty fast. We had some fair requests and just to show you that fair request, you can.

B

See that we had a.

A

Hundred five 500, so we had a hundred failed requests because of that eventual consistency, because it takes time for the information that serviceman disappear, propagate to endpoint and in the meantime, and we might keep sending requested right.

B

So, let's get back to hey.

A

And we'll deploy service, 3 and.

B

Tear down service.

A

2, but this time service 2 has draining all right. So you can see it's deploying publishing the endpoints and service three is now deployed and let's revisit that metric and you can see that metric did not increase right. It's a still original hundred five hundreds, we don't know we did not have additional 500 when we deployed service to now. All this code is, in our gita brief, there's a link in the description, a github solo yohood. So you can look and play around with it and in.

B

The next demos.

A

I'll try to make them more in a kubernetes environment, but you can see exactly how the draining works. I don't want to get too much in the woods right now, but the.

B

Idea of this demo.

A

Is showing you how this draining.

B

A

Just saved you, 100 failed requests right because we allow android to understand that the service is about to go down because of all these moving parts and because everything is eventually consistent. We do need this a little bit of wiggling.

B

A

Before we look at the android config one more thing, when we talk about the load balancer part android can expose health checks on it on its own. So let's look at these health checks, so it's configured to use, slash health and you can see that it returns http 200, okay, now what we can do is to fail. Tell I'm going to fail its health checks and the way we do it. We go to android's admin endpoint and we call using a post request.

A

Health checks fail, so you can say curl post a 8001 is the admin port health check failed and then once we tell I'm going to fail itself checks once we curl it, we get 503 and that's.

B

How we can signal to the load? Balancer, that's querying.

A

Onboard the status of android itself, all right, if there's any questions on what you just saw, please feel free to ask now and in the meantime I will review the android configuration that I used in this demo right. So it starts pretty normal things get interesting right here. So the first thing you can see that's different than a normal, vanilla config.

A

It is a retry policy defined on the virtual host and what I'm telling it is to retry on all of these uh network error right. The current thing with this error they're, usually not application, specific they're, usually because of some time out some network error. You know some connection, refuse connection reset and not a bad request right. So these are network errors, and I grabbed that list from the android documentation and in the readme, the repo there's a link to the documentation where it disappears and essentially.

B

A

Do a five retries and make five attempts to select a healthy host uh to a retry to, and I see we have a question in the chat: how does the pool ejection it's your work, so I'm going to get to that um so circuit breaking is something a little bit different and we're not going to cover circuit breaking here. We we are going to cover it in just one second, health checks and outlier detection, which is the in the cluster definition down below.

A

So if you still have a question, this question after I review that part, please feel free to ask it again.

A

uh So, uh okay, so this is first part the retry on network errors and that's defined in the in the virtual host or the route level configuration inside the http connection manager right. So that's one part and that.

B

A

Happen to any the reason we define it here.

B

Because that's a.

A

Property of the request, whether we retry a request or not when we're talking about health checks, that's a property of a pod or or a host, and this will be defined on the cluster which we will review now.

A

So essentially, hashtags are used to tell us which host of the cluster is helping right. So if we look at the cluster definition, we can see that it gets the endpoint from eds. That's how I made the demo dynamic and.

B

We can see, we have two.

A

B

A

um One because it's a small number of endpoints, I disable panic mode. Otherwise the demo wouldn't really work correctly, but we'll ignore that for now, if you want to learn more about panic mode, feel free to read about it in the android box, there's two uh sections or two.

B

A

Configuration that are interesting in a cluster there's passive health checks and active health checks right, so outlier detection is the passive health checks and what that means. It means that envoy observes requests going to a certain host and observe their responses, and if envoy sees three consecutive 500 response codes, it consider that host not healthy right and it's called passive, because for this to work there needs to be traffic right. There needs to be traffic from the user ongoing uh to that upstream.

A

Cluster yeah right, so android does not initiate anything here right if all of a sudden that host goes down, that pod is deleted and boy and android is not made aware by the control plane, and we will still consider that pod healthy.

B

A

It gets concerned three consecutive 500s right. The second part is active health checks and those are health checks initiated by anwar itself.

A

Right and first, I want to say the first part is a generic part, that is um all health checks have in common, so unhealthy threshold and healthy threshold and that those are threshold to after how many uh successful or failed health check requests.

A

Do we change the state, the health state of an upstream uh ip address and a pod right, the interval in which to do health checks and the interval to do health checks. If there's no traffic, you might want the lower interval here to reduce the load on the cluster. I I set it to low again for this demo to work to work more reliably and timeout, basically, the timeout for the health request.

A

Now, because it's an http option, we define an http health check, there's a bunch of parameters.

A

I only defined the path right so with this configuration approximately every second there's a little bit of jitter involved, but approximately every second, and we will send http requests to the upstream uh two slash health and if it gets 200, it would consider it success and if it would get the 500, it would consider it failer, and these are all tunable all the status quo. Everything you can review that in the android documentation right. So to summarize, there are two means uh for health check.

A

First, one is the passive health checks and that's, I believe what the question from the chat was about: pull ejection and that's essentially a outlier detection, and the number of errors is configurable right. So it really depends on your specific configuration. What it is.

A

Health checks are, after one, where android actively sends requests and queries the health information and.

B

A

This demo worked really is that when service 2 came down, it listened to this health url and sends 500 instead of 200, and during that period, andre knows to stop sending traffic to that service, because it's no longer healthy and only send traffic to the remaining instances.

A

All right, I know that's a lot of input to take in uh I'll, do a one final review of this config and then we'll we'll uh wrap it up here. So we in this android configuration, we implemented two mechanism, two slash three to increase reliability and enable heat less employees. One is retry policy on network errors and the other is health checks. Here I define both passive health checks, also known as a outlier detector and active health checks. You may not want both of them. You may want one of them. You want one, the other.

A

It really depends on your use case, and what this is allowed to do is essentially assuming you're a downstream pod. Sorry, assuming your upstream pod, knows to properly signal and propagate healthy information, as you tear it down. This will allow I'm going to process this information and stop sending traffic to the pod before it's completely done now, in the case that something happened, the part crashed or maybe you cannot change the drain period, because you don't control the application part.

A

You can also add retries and to retry in case of a network error and then send a request to a different location, so combine. This will up your reliability rate from the edge gateway. I.

B

Also have what I.

A

Said summarize in this readme, you can see kind of an overview instructions on how to run the demo and links to the android documentation. If you want to read you know more information and and understand this a bit more in depth.

A

All right with that, if there's not not a lot of other questions, we're gonna wrap up for today and again feel free to join us slack, we're available there and feel free to ask us questions. We love to help the community and join us again in two weeks for another live stream of votes.

A

Thank you. Everybody to for joining and we'll see you soon.