KubeVirt KubeVirt Summit, 24 Feb 2021

Previous Meeting

⏯

youtube image

►

From YouTube: Zero downtime KubeVirt updates

Description

KubeVirt has a very precise and resilient method for ensuring zero downtime updates occur.

In this session I'll cover the general strategy behind how we approach updating KubeVirt from a developer's perspective as well as discuss future improvments to our update process.

Attendees will come away with an understanding of how KubeVirt's update process has been designed, how it is tested, and what future enhancements are coming soon.

Presenter: David Vossel, Senior Principal Software Engineer, Red Hat

A

Let's see all right so just can somebody give me a confirmation. They can see my slides, yes, excellent, all right, so let me get started here um all right, so my name is david vossel, I'm one of the core contributors to the keyboard project and I'm going to talk about uh how we handle updates for cubert um all right so back and what I want to say was 2018.

A

I started hearing this phrase thrown around at conferences and the phrase was let's make kubernetes boring and the intent behind this phrase was that opera and kubernetes is going. It needs to be as mundane.

A

It needs to be mundane and exciting, so everything should just be so reliable, straightforward that performing and maintenance events like updating the entire cluster. It it really shouldn't be a big deal. It definitely should be something that we fear.

A

So when we started looking at how designed cubert's update process that mantra of make it boring is something that really resonated with me. So updating it should cause minimal to zero disruption.

A

Our components should be capable of operating in a degraded state, as new versions are rolled out and the whole update process. It needs to be observable, so you should be able to tell easily what is going on in your cluster.

A

So the angle here is q, vert updates. uh I want them to be so reliable and mundane that I'd be willing to update keeper in production in the middle of business hours and do that with confidence, and the only way I'd be willing to do. That is if we can make some sort of guarantee around zero downtime. So to kick things off.

A

Let's talk about our commitment to zero, downtime updates for qbert and what exactly uh that means for the project.

A

So today we have two primary commitments when it comes to zero downtime updates. The first one is that our api is going to remain available throughout the entire update process and and what.

B

I mean by this.

A

Is that the credit operations performed on key vert objects like virtual machines, uh they're going to remain possible during the update, so people should be able to perform life cycle events on virtual machines like starting and stopping them throughout the entire update. Now the timing of how quickly those lifecycle events are processed, it could be delayed. But the big point from the drive here is that invoking these actions should not be impacted, so our api by design is going to be capable of responding even in a degraded state.

A

The second commitment is that vmi workloads will remain undisrupted throughout the entire update process, and what I mean by this is we're never going to require uh performing a destructive action on your virtual machine workloads as part of the update process. So if you're running a database, for example in a keyboard vmi that database is going to remain uninterrupted.

A

Okay, I did put a small caveat here that says: unless you tell us to- and there is a situation where you might actually want to opt in to shut down a bit on your bmi.

A

That would happen during an update, but we're never going to require that and the reasoning for this I'll get into towards the end of the presentation. I mainly put that bullet point in there just for accuracy, but I'll get to it a little bit. Okay, that's.

B

A

Commitment to zero downtime and all that said, we do expect some disruption to occur during the update and that primarily is limited to anything that involves a persistent connection between our control plane uh components. So the the first one is the in-flight live migrations, they'll, probably get terminated. It really depends on timing, and the reason for this is vert handler is responsible for processing the live migration streams.

A

So that's one of our components and when we roll out new vert handlers, any in-flight live migration streams they're going to get closed, which is ultimately going to cause migration to terminate if you're using pre-copy live migration. This isn't really a big deal. Your vmi is just going to keep running wherever it was running and nothing's lost another form of disruption that we expect is any console, vnc connections to active, running virtual machines, um they're going to get reset.

A

So again, this is a persistent tcp connection, that's being uh tunneled through our verde api component and our handler components. When we roll out new versions versions of these components, uh it's going to reset those connections and practically there's not a lot. We can do about that. One okay, so what I want to do next, oh.

A

B

So what I want to do next.

A

Is I want to walk through the installation, update process and talk a little bit about how it all works? So if you're installing qvert, for the first time, you're, probably going to find some documentation on keeper.io about first installing the keyboard operator and then posting a keeper custom resource to the cluster which in turn kind of kicks off that installation of keeper? So what's going on here? Is we have a top level controller for operator which is capable of orchestrating the keyboard, install and update process? But bird.

B

Operator, it's not going to do any of that.

A

Until you post the keyboard custom resource which is going to give vert operator the instructions, it needs to know how to perform the update um so go into the process. Your installer operator you're, going to post that custom resource.

A

That's going to tell for opera how to install cube vert. We.

B

Have manifests.

A

For both vert operator and the keyboard custom resource and they're published in the asset list with every one of our cuvette releases, so this is just a screenshot of what it would look like. I guess I'm using dark mode on my github, but uh this is the vert operator yaml. It's going to contain.

A

You know the vert operator deployment and anything that deployment needs to work. So I'm talking about service accounts web hooks, crds whatever, and then we have the kubert custom resource, so keeper cr.yaml and that's going to contain the very basic instructions that will signal for operator to begin the installation process.

A

So when it comes to setting the instructions on the keyboard custom resource to declare what that install needs to look like there's two approaches here- and I I made up the terms here- you're not going to find this anywhere easy mode- is what I'm going to refer to as a blank spec.

A

So you just post a empty keyboard, custom resource back and that signals support operator to perform a default keyboard, install using the exact same release of the operator that was installed. So um if you install version 0.38.

B

A

The operator and you post a keyword, custom resource that looks like this.

A

You will get version 0.38 of cubert now in advanced mode again, totally just made up this term uh is that you can pin a kuver custom resource to a specific kuvert release, and you can do this one way of doing this is by saying the image tag. Let me hide this. I don't know if you guys can see this or if I can just see this okay, so you can set the image tag on the q, vert custom resource and that's going to tell vert operator to install a previous or a specific version of cubert.

A

That might not be the same one and the operator is so. Why do we have two modes? Flexibility is one reason, so it.

B

A

People to use different operator and keyword versions that can of float independently of one another, but really the big one here is rollback. So an operator it can't transition from a release. Version of qvert, that's more recent than the currently deployed operator, so the operator has to be greater than or equal to both the current version of keeper and the desired version of kubert when performing the transition.

A

So I guess here's a rollback example that was on version 0.38, I'm going to roll back to 0.37.

A

I'd have to have at least version 0.38 of the operator installed, and this has to do with permissions, and the new operator has to know about both everything, that's required and has to have permissions to do everything required for both releases, and we only do that in a rolling forward type of fashion. So you can only roll back if you had the most recent version of that operator.

A

Okay, let's quickly take a look at the typical installation process and um I'm going to show how you can follow that process by observing things on the kuvert custom resources status section. This is kind of teeing us up for observing a update in a minute. So if you're installing qvert from scratch, then you're probably going to want to do something like this, where you first post the operator, manifests then post the keyboard, custom resource and then wait for kuvert custom resources task to indicate that vert operator has completed the install.

A

So one way you can track the progress of the install is by observing the changes to the conditions on the q vert custom resource so that wait command here at the bottom. That's telling us that the install has completed once a condition called available is set to true. So let's take a closer look at what you can expect to see in the cuvette custom resources status, both during and after the installation is completed.

A

So, as soon as for operator begins working on the keyboard install that's going to be indicated in a condition called progressing, and here we can see that condition, and it gives us a nice reason for why progression conditioning exists indicating that a deployment is in progress.

A

We can also see the phase is deploying, and after we wait a bit and once the installation is completed, we can observe some changes so that phase is going to go to deployed.

A

The progressing condition is going to be set to false, with a nice reason for us there, and the available condition is now set to true, which that's the thing there cube control um command was waiting on and again. It gives us a great reason that all components are.

B

A

And lastly, if you just wanted to see all your pods in action, you could just list all your pods in that namespace install, cubert and you'd see you know, vert api for controller handler. All these things are running and ready.

A

Okay, so now we have q vert installed. Let's take a look at what entails the trigger and observe the update. That's in progress if you took the easy mode for installing keyboard, where the operator and cuvert release versions are always going to remain in sync, then triggering an update is as simple as applying a new q fert operator manifest, which you can find in the asset list of our official keyframe releases, which I had a screenshot for that earlier. So in this example, we post a new kuper opera yaml.

A

Then we wait for the progressing condition to get set in the kubert cr. So that's indicating us tests that the operator is now initiating the update process and then finally, we observe the created condition is set to true which signals to us. The update has completed so we can have an indication of when the update has started and when it has ended. So if you're, actually looking at the keyboard custom resources status, you can see progressing is going to be set to true again. We have a reason there, it's going to say, update in progress.

A

That's telling us that we're transitioning to a new version or a different version of cubert we're going to see something new here. The observed version of qvert does not match the target version in our q, vert custom resource status. So that's telling us what the last version was that was installed and where we're headed so telling us that transitions taking place, and this is what we're transitioning to so.

A

If we waste some time we're going to see that that progression and progressing condition is going to be set to false nice reason, there we're going to see that created condition the one that we were waiting on uh in our cube control command. That's going to be set to true and we're going to see that the observed version is going to match the target version now, so there's no transition taking place.

A

Okay, so that's what it looks like from an administrative perspective to install an update cuvert. I want to talk about the details about what vert operator is actually doing behind the scenes and how it orchestrates the update now so in order to guarantee minimal disruption, vert operator is going to be performing the q fer update in a very specific and controlled order. So there are several phases involved here.

A

The first phase uh is going to be involved with installing all the prerequisites required to successfully launch the new updated keyboard component. So this is going to involve things like installing new crds, meaning any new, apis and updates, and if there are new update, um excuse me apis that didn't exist previously.

A

We're going to install a temporary validation web hook to block the use of these new apis until our new components come online to actually service the api, so we're not going to allow people to use new apis until we know all the infrastructure is online to actually serve those and in order to ensure zero disruption.

A

We're doing this thing where we merge the old our back permissions with the new. Our back permissions and the reasoning for this is briefly when we're performing the rolling update of our components. Both the old and new versions are going to be running at the same time and they both have to operate and serve requests uh correctly.

A

So we have to continue uh with both sets of our bat permissions in order for that to work and, lastly, we're going to install any other prerequisites that can include things like new web hooks service, endpoints, prometheus rules and anything else, there's a few other ones. I didn't even list here so the next phase, after our prerequisite phase, is we're going to roll out uh vert handler so vert handler is our daemon set lives in all the compute nodes?

A

We roll out vert handler before the other components, because any new vmi lifecycle features are usually coordinate, coordinated by vert handler so by rolling out vert handler before the other components, we can ensure that the logic to support any new lifecycle features it's going to be available before the other components come online and potentially try to use those features.

A

All right next, we.

B

A

Vert controller and then, after that we have vert api and it's important that api is the last component in this chain, because vert api is what enables new user facing features and rolling out by rolling out vert api last. We can ensure that all the other components involved. These new features are available before anyone intends to use those new features after vert api. We're going to um what attribute we're going to be certain that all of the old components, all the old deployments, all the old pods involved with our control plane are down and then.

B

We can introduce.

A

Any backwards incompatible changes to our api. These would be things like introducing new versions um to existing apis and potentially deprecating old versions, anything that might confuse our old components. We would do here and, lastly, uh we're going to do cleanup. So remember back in phase one I mentioned we have to merge in both the old and new rbac permissions into the cluster. At the same time- and this is the phase where we're going to clean up those old, our back permissions and any other temporary objects used during the install process.

A

So this this face is also going to clean up any old components that maybe didn't have an equivalent in the version that keeper would transition to. So, if there's anything left over we're going to catch it here as well,.

A

Okay, wow time is creeping up on me here, um so that's a quick overview of the install and update process as it is today. If you're pretty familiar with the keyboard architecture, you might have noticed. I never mentioned anything about updating the components that live inside the vmi, pods they're, actually.

B

A

The guest virtual machines- and we have a lot going on in there, so there's a keyboard component called vert launcher, and then we have livert and qmu in there as well. So um how are those components updated and the short answer is uh well they're? Not so, as today we we don't do anything. We don't touch your bmi workloads.

A

uh However, we recognize this is a problem. uh So if.

B

A

Vulnerability in any of these components within the vmi pod, then we need a path for automating, the update of these components and back in the very beginning of the presentation, I made a disclaimer that we're not going to touch your workloads unless you tell us to and here's where I explain that so I'm working on a new feature for vert operator that lets us declare a strategy for updating vmi workloads- and this is an opt-in feature. That's configured globally on the kubert custom resource.

A

By using this new uh workload update strategy api, you can tell us what methods to use to update your bmi workloads so right now I have two methods you can use. The first one is non-destructive: it's the live migrate method, which is going to cause a bmi workload to migrate into an updated pod, with all the new components so again, non-disruptive the next one is evict and most likely that's going to result in a vmi game shutdown.

A

So at the vmi is managed by a vm object, with the run strategy set to always, then the vmi will restart in an updated pod automatically, but that's a destructive action or it's disruptive, because it's going to actually interrupt the workload and restart it. So if you want your uh workloads to update in a non-disruptive manner set this live migrate method and that will guarantee your workloads are only um impacted by live migration and if you don't have live migration enabled in your environment.

A

For some reason, then evict is really the only option you'd have if you want this behavior. This feature is pretty much done. uh I think we'll see it in the march uh release of kuvert, but you know we'll see. Reviews are welcome. There's the link to the pool request, um all right, so I'm gonna close things out uh when I said I want to make kubert updates boring one of the primary ways we can ensure it stays boring is by having a solid test suite that exercises the entire update process. So right.

B

A

On every pr, we have functional tests that execute to verify, update, updates work between the latest official keyboard release to whatever code is present in that pull request. So this means that, if there's anything in a pull request that breaks the update path from the previous keyword release, um it's not going to make it into our code base. So this test is actually gating code from being merged.

A

These tests also verify our commitment to zero downtime for your bmi workloads. So in the test cases we're doing things like start, the vmi workload perform an update, then validate um that the vmis are still running and we can still lifecycle manage them after the update has completed okay, I tried to cover a crazy amount of ground in 20 minutes. There was a lot more, I wanted to say, but uh yeah I think, I'm out of time.

A

Hopefully this at least gives everyone a sense of how we approach updates in keyboard and uh some of the techniques we're using to provide our zero down downtime guarantees.

A

um So I maybe I have a few minutes left, I'm not really sure. uh Maybe if there's any questions, I can help answer those.

A

Let's stop sharing.

B

So far there have been no questions, there's a comment in the chat, um but you have still a few more minutes. If you want to, I mean, I sure, yeah everyone to ask questions. Remember you can ask them here in the chat or in slack sorry go ahead.

A

If I have a few more minutes, then yeah I'll keep talking. um Maybe I did such a good job in my presentation. Nobody had any questions because it was all perfect uh so.

B

Here's something.

A

I wanted to get into that. I didn't get to um I and during the presentation I talked entirely about updating, cube vert, but you know that's.

B

A

Update path that you have when you're running keyboard, you also have to deal with um cluster updates. So if you're updating the cluster, then how do we avoid disruption to our virtual machine workloads when we have to do things like, for example, update a cluster node, and um that uh is configured a little bit differently, but we have a way of doing that and it's going to be used. um We have something called the eviction strategy and it's something that exists on our vmi api.

A

If you set the eviction strategy to live migrate, then if you are doing a maintenance event on a node during an entire cluster update, we're going to gracefully migrate, bmis off of those nodes somewhere.

B

A

Cluster, hopefully you have enough capacity and that's what allows us to keep vmis available. We know virtual machines are stateful where pods are kind of ephemeral.

A

Usually I mean they can be stateful as well, but certainly virtual machines are usually stateful, so we want to preserve a running virtual machine, even in the case of an updated entire cluster. So using the eviction strategy, we can move, make virtual machines portable across nodes by setting that live, migrate, eviction strategy. So when a.

B

Node is drained.

A

It's automatically going to migrate, our virtual machine workloads off of it and when the node comes back up, then it will be eligible to have new virtual machine instances run on there. So that's how.

B

We're handling.

A

And supporting the case of entire cluster update.

A

There any questions, I'm looking at the chat here.

A

How does update process handle dmis with host disks or container disks that do not support live migration? Also, does the change api mean existing connections to that.

B

A

Will break including console, I think, there's two questions in this one. So how does the update process handle vmis with host disks or container disks um which update process? Are we talking about I'll answer both so the q vert update process? um Again, we don't touch your vmi workloads unless you tell us to so. If you don't tell us to do anything, then nothing's going to happen to those okay in the q, vert cr update, so by default.

B

A

Going to happen to those they're just going to keep running, if you wanted to migrate them, then you're not going to be able to migrate with a host, something that's local to that specific node that it's running on. It's just not eligible for migration. So your only option if you actually need to update the vmi component when I was talking about workload updates, is event which is probably going to restart your your virtual machine, and the second part of that question was the change in ip break existing connections.

A

um It's certainly going to break console because console doesn't even go through the api. I'm sorry the ip of the bmi, it's it's kind of going through a back channel as far as the api.

A

Excuse me, the ip change. um Well, if your ip changes, then yes, but it really depends on what network you're using so there's lots of options for setting up a virtual machine with different networks, and things like that, some of them will work successfully in a migrate and some of them won't. So if you're using the bridge connection with the pod um network, then that's one of the ones that won't work you're going to get a new ip address. It's going to probably not be successful.

A

If you're trying to maintain persistent connections.

A

But there's possible other network or c9 plug-ins, or we need somebody from the network vertical to maybe help us out a little bit here, but there's other ways of attaching network devices that might be persistent through that there any other.

A

A

Okay, well follow up on the slack channel, especially with that networking question you might be able to get some more detailed information from somebody who is more knowledgeable than I am about how network connections are persisted across migrations and what options you have available there.