KubeVirt KubeVirt Summit, 24 Feb 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: KubeVirt data protection and forensics forum

Description

Let's get together to discuss plans/ideas to extend KubeVirt's data protection and forensics functionality.

Presenters:
- Michael Henriksen, Red Hat, github: mhenriks
- Ryan Hallisey, NVIDIA, github: rthallisey, twitter: @rthallisey

A

You can take part in the conversation.

B

A

Having said that, about logistics,.

B

uh Let's get started mike ryan, all yours, okay,.

C

Thank you pap yeah, so I'm I'm mike kendrickson. I work at red hat and I'm going to be talking about uh data protection.

A

um I'm ryan halsey. uh I work at nvidia. Sorry. I work in video talking about forensics yeah.

C

So yeah, I'm gonna, I'm gonna talk a little bit about data protection um what's available in convert now um what we're planning for the future, uh what some of the challenges there are and um but yeah like pep, said we're really interested in hearing from the community.

C

This is uh kubernetes. Data protection is a rapidly growing um field or topic of interest, and we want to know what you're doing for data protection now, what primitives cubert can provide and to kick that off. We've got ryan here to talk about his use case and you know uh how we can extend some of the we we're planning to extend some of the you know: data protection primitives to help him uh do forensics and, of course, a bunch of other things you can do with these primitives. So yeah. Let's talk about.

B

C

Let's talk about what you're doing now, and maybe you can help us with some of the challenges. So, okay, what is data protection, uh so data protection? Is you know? The purpose of data protection is to insure an application, and its associated data can be restored quickly after any corruption or loss, so basically backup and restore, um and so what's available now, there's offline virtual machine snapshots and I've done a terrible job of promoting this feature. So I will be going through the api and telling you how to use it.

C

We'll talk about online snapshots, which will be coming soon uh in their crash consistent and application. Consistent.

C

Time and god, willing, uh availability and uh valero integration. Valero is the number one kind of kubernetes backup solution or data protection solution right now, so we'll we'll talk a bit about how kubert is going to integrate there?

C

Okay, so if you want to use uh q vert snapshots, uh there are a couple things you got to do. um First in the keyboard realm, you have to enable the snapchat feature. Gate keywords: are the snapshots uh alpha feature and convert so um turn on this feature gate there is, uh should.

B

Be documentation somewhere.

C

In the cooper info telling you how to do it right now, uh you can only snapshot your virtual machines, um so uh bmis and vmi replica sets to come and uh yeah offline snapchat. So your virtual machine has to be halted uh for storage requirements.

C

So if you're, the big thing is, um and the kind of one of the main drivers in this feature was we want to be kubernetes native everywhere, so we are leveraging the kubernetes volume snapshot api. um You know. Obviously, if you know we run uh qmu vms and you know there are all sorts of uh capabilities baked into that. If you're using qca2.

B

Files, you can do.

C

Snapchats whatnot we're not doing any of that. We are relying on kubernetes, so you need a volume snapshot beta api uh deployed in your cluster. The crd. There is a volume snapshot, common controller that must be deployed as well and web hook. So these are things that are available in state. Sig storage maintains them. You can get them in github somewhere uh to so, because we're using the volume snapshot api for persistent.

C

uh You know your volumes and your vms that uh have to be data volumes or persistent blind claims, meaning uh if you're only using container disks.

C

A

Be supported we'll.

B

A

C

Your vm configuration in the snapshots, uh but uh we're not uh snapchatting any of your data and, of course uh you need a csi provisioner uh for your storage class and support snapshots.

C

uh Okay, so those are the uh table stakes there. uh Next, so, if you wanna snapshot a vm again make sure it stops and just submit a email that looks like this pretty easy uh you can see the source is a virtual machine and it has a name okay. So all right, so snapshot monitoring. So you can see buried in this yaml somewhere. There are a couple conditions, progressing and um ready.

C

So basically you can use cube cuddle weight. I think it's called and you can wait for the condition to be ready to for when your snapshot is complete.

C

um So one interesting thing I want to point out at the bottom: is this volume snapshot content name so back here we introduced this volume, virtual machine snapshot resource, but there's also, this vm snapshot content reference. What is that so? There's a virtual machine snapshot content resource and it is similar to. I don't know if you are familiar with the um csi volume snapchat api, there's volume snapshot and volume snapshot content.

C

So a virtual machine snapshot is like the intention, the job I want to snapshot this vm and the con. The virtual machine snapshot content is the um container that has uh the data and references to any data that uh are applicable to that snapshot. So we can see in here uh we're embedding the virtual machine spec, uh embedding any persistent volume claim specs and uh any the names of any virtual machine volume snapshots that have been created.

C

So when you submit a virtual machine snapshot, a virtual machine snapshot, content will get created and then the controller will create the volume snapshots and basically glue everything together in this snapshot content, so that this is like a map of everything that is involved in a virtual machine snapshot.

C

Okay, so snapshots are cool but they're kind of useless without a mechanism to restore. So um this is how you would restore a vm. There is a virtual machine restore resource and you specify the target which is vm and the name of the snapshot uh pretty straightforward.

C

um You can monitor this similar to how you monitor uh snapshot. um So what goes on in the background here is that the restore controller will look at that virtual machine snapshot, content resource.

C

Create pvcs from all the volume snapshots that are referenced there, wait for those all to complete and then update the vm spec with the spec that is in the snapshot resource. So, while all that is happening, progressing condition will be false, ready will be false and then eventually it'll be true. um Once everything has been created.

B

C

All together in the new virtual machine, spec cool, so offline snapshots, um that's what we've got now uh and we're working on online snapshots, but I just want to go into a bit of the challenges and what we need to address to make that possible. And this is an area where you know.

C

Kubernetes storage experts out there definitely chime in if you have any tips, tricks uh or any quirks or features that may be applicable.

C

um So one thing to consider is that kubernetes csi gives no consistency guarantees, so you can create a volume snapshot at any time and there's no guarantee that you know the snapchat will be created, but there's no guarantee that you know the snapchat is going to be useful at some time in the future. Could you may have just snapchatted uh at a time when you're uh file system is in a state where it couldn't be restored?

C

So that's why you know offline snapshots. We know that everything nothing is being used. We wait until any that are referencing at pvc are are gone and then we uh snapshots when we know that they're not in use. So we really need to solve this uh consistency problem when we get to online snapshots.

C

Another thing in this realm is: when you have a vm that is usual. You know. If your application is spread over multiple disks, you need a way to.

C

Make sure that you know all the disks are consistent amongst themselves and there's work in the community uh to address that uh there is a notion of a csi volume group which will be uh coming along very soon. I encourage you all to check out the data protection work group after doing some cool stuff.

C

So um that's one of the challenges there. um How do we.

C

Get snapshots that are usable uh so for crash consistency. We really uh need to do an fs freeze uh at the file system level to make sure that um the the file system can be restored. So what fs freeze basically will flush any outstanding uh activity on the file system and block any future writes, I think things will just block until snapchat is created and then, of course you unfreeze, which um you know will continue anything that was queued up or or whatever so for a crash consistency.

C

You know minimum you have to fs freeze unfreeze uh again the data projection work group knows about this and there are a couple: different initiatives: there's a container notifier and uh there's also the execution hook, which has been around for a while. um But the thing is: do fs freeze on freezing need uh privileges? Typically, so if we can't get the cube, let's do us we're kind of to do it, we're kind of to help us out we're kind of lucky and that we have handler around which is so more privileged.

C

So the next step after crash consistency is application, consistency and for that um we need a way to qs and qs applications and um because we're virtual machines there's a kind of extra level of work. We have to do there, so um we really need to integrate with the qmu guest agent to have code running in the vm. Do the proper you know flush tables or or whatever, uh to get to the application in the state.

C

um So those are uh kind of a lot of challenges.

B

There it's just.

C

B

Around synchronization.

C

Which I'm sure you all kind of expected cool uh right and.

B

The other initiative.

C

That we're uh going to be working on the parallel is valero integration. So valero is um you know the number one tool for data protection.

C

You know backup and kubernetes right now, so it provides data protection, migration and disaster recovery are kind of the three use cases so for data protection, it's popular a lot of people, that's why we want to integrate with it. It has support for csi snapshots.

C

It has a lot of support for cloud provider stuff, which I think is less important for us, but uh interested to hear if you guys are running on the cloud and what you guys are doing for backups there.

C

uh Migration and disaster recovery are kind of very similar um use cases, but valero has apis to back up to object, storage and restick integration. So I know if you guys have used rustic before, but it's pretty cool, and so you could kind of even on a bare metal cluster, bring up a mineo s3 server and using the reset integration back up your your disks in a relatively efficient way, so uh by default. Valero is pretty good at backing up.

C

You know your entire namespace or your entire cluster, but it's not so great you if we just want to backup, say 1vm and that's why we want to create a cubert backup, restore plug-in for valero that will allow basically to traverse the cube root object graph. So you know valero doesn't know when it sees a virtual machine spec that it has to also back up any pvcs that are associated to it or any other resources. So that's um planned integration there. So um definitely yeah love to hear what you guys are doing for backups. Now.

C

uh How does this sound to you guys? Have you used valero? Do you think this will work? uh I think it'll work, but uh so any feedback would be great um and yeah. So these are kind of the primitives that we're planning to provide now and I'll hand it over to ryan to talk about his use case, and um you know how we're gonna, you know, extend keybridge work for him.

A

Okay, okay, so let's forensics so forensics being like a use case of snapchat, uh so um to expand even on what, when mike talked about with uh live virtual machine snapchat and restore that's kind of what in a lot of ways about being exact.

A

Let's say that uh you know I like old video games, and I have one of my favorite video games of all time depicted here and and for older games. Nowadays those servers don't exist anymore. So let's say you know I I that I want to host a cluster on my whole with this video game, so I can play with my friends.

A

Let's say this runs kubernetes. It runs qvert. I have virtual machines. I have gpus, I have a bunch of allow people to connect as guests to run um in those and those virtual machines, the game and so that we can all play together, so they'll be playing and they'll have a gpu attached to their virtual machine and they'll be playing that game.

A

So um in this scenario, you can imagine where, if they had someone come to our cluster one, let's say perhaps that you know, maybe they weren't really standing there and and they won't really do anything and- and uh you know problem, you know, that's cool, but what happens if I get a notification on my phone or with monitoring saying that it's been reached? Well, that's not really a good thing. You know.

A

I can imagine that oh wow, something is weird happening, there's a person playing the game, but it's a little bit unusual they're, not really doing anything and someone's breaching my network. So what am I? What am I left to do so you know in the scenario um I really want to look into what what's happening with this virtual machine I want to. I want to look into ways that I can examine it further to see what's going on and so that I don't encounter this in the future.

A

So uh what would I look to do um so in this kind of use case? You know I have um cuber here and I have my virtual machine and I have globe. I want to make sure that my virtual machine can't cause any damage to any of the healthy workloads in my cluster, so I find it I'm going to look at suspending it, so I can do any sort of analysis later next.

A

A

Okay, so I mentioned suspend there we go so I mentioned suspend. So what do I mean by that? Like I have virtual machines running in my cluster, the running games.

A

I said: suspend um okay, well, two types of suspended, I'm referring to so pause and save um slightly different uh pause being that the the hypervisor is going to suspend the virtual machine and it's going to store its state in ram until it's resumed, save being that the domain state is going to be it's going to be suspended and it's going to be stored in pers, it's going to be stored in persistent storage, so one big ephemeral, one being a little bit more um uh persistent and right now pause is something that's already supported in q vert.

A

It's a condition that exists on a running virtual machine and there's an api that you can call to actually pause the running virtual machine. So what does that mean in the next slide? Please?

A

So I'm going to pause this first machine and here's what I get I'm going to give a a pause. Virtual machine is going to keep the virtual machine running with all its devices and resources in place. So, like I mentioned, we have a virtual machine. It's got gpus, that's got a gpu attached to it.

A

One of my guests is playing over playing a game in that vm. I pause it everything's going to stay in place. All the resources for it will will remain in place.

A

So with uh that high level suspend is going to uh in this case, spending is going to be um I'm going to be able to suspend a misbehaving virtual machine and I'll be able to resume it later for analysis. So it's perfect that you know that solves my problem. I can keep playing the version, the playing playing my game with my friends and not have to worry about any sort of problems, um and I can deal with it at a later date.

A

So now I've got what else suspend I can spend pass-through and mediated devices, so the physical gpus or the mediated ones. I don't have to worry about that so, but what does this not quite get me? I can't resume multiple virtual machines. What I mean by that is like if I was a team administering this cluster, you know one of my co-workers say they wanted to um also look at this um virtual machine and forensically analyze it. They can't do it. If I'm looking at it, they would.

A

They can't have their own copy without doing additional work. They could clone it if they wanted to, but they would have to do additional work and the same for resume into a different name space. You could have to migrate the virtual machine to different name space. Let's say so. You can log name space locked on the network to do some analysis um and then, finally, which machine spending for a longer longer than a week. Like I mentioned we're holding out the resources you know, gpus are pretty valuable in my cluster.

A

I probably don't want you know if I have um increased load over the weekend. I don't really want to hold on to my paws virtual machine, so it works, for you know, maybe a temporary amount of time. It's also my problems, but maybe not something that I'd want to do long term uh next slide. Please.

A

Okay, so I'm going to talk a very high level and sort of the sort of future work. um You know what mike mentioned. I'm doing live snapshots, so specifically, what I'm interested in describing the case is actually doing um the an online snapshot with uh with save. um So what would this look like a high level?

A

Well, uh when you create a snapshot of a virtual machine, the vert launcher pause the virtual machine, we'll do a hot unplug of non-usb devices and that's important because to an offline migration. This is. This is essentially like a virtual machine migration, so it runs the same code path, so uh the non-usb devices will need to be unplugged, and then we actually execute a save to um to a location on the pvc, and the result is that we save ram and vram disk uh and we'll all end up in the pvc. The next slide.

A

Please, and so the same thing will happen uh for restore, but now we're going to or the inverse will happen for, restore now we're going to um we're going to restore this, but we're going to clone the pvc when we go to restore um we're going to look at recreating the vm with the clone pvc reattach the devices and actually do the restoration on the vm and relocate the ram, and do that restore um and, like I mentioned one of the things we can do with this is now we can actually pick the namespace, because it's a namespace operation, restrict everything to a namespace and do and restrict the network in the namespace and do our analysis in a safer place.

A

A

Please, okay! So uh what does this get us now? We don't talk about save, so we can. um When we save we power off the virtual machine, we free up resource in the cluster, so high level we're able to suspend we're able to restore.

A

We can only suspend mediated devices when we do when we're talking about save and not pass through. I mentioned vram and media device earlier. That's the one exception to this, and then what else will we get out of this? Well, we can resume multiple virtual machines. We'll get this as part of this operation um because of the cloning.

A

We can resume it to a different name space, because this is equivalent to offline virtual machine migration, and then it makes it a little bit more appealing to keep our virtual machines suspended for longer than a week, because my valuable resource those devices, those gpus, aren't going to be consumed next slide, please so to kind of look back and conclude the story um on on my cluster, the person- that's misbehaving, uh I've paused their virtual machine or I've saved it and they're no longer able to play, and- and this is what they're left with they can no longer play games with us in the cluster okay.

A

So that's. uh That concludes uh by example, and we can start with any questions.

B

So if you have a question or ideas about data protection, please just say I want to speak in the chat and I will enable presenter mode for you so that you can speak out loud or you can ask your question via text in the chat or in slack, and I will repeat it.

B

Okay, what do we got here? Okay, this is one from alexander: are you gonna store the memory etc in the disk pvc, this interface in the disk pvc? Or would you need extra pvcs.

A

So, in the case of um so with save, you would be able to the we actually talked about this in from the community meetings. um The we, the first approach we were looking at doing is using the already attached pvc, um so that ram would end up in the would end up in the same pvc, with the assumption that the user attached it that they'll have enough space to store ram in there. um There are other options where we could look at um possibly adding new pvcs um there's a little design document.

A

We kind of went through and discussed. Those are some other options, but the first one would be using the attached storage already.

C

Yeah, I think, um when I did split around is to add uh you know a volume type to virtual machines, vms that is like a uh memory volume type that can be used as a sort of uh scratch space. If you will that when, uh if you want to do a snapshot or save that requires memory, we can dump it there and then take a virtual volume snapshot of that to restore it later.

B

Okay, um are vgpus, considered, mediated or passed through devices for suspend and resume.

A

um I'm I'm talking about them as media devices. Okay,.

B

um Dan wants to know what do you consider the greatest challenges in front of you folks for implementation.

C

uh Definitely the synchronization. um These are all things that are uh kind of. I think one of the challenges is going to be. You know uh going with the community and forging our own way like. I think the community is always going to be a little behind of what we want to do. So um you know some of these synchronization problems that they're tackling uh you know. Can we wait for uh the proper api and implementation to be deployed, or do we have to come up with something ourselves?

C

um I think that's going to be an interesting balance and just the the fundamental challenges there of synchronization. I think um you know what I mentioned before, especially the issue around. um If your application uses multiple disks and they have to be synchronized, uh there's nothing. You can do for that right now. Until we have volume groups.

B

C

Csi is supporting, uh helps us out there.

B

Okay, um jose wants to know, is anybody collecting use cases for for data protection in cooper um and, if so, where.

C

It's a good question, uh that's one of the things I want so, as I mentioned earlier, I think I did a terrible job of promoting the snapshot feature and I hope people start using it more and there's more awareness out there, I'm I I don't.

C

You know, I'm not aware of any.

C

You know use cases that I can share right now.

B

Okay, um uh rob wants to know, should the snapshiny feature be storage provider, agnostic, um uh they're, running rke, plus longhorn, and have had some issues with snapshotting.

C

So yeah, um so that would mean not using uh csi snapshots. I assume- and um I think that's where, uh when I briefly talked about the valero and rustic integration may come in, um stick to uh to uh do know incremental backups of your of your uh pvcs as they change. I don't see us, uh I think we'll. You know our plan is to stick with csi snapshots for now, but I think with rustic integration and valero, you could have a solution.

B

That doesn't use csi.

C