Kubernetes VMware User Group, 1 Oct 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes UG VMware 20201001

Description

October 1, 2020 meeting of the Kubernetes VMware user Group - presentation on stateful application backup and disaster recovery for Kubernetes clusters running on the vSphere platform.

A

Hi welcome to the october one meeting of the kubernetes vmware user group um today on the agenda, we're going to talk a little bit about storage and disaster recovery. My some of my usual co-hosts are likely to be joining us late today. um Miles gray is tied up with some activity related to the vmworld conference, that's going on, but he thinks he's going to be joining us uh using his cell late, but he may be in listen only mode and bryson shepard.

A

Just slack me saying that he's tied up in another meeting for now but uh sounds like he might be joining us a little late and I'm hoping he can join us because we wanted to talk a little bit about uh a recent incident. He had where he was trying to move a kubernetes cluster from one v center to another and took a little bit, but I guess he figured out enough to get that working and I'm hoping he'll be able to share what it took to get that to work.

A

um A little while ago we had on the agenda a topic by gopala about uh what's new in the vsphere 7 update, 1 release he's unable to make it today, but he'll join us at a future meeting, but we still do have um dave smith, uh you cheated. Did I get your name right or I'll? Let you say it again. If I didn't uh and he's.

B

A

Japanese, okay he's going to talk to us about uh the intersection of cloud native storage and data protection. So with that said I'll I'll, let you get started dave. Okay,.

B

So I have slides, but this is a very small group, so feel free to jump in and uh you know if, if you have questions- or you know, if you already know all this stuff or whatever um I'm happy to to move in a more freeform direction as well, um yeah, here's bryson, um so uh I'm dave smith vegeta and uh I'm currently I'm styling myself as the valero architect. So I've moved over to map vu.

B

I've been at vmware about four years now I was the architect on first class disc and I've been leading up the valero plug-in for vsphere for the last year and a half now.

A

And uh dave I I'll I'll jump in, but uh I've sometimes hear from kind of newbies to this, so maybe a little intro on what valero even is just in case somebody hasn't heard of it and also the acronym we're horrible at acronyms at vmware map bu is the modern applications business unit. So this is the part of vmware that engineers kubernetes, as well as a full, build run manage of application things using open source technologies to run on top of kubernetes.

B

Yes and that's a very good explanation, I need to remember that one um and yeah so valero is our open source, backup product for kubernetes and I'll go into.

C

B

Capabilities and some of the internal architecture on that in a moment.

B

Okay, so um there's a kind of an open question or I've heard varying opinions on what kind of data protection do you need for kubernetes? Some people will say you don't need it and other people will say. Yes, you do so. I tried to break it down kind of my view of what's going on and what kind of protection you may need. So you know kubernetes my understanding of kubernetes.

B

Is it started out being pretty much a front end scaler um to scale up and scale down compute and to handle what you would consider to be a stateless app and that may be either something that doesn't actually have any state or it may be a front end for another service that already controls all of its state and has ways to keep track of all that stuff and um and it's already protected. So really, you know your cooper.

B

The thing that's running in kubernetes is really just uh you know, compute you know network, I o, etc, that scales up and down as needed, and uh in that case you know, if you do just keep revision control on your yamls you're. Fine, because that's that's your application um and you you know. Typically, you can kill your application reload your application for the ammos and it should come up the same way and all of your actual state is stored somewhere else safely.

B

So that's that's one model, I think that's that's kind of the original model.

B

um Then we move on to people storing things in a cloud service, so um you bring yourself up in like aws, and you put things on s3 and you put things in dynamodb or whatever services that amazon. You know the cloud provider is giving to you again. You want to have your revision control on the yamls um and you know you may or may not want to have backups and snapshots on the actual stuff in the cloud service.

B

um Some of that will depend on things like scale. um You know as you as you move into extremely large scale. Apps um like take a like consider like netflix, the idea of snapshotting and backing up all of netflix gets a little crazy and that's not really going to work, um and so you know an application at cloud. Scale is usually built to have all of its internal replication, redundancy, etc, so that it will not lose data and it can survive all kinds of bad things.

B

Difficulty is that engineering that into your application is hard and at smaller scales, you know things like you know. If you have a terabyte of data in s3, you know backing that up. Isn't such a bad idea because it may get deleted for any number of reasons.

B

um So that's that's kind of like that's kind of my take on on things with cloud services running on cloud service, um and now we've been starting to push this idea of running uh persistent volumes and running applications inside kubernetes that store their state in persistent volumes, managed by kubernetes and now you know this responsibility of who's going to protect. The data has really moved back to the application.

B

So you know you need to do things like back up the data in these persistent volumes. uh You need to back up your kubernetes resources, um there's state now in kubernetes like which volume was attached to which um stateful set. All these kind of things need to be handled, um and so that's where you really start looking at hey, I want to do backup, restore remote replication of my application, as data and another model that we're seeing is people actually um they're building applications that keep the application state in custom resources in kubernetes.

B

So your runtime state, which may actually be really important to you, is no longer in your original yamls right. Your original yamls tell you how to how to set it up and run the servers and so forth. But then they start. You know keeping track of things in crs uh and you know like where they are in terms of processing, something that might be stored in another service like a database or a an object store, and now that information is actually really important.

B

So you really need when you, when you move to that kind of an app. You really need to back up your kubernetes resources. Even if you don't have any persistent volumes and that's something that you have to make a decision on like on an app by app basis.

A

C

A

Ask you a question: I I'm aware of some of these kubernetes plug-ins that maintain state too, like there's a few cni's network plug-ins that maintain state associated with the uh the uh networking implementation and have you ever seen any big listing. uh I don't know where these things keep their state, because I'm thinking, yeah it'd do the world they service, and maybe somebody who is in the backup thing has somehow tried to consolidate that.

A

But some of those store there's things in ncd, uh in which case an ncd, might get it, although some of them stored in an cd that potentially is not the same one used by kubernetes and some of them use databases- and uh I don't know it uh you- it strikes me that you might want a general purpose tool that could capture all of this stuff yeah. It's really that's.

B

That's something I mean valero will capture it all um some of the tricks are things like. I think, the the really different one. One of the things I see is really different between the way kubernetes works and the way we've traditionally used. Vsphere is typically your application that runs in a set of vms doesn't actually interact with the vc or control plane very much. It doesn't care, it says.

B

Yeah, you gave me a network, you gave me some disks, I'm good, and you know it's not very common that people will write applications where inside of it, it's like talking back and forth to vsphere and figuring out which disks are actually attached to its vm and you know which which distributed switch, is attached and all that kind of stuff, um but with kubernetes.

C

B

Really intertwine the control plane with the application, and this isn't. This is true for more than just vsphere, but like we start putting in things like volume, ids volume handles and these become like globally unique ids, which say, for example, you just take all your kubernetes resources and you replicate them. The volume handles are pointing at the same globally unique id, and if you're, not careful, you will wind up with two applications.

B

You know at best they get, they get locked out. One locks the other one out of the disks, but you know worse. They both managed to get access well,.

A

That's an interesting perspective because I remember going I I go back to those old vm only days and that used to be considered an anti-pattern to embed too much knowledge of the platform into the app and from from what you're saying it sounds like people didn't learn the lesson of history, and maybe some of this stuff went down a path of kind of doing the anti-pattern.

B

Well, I mean it depends on how you look at it, because you can look at it too, like I kind of view kubernetes as being analogous to say an operating system where I would want to. You know, control my processes. I can do that via you know like. If I'm in in linux, I can control my processes. I can list them out in kubernetes.

B

I can do the same thing, but with pods, and I can you know, for example, adjust my my number of replicas on the fly by just you know, slamming a value into the yaml and increasing the size of my uh stateful set, and let kubernetes do the scaling for me, um but.

D

It does require.

B

That interaction between the control plane and the application to do those kind of things not every app does that. But a lot of a lot of apps do.

A

B

Yeah and things like the kubernetes namespace or the kubernetes, namespace and control plane itself is not a self-contained entity. It's pointing out to external resources, which is kind of different from a vm, because the vm is more like self-contained, so so from silvalero is our. Is our backup application? uh It's designed to back up kubernetes resources to object storage. We have a number of different object store. Plugins. We can talk to s3 compatible, including on-prem things like mineo, aws s3.

B

uh We have plug-ins from google cloud, azure a whole bunch of other things um and it's possible to write your own. It's uh it's pretty open to plug those in and we've in fact done. Some of that uh dell emc power protect, has done integration with valero and they've written an object store plug-in so that the valero data goes directly to power protect rather than through some kind of an s3 compatible api.

B

Valero also will snap snapshot and or back up a persistent volume and I'll get into kind of what the difference between those two terms mean to me.

B

Valero runs on every kubernetes platform, so um you know, we've got it running on vsphere aws. If you want to run it in uh mini cube or on-prem uh red hat, open shift, it runs anywhere everywhere. It's not tied to the vmware infrastructure at all or any particular cloud provider.

B

It's part of tanzan and we have integrations going on with tens of mission control. Where you can do backup from mission control. You can control your different clusters. You can say hey back this one up and it'll trigger valero, as I mentioned, we've done integrations with dell emc power protect. So this is actually the core of the power protect, kubernetes, backup and red hat uses it in their openshift as their backup application. So it's gotten quite a bit of traction in the market.

B

At this point and I'd, say it's, you know it's it's probably it's gotten a lot of mind shift. Probably the you know. It's like one of two backup applications at this point.

A

Yeah and just to make it really clear, it is open source, so you don't have to use any of these commercial distributions. It's just that if you do it's available.

B

Yeah and it's integrated so it so, but yeah all the source code is there. You can look into just about everything and we're very committed to maintaining that open source um stance with the with the product and we've even done that with like the the vista plug-in we've kept as much open source there as we can and.

A

It is its own cncf project too, so that if you want to go down the open source path, you'll find that it is a cncf project. It's got a slack channel on the kubernetes slack. I believe yes,.

B

Yes, and- uh and you can also get paid support for it through uh through tanza, if you'd like that, um so basic capabilities for valero, it can back up all your cluster resources. Everything in the cluster uh you can, you can just say, backup everything you can back up individually, individual name spaces, you can select by label and the label could, for example, uh span all of your namespaces and look for things with that label everywhere in the cluster it gets installed inside the cluster. It is cluster scoped.

B

At the moment, um we can back up our persistent volume data, either using something called restic or snapshot plug-in and I'll go into a little bit more on exactly what drastic and the snapshot plug-ins do in a minute. uh We have things like execution hooks which are just things that execute and uh you can use them for quieting an application, or you know anything you like. Really it has a backup schedule, it's pretty basic, but it does.

B

um You can control it from the command line or by writing crds uh the command line, basically just writes crds, and then the server reacts to them and we can restore into new clusters. Our name spaces, or into an existing clustering space.

B

So this is kind of the overview of the valero architecture um without any of the the fancy stuff- and you know plugins and all the rest here. So we have um the valero, cli or cube ctl can be used to control it. These write custom resources for bolero things like backup, and then the valero server runs inside its own pod and it reacts to these uh custom resources and it orchestrates things um it talks to the api server serializes things there.

B

We also call plugins for various things, and then we have restic is a open source, backup file system to s3 application. We've integrated that inside of valero, so that we can back up systems that do not have snapshotting capabilities and that's um kind of a scale out. It's not really scale out it's it's distributed, um it's controlled by these crs, so um yeah, and actually this is kind of an interesting pattern here, because you know I was talking earlier about backing up the application state.

B

We actually don't want to back up the valero application state because you know when you back it up it's in the middle of doing a backup, so we actually filter that out. While that's happening.

B

So the basic thing that a valero backup does captures all the resources from the api server runs through them, serializing them into a tar file in the object store, and we can do things like modify like we'll strip off a bunch of the labels.

B

Time created stamps things like that and it's possible to extend how those are handled and you can select whether or not you want to back up your persistent volumes. Those may get snapshotted via plugins or they may get backed up to rustic execution hooks. It called at the end.

B

We serialize the info about these snapshots or rustic into the tar file so like with snapshots. Your valero backup consists essentially of all of your kubernetes resources and then a list of snapshot ids for persistent volumes.

B

Anything else we've got, you know, we've got some other metadata and this all goes.

C

B

The object store.

B

On restore, we pull that that tar file that we made down from the object store, look in it and uh we start restoring objects and there's a defined order for it. So we'll do things like custom resource definitions, first name, spaces, storage classes. Then the volumes then persistent volume systems volumes claims are built out of those and we get down towards you know like pods. So you know. The goal here is to get the storage in place before the pods get started.

B

Otherwise it gets a little wonky and then there's you know pretty much all your other custom resources just get restored at the end.

B

So I've been talking about rustic versus snapshot plugins, so restick is, as I mentioned, it's a file system to object, store, backup application that we use. um It is storage system agnostic, but because it's storage system agnostic, it doesn't do things like take snapshots and it's basically walking across the file system. So you have to be careful depending on your use case, like um one of our our sample cases is a log file backup you don't need to quiesce anything because you know pretty much rested, grabs the file and the file's being written sequentially anyway.

B

So anything that you miss is just going to be at the end of the file and you're fine. um But if you have an.

C

B

That's randomly writing in a file and we're walking across it sequentially. You can get pretty bad results um without a border uh rights being applied. um So in order to be safe with this, we need to actually class the application or you can do something like execute fs freeze via an execution card.

B

um This goes through the file system, and so can.

A

I ask you something dave absolutely that that application uh hook do those also potentially do things like log truncations and things or are they just kiosks.

B

No they're uh basically run this command in this pod. Okay, so they're pretty open, they're they're, pretty uh open-ended.

B

Yeah, so if you want to do a long rotation first, you could definitely do that.

B

um So that's the the rustic path is kind of the fallback path, and then we have a number of volume snapshot plug-ins, and these are things that are when valero.

C

B

If you've got this configured and valero finds a persistent volume, comes to the volume snapshot and says: hey snapshot this volume for me and then the volume snapshot, plugin interfaces with the underlying storage system does the work.

B

So these are crash consistent because they're snapshots, but it's a per volume thing. So there's no volume groupings we're not getting cross uh volume right order, fidelity yet um and then valero's view of the world is essentially the ebs view of the world which is hey. I told ebs, amazon, less block storage. I told ebs to take a snapshot. Ebs says: okay, here's your cookie, here's your snapshot id and then it does all the work internally to make sure that your data is safe.

B

So it will go ahead and move that data to s3 where it gets replicated all over the place and the life cycle of the snapshot becomes independent of the storage object. So if you lose the storage or even if you delete the object, your snapshots are still good and you can clone new volumes from the snapshot. But this all happens. Internal tbs.

A

So can I ask you, just I think I know the answer, but just to confirm that the advantage of this volume snapshot integration is that for certain classes of stateful, like databases, you typically want to ks, and you want that kiosk to be as short as possible. So you kind of temporarily stop the application from doing its normal day job, and if you have this snapshot, assist that snapshot freezes it and the backup happens from the snapshot. So you can resume full operation pretty quickly. Is that it? Yes.

B

And it's and it's typically, you know, the snapshot. Operation is much faster than walking the file system and copying the data so, um and it may be much faster to restore from that so like in ebs. You know you can do an instant restore essentially because when you say create a volume from a snapshot, it comes back pretty quickly with here's your volume and then it fills it in on demand in the background versus.

B

If we go through the rustic path, we have to write all the data into the volume before we can start the application.

A

Now, on a platform like vsphere with the snapshot, you generally don't want to keep those around a long time. Is there anything built into valero that would keep an inventory or put an upper limit on how many of these live? Well, we did we, we took care of.

B

All this when we get to the the vser plug-in, but yes that was a challenge that was some of the fun, because our plug-in is significantly more complex than say the eds plug-in, because ebs does a lot of the work for valero and the ebs plug-in is like. I don't know, 500 lines the visor plug-in it's. I haven't done a code count yet, but it's it's up there, because it has to handle a bunch of things that um aws would have handled for us and then typically, the block. uh The volume snapshot.

B

Plugins are block oriented, so we're taking a block level snapshot of the file system.

B

So the valence plugin for vsphere is essentially a volume snapshotter, so we handle snapshotting and backup restore of the data. So if all we did was take a local vsphere snapshot, if you were to lose your primary storage, you've lost your backup. So that's not a backup.

B

We started referring to these as like local snapshots versus durable snapshots. So when I talk about the ebs snapshot, I think of this as a durable snapshot, because I can take this. I can hold on to this cookie and I rely on the storage system to do the right thing for me underneath- and I don't you know from my from from the code's point of view- I don't.

C

B

To worry about what's happening under there, so we pretty much emulated that we took that same view of the world inside the the plug-in and we're looking at where exactly all this needs to live longer term. Whether things are pushed further down to these sphere, whether things move into valero.

C

And kubernetes.

B

We're not we're not sure yet, but right now, what happens is um the the plug-in for vsphere handles snapshotting and it also moves the data into s3 storage. uh We've only got s3 apis at the moment. uh It is doing full backups only we're looking at what the right thing to do. There is there's some commercial solutions that integrate with us that handle incrementals properly.

B

We have a scale out data mover so that, essentially, what will happen is the valero backup goes to it says: snapchat snapshot snapshot snapshot and we want to get that done as quickly as possible, because we want all of our data to be as close as possible in time as it can be even.

D

B

You know they're.

D

B

Multi-Volume consistent at least they're close, so you know baby steps we're. Currently uh the release version is currently one zero. Two and this is handling uh vanilla kubernetes. So this is the kubernetes installed in vms uh running on vsphere and using the csi driver and the fcd cns stuff underneath.

B

So basically, this is pre-project, pacific or v-surf kubernetes.

B

We're currently in the process of finishing up the 110 release, which will support, uh project, pacific and I'll go into the stuff we had to do in there to get that working, and uh so this will be. This will support all flavors of kubernetes on on vista.

D

Hey dave, can you talk a little bit about cbt support in the future and some of the ideas around how incrementals are being conceptualized and how they might be handled anything about uh any anything about that aspect of it.

B

So um what we're doing right now is the plug-in lives on top of the adp, so uh cbt is available and uh for us it came down to less a matter of doing the change block tracking, because the apis and everything and all this is- is traditional vsphere. So we didn't change anything for how cbt works on the beaster side.

B

um It came down to us like a matter of how much resource are we putting into the repository, because in order to handle incrementals properly, you need a fairly sophisticated format that lets you like merge down incrementals into your folds, um handles things like deleting incrementals in the middle of your sequence, we're looking at what, where to put the resource in terms of the open source plug-in versus the commercial offerings, so power protect, for example, takes advantage of cpt with the plug-in and uh does incremental backups into power protect just like any other vmdk.

A

I'm going to step in here guys, just for I I know, we've got some audience members who these acronyms won't mean anything to so they threw out v the adp. That's the vmware.

C

A

For data protection, so it turns out that at the storage layer of vsphere they built in an api that goes back, maybe a decade to support these operations for backing up vms, but they still are applicable to even when you layer, kubernetes on top of vms. As part of your kubernetes deployment, the cbt is called change. Block tracking. The idea here is that in practice, most of these stateful apps, like databases, don't actually rewrite the whole database every day.

A

So, if you're doing daily backups, the general stats are that on average you're going to find only three to five percent, maybe even far less than that, actually changed from one day to another. So dating back to the vm backup days.

A

People tried to optimize this out so that if you uh could manage to make your backup essentially a delta of the previous day's backup, uh the backup would uh take up far less storage and there are actually ways to create what they call synthetic full backups out of this, so that, even if it was just the incremental changes, you could still very quickly re-and reinstall the full image of these there's, even a full for a potential optimization on the restore where, if you know that, oh you, you need to roll back, you can use change block tracking to get the delta between today and yesterday, and only overwrite the things that would take you back to yesterday.

A

If that was the goal so anyway, just wanted to put that in for the benefit of people trying to follow along, who aren't always up on some of these acronyms. So we have do. We have like a vmware bingo.

B

Card you know with all the little acronyms. How many did you use today.

A

I think it would be a book not a card.

B

um Yeah so, as steven was saying, steve was saying like things like uh full incrementals, you know. If you go back to like the tape days right, we used to do a a weekly, full or monthly full backup and then do daily incrementals and like a weekly, incremental versus the full and in order to get back to a certain state. You'd start with your full backup and then start applying incrementals.

B

On top of that, and that's just lots of fun, the synthetic full backup is really what you want where the storage system has merged it all down for you, so that when you do a restore, you don't have to do a full, incremental, incremental incremental incremental, but you get everything back in one state, um but that does require a lot more um sophistication in the back end, because we kind of treat like in the current implementation. We basically treat s3 like tape. We write things into it.

B

We write you know fairly naively, and then um we don't have a sophisticated format in them. So that's things to do.

B

So this is the the 102 architecture, which is something that is um much it's simpler than the 112 architecture, um so this architecture say we're doing a backup. So we we tell the learner to do a backup. It runs through it's doing the stuff we're talking about before serializing all of the resources from the api server. It comes along, it finds a persistent volume and it calls us in the plug-in and says: hey snapshot this.

B

So we go ahead. We talk to vsphere vcenter and we say snapshot this fcd and that bubbles down to the host and there's you know, a storage of a typical uh vsphere snapshot is taken and, depending on what storage, you're you're running on you'll get different results.

B

um You know on straight vmfs you'll get a redo log on a vivo you'll get a vivo snapshot, but from our point of view it's all the same and you get it's compatible with all the storage and the same way that regular vm snapshots are and in fact uses the same mechanisms for the mdk snapshotting. That of vm snapshot would there's no changes in on disk format for fcds, except little metadata. That we had um so then we say: okay, we have a snapshot. We give back a snapshot.

B

Id and fcd ids are globally unique ids. The snapshot ids are also globally unique id. So we get back this pair and we say that's your snapshot. Id and valero goes ahead and puts that into its tar file and then ships all that off to s3.

B

Now what we're doing is in the background, so we said okay. Well, we took the snapshot, but now we have to upload the data someplace. So we uh we have a similar architecture uh to read the app. We write a custom resource that says: hey upload this disk and we have a scale out data manager. So there's multiple data managers running in the system and they you know, go ahead and do a little leader election on. Who should do this disk or this upload request, and then they connect to the snapshot.

B

Data using vadp and nbd is network block device. It's a over the wire. It's a it's a tcp ip based protocol for reading the blocks out of a snapshot or a vmdk, so it kind of comes in from the side rather than coming in through the vm there's. Another technique called hot. Add that we aren't using at the moment where you actually attach the snapshot to a vm, and you do read, writes via the regular I o path.

B

So the data manager then goes ahead and moves that data out to s3 stores it there and then, when it's done it removes the local snapshot. So that was your question earlier steve was.

B

We don't really want to leave a lot of easter snapshots hanging around, and so we don't so what we do is we move the data out? It's now. You know in in my parlance it's a durable snapshot at this point, because the life cycle is completely independent of your uh storage. That is, unless you put your s3 server in the same uh data store and backed up into it, and please don't do that.

B

So that was the one zero um architecture and that handled the vanilla, kubernetes case and uh kubernetes with vser. That's the correct term right or you know, I call it project pacific um and uh that has a bunch of changes and how kubernetes works, and um we had to um accommodate that.

B

um So one of the things that changes in project pacific is the level of permissions and so forth that are available in the supervisor cluster. So project pacific um takes on the uh integrates kubernetes more into the vsphere control plane, and we have this concept.

B

That's called a supervisor cluster and the supervisor cluster is a special if you like, uh kubernetes cluster, that has a very tight integration with vsphere and it's designed to let you manage quotas on on various groups, give the vsphere admin more insight into kubernetes and allow the kubernetes developer or devops person to work with standard kubernetes apis, um and so uh it's designed to handle things like multi-tenancy and isolation in various ways and one of the things that it does is. It does um restrict all the permissions that are available.

B

You don't get cluster admin, for example, as a as a user as an end user of the cluster. There are certain resources inside the cluster that need to be handled through the apis. They can't simply be written as uh as jammels.

A

I ask a question: absolutely your slide says it installs valero in the supervisor cluster. Is it fair to say that the typical mission of it, even though it's installed in the supervisor, is to back up the guests and to go further? Is there any reason to back up the supervisor cluster or is that something that could be recreated.

B

Well, it really depends on how you're using it, so um we do have the ability to run user workloads in the supervisor cluster um we have. The um you know. Paul was supposed to talk about uh data persistence services was that the that the new name for it um and those are going to run in the supervisor cluster, so we're setting ourselves up here to um to back up these things that are living in supervisor cluster and there's some shared infrastructure between supervisor and guest.

B

I'll show you in the um the diagram there um that services guest cluster from supervisor cluster, but there's also the ability to do things straight up in supervisor. Cluster.

A

Okay, thanks uh just another point time check: if maybe you can fit the rest.

B

A

Minutes or so ish, I think if we flip through, we.

B

Can move along I've only got like three or four more slides, um so the the pieces, the new pieces of the plug-in we added this valero app operator. That is a um was it so it's it's an operator, that's that you can install via vsphere, and this gives certain capabilities that we needed.

B

This is the same mechanism, we're using for data persistent services and eventually we're planning to open this up to a lot more applications to run in kubernetes on vsphere there's. This backup driver that got added in- and this is a common piece of infrastructure between the guest and supervisor- and this is we've moved our code that actually handles the snapshotting out of our plug-in proper into this backup driver, and this also supports this para virtualized architecture, so guest to supervisor.

B

So the guest cluster in project pacific is designed for uh really strong isolation, and one of the the key points was that things running in the guest cluster should not talk to the center period and what we've done is we've mediated this through the supervisor clusters, kubernetes, apis and there's. This pattern called a para virtualized driver where there's a supervisor cluster resource that backs a guest cluster resource, so with persistent volumes, for example, in the guest cluster in vanilla and supervisor cluster.

B

The persistent volume has a handle volume handle and on vsphere that maps to an fcd id which uniquely identifies a disk of the mdk, but that requires interaction with vcenter, because that's where the database of where the fcds lives is, and that's where the apis for connecting that fcd to a vm. So you can actually work with it live in guest cluster, rather than doing that, we built the spare virtualized model and the pv in the guest cluster has a volume handle which is actually a pvc of persistent volume claim in the supervisor cluster.

B

And so when you go to allocate storage, the para virtualized csi driver first creates a pvc in the supervisor cluster and then you know all the usual mechanisms kick in there. Things like quotas and all that happen and when the pvc is allocated is bound. Then the guest cluster can bind that name into the pv and then the usual kubernetes things happen. um But this way there's no actual communication between anything in the guest cluster and vcenter at least no direct communication.

B

So we wanted to keep that we had to keep that model, and so we did a similar thing with our snapshot, backup requests.

B

um We also introduced the concept of a backup network here based on some other stuff and that lets us isolate the traffic for the backup traffic uh from the management network and it lets us nail down who can talk to vcenter from the supervisor cluster. The guest cluster does not have access to the backup network, at least it shouldn't.

B

um So here's the big, crazy diagram of the new architecture. um So, let's just walk through what happens in guest, so in this, in this setup, you'll have valero would be installed in the supervisor cluster via the app operator that gets all the pieces. Parts for valero installed. The plug-in backup driver um the data movers.

B

Currently, you have to install those as vms from ovf's yourself and we're working on fixing that, um but you have a scale outside of data movers there um and from the supervisor cluster things work very similarly to the way they worked in vanilla in the guest cluster uh you'll. Also install vanilla, you'll also install our plug-in. It will detect that it's running in a guest cluster and it gets some credentials from the supervisor cluster that allows it to write uh records into the supervisor cluster. So we go ahead.

B

We go down to the run the yeah. We do a valero backup in the guest cluster by talking to um the valero server. There.

C

B

Writing a cr in the um in the guest cluster valera does the usual thing: serializing the resources when it comes to a persistent volume. It calls our plugin the plugin. Now, rather than going directly to an api, writes a resource for the backup driver, saying snapshot this volume in guest cluster. The backup driver knows that it's para virtualized and it takes that record and it goes ahead and writes a record for the pvc in the supervisor cluster. For that backup for and then the backup driver supervisor cluster resolves a pvc down to an fcg id.

B

Then it goes back to taking. You know, takes a vcenter snapshot, says hey. I took a snapshot, things bubble back up to valero and in the supervisor cluster it starts doing the upload in the background, and then you know when the upload finishes. It removes the snapshot, so that's kind of a lot of ins and outs and round and rounds. Does that make sense or any questions on this thing.

D

Is there support for backup or upload stream resumption if the upload stream is broken? Yes, there are okay, yes,.

B

So we do uh acetate multi-part uploads. um If the data mover crashes or you lose the network connection, we come back and we try again. So it's relentless forward progress is the kubernetes sketch word right.

D

What type of check summing is there to ensure that the artifact that's backed up matches? What was actually taken? Is there some sort of hashing that occurs to ensure that that's uh valid and intact.

B

Not at the moment- and that would be a good thing to add.

B

So we're pretty much relying on all the checksumming and you know, s3 being reliable.

D

Always a dangerous proposition.

B

Yeah we've got to start somewhere.

B

um So this is a view of the backup network that uh is customer defined. So you know you can just hook all your pods to the management network and everything works. Fine, but you're not supposed to do that.

B

So uh we asked for that originally and uh we've gotten some pushback on that, and so with some new changes in vsphere, there's an ability to add a nick that gets a uh a vsphere nfc label on it and that will put all of the backup traffic and vd bracket traffic onto that nic, and you can attach that to a separate network.

B

You can attach your data manager, pods or vms, that are doing the scale out movement you can attach those to that network and we also need to be able to bridge or route traffic to vcenter would be center only from these guys so that they can access the volumes and we'll have instructions on how to on at least a sample setup for this.

B

But everybody's network situation is very different, um so that's kind of the the run through on like valero for backup um you can also use valero for migration um so with the uh object store. So if you use rustic rustic is storage system agnostic, so you can go ahead and back up to an object, store and then restore that to pretty much any kubernetes cluster, because it's not it's not tied to any particular uh storage technology.

B

uh If you don't have shared storage vsphere to vsphere, you can obviously use the plug-in. Take that to s3 and then back down and then we're looking at. How do we migrate uh clusters that actually have shared storage underneath- and I think bryson was- was doing some things with that and we could discuss how successful that was and what craziness uh you.

A

B

To do that, that's.

A

A great segue is this:.

B

Your last slide dave, I have one more, which is just like futures and resources, so you know we can come back to those later on.

A

Okay, so bryson uh on slack, we've been having a chat about an incident you had where you were trying to move. I think you were trying to move a kubernetes cluster as opposed to move a workload to a new cluster. Is that true and had some issues.

C

So I wasn't actually moving the cluster. It was moving the vcenter that was controlling the vmware cluster, but in doing so we had to make changes to be able to point to the new vcenter um also had issues when that happened, not knowing who was actually doing the moving, how they did it, but where the vms were coming up with uh different uuids.

A

So moving the vcenter that you're talking about the vcsa, the vcenter server appliance, was migrated.

C

No, no, there was a new one. Okay and the cluster was moved to a new one. Okay,.

A

B

It yeah you're using the um the entry driver at the moment. Yes,.

A

And I think you told me on slack that it came down to needing to uh update the vm uuid in a couple of places. Maybe you can describe how you did it in case? Somebody else runs into that.

C

So there's several things you can run into.

C

I mean we're getting into talking more about pvs than we are talking about uh like valero and stuff, but um the sorry I'm on call- and I just got an alert- uh the storage um communicates.

C

You can communicate with the storage through the vcenter, so they're, one of the first things you need to look at is your permissions. Well, first of all your cloud config. This is for the entry.

C

So specifically is the cloud. Config is pointed the right vcenter people can move you to a different vcenter and not tell you um and then uh then, once you know that's on the right one. Sometimes if they move you to a new vcenter, they may not have set.

C

The permissions correctly, don't need to make sure the permissions are set up correctly and then, um then, even with the permissions set up correctly, we've we've encountered something weird lately with our pvs, where it seems like there's still some connection to the vcenter, even after restarting the controller manager and the cubelet on all the nodes.

C

So we've found that there's something in the control plane, that's still still connecting there and we found the best old d center or the newbie center.

C

This would be even without changing the vcenter say. The permissions are wrong and they change them and they say they want you to retry.

C

It's like it has a still connection and and like it doesn't get the updated permissions until you restart that connection.

C

So I don't know what service there is. We thought it would be.

C

The controller manager um and uh we started retrying restarting several things and then the thing that we found that just ultimately works is just rebooting each of the control plane nodes and then that so that that's, if, like there's uh differences with the permissions, um but then the other thing we ran into lately is where their it can happen in a couple different situations, but essentially the vm is registered with a vsphere and it gives it a new uuid and the deceptive thing there is when you look at your node um from the kubernetes side.

C

If, if you look at that, the you can there's actually two places in that node definition for the uuid and in one of those places it automatically gets updated and the other place it doesn't. So then you end up with the controller manager, saying hey. I can't find this vm and hey. I can't find a vm with this uuid, um so the fix there was to correct that second place in the in the the node definition, and um then it was able to find it and then it was able to.

A

So where is this uh node definition located.

C

Oh, it's just like a get nodes. uh Keep control, get nodes; okay,.

A

So you can use cube cuddle too.

C

Yeah keep control edit node and uh update the uuid wait. I can't remember you might have had to patch it. I can't remember.

B

Yeah, it's kind of an interesting uh thing to happen there, um where the vms, so the vms running kubernetes, were simply switched from one vcenter to another and otherwise they're pretty much running the same thing and get all the same volumes attached and networking all that, but um yeah definitely there's that interconnection between the kubernetes control plane running inside the dms talking out to vsphere and vcenter and uh yeah. That's uh definitely an interesting case. There.

A

And I asked you know why this is coming up, that you want to move vms from one feed center to another.

C

I mean it could be something as simple as you're running a different version of vmware and you set up a new vcenter with the new version and you can move it over.

A

C

I got it or you're having issues on one vcenter and you want to try it on another vcenter or you're migrating to new hardware, which maybe just build out a new center with new hardware.

C

Yeah one other thing that we saw specifically going back to like the persistent volumes was trying to use the technology zones.

C

Was that we found the same thing if we updated the storage class with the new topologies um and and the tags were all set on the view center, it would report back that it couldn't see them until we and again we restarted the controller manager in several different controller plane um uh services yeah. We found that we just had to reboot the control plane nodes to ultimately get like a connection restarted.

C

That was like oh yeah. Now I see him yeah actually did you move so when they moved the dms, did they just they didn't shut them?.

B

Down situation.

C

I've had various situations lately yeah merging all of them together. So what was your question.

B

No, when they moved the vms, they didn't restart them.

C

No, they were like a hot.

B

They stayed up the whole time yep so that that explains at least you know marginally, why the um the connections remain to the original b center.

B

So they didn't, they never got an indication, they don't they. They didn't really.

C

B

Got you know they didn't get when you restarted them, then they reread their config and attached to the new vcenter, but the network connection would have remained open to the original view center.

A

I mean I'm ignorant here, but uh can you fill me in on how you'd manage to do this to move a vm across recenters without well? It was still running. Is that moving the whole esx host to a new v center, or is it some form of that.

C

B

A

Should do it right? Okay,.

B

Because it's is the vm motion is really a host to host thing, so you just coordinate between the two hosts and you know they push the memory state and the processor state over and eventually they go. Hey flip the switch. It's pretty.

A

Cool okay! Well, this! It strikes me that I'm conjecturing here, but maybe this is just something that uh the team didn't think of when they were writing test cases for this. But we'll take this back and uh it it does make sense. Now that there is utility in being able to support this scenario just to minimize uh outages and impacts when you move or update vcenters. So.

B

Yeah, I think we're going to see is a lot of you know when you come in from the kubernetes point of view. That's not necessarily the way you think about things vms. You know coming out of the cloud world. You would never move an ec2 instance from one aws to another. You simply can't move them across regions and other than that.

B

There's no change in the control plane, whereas vmware you know we, we can do a lot of these things and, if you're, looking at from your vi admin point of view, it's like that's a vm. I move the vm. What's the problem.

A

Okay, well thanks bryson, let's maybe I don't know if anybody on this call is going to have that use case, but these things get recorded. So you never know we'll try to maybe backfill this even into the documentation to save the next person and uh for uh things we can do going forward is maybe make it not so difficult to support this. uh This, this kind of pathway.

B

Yeah, I think, like a simple thing would be if it reacted to the change in the config and immediately dropped the connection, and we restarted that would that would probably be the expected behavior that you wanted.

C

Well, that, or or even just after restarting the controller manager, I kind of expected that I might have to restart the controller manager to pick up the new change. But the fact you know restarting that didn't restart that connection.

A

Okay, well thanks we're at the top of the hour, but I've got a couple housekeeping things before we go the first one um we did. We, this group maintains a youtube playlist and there was a slight delay, but the cncf had their recent online conferences. The cloud native china and the kubecon europe and the video recordings of those sessions have been posted.

A

The one in europe was a summary of best practices for running on vmware hypervisors, not just vsphere, but we also covered the so-called desktop hypervisors fusion and workstation, and then the china event covered migration to out of tree cloud provider and of course, csi is the out of tree implementation of storage.

A

The final thing is going forward: kubernetes project wide they've had incidents with zoom, trolls and uh they're tightening down security, so we have to have a passcode on future meetings. I didn't want to do it today, so the rule is actually you either have a passcode or a waiting room, but a waiting room is kind of intensive for people who join late because you've got to pay attention to the participant list and it's easy to have somebody floundering. There, when you didn't notice them so I'm going to add a passcode.

A

Let me pause the zoom recording just in case the trolls watch the resume the recording.

A

So unless anybody's got any final items, I'm going to close this down till next time and one thing that you're free to bring up if anybody's got ideas for topics that they want covered in the next meeting, which will be first week of november, we've got one carryover where gopala is going to return and cover the changes that have happened with vsphere 7 update 1, but I think we'll likely have room for one or two other things. If anybody would like to suggest something either now or write it in the notes document later also dave.

A

If I can get a link to your slide, deck um I'll go paste it in the uh notes document uh after the meeting ends.

B

A

B

A

You, okay thanks, so going going on last call for any final remarks: okay! Well! Thank you! Everybody for attending and um I'll get the agenda, notes, doc, updated and uh the recording posted within a few days.

A