Kubernetes Storage SIG Miscellaneous, 10 Nov 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG Storage - Volume Populator Review Meeting 20201110

Description

Kubernetes Storage Special-Interest-Group (SIG) Volume Populator Review Meeting - 10 November 2020

Meeting Notes/Agenda: -

Find out more about the Storage SIG here: https://github.com/kubernetes/community/tree/master/sig-storage

A

All right so um hello and welcome to the volume populators uh weekly meeting for the kubernetes storage sig um yeah. This is a weekly meeting that we've been having for uh going into detail on volume populators.

A

um This is the fourth meeting and, and uh we we've covered a lot of material. I think in our first three meetings and made a lot of useful decisions, and I don't have any agenda items for today, other than uh sort of review and answering questions for those who have missed the original meetings and uh just want to understand what the idea is so um who here is new that hasn't been one of these prior meetings.

B

If you're new, okay.

A

Anyone else is this: the first meeting.

B

uh Ben, I think this might be the first time that I'm attending this meeting as well. This is true.

A

Okay, welcome john yeah, um so john. I know that you attend the data protection working group meetings and- and I have covered some of the volume populator stuff in those meetings like like half a year ago. uh Early on, I was sort of showing what I was working on. So this is a continuation of that work.

B

A

I guess uh before before I launch into the review, uh are there any specific questions or any areas where people want me to focus on.

A

Nothing specific shane, uh you, you seem to have a you know something in mind that you wanted me to cover. So uh what what? Where did you want me to focus? Just the motivation.

C

Just to like explain uh like what's the: if, if we don't have this currently, what does uh becca bender has to do.

D

C

D

C

Like what's the benefit I mean for me, I think the one thing that I found um very important for this one is the the placement part. I mean the the way for first consumer, one sure I don't think that's possible without this feature.

A

Well, yeah, so so I let me I'll talk about the motivation uh and how how I arrived at the idea and what the alternatives that I did consider were, because I think the alternatives or the are the what will answer your question when we think about other ways to accomplish this and how we got here. um Okay, so yeah the people have been talking about data populators for various purposes, for even before I got involved in this this feature area um there.

A

There were various potential applications for data populators, but the way that I got interested in it was from thinking about doing, backup and restore. um I actually, I guess it was about. A year ago now uh wrote you know, devised a whole prototype api design and implementation for kubernetes backup and restore where I invented my own crds for backups and my own controllers, and I started making changes to the to the csi rpcs and modifying csi plug-ins to implement them and, as all the pieces were coming together.

A

The place where I got stuck was when, when you were trying to take a backup and turn it back into a pvc when you're basically doing that restore, um I realized that to implement what I wanted, we would need to change the core kubernetes api, because there's a there's, an emission controller in kubernetes that validates pvc data sources and, if they're not either another pvc, because you're doing a volume clone or a snapshot because you're restoring a snapshot.

A

The emission controller just drops the data source field entirely and makes it blank if you put something other than those two things in there. So so I started thinking well. Okay, what I'm gonna have to do is propose the whole backup design and get everyone to agree that, like this is the right design and then I'm going to have to go to the kubernetes api, uh sig and sig api and uh ask them to say: can backups be a new feature, gate right and I started thinking well.

A

This is kind of nasty because we have two already I'm going to be proposing a third.

A

I know that there's other people that want to do other things with regard to you know populating volumes with you know basically creating a volume, that's not empty a volume that has something in it where that's something it could be a variety of things, and so so I started thinking well.

A

Is there a way that we could address all these problems all at once um and and so the the proposal I came forward with back in 118 was the any volume data source feature gate which basically said okay, instead of having a different feature gate for every type of object that could be the data source of a pvc?

A

Let's just add one: that's like a wild card so like. If you enable this feature gate anything can be the source of a pvc and then and then.

E

A

Have to bug the sig api guys anymore, when you want to add new data sources. The problem with that is, of course, there's all kinds of objects that are not going to be valid data sources.

A

For for volumes right I mean there's a million crds out there that people have created for different purposes and and nearly all of them don't make sense to be the source of a volume. So we needed a way to um determine which things are valid, which things aren't valid and that they aren't valid to provide feedback to the end user like you would get normally. If you do something, that's not going to work um so that uh so you don't. So you don't sit around waiting because you know in kubernetes.

A

You know. Sometimes things take a long time to to work, but, like you, don't really know if they're going to fail or if they're going to eventually succeed, and so you expect to get some sort of event that tells you what the current status is. So so so I started working on a you know.

A

A way to to provide feedback to users about is their pvc going to bind, or is it not going to bind and um and I'll go into in more detail about exactly how that works in a minute, but um but the with regard to the motivation I mean: does that make sense to everyone that you know we could have just said?

A

We know what we're going to implement, backup and restore we're going to create a backup crd, we're going to add it to the official list of supported data sources with its own feature, gate and then gone off and said. You know this is how we're going to do backup and restore where we're like that was just one additional specific crd that you're allowed to restore from and and is it? Is it clear why that's not a great way to do.

A

This you guys might all be on mute because I I don't know if this is making sense or not.

E

Yeah, it's it makes sense, okay, otherwise it will get crowded right. If you keep on increasing the feature so.

A

So so shin you follow this part of it right that we don't want to have to go back to sig api. Every time you want to add new data source. We want to have like a generic way of handling this yeah right, okay, okay, so so that was the uh that was on the api side and sort of the user interface side. We need a way to allow other things to be data sources, but to still provide feedback to end users.

A

When, when the you know it doesn't look like the thing that they provided is valid, and so the just a really brief recap of how that's supposed to work. We looked at doing a validating web hook. That would reject data sources that were not. It didn't match a specific list of registered data source crds.

A

But we didn't like that because it would change the behavior such that if you created a pvc with a data source that pointed to something that wasn't registered, yet it would be kicked out entirely, and so the actual plan is to implement a controller that will simply post events to those pvcs. If the data source of the pvc is of a kind that doesn't appear to be a registered valid data source and we're going to have a new crd called volume populator, which will be how you register these these, these uh valid data sources.

A

um So then all that leaves then is the implementation side. So so now we assuming that all this goes through and we have a. We have a way to well, actually the feature gate's already in so so. If you use the feature gate, you can invent a new crd called volume, backup if you will and you can implement- and you could.

A

Well, you could implement the the way that that you restore from backups in a variety of ways, but kubernetes won't stand in your way. Right, you'll be able to create a pvc with the data source and any controller then will be able to react to that and create the appropriate pv that has your restored backup.

A

So the the two approaches, then for actually populating the volume, would be one we could. You know right now.

A

The only way that you get dynamically provided volumes is with the external provisioner sidecar any any new volume request goes to the external provisioner sidecar and it looks at the data source and it says: okay, if it's empty, I'm going to ask the csi plugin to create me an empty volume if it says, there's a snapshot, I'm going to ask the csi driver to restore the snapshot and if it's a volume, I'm going to ask the csi driver to clone it and if it's anything else, just going to ignore it.

A

Well, if you imagine that we invented some backup feature, you could just have another elsif down at the bottom of that check inside external provision. That says, if it's a backup, then ask the csi driver to restore the backup using some new csi rpcs that haven't been invented yet right. That would be one way to do it and just to say you know what this is another csi problem, all of the work of restoring backups is going to be the responsibility of the csi driver, just like destroying snapshots, and that's it.

A

We could do that like that is. That is a choice that we still could make today if we wanted to, but I really don't want to do it that way.

A

I would like to have another way of destroying backups that doesn't rely on every csi driver understanding how to do the restoration so and, and this this part of the design overlaps with some of the other use cases for data populators that we're aware of um so just real quick one of the other well-known use cases is like if I'm running some sort of a virtualization engine on top of kubernetes, like like cube vert, for example, and I have an image repository for my vm images and I'm trying to create a pvc.

A

That's going to represent, like the boot volume for a vm, I'd, really like to be able to get that vm image. From my image repository into the pvc, so that I can then boot up a vm from it like, like that, that's another potential use case for data populators that shouldn't be shouldn't, have anything driver specific in it right it's just. I have a bunch of data. I need an empty volume.

A

I need to put the data into the volume in a reliable way so that I can use it for, like a virtualization use case, um lots of backup implementations could be done in a similar way where all you need is to create an empty volume of the appropriate size, and then some controller will fill that volume up with the data.

A

So um the the challenge is uh is is how we manage that workflow.

A

So you know if we just allow the user to create a pvc that has a data source of something that is, that is, you know, not a snapshot, not a pc, so the external provisioner sidecar doesn't understand it, but it's like either a backup or an image or something, and if the behavior is that, like the external provisioner sidecar sees that request and it goes ahead and creates an empty volume and binds it such that a uh you know so such that you could come along and take that empty volume and then fill it up with the data later.

A

The the problem that causes is the moment that the pvc becomes bound.

A

If the user who had created the pvc had any pods attached to it, like those pods will be ready to run and they'll they'll start running immediately um and and but you haven't actually put the data in there yet, and so we looked at this, I don't know a year or two ago. I don't know how long it's benching like. How can we?

A

How can we like get the empty volume, but not let anything attach to it until it's been populated and we've seen various proposals you know, all of which involve um changes to the kubernetes api or trying to hack the user's workloads so so yeah?

A

Let me sort of outline the alternatives that I'm not in favor of first, but one alternative was, like just add, a new field to the pvc that was like not ready to bind or like not not ready for use, so that, like you, could have a pvc that was created and empty, but like still being populated- and this would be a signal to cubelet, that would say like don't, don't use the volume yet um and then the the idea would be that the populator would do its population first and then flip that flag um and then and then allow things to proceed.

A

So the problems with that approach are one like. You have to change cubelet. You have to change the kubernetes api to do that and two like.

A

It's not clear like how you would, if you're, just using an ordinary kubernetes pod to do the data population like how do you tell kubernetes like this is the populator pod that is allowed to attach to the volume, and this is the user's pod- that's not allowed attached to the volume, it becomes very weird and we also discussed a potential implementation that was like based on taints and tolerations, um which is which is a node thing.

A

um We could do something similar with pvcs, but again that would be like a kubernetes api change. There's a lot of complexity around like well. It shares some features with node, taints and tolerations, but it in other ways it doesn't and so like it's not clear if we should use the name or use a different name, and it's not clear how much it overlaps with those taints and tolerations and so like. We. We had a lot of open questions around that approach and so uh other approaches that at least I've reached.

E

Can you at least re-explain why we cannot use the way we cannot put the field or you mentioned something I didn't clearly understand.

A

Okay, so there was a proposal to add the concept of a volume taint and a volume of a volume taint to pvcs and a volume and a volume taint toleration to pods like these would be new api fields that don't exist today, and they would have similar semantics to node taints and node tolerations.

A

But we decided not to pursue that approach because, like that would be changing the kubernetes api like this would be entirely new fields, don't exist, and it's just it would be. It's always more work to change the kubernetes api.

A

You have to have very good reasons for doing it and- and it was going to be confusing because the names were similar to existing concepts, but the actual implementation would have differed in subtle ways, and so we we were working on like trying to figure out those subtle differences, and then we kind of decided that it was. It was very hard and we didn't want to you know nobody wanted to carry that implementation forward. Basically it not that we can't still do it.

A

I mean if somebody says you know what I want to implement: volume change and tolerations. The field is still open to do that.

A

um It would be okay, but but I would warn anyone trying to do that that it's not going to be a small effort and that there's some some issues around around the you know the the details that haven't been hashed out yet so does that answer? The question yes, okay,.

E

Yes, thank you.

A

Okay, the other approach that that we rejected was a some sort of scheme. Where you will you, you don't modify the kubernetes api, but you instead like try to stick init containers into the pod so like before the pod can attach to the volume like there's another init container. That actually does the data population work and in very simple use cases.

A

This would be a workable scheme because if you only ever have one pod connected to a volume- and you can reliably inject it in a in a knit container into that pod- that would do the data population uh you could, you could have a workable scheme. The the problem that I see with that approach is one it you're mucking with you, end user objects right, the end user created a pod. He might have his own init containers, and here we are like adding another container that he's not expecting to see.

A

um It creates a strange user experience when we're injecting init containers into into customers pods that they don't uh that they didn't expect, but but even worse, like there are use cases where you have like multiple pods that are all trying to share volume, and it's not clear like which pod would need the unit container to do the population or additionally, there's use cases where you create the pvc. But then you like don't attach any pods to it and you just try to clone it or take a snapshot of it. And it's like before.

A

You know if you were, if you're, relying on nick on a nick containers and there's no data until the first pod runs, and so, if you take a snapshot or you clone the volume before any pods have run you're going to get an empty volume. So there's a lot of downsides to trying to do this through nic containers.

A

um So the uh the approach that I came up with that I've been advocating for and that I have prototypes of is uh just to create a second pvc.

A

So so, when so, basically we'll have a generalized workflow, where, if somebody creates a pvc with a data source and that data and the population of that data source is not handled by like the external provision or sidecar itself and and some maybe but like for for for the sake of this discussion, let's assume we're dealing with one that is generic and it's not done by the csi driver.

A

um What you'll do is you'll, have a populator controller that'll be watching for pvcs exactly like the actual provisioner, when it sees one that it has a data source that is responsible for it.

A

What it will do is it will create a second pvc that looks exactly like the first pvc except it has no data source and then- and it will wait for that- that one to bind, because that will create an empty pvc that is otherwise identical to the pvc that the user actually wanted, but because it's you've created a second pvc that the user doesn't know about. They don't see this happening. It's it's happening, sort of in the background or in another name space.

A

um So then, after this empty pvc is created what I call pvc prime in most of my examples, um you can create a populator pod, an ordinary pod. That's in the in this sort of hidden name space that the user doesn't know about. They can just bind to the pvc and do whatever work it needs to do to populate the volume.

A

um There's no api magic required. This is just an ordinary, empty pvc, an ordinary pod in a in a sort of admin, controlled name space and the and and the pod does whatever it needs to do to populate the volume, and you know it could uh they could just attach to it and write data the normal way it could.

A

You know using special information, talk directly to some storage controller and make some magic happen. It doesn't matter what this pod does, but there's just a pod running that has ordinary.

A

You know an ordinary volume attachment to the to the pvc and it runs until completion and you can you could do all the things that you would normally do here like where, if the, if the pod gets killed before it's done, you can restart it and resume the operation you can handle. You know nodes going down and nodes coming up and the pod moves to another node and it continues populating.

A

Eventually, the pod finishes, whatever work it needed to do, to set the volume up with the data that the user had requested.

A

Once we reached that point, then what the trick is is basically to rebind the pv that got created back to the old pvc that the user asked for and in most cases like this could happen very very fast right. If population is a is a, is an operation, that's implemented, so in some optimized way like the user could create the original pvc, the controller could see it and immediately create pvc. Prime, the csi controller could immediately create the pv for pvc. Prime, that's empty. The pod could get started up.

A

You know within under a second: do its population work fairly rapidly shut down signal success.

A

The controller could then see that the pod ran to completion, know that that pv contains the correct data and rebind it back to the original pvc and then, from the end user's perspective, like it looks like an ordinary volume provisioning operation right like they created a pvc, they waited a little bit a pv appeared and a pv bound to their pvc and their and then and they're done right. No kubernetes api changes are required for this approach.

A

It just works, um and then you know because you're rebinding, the the pv you do end up with like uh the pvc prime, it's a it ends up in a lost state because it had a pv and then it lost its pv because we rebound it and then so. You have to clean up that pvc prime.

A

You have to clean up the populator pod, but, like that's part of what the populator controller, does it cleans up the temporary objects that got created, but the all of the magic comes from the fact that you can rebind apv from one pvc to another pvc um fairly reliably.

A

So, um and- and that's that's what I prototyped so there is a hello populator out there. I've provided links to it uh in earlier meetings. If you want to go look at it that basically does this um or does a flavor of this. I probably need to update it to do the exactly what I what I described but it, but it basically follows this.

A

This workflow, um and so the the remaining work is to uh to basically find a way to take the common pieces of that populator logic, which is like watching for pvcs noticing when a ppc gets created that matches the data source, that the populator is responsible for creating pvc prime, attaching a populator pod to it, waiting for the pod to run to completion and then rebinding the pv back to the original pvc and then cleaning up the populator pod in pvc. Prime.

A

All of that is sort of reusable logic that should be the same for any populator.

A

The only piece that is going to be specific is like what does that pod do and, additionally, all the information in the actual cr, which is the the data source, needs to get communicated to the to the pod.

A

Somehow, so we need a mechanism for like registering a crd, that is a data source taking all the data, that's in it and sort of communicating it to the populator pod um and then waiting for it to complete, and if we can sort of wrap that up into like a reusable library, we could start having all kinds of data populators but, but most importantly, most dear to my heart, is restoring backups.

A

So I like the idea of like having a backup restorer populator that knows how to take backup, crds and basically put them into a volume using this mechanism. So from the user's perspective, it's no different than cloning from a snapshot they just cloned from a backup and instead of the external provision or sidecar doing the work. This other thing does the work but like they can't tell the difference. They just create the pvc and out pops.

A

You know a volume with their data.

A

um The the only other thing that sort of has caused a little bit of complications and we've discussed them at length in last week was um this whole scheme of creating a second pod and a second. Pvc has implications for stuff like wait for first consumer, um because.

A

Normally, what happens with if with like an empty pvc, if you have wait for first consumer set to true like the uh the provisioner, doesn't actually do any work until there is a pod that has been scheduled and attached to that volume. So so what we didn't want to have happen was for this thing to go ahead and, like just start, creating pvcs um on like in the wrong location.

A

If there was a pod that had wait for first consumer set to true, so this populator pod will have to or the populator controller will have, to sort of match, the same semantics where, if, if a pvc comes in with weight for first consumer set to true it, it will have to create pvc prime also with weight for first consumer set to true, and it can't create the populator pod until the original volume has a pod.

A

That is on a node.

A

So it would have to sort of you know, wait until there is a pod that has been scheduled for the original pvc in order to create a populator pod on the same node, so that the pvc prime gets scheduled to the right place. The resulting pv gets created on the right in the right place. If, if, if there's topology and it matters, and then when we rebind that pv back to the original pvc, it looks like all the rules have been respected and and it the topology matches what the user asked for.

F

A

F

A I have a question that.

F

So, let's just start with one, so the pvc type um this can be blocked and and fire system right does that have to let the populator have to kind of.

G

F

A

Isn't it? Yes? Yes, so so um you know the pvc prime will match all of the features of the original pvc, including the storage class. The the volume type the volume access modes, like all of those things, will have to be copy and pasted. So so, yes, if it is a file system volume, the populator pod will have to attach to it as a file system volume.

A

Similarly, if it is a raw block volume, the populator pod will have to attach to it as a raw block volume that will just have to be. You know an if then statement inside the populator logic that creates the populator pod. To basically say: is it a vault? Is it a raw block volume that needs a volume mount? I'm sorry. Is it a?

A

Is it a file system volume that needs a volume mount or is it a raw block volume that needs a device path um and and and then and then that information will also have to be communicated into the populator pod?

A

Somehow so, like the part of this that hasn't been designed, yet is what is the interface between the sort of this generic logic that is going to be reusable for all the populators and the specific populator implementation that like creates this pod, um that does the population work, because a bunch of information has to get communicated like you know, is it? Is it a file system volume? Is it a raw block volume? You know and um potentially other details. So it's like that part.

A

I haven't figured out yet but um like it's pretty easy to make like a you know, a non-generalized prototype that can handle a specific crd and then the trick is going to be. How do we generalize it to support any crd in any data, any any populator type, so that we pass down enough information that it can know what it's dealing with and know how to populate it?

A

I imagine for well, I don't know like for for backups. You certainly want to be able to back up both raw block volumes and file system volumes right.

A

So, inevitably, when we do have a backup design, we're gonna have to handle both of those, but um but it shouldn't be difficult to imagine how to write like a populator pod that can handle both of those cases. It just has to know which case it's in and then do its thing.

A

So um any other questions.

A

Shane did I did I answer your question because I know that um it's been confusing.

C

I'm mainly trying to introduce this feature to others in the teleportation group yeah. I I I see the value of this feature. I just want to okay.

A

Okay yeah, so so the remaining challenge is like designing the sort of the. Where do we draw the line between the the reusable popular code and the specific implementation of a populator? What does that interface? Look like um we had talked in the previous meeting about like do? We want to go with like a full sidecar design that, with a grpc interface between you know the populator sidecar and the actual implementation, because, like that, that implies like designing a whole grpc interface for this thing, that is fairly, that would be fairly heavy weight.

A

um I was hoping that we could get away with just like a library that you can import, but um this is this stuff is all still tbd, like the exact implementation of of the reusable part, and what the interface between that and the rest of the code and the populator specific code looks like, um and- and it's not going to be a small amount of code. I think because it ends up having to copy a lot of the behavior of what external provisioner does.

A

um And, and and the only other thing I'll mention that we've sort of covered since this is a quick review- is uh I I do like the idea of cooperating populator implementation. So so, just because you define, like a you, know, some data type which or some crd, which can be the the source of a of a pvc, doesn't mean that, like all of the populate all of the data population, for that type needs to be done by a single controller.

A

I like the idea of being able to split the work between potentially multiple controllers by you know, allowing them to examine fields in the cr and say: okay, this one I'm going to let the external provisioner handle and this one I'm going to let the generic populator handle.

A

You know through some policy that is specific to that data type um and and the example I use there is like for some types of backups.

A

We may want to allow the csi driver to do an optimized thing, and we may want external provisioner to support additional crds other than volumes and snapshots in the future.

A

But we may want to do that in a way that it only handles some instances of those crds and not all instances of those crds, perhaps by like having a format argument um that specifies the format of the backup and so a csi plugin could say. I understand this format in that format, but not no other formats and then and then the external provisioner could only pass down requests for supported formats leave other ones unhandled and then some other controller can be responsible for for backstopping restoration of formats that the csi plugin doesn't understand.

A

Potentially that's just sort of like a a brainstorm kind of idea, um because I I don't want to. I don't want to end up in a situation where, like for any given data source, it has to either always be a generic populator or always be done through csi, because I I think that's too rigid. I think that there there can be data sources where you can have a generic populator. That knows how to handle it.

A

But if a csi plug-in has a better way to do it like we should, we should be able to enable that. So this scheme- I guess I just want to point out this scheme- would allow that kind of cooperation so that you could have generic implementations and optimized implementations and they could work in tandem with each other.

A

If we ever get that far.

A

um So so that's, I think, basically review of just about everything without you know going into the depth in terms of the code, um I don't have like the the diagrams that show the.

B

How far is this down the track been.

A

Well, um yeah, that's a good question.

A

uh Like I said, the the feature gate was alpha in 118 and remains so we can't get it to beta until we sort of solve the rest of these problems, but the feature gate has been merged and is available, so you can at least prototype these kinds of things without having to change kubernetes.

A

You just have to turn the feature gate on create your own crds and as long as the feature gate is on, you can use those crds as the data surface of a pvc and then install all the other controllers and make them do whatever you want to do so so that piece is in.

A

There is a prototype populator out there called hello populator. I've shared the link with that before the link to that before. So you're welcome to look at actual code that that plays this game of creating pvc prime and creating a populator pod waiting for the populator pod to run to completion and then rebinding the pvc rebinding, the pv to the old pvc.

A

That code is available sort of as just a prototype. It's called hello, populator, there's also a prototype of uh well the proposal for a volume, populator crd, which is how you register other crds. That can be data sources.

A

That prototype is in the external provisioner repo as a pr it that that is still proposed as a as a validating web hook, and we've already decided, we don't like the idea of doing a validating web hook, so I need to rewrite that that proposal- um and that's probably the the next thing on my on my list to do is- is rewrite that that pr to to turn it into a controller that just posts, events um and then and then I'm gonna go back and take the hello populator and start to reconstruct it and split it.

A

Apart along the lines I described so that it could become reusable, but um there's nothing to stop you from just like grabbing it and cloning it and experimenting with it today. It doesn't support any of the weight for first consumer by the way that was the other sort of new thing that we realized after I wrote.

A

The original version of this is that special logic is needed for weight for first consumer, but aside from the logic for weight for first consumer, I don't think there's any other special cases that need to be handled in that populator code.

A

It's just it's just teasing apart the the generic code from the from the specific code and creating some sort of an interface that we can sort of support over the long run and and maybe even turn it into like a proper sidecar style interface someday.

A

It seems more ambitious um but um yeah. So so, there's there's code out there um and I am working on. You know fleshing out the remaining pieces um and I just I haven't been able to put as much time into it in the last month as I would have liked, because I've been sidetracked with other things, but um but I'm committed to seeing this through to you know to beta you know getting all the various controller pieces in getting the feature gate promoted and seeing a you know, a reusable workable scheme for all of this.

A

The other you know I'll just mention one more thing: um the cap, the there was a cap that has merged and I have an updated proposal to the cap that has not merged for 120.. um I need to rewrite the cap as well, but uh they have asked us to add like metrics. For this feature, um that's a new thing that I guess, they're really pushing hard across all of kubernetes is like everything should have metrics.

A

We need to sit down and think about. You know what kind of metrics make sense in the case of a data populator and and there's some obvious answers there, like you know how many volumes have been created for each data source type, you know, have any populations failed.

A

um You know the uh maybe even measurement for population time like how long does it taking to populate volumes um the kind of metrics that could tell an ops person if, if a particular populator was was sort of healthy or not? I think that's that's what they're looking for I mean I haven't spent a terrible amount of time thinking about metrics, but that's something that sort of needs to be done before beta. I think.

B

May I suggest that in the um uh meeting log we record the related caps.

A

B

Yeah yeah and well.

A

That there's only one cap uh and and so it's merged, and I have an update to that cap- that is, you know, to move it from alpha to beta, um and it's just that kept. You know that there's stuff that's tbd in there but yeah. I can provide a link to that pr or the original cap and the pr to the cab that would.

B

Be good um okay, and do we have a any use case documentation in the examples.

A

There are use cases in the cab that that's part of the cabinet. I only have two use cases listed.

A

One is the backup restore use case, the other one is the you know, image for vm images for virtualization use case, um but like it's, it's not hard to imagine other ones, and so, if you know, if you want to help me sort of come up with some good ones for uh for the cab, I'd be happy to add, like the third or fourth or fifth, one to really illustrate the kinds of things that uh that we can do here.

B

I'll gladly have a look at that. However, I think there's some other people in dell that I'd like to bring in on this, who may have a more intense interest in this. So.

A

Yeah interests interested because the backup.

B

And restore are interested for other reasons, backup and restore data protection.

B

um There is a team that has already expressed challenges with uh recover, restoring a um about a volume back up and then bringing that back into a different namespace. I don't remember the details of the challenges that they had.

C

That's a different one, though.

A

Is another very interesting problem that there is.

C

Another namespace transfer. Actually so not this one. There was another one yeah.

A

Was it sing? Do you know how that's going.

C

uh I know that uh the guy I think mike was working on that he updated that cap. That's so I know.

A

But it didn't merge for 120, so there's no! No.

C

No, it's still in design yeah, okay,.

A

Because, yes, uh being able to move various things across name spaces has been a problem. You know moving volumes across namespaces snapshots.

E

A

Namespaces or assuming we had a backup, it would suffer from the same problem. You need a way to move backups across namespaces um or you would need a you know, some administrator level guy. You know manually copying them across namespaces.

A

um So so yes, it's a problem that lots of people are encountering um and we need to need a better answer. I think, but but I I am not tackling any of those problems in in this in this working group, or with this with this.

C

I think initially, those two others are one actually then kind of divided into two first one and the other one. Okay, very.

B

Very beginning.

C

Long time back, I remember there actually one.

A

Right, well I mean it's a it's a fundamentally different problem right, like just.

E

Giving a volume.

A

To another, namespace is has utility by itself and similarly moving a snapshot to the namespace has utility.

C

Why would we do it actually? Today I mean it's not that impressive, but you can do it.

A

Well, an admin can do it. Yeah.

C

A

But like the question is like, should there be a way for anyone who's, not an admin, to do it, and what and what does that look like so yeah.

F

I have a limb. Sorry, I was my connection. Will kick me out earlier when you are actually my question, so I have to uh listen to it on the test and on the youtube uh this recording. But um my my second question related to this is the direction of the data flow right.

F

So from what I heard so far, it seemed like when we have an empty pvc and then the the volume populator will pump data to that pvc, and that is for the backup scenario, and the other way around would be that the the existing pvc already there and they want to deform it, the the reverse direction from the instead of just the volume uh populator. We want to uh able to read using the same mechanism to beat the reverse direction. We just related to that or we have to have another. You know volume, calculator.

A

Or type so are you? Are you talking about like a revert of an existing pv.

F

uh um It was like this one is seemed to me fit with my uh restore story right when we, when he went to a new pvc. I read from my backup and.

A

So, in the case of backup restore like the question of how you get from a pvc to a backup that is entirely out of scope for this work because it there is no there's, no need to change any kubernetes apis or invent any things to do.

F

A

I'm leaving those.

F

Kinds of questions.

A

For for the data protection working group to figure like how do you take a volume and get it into a backup, because there's a thousand solutions to that that problem um yeah, you know like.

C

So and there's a.

A

Little background.

C

There's also a revolving backup proposal. That is, uh uh I think, somebody's gone so we'll see if that comes out. Yeah yeah.

A

There's lots of ways to skin that cat and and yeah I'm just. I want to make it clear that, like that's out of scope for this, this is really about just only the restore path because it happens to overlap with a bunch of other.

A

You know how do I create a volume from something kind of kind of problems, but but how do I? How do I go from a volume to a backup? You know you know.

G

I have specific.

A

Thoughts on that, but it's not related to data populators yeah.

F

In my mind, with the components that do that in.

F

Direction and they have the logic similar logic- you should come the different direction. So I kind of wonder if the same.

F

Which is the other way around, which is similar to like a restore that has had.

A

Yes, yes, we're talking about the restore path and not the backup path. Thank.

F

A

In this area, um I the only other thing I'll mention, uh because I don't know if this is part of your question or, if I just misheard it, but but about like a revert kind of a work of a workflow, we don't support reverts anywhere in kubernetes. You can't revert to a snapshot today, I'm not proposing the populators help you with like reverting to backups either like the the the workflow is always if you want to basically respond.

G

uh I meant by the reverb, usually the reverse direction.

F

Of the data flow yeah, it's not.

D

Pretty I think that that's where it is.

F

In the restore right.

A

Yeah yeah so yeah when you do a restore the the the assumption, at least with in this design, is what you wanted was a whole new volume. If what you wanted was to take your existing volume and have it sort of go back in time to the point of a backup or a snapshot, we don't have solutions for for those use cases.

A

I don't know as far as I'm noticing nobody is working on that or where is it.

C

I think the uh I think alex's is working on something for backup holding backup api.

A

As a way to like revert a volume in place, oh.

C

Not no not not revert. No, I think.

A

C

Working a reward.

A

I didn't think so and- and I.

C

No, that's also not, I think, not, that's not what phong is asking either so yeah. Everyone was like someone raised that some time ago, but we don't have a strong use case for that, so yeah that.

B

C

A

A lot of people talk about I'd like to revert to my backup or I'd like to revert to my snapshot, but like no one ever has a compelling reason for why that's better than just creating a whole new volume. So yeah. I think we've decided not to address that problem in either this area or in the data protection working group. For the time being,.

C

Yeah, and also that's not, I think, that's probably not a typical backup restore use case anyway. River. That's not really usually used in a couple store to my knowledge.

A

Well, it it tends to come up uh yeah.

C

Yeah yeah, I mean yeah, you can it's like coming in anyway, right at least traditionally, um but I mean backup. Vendors normally don't depend on that one because they they kind of they restore it to a different. They back it up to a different place. You can't really use that data too. Do you revert.

D

C

D

C

Okay, um local snapshot. I think yeah.

A

So uh I'm gonna, I'm gonna. My action item I think, is to uh provide links to the cap into the pr for the cap in the in the agenda, doc, which I will do, um and I I just wanted to mention that this is a weekly meeting. But, like we've covered most of the topics, we need to cover so like unless people have new agenda items for next week.

A

Well I'll leave the meeting on the calendar for next week but like unless there's new things to discuss we'll, probably end up just calling that meeting um at the beginning and we won't end up holding it. If there's nothing on the agenda until until I have more progress to report uh or or new designs to review, because I have all the feedback that I require and I'm I want to. You know I'm happy to like answer, questions and stuff or have discussions in these meetings.

A

But um we've had all the discussions that I need to have so that the rest of this is for.

C

A

Your benefit better.

C

Like changing this stupidly, if you think you have to spend more time.

A

Yeah yeah I mean I, I would be okay with reducing the frequency. um The challenge is like with the holidays coming up like which.

C

A

Weeks, do you have.

C

On which weeks do you.

A

C

That's true yeah, so.

A

It might be easier just to like leave the meeting on the calendar but cancel it whenever there's nothing to talk about yeah yeah. I think so.

B

Ben I'll follow up with you offline, okay, that works for me. Thank you. I've got a drop. Thank you guys.

C

Thank you, bye.