Kubernetes Data Protection Working Group, 12 Jan 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG Storage - Data Protection Workgroup Bi-Weekly Meeting 2022-01-12

Description

Kubernetes Data Protection Workgroup Bi-Weekly Meeting - 12 January 2022

Meeting Notes/Agenda: -

Find out more about the K8s DP WG here:
https://github.com/kubernetes/community/tree/master/wg-data-protection

Moderator: Xiangqian Yu (Google)

A

This meeting is being recorded.

B

All right uh today is wednesday january 12, 20 2022.. uh This is the kubernetes data protection working group. uh Today there are two main ninja agendas: ones that we're going we're going to continue the discussion that nice presentation shine gave last year last year, months back uh after that, we will discuss a couple of open issues uh shyam, and so you are already online. Let me know whenever you are ready, I think she has already made you the uh presenter.

C

Yep sorry, a silly question when I hit share screen. It shows me a whiteboard setting.

C

This thing worked last time.

C

Does anybody know.

A

Oh, let me okay, so let me see if I made the right right uh the presented. Let me check, which I can see if you are actually yeah. I see that says co-host next to your name, so you should be able to share your screen all right now, I'm not.

C

Not sharing my screen, uh this is something called a whiteboard, which is what oh.

D

C

Screen is trying to figure out what I'm missing.

B

Okay, uh when I have.

A

Multiples, do you have multiple screens, maybe sometimes maybe.

C

Oh yeah, okay! Yes, I do. uh Let me.

C

Give me a second, let's say: I'm sorry fix my multiple screen issues.

C

Okay, let me tell you again.

B

uh I am not seeing your scream so far.

C

Yeah yeah it I'm not seeing.

D

An option to share the screen, I'm just seeing an option to share a portion of the screen. That's so you want me to share instead, yeah.

B

I can share as well if you want okay, yeah sure.

D

A

D

A

Hey it's. It's.

C

A weird option: it says.

A

Okay, all right, maybe sean.

B

Thank you, no problem.

B

Okay, let me find out.

A

Show me this is the one.

E

C

All right, so let me start by.

C

So I I just had some notes from the last meeting uh just so that we can get warmed up. I guess uh so. Initially we we did discuss about volume, grouping volumes for replication as well uh uh and got out the point as to being uh prescriptive about that.

C

So I just wanted to reiterate or repeat that we would replicate either pvcs or volume groups and the grouping really comes from volume groups. That's that's in flight. uh They kept the proposal is in flight and we don't. I don't see us doing replication groups as such as something separate from volume groups, but does that conflict in any way? Based on the discussions last time we met.

A

I'm sorry, are you saying in this updated slide step you will be talking about using volume group.

C

So let me no not necessarily volume group, so we did talk about uh when we talked about the orchestration uh ben was talking about.

C

Instead of only considering pvcs, we should probably you know, out of the gate, have something for volume groups uh and- and I was just trying to repeat the notes from the last time saying. If you have volume groups, we expect volume groups, grouping of volumes to come from the volume group enhancement proposal and and and not.

A

Right, that's another proposal. um I think you probably, I think we still need. We need to have individual volume replication, so I think we will need both anyway. So um can you just continue with uh whatever you discussed in the last meeting, so the voting group replication? That's the there's, a tab that I'm working on, but that replication part is kind of uh like the second part of that whole thing right, because we're trying to address the group snapshot first, um exactly correct, so yeah. Why don't we just go through this one? First.

C

Okay, so let's uh probably switch skip to slide 14 or maybe slide yeah slide 14.

C

Right so uh we talked about uh the various uh the common goals in terms of storage in terms of pairing storage systems for mirroring and talked about roles across these storage systems, for individual volumes being primary and secondary and and replication of data happening from the primary to the secondary, or at least the primary is writable and the secondary is the copy of data snapshot based copy of data or other. However, the storage system decides to do it.

C

Then we talked about failover and failed back and how to orchestrate that we did start talking about how the solution looks. um So I just want to start off from there and continue uh forward. So before we talk anymore. There are some some caveats in what we're presenting. There are some assumptions.

C

uh For example, this does assume two or nv replication. uh It also assumes that the storage systems can talk to each other across their control plane and that's not orchestrated or done via cube.

C

In other words, if we request for a volume or an image to be mirrored replicated, the setup can be done from the east storage cluster onto the best storage cluster. It doesn't have to be done via cube on west, and then you know, and then you come back on the east cluster and redo the storage pairing as such.

C

uh The final assumption, which is something we we have, but we don't want to really take it forward, is that it assumes that the volumes are accessible via the same csi volume handle across these two peer storage systems. So we're basically talking about what we've done to set the stage for what can happen down the line.

C

Moving on to the next slide.

C

So what we, what we added here, is a csi sidecar to do to to watch and reconcile volume, replication, crs cluster resources, volume, replication cluster resources, pretty much point to a pvc and uh hence via the sidecar, talk to extended grpcs on the storage provider to enable disable volume replication for that particular volume. The, as I said, the storage is it's assumed that the storage is already set up or appeared for replicating images. So that's out of scope of what volume replication does the very simply.

C

There is a a replication class which, if which has details about like the storage class, has details about the clusters that appeared and.

F

C

That are vendor-specific and so on. uh The volume replication itself points to a replication class and then points to the data source. In this case pvc and it's a data source because later on, when volume groups come in can point to a volume group and say I need volume replication for the entire volume group and not per pvc as such, and it has a replication state which, which is flips between primary and secondary, as the need case may be.

C

Going on to the next slide.

F

G

Interrupt so early, but um I'm just curious why? The decision that the secondary volume is implicitly created rather than explicitly created.

C

Okay, so the efficient, if we start.

C

This okay, so if we are going to create the secondary volume using cube resources, let's say say a pdc: uh we we actually and the application only runs on one of these two clusters, so we will have not necessarily orphaned pvcs but pvcs that are no longer user managed.

C

In that sense, the user is typically managing the application and pvcs on the current primary cluster for that application, and now, if the users expected to go around and to the secondary cluster, create pdc's uh and then provide which will hence create images on the volumes on the secondary cluster and hence provide the information back and forth between these clusters, the uh the responsibility of the app user becomes managing pdc's across these two systems, where it's in use in one side and not in using the other.

C

The second uh second observation or need there was uh the storage systems are already appeared. So any the question is uh usually the the data that the data planes appeared and the control plane is not but going to csi. If, if the control plane, information of the participating clusters is available with the csi plug-in it can, it can proxy. Instead of the user to create the required volumes on the secondary end, they have no representation in kubernetes.

C

So when, when a primary volume is garbage collected on uh on the primary kubernetes instance again, the csr plugin can actually proxy for the control plane of the secondary and garbage collected on. The secondary makes it much easier to deal with makes simpler for the user to deal with pvcs and and pods using those pvcs on a cluster uh rather than you know, actually having to handle these across clusters.

G

Right but been in a disaster. What does the user do.

C

G

I mean if you're going to get to this later. Maybe we should postpone my question right. I don't want it.

C

No, no, no! No! We will get to it in a minute because I I did add three more slides based on the tail end of the conversation last time as to the orchestration that the user has to do so, we'll get to it in a minute. Okay,.

G

Okay proceed then,.

C

Can we move on to the next slide? Please thank you. I talked about the volume replication class.

C

It's a very basic definition here of a volume, replication class and everything else stuffed into parameters, uh we're typically looking at replication schedules as one of the more common parameters here and everything else being vendor-specific as to what's the cluster id that's appeared and so on, so that whatever validation needs to be done can be done, but we just wanted the schedule out so that the users cannot all use this.

C

It's kind of resource controlled in a particular way, because the volume replication class would become a cluster scope, resource and hence can be quoted, and, and so every user does not end up asking for a replication schedule of two minutes or five minutes or a minute and and we can have classes of users who are allowed to request uh replication based on different scheduling intervals.

C

Anyway, that's the replication class. uh Can you move to the next slide? Please.

C

Right so this is this is a brief of what we were talking about, what the user needs to do, we'll talk about it again, a little bit.

C

So when the failover has to happen, the user needs to create the volume replication cluster resource on the secondary cluster, set it as primary and use the same data source as as they used on the on the east cluster and point back to the same pvc, uh and then they and and when they bring the pvc when they roll out the application, which includes the pvcs and pods, uh the volume replication resource would be able to recover from the mark.

C

The actual volume on storage is primary, but here's where one of the caveats which was introduced was that we depend on the csi volume handle to be equal on both clusters or in other words, as long as the pv object is the same across these two clusters.

C

uh It's assumed that the csi plugin can map it back to the mirrored copy of the volume on on the storage system on vest.

A

Does that have to be a requirement, because not every storage system can do that right. So maybe that's true for some, but not for everybody.

C

That is correct. We would actually want to remove that requirement. We can so what we instead want to do is so there are two problems that is problem a uh as a matter of fact. Even for ceph. We had to do some level of handling mapping across clusters to ensure that we mapped the mirrored copy on the on the remote class or the secondary cluster, but barring that it also means that a claustroscope resource like the pv, has to be backed up and restored onto the secondary cluster, which a user cannot uh do by themselves.

C

So it's not controlled by the user any longer the cluster admin or you know, has to step in to backup and restore the pd. So that's also not good, so we don't want to go down that path, but that's where we started. So that's why it's represented as such. What we instead want to do is when the volume replication is initially created on the east cluster, there has to be some cookie information that goes from east to west, so the volume replication is created on the east cluster it and it establishes the volume for replication.

C

It spits out uh just like the csi volume handle, let's just call it a csi replication handle splits out the csr replication handle, which is what we use to create the volume replication resource on the remote end and create a pvc from volume replication as the data source, in which case the user can take the replication handle and reuse it. On the other cluster, three deploying the pvc and the workload.

G

So so everything you you just said: I agree with that. You need some sort of cookie that can be transmitted from one cluster to the other cluster. But it sounds like you said.

G

The current design involves like some administrator backing up the pv, and you don't like that, because you would rather have a self-service kind of a model and doesn't that get you back to the point where the user should be creating a pvc on the secondary. So it's self-service and not administrator, controlled.

C

So I didn't draw out that model. So, let's.

C

The user does have to create a volume replication resource on the secondary and feed it, the cookie that they got from the east cluster.

A

C

The pvc will have a data source of volume replication, so it will.

G

Okay, okay, so so so they can create their own pvc and point it at some object which points back to the original cluster implicitly and then and out, and that pvc will pop up with the the correct data from from the mirror relationship right. Okay. But but you wait until it's time until after the disaster has happened, and you want to do the failover and then you go and create that pvc pointed at the relationship and set it all up.

G

That's good, yes, okay and, and then, and then okay and then you're going to get to how you, how you reverse things and get it back into replicated state after the disaster ends.

C

Yes, okay, I'm going to get into it all right.

G

I'll wait for that yeah.

C

Right so um the next slide is just about ordering in the pv based model in the in the wall. Rep cookie based model you'll create the volume replication resource first, but as long as your pvc destination is updated to user data, source volume, replication it'll either fail to get provisioned on ease on west uh till the volume replication resource is actually created.

C

Oracle provision, as soon as the volume replication source is available on the on the best cluster.

C

With the pv based model, we have an ordering constraint, whereas with the volume replication cookie based model, we will not have the constraint with the pv based model.

C

We have the constraint, because if you create the pvc, we, if you don't create the pv with the required claim ref for the pvc, uh the pvc will just go through dynamic provisioning and not get the required data, because the storage system wouldn't know where to get the data from uh so in the pv based model, the pv has to be restored first, which is another sort of uh not a good thing from a from a design perspective to actually depend on pvs and csr. Volume handles to reattach storage on the west cluster.

C

Let's go down to the next slide.

C

Okay, so these are things that I added. uh This gives uh here's, where we talk about some of the steps that the user has to do when they deploy the application, which is this slide. The next two slides will be about when they fail over the application when they and when do they fail back the application.

C

uh Some of it is a repeat so try to see if we can go through this faster so when they deploy the application resources. Initially, they deploy the pods and the pvcs, but additionally, they create volume, replication per pvc as primary and uh yeah.

C

Of course, step three is ensure that the volume replication is primary and an out of order step four, which is all the way at the bottom, is to protect the pvs, uh protect the pv cluster data from the uh yeast cluster, which, again, let me repeat, the user cannot do because it's a cluster scoped resource, but that's where we've, if it was a volume, replication cookie, the user would have been able to protect this.

C

This cookie at present, step 5, is keep watching for any uh new pvcs in, in terms of you know, stateful sets uh and other such things, but that's a higher level orchestrator issue in the sense as a user, if I'm using a stateful set- and I I scale it up- I get new pdcs, which means I need to create newer volume, replication resources for those pvcs, whereas if there was some higher level orchestrator which grouped this, it would probably automatically start protecting or replicating those pvcs as well, but as a user as it stands.

C

If there is dynamic ppcs appearing on the for the same workload, they'll have to keep creating volume replication per pvc for that particular workload.

C

Moving on the next slide, which is about failover again part of it, is a repeat but we'll just go through this quickly step. One is east becomes unavailable, which is a disaster, and so now we need to move it to west. So the first thing: here's where the ordering is important. We first need to restore the pv cluster data onto the best cluster so that the claim refs point back to the pvcs that were created step. Three.

C

Is you redeploy the application resources oops, which comes with the pod pvc and volume, replication resources that were created on the east cluster?

C

We create the pvc as primary, which tells the storage that the corresponding image has to be forced force promoter to primary, because there's already an existing primary, but we don't know its state once east is available. uh Here's where we start talking about heal and fail back right.

C

Once east cluster becomes available, the user would have to delete the application resources in the east cluster and also also update volume replication as secondary, so that the vsync can start, because that will let storage know that the the local volume on the east cluster has to be a secondary volume and it either flips the replication relationship.

C

But because this is a failover, it would have gotten into a split brain issue as in it would have received. It may have received some ios, which are out of sync from western, so well. Storage decides whether it needs to reset the volume uh from a last known, good snapshot or just continue resyncing from where the current primary is.

C

But that's the point at which that decision is made, and once we know that the volume replication resource is re-syncing, we can delete the volume, replication resource and clean up all resources from the east cluster and moving on to the next slide.

C

When we need to relocate or fail back the workload to the east cluster, uh well, we can start with ensuring that the volume replication resource is primary. Yes, delete all the application resources, which basically means the pvc gets deleted. The part gets deleted.

C

uh This is important so that we can ensure the pvc is not in use before we flip the volume replica volume, replication resources also change to secondary, but before we act on that and tell storage to change it to secondary, we need to be sure that the the pods are not in there are no parts using the pvcs and that the pvc is deleted. So there cannot be an out of order, part creation that will use the pvc as we are marking it.

C

Second, which is step four step three, and then we go to step four and once we mark it secondary the user will have to ensure that the final sync is complete and and then delete the sorry few more steps in the previous slide.

C

uh Can we go back to the previous slide, please thank you. uh User will have to uh clean up and delete all the volume replication resource as well, which will garbage collect the pvcs and other such things and then redeploy the resources on the east cluster marking volume replication, as primarily.

C

Of course, that is a step seven yet restore pvs before you recreate the pvcs on the east cluster. So then you, you had a question on feedback feedback.

G

Well, yeah, I was just. I was just trying to understand like how you know what the user actually has to do, and so it sounds like every time you want to do a failover and a failback there's creating of new pvcs and deleting of old pvcs um and manipulating of these relationship objects right.

C

G

Okay, um how about like a planned, failover.

C

So uh question I have on planned: failover is: does the secondary start off from a snapshot of the data or not.

G

I mean I, ideally it has the the latest copy right, but the reason I bring it up is because, in my experience like people that are serious about disaster protection exercise, their disaster recovery plans regularly, even when there's not disasters, because you never want to be in a situation where you try it after the disaster happens and find out, it doesn't work right. So so it's important as far as I'm concerned that there's a way to initiate a failover.

G

Just when there's no disaster to exercise the work so make sure that it's all you know all the machine is working basically, um and so so, in the version of this that we've implemented, like the pvcs just hang around, we don't delete them and create them. So I'm trying to understand what the advantage of of doing it this way is where you're, creating and deleting pvcs every time you want to fail over and fail back.

C

We want to disassociate the pvc lifecycle with where the app is primary, then associate the pvc life cycle with where all the app can be, where all the app can come. That's the that's why, if I have to state it in those words.

G

E

G

Me ask a follow-up.

E

Question here so for you to delete the pvcs, obviously all the pvcs have a finalizer. You know for protection, so that means no pod should be mounting those or using those pvcs. How are you managing the apps, whose responsibilities to delete the apps.

C

So, although I've posted it here, as the user is responsible to delete the apps, so without any any other multi-cluster orchestrator, the user is responsible to delete the apps.

F

E

Okay, is that realistic, not on their own scenarios, because imagine if you have a network outage or something user may not even reach the cluster or you know, if you I mean previously, you said you don't want the cost, the users to manage pvcs on the west cluster, because there was a too much of a manual process now, but they have to deal with the apps on the east cluster. Now.

C

Yes, I mean okay, so so they you're right that they cannot delete the app on the east cluster as long as it's unavailable, but once the east cluster is back up, they they would have to clean. The undeploy of you know the easter, the app from the east cluster.

C

In a regular case, I mean in a normal scenario. What will happen is if the app continues to run on east. There are a few things that will happen right. I mean okay, the volume still noted as primary okay, so technically, the storage system would actually allow ios to the to the storage right to storage right, but who is actually using the app I mean, where are other applications that are using the capabilities of the application.

C

This particular application requests coming from that can be in cluster or out of cluster uh in cluster apps that continue to run would continue to send requests, but it really requires some level of traffic, managed global traffic management or rerouting of the apps external uh addresses to the west cluster.

C

So the app has to be deleted at some point in time, at least like if I.

E

I guess the issue is this: you know like once um I guess you can protect the new primary on the cluster on the west cluster by breaking the penal relationship right, but once the east cluster comes back up, obviously io would still go to the pvs there. This wouldn't affect the new primary, but you know it can affect the old primary and I guess I don't know based on maybe the type of storage backend you have, that can have implications on racing.

E

So if you want to switch back to making the east the new primary, they can have implications there right, but um that can also have implications on.

E

um I guess customer facing like because somehow I mean I mean basically, the whole solution hinges on the fact that some load balancer has to direct all client traffic now to the west cluster and nothing goes to these cluster.

D

G

You mean in like a network partition situation or.

E

In lucky, in it in a network partition situation or even if or even for a plan migration, you still want to direct all new traffic to destinations.

G

I was just saying like in an actual outage: if the load balancer is pointed at the wrong place, you just don't get anything right, but uh but but.

E

As soon as like each cluster comes back up, load balancer would treat that as a legit back, and I would you know, send traffic there.

G

Yeah yeah, I mean it's something that needs to be managed for sure.

C

Yeah the load balancer should not. I mean it should be managed to not send traffic east we pre-fail over the west. We we, I think, 10 12, slides ahead. We said that we're not dealing with a load balancer or traffic redirection in in this set of slides, because it's sort of external to storage as such, but yes, the load balancer, will have to divert traffic to west and not discontinue using east.

C

So failover testing is uh one form of failover. Testing is no different than the relocate which is provided in this particular slide, which is a coordinated handoff of the application from east to west.

A

uh Which is why.

C

I asked the question question.

A

There's a question from sean chan: he could not uh mute himself. He has a question in the chat in a dr scenario. East is not reachable. How would you operate against it?.

C

Right, so yes, in a dear scenario, east is not reachable. So uh if we go back, one slide uh right. So this is the dr scenario value. So we we even the the line item five where east is available, that that's when we really can start garbage collecting resources and healing easter, secondary and so on, at step four, you actually have the application recovered on the west cluster.

C

You don't need the east cluster to recover the application on the west cluster at this point, but to recover the east cluster storage and market secondary and resync it from the current primary, which is west. All that happens once east is available so.

C

So, disaster recovery testing we're not necessarily switching off east cluster and moving the workloads and as failing over the workloads to the west cluster. If you are, if you are testing it from an ability for the application to run and use the data, then the relocate can be performed as the disaster recovery.

C

As the as the dr testing during the dr testing phase, because you don't want to lose any data during vr testing phase, you need to ensure that there are no further ios on east and that the pods and pdc's are worn down before you actually move the parts and pvcs to the uh west cluster for testing. So it's a real, real.

C

Okay, that's why we we added, calling it relocate rather than fail back because fail back denotes going back to east, whereas relocators you could just relocate from east to west west to east as many times as you want.

C

And you really fail over when there is a disaster that may not satisfy all requirements. That's why I'm a little hazy on whether dr testing should involve data loss or not.

G

Oh no, it shouldn't oh yeah. That seems pretty clear. You know dr testing can involve downtime because you're you have to take things down, do the final copy of the data so that the secondary is fully in sync with the primary and then bring up your app, which can be shrunk to as few as you know, a couple of seconds of downtime if you're fast, but it's unavoidable, some amount of downtime.

G

I I like I like the way you're thinking about you know relocating. Instead of failing back and being able to move back and forth pretty freely, I I'm still scratching my head about the pvc creation and deletion and the way that the the the selection of the location of the secondary. So when it sounds like in this design, when I, when I'm on the east cluster- and I create my pvc and it's of a type, that's supposed to get replicated automatically a secondary, will also get created.

G

Implicitly, I don't have to do anything, but if there's more than one candidate for places for the secondary to go, how is that decision made.

C

Is the replication fan out or chained.

G

I mean in all of those scenarios like like in in just a one or just a one-way. You know peer-to-peer replication. You could have multiple options for places to replicate to, even within the same site right you could have multiple devices or multiple pools of disks, some of which are fast, some of which are slow. You know all of those all the storage class type decisions where you know you would like to give people control over where their data ends up for performance, characteristics, reliability, characteristics, etc.

G

um You you want to make those decisions on both the primary side and the secondary side right. So so I'm just I'm trying to because this that it's implicit here and the pvc is created after the fact. So by the time the pvc is created on the secondary, the data already is where it is, there's no scheduling decision to be made for the for that volume. It's just give me the volume where it is now, so the scheduling decision was made all the way up front when you created the original thing.

C

Correct right so so this is where I see the replication class playing in where I'm gonna have to extrapolate a scenario here from what you're saying so you could technically replicate from east to west, uh where the volume on east is on fast pool and volume on west is on either slow or fast pool. You know that.

F

C

Choice, so there are so I I would look at it. I would off hand look at it as uh volume replication class uh parameter. So so, when you establish it, it says it. It kind of says: replicate this to slow. On the other side volume replication class, a and volume replication class b is replicated too fast. On the other side,.

G

Okay, yeah, I mean so. This is a that that could work, um that this is another situation where we, when we designed something similar, we opted to just create pvcs on both sides so that the ordinary storage class scheduling, other algorithms could run on both sides to pick the locations for the two volumes and then once the two volumes existed, then we establish the relationship, but but if you want to encode it all on the primary side and then do it implicitly, it could also work.

G

E

H

What you're asking for.

E

Ben is some of the orthogonal problem. Basically,.

G

No, no I'm just I'm just comparing this to how we actually did it and tried it, because because there's some interesting differences, but for the for the most part, this is very similar and seems to go down the same path. So I like it.

E

I think the main difference is that, let's say in the trident scenario application owner decides. You know where to pitch. You know cluster to fade over to in this scenario. It's the same thing. Instead of just deploying the application you first deploy the vr and based on where the vr gets deployed. That's how you know the application gets deployed there. So it's you know one extra step. You have to deploy the vr first and then, once the vr is set up the infrastructure for the application that application can get scheduled.

E

C

E

Up the pvc and pv for that.

C

Yeah in the pv case, the pv has to be set up. First, yes, but but if we go vr cookie based and we do a pvc with the data source, you can just bring the app and pvc first and and the pvc will never get provisioned because the data source it points to is not yet created.

C

So you can have vr coming in first, in which case when the pvc is created, it won't fail provisioning or you can have the vr created later and the pvc comes in and you know keeps failing provisioning, because the data source is missing, which in this case is the vr.

G

So so can we can we uh revisit the how this plays with groups again, because I've been thinking about all of this in terms of one pvc, so you're saying? If, if, if we had a concept of a group, how would what would change.

C

Okay, so, uh okay, I'd like sorry, okay, fine I'll talk about something before this you're talking about pvcs on both ends primary and secondary, sorry, yeah, pvcs on both ends and then in the in the trident model.

C

I was just curious there as to how do you deal with stateful sets or how do you deal with? I mean there are there are so many operators out there right now who create pvcs quite dynamically? I mean they don't even expect that the, um for example, uh the crunch crunchy db, postgres sql operator.

G

Are you talking about how you deal with them on the primary side or how you deal with them on the secondary side,.

C

How do you no the primary side either either other operators create the pvc or the user, creates the pvc or the.

G

C

The safety stateful set and hence pvcs, get created yeah.

G

C

That's fine that cube, I mean existing cube control plane handles that.

C

But then I guess uh what I'm, what I'm asked? I think it's equivalent actually in the wall drip case, you'll have to create wall wrap for each of these pvcs, whereas in your case you had to create pvcs on the west cluster for each of these dynamically appearing pvcs and then clear them.

G

At some point, yes, so so the idea is the pvc can be created with no replication relationship and it can just be an ordinary volume and then, after the volume is created, you can decide. Oh, I want to mirror that one uh and you create the necessary objects and the you create the secondary pvc and the necessary objects to tell trident to do to do its thing and then now you have a pv, the pvc.

G

Is there it's it's receiving data, but you can't actually use it for anything, because it's not writable, you know you can attach a pod to it. It'll come up and the all the writes will fail.

G

So you have to basically wait um now, if you, if your app is a stateful set, um it would get trickier because because in order for the in order for the the secondary to have a relationship, you do have to create the a different object before you create the secondary pvc so that it knows not to create an empty pvc but to create a mirror. Pvc on the secondary and a state, a stateful set it wouldn't.

G

It would not work well with that, so I'm, I guess I'll have to think through what what exactly that would look like and how we could potentially deal with it, because we probably don't handle that case very well.

C

Okay and now I'll segue into volume groups- okay, the way I see volume groups playing here is that, instead of the volume replication data source being a pvc, if the volume replication data source is a volume group, the one that gene is working on the volume group specification that for snapshots that's being worked on right and uh right. So so now what happens is newer volumes? Pvcs could be created added to the volume group removed from the volume group.

C

What not, uh but the volume replication acts on the volume group, just like a snapshot, would act on the volume group and snapshot all pvcs. So it would again from again assuming that the control plane communication happens from the storage yeast storage west, create appropriate, mirror replication copies on west, based on replication class parameters.

C

And uh and when you go to um when you go to west, uh the the volume replication resource is created again, uh uh okay, so the pv model works. But then the cookie model we'll have to kind of see. Okay, we'll add we'll, add it. It will, anyway again get a cookie to the volume group.

C

So when the volume replication is restored on are created on west, it will have to promote or demote the volumes based on primary or secondary roles that we assigned to volume replication resource on west, pointing to the uh volume group which will come in when the app is deployed. Just like the pvc comes in when the app is deployed.

A

What if volumes, are added or removed from the group.

A

So, are you handling that dynamically because in this case, sounds like you're dynamically uh trying to figure out the you know what to do on the on the remote side right? So if it's, if the group member.

C

So if the group membership changes, something add or remove, I'm going to is this gets a storage specific, but intent would be that so that.

C

The question: yes, the intention would be to handle it dynamically. Yes, but the question really is whether the storage systems can handle it dynamically or not. For example,.

A

Well so this is uh basically uh like in the the current the cap for the one group where we do have apis for you to add a volume to your group. So let's say user. Does that um or are you saying once they start this replication? They can't do that anymore. I guess I'm trying to figure out like what's the, um what are the steps.

C

They they should still be able to do that, and if the volume group changes like it gets, gets a new volume volume or gets an older volume removed.

C

The replication for the newer volume is is triggered or the replication for the older volumes removed is how I see it.

C

G

C

So so, whatever rules applied to the volume group should apply to a volume replication referring to volume group as the data source it shouldn't change.

A

H

E

So basically, the.

H

E

The short answer for volume group is that the volume replication would reference either like the cookie once the backup is done, either references a pv or a volume group right, yeah and the information encoded in the cookie is used to either restore or to set the pv or a volume group as the primary or which is a bunch of pbs.

E

C

That that that would be, uh that would be a good summary. Yes,.

E

So look, we haven't talked much about the cookie, but the cookie is more like a some kind of random text. That is, I guess it can be a label annotation on the pv or a bunch of pvs right that can be used to identify.

E

You know a single pd or a bunch of pvs. Is that how you implemented it.

C

E

No, the cookie the cookie here implemented. Is it implemented.

C

As a label, so we uh we did not implement the cookie, so we are transferring pvs. We are transferring pvs across clusters, so with the assumption that the csi volume handle is the is immutable across these two clusters, and so it will be able to map back to the original volume. So that's what we're doing currently, but that's, not user serviceable, simply because it's a cluster scoped resource and definitely not true for all storage providers, that across east west or across different storage clusters, the volume handle is readily reusable.

C

So we don't want it. So we've not done the cookie stuff. Cookie stuff is kind of hand waving.

C

So, okay, but then I just but then after having said that and to answer you not answer to elaborate uh talk more about what you observed, I'm not looking at I'm, not thinking labels and such I'm, I'm more looking at.

C

Status, output of the volume replication resource to actually have the cookie.

A

Yeah, I'm sorry to interrupt. We only have five minutes left and we do have another thing on the agenda, um so it looks like we. We still did not finish this. uh We probably should uh bring you back in another meeting.

C

Hey sure I I think we may need to start discussing this in the uh slack more and and.

A

Yeah, that's fine too, so we do have the ep. One group just have a uh select channel, we can yeah. We can definitely do that. Then we can decide whether we want to have an another meeting or not yeah.

C

Because it's kind of out of flight, so okay, thank you and I'll hand. It back to you.

A

Okay, yeah, hey great uh presentation. Thank you! So much! Okay, sean! Can you talk.

A

He cannot mute himself, okay, so, uh let's see what is it double muted? Maybe I don't know what's up, I don't know what's wrong? Okay, so let me see, let me try to share that.

A

All right, so basically the the other issue here um is this: uh this issue: hey? uh Who? Who added this? Can you can you talk? Please or add your name? I don't know where.

H

So I uh we discussed this uh issue a few meetings back uh and I think, like griffith was probably, if I just wanted to check, if there's any update here, if we were able to find the bug uh like why the back off is not happening. uh I think we like a few meetings back. We I had this in the agenda as well.

A

Right, uh I think, probably because of holidays, I haven't I've, not seen any update on grant. I just I just pinned him. I think I have not got a response yet, but I I was, I will see if he can. He said he will get back to this. uh You know you see here. I just said he's hoping to come looking into this again, so I assume now after the holiday. He should have time to look into this. So let me I'll ping him again make sure that sure thanks so.

H

Don't have this.

A

You don't have this first issue anymore right, the first issue, which is the um the object, I think.

H

The issue is the second one itself.

A

H

We face in the first place, not the first one, which we observed.

A

All right sure sure, uh but the backup should be successful right, it's uh or in your case, you you're running into issue that it cannot get successful.

H

The backup is not like the exponential backup thing was back off was not being odd.

A

Okay, okay, I see all right: okay, um yeah, oh I'll ping, I'll ping, him again.

A

Thank you. Okay, um then looks like that's the last issue. We have uh anyone else. Do you have anything else you want to talk about.

F

Since we have a minute left, I just wanted to point out to ben about the model that sean presented right which, where we don't create the pvs on the other side, while this talk was primarily for async replication, but when you apply something similar for you know for sync replication, where the storage is stretched, then you know there are two kubernetes clusters with a stretched: storage. Back-End then the model is you know you don't want to create pvs on the other side. For that case, uh then it will pvc.

F

Then it will create two dynamic pvcs right. So I don't know how you handle that sort of model of creating pvcs on both sides for a stretch, storage, question.

G

Well, even if it's synchronous replication, it's still the case that you can create one unreplicated volume and then decide to replicate it at a later time to another volume.

A

Oh, if it's tragic cluster, I think you should just have one right. It's stretch it. Basically, it's it's like one cluster correct right. I think that's different. I think that's this one.

F

I'm sorry, I'm sorry. What I mean is the storage cluster is stretched, but there are two kubernetes. Oh.

A

Oh okay, okay, yeah, okay, then you still could have two representations. Okay, I see.

G

So but I mean the the decision to add a new replica to some sort of replicated. Volume in principle can be done at any time.

A

G

A

Have to uh enjoy another meeting.

F

Sounds good yeah just wanted to bring that up.

A

Continue later, yeah sure: okay! Okay, thank you. uh That's a good discussion today, we'll continue at some later time. Thank you. Bye.