Kubernetes Storage SIG Volume Snapshot Workgroup, 24 Sep 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG Storage - Volume Snapshot Workgroup 20180924

Description

Meeting of Kubernetes Storage Special-Interest-Group (SIG) Volume Snapshot Workgroup - 24 September 2018

Find out more about the Storage SIG here: https://github.com/kubernetes/community/tree/master/sig-storage

Moderator: Jing Xu (Google)

A

Oh so for storage resource, we have code on burning space and we also have Cola for storage class. So if you look at this table, the first two is per namespace. So basically you create a coda object in your namespace and you can specify requests.

A

That means the cross own PVCs at whole. Whole storage requests like how many gigabytes cannot exceed this value, and you can also limit the number of PVCs in that namespace and.

A

Also, since we have storage class so similarly for each individual storage class, you can specify how much storage you must like restrict and also how many number of PVCs you can have 30 this storage class and so for snapshots I think we can have some kind of similar concepts. We can have Kota 4 per namespace. We can help code out for we because snapshots class and so for namespace. You can restrict how many number total number of bottom set shots.

A

You can kind exist in this namespace, so I think that's should be a straightforward, but for requests. If we have requests for snapshots, then we have a little bit issue here, because we don't know exactly how much snapshots takes up the storage. We only have a restore size, which is the size of the volume that you take snapshot from, so whether we should have a similarly I can request snapshots in Kota or not I.

A

Think this is me some discussion and this is in space and then we can have per snapshot class and seeing snapshot, class and storage class. Alright can.

B

You hold on a second I'm trying to understand you're, referring to a quota related to the size like the consume space of the snapshot.

A

Yes, okay, I was.

B

Gonna say, generally speaking, that can be very hard number to compute.

A

So that's why I think we might not be able to do that, but whether we can have something like since we have a resource on whether we should have code on for that in like in speak today. Okay, you can take snapshot for this much storage.

B

But use kiss, what are you trying to prevent.

A

So it's just like you don't want to take too many snapshots, but it.

B

So if you have a quote on the number of snapshots you can take, but not the total size, what is the bad case? It's somebody makes a really really large volume and then takes a maximum number of snapshots of that large volume and.

A

We can first just gave total number of snapshots per namespace. I. Think that's that's good enough! Then we can just at that part we don't need really have the like requests snapshots, like total number of resource eyes,.

B

A

Just suggesting whether we should have seen koto not.

C

Sure but there's gonna be two quotas right, because you're gonna have this snapshot quota and then, when you convert it into a usable volume, it's gonna fall under that other quota right. You're still thinking that.

A

I think snapshots and the storage can be separate kotas so because here we have central class right, do you have storage class and for system I mean if they want to set up coda? They can kind of have a rough idea like if they have total 100 gigabytes storage. They want to have 80 gigabytes for storage like volumes and 20 gigabytes for snapshots, and then they can have codec for snapshots.

A

If we don't want to have this, like storage request concept, we can only limit number of snapshots, then that's very two different concepts, so you just limit whatever number you want you have. The number of snapshots is that of the size of the centrioles I.

D

Think it makes sense to have request a snapshot because that gives away before add means to say the upper bound as far as what percentage of their stores can be used for snapshots. What percentage can be used for? You know data so and also back to Ben's argument that you cannot calculate snapshot sizes correctly, I agree.

D

It's based on the type of back end snapshots may consume more space or less space, but one can say the same thing about volumes volumes based on whether they're same provision where their D Dukes, whether they're compressed you know what you end up on the storage backing is quite different than what what you either may think so. I think in that context, it's okay to use the request, a snapshot as a way just to offer bound what percentage of storage can be used for snapshots or not yeah. So I.

A

Think upper bound. Is it something kind of a good use case of this right, even though it's natural am I not.

D

Basically, these novels were coming from the same storage pool as regular volumes and admins wanted to be able to control.

D

It or not, that's the way to do it right.

A

So in I think we might not use different name like it's. Not really request for central, so I say, restore science perceptions cannot exceed is total.

A

This number so like he said, it's provide a coupon for the total storage. That's consumed by snapshot, although snapshot may not take that much storage.

A

And so this is for including space, and we can also have per snapshot class and the concept would be similar, so you can restrict the total number of snapshots and also the storage, like the resource storage that is done. Snapshots can fix.

A

So another thing we are also thinking: this is per namespace per central Cass. How about super volume, whether we should provide a way for user to specify they can only take this number of snapshots per volume and if we already take well I, say more than that or equal to that you can make economic I mean more. Unless you delete some the.

B

Problem you're going to have here is, um if snapshots are not linked to the volume from which they came, you have no way to compute. This number, like the volume, could be gone.

B

Although I guess you can't create more snapshots, in that case,.

E

I'm kind of concerned about is: do we have some like concrete use cases for some of these things, because it, it almost seems like we're just trying to think of any possible feature or any possible quota that you can have, as opposed to thinking about things that people might actually want or use.

E

A

The concrete use case I think the the old the general idea is just like. We want to give user a way to kind of restrict how much storage daesh can use for snapshotting.

F

Okay yeah, so if that's the case then then it could be only.

E

The only one of these that's actually really important is capacity. Usage right, I mean, which is the hard one.

C

But it's also the way we handle volumes today, right, like I, mean I think well they're trying to get. That is some consistency. It was the way we already managed in kubernetes, and it's not perfect because, like Harlan said it's it's not true to what's actually happening on the back end in the storage. We don't have a way of enforcing that, but it's at least a way for an admin to restrict user unlimited number of snapshots and keeping that size reasonable.

C

I, don't I mean it's it's fundamentally broken prior to this, without having the enforcement of that physical storage, so yeah.

E

To be clear, I'm not I'm, not I'm, not proposing that we shouldn't restrict it I'm, just wondering if we could narrow the scope a little bit and focus on on the concrete okay.

C

So, in a typical storage like outside of kubernetes, if I'm an admin managing snapshot, do I typically restrict those today, I mean how is it done outside of this? What are those use? Cases look like and are we trying to mimic those same use cases and Advan wouldn't want to manage their storage in a non?

C

B

In a lot of cases, what ends up limiting usages is cost. So if you people have to pay for the bytes used they'll sort of limit their own usage, you know if, if they're paying for the storage itself, but if you're providing a service to someone else- and you don't have a good way to build based on actual usage- then something that quotas does provide ways to prevent bad bad actors.

B

E

There's there's two there's two things that I've experienced in the past in terms of quotas and limits right and usually the the limit in terms of a number of objects like number of snapshots that usually comes up due to device behaviors or API limitations, or something like that because for the most part, the only thing that is really of concern is sucking up all of the storage resources on the on the back end. So it's a capacity thing as opposed to and in using any things like count or number of snapshots doesn't really mean.

B

Anything especially.

E

B

E

Example, if your snapshot Sergey duped right, you take a snapshot every hour, but there's no changes on your system. You can have hundreds and hundreds of snapshots, but they're all actually there's no actual consumption, so yeah I mean I, know those are kind of silly things to think about, but.

A

Super small, so I mean I, just thinking. There's some system like me, might care about. If you have too many of those, even though the total size might not really big, but it still don't want you to to create that money. Yeah.

E

Right you're definitely I definitely agree. Okay,.

A

So we basically just provide some mechanism like then how user use it like depends down there environment, but those two like number: why is the number? The other is the size seems common like use coming way to like restricts so I? Think since we provide this for volume and so far, I think I, don't hear any like major issue relate to that, and I think was that we can follow the same pattern. So user can use it. You know very come in a way. Yeah.

E

I think you're right, I think they're. Following the same pattern.

A

And so I think for for namespace, first special class. We don't have much like pshew right, so we can agree, we provide beasts and how about per volume so I'm thinking this kind of.

A

B

A

Thing I want to or prevents themselves to, for forgetting, like take too many snapshots for optical volume. If we provide this, that's um it's my not that's critical yeah.

D

Sorry there are some sources that have limitations as far as number of snapshots for volumes, but give me that in communities, the lifecycle of snapshots is completely independent of the volumes.

D

This may confuse users to make this association between stress and.

A

So I mean if we providers might confuse users.

B

This one seems like it's in the gray area and I would say probably given. That is not following an existing pattern and there's no clear reason to do it and the confusion that our lawn mentions, I would say: let's you can always add it later. I would leave it out of round one.

C

We do do this today in kubernetes we do have restriction by storage class, / namespace, whether or not it can be used in that namespace. There is yeah.

B

Yeah yeah I did the / storage costs the printing space sure, but the the per volume seems like one that could be left out because there's no obvious reason to do it and no no pattern. Just following the.

G

Doors straight back: hands that have this nutrition, but if there's something straight that kind they just can't take only a certain number of snapshots promoting true.

H

E

You know you could I mean I, see both sides of the argument, but it does seem to me I kind of agree with Ben. It seems like it would be okay to leave this one out for the first pass, at least because the other thing is is the driver can still respond if they have a specific limitation and say hey, I can only create 100 snapshots of volume, and you ask for a hundred and one sorry: do we see snapshots no something else. Whatever I mean it's not like. It's an insurmountable problem to solve.

A

Yeah I think that's that's good enough for providing first two and they see me image for now and so yes, so the back end will return some error message and right now it will open it to the user and user can understand. Okay, they just lost that chance before they take next one.

A

Okay, so seems like we set around the story, the snapshots code, huh and so we will follow the same pattern as PVCs and.

A

Anyone think this is critical for like a moving train. Alright, it's mine being p1 p0.

A

So maybe residue p1 is that okay.

H

Yep sounds good: okay,.

A

And I also add something to the list: go back to the division protection. So last time we talked a few scenario, but I think we forgot. One more scenario is since we have two objects: modem snapshot and one session content similar to PVC, PV and you might user, might be beat volume snapshot content.

A

While it is still points to the bottom snapshot so for PVC I think we added a feature to add finalizar, so you can ask the RTD delete PV the volume if it still points to PVC and currently, if you deleted, it will kind of fail and the status of PV is called terminating, but it's not actually deleted and until I think the PVC is deleted, and so for volume, snapshots and smaller static content. I think we probably need a same protection, so otherwise the people bottom set shot still binds to this content.

A

But this fountain table is gone.

A

So we can have the similar protection use the finalizer so until and I qalam set shot, is you need it? We cannot delete volume session constant, any quest about is protection. Make.

B

Sense to me, the.

D

Jing, the finalizar for the PVC and PV. The way it works is that wall the wall using used by a running part right. That's when we delay the deletion of the PVC.

H

Wander like it's 2 to finalizes, the one is between pv pv c and one is between TVC and pod.

A

So we do add protection between PVC npv when they binds to each other. Oh, you cannot just leave me. First, okay,.

G

B

We already talked last week about the the case of while a volume is being created from a snapshot. You would have a similar kind of protection. You couldn't delete it until the until the volume was done, creating.

A

B

Would that would be analogous to the PVC pod finalizer? Don't don't believe snapshot while someone's using it to create a volume.

A

And this is kind of a p2 feature. We want to add yeah.

E

A

Mean the second one, the third one is the important yeah.

E

A

Okay, so we can say these divisions protections for the first three scenarios. We can make them sp1, so we want to make sure these feature before we move to beta I.

H

Think that's okay, one more thing I wanted to talk about was a retention policy. I. Think I commented a little bit about this on the blog post. But currently it sounds like the default retention policy is to delete the snapshot when the objects are deleted.

H

We don't appear to have any retention policy. Is that in the works? Is that something that we want right?.

A

This is the next item: perfect I called richelene policy right. So right now. Yes, we don't have anything like policy visit, a user debate, bottom snapshots, API object and everything is emitted, including the bottom site content and that physical snapshots and for volume we do have a policy rights. You can cut.

G

A

Maybe that's, why is no longer supported, but we have to eat under retain. So, even though you delete the object, the physical volume can still like. You can still keep it. Instead, you need it so for snapshots. We can have the similar policy. Why is coordinates the other content in it? It means if you need to be objective section, everything is needed and but if you want to have a return policy, so then only the API objects will be deleted, but you want to keep the face. Cosine shot, yeah, I thought.

B

That the analogous thing in the PvP VC world was retainment, the PV actually hung around and could get rebound and I'm misunderstanding.

A

Objects, but to keep their content object and physical adds the wrong so.

B

You do keep the content object. Okay, you said there was something around you could delete the content object but keep the actual snapshot on the storage controller, which made less sense to me.

A

Okay, so I mean we do keep the volume. Stats are constant. Yes, okay and.

B

In any situation where the volume content rick is deleted, it's expected that the actual snapshot is deleted to.

A

Whether we can have that currently we don't have protections so I think, like I said in the first I, don't know condition protection I think well, PV bottom such a snoop on to each other. If you just delete the API objects at the content, account it there's no protection like you, the API object will come, but the physical snapshot will be still exist, but we want to prevent that so after we add the deletion protection for that, so yes, I think we can make sure content is still okay.

B

A

If the physical one it's the common one will become yeah. Currently we still have this problem. Yeah, the content might be gone so.

B

The goal is to mirror what happens with Peavey's and PVCs, except what I'm hearing yes.

A

B

A

For these two rights after we have these two protection, the additional production and the reckoning policy, then we will have better control of the life cycle. Yes,.

A

Any questions about the reclaim policy, so it's well.

B

I guess I guess my only my only beef is um the name makes a lot less sense for the garden calling a reclaim, because even with a PVD, you obviously have a way to reuse it. You can delete everything and then go use that for something else with a snapshot there is no way to actually reuse it. I can imagine reasons you would want to keep it around, but calling it a reclaim policy is a little bit funky because you're not actually going to reuse it for anything you're just leaving it there for.

A

B

Of emergency there.

A

You reuse the right, for example, so you did eat from one in space volume set shot at the content and physical central co-exist, and you create another wall in sexuality in a different, namespace and minds to it.

B

D

Sunny was once once the TVs in retain and the corresponding pieces daily that you can no longer be bound to a new PVC and I would think it's same here right. Look. You wants to have a yeah.

A

So I, yes, the fall volume that makes sense because volume will have data, so the user first and I say the first user that could write some data to that volume and the you do want to have protection. This volume will be used by another user, but actually, if we move down to the next item, so I'm thinking snapshots in a way that's quite different from volume, because snapshots you're not like right and it ate her to it right.

A

The first user write something on the second edition: Knox access it by snapshots depending the kind of sharing policy and how I mean set it up, then the user might want to use it share it across different, namespace or different user can might share these snapshots. So.

A

Even let's say you eat from the one namespace, but you do want to use it in different namespace. We if we say after the snapshot appoints to this one and you no longer use it for any other snapshots. It seems to kind of restricted for snapshots and snapshots is sharing, seems kind of a good use of central, great yeah.

B

But that when you delete the volume of snapshot object, then the the system will in response to that delete the content object automatically right, that's what's supposed to happen, yeah. So what Pro? What prevents that you're saying you set the policy.

A

B

Thought the reclaim policies were not modifiable so like if it's the reclaim policy is delete, and it's always going to be that.

H

And there's you can change the reclaim policy after the creation of the object. Okay,.

B

So so to satisfy this migration or namespace change use case, you would temporarily change the reclaim policy to something.

A

Policy or by default I, say read, but your change to reclaim and just keep it like that. So.

E

One thing, though there is a so you can run the patch come in and change the reclaim policy, but I did notice in the current documentation that that's listed as deprecated, so I don't know if that's something they consider or not. It.

H

Should be marked as deprecated only for the recycle policy, the other two policies should be supported.

B

So you can't recycle a snapshot, obviously, but you're saying you could. If you had, when there was marked delete, you could change it to retain, and then you could do this namespace swap trick and then you and then after it was in a new namespace. You cos changed the reclaim policy back to delete and it would be just like it was in the new namespace yeah and that's exactly the way it would work today.

H

With PvP BC as well, okay.

A

D

So would I be using the retain policy here to enable the sharing right now are because counting or retain reclaim follow us need to be the mechanism to enable sharing, or do we want to be able to support sharing without using the retainer client policy, because I can see I mean users don't have to leader volume snapshot in order to enable sharing. They may want to use a snapshot insulting and also share it. So I don't think this should be the mechanism that facilitates sharing.

A

Yeah, it's not McKenzie before sharing it's just I, don't know. Sometimes users just wanted to beat the API but keep the same shot around and mmm I. Don't know that exactly why they want do that, but some user seems to have a use case of that yeah.

H

It's it's not the primary intent of this feature to be the mechanism by which you share one of the side effects is that you could do that, but the primary purpose here is just to be able to ensure that the snapshot is not deleted. If you don't want it to be deleted, automatically yeah.

D

So I look at these different topics. We should investigate each separately. Yes,.

H

D

Them together well.

B

The that- and it sounds again like what we're proposing is- is to mirror what happened with PVS and PVCs again, but at the beginning of this jing you said you see it very differently than PVS and PVC. So I don't see how it's different after we after we make the statement that it's gonna have the same policies as PVE PVCs.

A

A

Sharing the.

B

Sharing, ok, the sharing is the different part.

A

B

And it is a separate concern than this. The claim policy.

A

But the naming may be like, as you said, I'm not sure. If, since this is, the same, pattern is PVC, so that's why we use reclaimed, but it's not mm I'm, not sure whether it is a good way to say for snapshot. You well.

B

It sounds like it sounds like the ABC one has its deprecated with regard to the recycle and I. Think it's recycling them it's most confusing. So if that's going away yeah, maybe both names should be changed. You don't want to rock the boat there.

H

Yeah I'm, usually all about consistency, but in this case I think routine might make a little bit more sense. Well,.

B

The retention typically has a different connotation in the context of snapshots in the context of snapshots. There's you talk about snapshot retention with regard to like a snapshotting schedule that is taking snapshots and automatically taking and automatically deleting snapshots, and the retention is how many, how many you tended to keep before you delete the oldest one right.

H

Right right, that's a good point. So then, maybe just to keep it clear: let's leave it as a reclaim policy, because that, as far as I understand doesn't have any other conversation.

A

B

Deletion policy, because that would make more sense for PVCs as well. If we were to change it.

A

Okay, so I think the purpose of this policy, which is meaning to prevents.

A

Objects also makes sense. We can turn today to use division policy for now and I someone who, like other ideas,.

A

D

A

D

This question: for the past two three years: I think that was the right time to ask so when we set the return policy for a heated reach and retain the reclaim to retain that leaves the PB around and the only way one can consume. The data corresponding to that TV is to manually, create another PB with the same content right and have a PP bind to it right. This is the only way somebody can make use of that volume again in kubernetes I.

C

Think that Garrett you'd have to clone that volume and create another one. The idea of the retain policy was the company said: you've had to keep the data around for 12 months or something could put it in contain and it would sit there and be up to the administrator to clean it up. Does that answer your question.

D

Yes, they requires manual intervention by the admin to whether it's for auditing or for any other purpose. But if, let's say somebody accidentally deletes the volume and every clean policy is retained, the only way one can reuse that volume is to manually, create TVs, and that has the same content as yes.

H

And that is on the idea is the default. Behavior is all automatic. There is automatic provisioning, there's automatic deletion- if you explicitly are out of that, then you're opting into a manual process. And if you don't like that manual process, you can always change the policy again back to automatic.

D

Okay, I mean the reason I'm asking is because the you have customers who are asking. You know how to prevent accidental deletion. Things like that, and this is what I've been suggesting to them. I just want to make sure this is the actual workflow yep.

H

That's that's the intention. Okay,.

D

A

Okay, so related to this policy, so I'm thinking about the four snapshot, push like I said sharing seems very useful for volume. In most cases, you don't want to share volume across different namespace or users, because it contains like users, data well for snapshots. You might have, let us say some snapshots and you want to use it in different scenarios, likewise for testing ones for development in different namespace.

A

In some cases much. The user might also want to share this snapshots and right now for volume right. We have this one, one, nineteen between first volume in EC and kV and I'm thinking. Why for snapshots it's much this way if sharing is useful, since was that shot for some who you have physical snapshots, and you have a snapshot content for that and currently, if we want to use the same snapshot in different namespaces. Similarly, you need to manually create another snapshot, content.

B

I disagree that the difference has anything to do with the data inside the volume into the snapshot. I think you can make the same arguments about you know people wanting to share data in a volume you can for a snapshot. I think that the salient difference between them is the consumption model. Most volumes are the you know read/write once not read/write many. You know for a reader mini volume. You could make the argument sure lots of shared across namespaces lots of people mounted and attached to it.

B

There's no problem with that with snapshots, this sort of a default and anyone can use it, and so it's automatically shareable I think that's the important difference to think about.

A

B

No concept of read/write once with snapshots so.

A

My idea is currently, if you want to use that Johnny different namespace right, you need to helping content, object and then use to run on snapshots of like to find them, and why not? We just have one content object corresponding to that snapshot, and you can have some policies in this content. Horns.

B

It goes again, you were saying you could create two volume. Content objects with pointing to the same snapshot, but that's gonna cause the problems when you delete one of them and it actually deletes the underlying snapshot. What would prevent that.

A

B

I that doesn't seem like a reasonable.

H

You're saying that two volumes snapshot objects can bind to a single content object so.

A

You can have objects. Okay, it's point to the.

A

H

Got it but there are separate objects, so.

A

The same same thing, right.

H

A

That's point to that you that's a DC motor, but.

B

It's a sketchy thing to do because deleting either one of them will delete the underlying storage object right.

A

Protection of that okay.

B

A

Maybe you have a point. We should think about these two who PV point to the same volume. If you delete one, we should prevent.

C

Why wouldn't we clone that one and then make it so it can be cloned and then it creates a new volume so that we don't have this chaining going on I mean I, assume the assumption. Is you create a snapshot and you want to give it to someone else within that namespace to use or possibly in a different name space, and you don't necessarily want to have the same retention policies. You want to be able to update that snapshot based on what you're running so we're we.

C

Why are we changing in the or my misunderstanding, I.

B

Thought that the use case was I, want to have a snapshot and I want to share with everybody. No, not a specific user, but just say: hey, here's a snapshot and you anyone in the system can can potentially access it. But how do you make that happen?.

C

Access a copy of that.

B

There's there's no reason you couldn't be accessing the actual snapshot simultaneously.

A

System, but you can say, create volume from set shots. Multiple volume from sections online system like how it can be done prepared out or it should be done sequentially, but you can have multiple volume read ticking, restore from the same snapshot at same time. You can have such request.

A

C

Assumes that we're just reading from that snapshot, though we're not writing to it. Is that a fair assumption if you're gonna do the simultaneous access.

A

That's what I understand yeah I.

B

Thought snapshots were defined to not be writable, so it's an easy assumption to make.

D

So I haven't been following the discussions on Walden cloning and sharing wall. I'll use across namespaces I think it makes sense if you're to have a generic mechanism to share objects across namespaces and whether it's volume snap cause it all works.

D

The same way and I think focusing on the snapshot use case and the fact that it's read-only- and you know there can be many to one mapping or whatever I rather be focused on the larger pattern of how to explore objects from one namespace to another and whether there are snapshots for pb's doesn't matter in it's a McKenna's I agree.

C

And I mean I, guess I thought, though, that we would leave snapshots the way they were in snap charts, snapshots, weren't, technically usable til. They were promoted to a lot of you and if we fall underneath that same mechanism, then we can restrict the cloning to the volume piece and it can be snapshots or just you know, snapshotting a volume. It didn't happen to be a snapshot.

E

Advantage of that approach was well, not other, but we articulate exactly what what you were saying. The advantage of that is you don't have to worry about the linked cloning or the links to the volume. You would just create a volume from a snapshot transfer the volume to the other namespace and there's your volumes.

E

If you want to make your own snapshot from that point on, then you can do that.

A

Okay, here it's just like kind of thoughts about sharing policy really snapshots, since that I do have some difference from the morning right central. Is it today already the only thing and they seem have more common use. Cases relate to sharing so here I just have this idea, but it's it's kind of just for discussion. It's not something. We definitely want to do right now or yes.

A

The idea here is just we might for some time how about we allow you like many to one I mean so I'm, not equal. One in snapshots can binds to the same one of such content. So, instead of you manually copying content many times, I, don't see any like dangerous thing might happen from this. Many am I being, but for volume we want one one I mean because the data is like you can rewrite who they want to share it in many cases. Well,.

B

There are situations you don't want to share with their situations where you do want to share it. I don't buy the argument that volumes are different and snapshots in this regard.

B

D

The the fact that you can have safe many to one mapping is not the problem. The problem is really who you can't share your snapshot with. Only me, the Intendant people, not just with everybody, and these this problem applies to volumes. I apply to other contexts as well, so I think that's a bigger problem. You need to address.

A

Yes, so the initial thoughts for me is the content can have some policy saying which namespace you allowed to share this snapshot. So it's very from the basic policy is only restrict which namespace you can share, but if we want to provides more like a set of sharing policy, that's probably out of stroke. So our great neihaus want to have more like sharing policy across namespace across which, for both their jaws and modems. Well,.

B

And I have a feeling that this whole sharing concept is kind of like p2 or lower, because it just doesn't feel like something that you need to have it's very much in a nice to have. Yes,.

A

Yes, so we can just leave it for now means we don't have much time for like discussing this much details and we probably have a different discussion relate to sharing and clone volumes, not everything. So, let's go through whether we we have other things need to discussed and for creation, retry policy design. We say we need more design. I.

A

Think I will put out a separate documents related to that. So any question relate to retry policy. Last time we are saying: okay, we might have by drying for taking some shots, but we need to consider all the cases. Didn't you like time, differences among different systems, yeah.

B

Yeah the deadline is certainly better than a like a retry policy, but it does have a lot of thorny problems.

D

If you specify Devlin in terms of the duration from the point where you create the snapshot once not an object that addresses all the time difference times of issues, is that what is that viable or yeah?.

B

But you can't agree on and when that, when that clock starts, that was the issue with not with having at duration. Instead of a deadline is durations, require an agreement.

D

B

D

So the clock starts: whenever I create the volume snapchat objects as soon as I say keeps each I'll. Create volume, snapshot no object, then that's when the clock starts and then let's say: I have 30 seconds to take a snapshot and that 30 seconds is relative from the moment where he created its natural object with.

B

What, if there's a significant what, if there's a significant round-trip delay on the rest, API, what if it takes five seconds to do the rest API? Does that five seconds count against your 30.

D

B

D

That's a problem: it models, eventually things happen yeah, but that's the nature of kubernetes I mean there's no guarantee that you see the object timely enough for processing, but atlases.

D

It's seems better than studying like a explicit deadline. As far as let's say, 12 p.m. yeah.

B

D

B

Think the safest thing to do given the nature of kubernetes is to not do a retry and to make the system record what time it actually did. Take the snapshot, and then, if that was too late for the requester he can. You can delete it and try again, but you don't have to over engineer it just say: hey we'll tell you when we actually did it and let you decide if was soon enough, because we're not gonna make it we're not going to like there's no way to make it go faster.

B

It's gonna happen when it's gonna happen.

A

Why is time like a.

A

Period 30 seconds well.

B

Here you could also just say: we're: never gonna retry we're gonna make that to caught the users problem. Right just will return failure, we do. We didn't create a snapshot. Sorry or we created a snapshot, and here was the timestamp and if that's too late for you sorry in either case you got to ask again.

A

So we just tried once it's.

B

Actually, not a bad behavior. It's it's it's easy to understand, even if it's a little bit irritating it.

F

B

Makes sense anything else starts making less sense.

D

I think one reason why these are retry policies make sense is because, for example, some applications they have some. You know all you have service guarantees that they only tolerate, let's say 10 seconds of downtime so down the road. When we want to support application consistent snapshots, we need to be enforcing that and and I'm, not sure whether storage add me.

D

Excuse me are aware of those time. Constraints is really the application owners who are aware of those constraints and they're the ones who may have to set these repair policies.

D

H

I think that's the important part here is once we start forcing some mechanism to qsr the workload. That's going to happen for a short amount of time. We should have some mechanism to be able to communicate that to the storage system to say you have until this point in time to make this happen.

H

If you don't, then we're going to you know, continue the workload and you may end up with an inconsistent snapshot, but I'm not sure if that needs to also include how we, if we do reach eyes and how we handle retries or if these can be two separate designs. Any.

B

Reach I would have to be handled without hire application level. So there's no.

F

B

Having communities involved in the retry loop I think.

H

The reason we wanted to include it in the retry process was because traditionally kubernetes does. You know, attempt to drive towards a certain state, of course, snapshots kind of doesn't work well with the declarative model, but given that we would have a period of time, let's say you know 5 seconds 20 seconds 30 seconds within which snapshot can be taken. If something you know you end up with a network error. First time you hit the API, but you have another 25 seconds to recover from it.

H

It might be a nice user experience for kubernetes to automatically retry during that period. Can we end up? You know designing that nice user experience in a sane way that doesn't make life impossible for the storage vendor or for the other components involved. I think is the question: is the trade-off worth it? The design, complexity.

D

Policy, a parameter in it's not actual class or is the parameter in the volume snapshot, object.

A

It could be in either place so I think so.

D

Yeah, my point was that, if application owner is only the entity that is aware of this thing should be a primary in the volume snapshot object. Yes,.

A

Because s acerra.

D

Lis know: what's the application logic, so yes, whatever timeout the try policy states that may not be applicable to the application right. Our initial.

A

Thought is would be in volume, snapshots.

A

Unless we see it in this class.

A

It seems like retry policy is not very quickly go at least for now. So I think we can, you know, move it to Picchu and there is necessary to move to bethe.

B

Yeah, because the existing behavior is at least the same, even if it's annoying.

A

And also we have a kind of a later issue relate to okay, if you take a snapshot, well, PVC still not found it or the bottom still, not not ready. It will fail right. So rather we should want to retry and t.o.p. Bc is correctly bounds. It seems like just a nice feature, but it's not very critical. Yeah.

A

So, okay, we can leave it for just p2 and so right now it's already telling the rest of them. Also not I. Think very critical is some nice to have feature, but so we can discuss in the next meeting any other like issues of all. We want to end at a meeting any features we may sing. You think we are missing.

A

You can always like put your comments in document.

A

Okay, thank you very much. I think we kind of decided a few important feature we want to have so far and we are, we should more focus on those and then we'll discuss other things.

A

Next meeting yeah.

A

Okay, there's not a shoe so then now and see you next week. Thank you very much nice ride. Thank you.