Kubernetes Storage SIG Volume Snapshot Workgroup, 8 Oct 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG Storage - Volume Snapshot Workgroup 20181008

Description

Meeting of Kubernetes Storage Special-Interest-Group (SIG) Volume Snapshot Workgroup - 08 October 2018

Find out more about the Storage SIG here: https://github.com/kubernetes/community/tree/master/sig-storage

Moderator: Jing Xu (Google)

A

Bottom snapshots meeting so at first I just want to quickly go over the list, so we have discussed last few times the last few meetings and gave kind of an update, because that is the plan.

A

So for the first two, the division, protection and the record addition policy. We should be able to implement those in this coder and we start like you're looking at them and for resource Kota. I think this is also quite important feature, but currently we don't have support for recess for recess codon for c c rd, and I asked our team who is working on this feature.

A

It probably can be finished in two quarters, so that means we most likely can have it in q1 next year, I'll try to work and work with our team on it. So we can have a resource code on for snapshots as soon as possible. So for Q, who are we have it any issue about timeline? For this feature.

A

So, since everyone's ok like him, we can have it in two quarters.

A

And for topology aware to ask snapshots apology, we start working on that and seeing already have a PR, so we can talk a little more detail shortly in this meeting. So last test, yes, I, meaning we already joined and talk about this idea, natural topology and for the river and in place restore and thinking so far.

A

This caught her will like gather some feedback for users like who can start using as the natural as a feature so that we can get more or not how we retain like implement those functionalities, whether they are needed and well yes. So it's not like a very urgent feature. We need to implement now. So we want to have some feedback about those and fall laced. I have heard snapshots for volumes. Those are probably very useful feature, but again for this quarter.

A

We want to get some feedback and to see how we can provide those functions and for a consistent group, a group of Sandra's way. You know I like will initiate some discussion on that, and then we can put some like in Coco's, oh and for sharing cologne I think we will call it with, like others, for his organic clone.

A

He features. So we can see how we can provide a consistent way to provide those.

A

Any questions for the list so I add like a couple more so we can discuss today after the topology. We can look at the topology EPR, so I think he wants you like talk about the PR. You have on topology yeah sure so I need to share your screen or you can just.

B

I can share it. You think I.

A

B

Okay, no okay, yeah! It looks like on the presentation, I.

A

Already lost mine, okay,.

A

It still says the other okay, a stop sure yeah, okay,.

B

B

Okay, um you know so we talked about this at the last CSI meeting.

B

So basically, the first thing we want to add is a clocking capability for a snapshot. So here we added a snapshot, accessibility, constraints for snapshot, and then we renamed this one. We add a warning here so now we have two accessibility on screen and rewarding run for snapshot.

A

The reason we have these two separate ones like because we are thinking for bottom plucking- it's possible that they only have one it's possible. They only have one capability. For example, the volume don't have any topology construing first natural has or the other way like the morning have topology constrains, but the snapshot is no topology at all. So in this case we want to separate their King constraints, yeah the capabilities so that, for example, if we only have one and.

A

When the this constraints, the controller will ask whether you have this topology constraints and it answer yes and they will consume like both natural and the volume has. The capability is actually some some more one of them might not have to capability.

A

Okay, yeah just to explain why we have to.

C

B

Our voice is very low.

C

B

Been just a very it's like a strong distance, little.

C

C

All right go on and I'll try to fix my other here.

A

B

Go ahead, it just seems, like you, are talking at a very remote location or something, but here yeah, okay,.

C

Why sorry I, don't know what's wrong with this headset? um My basic question is because snapshot can't be connected to knows why does it go stability matter? All you can do is create volumes from them. So.

B

Yeah, this is for the upload part me. So if, after the snapshots cut, if you want to upload you to some location, then that could be a different location. So.

B

That's why that that's? Why we idealist? It's not it's not the same as if you just take a snapshot on the same block device, then that's horrifying that this is the upload card.

A

We have allotted to some location so in some certain zone, more southern region, and so we should also have topology support for snapshot.

A

Does this answer your question.

C

Why I guess that means there's something I don't understand, I, don't understand what uploading is in the context of oh.

C

B

We already have.

C

B

Yeah, because really even in the car smack, if you look at other Chris naps already talked about, there are two phases right. The first phase is snapchat is cut and then second phase is uploading, but a nod every Zuri system. What do the uploading right? Now? It's only the cloud provider I'm, sorry like Google Cloud it on the ass, but they were upload. The snapshot in the same create snapshot process going to be uploaded.

C

C

B

That's what we have this topology, then we we do have something in the design right. So that's why we we were discussing about English, stutters and the how to you know how to how to tell Co it is down. We talked about that for a long time, just because of those that we have, because we have those to say.

C

Somewhere great.

A

C

The end user needs to know, and so now you're saying well, maybe where it's uploading actually matters to the end user, but they still don't have any control over where it uploads to they just say: I wanted a snapshot, and this is it okay, I'm making you.

A

Choose, for example, I think he does like you can choose where you vote quick and right now, GC, for example, it's doesn't matter because it's global but yeah it might change too. So.

A

Yes, so through that apology right, so you can specify your requirement requirement.

C

A

C

So that was the part: I was missing, I'm, sorry, Oh.

A

B

Okay, so we want to the next. So this is just as some naming change, because we we modified the accessibility constraints that was defined for voting only before so now we added the volume in front of it. So, similarly, so for boarding message, that's that's from the crave only response, so just added that just is added avoiding in front of the accessibility conference.

B

This is the topology requirements, basically just making changes to the definition, so that considered snapshot as well yeah. So basically it's just a you know. Every word says the warning. Also added a snapshot in there I think maybe make more sense if I just go to the create pretty snapshot request, because that one.

B

So, okay great snapshot request. So here we added a topology requirement here specifies well, the snapshot must be accessible from so here you can specify the region with domes those type of information.

B

So, of course, the the the plugin is to report that snapshot, accessibility, constraints.

B

And and and then we can look at it as a topology requirement, so the in the Chris option response we also added topology here. Basically, that's after the snapshot is the probation, so this theater cell tells us well. We can have access to this snapshot from which region which zone so and and then let's go look at of the.

B

Very long, description of the Kabbalah requirements and apology, so basically it's going to be there are similar for snapshots. So this is this, has two fields: one one is the block receipt, the other one is preferred.

B

Basically, the sale will be looking at this this to you and then based on well the whether the node is successful to some zones, some regions and then decide where to place the workload so.

B

You have any questions on this.

C

Accessible you're, referring to the ability to create a volume and.

B

Yeah, that's the that's! That's the volume right! That's the water part right. The snapshots when.

C

I think about snapshots and locality.

C

The thing that's on the tip of my mind is I might want my snapshot in more than one place because I'm using it as my disaster, protection scheme or I might be worried about an entire site being lost, so I need to make sure my snapshot is in at least two places so that I don't lose my data and that's not an accessibility concern, that's more of a redundancy concern, but it sounds like you could achieve it with this mechanism. Right.

B

Yes, great this one, actually with this one, you can specify how you want it to be multiple, multiple zones, then you can suppose.

C

If the implication is, it should actually exist in both of the zones as redundant copies. The difference, that's a different thing to say, then you can access it from either of these zones.

A

Or more like thinking, if we specify this and so that underlying driver can I say, make multiple copies of snapshots but and based on this requirement that I see it's.

C

D

I think I could clarify that this terminology and this whole mechanism is essentially what we defined four volumes to begin with.

D

I completely understand your argument for replication, but the reason we want with with accessibility is because, ultimately, it's about being able to consume the volume or the snap shot it, maybe that it was replicated in two places, but ultimately the behavior is that it's accessible from these two places and the way that we actually Express the replication is through an opaque parameter.

D

So if the snapshot or the the volume are capable of replication, you'll have an opaque parameter that says, you know, turn replication on and then these parameters are used in conjunction with that to say: I want it to be replicated to these two. You know racks region zones, whatever does that make sense.

C

Yeah I guess what I'm trying to figure out is their scenario where someone who wants their data to be replicated and makes the right request, doesn't.

D

C

What they want yeah your data is in these two places, but then one of those places goes down and right.

D

I believe most systems are not going to allow accessibility without replication. So if you, for example, try to specify multiple zones- and you don't turn on replication- then your volume is going to be accessible from a single zone or the same thing with a snapshot and the snapshot or volume object that you get back. Tell you which zones or topology segments it's accessible from so you're gonna, see when you get the response that oh I can actually only access this from one. It's not until you provide that opaque parameter to say, hey I'm.

D

Turning on replication that the storage system says, oh, you want a volume or a snapshot, that's replicated. Therefore, let me take two of these required topologies that you specified and pick from you know, select two to replicate from yeah.

C

But I'm worried about the case where you have a back-end, that that says: well, it's accessible from all the regions, because I have my networking set up such that I can deem the data where it needs to be when you want to be there, but I'm only going to keep one copy because it's wasteful to have two or three or four copies, and it's you can blur the you know that you can say well, if all accessible means is that you can use it in that place. Mm-Hmm.

B

C

Next actually have to be there to meet that requirement, and so someone could potentially get away with saying yes, it's accessible in all of your regions, for all the reasons you asked for, but then in fact, if one of the regions, if the region goes away, that happens to have your data, then all of a sudden.

D

Yeah I think that risk exists. The number of storage systems that I've seen do that is very low, I think it's! uh You know the fact that this is already part of the API already yet you know I think consistency is better than trying to optimize around that and I'm.

C

Just trying to think.

C

And we need to be careful about what we promise, what we don't promise in the contract, yeah.

D

And actually I think accessibility is what we want. I would, rather, that we I would rather that we leave these as two separate concepts rather than you know, have a enforce. A concept of you can only be accessible from two areas. If you do replication.

D

Yeah, that's a good point. I think. Maybe what we could do is add comments to that effect. Yeah.

B

So we actually have some comments here talking about that, so if those two together right, so if it actually talks about an opaque parameter in create a bond request, if that says okay, this warning is replicated it's successful from two zones and then based on that, then the volume of snapshot she'd be really accessible from two zones. So here is one place I to see that this one is already applications being mentioned.

C

B

C

B

It's not actually, if you so this one. Is it if you.

D

Yeah read that comment right: there 522 to 525 I, think that captures it well yeah.

B

So I think the comments yeah actually very detail talked about. You know if you cannot sense that as much as you do so there are a lot of comments on that. If you request this, but you can only do so much where they should fail or not so I think a lot of details in the comments. The.

D

To be clear, the specification does not require replication, and that is on purpose. We want to leave it open to let the storage system define whether they want to do accessibility via replication or through some sort of network setup or through some other setup, and the way that the user discovers what is accessible or not is through the object, that's returned, it gets marked and the way that they request replication is through an opaque parameter.

D

Yes, I agree that there is possibility, possibly a way to if your storage system supports both replication and volume that can be accessible from two separate topologies.

D

You could have confusion in terms of oh I, specified two different zones, but didn't turn on replication in reality. I think those.

D

Systems aren't very common, at least the ones that we've looked at and I I think this actually goes back to something somebody who's requesting on the CSI called last week, which was the concept of using topology as aligned with network topology I think the use case was they wanted to have nodes that were belong to two different networks at once, and we had offline discussions about that and I think what we're leaning towards is we're probably not going to support separate networks for a given cluster through this topology mechanism, or at least not go out of our way to try and support that.

D

But we can discuss that more on the in the CSI call I. Think.

B

So you see in this stack, should we clarify that we do not require replication because, like people are confused, I think.

D

Let's have folks actually read through the PR and read over the comments and see if there is confusion, if there is yes, let's do it.

A

507 mentioned is synchronously replicated yeah. Maybe we can race through those comments and, if necessary, we very.

D

Fine and there's a question from Andrew in the chat. His question is: the question is whether the specification requires replication and if it doesn't, how do you request it? So the specification does not require replication, and if it doesn't, how do you request it? So you request it through an opaque parameter. All the create operations in CSI have a key value pair of opaque parameters which are defined by the volume plug-in.

D

So if a volume plug-in happens to support, for example, replication on either snapshot or volume, they would expose a replication parameter on as an opaque parameter in the create, well create snapshot and create volume.

D

Calls second part of his comment is it sounds like accessibility and replication should be completely orthogonal issues yes agreed so out of the scope, then yes, accessibility or sorry replication should be out of scope, because essentially the CEO does not doesn't really need to be aware of the fact that this thing is replicated under the covers yet, and accessibility does need to be part of the spec, because the CEO needs to take actions based on the fact that it is actually accessible from multiple locations. So, for example, in order to do scheduling, I.

D

Think an argument can be made that multi accessibility versus replication could result in different types of scheduling requirements, and if that is the case, that we could potentially add some sort of replication parameter or indicator into the volume response, the snapshot response objects, but so far, I haven't seen a concrete use case, for that and I would prefer to start with a minimal, API and add things to it rather than go. The other way.

C

My specific concern is, if replication is done, an internal cake way, and this is like part of the standard you.

D

C

Discover that at least on their implementation, if they use this mechanism, they get replication and they might start thinking, that's the right way to get replication. Yeah.

D

I know I completely understand that risk exists and I think it's just something that we're going to go forward with and if it results in a larger issue, then we can. We can add to the to what we have, instead of trying to prevent an issue that might not exist.

A

Yeah great so also, maybe the comments can't make anymore kind of misplaced in split so.

A

Explain I can this is accessibility not for the purpose of replication.

A

Well, we can also discuss this a little bit more in detail in CSI making.

D

One more comment from Andrew in the chat is the comments imply that behavior I think we should be explicit that it isn't guaranteed. So I agree with that. If there is an implication, we should be clear about that. It's not required and feel free to comment specifically in the PR. If there's a comment or anything that is unclear and we can take it from there right.

A

And those things, this apology is not it's for postmortem landless, next shots, I think it is also good we can, after releasing Bracken's, we can discuss in CSI making to me.

A

So it's good it's already. We want more things more clear.

A

We can have to follow up on this.

B

Okay, so should I add something here saying apology does not I would.

D

B

D

We're raising the concerns actually recommend the wording if possible. Okay, yeah.

B

We can get closer.

D

To what I think I'd like to see exactly what the concerns are, and in that way, we can probably get closer to a resolution than trying to guess what people are looking for. Okay,.

B

D

You have concerns on this PR. Please take a look for my comments and we can go from there.

A

So for snapshots, part housing, I think that's most yeah.

B

I think that yeah, that's it I think it's that it to go through the pure and look at some details and then that comments here then we can address it. I'll be great.

A

Okay, thank you and so can I share the screen.

A

B

A

B

D

Alright folks, thank you very much. Sorry.

A

A

Not fine sure, but you know.

D

Go for it! Sorry about that.

A

So, okay, you can see my screen no.

A

Yes, okay, so besides the list we discussed so far, I think one more thing we want to add to this is for the snapshot. Preparation, I thought it's not like a must feature, but I want to discuss a little bit to see to get some feedback. So I think we discussed a little bit before relate to this, and so right now in our API. We don't have any like support for prepare application before taking snapshots. That means we need to let user know it's not a consistent, seasonal shots. We miss opponent.

A

If you directly snapshot, you might not scatter their consistency. So user must prepare the application manually before taking a snapshot. They can either cause their application and also even like freeze the fascism or to be the syphilis raised unmount filesystem in some cases, and then they can ensure that except shots mr. Rob API and it's also like possibly we can have some support.

A

For example, we can have the controller to take care of, let's say fascism, phrase before taking snapshots and also add some hook for application freeze, so things right now the application is running in container racks or our pot, so Cuba it's responsible for the lifecycle of application, the pot. So in order to provide such book, then Cuba dates can watch them.

A

That's hook need to be like specified in a paths back and cooperage can responsible for execute some commands before to prepare application before taking snapshots, and it's uh it's a definitely not easy task to like provide such supports. So I won't see the general idea like whether we should think about providing those or it's certainly not nice. It's even not necessary, so the benefits I can see so far is easily.

A

It can automate the process like to minimize the human intervention and also improve some efficiency. The sucess, since the system can the controller can see ok, freeze, fascism and they immediately take some shot.

D

So I think this is definitely a requirement for the feature to go GA if we're ever going to get to a point where we're going to be able to take application, consistent snapshots, we're going to need to have some sort of hook into the application.

D

The challenges are, you know all the things that you mentioned, one more thing to keep in mind as we design. This is that the feedback that we have from Zig architecture is to ensure that the way that we do this has to be generic enough, so that it can be reused for potentially other types of features and other lifecycle hooks.

D

So it's not specific just to snapshots, otherwise we're gonna get pushback from API reviewers, but overall, my feedback is that this is something that we really need before GA so worth starting to think about how we're going to do it.

A

C

C

What one of the big challenges with these kinds of file system freezing things is controlling of the timing like how long is it frozen for how does it timeout? You know? How do you make sure that it's frozen for the shortest amount of time, something that we had a I think it was in the regular CSI meeting? You know the general discussion about deadlines and snapshots and how long you have to take them.

C

It gets even worse here, because now your application is actually halted for you know until the snapshot gets taken, so you got to have a way to put a time bound on it. So if, for whatever reason, the snapshot can't get taken, you remember to unfreeze and- and you know, fail the whole operation. It gets me going on and really seems relatively poorly suited to doing things in a time bounded way. So.

C

A

So I think the API with a final snapshot. So right now we do consider this timing, so we have a righty back to this, not that RIT, actually the the Krait 10. So when snapshot is cut, we put a quiz time and that's when indicate. Ok, this application can be resumed, so controller should be able to detect. Ok when the application can be resumed, and if your quick snapshots fail, then we have an arrow, so the arrow state will specify. Ok, you fail to create charts.

A

Then that also indicates ok, you you should resume the application, since your snapshots creation fails yeah.

C

But like like, as the user of the application, I might say, I'm willing to hold my application for up to 30 seconds to take a consistent snapshot. It is for whatever reason it's gonna take longer than 30 seconds I'm going to give up and let my applications start running again. So so the users don't get upset on the other end. And so you need a mechanism to say well, the snapshot is happening. It's just going too slow, so I need to abort and get my application up and running because I'm not willing to wait for.

C

However long it can take.

D

C

D

Part of whatever this lifecycle hook is going to be is we're gonna have to define what that period of time is whether it's fixed or static, or it's something that's negotiated, and if it's negotiated, is it negotiated between application and storage system and kubernetes? What are the players involved? How do we do that?

D

So I think all of that needs to be thought through, but the natural evolution of this feature is that if we want to get to a point where we're gonna be able to take application consistent snapshots, we need to be able to have some sort of hook into the application. It's not going to work without it, but I completely agree with the challenges that you laid out. We'll have to think through those.

A

Multiple powers can't use volume read, we have to visually, prepare multiple paths at the same time and then yeah, prepare them and unfreeze them. Those kind of consistency issue, but right now we already have to make two kinds of timing for one is for create snapshots. We also mentioned for Chris that we can have kind of a a period of time. You are waiting to trying to take a snapshot right. We haven't had that feature yet and then.

C

You figure out when it was taken. You need to be able to at the end of it determined. Did the snapshot get taken before the application unfroze. You.

D

A

That there also remember we have some readiness feature we mentioned before, so this and I also use this retinues. So whether I say when one part is ready and the other part can go ahead- yeah they're, not so complicated, I think.

D

The underlying concern that Ben is raising is around if we have a time bounded and I think this has come up in this call in previous meetings as well. If we time down how long a snapshot can take in the previous context, this was discussed in the context of retries. What happens if the snapshot taking process is taking a you know a long time, but is still in process, but it's beyond a deadline.

D

What do we do and it's worth thinking through and how we define that in the CSI spec? Maybe we can, because these calls are supposed to be synchronous. The termination of the connection from the caller side should be an indication to the to the storage system, and this is just me kind of throwing out ideas. Yes, it could be an indication that hey I, give up I requested this snapshot, but you took too long so do whatever you need to do to clean it up, I, don't need it anymore.

A

So kind of thing is, we don't have any cancel operation, and so in that case we time out, but we don't know what happens. I'm lying like yeah.

D

Maybe we should make that explicit in the spec that if our timeout happens, but the problem is the spec says, these calls are supposed to be idempotent and I believe. Create snapshot also says that so then, ultimately, the caller would be responsible for doing the cleanup, regardless I think it's worth taking a note, I'm thinking through the and see how we can clarify the spec and what can be added to to handle this I think all of this will need to go in before GA.

A

Okay, sure so we'll think through all these things, and especially like a process, an icepack, so whether we need some special also explained in more detail. So that's a good discussion. So that's what I have so far so I think for next time. We all like discuss a bit more in detail so since it's not required properly for this quarter, but it's definitely worth to start discussion as early as possible to required big changes in snapshot and also maybe it's nice back yeah.

A

Any other issues anyone wants you to discuss.

A

Okay, great, thank you very much again. We still have some time left so see you next week, thanks.

D

Everyone thank you.