Kubernetes Data Protection Working Group, 30 Nov 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes Data Protection WG - Bi-Weekly Meeting 2022-11-30

Description

Kubernetes Data Protection WG - Bi-Weekly Meeting - 30 November 2022

Meeting Notes/Agenda: -

Find out more about the DP WG here:
https://github.com/kubernetes/community/tree/master/wg-data-protection

Moderator: Xing Yang (VMware)

A

Hello, everyone today is November 30th 2022. This is the kubernetes data protection. When we're meeting I think today's main topic is Yvonne is going to give an update on CBT and I know. There are a few others Prasad and Dave Carr. You also have some thoughts about this, so uh we can start talking about this uh Yvonne. You do. You have anything to share okay. So let me actually uh I'll stop sharing I'll make you host.

B

Yeah thanks Sam um I think yeah. If maybe like um another person, um you know you put some time and effort into um doing a proper type on the bitmap approach right. uh Do you want to share the finding with the group any thoughts on that.

C

uh Yeah sure um yeah, so uh mostly uh the Prototype around how we can um you know, create a bitmap of um the response we get from EBS EBS uh change block um basically and yeah.

A

Do you have anything like any link to share it'd be good? um You know if I had anything to sh to show on the screen, rather than people just stare at this blank screen. You.

C

A

Some POC right I'll make you co-host. You can share something look where you can also put that link in the agenda. Doc.

C

So just secure the moment.

C

All right, um I hope, I'm sharing the right screen, reduce.

D

The guitar yes I can.

A

See that yes, the.

C

Yeah sure um yeah, so the thing we wanted to experiment with these like to visualize how uh the bitmap would look like uh for for the change block response we get from abs and how you know uh we can serialize, send the response and at the client said how we can deserialize and um uh build the you know, change blocks, get.

E

C

Change blocks uh data again right, uh so we um so basically uh used mob data of. So this is how we get uh e-based response. Ebs change, block response right uh so so.

A

This amount of data is this: it just like fake data is not really based on real data right. It.

C

Is based on real data, yeah I had taken two steps, snapshot and um you know used EBS clf.

E

C

Clr, basically, to get the data obviously.

E

C

Are detected here, they are not actual tokens, um so this is how we get the response and um yeah to build the uh build the bitmap. uh So again, this is just to you know uh some. It is not I would say best way of doing this. uh We are just doing experimentation, so I believe this is okay.

C

um So we uh some of the response. We try to get the um a number of blogs and since we can represent each block with one bit, so the total bytes required would be total number of blocks by I mean one byte can can represent. Eight block eight blocks right, uh one bit per block, uh so likewise uh we can uh build the bit vector, um so we will set if, if the, uh if okay so so, will iterate through all the change blocks.

C

If um we for for each index, we will set the bit to one otherwise uh by default, it will be zero and yeah, so we then serialize it um and then the actual response will be sent in format bytes and to deserialize. Again, uh basically, from that bite we got in the response. uh We try to build, beat V Factor again, so that then we can iterate over the big vector and uh get the sense of like which block uh is changed uh right.

C

So, based on like the index, we can then find out the data once we, you know, restore the volume um we we have. The metadata like which blocks are changed and we can fix the data from from the volume and to visualize, like um yeah, so for the mock data like this is how the bit Vector would look like.

C

um X represents if the block is changed and zero like the blocks, which are unchanged oops.

C

I, don't think there is an option to unwrap this level uh yeah, so the response would be invite, so it will consume only uh a total number of blocks divided by eight uh beats, bytes, sorry, um yeah and then obviously we can send this by itself response and deserialize and build rebuild the bit vector to uh get the change block information again and uh yeah. So um Carl pointed out by this approach may not work uh because we just get we can. We can encode only the uh index indexes.

C

uh Basically right, we can build bitmap about the indices, uh but the other information like tokens. um They are again. uh We cannot uh build bitmap about it. um Yeah.

E

C

This is not. This approach may not be uh extendable if we want to go into the data, if you want to adopt data paths in future.

C

Card would you like to add anything.

D

C

E

D

Correct I mean someone's going to use this data to use a vendors API then be missing. Information.

C

Yeah that that's on I think I had to present in any question.

B

Thanks Prasad um yeah, thanks for um putting the time to investigate into this yeah I um agree with um your um assessment. There I think like uh it does require, like you, have the backup software to make some assumptions um about you.

C

Know what their.

B

Business means you know, I guess, at the end of the day, all the backups off I can see it's like it's a string of bit of bytes or whatever um um you know basic type that we use there, but then it has to make some assumptions around, like um you know like, because in your test right, your the blocking back is one two three four five, six, seven, eight nine ten, but as I remember like EBS, have its own logical offset. It might mean who knows what the only EPs back I understand right.

B

So the backup software would even like building the soft um building the business and then um you know deconstruct building the bitmap inside the aggregated, API, server or controller or whatever we choose and then reconstructing it and or like you know like trying to make meaning out of it.

B

On the other end, there's some assumptions there, which um you know as much as like a I, think this is like um simple and great, but I'm, just not comfortable with the assumptions that have to be made there on both ends: um yep cool, um okay, anything else, um okay, um yeah so like uh in parallel, like um Jan and I, have been um exploring, like um you know, possible, alternative um and yet again, like uh I.

B

Think, like uh I share this thought with um you know on like um last week and it might be doable um the idea. I'm gonna share my screen. I guess! The idea is um instead of like forcing like um the payloads down the wire into the network and through the kubernetes control plane and cross our finger and hope for the best like. Why can't we just write the data into a persistent volume and let the backup software decide when they want to mount this?

B

Is the volume so I want to share it from like the API perspective? First, before we go into details, then talk about the different components within the CSI driver. So same idea like we have volume, snapshot, Delta and I mean request here is hey.

B

Show me the change blocks between again these two pair of snapshots and secret reference is really just um you know, CSI requirement, so one interesting, the first interesting change that we want to propose is, instead of like your maximum number of blocks, um we asked user to input like on the maximum size in bytes to be returned like maximum size in bytes of the payload to be returned because, like uh it I think we discussed this on slack as well right.

B

The number of blocks is really uh determined by the the block size which, as we all know, may vary between provided to Providers. You know when a user say hey, you know the maximum number of blocks that I'm willing to handle is 10 000.

B

in a way like it's kind of meaningless right, because, like it, ten thousand like can mean something else um from another provider's perspective when their box size is 512 kilobytes, for example, in EBS another provider with four kilobyte block size, you know, like the number of blocks would increase like significantly so instead of yeah that why don't we just enforce, like um the payload total payload size is expressing bytes. You know anything more than that, like um we were just um not want to receive it or not want to return it to the backup software.

B

So time out, pretty um you know intuitive I think, like the main, the first magical, I guess ingredient here is like I'll, be at like a property that allowed the user to say, hey, don't send me the data over at the network, because I don't want you to take down my kubernetes control plane, uh but instead write it to this PVC provision. This previous PVC use volume populator to inject or write all the data into the underlying volume and then, when I'm, ready, I'll spin up a part to read it from here.

B

So, as we can tell right, like these properties are very familiar, I think we can go into more um discussions about like what kind of like um configurable PVC properties. We want to expose the user, but the um some of the more common ones include like the name and the name space. So, like the user, tell CBT CSI, driver or CSI site car I want you to write the CBT metadata into this EBC in this namespace access mode. You know uh you know rewrite one's part. Just you know like um this.

B

How, like the backup software choose to access it, meaning anyone once this PVC is ready. Only one part can access. um This um is PVC. We could claim policy whether they want to delete it or retain it. You know pretty standard um API and a configuration there. Resource requests, some storage, you know like um how much of the PVC should be, and the storage class to use and then in return, so you can imagine right like um now.

B

Even we can do this without like um an aggregate, API server, we just need a good old, um CID controller. So users like create this uh volumes, natural Delta resource, um underneath it invokes the API and then now CSI like psychar, does its thing and then in return, you update the status with all the information um um that um of the relevant operation. You know the number of blocks that we we found, like the block size that the provider covers.

B

The number of bytes actually returned to the PVC that um we provisioned or the CSI card provision. You know messages State, you know, you know it's a it's. The PVC ready for consumption is it pending is a fail. So that's kind of like the idea that.

A

I thought you were saying you want to use the populator right, one popular.

B

Yeah yeah, but.

A

If the data source I'm not seeing that in this definition, right.

B

So, hang on a sec, yeah, good good guy, yeah good point so like that the assets will be inside the PVC right. So.

A

I only see a blank volume, so I was wondering. Well, is.

B

That yeah, so we can go into like we. um You know it's one of those like young. Do we want to it's a question around like? Do we want to let the user decide the volume populator or can our CSI site card decide the volume populator, um so um the the the second slide would deal with that. So, um but let's um I wanna like get people some.

B

You know feelings about, at least from the API consumption level. Does that like um makes sense, you know if.

A

This doesn't make sense, well, I, just wonder uh because take create a Walling could be slow. So that's the only uh anything that you know jump out of my mind right now, just to I don't know if it's uh they will add some performance. Yeah.

B

There's a bit of a trade-off there right, I think, like um I, think on someone did the math like on slack or something. So if it is like one terabyte of uh volume, I think the the CBT metadata might be in the magnitude of um hundreds.

A

I'm not I'm, not saying the size of the I think this.

B

Is yeah I'm saying.

A

That this is the creative water itself right. It's uh it could take take time right to create a volume. Basically just it's like for every time we uh do this. We want to create a new warning, basically right, I'm, just saying this will add some performance penalty.

B

Yeah, you know like it's definitely a good factor to to consider, um but there's some trade-offs there to be to be made at least like uh because, like the main like um I guess, like um you know, pushback was like um you know. We don't want to. We don't want to like put the kubernetes API server at risk. You know, as we flow all this data back to the backup software, which would be the case if we utilize an aggregated, ABS server or um just shove. All the entries into yeah.

A

We probably need to just need to uh even need to do some tests and see like how much yeah.

B

Yeah I think like from from at least from my perspective. It's like this makes sense because, like for one like um you know, we are we as a community, we are familiar with PVC, you know, we know how performing provisioning works. We know how.

A

To I know, I know it's just performance side right, take credit warning, yeah, definitely yeah.

B

A

B

Yeah, if you think like say two hours, two to provision the PVC.

A

Yeah but I think the the I guess. The reason why reason for having sweet is to improve the you know the make it more efficient right, but then we are adding this we're also adding some performance penalty here by using this way, but I'm not saying this is the I just want to say this is something that we need to consider right. You need to look at pros and cons of each approach, yep, so.

E

um So I think we're starting to design for worst case and that's kind of skewing everything at the moment. um So one thing to remember is CBT is an optimization. If you don't have CBT everything works, just everything works. Just fine. It's just slower right.

A

E

That's the same so.

A

We're adding this, and this will make it worse.

E

Yeah, so you know, even in cases where so like, for example, we um like when you're using CVT so worst case would be like say every other block is changed, I'm, not sure that CBT actually buys you anything in that situation, because reading every other, Block versus just reading every block yeah in theory it's half the iOS I guess with ssds, it's not so bad, but you're, not really saving that much. You know, there's a point where you just go: yeah just read the whole volume.

E

It's it's not that much worse because you may be reading in pretty big chunks. um So I think we want to be careful about that. We can certainly put some limits and say you know if it's more than x, percentage of the disk has been changed. Just read the whole thing, so that's one option, mm-hmm um I! Think we should. We should think about that and discuss that a bit as to where you know when there's a cutoff point where CBT is no longer buying us anything right.

A

Right right so.

B

A

B

I think, like um I, think that's a good point and um the again right, the the main pushback that we got is still like um the data flowing through the network through the kubernetes control plane.

B

um So, like you know it's a matter of like um the so it sounds like yeah. You know like there will be some. So if we go down the path of like trying to optimize things and there would be some, then we have to be explicit about like the supported case and the non-supported case. I think right.

E

um No because you simply come back and you say, for example, everything changed right. That's that's one option. You return an extent that says everything changed and then you know the backup software just deals with it like that.

B

So yeah like in a lot of that, like um so that again that so that is like the worst case scenario, where everything change no.

E

No, the worst case scenario is every other thing changed, because if everything changed and you're doing extent based that's one extent from zero to you know, whatever the last block is.

B

So and okay, so in light of that in one.

A

Way, can you just do a fullback already, you still need to increment, don't there's no.

E

Need no, but but that's, but that tells you you should do a full backup right.

A

Because there's gonna be plenty.

E

Of times when you're, when you do need to do a full backup.

A

E

You don't know, I mean when you come in and you ask for incremental changes. It's completely valid that everything changed so.

B

So how would we so, regardless, like how would we still like um send the data back to the user.

E

That that's that's not what I'm trying to solve there. What I'm trying to say, though, is that we don't we we want to be. We want to be solving in such a way that we get a boost most of the time if we are doing things that give us a boost, only 10 of the time, but we've designed for like the worst kit, but because, because we've designed for the worst case, we're not really winning yeah.

B

E

B

In a good case, scenario like I feel like uh this proposed solution will still work well,.

E

It would work but say, but my best case scenario is one block on the disk changed right.

B

E

So like reading one block should be quick.

B

E

If I have to go to the overhead of you know, you've got to create the volume populate the volume attach the volume in order to find out the one block changed so you're paying a lot of overhead for the best case.

B

Is it a lot, though,.

E

Yeah I mean I'd, say you know, allocating a volume can be seconds to minutes depending on the storage system.

D

B

Guess, like yeah this.

D

Is all overhead right I mean we have to take into considerate consideration as a factor of the overall data transfer for the backup? So it's a bit murky at this point, how we evaluate it um so, but I I agree with Dave. You know that we should observe and make comments about the worst case. But again you know it's targeted towards the average case. Maybe there can be some the API spec there could be some thresholds about.

D

You know what percentage of the disk change triggers the you know, backup everything right things like that that could those could be optimizations but but I I do have if we're not finished. If we finish with that topic, so I have a story. Separate question for you: Ivan on the format of this disk.

B

Yeah go for it.

D

Is this by? Is this a block mode disk? Is the data in binary written in raw or is a file system based PVC, and you have some files? I mean some a little bit more expounding on the format of what you could have down there and where those data structures come from.

B

Yeah I think like yeah, for the um for, for the like on first step. Like uh you know, it's just really like text. You know like we're expecting like I'm just tax payload, um so just write it to like a file system, PVC.

B

um Now, of course, there's nothing stopping us from like provisioning uh block PVC right again, uh it's just a matter of like um uh the uh the specification within the PVC, but.

A

That's not supported by everybody right, you say you're on provision about Roblox, you mean that's what I mean.

B

Yeah well like uh and then like you'll, be up to the backup software to decide how they want to consume it right. Do they want to consume a Boston, uh PVC or block PVC.

D

Well, the storage class name is is in the input right, so it has to be something something which is supported by that storage class. So.

B

From like the CSI like uh driver perspective, I think um would you agree that it is like a it should be: a user configurable property at the API level, s are.

E

Pretty different code paths right because you know you have to like you know getting the file system is one thing, but then you're gonna have completely different code to write to a file. You know up to a point than you are writing to the raw device and reading from it um also don't forget about um the possibility of having windows work for nodes yeah. So that was something that's that's.

D

C

And then yeah there's.

D

Another problem which I'm seeing which I think is a killer, and that is from the from a consuming backup application perspective. Let's say this: this CPT power, CBT PVC right, is dynamic. Reality it's to be dynamically allocated correct, because, obviously something can't be looking at it.

D

While we do it so that this is then forcing the backup application to launch another pod to attach this P, this PVC right, so that it can be read yes, you're forcing behavioral change in a backup application, whereas before you're just using apis now they have to spin off another pod, which itself takes time. Besides the you know, the dynamic allocation, all these things so yeah, it's there's a lot of um overhead, and maybe we could just classify what the overhead is and then we can understand it better. I'm.

B

D

It's one solution, but there is an overhead.

B

So that's those are good points right so like uh and like whether it is file system or code or so two things here right. The file system with a block- and you know like um hopefully like because like um if the volume populator like um API, there's nothing stopping us from having like um different kind of volume populators to have to to handle like different kind of different types of populations of different volume types, because that's one thing and secondly, regarding part, you know again excellent point right like.

B

But you know if, if we like, um just like um soon, I mean like, if we like um take consider, we try again right, try not to like um step into that. They have half too much. But once like, eventually like this backup, software will need to spin up a part to do to to to to get real meaning, meaningful operation. Out of this right, it would need to spin up a part to do the data path things, and this way it's going to apply like um all the CBT metadata.

D

Oh I mean it just depends on it's: it's implementation right and yeah, what I've written so but yeah, but that's beyond the meeting yeah. So. But but the point is this solution mandates eventually that apart be spun off to mount this? uh My CBT PVC.

E

Yes, if we're going straight, CSI with CSI at the moment right we'd have to clone the volume you know, get Jay access to the snapshot data we have to clone and attach so that path requires that we create a new pod.

A

E

There's also the new.

A

World in popular who is writing this in.

B

There so yeah I feel like um so. What we want to propose like um for the first step is like you have the container the cycle container is uh with is both like a controller as well as like a volume populator so like the controller will respond to like the volume snapshot, Delta request and then, as the CBT payload flows back in like um it would like. um uh You know the the volume populator within the sidecar um circus.

B

You know uh separate process right now and then like it would just go ahead and create like a persistent volume claim, and then it would Define like the data source uh reference that points to CR or crb, that the um that will have all the information. Obviously,.

A

We need to find this new crd as well right. We need a new crd yeah.

B

A

At least that's defined yet right.

B

So, of course, like um this, CR doesn't have to be um it's not something that the backup software um needs to be aware of so like it would be, like um you know, created by the PVC, as well as the change block output. um However, however, we want to name. It would be something that the sidecar.

A

So you're saying this will be like a common one, going to be one of the common.

B

Yeah yeah I mean, like uh you know, like the good thing about, like uh with a football volume. Population is I mean it's like I'm still like the same, underneath like it's like at least for Alpha I. Still like the same code around like um you know, we need to write it into like some sort of ephemeral storage.

B

um The volume population needs to be responsible for that.

A

A

Populates data, oh.

B

It's beta okay, so, but like at least like with it.

A

This is new right. You are talking about CBT, then that should be Alpha. This is brand new, yeah I'm. Sorry, oh.

B

Yeah I'm, sorry, uh um the um so yeah I think like uh we by putting it behind an API like um there is opportunity to Implement different kind of volume populators in the future. um So nothing.

A

Like a kind of one popular, you need I. Guess if you know it's a common one, I'm not.

C

A

The point: why do we need different ones if you well.

B

A

B

To you like that, concerns around like what, if um there is different code path, around code path around like um this, is whether this is a file system or a block um story. Small stocks um block PVC, there's provision, but at least like um so I yeah.

B

B

The concern is around optimization, like with it like, with this like slow it down significantly um I think, but at least like um I think, like they had a couple of things here. That um I personally, like um is the idea of like um again right, like I'm, going back to that user experience like um users have, we are familiar with Pub PVC work. um There is like you know. We are familiar with security around PVC.

B

uh We are not at risk of, like um you, know, shuffling Downs, like tons of traffic through the control plane and also like uh with a volume populator there's like um flexibility around how we want to extend like um things. So, if we put things into ephemeral storage that can serve as a cash, um you know when user resubmit, like a volume, populated snapshot again with the same base and Target. We know oh yeah, we've seen this before. You know like um we're.

B

Gonna return like on the same thing, because it's still in our fundamental storage. Thank you. um So yeah.

D

Okay and as Dave pointed out, you know something: that's going to read: the snapshot is still half going to have to mount the snapshot right and might as well Mount this uh ephemeral volume with your CBD data. At the same time, yeah well.

E

Actually I was going to continue on and say that's the current case, but eventually we want to be able to read the data over the network. Well,.

D

Yes, we'd love to because some some platforms support that yeah yeah, but that's an out of band I mean Beyonce beyond kubernetes beyond the current API scope. That's a vendor! Api yeah.

B

There's outside of CSI right so um I.

E

Feel like um talked about adding a data path, yeah.

C

E

And if we go down this path, then we do have the volume you know it's like. Even if we have the data path for moving data across the network, we've still got a volume in the path correct.

F

You have an adding data path would be great.

B

I guess so yeah I think, like the data paths. So far like definitely like um you know, we talk about this multiple times, right, they're, different implementation of data paths. So far like you know, the more I feel like the common genetic one is the one that is detailed in the the data protection white paper, where we have some sort of data mover path, a pot that do some mounting of the PVC of the restore um PVC.

E

Yes, you see that that has a lot of that that works in a generic case, which is good. The bad thing is that in depending on the storage system behind it there's huge amounts of overhead.

E

So if you go to say V sphere, then that means that it actually copies every block before it even out of the snapshot into a new volume before you even Mount before you can even get access to the data. So at that point you've already blown out. You know all of your. You know, advantages for CBT and everything else. You've doubled the amount of um of iOS.

E

F

Know how does it double the I o.

E

Because you had to copy all the all the blocks first, so the storage system copies all the blocks. First, then you copy the blocks a second time to get into the backup system.

B

Mm-Hmm so yeah I think, like this approach, doesn't stop that um the the out-of-band data path that you folks want to do right.

E

Well, it doesn't add anything for it, though, because we can't use so if we use this, then you know we're basically not gaining much so I think what we'd like to look towards is eventually having a network data path that is common that sits on top of a bunch of different network um data paths, but brings them together into a common into a common protocol.

B

So so, for the sake of discussion like instead of like provisioning, like uh you know the PVC being like a disc in the cloud or a volume in the cloud like. um Would it help if you're folks, like um just um provision ephemeral, one like it, would just be local on the node right.

E

B

It's a it's an.

E

Option so, let's just let's just keep things on the table, but just you know so I think you know. Second, is you know Chicago and I I think are looking at. You know hey how do we? How do we stop mounting volumes in order to extract data from them?

E

And that's a couple more turns of the of the wheel, because you know without the CBT it doesn't make a whole lot of sense and it's a lot of work, but I don't want to like lock things off so just just to put it as a as something that would be a a wish list item.

F

Yeah I mean, ultimately that would be the ideal solution right. We have a data path included along with there, this uh metadata, so uh one question. So what was so sorry I missed that part. So what was the concern around? uh How much traffic is okay for the uh you for using the aggregate, API server.

B

It's not it's a it's a combination of how much traffic and how much requests and response have to happen happen um again right. So, as you can imagine, this is going to be a cluster dependent right like if you know like.

B

If it's, if a user have a super buff like um cluster control plane, then they may have some higher threshold, um but the um you know, but if they can get sort of like in some sort of confined restricted resource environment being because, if you can imagine so even using EBS direct as an example, I think it limits you to like. Maybe there's a couple thousands of blocks and then the backup software will be like okay. Now give me more give me more give me more because.

F

So what we were discussing- and that was brought up earlier so save more than 10 or 20 percent of it is changed. um It will say: okay back up the whole volume. Would that address the concern that the of.

B

Overwhelming there yeah so then like um that would be up to the backup software to decide right so that now then, like the implementation at the CSI site car, it becomes.

F

No I mean we put it into the spec, then right.

B

um That the CSI site card will I, don't think CSI can enforce. That. Can it because, like the backup software optimally, has to decide okay, how much my threshold is well.

E

How much is well well? No, because it does returning it's a CSI driver returning the list of change blocks, and so we could set a a threshold in the CSI driver that says we're not going to return more than x amount of data. If it's more than that, we simply return a single extent. That says everything. That's.

A

One option driver is written by every vendor right, you don't.

E

Have a link yeah, but it's in this, but but well so, but we can spec. You know: here's how much.

A

There's no way to to enforce the spec.

B

A

Have something like a a side car or something some common controller? If we can set some hard in it, you know a common controller, a common side car. If we can settle limited there, then that's the only way to guarantee you know. Everybody.

C

A

Still Fork it, they can steal this.

E

Is a security issue though this is um I mean this is a assistant performance issue and if someone breaks it I mean people can break things all the time and there's plenty of ways to break the system and the.

A

Way you deal with it, don't break it. They can't reveal do not agree that right. That's.

B

The same yeah: well, we yeah, we don't want our components.

A

We need to so. If, if we can't, we have some way to say maybe our side car, our uh like we need CSI controller- can have this limit, at least if they use our controllers. They cannot go beyond this limit or something. If we can set some, we can agree upon some limit. Maybe we can talk about talk to them again and see yeah. So what is this upper limit? I guess I'm, not sure without limit that is acceptable.

A

B

Think I think so.

A

The question yeah.

B

I think that two things there right, like first of all the most of CSI, can say, at least in my opinion, is okay. I'm gonna only return. You x amount of data, like you can't pick the extra step like what shukrano was saying and say, like hey, you know, like I'll, do a full backup. You know you're, better off doing that way.

B

I can't say like I'm gonna stop at this amount of data yeah.

E

B

On software decide what you want to do from there.

E

No no I haven't all we're. Returning is a list of extents, so all we have to do is just say extent of everything or extent of everything from here. Even you know, so, if we know the size of the volume we could even inject the record, we could say: yeah, you've returned us, a thousand records. 1001 is going to be everything else right. It adds one more record and then discard the rest. That's coming back from the CSI driver.

B

So what would the response back do to backup software looks like, though, how does it indicate to the backup software that hey, you might as well bet? Do the whole backup now.

E

You you tell it everything changed the extent you returned. Are everything changed right.

B

That's the CSI, yes, the most that the CSI can do. Everything has changed and.

E

That's that indicates the backup software that it needs to back up everything.

B

That's the backup, well yeah, the decision. The decisions will still be at the backup software site right. Maybe it.

E

Doesn't it doesn't happen, it doesn't have any information other than that.

B

Right so I think that yeah I think I just want to make sure like you know that decisions has to be made by the backup software I don't want.

E

People there's no different decision there. You don't have any information.

B

But it would be up to the backup software to decide whether to yeah, yeah, restart the backup operation. I. Think.

E

It can't uh we can't presuppose the intelligence, the backup software to.

C

Say? Oh, no! No! No! No! No! No Mark.

E

No okay, so what we're doing here is we're returning a list of things that need to be backed up. If the backup software chooses not to do it correctly, that's a different problem, but.

B

If all we return is.

E

Everything needs to be backed up. That's all the backup software has to work on and that's a completely valid response. Everything needs to be backed up.

B

So yeah yeah again I feel like we are almost at the same point here. So the point is like to see it yeah the backup software has to be the one we like decide what to do next, but there's nothing.

E

It's not really a decision. There's.

B

Nothing we can see it I want to be clear that there's nothing within CSI that says hey. You know what now I'm gonna. You know support that auto full backup thing for you automatically seamlessly. Does that make sense? It.

E

Doesn't change anything.

F

Like basically, the Cs that I was saying, all blocks have changed right. Instead of sending a full block list, it is just sending um yeah so.

E

F

A flag, whatever right, we indicate yeah so.

E

We have extents right now, yeah, so it's an extent from zero to uh end of disk.

D

Well, that's a VMware issue, though right VMA returns extends, but uh our API here is giving change blocks. So that means we'd have to map the extent.

E

Set up I thought it was giving extents I.

B

Thought the original one.

E

Was giving extents okay.

B

I'm not really sure.

D

What extent looks.

E

D

Can clarify that but yeah, because I thought we were returning block data over here, but we blocked messages up here. I mean yeah yeah.

E

Again, this blocks are really inefficient. That's why we.

C

F

To look like a bit now, even but.

E

The extents are relatively easy to collapse. Yeah.

B

So yeah I'm not exactly sure what extent looks like if your folks have a link.

E

Yeah, it's like starting from here, starting from block X to block y. These.

D

Changed uh block X Plus, so many by so many bytes yeah.

F

Offset and the number of bytes yeah.

B

So, okay, so so yeah so I think um what was your question earlier.

F

I was asking was uh so the this discussion started because of the concern with the amount of traffic go flowing through the control plane, but if we put some upper cap on that traffic, would that be an acceptable solution? Yes,.

B

So going back to yeah, so yeah I, guess yeah I guess like what um this aligns with what the machine was asking earlier. I think like yeah, is I kind of floated. This um comment around in slack as well I think, for uh maybe it makes sense to say hey for Alpha. You know we I don't know like support, maybe up to 10 terabytes of volumes or something like that.

B

Some sort of arbitrary threshold like for Alpha just to see like how user use it um to get a sense of like um you know when the problem is the road. What it really looks like so and then I guess, to kind of to your point like say: if from a design architecture perspective, say if we go in and say hey, you know what like we did.

B

Our math like we're gonna only support like maybe up to 10 terabytes of volumes, which equates to a couple megabytes of metadata or worst case in a one gigabyte of metadata, the other Factor there and um fortunately, unfortunately, depending on how we look at it is a a lot of it depends on block size right. So if you sell like um one gig of metadata, if the block size vary from provider to provider, then you're going to have different like pagination Behavior different request response pairs.

B

You know, even if you try to stream that you know a lot of it will still be like um there's still a lot be under. There will still be like Network bandwidth consumption. So it's you know overhead. You know going back to what we're talking about.

F

Itself would not know how much has changed until it actually goes and fetches all this list of blocks and.

B

Then yeah further, what's the first part right this? The second part also that the return path back to the user through the kubernetes, um API server and then you'd be like okay. You know like so the block size right, it can be um I, don't know off the top of my hat on what the math is going to be one gig of metadata, it can be 5.

B

000 blocks can be ten thousand blocks depending on the Block size, and all of these have implication on how many requests and response gonna flow through the kubernetes control plane, which again, is the main concern of the I. Don't think the architecture cares about how we implement it at the end of the day, as long as we can take it out of sight of the control plane, because also we don't want our component to be the one that's responsible for taking down people's survey right.

E

Hey so you've got so: we've got two options there. One is I'm, not really convinced that blockside needs to change between vendors, because this is not the block size of the storage. It's the block size that we're giving changes in.

E

So is that part, and then the other part is that you know if you define it as here's a maximum amount of data to be returned before you stop doing it, then the block size also doesn't really matter because the vendor should be. You know only returning x, amount of data.

B

E

B

Yeah I think the block size is an interesting one. Like you know, I don't have enough like sample like I just know like abs, like 512 kilobytes at the CBT level, not at the EBS level. I, don't know, I think, like others might be different.

B

Hey. You know give me like a data and block size of like four kilobytes, because it doesn't even take like a block size as an input parameter right.

E

Now you can't do that, but you can you can, but you can easily convert as long as things come back in order. It's really easy to convert block size on the Fly.

B

Yeah and that's again that that's like on the user side right, that's not the user to decide how.

E

They are no in in the CSI driver. Even so, if I want to convert, say from 512k blocks to one megabyte blocks, it's pretty easy. I can aggregate them together, or vice versa. If I want to, if I want to say oh I'm, going to return them in 4k blocks and I get a 512k blockback, you just return. You know whatever a thousand 10 24 4K blocks.

D

Yeah, a little difficult processing that I agree Dave, but a little difficult processing that when there are per block tokens involved. Because for the missing holes. Where you.

E

Have to place it.

D

There are some additional calls the implementation would have to make sure.

E

Yeah, yes, returns extends not blocks.

D

B

I think we just want to again generic enough to cover the 80 90, um not a specific, like provider. um Respect here, I feel like also like um again right. Just for the sake of discussion like um we feel like uh we're, taking on a lot more like complex logic than I, just want a provision, a volume and write it in there. You know and listen right now. That's my way of.

E

Thinking, that's a lot of logic, I mean converting blocks and building bitmaps and stuff is not that hard. No yeah.

B

But the handling around like okay, I'm gonna, stop this. You know I'm gonna like continue or I'm gonna. Stop this I'm gonna convert the block size. You know I'm.

D

Gonna, let's take a step back and ask ourselves why we're doing this? Okay, I mean. Why are we doing all this? It's because someone wants to do an efficient, backup correct so today, if I'm doing an efficient backup I'm most likely using the vendor. Api is out of that right.

D

um If I'm new to the scene and I get this right. I want to just inherit efficient backup without investing a lot of a lot of time and logic into trying to understand the low level infrastructure and say kubernetes feed me right. That's that's the goal so um I don't know whether it makes very much sense without looking at a holistic solution.

D

I know: we've parceled it out into metadata of CBT and then talk about the data blocks, but I'm just tossing this out there does it make sense in all situations that we treat this in the split brain mechanism or do we also have to look at the data path to get the change blocks. The actual blocks as part of the solution.

E

I think it's a fair point. Carl um I think that the reason we probably went down this you know bit by bit path is just that it is incremental and getting the data path for the actual data. I mean that is a large amount of of stuff to move, and we we wind up in a bunch of issues there. So it's probably you know, that's why it hasn't really been on the table, but maybe that maybe it does need to be.

A

uh But we are okay, we'll plan on adding this into CSS background. She's expected something for control path. It's not not for data pass, so I think we need to address this first. Okay, we can add too much new things in there.

B

So how do you feel about this? Okay.

A

Well, I, my biggest concerned, just to still you know, I feel this is adding some performance penalty, which you know we are trying to be more efficient by using CBT right to gain some to have some performance gain. So I feel like this might be offsetting that so.

D

Yeah, the performance better teaching, it's actually small relative to the whole process, but if you just treat it in isolation it can magnify the impact is back. The overhead is magnified, that's what's worrisome.

A

So you're, okay, with this uh burning populated approach,.

D

um I, don't totally understand it, but I'm. Okay, if I have to consider the data block movement, because ultimately, if I'm actually.

A

Not this is not data platform. This is still.

D

Exactly metadata, but but if I'm looking at it from a holistic perspective as a backup vendor yeah I'm, not just going to look at the metadata I, don't care about the metadata. I, ultimately use it as a tool to get to the data. So.

A

D

What's the overhead of this metadata over the entire data transfer, that is how I would measure it right, but yeah. If we don't bring the data blocks in the picture, what are we measuring against? How are we how.

A

Are we oh, let's see what we're saying is that we.

D

B

A

B

Yeah I mean I agree with that right, like at the end of the day yeah without we, we don't want to implement anything on the data path, but without thinking about how the user consumed this like, um if there's no feasible way, um or at least there's no optimized way.

A

In the grand scheme of thing we yeah, we don't have to you know, describe like how how the you know. People will be using this. uh You know after retrieving this, how do they do the data part right, but it's I'm just saying, but we will not be providing common API.

A

B

It could be like everything flows through the network, including metadata plus the data blocks, so everything just got put into a PVC the metadata and the data block, eventually um from a backup software perspective yeah. So in the grand scheme marketing. What is the total like um overhead? There um I think. The other thing is also like. um If you want to talk about performance, optimization with network bandwidth, it isn't right, it isn't like not without its overhead. This is like. Maybe it's not less obvious until like.

E

Well, the thing is that everything, so what we wind up doing is we wind up moving the data over the network because that's how we get it out of that storage? So there's always going to be: it's always going to be a network transfer at the end, yeah.

B

E

You maybe write to a local disk, but you know or PVC or something so it's the network is kind of baked in thank.

D

You yeah, but there are many many networks. Right I mean the device Channel if the vendor were to implement this add a some with some spec. The vendor could use internal device dedicated networks to move data, as opposed to whatever we have for the inter-process communication and the kubernetes side right.

E

That's true, but I mean I I, think you know the especially on the CBT stuff. The amount of data relative to network bandwidth, modern Network bandwidth is trivial.

D

Yes, yes, yes, it's that I agree with it, and that was part of my buddy thing trying to measure this thing in isolation, yeah.

A

We are yeah, we only have one minute left, okay. So how do we go from here? So now we have uh uh this new approach. Does it make sense to move forward with the POC see how that works, or uh should we also come up with this uh like this? Can we come up with this com upper limit and try to see if we can go back and talk to the API reviewers about this aggregated API.

B

I think like uh well, at least personally I would like to give this a shot to get some feedback from like people like Jeff and David. It's still I think one way or the other something has to give like. We can't.

A

Okay, so you want to you want to uh do a POC with this. Maybe you can do a POC and then uh yeah, and we can look at that next. Next.

B

Time, yeah yeah and then to your question around like oh sorry, we're um we.

A

Need to come up with another limit and enjoy another meeting so yeah, so so that wouldn't you yeah, so why don't you uh maybe work on a POC? Let's talk about this in next meeting is okay.

A

Okay, all right, okay! Thank you. Everyone bye.

B

Thanks everyone bye. Thank you.