Kubernetes Data Protection Working Group, 14 Dec 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes Data Protection WG - Bi-Weekly Meeting 2022-12-14

Description

Kubernetes Data Protection WG - Bi-Weekly Meeting - 14 December 2022

Meeting Notes/Agenda: -

Find out more about the Data Protection WG here: https://github.com/kubernetes/community/tree/master/wg-data-protection

Moderator: Xiangqian Yu (Google)

A

B

C

C

D

Computers are so hard, oh there. He is.

C

um Yeah you, uh okay.

E

C

E

I was kicked off for some unknown reason. Sorry about that folks, uh anyway, pretty happy holidays. um Today it's been recorded right, yes, Wednesday 2022, and this is probably the last uh working group meeting for this year. uh The day we have only one agenda item. This is the over there uh Yvonne and uh Prasad talking about CBD updates, except for up from the last last transition and then we'll open the composition. Today, a conversation to the entire team.

A

E

Free to take over.

D

Yeah um so Prasad I'm, just gonna, give like uh two quick updates and then um after that, like.

A

D

Know you can share whatever you have on your plate, um so yeah um hi, everybody so yeah. Since the last uh working group I've been um trying to like um you know, just spend some time doing a prototype on the uh CBT cap, using the volume populator mechanism, I haven't had a chance to finish it yet, but I just want to quickly share the status today.

D

uh Just let me just switch over to my screen here, so you should be able to see my Google slide. um Yes, I'm just writing like um the code and try out a few things. uh I think like uh get to a point where, like the API call, can be very simple. The idea here is um again continue to open to feedback and discussion.

D

The idea here is like the backup software uh in order to like initiate, like the CBT requests, the just essentially just create the volume snapshot, Delta CR, the final crd, because uh in this approach we don't need an aggregated FPS server and then they will also like uh do a backup software also manage like a PVC.

D

In the last meeting we talked about letting the populator hang handle all this uh PPC and test management. I. Think that's like code. It up and try a few things. I think it's easier and simpler if the backup software have full control over the life cycle of the volume and the disk, where eventually like, uh where all the CVT entries will be returned to the general idea is still like uh backup.

D

Software says: I wanted Delta between two snapshots and by the way I'm gonna provision a pre-provision one volume, or this a pool of volumes, and some of them would be like um after the assault reference point, two: the volume snapshot, Delta CR that we it created um and then the CBT populator within it. There is a controller that watches for the creation of the PVC and they identify that hey.

D

This is the data source reference that I know how to deal with, uh because the crd has been registered with me and then it will go off and do it's, like uh you know, CPT like computation and stuff like that.

D

um I know like um so one of the I'll come back to address like um all right. Let's discuss like um the concerns we had um last time about, like latencies and uh extra resource overhead um in in a few minutes, so yeah, so the CPT populator volume populator within the CSI driver on the left hand, side issue RPC, call to the CSI plugin get back all the CBT payload uh spin up a ephemeral pot Mount that volume there was pre-provisioned by the backup software and I just write.

D

All the in-memory, like you have CBP entries to the volume. So now, like um the backup software would just um you know, decide when it wants to spin up like a data move apart or whatever other part uh to fetch those CBT entries.

D

So that's kind of like the general direction of the Prototype that I'm trying and playing around with um there's.

C

Questions about the here, the step, one, two, those are basically the grpc interface, this yeah.

D

Grpc call okay, yeah, so nothing changes in fact the RPC grpc level.

D

um So, instead of so right here, the payload cap came back in Json format or whatever format, instead of like um pushing it out into the kubernetes in cluster Network. We just like uh write it into that volume that was um previously provisioned and managed by the backup support.

D

So it's kind of like the general direction of the Prototype, like uh I, haven't, had a chance to like really go into more um benchmarking effort to see like how long it takes to provision uh pves and a volume, um but based on my like very very, like preliminary testing like um say with, like you know, in our internal lab, like with FCD like um we got like um I, could get like a volume of like a 10 gig 10 like um two three seconds. Of course.

D

That's just all like um you know, sub cluster dependent and environment dependent right, so it could be fast. Always it might be a lot of people, but so far like um now this, in my like preliminary testing like young, there hasn't been too much of an overhead in provisioning like um volume song to like 10 gig I would just imagine like it's a reasonable size for, like your CBT metadata um it'd, be interesting to see. How, like um um you know, is that continued to scale.

D

um You know if we change the size of the volume and I guess one of the things that I like about using letting the backup software it's like to pre-provision or provision the PVC is like I can choose the size, the speed you know of, however, it wants like the volume you can pre-provision approve of it. If you want, you know, etc, etc. The assumption that I meet there is just like you know, backup software is generally good uh provisioning volumes.

D

um I guess if that is not a universal assumption, is their background software out there? That's not good at provisioning volumes, then um maybe that might be um a limitation in this prototype here.

A

B

To three seconds is really slow. That's not fast! That's a big hit on these apis.

B

Yeah, we may be calling this hundreds of thousands of times for backup.

D

Yeah like again, yes,.

A

D

I think like um in like uh at least like again right, like um on our site like um like I, can't speak for, like other providers um on us like, like um you know like from talking to like um like um we could do it, like um you know, like 50s or hundreds see in a second in parallel, This Is, Us right, so I can't see following other providers.

D

The the one two three seconds is just my like own, like lab environment, just my own, you know hello, world cluster yeah.

B

I mean there's a certain amount of overhead with even just reading, and writing resources to the API server so like just creating a PVC takes like a second just to write the thing. Oh.

D

Yeah good point, like everything is free right, like with aggregated FPS out there get stuff there too, that needs to. um There will be some sort of overhead in some way or the other I think like uh it's a matter of like uh you know like stability over of the over the network or stability. Like of writing, like large amount of data into volumes, well,.

B

No I mean I think we should really just set. You know well, I'm, just I'm just trying to set the stage for things like Fast versus slow things that take seconds to execute are not fast.

A

D

So yeah I I mean like um again. This is one of those things that you know be interested to see what um how other folks hear about it.

E

This is the model that is pretty close to the volume populator right, um yeah.

D

It is the volume populator essentially so the the trade again right, like uh I, know, sunshine. You haven't been around um I. Think the main like um pushback we have from the architecture. Architecture is like the amount of volume that we pushed through the kubernetes in cluster Network, going through the control plane.

D

um So by writing it to this like, then, we eliminate the problem around like um overburdening the aps or kubernetes cluster entirely.

D

uh Okay argue that they will still be networking, but that happens between the provider and uh the CSL like the couplet right, essentially when the data is classified, but at least like it doesn't go into like the kubernetes network or over the network. um So that's one way to you, know I, guess, I guess. The objective like I have in mind when I came up with involving populated approach.

D

um Ultimately we still have that like we still um if we want to go back to trying.

D

um So how do you? What do you think I know you haven't had a chance to um yeah.

E

It's okay, I I have a couple questions for you, please sure. So if you go to slides three.

A

E

The backup software during the restoration time during the populating time it creates a PVC right. The.

C

Pvc was simple backup. At least the diagram is back up not for ReStore.

C

A

C

The change blocks.

D

Yeah, so the PVC was created by the backup software sure.

E

And this PVC is created for storing the change block. Sorry.

D

Yeah to see the CBT payloads, essentially so we're not talking about backup yeah we're just talking about the pre-backup fees. You know where I want to do a backup, but I want to only find out what that what has changed between two snapshots.

C

It is part of the backup right so Rebecca you, basically you take two snapshots and, and then after that, you trigger this populator to get the change blocks.

E

But the data, what what does the data source reference mean.

D

It's just the uh the PVC uh API to to initiate, like the volume populated right so like of inside the um I'm, still learning about the volume populator as well like um in a couple of weeks. But the idea is like the volume populator would it would watch the PVC um resource? Yes,.

B

D

Like you know, they get a restaurant reference, it's the um I, guess the entry point. If you will into the volume popular execution previously like uh to do restore, we only do data data source right, whether it is plugging snapshot and then we're only limited to volume, snapshot or another PVC.

D

So the guess was, reference is just uh it's an entry point into the volume populator.

E

Yeah that that field is there to serve the purpose during the volume population during a volume, restoration process, and it sounds like we're reusing this field to to treat it as a as the source of the change block track. Content right.

D

Yeah yeah, essentially right like it's not um so maybe like yeah, essentially it's up to the volume populator to to decide how that quote, like an unquote like a resource, um restore works right, it's you know, I tend to think of why we populated not so much about restoration but more about, like pre-populating the volume with data, so that the volume is not empty.

D

So whether it's Restorations or just you know like traditionally in it contain type of song approach to pre-populate data, um I, don't think as a restore operation, it's more like a popular and population type of operation. It.

E

Might be, it might be concerning to the API reviewers if we want to take this approach, because the the data source is therefore over here, it sounds like the popular. What the popular would be doing is actually create a volume that is from the volume Slap Shot Delta corrected from two snapshots: it's basically a restoration process. Instead of storing the media results, we process, it might be a little bit confusing there.

A

E

Uh-Huh, because because the field is designed kind of more or less concentric, around the restoration process, and do you have a similar diagram for restoration? What.

D

Was the restoration like restoration of of the volume.

B

I, the CBT won't really play in the rest in the restore process.

D

Yeah, so this volume here is pre-created by the uh the backup software I.

E

Yeah I understand that, but if I'm a unuser I want to restore volume, what what do we do.

D

E

Our volume from from William snapshot builders.

D

um I'm not exactly so.

B

So volume snapshot Delta is just the CBT, so it's the changes between two two snapshots so on restore you wouldn't use it so you'd restore from a volume snapshot right. That would be your data source kind, yeah.

D

I think the idea um yeah so yeah.

B

D

The idea, sorry, the in the inside the code right, it's just like hey, you know, go find this, go, find this thing and then find like the the volume snapshots and then like issue the grpc call so in a sense like yeah, maybe I guess maybe potentially the confusing part. It's like this is not like uh data source per se, like uh literally a dinosaurs. This feels like the way for the volume popular to say: okay I, you know this PVC. It will contain data from this particular thing.

D

This particular kind in the like the code is just like: hey find these two volume snapshots find the volume such a content issue. The grpc call.

E

And and what this is a backup, workflow right so backup software? Does the backup and store all the change blocks in the in a persistent volume claim that is mounted to the CBD populator right?

E

What's next, the backup software must be able to ship this thing to somewhere, and they don't use this thing to restore the volume right. That's.

B

E

B

Not not quite I think the flow is getting a little confused, so the idea was we needed to get the change block, Delta or change block data right and so we're getting pushed back on moving it through the API server. So.

E

B

I, haven't came up with this idea that we could put the change block data into a PV and then read it from the PV, so this is actually coming. So the data the CBT data is coming from the CSI provider being you know, provided to the volume populator which then writes it into the disk, and it's basically going in there's a file. So we could read the file by attaching it sure.

E

But event, eventually the backup software does the change block tracking calculation for backup purposes and restore purposes well,.

B

It would so it's usually so we're using it during backup to say hey. We only need to back up these blogs and then we store all the blocks in whatever our backend repository is right. So.

C

B

That point yeah at that point: we've got whatever changes we made, but that's kept internal to the backup software, but.

E

That by the, how is that exposed to backup software? How is this volume be used by backup? Software is my question. It.

B

Would attach the volume read the CBT data from inside of it by PVC Mount, attaching it to a pod. You.

E

Cannot the CBD popular is still there.

D

No, so this is not Max, so the volume it was never like mounted to the CBT populator then.

E

How do you write into that.

D

So you, you spin up with an ephemeral pot to do it and after that you just kill the pot right.

E

Who spins up the the populator.

D

The population has been out the pot right. The CBT entries into the volume then kill this ephemeral pot and then like that volume would just be, and then like from the backup software perspective. It would just be a static like um provision PVC with CBT entries inside.

E

Okay and then the volume the snapshot Delta contains the status whether this volume has been written or not.

D

Yeah, you know like this information around, like um you know at the end, so the idea is like yes, this ephemeral part is writing it. It was just a synchronously update like um the state uh which you can start in some resource and eventually we're just um say when it's all said and done, it would just say: okay, we're done how many blocks did I see the block size that the provider told me was this amount and how many have I returned to um how many bytes have I returned to that this?

D

What's the evolving sorry, foreign.

E

Okay, so the change block popular watches, the volume snapshot, data resources and the PVC resources, and it knows that I need to take care of the if the data resource references pointing to a volume snapshot Delta and once it says that it spins up a temporary informal part.

A

E

Mounts this volume, yeah I, don't know how that works and.

D

Then they're gonna all be in the same namespace as the volume snapshot that um we specify.

E

Okay and then the informal part will also need to watch volume snapshot. Delta I'll get the information from there.

D

No, like the the CBT populator would have like um okay I, see, I, see what you mean there like, uh because, like um the CBT populator like uh oh, the um CBD entries will still be in memory. Somehow like it needs to be able to like um find a way to like um put those um in memory memory.

D

Cbt entries uh to this guy, and this guy can write it to the volume there I think like um yeah yeah. It's.

E

Gonna be an again a service right, so it's.

A

Good because yeah the.

E

Cbd popular is making the RPC is not the informal part. Yes,.

D

Yes, yep I think like um there is like um I think like uh there was the um inside, the uh they have a hello world like CBD I, have a lower volume populated on how on what they did was like. They put it into like a um uh an ephemeral um file between like um the populator and then like yeah and then like somehow, and then, like the ephemeral pot. Like is able to read that you know file inside yeah, that is on the CBT populator um ephemeral volume.

D

So there is like an intermediate step here in between.

C

So let's say you assuming that you have to finish it: you have this FML part and data got read into this volume and then that's the backup. Software has to read this volume again.

D

C

D

So it would now, it would just be up to the backup software to you know, resume that backup operation right.

C

C

Something like.

D

That yeah yeah yeah, the the um original at least like um this, is where.

D

um That data move apart with Mount the CPT volume and then at the same time. Ideally, it should also be able to mount like um that. The volume of the idea blocks right. So this is not taking into consideration, like the data token approach that um they brought up last meeting, because it's just like already the current way of how the data mover pot would mount the volume and to get to the actual VM box.

A

E

Would need this volume and the target snapshot? The second snapshot? Yes,.

D

D

So at least like um that part of the data path, like um you know like hopefully like the big Hub software, would know how to do that right. So, if I'm a CSI perspective, it's kind of that's our scope.

C

Yeah I think it's durable. I was still my my concerns the same from last time. Still there added overhead performance thing right, so creating volume attached takes time, we're going to do this every time yeah. So that's my concern. I.

D

Guess yeah I think that's like definitely a legitimate um concern, so um you know seems like uh you know, multiple or a few of you have brought brought this up. I brought it up, um I think like um so I'm gonna at a point where I feel like um it'd be nice to get some like um would not be nice, but I think it'd be necessary to get some benchmarking to see how bad it is. It's just like again right. It's just like the networking um argument.

D

Right like there was prison that was presented to us, always bad, but nobody knows how bad it is. You know um I feel like this one one approach or the other like um it's up to us to come up with some numbers to um you know convince ourselves and convince others um outside of this working group that you know it's not that bad. You know but um legitimate concern, but it's like how bad it is. It's up to us to come up with the actual numbers to convince ourselves and convince others.

D

So, like I said right, like I, haven't, had a chance to fully like Benchmark like um the um the overhead. You know like um the at least like yeah I'm, just from on our side like um it didn't seems like that. Big of an overhead is something that, like um you know, we can. um You know at least for our like uh product like something that we can like.

D

We can accommodate so, um but I can't speak for like um other providers, so on outside, like we could do like 50 to 100 easily in a second in parallel.

C

But provision volume that's depending on the story provider right. It's not exactly.

D

Exactly exactly so so it works for us. I would say like it sounds at this level. It feels like it works for us. That's why I said I cannot say it works for uh by us, I mean their products and also like from what some of the internal benchmarking we have FCD and stuff like that so um and hands like you know, this is the receiving the Prototype phase. We still have any discussion about this. How this can serve like um community at large? Not just like um you know, one or two specific providers.

D

D

Okay, so before we go back to um the resource, the overhead like um from uh API from a workflow perspective, is it reasonable? So it's you know, then we can if it. If this, if there's something like fun about like the API or the workflow itself, then you know we can just you don't even need to argue about like a resource, contentions and overhead right.

E

Yeah I'm afraid we would get blocked by data source reference.

D

Because of the.

E

Yeah, the ownership of that field. Sorry, the ownership of this PVC in this volume- it's not super clear right this this is, firstly, is going to be owned by the informal part and then later on, it will be used by software. The backup software.

D

How is um what what makes? Why is that a blocker like.

D

It's not uncommon to have like statically pre-provisioned volumes that you know that. Maybe you can argue that we can argue that the owners are not clear until like whoever the workload decided to mount it right.

E

Sure about how how I I was trying to imagine how backup software can make that orchestration happen right the the data source reference it it's basically designed or I'm, not sure. What's the current status, maybe it changed, but in the early days it is purely designed for serving as external source of populating our volume.

A

D

A

D

Mean like it actually points to an actual data source or yeah.

E

It's actually like data populator. If you look at Ben's like implementation, as of today, it's literally a source of data.

D

um I think you just use like uh some sort of see or like uh in the example that I saw it's just like um it's just nsdrd right and at the end the implementation is still like. It came from like a static file or like.

C

D

Text string um I feel like that kind of like data source um details, implementation.

E

We can check, we can certainly check yeah.

D

I I mean I, know which I know, which example, yeah, I'm, sorry I- know, which example you're referring to, but I felt like that kind of data source details. It's not I, guess the whole point of having an API is like the backup software doesn't need to know. What is behind that crd right, you say hey. This is from by looking at it from my API perspective.

D

He says: hey I have a PVC and it has some data in it and that data come from this ERD or you know if I wanted like say if I can just change it to like.

E

I I I'm not disagreeing with you on that at all yeah.

A

E

Using this, video should be aware of what they're doing I agree. It's just a matter of conceptual wise or from from my poor understanding wise. It sounds like it is actually populating the data from William snapshots anyway,.

B

A

E

That's the second thing is: if you go again sorry number three I I, don't quite get how the number three happens.

D

So I think, like um the I I haven't finished, like the implementation of this mounting part, yet um I think essentially like once it gets these um CBD payload. Initially it's going to be in memory right, so it has to um we can potentially like um have. um This would be have to be some sort of local, like ephemeral volume that is mounted to the populator, and then this is where, like um it will go like the CBT in memory will get um returned to, and it's kind of like the.

D

What the volume populated example that I think um and then, like the ephemeral part, we just have to move it from like one ephemeral volume to.

A

E

Does the ephemeral part access the local informal volume? Can you you can mount it right? No, you cannot it's a local informal volume of the change block, populator.

D

Oh okay, so it cannot do okay right! You cannot like Mount right. It cannot be mounted by two workload. At the same time,.

E

And you have to make sure the informal part is scheduled on the change block populated node as well before the operating system. I, don't know which, what informal volume uh the population was using before the operating system actually recycles. It.

D

Exactly yeah, yeah I think um yeah. That's a very good point like um you know like you, cannot have two volumes like a two portals now like, um regardless of what the ephemeral volume looks like underneath on the Note.

E

A

E

Make this informal part actually at this arrest service.

D

um Then it then now we just have to stand it like then we might as well just go back to like um directly sending like um the CBT entries um back to the background software right. If this part here has to be networking, number three has to be now: okay, then the E minor rewinds directly push it back to the backup software. There's no.

E

So, if there's a difference over here right, if it's pushing back to the backup software, then reinforcing backup softwares to implement the same apis. The CPT populator in this diagram can talk to by having our own service means that we can standalize that push API and we can expose the get API for backup software.

E

So what goes underneath as a storage system, where there's a persistent volume or or I don't know, object, storage whatever it is. It's completely under control of the service.

D

So you're saying potentially this can be a cozy type of thing. If we want, however, yeah.

E

Whatever, whatever it is, whatever you.

D

Want to do it, um yeah.

E

um It just you know, figuring out all this volume mounting and the exchange ownership after one part is done, and another part take over this I feel that there will be a quite a bit of orchestration needs to happen. Yeah and.

A

I I agree: yeah.

E

A

Think, like yeah.

E

It can be very prone to error.

D

I I agree with you if there's no other way to do number three without get another volume, then I personally kind of don't like it because of just yeah like no. We need to like get exactly like what you said. There's too much orchestration orchestration that need to happen internally. At least um you know, and you can I mean one can argue that one can argue that at least all of this is abstract and um in from the backup software backup. Software is insulated from all these complexity, but still I.

D

Think it's a bit more than I want to maintain. You know at least from at this stage, um so um yeah no I, don't know yeah off the top of my head. I, don't know how to do this. uh Actually we felt like and uh yeah another volume pin, I, don't think it's um then by then they say forget it. You know, like just send it back to the network over the network back to the backup somewhere directly to have all things.

E

Yeah one of the counter argument I have to I hate to bring this up, but I think this does does happen previously when we start talking to this, because this is effectively already kind of kind of on the data path already.

E

So uh one of the reasons why Community does not like quite like the rest, IP API idea was that is not kubernetes native so and that that's the that's the main reason why I think they want to see Rd instead of rest apis, where crds itself contains a lot of implementations, which you know the the limited amount of storage can can we we can have in this in each crd itself, makes it not a very proper structure to hold change block tracking data.

E

Yeah I, don't know man if we go back to the rest, yeah.

D

I know like even like um sukana and I have been discussing this too. You know like I, feel like at the end. We might just end up falling back to that initial design from like I, don't know five months ago, where we just exposed like uh an external rest endpoint, then um you know then end up like we don't get uh sick architecture won't be concerned about it. Then we can have. Then we don't have like um volume pre-provision over him, Etc.

D

um Of course, the you know the trade-offs there, like you said like uh it's, not very um kubernetes Native, and then we have to also like uh deal with security. um Not the end of the world is doable. We just have to like Manu right code to delegate uh authorization and authentication to we need this API server.

D

We need to manage TLS search, you know, um but if you, but it feels like yeah, you know like we might just end up going back to that at the end, you know if, like uh volume populated, doesn't work and aggregated okay, so it continues to um you know: if we're not able to convince the architecture that you know aggregate the API server would not uh be. uh You know, potential bottleneck or like resource um thing.

E

Yeah, how did that piece go by the way? Pretty sorry which one the the aggregate API front.

D

I mean like um so yeah we've been chatting about this like uh as well and um I. Think the main thing was like I, don't know if I have it here, I think the main thing is like, let's see, I, think the last um a bit slow here, the last feedback we got from Clayton was that, like um you know, where is it okay? If y'all can see my GitHub um h, um I?

D

Think it's just here right like his thing is, like um you know, somehow, like the API, the bandwidth requirement did not material change, the P99 bandwidth profile of the API server, um what I mean by that? So so again, he said like there's also like he's going to like why this is appropriate for exact port forward, but not not appropriate for hours.

D

um So if we can prove this right like if we roll this out into like a kubernetes clusters and test it with some reasonably sized um volume and pull like the CBT entries and then like this, the the P99 it doesn't affect the P99 latency of API server, then we at least we have like something that to go back to them and say: hey it's not as bad um now tangibly. What does this mean? um Not much information here, I reach out and say Hey, you know just like I'm, just trying to like tap chat.

D

Have a good chat with us in here. I have enough conversation here first before I ping, him directly um about this, to create some clarification by knowledge and, for example, like um the API server. Actually, on slack um it has the Avi the kubernetes API server. It has some SLO alert that, like would measure and predict like uh the the latencies of the API server yeah. You know if it goes beyond like that.

D

Slo, like the alert is gonna fire it and then it just and then like um the uh API fairness and priority thing is going to kick in and API server is just going to start throttling and stop like I'm cutting off rate limit, like all the some of the API calls. I. Think that's the main case in here. So I mentioned right. I also mentioned to you like hey, you know, I talk about a priority, In fairness um or financing, probably I, always I can't always remember.

D

We've come first um I mentioned it in this um cap, too. I was like hey, you know, um and uh I talked to Jeff about it too, and then he's um he said like he liked that we include, like um you know, we account for fairness and priority, um but he also understands like um where Clayton is coming from so I. Guess it's really up to us to prove that you know what if we throw this out as an educated, API server, uh you know if you have some, the SL alert is not gonna fire.

E

I'm afraid we will.

D

Yeah yeah right because that thing is not like an average user, can configure right and.

C

D

It varies from cluster to Cluster and where was it? I found it somewhere um if folks are interested because I remember having to deal with it when I was working on um open shift uh just for for reference right here, yeah this thing here right.

D

So this is where you measure like um thing like the burn rate, and then it goes into like some elaborated um formula: yeah uh competitions and calculation, and then, when that thing fired, the APF is going to kick in and then you know, then the users like they will see tons of warnings and errors um in their kubernetes controlling plain blocks.

E

Yes, yeah: it's a hard problem to solve yeah.

D

Yeah because I remember, if I might be uh yeah I, remember being that person who put all this in yes, I spent like two months dealing with this. It's not fun to do. I think this is the main thing. If we can somehow spin up a cluster and say, like you know what oh yeah, this is terminates the request, because apis have a world terminate your request. Imagine like this kicked in while you're doing it halfway doing your backup operation and then the kubernetes API say oh I'm, going to start terminating your request.

D

Even though you're halfway paging through your CBT, it's going to be a really bad user experience and there's nothing. The user can do to recover from it because they have to cut the operation and just wait for API server to slowly recover from there.

E

You probably won't even be able to restore, you know recover from it if the load is just too big yeah.

D

You have to wait for the API server to recover itself, which is like the you know: yeah I spent a couple months versus so but anyway, so I think that's the gist of it. um um The.

D

Yeah I am open to like um suggestions on like. Shall we continue to try to figure out if, while in popular will work, shall we uh I mean okay, so yeah multiple feels like there are multiple friends that we can attack this, um which I think like um you know, we can come up with some numbers and say Hey. You know the P99 is not that bad.

D

We can like um brush off the dust of the initial proposal and say: hey we're. Just gonna expose like um going back to the exposure out of band rest endpoint still with I think kubernetes cluster, but it doesn't go to the kubernetes control, plane, um I.

E

Kind of agree with the cotton on his statement. Typically, you want to avoid data pass traffic through the control plane, because data pass typically needs to be a little bit more available than the control plan components but I like well. Your your PV controller may have a 99.9 of availability, but your data path of writing data into the volume has to be like six nines or seven nines or whatever. It is five nines.

E

Typically within a cluster uh yeah, going through allowing the data plan traffic through the control, plane itself, bottlenecks, the availability of the system to the availability of the control plane. So uh that's one thing, the other thing. The main thing is that we're talking about is actually a network bandwidth issue. We can do throttling to make sure we don't bleach, but in general, if we could, maybe maybe we should.

B

Well, I think CBT I mean it's it's kind of it's kind of halfway in between data and control, yeah.

C

B

On which way you look at it, it's you know it's it's control, information right, but there's a reasonable amount of it. That's why it starts to Edge towards data.

E

Right uh we did some math before right, Dave and uh even for for that, I think. The the really thing is the first one right when we first take a snapshot over the big volume and between the first volume to uh no uh there's no snapshot previously then effective with the change blockage. It's basically the entire thing right.

B

Yes, but we really have to look at things like returning extents, so you know we should be returning extents not, but we have the bitmap idea, but in general we should be returning extents, like block X, to block y changed, and so your worst case is every other block was changed. That gives you all the extents are you know size one and there's every other extent. If the entire disk changes you know, then you have one extent: zero to n.

E

The bid map idea does not fit into the token right the token one.

B

The bitmap idea the so the bitmap idea. Yes, you don't get the tokens back because you've reduced it to a single bit, yep.

B

um Yeah, so I I worry that we're trying to design for the worst case when the worst case is really the one you can probably ignore, because things like every other block on the disk changed just back the whole thing up. It's really, you know you're really not gaining that much by skipping half the blocks.

D

E

This is this, is this is a comment, though, because the first time you backup is always like that.

B

No, no! No! No, because we're talking about extents so it so so, if you so, for example like if you changed every block on the disk right, you get back one extent it's going to say, block zero block, one million and that's all you're going to get back you're getting back like 50 bytes of data.

B

Right if you're, if you're extent based and everything on the disk, changed.

E

So again, this is the this is where.

D

Yeah I think like.

E

Right, the the key is okay. Maybe the worst case scenario is not that that bad, it's basically right. It's.

B

E

Other product, you need to change, you changed.

B

That's like your absolute worst, I, think and and then you've got like this initial backup problem and it's like how was the disc written? How big are the blocks you know, so how many of these extents are you going to get back and that's you know we could mess with that a little bit.

B

um There are other things you could do. You could constrain it. You could say, look you know if there's more than you know, 64k extents, just back the whole discount. It's an optimization API.

E

You cannot do that. No.

B

Sure we could, we could just say, because we're returning whatever we say so like, for example, if there's more than 64k extensions say you know what back the whole disc up I'm going to give you the extent is zero to one million. You.

E

Know the whole disc, no, no I I kind of disagree on that.

A

E

Needs to have a very clear contract array that.

B

Is a clear contract if the API tells you block zero to block one million changed, that's what you got to back up! It's the entire disk.

D

So that's assuming it knows that block 1 million is the end of the day. It's great it.

B

Has to know the end: well, not the backup software I mean the backup. Software needs to know that, because it needs to know the size of the disk in order to back it up.

D

So the yeah um I think like um it feels like.

D

I guess like not just the um so even if we park like the worst case um scenario, it's like you know like, even though the worst case scenario is not that bad um I'm, just trying to think like, even if, like we go, it's just from a formative perspective again right. Just rereading like um the comments there in the cap a matter of some stability right, if something like um goes bad, assuming, like everything, goes through the kubernetes control plane.

D

If something goes bad case in your average scenario, whatever something goes bad, what can the user do? There's nothing, because.

B

The whole disc.

D

No, no, if the control plane is affected, if the control FPS levels had to terminate requests, yeah.

B

Then you didn't get your CBT data and you have your. You, have your choice as.

D

Long as you, you know, halfway through that paging process like.

B

So so so the CBT data is an optimization. It's not required to do the backup. It's an optimization, so.

D

Regardless right.

B

A

B

Mean if you, if, for example, you get an error on reading this vbt data, you have your choice, you can say, fail the backup or you can say you know, just back up the whole disk.

D

So if kubernetes API server, like um you know the the P99, the SL other fire I, just have to rate limit and throttle, and you know basically the API calls that affects the control plane that affects the rest of the cluster. So, regardless of where the best case, you know what we do on a backup service like.

B

Sure, but that's really, depending on how much data is really being pushed through it right, because if we're, if we're commonly pushing through enough data to break the API server, we shouldn't be doing that.

D

So it's enough is subjective right, like we just.

A

B

D

Make a blanket statement around like oh: this is how every server, how much.

B

Now there has to be a blanket server statement. I mean you can say it's 1K 10K one megabyte right, I mean there's got to be a point where the API server handles it or it's broken so I mean that's kind of like saying you can't put anything through the API server, because anything might overload it because you might be running on somebody's wristwatch. I mean it's like that's you know, there's there's always some minimal reasonable expectation.

B

D

Is not different from anything else right right and hence they have the SLO. um You know metrics defined. So it's up to us to proof that you know like so a week is 5K 10K. Whatever numbers we we plug in here, is it's just speculation right? Those enhance, like you know those other Matrix that I share are like concrete data that we is up to us to prove that it doesn't.

B

Well so work it. The other way is there a number that that you know is a Max that we should be pushing through.

D

I think it's just as long as like the API server alert. Doesn't fire.

B

That that that's basically a rock fetch I mean there's. You know.

E

You cannot easily control, that's the key Dave. This is a shared API. Server is shared across the ball, but.

B

Everything, so this is no different than everything else right, I mean. Why is this different.

E

It's because this load is too big.

B

That's no. You.

E

B

To Define what too big is and then make sure you stay under it. That's.

E

The reason why they're asking you to run stress, testing right.

B

That no because that doesn't make any sense, because that's saying that you you know because we can calculate we can cap things. For example, we could say we will never return more than x megabytes of CBT data.

E

Right, it's also a frequency and a number of the qpsc right again.

B

I mean these are all things you can. You can calculate out, I mean it's the same thing I mean if someone if the backup software decides to create volumes from a thousand snapshots right, that's simply what it does. It's the same thing. If an app creates. You know a million pods, that's what it did.

B

I mean yeah, it's kind of ridiculous I.

D

Think those are all good coins here, so I I'm, just being mindful of time here like we're one minute left um I, think like uh to quickly sum up like um I. Think just one last item I want to share um sorry to um interrupt there.

D

I think like uh so it's about the I think like um Prasad has um agree to like maybe do some of the help us with some of the networking benchmarking there um maybe like uh we can talk about it offline about how to put some like concrete numbers to back up our claims. At the end of the day. You know uh it's I think like somehow to some extent like we ourselves here are conv. We have concerns, but we you know.

D

We also convinced to some point that maybe that is the right approach, uh but we need to convince like the bigger group, a launch, but one last thing: I know it's 10 o'clock, but um I have brought up this. A few times alluded to this. A few times like um I um we found like um are currently not being active on the cap.

D

I don't know when he will be returning to this cap if he is um um I need to help with like um someone to co-author with the cat, I think I believe that Prasad has um kindly agreed and to step up to a co-authorship role to help manage the cap and move things forward. In case, like um you know, I'm, on holidays or whatever so yeah is that all right is that the okay.

A

Yes sure, okay.

D

Cool yeah I'll update the cabin at you to the co-author list there. So okay I think like yeah.

A

D

Okay, we'll be in touch um yeah, um happy, New, Year um and happy holiday, we'll touch base again in the new year thanks everybody. Thank you.

C