Kubernetes Storage SIG, 27 Jul 2023

Previous Meeting

Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG Storage - Bi-Weekly Meeting 2023-07-27

Description

Kubernetes Storage Special-Interest-Group (SIG) Bi-Weekly Meeting - 27 July 2023

Meeting Notes/Agenda: -

Find out more about the Storage SIG here: https://github.com/kubernetes/community/tree/master/sig-storage

Moderator: Xing Yang (VMware)

A

Hello, everyone today is July 27th 2023. This is the kubernetes story, sleep meeting. So today we are going to go over our 1.28 planning, spreadsheet and and then I think there is one you should really want to go over uh and we have just passed. The test freeze deadline on Tuesday, uh so the next next Ally is August 8th. uh That's the dog complete and reviewed deadline.

A

So let's go over this one sheet.

A

The first one is recovering from resize failure.

A

B

It's not I think the code has been merged, I'm, not so sure about the jokes, though,.

A

Yeah I remember someone says, he's going to update it, but I don't know which one was a PR. Okay,.

B

Yeah there is not even a placeholder.

A

Okay, I think you just need to update Reynolds uh already a dog in place that was added earlier. uh Okay, so it can be painting offline or not.

A

So, okay see the next one provision volumes from course, namespace snapshot, PVCs.

C

C

Pr and the magic.

A

C

And they already finally got is not updated.

A

C

ah Different current is a no update.

A

Oh reference, okay,.

D

E

D

The the code PR is blocked on the reference update reference Grant is that right, blocked.

C

Our deep operator.

A

um Okay, so which code PR we're talking about we're talking about the the why, in the external, uh the volume data source, one in populator, Repose I'll be talking about some other injury, PR yeah.

A

A

Taka for me, you want to okay, which PR your the code PR, which one you said it's merge is this little.

C

Our Podium populator, oh.

D

Populated okay, oh.

C

D

Is it is it merged or is it blocked.

C

F

Well, let's just have you share the pr on the chat and then we can go from there.

A

Yeah thanks, okay, yeah! Please uh just uh share that on the chat. Thank you. uh Our next one is additional metrics, so this is still working progress. um I don't have new update on that.

A

And change block tracking, so we we um so we had a meeting yesterday's death protection group. We went over the changes, so basically the team went to seek off and got some recommendation from them.

A

um So so the team will be updating the cap accordingly, but then do you want to update on this one.

G

Are you talking to me Shane yeah I am struggling to get my uh my Bluetooth headphones to work. It's just.

A

G

I uh so tell me which one it was again. Sorry I was oh.

A

This was the change block tracking you want, you want to add any comments there. What.

G

A

G

Yesterday, yeah I mean we um we, we did another review of it yesterday in the data protection working group meeting and the big update was uh the the developers working on a head talk to Sig auth and they had proposed an entirely different authentication scheme which is way better and way. Simpler and I think we were all pretty happy with it and we spent the meeting discussing uh just a new way to have access control based on the new authentication mechanism and I.

G

Think we basically reached agreement on that, and so um it's good progress, I I, don't know where the CSI spec changed for that stance. I think I think we should. uh We should work on that soon um and uh and I, don't know where the cap stands, but the overall design is is looking ready to go. In my opinion,.

A

H

Yeah so yeah we'll we'll continue to work the design and the Prototype and update the presentation. And then, if we're all good at in the working group, then yeah I think we'll go back to CSI and work on the spec.

A

A

Okay uh as soon as uh okay, so this is the the oh. So this one is March. Okay, great.

A

So let me add this one here.

A

I

uh Quick question is this feature the same as uh the cross namespace volume data source, uh yes related attributes which was introduced in 126..

I

uh The reason I was asking was because there has been some uh recent activity where folks have asked for uh that functionality to be enabled in beta and and I just wanted to make sure if this was exactly. If this was that feature or not sure,.

F

I

Yeah, yes uh great, so so this this would be available in in uh in beta starting.

A

Monday, no, so because it's blocked on so the reference Grant, this feature is moving uh to the to the core code. Are we moving this entry uh Michelle yeah.

D

So so, currently the the feature depends on the Gateway apis reference Grant and before we move this feature to Beta. We want to move this reference, Grant object into the core kubernetes, and so that proposal is still under discussion. So until we can move reference grants into core kubernetes and move that to Beta, um this feature will be blocked from moving to Beta.

I

Got it okay, thanks for that update, so do we have any expectation in terms of when uh that can happen? Like do you think.

D

I

uh Reasonable, or uh do you think so.

D

I think 129 would be reasonable for an alpha of the reference Grant and then 130 would be the earliest that we could move reference Grant to Beta and then and then be able to move this feature to Beta.

I

Got it and uh can the move to the reference Grant to Beta and the move to offer cross namespace volume data source to Beta? Can those happen in the same release or would that have to be I? Think.

D

I

Can move in the same release? They can okay sounds good, so tentatively, we'll just assume that around 1 30 or so we we expect that to happen right. That's.

D

The best case scenario- okay, um I, do think- maybe maybe takufumi you can clarify, but I think we do need help pushing the reference scrap work to both Alpha and beta.

I

uh What kind of help are you looking for.

D

Umi, do you wanna clarify.

C

I think you're.

I

C

Else, okay, uh I think uh touching yeah, that's the one! Please.

I

Sorry, could you repeat what you said: I couldn't follow. You.

C

uh So supportive, uh cable, URL, uh uh I search for and the chat things that I meet in chat.

I

Okay, so so you need, you need some support on the cap and and potentially on the implementation as well or.

C

I

Got it okay, uh I know that there's some interest internally uh to to have this available sooner so I'll see what we can do from the AWS side and uh I'll Circle back with you folks. So.

A

I

A

And next one is enable privileged containers or Windows to replace CSI proxy. um So many you have an update on this.

I

uh Yes, so uh we will likely uh pick this up in the near future. uh I know that there is some discussion that we need to kind of conclude on with Mauricio and others, so uh we are working on identifying like you know uh uh who can work on this and and what the next steps should be from our side, uh but uh but we should be able to start contributing some Market to this in the near future. So.

A

Great. Thank you.

A

And the next one is the secure links, relabeling easy man options. Young. Do you have an update for this.

B

So the fix has been merged to master and it will be in kubernetes 128. enabled by default and I am exporting the fixes to 127 disabled by default. So I can ask people homes uh whose clusters were broken to test.

A

Okay, but we don't, we do not want to enable it by defaulting 1.27 right because it's risky, oh maybe.

B

If this, if they tell me that all of them are fine and maybe but I, don't have high hopes to be honest,.

A

A

A

And the next one see some aggression remove inch GCE, so this one is done: do we have any other?

A

um So this is done? Do we have any documentation or things like that to follow up, or are we just uh complete.

D

I, don't think so.

A

I'll just say done:.

A

You can cross this out.

A

Windows, 7.20, 1.30 and then I think sap is also so self RBD and self access are um deprecated. So those are. We also have a dog for those there's. A dog.

A

So the code, uh the code is already merged.

A

And next one is the better default story class, so we're moving that to ga uh young. Do you have an update.

B

Death has been merged.

A

B

Merged and including jokes, yes, so.

D

B

Miss the block.

A

uh The blog I think the deadline has I think there is a placeholder for that I. Don't think the deadline has come yet.

E

A

There was one I forgot, which one maybe I think it's the other one. The alpha one I remember was the issue with the dog, but this one. So there is a placeholder Vlog PR thing.

B

Yeah I still like Roman, is busy these days. Oh.

A

So he has no feeling: okay yeah. We.

B

A

Okay, yeah. We still have some time for the for the blog yeah um next one quality of service for volumes.

A

So we uh sunny, um sunny or mad.

E

Yeah so I didn't get time to work on that, but I see most tasks. I picked up this morning.

A

A

D

How about the CSI is aspect? Changes are close to being merged.

E

um Like we addressed some comments but like James says something else, so at this point I don't know.

A

Okay, sister being reviewed okay, so probably still will take some time for everything to get resolved.

I

So I think there was one there were three PR's three issues if I remember and uh uh I believe that one of them was not yet, uh at least when we had checked last, nobody had a sign being assigned to had volunteered to pick that up. So uh if I don't know what the latest is, but if that's the case, then uh Colin from our team can can help with with that particular peer. So.

E

I

uh Colin is on the uh sorry, not Conor, Conor is on the call uh and and he can reach out and uh and he can let you know which, which specific PR we are talking about.

E

Okay, I think it's coroner, who picked up the extreme excellent provisioner task this morning.

D

E

I

Yeah I was on vacation the last few weeks, but I just assigned this morning and I should be able to get the pr.

E

Out, probably sometime next week sounds good. Thank you.

A

Thank you. Thank you, okay, so it looks like we are making good progress here, um see next one robust volume manager, reconstruction, Young.

B

But it's the same as the city looks.

A

Okay, so fix this merch inside okay.

A

All right, thank you and okay, so this is the one, the PB last phase transition time, yeah.

B

That doesn't even so, code is merged, but we don't have even the docs placeholder I am Pokemon to create one, but yeah he's busy.

A

Okay, yeah, if you can because I think, um because the dog ready to merge their line is August 8th right. So we still have a few days. Actually, if we can yeah, if you can submit a while, I think she'll still be signing.

A

A

Next, one is story capacity scoring um Michelle. Do you know? Is there any update on this.

D

No I haven't had a chance to look at this, but it is um on my plate. Okay,.

A

Thanks and next one is the water extension for staple said um so come on it's not here today. uh Yeah, do you know if there's any update on this one? This is.

B

Like I think it just slipped from the release anyway, right.

A

It's a design anyway right yeah. This is a targeting design. Meeting against you, okay, all right thanks and next one is not graceful, no Shadow. uh So this one uh code is merged and and then there's the E3 test. But then we decide not to merge that because it depends on gcp, but there is a integration test that is uh being reviewed, so we'll try to get that merged once the code freezes lifted.

A

So that's all we have here um Let's see we have a few things uh Justin. You want to go over. The first item.

E

J

J

How should I um share my screen? Oh.

A

Okay, so let me make a new a presenter.

A

J

um So I just wanted to do a quick demo on a new addition to the local static, provisioner repo.

J

um So it's workload recovery, so I'm just going to go into the background on the problem that we're facing the controller. That's the solution to the problem and a demo of the behavior of the local static provisioner before and after the controller implementation.

J

um So the problem was that as I'm sure, a lot of you know pause using local PV are always scheduled to the same node as the local PV they're using. But when nodes fail, while having the local PV attached to them, the pods using local PV becomes stuck since the scheduler is trying to run them on the Node. The local PV is referencing, but obviously the node is deleted.

J

So it can't do that, and the issue here is that the PV and the PVC aren't cleaned up when a node becomes unavailable, and so all those resources become stuck. um So we had some user concern. There were some GitHub issues out for this for recovering workloads from no deletion, and so the solution was a cleanup controller. That's already been merged in the local side.

J

Provisioner repo, the high level flow, is that when a node is deleted, we look to see if there was a local PV attached to the node, and if there was then we start a timer for say, like a minute. The timer is user configurable and if, at the end of the timer the node is still gone, then we delete the PVC first that was associated with the local PV to that deleted, node and then the second step is that the PV now becomes released since the PVC is deleted and there's a second process always running.

J

That's looking for released and available local PVS attached to deleted nodes and deletes those, so a typical flow would be, the PVC is deleted, the PV is deleted and the Pod is able to be rescheduled to a new node.

G

But in that case you lose all the data that was on the PV right.

J

um So no, we assume that when it's so the first bullet point here is that well, the first and second bullet point here are that we assume that local data is lost when a node is deleted. So we wait for some amount of time, because there might be some edge cases in which a node triggers a deletion event, but comes back immediately and data is not lost, but most likely when a node is lost for some time.

J

The data is already gone, and the other part of this is that the controller only deletes the PV objects, we're not actually draining data. um So.

G

If you go back, what what's the user's expectation when they try to use the local PV, is it? Is it understood at the moment that they start using it that the data might go away, while the workload continues to run.

J

Yes, so um yeah, so the first bullet point here is that users already know that a trade-off for using ephemeral volumes is that data losses possible, and so we assume that users already value workload up time over um keeping data, since their workload should already be resilient to data loss.

A

When you assume that the workload is actually doing your application of the data themselves,.

J

A

Can we assume that the application is actually replicating the data, so maybe it does not matter they're running multiple replicas, so.

G

A

Is lost but then.

G

I mean that's a possibility, but my concern is: is there an expectation that, once the workload starts running like it, its data will be there and and the workload will not continue running if the data is not there, you know what I mean like does.

D

D

Like in order for someone to use this controller, um you know there's a couple of different assumptions made about the workload and how the workload can handle data loss. But generally, this is designed. This controller is designed to handle um the local storage. That's in Cloud environments, where it is inherently not a durable storage, and so, if someone is using local PVS in the cloud environment, they have to already their workload already has to be resilient to just the data disappearing.

G

Okay, but but normally when the workload would restart, it would restart on the same node and it would still have its data like if it was just a node reboot. This would be a special case where the note workloaded restart and this data would be vanished and I'm. Just wondering is there anyone for whom that would be a problem and if the.

D

Answer is no, then.

G

D

It's very possible: some workloads can't handle that right, and so in that case they should not use this foreign.

G

Documentation or a deprecation warning that we need to put out if we make this change say like Hey we're previously.

H

G

Going to let your workload languish when the node went away now we're going to start it up again and it's your problem to deal with the weird behaviors that might result from that.

D

Yes, the documentation for this controller explicitly says all of that. In the beginning: okay, cool.

J

Yeah, thank you Michelle and additionally, the controller is completely optional and, as I'll get into later, it's opt-in per storage class. So you don't have to opt in all of your workloads too. The controller so I have a demo of the before Behavior.

J

So here we have a cluster with three note three nodes and we have the local static provisioner running and we have a staple set SS with one pod ss0 that's running, um so we can see that ss0 is running on the Node ending in fs48 and it's using the PVC sspvc and that PVC is bound to the local PV ending in E4. So to trigger the issue.

J

We delete the node that the Pod is running on, as we can see here and we get the Pod and we see that the Pod is stuck and pending because, as I said, the scheduler is trying to run it on the deleted node and just to verify. We can see that.

J

We have a warning here: failed scheduling, node info not found for the node ending in fs48, and if we get the PVC, we can see that the PVC is still bound to the local PV and again E4. So that's kind of the issue here the Pod would stay and that stuck State forever and so with the controller.

J

If we reset here, if we have a same conditions as before, we have a cluster with three nodes. um We have the local static provision running staple set SS with pod as a zero. This time the Pod is running on the node ending in 9rsp.

J

And the PVC it's using is still sspvc, which is bound to the local PV and AE. So this time before deleting the node, we will create the cleanup controller, as we can see right here, and we can see that the cleanup controller is running.

C

Right here and.

J

Just like before, we delete the node that the Pod is running on, and then there are a couple things that we can look at to see the difference this time.

J

First, we can look at the logs of the cleanup controller. So when the node was deleted, we started the timer like I mentioned before for resource deletion. uh The timer here was set to one minute and after one minute past the node was still gone, and so we deleted the PVC that pointed to this uh deleted node and then, after that, the PV became released, and so the controller was able to delete the local PD that had never Affinity to the deleted node as well, and as you can see from this output up here.

J

Okay, when we watched the pod, we can see that the Pod uh got deleted when the node was deleted and then went into this pending State as before, but unlike before it was able to get back up and running once the PVC was deleted and as we can see here, the PVC was deleted by the controller and a new one was brought up and bound to a new local PV I think in BD here and as we can see here, the pod that's running again. It's now scheduled to a different note wyty's.

J

So it's able to get a new PVC and PB.

J

And just the last thing to look at is that the extra local PV that was there, we initially started three, since we had three nodes. The extra local PV is now deleted because the node is deleted which, before it would have just stuck around.

J

And so some things to know about customization is that the controller only deletes PVS and PVCs belonging to uh specific storage classes. This is a command line argument, so you can opt in certain workloads for this cleanup and opt out others. You can set the delay between no deletion and PVC deletion. Just in case you want to be conservative about uh not being too quick to delete resources when uh no deletion event is triggered.

J

You can set it to like five minutes if you want to, and you can also set the delay for PV deletion, which is the same idea and some code pointers. The controller was already merged and the documentation for it was emerged as well and there's an example, deployment and role-based access control for that as well. So you can use it now if you would like, and if there are any questions, I'll take those. If not thank you for everybody's time.

D

Awesome. Thank you very much.

A

Yeah, thank you.

H

A

uh So in terms of documentation, you will be updating the release notes for this controller. That's.

D

Right everything will be repo.

A

D

It's going to be released um in the local static provision, you know yeah and so all the documentation there and the change log will be updated. Okay,.

A

A

Okay, uh sorry, the new to go back to this. Let's see, what's the next.

A

I think the next one yeah. So um we have an another one um uh Baptiste you, you want.

K

A

You hear me, yes, do you have anything to share or.

K

um Well, no, basically, just the issue: that's LinkedIn,.

A

All right, Trace instrument, all right. Let's see.

K

So yeah I'm, just like I, would like to implement some uh basic uh grpc tracing on the CSI components and, like obviously like hidden behind um a feature flag, because we don't want to have to have it enabled everywhere. But uh yeah I would like to know what the the folks here think about it. And if I can proceed with some peers.

A

uh Somehow saw this one: did we look at this one before somehow remember? There is a similar one that we just uh reviewed.

K

A

K

It a few weeks ago, so maybe you saw it before.

A

Did you open an issue and yeah another report or.

K

Maybe yes in the um CSI drivers like for gcp and AWS.

A

Yeah I thought we actually discussed this one a few weeks ago, I thought even did you already submit some some PRS for this? Nothing.

K

Yes, no I submitted a PR on like adding people off.

K

A

Right, so this is okay, not exactly the same. I I see I, I I. Remember this one! Okay, um it seems yes seems to make sense to me. uh Michelle Young, any comments on this.

B

So if I understand correctly, it is the same as we do in the sidecars that we lock all the grpc messages right.

K

um Yeah, basically, the the sidecars are like um uh using um like uh an interception grpc course and currently I think the only Interceptor is like for logging and I just wanted to have uh like one more Interceptor to for um like open television, tracing, okay,.

B

That sounds good.

D

Yeah seems good to me, um I guess. The only question is I'm not familiar with this library. Is this um sort of like a it's.

K

It's like the official um like um open, Telemetry lab for grpc instrumentation.

E

B

E

B

B

Because, like we send in the grpc messages, we send some secrets in credentials: uh I hope they are not going to leak. The.

K

um Is it only on the because uh I've already like uh did something similar uh um in a um a deployment on my site on the AWS CSI driver, and there are like no secrets so whatsoever looking through it? So like the.

D

K

D

Guess the it depends on the Cs I drivers from CSI drivers, secrets and others don't so it may not have shown up if, in certain drivers.

B

So maybe let me ask differently: what does it Trace? Does it Trace all the fields in the grpc messages.

K

I'm not sure I get your question there.

B

Like what does it look in the end, and it will open Telemetry.

K

um Basically, uh I don't have an example right now, but uh it's uh you have the like. If there is any an error, you have the grpc message with the Theo and it's basically it's like very basic instrumentation, like uh the time that the like how long the call took um and some metadata around like uh which uh grpc service was called, uh and things like that.

B

Okay, that should be safe as long as it doesn't lock the videos all the arguments of the call. Then it should be fine again.

K

Find an example of uh the the fields we have uh on our side and just put it in in the issue to have like, so you can have a look at it thanks.

I

One quick thing uh so, uh first of all, thanks for uh sending the a pull request for the AWS EBS CSI driver changes Baptist. uh We did see that request. We are actually evaluating right right now. What our position should be regarding uh regarding such changes.

I

uh I personally, don't have any problem with uh the changes that you have, but I need to kind of make sure that uh we are consistent in terms of a policy around what monitoring framework we are using within the driver code and and also like you know, uh uh just need to make sure that there are no concerns around adding of telemetry within the EVS CSI driver.

I

So we'd probably have a response for you officially on your pull request at some point in the future, uh but uh but I just wanted to mention that here, because we were talking about it right now,.

K

That's great, thank you.

F

Will this change need to be made both on CSI Libya tools and on the driver's side.

K

um So um once the change is made on the libutil side, um I would have to go through the um um sidecars and like update the code to just use a new um function and this part will be hidden behind the the feature flag and on the CSI drivers code itself. uh uh It needs to be different because uh the CSI leave utils is only responsible for, like the client uh part of the grpc calls and the CSI drivers use their own implementation for the server part.

K

So it's different basically.

A

Okay, so basically you need both.

C

I

So is there a particular uh motivation for uh for these changes on your site properties like? Are you actually trying to do some performance profiling, or something like that for which you need this or yeah.

K

So, on my side, I'm like we are pushing in my company to uh like, have some more uh in depth of the observability under the components we are running to like be able to know, what's like what's going on, and if there is a failure or something to be able to debug, what's going on uh more efficiently and so yeah like uh I'm, going to to do it. uh If I got Upstream it I'm going to to four kids, so I'd rather explain it I think.

I

Got it and uh I think uh basically, what you're saying is the goal is to do it on both the Upstream components and the CSR driver components to get a complete picture is that.

K

Right yeah well, uh once or like one component is enough, because we like it's better to have like the client and server obserability to have like a more detailed view of what's going on button once or only one is enough to get some basic uh stability. At least.

A

All right, thank you. Thank you. We will review your PRS.

A

uh As this one I just copied from the the chat uh it's from takafumi so menu, if you are you asking for this information right or you you're.

I

All set yes, that's right! That's right! So we will uh I have made a note. uh We will follow up on this and we'll see if uh we can do something from our side to help this along further uh I I'll need to speak to some folks internally to to understand. If there is somebody who can help out with this effort, so I'll provide an update in the next meeting. Hopefully, okay,.

A

Thank you say: uh issue opened for some AWS E3 tests. I don't have it here, but I will ping you on slack for that. So if you can help.

I

Take a look sounds good.

I

um One other thing that I wanted to update in the last six storage meeting uh I had promised that I would get an update on a couple of EFS CSI driver requests, which had been uh sitting for a long time, so uh I forget who it was who had asked, but uh I can provide an update uh here at this point. So there were two pull requests. uh One was I I, don't have the link link handy, but uh let me see if I can put it up.

I

One was for a PR uh uh 7, 7, 32 I, believe yeah I I.

F

Found the link so I can share that.

I

And the other one was for PR 850, uh so for 732. uh We follow so for both of these I followed up with the EFS CSI driver team, and they confirmed that they are both being actively worked on. It's just that uh it hasn't I. Think just before the last meeting, uh just after the last meeting actually for 7 32, there was an update, so I think that is reflected in the issue and for 850.

I

There is some performance analysis that needs to happen, so they are in the process of trying to prioritize that work uh and they will probably get to that in the near future. So so both of these are things that should uh should get worked on and addressed in the in the relatively near future. So.

B

Yeah, thank you very much. Give us me who asked last time. Oh.

I

B

Yeah I saw the comments in in the PRS thanks a lot.

I

Yeah no worries uh and uh uh if you can send me uh the specific issue that was opened I can I can follow up on that sure.

A

Yeah I will I will ping you all right, I think that's all we have. Are there anything else you guys want to discuss in anything.

A

If not, then that's it for today. Thank you. Everyone, bye-bye.

I