Kubernetes Storage SIG, 24 Aug 2023

Previous Meeting

Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG Storage - Bi-Weekly Meeting 2023-08-24

Description

Kubernetes Storage Special-Interest-Group (SIG) Bi-Weekly Meeting - 24 August 2023

Meeting Notes/Agenda: -

Find out more about the Storage SIG here: https://github.com/kubernetes/community/tree/master/sig-storage

Moderator: Xing Yang (VMware)

A

Hello, everyone today is August 24th 2023. This is the kubernetes logistic meeting. So uh now that there's 1.28 is released, uh we can wrap that wrap that up and start to do 1.29 planning.

A

So this is a temporary tentative schedule and there is a PR out, but it's not merged yet so looks like the enhancement, freezes, October, 5th, uh butt code phrases, October 20th, October 31st, so the uh yes, the time between enhancement, phrase and code phrases less than a month.

A

Okay, so um so we have copied 1.28 sprushy to 1.29 to get started.

A

um So if there are anything new that you want to add, you want to tracked for 1.29 and targeting even 1.29. Please add it so I'm just going to go through this one and see uh what needs to be added or removed.

A

The first one is yeah is.

B

That document linked in the.

B

The the spreadsheet, the.

A

Spreadsheet is here: it's.

B

Linked in the in the meeting notes: yeah, okay, yeah, so.

A

One question is it's the same spreadsheet so uh basically we just added this one new sheet, which is copied from 1.8.

A

Right so this is the same uh okay, so the first one is recovery from resize failure. So this is a in 1.28. This is okay down right um yeah. Do we still have some remaining work for this inside car.

C

Those that those are done merged as well, so we are just waiting for the release. So all the work targeted for 128 is done and for 129 I don't know. If you're are we just doing 128 planning then, but in 129 I want to work on restoring the size all the way to the original size. That's the design that we didn't finish.

A

Right here, okay, so this is the this is done.

A

um So uh now do we I wonder if we want to? uh Are you going to have a different uh enhancement issue? It's going to be part of this for.

C

Let's we're not going to use the different it's the same enhancement but okay, all right, I, don't think we should Mark it as industry status quite yet, because until it's G8 I think yeah.

A

So it's still okay, so we'll still keep this one is Alpha in 1.29, I. Think.

C

Yeah, because we just made API changes in 128 right.

D

C

uh We could not, we shouldn't maybe turn it beta immediately. So right.

A

So, okay, so maybe I'll just change this one I will I'll update the 1.28 on later. uh So this one.

A

I'll just say that um precisor research PR merged eating for release because like are released and then the other one is, you will be waiting in 1.29 will work um a decrease in the size.

C

uh Restoring the size to original value, okay,.

D

Store residue original.

D

Okay, thank you.

A

B

The thing I I've requested added access to the sheet.

A

Oh okay, okay,.

A

um Okay, so maybe uh maybe that I'll get an email, then we can get a Creator.

D

E

You, the owner, do you have comment access.

D

E

Okay, I think we had. um We had locked down the sheet I think because.

B

E

Some point: someone like totally messed up the document.

A

Right, uh do we actually in general, does everyone have a comment? Everyone.

E

Should have comment access, um yeah, but I think before it was. We gave everybody in the Sig edit access.

B

E

B

Oh, no, there was a point in time and it was wide open and Anonymous people could edit it and it was.

E

B

Went okay for a while until something bad happened.

F

Yeah I think someone accidentally messed up the spreadsheet, so we locked it down. um So if there's any changes you want to make just put a comment and then uh Shane, Michelle, myself or yawn should be able to edit it.

A

Okay sounds good.

G

A

Okay, awesome so uh other than I. Think this one okay I think I changed this one, so that I think this should be fine, so fast um API how to say API chain just submerged in 108 uh I'm going 29.

A

um Let's just say: stay in Alpha, okay, so the next one we crossed out. So we don't talk about that uh yeah this one also one group I, think it's still I think we were moving to another status yet so uh next one is provision voting from Cross namespace snapshot. Pvc.

A

Do we uh does this stay the same Michelle or we.

E

um So this one needs um is blocked on the reference Grant work.

E

um I haven't checked the latest status of if the cap for the reference Grant is ready to review. Okay.

A

Is that I forgot the status uh so that cap was not merged in number 28 correct? Okay, the reference.

E

Grant was still under discussion, okay, and there was um supposed to be some updates to the cap, but I don't know, I haven't checked to see if those updates were made. Yet, okay.

A

All right, I'll just keep it design then um so next one is the uh see if the wooden house, this is a basically at the E2 test. I think still I need to uh pin this person so no no update. Yet someone is working on this one, but I'm not sure what is the latest status on this pin a change about.

D

A

So we yeah- we did talk about this one in the data protection meeting, so we'll Target Alpha in this release.

B

Yeah for change block tracking. There is a cap and a CSI spec PR that are pursuing both in parallel.

A

uh Oh so party, we should talk about this one. um When do we need to get the csis back released for because we last time we talked about, we want to wait, see if we can get this money in before we cut the release and so for what, if I, what other this one quality of service? How long can we wait? I guess.

F

Could we uh discuss this at last Wednesday's CSI meeting I missed that one we.

A

Did we did yeah? We do not know how long we can wait. I guess that's the thing. So if.

D

A

uh I think we should have waited about I mean um so.

F

I think, uh as soon as the schedule for 129 is published, oh it looks like it's already. There.

B

B

F

Can we can pick maybe like I, think three or four weeks before code free seems like a good time.

A

Okay, yeah, so code freeze is the end of October, so it's like one month so like in September. Okay, all right, we can talk about that yeah. Maybe.

F

Let's uh put a tentative uh bullet point right there uh with a CSI spec date uh right above uh the code. Freeze, okay, we can add our own bullet point. Yes,.

A

Back release date.

H

A

So, tentatively, tentatively, uh what September September 30, something like that.

F

Yeah that sounds reasonable.

A

One month before the code- freeze- okay, I'll just highlight this one, so we can come back to this one next time all right. Thank you.

A

Okay, uh yeah so Ben, please uh review the csis back and.

D

A

Also cap, um okay,.

A

Next, one I think this one is still crossed out that we don't have anyone who is picking this one up so I think we'll just leave it like that. Next, one is enable privileged containers for Windows to replace CSI proxy is the menu here.

A

Today so we'll just wait.

A

Looks like he needs to look at it.

A

Okay, so next one is the secret links are labeling using amount options. Young is Young here.

I

Yeah uh I will probably keep it in beta this release because uh it was just enabled by default in 128., okay,.

D

I

Beta and I also promised to it.

A

D

A

D

A

D

I

Promise to collect metrics but I need people to use it first and then I can click metrics. So it will take some time. Okay,.

A

So, okay, so this will just um now do we still need to track this? Okay I'll leave this one for now and see, but but then this should probably not not be done or what we do is this.

I

What did you do last time if the items that were just waiting for the next series I think we just close them out or.

A

Yeah we sometimes we just cross node just say: maybe we'll cross them out and see yeah I'll. Let you finish something we want to keep track right, so we.

I

Can cross? No? No, there is, there will be nothing done, okay, just close them out and.

A

I

Them in the next release: okay,.

D

Say that um stay keep beta.

D

A

D

D

A

So, okay, so next one is done, uh says migration for GCE. This one is one page okay, so this is. This is for future 1.31.30 and this one is uh yeah. This one is done because we we have already deprecated it, but do we uh well do we need to also uh remove it I guess we did a deprecation right. We didn't.

E

Yeah I think we targeted removal for like 130 or either.

A

131, maybe we just we would do the same thing say uh like like this one right so remove. Let's say.

D

E

Removing I think so we deprecated it at 128 and so three releases is yeah.

A

Yeah I think that's what we said all right and the next one same.

D

A

Okay, so we will just change it to not started, not started.

D

A

uh This one okay, so we do not know what once I will check with the Grant and see if, if they have any plan um okay, so this one always owner reclaim policy. I need I need to check with the Deepak, because he and see if he can start because he's going to look at the details and see if he can.

A

uh What colors so we'll add. A note here say.

D

A

Next one is the better devastatory class, so, okay, so I guess we're done right, GA anything else.

A

I

Yeah, no, it's GA! Okay! It's done.

A

Another can completely remove this one right. We don't really have.

I

A

D

A

All right next one is quality of service.

A

A

Our father right, Alpha.

E

Yep, the CSI spec part merged, but yeah. We need the rest of the implementation.

D

A

So yeah this is still same as you alpha alpha and we'll just change this to change. It started because I think there are some. There are some PRS out.

A

uh Next rise, robust warning manager, reconstruction- this is what this is the oh. This is same. This.

I

Is the beta it's enabled by default and it needs to stay enabled it needs to save beta for One release.

D

Okay, stay beta.

B

Does that mean there's no work, we just we just wait. Yeah. We.

I

B

H

I

To complain that it's broken again.

A

Just to cross it out, then.

I

A

Next was PV last phase transition time. This is uh this is Alpha.

I

Right, this is Alpha and we would like to move this to Beta.

A

Okay, do we need to wait for so I think this is fine. I think this is pretty small yeah. Okay, we'll see Target.

D

D

A

So this is uh the end state is of.

D

So now is in beta.

A

It's not started yet.

I

A

Okay, we're like 1.29 just getting started so so the next one is story capacity scoring. uh So this is design so I. We shall do you know.

E

It's locked on it's blocked on me. I need to review the design and.

A

E

Try to get to that soon.

A

Oh, it is, do you know if uh the cap owner will do a design or I.

E

Think, if it's, if the design is good, it can Target Alpha.

A

Okay, the first should check with him yeah.

E

A

Have to say, or.

D

D

A

Next, one is watering special for C4 sets hammont.

D

C

Yeah I am here: okay, yeah, uh there's been update on the cape I will have to check if updated cap looks good and yeah. Okay.

D

A

You know, um are we stay in Design This? Look this quarter or do we want to talk it Alpha.

C

All right so I think you will want to Target, but it depends like a little bit on the reviewing the design. Okay.

A

So uh so not sure yet right so check. Yeah me me.

D

A

Check his answer.

A

Oh, we also have another this other one. The delete the PVCs when the staple said are deleted um is that beta I was just wondering. If are we doing anything with that one Michelle I think we met. This is the one who is driving that.

E

A

States Auto delete PVCs, we still said. Are we doing anything? That's is that bad? um It's.

E

Beta um yeah I can check with Matt if he um thinks it's ready to go to ga. Okay, it's been beta for I, think two releases yeah.

A

It seems to be sometime yeah, okay, thank.

E

You, maybe add, you can add it back here and we can um I'll try to check with that.

D

A

An old tool removed PVCs these staple.

D

A

Add it here um so next one not graceful, no Shadow, so this is ga. uh We do have the integration test that we want to get merged. So maybe I will just change this one to the test.

A

Okay, that one merged and then after that's March, then we can cross it out. So we started started, and this is uh so just oh just Chuck this as a bug fix now, since.

D

Yeah just still wait, maybe okay, because.

D

A

um So I think that's all we have here. Are there anything else so.

B

I put a comment on the sheet to add a new row. Oh.

A

B

Access to edit it's on that road, 24.

A

B

Sorry, no on the sheet on the sheet. Oh.

A

Here you add: okay yeah, so um oh, oh this one, okay, yeah.

D

Let me see, let me.

A

Copy this one I was I. Think if I for some reason, if I resolved it, then it's just disappeared.

B

And no, no, don't worry. Yeah.

A

Please add around or.

B

Or give me edit access and I can do it.

A

um Okay, I can copy it. I would just I just found it's a little weird.

A

Then it just disappeared. Okay, so the description.

B

So so sorry I can describe it so so back in the earlier days of kubernetes, before we had the non-graceful another shutdown feature. If you had a node that shut down on gracefully, you could have PVCs that got stuck there if the node never came back. So we added this six minute timer that would force detach your your pods after six minutes to deal with that situation, um but that feature was always a violation of the CSI spec and it was just sort of a workaround to a gross situation.

B

Now that we have the non-graceful node shutdown feature, which is a better way to handle that particular possibility.

B

um The six minute timer is more of a problem than a help, and so we want to have an option to turn it off. um It'll, be a command line, option for cube controller manager, most likely. uh We already have a change that does this and we're going to propose it.

A

Does does this need a cap at all, or is it just if it's just option uh we can.

B

A

It without a cup.

B

I think the the eventual goal is to to deprecate this feature altogether, but in the short term, just just keeping the default behavior and giving an option to turn it off as a first step towards deprecating. The whole thing, I think, is a is a good approach.

G

So I remember: we have some metrics uh related to this. We.

A

Do have yeah, so it's one that is tracking the uh the timeout related to post detached the other one for the. uh If you add a taint right tainted related the first detached, but that is I think that differentiation was just added in 1.28 before that uh there is also a matrix. That's just just chucking. The I think Donnie checking the the timeout when I believe.

G

Yeah I mean tracking the six minute uh Force detach. If there is so we we know all things triggered.

A

Okay yeah, so it is there. um So if we disable this option, then that will not happen anymore, so that metric will just be no longer be used right for that.

B

Yeah, the idea is, is this.

A

B

Default value would would preserve the current behavior, but for for administrators that don't want the current behavior. This would be a way to flip it off.

E

But then we enhance the metric also uh wish if it Force detach the reason why it Force detached yes,.

E

If we disable the six minute one, it means that the six minute reason will no longer should no longer output anything, but you could still output the metric for the taint reason.

D

B

Yeah I mean my thinking is the the six minute was sort of a blunt instrument to you know before we had a mechanism to do it quickly um and now that we have it, uh we would want to encourage people to use the new mechanism.

B

um Obviously changing the default Behavior in the future would require a release note. But but this is just a an option to you know: allow someone to disable it yeah.

D

Because it's actually causing.

B

Problems in some of our installations, that's the that's the reason for this like it when it when it triggers it causes. You know, kubernetes to violate the CSI spec and some CSI drivers don't behave well when you violate the CSI spec, and so we would. We would prefer to have a knob to turn this off.

B

If you could yeah, but you could flip it to disable, it.

A

And then one point Thirty: what uh default you say what? What was that we.

D

B

Have another discussion for for 130 that the.

C

B

C

The thing is that yeah, the thing is that the default current behavior is at least automatic. The the graceful non-respondersh shutdown is manual. It requires that yeah right.

B

Right so so changing the default would require release, notes in a formal, deprecation and all kinds of other for now. I think just an option that lets the admin decide to turn it off but preserves the current behavior by default is, is the really safe thing to do and still solve the problem.

A

B

I hope that that yeah and 130 we'll have a discussion about deprecation.

G

Yeah I'm wondering uh like, since we have the metrics. uh How often we see this whether we um it'll have a sense about frequency and the human situation. It happened, but yeah. We can like check that later. Yeah.

A

We don't know you know people probably don't run into this one that often right it's like a no Shadow. It's probably not a very free.

B

Yeah I'm betting, that in a lot of clusters, both metrics would be zero um but then, depending on how how people are managing their clusters, they might have one or the other triggering quite often, and also.

E

The Army for before yawns fix to only Force detach on unhealthy nodes. uh The force detach actually happened quite a lot.

B

Oh yeah yeah, that was a very necessary fix. Yeah.

E

I think, once once John's fix came in I think we have seen a significant drop in number of force. Detaches.

A

You have to see that in your environment, yeah.

E

A

E

Force the attach happens also, if just like unmount is slow. It used to be that Force detach would happen if unmount is slow, and it also happened because there were some bugs in cubelet, like cubelet reported pods as failed too early for jobs, and that caused us to force detach, and there were some other a couple of other reasons why Force detached happened, yeah, but once John's fix, went in I think we um that solved a lot of those issues.

D

A

Okay, yeah so yeah we can observe and see how often that happens.

A

Okay, so I added added here um yeah. If there was anything else, you want to work in 1.29 release. Please uh add it here.

A

All right so coming back uh to the agenda doc, we have an item here. uh Do we have a realm here.

H

Yes, um we had, we created a ETV test with the idea to promote it, conformance once it's proven stable. uh Thank you very much to everybody who has reviewed it. We haven't had any other comments or reviews that it needs more work, so we believe it's good to merge. So if somebody can just give us a approve there, I really appreciate that or if more review is needed, give us there. If you want to get it on the test grip as soon as possible.

A

Okay, thank you.

H

Thanks pleasure, there is we plan to address more endpoints in this release. The next up is a storage class. In point, this seven of them that need testing, so we'll probably be back in a few weeks with that test.

A

Okay, thank you very much for that.

H

It's a pleasure, that's all from my side if we can just get that approached. Okay,.

D

H

A

And okay now mauritio, maybe I should make your co-host that you can present. Do you want to go over your slides.

J

A

Yeah, you are co-host. Let me stop sharing.

J

Okay, can you see the slide.

D

J

Okay, I'll start, so um this presentation is about a proposal to combine all of the CSI sidecars industrial single component.

J

There are some terms that most of the people in the meeting are familiar with, but because this is a requirement and I'll just explain. A few of these terms like KK is kubernetes. There is the coordinate CSI or in GitHub kubernetes 6 or here in GitHub, with CSI CSI sidecar, which is another three controller, that watches objects in the API server and perform some action, and we have CSI drivers which interact with these sidecars.

J

The sidecars make calls to uh circuit and the CSI drivers implement the methods defined in the CSI spec and react to those calls. Then I may also talk about CP, which stands for control, plane and MP, which is node pool. The agenda is um first I'll talk about what led to to this proposal.

J

The Proposal itself risks and what would be the next steps.

J

So, as a refresher of CSI drivers, this is our usual deployment for CSI driver which has to components I I think it may have more like the identity server, but the the ones in this slide are the one or one in the control plane and another one in the worker node.

J

In addition, we have sidecars, which are deployed in the control plane and working out a few ones that already usually deployed in the control plane and a few other ones usually deployed and working out. So in this slide there are a total of six sidecars.

J

Liveness probe is used in both components and the sidecars are maintained by the kubernetes CSI community and the driver is implemented by the outer of the driver.

J

I also wanted to talk about responsibilities of each actuary in this system. There is the community which we are part of I listed some of the responsibilities that are part of six storage and kubernetes CSI, because this this proposal is about CSI I'll just highlights those the CSI, the current CSI maintainers own, the CSI spec, the sidecars.

J

The way we test the sidecars against kubernetes and a lot of ripples inside the kubernetes CSI org, there is also the CSI driver Ultra, which has a driver and needs to integrate it with sidecars so that they resource solution can work with kubernetes and they need to keep up with releases of sidecars um of CSI 2 and of kubernetes, for example. There is this new feature called volume group snapshot, which is a change in the external snapshoter component.

J

There are new crds, so the CSI driver outro would need to implement additional CSI calls that would be done by the snapshoter.

J

Sidecar, they may also provide default versions of the sidecars and on CSI drivers in in in manifests. This usually happens if they have, if the the source code of the CSI driver is published Upstream and they also create releases as they see um the release. Guidance is different to kubernetes, it's not tight, so they may release it faster or slower, and finally, they recommend configuration of cycers and their CSI driver for the control plane on Notebook.

J

This is this involves setting up the the right flux that the sidecars will have to interact with the CSI driver, and the final lecture is the cluster administrator, which usually plans for upcoming kubernetes releases and needs to make decisions on the drivers.

J

So it might be possible that a CSI driver is about support the feature, and that feature mean meant could mean a breaking change like in volume snapshot which went from B6 from before to B6, uh and there were new changes in the new fields in the crd, so the class administrator needs to plan beforehand to make that available and well. This is what happens at Google. We also. We also because we are classroom administrators. We provide a managed service.

J

We coordinate with other platform teams on the testing of the integration to ensure that it works fine across tests that are not covered Upstream like upgrade tests. Let's say that we have workloads running in a cluster and we upgrade the cluster. Then it might be possible that the nodes get recreated. So we hope that the CSI driver might move the workloads from a node. That's been turned down to or turned down to a healthy node.

J

Okay, so I'll briefly talk about how we build components so on the left, we have I just pasted two repos liveness Pro, but no driver register which are sidecars and we have two main utility libraries. There is CSI release tools which cast all of the code to build this. These projects, it builds the binary. It also builds a multi-arc image and it also has lots of code to test it against the test. Csi driver, which is the hostpat CSI driver, and we also have additional go modules.

J

I just and put one here, which is CSI lab CSI blue deals, which has lots of utility code to set up the connection with the socket do leader, election and so on, and the way we use this repos in the Cycles is.

J

Different for for CSI release tools and the go modes CSI release tools is just a collection of scripts. So it's not really a go mode and that's why it's included in the color bases as a kit sub 3., it's a trip and uh CSI Liberties is a go module, so it's a dependency in go mode.

J

Okay, so I also wanted to cover what happens when there is an update in a dependency and what would that mean for all of the actors in this system? So let's say that there is. There is a dependency that had a vulnerability and the vulnerability was fixed recently.

J

That may mean that the dependency will have a new new release and then depend about which is configured in the sidecars will create a pull request automatically for master and as according to CSI maintainer, we'll need to review it and approve it.

J

The same vulnerability may be applicable to the driver too, so the driver might need to update to to the same thing, update the dependency. uh But if, if the boomerability happened on a sidecar, then the the outer usually waits for a stable release of the Sidecar.

J

So for the from the driver point for the driver Ultra, they just need to wait for the second. If there are CV fixes for the cluster administrator, this usually depends too so Google. We have automated pipelines that build the sidecars and whenever there is a change upstream- and we do this- because we also have to make changes um that are beyond what happens- Upstream like um patching the code version, for example, to uh to a newer or update integral runtime to a newer version. If there is a vulnerability in gold.

J

One question is: if we have this fix that happen in master, what happened in previous releases I think this is handling our best of four places. So if um we need to backboard it that that's that may be done by by the maintainers, but as I mentioned it's a best of basis and at Google we do this manually internally.

J

Another update and that I briefly talked about is a core runtime, because we use the other report. Csi release tools to build the binaries, then the change goes in CSI release tools and because this is a a different triple, it has to be propagated to all of the sidecars.

J

So there is a script that manually or that creates pull requests and updates that submodule in each Sidecar.

J

um For the driver Ultra, they may need to rebuild their driver if there is a go. Runtime update and nothing related with release tools.

C

J

They are using it in their in their driver and in for the Clusters administrator they may need to rebuild the the sidecars, let's say bump, let's bump in the Quran time internally and rebuild the cycles, and we also have the same question. The fix happen in master and what happens in previous releases I think it's also handled on a best of four basis, as usually the last three releases are maintained. uh This is also a gray area because for let's say, volume snapshot which went from B4 to B6.

J

Recently, the last three releases would be before B5 B6, but we're not I think we were not sure which of the minor versions of each of these major releases were under support.

J

There is also there may be also changes in CSI libutils, so this is a go mode, and that means that someone makes if someone makes a change in that in that project. Maybe all of the sidecars need to pull it using the the latest kitsha percentage, the driver Ultra. If they are not using libutils, and they don't do they don't need to do anything and similarly the cluster administrator. They just need to rebuild it, but that usually happens automatically because of that dependency update on go mode on the cycles and we still have the same.

J

The same question this happening in master and whatever their previous releases and the same I think the same approach is taken.

J

So I also tried to collect a few stats about pull requests that were created against all of our Cycles. We have a column for updates from dependablet manual propagation of CSI release tools. This means creating a pull request and assigning it to someone in the community that can review it and same thing for CSI Libya tools and.

J

Something that I mentioned that I didn't mention is the cascading problem, which happens if let's say that there are many cycles and all of them have at dependency, that's vulnerable. So that means that there will be a pull request created against all of the cycles, and this effort needs to be multiplied times. The number of releases that the community supports.

J

J

um So for the cluster administrator touching, a sidecar may be good for the latest version, but they also need to worry about previous versions.

J

So because this is sort of a sort of a gray area, the classroom administrator may need to back Port the fixes to previews cycle releases, and this can happen all the way down until the last release. That's currently supported, I also have to mention here that cornetis is going on a n minus 3 Model pretty soon, so this might mean that we might need to increase our maintenance window for components.

J

One approach that uh I think that could that that thing helped uh Define that I think some more teams were using was to reuse the same sidecars in previous releases.

J

So that's something that's one alternative, but the problem is still there, I believe, okay, so a summary of the issues. We have changes in codependencies that Cascade too many ripples.

J

Changes in go run times in go runtime, which is an update in release tools that also Cascades too many ripples their the current dcsi maintainers have to make a lot of releases too, because all of these components are have their own release numbers so on every kubernetes risk we need to release minor version of each of them, and there is also a little bit of confusion for the cluster administrator on which coordinated versions to use and one one problem that I didn't mention in the slides.

J

But I just wanted to point here was: it was an addition, an additional problem based on how many components we have. Let's say that all of the sidecars have a minimum required or on on memory, let's say 50 megabytes, and if we have four sidecars that would be 200 and that will be just empty.

J

Well, the minimum required just to run the sidecars on an empty cluster without any any workloads. So at this point, I asked the question: why? Why do we have so many projects? And there is some background that Michelle provided.

J

Maybe I should read it so before CSI we had the concept of external provisioner. We have provisioners for cluster, safe, NFS, local search and all lived in the same repo. What happened is that the use cases and controllers in the depth being independent from each other and people will deploy only a few and not dealers. We decided to split them in their own repo, so that each project would have its own really cycle and maintainers and for CSI cars it facilitated fast development and it operated like a microservice.

J

We also talked about some sidecars would mature faster than others, and now that it's been 12 between two and three years, we're seeing that the maintenance burden of microservices, it's avoiding the benefits of having the couple components and one one last comment that Michelle had was that they usage pattern for sidecars in specific, not not disorder, provisioner projects, it's it's different because, as as we saw in one of the very first slides when you do your driver deployment, usually the same set of sidecars are are used, so they are actually kind of related and that this is a proposal to combine all of the sidecars into a single repo.

J

We would have a mono repo that has all of the um code of the Cycles. We will also expose a binary, a single entry point. This will be a new new file. That is a lot like Cube controller manager, where we would selectively decide which sidecars to start and for the other problems about CSI release, Stills and CSI libutils. We can include them in the Ripple, so CSI release tools would be also part of the repo and it will be the source of Truth for other consumers of the repo too.

J

Similarly, for CSI Beauties, it will be a nested, go module and the way the sidecars. Would you see this through go workspaces?

J

um There is one problem here, which is: if we move all of the code to be internal, how do we keep all of the people that are using CSI release tools and CSI libutils in the same state? We don't want to break their stuff right, so we can follow a similar strategy to what coordinate is the switch? Is they which is a day, publish the staging folders to uh ripples, and this is done through one project called kubernetes publishing both.

J

So in in this, in this picture, we see how it is now and how it could be, how it could look like So. Currently we have like six sidecars and in the proposes they will have just one cycle with the same version. It would just have different flags.

J

For the control plane, it would be just one image with different flux. There would be a controller slack similar to keep control manager, which would decide which sidecars to enable there would be Global facts like leader election. There is also a new feature like structural login. All of that would be common for all of the Cycles, because there is a single entry point and there would also be flags that need to be customized per sidecar, so maybe we could add prefixes.

J

So if, let's say um a sidecar defines a timeout which is different to our to the timeouts of other Cycles, maybe that could be like a touch or Timeout, on provisional timeout and so on and for the notebook it will be very similar. We would add controllers no driver register and liveness Pro.

J

In in this table, I try to quantify how oh the characteristics of the proposed solution, so assuming that we have C six CSI Cycles to maintain.

J

um The first point is: if we try to make changes in CSI release tools, how many pull requests? Would that mean and that's the number of CSI released to changes times the number of Cycles in the new model? If it's a mono Ripple, it becomes zero because sidecar or CSI release tools is part of the Ripple and it's Global changes apply immediately for CSI Liberties. It's the same thing for go more dependency bumps.

J

uh Previously it was a number of dependency changes. Let's say CB fixes. We.

D

J

Times the number of CSI Cycles times the number of CSI releases supported- and this is known as I- was mentioning with the snapshoter example.

A

um Sorry, but we're out of time, can you quickly summarize.

B

Well, I I think we need a follow-up meeting to continue this okay and have discussion and things.

A

So should we okay, should we continue this in the next six Source meeting? We can also talk about it in the CSI invitation meeting as well.

J

D

J

We can continue the next one.

A

B

All right so finish finish in two weeks.

A

So uh please uh review this and we'll come back and uh talk about this again.

A

Thank you very much.

J

Sure yeah, thank you.

A

All right, then, then, that's it for today thanks everyone bye.