Kubernetes SIG Node, 27 Jun 2023

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG Node 20230627

Description

SIG Node weekly meeting. Agenda and notes: https://docs.google.com/document/d/1Ne57gvidMEWXR70OxxnRkYquAoMpt56o75oZtg-OeBg/edit#heading=h.adoto8roitwq

GMT20230627-170501_Recording_1374x1020.mp4

A

Well, hello, it's June, 27 2023! It's a signaled weekly meeting um welcome everybody. We have a lot attendance today and not many topics. So let's Jump, Right, In I, think magnet says that he'll be uh five minutes late. uh We will leave this topic for later. Let's go to the second topic: uh Nicole.

A

B

A

Here, can you hear me yeah well I'm here.

B

Actually never mind video here, yes, hi, so I'm gonna place here, I opened the pull request. So by the way, this is my my first meeting, uh I'm uh I work with Microsoft and I'm. You know working on this um confidential pods on on Azure kubernetes, basically and um I I I opened this PR as as an example right. Of course, uh we don't need this to be the solution, just proposal, so so the problem.

B

The problem we were running in nature is that we'd like to to be able to run uh different uh runtime uh classes uh on the same on the same, uh coordinates cluster and we've run into into a a problem. We have different snapshots, so we use potato now allows different subjects to be used for a runtime class, um and then we run into this problem where, uh when we run a confidential pod uh first using an image and then you try to run another non-confidential part uh later using the same image.

B

It fails to start because it it thinks that the image is already present and the image is only present in the snapshot of of the confidential part, not on the uh other snapshots like overlay effects, snapshots um and then one yeah thanks thanks for placing the Dr so one once one way that we found to to address this issue is to add an annotation to to the image reference uh when we are when we are about to pull image or we want to check if an image is present uh or not, and this needs to come from um well, it doesn't need to come from the keyboard.

B

Let's cancel the cubelet, it's automatic, it doesn't require uh users to to annotate the their their podiums um yeah. So so uh we can. We can go into into more details of the problems we're running into and I try to describe them in the in the in the pull request, um but but the request to cumulates, basically to add an annotation, and this is sort of what the the pull request does. So what I'd like to hear from you is um how do you feel about this?

B

um Do you have uh perhaps other things that come to mind uh to try to address this, the sort of problems or any feedback on on addressing the problem or the pull request either.

A

One team Mike Brown turned on the video meaning that he wants to come into that and I I think I know what the comment will be but go ahead.

C

No, this is this is fine. um Using an annotation would be a good way, I believe to you know, to align for a particular runtime Handler, um but but yeah. What we'd like to do is is eventually pass the you know the policy for doing the insurer down to the container runtime as opposed to having Google it. Do the insurer first and then run the Pod by the way.

C

Well, I, don't know if you notice, but the the Pod itself also for each container will ensure that image is local and you know, depending on whether you've got isolation across the the various runtime. You know, engines and snapshots. We may or may not be able to share some information since it's they're, both from the k8s name namespace.

C

We might actually be able to allow you know certain things to be shared across like snapshots right um the topic we would like to work with you a little bit more on this, um maybe maybe a call with Derek McAllen myself and a few others um Peter hunt, I, think on on the on the image side on Android as well, I think to make sure this is aligned, but I do like the idea of adding an annotation, at least for now or aligning what's going to be.

C

You know tell us what what runtime Handler you're going to use um so we'll so we'll pull to that space and ensure that space when the requesting status on the images, yeah yeah, you can't just say cool. You know, please go pull this to that! Runtime Handler! You also have to say you know um when you're getting the list of images and the status for all of them.

C

You actually have to say that as well, and we don't have an aggregation idea yet across that now, confidential containers may not even want to share what they've already pulled, so this is getting, it might get tricky and how much isolation they're requested.

B

Right, yeah, yeah, I I, actually didn't know that you were planning to push the the policy into the into the CRI Service.

C

Well, yeah, it's a long! It's a long-term! You know plan is to move to policy, to policy handling on that kind of thing because yeah, it's just it's not working very well to have Google it managing that policy without our understanding of it, and then we do. We have two types of garbage collection, one for the image reference and and the other for uh the snapshots.

C

Okay. um So it's a little it's a little crazy right now.

B

Yeah yeah sounds good, I mean in terms of uh additional discussions and meetings. Let me know when you like to do it so happy to to the department. It's.

C

Also gonna, it's also gonna, be you know, cross-related to a a cap, we've got called, ensure, pool policy or insure full, always kind of. If you know the image policies where it's pull always or pull is not present, there's a you got the same kind of problem there it if it's there for one is it there for both we've already got some code going in for that um so and it won't understand your annotations.

A

Right when you do annotations, mom I haven't looked at VR. um Do we do anything special for garbage collections? That Kubota will be doing like uh recognize that it needs to collect garbage for both runtime classes so, like what does it, even though.

C

Do we want Google to aggregate across all of the runtime Handler name spaces within the kas.io namespace? Maybe maybe not I, don't know when when you're asking for insurer, you know do we, you know that's a tricky question. We could just reply back to your surgery, which snapshot are we using, but again that's only for container D, it's not for cryo, yet they're they're, still using the grass fiber they're, not using this kind of this kind of concept. Yet.

B

Yeah so so in terms of the pr uh I'm not touching anything related to to garbage collection of images, I'm, just adding the annotations in the ensure image access and then so. This gets translated into a denotation into a pool and and then the check for the presence right. That's really all I do and that uh it's a this annotation for the for the runtime Handler right, so so, without actually uh mention the concept of snapshot which this gets used uh in the equality case, uh uh based on the what I handled.

A

So we will eliminate problem of uh Port cannot start because image uh mistakenly was recognized as existing, but we will introduce another problem of uh summary.

C

It'll still work Sergey, we'll just ensure we ensure lay if you Google, it thinks it's okay to run the Pod, because the image is there they're trying to make sure they can get the images down right.

C

But then, when you actually run the container, we also insured again one last time because it could have been collected right between asking the question and happening we check one more time and which isn't a big deal. If it's you know cash local, it would only be a big deal in the case of welding certain you know case here where they needed it to be in another uh snapshot.

A

Yeah but the question I'm trying to ask you: do we introduce another problem with uh images being stuck forever and never garbage collected in some snapshotters? uh Do we need to like I mean fixing one problem and tradition? Another uh may not be ideal and uh if we have any mitigation to be great.

C

Also, the the per the you know, the the being able to configure this snapshot or per runtime Handler is more of a new thing, and it's only in 2-0 I. Don't think we really expected it to be used a lot yet so.

D

C

Some work needs to be done here.

D

Something to think about a.

C

Strategy that could be used absolutely yeah, hey Peter,.

D

Hey uh something to think about also a strategy that could be used is in situations where the uh the image is stored in a different run for a different runtime Handler than the default. We could use the CRI pinned uh field and then have the runtime handle the garbage collection of those images instead of having Cuba. Do it so just be basically considered a totally separate like the qubit, wouldn't consider it uh for eviction.

C

In charge of evicting those, that's.

D

C

Good idea for a way to transition to a pulse here, because right now pending is the only policy we have right.

D

Yeah, it's the only way for the CRI to tell the Cuba to like tune it's um it's eviction. The other thing is, uh you know uh we brought this up a couple weeks ago, but in 129 we're going to be collecting use cases for like a more clever image, eviction schemes so like this is a good thing to bring up now. So we can consider that in um you know, as we go forward with kind of figuring out different ways to um yeah tune that.

A

uh Can you comment on this uh issue? Ronald mentioned that it's maybe related uh and it's uh uh I think it's in cryo as well. So do.

D

You know so this is a. This is different um this this does have to do with, like the CRI needing to know more about like image life cycle than the qubit is currently giving it, um but this is more specifically like where this is for.

D

um Cryo is trying to add support for uh using six store signatures and verifying with those and we're trying to add, make it uh namespace aware, um but uh because the cubelet only pulls image once you know for the whole node, it's not aware of like whether or not it can like. If a container a did have the signature or it was in the namespace.

D

With the signatures policy correct, then it would be able to pull the image, but then it's container B wasn't in that, then it would, uh it should not be able to, but it is currently able to rely on the fact that container a had already pulled it. So it is really related in that, like currently the Cuba doesn't try to verify a container should be pulled for each container.

D

An image should be pulled for each container that it's running, for it just says it once for the node then verifies that it's still present for the node. um So it is relevant to the degree that currently the cubelet doesn't know enough or isn't. B is inexpressive enough for all of the different, like image, storing or pulling um mechanisms that the Cris are trying to work through right now,.

C

You're talking about here, artifacts, specifically for signed, signed artifacts, pointing to the.

D

Manager, well so yeah I mean that's, that's more digging into weeds, but, like generally, the ideas like uh verifying signatures on a per container basis, as opposed to a per node basis, so like certain namespaces well, could have different.

D

um You know providers for signatures and uh right, but right now it's only defined per node, so the cube is only able to like well. The CRI is only going to pull an image once and at that point we'll verify the signature for whatever you know container it's being pulled for, but it won't actually do it for each future container.

C

It really, it really depends what the signature is pointing to right on. Who should be doing that if it's yeah, that we don't have a way to reply back Sergey to you know to couplet right now, a list of signatures on all of various blobs manifests that we pulled from this image that you've requested um it's not in the status. Yet right.

C

But I guess that's sort of that's the question Peter. What should should we be doing this kind of policy management to make sure these things are signed down in the container run times, I? Think so, or do we keep letting kubl it or keep requiring Kublai to do the management of those types of things and artifacts making sure it's got an s-bomb that sort of thing right.

E

E

It's better to do the in low layers, because it's implementation detail for each runtime Handler.

C

D

Yeah I mean this is going back to the the classic philosophical question that sick note has been grappling with for a while, like who is response. Who is who is the? Who is the entity that is aware of the Pod life cycle, and who is the entity that needs to react to the Pod life cycle?

D

Like you know, uh the current lady, the Cubit, is the one that's where in the CRI is, you know reacting, but we're in the process of like pulling things down to the CRI to be the one to be aware, if, like you know, if we did move the entire pod creation process, you know without having the individual steps be micromanaged by the Cuba.

D

If we move that whole all down to the CRI, then we'd be able to handle this natively just in the CRI, but because we don't do that, we have to kind of jump through these hoops like this right now.

C

Okay, interesting discussion.

A

So, uh specifically, for this PR I think uh you may need to get more information of like what alternative we have. Maybe we can introduce some new CRI fields and it will solve this and plus trying to address a problem with garbage collection so introducing new problem while solving as a problem. It's not ideal, so we need to uh medicine. Is it? Do you have enough action items from this discussion.

B

I'm trying to understand the the better understand the the uh the image lifetime issues from from as far as I understand. uh What like sort of the implication of this is that, if, if you have, let's say um uh some image that was downloaded for two different uh uh runtime headlets right uh that might understand, and then only only one is being used uh and the other one's not being used anymore, then the neither would be uh collected, but then once they're both gone then then both would be um eventually it would be garbage collected.

B

uh That's right. This.

C

Requires when you do a poll or an Ensure process that you at least pass in the runtime Handler, just specify it for the Pod and then internally, when we reply back to a list, we need to return the handle that was used. If we decide that's the right place to handle insure with this kind of thing and for signatures and other stuff.

B

Yes, uh but then, but what I'm trying to say is that when we report the list, for example, we don't do filtering. So if, if on the list, we actually get the list, uh regardless of which runtime Handler has it right, then then I think garbage collection still works, but that has this Quirk, where uh you would only garbage collected the thing once all of them uh are gone right. So when you're trying.

C

To talk needs to know about the option now, so you'd have to have two lists, one free or if you had two runtimes, you need two lists.

B

To to Rome is, uh would it do we actually need to do that, or would it be accept all of them once they're the other users are gone, or do we actually need to garbage collect like individual uh handlers right.

D

Well, you, you can I I, guess it comes to like how long you want the duplicated images to be stored, uh because, like you, can you can pretty much consider it like uh two containers on the same node using the same runtime class like you know, that's what the qubit will consider it. As the problem is the underlying storage layer. The image will be duplicated between the two spots. So it's like. Do we find that duplication acceptable?

D

um You know for the number of images that would be used between both um like coming to think of it. I I think that at this point that is acceptable versus needing to teach the cubelet how to be aware of these two things, because this is kind of like it's a niche use case. I mean it's not super Niche, but it's like it's a pretty fine-tuned use case. I, don't imagine that there are that many pods or nodes. Even that will be running both confidential computer.

D

You know containers and uh conventional ones using the same exact images so like in practice the duplication, I, don't see it happening that much even though it's theoretically possible so like I, think I, my my gut feeling is it's acceptable to have that duplication, while the you know, while both either one of the containers in either one of the runtimes, is still running until both of them go and the GC starts kicking in so like I I agree with you, I think that it, uh it does seem like that's acceptable for now.

A

uh Do we know uh Mike, do you know if um you delete image from container G, we'll delete from both handers or it will just delete it from a default because I I? My comment was in assumptions that containers you may have this behavior when it will try to delete only one and uh you will get stuck in a situation. When can energy always returns, the image Google tries to delete it and it always tries to delete it from default and it never got deleted.

B

Oh, so just just an introduction that I just just briefly uh so sorry for for speaking over people, but uh this is actually a point of discussion and something that I sort of mentioned on the pr right. uh We have a choice on what to do when The annotation is not present right. So when the notation is not present and we can list everything regardless.

B

Of uh where they are and which Handler uh they are, and if, if The annotation is present, then we only uh do that action to that specific uh Handler um and uh with an approach like this, then things like cry uh cry. Cuddle uh RMI, for example, removes everything right and in fact I've I've played with with with.

B

uh If annotation is not present, we go to the default one time Handler instead of using any Handler, and in that case we run into into uh this problem of of like getting images stuck because we never specifying the the right runtime Handler uh for that image. So so I think it's like the default action when navigation is not present. Sort of like helps uh addressing this. This concern.

A

D

Yeah I I, think that makes sense, I think that, like yeah establishing in the CRI that, if your, if your image is duplicated between multiple handlers, then like the cubelet asking to remove it, that should apply everywhere, because the cube is keeping track of which containers are using. What until.

C

So sometimes we come up with a deleted from one place. Only yeah.

D

Right until we teach the cubelet how to do that or move it all down to the CRI yeah in the meantime, basically make sure that it's removed everywhere when, when it's requested, to be removed at all right.

C

And then, on the other side, we remove the snapshots when there's no more references, so we're.

D

Good on that side,.

C

A

Let's um discuss the rest in NPR sounds.

C

B

So, but so let me try to understand uh the the conclusion here. Is there anything from me or do you feel that this might work the The annotation thing? Are there any concerns things you'd like me to to do? How do you, how do.

C

You focus I, think I think this is going to work. Well, generally, we're only going to collect these images when there's pressure, um so budgets need to test that right. That is deleted from all places expected.

D

I I actually think uh this. This might be uh a hotter take, but I actually think we don't need your PR as a stand. So as long as containerdy or you know, the CRI is doing the correct thing with the removals, because, if because it was mentioned earlier that, like at container run time, if the image is not present, at least like crowd, doesn't really have doesn't have this Behavior yet so I mean.

C

It will require a doubling of the images, though and uncompressed as well right. So oh.

D

So the issue is basically that it would be duplicated all the time, rather than only when it's used in both okay yeah then yeah. This then your PR makes sense to me.

C

I'd have to check and verify Peter that we're not just.

D

C

I'm impressed but I believe that's the case. Yeah I'll.

D

Check it yeah I know that totally that makes sense that it would uh they would, if, if continuity, wasn't being told to pull anywhere specifically, which is folder the default um so yeah that makes sense.

C

A

C

Like to at least add, you know which runtime handle, even if we're not going to use it, I'd like to be able to see when they request an image Peter, for you know for pulling what the runtime Handler is going to be for the pod, that's being run against, we we've already added a couple of other. You know annotations for the pods pod information that we're passing down I. Think that's going to be a part of this policy thing anyway, going forward we're gonna, we're gonna need to know.

C

What's going on in the Pod, when we're pulling an image.

D

Wait Adobe already specify the runtime Handler on image. Pull. Doesn't this happen.

C

We might have already added it yeah yeah. We might have already added it. I'm.

D

Pretty sure we have this already as a codified CRI field, so actually I'm back to I'm, not sure the pr strictly needed, because it would the full image request.

D

B

I, don't think it's that I, don't think it's there. I may be wrong, of course, but.

C

It wouldn't have been passed as the as the Handler would have been passed as the pot uh pod spec. There's a.

F

Plus I think it's in the Pod create.

C

Yeah I know on the image we're passing something down.

B

Yeah, so we don't need only for image pool. We also need it for the uh status right to check that it exists. In that case, we don't have the the Pod uh spec there. We only have uh image wrap there.

C

Yeah, we don't specify in status.

C

What the runtime handler was, unless it's already in the pot, maybe it's there I, have to check.

D

uh I'm pretty sure it should be in the pot sandbox and break the runtime Handler. Okay, what runtime Handler it.

F

Is yeah Parliament request has an image spec, which has annotations and then also pod, sandbox config, which should have the runtime.

C

Yep there you go so.

D

So the situation that this all right, there's just the situation that your PR would be optimizing for is for if, if the Cuba didn't know like, uh if they didn't know that the Pod sandbox was going to be run in a separate runtime Handler pulled the image, but then it actually was which I don't think so. I think the risk is like only for the duplicated case, where the Pod is first run in the default runtime Handler.

D

So it's not pulled into for this, that specific Handler and then later the same pod or the same container is being pulled for that. But I think that that's fine, so I, don't I'm back to I. Don't know if we need the pr, as is I, don't think we need the we.

C

Might need to determine is present based on the handle or information as well in the image stats, we might need a different line of code or two.

D

C

That should be up.

D

To container D to report.

B

So so Peter the the scenario describe. This is.

B

Actually, a different Handler at the moment it fails.

B

Yeah and the reason is because, in the in the image status we don't know what the sandbox is. uh The Pod sandbox uh is not there right. So it's not something that we can check.

B

Which is used to check if for the presence of the image it's not present, so we don't have the information there.

D

I see so I would say when I was saying, based on the hearing, that uh containerdy was pulling the image uh for the container create, if it wasn't present in the runtime handlers that it was you know, is being created in.

D

um If that's not the case, I could see this PR fixing it but I kind of feel like maybe that's the container D implementation detail that needs to be fixed instead, like it.

C

Could also be a timing issue. I need to look.

D

C

D

F

C

A timeout Peter because they, you know didn't, expect to start to take so long and right. They may be getting a context timeout, while we're getting one.

B

Now, what we're getting is not uh cortex timeouts, it's actually you're trying to create a uh snapshot. You're trying to prepare a Snapchat snapshots and the parent is something that doesn't exist right so, like you, you believe that the image is this present and then you're trying to prepare a snapshot to be able to use that right. That's it's a different error. It's not a thing on.

C

D

C

A pull request required in in the in the cry code than in the plugin.

C

Let's, let's talk offline and get the picks in on the cross in the plugin.

D

C

D

Like a container D yeah.

A

Yeah, we also may want to have uh image at least, uh should I if I improved, so we know which runtime counters situated on yeah.

C

A

A

Anything specific.

B

For my own understanding, you suggesting that we should change continuity to pull the image, but it's completely aware of the policy right. The policy could say uh Liverpool for example, or it could say uh actually never post is the case when.

C

This policy is not passed to container D, yet.

B

So if continuity doesn't know about the policy, then it wouldn't honor it right.

B

B

C

Never pulled flag passed in hold on, isn't that is it? Let's talk, let's, let's look at the code in detail offline and talk about it. The flag may be pulled, maybe passed to not pull.

D

All right, no I, don't I, think I, think you're right, I, think I. It wouldn't be passed down because the Cuba's, the one triggering it based on that field, so I, actually I've flipped again back to your PR, sounds useful. For that reason. Yes, so um thanks for workshopping, this I agree because yeah continue or you know, the CRI doesn't isn't aware of pole policy, and so the keyboard has to be able to tell it when to pull and when not so like.

D

If the update is always and the container is being run again, containerdy wouldn't necessarily know to re-pull the image. If it was already, you know if it was if the image was pulled in the default runtime, but not in the um and uh uh they're, not the fault one. So yes, I.

C

I got a vague memory, P regularly The Code by a big memory that kublet does tell us when they, when they run the pod in some spec policy there that you know never pull local.

F

Only kind of thing not allowed to pull.

C

The image we're updated, I'll check.

A

You wanted to do some comment.

A

E

I've wrote a new chat.

C

No and I I agree with you, Alexander I think down the road we need. We definitely need to get this. This pole policy to be passed in the runtime Handler in a more formal way with under the expectation that the container runs on is going to be managing the images collecting them and such which would require kiblet to also pass down the percentage of resource use policy. So they can do that collection right, remove those images as well right.

D

Yeah I think, let's, let's I would I would vote. We table that conversation for 129 when we like begin rethinking what um what image eviction looks like in the cubelet um which we had talked about doing already. Okay,.

C

Sounds good all right.

A

And um one more comment: uh listen if you can validate that we don't break any backward compatibility issues. So if customers has it's not same snapshotter for different runtimes, it should continue working, as is uh we shouldn't, be broken just break into scenarios.

A

Okay, let's go with. The next item is Muhammad.

C

G

Yeah, yes, yeah, sorry about that was a bit late earlier, um so yeah, um as you may or may have heard. Lately, um we've started running some node ETV tests on AWS um I've been working with dims and taught on that for some time um I.

G

There is an opportunity to take another look at how we do signal E3 tests for nodes in particular initially, um given that we have a fun requirement where we need to run a test scenario against many operating systems, and now multiple architectures um and I see that the cops project um has a neat way of generating what's effectively a test grid of various permutations. So I had a look at how they've done it.

G

So they've got this little python script in in the same folders as next as the Power jobs and they basically Define a test scenario running against. Let's say tedos's um with this many with multiple options and then they would generate a power job, so they probably got like hundreds of jobs generated like that. Instead of a human right and all those drops out so I have a PR in there has implements that so it'd be great. If you could get that approved, I also want to take this opportunity to finish that cap.

G

That was migrating to keep test two um and then there's a a lot of most of these Legacy jobs um are using bootstrap. We need to get rid of that too. um So, if you the big long file there, um we've got signaled grid um real, quick.

A

A

G

um uh I need to finish the PR, but this is kind of what I wanted to. Let the water group know that's what I'm working on. um Initially, these will be CI jobs. um Once the jobs are stable, um we should probably have a conversation about picking which ones you want to use as permanent pre-submits.

A

um Okay uh inside group we discussed recently what Dixie wrote the document about permutations you want to test. uh So maybe it's a good opportunity to talk to uh together and.

G

The meetings on Wednesdays right every week.

E

G

Yeah I won't be here for this one, but I'm going to join the next one and we'll have a look at what was discussed before and see what I'm working on and where they're at someone else was having a conversation about arm 64.

G

um They wrote something um so I'm, probably going to talk to that person as well and see where we are.

A

Okay, uh yeah I think Ike was involved and there are somebody else. I, don't remember the name. um Okay, so yeah welcome to CI uh music. We have a document linked. uh One of the past uh meetings. Dixie I, see you see, I see you on a call. Can you uh add uh link this document in the agenda.

E

Sure I'll do that. Thank.

A

You yeah and uh probably will start congratulations into PR and discuss uh everything on GitHub as well. Thank.

F

You for bringing it in.

A

Okay, next topic is Francesca.

H

Hello, so, um okay, uh it's about one issue which is was open a few weeks ago and it's fixed so I'm trying to get basically uh design review and code review and just trying to bring attention because it's a regression. So a bit of context for about this, uh while a while ago we had the bug on which we noticed that the device were improperly handled on node reboot.

H

Basically, what happened? It was another Boot and the Pod were admitted and because of code worked, the workload was started without the device actually allocated, which in some cases is very bad, and we fixed that, unfortunately, for reasons that are explained in the PR but basically boils down to a few issues in the tests uh we may we introduced.

H

These are in our regression, which Cube letter start causes um pod, actually, yes container, but still pod kill on rest, cubeletal start only for container consuming devices, which is, of course, bad and which ones, of course want to fix.

H

So this is it and basically I'm I think I found a fixed tank to a conversation with Clayton and others. Others in this group already commented on slack. Thank you for that and I think I found a way forward, which should make your own app in should address all the questions I'm aware of, but I will like really love to have more comments and and review and Confirmation, or not that this is indeed a good way or any missing items, and so this is me trying asking for that.

H

So if anyone questions comment, I'm happy to answer, that's, why feel free to catch me on Slack.

E

E

Lizar from Nvidia, so this side came and came out. You might also want to look at this particular thing.

H

Yeah Kevin is Target and I think demons luck, but everyone is a good suggestion. Thank you for that. I will get in touch.

I

Yeah I just want to mention I was I'll, spend a little bit of time looking at this and have a few comments, uh I'll send them out on the PR, but big thanks for for working on this I think overall, definitely right fix and uh looks good to me overall there's a few kind of small comments. I'll leave on the pier that's.

H

Very welcome David. Thank you very much.

A

Yeah it's yet another um situation on kubility Stars shouldn't be that dramatic, I think uh another problem we have is these probes right when probe when Kubota starts, it makes everything come ready and then we need to reprob everything. It's much smaller problems, though, but still a similar aspect of it.

H

Yeah this area about cable restart, is let's say we can probably improve a bit overall. So there are few dark coordinates, I would say so, let's, let's make it tomorrow, boost over in the long run.

H

Well, thank you all.

A

D

Yeah, uh so this one, you know, I I, just wanted to bring up a slight some light complications that have come up about the secret driver thing that are not insurmountable, but I just wanted to bring it to the group's attention uh and gather feedback. Basically that you know currently, the idea is for the C group driver field to be static uh in that the cubelet requested once and then assumes it to be the case. But technically speaking, the CRI implementation can reboot underneath the cubelet and change that value.

D

um So I just wanted to make people aware of or discuss like what I'm imagining for. It is adding a caveat in the CRI spec that says, if the. If, if a c group driver is changed, then either the CRI should ensure that there are no containers running currently like it hasn't restored into containers that were previously running on the last run or the node like has rebooted um but speak.

D

But that is something that has to be enforced on the CRI side, because the qubit is not going to be continually asking for this field, so it won't be able to learn that it's changed and react to it. So so you know we definitely have precedent of asking to see right nicely to do things, but um I just wanted to double check that that approach makes sense to people and um they don't have any concerns or anything.

A

So the Google will ask a customer to clean up everything before change or Google will do something. I I didn't get it cubelet.

D

Cubelet, so this is the the problematic Behavior will be that the cubelet. So if, if the CRI changes, the C group driver underneath the qubit Cubit will not be aware and will continue to create the old whatever the original hierarchy is so like, if it's originally just a v, cubelet will continue to create slices for a container that will now like not be managed by System B, um so the I I assume I haven't tested it, but I would be surprised if that worked.

D

I know cryo, like manually checks the paths um of the the Pod C group to check that it conforms to what it expects to be. The C group driver, I, don't know if containerdy does the same, but um I assume that the pot, the container Creations or pod Creations would start failing, because the pause C group and the container c groups are like different drivers.

D

um So the expectation that it's being uh declared at the the CRI spec level is that this is the responsibility of the CRI implementation to handle gracefully and a couple of ways that that could happen is either. It ensures a reap, or basically it needs to ensure that there are no running containers when the C group driver switches, um because the cube, because the Cuba would have already created the C group hierarchy.

D

uh If there are already containers- and so we basically need to make sure that um there, the switch doesn't happen, mid run of a node. The easiest way to ensure it is just node reboot. It like it needs to go along with the node reboot, um which would be possible to check, but might be a little bit more complicated than just checking. If there are any running containers.

C

Okay, so in a restart situation, you want us to confirm that no existing gems are running under a different C group, and if they are right, you need to do more error on the reboot and that asks them to reset it to the original C group and then so that they can shut it down right and.

F

C

In the config we need highlight did any change to this needs to be changed, needs to be, you know, acquiesced first, and that before.

D

Yeah restart exactly yeah.

C

Reconnect so at least on secret.

D

E

Has a list of running containers and can it knows uh we'll see groups which is created so it can just.

D

But but it doesn't know when the C group driver has been changed, because it's not I mean if it would know if it itself was restarted like if cubelet restarts learns a new c-group driver, then the other, like the pods, that it's able to find like that could happen, but um I think I think it would be better if we just declared that the CRI should not change the secret driver if there are any running containers.

D

So if it re, if the CRI implementation restarts and the node hasn't rebooted or the containers have been removed, uh then, and the C group driver has switched, then the CRI should like error or like return uh response to the cubelet or something like it, should that's that's a error condition that should be handled as abnormal, um because uh it'll make it would otherwise make like some nefarious errors happen during container creation. That might be confusing. Thank.

C

You thank you. I was really wondering. I was wondering what what we do on this on a reconnect. Would they call you know this this that you know give me the static config again on every reconnect.

D

D

No I don't think we should I, don't think the cubelet should be in the business to try to figure out if the C, uh the CRI implementation changed, it I think it's. We should just make it clear. The CRI is responsible for this and it needs to be smart about when it switches.

C

And we'll put that in the cry, if you the new API we'll just say, this is only.

D

Exactly I wrote a note on.

C

The on the reboot that this would happen, okay,.

D

Yeah so I I put a I put uh some notes in the the CRI spec in this PR I like am taking it over for Marcus because he's on vacation. um So please take a look and let me know if it's clear enough I'd be happy to fill it out more or um yeah I said cool. It sounds like we're in agreement.

C

I

It yeah well I.

F

I

One question about about the proposal as well: I'm, not sure if it was if it was covered somewhere else, but like I, don't know how big of an issue it is, but four different runtime classes. Is there a difference to group driver potentially per runtime class, or are you making assumptions? Classes will have the same secret driver we're.

D

Not supporting that in kubernetes like oh, that the the I think we previously briefly talked about it, but basically because the cubelet is managing the hierarchy itself and is managing qos classes like you know, for that like it would make it really complicated to try to manage it between different ones. So so far, we've just not even tried to support that, even though containerdy supports it um and I think that we should continue doing that.

C

Literally, at the host level.

F

D

Yeah, it's it would, it would create a it, would make the cubic code a lot more complicated. uh You know if we move to a world where the CRI is fully in control of the CEO of hierarchy then, like maybe but like at that point, it would be an implementation detail of the CRI, um but I don't think we should teach the cubelet how to Multiplex the Sierra uh the C group driver. It.

I

Makes sense makes sense, I I think that's fine. My only concern is, if there's someone already doing some crazy stuff like that, like I, don't know if they're, maybe, if someone's using I, don't know, say, system, DC grip driver on the kublet Pod levels to groups and for the container ones, maybe they're running inside some VM or something using secret. I, don't know if this is like something anyone does um I just want to make sure we don't yeah the factory reset like personally I'm, not aware for basic.

I

You know, use cases or even like, like G visor, for example, I'm I'm, aware like it has, you know it has to match, but I don't know if there's some other weird runtimes that have some special rules. I'm not aware of anything but just wanted to ask.

D

Well, and admittedly, there are some hacky situation said some cryo code has where, like systemd, doesn't manage certain C group Fields, so cryo will manage it on its own. um But that said like we're, not teaching the qubit about that so like. If, if that creates any weird situations, that's the CRI implementation's fault, and it's not something that we should make they keep it aware of. It makes sense.

I

E

David, it's more weird situation when couplet goes and tries to change something directly in with C group versus what it's supposed to be done in runtime like trying to assume some location of a container level. C group where I will containers run inside there.

I

Yeah, it's a well yeah, but.

D

But it does have that boundary pretty strict about just managing the policy group so like this is not yet an issue. But yes, that is that that could be. That comes up, but yeah I, think that is a good question and a point that was brought up David, but um we're not worrying about that.

A

Okay, I think a related issue was a long-standing problems that I I found one another day is the fully drained the node problem when we cannot ensure that we fully drained all the containers- and it was the Bots from a node but I, don't think we want to solve this problem as part of the circle right.

D

How is that related to the C.

A

Group, no, it's not about c groups. It's about a previous issue um about driver like you need to drain everything before um switching.

D

Yeah yeah yeah it and I mean like we can't do it a guarantee of that as a cubelet uh uh I think, but we can like I like that's. Why I feel as though it should just be the cri's responsibility um to ensure that the state transitions make sense.

A

Okay, yeah, we at the end of our agenda and we have only five minutes left. Is there anything else? Somebody wants that.

A

You know, then um happy rest of your day and bye-bye see you next week.

F