Kubernetes SIG Node, 17 Jan 2023

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG Node 20230117

Description

SIG Node weekly meeting. Agenda and notes: https://docs.google.com/document/d/1Ne57gvidMEWXR70OxxnRkYquAoMpt56o75oZtg-OeBg/edit#heading=h.adoto8roitwq

GMT20230117-180545_Recording_2010x1120.mp4

A

Hello, hello, it's January, 17 2023, it's a weekly signal. Cia me up signal meeting, not CI CI will be tomorrow.

A

Welcome everybody uh today, uh I want to start with our typical uh PR uh Trends analysis, so we have 200 217 PRS right now uh we don't close and merge, as actually as uh we would expect, and also we start working on enhancements, so I think it's a stage of a project when, like people, mostly looking at their reviews and writing enhancements and doing PR reviews, but if you have Cycles please get on it and help us move project alone and keep your number healthy.

A

um Is that said, uh let's go to the first agenda item uh and it's something we discussed on CI machine last week right.

B

uh Yeah I was um so yeah. Can we drill into that issue.

B

So this is for the local storage capacity. Isolation feature it's Alpha currently and there's some problems with it. There's a few issues open for it like this one um and others, and the question is: do we want to keep the feature around and graduate it to Beta or deprecated out?

B

um We haven't had a lot of traction on the feature in the past, and so I was wanting to gather information from the team about what we should do with it. um Is there any opinions on this feature.

B

Smyrnault, hair.

C

Yeah I'm here so I think maybe we should summarize where the current status of it is like what are the? What are the blockers.

B

So one one blocker is that a user can change the project ID of a directory which is potentially a security problem.

B

um Other blocks of that yeah.

C

Is this something we can check with the kernel file system teams and see uh if they can help out here.

B

C

Have any thoughts or any ways to address it? Somehow, or maybe it just marked, as needs obscene kernel, work or file system work.

D

So the the feature- maybe I, can add a little bit back. What the context the the feature is like non-dual feature. What do we want to do because uh we don't have efficient way to track? This is the kernel feature I, think that maybe, like the even like 15 years ago, I already sent this request to Nina's Community.

D

um So we don't have efficiently tracker the disk usage and so there's the feature long do in the kernel side. uh They have the this quarter. They used because this quota, then the this is the second version of server. Wasn't I forgot, um so so so when we first have the kubernetes start. uh So we are not also have this kind of problem and which is, we have to use in a really heavy way to track of the disk usage.

D

So, luckily, right now, a lot of the usage is using the pde, remote storage, all those kind of things not, but on the local storage side. Today we still have a little bit problem. I mean not the need of a problem. If a customer using having it on the local storage, we cannot really efficient to track off the image usage, the login usage, all those kind of things and not really efficient to charge to the user.

D

So so this is the time we've been promised by the kernel folks, they are going to give us not perfect way, but this is better could be connected to the second instance like the process right per process, those things, and so so. This is why, after this third washer or something exciting implementation, so we founded this effort in the kubernetes community.

D

um I I haven't looked at it, but I know that when we review in this community meeting- and we did the thing about the kernel- implementation still is a little bit different from what original thought, but the I I agree with the menu. What's the real problem, it is. Maybe this is the good way for us to send those requests to the kernel. Folks.

D

No matter CPU and memory actually is because how we are using those containers, then we send you those requests um so like this MCG memory, V2 like the usernand username, the um handling actually request from us more than like around 15 years ago. So that's why I think this also could be the similar way.

B

So um I wanted to uh start the process of maybe deprecating this feature since there isn't really an efficient way of doing it. That's cross-bought form. Are there any objections of doing that.

C

Maybe we we summarize the status first and then make a call whether.

B

Yeah I I think uh yeah on the on the pr or issue that I create. I can summarize the status and we can have a discussion there on it. If.

E

B

C

Yeah yeah, once you have an update, uh we can again discuss here: okay, just like yeah.

B

A

Yeah part of this update will be all the issues that we had, uh maybe Ryan. If you can collect all the issues that we have but never addressed, and nobody looking at them. It will be another data point.

B

Yeah will do thanks.

F

And so just quick question for this feature without it uh Google Tracks file usage with like du right, just like okay got it, because the other thing I would uh recommend we'd look into I think we also need to support it on the container on time side, because uh container runtimes also Support over the CRI. The report, um you know overlay file, system usage, basically right, and they also do this. So I think would have to we'd have to think about that as well.

F

Something I was looking into recently.

B

G

B

uh Update the issue with uh the collected issues from all the other uh other issues on this topic, and we can have another discussion about it. Thanks.

A

Yeah- and this is one of the those uh in like infinite alphas and infinite battles that we need to get rid of so General mandate is if we have something that never graduates so past Alpha Beta, it needs to be deprecated, so I'm not saying that we need to force it on this specific case, but uh it's something we need to actually look into.

A

Okay, uh Renee I, think I saw you in a call.

E

Yeah I'm here so um in the In-Place uh part. The update here is that uh the API PR Tim Hawkins, is concerned about uh the gap between merging the API PR and then a week later, following up with the implementation PR rebase to the merged API PR. uh The concern is that if there is any anything that comes up that would require you to unwind the whole feature, uh the worst case scenario.

E

Then it will be very difficult, especially if other peers interview intervening peers, have picked up on the API changes and then we're gonna have to run through resolving Airy conflicts. uh I think it's a reasonable argument. um I don't have any. My. My main reason to separate them out was that these are there's a big change and it would be easier for uh in posterity. If someone is looking at it. Just look at the API changes. It's just one commit there and then the implementation is a separate commit.

E

That was my main reason and then a secondary, auxiliary reason here is that uh it'll help me it'll. Give me the time to uh bring in the periodic CI jobs which we have disabled and ensure that they're running clean uh and they're doing no Ops until the implementation merges and the implementation uh goes in. They continue to remain green. It's not a big deal. The I think tense.

E

Point stands uh despite these two, so I have no concerns I'll, always we can always add the periodic CA jobs afterwards and then, uh if there is any issues, then we will iterate on it and fix it, uh hopefully fairly quickly.

E

uh I'd rather bring it in because there are other jobs that have referenced the pull pull job and they occasionally keep failing when they do retest and it's a distraction for other peers.

E

uh So it I think at this point we want uh direct and Tim to you know, go in there, I just rebase the pr and then updated, uh checked the pull request. The pull job for In-Place resize ran it multiple times. It's again, it's established a history of running green.

E

uh We just want both team and direct to slap, their LG TMS and approves and get this in and uh watch over it for the next week or so and then bring the other CI jobs in one of the change that we discussed uh in the context of that API PR is a small name change to the resize policy, to structure it better um to allow for different policy types to be appended today. We just have uh resize policy under that we have a policy, no restart, uh restart, not required and restart required it.

E

What he said is it might be better to name that as restart policy, so that it's very specific on what it is. uh What's the meaning and then not required and restart container and potentially in the future, have uh something like a restart pod. This could be for some kind of a VM container which requires you to run through the init processes. Init containers once again to provision it, and uh there is no other way to do this other than have Google support it. So that's a plausible use case.

E

We don't have a real use case like we have for restart container, which is the Legacy applications, but if it comes, we can accommodate it. Foreign.

E

I saw your question: uh did you mean scheduler 1 20 of API server 129, uh or did you mean 119? uh You asked a question on the on the cap update that I kept on kept update, PR.

D

So it's a typo, it's the 29 yeah. Sorry.

E

Okay, uh yeah um the there is a versions, Q document that I found, and they have very specific reasons to allow only certain versions and I think with the scheduler and cubelet. Let me just post it here in the chat um with scheduler and Kublai, uh with scheduler and API server. The scheduler must not exceed the API server version, but it can be behind by one that's what they have in here. So this is the.

E

So that's the reason I drove the table to see uh you know what combinations make sense and what we need to support, and this is going to be a follow-on PR again, because the sphere has already been reviewed and approved by direct I. Don't want to touch even to change a comment at this point or resetting reviews. The only changes I've been doing. That's are the ones that are forced upon me by rebasis.

E

A

E

A

Be great if you, if you can reply in this comment, so.

E

Yeah I'll I'll do that I'll I'll put.

A

E

Yeah from the apis I, don't think there are any issues. uh uh I don't know. If Derek has any questions Sergey, there are a couple of questions that you had in there in the API. Pr I have already addressed them. If you don't mind, taking a look and resolving them, uh please do and.

A

um Like smaller ones, yeah but I will.

E

Take yeah yeah; no, they were very valid ones. The the fact that I have aforementioned and in the Swagger documentation it will confuse people so very good catch there. Thank you, foreign.

A

Thank you, honey I keep my fingers crossed.

E

Yeah I'm hoping it will be done this week- I was hoping Derek will be here today and I can paint them as soon as Derek's uh concerns are addressed, uh I don't know if he still wants the API PR merge first and then just go on it. uh Hopefully you'll see our notes, uh we'll put some notes in the reading on the meeting. Doc and I'll update it. Foreign.

A

Class resource cap I think it's Marcus. Yes,.

G

Yes, excuse her, so here just an update on the cap, so uh I've update the cap based on on the feedback and comments, but in the past so I guess. The two main concerns in the past, but especially the kind of for The annotation based uh Bridge Gap qx.3 proposed in the in the cap, but now I did that completely and kind of put in the in the first implementation paste that that we already go straight.

G

Try to kubernetes API with the changes, so no no button location but annotations would be uh involving that at all. And the other kind of question has been popping up many many times. Is that isn't there a mechanism for uh limiting the usage of some classes so, for example, being able to say that only two users of this high priority class on this note can be?

G

Can we at the same time so that that has also also been been added to the cap? No, there are this possibility for, for a kind of class capacity to limit the usage of classes.

A

Capacity, yes,.

G

Yes, yeah yeah, yes, scheduler would know about that so uh yeah, so I can show a short demo if you want to see how how good kind of work in practice I'll look for you, how the user interface good looking local practice, so I could I could yeah yeah.

A

I think we have time. So if you have it ready, uh let me know yeah.

G

A

G

Minutes, yeah, yeah, I think I could try to ride. Yeah, try to run it, so it's based on the latest latest score, so we will take only.

A

G

G

D

G

So, can you see anything, is it too small yep.

C

It's okay, yeah.

G

Okay cool, so I have a one one note cluster running on the kind of pretty recent version of kubernetes and and a cryo content container runtime touched to support the Curious clusters as well. So um let's take a look at how the node node looks like so in here we have uh kind of enabled container level curious class resources, so they say: block IO, like three classes: high priority low priority and normal and and now there are two dummy classes that don't do anything just for my testing and demo purposes.

G

Then there is automatic rdt class rdt resource with uh four classes, bronze called Silver and system report, and we have, in this demo, have just put put some capacity into these classes to demonstrate that so we have kind of infinite as well. It's kind of universal that would be nice. You know practical in real life, but High priorities, capacity and low priority has a positive one and normal, like two users can be at the same product, for example, and then in the north status.

G

We also see the kind of usage usage of this of this uh pure's resources and if you take a look at the demo uh pod, we would have a simple Port here. That basically does nothing, but it has two containers: uh reverse container, put a request, early, the class code and local plus high priority, and the second container would request only because low priority.

G

So if they run that one okay, now it's running and now we can.

G

We can see in the not status now that that actually block our class high priority and low priority are both kind of one instance is reserved from these classes, and our data classes runs on gold as well are kind of so the gold gold plus would be fully fully occupied at the moment, and we have like second container and it tries to. It also has two containers.

G

First, one only uh requires requests, rdpt class bronze and the second order class called and Clock Plus low priority. Now, if we try to run this one.

G

G

Status, we can see that it's pending, because uh uh blocklayer plus low priority is not available.

G

So yeah now it notice running and then well the node status um kind of reflects the uh changed allocation of uh resources as well Oculus resources as well.

G

What else do we have here, um uh then? We also in the next or future implementation phases are kind of uh it's.

H

G

Just to extend the resource quota mechanism to to support these curious Resource as well, so I can quickly show that. So um you would have a quota here. Let's say well, for a container of resources or did the class runs like two instances are allowed in in this default namespace and then for Block IO, we don't put any we allow, plus plus, is normal and low priority, but don't put any capacity minutes there before the atomic one resource. We also put some limits capacity limit to class class D from now.

G

If we, if we set that so.

G

um You get get the status of the resource code. We can see that there are now some some quota in effect in in the default namespace, because we, as we specified so for this already they will put in this quote uh also a kind of limit clinical values.

G

I know: okay, now the one part is actually running. So it's also yeah. It's also already kind of accounting against the quota.

G

It's wrong with.

A

Being deleted, can you comment on uh how it was configured like? Did you configure nodes first uh with specific resources, and then they Auto Discover it somehow like which classes exist, on which nodes.

G

Yeah in the skept, the kind of uh management of these resources is handled in the runtime. So so those resource resources are configured and managed by the runtime and then the runtime reports, through the runtime status message to the to keyboard. What is available on the nose and thank you, but let's test, updates, notes, tattoos and the schedule gets that that gets that from there kind of their availability. What is available on which nodes?

G

So basically all of these are configured in the runtime.

A

And I remember: I asked in early stage of a cap is it mutable or immutable for the node.

G

uh Basically, it's depends on the resource I get well and it's kind of in the kept its uh it's part of the kind of status information at the Discovery Place of the of the resource. So some some curious resources, maybe maybe immutable. So you cannot change it after the content or report has been created, but it is possible to uh plug it as as mutable as well so yeah I guess the main.

G

Reason for immutability is: could they, for example, in in some cases for for like B and B and best runtime? So you cannot can maybe use cases, but you cannot really really change it after the something has been allocated by the way, but it it depends on the kind of resource and it's it's shown in the or it's a statement, kind of discover this resource, Discovery Place.

A

Yeah I need to read through cap, but uh I think uh today, uh nodes are have static, CPUs and static memory, because of all the caching that you do in scheduler. uh So there will be no confusion like you cannot update. No, the result not about resources in runtime, so I wonder if it will be new pattern uh with resources.

G

Okay, yeah yeah, you mean immutable, it kind of what is available so yeah yeah I got like two two mute abilities. So one is that, is it possible to yeah update, update the kind of resource assignment of running containers, so yeah that depends on yeah on the resource, but kind of nothing prevents from the kind of API point of view to to change the kind of available resource, local resources on the Node, but that yeah. So from the API point of view. That is not like restricted in any way.

G

E

Like talking about we're talking about hard plugging some resource.

G

Yeah, for example, yeah or changing the runtime configuration kind of dynamically during run yeah, while the runtime is running.

I

And Sergey I actually were immutable, node resources, it's artificially only for memory and CPU. So if we think about uh extended resources on one out or device, plugin provided resources, the capacities are updated based on the state, so scheduler properly handles yeah.

A

It's fair yeah I, just try to map all those Concepts from like what exists recipient memory and people may map their understanding of available memory and try to Cache it somewhere, which will not be true for this kind of resources.

A

Yeah and I'm glad you all just thought about the port uh in place, update, uh I, think uh what VNA work is showing that this is a request feature so it'll be interesting.

G

E

And uh to add some more context to this I looked at this cap as well uh from the perspective of uh adding the capability for reporting. Let's say you have a network device, uh Network cni, which is capable of uh offering different qos levels for Network traffic or even bandwidth limits and requests we have limited today, but focusing just on qos like traffic from part A is high priority or container a and body is high. Priority container B is low priority that I don't know.

E

If we can do it, but uh for between different parts, we can definitely differentiate uh and implement qos. We can give some uh ports high priority like if you have a real-time application running in the Pod, then that would require low, latency Network Pathways and that we can make that happen using avpf.

E

um This feature. uh uh From that perspective, it makes it easy to advertise Qs levels in the network traffic and then different uh values that you can have. Let's say: Network us and high priority or expedited or best effort. You can have these classes and you can go further Beyond it. So from my uh what I saw I'm a huge plus one for this feature.

G

Yeah thanks thanks very nice, so yeah, that's that's, definitely one one kind of use case. It's also interesting, but I haven't had time to really prototype on that, because I guess.

G

E

G

On the list, yeah definitely on the kind of this.

E

G

E

We'll have to work out how I think it will require cni extension to discover, capabilities and report that uh report the extended capabilities or something uh another thing that was going on I saw on the network. Sig was that uh the possibility of extending CRI uh to today we have set up pod, sandbox, tear down pod sandbox, explicit apis for setup or network dead on board. Network I found that, as a haven't figured out what its advantages for Google to directly manage Network would be, but there might be some Advanced use cases.

E

I floated some I, don't know some uninformed ideas, but this is something I thought I should mention was being discussed in signetwork.

E

Since we have CRI folks here, you can take a look and see how you feel about it.

A

E

You can, uh let me let me just let me just share it in the chat.

C

Yeah I saw a tweet about it, but haven't seen the link. So if you can share a link will be useful things.

A

Okay, uh why are we sharing the link? Let's go to the next.

I

A

uh Marcus, thank you for the demo. uh I think I will take over the screen share yeah. Thank you.

E

I just share the link in the chat.

A

um There's an email, okay, uh let me put an agenda and uh how we're interested will be completed.

A

Now, if you have any quick updates or quick thoughts, uh please share otherwise I will go forward.

A

Okay, then um yeah next item I wanted to share, is uh Bartosh requesting to be a reviewer and I'm fully supportive. uh If you have any concerns, please voice it here. um Otherwise, I think it's. uh It should be good to go. um So don't direct. If you can. Oh don't there is not here. Oh, don't is already lgtm, so we'll wait for that. I can uh uh we'll have one more reviewer. Hopefully.

J

Thank you very much, um first of all, to Kevin and to the people who vote for me. I I appreciate this and will hope to do my best in this role. Thank you.

A

Thank you for your work and sorry for calling you Bartos, but uh it's.

J

I'm at people call me ahead: I.

A

I can call you and uh I'll try to remember, but uh yeah I'm always matching it to GitHub handle and sometimes hard to switch Yeah.

A

Thank you, um sidecar uh sidecar update. We had we've been running a sidecar working group for more than a month now, and uh we get to the point when we have cap that almost fully fleshed out and we want to move it to GitHub. uh Hopefully tonight, uh if you have any comments to the Google Doc, you can leave comments today or tomorrow. You'll have opportunities to do it on GitHub um we I can give more uh update once you have once you have it on GitHub, so it will completely flushed out.

A

So maybe next time I can uh do small um uh explanation in the presentation about it. So yeah. If you have comments right now, please uh comment it. There.

A

If no questions, uh um let's go to the next item, next meeting for work for plugin for Google plugin.

H

We will have next meeting in two weeks from now. uh We we had today uh the the last session. Basically, uh it's we, we presented our approach. Basically, there was a feedback passed in the last session, the previous session, which we tried to cover it. We presented a possible architecture, called how to deal with that uh and uh yeah. We are proceeding with that uh this. In next meeting, we will go through more details about the cap uh kind of fill out um with information, and we will continue from there.

A

They said the meeting was today: uh can you share recording so we will have it uh published.

H

Recording this with my colleague Catherine, he might be offline already today latest tomorrow morning. uh We will send it to you yeah. Thank you.

A

Okay, we reached the end of the agenda today. uh Is there any other topics, if not, let's get 20 minutes back to uh to do more work. um Thank you, everybody, bye-bye. Thank you. Bye.

C