Kubernetes SIG Node, 8 Nov 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG Node 20221108

Description

SIG Node weekly meeting. Agenda and notes: https://docs.google.com/document/d/1Ne57gvidMEWXR70OxxnRkYquAoMpt56o75oZtg-OeBg/edit#heading=h.adoto8roitwq

GMT20221108-180646_Recording_640x360

A

This meeting is being recorded um good morning. Everyone today is the November, 8th 2018 and it's our regular weekly signaled meeting and the welcome back everyone, and also today, is the 1.26 uh code of free state right so I know everyone's so busy and after and also the code of phrase, all those kind of things. So we also have a lot of proposal but I believe off nothing.

A

um Some of us already protests, so some is being already crossed and but they're still put there as the information to updates to the to community here right, so us so David and Eric. Do you want to start.

B

Sure I can start. um This item actually is not super critical for uh code free, so I was thinking. Maybe I can actually move it.

C

B

Has more something merger for countries, um so we can move it to later. Actually, I, don't wanna I'm gonna give time if people have right code for each or a lot of things.

D

I I think like Derek and I made a pass yesterday and today only thing that we are aware of is Venice PR and we have one more week. So, let's see what we can do, but in general a call out, if folks have anything for 126 now is a good time to bring it up. I, don't see anything more in the agenda.

D

A

And also some is blocked by the uh by the release, actually I, believe people write the process like also in the morning I also start to process all those kind of things. So I think this is maybe it's okay for us, too. Okay,.

B

In our case, okay, cool just wanted to make sure that people have time with him and bring anything up. So this is um something that I've been working with uh with uh ericon. So uh the kind of backstory here I just wanted to kind of get folks thoughts about this and see uh if this is some problem that other few people face or if, uh if it's something kind of be kept for, maybe this is kind of a smaller thing.

B

uh The kind of story is that we um we have like a script that basically checks the health check of the CRI periodically um on our nodes. So sometimes we have the case where the container runtime goes down um and sometimes Google can still be up or it can be kind of worked uh one way or another. So we have a script that basically periodically directly contacts uh the container runtime, the CRI via cry cuddle and just does like a crack cuddle, pods list and that's kind of health check that we have.

B

We run I, think every minute or something like that, and then, if we detect that there's an error there, we basically restart the container on time.

B

So it's kind of like a health check that we have- and um this requires us kind of to run, crackle every minute right and so the problem with this we kind of realize crackle, is actually a pretty heavy binary um and we were kind of investigating some cases uh in production where we have a kind of uh disk, throttling and low disk iops and, surprisingly enough, uh the loading of the cry cuddle binary.

B

Just because it's the kind of a big chunky, 50 meg, go binary actually resulted in some like additional disk IHOP usage, which is kind of funny that the health check itself is causing kind of more issues. So we were discussing some ways to resolve this and we were thinking you know, since the kublet is already talking to the container runtime. Maybe it makes sense uh from the Kubla to kind of have a health help the endpoint um that directly checks if the container run times up.

B

So this way, since the kublet's already running, we don't need to load another binary to talk to container runtime. That's kind of The Proposal here. So the proposal is to add a new health check to Kublai um that directly just checks if the container runtime's up so that way, we don't need to load in another binary. To do that. So I was wondering if other folks think this is useful- or this is another page probably face or how you health check the CRI in general um yeah and to get some parts.

D

Yeah I think we go by like node, not ready, but I I, yeah I, don't see a I mean this this this sounds good. It doesn't seem like a Sim sees like a useful addition. Yeah.

A

Yeah I also don't have any kinds of that, um but I I have to admit that until I agree to the code, I have to say that the implementation, the the description actually is confusing there. So the feature what they you need talking about.

A

Actually it is totally I, don't have any concern uh um and also I'm, not sure um yeah I agree with this also why we need another of the flag there and the um so so, but I can see that uh maybe because they want to use in a flag uh are to control, because on doing something like the motion level controlling here, because because the older version don't have that healthy right in the point.

B

Yeah exactly and the theory that why we were discussing flag for that is because some folks, like the existing health check, didn't check that the container runtime was up just a few bits. So maybe some folks, uh you know for the health checks, they don't care about, checking the container runtime, and so that was kind of someone raised that as a concern. So, like that's kind of the issues, uh we need some way to differentiate if folks care about that- or maybe the other question should just be built into the Kubler health check.

B

You know the Google helps actually fail at the container runtime. It is down for an extended period of time. It might be also a valid proposal.

E

Oh David, one question: how is this different from the generic plan, because generic plaque already caused uh CRI to at least All Pause right frequently.

B

E

B

The difference is basically that this is just checking if uh I think it makes I need to check exactly what call it's making but just checking if the container runtime is up and then it's marking the the health Z endpoint directly. So the difference is that uh you know the Kublai actually exposes like Health Z endpoint right that you can kind of curl and you can get that as a response, so um that can be used by like a script or something else.

B

That's checking the health periodically right as opposed to something internally in Google, which is doing that foreign.

B

So the main idea for us is, like you know, instead of having our script, that does the health C check right now, it has also to do the container runtime check via correct model. We could just check with Kubla directly and avoid loading in additional binaries, and uh you know creating more IO.

B

That's unnecessary and hopefully that'll be useful for other folks as well, who are doing health checks.

F

Yeah David, it seems like it's just calling. You know, Christ status, um exactly.

B

F

When we extend and enhance that we'll we'll need to modify this healthy check right or you know getting getting a list of resources that are available right.

B

Well, I mean the main goal here is just to check that the continuing times up uh so like the idea is like you know. If the correctional status is responding, it's up.

F

Networking is ready that sort of thing yeah.

B

Exactly exactly so, the idea is to check you know, is the container runtime reachable basically yeah.

F

Okay, just reach them; okay, all right or whatever success on the status, all right exactly.

B

Exactly I think that's something that you should make more clear in the pr, though, so that's a good good feedback, then document. Clearly what is actual health check.

G

Yeah one question about this: is that I remember it's the MPT performed the health checkpoint and we started compare the and on MPD side there is also code written to perform household, and it's just that that code today invoked character. But actually, if we just change that part to a client code, then we don't need to load the binary either right. Why that is not considered.

B

Yeah, that's a good point. I think we need to look into the NPD side, I believe right now we have some script that actually we don't use NPD for this actually.

A

G

Oh okay, that makes.

B

Sense, that's something that would be done in NPD.

G

Oh okay, because okay for uh yeah I think we use that in some other products. Basically, we use MPD to uh there's a house chat, plugin there and it load that uh joins, has house type plugin to check the uh Cube Lane and contain healthy needs and restart it's basically the same behavior with the script, but uh result before or four kubernetes before, and we use a a midi and a midi plugin there and there today. We are also invoking cry Castle periodically from that Health chat code.

F

G

uh We can also change that to a CI client. Instead, if we think loading another binary is an overhead just.

B

Another question yeah another one, but even with that option just to be clear if it was an NPD plug-in that would be run periodically. That plugin would have to be loaded in and out of memory right like it wouldn't be long running or would it be long running.

G

B

G

Is uh it's a vlogging yeah.

A

um Yeah, yeah, yeah and now this is why, when I saw this, when the first and the first one I saw this agenda, I pin you because I said so, then we won't have the plan and later MPD take over. So we don't need to have those script. It sounds like we'll, never finish that work first, the thing and they said that's why no puc I said. Can you take a look but on the other hand, quickly after pingu and offline, and ask you to take a look I quickly reminded?

A

Actually, maybe we do uh I think these features still have the meaning, because not everyone deployed after know the problem. Detector right you do want to next we could have a while even like the uh maybe people don't deploy after know the problem, detector and- uh and they only have like the container, they pass the kubernetes. They still have a way to check. So that's why I kind of I also told myself? Oh, maybe that I need to think about the front other way.

A

I just share here the context, because some of we do have the MPD and there's the planning already build, but it looks like within shifter the standard config from the previous script, based of the check, kubernet and and also containerdy healthiness to that NPD, uh NPD and and but on the other hand, the containerdy itself and the crowd itself have that healthy and the point maybe help other uh this job because they may don't deploy NPD, they don't, they have to travel, depart NPD. So just.

G

Yeah yeah, but another question I have related to this. Is that why, uh because another option that we can just let the condition one time expose its Health standpoint itself right so that it we can gather up the dependency and cubelet in between and yes, I. Just wonder why? We uh why those are not. uh Those options are not considered instead of just uh that's.

B

A good point, I guess I guess, is it in I guess? Is that valuable to have like a client connect remotely to crackle and check if the client was able to make that connection, because that's kind of what Google's trying to do versus if the container runtime itself is up I, don't know if that difference matters right like is there a situation? A continual time can be up but like it can't be reachable from like clients, I, don't know, that's sort of the question.

F

Google, it's a client as long as the sockets there should be, everybody should be able to go into the apis.

F

I mean same same thing can be said for some of our cni services. I know, there's a cni check and I'm I noticed in here that it looks like you're going to check for runtime, ready image, service, ready and network ready. So we probably wouldn't get this a little bit a little bit deeper. I think Dave.

B

Okay, yeah, maybe this makes sense on the runtime directly. That'll also make sense. I, don't know the the current kind of all the health checks are built around connecting to the CRI and then checking if that connection's special. But if the socket's there and container runtime says it's ready, maybe that's good enough.

A

For this one, basically, it is only only you solve the one problem right. So basically, I have published this in the points. It's not really healthy, it's not really continually or cry out. It's really healthy. It's basically, it is uh alive. It's running the process is bad and and into State can publish that at the point. That's all so so um I'm not sure. Are we going to eventually once we have this healthy?

A

Eventually, we are going to build more logical there. Actually.

B

D

Guess the time it takes to process the CRI I mean it's.

H

D

Way, generic plug does that right, like how much time it's taken.

B

Okay, okay, I think. Let me go back I'm working with with someone else in this and let me kind of gather some feedback I think it's useful, maybe consider something on the container runtime and uh let me bring this back into some internal discussion and then come back later. um Thanks all for the discussion.

F

F

There are some discussions.

A

Yeah, let's move to the next topic: Paco are you here? Oh yeah I'll show you here, yeah.

I

um This this topic is about the image pulling uh at the same time, and we have already have a option of like like to to make returns here, QPS and first to a limit, but uh I find that there uh I recently got got the issue, and the issue says this works not, as is expected and uh I have a PR uh forecast to fix it. But after some really thinking about this feature, uh this feature seems not to make sense.

I

The most scenarios and I I'd like to ask for some feedbacks from sync node and also I, think we should have a new flag like non-level limit or something else, because the current logic is just the qts and burst, but this this just limit the um starting image pulling limit, not the no level limit of parallel post as Raven says, I think I'm, not sure. If.

E

Yeah so pekko, if I remember correctly in the issue, you mentioned that somehow the users were under the impression that the limit is actually on the number of in-flight parallel image pools. But that is not the case right. The burst actually only limits the number of poles that kublitz sent to runtime.

E

It doesn't care how many pools are in Flight already. So that's why we want to change it instead of doing the QPS burst. So we want to do a uh no level limit on number of in-flight poles.

I

G

Oh I heard Echo. Oh I just wanted to pass one here, because I can place some internal calculation with this trying to use this QPS to actually limit the the the concurrency. So it's basically impossible because the image pool requires this long. It's a long, long running request. It's not just some one-time thing.

G

Just limiting the QPS won't work, because if the image take much image, protect two minutes five minutes just limiting how many poor you can start every second doesn't help so like we even did this kind of very complex calculation to try to see whether we can use this to achieve some kind of concurrency limit and just impossible. So at the end we didn't use this at all. So yeah.

I

Yes, this may help in some cases, but but if the image size we don't know- and the registry is the hand of some other problems- I think because this may have been most kids, but there would be some con cases. I think.

D

Another challenge here will be right, like even limiting to images, may not be sufficient, because what we are actually pulling at the runtime layer is the layers of the images.

D

So when we write it down baby, it will be good to capture that one more thing that we've done in cryo is: we've had an added an option to do an image pull within a pod C group, so the image pool gets charged to the Pod instead of to the system.

D

So so, if you have more resources at the pot C group, your image pull will be faster, but of course it doesn't account uh networking. It's only memory and CPU.

A

Yeah, that's that's uh yeah, that's also, but that's complicated, I I. We yeah. We always wanted you to uh to the red uh worker right so initiate, but that's but I also understand that's kind of make the uh user developer to to oh yeah, okay, vertical vertical impress update, can help this one but I. We thought the desperately want better things and we can get some user exposure. I just always worry about it. This is my old experience before we have that vertical Dynamic update in book.

A

Basically user is just cause more trouble like that for customer, which is my type of developer right, so that the developers so like admin uh is also my user, but they they are maybe happy because it's basically but I shift that problem to the every uh application owner. So when it deprives our services, they don't know how to configure their work node. Let's make that even more more challenging. So more people stopped using guarantee instant use in first of all, then, which has to return back to make the admin work is much harder.

A

You can see that the cascading problem and circling like I say so. We shift the problem them because from the itemic mechanism is easier, so then charge to the each application owner because you charge it today, group so then make the their job maybe be killed. All those kind of things all strength, all those kind of things, and so they have to configure their working out better. So then, before we have this dynamic in place, upgrade update the resource in the past.

A

So the end up, they all changing using something like a burstable and once everybody using burstable, then you have to preemption all those kind of things. Logic- and you know items actually also have the problem, because you also have the arrow budget right so for the services what you host. So you are then to predict about that resource capacity planning better, so I, just I just want to share people here, since we also many of the people development. Also- or maybe it's item in here. So this that's the that's potential problem, yeah.

C

Yeah this sounds like another use case for using evpf to detect that okay image pool is starting and immediately give whatever capacity you can to the Pod as long as they are allocating the budget, the budgeting for it, and that will make things better. It's much more responsive. These are the deterministic events for which we can really use the technology.

A

But uh we need here's the one problem, it's right, so kubernetes we did a terrible job on the network eye opener is management.

A

Because it's just harder for us to do because there's the light of a backbone right, so their schedule is totally different from.

C

A

Cpu and the memory and the days all those kind of things so.

I

A

Job on those Network aisle I mean I, I actually say we did nothing basically, and this I hope we also did this really poor, um but the hopefulness it will be too you can. We can help to do something at the Disco at all, but the network that I owe so far I, don't see whether it's the good solution for kubernetes, so the basic Universe are the kubernetes uh vendor to think about how they are going to integrate, with the their backbone Network offer and do this education. So just.

C

uh I actually did one talk in the open source Summit in Texas in Austin Texas about uh bringing in qos for kubernetes, pods I, don't know if that's something Tim also mentioned he was interested in it, but we don't have a clear set of requirements to. Essentially this was more from the line of uh from the point of you know: uh different pods have uh different network needs. If you have a pod, that's doing image, processing like file, backup versus a pod, that's processing, you know payment traffic. They both compete for the same network bandwidth.

C

One of them clearly has more important things to do. So.

C

After all, this dust settles on the In-Place.

A

C

Maybe we'll revisit that at some point and see if that makes sense for kubernetes.

C

I

For your uh feedbacks and uh I have done something related and I have to have opened a podcast in continuity to add metrics for image. Pro related thing, for example, uh how many images are pulling in process and some uh the the duration of the image pulling in your Matrix in another metrics? And if we have those Matrix I, think users can a limited can know how to set the QPS or burst live using the uh using such a calculation to to know.

F

E

F

Think it's also important to note here that we had a lot of discussions at The. Summit cried V28. You know, Summit discussions um around the concept of adding into the Pod specifications, some image service. You know kind of declarative information that would help explain to The Container runtime, the the types of caching that needs to happen. How fast this particular container image needs to be pulled. You.

A

Know whether it.

F

Should be kept on, you know on store in the node, um as well as whether or not you're you should use lazy. You know image pooling services for that. You know, for this particular container.

F

um Different runtime handlers are going to want to pull the images into their own VM, for example, in combination containers, so I think it it sort of points out that we need. We need some more declarative information in the plot specifications to hand down to who's handling these. These image pools in this additional networks and and pretty soon multi-networks on each of the networking devices that we have available to us. So we we need. We need to somehow manage this stuff.

F

Three and like vinay was saying we need some quality of service information in those pod specs, so we can go from an imperative right, API to more declarative around this space, and then we can extend plugins that some of the intelligence we're working on you know to make some decisions on how to manage those images. You know in a in a common way across the entire node.

C

Right I think multinet dripping brings in a very interesting that effort has started out. I was in one of those contributor submit talks, uh they're talking I, think the use case of using SRI makes the high-speed Nicks for getting bulk data that might and that separates your Port traffic. uh It could. Overall, there are, there are a few different design patterns that we could look at here. uh Qos included, hopefully with this whole involvement to you know better Network management for kubernetes.

A

um I have to check the time. I know there. People have so so Paco I. um So thanks for the after this one, so I basically I think the uh can. We just uh looks like the everyone agree about the limited, the number of the parallel uh poor, at least at this moment before we accept and I I, think everyone agree about the net know the level of the limit right concurrent, the pool at this moment.

A

So can you uh summarize what we discussed here and, of course, your current appear together with the river and then we we can working on new? What? What do you propose here and continue with that? New okay.

I

A

Thanks and thanks, we will also volunteer to help and um next one we may really I know you. We have you earlier.

D

C

A

C

A

Can you update yeah.

C

I think that pretty much covers it. There was uh the issue that came up was with after rebasing. The bunch of changes came in uh in the past uh day or so, and there were some code changes I needed to update with the rebase. The big change is essentially adding uh the new CI job. I still haven't had the chance to look into. Why that's feeling.

C

uh David Porter had helped me with uh with it, and we found that some of the config parameters weren't right and then that it went past that now the API server is not coming up. uh So I need to look into it. Hopefully, I'll find time tonight or tomorrow to look into that. Yeah.

B

Happy to help you do you book that way. Yeah.

C

Thanks very much appreciate it. Yeah.

C

That's it from me thanks.

A

Billy yeah.

A

Next one I think the sorry about the uh Ian. uh Sorry I made honest. Your name is wrong, but can you talk about the CPU site.

H

Sure um yeah name is pronounced, um Ian um yeah I was talking with um Tim and some other people about uh moving some a CPU set Library, that's used within CPU manager, primarily um into kubernetes utilities, and um it's evolved into quite a little project, but um uh I just wanted to post the pr in case anybody had some um interest in um opinion on the deletion of N64 variants, or um uh you know kind of the re deleting the no sort variant I was originally planning to delete the no sort variants of the um API.

H

um But then um Tim suggested I take a look at the set API, which has a um list and unsorted list methods and to mimic that for this Library. So that's kind of the direction I went with, but just wanted to kind of from the Spy people and see if anybody had any ideas or anything.

A

Do you want to.

D

A

D

I think that the move itself made sense to me. uh I I can I can review the pr like uh and see if I have any thoughts on yeah.

H

Yeah cool thanks.

A

Okay, thanks, that's all because I think the rest staff will already covered and uh any other topic of people want to discuss.

A

Okay, thanks, you got more than 20 minutes back.

C