Kubernetes SIG Node, 4 Aug 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG Node 20200804

Description

Meeting Agenda:

https://docs.google.com/document/d/1j3vrG6BgE0hUDs2e-1ZUegKN4W4Adb1B6oJ6j-4kyPU

A

Thank you. The host.

B

I click the button.

A

A

Okay, let's, let's start and david, do you want to share what you found with the team, because this is kind of the follow up? As last time we talked about the issue, pr charge and the that that we continue discussing. We haven't uh have the proposal yet, but this is the good start.

B

Thanks yep so um dawn and derek and seth, and I chatted about sort of what we think some of the most important things are for the sig and one of the things that came up is that as a first step, we'd like to have a better idea of how we're doing in terms of reviewing pull requests that come into the sig.

B

So I went and looked at there's a cool site called devstats and there's some sig specific graphs that break things down by sig that exist, and so I just took a look at those and compiled the ones that we have.

B

I can go through them really quickly. I'll give a brief. Like summary of my findings, which is that we had a significant increase in open prs and a decrease in the number of pull requests that we were reviewing and approving around sort of the december to march or april time.

B

But since then we have been reviewing prs pretty steadily and have reduced our backlog significantly, although that has leveled off sort of recently. So I can just go over some of the graphs, but everyone can obviously click on them themselves.

B

So this is sort of illustrating what I was talking about, which is that um we had a significant spike. Let me see if I can show the timeline, so this is this is the beginning of 2020 here. So we did have a spike during the initial few months of 2020, but we've largely made our way through those prs.

B

The one thing that was a little bit alarming here is the the stall stalled prs which there aren't a ton of so this is probably something that we could attack in an afternoon or in a separate triage meeting, but we do have, I think, a large number of prs that just haven't been touched in a year, so we might want to go through and triage those or figure out why those are stalled.

B

um Some of the other interesting bits. If I look at the workload we we've had a pretty steady workload over the past year. They do have this cool. uh Let's see, where's the reviewers.

B

This is the number of sig reviewers, so you can actually see that we had a drop off in the number of reviewers over the same period of time that we saw an increase in the number of open prs. So that probably suggests that there were fewer people who were actively reviewing prs during the early months of 2020.

B

The other links I'd like to point out these probably changed since I looked at them last week, but here are some hopefully helpful links to prs that need approval for those of us who are approvers in signode, as well as those that are open and are looking for review and are tagged with node. So if you're looking for a place to to get started or to find more prs, then the bot auto assigns you from being on the reviewers list.

B

You're always welcome to take a look at the prs that are open and are ready to go, but need a review.

B

C

I got a random one, usually in depth stats you can. You can also see how much people are contributing. uh Do we know how much? How active approvers are. I I'm asking because just to give more context, I'm asking because it's usually it is. It is the case that people get overloaded or they switch jobs and they like, and people who are approvers in some places. They still they stay as a progress, but they are not. Actually, they are not actually as active, yeah fun fact.

C

If anyone is actually interested in going through a lot of owner files in covert and kubernetes, it's like joe, better still, that is still an appropriate and he is still a requested for review and approval and a lot of pr's, even though he's not actually he's not actively contributing to the brilliant project. Anymore, yep.

B

It doesn't so, I think there were 25-ish open pr's that needed approval right now that are tagged with sig node uh and all of those had a an approver from don derek myself or seth assigned and looked like they were making progress. So it didn't look like any.

B

At least any of the prs that were eligible for approval right now were blocked on us. I looked at the dev stats. I found something interesting, which is that it didn't seem to have all of us. I could find myself, I could find derek, I could find seth, but I couldn't find dawn for whatever reason and I couldn't find a number of the uh of uh the other approvers that uh are perhaps less active today.

B

So I don't know if we're just missing stats for particular people um or what the what the deal is, but I know don approves prs, but she didn't show up in in devstats, so I can try and uh it does. I don't know if it does. I think I only looked in kubernetes.

B

Kubernetes, okay, cool! Well thanks for the the points um what I've done actually so, hopefully we can make this sort of a two-minute or shorter. You know kind of piece at the beginning of sig meetings, I've added a sig health portion of the meeting notes at the top.

B

So I'm not going to keep this the meeting notes from today updated, but if you have other links that you think would be useful, especially things like how active are our approvers, I think, is a really useful metric to have there's only these.

B

Four, um I think, that's so there's a question from morgan. If there's only four approvers uh which to reiterate our dawn, derek myself and seth, can we shadow along to become approvers eventually.

B

D

Yeah, so there's actually uh probably still some other approvers who um might be taking either temporary assignment elsewhere or I'm not sure if they will be coming coming back in the near term, but have a large like historical knowledge uh of the code base. um So uh we'd have to fault with some of those individuals and I think um among the group of us here we can kind of take a health check assessment on what their near-term assignments are.

D

um As for bringing up new approvers, um I think the answer is: is yes always we'd love to get help and have folks shadow?

D

The only uh caveat on that is like um we want to make sure that people have some historical knowledge uh to the project to know past decisions. uh Well, so the most recent enhancements for seth and david to get elevated kind of captured their long-term engagement and so, but for anyone who who wants to help uh get on board with approving and and getting that historical uh context. uh I think all of us would love to help facilitate and accelerate that.

B

The other thing I'll point out is just that: approver isn't generally a binary for the sig that, if you're really experienced in a certain smaller area the code base, it's not we've. Definitely given out approver, for example, to the the container manager like topology and cpu, pinning section of the code base to people. Who've worked on that a lot, so um I think that's also a good strategy for being able to unblock smaller, more focused, prs.

D

Yeah um so kevin uh if clues from nvidia has uh approved a rights on a number of sub elements in the trade. So yeah really here we're just talking about top level approval to cubelet and command cubelet and probably test ede node.

D

But there's plenty of subtrees that if you're working in we are more than happy to recognize that and give rates out too.

A

I just want to add it uh in the past. We actually successfully grow a lot of the people uh from uh to the reviewer and then to the approver. uh We've been doing this constantly and also I personally actually uh kind of like the on purpose, uh try to not use thing. My approval power until deleted recently, there's- maybe a lot more leader than compared to before, but over time, actually a lot of time.

A

I don't need to perform my approval uh power because we do have a many of the approver in the signal the community in the past, so we so now we're starting. I will use the starting on the reviewer because most of the approver and also other reviewer now we can actually want to grow more on the both reviewer and also approval just to share with the team here. Yeah and there's another question from the theme and didn't want to see the regular tragedy meeting. Yes, we are agreeing. I think this is exactly uh uh we.

A

I asked david to take the 15 minutes initially, because we we asked you talk about how we have the regular. What's the procedure and the process to have the regular charge meeting and- and we have to make the final decision here so this is why, like the, there is today's kind of initial topic today, yeah dark, do you want to.

D

Yeah, so I think calendaring is hard and we had a number of activities that we're trying to get ramped up, that we didn't want to um uh kind of take away the momentum on any of them. I guess I would view right like uh so.

D

We have the ci health activity that we'll talk about afterwards and then um I had originally proposed the dedicated triage meeting, but then we thought well, let's just make a standing 15 minutes on this meeting weekly to at least go through and report if anything stalled and not be so oriented towards uh new feature, discussion or issue discussion that type of thing so um the moment I think, across david myself and stuff getting our calendars right.

D

This dedicating 15 minutes here was our approach and then for the ci stuff that we'll talk about, hopefully getting a readout from that group.

E

uh Sounds good direct, so quick question, so do you expect people to line up prs in during this section of the meeting if something is stalled,.

D

Yeah, I think that would be great right, so just uh uh we're all here right. So it's a good. We just want to have a standing block. 15 minutes talk about helpless, again, uh issues that are being stalled or or needing attention.

E

um Right yeah: let's just do it that way, we'll ask people if they're stuck then come here, uh add it to the agenda and then the initial 15 minutes and we'll usually, if we do it at the end most of the times they we don't get around to it right so and it falls off. That's.

D

Why yeah, even even just getting it linked in here as a thing to call out, took a look at, is often a helpful product, um far more than an email for me. So.

A

uh I didn't see that team. We are not expected in that 15 minutes. We are going to approve a lot of things right, so basically just get people's attention.

A

So we we want to take that utilization uh 15 minutes at the beginning of the meeting and then quickly turn around and find the reviewer and find what is the block and our review process or the approval process, and not really, unless there's the technical attention like that we need discussing, then we connect that will be, but that 15 minutes most is for the uh get people's attention right.

E

Absolutely I totally get understand that don agree.

A

Yeah, I just don't want to just connect the set. I just hope we can set expectation correct here so so directly, and I will talk about more because we last time we last week we meet and we haven't finalized those process yeah. But this week we we're going to continue that that and discuss because last week we put more effort and discussing on the ci project.

A

So do it anymore? You want to share with the community here.

B

Now I'll just say that this is intended as a first step is getting on top of our pr's. I think getting on top of our issues will be a follow-up that will cross the the test group of folks that have been working as well as some of the more or some of the other folks. Here.

A

Okay, so any more questions and uh and.

C

I got one last question, uh so it was mentioned. It was mentioned that it will be ok. It will be okay with y'all to uh to to shout at the shadow of proverbs in order to become reviewers and proverbs in different different areas. What would a, how would you recommend that people actually go about that.

D

um I think there's been a few folks that been trying to mentor uh via slack so just reaching out to one of us. One-On-One is actually really helpful. I think uh we'd be happy to facilitate that one of the things like if I was to be completely honest, we have a lot of folks who will look at code and have a in some ways.

D

I feel too strong of a bias to say yes to contributions and one of the things I know I've been trying to help a couple folks is to be comfortable, saying no and to me being comfortable saying no is the clear dividing line on like I actually am going to pay attention to this.

D

This approver's interaction right like it's, really easy to lgtm things. It's harder to ultimately, then go.

A

D

And say actually, no, and so um what I would find interesting is like if there are folks who are reviewing prs.

D

And they are uncomfortable figuring out how to say like no, I don't really think this change adds much value or it makes it more complicated or the readability is harder and they want guidance, maybe on the best way of communicating that softly.

D

That's like the key skill I feel like it's hard to grow in the community, but I feel like it's a skill that I know both seth dawn and david and others who've worked in here. I think we feel like is really needed uh to get into that approval right. So there's a lot of folks who, who will say, hey, I lgtm a lot of these pr's and I did all these reviews. But then, when I look in the details of the review, I never see any actual constructive criticism um beyond generally this.

D

This seems pretty good right, so I think what I'm looking for is like how do we get constructively, provide criticism and are comfortable saying no uh as a as a thing, because ultimately, it's our job as top level approvers to ensure that, like our scope, doesn't go and brown unbounded, and so uh if, if folks have been participating in entertainment prs and they want to talk about things that they think may or may not have been right or that type of thing, I think all of us would love to have that discussion.

D

But for me personally, I find those I've been trying to train up now and I'll name names here. It's they are too too nice. I guess I would say- and I think sometimes it's fair- to find polite ways of saying, uh uh constructive criticism on changes.

D

But to me that's always been a key uh thing. I looked for for approvers in the sick.

A

Yeah uh sex there uh dar could share this one, and I I also wanted to hear in the past we did a 10 down song approval and uh because just two, less and and and because sometimes I I personally feel like the really proud it is in the past six years. I I constructively turned down a lot of project and another project. When I look back, um I do think about it. Is damage could be damaged or community?

A

Could then, if you are signal the reliability and- and so so, that's really key and important pieces about being a leader, technical leader here, and I it's easy to agree and without question and without the challenge. So that's why that's the kind of the key part that we tried to looking for to become to off the approval, different and different from the reviewer yeah. So thanks derrick brought this up yeah because it actually used to be. We have this many discussing uh before.

A

So, let's move to next topic and the circuit and victor: do you want to talk about last week a few of us get together talk about the ci and and signal the reliability project, and then at the end we decided to name it is as the signal the circle project and and the second and the and the victor. Do you want to driving the discussion here and share the doc with.

F

Us hi, I'm sergey, I I'm I'll start and victor jump at any moment. uh We didn't uh sync up uh this uh um mention of the document, but uh yeah. I think we can just uh figure it out as we go um so sig note. Ci group is not something new, it's a effort that already been started before and we've been uh when we say we it's like. I joined uh quite late, but uh I've been running.

F

uh We said uh with uh test fixes and stuff like that uh for a while, and uh the big reason for that is that uh uh tests uh for uh signal specifically uh started degrading and uh just natural causes of degradation, and it's nothing that uh something bad was happening. It's uh it's a life uh and uh our goal for this group is to be proactive and proactively, maintain our test cases and keep the healthiness of code base. So this wouldn't be a challenge any longer.

F

We want to keep releases stable and predictable. We want to improve test quality. We want to improve velocity of contributions, so all the pr validation tests will be stable.

F

We want to improve test coverage, so more features will be covered this test and we'll be more confident in uh contributions that coming into signaled, and we also want to fight the natural causes of test degradation, namely os image support. So whenever new os image comes, we need to support it. We need to adjust to the changing nature of life and what images being used widely.

F

What tools are installed on the nodes and stuff like that, and we also want to um improve test roi, so we want to make sure that uh whatever test we run, we know why we run this test and we have a clear understanding how we benefit from running this test and spending uh both like machine time and people time, maintaining those tests uh active and healthy.

F

um So from execution standpoint, uh we want to uh restart our regular meetups um uh right now. This plan is to do it monday at 10 a.m. So it's a time synchronized with east coast, and europe are a little bit mostly with europe, not asia. um We want to triage issues. uh We want to do some pull request, reviews and bring to attention some reviews at a stale.

F

You also want to run beside the meetings. You want to run some image, support and effort to up to improve the coverage of images.

F

We want to both increase it, where we need to increase it and uh cut tests that are not testing anything that may be just running cpu cycles and don't bring actual uh verification, uh and we plan to work on some uh automation and tools to uh help our life, uh namely like uh alerts when our release blocking tests fail failing. We want to uh do some dashboards to fast to better track uh pr, pull, requests and understand the code coverage for our tests.

F

um And, finally, we want to improve the communication, so every new contribution will be easy and we want to make sure that we know what that's doing. We have an easy way to figure it out easy way to support it and easy way to onboard new members to this effort um yeah. So I think the actual call out here is to join uh join this effort, uh try to contribute in the stability of signal and uh um yeah. We welcome everybody victor want to add anything.

G

Yeah yeah sure thank you sergey. That was a really good, um really good introduction to the you know the the new testing effort um I wanted to just say thank you, ning ning um and you know sergey, and I think rory and quran also and probably david met, and you know, did a lot of work on this document, which was really sort of picking up some of the ci efforts.

G

We had started early on and sort of they had sort of held off, and so this is a formal effort to restart that, and I think that's great one of the things I'm not quite sure of you know when we look at this is. It seems like this is a formal charter thing and I think there has to be some probably review an approval process and I'm not quite sure how to move forward on that, um but yeah. um This is this is good. So can we speak to that for a minute, dawn and derek?

G

Is uh I mean? Is there some review approval process that we need to go through to form this chart.

A

um Actually, I've been involved from the beginning when thing uh and the victor have this idea. So I already reviewed the dock and I already packed it to the last week. I basically yeah there's the review process like the pro like before we want to make this a formal sub project and- and we also want to eliminate uh elected.

A

I think we already elected of the uh like the uh like the leading uh lead leader and drive that the two project, and uh but derrick uh and also derek, is looking into this one and the derek, and I is going to discuss this one and uh derek: do you want to point more.

D

Yeah I mean I, I think.

D

uh There is some uh eyes and t's: we have to dot and cross to go through the formal subproject process, but um this is really covering things that are related to like testing of the cubit sub project itself, but then some of the supporting uh machining around it that um we can get the the paperwork stuff settled. uh I think I'm I'm understating how happy I am to see uh the help in getting this uh brought forward.

D

So much of what's been identified here is fantastic and I'm hoping that this can prove successful and uh uh can provide a daily readout on um they're, not a daily, I'm sorry, a weekly readout on how we're tracking those in the world sick with with health. On this, um the only thing I was thinking about after this is, if some of the other subprojects that were listed, there could get better integrated in this process so um uh but don, and I can can trace that down afterwards. uh But I think um this is long overdue.

D

So a big thank you to all who were forward.

A

I think some of the old several projects, maybe can consolidate or maybe deprecate it. So we can we uh direct and I go across process those things together after smoking. So then, then we will report back to the community.

A

If we do things and uh but yeah, that's the sum of the paperwork, we're going to pick up and uh then drive through offline, and so maybe we also need to see that, what's the how often the sub project should be reported back to the signal directly, we used to have like the other sub project regularly, like the monthly or something it is like the evening by weekly report back to the signal. Maybe we also need to finalize that process and they used to be like the a little bit negative menu.

A

We poke up the people say. Oh, please come back to the signal and and the sound is just nature because invaded in the signal, the community meeting, because it's kind of daily stuff, like container runtime interface, used to be every week there. They have the updated status and because everyone pay attention so, but the zombie is a little bit like the ad hoc, like the node problem detector and I have to manually to poke engineers say: oh, come back to signal at least like the monthly to report.

A

So so different efforts have the different uh things. Maybe we want to follow or finalize something like that at least.

D

Reported back, I think, for how david presented um the overall health metrics for the primary, like, I guess, uh top level project of the sig.

D

uh I think it'd be good to pair up um this group to output the current status in the beginning of every sid call um uh and we can adjust as things go, but um that seems to work just fine for me, some of the other ones like cri and stuff they've. um Well, they need to get out of this perpetual alpha state they've also are seeing less uh less churn, whereas this is just. uh Are we continually healthy.

G

One comment I would make is, uh I think, in the previous meeting about the sub project for ci, uh we had sort of said that the group would meet once every two weeks and maybe if we're going to uh you know, give an update every week, we might need to change that to weekly um yeah. That's.

D

Just the thing, so, I'm still hoping we can like resurrect uh some of the supporting tooling we had like if we regressed on a performance like we had a performance dashboard. That was being, uh I think, don you had an intern or something that built that at one time. Yes, so.

A

D

Can have just the dashboard that we can very quickly look at our our health. Oh.

G

D

Doesn't require a greater bi-weekly or weekly meeting victor, but more just like a way that we can pull um pull the health.

G

Of the effort, okay, yeah, I think we maybe um found that graph where it was spinning up cubelet, pods and seeing how long it took them to spin up and stuff like that. Is that the graph you're.

A

Referring to yes, that's the load, the level of purple test, build by my intent and then my turn by the google nerd in the past and uh but recently last half last one year almost like that. Nobody really mentioned that one. So that's right! Yeah.

G

Yeah, I think, morgan, I think, was the one that sort of found that, and you know we were asking questions like who's looking at this thing and before we make changes to test code, let's understand so that's good and that that'll be good to get that revolved and you know start looking at that to see how changes improve or hopefully, don't decrease performance. So, okay, good. I.

A

Just want to share like, in the past the coup, a lot of the kubernetes resource leakage issue and a docker performance related issue actually is founded by that dashboard, and we even find like the one of the remember, it's nectar. We need couple release, blocker related to the storage and the uh the walnut mountain and I'm unking, also founded by that dash dashboard yeah. So.

F

There was a question on chat from jorge, I think, asking whether we can have a doodle for meeting time meeting times are hard to schedule across such a big community. So, uh let's uh victor, let's chat about like meeting time, and maybe we need to sign a doodle and uh collect more feedback on on the timing.

F

I think 10 p.m and monday worked before. uh We need to check whether we want to change it.

G

You mean uh it was, I think, 10 a.m, pacific time, which would be like one eastern time and um sure. Maybe we can start with that and we can start out and see if that's a good time and folks can give us feedback. Otherwise, we won't ever have a meeting.

A

Maybe we have the form and ask the people like that they just sign on and they put it up extra time and the majority when that's easier and who wants to participate and promo and pro we can propose several times and ask people to vote that'd be easier. Otherwise, arbitrary is hard for us to decide. It.

G

Okay sure we'll do that.

A

Cool, so next one and renee do you want to? Oh, let me let me grant you the host access, so you can slide.

H

Sure, thanks john.

A

Oh god um circuit, can you do? Can you try to do that? uh I I noticed that my my laptop is just dyed too hot right now.

A

Well, I can't do.

I

A

Worries just this, my machine so slowly.

A

Can you try really, I think I give you the.

H

It's still saying, post disabled attendee screen shooting. Let me try again yeah not yet.

A

Let me try one more time.

A

A

A

H

So the uh I have uh put that on google drive and uh the signo dock has a link to it. If you can pull it up and share your screen, that also works I'll just talk through it.

H

There's a pdf there.

A

um Circuit can you share because my my laptop is frozen yeah yeah? Can you help to share those uh the slides.

H

Let me let me just give the link in the chat. It's a lot easier. That way find the.

B

Talk, I can grab it here. You go.

A

H

H

Perfect, okay, so I think david and tim have been uh discussing this. uh I was looking into some other work for a little bit, and I came back to this. uh If I, if I summarize correctly, it looks like there are two tracks: two train trains of thought that are going, one is to move, uh revert back to an earlier state where we had where resources allocated uh the kubelet does some checkpoint.

H

Maintaining this is kind of we discussed this about three weeks ago as a potential option where cpu manager is already doing it, and other uh train is that we have preferred resources, a new field in the api port spec, and uh that would enable the feature where you have. uh Okay. This is the minimum resources I need to be able to run, and uh this is my ideal resources. So uh I think it's an impact.

H

We are at an impasse whether we should go with that or whether we should have a checkpoint. As far as tim's point of view is concerned, uh david, is that accurate summary.

B

Yeah, I would just say that tim doesn't think the field and the spec is justified by what we're getting um out of it.

H

Yeah, I thought already read as much into it so uh anyways I kind of looked at that google checkpointing code. I didn't get into uh deep, deep details or try it out yet, but it seems like it's uh you you create a structure around it and there's a backing file you it saves to, and this seems to be some caching mechanism. uh Please let me know if I'm mistaken about any of these, so uh based on that what I saw, I think I was looking at the following flow.

H

We can previously did this uh in an earlier iteration where resources allocated was in status, but we did not have a way to recover if it was lost. But uh let's say we go with the approach of having a checkpoint in the couplet uh similar to the cpu manager. The way it's doing so, essentially, when a pod gets added, we store uh the part, its name, its containers and what its resources are at the time of admission, and that is your allocated now we reflect that back into the the status contains.

H

uh The container status contains two fields. One is the resources which is uh the actual running state of the container, and there is the resources allocated a resources list, a resource list type, which is the reservation that we agreed to uh based on the admission uh being successful, so that would be persisted in the state file and reflected uh by uh into the st uh status allocated field uh from so in this would uh yeah it's a patch that happens in sync pod. So immediately as we determine that you know, this is possible.

H

Yes, the resize can be admitted, we do the patch and then the second we go.

H

The container the runtime, the runtime manager, has a pod uh compute pod actions and sync pod, where it looks for what the code is doing today uh using the spec field, resources allocated, it does the same checks and it goes through breaks down the updates that it needs to do orders them correctly and all that stuff and then calls the cri update container resources where necessary and the container status api will respond with the updated resources, the actual resources that's configured uh configuration on the container by the runtime and that gets reflected into the pod status and the update that goes to the api server.

H

So this goes to status, dot resources. I should have mentioned status of resources over there. So if you please go to the next slide,.

H

So, to summarize, I think the changes that we are looking at here is uh api uh pod spec. This is the this is what uh this is fundamental. We have the containers, resources it becomes mutable we control to see that uh only cpu and memory are allowed to mutate. At this time we have a new object resources allocated in of type resource list in container status, and this is rediscovered from the state checkpoint state. In case the status were lost, so it's something that we can recover and we have the resources.

H

What's already there today, resources field in the status, which is the actual running the configured uh configuration on the uh container as reported by the runtime. So this I think this include this would report the runtime of today would report cpu uh reservation, requests, cpu limits and memory limits. uh We get the memory requests from the state, so that's uh the little detail there and about handling uh coupe, let's start restart. uh When we get a new part, we come into handlebar edition and we use a checkpoint state.

H

If part is found, then we use the resources allocated value that was saved persisted if not found. Then we do. The standard can admit part and then admit to reject, as we are doing today and scheduler would have to use max. I believe we can get away with using uh status or resources allocated alone, but I'm not I'm not uh sure if, if the status is not available, what's gonna happen during the time, uh maybe we'll have to limit the user from modifying the resources on a part.

H

That's not running yet, but that's not a great idea to me because in our earlier prototype one of the things that the jd.com and the companies that took our prototype and worked with it did is they looked to see what pods were uh not were pending because of resources tried reducing it while it's spending. So that's something. It's a use case that we'll have to say goodbye to. If we did just resources allocated, we can go with this. For now uh tim.

H

uh There is a corner case where a malicious user can game the system and get more without paying for it. I could have documented that, in the uh thread that we are going with tim on the revisit of the spec of the cap, so I do not david if had a chance to read through that.

B

uh I didn't read: I I looked at it briefly. I didn't actually try and reason through it myself, yeah, but.

H

It's a fairly, uh I think it's not a big concern in today's kubernetes, where uh you know the entire cluster is owned by one. So, if they're uh doing this they're only hurting themselves, the cluster is hurting uh their uh their own usage. But if we had a true multi something we've been working on a true multi-tenant kubernetes system, where the service provider has tenants using a single cluster and you need strict isolation, then this would become a concern.

H

We cannot let gaming of the resources or any controlled by the user on how the resources are managed on any any given node in the cluster.

H

So we can read through it and discuss in detail on the thread.

B

The high level here is just that: we're taking the field in the spec, putting it in a file on the node and then putting it in the status.

H

Yes, that's the summary of this.

H

Please let me know if I missed something or I think we'll I don't know, what's the good step, but this is this looks like a viable option. That's what I just want to present at this point and leave it at that.

H

A

Do we improve uh really thank you for your share with this one yeah and the team, and I have been discussed, I um I understand where he came from and after he told me because he mostly is like the concern.

A

This is the he wanted to um uh dedicate of the powders back to represent the user configuration's user requirement. Yeah. That's why he is concerned about this resource located, is kind of the uh generator automated by system and uh mess up the user config.

A

uh I I buy that intention, but and it's like the, but the problem is today's protospec. Actually I already have a lot of things is defaulting by the system and the generated by the automated by the system.

A

So so so so so so it's kind of the I I like to move towards that direction, but on other hand um to for this particular using that this particular field using that one, um I'm not sure, because I want to see the all the other complexity generated, but I want to ask you one thing why we absolutely need the checkpoint. I think we talked about this before, but can you help me refresh the memory on this?

A

One because cooking that- and I saw the even kubernetes that so you know the history right so history about the previous auto scanning states, but the student diet, auto scanning controller, has a way to check a point that states.

H

I think the the concern here is: let's say everything is stable. Where you have added. Let's say you have a a pod which has uh four cpus available as a node with four cpus available, and then you have one port that uses three and then uh another part, that's using one. You have these two parts. So now the kubelet dies and during the time cubelet is offline. The first part which is asked for 3 goes to 3.5, and then you come back up.

H

If you don't have a way to recover, know what it was admitted at the resources allocated. That is essentially where it is. If we don't know this, then we'll say: okay, p1 3.5, I have capacity I'll admit it. P2 comes and it asks for its one where it was admitted at initially we say we're going to reject it because we don't have any room. So that's.

B

It actually doesn't even have to be done while the keyboard is down. If there's just any pods that have pending resizes, then on restart, you would essentially end up accepting whatever the resize was and rejecting other pods that were previously running.

H

Right, so this is the need. This is the reason why it needs to be persisted somewhere uh where you can have a source of truth. This is what it was and by putting it in the pod spec, we had a single source of truth, uh the aps server, which tim is against, and uh the other option is to use the checkpoint mechanism. That's there in kublet today and achieve the same goal by moving this resources allocated field to status. In fact, the naming of resources allocated.

H

uh If you recall it, was initially in status, and at that time we didn't. We weren't aware of the need for uh handling the case where status is lost. That came up during one of the discussions, and then we moved it to spec, and I just didn't uh take the time to rename it to something more appropriate, because resources allocated is more of a status sounding kind of thing. Rather.

A

H

A spec a desired spec or desired state. So that's my bad.

D

So then, on the note yet for scheduler, I assume that's. The same applies to quota here.

B

Quota, you can actually do quota just on requests. Yeah, since downsizes are essentially guaranteed to be accepted by the cubelet.

D

uh Why are downsizes guaranteed to be accepted.

B

uh At least downsizes in terms of requests are guaranteed to be admitted, rather not not actuated, but they're guaranteed to be admitted by the.

H

Cubit yeah we had uh max in the in the earlier iteration of the cap. We had max for resource quota as well, but uh figured david kind of identified. We didn't need it. uh There shouldn't be an extended period of time where the cluster resources you, if you were to go with uh just the resources allocated, you would end up in a situation where you're giving more than the resource quota allows for an extended period of time.

D

This capability lets me size up and size down right. Yes, so why?

D

Why is the max not pertinent on size up for resource quota.

H

So the uh there's.

B

A trade-off here.

H

Right go ahead. Sorry, yeah, the! If we allow this okay, um I'm just resource quota being, like the literal cube, object resource right right. If two, two very successive okay, I see the point who have to revisit this david.

B

Yep so derek, um but you can think about it. Two different ways. One is that if you enforce quota on the what essentially, what the user has asked for, and only that, then there is a race where, for example, I lower the lower the desired resources for pod a and then increase the desired resources for pod b and those are not rejected by the quota controller or by quota, because you did one and then the other.

B

Of course the cubelet on one node could accept or could admit and then run the resource increase before the cubelet on the other admits the resource decrease.

B

But my contention is that it's generally a benign race and we we don't expect the cubelet to reject um changes to resource requests that are downward. We ex we expect those to be admitted. The alternative is that we could use max. The only thing that that introduces that's a little bit strange is that then you could have the cubelet have status updates be rejected because it would violate quota.

B

So then we would just have to be able to deal with okay. If the cubelet accepts something checkpoints that and then tries to report it in status now we have to figure out how to handle the edge case in which um someone resized their pod or no.

B

It would be quite the series of events, I'll just say: I won't try and enumerate them, but I'm pretty sure you can end up in a scenario in which the cubelet would like to increase resources allocated for some pods somewhere, um but that's prevented by.

B

D

I agree the cubelet will accept downward changes, but it can't always um realize those downward changes.

B

Yes, that's only for limits right, so requests can always be realized, uh at least like cpu request changes.

D

So that might be dependent on the configuration.

B

Of that cube right um yep, that is so. I I'll also say that right now we allow we don't. So we have two options. We can either try and serialize um runtime changes with admission, which I think would make the cubelet hard to reason about, but that would be one option is to say: okay, actually, the cubelet can't determine whether or the cubic can't fully admit you until it's ensured that it can actuate this resource resize.

B

The current design essentially says that we'll admit you at a certain set of resources and use the um it's like. There are two there's actually two benign races. One is a race between um propagating requests to resources allocated like there's the race during admission that I talked about previously, there's also one during actuation, in which it's possible that the container runtime could uh reject or just be unavailable or something and not be able to okay.

D

I guess all I'm calling out is quota lets. You quote up both requests and limits, and so I agree, and so all I'm asking is like if you have quotas on limits and you have quotas on.

A

D

um Which many users in the world do to control like the ratio of overcommit? um The line you have for the scheduler would need to apply to the quota system as well.

H

Okay, cool so david. There is a case where, uh let's say it's one, uh not one downward and one upward, but two uh upward increases coming in quick succession before uh the cubelet on say the first part uh it both are acceptable to the two parts p1p2 on two different nodes and both are acceptable. Just looking at raw capacity.

H

There is enough capacity for that and your uh p1 comes and the kubelet grants it, uh but it hasn't had a chance to write to resources allocated yet because there's some delay in the watch due and a second request for p2 increase comes uh if we're strictly looking at resources allocated we're going to let it through, whereas we should really not be right.

B

Right so my my suggestion was that for purposes of quota enforcement that we use the desired resources and for purposes of ensuring that a node is not over committed, we, the cubelet internally, is using a checkpoint file which is resources allocated, which is reflected in the status, and those two can differ.

B

But I I do agree with derek uh I hadn't. I had forgotten that you can do quotas on limits. To be honest, so I do think we will need to change the way that we do this for products, yeah yep.

H

Maybe we can break it down. uh Well, it's simple to just use max.

D

Yeah, I know we're at time uh yeah just the I'm trying to make sure I get the iteration. This is going so we're saying we want to put allocation under status and absence of that status. Qubit would still do local checkpointing.

D

A

Here's what I'm thinking the reason.

D

For that local checkpoint was.

H

Was what in in case, we lose uh the resources allocated? If we don't know what it is, let's say it's in status, but we lose the status it gets wiped. Then we don't have a way for uh if cubelet were to restart, we don't have a way to bring it back to the state of pre-restart.

H

uh If we did not have the info.

D

H

Had a information.

D

Extended period of disconnect from an api server and still had a workload running and then the cable restarted, reconnected.

H

And resources allocated was lost because status was wiped, so there is a couple of uh fairly low probability. What to me seems like low probability, ifs there that are going in, but we are. I think the principle is that if we didn't have a checkpointed way of recovering we're violating api uh api uh guidance.

D

Yeah, so we already have other checkpoints, that's fine. I know we eliminated one of them recently. We still have.

H

Yeah you mentioned that and I looked into that manager so yeah. I am okay with either approach uh this or the preferred that david brought up and tim. I think we are kind of undecided. Tim was wondering how to break this impasse.

H

Should we get a sig uh a lead from signor lee to come and explain why that feature is so useful or should we get a third reviewer? That's the questions. Tim is asking right now so, but I just wanted to present this approach and see think through it like get you on it and uh I've not prototyped the checkpoint part of it. So I don't.

H

I can't speak very confidently right now, but uh it looks like it might work it's plausible, but you you look at the approach and see if it if there's any holes, that you can poke. Let's get this right.

A

Here's what I and here's, what I'm thinking and uh to unblock this project, because it's always important many many people waiting for this for a while and it goes through actually the checkpoint and the discussion also is proposed initially also and and also we are clearly told community.

A

The status may be lost and don't monitor and watch the status change and to to build your control plan and the logic on top of that one. We only limit our kubernetes provide the default controller without using those things.

A

So so I'm okay with to check point to move forward even like I'm, not really convinced, but I I don't think about the. I think about the things we have that one to.

A

We put off the spec to deprecate, is much harder put into include into the checkpoint, uh implement of the checker point and put it into its status at least uh there's the way, because we clearly clearly currently tell please don't depend on the status to the community. More people violate that all the time, but at least we document that and the clearly total community so to unblock this conversation, I'm totally okay with the checkpoint.

A

How do you think about.

D

I'm I'm fine with the checkpoint. uh Just um we need to ensure we're graceful about not failing in the absence of a checkpoint, um because that's the only painful things we've had in the past: around checkpoints but uh yeah. I think I like the checkpoint, probably better than the earlier iteration, I'm fine with having it in status. I think it's it's a nice user touch to. Let me know what the keyboards actually acknowledge um and it's not actually live usage. So it's it's low right.

D

um So at least what you have here, I think uh it's fine, the thing that bugs me a little bpa is. I uh is uh it's right now: it's cpu and memory, but um in the end, I'd like to be able to like grow it to be. You know, other counted, resources on the node like pids or similar yeah, and so I'm just looking at this thinking uh is there any detriment to that and I can't think of any so in general. I think we've got it here.

H

As long as it's one of the supported fields in the resource list, we should be able to, and even for extended resources at some point it can be added. uh I just didn't see a need for it, and I believe this question was asked in kubecon and piata mentioned. There was no asks, no, nobody was asking for it, so they were not doing it.

A

To to manage more resources, I think we need to be thinking harder on the checkpoint format. Then.

H

Yeah that that's the detail that I have to work out. Yes,.

B

So this will work just out of the box with extended resources right. It's just that vpa doesn't manage those for you correct, not right now at least yeah.

H

We're going to block it in validation, unless we really have a need for people to be doing it, allow it on careful validation. So this approach concept approach we can take.

H

I know I'm exceeding your time but uh yeah, please think through this look at the design and I'm gonna go through more details and if there is anything I see any fishy things I'm to surface it immediately.

A

Really can use what we agreed today. I could work yeah, so we can unblock and uh and uh and so team also ping pokney and a couple times to say.

I

A

We move forward like the whole. The project want to move forward. He also want the project move forward right, so we have laid back first yeah, so.

H

A

Some great behind the scenes so yeah.

H

Yeah, I think what we we're looking at doing right now is uh using a node local checkpointing mechanism to persist, uh cpu requests and uh uh cpu res and memory resources, and so that when, if in order to restart we have that information available- and we can- you know uh converge to that- and uh we reflect that into status, container status, dot resources allocated uh the agreement that we have okay.

H

This is what we're going to provide in terms of requests and uh the container status resources continues to present information that we query from the cri.

H

So I'll look into the details of the checkpointing mechanism, the file format and things like that over the next few weeks and work on it uh start working on it at least but uh yeah. I think we can let tim know we'll try this approach. uh This doesn't rule out having preferred resources at some point. We can always bring that in the future if we need it, but.

A

H

Better, we do it now, rather than later, if that's the, if that's the desired way to do it, because uh you know already, there are three fields that are resources and the whole prospect is spec: dot, resources and status, dot, resources allocated and then status resources. We don't want one more. It kind of starts. Looking a little weird at the point.

B

H

A

Thank you billy, and we joined after time. Sadly,.

A

It's okay, so so uh thank you. Everyone attended today's meeting and uh see you folks next week.

B