Kubernetes SIG Node, 4 May 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG Node 20220504

Description

Meeting Agenda:

https://docs.google.com/document/d/1j3vrG6BgE0hUDs2e-1ZUegKN4W4Adb1B6oJ6j-4kyPU

A

Cool so uh hello, and welcome to the first meeting and uh like to kick off the signal reliability project, uh mostly to focus around improving the testability of the kubelet.

A

Yeah, we have quite a few people here today, which is exciting, so um I guess everybody here saw my emails, but to like go over and make a bit of an introduction about what I want to start with.

A

This is to um start thinking about how we can improve the reliability of the kubelet, uh partially through testability and um refactoring, where necessary, to make it easier to land changes in the future, uh because where we are today, um we have a lot of pretty scary regressions that land with basically every kubernetes release, while we land features in the um and to avoid us being in a place where we will want to eventually rewrite the kubelet I'd. Rather us get the test coverage there today um to make things.

A

You know better for everybody and also easier for us to land new changes in the future.

A

Yeah um so let's see if I can show my screen.

A

I cannot show my screen without restarting uh zoom, so uh can somebody volunteer to share the document that I linked to in the email.

A

A

I don't, could you possibly shout out the document on your screen.

B

Danielle give it a shot. Give me a second.

A

um I never use zoom on my laptop.

A

And so nothing works, apparently.

A

I gave you co-host dark.

B

Yeah, I'm currently hitting the.

B

Share screen thing here and I'm having trouble just choosing one window to share.

B

We're in double.

A

B

A

I love how I usually use linux on the desktop and screen sharing is fine and the one time I try using the like just works operating system can't even share a screen.

A

um Okay, so let's just go on assuming that everybody can uh see the document, uh the link's in chat if people are interested, um so it's kind of a like difficult thing to kick off. But what I want us to do today is to talk about um the ways that we think we can improve. The kubelet uh areas by testing is missing from people who have experience with different parts of the kubelet um and then sort of like define some goals for what we want to try and achieve in 125..

A

uh Does that make sense.

A

Cool um so sergey gave us uh some notes uh that are a pretty good starting point uh that cover um some of the problems we have with our like existing basic unit test coverage and uh some of the like ete gaps that we have today uh with some like specific things, we're missing, um but to quickly summarize that uh we're. Basically, um we have a lot of coffee well.

A

The coverage we have today mostly focuses on the happy cases uh so like we have a lot of tests that will assume that everything is fine or set up everything in a way that is guaranteed to be fine.

A

But we don't really test what happens when things in the kubler start to fail um like we don't have any tests that cover say what happens if the cri fails to respond at a given time um and we've had bugs in the past, where you know say you restart, contain id at the one time, and kubla will never like fully reconcile something, for example, um and those are the kinds of things we don't really have any way to even reproduce and test today.

A

um But I guess I'm gonna open things up for other people to sort of talk about what they see as being major gaps.

C

Yeah I yeah from a container runtime perspective. I agree we, we did a good job. Writing the cry api, but we didn't talk about enough. What would happen if there's a context?

C

You know failure, um you know timeout in at the on the client side or on the server side and how that would affect you know the overall solution. um So yes that definitely on on each of the apis, especially the primary ones like pod run, you know in container uh create and start. We need to go over in much more detail.

C

You know the life cycle and what happens in these failures probably defined in the api, because we we went on a.

C

How did we do it in docker shim perspective, um when we did the container run times, but sometimes in the implementation there was a a gap, a knowledge gap between what kublet wanted or expected uh and what we were doing.

C

The another place, I think that's important to talk about here- is the the life cycle from release to release the expectation of pods. That would can still be. You know running after you've, upgraded either kublet or the container runtime, or the runtime engine right, or a virtual machine and even cni right, the the expectation for that needs to be covered in some test cases.

C

um So I mean that that's why I'm here. I think we definitely need to work in these areas a little bit. You know this whole the bigger life cycle, as well as the threes of the apis. What happens if there's a failure, and it it doesn't just take kublet it's kublet and c advisor, where we're still using c advisor for some part of the solution for metrics and monitoring.

B

So my one thought here is maybe, if we just kind of agree on what it means to be unreliable, um and so uh I think historically, I would look at this as the cubelet is unreliable when it is violating a published invariant in the kubernetes api.

B

So the longest running issue, I'm aware of that we've had intermittent challenges with is when a pod that is either incorrectly or correctly appearing to toggle between a terminal and non-terminal state, which seems to be the most common I'm trying to think through. I don't know who else is on the call historically here that that, in some cases has been a reliability issue at both the control plane side of kubernetes, as well as the I guess, the cube with data plane side, um but for like our community as a whole.

B

I think that that to me is like a like a key thing, which is just like um a thing that uh we could all agree on. Then there's subtler definitions of unreliability that I'm not sure um uh how folks feel like when we mention like what happens when um a runtime restarts or that type of thing.

B

It's clear that um system, deployer of kubernetes uh in some cases, has made mitigations to those issues right like, um even if it's at, like the systemd unit file like if your runtime restarts, let's just restart the keyboard, to be safe anyway, right, like these types of tricks, can happen and that in some cases is uh we haven't been clear about our own invariant uh of expectation there. But I don't necessarily know if we violated one right.

B

I think it's worth talking through and then there's been like a weird squishy area in between where uh particularly on cni's, where there's like expectations with respect to when the cubelet should communicate back to the control plane or not, about particular state transitions that I don't think, we've ever documented any expectations around that I'm not aware of us actually violating anything with respect to what we've published.

B

um But I'm wondering if in this group we're we're looking on insuring. First, we are meeting the invariants we advertise for, or are we trying to also maybe expand the set of invariants? We want to uh give confidence around like thinking about like when, should the cubelet advertise back to a control plane when a pod has an ip address. That's like something. We've never made any statement around um so kind of just curious as a group here.

B

If we can like do the common definition of reliability, though, but that these are the dimensions I could see us looking at. So um I have an idea about that, but I'll let clayton.

D

uh Speak first and then.

E

I'll keep this shirt up a couple of things I was thinking about, like uh as we went through some of the like cleaning up parts of the life cycle where we'd fix something and it would expose a bug or it would interact with another subsystem and then show up as another bug kind of that same long like propagation.

E

At that time we were, it was fairly easy to capture tests that would have caught the issue, but the I think it's it's hard for us to keep a pipeline going of that and work through it. um It felt you know we could record it at the time. We'd discuss it, there's a lot of really good discussions and then they would they would um die out a little bit.

E

I think maybe one thing that I think this effort can really help with is that kind of collation and categorization of key test areas where we can try to do a little bit more to have a long thread most of the tests actually aren't that hard to burn through in batches but they're actually really hard to context, switch into and go right one and then come back.

E

You know three months three weeks later and write another test, so that might be a way to get um if we can start to build up some of those lists and a second one, the second point- and we didn't- I saw it kind of get hit, but it's kind of derek's question about what's on reliability, a lot of the latent bugs they're, very observable in a large system where certain invariants are violated super hard to pick out from a unit test or even a um a node ede, where we would once we knew what the problem was.

E

It was super obvious, seeing the problem um was one act of filtering out reports from users that are of subtle things or something. That's you know it's a small flake and a small part, because we we put the cubelet into a stressed state and we're trying a bunch of different things.

E

I think there's a space and I don't want to call it chaos testing, but there's a space for us to have a little bit more complex of a workload over single node without being all the way to a full system um or, potentially you know, clusters that start in a specific state like a lot of our performance and and ends like those kinds of variances, show up a lot of the pod shut down like where we would never clean up pods.

E

They would just sit there in the cubelet those showed up because we had fairly strict like hey. Why did this pod? Why do some pods take three days to shut down?

E

And so I think we can do a little bit better in that mid ground of a little bit higher than uh node e to e and a little bit less than 40 e.

A

Yeah uh so like the way I was thinking about, this is a lot of the couplet.

A

Today we don't have a documentation of what our expectations are, because there are no tests for a lot of things, uh and so it's less about um whether the kubelet fails in a lot of uh like late and unexpected ways today and more about giving us confidence to know that when we make changes in the future, we're not going to cause regressions, and so a lot of that is uh writing tests that essentially document those invariants uh and then also introducing some of the like uh failure testing.

A

You were talking about into like places like um grpc interfaces and stuff.

A

uh Well, it's just like a lot of stuff where we want to both introduce testing to like find latent bugs, but also introduce testing to stop us from shipping regressions, or at least when we change functionality, know that we're changing functionality.

E

Great and- and I think the invariant is a great way to frame it- the um describing the invariance that should take place in the system are actually surprisingly easy to capture.

E

um You know certainly like on the openshift side. We spent a long time um assessing you know the the sequence of transitions and there's some really obvious ones that pop out. I think we can do a lot better, and this actually shows up in controllers in cube, scheduler and variants where we prob, we documented them built, cube and then never went back and really focused on. Are we always satisfying our appearance? So I completely agree danielle.

B

I think the one thing I would say is part of the reason we are where we are. I don't I think clayton was the only one in the hand up uh I'm sorry. The only comment is like timing and variants have proven really hard to do so, like we were getting a lot of.

B

One of the easy ways to fix a flaky ede was to extend the timeout period um and the pressures of the infrastructure you were testing on um were difficult. So, like I'm, I'm wondering if there's a strong desire to try to maybe push timing and variance or timing guarantees here when we talk talk, reliability where I would kind.

A

B

Versus um like behavior first uh and make timings so secondary, but um uh I I know I've written tests that I had to then later go back and push the time out a little further, because uh people's pressures on that given moment are it's the easiest fix for the.

A

Issue my my take on that would be push for getting behavior tests in place. We can always fix timing later, especially when we start getting tracing in the cooper and stuff. uh Don't.

F

I think cleveland cover mostly, I want to see. I just want to uh mention one things in the past.

F

uh We did the of the part of spec and auditing, so every powder related field and what's the variance we added test that's a couple years ago, we haven't done that for last many years, so I noticed that many new feature new field, add the updated to the part and and even there's the slide like the new vinyl new variants. I did. We don't have a test specific right, especially for the uh non-happy places, so we didn't add those tests. So I think maybe we can start from there like the we.

F

We did the part uh uh auditing, so we added the first round of the loader e2e test, but we are missing tons of things. Also, there are tons of the feature deprecated. We also didn't. We only simply remove the test, but we didn't reunite. Okay, remove certain tests because those tests may be tested with deprecated field, but actually they have the more semantics within cover. Another thing: it is the couple years ago when we do that powder spike auditing. We see the nexus next.

F

One where I wanted to download the level- and we never didn't know that and we just ran out of time. Then we didn't continue. The third one is. We do have like the stress test, but the stress test itself. We ideated always give us a lot of error right so because we are over stressed off of the so people treat that it is we we have the goal back then to make. That is a little bit more reliable right.

F

So it's kind of like the uh how to make sure the uh because no the intuit has basically tried to remove after uncertainty right like try to remove the uh sky-dweller, so it makes sure we try to make the all know. The level test is more predictable.

F

So, like the what's the input, what's the other, but obviously that's not. uh We didn't completely finish that one because still have the api server. We still have the control plan there and we didn't guarantee of the resource for those, because the know the level of the problem, so so so so stress test is level really run continuous.

F

We have a lot of problem. Actually, especially just you mentioned that earlier, you mentioned that unhappy pass right, you overloaded the load and because we didn't test those stress test is being disabled, only manual run when people uh qualify kubernetes the new worship that will be running so a lot of things. There's the enhancement.

B

Maybe the last comment I'll add here is um uh in my experience. We've also hit reliability issues when we lack a feature or any reasonable sense of defense on the node, and then the keyboard is blamed for that source of reliability. So maybe just getting transparency among this group with respect to issues that I I personally sometimes look at as best effort for the cubelet, but not like guaranteed so.

B

Like what the cuba does in the face of um exhaustion, um I often view that as best effort and I'm I would want to get like consensus on. If we all agree on that, uh because like um this accounting was slow, we lacked core features for it. um uh Io contention, I also view as best effort, and so like.

B

Maybe just also clarifying like uh are those are those the unhappy paths that we want to tackle in this group, because um they're often the root cause of an unreliability issue, particularly between cable and runtime communication, that I feel like we're defenseless at right now, and we should just be honest about that in some ways, um whereas uh the measured failure rate of uh pod life cycle and variance being you know invalid is is like a different thing, I guess so. I guess every one of us has had different experiences like I don't.

B

Sometimes I think we could overstate that there's like in in general, I feel like the cuboid is pretty reliable. Honestly, it's just this is an aspirational improvement like a it's not like.

B

uh And I only state that, because we have to kind of measure what we mean by reliable, right and um absent, that measurement like it's hard to focus our problem. I definitely agree the testing side that there are areas, that's right for improvement um and sometimes even in unit testing. I have found unit testing to not be as valuable sometimes when it came to the cubelet, and I'm also not sure if, as like an unspoken but expressed feeling here is, uh um is it the externally measured view of reliability or the internally confidence?

B

We want to gather as a community with respect to forward-looking changes, because I can definitely agree that, like there is confusion between like pod workers and the core cubelet iteration loop, that like well, it works uh and you could externally measure it a dimension of reliability for it. It is confusing to many new folks that come in and, um like I think, back don, do you remember when I I asked I said to vish?

B

Oh, we should drew a a map or like some document that says how everything in the cube it works and at some point then there wasn't like uniform agreement that you even needed that it was like. Oh just look at the code, it was obvious. So um is this more a symptom of the latter than the former is what I'm also wondering, because uh a lot of definitely a bit of push yeah, okay,.

A

um Like we have the problem of like uh when anything touches that bit of the code heavily, um we end off um shipping, either regressions or behavior changes that we don't understand that are then hard to triage, uh so improving both like our coverage of how we test that and also doing some level of refactoring to have a better unit test.

A

As an example, I mean that as we're writing and making changes to that code, we're documenting our assumptions about how that will work, and it makes it easier to triage those issues in the future because we're never going to not ship bugs.

B

Yeah one broad area I could see that we also haven't covered is container run times, are not always reliable and how they report back to the keyboard, and so uh um has definitely happened and um like our reliability, is only as good at as that's the systems we invoke, I guess, and so um just figure out where we want to focus our defense a little bit like c advisor, metrics collection, often stalls right.

B

I think, we've all seen this and you'll see messages like housekeeping intervals, taking longer than 10 10 seconds or something you're like okay. Well, at that point, the cubelet doesn't even have any idea. What's going on in this machine, um uh and I've seen that taken out right.

C

B

But then that and so exactly like.

F

B

It's like a therapy session right, um uh the the and, when a runtime reports back active context, deadline exceeded you're like okay, well, the cue boat's defenseless here too, um uh or one list container call saying something exists and the next one's saying it didn't like uh um all these things can happen, and so I just want to try to understand like uh are we trying to look at total system here, or is it really to the point you're talking to diana it's just danielle, which was a um confidence in evolution of a core control loop which I can totally get behind.

A

I will let david and mike uh reply first.

G

um I can go ahead. Yeah, I mean totally agree with a lot of the discussion here. I just wanted to add one point out I think has been touched on, which is um we also, I think, need to step back. We have tons of jobs and different variants of tests running today, and I think uh a lot of it has been a little bit lost like what are we testing and why are we testing it?

G

So, for example, what I'm trying to say is, uh for example, we have a lot of different variants like in different container d versions, different kernel versions. We have like c group v1. We have secret v2 that we both support, and so I think we need to kind of step back a little bit and think okay, like what are the valid configurations that we support like on the os level and on the container d level and then exactly what tests we need to run.

G

So, for example, one of the big things that's coming forward is a lot of the os districts are moving to secret v2 by default, but as a community, we probably still want to continue to support zebra v1 right. So we need to think okay, which, which test combinations. Do we actually care about uh which container d versions do we want to support right and what's kind of the policy moving forward where okay, we always want to test this, this containery version with this image and so forth.

G

So I think that's also something that has been a little bit ad hoc, especially like updating os images and so forth um that I think we should define a little bit better. What is our policy and what we want to test.

A

We actually have an open issue for that um with some discussion going on mostly between, like me and dennis right now, I think um but yeah um like making sure we have secret v1, v2 power, t and then making super v2 default, for example, is something that I hopefully want to try and get done. This early cycle.

A

But yeah we definitely need to document better, uh especially when it comes to stuff. Like c group. Drivers too, like there have been recent bugs, but like things work, if you use systemd, but don't if you use secret profess.

G

um Yeah exactly and so like, for example, you know we have you know if you, if you sort of multiply it out, you have like kernel versions, times infinity versions, times systemd c group drivers times, like c group configurations it's kind of a lot of stuff, so um maybe it's not reasonable to test every single configuration. So maybe we should just say: okay for sugru v2, maybe only support one driver, okay and we're going to test that making some type of claims around what we really want to test and support.

G

I think it'll be really helpful and important.

A

C

Backing up to the uh question that derek had or the point that he made about it's the container run time too right, I agree um and a part of it is the the way we do parallelism um and execution of the you know for instantiating, pods and containers and pulling images those sorts of things um we're. We we've got this director control loop in the control loop, where kubelet's telling the container one time what to do and there's a there's.

C

Some expectation that these hap, these things happen synchronously without timeouts, but container d will accept as many parallel requests that google asks us to do uh right up to running completely out of resources and you getting timeouts on. You know these contact timeouts. I was talking about earlier, so we we have another model that we could. We could provide, um that might be better and faster than than doing. You know the status update loop that we currently have right now, the sync loop. We, we could probably just push you guys over a stream.

C

Some events that'll notify you what we've actually changed in status, so you can have a better expectation of what's happening in the container runtime other than what you've asked for in what we've replied on um you know in the synchronous call but becomes asynchronous when you get a timeout, okay, derek. So maybe you know it's our an architectural decision to go on that route, but we've got the events um that that you know from the execution of the containers when they change status.

C

We know right away and we could just tell you that this container is gone or this pod um is ready instead of you. You know waiting to ask later on if it's still ready or not.

A

Sadly, reconciliation loops, don't really mix well with uh event, oops.

C

Yeah and to be to be fair container d is parallel in this execution. Much much much more than docker was so when you're in docker shim, and then you moved to container d. All of a sudden things changed a little bit. We started accepting more more of the requests than you would have been able to get before.

C

F

I think I'm done next one, who is the hands up here, so uh I I agree with michael in general. You said here you uh the, but the I think, there's the one problem is that how kubernetes uh integrate I mean not that integrate but integrate with the container runtime the whole integration test. Issues to we know this problem, so this is why we do the cri design and the implementation. So we introduce cri test, but I don't think that we are actively proactively not proactively, even just actively maintaining those tests.

F

We originally thought it is. We are going to continue add a new feature, a new feature to kubernetes. Then a lot of feature will change off the ci, and so we are going to have the test and the ci test validate all the ci implementation.

F

So for all those signaled and the tests we do have the ci test, the even continuity take over those ci tests, those as the pre-submit for sometimes I I haven't checked recently. Did we still have that one? But I didn't see much of the new test that I did over time and also even that time they are test.

F

I remember back then, and the only one engineer and work with me and the nintendo wii I did it's definitely not comprehensive, so we really haven't really under know the e2e, but for the test, the actual philosophy to really have the high reliability. You need the mixed right, so you need you need the fingertips that you need stress test. You also need the isolated environment for component tests.

F

I do think about the in the container runtime. We didn't keep our promise before that. We do think about. We are going to uh every single uh new container runtime and the new worship will actually have the uh suite of the cri test associated. Then we publish those things. It's kinda like the conform test. What we did for kubernetes right the seamless info signal that we feel of those kind of things I have to say we didn't change.

F

We didn't either even like a performance dashboard we after a couple of years, and we give up so this is where a lot of time we once a while we receive like a container runtime for the new worship and maybe there's the using more resource and unexpected results, and so so those kind of things we we definitely can be. It's not like that. We, uh we never did it's new effort, it's just like the previous effort. We already did that we did actively maintain those, not practical, even eye, update requests.

F

We just still maintain those things.

A

um Cryo and containity do both run um no d2es on every pr list um and I think that helps catch. A lot of stuff.

F

But we haven't uh last time I checked, we there's the new feature. We haven't proactively either any test, the the the so so the new worship. Basically it didn't capture problem even like we found the container runtime problem and there's the regression test right. That's the just in engineering practice we filled. I had the regression test there right. We did the hold on the node or, like the kubernetes level, when we have the refractory, then we found the problem. We either we're going, but we didn't because uh I think people think about container runtime.

F

It is is another component, but actually it's owned by. It means the cri and the container randomly for the cri. That interface is owned by signal, because this is our component and that's our responsibility. We can find some compatibility or uh issue or all those kind of things, but that's that's. I just want to point out because we do have the effort.

A

uh We definitely that's definitely something that should go on our list of things to improve as part of this uh clayton.

E

uh This is a this is a response, but um one of the things we were talking about reconciliation is um into a little bit to derrick's point. So I am, I am trying to clear out time to go and to document some of the the flows that were touched um specifically around places where the cubelet invariants weren't handled, because the control loops weren't accurate. The mission is a is a key part of that which is there's a number of just the emission. The emission life cycle does not resiliently handle changes to the cubelet.

E

Well so at least there is a a draft kept sitting in my um in my editor, that is at least halfway of trying to document some of those assumptions.

E

So that's something I'm trying to commit to get done uh and I will certainly take feedback from folks around some of the other things I heard mentioned, like better documentation, better visualization of the flows, an opportunity for us to describe invariants and where we should be doing reconciliation, but are not getting to a point where we can have some productive discussion on places in the cubelet that are sources of issues. So this is more of a side thing. It's just it is on my.

E

It is on my list of things that have not quite flushed to disk. Yet that sounds great.

A

uh I'm actually super helpful.

A

um I am excited to read it, um so I guess now we're starting to run out of time. We have 15 minutes left. uh Do we want to quickly sort of summarize the main areas we've identified and then prioritize what we actually want to do short term, because you know it's open source. We have limited amount of hours in the day, um an effort that we can actively do now um and sort of try and figure out what we want to do in the next release cycle versus in the next couple.

A

I will take that sounds good to everyone, um so if I'm remembering correctly- which it's getting late for me, so maybe I'm not, uh but we've basically identified the sort of few areas that I kind of thought we would, uh which is to say we want to document what exists today through our tests, uh but we also want to improve the testing that we have that has been under invested in.

A

uh So some of that is things like cry tests and sort of like some of the contract testing around how we interact with cris, um with a hopeful future case of potentially introducing some failure. Testing that, uh like you, know like a grpc proxy thing that will just like randomly sleep or whatever, like we can figure something out. um Testing is always a bit dirty, but we'll figure it out um and then, um as like reviewers and approvers of parts of the kubelet start requiring that as we fix bugs and everything else in the future.

A

We actually include testing coverage for them.

A

um Does that make sense, as like the majority of what we were talking about actually being able to do now,.

A

Mike is nothing has had gone direct. Did I miss anything from each of you.

B

No, the one thing I was trying to think about how to uh ask is: there are definitely features in the cubelet. That um are what I would classify as.

B

Cluster workload, specific um cluster to a workload, specific marriage of like desired outcomes, and those are often the ones that are not as extensively tested as we'd like and they might use certain exotic uh or not. You know the 95 percent case of like resources.

B

One thing I was curious about is: if anyone has found um reliability issues in those areas that we want to bring forward that, like maybe none of us are aware of so like if someone wanted to say like um cpu isolation with the particular uh variant is just not working for me or I'm. Seeing uh incorrect results uh with this particular workload or entire features like huge pages with a particular page size is not working with me.

B

A

Aware of any of those.

B

I'm just curious if there's any any of anything in that realm or dimension, that um we should focus on or maybe like cubelet plus a particular device, um we're seeing issues with just so that maybe we can step back and ask: is there anything we can do as a broader community to you know, get more attention to that area?

B

um Does anyone have any concerns or awareness of those things I mean as a vendor? I know we're using those in particular areas, but, uh and so I only see what we see- I don't know what others are seeing.

A

The only one I've seen reported on um nice kubernetes recently is the one where c groups v2 when using c group of s, um something failed. Anything c advisor, uh but I think david has his hand up and worked on that.

G

Yeah I mean I don't want to slowly differently, just like as a vendor. Also like. uh I think one of the biggest issues we see is like disk related, so like slow, iops, low disks, causing all types of issues from container run time to kubla to other things. um Those are hard to come back because you know we don't have full control over the disk, but I think.

F

There's some things we can do like.

G

We've been looking at uh different, io schedulers, for example, and saying: okay: can we prioritize kubelet and container runtime over user workload? So if a user workload suddenly schedules uses of all the iops, we want to reserve some for for the system and that's something we're experimenting with, but perhaps doing more of the community and figuring out kind of best practices around stuff. We can do to provide more isolation to kublaid and container on time. Even if there's I have pressure and disk pressure or something I think that would be valuable.

A

I think that one would be super valuable in combination with like documenting the like permutations of stuff. We support um with you know like: does the kubrick even require like explicitly a given kernel version like not really like? If you have the features it kind of work, maybe, but we don't document exactly what we need and when and where um and like um you know, moving forward with c groups v2.

A

Do we want to keep supporting running outside of systemd like I'm, not necessarily saying we don't, but like it's probably a good time to have those conversations.

F

um An earlier signal, though,.

D

F

We have even quality of the services that we have. Underneath we have the wounds go justified. Then we did some tests to make sure and among others, the jobs is the workload. The job is being killed at the knees the combinator can do something.

F

So that's those kind of things we still could, but so, for example, we have the cuba's reserve system reserve, we could reserve bigger things and then try to see uh how effective we can avoid of the cardinal seeds wound, because that's really a big loss and right. So the kubernetes we reserve the same of the things mixed kubernetes could do something.

F

So you could adjust those kind of value and say say: can we really kubernetes if customer config kubernetes properly and the company can take action and uh to do the um handling, so those tasks actually can help once we switch to the sql version 2 and we have the better uh username of the woman handling, which is in the book. We propose that the user space will handle because we think what we can do better job. If we we protect the load, the agent correctly, all right.

F

So those kind of reserving not for kernel threats, we never enough for this opportunity of the user space agent and then what we can do. I I don't think we we put a match so when I think about those e3 tests. Basically, I do think about that to generate a new idea, new feature: it's not that new, because we already did in the past, but still, how will I make that is generic enough for food benefit open source community?

F

That's is much harder requests because of the the community user. Actually is variety right, so we have to provide more generic solution, so those kind of things the I also want to uh back to the what direct early ask. We do have customer cases once a while ask a cpu manager, maybe not work as expected, but I think that a lot of things customers do other way and work around the problem, because um so those feedback might be not really bubble.

F

Report back to the open source, because a lot of solution next to what we are trying to enhance of the cpu management and but the problem is customer cases sometimes is quite a customer mass. And so I think the open source uh solution actually is more generic. Because of customer after that, when they understand the cpu management uh and they basically do the some customer master solution to work around those problems.

B

Yeah and then maybe some other eliminating uh examples um where we could publish expected variants. um We've encountered node reliability issues in the face of um exact liveness or readiness probes, where there's almost an there's, a misunderstanding with respect to how much memory an exec probe actually consumes from some of the underlying issues.

B

Like, I think, run c, you couldn't launch something with a greater less than a particular megabyte count, for example, um and so we've seen systems where uh probe usage uh basically becomes like the dominant cpu and memory consumer on the node and that's a surprise, and then that's also a surprise, because we haven't actually done a perfect job. I think we probably vary by runtimes now, who gets charged for a probe? Does the workload get charged for cpu consumption, or does it go under the system kind of charge?

B

um Similar things with respect to what can be the appearance of reliability would be.

B

Is an image pool confined to the pod sandbox? uh That is trying to use that uh image, or is it the cpu associated with pulling that image charged? The note as a whole, um we've done things as a vendor to try to.

B

Normalize those things for scenarios where that makes sense to normalize and in other scenarios where it doesn't like people, can make different choices.

B

But um a lot of those things in my experience are like critical to actually keeping stuff reliable, um and uh maybe we can do a better job of getting uniform uh best practice on some of that, but um particularly with like uh cubelet is blaine when noisy neighbors cause problems uh and rarely the workload is blamed, and so the more I know myself uh with my red hat hat on I've, been able to focus the problem on the workload and away from the cube with the better.

B

I guess, but I'm curious as well as in the next discussion, if we feel like there's other things, folks have done to better focus that.

B

Because it all kind of goes into the reliability bucket, I guess.

B

Trying to think of other examples.

A

Cool it sounds like we have a big bucket of work. We now know we need to do an agree. That needs to be done, which is progress um uh when the recording for this is available I'll, go through and write up, notes and start filing a couple of issues of things we want to think about, and then, um hopefully, by next week's signal, meaning we can sort of prioritize those a little bit as if they would catch.

A

A

Derek is this something you want to add.

B

A

Just saying it's very.

B

Cool this is good to see like um um plus one.

A

um So if everybody does anybody have anything to add before we call it done.

D

No, in which case everyone gets oh, I was just working here, but I heard the the one issue this is near and dear to my heart, getting mentioned without without even posting it. So the liveness and readiness probes exact you spawn a hundred probes that do liveness and readiness checks, even if they're just doing you know we'll do it.

D

C

We need to honestly guys we need to change the architecture a little bit.

B

We need this in the container.

C

The ability to do a probe.

B

C

B

For that one mike, I recommend catching up with some other folks. That is actually a fundamental run c issue right and you run into the limitations of golang. In our experience on this so um yeah, I meant actually.

C

Doing an exact for each pro is it: it creates a new process and the stream and logs, and all that.

B

Yeah yeah, so we've I mean: we've definitely done things to um mitigate some of that.

F

But it's not at the cubit.

B

Level as much as where I was trying to call out and that the key.

C

B

Gets blamed yeah, it's a coup's detail, missing parameters and our implementation detail.

C

A

Well sounds like everyone's getting a couple of minutes back so thanks. Everyone thanks.