Kubernetes SIG Node, 6 Aug 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG Node 20190806

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

People topic today to cover and the next data from the first one, Eric I saw you here and maybe we can kick out to the our today's meeting database, how the overhead sure.

B

Great I had the issue and it's relatively straightforward. Basically, today, eviction has done it that pod granularity, but the first thing it's done is: is you would you know when you're checking requests versus actual utilization? The requests are the sum of what's done for each individual container and the actual utilization is done by the sum of the containers in the particular pod and those who are compared and then based on that you have better details of maybe that's a good candidate for eviction.

B

This doesn't take any sandbox over heads, so anything to deal with actually running a pod, which you know is not negligible, especially when you're using a runtime like a sandbox runtime.

B

So to me it seems like it definitely makes a lot of sense that we should be taking into account overhead both in the requests, since this is a part of pot overhead alpha in 1.16, but as well as when we're going through in doing the eviction, information and deciding you know, what's the actual utilization to me, it seems one is more accurate to do it at the five level and two, it seems quite a bit more straightforward, my least favorite code and qiblah.

B

It might be going through the list, pod stats, where it does a mix of CRI and c advisor, and all of this to sum up, each of the container requests. It seems like it'd, be quite a bit simpler if you just went ahead and looked at the Pazzi group. Having said that, I don't see anywhere in in and that we do this today. So I'm not sure if it's controversial from like a performance standpoint or a caching or anything like I don't want to just go ahead and submit a PR without kind of understanding.

B

So the legacy which dawn you clarified versus any performance implications and to get a bit of guidance on that.

A

I just update that the issue there you'll brother I, provided some lacks a region. Why we didn't do that because the giving the original QoS implementation it is the per container and also per resource. Then later we unified, is negative, become connect per container actually later become to prep outer and for all the resources so, but to then that time we want to have something civilian and also there's the production.

A

Also there's the next Irenaeus when we first a design container runtime interface, and there are something like that. Logging like there are certain things we want to charge to the part instant attracted to the demon and back those is and deterministic and those resource usage. So we are connect the team to really move to that direction. So analysis just like what you are working on powder over harder called or had it used to be the fixed amount and I'd miss the fall for the CPU and memory just make.

A

You also point out the network bandwidth usage is: is it's not a charged, but anyway that's not the first, the class of our resource today, so but that time even.

A

And I work out, we want to implement a different continent renderer and then we believe the different type of the container runtime overhead, it is will be very out here. So that's why we didn't do I'd end. The dnt goes to the powder and but I'd the signal. The meeting actually many times we did talk about, and once we finish all this work introduced, the part that I was it grow and and introduce once we unify all. Those kind of things include after Hector, more clear on the pod over.

A

How do we do could move to the major part? I was secure, go directly. We did have many discussing in there in the pasta. So now it is the back to the signal community. It is.

A

It is right now it is the rec time because I believe there are in the inertial in the production and if the people still don't have the policy group well.

B

Cubelet makes it anyway, so at least you know, the victor owns the policy in a way here, so it makes it even more.

A

Straightforward.

B

A

Is that we measure of the powder usage is not it's based on the powder than what you say, but but we major at risk due to the technician right. Some of those usages we are not. We are talk about it used to be talked about the director manager of the powder level. We do. We do calculate right, yeah.

C

Eviction is based on C groups from the or sorry metrics from the pod C group directly.

D

E

I thought we had made this change a while back David yeah.

F

I found it it's their King everywhere. It.

B

F

Life easier at.

E

Least, it is for resources that we can charge to the Pazzi group right, so I think you might see code in there that is related to disk management. That is potentially is summing.

C

Stats, so this so I sent you all right, pinged, one out into the chat. That is where we calculate the Pazzi group stuff, and then let me find the eviction code where we actually point out it.

C

Let me find it out, though: I'll click through your links.

C

Yeah I made this change release or two ago.

C

So in here's, where we actually use the pod level metrics did that copy correctly yeah. Well, maybe addiction, okay, yeah, so there's there's two links in there.

C

The first is where we calculate based on B, so the way C advisor works is you have to look up the quote unquote, container ID, which is actually just the last piece of the C group, which for docker containers is the container I need for pods, it's the pond UID with the prefix pod.

C

So we in the stats code we're actually querying for the pod UID from C advisor and if we find that that's what we use and then in eviction.

G

C

Actually, directly accessing the pod stat stop memory, we're not iterating over pod stats, dot containers like we used to.

C

Love you, okay, I.

B

Was looking at the sinc function in the addiction manager, you just cycle back well, if it's, if it's fixed, then everything is straight forward and quite easy. Then I thought that it was still going through list. Pod, CPU and memory stats, but I I think that I was wrong on that, so that this that'd be great. If that's not the case, that's.

A

Good Meester I I, so did we talk about so many times? We never really finished that part I.

C

Can find the PR way.

B

Cool thanks, David appreciate it. That makes it then I make a straightforward formula, but.

E

Just I think in general, if it's not doing that for memory, it's a bug, yeah I, think there's it's possible. There could be code in here. That is summing things just for non memory, resources and so depending on, where you've looked, you might have- maybe maybe there's some happening, but it's it should not be getting used for memory. Yeah.

E

B

Cool in the list, CRI stats I think it was saying like evictions, the only person who calls this so don't worry about it and that might have thrown me. I can clean that up. Yeah, yeah, okay, I.

A

Hope you would do what what do you shared.

H

A

Great thank you Eric previous here and the sensitive aid and the Terek. So, let's move to next topic: no, the shutdown handling handling yeah.

I

So a little bit of context around this cap, so certain contexts kubernetes, is in a bit overprotective around stateful parts. One of the examples is the case of a node that is shut down, so the currently the control plane cannot know if the cube layer is crashed or the node is totally shut down. A while ago, we added to the cloud provider interface, called the queries.

I

The cloud provider and returns even noticed shutdown, so this gap tries to make and leverage information from the cloud provider make certain assumptions to forcefully did delete the pods make them into another node, so the design is relying on node list object. So basically, whenever we see them, a node is shut down, we'll try to grab a lock, which is the no please so basically setting the holders field to the control loop that will forcefully evict.

I

The pots will do this to basically handle a case where the key can come up back and start the containers again, which will lead in some case, to some data logs. So the idea is to add a code in the control plane that grabs a lock and a code in the cube layer that, once the cube, let startup tries to update it leaves its list and if it succeeds, it means that no one is holding its list, meaning no eviction is going on.

I

If it fails, it means that someone is holding its list and some operation is going on. So the cube leg should not start the containers, so this is a little bit of a chill. They are of the cap and I wanted to discuss with you and see. If this is something that you envision and see for for the cube.

J

C

Was just gonna say, dawn and I had talked and I had a or a question came up during that which is.

G

C

Is for the case mostly when the qubit is starting up, it would be helpful to know what the behavior is. If, for example, the cubelet had already started containers and then was unable to acquire its lease running and can't acquire behavior so.

I

The idea is to make grabbing the LEAs mandatory for the cube. Let you start containers.

C

Just to start or further to be any running as in will the qubit does not start any new ones or will the cubelet if it can't acquire its lease, terminate all the containers on the node. It.

I

Terminates all the containers on the node x, the static parts and the ones that are tolerating this.

A

That's dangerous so so doing the curator last time and there's no shutdown, though anything is just something's wrong and Cuba needs, because today's Cuban a to her debate, they will cure ethically okay. If the there is the one time they cannot successful the nice and then all the running pod is going to be coming right. Yet, yes, I.

K

Think I misunderstood the cap. I thought this was specifically during cubelet startup.

K

D

My question yeah.

I

So on the well, the cap is centered more around an eventuality where the note is being shut down. So it's not really um focused on business. Startup, I.

K

Think, if that's the case, the the note folks might prefer a solution that has sort of less interactions, except in the case where we know that the notice shut down.

C

It just means that it's a very, very large hammer.

I

What the we we were trying to do it another way like relying on change and we implementing some semantics of locks around it. But if field like was error prawns, so that's why we took the way of using the least object, but the I liked, if we want to make this happen, we'd still like we'd still need to have some way to enforce at the node level. No container would start up.

I

If we have some sort of signal, the signal could be someone holding the lease like a change or whatever, but like we'd, still, some kind of synchronization between the control plane and the workers.

C

Actually, I have sort of a more general question, I'm, not very familiar with how leases work yeah. If the cubelet fails to acquire its lease once does that actually mean that it has lost its lease or just that it hasn't renewed? It does that. Does that question make sense.

I

My understanding is that each one each one to renew its lease, because each vari it will fail to update it. Basically, what will happen is a conflict. The object level.

C

So actually I found the some docs on it. So it looks like the qubit tries to update its lease.

C

We mean 12 seconds yeah and the least times out at 120 seconds. So should the behavior be that the cubelet, if it misses a heartbeat, doesn't stop all the pods, but only if it misses its heartbeat like an entire lease duration.

L

C

Other words like the cubelet should stop pods once we guarantee that it that's or once someone else couldn't have a choir. That's lease yeah.

E

I think that I'm a little anxious on this one so like if we were to do that, we have plenty of users that deploy workers that may lose separation to their control plane and they expect that the workloads running on those workloads will keep running even when not able to phone home. So this is just to the lease and the qiblah being out of to renew its lease like we have plenty of users that would expect the workloads to continue to run for the time being and I think that that would be rather problematic.

E

If we just said the cubelet shuts down things, if it can heartbeat I mean.

J

There's a lot of grace period, timeouts already defined that this kind of ignores.

I

So if the cubelet cannot acquire its lease at some point, they continue. The pods will be evicted.

E

The pods are only evicted at the Kuban, acknowledges that it's been deleted.

I

They for for the stage Philpott's yes, but for.

I

Before for state lives, but still I, think they'll still be evicted. When the pod eviction timeout is passed, it's.

E

Not true so I guess what I'm concerned about is that the design is assuming that there's always an I as cloud provider underneath kubernetes and that's not always true, and so.

E

You know if there is no under eye as underneath cube, the pods aren't actually removed until the qiblah wakes up and confront home and acknowledge the delete or the admin has figured out how to manually force that deletion to make it safe. But we can't just have the cube of delete things because it lost a little ease because there's plenty of environments that are running bare metal hosts that expect that workload to keep running. Yes,.

K

May I ask a question.

K

The the main purpose of this campus I understand it is to delete forced delete, pod objects from nodes that we know are shut down. So it seems like the most important parts could be implemented without adding any new evictions to the cubelet.

I

Yes, but it's well, the change that is required at the cube level is basically some coordination between the control plane and the workers.

K

But that is specifically in the case where the cubelet comes back after we have detected that a notice shut down. Yes, so if we were required, the least to be active for starting pods, but we did not evict pods on losing the lease. Would that be sufficient.

J

I

Think that, like I'm, not I'm, not strongly opposed to use something else, as as a signal but like to me, the most like the most important thing is that users are starting to notice this, so I'm I'm pretty flexible on taking another path. If you have another suggestions, I guess.

E

I'm, trying to figure out is the node shut down for maintenance or is the node shut down because it's gonna be going forever.

I

Well, it's the it depends on the cloud provider. Some cloud provider delete the node object if the failure is isn't recoverable and some other providers have a specific state that is modeled on the instance lifecycle to specify an instant that is failed, so it can be both.

E

Right and in the case of no cloud provider like we can't have the cable to any I. Think in.

I

Case of no cloud provider, the control loop won't be running, so no one will be acquiring trying acquired with the cubelets lease, so the behavior remains unchanged in case of bare-metal. So.

A

So can we fix either cook a special call instead put into cabinet this, which is the general solution for everybody, so machine.

I

So this this kept assumes running the CCM, the Cloud Controller manager, so the changes would be effective only when running on a cloud provider, but for like it. Even though it's something that is poor cloud provider, it still needs something on the cube left side.

M

Yeah and the cloud provider bits like we're requiring the cloud provider parts right now, because that is the only way we know of right now to deterministically decide if a node was intentionally shut down, because, like we mentioned before, like we can't evict pods if a node temporarily just lost a heartbeat for like a minute or something right. So the cloud provider check essentially gates that that control loop, where we start your victim, stateful pods.

M

Well, we could extend that behavior to I, guess like non cloud provider environments, if we can, if we, if we know how to do that, check for all like kubernetes clusters that don't have a cloud provider, it enabled.

E

Its my original understanding on this was proposed was that in a cluster with a provider enabled control plane, running controller would distinguish a node shutdown from a network separation because they had access to the underlying I estate and it would label the or taint the notice saying I'm shut down and then once it was shut down, I thought we were going to force evictions from that No and then just trying to work through the issue. Here, the when the note is restarted, it comes out of maintenance. Whatever was happening on that note.

E

Why do we need the cubelet to do anything because it was already shut down and the evictions occurred? The cubelet will restart and literally nothing should be running the.

I

Issue is if the cubelet restarts in the middle of the eviction, but.

E

The notice may.

A

E

Shutting down the notice been shut down, the ayahs or supporters shut down yeah.

I

E

Cubelet or the machine has restarted before you fully drained yes yeah, and why is that?

E

So don't you just have to taint sweat it shut down, and then it's drained, like Cheney's centrally known when it's been completed.

I

The check, basically we we know we know if it its terminated or not by like usually the idea is to when, when the holder is not of the lease is not the control loop, the eviction is done, but.

E

The whole issue is the notice mark shut down and then because it's shut down, you want a stable, SEC controller to be able to know that it can go, create a new.

M

E

M

The problem is that a down note is doesn't result in the note object being deleted so so running pods on the note aren't deleted. Then you have issues with volumes that are mounted on that node that don't get detached and put into another node. That's that and that's where the coordination on the couplet needs to happen.

N

It's impossible to do this in so we see that the knowledge China, which I'll cover.

N

A

A force release those warning attached, Ilana, no, sir, so there's the way to imprint an alcohol either.

N

Before we decide to change back so in a possible that which order, please do not waste our this note. For now we're going to market not.

C

N

N

Cubed because first we know the noise are done man, so the note is in the shadow instead an ally. At the same time you can chunk of either don't worry started. Not then we are sure that Cuba won't come there's. No, that's.

A

Other set of the.

A

It's more state of or how it's running on that I intended shutdown know if it is intended and I commend tennis all the things it it could be quiz for shut down, no to drink the node. All this kind of things. The problem is just other side and then know that just gone and so but it abroad like they're in the production, a lot of customer behavior like a panic, we don't know behavior, it is today it is know that will come up a lot of provider.

A

We can know the config in a panic and whatever things, and we want you to know that it should be come back. Everything still serve the majority of customer. If not the batch workload, because you, it is going to continue serve, continue finish the work, but if or the stateful side, we totally there's the problem. My question is the proposal right now it is just like a big hammer.

A

Maybe they know the only know any state of all part or maybe have the wonderful part, but we are actually just because these and a crowd these are temporary. We cannot get that one. We have to hold a hold of norm agency staff leaders, not just staff, and also the proper, so actually applied to entire. After note left circle like the even you are not a shut up, but we are pure agony trying to renew our lease and the kind of proposal is basic, connected.

A

Okay, no matter which time because we cannot tell right now it is the children in the first start doing the boot up time or it is just couponing to restart or whatever do you know whatever hyper, then we apply the same rule then we're basically not get okay like at any time this Cuban I cannot acquire or renew their lease and then all the part I don't know that will be terminated. Okay, if we couldn't imitate only at the boot time, we could talk about more I. Think right.

C

Either limit to boot time or is limited to just start anything. Yes,.

N

C

It didn't kill.

A

Them just prevented.

C

Anything from being started, the.

A

New staff being started yes well deal so.

J

So it's a question here so from a user experience, are you trying to guarantee that only one stateful set is running, or are you trying to guarantee that there isn't more than one that may be trying to write to the same data, because those are.

I

Two things.

J

I

The main concern here is to keep the same about safety guarantees that we have right now, meaning at most one replicas, basically not detaching the volumes. Why well they're mounted on to the bar.

J

Okay, but we just said was that only one of them can be reading around the data at one point in time and that's something that you could achieve using the storage implementation I mean the underlying storage can implement reservations there, and so, if you're worried about the data consistency, you can enforce it at that level and then eventually, one of them will eventually give up give up its lease or you could use a challenge algorithm to get rid of that and then move forward from there I mean that's been that's been a common practice for I, don't know 30 years or so at least in the scuzzy realm, and so I feel like the right.

J

Granularity needs to be around the object. You're trying to protect, not the note itself, yeah.

I

Well, this is good like if you put this aside, we'd still need a way to ensure that at most one replica is running and and I agree like we could limit it at boot time to make sure that nothing is going to start so we could conform. We could still reinforce pod safety if we limit this to startup.

K

Patrick, if I understood your suggestion for coordination, is that, like the the different the the accidental replicas of the writer pod, should try to acquire a lease on the PV object, yeah, which would suffer from the same network. Partition.

J

Objects, sir I'm not saying the PV object in kubernetes I'm saying, distort the backing storage underneath.

J

So I'm talking about like so you can use a scuzzy, persistent reservation. That's one option that works at the scuzzy level and that's got a hole. Yeah, there's a whole practice around that or in the absence of that you can still rely on the cloud provider or the storage plugin.

J

That's creating the lease is to to implement a mutex, that's specific to that storage type. Yes,.

K

So if, if it's cloud provider specific, then I think we would have to assume that some cloud providers don't have this level of guarantee. And then we don't know when we when it is safe to delete pods on a shutdown node.

J

Are you saying it's not let me flip that around are you concerned about deleting pods on shut down node, or are you concerned about starting one while another one is shutting down.

K

So as I understand it, you know a node shuts down and we start deleting pods stateful set will create the new pod, but due to a race condition, the shut down. Node comes back and starts recreating all the containers that it had before shut down, and then you end up with these conflicting writers, yeah.

J

But I'm saying when the stateful said it on the new node is coming up. The attach should fail if the other detach hasn't completed.

K

That is in theory, what's supposed to happen, I've, depending on the cloud provider we've run into trouble with that house.

E

Thing Patrick that would break the ordering guaranteed semantics around stateful. So that's where you have to allow for order to start up in order to shut down and we kind of I don't think at the end, that that has to be enforced at the actual stateful set controller layer and I. Don't think that we can get around that if.

K

We could add like if we could add to conformance that attach, has to fail in that scenario.

K

That might be sufficient.

M

If we, if we gated the cubelet coordination thing that, yes, you mentioned only if the copper provider set the shutdown taint, would that be like a reasonable first proof of concept, for this would.

C

Still have this name race condition that just using a taint would have like if we're gonna do that we might as well just you know.

M

What I mean is like when the when the cubelet starts up, and it has to do the least acquiring like it only starts that process? If it sees it ain't on itself already, because a cloud provider would have said it rather than having the cubelet always acquire the least, regardless of what could what happened before it started? Well, we always acquire the least because the leases are hardened.

A

Mean you talk about different.

M

Yeah that the lease mechanism that you've seen was talking about before. Oh.

I

We could also like limited to just start up and get it into like implement this mechanism only on startup. No, no, whenever the cube that fails to acquire its lease.

A

So then we need the company to roughly estimate that not it is in the startup period. It is just right after boot up and they have a couponing crash.

L

A

C

We need estimate that will it still be killing everything on the node if it starts up and can't acquire its lease, because if the cubelet restarted and had network problems or something.

A

C

Would essentially be the same as so.

A

The basic new you definitely have to churn so I'm, trying to this yeah yeah potential. Sure.

C

Did you get my question or if we start up and can't acquire at least, are we still going to kill everything or or.

I

C

Up anything so.

I

If we can like, if there's a network problem to ensure safety, we need to kill it, we need to kill everything.

O

E

Just don't so if I'm running a bare-metal environment today and my worker has lost connectivity to the control, plane and I've been running in this fashion for I, don't know twelve releases and we upgrade this and say hey your pods that used to keep running will no longer run because when the cubelet restarted, you didn't put the note shut down paint on Toleration. Aren't we kind of like breaking a backward incompatible they're like.

I

If you're running on bare metal these fun, this isn't this shouldn't be the behavior that is going to look.

E

But basically you're saying if the cubelet couldn't acquire its lease I should step down all workloads on that node and the cubelet in that case doesn't even know. Or are we saying only if the kid what is configured with an Associated cloud provider? Will the cubelet has the agree on startup? So what happens when the qubit is configured with an external cloud provider so.

I

When the cubelet is configured when, within a cloud provider, the idea is on startup to basically try to acquire the least and if we fail, we start the paths and, if we're running on bare metal, so you do not specify any cloud provider. This behavior shouldn't.

O

Exist, if basically.

E

We're saying the note the the behavior of the cubelet should not should shut down all workloads when it starts orca can't get. Its leaves is only present if cloud provider is equal external in the fullness of time. Yes,.

J

A

Even in on the to ke, GCE I don't want this behavior because we do have customer want their pod running after comeback or maybe the time we have the network petition because services, the type network a partition, could be only returned after control plan and worker, and but actually the service is still you have the access. So we don't want those kind of things just being forced to evicted or shadow, and also this is different controversial. Actually, we want a long term like the Cuban eight to even act, the ATF server it is died.

A

Cuban eight will continue services. Next, we are talking about the checkpoint and the eight the palace back on the node. So we could accumulate. You can serve as part of the cultural plan, even there's the series of the net or competition return terminator and the control plan. So this is kind controversial from what we are talking about.

I

Four or checkpointing and I think that the correct me if I'm wrong, but sick Gloucester life cycle, was one of the main use cases for implementing checkpointing.

E

I think that's independent, okay, I kinda wanna reinforce what Donna's saying here, which is to me. It's it's a non-starter that a qubit needs to stop workloads if it can't talk to the control plane like we have plenty of real-world applications in the world where people are running applications on worker notes, which might have own dedicated routers, which can access their content.

E

Even if you lose network connectivity to the control plan and like our real-world systems might come down, because a small network fall between worker and control plane, which has nothing to do with ingress to that worker from other parts of the world. So, like I'm, pretty strongly against the idea of the qiblah just shutting down things if it lost access to the control plane with such a broad hammer right.

E

If we can tighten the hammer, maybe I can feel differently, but even then I'm not sure and I'd have to reason through a little bit but and I say this entirely. Empathetic cuz I also get the counter customer complaints, which is. Why isn't my work or would be rescheduled because I want this PV to be attached to another node?

E

It's just I'm still of the mindset that we can't do it with such a such a huge rocket launcher, essentially like it needs to be far more targeted.

I

Would it be reasonable to like create a possibility to tolerate this kind of behavior? Like users can say, okay do not evict my pods, even though the node is shut down, I think.

E

You have to opt in rather than opt out.

E

That's possible we'd have to think through that, but it can't be opt out. Yeah.

I

C

One of the central problems here is that the node heartbeat lease can be triggered for reasons other than a note shutdown. So there's I think most of the concern is about the ability of this to accidentally trigger when we don't want it to so.

C

If you may get more support, I think if this is only intentionally done either through adding the ability for the cubelet to wait to see its taints before it starts things or some other lock, ish mechanism, but I think they like the accidental part of it is, is really hard for people to swallow here.

I

Okay, Oh try to try to rework it and see if we can like opt out or rather are they're opting for it and see if I can get it done. Thank you thank.

A

You and also police are also take a look at what Patrick suggests the earlier. We can think about is potential internet static, call the provider layer or it is a stored into storage layer. More specific in stand like that, like this one, okay, thank you. We can come back. Please come back if you have the alternative approach and adjust update to your cap. So, let's move to next one and the next one I think theme is that what you asked for for the code organization for Cuba I mean what they think. Oh it's.

I

I'm, the one that added a we're trying to move Cuba diem out of Cuba nearest communities and right now the and the node end-to-end tests imports the validation of comedian. So we were suggestion to move the validators of comedian into a separate Rico. Do you want to see with you guys? What do you think about it? If you had any strong opinion on this I.

A

Personally, any other people I'm, not they're,.

C

Gonna move some of the utils that no you test use either separate repo. That's.

N

Why we all look at you, yeah I, think the lender is doing it. So I think my yeah, some.

G

N

So the you tinia means the framework to run, has BN know if the.

C

You notice wheat imports command key Badman, a few chill system, I'm.

J

C

Keep admin code huh I'm, not sure what for do. You know what it's used for.

I

So I think it's mostly the docker validator that is used, but I didn't get through the all of the code. So there's definitely some parts that are used, the doctor, validator and maybe some other forms. So what there's two possibilities? I think one is to refactor the tests to not rely on these or the second one make it like: a library, shared library for validation and moving out of tree in two separate rhythm.

N

N

Remember initially, the code is in the a tweet has done and Cuban me in coordinate we test code and now it seems to be the variance direction they're worse than order, so that corners, inhuman window.

N

No need we task so, but they sing that, based on your issue, the note is hardly in cubed me and nobody tweaks encoding. That which is different from the original I mean whole structure, so I need to think about it. Okay,.

I

So I'll link to the issue on the agenda feel free to take a look. No.

A

Thanks Joanie's module next one and I think the demon you put. This one requires the reviews about the organization there.

G

Are two two of them quite easy once it's it just that it touches the code from signaled and we need approval for for those two case, then I notify one and the neck link, one the third one Derek already reviewed. So thanks direct and essentially, we have to make sure that the cross builds. It.

E

Is that that pass right before the call here? But so it's it's AG and you're good to go. Okay,.

G

Perfect yeah so just wanted to get somebody to take a look and do a quick approval for those two.

G

Thank you. Thanks.

A

Thanks Jenice Martinez copy, okay.

P

Okay, last week we discussed I think we stopped it. The recommended take some time to look at the proposal and, in particular, I wanted to focus on the discussion around the source of truth for resources that are allocated that cubelet has agreed to, and I have two links in that which analyzed the case when pods a single part and then multi pods, two plus pods, get updated during cube net restart.

P

So the solution potential solution here is to rely on the resource version to sequence, the to determine which update came first and honor that so that way we do a first-come, first-serve yeah, even during restart, and we can tolerate I think this was a concern that David had regarding how we handle. If restarts happens, something doesn't get hired of another one. So, if possible, could you guys please take a look at these two analysis and then see if there poke any holes in that?

P

If there is I met, I miss something and the point is to determine, make a decision on which one do we want a source of truth. I know there is I'm leaning towards the node local storage. Many people here who would not prefer that they would want to put it in the pod, spec and either case is fine for me. I just feel that no local storage is simpler and we are already doing that in a way when we come back up and determine what pods are running.

P

So this is a extension to what we are already doing.

P

Direct you get a chance to look into the updates. I know you're on vacation, I, I.

E

Have not had a chance, I mean okay, having just been acquired by a huge company. I had to be distracted by looking at some other things, so I will try to get it open this week, but I want to apologize and the prev I was able to go upstream. I tried to get the easy ones first and the hard ones I don't know. If I have the mental power at this exact, you know moment, and hopefully, next week, I can get back to it. Yeah.

P

I think we can I'm kind of working on this, mostly on my own time. I guess you guys probably already know a couple of few weeks back. We had major layoffs and we lost some resources. I'm game I'm being asked to focus away from this and focus on some other pressing things that I hear.

P

If we can resolve this and then I don't know how far it, how many issues we need to like drill down and identify to close this and get approved. This will help me. If it gets approved, then I can go back to management and say you know what I need budget I need many people, and this is something we are driving. So it's coming to that at this point because of.

E

E

P

P

P

Been going on for a long time and.

P

P

Okay, but the new design I think we've minimized the amount of changes that we need to do so. Hopefully the risk is very well contained with this it's, but it's definitely worth taking. I know this is a pretty invasive change and to commit a case at this point, so it's definitely worth taking a close look at every aspect of it and if we can get 80% conference or even something around that, then I'll feel good about going forward with implementation, getting resources for this and doing it.

P

If, if we think it's a feature that can be done, it's definitely a useful feature. It's just. We want to see that no, it doesn't break things. Of course,.

P

So David did you have any other comments regarding the source of truth. Any thoughts looking at analysis I know I posted the analysis for to play sports case, but this is something we can look into. The resource version is can be used for sequencing. We just need to fix the Kunitz caching of the resource version.

C

Yeah I'll have to think about that. That's not something we've used before and I'm not actually very familiar with how resource versions works. Oh and I had talked a little bit about it, and one thing we could also revisit is using the actual.

C

Requests and limits of the container based on what's currently running on the node, rather than what we were previously trying to set I know. I had tried to lean away from that, but.

C

It might be something we want. We want to revisit just to keep the initial implementation as simple as we can, especially in terms of API.

C

Ok, so in the interest of making small steps in the right direction, it might be useful to think about how we could accomplish this by just adding the actual resources into the status and seeing what what problems we run into with just that.

P

We will not I mean as then we're relying on status being being the source of truth that the API conventions- that's where we have a disconnect with that. That's. Why did that before? I just don't know if we should rely on it, given that it's quite clearly stated out stated in the APA convention stock that this can be lost and it should be regenerated or yeah I. Think the I.

C

Think what I'm saying is that we should I, don't think there are any rules necessarily around how the cubelet does admission when it restarts our API conventions for sure and I think the status is the status should be just an observation of the current state of the node, as in like pot status, not resources allocated should be purely based on what the container runtime says are.

C

The requests and limits of containers, so I'm, not suggesting changing that, but I am suggesting, is that it might be worth looking into an admission time when we're admitting pods that are already running doing so, based on the actual resources that have been allocated to containers rather than the desired ones in the spec. So.

C

Anyways I'll try and spend some more time to think about this Derrick and Dawn, especially if you have just high-level directional suggestions. I think that would help us a lot and we can work through some of the details.

E

Yeah I really want to support the feature. It could be a useful feature for a lot of our users, so I will try to do that. All right. Thank.

P

P

Okay, yeah: please take the time to review the comments and then overall cap in in the next case, I know that much of the discussion since the last update has been going on in the comments. So if there are any questions I can we can a piece of Linda Kay asked on the thread public thread and we can I can answer any specific questions. I know it's a lot to dig through.

G

Thanks everyone.