Kubernetes Batch Working Group Weekly, 8 Dec 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes WG Batch Weekly Meeting 20221208

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

Okay, um hello. Everyone today is December 8th, 2022. Welcome to today's edition of kubernetes batch working group, um I'm going to be moderating your meeting today, I'm Swati Sega, I, work for red hat and uh I would like to let everyone know that this meeting is being recorded and please keep in mind cncf code of conduct. When you engage in this meeting, we have two items today. um The first one is from Wilfred what unicorn, scheduler Wilfred uh feel free to start off.

B

Okay, good morning, I'm Wilfred I work for Cloudera and um I'm. The the tech lead on the Apache unicorn project and I was asked by some of the the boy group people to present on what we do around our gang scheduling, the All or Nothing stuff that we have uh set up. So I've uh I've created a short presentation. So if I'm allowed to share my screen and.

A

B

Should have host access.

A

B

Okay, thank you.

B

I'll uh I've prepared a short presentation around what we do and how we've implemented uh game, scheduling and, and some of the reasons behind what we, what we did or I'll talk about um and compare that in in a number of ways to what's what's available, currently uh kubernetes so yeah, um beginning scheduling. So if we look at what game scheduling does um I've spoken for this before um in May at kubernetes, um the cubecon in Europe?

B

um We looked at at gang scheduling and what we did is uh we said we we need something or some way that we can um can tell or can schedule an application based on a larger request and then one part or or a multi-pot request, so what we do and- and we we look at it specifically from from a unicorn perspective.

B

We said we want to introduce that kind of scheduling based on an application object or an application, um aware kind of a scheduling, an application within where the Unicorn is a loose um definition of of a set of uh of bots, and that is purely based on the fact that there is an annotation on the part. That is an application ID. So it doesn't have to be a set of five of the same Bots or uh it doesn't even have to be parts that get submitted all together.

B

It could be one part of one type and 10 parts of another, but it could be 10 different types of pots. Also, so that's That's the basis of what we, what we did. uh The other thing that we wanted to do is we wanted to be uh quota aware So within unicorn.

B

We don't use the kubernetes namespace quotas, we have got quotas, the set up in a uh Q system, and that is purely hierarchical, so we've got multiple queues within the Unicorn scheduler and within that hierarchy we set up a quota and that quota uh runs for that specific Queue at the root, and then we can have children of that route and Children of the children, etc, etc. So that getting scheduling that we would set up had to fit in with that quota system that we do.

B

It means that we we do not enforce quotas when you submit an application or a bot, and we only admit- and we only enforce, that we and we- we used a quota when things start running, so that that that's the the two points where we we said: okay, that that needs to fit in without getting scheduling.

B

um Other goals that we said we said we need to be usable with with all kinds of workloads.

B

um We schedule a lot of spark workloads as batch work routes, but we've also got uh workloads or services that we want to schedule or there's people, that's uh that start up biking jobs and it needs to fit in with all these kinds of different workloads. So we don't want to pleasecribe and say you need to do a job or needs to be a demon said or whatever now any type of workload, but people could create needs to be support.

B

um We also had a a view of we want to have a minimal impact on the on the submission, so that means whatever we need to do. It needs to fit in with existing uh applications like spark and other objects and other ways of submitting things.

B

So the third point was crossed throughout the scaler, um whatever we do, it has to fit in with the cluster Auto scaler, without making any changes to the grass around the scale itself.

B

The fourth point that we said: okay, we want to be flexible.

B

We want to be flexible in the way that we Define what a gang is, but we also want to be flexible in the way that the parts or the the bits of the gang are used, and that means that we want to be able to say if somebody wants to Divine ganks on an application or one game for an application or 10 games on that same application.

B

We need to be able to do that, so we have to be able to be flexible, Define that and work with them, and because we want to support multiple kubernetes versions.

B

um Unicorn runs as a schedule against a wide spread of kubernetes versions. We said we don't want to introduce new kubernetes objects or an API, because that limits us for when we can start using it and which versions The end users can run as a kubernetes release So.

B

Based on all of that, we started working on um on our game set up and we wrote a design dock, it's available on the Unicorn website, and we have been running with this gang Desiring for I, think about a year and a half now and we're running it within production and it's being used by a number of launches uh spark uses in production. It's mainly spawn users that are that are using this.

B

Let's first take a step back and then we'll take Sparks and as a as an example. So what happens when we we start scheduling, uh spark application and and how those others that work with either unicorn or with the default schedule. So from from a spark application point of view, when we submit the spark application we create the driver bot the driver Port gets scheduled and after the driver, ports gets scheduled, the driver bot itself will create a number of executors.

B

So when we submit the the spark application, we don't know exactly what the whole demand will look like. We don't know beforehand if that submission will create one two three or maybe even a couple of thousand executors. So that's that's the start that we've that we've got a bit difficult.

B

So it also means that the driver does not define at that point in time how many executors it will use over the lifetime of the spark applications run yeah every driver that creates a number of executors and every executed he supports and it will just get scheduled one by one up until it's needed and and when it finishes. That's the end of the spark application, the number of executors and the configuration of the executes is included within the driver, and so that's the part.

B

That's the area that we know how much we do and what you know what we've got, but it's not known at the kubernetes level up until the driver gets started and then the departs get created. What the executable Bots look like.

B

That's a a problem with predefining. If we, if we need to predefine things that we that we run with the games also, you know right.

B

So what if we? What have we done? uh Let's first have a look at the gang specification, so using games in in, in the whole gang scheduling. What we've said is when we, when we use gang scheduling, we specify the the Gang done or All or Nothing, set up on the first pot that gets submitted. So in the spark case that could be um the driver or if you use anything else, we've done a little bit just kubernetes jobs.

B

You specify the the gang information on the first book that gets submitted and that's where we read it from it's purely a simple annotation under the the Unicorn namespace and The annotation consists of a number of elements that we use so like I said before we can Define multiple sets of of of of bots, and that could be um just one type, food everything that runs within the application. But we can also have four or five or six different groups in there, so distinguishing them by name.

B

We set up a number of members that we want to create or that we're expecting to be used a little bit more about that in a second and we specified the amount of resources, and we also allow you to specify a set of non-selective, tolerations and affinities for that specific group.

B

Two um in the first design in the first setup that we had, we did not have the node selectors to relations and amenities included in it and later on, when we, when we started looking at the more advanced use cases, we saw that we had to get that in there to to allow um proper placement to just work. Also based on the number of members that we require in the number of you of groups that we've got, we create placeholders on kubernetes.

B

So as soon as we start scheduling this application, we say we want to create enough placeholders on the system to take up all the quota and make sure that it fits within the in in the running system before the real pots appear.

B

So if we go back to the spark uh kind of a setup, the only thing that gets submitted is the driver, so we don't know the the number of executors or what what will be requested. You know if we don't know that we can't reserve the space. So we can't schedule this again, but what we do know is: if this annotation is there and we've got the game definition we can create um whatever we want based on these gang definitions.

B

So what we do is we create placeholders, just simple poor spots and the the pots that get created use this as a bot spec. So we know now that for this task group we use 100, Milli, CPUs, 50, Mega memory with all the nodes selected and everything like that.

B

All these placeholder pots will get scheduled and we put all these live spots on the system and we're going to say we start scheduling those spots. We hold back the other Bots, we'll dive into that a little in a in a little bit. So beside the bot specifications for the the placeholders. We also have got a policy specification again on the first application.

B

What we have not just these status groups that we specified, but we also have um a policy that we allow you to specify, because what we want to do is we want to be able to say, do not Reserve this, this placeholder pots that we've got forever time them out, because if, if we haven't used them or there's something else going on, we don't want them to sit there forever, and we just want to make sure that we can time them out and that's something that the customer knows. They know the behavior of the application.

B

So we allow you to set a timeout.

B

The timeout is the the number of seconds that those placeholder pots that we create based on these pod specs can exist on the system after being scheduled.

B

We also have a game scheduling style, because what we, what I need to was uh customers asking us saying Okay um we want to be able to. If you can't give me all the placeholder pots and- and it doesn't fit within the system, then I'm happy for the application and for for the thing to still run, but just as a normal scheduling um cycle, so no placeholders.

B

We just want to do then put by port and we'll see what we could do. So we've got two mechanisms: a hard and a soft one. The hard one was like if I can't get all the Bots defined in my game, I failed. The application go on, that's the real That's, the basis that would that we started with and say, give me 10 pots of this size and then five parts of the other size.

B

If I don't get all these 15 ports within a certain time, I failed the application, and that was that was the start of the startup for uh what we did. But people then said ah now. If we, if we only get some of these spots, I still want to have the application run, but without any of the the gang style in involved. So we do a soft system.

B

We try to get all the placeholders, but then, if that plays all the time out expires and we haven't got all the plays holders by that end of time, we don't fail the application, but we say let's just continue and try to schedule without doing any of the heart of the the game, scheduling, stuff.

B

Then the second part is using the gangster's specification, so we've set it up, we've got it on the first board, we've defined all our task groups, but now we need to be able to say use these, these placeholders, that we've got um that we've created and then use them during the the scheduling cycle, and since we don't have the official pots at startup, yet we don't have to have them. We might have them. We might not.

B

We need to be able to Define that a port that is part of the application also uses one of those placeholder pots and uh is a member of that task. Loop. So.

A

Sorry, just a quick time check, we are 22 minutes in and we have another agenda item as well. Do you need.

B

The long time I've got about three three to four minutes: left: okay as good as done here. So thank you, the next step. The next step is. We want to use this game specification so on every single pot that we have we create, or we we put another annotation on which defines for us the which member we use and we we use the the placeholders. So we've got some checks and balances on that.

B

um We've got an uh an opt-in, it's a kind of a scheduling setup. So if you don't use all the placeholders we clean up, if you use more, we treat them as normal parts and we we go through. What we also saw is that people sometimes have got different requests in the setup, then that they have on the little pods.

B

So all that the difference is being accounted for during the uh the scheduling cycle, the scheduling cycle looks a bit like we hold the port, we create placeholders, we replace the placeholders or you release them in the end. If it's not done a real quick overview of what that looks like in the game for the spark set up, we've got a driver. We specify the gang as one driver, three executors. That means that we create these four placeholder pods on the system.

B

These four plays all the Bots get scheduled by the scheduler, and at that point we release the original Drive report that was created and that original driver Port will start creating the executors that are now created. So we've replaced the the driver placeholder and we schedule the the executors that now come into play one by one and we replace the placeholders that are there and in the end, we've got the whole application up and running.

B

So from the moment that we create the placeholders, we are using the quotas in the queue and in the system and if that would not completely fit so, if you ask, for let's say one drive OneDrive as we executed, but it's more than your quota that's available, then we reject the application already for you. So again, we've got some checks and balances around all of these, but this is a flexible way that we use to build and um schedule gangs around spark or any other application that you want to do.

B

Let's um that's a quick overview, there's multiple examples, multiple ways of doing things and all the all the documentation. Everything like I said in the first slide, is on the unicorn website. Things thank.

A

You thank you very much for for the presentation. I really appreciate if you could add a link to the agenda with your presentation. It'll be great for.

C

The references.

A

Later, um there's a question from Abhishek there's a question from Abdullah as well, but I'll start from a question from Abhishek who's asking: do we lose the guarantees of running the app when quota is evaluated later and is there a scenario for a partial job or a part start.

B

Yes, so we we do have um so there's two things um if the whole gang doesn't fit within the quota. We reject the the request and we say sorry, you can't do this. This is out of quota or too large for the quota that you've got if the request fits within your quota. But at the point that you want to run it there's not enough quota available, then, depending on the the heart or the soft uh setup, that we've got. If it's a hard setup for the game, we fail the application.

B

If it's soft, then we try to schedule within the quota that's available. So, yes, we've got that those two different um kind of of ways of scheduling things, and um especially that the the last bit, if there's not enough, quote available, and we still go on and schedule that came from requests of the the users that we we had around this this option. So they really said I do not want to fail. The application I just want to try to schedule within what is available at that point in time.

A

Perfect, thank you, um Abdullah um feel free to ask your question.

C

Yeah, sorry, can you please go back to the the gang definition on the first part, so the this works well like for um spark where you have the driver first created, and so you can put that in the driver, but like for a V1 job like you have to put it in all parts right of that job.

B

C

B

Correct yes, yep in in that case, when we, when we do create uh jobs, we just put it um on the on every single pot that is there and depending on which part we see first, because you can't even rely on the fact that um when you create a job spec uh things flow through the the job controller, and um we even see that with when we create placeholders um based on what we see what we do here, we we create um pots in a certain order, but going through all the event, processing and everything that happens in the background.

B

We get them back in a completely different order, so yeah we, we can't rely on um some kind of ordering anywhere uh in that system. So, yes, if you do a job, you specify this on every single, uh every single pot, because we don't know which one we're going to see.

C

First and the placeholder parts are actual like you, you create them after you check the quota uh for the gang or before yes,.

B

C

B

A

C

Create them they are actual Parts like they're, not like uh Parts. They create in memory in your scheduler. No.

B

No, no, no, no.

C

Okay, yes, okay and.

D

C

You create them and then they go through the normal scheduling process and then the actual.

E

C

They need to schedule in their place. uh You somehow match them using I guess the second slide uh correct.

B

Yes, so we we've got the names and, based on that name and the the match that we've got foreign.

B

Because we could have five or six different kinds of uh of task groups within that gang, so even if you submit an application um and you later on, submit a pot under that application, it could be part of the example group, but it could also be part of uh in the spark we do that, often with drives and executors.

B

It could be part of the driver, group or part of the executed group or you could, even if you want to say oh no. This is something that I want to run outside of this guarantee and you create a pod without any annotation, and then it just get scheduled as one outside of that guarantee that you've created with the game right.

C

I see two limitations here so with this with this approach, so the first one is that, like with spark, you always have to have that task group name, so you don't know like it, and so basically you are predefining.

C

What is the maximum that they can create right? So.

B

It's not no, so we will We allow you to go over the, so we've got us in the spec. Here we say: we've got a minimum member and that we treat it as minimum member. So if you create um plots that are mem, um give the name example group as the uh as they supported, but I've already scheduled two and there's no placeholder left, then I just treat it as a normal pod, so we're flexible. You can go over that.

B

It's it's a minimum member, it's not a maximum member, so we are flexible because we need to be able to do that. If we, if you look at Dynamic executors in spark, you give it a minimum number of executes in spark, and that doesn't mean that it uses that number it could use more, but it could also use less. So we allow you to do both. So that's that checks and balances with that opt-in. If you use less parts that specif that specify the specific dialogue we are, we have got placeholders left over.

B

We clean them up after the timeout if you use more of them. So if you submit another Port that has got a task group name but there's no placeholder left, then we just treat it as a normal pod and schedule it as a normal part, and we don't do any of the replacement stuff right.

C

The the the the problem, uh one problem, because this is something similar to what uh we've been discussing with like general idea of reservation, so one thing that is going to be complicated. It's like basically you're trying to create the Pod spec in The, annotation and I mean I mean, like you already have this problem with.

C

Oh, you had to add, not affinity, and then you had to add tense and Toleration so and then validation, Etc, so I would caution that this I don't know how this will like evolve in the future in your in your case. um But you can't always scale with like running away with keep pushing things into annotations. Annotations are not really an API, but the bigger problem. I see is that you have cases where, for example, Dynamic resources being allocated like PVCs on the Fly using stateful sets right.

A

C

Status controller creates the PVC on the fly when they create the part that you wouldn't know when you create deployments hold that part, so you wouldn't create a PVC for that part, so you wouldn't be able to schedule them at all, so you wouldn't be able to provision storage for the future positive of the application. Correct.

B

And correct so we will leave. We leave that up to the uh to the real Port that gets created so but.

C

If you already had that place, placeholder pod, scheduled in a specific place, you don't have a guarantee that the PVC will actually schedule, because it has some Sometimes, some node affinity.

B

C

It needs to be placed on a different node and, and so that's right.

B

We handle that that we do that we handle that with uh with our scheduling internally. So if it turns out that that doesn't fit, uh then we we use different nodes, and we move on for that. So yes, we do that. We we already do that in uh in our Uh current set, because we've already noticed that sometimes uh when people specify all of this information in the um in the dance group, they don't set up any nodes, selectors or tolerations or affinities, and then later, when the real pot comes in. They do have that.

B

So we already see that it's not just for PVCs, it already happens and we handle that during our scheduling cycle um that that is done, so um placeholder might run on one note, and then the real pod gets scheduled on a different node uh later on. So we do that, that's handled uh we we saw that happen before so, yes, um and that is not just for PVCs, but it's it's with everything.

B

um People are not always consistent, they say oh I just want to do. The minimum I want to make sure that it runs within the queue, so I only specify Min memory, I mean resources, do not specify any of the other things and then later on that gets uh gets on. We even saw people that said.

B

Oh, my placeholders are one CPU and one gig of memory and then, when the real Port comes along, they ask for one CPU, but for one and a half gig of memory and in certain circumstances that doesn't fit on the Node at that point in time. So again you need to have fallback mechanisms during the scheduling, and that is all built into the the schedule it handles all these edge cases with differences between realport and the um and the the placeholder that's been created all that stuff. That's that's all handled.

C

Okay, I'll read about on that like how do you hand it fall back and probably in.

A

C

Design document I don't want to take the whole time.

A

All right, um thank you. Angela thanks Wilfred, the presentation was really interesting and the slides as well um I would really like you to continue talking on this, but we have another agenda item so I'd like to move to that um thanks again, um so we have Kevin here and he wants to discuss about pending pods you have an over to you.

E

Yeah, thank you, uh I'll go ahead and oh can I get uh permission to share.

A

Yeah, you should have uh okay snow.

E

Thank you. So uh this kind of came from a discussion I had on slack about uh I, was looking at the the retryable job uh idea for I was trying to use that in Armada and I realized that a lot of the cases we found in the Armada projects were actually around handling uh pending pot issues like uh invalid image names a secret.

E

That is, that you mounting a secret to a volume of a lot of other ones, and so I wanted to try and see if there was a way to kind of have that retryable job idea for uh for pending pods, to try to say, like my end goal, would ideally be the job API can or the job a batch job. Keep your dicks whether or not a like a the job is stuck in pending due to configuration error.

E

Can it transition to failed so that, like a main use case for this, is that for if people are scheduling, large amount of jobs and batch users are usually not kubernetes experts, so they might have configurations like a image poll secrets that are invalid and a lot of the controllers do handle like this, like they might retry a job and Armada. We actually have we. We have a controller that reads the container status or the container reason, and also we. We also read the events to know whether or not uh jobs or pods are failing.

E

uh So I have kind of a potential uh idea that I want to run by this group. uh I'll skip to the bottom I kind of went through the the common examples that I found uh for uh what I found for configuration errors, and these are mostly just contrived examples. But uh so a lot I would say. There's like three grouping of cases that I've seen for pods going in the pending. One is uh configuration errors and those are usually well represented by uh the container status of waiting with a valid reason and uh yeah.

E

So like this one is, if you have all capitals in your image name: you'll get this kind of state from your container status notice. The conditions are all they're scheduled, they're, just not ready and they're ready, false and container is ready, false, which isn't really a valid condition to kind of code against, because there are business as usual cases where all these conditions are set.

E

uh uh Image pullback off is one that I think I, don't think we should I, don't think we'll be able to Target really because uh it's a kind of a business as usual case where I mean there's stuff like image poll Secrets. If you have an invalid image poll a secret, it will still go, get stuck in the image pull back off and also, if you have an image that doesn't exist, but I think it might be too difficult to try and predict all those cases.

E

So I, not I, don't know if we can handle that one with this case, but then uh so some others are like. If you have an image poll being never and you're trying to pull uh sorry image, pull policy is never and your container doesn't exist. You'll get an error image, never pull, but the container still sticks and pending.

E

uh If you have a missing config map, you'll get a reason that says create container configure. So these are all pretty well represented. uh The interesting one is actually, uh if you have a if you're, mounting a secret from if you're mounting a volume from a secret or other way around I mean have it mixed up. I noticed that this status actually just says reason is container creating and the only information you get.

E

That knows that it actually failed is in an event, so that is uh kind of tricky to uh I, mean I, know I, don't think we're supposed to be relying on events to understand whether or not a pod failed, but in this case you kind of have to uh and the other one that or so the there's also there's also the case. I didn't really cover here, which is, if you're, uh your pod can't get scheduled. uh I think that is covered by this condition.

E

Your pod scheduled is false, uh and this is a an interesting case where, if you have a missing volume for your container, uh your your pod will be stuck in pending, but it won't be able to get scheduled uh and so kind of. What my idea was. Is that I think they're?

E

A lot of these cases can be represented by a potential condition to add to the Pod API that sort of kind of reflects uh invalid image name some of these images, and then uh my hope would be that eventually we could uh use the in the retrieable job cap. I know they have an image or they have a retryable policy based off conditions. I would hope that maybe, if we have a pod condition that has like configuration errors, we could Force jobs to fail.

E

But uh this was at least kind of my uh proposal and I was hope. I was wondering what the this group feels about going forward with this, obviously, and not want to get into design now, but I know there's a cap in other stages, but I want to know. If the idea is good uh and I'll open the floor for questions.

D

um I can give some general ideas, I think um you you got it right that uh to to fit within the API that we already worked on.

D

We have to add a condition and I think in general, the less conditions, the less different conditions we need to add the better and I think I think you you're just proposing one and that's that's good, um but I think the other thing uh we need to think about is when or how would Parts transition to to failed because they are currently impending right uh and that I'm, not sure uh I'm, not sure about like well. Where should that responsibility fall like I? Guess one option is it falls within the job?

D

Another option is the fails. It fits Within um the cubelet, to fail eventually the pod, um so that one I'm not sure where is the best location or what what's the best component to to solve. As for the condition I think cubelet is probably the best, the one that has the best knowledge to to do that.

E

Yeah so you're saying uh yeah, I, think I think it would be very clear that the the condition will have to be added as part of in the cubelet code. uh I have I've done, some exploring where that would be, and I know that it's it's pretty I think I found the code where that is added in the cubelet to do that uh and uh yeah. So at least the condition uh the transitioning to fail, but I don't know uh how to start that conversation I.

C

Think this is a cubelet like I would explain that the cubelet would be doing that because you're talking about failing to start the part right and- and so you think, of the part as like you know, State machine like when it gets created. It's the API server that starts looking at it after it was created and persisted it's the schedule that picks it up moves it to a stage from unscaleable to schedulable and once it's assigned to a node I.

C

Think of it as like you know now is the responsibility of the cubelet, to you know: progress its status to running and so I guess. That's the point where you're trying to fit like you know make sure that it's included right, like in our retrievable that that it gets stuck in in starting.

C

You want to transition transition it to fail right now that.

E

Would be the the goal and I'm okay with as a alpha or first round just adding.

C

A condition sure.

E

C

Mean I'm just like trying to describe like how to think about this and web. The responsibility of you know which component is responsible for transitioning, that part to the state from pending to field right yeah, um so I would expect that cumulate should be doing that um and so I would I would present this topic to Sig node.

C

um Let us know like maybe I'm interested like, if, if you, if you want to bring this topic to Signal like one of the signaled meetings um yeah, please ping me like or or maybe post it on the working group uh slack channel, that this topic will be discussed there, okay and see what they have to say about it.

A

I have one question here uh so kind of on abdullah's comment about um who's responsible for identifying and maybe changing the state. I was thinking about like it depends on why the Pod has gone into a pending. State uh comes into play here as well.

A

So if resources that are there aren't enough resources say on the cluster and the Pod is pending, it is kind of because it hasn't found a node suitable to be placed I understand that volume, provisioning and and cases kind of uh are after the Pod has been scheduled on a node and node has taken ownership of that pod. But before that has happened, a pod could stay pending because there aren't enough resources, and in that case, cubelet probably wouldn't be the right place to transition that state from pending to failure.

A

So I was just thinking about that particular case. Yeah.

C

For that case, we already have the condition right, like the unscalable condition. The schedule already um does that now, I guess, the question is who should delete it is. Is that the idea like you want it to be deleted, or is it fine to continue to exist? There.

A

Yeah I think what what is being said or I've looked at a very high level, but the proposal says that we should transition for the pot from pending to failure. So it becomes evident that there's some error and we need to either take some action. Maybe provision more resources make sure that the volumes are available and that changes from case to case.

E

Yeah I think there's a lot of cases is what I was when I was going through. The examples like I know, the there's I didn't really have much examples of this unschedulable uh one, but I know that that condition is at least well represented, but there's also the yeah the configuration cases that I've seen happen a lot, and that is one that I like it is scheduled. It's just that it gets stuck in pending and the only way to take it out is you have to have an out of tree controller delete the pod.

E

So uh at least that is for this for I'd understand. There's a lot of there's a lot more cases than I've covered here.

D

I think that that's a good point that um you know adding the condition is something that needs to be like you lit, but this this cutter already does so that's not a problem, but deleting the Pod or you know, failing the pot could be done by an external component.

D

Maybe that's a good place to start just uh um you know prototypes uh to have an external component to the deletion and then uh have it as a proof of concept, because he unpair it with the job, the job failure policy API to see how uh how useful it is. uh I think that the major question is how we Define uh yeah, how we translate, or when do we transition to fail?

D

We need we need a timeout at the very least, but maybe timeout is not enough and if we think of, for example, unicorn that just presented the pot can be pending and in I don't know if it it would introduce an unescalable a condition, but uh if it does, it could be because there is no quota and you know it doesn't mean that the pot failed, uh but it will fail later so I, don't know, um I, don't know about that scenario. For example,.

E

Yeah uh I'll uh I guess I will I'll think about it, a little bit more and then I'll try to reach out to signode and see about. uh Maybe, although you and I can uh or I I guess, I can I do like the idea of the job controller, picking this out through the Pod failure policy, just as a a proof of concept, because I I think the problem will run into if cubelet. Is that the it's like we add it's like.

E

We need another like state in order to say like if there's a configuration and it failed because, like there's, there's I think for containers, there's waiting running and failed and like sort of that, it's a big, it might be a pretty large transition to add another like State transition. Here uh at least that's where I that's where I was coming at and then even from the issue. I think I posted that started.

E

This conversation there was a lot of I think there was more of a request to have this as an out of tree controller rather than in Cuba. So that was why I kind of went with at least the condition. That's a starting point and then starting a conversation about how we want this further down the road but I think we're over by a few minutes. uh So I want to.

A

Yeah, uh just one last comment on this: uh if you want to bring it to signor um and you plan to get this in for 127 uh signal is going to have their planning session for 127 next week. So I think it would probably be the right time and uh and kind of you bring it to signal and that gets incorporated or at least discussed in the next meeting, because if we don't do that, then it will move on to the following cycle.

E

Do I just add myself to an agenda on signode yeah.

A

Very much you just had your name put in an agenda item and we can discuss it. There.

E

A

Perfect. Thank you. Thank you. Thanks everyone. uh Sorry, it went over time. um I'll see you guys in two weeks, bye.