Kubernetes Batch Working Group Weekly, 13 Oct 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes WG Batch Weekly Meeting for 20221013

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

This meeting is being recorded. All right, uh I'll go ahead and start.

B

Let me let me just uh introduce yeah. Sorry just give me one second um hi everyone. So this is a batch working group um meeting. This is the October 13th, um so this meeting is recorded and will be uploaded to YouTube Just disclaimer and um please adhere to kubernetes code of conduct.

B

um Today we have uh two two topics on our agenda.

B

um So uh first one of them is like atna, uh sorry, um Kevin um again from G research, I, think you'll be introducing or presenting um his work on interactive jobs uh using Armada, but before that, I would like to uh announce uh or remind remind everybody of kubecon um it's coming up uh batch day on October 24th, which is not next week the week after that, and so I was uh also thinking that we could cancel uh the working group meeting on that week, because most of us will be in kubecon and we'll be and we're meeting there um and I hope.

B

Also during that meeting as well, uh you can see like magic, Diana and other people from the community. We can discuss the roadmap for the back working group if we have like a sketch of the document by that time, um uh it would be a great, like you know, opportunity um to discuss it face to face.

C

That sounds great.

B

Okay, so with that out of the way uh I'm gonna, add it to Kevin um Kevin. Do you want to present.

A

D

A

B

Make co-host I guess yes,.

A

Okay, all right so I have uh I, have a Google doc that I added to the uh the agenda, so I would say that Armada is kind of an inspiration for this project.

A

uh In soag research we are primarily responsible or there's a lot of stuff that goes on there, but I work on the open source division I'm, mostly on Armada, which is a multi-cluster uh kubernetes scheduler for running batch jobs, and we have a lot of users that like to do different things and one of one of the things I found they they tend to do. Is uh the ability for running Jupiter, notebooks uh onto Armada and then be able to.

A

uh You know, access a notebook while the job is running in their cluster and uh Armada. Does this kind of pretty they have a pod a service and Ingress? And our Armada controller basically starts all these together and then the user can use. The browser to you know, go to the notebook and then uh run whatever they want inside their python code. So I was trying to take a look at what Hugh was doing the batch working group and trying to do this using a job.

A

A service and Ingress and then I, think someone suggests the idea of creating a custom resource definition, and uh so I have a just a little proposal here that I'll briefly walk through and then I can give a quick demo. This will be a very short talk, but please feel free to ask questions if you have any uh I in a previous job.

A

I was working at the National Institutes of Health, and one of the one of the popular requests was actually a lot of times with researchers using Jupiter notebooks and also be able to run interactive visualization jobs or even uh when I was writing some parallel code. It was nice to be able to get a cluster to run a debugging tasks when you need a chance to debug something live, so this was always kind of a common use case.

A

I found for how to be able to interactively work with a job rather than just assuming it will finish and looking at logs and so in general yeah. The Jupiter popular.

B

Can you please make the window a bit bigger just like maximize your document or your browser yeah perfect thanks.

A

Okay, yeah, so uh in general, there's a lot of different use cases in HPC. It's actually I got I didn't list a few, but I only list a few, but at I know at NIH there's uh they have a huge section about actually accessing Jupiter and how they allow users to do that. And then you can look at some other options here of just like uh mostly use cases where people do allow users to use Jupiter inside of their research clusters uh and yeah.

A

So for Armada we, basically uh we when we submitted a quote-unquote job. It really is a pod, an Ingress in a service, and it goes to one of our kubernetes worker clusters and then uh the their worker cluster is responsible for starting the pods starting the Ingress in the service and tearing those down once the uh once. The job is complete uh and then use.

A

The Ingress is obviously used for act for exposing a URL to the the user and then uh yeah and then once the job is complete, it's removed uh all the objects are removed. So uh really why I'm here is I, wanted to talk a little I'll give a demo of what I did with this.

A

The the custom resource definition and then I would like to kind of maybe talk about next steps for integrating with Q, if possible, because I I'd like to be able to see how this would work with a natural uh with the queue as a queuing solution, uh uh yeah and so I'll just go to the demo or I'll have a little yeah, whatever it's pretty clear, I'll give a demo now.

A

uh So, basically, this is my first time creating a custom resource definition. So this was also a fun little project for me just to understand the cube Builder and how to generate all this stuff and then walk through some of this, but so what I kind of so my my specification for my crd is a uh a job template uh currently for this demo. I, don't really have the Ingress one working I'm using kind and I find trying to debug uh ingression kind.

A

It's never my favorite, so I I'm, just gonna, do a port forwarding for now and so I have a service and uh I've already created uh the interactive job.

A

So you can see I have this Jupiter and uh it it's actually running a job uh that is still running and then the Pod is there and then I have a service, so uh in general, I'll just uh support forward.

D

A

So uh it's pretty uh I, don't have it very clean or nice right now. So uh it is uh mostly going to show you the login for uh for Jupiter uh and then uh there's a token that you can read from the logs to get it working but I, just kind of wanted to show that now, like since I'm port, forwarding the service and this job is running, I'm able to I, have this notebook up and running uh and then uh yeah basically.

A

um What was I was gonna say so the the spec is just uh I have a job template which allows you to uh run the job and then for now, I just have a I hard coded to be a node Port service and it listens on the same port as the container and then uh part of the the controller responsibility is. It starts a job and a service and then once I remove the the Jupiter uh the interactive job. It removes other resources, so I can just.

A

So then, you can see here, service yep everything was deleted, uh so I guess this is I mean it's a pretty short demo, but uh so, like really, my main questions for this group are about uh like I guess what I wanted to show. This is because I I found that, with working with research groups, a lot of times batch jobs aren't a lot of times.

A

People do want uh some kind of networking, especially for like Jupiter and I notice, that a lot of Q is focused on the job, controller and I wanted to know about if it's possible, for integrating something like this into queue and how we could go about that and that's kind of the only the question I have uh really about it, for this is just talking about like what the steps would be for that and any questions you have.

B

I have two questions, um but I'll leave it first to the Community if there are other. If there are questions.

C

Yeah, so listen thank you Kevin for sharing this. This is really interesting, so it sounds like it's part of the requirements. If you want to be able to express holistically in some aspect, uh multiple resources, not just the job right, so that the whole all the set of resources, uh can be dispatched or removed from the system holistically. Exactly is that kind of the basic requirement for this workload.

A

Yeah pretty much okay.

C

And then also uh since these don't really run to completion which uh I think of as jobs and the general sense it's it's really a standing up of a pot and then there's explicit removal of these objects in the sense of when you do delete. You want to delete all the objects, uh but they don't ever run to completion. Is that correct.

A

Yeah my my thought, for that is a Timeout on the the queuing uh either as part of the queue or as part of the the job, but then I guess I need to see if the timeout would also uh Delete the the crd completely so I don't know. I haven't really thought about that, but that's at least kind of what we thought about is at least the Pod will be. uh It has some kind of active deadline that will terminate after a certain amount.

C

Got it yeah? Okay again, thank you.

D

Yeah so I I have a similar question, but I have another one as well, which is uh you demoed with the notebook but then actually, because you were talking about Ingress, but actually you just did put forward so in in principle. An interactive job should be just like layer. Four, it doesn't have to be like HTTP forward living right, like the Jupiter example implies Ingress, but for the generic case we probably are not talking about. Ingress is more more something else uh like exposing on layer for the port directly, not at the hdb layer.

D

Is this correct.

A

Yeah I mean in in Armada, we do have the Ingress stuff working I, just uh I'm using kind and I always find it uh troubling to get my Ingress stuff working from like from kubernetes local. So it.

D

Could be like a generic joke where you just uh either get a shell directly on the remote pod, somehow using a port, it doesn't have to go through through Ingress, so I was just thinking also looking forward to something like the Gateway API with where you have both layer four and there are seven possible. Okay, it's probably uh yeah.

A

D

A

Yeah I know that there will be another talk about that. I, don't know much about the Gateway API, but I'll take a look at that thanks for that, uh but yeah I think in general uh I would want it to be in, like the final solution would be working with an Ingress I. Just don't know exactly what kind of configuration like I would want for an Ingress at this point so for the demo I just kind of skipped to that.

A

uh In uh our motto we kind of have a like: we use the name of the port and then that's the host or sorry, the name of the Pod that creates the host and then we just have whatever the ingresses and Armada. That is what allows the users internally to view the job, that's what is done with Armada, but uh that would be the final solution. Obviously,.

D

B

um Any other questions.

B

So the the idea of queuing and it's mostly centered around Resource, Management, right and and we're talking about hard resources like things that are, um you know uh you have limited uh capacity of like CPU memory, Etc yeah, no, exposing a service is not necessarily A Hard resource, unless we think we have a limited number of ips that we want to manage, and so in my mind this is not necessarily something that is going to be the concern of the queuing operator like whether or not we create a service and expose it.

B

So, in a sense, your integration with Q is kind of orthogonal to uh the requirement that this job when it starts it needs to be fronted by a service, so that because you can always create the service from the beginning, it doesn't matter right. uh It's not taking up any resources, we're not trying to manage anything related to it, but once the job actually starts it will become responsive. So that's the only side effect, but um creating it beforehand after the drop starts Etc it.

B

It's not really the concern of queuing or dynamic quota management, um and so it is going to be really the concern of whatever, like you know, uh for example, in your case Jupiter like you, just call it like job operator that creates the service once the job starts, or maybe even creates it even before, and you have some sort of I don't know um uh already paying or whatever like that kid. Okay, once the service becomes responsible, it means that the job was scheduled, and now it's serving uh you know the the request.

B

So that's one thing that I think I would like to clarify um the other thing related to Finnish, um so I guess in a sense, who's going to finish the job. It's the developer or some other right. It's not like, say: okay, I looked out of Jupiter of this session. That means the job job has ended, and then one during interpreter gives that, like you, know ability basically okay I logged off and so that job should tell me yeah.

A

I think that is something that I I've been thinking about, and I I posted an issue on cue about uh having kind of I guess some unclear fully on the uh the nomenclature. But I was wondering for like I.

A

Oh sorry, I'm not as familiar with how this is handled in like an HPC uh Community, but I have seen uh typically when people have interactive jobs, they they have time limits on them uh and I, don't really I guess these can be handled by a user, creating like the job as an active deadline, but I didn't know if we should have like timeouts as part of a queue.

A

So that, like, like you, know, I imagine there'll, be like an interactive, cue or interactive job queue which, with has like a hard deadline of you, know 30 hours and then that at least forces the jobs to end, because I do think yeah. This could be a a permanent. uh If that you don't have any kind of time limit, you can have somebody, you know using a GPU for infinite amount of time for whatever they want to do, which is not a great case for sharing your resources. Among a group yeah.

B

I mean that's, that's like an interesting requirements basically like, uh and we would like uh we're discussing preemption and uh until, although we'll we'll talk about this in a minute, uh this can be sort of like as one form of preemption, but not there's, no new job. That's coming to preempt, it's just basically you're getting preempted because of a deadline. So that's an interesting uh uh like you know, um you know requirement it's not really related to interactive jobs. Right like it's. A genetic requirement around okay setting up deadlines around.

A

B

D

B

Thing that you might want to look at is like uh with job you can have.

B

um You know you can create index job where you can also associate it with a headless service, and that would expose um you know like a predeterministic IP address like a record for for each index, and so, if you create a one part job and create a headless service for it, um you're gonna get like a a stable uh DNS name to that part which is going to be the job name, Dash zero, which is like the index.

B

um So this could be a pattern that you might want to apply. Okay,.

A

Well, I'll take a look at that.

B

um But it's it's not necessary, like you can always yeah. uh If you have your own CRT, then at the end of the day, you can do whatever you want right like, uh but this could be another way of doing this without necessarily having a crd. Basically, every time you create the job, you just create a headless service for that job and that's it uh okay, but you need to enforce that these jobs are created as in these jobs.

B

um Okay, so this could be like a like a simpler approach.

A

Would I still need an Ingress for that, or would that be.

B

So yeah like from outside the cluster well.

A

B

Yeah I mean you need you need to do yeah you need to like I'm, not quite I. Don't quite remember if headless services.

D

You can still expose a service if you would make it like yeah.

E

D

Would be but I think the main issue with Ingress is again that it would imply http, and you probably don't want that because you might want some more Generations.

A

Yeah and then I guess Ricardo Deer uh previous point about the uh logging into a node I I was I was thinking about that one for a little bit, but I I figured it's a little uh at least it's kind of weird to be able to SSH into a container I think you have to actually add like SSH capability to the container and I didn't want to require that you have to modify containers in order to do that, but I guess that is possible.

A

At least that was my yeah.

D

I don't know if it makes sense, but it's a request that we get constantly. Okay,.

F

I'm also, maybe you want to look at um well if I just forgot the name now, um the Ingress is replaced by another API, not replace, but there is an alternative. Apiary Gateway, that's the word. Yes, okay, you might want to look into that, um but back to queuing, uh I wanted I wanted to mention that since you're using the job API, um if you just install install um queue on your cluster, those those um embedded jobs will already be queued.

F

So is, if you're fine having the service and the Ingress created from the beginning, then you're already integrated right.

F

If that's not the desire, then then that's where we need uh extra integration. So basically um you want to have your crd also have a suspend feel so whenever Q decide, that is ready. You can flip that flag so that that's kind of the mechanism.

A

um You're saying that uh I mean so right now, like sir, like you could already create the service and Ingress and then add your workload. The job suspended true and all that would probably work, as is the only issue, is cleaning up the job and service after the job is complete right, which is a manual process.

F

Not really I mean, if, if you delete the the crd, the job will be deleted, then the workload would be deleted so and that that would also work same way.

F

Okay, so it's it's more about whether you care about the cell is being created from the beginning, um so that that would be the difference, although you could make it um you could make it so that the crd doesn't create the service until the job is marked as suspend equals false, which could be your your um your gain weight for you control your control flag for for the other two objects. So that's a possibility too.

F

That would be a fairly easy integration, um but so that's all good and that all works when your job uh is already, you know a fixed size like you know, you know you, you just want one part throughout the entirety of the the the interaction.

F

But uh the question is what happens when if, if the in Notebook, you can create extra workers, sources, I, don't know if that's possible, uh I'm guessing it is possible. um Yeah.

A

I mean I, guess like the idea would be like for when I think about like with the normal HPC, uh like you could say: I want, you know four nodes and then or sorry you know four dedicated computers in order to run like an MPI job for debugging purposes within those four nodes. So I think that, like that would be a requirement down the line for this is that you would probably want to be able to use more than I guess: I get a little confused with pod and nodes.

A

To be honest, so, like I, guess, I assume that a pod would be a node in this case and then so.

D

A

You wanted multiple, multiple nodes. You probably need multiple pods.

F

Multiple MPI notes: yes, usually those would have to be Representatives, multiple Bots, uh but that's not really a problem for Q. If, if you, if, if the number of workers is fixed from the beginning, that's not a problem. uh What is missing is the more like a dynamic scenario where you start with one part, and then you require at some point um different uh number of parts, and then you scale down and scale up and so forth. That kind of interaction is not defined.

F

There is some people trying to uh to to build this into queue button, but uh there are some some things missing so that that's where more work might be needed, um but okay size, even if multiple pots- that's that's already supported.

A

Okay, I will play around more with uh fan if I can just integrate this with Q directly uh and then I'll think about more about the service and Ingress like the life cycle for that, uh but yeah thanks for everyone for their time. I know, although you.

C

Okay, one more question: it was really more for the community, and that is thinking forward. uh This I think the concept of identifying not just the Pod, creating resources, but other resources and kubernetes holistically is a powerful concept and I'm wondering down the road uh being able to take advantage of that Concept in queue. Do we see this is, is something we might want to expand on this kind of requirement.

C

Where you create you, you dispatch a workload, not only the pot grading resources, but also, and not just one pod, recruiting resource would be multiple ones, uh but also non-pontreating resources.

C

Is that something we might think through uh or as powerful enough, and this is an example of a use case where it would be I'm thinking individual would be nice to have here's, here's all the objects that I need to represent my job or my my workload and cue it, and then, once the you know, once it's ready to get dispatched based on you know, whatever the requirements are resource demand then dispatch all the workloads uh within that.

B

Are you referring to multiple.

B

um Quote-Unquote apps more than that, like volumes and.

C

It could be yes, so it could be multiple resources in this example and I'm using this is Kevin's. Examples is that you know these are a few objects in kubernetes that he, you think, through as one set of resources that represent a job right and so um obviously other job yeah. So I'm.

B

C

B

Go ahead. Sorry.

C

I was gonna, say I'm, just thinking of it as right now, Q is, is queuing a the job resource and I'm wondering as long term as we think about cueing and taking this concept of uh holistically, representing not just the the one job creating prod object, but multiple resources. If we think that's something that we can that's a powerful enough to start to, you know think through what we might do as far as cueing. All these objects that represent a job.

B

Right um so cute doesn't really cue. Jobs accused something called workload and it's a abstract concept right. That represents a collection of resources that you have limited capacity of, and you want to subject them to quota, and so whether or not those represent a job, two jobs, a job and a deployment is really up to the. Let's call it the workload controller right.

B

um So how so so that's what we try to do with Q. We wanted to decouple these two concepts like the workload is not an API that creates resources. It's an API to express how much resources you want right. It's not it's, not an API that creates pods or deployments or jobs it it.

B

It's not that it's just tells Q how much resources I want um and then, once these resources become available, tell me to start and when, when, when you tell that workload controller to start, it will create them as whatever it wants, and so in that sense, what you mentioned is kind of already supported it's again up to the uh batch workload, controller.

C

um Yeah create.

B

Things in the way that it wants right.

C

Yeah you're kind of leaving it to the controlling entity to determine how to release or I, don't know what.

B

C

B

C

um And if you were going to include Services, you would need that concept as well.

B

C

B

I mentioned at the beginning: what is what is it that you want to control with the service is? Is there a resource that is associated with the service that you want to basically control and subject to Resource quota? That isn't really, but I mean it could be that oh I don't want to create the service until my job is scheduled right. So this is not really you're, not really associating it with a with a quota.

B

It's just okay, once my job is scheduled, which is I want one core for that job, then I will also create the service right exactly.

C

A

Mean I guess, but when uh ports and services be considered service like constrained, if you have like for like Armada, you know, we have a lot of users and we can't have everybody open up a port and not clean up that port and So. Eventually, you don't, you run out of ports to use. Also, if you just let people schedule them whenever.

B

um So so in kubernetes uh I mean like IP management is happens in a much different layer right um and it's. It is not like a job level resource. Really it's it's a it's related to the pod.

B

So that's one thing, um but if yeah I mean I, I port forwarding is again like again: it's yeah I, don't think it's an encouraged like it's. It's a pattern that in general the community encourages like to do port forwarding on the Node I mean it has security implications and whatnot. It needs to be like you know, you need to set up like a proper Gateway forwards, things to like back-end part, Etc, I, don't know if that's something you want to manage as a resource. It's too complicated.

A

I guess yeah. My point is that eventually, for a large enough research, cluster uh I would say ports do or are something you want to have some control over, because you don't like you do want to make sure that they get once that job is complete. I would say you probably want those ports open again. So that's always.

B

Reports at what level like are they constrained?

B

It's. They are concerned at the node level right but again like if you have so so, and so it's up to like how many parts you have on the Node. It's not it's, not a cluster-wide resource.

D

B

But again like, if you set them up properly using like you know, uh Ingress and whatnot up, like you know, it's really, like a part, is associated with the IPO virtual IP that you assigned to that service. It's not a limited resource. It goes themselves all of themselves like it depends on the system.

A

D

A

Right. Thank you.

B

D

B

Minutes I'll: do you wanna quickly talk about description or save it next.

F

Time, yes, we have a few minutes. It's rather fast can I share my screen.

F

Wait: I don't have permissions.

B

I should have now.

F

Perfect, thank you um so yeah.

B

Is it just me or Aldo? Are you muted, yeah I'll? Do your muted.

F

Is it the same problem again? Yes, I'm getting muted, once I start sharing.

D

F

Me Maybe, if I try the entire screen, it works. Oh, um let me try that.

D

F

Yes, this, can you hear me.

B

D

B

F

Thank you, okay, sharing that doesn't work um so, uh okay in queue. We are we're working on a proposal for Warcraft preemption. uh This, basically two scenarios where, where preemption might be needed um in Q There, is a concept of borrowing quota.

F

So if you set up quota for a cluster queue and then multiple cluster cues are set up to to be in the same cohorts in the same group, they can share quota when rather one cluster, you can borrow quota from another if this other cluster queue is not using the quota now, the problem is how to recover the quota. Currently, the only way is to wait for the workloads to finish, and this is clearly not enough for certain use cases.

F

We want to basically uh end workloads faster uh well, the other use cases within a cluster queue. You could have a a high priority job coming in and even though um you you're not borrowing quota from other cluster queue, you still might want to preempt a a workload with lower priority. So that's that's the use case. I I think it should be relatively common concept across uh patch schedulers, so well.

F

The proposal um I was kind of um being uh I was trying to be as comprehensive as possible or give users the a good amount of knobs to to influence preemption, so one kind of Knob I was trying to give- and this is based on on some discussion that was happening in the issue. If you go to issue 83 4q, um there was some some requests to be able to um favor the preemption of certain workloads based on how long they have been running um now.

F

I came up with a kind of maybe a little bit too complex API. That kind of sets uh cost a cost curve uh for the time based on the time or how how good or how bad it is to preempt a workload given how long it's been running. So this is my proposal. I think the uh most of the feedback feedback I've gotten so far- is that it's maybe too much to start. Maybe we can just get rid of it until there is more um justification for it.

F

So if you think this is something you might be able to use, let us know otherwise we will um park it for now and the other one is uh maybe a little bit less morning intuitive. Basically, when your workload wants to pre uh needs to preempt some other workloads, um just simply saying how long it's okay for you to wait for other workloads to finish before you start pre-enging.

F

So this could be useful. If you, you know, you have short running jobs uh and then you're, okay, waiting for the period this period of time and so yeah. Basically, that's the API I I have another proposal over here, also to give extra control.

F

uh What is it like workload sorting so how how to sort the workloads uh they are already running to determine which ones are the best ones to to preempt? uh Honestly, I think this is probably Overkill at the moment so, but I wanted to leave it here so that you more or less understand. What's the the current proposal for how workloads should be chosen like first sort by lowest priority, then maybe a short lowest cost as proposed uh earlier and then by lowest running time.

F

um So yeah, that's that's.

C

F

Yeah uh I'll just finish up real quick.

C

um Okay sure that's.

F

The API proposal, I'm still uh working on the actual algorithm um I, hope I hope to to have some this completed for kubecon. So if, if you have a chance to talk uh with me, you uh we can discuss it or you can of course uh submit feedback in this in this PR already uh yeah. That's it yes, Diana.

C

Oh sorry, I didn't mean to interrupt you. There um I just had a quick question. So I haven't read through this, but do you have the basic concept of expressing whether a workload or is preemptable or not, meaning that um if there's borrowing uh you know- and you want your job to run you and it's not a preemptable or you don't want it preempted, uh then that allocation of the quota would not be part of the preemptable type of resources that would they can get reclaimed.

C

So, in other words, uh you know I say: okay, I want to borrow from my workload, I, don't mind if I, if I need to borrow I, borrow and I know that I have the risk of getting pamped in right and I can set my parameters for that. Okay.

F

So rather you mean that these workloads shouldn't ever borrow. Yes,.

C

F

That's a good point: uh there is no switch, not what the most you can do is the the knob is for the cluster queue rather yeah. Exactly um if, um if the cluster queue is like, it's used usually has uh workloads with this kind of requirements, then you can simply disable borrowing. Look for it um and I think this kind of things is better to leave to the administrator.

F

So setting up setting it up in the cluster queue might be a better fit because, uh if you put in the workload it might lead uh to a abuse right uh of.

C

Course, everyone knows that's true, but it's true, but I also think it's a powerful as well that uh that I I know if I submit this job and it gets accepted and with meeting all the policies that I know it's not going to be preempted uh but I have this other job that I don't mind running in the cluster.

C

uh You know: I'm the same user I'm using the same quota, but I have this other job, that I don't mind it getting preempted um and so being able to express that within the one job uh adds more flexibility. So.

F

I just wanted to know if you're I think you can achieve exactly that by setting to Cluster cues, where the first cluster queue has the real quota and the second cluster queue has zero quota, and it can only borrow from the front existing classic queue and other cluster Keys, um so that could be a way to set that up, giving all the control to the administrator instead of of the user. Of course, the user has a chance to use one classic or the other.

C

F

So that that might be a way I think, but this brings me to maybe like how to set up best practice. This is an interesting you know, question so I think having an answer for this kind of scenario would be good. So that's something we can include in the documentation.

C

F

um Yes, uh I think I had a better thought. While you were talking about this, but I forgot.

F

um Any other question, or did you have a second question, no.

C

C

No I actually do have more questions, but I'll just make more comments in the in the because I need to think through it and I also want to read through it before I. Ask a question. So, yes,.

F

We we're kind of trying to give some control in the workload, but at the same time, we prefer to put more most of the control in the cluster queue, because it's it less. It leads to less less abuse um from from end users.

B

I have a question on the chat. Oh thank.

F

F

How to manage preemption among non-borrowing plastic cues? Are all jobs considered to be scheduled schedule scheduled after the previous one completes.

F

um I guess this relates to higher priority jobs.

F

I'm, not sure I, understand the question: can you reward for us as life.

E

Yeah sure I mean I was just you know, taking a point on on. What we just discussed is that there are two cues one can borrow, and one cannot borrow, but you have like n jobs skewed um in a in a cluster queue that um that that cannot borrow like I mean, meaning all all the jobs should should complete. First so like. How do you prioritize it? In that scenario, foreign.

F

F

The workload can have priority, um but otherwise it's mostly fifo first thing. First out: okay, okay got it. Yes, um there is somebody working I, don't know if Kant is here or Alex uh I.

F

Don't remember sorry, I, don't remember which one of you were working on a proposal for giving more knobs for sorting winning a cluster queue uh so yeah, because today we only have um five four kind of a strict fifo, which is five all the time or some fifo where, if it doesn't fit, it still tries to uh schedule other small other workloads.

F

um But uh there was an idea of expanding these knobs to more sorting mechanisms, uh but uh that's still ongoing. Okay.

E

B

All right uh five minutes over time. um Thank you, everybody and hope to see most of you in kubecon in two weeks, just as a reminder, I think uh we'll be canceling. uh The working group meeting on that week um I'll send an email um and post a message on this live Channel as well.

C

Thanks: okay, great thank you all.

E