Kubernetes Batch Working Group Weekly, 22 Dec 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes WG Batch Weekly Meeting for 20221222

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

um Yeah, hello, everyone um today is the uh December 22nd. uh This is um batch working group by weekly meeting. We have um one item on the agenda for today before we go through that. uh Just why I'd like to remind you that this meeting is recorded and will be uploaded to YouTube. So please.

B

A

Sure that uh you adhere to kubernetes code of conduct, um so yeah as I was mentioning. There is one argument: the agenda uh that I guess, Aldo and kante would like to discuss, which is um Q roadmap.

A

um We have a PR I, think uh discussing that, but I think Aldo prepared a short presentation, so I'm not sure Aldo and Kanti did you have an agreement? How are you gonna proceed um with today's meeting.

B

uh Okay, hear me: yeah yeah I'll show some a little map for the queue in the next year and then maybe yeah. You know what I mean.

B

I share the screen, or you can help me.

A

Yeah I'll share to you.

A

Can you see my screen.

B

uh Yeah I can see it. Okay, uh you.

C

B

Are coming to the end of the of 2022, so we have a rough plan for next year about what we want to do in the uh uh in the next year and there.

D

B

Several parts of the the uh because there are two parts of the roadmaps is one one is about the features we we think we can finish in the next year, but it depends the it depends, the time and the priorities.

B

So we we hope to share something with the community in this meeting.

D

B

Hopefully we can get get some feedbacks about, like you know, like the uh uh suggestion and feature demands just to me just to make sure that we are on the right track. Okay, so so during the sharing. If anyone have any questions uh as up- and we can talk about in details- okay and I'll- go through the list one by one for.

A

B

A

Think I'll do put together a presentation. uh Maybe it's easier to go through that yeah.

B

A

C

um Yes, um okay, thank you, I'm I'm, actually adding a public link because uh um I am I'm, adding a link into the notes.

C

One second I need to change my account.

C

C

Okay, it should work.

C

Okay, so I I already put a link in the in the meeting notes, um so thank you, kante for for bringing up the topic. I wanted to um kind of give an overview of of what uh we have been doing in the last uh release.

C

What we are preparing for the third release and also I, wanted to give a quick view of what we, as gke developers, are, are planning to do and where we want to encourage uh contributors.

C

um So let me let me let me just start so uh we are planning a v 0.3 release um and the estimate is mid-January and we have two ongoing ongoing items. uh One is preemption, um so if you're familiar with with a scheduler there is, uh there is preemption in scalar two, but the preemption is based on thoughts. So um when there is Need for space, we preempt individual thoughts uh in in queue. We are kind of trying to do the same, but at the job level.

C

So when we preempt, we plan an entire job instead of likely preempting two parts from different jobs which might which might fail.

C

So this is a Atomic preemption, let's say, and uh there is two modes of preemption and Q according to our design, uh one is within your own resources and one is uh among the resources that you share with the rest of the teams which we call cohort and uh if you, if you open these slides in from the link that I shared you, can you can open the issues, uh but for now please uh can we just stay here: yeah um yeah, if you're interested follow the links.

C

um So that's one feature the other feature: we've been working on, uh it's a short-term form of All or Nothing, um which basically is just a it's just a one after another scheduling, so we we only scheduled one job. We wait for the Bots to be to be running and then we proceed with the next job.

C

um This is a as the name suggests, a very short term uh solution, because we want to have direct integration with cluster of the scalar or they keep scheduler to to do the the atomic scheduling.

C

But this is a short-term solution that that is uh configurable. You can disable it if, if it's not something that works for you, because it can be very slow, um so that's that's the ongoing and then thanks to our contributors uh um in the community, we've completed two more important tasks. We now have a performance test that is easily reproducible um by victors and we have end-to-end tests. uh I think think, thanks to uh Kevin Kevin Harlan.

C

um So this is uh I think the major things we we have uh paid attention to for this release um and then we wanted to share what we think uh or what are the priorities for us as as gke in age, in 20, 2023 H1 H2.

C

um Just for for you to have an idea of what we're what we're focusing on or our efforts on. But of course, if you have your own ideas, uh please please feel free to bring them up. um So can you go to the next slide? Please?

C

So! Yes, um some of these. In some of these uh plans, we have some developers are located too, uh but uh there will likely be uh minor tasks or or bigger tasks where uh we we would need help.

C

um But let's go over first through the priorities we we already have some allocation for uh so comparative preemption. uh This is a second step or a Improvement to the financial we have uh where we want to include information about in jobs. The jobs can include information about. When was the last time they did. They did a checkpoint, and this this is useful information for prioritization of what can be preempted so likely. We want to preempt jobs that recently that more recently did a checkpoint.

C

um So that's that's that task graduate apis to Beta. This is important for uh production Readiness. Once we we move into a beta API, we we will promise uh longer longer um support for it and backwards compatibility.

C

So that's an important step towards uh production production. Readiness um then we want to. As I mentioned, we have performance a performance test and we want to start paying attention to it uh and incorporate learnings from there to improve performance um yeah. There are easy tasks here, uh for example, incrementing the number of workers uh in in several of the small controllers, that will that will give us better performance and introducing parallelization uh I, mean CPU, polarization and so on and so forth.

C

The things easy tasks that can be done to improve performance and more more granular tasks can be done with people with uh profiling, which we we want to add as well, and another uh thing we want to work on is some form of meta job. As you can see, the the name is not established. That's kind of the idea, um but um the intention here is to have uh an aggregate job that in which you can Define multiple pod specs right, for example, to support a launcher worker paradigm paradigms.

C

uh Let's take this as an example uh uh MPI, so you have a launcher that starts all the that starts all the workers. uh Let's say your workers need gpus.

C

um You don't want your launcher to also use a GPU right, uh because I would that would waste resources, so you would need two different templates and uh that's what we want to achieve with meta job.

C

um But we don't want to rewrite an entire new controller.

C

We want meta job to be to create multiple jobs and then the meta job will be handled as a single unit by queue, and this is one kind of thing that could possibly make it all the way down to kubernetes core, but uh I think it's it's fair to start uh in a project which has more velocity such as Q, so and- and that would be the idea we start this in Cube, we experiment we iterate and then possibly let's say 2024 or later we proposed a 12 streaming um and then we uh we want to encourage a few things um both in upstream and in queue uh in Upstream uh I.

C

Don't know if way is here, but we worked on on scheduling, Gates scheduling, Readiness Gates, which is a very uh useful building block to uh to block scheduling. This is particularly important for spark applications, because in spark you could create a launcher bot and then the launcher pod would would create, would create a lot of pots without asking for permission right, we'll just create a pods.

C

uh We want some control over that uh and then the the solution we we proposed along with way, uh is to um add this scheduling Gates feature in Cube schedule, so this is an important building block that uh will be very helpful for for Q and yes, so it's currently in Alpha. We want to encourage the the Beta release in 127 and the integration of Q with it- and here is this last item is where we would like like help with from the community.

C

uh We want uh to start integrating Q with with multiple existing Frameworks, uh in particular, MPI job tensorflow job uh spark has already mentioned, and Ray IO, which Rey has some similarities with spark in terms of how how it works. So, um yes, that these are the Frameworks that we know of are top of mind for multiple customer from multiple communities.

C

um But of course, if you have your own, um please bring it up. One important thing to notice is that to integrate with with Q. We need a few hooks right. We need a hook to basically suspend what creation from from the jobs. So that's that's kind of the first step in each of the these apis. We need to add this. This configuration this this field and then Q can integrate with it um yeah. So that's kind of what why we want to achieve uh in H1.

C

uh Of course, we might not be able to achieve everything, so some of that will will spoil will spill into H2 uh uh for H2 things are less clear, but this is a good also a good opportunity for for the community to to suggest what what's top of mind for for for you.

C

So, but we have a few ideas, uh for example, book provisioning. uh This is something we we.

C

We are already thinking of Designing, but we don't have. uh We don't have a complete proposal yet, um but the idea is that cluster autoscaler can uh uh can provide an API where we can request a bulk of um a book of notes or a bulk of resources, um and then the cluster Auto scalar can respond.

C

Saying I am I, am able to provision these resources or the process. Resources are really provisioned. I think you can unsuspend the job having those guaranteed resources.

C

You could even even be more powerful because the cluster Auto scalar can can say I, provided all these resources in in the zone a and then Q can inject that zone as a node selector in the in the job, and then those spots will directly uh land in those in those um in those nodes.

C

So that's that's. uh That's both provisioning um and then another idea is to work on budgets uh as, as you might know, Q is uh uh basically a a quota system right. What uh means that, at every point of time, you cannot surpass those amount of resources budgets is, is more about.

C

You cannot surpass the amount of resources in a giving amount of time right if you've been using the jobs for if your job has been running for 10 hours, that's taking into account towards your budget 10 hours, multiplied by how many resources you're requested uh that the job requests.

C

So that's, that's, that's something we want to to achieve um uh next slide, please.

C

um So, as I I was saying from the beginning, you we would welcome your ideas.

C

This is uh these ideas that I presented are coming from from from our um you know our view of things, uh but if, if we are missing something we would like to know if something else is top priority for you, uh you have the the resources to to work on them, you're more than welcome to to come and discuss and and present your designs, but we already have some some smaller or bigger features that uh we know are important, but we don't have enough resources to work on and uh and they are basically up for grabs, uh so one one nice one, one interesting task or or even design is a partial admission of jobs.

C

So you can say things like um my job. Can my job ideally requires 10 workers right, but I'm, okay, having only five um so depending on how much quartile there is there is available, or when we have budgets how much budget there is available uh you can, you can uh start on. uh You can start a given a given size.

C

So that's that's one uh pip Prof! uh This would be very useful for improving performance uh and uh yeah. We all of the kubernetes core components, have a paper of endpoint, uh so so my some ideas from there can be grab uh and then some non non-coding tasks such as setting up the documentation.

C

um We've been hearing from multiple of you in the charts in the in the slack or in or from uh some some um user experience. Research we've been doing that documentation is not it's not the best. So uh definitely we would welcome help there, and another point is the cube CTL plugging uh we have cubicles has some limits in terms of how useful it can be for crds it's hard to surface uh some complex information such as States, so active CTL plugging, would be would be helpful.

C

Another another problem we have in queue is that most of the information about a job is not contained in the job it's contained in a separate object, which is called the workload right. If you want to know, if why my job is not admitted yet, why is it not running yet? You have to look at a different object right, so having that available in a single you know listing through a cube, CTL plugin would be super uh useful for for easy of use and um a web-based dashboard grafana samples.

C

We had a contributor share, some grafana samples with us, um I'm, not sure, even if you can like have kafana templates, if there's such a thing, um but uh those would be useful or a web-based dashboard to to see all the information about about the jobs.

C

um So these are all apps for for grabs uh again, there's there's two two things here. This is again our vision of things.

C

um Kante is a suggesting that we publish actually the roadmap in the in the Repository um for for for all people to have a permanent view of it um and that's that's owned by a community right. So if, if you have ideas, we will try to put most of what we have here in this slides into the into the pull request.

C

um But if there is some uh some priorities for you, uh please uh join the discussion in the pull request and uh at this point, I'm opening up for questions or back to kante. If you have uh more things to add.

B

uh Totally the presentation cardboard most of the features we plan for the map. That's one point about the multi-class support and some of the users from the community asked for the Mali class for the past. We think is uh far from us, not because we have. We still have plenty of cool features to improve in in the following, so I think it's a yes! It's a long-term feature in this one.

C

Yes, I think giving all the features we already have for 2023. uh It might not be possible to to start working on multi-cluster in 2023, but definitely something in our minds and I. Don't know if we have uh defaults from G research here, uh but uh we are. We are in conversations about how to uh synchronize the efforts, uh because that's that's the better of Geo research uh of uh Armada.

A

We will have a presentation next working group meeting. Yes,.

C

In January 19th we are going to present there. The.

B

Current efforts.

C

um So we we might have an Avenue there, but still yes, it's top of it's it's in our minds, multi-cluster, but it's unclear how much progress we can do in 2023.

E

Sargon, who, who else was gone first with.

D

Me Kevin go ahead.

E

I was curious about with Cube flow I've been noticing that there's like a lot of progress, going to try to combine all these operators into like a single training operator is this: uh are we kind of focused on integration, one by one, for each of these operators or I? Guess I'm not going to be the one doing the kubeflow but I'm curious? Because when I read the uh like the threads and stuff, it seems like there's a lot of stuff going on in the Kim flow Community around consolidation of those operators.

C

Yes, I think, uh given how the architecture of cube flow is today, the Q flow training operator um I think it's easy to It's relatively easy to do the integration for all of them at the same time, except for MPI job, which is a separate project for now.

C

uh So yes, um if it can be done in a single effort, that's great, but if there has to be prioritization I think the fjo is probably the most popular, possibly followed by by torch I'm thinking, but uh it's likely that it's a single effort could could support all of them.

A

So I have some thoughts on this. One um I've been looking at all all these apis API drop, TF job by torch um and My Hope Is that we don't really need to integrate with any of them directly, and we can mostly replicate that using the uh the multi-job API.

A

um So when I look at TF job, for example, or by torch, like 90 of what the controller is doing, is setting up some environment variables and it's, and so um it doesn't feel that integrating with each of them is um going to be sustainable.

A

Long term I think we can come up with a solution uh that basically a single API that would enable you to deploy all of these training operators using a single API, um but I don't want to discourage people from integrating uh if they wish to do so, um so that that is that's the path. That's possible I'm, just commenting in general about like the that common training operator that Cube flow has.

A

um So there are multiple apis. The implementation is basically I replicate like they are. It's really minor differences between them. uh It's mostly just making it convenient to set some environment variables that these different libraries use um and so yeah. So that's I, I hope. My hope is that we we will have that multi-job.

A

uh You know API before that, and we would be able to basically support all these types of workloads um using that API.

C

Okay, everything in general is a question of timelines, uh because some people might want support for TF job in 2023 and uh once met a job Lance, it's probably going to be at least 2024 before is able to migrate to the to the API.

C

So we we probably will need uh midterm solution, which is direct integration.

C

But against um up to the community right now, what's uh the highest priority for the community.

E

It seems to me, in general, the the problem that I'm dealing with also is all these different crd and, like you, have spark Ray and I know: I've looked at desk and a few others and their AP, like their pattern, is very similar, but there's also a lot of I guess business logic built in some of those operators where I'm sorry, some of those crds so I do I. Do like the idea of having kind of like a meta job or multi-job API, but also I, think for short term.

E

It does make sense to do the Integrations with the Community Driven crds, from the projects like Ray or desk, for example, or spark operator to encourage more you. You occurring encourage more utilization with q and other popular projects, but I do like the plans for both.

E

Thank you, though,.

A

Any other questions.

A

Dave I think you want to.

D

I was just going to go back to uh the inquiry about Armada and just point out: Kevin, I and I are here. uh If someone had questions about Armada now, um I've you have an audience um or if you want to wait till uh presentation next time. That's fine too.

A

We only have 10 more minutes.

D

Probably a little too much to get into um I I didn't catch, who who was uh asking about multi-cluster but uh feel free to reach out to uh um and check out the Armada project.io, because that's that's definitely what the space we're playing in um we've got. We've got a lot going on there.

E

um So one question for the the scheduling Gates: it says in the 2023 encourage. Is there I guess it's? It is already in Alpha for one dot 26. So what more items are you interested in for that? One.

C

So first graduation to Beta, um because that means enabled by default and uh integration uh I mean the API is there, but Q is not using it um and I think they the primary case or the the first case which we'll need this. This feature is Spark.

C

So uh probably this part integration would be the the first user of of such integration.

A

Any other questions.

A

So we will have this I guess um documented in the repo um and please feel free to um check our uh like open issues as well um and suggest any enhancements communicate your priorities as well to us um yeah. It should be an exciting uh H1 effort.

A

All right, I guess: if there are normal questions, we can conclude here, happy holidays, everybody and see you in the new year.

A

C

E

All right thank.

D