Cloud Native Computing Foundation CNCF Webinars, 7 Oct 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Webinar: Kubernetes native two-level resource management for AI/ML workloads

Description

This session explores resource management of AI/ML workloads in Kubernetes based environments and shows how two-level resource management using Multi-Cluster App Dispatcher addresses the challenges.

Presenters:

Diana Arroyo Software Engineer @IBM Research
Alaa Youssef, Manager, Container Cloud Platform @IBM Research

A

All right, everybody I'd like to thank all everyone who is joining us today, so welcome to today's cnc webinar, the kubernetes native to level resource management for aiml workloads. My name is danielle, I'm working for red head and technical marketing major as well as I'm responsible csf ambassador as well. So luckily I will be moderating today's great webinar, so we would like to welcome our presenter today, diane arroyo and software engineer at ibm, research and online chef, the manager container cloud platform at ibm research as well.

A

So there are a few couple of things about housekeeping today so probably started uh this webinar, so you are not able to talk as a attendee, so there is a 28 box at the bottom of your screen. So please uh feel free to drop your question in there and we will get as many as we can at the end decision on official webinar of cncl.

A

So this is this is the subject to csu code. We conduct, so please do not add anything to the chat or question. uh There will be a violation of the code of conduct, so please be respectful. All of your fellow participants and prisoners, please also note the recording and slidestar will be posted later today at cnc webinar, page www.cnc.ios webinars. You can find it there later today and I will hand it over to diane and ollar to kick off today's presentation.

A

Take it away guys.

B

Thank you, daniel.

A

B

Share my screen.

B

And I hope you can see it.

A

B

Okay, so uh thank you all for joining today. um I'm happy to take you through uh this presentation, together with my colleague, diana um so kubernetes native uh two level, resource management for ai and machine learning, workloads and our agenda today.

B

uh Basically, we want to start by saying why the kubernetes scheduler is not enough for the scheduling and and resource management related activities uh for these ai and machine learning workloads, and then we will also talk about a few additional desired capabilities.

B

uh On top of the shortcomings that we will discuss in the first point, then, after that, we will show you our proposal to address these shortcomings and desired capabilities, which is the multi-cluster app dispatcher, which is an open source project that we want to share with you and tell you a little bit about it and about how it works, show you a short cool demo and then a call to action which you can guess so why the kubernetes scheduler is not enough.

B

Actually, we had a recent cncf blog post with the same title about a couple of months ago and uh basically, in that blog post, we try to motivate for for the need of this. Second level, resource manager that we are going to talk about today, so some of the characteristics of these ai workloads that we are targeting here.

B

um So uh basically, you all know that you know it's. The use of uh kubernetes platform for running these ai and machine learning workloads is is on the rise and- and these workloads have typically multiple concurrent learners or executors right. For example, if you have, if you're using spark, you have spark executors if you're using some, you know doing some deep learning, you have deep learning uh learners right and typically, they need to run concurrently in in a distributed learning uh fashion.

B

They have some uh collocation or a thin affinity constraints.

B

They may have some. You know specific hardware requirements such as using gpus, for example, and a specific number of gpus per learner also we're seeing um nowadays an increase in the massively parallel jobs, where there is a big number of short-running tasks that need to be executed, like array, jobs, for example, and and and these jobs are resource hungry, meaning, if, if, if you give the job all the resources to run the the the thousand, the thousand tasks that it's composed of it's willing to take all those resources and run the thousand tasks in parallel.

B

If you give it less resources, um it has nice. You know many of them have nice elastic features where, um if you give it resources enough, for example, to run a only 100 tasks at the time, then it will do 100 tasks at the time until it completes it will take longer time to complete, but the total amount of consumed resources is going to be uh the same, and why would you want to do that? Well, I mean there are two reasons.

B

First, I mean the resources are not infinite, but second, you may want to um regulate the flow, so you allow for parallel jobs to proceed, especially with the increase in the use of interactive mode like, for example, data scientists sitting in front of a jupiter notebook and running some of those machine learning jobs and want to see results interactively. So they are not all batch jobs that that you know can be run later and and then, when done somebody's waiting for the result.

B

So in this kind of of of uh environment right, so you have jobs and tasks right and and then there is the the the advantages and disadvantages of managing at the job level versus at the task level. So, typically, a task maps to a kubernetes pod and the job is composed of multiple of those parts of tasks.

B

And now the question becomes at which level? Should you be specifying things like priorities or classes of servers? Like you know: gold, silver bronze, for example, or quotas right? This is the maximum quota on the memory resource or gpu resource for a particular task. Job user uh organization I mean at which level should you set these things um and also at which level do you do queuing of uh of these jobs or tasks right and then and allocate resources and do preemption right.

B

uh So these are all the questions right that that don't have um you know one answer, but but um you know we are trying to argue here that really the the right level of of management that you need to have is at the job level, uh and you need to do a lot of these functions at the job level, not at the individual task level. Obviously, the task will inherit from its from his job and think about what happens: um your scheduler and the different controllers in your kubernetes environment right and in your hcd.

B

When you have you know thousands and thousands of of of pods belonging to you know thousands of uh of jobs that are pending and need to be scheduled.

B

So if, as as these jobs arrive, you create the pods and and hand them over to the kubernetes scheduler, it will be overwhelmed, as well as the other controllers in the system.

B

So briefly, you may all know that you know the kubernetes scheduler is a pod scheduler, given a pod with particular dimensions like resource requests in the memory dimension, cpu dimension, gpu dimension.

B

It basically filters out the nodes that that don't fit that that pod or or for any other constraints shouldn't be used for that pod and and then for the remaining uh nodes in in the cluster. It's going to prioritize them and and rank them according to priority functions and um get the the top candidate to put the pod on right, and then it binds the pot. So in this example, here you have a series of pods that are arriving and and queued at the scheduler to schedule.

B

They happen to belong to three jobs where each job um has four executors or or learners. Right has four parts. Basically, in this simple example, and the job of the scheduler is to one by one scale,.

B

um And now imagine now that you have thousands of these jobs and all those pods are waiting for the scheduler to to schedule them, and you obviously have limited capacity in your cluster and it will keep trying to schedule them and failing and schedule them and failing until until once in a while, it's able to let in you know one of the parts so in this example. This is what happened right.

B

The pods got scheduled this way and you ended with two of the three jobs not being um totally uh placed in the cluster and and now. If, if these jobs require to be the whole job to be placed in order for the job to proceed, to do the learning or whatever it needs to do, then, then you have those two jobs: the the purple and the red here. In this example, some of the pods occupying space that is useless until the remaining parts can be scheduled and you have partial deadlocks.

B

So these partial deadlocks, obviously, if you have a gang scheduler, can help in solving them. But just remember that when you have thousands of pods, you could have avoided that all together. If you had a second level controller that is able to look at the cluster from a you know, high granularity level and let in to the scheduler, only jobs that are likely to be placed.

B

So, as I mentioned, the cluster resources are limited. Scaling up the cluster by adding nodes takes time is not something that you want to do instantaneously for every arriving job or leaving job.

B

It's probably something you want to do at a you know, little longer time scale and, as I said, these, some of these jobs are resource-hungry and no matter how much you scale the cluster they're going to eat it up. So practically speaking at every point in time there is a limit on the available resources, and even if that limit is not permanent and it's going to change and you want to be efficiently using these available resources.

B

So you know uh the the the the common saying says: the sky is the limit, but remember that it doesn't say that the cloud is the limit right. So the cloud has limits um in terms of available resources at every point in time.

B

In addition to that, we are seeing the rise of um you know: different patterns, uh like multi-cluster patterns, where organizations own you know tens of clusters, tens of kubernetes clusters um and uh these clusters are basically easier to manage than than a huge big big cluster and and that's contributing to the rise of this multi-cluster pattern that we pattern that we are seeing and um in in in the hybrid cloud world where, by default, you have you know some on-prem resources on prem clusters and some public cloud and multiple public cloud clusters.

B

By definition, you have a big number of clusters that you need to choose from um where to place this next job, that I need to run static assignment of users or certain apps to a particular cluster is feasible, but is not the efficient way to use all your available clusters.

B

So you may have, uh in addition to on-prem and public clusters, you may have uh edge clusters as well, which increases the the number of clusters um uh you know tremendously and and and in that case right, um the static allocation of a particular job or user. To so a certain cluster is not the best way to go bursting scenarios.

B

People have, you know, seen them and talked about them for a while, um but you want to be able to do that in a more um automatic way right where job is submitted and then, if there is no resources available in my on-prem cluster and it's automatically routed to a public cluster that I have a subscription.

B

So finally, I mentioned about the the rise of edge computing paradigm and and what it means in terms of the the number of of clusters that an organization owns and and is managing and the options to place a job.

B

So, in addition to what I mentioned, there are some desired capabilities that we're looking for. So I alluded to the need of having a some form of cueing at the job level before admitting the pods to the scheduler to to do the binding to specific nodes, and if you are able to do that, that cueing and dispatching to a specific cluster before it reaches a scheduler in that cluster.

B

To do the fine, grained placement of the pods and binding them to specific nodes, um then you have, you know, achieved a lot, but in addition also, this is a good point of control where you can specify priorities and classes of service and have multiple queues for different priorities or different classes of service, where the jobs can wait to be dispatched to the appropriate cluster based on an available capacity.

B

um Enforcement of of quotas like hierarchical quota management, for example. If you want to say this is the quota in terms of resources that this particular organization or department or user is allowed to use in my system, then you can do you can do that at this at this control point in in an easy global way, and you can think about all sorts of you know. Soft and hard quotas and and of course, quotas that span multiple resource dimensions.

B

Also, you may be able to do things like you know: preempting low priority jobs in order to admit higher priority jobs or jobs that borrowed right from the soft quota of of another organization, but now that organization has its its jobs coming in and they have the right to go in and and preempt um those jobs that were borrowing above their quota.

B

So you can do these all these controls at that um second level uh resource manager, um because it basically has a unified uh view of the jobs belonging to those different, ai and and machine learning frameworks.

B

So what is our solution to all the shortcomings that I mentioned and the desired capabilities that I also talked about? Is the mcat or multi-cluster app dispatcher, which is an open source project that we are going to tell you all about it and and how it works. So I'm going to hand it over to my colleague diana and she will take it from here. Tell you a little bit about how it works and then show you a a a cool demo, so diana you want to share. Or do you want me to flip the charts.

C

No go if you can uh release the sharing.

B

C

And I'll take over you don't mind.

C

Okay, great and I'll make this big. Okay. Can you see my screen? I think I picked up on your last chart.

C

Yes, okay, great, okay, thank y'all, um so, as as all I mentioned, we've developed a controller that tries to address some of these uh capabilities uh and we have an initial framework of one and uh I'll talk to you a bit about what it does now and then some things we're looking at adding and adding uh in the road map. But essentially, as I mentioned, we would like to have a way to dispatch to either within one cluster or multiple clusters, an ability for cueing.

C

In this example, this high-level picture here we're just kind of showing how, in a multi-cluster environment, you would submit a job and just what we call a dispatching cluster and I'll, give some more details about that where you essentially submit all the components of a job and there's a uh the mk controller, as you mentioned before, would do the initial evaluation, whether that job is runnable and then determining whether it's runnable or not, dispatch it to other clusters to actually be realized and for the binding to happen within the cluster.

C

Here's just a quick notification of how you know. As all I mentioned, we have a open source project for this, it's in github and it's also in the operator hub as well.

C

So let me uh dig a little bit deeper on how this actually works.

C

So, as I mentioned, there's the mcat controller, the multi-cluster app dispatcher, and with that we operate on the custom resource definition called app wrapper. The app wrapper. What it does is, or what it is, is essentially any and all of your kubernetes resources that you create the ones that are uh compute consuming resources, for instance, deployments pods staple sets any of the resources that you uh define as part of your complete job.

C

We wrap it in this app wrapper and in addition to that, we also allow you to wrap of any non-compute consuming resources, for instance many times you'll deploy as part of your job as a service, maybe even me deploy them under different name spaces.

C

So with all of those components that you have that represent your job or application, we have a crd that you would submit it under and wrap it all so that it could be evaluated holistically and then the mcat controller.

C

What it does that operates on the app wrapper it takes, the app wrapper uh actually uh investigates it, inspects it and determines whether that job, all the resources for that job can uh be runnable and it does it in a holistic manner, meaning it looks at all the items that you've listed in this wrapper and evaluates whether it can be run or not.

C

If it does determine that it's runnable based on the policies, then it unwraps all of the objects that you put in it all the kubernetes objects and creates those objects in the cluster where they're dispatched to.

C

Obviously, if they're, not runnable, we put it in the queue to and re-evaluate over time to make sure that we can dispatch it once resources become available, and then another thing is all I mentioned earlier was: is that we support preemption and re-queuing, and with that aspect we actually allow app wrappers to have dispatching priorities. You can define those and I'll show how that actually works in the yaml, as well as a demo that that we have.

C

Okay, so this is kind of pictorially what I've talked about in the first chart, where we have the controller, the multi-cluster app dispatcher, that's operating on the app wrapper, so you take whatever objects you were creating without any kubernetes objects that you're creating you put them inside of an app wrapper and then submit that app wrapper to a kubernetes cluster and that multi-cluster app dispatcher will operate on it.

C

We have two configurations that we run. Actually we started out with the standalone version, meaning you know. This is just running within one kubernetes cluster and we found uh that worked really well and it was a nice extension to actually move it to multiple clusters.

C

So- and this is the second bullet here, where we have a dispatcher mode for the kubernetes cluster and agent mode, and that last bullet really is the multi-cluster environment.

C

Okay, so a little bit more deeper into what's happening behind the scenes, so this big box right here is the the mcat controller and uh what it does is that it uh it runs with inside the cluster.

C

uh It actually determines uh this available capacity of the cluster, so it finds out what's actually running what how much resources are available from by using the normal kubernetes model and it tracks what's available over time, and in addition to that, as you bring in app wrappers into the system, you submit them, the app wrappers will go into a queue again. It's all wrapped in the app wrapper wraps all the resources that you're trying to realize into that cluster, and so we put them into a queue and this queue can be fifo.

C

If you don't set priorities, that's how it behaves. If you do set priorities on these app wrappers, then uh the queue will be. uh The queue will recognize that and recognize the put the ones of the higher priorities at the beginning, and then the goal here for this mk controller is to uh evaluate taking the available capacity looking at what's in the queue pulling uh jobs off the queue determining their run, ability, as I mentioned before, and if they're runnable dispatch them and actually create the objects inside kubernetes.

C

So that's the standalone version.

C

Here I'm showing just the modifications that we have made so that we can support multiple clusters um in this picture. Here you see these big blue boxes here and they're actually separate complete clusters. So this is a cl. These two clusters on the right are what we call excuse me agent clusters and when you dispatch mcad into that cluster, you dispatch it in the configuration of an agent, and then we have a another cluster that acts as the dispatcher cluster.

C

I mean this is where jobs get submitted right and what happens is that the agent custer's agent uh controller collects the available state or the available capacity for the individual cluster and makes it available to the dispatcher cluster. The dispatcher cluster or the dispatcher controller here keeps track of the available resources on the different cluster.

C

It's supporting and then, when you submit app wrapper jobs into the dispatcher cluster they're, put into the similar queue and then they're evaluated where they can run and the dispatcher cluster will actually dispatch the job to the cluster where it chooses.

C

Here's an example of I thought it would be useful to have like a use case, so you can kind of see how this works. Let me see if I can minimize this get this out of the way, so this is kind of covering it up. So I apologize, but this is an app wrapper here, but before we actually create the app wrapper, as I mentioned before, you bring these mcad controllers up on the various system.

C

This is the dispatcher cluster that I showed before and two agent clusters and you have the mcad controller running in different modes.

C

So when that happens, the first thing that happens is the state is collected, meaning available resources and the local agents, and that state is made available to the mcat dispatcher controller and then it's ready to receive app wrappers.

C

So this is an example. Here I have we had. We worked with a team who had a bunch of resources that they were trying to create that represented one job, and this is actually really a subset of them that I put in here, but in their job that they submitted. They had multiple services. They had a name space, they were creating a network policy at pvc.

C

They had a couple of deployments that were creating one of them just had one pod and another deployment had multiple pods, so what they did is they were creating those objects already and they took our app wrapper and they just wrapped all of the objects and that they were already creating inside the app wrapper and I'll show how those are how those actually look inside of our crd and they submitted into the dispatcher cluster, the dispatcher uh cluster mcat, the impact controller on the dispatcher poster picks it up inspects all the objects inside of it determines if it's runnable and where it's runnable and then dispatches them to the appropriate cluster.

C

In this example, here I show that it's dispatched to agent, one where it's created and then the local mk controller will actually take that object and unwrap all of the objects here. They're like objects, a through g, unwrap them and created inside of this cluster. Here.

C

And then, let's see here, my screen is not working there, you go!

C

So that's what we have as of today and I wanted to give you guys some insight into kind of the things we're working on now as part of our road map. uh So this is the current work that's happening now. This is here the box that you just saw in the previous, where we had the mk controller, the queue and the available capacity where object gets submitted.

C

But, as we mentioned, we just we want to add more policies to support the capabilities that allah described, and one of them is not only evaluating jobs where they're runnable based on available resources, but now they want to extend that to additional capabilities and one of the first ones we're looking at is extending this to quota management, and so um what we want.

C

What we have here- and this is just an example here- is that you may want to define quota uh not just like at a namespace level, which is already available in kubernetes now, but maybe some abstract administrative definition.

C

So an example with that of that would be possibly you may want to set quotas on organizations departments within those organizations.

C

You also may want to set quotas on projects, so you could have multiple projects that you would set quotas on those, uh and we also want to support soft and hard constraints on these quotas, so various aspects in enhancing the whole quota management uh capability, not just like in a news, namespace uh level, but even more abstract.

C

So um this is the work we're doing now. Just wanted to give you some insight about how we would be able to take advantage of some quota management evaluation where, yes, we would pull it into the controller to determine available resources, but we would also evaluate whether that job is runnable based on quota management, in the sense that you know we may have enough resources, but there may be hard limits that you want to restrict on some of these jobs, so we would cue it up and then dispatch it once there's enough quota, that's available.

C

So that's some some yeah, some insight on what we have and how how we're moving forward with enabling more and more capabilities um at the evaluation of these these jobs that are higher level.

C

uh One more chart here just to show you, as I mentioned before, the actual crd that we have is the app wrapper.

C

So this first arrow right here shows you would express your your job that you're going to submit as an ad wrapper I'm going to skip these two guys just for a moment and go to the last one, which is I mentioned, that you would put all your objects inside of your app wrapper, so you would wrap them and you would just do it under the item stanza, where you list, if you had, if you saw on the other chart, there was like seven or eight uh different objects that we created kubernetes objects.

C

They are listed under the item, so you just have a full list of things that you add here and then, as I mentioned before, you can set priorities um on these uh jobs as well, right and we'll. This is a dispatching priority, so they're determining whether this job can dispatch and giving higher priorities over other jobs and then, finally, uh just some insight on where we might be able to express a quota information right.

C

So this job is assigned a specific quota uh name that we would evaluate it based on the quota tree that gets built- and this is again, as I mentioned, part of the real work that we're working on now.

C

Okay, so I thought it would be also helpful to just get some insight on how this works with showing a demo. So let me jump over to the demo I have here- and I recorded this so I'll talk through this demo. It's a very simple demo, but I wanted to kind of show you guys um how this works initially and I'll show you examples of cueing and then this preemption with priority.

C

So let me get started. Let me start this up so on the left here is the black window and it's the dispatcher.

C

It's the dispatcher controller and it's on a separate clusters. The green and the blue box are also separate clusters as well. I've made them very, very tiny, because I wanted to be able to fill up the cluster and show you the queuing, so each one of these clusters only has one node in there and every node, I think, has about eight cpus and available uh right off.

C

The top is really just five cpus now down here on the bottom, just going to show you the app as I submit the app wrappers they're going to show you the state and also show you that pods actually don't get created on the dispatcher cluster.

C

They only get to play created on the um agent clusters, so uh the first job I'm going to submit is job. One I'll show you the contents here. I kind of showed you some some of that in the charts that I'm submitting in app wrapper, uh and then I list the items here. I'm only. I only have one item in here.

C

It's a staple set and again you would put as many items as you need to represent your job and then, of course, uh since we're evaluating whether it's runnable or not, uh here's the cpu that we're actually allocating and in this example, we're having three replicas. So it's a three replicas with that cpu and memory limit.

C

The first thing to do is really is I'm going to show is filling up both clusters, as I mentioned they right now, they have about five cpus available that are free before submitting any jobs.

C

uh So if you create one of those jobs, it's going to fill up, I about a little bit more than half, so we submitted it and it's actually dispatched to the blue cluster and it's actually running and then the next job that we'll submit here is job number two, and it's going to uh we're going to submit this one, and what happens here also is that when we submit- and the policy that we have running here is just a random policy, uh we select, we just randomly pick a cluster, and uh if it fails that cluster, uh it will take it'll go into back off mode and try again in about 20 seconds, and so this is what you're seeing here uh is that it tried the first cluster.

C

It was full, so it actually backed off for about 26 in seconds and then retried it on a different cluster.

C

So the second job got submitted and it got dispatched on the second cluster and next, what I'm going to show is so now.

C

Essentially, these two clusters can't fit a third job of the same size, and so this is really to show you that the first ability, which is to be able to cue the job and wait for resources to become available again, I'm just showing here that it's the same same size job and since the cluster is full, it's not going to be able to create the jobs at all and because it's not it'll cue it up and there won't be any pending pods, which you would normally see.

C

If you just submit these uh staple sets without the app wrapper there won't be any pending pods on any of these clusters. It'll just be the app wrapper, that's pending, and so what that means is the scheduler is not trying to do any work where it can't fit all of the pods.

C

And you'll see here that it's pending on the bottom left hand window. The third job is pending.

C

So now I'm going to show how, when a job gets freed up and the resources get freed, the pending job automatically gets dispatched into a one of the clusters that become available so job one gets completed and job three will start to dispatch here shortly.

C

There it goes so I get dispatched. You know we, the mcat controller, detected that it was enough resources, so it dispatched the job to an available cluster.

C

And then, finally, I want to show another simple demo here, where we use priority to print for preemption. So here I'm setting the priority number of this job to 10. all the other jobs. They didn't have a priority number defined. So by default they have a priority of zero.

C

So we're going to create that again, it's the same size, but since this has a higher priority, we evaluate what's actually already running and determined that everything else is lower priority. So we will preempt a lower priority job and actually put it back in the queue and allow the higher priority job to be able to get dispatch.

C

And this is you want to use this kind of environment. When you have jobs that uh you know takes time stamps it takes takes tracks the the movement of the job, where you can restart it very easily, so that job got preempted. I think it was job number two got preempted and then, of course, uh when resources become free available again, which I'll show here next I'll free up job number three and then job number two will get redispatched and again.

C

I believe in this recording when I did it the originally the job 2 was running on the green cluster and now we're redispatching it to an available cluster, which is in the blue customer.

C

And I think I'm going to stop that now, because we're running close to being done here. I want to give time for questions as you can see that it's we got re-dispatched so I'll. Stop this recording and uh I think that's it. um The last chart I think I had was really just a call to action where you know feel free to give this a try out. That'd be great again. Here's the links and the name on the operator hub, give your feedback, we'd, love it and even be even great.

C

If you can contribute as well and that's it, I think we can take questions now.

A

Awesome uh thanks diana and allah, it's a really great presentation and awesome demo. So we now have some time for question. If you have any question- and you would like to ask- please drop in the create tab at the bottom of your screen and we will get to as many as we can time go.

A

A

B

Must have been very clear.

A

Yeah, I think, yeah, it's a very positive thing. It's a pretty awesome and a very understandable demo. So, oh we got a one question actually, so how can I configure the priorities of jobs want to take this question.

C

Yeah, so um the priority of the job is set at the um as part of the app wrap respect. So all you need to do is essentially just add that stanza the priority stands inside the app wrapper. uh Let me see here I can show you again the.

C

Example, yamo. So when you submit your app wrapper, if you want to take advantage of the dispatching priorities, you would just add this stanza and set a priority that you want that job to get assigned to. So this is all configurable at the submission level right. So whenever you submit a job, you set the priority uh that you would like to have for the job to have.

C

A

C

A

Next, with the answer.

A

In the following question from chan, can the priority can you priority be changed iterator when it is in the coup.

C

uh We so right now we don't support that, but we definitely have plans in our roadmaps to address that. um We also, I showed you a highlight of of the major things we're doing.

C

One of the things we're also doing to address all kinds of complexities is which is uh uh starvation right, so we initially start out with this user-defined dispatching priority, but you may, as a cluster administrator, be able to handle starvation when there's a low priority job, but you still want to get a few of them in so we're currently developing a system priority that will consider the user defined priority, but over time those priorities will change, and so we have. We have that part of our program. Mecca roadmap as well.

A

Cool and another question just came up uh from our grenade box. What is the role of the aiml in a resource management.

B

I can take that one so uh in terms of what we've uh shown today, we are doing resource management for ai and ml uh workloads. So it's the the opposite of of what the question is about, but uh of course, um the use of of ai and machine learning um in resource management right to in the resource management function and its benefit.

B

There is obvious, and- and we have other orthogonal work, uh for example, focusing on using uh uh ai in in deciding on the scaling of uh elastic applications vertical or horizontal scaling, the use of reinforcement, learning to work around failures and avoid failures so, but but different orthogonal pieces of work. um The the mcat controller is, is not really using ai to um to control resource management for uh for the jobs.

A

B

At the moment,.

A

A

Okay, so I think how all questions we are addressed today, so, okay, uh any other question.

A

All right uh thanks diana and aura, for the great presentation and acquaintance for reputation once again and all right, uh so so, and everybody thanks for joining us today again and the webinar recording and slice that will be online later today, and we are looking forward to seeing you at the future cnc webinar, as well as kubecon and clown navy called the next month in november. Please register and we will meet soon again and have a good rest of day. Thank you.

A

Thank you. Thank you.