Centaurus Infra TSC Meeting, 30 Nov 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Centaurus Monthly TSC Meeting 11/30/2021

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

Okay, okay, so so we had two agenda items. One agenda item was uh prashant was gonna provide uh his overview of uh distributed uh cloud. You know the the one from a use case and business perspective you know, uh but where the the direction this whole ledge computing and distributed cloud is going so it looks like he's not there.

A

So the second agenda item was uh we invited zappo to kind of there's been a lot of work done uh as part of the layer project ai project. So we thought maybe it's a good time to kind of checkpoint and let zabo kind of present where we are and what uh work. I think she presented about two or three months ago too. Actually so it will be kind of useful for the for the our.

B

Community so zabo you want to take over. Do you have any attacks? You're gonna share, yeah.

C

Oh okay, yeah, I'm still updating the slides, but uh I can start and we can just discuss okay. So what I let me see what's going? Okay,.

C

uh Yeah, just a little recap before other people forget what a eleanor is. uh It is a elastic, intelligent, ai platform within centaurus, so we currently focus uh we're currently based on kubernetes and focus on improving the efficiency of running training jobs, but for ai jobs. There are mainly two types: one.

C

The training and the other industry serving workload is under planning as well, and we have uh three or four main components: uh the the fourth one I haven't typed in yet a little update uh from a previous update, I think, is late drawn the its meeting. We add a couple of modules in the main framework. One is uh one: is the unified training framework uh before we when we prototype, we only support the elastic horowitz, which is one type of distributed, training jobs. But now we redesign the training framework.

C

It's a unified framework can support portfolio elastic and non-elastic drugs. So all the jobs are managed with one same simple yammer file, and you just use a framework key to wait. Your type of training jobs you want to run and under the hood, we have different branches to handle uh different uh frameworks, and with that we have a global overview of all the jobs which is easier for us to do.

C

The dynamic size, size, control like a scales, go up or scale down, and another component we added, since the previous update, is the the scheduler which is uh in the original architecture, but we haven't designed in the first phase. In the second phase, we implemented the scheduler um one main thing for the ai workload is it's.

C

um It will uh require a group of a path to work together, especially in the distributed training, so uh to avoid the resource starvation, like some of the part launched the same as pending, uh we need uh this kind of a co-scheduling plugin to launch the pod or nothing it just is done already done in the kubernetes governance side. We just trimmed down the code and integrated in our eleanor project and another special thing about this scheduler is the data driven. We have a component, I haven't read finished, which I said is the fourth one.

C

A profiler component can lively profiling, the resource usage in the real time. So when we calculate the score during this scheduling, we know every node is not how the gpu has been used. So if some of the gpu utilization is low, we can pack a more drop to one httpo card uh yeah, that's the two main thing in the scheduler and another main um achievement that we did is for the gpu sharing um beforehand, we evaluated the mps we, which is provided by media to share gpu, which uh they pack many process together.

C

But uh there are interference between the process, can cause uh uh unknown crash and another way is um to use the uh the cuda interceptor way, which means uh when the application calls the cuda api driver. We can intercept within the user container and then to monitor how much memory and the computer resources used by that container and if it used more than is requested, we can reject that request.

C

By this way, we can safely share a gpu from the higher level, so the challenging part of this sharing is we need to handle the memory, isolation and also the computer isolation.

C

It's relatively easy to intercept the the memory my cuda memory analog api call, but for the kernel launch, which is the slide related with the compute, is a little difficult. We are still working on that. So that's the um progress for this three component and the fourth one, the comp, the profiler.

C

We haven't added too much uh uh to too much additional feature. Profiler, we just add another database. We use the nosql database to store all the uh uh platform data so later on. When we try to study and learn the history, we can pull the history data easily from the database as the progress in the profiler, so yeah we'll still try to complete the whole scope. The whole uh the whole pro architecture in in within in the scope, but another thing we are looking at is the testing framework.

C

So by next release we like to see uh the owner can run daily and the benchmark can give us some uh performance results so and that's an overall, uh updated effort. Anybody has any questions for the detail. We can uh dive in.

A

Trouble you may want to talk about the collaboration uh work, the open source thing we're doing with the uw by the symphony project as well. The gpu utilization, oh.

B

Yeah yeah yeah.

A

Yeah, just briefly or if you want, I can kind of give a high level. It's up to you. Go ahead. Go ahead! Your wife go ahead.

C

You you and you can talk about it as well so yeah, I imagine, the serving workload is under planning which, which is we will do a throw through collaboration with uw and yeah deepak, know the details.

A

Yeah, so I think just to give you folks a background, actually we're doing a lot of interesting work uh uh with uh with the uw university of washington folks. Actually, so these folks have developed a framework called symphony for inference uh uh ai influencing uh framework. Actually, so this was a project called nexus before and now they can have added a lot of new new functionality to it to make it even better.

A

So I I so there are two or three other nexus was the previous project and a symphony is a new version of that and there's another project called clockwork also so there these guys are. This is the folks in germany, I think so they're trying to address the same issue, but this is a pretty hard problem as well, though, because I think every uh there's been a lot of emphasis on optimizing.

A

The training uh my model training thing, but the inferencing is a big problem as well, though you see, as the models are getting bigger and bigger. How do you kind of have a subset of models running across multiple gpus cluster of gpu, then how to kind of dynamically dispatch the the the the incoming arrival and of the traffic uh based on the qos? You know the patterns and the deadlines, and all that. So that's uh this project is all about and they're.

A

Currently it's not really uh open, so it is kind of open source, but uh we're trying to work with them and that there's some approval needs to happen and it's going to be apache 2.0 and we will eventually the goal is to integrate that as part of our tools. uh uh You know the project as part of santora's overall umbrella. Basically, so this will be our uh inferencing uh capability as part of our centaurus project. So that's pretty interesting. So we will.

A

We will uh do another detail kind of overview as we go along, but that's where we are actually so essentially just the bottom line is uh that uh we will have the project open source as apache 2.0 and incorporate it as part of santorus project.

D

I see today they already have a working prototype for the nexus.

A

Yes, yes, yes, everything is yeah, nexus is working, but they they moved away from nexus uh to symphony project, which is the evolution of that much much better. You know the gpu utilization project called symphony. I think you were aware of that. You know. Certainly I only.

D

Read the previous paper the nexus paper, I'm not aware of the new one.

A

D

A

Simply the new new, you know version of uh nexus.

D

Okay, just curious in the paper they mentioned that in the nexus they are, they are doing actually some very deep level- optimization, uh for example, they're trying to they're trying they were trying to combine um first first kind of uh they are breaking down a model into different layers and then combine the execution of.

E

D

Layer from different models, but from the engineering perspective there are lots of difficulties just uh or just the girls in their prototype. Do they already have that part or just the model model execution itself.

C

In the symphony they just use a simple model: they haven't break the model, yet.

D

Yeah yeah, because my impression was from the people that this idea sounds very very good, but from implementation perspective there are still lots of difficulties.

C

Yeah yeah, I yeah, I haven't read the detail like why they need to break the model to achieve the better results. Yeah, I'm not quite sure for that part in the next, but for the symphony they claim it's mainly uh centralized load balancer so basically about how to dispatch.

D

Yes, they want to do a fine, granular batch. So so usually when we talk about the batch, we mean like uh we execute the.

E

D

We x, I mean we executed the same model, inference um probably like uh 100 requests. Both are using the same model to do the inference. We batch the equation together into a same node, but they are trying to do even find a fine granular batch or parallelism. They break down the model into different layers, different.

E

Subjects combine.

D

Yeah, that kind of subsets then only combine the same layer from different models, but because because what a user submitted is just a single program for a model like a python python code, if you want to break them into different layers, logically, it's doable, but uh from a coding perspective, especially from cloud provider perspective. It's a little difficult to do. That.

A

Well, that's what that's the key feature of that! Actually, so, essentially the back end gpu each gpu. Let's say you have thousand gpu, so each one of them they have a subset of the model, basically loaded, and how do you dynamically?

A

Send your request, incoming request uh to the to the appropriate cpu, depending on the qos and the deadlines, and all that that's what the key capability of symfony is yeah so and then uh the other thing was uh the couple of things which were there in uh in the nexus uh project, because I think that the emphasis on symphony was this actually what I just described, but there were a couple of things which were very good in uh features in nexus. They haven't ported that so they need to.

A

I think there was a concept of what he called delta model thing. So if you have a different variations, customization of a model, so you don't need to so they do optimization at that level, see as opposed to having each model treated differently. They just work at the delta level. So that feature so there's some work. So that's why we propose this collaboration, because once we understand you know what symphony project is and then we can add with the zabo team is already doing a lot of gpu optimization work. Actually, so they they current.

A

uh The scheduling they use is the time slice phase of scheduling id. So maybe we can add more value. So that's the purpose of our collaboration. Basically, and then this will be part of uh open source, yeah yeah, so their students uh will work with us, uh uh uh professor arvind's uh student, uh so that's the kind of goal. So this is not going to happen now, maybe around february march we're trying to understand the code base.

A

We get getting ourselves familiar with the old nexus code and the the new symphony code and then see what value we can add and there's a lot of work needs to be done so right. Now, it's not in a good. It's not in a shape to be released. You know uh publicly like apache 2.0 and all that. So that's. Why we're doing this work with that.

D

Did we finish all of them? That's probably more like more practical or useful in the cloud.

C

Another training, um as not all done and still still, we are still building like the unified thingy job.

E

C

Intern is writing the code. I think a he will do the final checking this this wednesday and uh for the scheduler, since we added this uh uh viewpo uh manager plugin thing so the scheduler has to handle the the fractional gpus gradually with, which is a little different than the regular scheduling job so yeah. There.

D

C

D

A button for the elastic either for the elastic itself, the other one does not rely on fractional gpu right.

C

uh Lastly, here we just give the whole car like a distribution yeah and therefore for the sharing yeah before like when we talk about. We want to like share the review during the training and and after, like uh some practice and testing. We think may not be that very necessary, because these days the models are big and so yeah when they use distributed training. They already like ask for like a 4 8 like 16 cars. So sometimes uh you know, maybe in the inference there should be a serving uh uh scenario.

C

We can share the gpu more.

D

Yeah, it's probably.

C

More more important.

D

For for the inferences narrow, in fact uh several years ago, when the the gpu sharing project that we did, uh which landed in production, was also designed for for influence for influence. Oh only after that hq guy told us. Oh there there's a lot of small models of training if it's possible for training, but today I don't know if that's still that valuable.

D

As imagine more because now most of the models are pretty large.

C

Yeah yeah yeah, the intern did some experiments like he kind of saying. If the gpu utilization is over, 60 70 is not worth to take another job.

A

Okay, so I think so thank you very much. Yeah okay go ahead. Go ahead!.

C

Yeah and another thing is: we have a new hair, a new place worker join the team and he has knowledge on the like tensorflow source code. So we plan to do some machine learning framework optimization as well.

D

C

Like yeah, that's that's all.

A

Okay, good, so uh I think that's pretty much it actually. So stefan anybody, you guys have anything to share. uh We can we still have a lot of time, but I think other than that. I think we pretty much done.

C

I I will stop sharing.

A

Yeah, okay, so I think uh we can end this meeting then in that case I think most likely, because next month is gonna, be a holiday. I mean this december, so our next meeting will most probably be around january and kind of a time frame. Yeah.

D

Exactly yeah, it looks like we do not have uh many topics recently, we will skip next month's meeting if yeah yeah.

A

D

A

We will push it to january and I think the there's a lot of work being done in the edge edge cloud side in our you know. I think that a lot of work, uh we're gonna, be collaborating with stefan's team as well, so we'll give give kind of overview as we go along actually the whole programming model and on the edge cloud and all that so yeah, so that would be yeah. Okay,.

D

uh I don't so circle go ahead.

A

No, no, I think so other than that we are done.

D

Okay, just the one one quick question for stefan: okay: steph: are you there.

E

Yeah yeah yeah.

D

And I are working on paper and there we are using the itv. uh I attribute latex template for some code slips. We- uh and maybe I can use a couple minutes this year to see. What's uh what's the temperature you guys are using, it looks like the code embedded there. It looks another nice we are seeing. What's what's the right template we should use?

D

D

Are you guys able to see the screen yeah yeah, so you can see for the this? Is uh uh I three uh latex um template that we got and for embedded code and the listing part? The font looks very weird, so just a quick, quick check and see. What's the usual element that you guys are using for this uh code coding and then in this key.

E

Okay, that's a good, that's a good question. I I usually use uh listing no, not listing. I usually use a cursive adjusted. I changed different font basically and then I formatted, um but I can send you if you will uh just some examples. I I can't remember now from the top of my head, exactly the configuration, um but I can send you some examples from some papers and and how I configured those okay.

D

Very very helpful: if we can have a sender some examples. I don't know sure somehow I'm well yeah, but the fountain looks really weird. So I would, I think, you're doing wrong way, trying to find the right element.

E

Yeah yeah, I see what you mean, I see what you mean yeah. Definitely um I will send you. I will just send you an email after the meeting with some some examples that I found or that that I was using actually because I don't know if I, if I can find anything about this. Okay, no worries.

A

Okay, great uh thanks everybody and then have a great happy holidays and and happy new year to everybody. Yeah.

D

Have a great holiday season.

B

E

Happy holidays all bye.