Kubernetes SIG Node, 12 Dec 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: SIG Node Resource Management WG, 2022/12/12: Kubelet Plugin Discussion, first of 2

Description

Meeting notes and Agenda:

https://docs.google.com/document/d/1ALxPqeHbEc0QOIzJ3rWWPpwRMRlYDzCv0mu2mR4odR8/edit#

A

A

Okay, yeah sorry for the delay. With the start, this meeting uh welcome to to today's session. uh We, uh as we discussed in in sick note, uh we are working on a new cat for uh resource plugin manager. uh We opened already some sort of uh Fork uh for for for the cap uh currently under my GitHub I will model if you can put the link in the chat, appreciate appreciate um so we we have some initial structure already inside.

A

That um I will most probably switch Pro to the slides, just to explain, explain some of the motivation and so on. Thank you um right.

B

A

um Yeah, so currently, uh the our kind of observation about cubelet state was that um we have. uh We have several managers um inside cubelet. We engaged in in actual uh kind of um Resource Management activities.

A

um In this illustration you see topology manager, memory manager, CPU manager, device manager, and we know that also since 126, we will get for fifth, one uh that that will be the dynamic resource allocation um and uh yeah. What becomes obvious in that illustration is already that if you want to extend CPU manager with some New vendor-specific Logic, it's quite uh quite difficult kind of situation and um we as a vendor are interested to to propose several kind of uh extensions, and most other vendors would like to do that too.

A

So that's why we are thinking some sort of plugable uh concept uh would be very good. We see plugable Concepts in cubeletal already uh in the situation with Device Manager. Also, the array, Dynamic resource or location, uh introduces the concept of drivers which can be plugged from from um other vendors other providers of such implementations. So um basically what we were thinking. Why not have a central point for all this?

A

um Basically, we have one um the the idea is, if it's possible to have just one resource manager which is responsible uh for for um handling of different plugable concepts. It can be device plugins. It can be also Dynamic resource allocation drivers.

A

um It can be also plugins for for standard resources like CPU and memory.

A

So our goal is if, if we can refactor the current state of cubelet, so that we can have basically a single resource manager uh which can handle all all this kind of plugable Concepts inside, allowing to minimize basically the complexity, Inside kubernetes cubelet by having just single kind of entry point for for won't work um and also allowing extension points for for vendors, not only for devices uh but also extension points for other resources which don't fit in the device sector or in the device field and uh the the commonality with previous approaches is.

A

uh There are some common points. The the protocol uh basically can leverage a lot of the available mechanisms already in cubelet like it can function similar to device manager or to the the dynamic resource or location manager where you have classical registration the step. So basically, there is a socket available on the file system and resource plugins are also similarly to device plugins demon sets which connect to that socket and they get registered after the registration.

A

um Basically, the lifetime Loop of cubelet kicks off and we basically build a protocol which, more or less invokes certain certain events uh on on the Dima sets. They can be admission, events or basically allocation events, or if, if basically, we have to delete something, it will removing containers or Bots.

A

um As this step is quite big, uh we think um we we uh we we were proposing if we should not consider doing it a little bit incrementally.

A

um So what we could do is uh build again, a single point of contact, resource manager um which can those plugins and um the existing managers. Clickable managers today, like device manager- and this illustration not show Dynamic resource allocation because become children of this.

A

um This Central Resource Management component and uh basically, if we have to handle device plugins, uh the the calls uh will be forwarded to the device management uh uh um component and uh Dynamic resource allocations. They can be forwarded also to Dynamic resource allocation manager.

A

uh This will allow us to do a small incremental step uh basically um without and we can reuse completely the the the two pluggable components uh for for the other three managers, uh CPU manager, memory manager, topology manager. uh We are thinking to uh as uh we we need to support uh existing functionality.

A

One good approach is to offer a default plugin, uh which will Implement CPU manager, memory manager and topology manager by reusing the existing code base, so those components will actually uh will be uh still residing in in public as source code, but they will be instantiated. The managers will be instantiated in the plugin, so we we don't need to duplicate code or in inside a separate plugin.

A

um Yeah, so maybe I do a short break for for questions.

A

If I'm not going too fast.

C

um Any question who maintains any state information across these plugins today, if I'm not mistaken, we have some type of checkpointing that tracks decisions that were made and things that are running. Are you? Are you delegating that to these plugins or are you assuming that the cable continues to maintain the checkbook.

A

Manager, plugins and we'll, for example, this this kind of Legacy plugin they still will maintain their own State, like um CPU manager, will will track what are the located CPUs and if there are CPUs available. um So this kind of there will be some sort of State handling inside inside the plugins.

A

um So it's expected to have that.

C

uh But in the former picture I'm just trying to it seems like the first picture you showed is your ideal picture, so I don't know if you're trying to get to that state or if you're saying the next picture you show, which looks much different is perfectly fine too like? Is there a if I?

C

How would I contrast these.

A

Two to the two pictures: the contrast is that we have this device manager um and you will get the the basically at the current points inside cubelet, there are two managers which have plug-in Concepts.

A

um This is the device manager and there is the dynamic resource of location and if you read through the link, I hope Mario provided it. What we are thinking is some sort of abstraction or extension to Dynamic, resource or location that it can handle those cases yeah. It would be really good if we can basically have a central abstraction which can handle Dynamic resource on location, can handle device management and can handle standard resource plugins.

A

This is more or less the vision. What we are after um and the the difference here is that we we start with a small step instead of uh implementing completely the functionality for device management management inside this resource manager. We just dedicate to the device manager, the current existing device manager and we make it a children, a child of this resource measure we somehow nested inside, and we do the same for for dynamic resource or location.

A

Not all on this illustration, but basically in first step, the the big difference is: we want to reuse those components, so the two paths basically for the device plugins and for the dynamic resource allocation drivers they will be handled by just forwarding, passing the calls. The life cycle calls the needed lifecycle events to basically to those two children managers um in the ideal State. This gets merged inside the resource manager.

A

Basically, the the functionality what device manager has and dynamic resource manager has, because they are, if you look inside the the code base, those three components, device manager, resource and dynamic resource or location manager and resource manager. We'll have quite a lot of common ground.

D

Yeah, it's um it's related to this specific point, so we're in the middle of iterating on a PR right now, where the at least the dra manager is going to be moving outside of the container manager and directly inside the top level couplets, um at least for the time being. That's where it's going to be moving, because um you know all the other managers right now. They um they are called during the Pod admission, Loop right and uh we have a somewhat artificial restriction in the way.

D

Dra can do certain things right now, because it's because it's sitting inside the container manager and to get away with or to do away with those we're temporarily pulling it up to the kublet um rather than having it nested inside the container manager, because um we need to be able to have um transient failures when we call out to the we need to be able to handle transient failures.

D

If we happen to call out to the various device drivers that are connecting to it and right now, there's no way to handle transient failures in the Pod admission Loop. If, if that call out to the plug-in, fails you get a plotted? Mission error, um and so we could pull it back into the to The, Container manager and subsequently, the resource manager as you're describing, but as a prerequisite to that, we would have to be able to add some mechanism to handle to handle transient failures in the in the Pod admission Loop.

D

So just something to keep in mind is, as we go to do this. If we want to support this the right way, that would be a prerequisite for that. What.

E

Do you mean by transient failure, are you talking? The device is just temporarily unavailable.

D

I mean someone I mean that the during the device manager case, for example, if the device manager called out to the um to the to one of the device plugins, and that call fails, because maybe that device plugin has has died or you know, is unavailable for whatever reason at present, that will just appear as a pod admission uh failure, which is which is weird, but it is the way that it manifests at the moment um and with dra.

D

The possibility for these calls to fail is is much higher, and so, when we had it in this Loop um of uh you know the part just like we do with all the with all the other managers.

D

um Sorry, my headphones just came out um so when we had it in this Loop, where it was, you know just alongside all of the other managers, we saw it failing quite often with these potted missionaries, and so we had to change the place that it's called into to not be in that Loop um yeah, so um we'd prefer for it to be in the Potted Mission loop. It's just that we need to make the pot admission Loop more robust to these types of failures. If they can happen more frequently, yeah.

E

Is that topology advantages currently fails, like you end up stuck in pending, seeing if the new amount line that isn't cracked, so this.

A

E

Solve that particular problem, so I think it worth the Cycles to figure it out, see.

A

In general, the the point is very valid, uh like we will have three things: relying on plugins having some sort of robust mechanism for failing failure of pretending, failures of plugins. It's it's really required and I agree on that. Yeah.

C

Just make sure I have a understanding here. Are you proposing? A cubelet is aware of one resource manager plugin in this picture, and we in tree have some default implementation of that that Maps existing function, or are you implying that the cubelet would be aware of many resource manager plugins? Is it a one to n or a one to one.

A

So you will have, let's say, a plugin which will be responsible like in this case for CPU memory topology, and then you will have plugins doing devices and yeah. So you can have additionally further resource plugins. The doing further kind of um activities.

C

F

C

Accept so a lot of these other devices.

C

um You know the keyboard only advertises the resource when the plugin has registered to the cubelet, um and they can only schedule the resource when the plugin is registered and there's like a heart beating type. Are you still good um for things like CPU and memory?

C

um I'm assuming the cubelet shouldn't depend on the presence of that plug-in to be able to schedule CPU and memory correct, but is there like a chicken and egg uh situation here that y'all have thought through on um what what one should do like today, at least I know, I configure the cubelet it's in one binary and from the moment that cubelet starts it.

C

Can it can do what these resource managers do today, I'm assuming there's some some late latent type issues that happen here like are these plugins deployed as Daemon sets themselves that none need to be launched or managed by the keyboard that Ben can't have plug-in decisions made on them or what? What are your thoughts on that space? Generally yeah.

D

Are you asking if it's like two levels of plug-in, like the resource manager, has a device manager plugin, which then has device plugins attached to it? I'm.

C

Just asking like today, you know traditionally the cubelet wouldn't be able to schedule a GPU. Let's say until a GPU plugin had registered right and then the keyboard advertised the GPU and then you know the Pod says: I want a GPU and the keyboard says: okay, I can do this. What I'm wondering is uh absent a plug-in should the kiblet schedule any pod that requests CPU or memory and then secondary to that is the expectation that these plugins are themselves managed and deployed as Daemon sets and so I'm just curious. If there's a yeah.

A

One options in the in the absent case is, of course, do default, which is more or less assigned best effort class to any bot so that that's even if you get guaranteed or something else this is doable.

A

um You don't need very you.

C

Might it doesn't seem right to me, what do you mean I mean I, would assume I get I, get what I ask for, um but maybe the macro question are you imagining? The cubelet? Is the manager of these plugins? Are these plugins? Yes, pods.

A

Yeah yeah, basically kublet becomes the manager of these plugins and you might have a small core which inside the resource manager which can handle the case until when you don't have a plugin in the system, so that you just get best effort container or something um or or pot, basically um another option how to deal with that is similar to core DNS. uh Basically, you, you start a Bots at its uh through Cube admin facilities like with some default plugin.

A

What we have um right, but in any case I think we will need some small core piece which, which does at least support a worst of all class, the best effort class of resources when coming right.

C

All right so I'm a little I'll just that one I'm not sure how I feel about that. um I was just mostly trying to figure out if I just had the expectation right here, which is in the same way that many deployers in the world manage their cni as a Daemon Set, uh and it would have some elevated priority. Class um and deployment of the cni implies some Readiness State about that node. You don't have anything right now. Around deployment of these managers that could.

A

F

C

Readiness State yeah.

A

It's also one option: we did not prototype such a solution, uh but uh if we identified it as needed, um so we we can take a look: how to implement that. So.

C

One of one of the use cases I know as a vendor is I have to be able to classify components that are quote unquote. Management on the Node versus, say, workload and things that are needed to support, like the the running of that node um may need to get moved to particular CPUs in the particular set right and stuff classically. For me, as a vendor would mean that, like I want my cni on cpu01 I want DNS plugin on cpu01 I want um my metrics collector.

C

You know uh carved away and I want to use the other 40 odd CPUs to run my 5G workload or that type of thing. um So what I was trying to think through is in this world.

C

How do I um get to a spot where I could deploy in that fashion, like at some level, the qubit would want to make the plug-in that the sides how these things should run also be able to be supporting itself being boxed away right like um and maybe I would want to make sure that this plugin is running before the other. Like things are deployed and then I want to make sure I. Actually, don't know what to do if this Damon is down around running the other workloads like.

C

Is there a dependency concept here because, like in my head, when I think about like, like a 5G workload and Swati and Francesco- probably wear this now like to me? It's like I've, I've, boxed away my work, my management components into a very small budget, and then my workload components get the rest of the node and they want the best alignments and stuff that they can get. I appreciate that y'all are also interested in exploring other unique capabilities around your chipsets and stuff, but that basic use case I still think is probably Universal.

F

C

So I was just trying to think through, like how would we get get to that, um particularly if um these things are are pods themselves? I don't have the answer in my head, I was just kind of vocalizing the things I think about when I look at this sweater Francesca. Do you see similar issues or have thoughts.

F

Yeah I think uh Francesco and I were discussing earlier. uh The chicken and egg problem is definitely evident in this case. It'll be we'd have to figure out. How do we bootstrap the system up front, maybe cni? How cni is doing it? That's something that we adopt, but I think it would have to be explicitly stated.

A

Right, I I think you will need some sort of bootstrapping component agree, most probably ideally in something through Cube admin or something which which deploys basically at least the default plugin um right um so and and I I like the idea for the radio state it it at least gives a clear picture. If the note is is usable or not um so I I, don't know what do you think the other people think about that? But it's not not bad idea. Yeah.

B

And there is a Francesco here. There is another case which is related to what the directions, what I mentioned, which is dear to me and I- think it's important for this form, which is uh it's also about how we do manage the bootstrap, because if the plugin, which manages CPU goes in a crash Loop, what what? What what next I mean is the node lost? Can the node recover somehow so I just want to make sure that the failure scenario and the restart scenario is mentioned.

A

So right, I was thinking today a little bit about also similar kind of the issues. What what happens when, when basically, you have a dying, plugin and stuff like that, of course, changing the radio state of the note is a bad thing um to do.

A

um Then I think you, you might be winning something uh if you think uh back to Cuba today. If something happens with cublett and the container manager or let's say CPU manager, topology manager, crashes, you have most product to reboot the whole thing, um so you might be winning some some benefits from Bots uh liveliness tests stuff like that I hope those can improve a little bit this situation. What we have today.

E

But one thing to keep in mind while we look at this picture is I'm. Looking at this similar to the way we look at the Linux kernel and drivers. So basically, what we're talking about here is a CPU driver. What we're talking about potentially is a power driver memory driver or a topology manager, piece which is kind of external to those drivers.

C

Yeah I think that's all appreciated. Marlo I think the issue is just: is it managed by the platform itself and then what are the error cases when that is the case right.

E

Yeah, right and I think where we're going is minimal manage by the platform itself. Until the plugins are activated, then you can take over. Those are definitely Damon sets and we should be following other patterns already in organized. So I agree with everything you're saying here.

C

Yeah so I just uh I, think for Francesca and slotty. um The bootstrapping concern is probably very real. It would need to be able to box literally the first pods that are running on these notes, I'm, not sure in practice.

C

Maybe Francesca didn't get a chance to answer what what we have observed with like individual managers within a cubelet going unhealthy and then the impact of that versus just you know, restarting that cubelet um but I do imagine, there's a sea of complexity that we would all uncover here um around error scenarios, but maybe Francesco or Swati. You can speak to what maybe you have seen if anything there.

F

So one thing: I'm going Francesco I.

B

Just want to mention that it's totally true that a bug in CPU manager, for example, may cause the keyboard crash through. There is no discussion in that. The thing is this remains too true and moving the complexity actually I mean moving. The CPU management, for example, and the memory management into a plugable component makes the system complexity a bit tiger. We gained something, but we lose something.

B

So yes, a bargain memory management say you can crash the cubelet and the note goes in a ready state, but at the same time, with the moving that on a plugin, if the plugin crashes, then we have more I think we have a wider um uh vulnerability surface I'm, not saying it's bad I'm, just saying it's different and I'm I think this area should be explored and and I'm done.

B

The only thing I wanted to mention explicitly, because I think that bootstrapping, a node could be maybe I'm wrong, but could be somehow simpler than being able to recover a failing research manager. This is at least my my gut feeling, yeah.

A

Bootstrapping I don't expect to be as difficult as the the faulty cases yeah.

B

Yeah I think these self discussions aren't done gone. Sorry.

E

Francesca- and one thing to note um is I: don't think this is increasing complexity, because Kubler is so complex as it is and as we're looking at more and more things going into Kubla to address specific things. This is actually simplifying our process going forward um and simplifying the Kubla internally and moving that complexity into the regular plugins. So if someone wants a very specific plugin that handles CPU resources in a particular way- and it's very specific to their case- the community no longer has to support it. You just have it in the plugin directory.

E

C

I agree: there's there's a difference between like Community complexity and like deployment complexity, yeah, I'm, hard-pressed to not see more boxes and think more complexity and more error cases. So I I, I, agree. There's a community relief here, I agree, there's, probably a differentiation that can be offered to to users and vendors that would be appetizing.

C

I struggled I think that this will be as easy to deploy as present State, um uh just because there's clearly more boxes and more more error cases to think through so but I I agree Marlo that there's definitely a potentially a community relief valve here, but um I'm, not certain I'm commenced. The deployment complexity is, is easier and so I guess what I look at this here is one I agree that uh we can't drop existing function. There's people in the world depending on this.

C

So if, if we move existing function, that the community has taken on into one of these plugins like that, that plug-in needs to be as rock solid as the existing function today is right, um and so from that spirit, I, don't know if that plug-in needs to run through a socket or can just be launched inside the existing cubelet today, um so that people who don't uh need the additional capability don't take on the additional complexity, um so I think there's things we can think about there in the same way that we ran the the CRI shim for Docker inside the cable.

C

Today, like you, could run a resource manager plug-in to this new contract inside the keyboard. So I think there's things we should think through on that.

D

C

Could actually be.

D

That could actually be bootstrapped of this kind of default, plug-in that they were mentioning yeah.

C

I was just trying to keep in mind that, like in my head, like we can't make people who are successful today, um more I don't have to be cognizant of like the the complexity we're asking them to take on with respect to their deployment postures that um there's probably things we could do just to vocalize here that you could run a plug-in entry also in process and not regress anybody, but have a new contract for those who want to deploy externally.

C

um So yeah I agree Kevin. It could be one of those things we keep in the back of our mind.

G

To change typically with uh I I want to reiterate on this uh um checkpoint I didn't get the answer where checkpoints and will be happening and also similar to checkpoints and uh word configuration. Do you think, will leave an ideal picture? Is it something that we configure for couplet? Is it something that we configure for a specific resource, plugin and houses like configuration will be versioned?

G

You find any so.

D

I, ultimately, like who controls.

G

What is configured is it the system, admin that controls everything and complete has no uh saying to that or it's opposite.

A

The Plugin or, let me start with the state or the checkpointing, so if you think about checkpointing I, recall, for example, the CPU manager checkpointing, what course it are located in this kind of CPU major State file. This can still happen, so you you can or the the plugin which handles this the or implements the the standard CPU manager, uh the default.

A

One can still do basically the checkpointing um and then store the file um in in the as before, basically in VAR cubelet, basically so that when something happens and you need to to start from the checkpoint, this, this should be able to to work as before.

A

um Right, at least from from my observation. So far, I I saw this guy kind of checkpointing there, maybe there's some something more in device, plugins and and the array, um but.

D

For now yeah it doesn't have them. The array doesn't have any, but the.

A

D

Has a has a similar checkpointing to the CPU.

A

Right so yeah, if it's only the CPU manager checkpoint, what we have to deal with, we can leave it to the default plugin. Basically, the action, basically to own it, to write to it to read it and stuff like that.

G

And what about configuration like? Is it uh managers configured from couplet or the separate configuration for every manager like what is ideal picture like who is in control, yeah.

A

The um control of um demon sets, so it's usually um you you can have most probably also some cluster light privileges like what what plugins can you install and stuff? There could be some configuration from administrator, what what is allowed and what not to add as a plugin uh but yeah, adding a plugin is similar to to how you install a device plugin today. So.

A

G

Compare it with with the plugin itself, so configuration of plugin create plugin itself so, like what do you have like today? Equivalent have configuration I believe on one of the previous slides. uh There was a situation when Google config will be propagated down to plugins and I'm, not sure if it's just uh to satisfy current scenarios and support backward compatibility or it's something that.

A

Our prototype, we did such kind of um solution how we can share the cubelet configuration to a plugin so yeah, basically after registration, we can send it to to the plugin. So if you want to configure kubernetes before it's, it's also doable, um you can keep the old cubelet configuration and pass it to the underlying plugins, but nothing stops US also to configure on plugin level.

G

So this is just temporary, or do you envision that plugins will need config long term.

A

uh This is open for discussion. um I I think you you most probably will need some Legacy Solution on that on on the passing the cubelet config, uh but yeah long long term.

A

You might switch completely to to putting that as part of the plug-in uh kind of spec, yellow spec.

G

Okay, thank you and another question. I I just like for previous question. I was just wondering if you have specific scenarios in mind when Google config will definitely be needed for plugins. If it's just for backward compatibility, I I see that okay, we need to configure plugins themselves yeah.

A

We we were speaking about for many. Many use cases are usually relying that the system resources are separated, like uh you have separated course for um for runtime and stuff, like that, basically, the reserved CPUs, what they're passed nowadays so I think that this this use case. We we see it in a lot of uh yeah spaces, um definitely also in Telco space. So somehow we need to pass what will be the reserves, CPUs and and stuff like that for for the cubelet and runtime.

A

So this this has to get to the plugin um other configurations um in in terms of plugin configurations. Let's think about CPUs uh the Water people would might would would could have some knowledge, for example, that um they they have a workload which cannot tolerate uh having siblical threats.

A

You need full physical, core allocations, so it's one option what you can configure, also for the plugin um yeah um there or further, like huge page configurations stuff like that- definitely fits inside very, very well.

G

Okay, another question I had is: uh have you thought of possible abuse for these plugins like I I, remember somebody who proposed to do plugins for Port admission and those are plugins for pod admission, so people can register CPU like fake CPU managers. That will just check on some other parameters and won't admit something because of it doesn't want it here.

G

um Is there some protection we can offer or like um yeah.

A

No we're very, very good point: it's one classical thing how we can protect it. We do some sort of handshaking, so the TLs, the certification basically checks, if, if the plugin are actual plugins and the not something strange so I think uh also in in the kept description, what Marlo has uh we were thinking?

A

It would be really great if we have some sort of computer Community, the repo of such kind of um approved, plugins or yeah, um things which are we know that are certified and and so on or yeah basically cannot break your systems if you put them related.

C

um Do the plugins actually change any state about any running container on the Node, or is it still ultimately the keyboard? That is the one communicating to the runtime service to say, please make the state change.

A

F

C

Like what privilege do these plugins typically run with and in your mind,.

A

In depth to us, that's uh basically, we would like to cover both situations, um so we we will have plugins, which can run completely in user, privileged mode where you rely on the existing CRI interface, basically or the resources. What are to be allocated? They. They have to be fulfilled by the CRI interface by the runtime service, um but uh other customers other users. They want more. So we think also for privileged plugins, which which can do more kind of tuning, which is not covered by runtimes.

D

Yeah I mean that's how the CPU manager works today right it. It modifies the set of CPUs in right in in this in the in the system, containers that aren't actually running with exclusive CPUs. It grows and shrinks them based on what it's allocated to any other Cloud.

C

By ultimate calling update container on the CRI yeah.

D

That's what it currently does, yeah.

C

So I guess what I was trying to figure out is do if or see any issue where the keyboard itself thinks it's the manager of a certain resource, and you would want to cease it being the manager of that resource.

A

I, don't think we will run in that issue as we are tightly coupled with the lifetime Loop of resources.

A

So if we were we, we are basically inside the loop, the defining containers, updating containers, deleting containers, so um do we we can avoid conflicts, of course, some some plugins which have privilege kind of um malt more or less.

A

There is always the possibility that somebody provides a stupid implementation of the plugin, so we cannot avoid that right.

A

But nothing will stop you also. If you have a privileged container somewhere in kubernetes yeah. Theoretically, you can break the whole system so.

G

Yeah I think that expression, I, don't know if that you I put words in your mouth, is also about uh what kind of abuse can happen.

D

G

Yeah, uh if we allow that much of privilege and abuse, not in terms of like hikers who hack into the system, it's more about uh people will try to use this plugin system for something that it's not intended to be used for uh and then, uh if it will, it will become the generic pod admission plugin system. Then we may decide design it as such, rather than like making people try to hack into it, and then they will hit some limitations or some side effects that we didn't intend for.

F

A

Yeah I think we we have to cover that. Also, yes,.

E

I mean what I'm hoping will help is have a plug-ins uh group, just like the scheduler plugins have have a kiblet plugins group that there's um blessed plugins right and then Avengers want their particular plugins. They can have their own blessing on them.

C

Excellent related to this um existing efforts that people were exploring for topology we're scheduling um where, where would you see those efforts being redirected, modified or adjusted in light of what you'd be proposing here? Are.

E

You talking topology manager,.

C

No I'm talking about um the efforts to make the cluster scheduler the node topology aware and the endpoint added to the cube. What they say tell me the placement decisions that were made and then the scheduler trying to yeah.

E

C

Take that and took out when making cluster scheduling decisions so.

E

This is part because you not only have to have to choose things. You also have for type core frequencies. So if you start looking at sustainability, space and your core frequencies do matter so whether you're configured for high performance or lower performance course.

E

So these are all to me attributes about the resources that we don't have the ability to show today the dra does, which is part of why we want to piggyback on that.

C

I'm not sure, if we're speaking past each other, though um just that there is an existing activity that folks were exploring to make the. Let's just say the high performance case right, which is to to reduce the amount of scheduling or potted missionaries that can occur with the existing plug-in that our node side to understand.

C

If there's even a slot that could fit this this pot or not, are you imagining that the cubelet is still aware of all these things, or are you imagining that these plugins would need to emit state that the cluster scheduler could could rationalize or reconcile, and maybe Francesco or Swati or know Kevin I? Think you're working on the space um could continue more detail.

B

Yeah long story short very long story short. The requirement is just quote unquote to have something like the current incarnation of the Pod resources API, which so sum and point which at another level, we can query to learn about their location.

B

That's basically what we need to expose that at cluster level, I, don't think I at this point in time. I'm not promising is all of it. But at this point in time it seems the biggest requirement yeah.

E

Get rid of within saying: Francesca is it's beyond that, so it's not just the location, it's also what the frequencies are or if the course can even handle the workload landing on it. So what I'm? What I'm asking you is to consider that topology is one of many attributes we may care about with the CPUs.

E

Kevin am I making sense.

D

In kind of the you know the the grander vision for how you would do better alignment, but I think Derek, question Derek's questions just more about if we wanted to continue supporting this topology and we're scheduling that's already in place that makes use of resources that are advertised by these existing components. How would we do that in this yeah.

C

Or we just do we cease that work or what would be is that work that people were tracking, that we should inform us it's far more or less grandiose, like I appreciate that with the sustainability efforts, there's more interesting right power management issues that people can explore and I'm not making a value judgment in any way, shape or form. Just kind of wondering like.

A

C

I keep looking at this now.

A

I think what Maro mentioned is still also interesting. The dynamic resource allocation is something which now becomes exposed uh to the scheduler. The question is: if we can fulfill the topology requirements with other kind of mechanism, maybe through some extension of dynamic resource or location and stuff like that, but yeah the um it's also a question of performance, see if we will get the performance requirements well, what you have with such kind of topology management, topology, aware scheduling, um we.

D

Have plans for how to support topology where stuff in dra, but it's very different than the topology we're scheduling stuff that's been going on until now, based on these managers so um yeah another.

C

Thought I, I, guess separately.

D

From this, but yeah good.

C

Another thought to This was: um uh was there anything explored around making, just in time execution of these plugins being a preferred model versus a long running pod running as a demon set on a node that takes some CPU or memory budget?

C

um If you had a way to just in time, launch this plugin and ask it this question and then sleep it again. Is that ultimately preferred.

E

I mean that's a good idea, but.

C

E

Has to do with your plug-in implementation, so you could do your plugin in such a way that you're just calling the library and you have animal resources. As you run,.

C

Yeah, this is why I was kind of asking like if Damon sets are intended to be the management model, that these things are deployed up, because they don't need to be right and with a pod, Comes, This sense of permanence and particularly um well majority of the time, they're not going to be doing anything right, they're just going to be sitting there doing nothing. So I was curious if we had any appetite for maybe exploring uh just in time type of uh invocation of the plug-in, and then you know, shut down the plug-in gotcha.

E

So we're talking a grpc interface, so I bet you could interface to anything you wanted. So if we want to start exploring other things outside damienstock. That may be interesting, but it may be outside of what we can accomplish in the time for this.

A

Yeah I think a little bit after having some ground on kind of, but yeah. It's interesting, interesting thing to to be looked at, I think yeah.

E

Like it's a great.

B

E

Love it I, don't know: if we're going to have the capacity for it will the grpc be sufficient? I guess is my question to start exploring other models as well.

A

Yeah, it rises some other questions. If you have something quite good just in time, if those things are put to sleep, if they are moved around see if you're not breaking other other work with. So it's it's complex kind of topic.

C

A

C

Was imagining some type of socket-based activation and these plugins could be deployed into some directory almost like cni plugins right, not the whole, damn instead, but just dropping some Etsy folder somewhere or some bar folder somewhere and do I need a long running. Pod is what I was just trying to think through and I'm, not saying yes or no either way, but I could see.

C

um I could see benefits of not needing it. So I was just curious if it was something that explored or not.

E

For some things like power management, for instance, you still need to monitor the course and so you're still going to need to be able to change the frequencies ad hoc, because part of what's going on with our kubernetes power manager, is we're having to wait for the poll from the API servers know in the Pod setup, which is a little silly considering. We just want the information of what's going on on the node right, um so those will have to be long running and those will fit nicely into this plugin model.

E

But what you're talking about where it's just at at start and end like I, think we should look at other things.

D

Yeah typically device plugins tend to need to be long running too, because you know they do two things: they they advertise the set of devices that are available, but then they also monitor the health of them and update that list over time. If, if any device becomes unhealthy,.

C

Yeah so for things like the built-in resources, I was just trying to think through. If that was really need it or not. um I could see long running if there's some Metric that they export. That would be useful. That's just a thought on that. um I get a lot of pushback from our users that there's too many out of the box pods running on every node, and this would potentially expand that same list of out of the box pods that some of our high performance users get upset about. So just curious on people's perspectives.

E

Over two minutes to the top of the hour, do we want to continue discussing on Thursday I know not everyone that could make Thursday can make today, but it looks like you all did. What do you think.

D

Yeah I won't be able to make it Thursday, but um obviously continue without me.

C

I'm gonna make it Thursday I'm curious. If there was much more to the deck.

A

Not too much, we had a little bit um an illustration. Basically called the grpc site works, but this is nothing really huge. It's similar to device plugins mechanisms, um you have a grpc socket, basically, which is created by cubelet, and then the demon sets are registering against it um and you hit the state machine actually is handled by the plugins and yeah.

A

um They can be only read only if you don't want to to have privileges root privileges in the plugins. We in that case, as we discussed, can come back to the runtime and do the allocations. Then we depend what the runtime supports um and in the other cases where plugins needs to do more, um then, then you could get privileged kind of plugins right.

C

So I know we're at the time I'll be there on Thursday I, think uh just speaking for myself, um um and maybe others could could help iterate to get here, I'm very open to us supporting a plug-in ecosystem. What I want to make sure we do is don't by virtue of doing that, make existing deployers lives more complex for existing function.

C

So if, if maybe some of the feedback that was given in today's discussion, which would be those who don't worry about error complexity with the distribution uh can, maybe we can make some updates to the enhancement or get feedback on it.

C

That says, um maybe the built-in thing just stays built in and cubelet managed without necessarily cue extracted, maybe if you'll have time to think on that a little bit more that'd be good, but I worry about the air complexity of people running things today, as this rolls out, but I'm very open to you know people innovating and doing new things here. So I was just maybe we can find some marriage of those two tensions. That would be great.

F

C

All right, uh all the best I'll talk to you all on Thursday and thanks for sending us up.

D

A

F

A

E

So, okay, we'll figure out how to get.