Kubernetes WG Resource Management, 17 Jan 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes Resource Management WG 20180117

Description

Meeting Agenda:

https://docs.google.com/document/d/1j3vrG6BgE0hUDs2e-1ZUegKN4W4Adb1B6oJ6j-4kyPU

A

All right well welcome, I, run to the January 17th meeting of the resource magic workgroup apologize for not being able to hold last week's meeting. Just as a reminder. We are going to move to a bi-weekly cadence after this meeting, so you will see updates to the invites to reflect that later. Today we had a number of items on the agenda before we turn to the agenda items I just want to call out if there are any particular features or topics not captured that we want to track for kube 110 that need to be discussed.

A

A

Can hear whoever that was yet and if so feel free to speak up about those, so we can give proper priority to the near term items, but with that, if there are no immediate I can hear you Jeremy. If there are no immediate concerns, we can switch to the agenda. So first on the agenda was questions around allocate RPC calls.

A

There was no an owner next to that, can ever added that.

B

C

B

So in this PR, what we are doing now is that we are making allocate physical to the device plug-in at each container creation like previously. It was once for the pod for all the containers, so so sort of the main point of argument or discussion here is that there is opinion from some folks. So once that, even on a container restart, we should not use the cachet from within the qubit, and we should make a look at coal even at each container.

B

Restart as well, and my opinion is that, like we can make a look at all our basic old when a container starts for the first time, but for subsequently starts of that container. We can. We can use the cashier from within the cabinet.

D

Allegory with the makers opinion that we want to to use cash estate to handle the potential the best practice, video I think it has been a philosophy that the communities control a failure shouldn't the effect in the normal pattern. You, as its allocated to a node and I, think you played the signal. Team has really put a lot of effort to make sure he later restarted, replace and they should do the same for the best party and so.

E

You know, I have a slightly different opinion on this and so I think well from how we built the device plug in the idea was that it was predicated on a continuous relation level and that it would be able to run device specific operations for to be able to reset a device on container start, so that your device would have the exact same state at every container.

E

Stop and the arguments that I have with running a alcohol at the beginning container start so that, for example, in the case of nvidia gpus, we would be able to if the container, when it restarted, put the GPU in a bad state or left and, for example, memory, lakes etc. We would be able to clean it off and in terms of infrastructure. I think it's better model if we're able to scrub the GPU memory now in terms of caching I'm, not completely against caching.

E

But one of the argument that was made is that it would be able to accelerate at least reduce container start time and if.

B

I'm, correct did I, do, are you mental or the one? Is they improved as restart time and the second argument, and that main primary argument is that it would not that already running container will not depend on the device plug in. Let's say if the container, which was already opened it somehow it restarts, and if device burn is, is not there, it would fail if we make a look at each time. So that is the primary reason. So.

E

Just for the container start, a time I think we should actually gather numbers because I'm not exactly sure what the organic magnitude of an RPC call is, but I think writing to disk, which is where we would be doing writes. We can even and that's a.

B

Response to disk would be basically here. Let's say we can even ignore that performance improvement.

B

How do it inside? How do we decide on this point that that the restart would would be become dependent on the device planning like? So this is the main point, a main primary point in my case as.

E

Far as I know, CRI is already doing a Serie and at your PC call and they're handling things like at least what I understand. They're handling errors in a way that their community communicating that error up to the qubit level so and.

D

See Monsieur, it's still running the same binary, Ice Cube late, so they like the same figure. What.

E

D

D

E

Because they had the exact same error case and they seem to be handling it right. Why aren't we able to do exactly the same thing.

D

So so I think container and I'm overlooking a dependency on the container run time, but but again, I think I hope we don't die tomorrow, fated dependencies because doing that will make the best packing upgrade more complicated, I'm.

E

Not suggesting adding dependencies I'm just saying that CRI is doing it. Gop Seco exactly the same scenario as suggesting.

E

B

Again, making it configurable be an option like I mean I'm, not against making it transfigure.

E

About but what's.

E

D

First, I want to understand that the requirement for security, isolation and I think the polling or security isolation is perhaps good enough for now how.

E

We support GPU as a container isolation, I'm.

D

Seeing accumulated either at the stage to support the container level security, isolation and I see if we can support column level, security, isolation, it's probably good enough for most of the user base. Just.

A

Are you so the the scenario of wanting to scrub memory? You're, bundling that under security isolation scenario, I mean.

E

Scrubbing the memory is not it's not the only thing we would be doing as I was mentoring, just if the GPU I mean container a star usually means that a failure happened and we're expecting. That would be, for example, a failure that put the GPU in a bad state cleaning up the chip GPU and making it.

E

For example, if memory was like with the was leaked or any other things that might happen to interfere when you're restoring your container, that's something that we want to be able to do in the device begin to honor container, restart container.

C

Consider that means allocate you.

A

Know ask the device plug in for a particular GPU and all you're asking for here is a a pre container start exit point for you to do any action to potential memory or clean or I mean.

E

That's that's basically, I understood that the Vice buggy and would be.

F

But such task better to do after immediately I have to contain your failed. So the kind of delicate call Rosalyn on next hour kit I mean.

E

Why not, but I think when we were doing that, we wouldn't be able to guarantee that, for example, say the cubelet crash after a container fails. How do you you can't guarantee that you're going to run that geolocate pass.

D

So I seeker I do have enough data to say whether that is a common failure scenario. I think most cases I would like to separate the security cavity was just uh have to state monitoring, because.

E

Reducibility, if your container restarts and just starts getting errors, because the device is not in the same state you're breaking one of the premise of containers that you container it it has. It's not going to be replaced.

D

I, don't think of a house such guarantees for any other resources we do to such chuckling for CPU or memory. They can fail at any time and we don't do I need such guarantees for storage like failures or like they can.

D

The maybe GPU cows or do ADA underneath maybe it kind of provides process level security guarantees I mean that.

B

Is another conversation? Can I just come in? So another point is that if there is a container failure and that's causing problem, so if that that food has become faulty and that and when this port will be relaunched, so I think the devices will get reset the poor recreation but I'm saying that if a container is cake, if it keeps failing in a pod and the user just deletes that poor and recreates the pod right, yeah right.

G

B

Devices will get reset it.

G

Assignment of a device to a container we go to reset the device. Oh I can hear you see responding to the previous comment. Are we assuming that, before we assign a device to a container after restart or whatever, are we going to reset the device.

B

Yeah, that's what Renaud is asking for, like even after each restart and what what the current state the PM the current state in the PR is that we do reset device only.

B

Actually, we do invoke allocate goal only at the first time when the container will be started. No, then, each time not at 80, restart or.

H

Second, some sort of a compromise where you do the alcohol and container restart, but if the plugin is unreachable, you use the cache if there's cache data so.

E

H

E

Problem was that is that you're, at least how we see it in video is that you'd be you'd, be happy to be exposing yourself to having bugs that. You can't reproduce and behavior that you can't reproduce, because if your GPU I mean gets into a bad states, then your your bills, just I mean.

B

So I didn't know that, as I said earlier, the deletion and poor recreation will fix. This will reset the things I mean there is like it's not like. There is no way if a container like on that, you start off, a container will do the model. Devices are not getting reset it, but if a pole gets deleted and food is being recreated, then very wise. This will reset it right, but.

E

I, don't see I mean that seems like you very model, if your container we're continuing restarts in a cpu environment and you get weird errors because Linux didn't do its job properly, are you expected to I mean? Can you see the parallel I'm trying to make? Are you expected to have the user intervene to remove the pod and recreate it because weird errors happened, I mean it's the role of the buffer or Linux in this case, so.

H

Renault in your situation, if the device plug-in is unreachable and you bullet tries to call out to it on container restart, what would you like to happen? Instead, you want.

E

H

E

Understanding is that CRI already handles a similar scenario where, for example, if it's running with cRIO, if your REO is not reachable, it has a behavior that pops, the ER up words to the qubit and so I was thinking that we could have the exact same behavior, STR I,.

D

So you are making the best racking either I think all tonight's, the continued runtime I mean.

E

If you, if you're running GPUs or if you're running devices, it makes sense that your device plug-in or your device runtime, because it's running device specific operation to setup your device will be as important as your I. If you're not running devices, then it's not as important because you're not using the device but.

B

E

How about making it configurable, why I'm not against its I, mean in general, I I, just don't think we have any. We don't really have a use case and would probably trade against performance, but I'm not against. If you, if we want to do it, make it configurable, then why not.

A

I don't understand the argument for making configurable. Ultimately, someone's gonna have to configure some something. If you make it configurable who decides the configuration.

B

Device plug-in I mean the device plug-in will, if the device, the device plug in at the registration time, if, like that, if it wants to allow to use cachet or not to cubelet, that's the idea. That's a.

A

Feeling overly cumbersome and complicated yeah.

D

I think, oh, we can consider this possibility in the future, but right now, I also want to keep things simple and adjust the user default. Behavior that works for most of the cases and I do think for most use cases. How do I was Iike go to isolation is enough.

E

But that's not how we support GPU and we would be Masai ting per level. We would be migrating that we would not be using the device plug-in. We would just migrate the code into the device reset and the memory is scrubbing into the continual runtime.

D

That's an option I think, and this is a you already require twins. The media special continue runtime but again, I feel like her multi-tenancy. It's not a community's focus from the very beginning. I know like people are working on that, but still I think we are quite far away and you will firmly support about it in NZ. I think Holliday will security as a racer. It's perhaps good enough.

E

Why were where we don't want to have the same behavior as your eye and boy with disability. Insecurity is not something that's important to do to us or to the kubernetes community. I mean the behavior seems saying it's it's this exact same behavior as your eye and in general it seems pretty simple to say that the device plugin, or at least how we decided it last year and how we implemented it into the device file system, is to run device specific operation for container level isolation.

E

D

You have any surgeries place, a to clean. You know, level, isolation.

E

It's just that: why are we changing the requirement from container level isolation to public or isolation you're after that, after that,.

D

I, don't remember, those IAE is basically like imagining that we want to provide the container level security resolution. The.

E

Committee meeting last year in March, where I mean that's, always been our stance, we're building your library that offers container level isolation for GPUs, and that's always what we've been doing. So why is it that it's changing now a year after that, so.

D

I think power level, like a containers in the same party, even share the same I, think the same network namespace so I don't really understand where this container that will secure the resolution come from. It's.

E

Not just container level security isolation, it's just GPU can take our GPUs and containers. That's how we see it. That's how we think that it's better to isolate the lowest level and then add sharing capabilities. On top of that, for example, the puddle I.

D

Think again, I think the the current the kinetic model I could. The meaning focus is on single tendency, even though, like people I will.

A

This is like a tenancy discussion. I guess it's.

E

Not a sanity discussion, it's it's! It's a GPU discussion where we're dying here is suggesting that our lib is basically is suggesting a different model than what we've been implementing for the better part of two years now and I. Don't understand why this strategy change is actually happening now and wasn't mentioned a year ago, I.

A

Think I think there's a bit of hyperbole right now on both sides and I have using out the reality so like to me. This is like a question of just read and how often we call allocate and I feel like there's a lot being tangled in this. This yarn for some reason, so.

A

So maybe Jen can you explain why you think this is tenancy issue.

D

Actually, I I. Don't really think this is attendance easier, but I want to better understand why we like renounce things like how to provide a continued level security. Isolation like, for example, why we have to reset device, even though, like we just reallocate the device in the same powder, but two different containers I think like if we can just reset the device like make our allocation car when we reallocate a device to a different part. That will be a good enough model for the continuous case.

D

F

Devices potentially can have internal memory buffer switch like when you put different algorithm on it. You probably need to clean up it, so it's it's good practice to clean up it before we use it's. It's ok.

D

E

Not just a security issue, it's a reproducibility issue, you're at least have we understand it is that containers are, is the lowest level. If your container should always run in the set, you should always be able to when your container with the same States. That's why we think container level Isis and that's what we're doing elsewhere in darker and darker swarm. That's what we're doing in business to and I'm, not exactly sure why it should be par level in kubernetes.

D

Uh-Huh, yes to.

H

uh To restate your I guess your hesitation around making allocate block the container launch.

H

Of the device plug-in to stop the key from being able to create containers right to.

D

Restarting a puzzle because I think a partner restarted. The pod and containers are very common, you case, and if we make like every time when they restarted container, we need to make this allocation car. The introduces this like this potential failure, pond that the device pragna heights are. We had to be aa bit there I think this makes like the penny handling scenario very complicated and also like it was playing mostly like most likely. The best parking will be deployed lights dim inside and the way secrecy has benefited like.

D

We can't upgrade the device plug-in easily, but if we it's tied to the continual lifecycle, we think this will make it more complicated to upgrade the device buggy. It.

E

Isn't upgrading the device for you know already requiring that your all your cards on your node the earth? Please let your node be drained so I. Don't understand that the upgrade organized here I think.

D

That's the recommendation, but hopefully if we can decouple this lifecycle, it's not nice to me. I.

E

Mean so the eye and just to answer that other argument with restarts I think that if you restart and that your container just keeps crashing because we couldn't keep the state or just all the memory wasn't like free, then how does it have? How does that? How does it help that your container just restored but crashes, but.

D

Harvin does it: it happened like a processor on your GPO crashes and like the it justinep the device in place, state and.

E

Gpu is at the same state as CPU. We have hardware issues, for example, if you have two processes using the same GPU and that one process faults the force on the GPU, the other process will crash to where we're still pinning hardware, isolation, we're still building hardware and kernel clear, but that's to say, GPUs are at right now.

D

But this problem will happen, even though you may be thinking. You know if we have multiple processes.

E

D

This problem will happen even inside the container, if you just do bang process crashes so.

E

Say say your process crash inside your. You have a container that uses a GPU, your process claims and if it's a tensor flow, for example, it's going to claim the better part of the home, or at least the whole memory of the GPUs. If, if not just a bit under it, usually claims a lot of the GPU memory and then it crashes and we didn't have that we couldn't clean up at that point.

E

If you continue restarts, then just tension flow is going to try to reclaim memory that isn't available and and and then it crashes again and restart crashes, again, restart crashes again and I think it's probably better to just have the same model s your eye. That says, if you're not able to have your infrastructure clean up and make sure your GPU is available, then just exactly as your eye and just return your neighbor and maybe try again later so.

D

I think you're you keeps making a parallel comparison bin CRI and then maybe we can propose this a bit signal people. You know signal meeting because I do want to hear their opinion or whether they think this is an acceptable video model.

E

So I think AI, for this is let's bring that up in the next signal being yeah.

D

I see you maybe because like if we can get to some people's opinion in the peer review. If we can get some people from signo de on that peer review, then perhaps we don't how to evade for the next meeting.

B

I will try to add more people to the pier and.

E

The next one should be pretty straightforward, and the next item on the agenda is I've. Actually it's the PR bad annotations and the device I think it's racist, straightforward, and the idea is that we would like to be able to add annotations, CRI annotations on a container and the product ID, for that is that for cRIO with on occasions, you can call a hook in that case. That would allow us to support the Nvidia.

E

Alipin video container would be able to be supported with cRIO and we would be able to at least have another runtime and in terms of implementation.

E

I think it's better to it would because currently, the way we recommend people to do things is install docker and stolen video docker and set your Doherty for one time as Nvidia, which means that when you, when you create a pod with a GPU and NVIDIA GPU pod, and you don't request, GPUs you'll get all GPUs exposed and because you're the default runtime is in video and we're hoping that with your I/o, that would be a first that only eg cause requesting GPUs will have GPUs exposed.

D

Icu comes out us plug-in API and to add continue annotation support, but I also feel that the longer term it's it's, not a good way to invoke resi hook. So I talked about like if you think this is just to draw your case, then perhaps we should the hi world document that describes the use case, why we want to use the currency pre-start hook and center circle that documented with the signaled and I people and hope, like the rest, who can be supported at CRI level.

D

Instead of doing this, a magical like a passing information stations.

E

It's just that we've been talking about that with fish since I think November, and he mentioned that CRI was going to take some time and I think it might take some time so I think annexations is a first step. It would prove that stereo works with the Nvidia and time we tested that internally and it would basically be able to support our reduce case to actually standardize rensi pre hooks into CRI.

D

Yes, I do agree, it takes time. So that's why I think we should start now, which is that the discussion now between.

E

It's just a penetrations or basically a few, a few lines pure that just require a bloomin friend and I. Think everyone, or at least not a lot of people, can pay to see right and could be in better weather, CRI I think it it's. It looks like more of a g8 I'm marrying.

I

You guys doing okay, yeah I, reached out to my ground. He works on the your I do stuff, but I'm pretty sure you put up a PR are ready for the pre-start hooks. You guys do that. No, but I'd be willing to get strapped in the review cycle.

I

Sir, can you beat that sir, it's under review, but there's some back and forth about whether it should be something that should be supported? I. Thank you. Okay,.

E

D

Think Al Green yeah.

D

So I think I, like a bring you more use case. We'll definitely have the discussion there. I.

E

Mean so basically I think the idea of bringing that up in this meeting was if there were anyone that had any concerns about adding annotations to the device for being API I mean that would be a place to talk about it, if not we're we're just looking for basically people saying that looks good to me. What's.

J

The use case again for the annotations in.

E

That case, your I/o has this priest at least Hook's system. That says, if you see this annotation on a container, then invoke this hook, and in that case that would be invoking the Nvidia renze pre-start hook. Okay,.

A

So obviously, I'm incentivized to get cryo to work well with with with the media here, but I guess who is setting who is where are these in this annotation source from.

E

Can you repeat the question who.

A

What is the source of these annotations and.

E

Device plug in at allocate time sends the annotation that needs to be at lab board on the container just like it would send an environment variable just.

A

Like an VARs amounts, you'd want to say. I also would like these annotations present yep.

F

Okay, good question: all right now: can we have rope? Is it direction when we allocate is cold? It gets for information about the notation of the port.

E

You're saying, can you send potter notations to the device plugin? Yes,.

A

No you'd have to set those annotations on the pod yourself. So like rinoa as I understand it, you say I'm a pod that once a GPU yes.

E

A

The device plug in the device plugin says: okay in order to consume this GPU I need these n bars. These mounts and also I'd like to contain a runtime to annotate the containers as follows. Yes,.

F

Yeah and my question was about a bit different use case: how to get a bit more information to with the Vice begin.

D

A

All right, like it's, the Vice plugins, that's kind of saying the contract for how they want to use annotations, not the end user, enforcing that contract. So.

E

I think I mean I'm trying to understand you, you use case Andrew you're, the one developing the Intel device plug-in right. Yes,.

F

So far our case, one of our idea- was to get with information about bit stream for FPGA to the device program, so the West podium can prepare appropriate bit stream for device.

G

That's one use case: you could also use the annotation for indicating which this source is needed in a particular FPGA, for example. So, as we discussed in previous meetings, we could have our admissions controller, which is annotating the pod spec, that is Lanna tations and that could be used with device plugin.

D

Maybe 500 in the resource cross api we discussed before and the way to hope to to vandalise that design this culture, so visitor I think in a goon we may want is place it API to express the specifically source metadata requirements. That's the use case. You I think involved. Yeah.

F

It's we use case what I'm thinking about, but according to what I review a resource class, although it's not fully covered so I, was thinking about annotation as a temporary solution.

D

F

Can try and over time to see what actually, we really need based on wheels cases.

D

Yes, I think definitely we would like to hear more use case scenarios, but no, we don't plan to send up how the annotations to device.

G

Great was objection to that Lake is the downside to sending annotations to the plugin.

E

No, it doesn't sound like it sounds like a sane idea to sending annotations to the device for you, but I'm still new to the idea. I'm still trying to understand the use case, as you were saying, I mean: do kick kick. Can you formalize that into an issue or just a if you, if you paragraph yeah, we can do it.

E

So yeah, just to finish on the previous, if you and directed did you have any other questions on annotations? No.

A

I mean I think to some degree I know.

A

We would like to believe that device plugins don't need to do anything contain a runtime specific, but that that's not always going to be the case and I think it's I think the proposal you have here is reasonably fair, like I have no objections to it other than I think there was some commentary I'm like how do you graduate these annotations I guess to me: that's really outside the concern of the cubelet per se and more than concern at the device plug in and how it chooses to integrate with the wrong time. So, yes,.

D

And I also think alike. I know, tuition, I hope like I know. If we add this to this API, we already use it for early experiments and also even Parkington annotation OCR I. They, the comment is play. Stately mansion, like annotation, is not. Hopefully, annotations are not going to affect the container runtime.

D

So that's I think that's the caution. We should keep in mind and the way should have deprecation plan. I know.

A

It's not really our concern right.

E

And I'm not I mean adding. That comment to the API is pretty ironic, because that's exactly what were I mean, that's precisely what we're doing by adding annotations the use case is to affect the container in time. Yeah.

A

I think it's outside of kubernetes purview to tell container runtimes how they should use annotations or not annotations. To me, it just seems like this is another useful vehicle by which the cubic canner doesn't need to know what's inside the envelope, but the envelopes allowed to be sent right, and this seems you know completely reasonable to me in that regard. So I I can comment on the PR with a +1 on this and III honestly, don't think it's like it's none of the business of the cubelet to know what's inside that envelope and.

A

It's up to the device plugin for how they want to integrate and where they want to integrate.

E

Okay, I think: okay, I think that.

D

Perhaps we can continue that discussion in the PR and we can move to the next item.

D

I, don't know whether to reach out it's here.

A

So the next item was around supporting extended resources and limit ranges and quotas. Mm-Hmm.

D

A

Knowing what the use case that's driving the requirement is so, but there was an opaque reference to like potentially licenses, but it just be good to know what what the actual use case in motion Derek.

H

So extended resources are how we advertise all device, plug-in resources. So I think if you want to quota those, it's pretty self-explanatory. Yes,.

D

So, for example, if we want to support a GPO quota, we will also need yes.

A

I think we did discuss this previously. No, and we basically said extended, resources won't work over commit or mm-hmm or, if I think about it. We potentially deferred discovery if they supported over commit to the future resource. Api.

D

So I do think we want to support over commit either, but to support research. Cota maybe can support a requests that the resource name for Eastern the resource so that but ministry toads can put the apogee made, but requests for GPO requests, for example, yeah.

A

I have no objection to that, so someone's willing to sign up to I'm happy to to.

A

Shepherd that review I.

D

Think the Tricia has already have a pure and look at data, it seems fun and it's basically just like follow the similar model and other resources like few pages, and also we don't because we don't allow poor Cleatus. We currently only supported request a recess name and code has cope and we make the rangers should have work. The same usually just work.

A

Okay, so this this PR got lost in my birth of a child, email bankruptcy, so I will I will bring it back to the top of us and yeah. Generally speaking for the group, yeah I have no I, have no issues, quoting the request of an extended resource.

A

So with that I think that covers that topic. There was one final topic about a quick status on where we are and how we feel about the transition. I don't know Jeremy. Did you put this on the list or I.

H

Added that at the last minute, okay.

A

So I guess to summarize going through here we see the huge pages feature got moved to beta and that PR is merged so for CPU management, I think Connor. You said that that has now gone to beta and has merged and.

A

Yeah and so I guess for device plugins, or we do folks feel like we're still on track to graduate that feature this release, or these two PRS we discuss now the major sticking points, though I.

E

Think we're on track in general, I feel like peers, are getting done. The two pillars that were mentioned, or at least important for in the video and I think in terms of device flag instability, it's they're, pretty good I.

D

See even the API is tensions. We are discussing the compatible changes, so I do think like a I. Do them will affect much, but I do think like there are some some changes, I'm hoping we can communicate about cramming device parameter beta, like I, think it's switching to to a general complete clocking model that uses probes instead of registration. I, think that would be a major one to thirteen and also I also want to sync up on the state where so other device plug-in implementations other than a GPS tracking I.

D

Don't know whether we have made any progress um like Solar, Flare, high profile, snake and also FPGA company here like if they are working on this hell guys. Just briefly on the progress on those device. Cracking oh.

G

Yes, yeah, we can have pg8 device plugin, we are set up environment, we are making progress.

G

We will get back to you with the status update me, fu, G, okay,.

D

Great do see you for this kind of you, the one pan at one point, ten tampering.

G

You know what do you mean do what arcane timeframe, how much time do we have.

F

G

Definitely of an update by the time, not sure if we complete them, we.

F

Have a bit of internal bureaucracy, what we need to fulfill before we can open source with podium, I.

A

Guess, from my perspective, it's less of a need that the plugins open source, but more that the API has been valid as serving another device types needs. Yes,.

D

I think update will be good enough. Maybe.

H

For the registration, API change, you mentioned Jane.

H

D

Don't think this is a hard block, but it's jetta we'll be nice to happen. This will generate the with the. Neither party was talking to monitor, cubelet restart, which I think is a nice thing to have, but I also don't think it's really a hard blocker, but people people have other opinions that you know.

E

It would be really nice that, if, for better we're, at least after better updates, we don't have to change the registration model. Mmhmm.

D

Yeah basically I think like when we say this enters beta and the model says: okay, you only need to implement this grdc's service. Instead of that, you guys you have to register with two blades I would have a complete restart. That will be a better communication model.

I

Similar to moving up the device plugins in terms of future stability, what do we do with features that are kind of enabled by device focus? Installs the example? We have GPU status at a certain level?

I

Are they usually exclusive or I?

I

D

I

To you, we have for a couple of different implementations of GPUs what'll.

D

I

Take to move CPU status forward, I see.

D

I

D

Or any other arbitrary.

I

Thing like FPGAs well.

A

That's that's really outside of the domain of the core project, then right. It's up to the provider of set device plugin to then take ownership of the problem. Yes,.

D

And also for GPU I think the the the old ARF of flag away idea of our GPU will be deprecated in the content. Well,.

E

If we, if we graduate bed down in 1.10, I, think the plan is to then deprecated that, like.

A

Do you folks mind if we just find another minute talking about that allocate topic a little bit sure so today? My understanding is that we call allocate during cubelet admission of a pod, correct and so like at a 10,000 foot view when we talked about in the long term of wanting to be able to support. You know.

A

Locality based scheduling concerns where I want to schedule this. The CPU I get pinned to closest to my GPU. What would be the impact of calling allocate on every container start call versus letting us have some centralized planning step during cube admission its day. This is when we define your CPU. This is where we define your GPU and then that's so.

E

I was actually thinking about it and I.

E

My general ID is that GPU are not going to move, or at least currently we don't even support how high parking, GPUs and because uncertain mother boys that you're in the field we've seen motherboards that might have a few problems and that just don't support hard on your GPU hardware level. So the general idea was that it would probably be a call during another call, instead of a light that would just return, GPU affinity to a CPU where I'm still thinking about it, but in general it would or just like a matrix that says this.

E

Is these GPUs are legs by envy link excetera, so it wouldn't be a call that would be at least topology wise I. Don't expect that to happen in the altie call.

B

Yeah and and and that ik q, it may have information about the locality in the device structures in its local Kakashi, because less than watch we'll keep analyzing device. Logging will keep advertising video ID attributes to it, and when these let's say Numa manager, it is trying to get get the topology details from device packing manager. It can respond back from its cache.

H

B

H

So that this is maybe a just a bit premature because we're still finalizing the draft of the proposal, I think that thickness is is referring to you yeah at least one idea. That's it's related to a two-phase approach.

H

Where there's a first place during pot admission where you can collect Numa affinity preferences from various providers, the two initially would probably be the device plug-in subsystem and the see you manager and then at any point, after that, those components could call back to the new manager and and figure out what the overlap of the possible Numa node affinities are when they're making their allocations, but I think maybe we've we've taken one step too far back I think that the decision of whether to call allocate multiple times or not is separate from the decision about whether we choose a new device when the when the device plug in is invoked right, yeah.

A

I just want to make sure that I wasn't I'm very easily confused. So I wanted to make sure that I was not confused, that if we call allocate more than once over the life of a container that it has little to zero impact on whether or not we can support locality based decisions and admission on which device and which, which.

K

Cpu depend in the future, so I don't think what that would impact like, in my opinion, like yes,.

D

I see currently the motive I think it's the scheduler and the curator at mini admission handler, will take the device properties into account, but I think currently the allocation by the way I want to make a single or multiple allocation cause are most relevant to whether we're going to reset the device, but the device plug-in. You want to use this obstacle to reset device, because I think currently the Keyblade decides what devices to allocate to a particular container and I. Do you think that pathway will change.

H

Even if we do at some point call LK multiple times, it would be with the same device right.

B

F

D

F

Comment if I may so even will make currently we are talking about like GPUs and tower devices, which is not hot pluggable. Please that's keep in mind. What were are some accelerators, which can be used over USB and USB devices can be plugged and unplugged even round the moment, so enumerated a bit differently, so we need. We need to keep in mind yeah.

D

That's the template, I use cases we should support and about plug-in, does supported and I'm agree sauce that cycle and but like how to better support dynamic results. Provisioning like missile standing I think is perhaps outside the scope of the device per game, but maybe faith fighter into the resource crafts. His question.

A

Okay, well thanks thanks for the ketchup on that at least said it, but that makes sure I wasn't confusing topics in my head. I guess.

D

A

Guess we got five minutes left if there are other topics, people went to most quickly on now these rays them. Otherwise we can adjourn I'll post the recording and just remind everyone again that we're gonna move to a biweekly, cadence and I guess the takeaway here was that we know the first topic. You're on allocate will go back to Sigma and we'll have a chance to digest.

A

If we want to do here on the second topic, I'm gonna, like +1 the idea of the annotations getting passed, maybe we can unblock that just on the PR and I will follow up with shepherding the photo and extended resources work. So.

E

Just actually quick question on that: um what's my approved requirements for the second PR I'm.

A

You know I'll look deep closer at your PR after this call, but I guess I, don't know how if you.

E

The question- sorry sorry I've taught you, but the question is more: is there someone else that we need to involve or.

D

Iceni sixth up offline fitness, yaaay people and also with Tom the I, don't say a nice strong of attractions with them. So I think he fool right now. Maybe they were just a base for oh no cool.

E

Okay sounds good.

A

Right I mean Renault, I will go, and you know if, if what we discussed here maps when I look at the PR I will express my group, oh and then give dawn a moment to yea or nay, but then otherwise, I'll just no just tag for moving ahead. Okay! Well thanks. Everyone and I will talk to you on slack and then in two weeks.

A

E