Kubernetes SIG Apps, 28 Nov 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG Apps 20221128

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

Good morning, good evening, good afternoon, depending on where you are today is November 28th, and this is another of Our Sick apps bi-weekly calls. My name is Marche and I'll be your host. Today, I have one announcement. uh The 126 release is uh for the next Tuesday. If I'm, looking correctly um and based on the emails that I saw earlier today, there has been no problems, so it looks like we are on the schedule to be released on time for uh December 6th.

A

So that will be the only thing and we can move on to the discussions. Drew and Peter I haven't seen Peter on the call. Maybe he joined in the meantime. uh uh Take it away from here.

B

Sure um hi I'm drove I'm from Shopify and I've been written with Peter as well from Google to basically propose these improvements to stateful sets um apologize a bit from my wife, I've kind of lost it I'm recovering it. So if uh yeah it might, there might be a few hiccups, um so I'm gonna be talking about adding zono rollouts for staple sets and um General Zone awareness for pdbs as well.

B

um So um the use case we came up with for this was uh for initially for Kafka deployments at Shopify, so I'll use that as a case study, um so that yeah yaml file you see on the site is roughly how we deploy Kafka at Shopify.

B

um For those who don't know, Kafka is a distributed event store um deployed as a staple set for us. um Kafka is Rack or topology over here. um So if you have replicas of data, three of them, for example, and three zones, then capture will replicate um this across the zones.

B

um One key thing we rely on um when we deploy is the Readiness probe, which essentially is a command which checks if a pod is caught up um caught up enough. Rather, the drop location of data, um I, think different databases and things have different kind of commands which are kind of similar, um and one key thing um when you reply this as a stable set, is if three or more pods are down at random.

B

At any point, um we run the risk of data loss and unavailable partitions because it could be from three different zones and then we could lose all three replicas for something.

B

So the problem we had is that deploying these staple sets can take a very long time with the default rolling strategy um for yeah.

C

um So is your data loss because you don't what's the durability that you configure your Kafka with? Is that the reason you would lose data.

B

Oh yeah, no I'm, just saying yeah that if you um we have a replication factor of three for basically um every partition, which is the lowest denominator of Kafka replicas, replicates things. So, basically, if there's ever a disruption and more than three pods are done at the same time, you technically may be at risk of losing the data.

B

Does that make sense.

C

Well, okay, so it makes sense if you're saying that you basically have a very long time between the data being committed to Kafka and being synchronized with storage media. So if all three go down you use partitions and they're unrecoverable, because um you know Kafka is configurable to allow you to write much faster than the disk moves. Oh I see at your YouTube.

B

C

For the topic such that you would lose data, it figures now right.

B

I think yeah I think I've um reported it a little incorrectly, then I think it's more about unavailability um than necessarily data loss, because it did a loss in the term of if you're, trying to reach and write it to a certain partition for which there are no replicas available. Then you're losing data in that regard, but yeah other than that. Kafka is pretty resilient with what you describe where it does frequently um flush the disk and write um whatever is available.

B

So it's more data loss, as in people who are relying on writing Mr sister Kafka would be losing data because there's nothing to write to.

B

It's like if your database is down and people are trying to write you do it and you're going to lose messages there. um Does that clarify.

C

uh Yeah thanks.

B

Yeah sorry I should have been clear, um so yeah, so the issue we were facing is for some of our largest staffer clusters, which can be in the order of two to three hundred Brokers uh or pods. um It can take some in the order of 20 hours to like more than that, even to deploy, um because we have to rely on the default strategy to roll out one part at a time.

B

um These pods um cannot be a bit at random, as I mentioned. If you um set that to something like maximum 25, you could accidentally take down all three replicas for something. That's the three-part restriction. I was talking about, um and so we want these deployments to be faster, uh not just from the perspective of Ops that you know, I would like Motorola to happen within working hours, um but also at least for the kafka's use case.

B

We we prefer shorter disruptions um because of um clients who connect and consume from Kafka the longer a rollout might take the more time they might perform, rebalances, where basically, they do no work.

B

um So the key Insight we realize is that we can disrupt all pods in a given Zone and still have the other two replicas for data being available, and so we can basically update our stateful stats by updating all parts of the zone at once, and it doesn't have to be all pods um I think the AWS use case was they were going exponentially. They would start one part at a time, then two and four and so on to whatever marks and available is.

B

But the point is we: can these kind of apps can tolerate a lot of disruption in a given Zone and we can leverage that for faster deploys, be it for um your app or even node upgrades, and things like that.

B

um So one thing: a lot of people when we talk about this thing is like bot disruption. Budgets could help with this. They don't um because they're, not limited deploys are not limited by pdbs when you're doing rolling upgrades, um but regardless of that, we also want pdbs to be aware of uh zones as well.

B

So the way we solve this essentially is by writing our own controller or operator. We call it the zonal, deploy controller, um it uses the popular controller, runtime Library and basically, we now deploy our SQL set with the on delete strategy and on relevant updates, which for us is mostly a change in revision or a change in anode certain annotations. The controller takes over the deploy.

B

um We update all pods in the same Zone. At the same time, you could also configure it to be like 200 time or something if you want um and as I mentioned before, we heavily rely on the Readiness probe um for the stateful sector. Tell us when the Pod is up and running uh and if pods are unhealthy after resetting a Zone, then the controller stalls and doesn't make any progress.

B

um We also have a pause field um which I think mirrors. What deploys have for some reason we can pause, deploys in kubernetes, but not stateful sets. So ideally we can add that too.

B

So, with this controller, we saw that our deploy is basically took around six minutes per Zone and we went down and since we have free zones, we're basically down to 20 minutes from 20 hours, which is a huge win operationally speaking.

B

Our client disruptions are minimized for the shutter duration of time, as I mentioned, there's a certain thing called consumer groups and if any kind of outage happens to what they're looking for they basically stop the world rebalance and you're, basically, they're not consuming anything in that time, so the shorter the disruption window, the better it is for them um and the way we designed this controller, it was felt like it was a negative feature and it just worked without deployed pipelines as well, and we designed it to be generic and for now we're only using it to deploy a Kafka.

B

But there's another team uh which uses elasticsearch, which can basically use the exact same solution.

B

um So I guess what we're really proposing um is to Upstream. This kind of zonal awareness for staple sets um just like Kafka there's a lot of other apps which can tolerate disruptions in many pods. If limited the same, Zone um I think kubernetes would benefit by offering these kind of features um and I think it should be native to kubernetes.

B

um So the two things we think that will have high impacts is the ability to roll out pods by zone for a stateful set and to allow um pdbs to have a budget of disruption per Zone as well. So basically, rollouts and pdbs should be a zoner topology player um and again. The goal for both these features is to essentially speed up deployment times uh without having an impact on our any kind of availability.

B

um So we kind of prototype what this might look like. um There's a picture on the right there of what it might be um where the main thing is. Maybe you can come up with a new update strategy type like zonal update, where you can define a key.

B

um um The main semantics I think, though, that any kind of like zono rollout should have is that the zones should be updated, always in the same order, um whether that's specifiable in the yaml file, or something like that, um I, don't know.

B

We can have a discussion on that and the reason that's important is is, for example, if you deploy a bad change and you need to roll back or maybe have a fixed forward change, or something like that, and it's useful to update the same pause which we're just uh which are down now, rather than starting from some other um place, um and all parts of the zones should be updated before moving on to the next one, um apps can use the Readiness probe really to control um how fast the deploy is happening and to see if they're caught up um for us each part reports their own health, but it could honestly, you could be reporting the General Health of the entire cluster as well or whatever I think that's where people can customize how the sold out strategy would interact with their app a bit and um if bars are unhealthy after resetting a Zone, then the controller should just stop um as I mentioned before.

B

um It could also be nice if we can pause these rollouts uh look at deployment support.

B

um These are some other examples we came up with the image on the top left is the one I just showed, which is you can have a new update strategy called zonal update um the one below that bottom um bottom left is. You could keep rolling update but perhaps update the partition type to not only support something like an ordinal, but maybe a label selector, and then you can have the partition key, be something like the hosting and that's actually incorrect.

B

That needs to be the topology key there or you could have something we have on the right, which is again. You have a new update strategy. Like Journal update, you can Define how what is the max of unavailability per Zone like 20 or 10 pods, and you can Define what order it might be um or what key it might be, for example, alphabetical here or any mix and match of these. This is the part where we don't know what it really should look like here and um people with more experience.

B

Please um you know pitch in, but we just wanted to give a couple of different things or ideas we had for this um yeah and it's not just us um um I think there's other people on the call um there's Mariana from AWS who implemented essentially the same controller. We did at Shopify to speed up, deploys for I, think Prometheus, um and then we have Peter from Google I think who is working on a feature to migrate. Stapable sets where having zonal part disruption.

B

Budgets would also be very useful for speeding up those migrations, and we actually do do that. We, when we do cross cluster migrations for Kafka, we are trying to take Zone into account as well and um I. Don't think I mentioned it, but it's not just scaffold. There's a lot of stateful apps out there, which would benefit from this.

C

Yeah I think my well two things. One have you open source state.

B

uh No, but the AWS one is.

C

Okay, so that would be good. It would be good to get something in the open source community that people can iterate on and give feedback up before proposing to like take an entry. So they give us an opportunity to kind of iterate on that API and lock it down before you try to be wanted or build it into an alpha feature even.

D

Because, like with.

C

The way the release Frame Works now like you, can back an alpha out, but if you want to move it forward like there's, not the whole thing like we did with B1, where we let it soak for like two years before. If you want anything, is no longer a thing um purposefully.

C

So so getting some some you know in the wild feedback on an API, that's being used across like Shopify, AWS and Google would be a strong motivator to Upstream something then my other thing would be um so there are Mo very many users use kubernetes on top of a cloud provider and the failure domain. There is most failure domains. They are almost easily specified in terms of regions and zones right, but many users also use it on top of do-it-yourself data.

C

Centers right and their failure domains are not going to be um necessarily described in the same way right so that, in terms of uh topology spread constraints, which we already have introduced in in the Pod spec for being able to spread out. Instead of calling it zonal like we, we should probably we want to do this. We need to be able to handle arbitrarily arbitrary failure, domains in terms of the topology spread right um so and I.

C

Don't think that there's anything that would Block doing that, but I would look at the API and go back and say: okay.

C

Well, how do we make sure that we can generalize this to the use case of somebody who, instead of going across Zone, just going across power domains, Network domains, racks so forth, and so on, um but yeah other than that it seems uh like cool like I mean I I, don't I have no like opposition to the to it in general practice, but those those would be the two things like getting some more real world hit on like a a common Epi that multiple people are using, as opposed to like we have one from AWS and one from Shopify and we're gonna try to select best to read with like help push us in the right direction.

C

So we know we're doing the right thing and being able to handle arbitrary um topologies would be more inclusive of users who are not running it. On top of um the cloud which you know over time, that maybe may not be as many people, we expect more and more workloads to move out of do-it-yourself data centers, but today, there's still a lot to run on it.

B

So just one thing with that: I think a common pattern we've seen is it's not just us writing these operators, but for a lot of stickable sets, people would um um just deploy them in general, using operators, and this is kind of baked in there. They have a lot of logic. How do you recommend We Gather these use cases I suppose, because we have one from AWS, we can come up with two from Shopify.

B

um Do we like further reach out to the communities or like how.

C

Do I go about that? What would be awesome is if like, if we have, if there are multiple people that are all working so okay, in the same way that you said this is great, and the reason this is great is because we believe that this controller that we have will work for multiple workloads. All doing the same thing right I would assume that the same thing is true about what AWS is building.

C

If we could convert like I mean we could form a working group, if that would help I I'm I'm, very I'm, open to any suggestion about how we can help gather our community around it and convert it to one open source thing. I would really love to do that prior to start to baking it into the API like outright.

C

um If that's possible,.

A

In parallel to the zonal uh rollout, the pausing of the workload controllers has been mentioned in the past couple of times during the batch work group meetings um and just like it is currently implemented for deployments and cron jobs and jobs. Currently, uh that has a request also to have um The Identical functionality for the remaining controllers.

C

To why that is, it wasn't thoughtless when we review wanting the apis.

C

um The feedback that we were given by the architecture Sig was that the pause, primitive and deployment wasn't declarative and was not compatible with GitHub style, workflows and automated orchestration at a higher level, which is why we didn't like, because we looked at including it in the rest of the workload Primitives resources prior to B18.

C

um But there was push back more broadly and I. Don't that's not to say that, like we can't revisit it, we can always revisit it, but it wasn't done like capriciously or because we thought that it was like, like let's rewind it without this feature and add it later, it was, let's not add it, um but maybe it's time to revisit that position.

E

I I have a quick question. It's a good moment. um It wasn't apparent. uh What are the changes to pdbs required by this proposal? It seems like none. Could you please clarify that, because you.

F

Mentioned yes, I think for pdb is the the work around or I guess. The solution we want to. The problem we want to solve is that for many apps there is a discrepancy between the unavailability cross, Zone and intrazone, um so you may be able to take down, have more pod disruption um in a single zone, but if you do have a second significant disruption event in one zone, you don't want your budget to be affected across the other zones.

F

um I think the the challenge that Drew was pointing out was for the replication factor of of two in Kafka. You can't have more than two like assuming you know. You have partitions sharded fairly uniformly. You may have a case where you can't take down more than two pods, even if you have a capital cluster, that's very large and it's really dependent on the scale of your of your Zone replication.

D

And in our case, at AWS Ikea has this uh no managed node groups and during node groups a big rates when you need to, for example, uh updated your OS within your Secret Patches. It uses pdbs to like uh to in case the the the parts don't became. Half of another group of updated supports the node group, a big raid in that case.

D

So in our our case, why we created the Zona where a PDP was because we wanted to have a way to safely uh start shopping date, multiple nodes at the same time, and have these safe word in case something goes badly. We can pause the rollout.

E

To clarify, when you say in zone and where pdb is that the regular core pdb with additional labels or is it separate objects on top separate from core PV yeah.

D

What it did was like you saw, we created a new admission web hook so with uh that. uh That's basically watched for the pods status, States, a status between uh between the zones and then basically using that information allows or not to request to go to the eviction API.

D

So in our case, we kind of create a new object and I'm not sure on the. If and to be honest, we copied a lot of the pdb code, but we had to like create a new like uh the statuses you need to to keep monitoring how many parts we have per easy, how many different options we have per Z kind of for the status that we used was uh pretty different from from the what we have in the PDP one.

D

It's basically pairs on that like the entering in our implementation details but I believe, maybe you could do it. That's, because if you could extend the the pdp's current.

C

I would say this I can at least understand. I can understand the use case for making staple sets or other workload orchestrators aware of topology constraints during pod termination right. But if you're going to do it like I, don't know what that looks like for pdb like we just offered a couple of API one apis that are like very targeted towards Zone, but I can understand like how you could extend that to work with an arbitrary topology constraint.

C

Pdbs are designed to work on label selectors right, like they're I. What how yeah I mean like is that even compatible like would label selection, be compatible with topology constraints, because you're saying specifically like these pods are grouped together and I only want X number of them to be disrupted.

C

um How would you like I, guess, maybe you could say like I- want these pods to be grouped together and I went X of them to be only X of them to be disrupted at any given time and prefer maybe like a preferred zonal.

G

C

um Then that is kind of the way like the Zoro constraints are specified for scheduling. You have preferred.

D

H

C

Right I could see that but yeah I, just like I'm I'm, not opposed to the idea but I'm not seeing no I haven't seen so far today. Anybody offer like this is what we think an API would look like and handle those cases which I mean sounds like a good project to work on. If we can come up with how that works.

E

You know main concern about the using the admission API to reject the eviction. Calls is that essentially actually comes in from two sources: uh schedule preemption and drain right. So essentially, if you reject eviction and actually scheduler who drives the preemption that may lead to unexpected results, because schedule actually want to preempt that pod, because it needs to find room for high priority, but is protected by your own custom pdb logic. But ultimate scheduler needs to find route for the high priority. So I think that could be a little bit uh concerning.

E

If you try to use a mission hook to reject eviction. Apis I.

C

Wouldn't I wouldn't do that in the general case right and then, but that is the kind of thing like if we're going to solve something that allows pdb to be a bit more expressive in terms of like preference toward Pi termination. It does affect scheduling, it affects eviction, it affects draining um so I'm, not saying that's not in it's possible to do that potentially, but we have to be thoughtful.

E

F

Which is one to throw a thought out there so I know pdb is today. If you have overlapping pdbs um eviction is not allowed um like a potential option is allowing overlapping, pdbs and kind of relaxing that constraint. I, don't know how complicated that is, though, but that is a.

E

Fantastic segment: segue I, have the follow-up item. I have the requested proposal and I think I'm going to link it again, request and review for supporting multiple pdbs overlapping pdbs. It was assigned to see apps a couple of weeks ago. I proposed it roughly a month ago, so I would love to get feedback on that proposal. Okay,.

C

That that we can do I think one of the challenges, in particular with our customers that use pdvs as a kind of a communication between clients who run on their clusters and their own, like node update protocols for managed nodes. Is that like it?

C

If you, if you allow them to be too aggressive, then it's hard for people who are providing kubernetes as a service to respect those pdbs when they're doing managed node upgrades right um that it can basically take forever um so but again, I mean like yeah like if the community really wants it? It's not super opposed to it. Personally.

F

E

I just want to go.

F

Back to the topic of um the staple set up updates and um one thought that occurred to me was today with staple set updates. Your updates are declarative, so you say: Okay I want to be able to update to this partition. X number of replicas updated out of out of n.

F

um If we're doing this along, say, topology aware boundaries, the state that you update to is dependent on the current state of a staple set and the current state of those pods were which zone they're scheduled to um I think for say a staple set that has PVCs that are assigned to specific topologies that doesn't matter as much, because those topologies are fairly static.

F

But we can get into some weird scenarios where um maybe you're rescheduling and you're rescheduling two different topology group, um so you're not necessarily deterministically doing an update um but I kind of wanted to get like as a general principle. The city I have 70 thoughts about like risks there in terms of using pod state for these updates. I.

C

Guess here's my question so one thing for okay so for it depends on what your PVC is baked on in terms of whether it actually is only locked or not right. So if you're using like elastic file store in AWS, then you can move it all over the place if you're using EBS and then through Google, it depends on which version of PD you're using right, like they do have multiple PDS. So the the storage topology is a much longer conversation which has been going on between Sig, apps scheduling and storage for a while I.

C

Don't think, we've made any progress on trying to actually do better. With that to be honest, but that that's a conversation we should probably have as well I I'm, not sure that this is not declarative.

C

Well, I guess I'd ask like why? Is it back decorative, it's clearly non-deterministic right, which is definitely a change but deployments and when they update when they're updating replica sets that behavior is non-deterministic at intermediate States right, like eventually you converge toward. You have X number of replicas, but at any given point, regardless of what you specified as Max surge have been unavailable like if you specify Max server to five. You know you may end up at some point surging to six, just due to network partitions right.

C

If you specify A Min unavailable two or three you, you may end up with four again, just because of network partitions, so like for most of the controllers, the behavior in turn, it's never been completely deterministic like a state machine, you're guaranteed to go from X Y to Z. It just converges toward the declarative. The users declared state so like yeah to me it seems declarative, but I'm definitely open to hearing why it's not.

F

Yeah I guess I guess the non-declarative nature I'm thinking of is like the like, the Zola spread, May adjust um and that may have effects on the application.

F

um So, like maybe from the like staple set perspective, you know we're still declaring what the eventual outcome is going to be. But, like Drew pointed out, you know there are some. Maybe some rebalancing risks that you have if update takes too long and that sort of thing so there's these side effects that may result if we're not necessarily rescheduling to the same Zone, that a pod was updated from foreign.

C

Kind of different in that, like it's, the dog, wagging the tail right like or the tail wagging the dog right, you scheduled storage and then the storage draws the Pod to it. I.

C

Don't necessarily think that would change if you mean, because really what you're looking at the proposal is basically like what we want to do is we want to control termination so that we can burst it more rapidly across a particular failure, domain, right um and and I, because the storage in in the case of where you're, in the case, where you're on a cloud and you have zonal storage, you're still always going to get rescheduled to the same Zone like that's, that's just the way the scheduler is going to look at it, they're not going to put you on a node where you can't get the storage that you want.

C

um So yeah I mean like I. Don't I, don't see it like you're saying like if someone adds a Zone while you're doing the disruption or like what? What? What's the use case, you're thinking where it would be really super risky and edgy.

F

Yeah I think either if you have like uh capacity constraint during an update um like maybe you have a very aggressive, auto scaler scales down um once resources are not being consumed.

C

Like if you use, if you use cluster Auto scalar on AWS with EBS, just as a for instance, um then you scale down to zero and you have staple sets that are provisioned um and looking to try to mount volumes in the zone where, where it's already been scaled down to zero, you, you can get unschedulable pods, that just kind of hang and break so that that is already a thing.

C

F

Yeah that makes sense.

A

Okay, hearing on, let's jump to another two topics: sorry I.

H

Have a question: okay, um just wanted to clarify the expectations about the next steps for the um staple said, update strategy proposal, specifically I heard that we want to solidify the API in public, um but I'm wondering what the bar is like.

H

What we're looking for specifically before we would accept uh a kept as the forum for discussion discussion on what that API looks like because what I'm hearing is that there are already a bunch of different companies who have either open source or closed source implementations of this, that they have some API that they're using and whether or not they're able to share the code and have like competing essentially uh uh out of tree Solutions uh to this they can probably share the API.

H

That's been successful for them, so I'm wondering if, if and when a kept would be the appropriate form for that discussion. It's.

C

Really up to the contributors right like if you guys want to move forward with a cap right now, we'll review it, um and if we think it we can be confident and you're you're willing to contribute the code to make it run. We will Shepherd it through committee I'm, just suggesting that you might want if you have working code right now right.

C

One thing that, as in the past, help motivate contributions to move around rapidly Intrigue with a large degree of success is to do it as an open source thing and start a working group and then just merge it and after, if you don't want to do that, I mean like. If you want to open a cap, that's fine too,.

H

Okay, thank you for clarifying. It's.

C

It's they're your contributions. I just want to help you make them.

H

Great thanks, so that makes sense Drew.

B

Yeah, it does I think Mariana you're. Also, we already have the thing open source and you're also willing to work on it. So maybe we can coordinate doctor as well. Yeah.

D

B

D

Can discuss, for example, what we have that doesn't work for you as a different type of problems, yeah.

B

We're only talking about that.

C

One thing I would just put it put out there so for a while now cigar architecture has been very much in the position that we should do as much as possible using custom resources and out of tree and then do the things that are really necessary to do in court in court. Right.

G

C

That is kind of one motivator. Taking that approach might help build more Community around it and help you get it through faster.

C

That being said like, if you know, if, for some reason you feel like it would be better just to take the approach of doing it again and building a tree, you can feel free again to open up account.

B

Ing question with that is I'm not been part of the process where you saw with open source contributions like I'm not used to the park like like. We can, for example, I'm sure it's very easy for me to add what I need um to Rihanna's open source project and then it's a team of two companies which know what it you know. Who have this one open source solution, and maybe we can use it for the other elasticsearch use case for us right, but then how do we do that reach out to other people?

B

How do we say Hey? Listen, this exists. Maybe you want to use this right because we're not advertisers for these features. We don't I, don't know if there's a forum to talk about this or or people engage about this, so my concern is it'll just stall at that point, where, like it's just us, we figured to figure this out and then, where do we go from there? We.

C

Can create help create a forum for you, so we could do a Sig act. Working.

A

C

Specifically, on like topology aware, updates, I would love to see that evolved to also respect the incense. We could involve six scheduling because I'm sure they would be interested as well and see if we can get some contributors from there. That would also kind of blend in um so that that would be one act if you look at, for instance, snapshots right like that's a feature that has broad utilization across the community that I, don't it's not entry at all right it is.

C

It evolved completely alongside of it right so there there are ways that we can make this happen. If there is enough interest- and it doesn't have to be a large group of contributors to do it.

H

C

You know if, if you feel like, like I'm, I am hearing that people are saying. Maybe we have enough data like if you feel like there is enough data from various users. You can also go and open up account. My my one pushback would be again what I, from what I've seen presented today is that the data you have seems very cloud provider focused and even like the type of topology that you're talking about respecting is native to Cloud providers, right which isn't representative of the entire community of users.

C

So just looking at the API, you have I'm like that's. That may not be ready and the first feedback you're, probably going to get from the broader Community, is that well. This is very different from the topology constraints that we use for zonal spreading, which are aware of or potentially support, arbitrary failure domains, and this is very focused on cloud provider zones.

C

You know so like and there's again, if you do that out of tree, there's nothing wrong with doing that at all.

C

Right, like you, can have cloud provider specific functionality or functionality, that's specifically supporting kubernetes running on public Cloud, but if you want to build it, Intrigue, then coming down to a common denominator that can be leveraged for people who are running on do-it-yourself data centers would probably be part of the Mandate for like actually merging it right, like you can't say like well, if you don't have a concept of a Zone you have to inside of the failure, domains that you're running your infrastructure on top of come up with that Concept.

C

In order to use this feature, um especially when the scheduling constraints that we use for anti-affinity and affinity, don't work that way.

B

So just do things on that. um Can we not use the same idea of just having letting people specify a generic label like we do for Affinity empty Infinity? Can we not just adopt that enough.

C

That would be how I would approach it right, but okay, like if you were going to open source what you had today and make it public available that isn't what you built or what you've been using right and it's not what AWS has built or has been using from what I can tell.

C

So, if we're going like, if we're going to build something and then build it in Trio, my my opinion right, which is not this is just a pleasure, would be I would do it out of tree first, because you can iterate very rapidly on it, get some use on it.

C

Bring that back in house use it at AWS use it at Shopify demonstrate that you've got the right thing and then that's a strong motivator for like well, and it's Google too I mean you have two out of like three of the largest public clouds already using it. On top of you know, some major um tech companies like Shopify already using it, doesn't make sense to not take an entry at that point right.

C

um That's kind of my two sense now the other thing would be like if we find it's too difficult to do it like, because if you want to use staple stat as a primitive, if you're not trying to Fork staple set- and you want it to be a rolling update mechanism- that's embedded in the controller itself, it may prove that, like okay, the only way to do this is Alpha feature inside of staple set, which you know we do a cap and do it that way too.

C

That's also not a bad path, but it it seems like we were able to do this out of tree already, at least in several several cases. Having one method that you know is like this is great. It works for all three of us as opposed to three different things and then saying well. This is what we would like to take as an API and offer to the public in general. To utilize would be a very strong case to me.

A

Okay, thank you. A different venue that you might want to reach out to is uh Michelle from six store to reach out to us last time. There is a link to the data on kubernetes community. uh They are specifically talking about running uh storage. Related uh applications on Queue may be worth syncing with them. uh We're planning to to reach out to them as a group um and gathered their additional feedbacks.

A

Maybe they will be able to uh to either confirm your current approaches or add additional use cases that you might not have thought about before or basically uh before info.

G

Yeah, let me stump for that a little bit more we're working on related issues. It's not directly related but related issues in the data protection working group of six storage, so just stump in for that.

B

Awesome. Thank you.

A

Okay, Final Call. Would you know what else have any questions last minute uh comments.

A

Okay, cool uh the two other items, I see from Raul and Ilya, both of them they've been um brought up a couple of times before I. Remember, Ken, that you were requested to look at the uh PVC Recreation uh from last time. I'm, not sure. If you had any chance, it would be nice to uh for you to have a look and probably similar questions for for aliapr about the uh multiple pdbs on a pod.

C

Yeah, we I think look I think was aware of the multiple pdb, uh the it's PVC Recreation or PVC delete PB deletion, uh Recreation.

C

We'll take a look thanks.

A

Thanks a lot uh does anyone else have any other topics that they want to bring up with the group.

A

Okay hearing none, so thank you very much all today and see you next time enjoy the rest of your day. Take care.

F