Kubernetes Cloud Provider Special Interest Group, 5 Jul 2023

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: SIG Cloud Provider 2023-07-05

Description

Meeting Agenda: https://docs.google.com/document/d/1OZE-ub-v6B8y-GuaWejL-vU_f9jsjBbrim4LtTfxssw/edit#bookmark=id.qkklje7sz781

A

All right welcome. Everyone today is Wednesday July, the 5th 2023, and this is the kubernetes Sig cloud provider. Meeting uh Sig cloud provider is a sub-project of kubernetes sigs and as such, we follow their meeting guidance, which essentially says please treat each other as you would expect to be treated, which is to say, please be kind to each other and also please raise your hand if you'd like to talk uh and I will call on you.

A

So um I would ask everyone to add yourself to the attending list on the uh on the agenda here. Yeah Bridget go ahead.

B

um For whatever reason, the this meeting is being recorded is still on screen, it might be on a different screen for you, but it's like blocking the thing you're trying to share.

B

If you can click the got it that.

A

Is really weird, let me see.

A

B

A

B

Mind that was actually mine. It was just positioned perfectly to look like it was yours, um so exciting and challenging awesome all right. Moving on.

A

uh Yeah, so please add yourself to the attending if you'd like to on the agenda and otherwise we'll get started by going through um the sub projects and I guess we'll come back to the scrub triage at the end, uh so I'll just go through these. uh Do we have anyone from Alibaba or Baidu um AWS.

C

uh Yes, uh I'm formative. Yes,.

A

Yeah, do you have any updates for us uh Kirsten.

C

uh No uh uh I mean I think a place to place meeting like there's this discussion to add AWS click uh I mean to have like three core produce for testing in Upstream path and uh like um plan to work on that hit. That.

A

Awesome, that's great to hear and welcome. uh You know. Welcome to the group Kirsten. Thank you uh Bridget, you still had your hand up. Did you have something you wanted to add or Azure is next anyways.

B

um Yeah I tried to lower my hand if it's still up sorry about that. I, don't know I'm, apparently having a lot of Zoom challenges today,.

D

B

But uh no I don't have anything specific for provider Azure. Today, I put some agenda items about other topics, but I don't have anything for Azure. Today it looks like we haven't, had a release since last time we discussed it so probably because of there was the various holidays.

A

E

Yeah, just adding a note about the uh ongoing effort to try to get um the cloud controller manager running by default. Kubernetes.

A

Yeah thanks Andrew um I. Don't think we have anyone here from Huawei Cloud, uh IBM cloud I think we've gone through every openstack.

A

Be sphere: okay, um all right, so we'll go down to the regular agenda meetings at The, Irregular agenda items. uh The first item is carried over from the last meeting and this was from Bridget about. We need to fill out the annual report. Yeah.

B

I put it on here, so we wouldn't forget it. It I'll take that and clone it and put it in a Google doc and link to it from this, so that we can start working on it before we are ready to file the pr.

B

um This is because this is it looks like this is just a the template that they generated for us.

C

B

So probably you and I should think I'm out a little bit later and bring it to the next meeting.

A

Okay, cool yeah! If you want to start a doc, I'll be happy to to kind of give whatever input I. Can there.

A

Okay, um so the next topic is about the requirements to graduate disabled Cloud providers and disable kublit crowd credential provider to Beta um I, don't know who added this one.

A

I think we were talking about it last time, um I have reached out to Dems and uh Antonio and Ben the Elder about coming to one of our meetings to talk about kind of some of the next steps that we need to do to be able to get that enhancement across the line.

A

um I, don't think Antonio could make it this week, but I think he's going to come next week or in the next meeting uh to try and help us out, but we probably we probably need to coordinate a little bit more uh and figure out like kind of what we want to ask and everything um I think right now we had just talked about having uh having kind of a general conversation with them.

A

Does anybody have like questions or comments about this topic.

E

You recall what was the biggest blocker for this. It was for the pr.

E

Is it just like a matter of um getting the templates updated and um getting the cap in Good Ship, because it's a pretty old cap right.

B

I think there are docs requirements like it got rejected, um because there we weren't in compliance with the current cap standards for docs, for moving to Beta.

A

Gotcha, okay, yeah I, think that was a big part of it was we need we need to describe for every component. You know what the upgrade process needs to be and I think there were a few other points that were called out there that we needed to fill out on the cup.

E

Yeah makes sense.

B

And to be clear, it's not in the Cup itself, it's the production, Readiness review components to progress the cap to the next stage.

B

I, don't think the Cup itself had too many problems, except that it's missing the stuff that is now required.

C

E

Yeah I'm debating if it would be worthwhile to just um instead of trying to get the current kept in shape, just like starting from scratch on the from the template and then back filling information um because, like I, haven't looked at the cap for a while, but I, don't remember when that kept was written. But template changed like quite a bit since then um anyways yeah, okay, we can. We can chat about it more later, but.

B

Andrew, do you want to start with that action item, or do you want El, Nico and I just start with that or like what? What's your thinking about um I mean I'm I'm, not trying to be a blocker and volunteer for more stuff, but I have written uh cups and progressed stuff in caps in the last, like several releases, so I know the the process has changed a little but yeah I'm just wondering if you want to start with that or if you want us to start with that.

E

Yeah I guess I was wondering um yeah I guess it'd be dependent on like what the blockers are. If it's just a matter of getting the cap in good shape, then um yeah I get like it'd be great. If.

A

um Really anyone in.

E

The sake is, is who has a Cycles can kind of take it if there were like specific blockers or requirements that um that's not only missing in the PR, but like we like some prereq, some work has to happen to.

C

D

E

Prr um then it probably makes sense to do that first, um but.

A

Yeah, it sounds like.

E

We'll have more clarity, um but that when we have a chat with um dims and and then Kanye.

E

I guess it wouldn't hurt to start with the cap update anyways.

A

Because there's.

E

Regardless of block words, there's a bunch of stuff that we need to update.

B

Okay makes sense.

A

Okay, um yeah I mean I, don't I, don't necessarily know if I would have time to do that. But I could I could certainly take a look at the template.

A

Okay, uh so it sounds like we need, like probably a little more information there um and then we'll need to redraft it, probably uh okay, so we talked about this a little last time, but Nick's role in the sing, um I don't know Kirsten. Maybe you have more more insight on this uh or what I'm not sure what uh what we have to say here.

C

I mean uh he's on paternity leave like for last one month, so I could talk to him and I'll mentioned I messaged him, but uh so I can check with him when he's back, which is mostly uh next month,.

E

C

E

Think this can wait till he's back like yeah, um but.

A

Yeah, whenever he's.

E

Back, let's just um maybe have him come to whichever.

A

E

Meeting uh comes after when he's back and then just discuss the discuss. It then.

A

Okay, all right sounds good all right. So next we got a question. We got a topic from Joel here: uh gracefully releasing the leader election lease on shutdown, uh go ahead; Joseph.

F

Yeah so I don't know how much people know about the leader election code in in Cube, but there's this option called release on Council. um What really some Council does is. Basically, when the podcast signals shut down, it releases the lease typically, the default is pretty short. So if you don't bother with this, then you know you're only waiting, 10 seconds or so, um but we do expose the the kind of knobs to go and configure those and make those longer uh in openshift.

F

We have some pretty long lead reduction leases because of some maths that uh David East did at some point, um and it means that actually, what we were seeing is in the worst case, during an upgrade. You can end up with about two minutes of downtime uh based on these. These things now for the CCM. The issue there is that you've got nodes going in and out of load balancers.

F

While this is down, you know we were looking at upgrades and stuff um adding new nodes, so it would be better if we could do the release on cancel. So I wondered if anyone has done any Research into it and worked out whether it may or may not be safe, uh and if that hasn't been done, does anyone object to us looking into that in in 128 to see if that's something we can enable safely.

E

And to clarify is: is this just regular leader election from the CCM, or is this using the like the migration.

F

Just the regular between multiple copies of the same CCM uh leader election thing, so yeah the standard version and.

E

It's just a controller manager use release on cancel.

F

uh Cloud control manager is working towards that. uh They found some issues with certain controllers that weren't shutting down gracefully, so they haven't enabled it yet. um Basically, you have to make sure that all of your controllers basically pass through a context and will shut down correctly when that context is canceled to enable it.

F

um Obviously, we have a much smaller surface than KCM, um but I believe speaking to March 8th, that's pretty much ready to go in the next release, or so for them.

E

Does that mean KCM has the same uh like one one minute or so issue.

F

In in openshift, we actually keep the KCM to the default. So it's much shorter because of this known issue, um but I was asked to ask you know: could we do this for the CCMS, because then we can leave it on the longer longer leases.

F

um The reason, one of the reasons for the long release is obviously, if you've got lots of components in your cluster and they're all regularly updating retrying leases every three seconds. It's quite a lot of load on the API server, so we're trying to mitigate that a little bit.

F

um So it's not a desperate need, but I think it would be something good for us to do.

E

Yeah, it makes sense.

F

Yeah I'm guessing no one here has really looked into it. So maybe I should create an issue somewhere and we can start looking into it and Gathering the data. We need to make a decision.

B

That sounds like a good idea.

F

Where should that be yeah.

A

uh I think make an issue on the cloud provider repo.

E

Okay, uh you could probably just use to just do it in kubernetes kubernetes, the cloud provider refill is mainly like use the backlog for a sake yeah, but yeah I think we'll get more visibility in the main repo.

E

But without looking into it more deeply, I I think what probably makes sense is for us to do whatever KCM does so like if KCM is probably going to support release on cancel 128 like I, don't see why we wouldn't just um lump the change in together, like with CCM as well.

F

Yeah, it's just. You need to basically check that things are doing the shutdown correctly.

E

um Yeah, or is that.

F

um uh My understanding for KCM is they're just going to flick the switch on it. I've seen other controllers expose this as a command line flag and that end users choose um when they're configuring it, but I think we don't expose it in the CCM libraries uh and KCM I, don't believe, does right now: okay,.

E

Makes sense: okay,.

F

Cool I actually have two more topics um which are related, but I added the last one after Walter swipe at the bottom, um so um Regular attendees know I've been banging on about this Azure three four, nine nine issue for a little while um so actually started seeing the same behaviors on AWS recently uh so Alexander's joined us today, who has been working on a lot of improvements to how uh disruption uh is is handled with with low answers and CCMS, and the configuration that we make.

F

One of the changes that he made uh is to do with reducing the number of times that a external traffic policy, local Service reconfigures, uh the number of nodes that are attached to the load balancer when that fix, went in uh it actually broke AWS, and so previously nodes would be reconfigured when the node went ready or unready, um and so that's fine, no guys already.

F

Aws removes it from the load balancer, uh with the change that Alexander put in now for the external traffic policy, local service that note set is only reconciled when the node is added or there's a special taint or a special label. You can use to exclude it and I think now something to do it. Provider IDs has just gone in as well, but that's by the buy.

F

The point is it's no longer based on whether it's ready or not, now one of the side effects of removing it when it's unready is that you remove it preemptively as it stops being able to serve traffic, and that means your health checks.

F

Don't actually need to be that great what we found with Azure earlier in the year and now what we're finding related with this is that the health checks that are implemented don't give you the ability to check that the node is able to root to the cluster back end, so I mean ATP, local I, think I mean ETP cluster anyway.

F

The the point is same problem that we've seen on Azure already in the year, I've kind of written up the details here, we're now seeing on AWS, and we saw disruption, go from like under 10 seconds, I think if we would have our stats we're looking at like seven or eight as our RP 95.

F

um To like 15 to 20 seconds reliably of disruption when we were when we introduced the rebase onto the 127 branch, so I've proposed a fix for AWS and basically, what that is is let's move over to using. If, if a user has done nothing for the health check, they have specified, not the port, not the protocol, nor the path move over to the same Health checking uh that gcp does, which is to use the cube proxy endpoint um and then, if they have specified anything, retain the old Behavior.

F

So in theory this shouldn't be a breaking change um with the testing that I then did on this. We were getting zero seconds of disruption in our testing zero to one seconds um on the E20 runs. We did so I think this actually improves on what we had prior to the changes that went in um and I believe. This is related to another kept that Alexander's been working on the 3836, which recommends this kind of path forward as well. So this is the pr for AWS which I wanted to bring up um KFAN.

F

If there's anything, you can do to help move that along. That would be appreciated um and I did also want to ask Bridget as well about the there is a PR to do the same thing. On Azure um last I heard there was some internal discussion about whether it would be a breaking change or not, and I haven't seen anything since yeah.

B

I I did talk to pingfei last week and I'm trying to get an update surfaced over over on this issue. It just isn't there yet. So that's I'm gonna think and make sure that kind of is like time. Zones are what they are, and uh there was a very inconvenient U.S holiday that you may know something about something about us and some colonialism, and you know that got in the way of this week being at all useful in terms of class Globe collaboration.

B

But I will try to get that to be updated.

F

E

So that's kind of sorry um I wasn't fully following all that, but but I'm trying to just unwrap my head around the problem. So the problem is basically um we made a change where we don't consider the no status when we, when, when service controller, basically evaluates the nodes for low bouncer and that causes problem only for um traffic policy cluster, because we don't actually like check the um health check.

E

um The node Port health check. Basically, but but isn't the problem that or isn't the limitation here that Q proxy doesn't serve that health check endpoint. If you use external traffic policy cluster.

F

There are, there are a couple of ways you can check this uh for HP local there's, one endpoint and Alex do chime in if I'm getting this wrong. There's one end point, but then, if you are wanting to do a cluster, if, if you have q proxy running right and it is running on the Node, it can root to any back-end pod within the cluster. So all you need to do is check the modes ability to root that. So what do you need to check?

F

You need to check if Q proxy is running, so what the Google cloud provider does is checks just the the normal healthy endpoint of the key proxy? If that's running, then that can route the traffic to the back end correctly.

F

Now, what we've seen is combination of that and the graceful note shutdown feature is that when you go to shut down the node right, Cube proxy gets the signal shut down. You configure a graceful node term, a graceful termination period on the Pod. It will then start failing the help C check, but still is able to route that traffic to the back end by failing the health C check.

F

If your load balancer is looking at that it gets time to do the graceful drain of the connections on that right, because you know it's still running for 30 seconds.

F

It still has the ability to root that traffic, but it's saying it doesn't after 10 or 15 seconds, it gets removed from the load balancer list uh healthy list stops getting new connections, and then you don't drop any connections, whereas when you check the node ports right, the node Port will always Root Down to the pulse Hub check, and it doesn't matter for an ETP cluster, whether that's on the Node you're on or a different node, and so that will always report healthy.

F

It will never report bad unless all of the pods in that service are dead um and it'll only stop reporting uh a healthy. When the node is gone, when it can no longer traffic, at which point you can't do the graceful drone.

E

D

D

F

Cool this also then leads on to this. The next one um which I I'm sorry to skip Walters topic, um part of the changes that Alex has been making um which I've read through the Caps I agree with. He also did a great talk, a few common which I recommend you watch.

F

If you want to understand this better um it we no longer have this check already or not right, and so one of the changes has gone in the service controller will look at the set of nodes that exist in the cluster and in an attempt to reduce the number of cloud provider calls I believe it now creates a cache of you know.

F

These are the nodes I sent last time to the cloud provider and if, according to its set of predicates, um that list doesn't change, it doesn't go and tell the cloud provider to keep updating the list. So it is the one that is making the decision. Should this node be in the cluster load balancer, or should it not um the predicates for ETP, local um and there's a new feature gate which I believe is on by default now uh called stable, node predicates?

F

It only looks at the taint the label, the provider id um and something else it doesn't look at the Readiness of the note so on Azure and gcp. This is fine, it calculates a set of nodes and whenever that changes it sends it to update no load balances um hosts and they just pass through that list.

F

We found an issue with Azure Bridget. Did you want to chime in drawings? Finish? Oh.

B

I actually was when you mention it being on by default. Do you mean because any beta thing is on by default, because that changed and now things aren't down by default, unless you explicitly make them on anyway, when you said I'm by default, I was like: oh I, don't know what stage this is at, but if it's beta, that's no longer the default to be on by default. Okay,.

D

It's beta in any case, I, don't know if it's on by default or whether the rules changed or not. But it's beta.

B

Yeah, just so just so before we keep moving on I want to make sure we don't have a misconception because, two or three releases ago now the the default for beta when something goes to Beta. It doesn't flip on magically, because I guess that was too disruptive.

E

I think that only applies for if you're, adding a runtime API, that's data, but I think future gates are still if it's like just a code, change gated with a feature gate and you promote that the data and on I think that's still on the default.

B

We may need to explore that anyway, not.

E

C

B

Joel, please continue I just want to make sure we note that that is possibly not the way. We think it is.

F

um I've just checked the code it is on by default, fantastic, okay, so it's on by default, basically, the the CCM no longer checks for Readiness, and it will only send when it sees the nodes update so basically when a node is added or removed. Now, if you look in the Azure code, unfortunately Azure code is doing its own checks again. Is this node ready? Is this node? Not so it has this cache of.

F

Should I exclude this from the Nova, the load balancer for various reasons, one of which is the note Readiness and basically, what we're seeing is that when the CCM sees oh here's some new nodes, it goes, my node set is different. It pulls update, load, balancer, azure's cache says these nodes aren't ready, so don't add them to the load balancer and then eventually those nodes turn ready, but the CCM doesn't see that change right, because it's not looking at the Readiness, so it never says, go and update the load. Balancer azure's cache internally says yep.

F

These are ready to go, but it never updates. So we've we've been reliably reproducing this for, like three days I've been looking at this um every time, I create a cluster, I, add the worker nodes and they just don't get added. There are a few ways you can trigger it. So if you do want to be re-added, you can restart the KCM.

F

You can add another node, because that will trigger it to update the balancer list, but the new node won't get added only the old ones that are now ready uh or you can wait for the full resync, which I think it takes 10 hours so again, I've written up an issue. This is related to the previous one, I think. If we can get those two PR's merged that fix the health checks, we can get rid of the node Readiness checks in in Azure I think it needs a little bit of some re-architecture there.

F

So I definitely want some input from from the maintainers of visual month, but I wanted to bring up here today, one to put some visibility in and also in case there's any other Cloud providers who happen to be using the node Readiness as some condition when adding their their nodes to the logo. So, basically don't do that anymore. Rely on the hosts that the CCN gives you when it calls update, load, balancer and just apply that and everything is Jolly from that point.

B

So yeah and I will bring this back to my colleagues and try to get an update on this exciting new uh issue and I I'm, not being sarcastic I. Think that sounded sarcastic I'm actually really excited about this. It's like oh, so much detail. Thank you. Yeah.

D

uh One other measure you could take to to resolve the situation is maybe update the the service that has the load down so by adding an annotation or a label or whatever just to trigger the re-sync of the service, uh and in that case you could uh probably fix it.

F

Yeah I think I've written some details on the issue. I've gone into as much detail as I can going through the code of where I see with the issues and um the the way that I was reliability, reproducing it is to create a cluster and create a single ETP local uh service, and with that it was, it was reproducing reliably because I say all the issues are in 4230, um so yeah.

F

Hopefully we can come up with a fix, but I think the first thing to do is that for AWS amateur all the pools uh that I mentioned in the last topic.

E

Joel I think you're I think you're ahead of everyone here, because um it's gonna be like maybe a year or two before, like a lot of managed services are going to be like rolling out 127.. So you're probably catching this issue like ahead of the crew. um It's.

D

Also on 126 and most Cloud providers have already officially started supporting one of the 26., so I know Azure does so so.

F

4230 the one I was just mentioning. We have also reproduced in 126, um but it's not present in 125.

E

What's the what's the reason for uh not looking at the node Readiness, was it just causing too much churn? No.

D

There were other bugs as well with regards to that, where it could, for example, so if a node flaps uh when it comes to the Readiness conditions or between not ready and ready if the node is flapping, but the application or the Pod running on that node is healthy. So essentially, there isn't really a problem with the node, but for whatever reason, the node object is flapping State. Well, in that case the the node gets removed from the load balancer and if it's ETP local, um then Ingress for that pod is essentially cut right.

D

So there's there's an Ingress downtime there. So that was the original issue as to why this uh this fix was uh was implemented in 1.26, um yeah and.

F

Node node Readiness doesn't necessarily mean it can't root for an ATP cluster as well right like just because the node says the disk is full, doesn't mean q proxy. Can't you know, root traffic to the back end.

F

um I think that one of the reasons it was added you know is because of these health checks and health checks are hard right, um but I think we're getting there with that. I think that's what's written up in 3836 or whatever the cap number was I can't remember now.

F

um You know there are better ways that we can do this. That mean that disruption is, is getting to zero, and you know I've been testing on AWS and Azure recently and we've gone. You know, openshift does a bunch of statistical analysis on the the low balance of disruption when we do upgrades right, that's what we care about our customers, don't want that pods disrupted um and we've had some serious spikes from you know, gcp with a few bucks there, um but AWS and Azure have consistently been.

F

You know four to ten seconds of downtime, with the changes that we're working on here we're getting reliably zero downtime, which is just that. That's what we want right. That's the perfect the goal so yeah I think the work that has been done here is is good. We just need to get it in and make sure we don't break people and I think kind of get the word out to get people using these new health checks um and the graceful node shutdown stuff so that we can get people with good disruption. Metrics.

B

Thank you so much for working on this Joel and I kind of wonder. When you mentioned, we don't want to break people. One of the things that will probably all the managed services will need to do is make sure we have a path for people that is either well documented or backwards compatible. You know what I mean like nobody wants to surprise their customers with oops. Your clusters are broken well, so yeah, even if the change is a good change, we can't make a surprising thing happen.

F

Yeah I totally get that and I think so. There's a few variables right in a cluster, so one of them is, you may not have cute proxy um I. Think Q proxy is implemented for most cnis but I know. For example, obn kubernetes doesn't have q proxy, but they mimic the cube proxy health check endpoints. So you can still do these style of health checks.

F

um So as long as people have got that and it's working, we can make this work, but you're right. We don't want to don't do anything in a backwards in a possible way and I. Think that's why both the Azure and the AWS sprs that we put up make sure that if, if a user has any opinion at all about how their health check should look, we just keep the behavior as it was before it's only if they have zero opinion that we just swap it underneath them, and hopefully they just go.

F

Oh this looks better. Okay, that's disruption, great.

E

Yeah, it almost sounds like we're trying to do all this like sophisticated orchestration um due to lack of proper health checks. um So it makes me wonder, like I, think your the change you're proposing like tactically makes sense based on the existing Behavior.

E

That's out there and not trying to rate people, but it also makes me wonder if there needs to be a discussion with sick Network on um enhancements to not just Q proxy but like to the service spec on whether we want um like better, better help, checking API so like one one thing I can think of like instead of using Cube proxies like health health, Q prophecies tells the endpoint um two proxy can have like a dedicated port and then it'll have the it'll have like a path for the service, namespace and name for the actual Health chat of that service and like that with the behavior of that endpoint can change based on whether it's a local or cluster based traffic policy, um but it it sounds like.

E

Even if we had that then um or if we had that assumption that um all about all nodes have that health check, then we can also make a lot of simplifications to the surface controller um and like it would be a lot safer to just completely ignore note, Readiness and actually probably a lot of other conditions right.

D

And what you're saying is something that uh I and Tim Hawkins have been discussing for a while. Now, since we started working on this, so Cygnet is the primary Sig. I would say that has been involved in all of these discussions for about a year and now, but it's been mainly Tim um but yeah. It's it's known, so to speak, at least from the point of view of Sigma I'd, say.

F

Yeah I was gonna say that doesn't the message from Sydney, okay, it doesn't seem to have made it sick cloud provider in a way. um I think that's. Why we're seeing these bugs coming through right? The changes have gone into the core cloud provider, library and then we've had updates to the the providers and that's when we've then to oh. This doesn't work anymore.

D

We've been pinging people uh at least pinging Sig cloud provider and such on the both the caps and the PRS that have gone in implementing them, um but yeah there hasn't been a lot of traction so to speak and I guess maybe that's normal since we're not really joining meetings and maybe coordinating personally on things. So if you just get a ping on GitHub, it's easy easily overlooked.

D

F

It's also the fact there's so many different Cloud providers right, like implementations, need to be updated for this, um but yeah I think maybe if we bring bring more awareness into this group, we can keep better track of it and then make sure that different implementations are getting getting the updates. They need.

E

Wait Alex: are you saying that there is an existing cap for specifically, like the health check, change that we need.

D

Specifically for cube proxy, yes, let me let me find that for you right now,.

F

Link to one of the issues I met I made earlier I think 3836.

D

It's 3836 exactly you know, so you post that here in the chat. Okay,.

A

I'm also noticing we have a couple more topics, although I think we'll we'll save Walters for next time. Joe did you did you want to get into your other topic here as well.

F

uh uh That one we kind of merged in with the previous one, so we've kind of been through both of my topics: I'm ready to stop talking.

A

Okay uh and I was just gonna see if I could was there a link in chat that I I should click on here.

F

Yeah, that's that's cool I'll um copy, some links into the right places in the agenda.

A

So I guess this is the cap we should be looking at, though, that talks about this Behavior basically is that was that uh was that the story around this cap.

F

D

So it's interesting. There were two caps uh two, this one deals with health checking um made from the point of view of cute proxy and how it should handle graceful shutdown, shutdowns of nodes, um the other cap. Let me let me three, four five, eight I've just posted you much quicker than I. Am there you go.

D

That was the cap that implemented or led to the change that we've been discussing concerning Readiness uh for the node object, so the reading state of the node and why it was removed from the kccm service controller.

A

So, okay, great that's, awesome background yeah.

A

Okay, so Joel, you feel you've had a good discussion here. um I think the follow-up is to get some reviews on these on these two pulls that you had, uh but also that sounds like this kind of a bigger question as well that we just need to keep talking about here. Maybe like I'm, I, guess I'm, not. Is there an action item on the on the larger topic of, like the be the general Behavior.

F

No I'm kind of bringing it up for for visibility. Really, like you know this could be an issue in IBM. It could be an issue in in other providers as well. I have no idea. I haven't looked too far afield I've gone through AWS, gcp Azure.

F

um You know if I bring up here. Hopefully it identifies it to the other maintainers, but I think it's something we need to be aware about, and and yeah obviously like giving Bridget some homework uh to go and find out some stuff on that and those for me and then obviously the AWS one as well.

A

B

Yeah, thank you always happy to have homework from Joel yeah.

A

Thank you for bringing such a well-researched and uh history topic here. That's uh it's greatly appreciated and.

E

It also sounds like someone.

C

E

This six should probably sign up as a reviewer or like representative for some of the Caps that Alex is working on.

A

C

That's a good idea.

D

Or what I could do is maybe try to join this meeting more often and share news with with you guys as well.

A

I usually join the Signet.

D

Meeting as often as I can uh so I'm a bit more involved, but uh I could start doing it here as well. I guess yeah that'd.

E

Be great as well, I I haven't been able to make Signet due to recurring Conflict for a while. Now.

B

Feel like when they move Signet to make it a more convenient time, it became less convenient and more conflict. Having for me, which was unfortunate.

B

But so I'm not at that one as much as I could be either but yeah Alex. Also, if you want to for right there, where um we have now reviewer on the related Cuts, would you mind commenting or putting a link to the exact specific ones that we're talking about just so that when we follow up, we have the exact links.

D

In this Google Document yeah.

B

A

Can I can copy or.

B

You could, if you can grab them, that's also fine. Thank you.

F

It's something: I have an interest in so I'm happy to follow it um and try and bring stuff back if, if needs be,.

C

F

I just want to get the uh the zero zero second disruption.

A

Was this the other one? Three four, five, eight yeah.

A

Okay um already so the last topic we have here from Bridget about the upcoming Kube cons, uh take it away Bridget.

B

Yeah, absolutely thanks so um I know that it's July and these things seem like they're, a million years away, but sadly they're not and the deadlines are coming up to put our maintainer track. um Topics in and uh I did chat with um our colleagues who are going to be at the kubecon in Shanghai uh and got a an outline of what um pink Fair would be willing to give there.

B

If, because there hasn't been an in person in China for some time, but there is going to be one this year and maintainer track is available, and so we could- and my question basically is how do people feel about a maintainer track in Shanghai on this topic, and then um would anyone else be in Shanghai and want to participate?

B

And then then? The related question, of course, is who plans on uh participating in the maintainer track in Chicago who plans on being there. um It is it's uh you know in the U.S or maybe travel is easier for some folks Etc. So that's just kind of the overall question it looks like El Miko is putting in real time that he thinks that's a good idea again. I had a nice talk with pengfay.

B

That was basically we need to, um and I sent him the video for um uh Michael and Joel and I in Amsterdam and was like a talk like this with some technical depth with the topics that are, you know of interest to you and it can't be Azure specific, though, of course you can use examples, but um this is intended to be instructional for people who are watching videos or participating in live and kubecon Shanghai, and he took it, ran with it and kind of came up with a few topics there. So basically yeah.

B

That's unless people object greatly, I think elmiko and I are the ones who would be proposing these. So mostly I just want to know if anyone else going to be in China who could participate in a talk, I.

E

B

If anyone else plans on going to that one I.

A

Mean yeah I have I, have no plans to go to that one, but I think it's awesome for us to have more representation. There I think you know yeah getting some new colleagues involved, especially in Shanghai I, would be I. Think it's tremendous okay, yeah I.

E

Agree I mean I would go, get the travel approved, but probably not so yeah.

B

Well, how about getting travel to Chicago Andrew? Can we can we get you in the maintainer track? There.

E

uh That's the hope, but uh I don't know yet so usually um yeah like like travel policy right now, is kind of influx. So what's kubecon Europe, it was kind of um prioritized for folks that are actually like local to Europe. um So I'm, more hopeful about this kubecon but I haven't heard any details. So I don't know.

B

Okay, I know what we did last time was um I, think Andrew and Nick put in the maintainer track proposal, and then we ended up uh swapping out the speakers based on who could be there and they were generally okay. With that um do we um do.

B

We want uh El, Miku and I to put in at least a placeholder for Chicago with people we think will be there and then maybe I would love to have people who haven't given the maintainer track, update, give it if possible, or at least not lately- and this is not me saying- El Nico and I can't be up there constantly. We certainly can, but it would be super. If say, Andrew could be up there.

B

Yeah, for example, Joel you're, gonna, you're gonna come to kubecon North America, or is that a probably.

F

Not Apple budget is not looking good. um I will certainly ask, but.

B

Magic eight ball says yeah.

F

Getting a free ticket to kubecon would help the point, but it's the flight's hotel at the moment, we're very.

B

Strict yeah yeah, okay. Well, we can see who is going to be possible to put on it, but it would be super if we had. You know Andrew Etc, all right. Basically, I want to talk about that and then I will sync with El Miko separately and uh get the proposals hammered out and uh turned in for pengfay, because I think having us give a sid cloud provider. Update in China would be really superb.

A

Yeah totally and yeah feel free to hit me up if you want any help or anything yeah.

B

I have an outline I just need to kind of you know how it is, make it a little bit more succinct, yeah.

A

For sure, okay cool, um that brings us to the end of the agenda anything else. People want to talk about.

B

Is Walter the main person we'll definitely move that topic about the cloud provider test accounts, but what he wrote about. Azure sticking out I just was like looking at that trying to look for more information, so um I know that there was a. There was a whole thing about the infrastructure that was donated and all this stuff, and it's like that might be details, but in case Walter watches this or you know, reads anything later: I want to make sure we actually look into that.

A

Yeah I I, don't I, don't have any further information on that I'm guessing that he was calling it out because yeah like vsphere can be run on top of something else, um and you know the GCE costs I, don't know how those get calculated, but.

E

A

E

Basing it off of like which jobs are failing the most but yeah.

C

E

Give him a ping internally see if yeah.

B

Maybe just see if he wants to either come to the next one or sync with us um between meetings, just to make sure we follow up on this, because it sounds important.

E

Who needs tests.

A

Okay, well with that, um thanks everyone for coming out uh and I will work on getting the video uploaded and hopefully it'll be up soon. So yeah have a good one until next time.