Kubernetes SIG Cluster Lifecycle, 1 Oct 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 2020-10-01 CAPZ Office Hours

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

Okay, I challenge myself by doing everything in french. Sometimes it makes sense all right.

A

B

I was gonna.

A

Start recording.

B

It says record.

A

All right, fine, um hello, everybody welcome it's october, 1st 2020.. This is the kubernetes cluster api for azure uh meeting um we are being recorded and this is a kubernetes sigs, uh meaning so please be on your best behavior. Basically, uh let's try not to talk over each other and uh maybe use the raised hand feature, although if it's just the four of us, it's probably not that urgent.

A

Please, and uh please add your name to the attendees list here, so we have a record uh there's no new members or attendees as far as I can tell, but if somebody wants to say something random or unrelated, this would be a great time.

A

Otherwise, let's just move on to the open discussion: unless people have something.

A

uh Cecil you want to go first with cap.

C

Yeah I added this at the top just wanted to mention that the zero three ten cappy release just dropped um and I think the we had e2e signal on the cabzi pr that nader had opened.

C

So we should be good, but I think now we can move that forward to 0 310 and we should be able to bump it today.

D

I'll I'll change the pr and push it now with, like the latest.

C

Yeah great and then we'll follow up with the cabsie release, uh probably tomorrow or early next week, depending on. If there are any epr's waiting cool.

A

um Yeah I've been my testing. I've been rebasing off of that, so that'd be great when it's a final um want to move on to nader.

B

You want to talk about cappy case kcv, uh yeah.

D

So this is the we in the last meeting two weeks ago. We said we would increase the timeout for that and see how it would look uh over like a period of two weeks to see if the flakiness is any less. I think it's been less, but in the last few days this couple of days has been filling again consistently and in that pr, with 310 I had to increase the timeout I kept failing until I increased that time out again.

D

So if you, if you don't mind clicking on like the test grid link, it will show like the pattern uh I haven't yet investigated. I mean it's been failing on timeouts, but I haven't looked into what happened. I would I thought the the link here in the description will take you to the actual grid of the test.

D

I would have thought with the change to delete uh that cecile made that things should have been faster. I mean tests in general are faster, but this upgrade one fails on like waiting for the upgrade to happen. So uh it's kind of not related to this specifically, but.

C

And we haven't been able to repair this locally right.

D

I, for me it works locally, but.

A

D

Look into it again, this one, uh the upgrade one, the gtp upgrade of the cluster api test um has just been timing out consistently. Until so last meeting, we said we'd bump the timer wait like that. It waits for the upgrade to happen and see if that's going to fix things fix things a little bit but not 100.

D

And in the pr for like 310, I had to pump it up again because it kept failing as well or especially.

E

So what is the current setting? What's the current amount of time.

C

It was like 15 minutes or something.

D

It was 40 minutes, yeah yeah, so this is for all like the like four. It's like the this waits for all the vms to be created with the new version and then the old ones removed as well. So the deletion of the vms is included in that time.

C

But the deletion of an individual machine won't be improved with my delete changes because it only improves when you delete the whole cluster.

D

Yeah, because all the times it it it failed, you find it saying it's still finding the old ones, they're not gone yet. So that's the part where usually like the last part of the upgrade, that the test that it checks that the old one's already gone and that's what usually fails here, I'm just bringing.

A

C

D

If there's any other ideas or something we should try.

C

I can try to get a local repro. I think the best way is to find out like if we get a repro and look at the logs if it's actually failed in a state where it won't ever recover and if so, why or if it's just because it's still timing out and that it eventually gets in a good state.

D

I mean every time we've increased the timeout it passes, so it feels it's just taking too long, but yeah.

E

So, what's the theoretical amount of time that it should take, so when we go to create a vm like I, I guess what I'm saying is. It would be really great if we just had a theoretical time and then we just look at and say you know, what's how much time do we really think this is going to take?

E

um Also the the delete is, you know, definitely sub-optimal right. We're going to delete one thing: wait, try the next! Do the next wait next weight um when in fact we could probably take a slightly different approach and just do like the delete? No wait just try across everything.

E

Even if we get an error and then just hit the uh reconcile loop again just say: hey, you know, let's timeout for 15 seconds or something like that and try to do the uh delete again and no wait and and just keep cycling through like that. The problem is we'll eat up. Azure api uh calls, but we might be able to find a nice a nice balance.

E

D

Sorry, I'm just gonna say this is mostly also just like a side note of the kcp code. So it's like constrictor itself. That's managing the creation and deletion of stuff. That's very.

C

Fine well so we're managing the deletion of the azure machine right. Cluster chair tells us delete this machine, and then we delete the azure machine and the actual resources. But the thing is in the case of a machine.

C

I actually looked at it when I was looking at deletion optimization, it's really hard to do that, because all of them depend on each other like what we're deleting is the network interface, the vm, the disk and that's it and the network interface like the vm depends on a network interface and the disk depends on the vm or advisor is left. So you can't really delete one before the other is completely deleted and that's what we're doing right now, and so I feel like by adding a try to delete everything.

C

We would just have more failures and it wouldn't necessarily go faster because we would still be waiting for the same time.

E

So, under the covers, the sdk is doing an exponential back off when it's pulling for delete status, so when it goes and calls delete, it goes, and you know 2 4, 8, 32, 64, yada yada, and it does that so it could be getting like the vm gets to, like.

E

I don't know, 128 and takes a big jump to t56 or something so it could very easily like. Sometimes it works, it just works beautifully and then other times we get this really nasty uh exponential back off and we can. We can set the upper limit on on that back off or we could change the back off to be a linear back off um it. Really. It really depends like we can do some things.

E

um The problem is it just takes a little while to delete the vm.

E

It would be nice to actually just like to like drop out of the cluster like stop participating like tell it to stop participating and it's still there and we delete it behind the scenes or something.

C

Would be great is if we could have metrics for these things and know exactly how long each part takes like if we could tell like really like right now, we don't know like we're speculating that delete takes a long time, and it probably does but it'd be good if we could do an upgrade and see okay like this is what how much time each part takes.

E

You know what maybe maybe that's just what we do. That's a good idea like. Maybe we just need more information about it.

D

Yeah, even can we even like log like the times and stuff before each operation, and then we can from that collect.

D

E

We could instrument, we can log like what do we want to do? Do you want to just throw it into logs like low low tech, um or do we want to actually just start adding prometheus metrics or something.

D

Well, metrics is definitely better, but I don't know how much work it is.

C

Yeah, because if we see that the first two machines get upgraded in like 10 15 minutes and then the last one after 20, more minutes is still not upgraded. That's probably a problem and whereas like if it's just like equally distributed, that tells us more about how it can sometimes go over.

D

Maybe this is the time we have to do the metrics and the dashboard, and that will show us a lot of other useful stuff too. Not just this one problem.

C

In terms of doing metrics, do we have a clear like approach? Like I remember last time we looked at it, it wasn't really.

C

um How can I say like we, we didn't really know which way we wanted to go with metrics.

D

Yeah, I remember a while back like cluster api was trying to have like the same approach.

C

D

All the providers and all that and then, like nothing, happened there.

C

Yeah and then there's also the thing that uh brian showed at the office hours using jaeger. I don't know if that's also an option.

E

Oh yeah, I would just love to go down the open telemetry route like jaeger, plus metrics, and we're just putting out metrics. If we want to expose prometheus, we expose prometheus.

E

If we want to do something else, we plug in something else.

D

Yeah, I think we might have to start the effort here and then push it towards cluster api itself because it doesn't seem. I have not heard anybody talk about.

D

C

um Sounds good I can try to get a local repro in the meantime at least. Try to do that.

E

I why why don't I try to write up like what the metrics would be, that we're going to track for a givencontroller or how about we use this case, and we start small pick some metrics and then let's see how they work, just experiment if it works well, then maybe we expand, if not, let's just uh learn from it.

A

You can start with delete metrics.

C

Yes, it's a lot.

A

C

To expand once we have a base.

E

E

Where are we gonna put these like so prometheus, oh jaeger or prometheus? It's really! You know you just run it in your cluster and for me because it's going to go start grabbing up uh metrics.

E

um These clusters are short-pipped right. There are test clusters. Where are we putting this stuff.

E

I mean: do I pump metrics out to like app insights.

D

Probably not I mean I don't know if people would want that in their post surgery.

D

Well, or is that just for us.

E

We could use it for testing.

E

I mean if you're like, where would nader, I don't want to poison this. Where would you want your metrics.

D

I will all my metrics in.

D

On azure, I guess, unlike a different thing, on azure, if I'm running other stuff, for our case at least. Ideally, I would want like another like. I would want prometheus running somewhere so like in the different like a like an instance running and then I'm sending to it, and then everybody can install it wherever they want kind of thing.

C

Is there a way this might be crazy, but is there a way to have it accessible from pro like if I go in the artifacts folder? That gives me like there's like a file metrics, and it gives me like a link to go, see the metrics and it's publicly accessible.

D

Wouldn't be like a dashboard kind of thing, it would be like a log like this.

E

Yeah yeah, so what if we had a log outputter uh for metrics? So if you're, you know, if you wanted to like open telemetry supports different outputters, so prometheus, usually scrapes right, you have an endpoint that prometheus goes out and says: hey, tell me your metrics and then prometheus collects them and then processes them right.

E

The reason why I said app insights is because that allows us to push those and then um it's a managed service that that runs inside of azure that that's what it collects right, um but I do like the prowl thing better, because then that gives us the ability to just say: hey it's here, you don't have to have access to any. You know you know possibly permissioned resource or something um is that's kind of what you're pushing towards right seal.

C

E

It probably won't be very pretty, but we can grab our way through.

C

Like I just mean a way for people to see it who are investigating their own end-to-end tests on their pr like, I don't want this to require any special access like you, don't you shouldn't need access to like e2e resource group in azure. That's what I mean.

E

Yeah, I think it would be really weird to try to expose the prometheus endpoint outside of the cluster, to like, say some other prometheus service. That's going to be coming in to gather up logs, because if it doesn't gather them up in a timely basis and the controller ends up getting destroyed before it would be before it could get the metrics out. It would be difficult to.

E

C

So who's gonna get started on the metrics, because I know david you're already working on a few things. But you seem interested.

E

um Actually, I really just have authoring work to do at this point.

E

I was working on the uh encryption uh secret stuff, but I think we're waiting on nadir and there's. I don't think, there's much out there right now. So unless there's something we're pressing, I'm happy to do this. I'd love to.

C

That sounds good to.

E

Me, and and based on the uh the secret stuff, do we want to push that out to the next uh milestone.

C

Yeah the other milestone ends today, so unless you can get this done by lunch, um yes, it's a spring. We actually have to go through it at the end. So.

E

A

Right cool are, we, uh are we done with this topic or.

A

David volunteers is all I heard all right uh next thing. The only other thing on the list is a self-serving thing I put up there. So probably everybody knows I've been working on for a couple days, implementing the log collector for azure.

A

It basically just boils down to, and I'm just looking for some fresh ideas, because I think I see a way to get it done, but I'm not at all happy about it. um So it basically boils down to there's just a callback method and some other little scaffolding. You set up inside e2e tests and then the callback gets called when we're dumping and collecting logs from clusters we create and then inside that. Obviously you want to connect to the node and basically do a system ctl.

A

You know summarize a few essential logs and save those locally. So the trick the problem with problem with azure and the way we're setting up clusters is the only way you can actually another way. I'm aware of is ssh.

A

I looked at bastion for a while, but I it's not clear to me that that's available outside the portal or that there's an sdk wrapping it yet david's shaking his head. No don't use bastion or uh it is purely ui yeah. That's what it looked like to me. They kind of waved their hands about multiple connections and stuff, but I guess they're actually saying you want to open that many instances of the portal, so that really doesn't help us.

A

um So as far as I can tell we're stuck with ssh, the only way into the cluster safely is through the publicly advertised ip through a master and then jumping from there to a node.

A

um That's a little funky to set up. I don't want to write the jumping the tunneling code and go so. I was falling back on just uh shelling out to ssh and doing like dash j to jump through which will work, but obviously there's some other issues involving uh you know, known, hosts, key checking and all that you have to massage around, which is a pain in the butt.

A

And then the, but the most problematic thing is the contacts you get in. The callback is just here's a machine. Here's a vm go ahead and scrape the logs off of it. So the only reasonable way I see to do that is to make further either a z, cli or api calls to say, okay. Well, what resource group am I in? Do I have a load balance or public ip well go in through that and uh yes, cecile. Stop me before I go farther off the road.

C

Okay, so um so, when we first discussed this in the cappy issue, the consensus was to go with a daemon set that would run and collect the logs on the nodes. Just like we do for the conformance test. The downside of that is that you can only get the logs from nodes that have actually joined the cluster. So if a node fails to join, then you can't get the logs, which is not great, but that was the idea when we started this was okay. This is best effort. This is it's better than nothing.

C

So we'll we'll start with that, um and I'm wondering if the framework that fabrica actually like makes any assumptions on that or if it's or does it just not at all, like uh like specify how it.

A

Doesn't no, you just have a single callback method and then, at that point it's up to you. They have some useful code for like setting up how what logs you would want to scrape, but, like the doctor, docker implementation, just sort of makes a direct connection. The the uh amazon uses their. uh You know they have kind of a bastion-like service that lets you get into any post directly, which is nice. That would be nice to use.

C

So the other thing is, we do have uh the bootstrap boot diagnostics enabled now and you can.

D

B

C

From azure cli or the cook or the sdk client, you can get those logs directly. That does not include cubelet logs, though right now, which.

A

Is what we want cloud in.

C

It and boot box: well, we want cloud in it mostly for now yeah at first, um but we also want cubelet afterwards, but that's a good, easy way to get cloud in it without needing to like ssh.

C

So that's another.

A

We can get cloud in it through.

C

Yeah, it has all of cloud in it.

A

C

And you can do a z boot, something uh I forget: the command but yeah.

A

um And so we have the az, I guess I've seen it in pr, so we do have a z, the cli available to us in e2e.

C

Or you can use the client also the sdk.

A

C

A

But for some of this the cli david go ahead.

E

What if we just mount a uh share on there and part of the script just on the exit, handle just writes files to a share, and then we just grab them off the share.

E

B

E

Like a sheer mount um so like an nfs share or something like that, we set up an share mounted to the vm uh during the test runs, and then I have the exit handler in the cc. You end up like dropping those or or apply a new extension, a script extension or something like that and have it drop the files at the end onto nfs.

C

Like as opposed to a name command, oh sorry, go ahead.

D

Wouldn't that be like a persistent volume or something that's probably a little bit slow to mount and all that, or is that, like a storage, azure storage or something it's like an azure.

E

Storage um and we could have uh a nfs disk and then it's just like test run ebu id or something like that as like folder and then machine games or something I don't know.

E

Now we can grab them out of that that storage and then save them for prow.

A

Yeah that would work. It seems it kind of feels simpler to have them all mount some persistent volume and then have some other pod go in there and just pull the files off of it. But maybe I'm not thinking, do you think that's simpler or do you think just setting up the nfs and all that as a precondition for all the tests sounds like a pain, but maybe it wouldn't be that hard.

E

I don't know: ssh, isn't too horrible.

A

Well, it's kind of horrible here because tunneling through the because finding the master is not a definitive thing. You know you're kind of like making a lot of assumptions that tunneling through this one ip address will fi will get you through a host. That knows how to get to the other one and uh yeah.

C

I actually have working code and go that I had written for something else for rotating certificates that does the agent forwarding and stuff. If I want, if you want, I can share with you.

A

C

A

Yeah, I've got the I've got the commands running through exact command, but it's not it's ugly and it's also you kind of have to well. You have to run two commands right because the dash, oh uh you know, don't check if I'm in known hosts, reduce strict checking only applies to the first host, not the second one, so you have to kind of pave the way by making an initial command. That does that and then does an ssh command on the second host. So it adds it to known hosts there.

A

So you can go all the way through the second time. I don't think, there's a better approach for that, but um but with go, you could do whatever.

C

Yes I'll send it.

A

Yeah, I hadn't actually thought about the daemon set approach and modifying everything to dump stuff to a volume, because I was mostly looking at the upstream code in there and they do it a little more dynamically. So I was trying to take an ssh style approach, but.

E

That dance, I could also mount the volume.

A

Well, these are all good ideas that I hadn't thought of. So that's why I was asking you.

A

Guys, I think I'll play around with volume mounts and the damon said, approach and see. That's probably gonna get me a lot farther than grinding down this ssh.

A

Path, although I mean that's conceptually pretty simple, I guess the only thing I'm the only thing I really don't like about the ssh path is there doesn't seem to be a is the assumption that we have to look at the research group, because I don't see any better way if you're, just given the context of here's a machine and that's all essentially we get in the callback.

A

Is there some more reliable way to to go from that to getting to the way to get into the master yeah.

C

You can get the cluster because the cluster is the owner and then you can get the control plane endpoint from there. That has the dns name.

A

Okay, well, if there's a direct link there, then that doesn't sound so hacky either.

A

C

I just sent you the code, but basically it just like it's just a function that takes like the like ftdn, the command that you want to run and then like the ports, and it just runs the command on the host.

A

Okay, well, I don't have a clear path forward, but I have a lot of options now, so I think we can leave it there for right now and I'll go back to the drawing board.

A

Thanks very much probably ideas um do we have anything else. You want to say today for pepsi stuff.

C

um Do we want to take a look at the mauston.

A

That's what we want to do I'll, stop sharing, because.

A

I can't manage that board.

A

Oh boy, if I can figure out how to stop sharing.

A

C

Anyone want to drive her. I can, if.

D

Not right well, thank you.

C

Oh, you can see github right.

C

Okay, all right, um so this is technically due tomorrow yeah. um So how do we want to do this? Because I know cappy is done with like the big, like v1 alpha 3 releases, um but I think we should still do another another one, because we're not going to have the one of the four types right away or does anyone think we shouldn't do that? We should just start with the v one of the four types right now at the same time and vendor and main branch from.

D

I capy we still have things to work on that are still like in our 0-4 series. Before we get that it's going to take them a while to make that stable and make a release of it. So we can't make any and then we may have to make our own changes. So I think it might be one or two milestones even before.

C

Okay, that sounds good to me. um I say I think we should maybe try to like get the ones we didn't get done in this one and not try to like over like stuff it with new stuff.

C

So we're still like on the way to slow down um and trying to get the things that we really want to get done done instead of like trying to get new bigger features in.

C

So next time would be the earliest. I guess.

C

Okay, um so I'll just move everything, that's open and then we can.

C

A

I was just thinking, I wonder if we could close the e2e test should print more debug output. I was thinking of doing that. We could move it and then close. It either way.

C

C

Yeah, so this is what's there right now um that was so. Let's just go through the list, um enhancement proposal david. Do we want to move that to next, or is it still relevant right now.

C

E

Should get it done.

C

Does it make sense to get it done now, if we're not close to the implementation.

E

Or would you rather wait? I was I was actually uh before we started talking about metrics. I was going to bring up um writing this up, so I can start working on the refactoring for this, um so part of part of what will be needed for the reconciliation, uh the cloud resource reconciliation um would be. uh You know breaking out all of the reconcile services that we have so that we can replace an interface there with one implementation or another, so basically, all of our reconcilers.

E

If we take all those reconciles together, we have all these different services that we need to have reconciled. Basically, we need a high level service that encompasses all of those different reconcilers.

C

Isn't that what we already have with azure.service the interface.

E

Well, that is, for an individual reconciler.

C

Oh, you want one big one: okay,.

E

Yeah, so basically we could say we have a service that will handle all of the different things that we need from our cloud and it just reconciles those so okay, um yeah uh so part of that refactor.

E

I I was you know, planning I was intending on, starting, or at least I was thinking about starting on um to get that code structured in such a way that we could replace one with another, just based on like some config settings for the controller so before you before. We want to do that. We probably want to write up what the design really should be then have everybody say yeah or no um change this stuff. So the sooner that we get the proposal out there, the sooner that work can start.

E

So I'd I'd like to get the proposal in and I wanted the last one. So, okay.

C

So I will leave it in here sounds good, um okay, private clusters. I am definitely working on that. I'm working on that right now um don't have a pr. Yet it's a really big change.

C

I think it should stay in here for now, but I might if it's too big of a change, I might have to move it to be 0.5 if it becomes.

E

Breaking? What's the what's the what's the biggest blocker there like what are what's the biggest chunks.

C

It's backwards compatibility. um We can talk about it a bit more, but basically I'm adding like we weren't before the load. Balancer spec was all done in the controller like it was all hard-coded.

C

There was no user configuration part of it and because I'm like adding a whole new spec, it's just like if you weren't using that spec before your, um you won't have it defined, and so well, if you're, if you have an existing cluster, that spec will all be empty, um and so it's like how do we reconcile the load dancer of clusters that pre-existed without these new features or use the settings?

C

um I have a prototype actually I'll paste it in the issue, but yeah um multi-tenancy.

D

uh Status, I kind of worked on it for a little bit and had a chat with david, but got distracted by a bunch of other issues like smaller things and some vmware stuff. But I was planning on working on this for the next few days and like not picking up any other stuff until I have this at least like a work in progress. Pr just to have the conversation started.

C

Okay, so definitely still relevant yeah.

D

Yeah sorry about taking too much.

C

um Failure domains for azure machine pools. That's a pr! That's been open for a while I'm going to move it to next, because, even though we have the pr, I think we've pushed it. Several nice milestones um and I think.

D

That's something that we need to get it like. Do we need, I reached out to jose who's like the person, and he said he's just too busy. I was asking if he needs help finishing up his stuff and he's like he has a couple of pr's and he said he wants to work on them, but he just like to do with his other stuff.

C

So he did, I think he merged the cavity- it's not linked here, but there's a cab dpr to add this to machine pools and that merged. This was the remaining work.

E

uh Yeah we, we also have the end-to-end test to add to for machine pools.

C

Yes, uh we should add that, to this milestone, right yeah, is there an issue for that already or um okay uh internal load balancer created for public cluster? I'm actually fixing that as part of private clusters, because it's all related um does not currently handle are the separate route tables where it's yeah? Where are we on that? It's.

D

Ready, I think, like it is a pr, and I think it has everything we talked about.

C

D

C

To review it, yeah.

D

Okay, I even reversed everything a couple days ago.

C

This one, you said we should close it matt.

A

I mean unless we have, unless people think of more specific things. I put all the logging in there I thought of so. Let's say we close it.

C

I mean this close. The original scope of the issue is done. If we think of more things, we'll open a new issue exactly.

C

uh What's next exposed cloud provider rate limiting configuration?

C

I don't think anyone has asked for this specifically, and I don't think anyone is working on this, so I think we should consider pushing it to next and marking it as help.

D

Yeah, it's pretty big. I think now that they've changed how all the things are different, like different. A lot of different.

C

Traits all right.

C

Okay, uh the network, describer interface- I think I'll- leave it in here and, if x, doesn't get to it I'll take it. I think we're that's pretty easy to do. um Improvements in e2e are. Is this done? I.

A

Mean there's a lot in here. I think the main thing that's remaining is we'd like to be able to run against an existing cluster without reprovisioning it, and I have some work done in that area, but I'm not quite done.

C

So, do you want to keep that open to track that.

A

It's still yeah, it's still something I like to do very soon. So, let's I think it's okay sounds.

C

C

um This one, no one has been working yet and I think we should.

D

Wasn't that related to the provider or something it is, um that was something that yeah.

C

Oh okay, um let's just leave it for now and.

C

Yeah that one had comments that weren't addressed and then it has merged conflicts. But.

C

I mean I can leave it, I think, or what should I do? Yeah I'll leave it? um Oh this one we might want to move to next uh david. It's the secure sense of bootstrap data. Yes, uh next! Okay, do you also want to be unassigned all right, yeah.

E

Actually, I meant to do it.

C

um What's the command, for I don't know I'll just do this, okay uh and then oh this. We want to keep for sure.

D

C

Okay um yeah: let's um do you want to summarize the like action items from the meeting or the metrics and uh everything and then we'll keep it in here.

A

uh All right, david posted, the url is when we should add, did.

C

You do that, oh all right did you just open it? It will be at the top.

E

No, that's uh an existing.

C

um Yes, let me get the chats.

A

To the milestone.

C

Oh, oh yeah, are we doing that this milestone.

E

um It will probably go with uh 9.70, so I just created 970.

E

I'm trying to test yeah so that other test is probably going to fail.

E

Or at least that oh.

C

E

Because what it the cappy test there, what it does, is it scales up, scales down and then upgrades um so machine pool upgrade, I think, has been uh troublesome for a little bit like it just doesn't apply yeah your model.

C

I yeah definitely want to see that done. I just don't know in terms of like people if we have enough like time to get this done.

E

Well, if that doesn't get done, then the other test doesn't work.

E

That is true um and, to be honest, I would I would be thrilled to go fix that.

C

Yeah you're you're getting a lot of stuff accumulating.

C

We need to make sure we have enough for the others too, um but no. I I'd love to see that working that'd be great. Okay,.

E

Anybody else wants, if they're, welcome to it as well.

C

Are there any other um issues that are top of mind that people think should be going into this milestone that are important.

D

Is this little balancer for ipv6, something that is broken that needs to be fixed quickly.

C

D

It should be in.

C

There, oh it's not I'm fixing that, as for private clusters as well um yeah, maybe I should write. Do a little write-up on that because it's gonna it's getting really big. You said 0.6.

C

Oh no, I don't know why.

E

E

It's already done yes,.

C

E

C

Me trying to sweep it under the rug issue, never happened, you didn't see it.

C

D

The network api thing is that that is that something we should look into, or we don't have to upgrade to that.

C

Version it is, but it's it's not like priority. I would say it's more like whenever we do move to the next sdk if we end up moving there for another reason faster than we thought we need to make sure this is done as part of that, but I don't think we should push to do that in the next milestone.

C

C

Oh, uh I think this is definitely happening in I mean yeah. Pr, I think, is.

D

C

E

C

C

We could consider doing the conditions for deletion.

C

D

I think we already have a lot of stuff, considering it's only like four weeks. Okay,.

C

All right, yes, I was the one.

D

That said, we shouldn't add too.

C

Much stuff, yes,.

D

Yeah I mean, does it look like it has.

D

C

D

Unless you decide, the private cluster is just going to require like breaking changes and push it out, then we can add more stuff.

C

Yeah, if I do end up pushing it out I'll pick up more stuff and add it to the mustang.

C

All right anything else that I forgot.

D

Think all good.

E

How is this working for everybody?

E

Is there anything we can do better? You mean.

D

E

um Well, meeting prioritization: um are we working effectively together.

E

Slack, should we throw more versions.

A

E

It's a text to emoji ratio, correct.

D

I know we don't have enough emojis for sure. That's clear. uh I think it's like it feels like there's like there's nothing. I have any issues with, except because like there was, there was more people.

A

One point there was, I don't know where everybody went. I wonder I wonder if this timing really is discouraging.

C

A

It didn't seem like it was that much of a change.

C

More people liked it than the other one, but I think yeah the people who said they preferred this time were also people who don't come every time. So.

E

Oh okay, yeah. I I think we're trying to be inclusive, also to folks who perhaps hadn't come before like the new zealand folks um that that are on a sister team to us. They um they haven't, they they haven't been coming in meetings. We were, I think we were hoping that maybe they would um at least I was um it doesn't seem to have worked out.

E

So I wonder if there's a better time for this, I think we're also looking at what like uh folks across pond.

D

C

I think there are also like phases and periods where more things happen in cluster api versus more things happen in the providers, and I think this is one of those times where more things are happening in the core cluster api, with, like the one of the four coming up and the last release. So I think that's where everyone is kind of putting their focus right now, like included, and so I think, there's less happening right now. It's more boring in the infrastructure providers, which is fine, yeah.

D

Yeah, that's fair. It will pick up when uh we have the new type to be able to perform stuff. There'll be a lot more work too.

C

Yeah, um but my hope is that this office hours start being more of an office hours and have like users come, and you know we can like have conversations with users and not just like us that we like, like the maintainers, who already talk to each other most days um outside of this. You know.

A

Yeah, maybe we should nudge more people on slack to come here like every time. Someone asks a question.

C

Yeah, let's see, for example,.

D

Yeah, I wonder if we can also like put a note in the cloud provider. Azure slack anybody might be interested whoever works there.

C

Yeah, that's a good idea yeah and we should maybe also promote it. The meeting like internally, um you said, like david, you know some people might not be aware or might not remember that this is happening now. Yeah.

E

Definitely yeah.

D

We can, we can mention it.

E

D

Also, like the copy meeting yeah, we need marketing. I guess.

E

Well, that that's something that uh we've been we've been talking a bit about like how do we write about the work that we're doing and start to start to speak out to the community and more users, because I think we're at that place where you know we have some. You know pretty solid functionality that people could take advantage of cool uh one outreach we're uh coming up is november um at gophercon, uh we're gonna be doing a workshop um and we're going to be focused on github and like git flow and go.

E

You know, building go code in github and one of the things that we're going to work on is uh cluster api git flow at gophercon. So um no, no just a heads up if anybody's interested they're welcome to participate, and uh you send me some details if you all want to.

A

I mean you mean like trying to do the git flow workflow for cappy is the idea heck yeah.

E

Yeah blow that up and make craig's head explode all right.

C

That sounds pretty cool.

E

Yeah it'll be fun, it'll be fun if anybody wants to it and it's gophercon is a really great event. If, uh if you haven't gone before in the past, it's really fun. So, if anybody's interested they're welcome.

C

Awesome, I feel like we're more having like a podcast than office hours now.

A

Just call it a podcast, maybe more people will come.

E

C

All right, matt, you want to close.

A

Oh yeah sure uh I guess that's all we got thanks everybody for coming, we'll see you in two weeks I will post the recording in the document. Okay, thank you.

C

A

C