Kubernetes AWS Provider, 14 Jul 2017

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes sig-aws 20170714

Description

Recording of kubernetes sig-aws meeting held 2017-07-14

A

B

A

We are recording I, think this is the gate of us. It is July 14th best eel day and welcome to everyone. I think we have one item on the agenda so far and anyone wants to do a hello before we get started or whether we should just get right into it.

A

I'll say hello: first I have Justin I am one of the CEOs leads. Also do a lot of work on cops and.

C

Red Hat new on the Sigyn Meos space and going across multiple clusters, writers.

D

This is Ken from hefty. Oh just Joe beta couldn't make it today. He asked me to kind of just check in to see. What's going on so I'm gonna try to relay anything to him.

A

Chris I'm a partner Solutions Architect at AWS.

E

My name is Luis Cohen and I worked at core OS.

E

I'm Seth Pollak I work at verified.

A

Very cool I think that's everyone. So if there's nothing else, let's get into the agendas. I. Think the first item on the agenda is about making the cluster ID tag. I guess required. So Rob. Do you want to speak to this yeah.

C

So we in an open sure, friend, issues where you know we were running multiple clusters under a user account in their own V, pcs and in snow storage instances where storage stuff was getting created and locations that didn't have.

C

You know any any execute nodes any to blitz running that could access the storage kind of tracked it down when we found out that we were installing the clusters. Without you know the cluster, any cluster ID tags needed the kubernetes cluster tag, and we had thought that being isolated to its own BBC would would handle that. But it appears it's not and certainly looking in the code it it looks like at best you can get isolation on an availability zone per user, possibly even one cluster per user.

C

Without you know, possibly having issues like this, so I kind of wanted to bring up the cluster ID seems pretty critical if you're gonna be ever running more than one cluster in AWS and it kind of seems to make sense to make it a required field period in kubernetes. If you can run it, you know in AWS. You need to have this even if it's only a single cluster.

A

Yeah I speak to like some of the history here which is like originally, the tag didn't exist at all, and it turns out that you need this tag to differentiate different clusters in the same account really looks like certainly without the tags. Things don't work very well, and there are, for example, like not tagging yourself, not taking your subnets, for example. Well, also call it problems with some providers, so definitely tagging is highly highly recommended. Yeah.

D

We ran into the same issue with the quick, the 80s fixture we made with the e lb not being able to sign they weren't able to find out which said that the crazy I'll be using. If you think things weren't properly tagged. Definitely there.

A

Is a yeah, and so the I think the only reason that we would not make it mandatory is because it isn't like if we were to start from zero today. I think we would make it mandatory across the board. The issue is that making mandatory in theory breaks anyone's cluster that doesn't have those tags set, even if really they should have them set yeah.

C

And Justin you had, you had suggested making kind of like a backwards compatibility flag, which I think is kind of a good good way to go about it. You know make it required with a ez flag to just you know, say really legacy mode or something like that. Yeah.

A

So, like that's sort of what we've done in the past, where we've made sort of your breaking changes is to have a have a flag which people can set, I mean I'm possibly we can debate like the exact sequence of of whether it defaults true or defaults, false and all those things over the releases in the points at the deprecation policy. Basically, you know start off with it being easy to fix and Gretsch and like at some point.

A

We make it so that users have to do something simple and that gets their cluster working again and then like in the nest release, if they haven't done that they haven't tagged in this case, and it would be an error and there will be no option.

A

One there's a meta thing here, which is like the reason why so in this case it was determining the zones for in which your cluster is running, to create a volume in the hep do case.

A

It was the EO B's is determining the subnets, which is actually a little bit more complicated because of the weight subnets work with Al B's, but the sort of root cause of both of them is that the cloud provider doesn't currently have or didn't originally have access to the nodes, for example, and if we had access to the list of nodes we could do, we could probably infer these things a lot more smartly, smarter, and that is another one, but I think in one six.

A

Someone changed load balancers to accept a list of nodes, whereas previously just accept a list of node names, and that was a big.

A

It helps us a lot if we do that, because in the node is a lot of extra information, for example, the the AWS instance ID is in there the aez is in there all this information is in there in a way that is much nicer and more reliable, so I think a separate thing that we are also working on is to make the nodes to have a cloud provider act, gain access to the nodes generally and use the nodes as more of a source of truth, rather than having to go back to AWS all the time which will make this less of an issue.

A

But I I think that the application flag is a good idea if anyone has any so. In other words, the proposal would be we make it required the cluster name tag, and maybe some other tags would be required in 1-8, let's say, and so everyone's cluster that did not have. If you have the tag, no problem, if you don't have the tag, your cluster will fail to start, but the workaround is simply to add a flag.

A

A single flag like legacy tags equals true or something like that, and then your cluster will continue to work, but there will be a heavy warning that you add those tags to your cluster and let us know if you can't add this text to your cluster, because legacy like is the the legacy flag will go away in either 1/9 or 110.

D

Speaking of legacy, I was I, have one seven source checked out and just looking at the tagging, a div s provider, and it's called this Const tag. Name kubernetes cluster legacy is kubernetes clusters. The vet there, so yeah I think the QuickStart we're already doing one level of legacy by naming the tag wrong. I just now realized that this is a problem. So, yes, there's.

A

Two different tags there yeah, so the original one was kubernetes cluster I believe yeah, okay, capital C. The problem with that is, it doesn't allow for subnets, for example, to be shared. You can only have one tag with the same name, so then the new ones are communities, do I, owe slash cluster, slash cluster name, and then you can say equals shared equals owned to have some notion of ownership. But the idea is that you can have two clusters that are sharing a subnet the level three.

A

The level two legacy is that I've been told a terraform can't currently create key tags with slashes in them, or something like that hmm or something like dynamic. There's something now issue with dynamic terraform tags. That means that people are not entirely happy about that new form either. So the beat goes on.

A

Yes, that is why we also like the it would be better to prefix the tags with our commands, IO name anyway, so that they but yeah I think we could certainly have it would be nice, I think you'll be nice. It is a good suggestion to mandate the new flags, the new tags as well yeah.

C

I mean could mandate one or the other saying one of these has to exist on there until there is a a direction on what the final tag should actually be.

A

And it is a sort of an implementation issue in terraform, so I, don't imagine that I imagine at this stage they will address that I'm sure we shouldn't necessarily jump through hoops to every tool was terrible.

C

Yeah definitely.

A

But yeah I'd be in favor of that I. Don't anyone has any other thoughts or objection so that the proposal would be that it would for now, at least that the flag would that the legacy in 1/8? That's, let's say that having a tag either legacy or 4v2 will be required.

A

But if you pass a flag to I, guess coop controller manager, I don't know if we need to enforce it across the board, but just COO controller manager. Then it will. Although I don't know, if that's possible, if you pass a few passive flag, it will ignore that check and allow you to continue for a limited period of time. Presumably 1:8 and maybe 1/9 are.

D

People going to be looking at the output of the API controller, or is this gonna be visible? People fail to start. Oh okay. Well, I mean what I mean is in the period of transition where we're gonna warn about it. There are people, gonna see it well,.

C

I think I think that's what Justin is suggesting is by default. The flag would be would require the cluster so the first time you upgrade how to run it. It's gonna error out and there's there would be a flag that you can set to run it in the old mode, but you're gonna have to I think acknowledge the issue because.

A

I mean part of the issue. Is that exactly what you just said where we are logging I think it was only an info and it's now been upgraded to a time with an error or warning, but people didn't notice it and it is I regularly missed those warnings.

E

Is that do you know a PR for that yet or a documentation yeah for, for which one today, for the cost ID required I just want to make sure that there.

A

Is nothing that's. What we're discussing here is whether we agree. The idea is you.

E

Know it's a fairly it's a breaking so.

A

I mean the other option is just to make it required not even have a legacy flag. If people are like how hard is it just tag but I feel like it's not trivial to just tag so yeah.

C

For a cluster with significant amount of resources that may have been spun up, that could be a big endeavor.

E

How much more work is it to tag the prosperity that it is to set the flag.

C

It would be significantly more work to set the cluster tag, in my mind, then set the option on the whatever Damon would need it, because you have to go find. You know your load balancers your nodes, your you're, persistent volumes. Whatever else is your cluster is used and created and tag everything.

A

You think you raise a good point and I, don't know how hard I mean I guess we'd only actually make the instances the instance would be. The only one we would air out on. The issue is: if there's someone out there who, for whatever reason, can't tag their instance with the Canaries buster, we were basically, we would have no out unless we gave him the flag, but I do a group. I suspect it wouldn't be too hard to like, whereas the flood placed on the on the instance. That's.

B

A

Yeah you pass a tag: kubernetes cluster equals that or but and then what you typically do is, if you're running an order stating repeat a get on the or a scaling group with like a pass down, it passed down magic so.

E

B

E

That we can warn that this is coming. We can suggest that you definitely tagged it correctly, but if you, if you really want to talk to you just but it's like you know, it's like you can shoot yourself in the foot and they say you can. You can put this label on it, but it you're on y'all, I.

A

Think that makes sense, I think that makes sense and I think also that we say that if you have to use this, if you have to use this flag, let us know why you're using this flag because it is or will deprecated it immediately from the start. So we'll start the clock I'm like turning off that flag and I think I think that's reasonable. We.

E

Can get more opinions on the fact I think it definitely wants the documents created.

C

I'll sign with me, you.

A

Know Rob, are you gonna, do a PR or yeah proposal I.

A

Just describes what the issue is and what up the other a green solution is because everything else have an issue and then I don't think it needs a full design. I, don't.

C

A

C

Shouldn't other than the fact that it's a breaking change, but you know I, think probably the PR is fine. Yeah I'll write up a PR yeah.

A

I think you think a breaking change with the flag is sufficient is.

E

There a formal process in communities for breaking things like that, so they documented and released noted correctly. I think.

A

That's it I think they're pretty sure be more, but I think that they're technically there is a deprecation policy. So I don't know if this would actually be a breach of our deprecation policy. It's odd! It's an odd one. We can ask where the release manager is, or we can ask the architecture, people or I, don't even know. Who would ask I can.

E

Ask I know: there's a couple of people, of course, with part of sick PM, sick release. We could ask so I can just ask on the side just curious cool. Thank you.

A

That's great I, don't want to put anything else on the agenda or has anything else they want to talk about I just.

E

Have a question for Rob really on the disappearing nodes and because I saw that myself and I was wondering if any status changes from last time. He mentioned that there's.

C

Been some discussion on the PR I think largely I'm kind of waiting for the creation of the cig cloud so that we can bring it up there. Really it's a something that all the cloud providers need to kind of weigh in on and we need to have kind of guidance on it.

C

You know and there's been some discussion, but it was lots of different ideas, but nothing had yet towards a conclusion and I just kind of I think we might need to have the overall working group or sig or whatever they're gonna end up, calling it doing it with it, whatever to kind of decide how to move forward on it make sure we get enough people with buy-in right from the cloud providers. I, don't I, don't think yeah we got like a B.

C

Aws is kind of chimed in and I believe, I think OpenStack has and some volume eyes have kind of chimed in, but I don't think like as you were, or any of the other cloud providers have really chimed in and looked at it.

A

What our AWS, what we think if it was only a degress, what we think the right policy would be and I think so, I think the question you want to restate the problem or shall I restate.

C

It sure yeah can we start it. So the issue is that least, for a B of AWS when a instance is stopped in in AWS, it is removed from kubernetes, so any pods that are running on and get rescheduled and all that kind of stuff. And then, if you restart the instance, obviously it comes back up joins the cloud it has all of its data on it in other cloud providers. There are some others that when you stop the instance, it just becomes not available and it stays in kubernetes.

C

So really the question core question is kind of around what determines a new node since when you start in an instance, it's going to have a lot of things are already in common with the instance that was stopped in kubernetes.

A

Yeah- and there are ways there aren't- there are changes you can make to a nativist instance like I believe we can change the instance type. Is that correct.

C

That's a good question: I think you can I think you can I think you could build a change. You know change the instance if, at one point.

A

One doesn't really concern me, because you know if a node has more CPU or RAM right.

A

It would, if you had a big deal, yeah yeah I mean to bit should treat that like a hot plug, which apparently happens in the real world that someone went ha plug in more RAM or something so cubelets should be able to update the node status and, in theory everything should just work, I'm sure- or I suspect it won't just work out of the box, but I don't think that's a guiding concern, because I would just be a straightforward bug, I think the or new behavior that we need to accommodate. I.

A

Think what we're good interesting is like with volumes like local volumes. I think are, are the more problematic or the ones we're willing to think about in terms of like look of Williams anew and I guess like if you have a local volume on a node and I stopped it like do, I want to bring it back with the same node ID so that the same local volumes persist or whatever, whatever the logic is there I think will be a guiding one. I. Don't know what other cases can kind.

E

Of instance, be stopped without the administrator, knowing I'm gonna be stuck with out from AWS instead of from the API, and then.

A

E

Situation never.

A

E

A

Suspected and I imagine that, like it's.

E

A

Because my biggest.

E

Concern is that as a kubernetes user, that they look at the like the UI or the minute output and they say.

B

E

Nose already everything's happy right so and he's just losing nodes slowly right and that's it's such a phantom thing, it's scary, so god I wish. It was a way to say look. This I knew about this and is no longer there. That's that's what.

A

I would like to.

E

A

I think that's I, think that's a good point. I! Think that, though, there's a there's like a general like what is it you know, the idea of like the nodes on a degress are also very ephemeral, right and I'd love to see us integrate with the or continue our integration with the cluster autoscaler. More so that sure the notion of nodes doesn't really matter as much anymore in my mind, but I guess. Initial question is like why? Why do people stop nodes like case I?

A

Imagine that there are circumstances in which, like if a device detects a compromise you maybe they'll shut, maybe they'll stop one of your notes. Right, I, don't honestly I, don't know, but.

E

They're in the region or an availability zone is no longer available, so what would happen then I think they get terminated and terminated with, but looking at they say, I'm I only look at the committee's screen, I. Don't ever look at AWS right because I trust the kubernetes scrape right. Then, if reason built now something that happens to the cloud and those notes go down. What does my kubernetes screen show.

A

If it's, if the nodes are terminated and they will denote, will disorient senses are terminated, the nodes would disappear. You know, but so everything's first ready.

E

A

B

E

That's I mean that to me it sounds like it shouldn't be happening right because, from a kubernetes point of view, as a view of the status of my clusters of thousands of things I, should it should tell me that something has happened. Some state change that happens in which I didn't expect right.

E

That's for a suggestion.

A

A

There is a separate notion of health when it comes to things that, like I, want to its EBS volumes can be scheduled right, EBS looms are the classic like what does it matter if an AZ goes down? What ETS volumes are like the the big one?

A

Okay there's you know if you only are running in one AZ, that's that's a problem like if you're down from three to that's for them, but the real most likely one is you're down from three AZ's to two AZ's and you know, half or a third of your EBS volumes are not mounting yep.

A

Absolutely no I mean I. Guess you see that in your staple sense, I it's as you will see a whole bunch of pending containers, you'll see yeah you'll, see exactly but.

E

At least there's a state change if there's there's a they're, not saying running and that's what I was like instead said running and they're actually not running or they just got removed completely like stateful sizes were gone and, and that would be bad I mean the.

A

Nodes are gone, the pods will be gone, the stateful set will bring up a new pod and we'll fail to schedule. Yeah.

E

Exactly that's see, that's like the symptom of the of a of something that we could know easily, because we knew this that there was a note there and.

F

E

Had to actually remove it from the database right.

F

Isn't there a difference between an instance that has been stopped by the user? That's different from there's an issue in AWS and a zone is not responding right. So I would imagine that within communities the eight of us controller, it's not gonna if it loses contact with an AZ for whatever reason it's not gonna suddenly mark all of those nodes as deleted.

F

What it doesn't make sense to me, yeah.

A

There are, there are so there are three states, I guess, there's an instance can be terminated right, which is when AWS force or the user shut it down forever. There is stopped, which is a sort of temporary suspension, but you're not charged for it. Nothing is actually running it's just sort of it remembers.

A

The configuration eight of us remembers, the configuration can restart it and that's why I believe we can change, ends types and then the third case is something has gone wrong with the control plane and you can't reach an AZ and a couple number of years ago, I, remember, seeing like eight of those instances would just disappear from the describe instance list in that situation and I believe it's fixed the mitigation against. That is that so what currently happen is those nodes would be deleted because your guys would say I have no knowledge of these nodes.

A

The mitigation against that is, if, like more than some fraction of your nodes, disappear at once, I think we don't like we don't evict, we don't delete the nodes, I mean that's what I think happens, but honestly I think we saw this actually like two years ago in Europe somewhere as well, but yeah it. Certainly it's certainly a scary edge case. I mean your nodes on AWS or the cloud are supposed to be ephemeral, so they should I, don't know, that's that's where we I think that's where the sort of the yeah this might.

A

This might well be a sick cloud discussion which is like what is the notion of a node and should we under what circumstances should we alert when you don't have nodes, for example, in an AZ or.

C

It's an as you said, it's an existential question. What is a node man.

E

You mind adding that PR or documentation to the notes from today. That way sure know where they are.

E

Just so you know you track of it, but.

F

E

Mean I'll tell you how I got to it. I was doing a demo of a glossary fest on top of on AWS and I was trying to show how, if I, take 1 AZ down, but the parts still running and when I said ready and everybody's like yeah. What's the problem, so I was like no, no, that I shut down a whole bunch of notes. I trust me I did so.

C

That was actually what kind of spawned it. So we had a customer that had that kind of a thing here. They had a system that they're trying to use to monitor the nodes and kubernetes and query and kubernetes, and they had nodes, go down, it would get stopped or whatever and they disappeared. So they could never monitor. When a node had an issue right because kubernetes perspective, it didn't stopped existing.

A

What happens if you never launch nodes in one of your three AZ's right? That is just as bad, but.

A

C

Agree, the current situation is, there's improvement to be had here. It's just unclear what the improvement is. Yes,.

A

It might be a general cluster health. We have no problem now. Maybe it's like the cluster problem detector, which is you know, copier or a third of your nodes. We don't have any coverage in this AZ or you only run. A single AZ or every single pod is in is on one in one node, for example, like those sort of things warnings that every single pods on one node.

A

Right where we have that can happen today,.

A

So only sort of meta health checks that are- and it sounds like this was a this- was an attempt by your customer user Rob to build a marching system that went a bit awry because of the way committees says. Yes, all my ready nodes are ready and happy.

C

It was, it was something they tried to do. It didn't work out very well, but it kind of brought up this kind of when digging into it and looking at what was going on it kind of brought up this question of oK. We've got these inconsistencies across cloud providers. So now what are we in order to come up with any consistency? We need to figure out what a new whatwhat a node is and what determines a new node and all of this before we can determine you know in four. We can start saying.

C

Okay, all cloud providers need to act this way, because how do we know what what's correct, yeah.

E

And I think I will have to ask the opposite question, which is, when is a AWS note marked as not ready, increments.

A

So, in the not ready status is purely determined by the heartbeat and that cubelet to the API server. Yes,.

C

So I think it would be marked not ready to see what the issue is right now, when you do a describe instances, it's trained only on active, so I believe it would become not ready if your instance stays active, but the kulit stops reporting in for X number of status check, so I think ends up being like 40 seconds or something like that.

C

B

C

40 seconds or something if it stops reporting in that period of time that it would mark not ready.

E

Network or it is not running issue, yes,.

C

The Buddhists crashed Kubla does not yeah, it was long or networking issue of some kind. I think I've.

F

Seen it when the disc gets full like the logs, can't be written Google, it will crash and cause that not ready, and all you can also sometimes, depending on how long your instance takes the boots up. Sometimes the it will respond is not ready when it's rebooting, ok,.

E

So then the the ready, not ready is that the instance checking Dean since it is started in the AWS, but it is not responding. So that's already the.

A

Yes, not ready that was entirely ready. Not ready is entirely done by the cubelet and the qubit heartbeats. It's entirely kubernetes comp concept, but we what we have in addition on AWS ears and all clouds yeah, most confident we have a list of instances and if the disk, here's or if it, if it is not running on AWS in the describe instance list, then we will delete the note, and that is ace. That is the node controller, include controller manager.

F

And just ran sanity. What is the what's the purpose of removing a stopped instance from the node list, so.

A

I can tell you the purpose of removing a terminated instance, which is you want it to go away so that you don't have to wait for the readiness you don't have to wait for yeah you don't away for the heartbeat to mark the node is down, and for all of that to happen. Yeah.

E

A

Stopped I think was just an edge case that was basically overlooked. All right and it's I mean you, don't the node isn't running so you do want to reschedule probably I guess the other question. Why.

C

Are you stopping your note right, like yeah, there are still I mean we've gotta had the discussions on the PRI they're still there's the I guess you could say like the old way of doing IT, the new and the new way of MIT, and you know older ways, even though it's their cloud instances still Mary prefer to take down an instance, and you know to upgrade it or something like that and not have it get removed from kubernetes and that kind of a kind of a thing this.

D

Not necessarily an old new split either I mean like there's, there's situations where if I was running locally or something like that and that was on Prem and I wanted to shut down machines because to save power or something like that, then I would imagine a node being stopped is a totally legitimate state. True.

A

Just rejoined when it comes back and.

A

To like continue running the pods that it was running when it was shut that when it was stopped.

E

A

Think yeah this might well version for safe territory. I think it seems like the answer. Is we don't really know we're, not a strong answer for what the US should do correct we?

A

Certainly if it's pressing, we can certainly like do a future flag type thing to change the behavior the it may be that if we have some form of cluster health monitoring order to make a system, then we negate- or we sidestep the problem.

E

If you do is the question.

C

Well, I can certainly say that, from from my side, the the original issue that that caused this this crisis and in identity has not has kind of gone away, so it kind of it doesn't appear to be an issue. I guess now, with art with the customer.

C

I don't know what they did, but the issue got closed out, but I did just bring up the the highlight the the inconsistency between the COG virus, because GCE and Azure do not work this way, GC and Jesu, or both when instances stop it just becomes not ready in kubernetes. So that's.

D

What it makes sense I mean I'm, looking at the code right now and I'm, seeing there's only three states for a node fade because pending running and terminated, would it make sense for a kubernetes to learn a concept of a stopped node and that you guys would just clearly do that and then other cloud timers don't have a concept of stop, so they would put it right and terminate it. It's.

A

Something you're saying that a kubernetes No, ah that's interesting right because.

D

I really think on-prem that's a legitimate thing if I want to Dinn pack, all my podge and shut down some machines and then power them on later and I won't have the automation to do that. I should be able to express that to kubernetes that these nodes are stopped, but just for now yeah I.

C

Think that then gets us into signo. Doesn't it right? We can just make somebody else deal with the problem.

C

D

C

Passed the buck around.

D

Did should I create an issue for this increment I used to some discussion going on that, but.

A

The notion of us I think that yeah link.

E

It to to this one yeah.

A

Yeah, it's really because if, if it turns out but like bare metal like pops up and says oh yeah, we really need this notion industry painful. Then that adds a lot of supports in the notion that we should have my.

D

Dad it's painful. It's really like it's, it's kind of a contrived scenario, a little bit, but it's still probably worth expressing it's, especially because, if we're in a situation where we need to reconcile the behavior different cloud providers and try to force them to the same bucket, when it's really not the same thing, this is be a nice like slowly. It's fixing that yeah I'll start on that doing.

E

The key thing really is to not disappear, so just don't disappear.

E

D

Okay, if we have time for another topic, that one thing I wanted to bring up real quick was that, with the one-seven-one release that just came out cube Adam now learns a node named flag, which is important on. We were having issues with one seven on our QuickStart, where nodes would join trying to advertise their short node name without the fully qualified DNS and some other part of kubernetes, and the details are escaping me right now.

D

But some other part of kubernetes was anticipating that the pre-allocated node was gonna, have the long fqdn they weren't agreeing with each other and to disambiguate it for amazon's case, because I guess this was hitting everybody that was trying to use AWS cube atom and one seven. I think they added a node name field, where you could just specify what you wanted to name the node for cube adam for both in it and for joining. So we're gonna be doing that in hefty o's end.

D

We're gonna be setting that flag and I think that maybe pops might want to have something. I, don't know if cops the suppurating to send the same problem or what.

A

The do then set the hostname override flag on Cuba. Is that where it goes to I.

D

Don't I think that that might be what it's doing underneath this might be, what cube Adam is actually underneath so yeah the so.

A

Cops just soaked up says done that before, but what we had literally this morning is cute frocks. He also has a similar flag and also things about the same thing and there's a PR which is gonna, make it a serious problem if they don't match, so we're also doing that in queue proxy. This is a long-standing issue where the root of this is it used to be that the node name had to be resolvable from the master and was how the master reached the cubits that is now fixed with some flags.

A

You can set the resolution order for how the master or the API server think it's a para. How the API server talk to the cubelets and you prioritize the internal IPS. Then, basically, everything works, as you would imagine. It should have always worked. The node name doesn't really matter anymore other than they have to match yeah, but I would love to see a better node name.

A

So the other problem with the other problem, with the node name- and this is my time to the previous conversation being the IP for being the host name- is the host name is not unique on a cluster so interesting, it will be the same I think if you stop a node, but if you have a small subnet and you reset instances rapidly or start nuances rapidly, you will rapidly reuse an IP, an instrument PS, and you will get the exact same Danette, which is yeah turn that leads to all sorts problems.

A

We're right when I try to introduce. Caching like it turns out. I can't use the node name as the cache key, because I just need, because.

D

There is part of my ignorance on AWS in particular, but is there oh? Is there a DNS technique where you can use an instance ID, that's something and get I. Guess. That's really doesn't work that way right, because innocence doesn't just have one IP address as.

A

Far there is not. There was not a way to do that. Sadly, it would be great. If there was you can publish DNS records from the instance ID yeah yeah right. It is not built in.

D

See I think what we're gonna do now is gonna use fqdn everywhere um for our node names and just hope that works across our fingers. What.

A

I would love at some point to get the node name being. The incident database instance ID, but I've also had feedback that people like it being the longer fqdn, so they can like map it to an internal IP. Even though it's the internal IP and not the external IP, like people, you like that as well, so it got, we solved the amitabh there are there other problems here which is like if you have a custom domain, name or DHCP domain names in your V, PC I think that's correct.

A

You get into all sorts of problems, but which is another reason why it's so so frustrating or so complicated to deal with this, but I. Think for now, we've got all the problems fixed and we'll probably just live with it. Yep yeah I'd love at some point to get a better node name, but I, don't think it's gonna happen anytime soon.

A

All right thanks.

A

Jenica numerous has anything, is it? Is anyone trying one 7-adam address with bigger clusters like more than 200 notes, because I know we had some bugs if you fixed pretty late in the one seven release cycle on that.

E

I wouldn't mind trying it out: I, don't have two hundred nodes.

A

Though yeah it is two hundred where it goes that where it was hitting a page for pagination limit or a a limit on the number of the number of filters you can have actually but anyway there's those bug should be fixed but I. It certainly hasn't had the coverage that we that we would like to have on it. Yet so any feedback will be very welcoming. Someone has that you.

E

Have a PR or something for that. It's.

A

Already merged well.

E

I mean like at least still can read it so I know what to look for the promo I'm trying to find it.

E

That's gonna be is that marriage and outing 1 7 1 1 7 ow.

E

So I'm sorry I'm just trying to ask the problem: wasn't 1 6 or the phone wasn't 1 7 the.

A

Problem was in any version prior to 1, 7 got it. Thank you I'm, not gonna, say which alpha or beaters it may have got into. But you know one seven now includes the fix in one. Six does not I think I.

B

A

For two step, I'm gonna find there's a put an issue in.

A

A

If no one else has anything else,.

E

I'm one foot all right: it.

A

Is happy Friday to everyone? No, we.

D

Good yeah we go weekend off.