Kubernetes SIG Network, 25 Jan 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG Network 2018-01-25

Description

Kubernetes SIG Network meeting from Jan 25 2018

A

Cool recording this is the Cabrillo sick, Network meeting for Thursday, January, 25th 2018 and, let's jump right into the agenda, got a couple things on.

A

Why don't we start with a quick spin through the test state and I will share my screen.

A

um Yeah so I I missed last meeting I, don't know if you guys had a chance to go through this or talk about it at all, I'm, just kind of catching up on it. Now.

B

At least a couple of the test cases: okay, yeah: we did a quick pass through and then looked at a few. Some of them seemed to be not Signet, work actually and then yeah I, don't know if it came to any huge conclusions, but yeah.

A

Yeah looking I mean looking at it now things that jump out on me is that we've got like almost every test is listed as flaky or failing was just a couple of us. It is passing and then there seems to be something wrong with all of the cube ATM based tests on.

C

The cube ATM ones, unfortunately, all of those are based on a project called kubernetes anywhere, which started out as just someone wanted this type of thing, but it became kind of the de facto four-cross cloud testing and I've been trying to add some fixes, but there's not a lot of owners in there and the one owner that I did know and that works on this to you. So I can't test things out very easily and have like a high degree of certainty needed.

C

They worked without more attention on some of the open PRS on that repo.

A

So, and is that just the sign that we're using the wrong set of tools for running these tests or yeah.

C

I mean it was always like, like an instance of kubernetes anywhere like a Shah, was brought over in to test infra, and if you have tests right, yes itself and the PR that added that got closed so now, I'm kind of stuck in the same process. It got closed because Kubb deploy or cluster API was supposed to be the way forward.

C

So I'm not sure if these would be deleted or like in favor of cluster API, evolving or or what should happen. It's probably something we should reach out to cluster lifecycle, about I, guess.

A

Yeah, do you do you happen to know anybody who might be knowledgeable on what the.

C

Plan future is probably the person that shot you Lucas.

C

Yeah I think there's two Lucas is instinct cluster boy either one of them that we reach would probably know what to do.

A

A

Does anybody wanna to take reaching out to somebody from cluster by cycle figure out what the future is there yeah.

A

B

So what should we do if some of these don't appear to be our faults? For example, I was looking at the that failing one right there, the ingress GC upgrade e2e and the most recent run looks like it's actually failing at infrastructure. She is a can't s H into the master mode, yeah.

D

So I am more of that. On our end, we've been working on migrating all ingress jobs to pull from a dedicated project, pool with a lot of quota, and that kind of migration is spawns, a couple of unforeseen issues, but we're aware of them, and we know how to fix them. It's just a matter of fixing it, so those should be completely fixed. Hopefully, at the end of today, okay,.

B

A

We think that's both of these ingress jobs hitting that issue yeah.

D

It's it's all ingress jobs, so any ingress job you see- that's flaky is most likely due to the issues we know about. So.

A

Cool, that's that's good to hear.

A

So I mean I guess that kind of sounds like most of this is infrastructure stuff that we have to get past. First, these failing and flaky and then all the cube, ATM ones as well or the upgrade downgrade.

C

A

C

Score, it's like the end of the list. Oh.

A

Yeah, the key proxy staffers are also listed as flaky.

A

Looks like we've had some.

E

I, take a look at it test, saying the failures, because we are leaking somewhere resource out the earthquake and down way- and it's not failing it's just flaky- are you I would take a look.

A

Leaking like a an infrastructure resource or lino IP.

E

In grass also something it's a thing that I do in the upgrade and making choice of working things like that.

A

Okay, so you're comfortable, owning looking into what's going on there yeah.

E

B

So quick question Casey. What exactly does flaky mean here.

A

B

You know that's.

A

A good question I'm, assuming it's some percentage of tests failing over the past amount of time, I, don't know the exact definition yeah.

B

I mean the concern I have is that there there might be a lot of noise and some of these results because of its flaky. We don't know whether it's flaky, because of our problem or because of somebody else's and then even if it is somebody else's problem, then it'll still be showing as flaky for at least probably at least a day yeah.

B

So it's kind of like it's a little hard from these results here, to tease out what we really do need to pay attention to consistently I mean. Obviously you can go through the list once at one time and be like okay, this tests, not our problem, it's like not our problem, but then how do we kind of know that the next time we go through in like a couple of days later, that yeah I, don't know it just seems like there's kind of some.

B

Somehow nicer, if we could actually in the failures on sig network stuff, but I'm, not sure how that could happen.

F

Question here dan, would you be happier if, in this summary, in addition to showing the failures in the past, we can also show the failures in the past day,.

B

Yeah that might be nice or just you know, if I knew what the criteria was for flaky well,.

F

Look, it has to be up to us or somebody. Yes, you know what the cause is. We can't ask the tooling to figure out the cause, but we can ask the tooling to give us more visibility into fix trees, yeah.

B

Well, I mean okay, so one example: I was just looking at one of these. The GCI GCE one, which is listed as I, think that's one awesome yeah that one's listed as flaky, but it flaked in one test.

B

Yeah but then also what I was looking at was, if you hit the about and then you go see these results in goodra nadir.

B

It looks like there's a lot more failures. There.

B

Yeah, so if you see that first failure, yeah.

A

B

There that one is actually a cig storage failure. So is the tooling good enough to ignore that particular one, because I mean I, don't.

B

Maybe this is just me not quite understanding the presentation of the tool here, it's like given that so, if we go back to the yeah you're in the right spot, go back there and you'll notice that the DNS should provide DNS for pods. That's the one that builds most recently and it failed. Like looks like this morning. How do we actually get to the test results for that specific test?.

B

Okay, nevermind I guess you can click on it, but you don't actually see that you can do that. Okay, so nevermind!

B

It's just my confusion. In my own familiarity, yeah.

A

I think we're all sort of figuring out how and all.

B

A

Stuff hooks together.

B

Yeah, so it's not obvious, but you can actually click on the red square. Yeah I love the exact it's. What I was looking.

A

At I mean it looks like it's smart enough to filter out this okay, six storage stuff, because good, that's not showing up in this dashboard. It's only showing the the individual tests label, the Signet work, yay, smart tools, the.

C

Whole thing stole, though, like showing, as flaky like how we haven't marked yeah.

A

That's a good point: um where did that would I.

C

Put that cuz, if so, that I think that's the change that I made so I noted the same things that Dan was like it'll, be a six storage test that failed so for a given job. If that happens, the whole thing is marked as flaky. So I had use some of the regex that I saw or to filter to just save Network stuff, but that appears does not potentially have effect on the whole suite if you're just filtering, because it just shows what's displayed I guess I.

A

Think I think I'm following yeah I mean this is a summary.

F

Of I'm afraid I'm not I, mean if we go back to that, one that we do when you click on it. um Nevermind I see it yeah, there's one nevermind.

A

Yes, I think what it's doing is you've got this GCI GCE suite of tests, of which only this subset or SiC network. But the overall thing can show up flaky because there's some sig storage failures in there as well.

B

So with that one, looking at the summary that's there, it says seventeen of ten thousand three hundred thirty two tests failed and yeah. That seems to be about right, but then 98 if 287, runs failed in the past week.

B

Do we know what the runs part is they're like? Is that does that even matter for us or matter for the sig.

A

See that's if I understand that correctly, which I'm very well could not. Those numbers are not super important for us, because I think the run is the entire, like GC igce, test. Okay,.

B

But then the 17 of 10,000 is the signet work, related tests.

F

Meaningful that 10 3 3 2 has to be instances of the GC igce test. Yeah.

B

I think that part is correct because we've got you know, I, don't know 25 30 odd tests and if you go back a whole week there are a crapload of you know: cells in that display and I did a quick count in there are about 17 that actually failed in that entire display in the grid, but I'm running wondering about is the 98 of 287 runs, and maybe that's kind of pointless for this call and I should just go figure that out somewhere else, but well, maybe it shouldn't be already privately figuring out happens.

B

F

Delegate, someone to think of the Dallas I mean.

B

The end result is: how do we separate the noise out of here and actually figure out what we need to pay attention to so I yeah I, don't know I mean, maybe we just the other thing is it would be nice if there was kind of more of a dashboard here.

F

What would be more of a dashboard than the summer?

F

B

That I don't know well whatever I'll follow up, maybe with the people working on the tool and give some impressions about that. Yeah.

A

I can try to go out and find some sig test person. Okay, tell me how this all works. I'll. Do this one I.

B

Mean I've sort of been looking at this tool for like what the past couple of weeks and every time I go into. I take a look at a test case, but then you know I could come back in the back wow I. Don't this isn't really helping me get my job done.

A

Yeah yeah I'm sure the test guys are and just to get feedback on this anyways yeah.

B

Also, do we have any idea of which tests are more important to concentrate on I mean like? Is there a set of tests that we should be looking at every single day or every other day to make sure that we actually work to get them fixed and then some others that we don't care quite as much about, because you know clearly we have limited manpower or person power as it is.

A

Yeah I mean I, think that's definitely the case. I don't know that we know which ones a little bit and maybe.

B

That's something that some feedback that we could send to the tool people is, if there's a way to kind of sort. The tests by importance by importance of what the sig thinks is important yeah. So if they show up on top as opposed to having to scroll through the list to find those- and then you know, I mean at least then you can kind of take a look at the top five tests or something and be like you know: hey our most important tests are failing and that's a lot easier to see more quickly.

A

B

Well, anyway, maybe to cut myself off. Do we know which ones are most important, that we should focus on right now and dole out to people to kind of triage.

B

I mean we know, the ingress ones are probably broken.

A

Looking for help from the audience on that question,.

C

Which, which ones have we had not enough by the underfloor.

B

Is that we should do identify an owner for specific tests.

C

May be just a deflate them for now and then bring it back to the group. Sure.

A

C

Can I could take all the coop, ATM ones, I think someone else mention if they can take ingress and crew proxy upgrade downgrade.

A

Yeah, it might might be good to have a has written down somewhere like who is who owns and who knows, each set of tests best.

A

So I'm sure that each of these has somebody who understands them really well, but it's hard to find that person.

G

A

Just furry said: you're gonna do keel, ATM ones for now.

C

Yeah I mean I could take on more if no one else wants yours, but I can't promise.

E

Troubleshoot, oh sorry, I think you've brought you on your heart. I mean I yeah.

A

Yeah anybody who wants to own a test feel free to put your name down in the in the notes there.

A

Cool, so so I think that means we've got a couple. Owners for tests and dan and I will try to reach out to the sick testing guys get feedback and get get a better understanding. What this summer is actually showing us. It's.

B

Maybe next time we can do a short presentation to the group yeah yeah. That sounds good.

A

In that case, let's, unless there are questions on that, why don't we move on to the next topic.

B

Does that mean? That's you alright! So there's a listed issue there and one of the things that we had been looking into was running pod data plane on a separate network interface. Now this doesn't have anything to do with multiple networks and kubernetes really at the moment. But it's more about you know having the masters and nodes be able to contact each other on you know one network or one IP network and having the pod data plane, run on a different one, and it turns out that there are some.

B

It's not particularly easy to do that, and there are a lot of dependencies in the code where, for example, if you're I think it's if the node IP, where the hosts I think it's the node IP yeah. So when you start a pod, the pot gets a pot IP, but the pod also has a node IP and some things in communities use that node IP, where they probably need some other kind of address. If the pods data plane is on a different network, so I I'm, probably not describing as well.

B

The issue actually talks much more detail about it, but I wonder if anybody else has sort of tried to do this kind of thing with kubernetes and then, if people have come to any conclusions with that or if they've tried it, what their approach was and then there's kind of an approach outlined in this issue that might be useful to try out and.

B

So the solution proposed there was a new excuse me, a new node address type.

B

Don't see I'm trying to remember exactly what that one was.

B

We were proposing one for the control plane, I, believe because there's enough stuff that uses the nodes current address for data plane, type things like Q proxy, for example, and so by specifying a new address for control, plane, eg, nodes, talking to the master or master talking to nodes. That was kind of a way to not have to modify a ton of kubernetes.

B

So anybody have questions or need clarification. Anything I mean it's just kind of a request for you know if anybody has tried this, if you know letting you know, we can talk about it here. If this is interesting to people, otherwise we can just keep it on the issue.

A

Yes, it's interesting the main just kind of parse everything he said. Clan read the issue in detail, so.

B

F

Example with go ahead: Mike, yes, a at IBM, we always use notes well, usually use notes with two interfaces and we care about what goes on or what I'm trying to remember what I've done about this and I? Don't really remember the specifics, yeah I, don't quite understand why it's the committee's issue at all I mean the node address is used. For you know, platform communications and the pod addresses are used for data and it's up to the routing rules which are not something Cooper in his manages anyway to toe.

F

You know what traffic was our one interface yeah.

B

B

Remember exactly what it is so, okay, one of the problems is pod status at hosts IP. What cubelet does is it'll grab the node from node status addresses, and so your node IP is going to be.

B

Usually that's going to be I think whatever the hosts IP address is or whatever you've set, the node IP for the notice, but the node IP is also used. I think when the master wants to directly talk to the node and there's a couple of cases when that happens, because there's two ways that nodes and masters can communicate the master. Sorry, the node can actually ping the master and keep the connection open and by this I mean API service, entually and register itself.

B

But there's also cases where the master talks back to the node I think some of those are for, like C advisor or status or other things like that. So but anyway, the point here being that.

B

That node IP in the pod stata the host IP and the pod status, gets used in some places like queue proxy, for example, and that might not be correct in queue. Proxy yeah.

F

That would be a mistake.

A

F

A

Is already like a slice of addresses right, so just picking the first one or does it have some logic already to determine what I.

B

Think it uses like there's a couple address types one is internal address and external address external address is supposed to be. If that node has like an internet public address or something like that, internal is supposed to be the IP address of the node like within.

B

You know your private network, so, but neither of these are exactly right, because you know you can think of both in the split kind of controlling data playing case. Both these addresses could be private. Neither one of them needs to be publicly accessible like no external address, but yet you want certain traffic going over one or the other. So.

F

Let me try again sure the note addresses should only be used for control, plane, yeah.

B

Unfortunately, they're not yeah.

F

Okay, so that's the issue: what uses node addresses for data plane.

B

Cube proxy, for example, pulls the node address out of host IP from the pod status, I believe or that gets somehow propagated into endpoints. For the end point controller I should probably go off and figure out exactly what's going on there, but that's one case. Definitely.

F

Sounds wrong I mean clearly if you've got a pod, that's using a host networking, of course, that you don't want to distinction between control, plane and data plane. I would call a distinction between platforming workload, sure.

B

I'm, just thinking like you know, any traffic that is to or from a pod versus traffic between a cube or Nettie's node and kubernetes master well,.

F

Unfortunately, have to think about to and from at the same time right so you've got a clear answer when it's from a pod to a pod, you got a clear answer when it's from a worker or master to a worker or master. The other two cases is debatable because it's crossing this boundary.

A

Yeah and I mean host networks. Pods are still pods.

F

Okay, so I bet woman, it is I, should correct what I said right where I said pod replace it with pod. That's not using host network.

F

B

Yeah I mean the issue happens here when well anyway. I think that's. What this has made clear to me is that we need to be more specific in where this boundary is kind of crossed or violated in the issue. So, let's start there I'll go, do that and then update that issue and then, if anybody else on the call is interested in this. That issue is there. You can kind of continue to discuss.

A

Yeah, you know I'll, definitely read it through that cool thanks, Pam Thank, You, Minh, Anh, I, think you're up next with pod readiness plus-plus.

A

H

So I wrote this down, and this has been a problem since, like the birth of kubernetes, so basically hot readiness is defined as all its containers are ready and whether it's containers already or not, is based on the feedback from the runtime and the readiness probe. So the pod readiness essentially is solely determined on cubelet and, on the other hand, like services to use service selectors.

H

That means the services has like implicit backends and that basically further encouraged the workload api's to basically ignore services in their decision-making and were closed generally, like deployment or demon said they generally only look at the pod status there they're managing so so. This creates a gap between the service life cycle and the pod life cycle.

H

Like let's say you have a pod starting up, you want it to be ready, and not only in the sense that its container is ready to serve traffic, but also like it's supporting infrastructure like Network policy rules are in place across the cluster or like iptables are already fully like already like propagated programming of iptables across the classroom, so that eventually, you claim this pod to be.

G

F

And it's also got the same problem going down right for all the load. Balancers that point to a pod. You want to know when they stop pointing to a pod, as you take it down.

H

Yes, yes, yes, basically, that's the problem. The two sides of the problem statement: one is the on startup when this on teardown right.

F

And it matters for all the load balancers that point de pods, including, for example, the cluster live balancer.

H

H

So that's why? Because this is sort of a big API change in kubernetes. If we want to fix this properly, to make the were close more like service and network aware, so we have to introduce an extra state somewhere. Either we delegate were close decision-making to some external stuff or we need to allow like external feedback into the pass status right that basically there no no way around it. There's only two way forward. So both of these options are sort of a big change to kubernetes api.

H

So, and since this is mostly impacting the networking folks, so I first wanted to basically send out this problem statement and the survey and see like whether folks are feeling the pain of this problem and then how much support how much determination we want. We can get from the community to say to push this API change forward. He.

F

Say more I haven't spent much time thinking about the solution. What are the two ways you're thinking of that to solve it so.

H

So, generally, there's only two ways: right one is to you, you, you have some extension point on the workload side to say. Okay, you consider more, instead of only the part, readiness right which is currently solely depend on cubelet right. You either.

F

What what side do you mean the clock, consuming or client side? No.

H

They're, like the deployment controller, have a extension saying they're all like me. When you do this, this is you're making you you have to talk some talk to somebody else and get a more yeah, more feedback about the pod or or the pod status itself needs to allow external feedback right now. The feedback loop stays within kubernetes, which means basically cubelets updates it only cubelet updates. It.

H

Sounds reasonable right so so, basically, that's the two directions we can, but both of those tips. The directions are quite disruptive to the core kubernetes api. So yeah we wanted to see, send out a survey and see how how much support we can get from the community.

F

So one of the things that I find troubling as I think through this start to think through this, as I said, it seems to me look what matters is everything that points the pods you know needs to track, pods coming up and coming down and clean all sorts of load, balancers and I. Remember Kelsey Hightower a couple of coop cons ago emphasizing how services or training wheels and you don't have to use them and you can use whatever you want to load balance.

F

So that's really an open set of things that needs to be tracking pods coming up and going down and whatever you have an open set. Well, there's got to be some responsibility on the members. These things that are tracking pods that'll be let they've got to be updated to. Let kubernetes know that they've tracked it. So, whenever you're talking about making a change in an open ecosystem, you can't expect it gets done in any amount of a particular amount of time.

F

So now we're gonna have a situation where some of the things that need to track it are actually using our new API for tracking and as some of the things or not.

H

Yes, yes, so that that basically brings up to bring us down to the the proposed like solutions. You know we wanted to wear like a very disruptive like proposal that, like basically enforce everybody to to consider the cases or we say more of an open-ended like solution where you can say: okay, you you, if you you care, then you consider, if you don't care that you you just stay whatever you you're doing well,.

F

There's two sides of caring right: one is in some sense the consumers of the pods service. They want to know when the pod really is ready and when it really is not, and the other is that the intermediate providers of load-balancing they are the ones that are in this unbounded, things that we're never going to find them all and that need all get updated well. But we can't require that all give it because they never will.

H

But then that depends on what's your definition of like ready, plus plus right, it depends on how you define.

G

It the core of your problem,.

F

Is that the there are consumers to go in direct through load, balancers and load balancers take some time and in policy enforcers, and you know various intermediaries, all right, they're consumers to go through intermediaries, and so the pod isn't really ready when it's coming up right or it starts to go out of service. So as it goes down as his intermediaries, it changed their handling of the traffic.

H

um That could be one case. That can be one case. Yes, what are the other cases like, for instance, leave pause, come and go, and then you have never policy in place or not in place, and then you have like security gaps or like the iptables are not not in place, and then you don't know how to debug it and what what.

G

Happened because it's yes I thought I was including.

F

That by intermediaries, I include yeah. The kernel does due to IP tables yep.

H

H

So, basically, in order to properly solve this, we need some kind of extension point somewhere right.

H

So at first I wanted to like ask the class folks, if you see this as a very severe problem, or is it like a liveable problem that takes okay, it can be solved or not solved, or is it like must be solved in kubernetes or you can? You should be solving some higher like constructs or some kind of some kind of service mesh that that bill on top of kubernetes well.

F

You with a service mesh, you get a higher level concept, just gonna have a higher level intermediary the same problem again. Yes,.

H

Yes, but if your your customer, your users are interacting through the higher level API, which already handles these kind of shortcomings of kubernetes assault, then the user will not observe this problem anymore.

F

Okay, if you're going through a high level thing that that is self-contained, and then it knows about all the intermediaries. That matter.

F

You know it's in order to answer your question right. You know, as soon as you say, the word security gap. There are some people that are gonna freak out and they're gonna say no, no I can't use covered. It is until that's problem, assault.

H

Then then, that comes back to your definition of ready, whether you want to say: okay, all the security policies are enforced for this new pod or whatnot. You consider it fully ready or not.

I

If I may add something, it's also sometimes a problem over when there is a problem reporting, what kind of problem occurred and letting the user know that a pod is not able to come up for the specific reason that the Cubitt may not be aware of, like you know, some failure in the networking back-end, and it would be very useful to be able to provide still updates. For this reason,.

H

H

Like you can have event that directly attached to like the parsing, then why it's not ready, because it's so some underlying dependency is not ready.

F

It's almost like, we need something like the the concept of node conditions, all right, there's a set of pod conditions that can report whether this pot is ready or not, and so so someone who you know wants to tell you know sentence you can look and see why the pod is not ready yet.

G

I I don't think this is related to node condition, yeah. No, it's not node conditions. It's just like it. I'm saying.

F

It's it's a similar concept. Yes,.

H

Yeah I mean this. This item is mainly for gathering feedback and see how much support we can get and then and then obviously this will be a very disruptive like API change if we want to fix it. So so, basically, I want to see like how to what extent we wanted to fix it in for kubernetes api.

H

You want to fully fix it or the one it partially fix it or- or we just want to like for my extension points and whoever want to fix it. Just use it yeah things like that. Yeah.

A

H

Flexible people want this to be wood; it won't be like a rocker like blocking from the initial transition from ready, ready plus for us, or you want to block like you, wanted to have continuous, like like a status update which, which interlock.

H

Or things like that, so basically, that's what I am trying to figure out in this survey. Do.

A

We we have a set of users who have said that they've been bitten by this, like what? What kind of brought this to your attention.

H

So I have so so we have a bunch of users which, like run into similar problems, which we can't have a proper room caused by let's say: there's no logging, no snapshotting of like the iptables state at the time the problem show up or it's an intermittent problem or whatnot. So we when, when facing this kind of like mismatch between part life cycle and the network for service life cycle, this kind of mismatch there.

H

The only way we can do is to reproduce I try to find out what are the operations the user perform and then try to snapshot it and then figure out what happened yeah so and then we find out this is sort of like because the behaviors are deterministic, so yeah, so yeah.

A

We had a bunch of symptoms that could be.

H

A

By this, but yeah.

H

Yeah, it could be explained by this or not and then yeah. We would feel like we should seal the gap. So first we we need to say okay when a you got, we know we think this is a gap and and to what extent we want to seal the gap right, yeah, yeah.

F

I'll also add that I've also seen colleagues in IBM, basically saying you know: I I get these occasional. You know 500 errors, you know or 500 ya, miss 500 or requests drops. You know the problems that that could be explained by this. Don't know whether they actually were. But you know clearly, this kind of thing is gonna. Give interruptions in service and I have colleagues who were reporting interruptions and service?

F

H

Yes, especially for like users, which is very sensitive to like this kind of this kind of disruption, yeah.

H

Okay, so I would encourage everybody to go through the problem statement and then there's just a few, like very brief description of the direction of the proposal and also the there's a survey and see how much support we can get yeah, big, Network and then I'll present it to the wider community.

H

A

Thank you awesome. Thank you.

A

um Yeah so I think that was the last thing on our agenda today. So unless there's any any last-minute topics give everybody twelve minutes back.

A

All right next meeting will be in two weeks thanks everybody.

I

Thank you, David Warnock,.