Kubernetes API Machinery Special Interest Group, 14 Mar 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes SIG API Machinery 20180314

Description

March 14, 2018 API Machinery Sig meeting recording

A

All right- and you guys probably want me to do what is it go, represent? There's! No, since you add on a book to this room, you need to give it a name, so you can type it in. When you go to go to present, it will usually Apollo connect you to all right. Do it in this book. I'll get that feature.

A

Application window and the zoom reading there soon for do I want to.

B

A

My entire screen, or just you could do it either way both work I always do my entire screen, but the other one should be.

A

Well, we can always do it short there.

C

Is nothing to say we have to take up the whole time.

A

Yeah I'm sure nobody will complain, tell them I think there was one person who wanted to show up. However, and it's we have to know, that's.

B

Me meeting notes.

A

To meetings yeah.

A

There was something in the in the slack I thought it's gonna cook yeah. Tell me if there's anything slack.

A

Hey, can you hear us yes,.

C

I can hear you awesome. I can see you too. That may not be as fun I. Don't know any of you by sight. I'm Mike sprite, sir hey Mike, I'm, Walter, otherwise known as chef taco, Hey, hola.

C

And then we we, we also have a few other folks in here. This is Joe Betts I work most minute.

B

Eating this is how we.

A

A

C

Are currently acting as our webhook are validating webhook expert, all right nice to meet you awesome, so we can't actually come to any decisions today, because we are missing both David EADS and lava lamp.

C

But if you have any questions or anything you wanted to discuss yes yeah before you do that I am suddenly realizing that I should record I just need to work out where the record button is recording in the corner. I. It does awesome. Nevermind we're good wanted to make sure it's being recorded.

C

Okay, hey chow, so we also have Chow hiding us thank Jeff.

C

Cool sorry, where I really interrupted you, you were about to ask how I was just gonna say: I do have an open issue, I'd like to discuss to the degree that we can with the people available, I'll post a link in the chat there.

C

So the background is I'm studying I'm interested in using the kubernetes api machinery to just build distributed systems in general, I. Think the the idea of focusing on, as these guys say, level based rather than edge based management and the idea of watches is a good way to build distributed systems in a way. That's simple and reliable and I'm trying to do that and to support that I'm studying the performance of the machinery, so I have a struct at a simple test.

C

Micro benchmark I took the example aggregated API server, which implements objects called pflueger's for some reason, and the example controller and modified it to work with flenders and all it does with. Flutters is basically law of timing information. So I wanted to study the latency from the time that the client does an operation on a flutter to the time that all these controllers, which I call loggers cuz all they do, is a long time.

C

Ii receive the information and I want to be the latency and then the costs in terms of CPU memory and network along the way, and so I've been doing some experiments and there's a pointer in that issue to a Google Docs site, where I'm compiling results of those experiments and I'm. Finding that, in the normal configuration where these loggers connect to the main API server, which in turn proxies their watches to the aggregated, API, server and I'm, starting with very simple configuration where there's just one of each and each has its own SD server.

C

I based. You know, because I'm using the sample, API server config, which has a pod with an aggregated server and its own entity, sarena um and deploying with coops great, which puts up one API server and one SD server anyway. In that configuration I, find I can't use up all the CPU and network, whereas if instead I make the loggers connect directly to the aggregated API server, which is relatively simple by just hacking. Their client set object that I can use up the CPU on the machine that is running the aggregated, API server.

C

So I think that there is some internal bottleneck in the proxying, except for watches. That is, you know, limiting it.

C

Yeah I I would imagine that's true a quick question, so we've got the main API server is presumably doing an I need to validate. This is doing a watch against the aggregated, API server. The aggregated, IP I servers then also doing a watch against its storage layer. No, no, the the main API server is simply proxying each watch, so the aggregated server sees a bunch of watches on Flender objects right, but that that API server is I. I, don't remember the plunder implementation.

C

Well enough, do we know it is itself so I was the default, would be it's using at CD and CD under the commerce. Yes, it.

A

Is you look at see.

C

I'm, sorry, what yes, it is using at CB and then some I would assume, but I would like to that. We can validate whether it is then doing a watch against that. Cd right I would assume right. Ok,.

A

When I swing as well, but.

C

It's probably worth one of us validating I have done some with V turned up to ten, so I think we can see in pretty great detail what happens. I'm, not sure, with our shared logs from that. But I can do that. If you like I I, think we just need someone to check I'm, just trying, partly I'm just going through the deeds I'll see if I can trigger any thoughts in my brain as to what might be going on all right.

C

Well, I'll remind you that you know the two patient's cases I'm, comparing the in both cases, the aggregate, a pacer risk getting watched operations. So it's serving the watch operations, presumably the same in both cases. Yeah I mean the interesting thing is that unless I miss remember how the API server implements watch, it is essentially just an open request which gets replied to when there is data available. So it's not exactly I mean it's all pooch based rather than home, based. Yes, they are at the implementation of a watch.

C

It's it's an HTTP request/response over HTTP one. It is a you know, response that goes on for a long time. I use.

B

C

Seems like we'd, be much better off with a she to be? Oh well, yes, it does alright, then the go code does use HTV when you're using TLS. Unless you take steps to disable it so I was qualified explicitly qualified. My remark in HTTP one uses the chunked transfer encoding, but in this case act the actual traffic is using HTTP 2, which doesn't use chunked.

C

It doesn't have chunk, but it has this framing concept in HTTP, and so it just sends a series of frames back and everybody has something to say yes, so the weird parts of that is that if it's all push based, it's sort of I'm trying to I'm having a hard time thinking what could be going wrong between the central API server and the aggregated API server such that you're limiting and the only way I can think of is. Is there some sort of delay in how long it is before it?

C

You know, reestablishes that it wants more data up.

B

Daniel, so so the claim here, I, don't know if you're familiar with this yeah I've opened this document in tab and have had zero time to start closing times as opposed to opening them sure.

C

So the claim here is that if you do a watch directly against the aggregated API server, you can could as much resources as you need to be clear. This is not about one watch. This is about hundreds of watches, oh sure, but with the key being, the difference is that the direct you know directly watching against the API server, rather than going against the central, the cube API server, which that aggregates and that somehow the aggregating, the watches is preventing the system from scaling. Well, my.

B

Memory because I I saw the document, but I don't recall, does it is the problem in sending items through the watches that doesn't work, whereas the problem opening up a new watch starting starting a new watch I believe.

C

The problem is getting the fruit foot through the watch, yet we, the lobbyist, works, it's a matter of how well it scales, and apparently it scales well when not aggregated but doesn't scale when scale well when aggregated. So it's like you can't it's not being driven quickly enough. So when.

B

Aggregated right.

C

Just like there's a throughput limit, so the failure symptom when the the proxy case hits the wall. Well, the what goes wrong is, watches fall behind and they terminate early and when they set. When the informer or reflector says pick up where you left off the server says: oh that's way long ago, I can't do that, and so it has to do a list again, which cost even more bandwidth and that.

A

C

But and it's how he is no.

A

C

That's the trick: it's not CPU bound. It's not hitting the network capacity. I'll tell you. My primary suspicion is in the fact that this proxying is done over one TCP connection. The HTTP 2 mechanism is doing multiplexing and demultiplexing over one screen and that's logically, a serial operation. So there's going to be a bottleneck, a multiplex thing on the aggregate, an API server, all those responses back down into one TCP connection instantly, there's going to be a bottle, unlike demultiplexing them all into the various watch. Requests on that the main supersite sounds.

B

Pretty plausible yeah.

A

B

B

Actually, pretty easy to test.

A

C

The problem is that I mean if he's right and I think he probably is that what we're really talking about is essentially it's a lock contention problem. Oh no I wouldn't call a lock contention problem. The lock is just an imitation detail. The problem is that there's this logical necessity to multiplex a bunch of concurrent response streams into one TCP stream right, and that inherently is a serialization thing and it doesn't matter whether you implement it with love, yeah.

A

But I mean you.

C

You have to you have to lock to do the serialization anyway, I. Don't you don't mind I'd, like I'd like to not get too quickly on that right? The point when.

A

You said to me.

C

Exactly there's this contention, that's inherent in the fact that there's one serial stream.

A

We did something as.

C

Simple, as just put in a second one, we should, if you like, we doubled the through time.

C

It would be enough to prove this or you know, an interesting experiment would be safe if we can still just one main server, why I create a server if we could just get to TLS connections between them, yeah.

B

And you should be able to do that way. You have to.

C

Try, if you get two, it gets easy to test right. The aggregated, API server has both plunders and Flinders what you do.

A

C

One aggregated API server with blenders and Flinders, then you attach to aggregated API servers and give each one of them. Plungers get one of them. Flinders same set of watches, split 50/50, and you should see a doubling of performance just by splitting across to aggregated API servers, because.

A

C

Actions right um which shouldn't require it's a config change, not a code change, yeah, it's Fisher's, it's flutters and Fisher's um officials and Sherman. Let's see yeah I have a very quick way to confirm it. So we'd have to well we'd, have to put the flounders in the Fisher's in different servers, though all right. Otherwise, the the main server is going to use one connection for both of them. I thought it was which one which I created API server was a can.

B

C

But I thought the API service object is by the group and version, maybe not even version but by the group right, not the object type. Let me just double check. I mean I've got the running so.

B

You might have them and I I'm actually not sure how smart the go.

B

Http client is it might if you use the same IP address and port for each it may actually still multiplex well. This is why I was thinking. I'd yeah I mean it's not a perfect solution, but I get to just bring up two copies of the service.

C

Then each of my IP port right you're right easy enough to bring up two servers, but that's not more of a difference. So it's not.

A

C

Not perfect, but what I mean I and I think I like to the way Daniel put it better than the way. I put it. If that doesn't double your throughput and we're on the wrong path, yeah I think I stumbled your throughput. Then we haven't proven it, but we've added some pretty strong weight to the theory. Exactly yeah. We.

A

C

Yeah, but as long as all these is running on a single box, you know you know it's not a you have more resources available, you know it's essentially at that point. You've proven it's a contention.

C

It's either in a contention in the extension, API server, which seems improbable given that get the throughput on just one before it's a contention on the communication mechanism. You just don't know we're in that communication mechanism. Yeah.

B

I yeah I think if I were going to try and disconfirm, this I would just make a code change the main API server to turn off the for disabled.

B

Which should be possible, I, don't know there.

C

There are, in fact a couple of switches already in the I believe in there. There are two things that are documented, I think they're, both the go level, one that I've experimented with is an environment variable named disabled http/2, which completely disables htv-2.

C

Right, which would be again way of disabling the multiplexing.

B

B

C

I have achieved a lot. I got different failures. Now my problem was that I wasn't very wasn't able to get very selective, so I disabled htb to basically everywhere um and that produced a bad performance of a very different sort. And if you look through that document, you can see a reports from experiments in this case, I.

B

Wonder if that flag on the server actually only disables the server half of HTTP 2 I, don't know if it disables the client half. So it still might do HTTP 2 in a client to earlier downstream server. So.

C

I verified what was going on by doing LS Oh F on the processes and Counting TCP connections. So yeah I did that you know I flailed around until I got a lot of connections to make sure that I really was disabling. The multiplex.

B

And and and so that gave so many other problems that you can really tell if it ya.

C

Gave a completely different failure. Magnet did not look like the direct case. It did not look like the proxy case. It was bad performance in a completely different way and I have a graphs in that document that you can look here. Look at what.

B

If you disable HTTP to on not the aggregator but on the aggregated, API server well,.

C

That was my problem: I couldn't get a lot of connections and the site disabled. They should be two on both of them now, maybe maybe I just watched something maybe I could I can I'll try again and also in the go documentation for HTTP two. There are different environment variables that are documented, so I can try that too so I can go back and try again see if I can get. You know just the one that I want, but I'm probably gonna disable it.

C

You know what it liked I, don't really want to disable it on the main one entirely I could disable it on the aggregated one yeah. What I'd like to do is disable it on the aggregator server in its role as a server, but not as an SD client, um but I. Don't I well, I'll, try again, I, don't know if I can do it. Control of that. Finally, yeah.

B

Yeah this is this is why I think you know the the config level almost config level thing is, is I, think a lot faster and it'll get you pretty far in control, and you can.

B

C

But the config level thing right now: if I got this right, this is actually not quite a trivial thing right, because I'll need a new API of group version.

C

So that's kind of cloning, a lot of code in the sample, API server, I.

B

Was gonna suggest running multiple copies of the aggregating, a server including to be kind of a load balancer, but I think that or no actually the service mechanism? No, no, that won't work that should work is still I, think it won't from connection from the perspective of the aggregator when.

C

You still get one connection, but the API server load balance I mean that in fact,.

B

I think the way to do it would be not put it behind a load balancer ad I.

C

Don't think that it is I mean I, think the easiest would be if we could convince the API server the cube, API server to listen on to quartz, and then you register each port as a separate endpoint, because then what the the a cube API server is going to do is load. It Alice and it'll keep start.

B

Keeping with a connection well, that's right: it does it does. It does figure out the endpoints. It goes with the endpoints yeah, so yeah I just heard yeah. So if you run two of the back-end of the aggregated API servers behind the same service or actually yeah, you can run multiple of them. The main the aggregator should multiplex across them, and so, if you run two, you should see double with your foot yeah. In fact, if you don't, let me know because that's a bug because I made.

C

That work, yeah, okay,.

B

C

Try that right, I could no I'll have to change the config to set up a little bit because the existing sample API server has it's a replication controller with one of you know, pod template, which has both the SD server, a single standalone, SD server and the aggregated API server connected together, yeah so just run one NCD, but make it two of the aggregated API servers right I have to reconfigure, so they both connect to the same guys but it.

C

But that way it's a config only change and you should be able to write and I can use pod affinity, put them both on the same host so that they're shahbazi use exactly.

B

Yeah, if you could like register a load balancer instead of those two you can figure out, it was.

C

B

A

C

Me just double check so you're telling me that if I do get to egg di servers behind the same service, just the normal main API server is going to open connections to both and load balanced requests across those two connections: correct, yeah, okay, well, I'll, try! That sounds like the easiest thing to do. I'll try that first and.

B

C

It's harder but Daniels other suggestion which I kind of like is, if you then put a load balancer in the way. What will happen is with so both services behind a load balancer. It looks like one end point to the cube, API server and that way you'd actually confirm your theory.

B

C

A

B

C

Doesn't totally configure confirm because then you have to ask what is the load balancer? Introducing a bottleneck, I think that the paracin between what I've done and what we just outlined with two aggregated servers is a pretty direct one and a pretty good test. So I think I'll. Try that next yeah.

B

By the way, I really appreciate you, yeah.

B

Kinda thin, so it's.

B

Investigated this level of detail and I'm sorry I'm.

B

C

Thank you, yes, well, I'm glad that uh I have people to work with, as I said, I think it's you know a good technological approach. I just want to you know there prove it out and help make it actually good in terms of the actual implementation, yeah.

B

C

A

So that's all I had for today.

C

So everyone gets half an hour back all right thanks everyone all right. Thanks.