Kubernetes SIG Scalability, 23 Mar 2017

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 2017-03-23 Kubernetes SIG Scaling - Weekly Meeting

Description

Public meeting recording of the Kubernetes Scalability SIG.
See comments for Zoom chat log.

A

They've gone through on and the latest version of it city to make it nice and seamless to integrate in process so I, don't know whether or not we wanted to have a configuration or a flag to potentially have an embedded version of STD in the API server or some people that want to have that configuration, because that allows you to get rid of the caches everything's in memory. So the queries will be fast. You get rid of that cert over the wire, so you don't 54 people that have that configuration.

A

So there's there's a lot of advantages, but there's also a trade-off on the fact that your API servers will have to be deployed in groups of one three or five I. Don't know what your thoughts are on them. So.

B

I think that yes I agree that some caches now we may be able to get to it, but it's not all of them. So, for example, like this cache I think I think it's still. It will have to still be there because it's like, if, basically, if we are served like serving waters, then we then in @cd. Obviously we have serialized data and deserializing it like if we will project type waters in directly into a CD, which would be the case.

B

If we don't have this kasher, then it would mean that we will be deserializing the same data over and over again. So it's not that it will be able to remove every caches that we have so that, as matter so I think it can potentially have some advantages. But I wouldn't like to be at the point where this is like the only thing like we can. We can consider like making it an option, but it should be an option.

B

I am not sure if it should be default option or not because, but it shouldn't still be possible, I think to deploy it relatively easily, like as a separate process, but yeah.

A

Yeah I agree: I, agree, I'm not convey I'm, not saying like we should. Should you know it's the one true path and saying: there's, obviously a trade out there, but if you wanted like a simplified, H a deployment, this gives you they're a little bit easier if you're trying to manage the number of bits that you're trying to deploy it makes it easier. There's also some slight performance advantages there too. So don't.

C

Get one one thing you know: there's just basic hygiene with a database around backups and, like you know there might be other maintenance tasks where you go behind kubernetes back to get at this stuff, and so both of those issues might be more difficult. If we do it in the embedded case. So.

A

We, actually, you know, full disclosure, we actually did this an open shift and is you can provide the same endpoints that you normally would provide and what we did is we brought it up through localhost. So that way, if you're on the machine, you can do all the exit e operations, but you have to be on localhost.

C

Interesting, okay, I think it's worth looking at I think um you know it expands at s-matrix. Yet yet again, that's nature says.

A

C

It's yet another sort of install option for people to wrap their heads around.

A

C

We were going to go to the sort of hyper hyper cube where it's just one process that sort of brings stuff up. I think that starts to make some stunts, but as long as we still have separable- and we have to do like boots, QV type of things, you know I'm not sure how much, how big of a gain it ends up, be so.

A

This triggers this demanding management there I guess the one. The one big advantage you would get rid of is: do a get rid of the cert configuration and the hop over the wire between the API server at CD when it's in the clustered fashion. So that's yeah.

C

I mean it sort of definitely a sticking point, but yeah, and so rotation is something that we haven't dealt with in a great way yet I think.

B

That also like a good data point, for it would be how much work it would be because, like if it's like week of work, then we should definitely try doing it. If it's like half of half of a year, then they did not work.

A

They made it really seamless. I can point you at the reference of how they did their integration, but the the piping of the config to localhost that that's going to be extra configuration to make it work great.

C

A

C

It's worth maybe bringing up the cluster lifecycle, because this this is going to have a pretty big impact in terms of the management experience, yeah.

B

Okay, go to point.

A

That was the only thing I had on the on the high level.

A

House how's the data looking for one six Witek good. It's.

B

Pretty good here.

D

And you guys see this stuff about the nodes. This missus people know not many issues that people are seeing I mean and you guys he met and any large scale numbers. Where knows, just kind of look like they're, not ready for long period of time. Somebody from the fashion do the case here.

D

You know, there's a regression of what I am so.

B

mmm I'm not like we've seen some in some number of nodes, but that are already, but it's mostly. It was mostly like because of API keep yeses in starved and not not and cubelet, not being able to send the to update the status itself, but it was like during some like it was few weeks ago. So so this actually but but it's all obviously depends on the log.

B

So if you are get some like, if you generate some very high load on Cuba or, for example like if you create a lot of secrets and or convict maps for pots that are running on this qulet and this cubelet will be periodic me like retreating today, that may potentially affect like the ability to update node status. So so it's it. It very highly depends on the load that you intended on the pods that are running on this nervous in this is like the problem that you are observing, because it can potentially be something else.

D

So this is a normally in point of disease just because that I dunno know that it works. I mean I, guess to the students that it's a that comes from a cupola not being able to talk to the API server, guys server, folder! Something sorry can you repeat, I, not ready. What does being a measurement of is that reading from like a healthy or than reading from like from I, don't even know how that's measured by little but make a server once more on a running cooler. So.

B

Then not ready note is basically if, if cubelet won't be able to send update, started updated status within like I, don't know 40 seconds or something like that notes. Controller will marketer side.

E

Right, yeah, okay, make sense! So that's why I should keep yes, okay, so.

B

We were like with some well like, especially when we were even we were having like a lot of config maps and or and a lot of secrets and and and stuff like that, on a given note, we've seen that, but it it wasn't that the cubelet was like ready unready for a long time. It was just like swapping between already on android, so it might be something completely different than you are observing. I mean.

C

There is, you know, just you know, distributed systems 101, there's, there's a balance between detecting when things are actually dead, versus thinking, they're, dead and sort of responsiveness versus robustness, and so there's no right answer here so I mean we could tune that right, but it would probably make it be harder to detect when things are actually dead.

A

So they're members that you brought up a point that I think is worthwhile or you know so I thought amounts or the original topic, but it was the question of our QPS. Limits are still really fictitious. Leeloo, given all the changes we've made over history actually have a tool right now that I'm working on to rip through and do like a ridiculous number of queries and a little afraid of how harder could hit a loaded system so or keep the estimates. Are they in scoped for like one seven to start to update them?

A

Now that we're going to be like a five thousand average.

B

Maybe I don't know like we like I, think that it's important I'm not sure like how much time we will have in Google to the pin good can.

D

We just kind of ballpark is based on cores memory or hello. To some like right. Now is just hard coded right. That's kind of the worst case and I can at least be sort of whatever it's hard coming to be normalized to the amount of memory of Dave nicer or something some other metric.

A

So it's a reasonable guess, start off with it's like it's like would largest magic calculation for D serialization cash.

A

People don't know that magic and anything, but I've, seen it too I.

A

Don't know that crazy thought, I, don't know what maybe it is online now it's a reasonable expectation to have the, but, but it's topaz total fanciful. It's like we're. Making up we're, making up or retreat we're trading one magic number for a different magic number. That's slightly variable, yeah I think it's reasonable to up the QPS limits to some value, but it needs to be qualified right. That's the problem.

D

Yeah I mean what's the.

A

Biggest limitation on 2px I mean CPU, and originally it was CPU. So like back when we put these limiters in, it was way back in the 1 1 100 time frame days. That was the original limiter, so maybe a buck then I think in 1 1 and then the we've kind of left them there over across multiple releases and in the 1 2 time frame you know, was it like a 300 node cluster could pretty much completely.

A

A

Quick circle, so you know the CPU on a master, so that looked, that was. The original problem is just that. Having too many queries in flight with all the encoding and decoding that was going on was woefully inefficient, and that was just the governor that was placed on there.

E

Yeah I think I remember those are the great slots for that. A long time ago, yeah.

B

But definitely for some components it should be easily pumpable. Now yeah.

A

I'm in the kitchen API server, because and I'm writing something that's gonna, just query the heck out of it. So the.

B

Question is like how much we can potentially bump it in cubelet, because in large cluster, like bumping, for example like $5,000 after bumping TPS by one and qubit means like 5,000 PBS's. So.

A

Well, I think just stopping the API server alone is a reasonable thing. Yes,.

B

A

Are like connections.

B

Yeah, yes, that one is.

D

Yeah I think this is all created.

A

That's the only things I am so.

F

Regenerate this special, what.

A

We get definite I did it's more like Mac sent by connections. Are then the natural keep the estimates themselves from the components, although we could, we could revisit it for the core components. I agree with why tech on the on the cooling side so.

F

Basically was decided because we saw something catastrophic be happening, but we never considered like some sort of tests or some sort of benchmarking to figure out what could be done up and how we could optimize the physical. We.

A

Could probably easily write a benchmarking tool, which is not a bad idea honestly to basically hammer the API server minute end test, because it's just client stuff, so you could load. You can load sed up with fake data and just constantly have a client clearing the heck out of it and see where you know. What's the threshold by which we you know exceed some measure of okayness like we'd, have to specify that like. If we exceed like four cores, then you know, Kim came over put her for something to that eccentric, I've.

B

Ever eaten my to delete for more than two porters now in my backlog and always dip-dip deprioritize deprioritized failure p.m. still for your sandal. Oh sorry, oh listen! So I'm saying that, like this benchmark for expeditions idea, ever I gave it on my to-do list on my backlog for like more than to bother balance like it was always be prioritized priority, M bottle. Oh yeah I definitely agree that we should do it at some point. Maybe I will have time to look like. Please contribute to it and.

F

In topic to such a serious allows right, kinda, naturally, yes, okay, it.

B

F

In progress with us, ellos.

B

Marcus on vacation now I was pretty busy 11.6 also so like I I think that, except from like what we were discussing two weeks of all I, think I think that mark send it like wider to the whole DK death. I. Think, sorry, oh sorry, not a kubernetes net and I think there weren't any like any concerns about it. Sometime look so I think that we like within a like. We should be back in a month I, so how next week is actually Cuban so probably into like make a final decision.

B

Is it's like our final project or say that's like? If no one will object until Don cancel, then we will act wound up. It's like okay,.

A

Yeah because I mean these type of like not only the F fellows, which is super important for the long term, but these other benchmarks are also good because right now the kind of ship something out in the field. You know- and it's got governors in place and users- don't understand why things timeout and right issues exist, and it would be nice to have it more formalized honestly, I come.

B

From D agree, yes,.

F

In the meantime, where do we place that information working? Where do we inform and an opening and easily accessible way like how does this human and why they were made the role where this is something that we ever like, really pay attention to.

A

um Ideally, we would have some type of like document for this scalability sig. The outlines here are the knobs. These are the azir's, the history behind them, so that somebody can go through, and you know new people that come on board could understand in it. We've never actually done. That mmm I think it's probably worthwhile for someone to take an action. I have to do that. I am swamped, though.

F

That's all looks too gentle do.

D

We want to possibly human autumn cue ps4 with an SN 4, 1, 7, 10 or anybody else we have do. We want to try it Edward. We want just bump the number as a dessert I think.

A

If we're doing jurisprudence, if we're doing what we think is right, you're do both Vienna.

D

So we may be able to help with that on cuz. I know, that's something that these guys care about for some of the scale stuff that they're doing and open ships. If nobody else wants to work on it.

D

All right, okay, well I mean yeah. Does that sound like a decent idea for us to possibly grab or just going to be able to do it? The auto QPS thing a long time honey to eat, then a Mazzetti.

F

A

Think you'd probably want to focus on max and flight connections. First right, that's the most important one. Oh.

D

Yeah the Maxim photo okay, yeah, so max invited a separate option. Let me see I, don't remember how adoptions working in a section. No, this.

F

Is anything we're proposing if Smither already could have exists? Just has it been implemented.

E

Max fighters there but I don't remember.

A

So I, don't I, don't have any objection if you want to work on a chain, um I think probably voice. Why Tech or Merrick would review the PRS for any and then on that side, and you know if you've had it on your list for a while, I gave anything written down boy Tech on your test when you plan or no.

B

No that's waiting.

D

Yeah, it was something was a little issue for the last time, each other saying I member, so on.

E

D

If the idea of automating any of this is, is something we're interested in that I think we can play around with. It is okay.

B

It will be some kind of different e to e because we probably don't want to run controller manager and cubelets and anything like that to just what we will just want to stress API server plus at CD. But yes, in a way, it's so.

D

The answer end certainly yeah. We definitely I think all agree on.

B

The end, the only question.

D

I have is whether what we are interested, what we think is the most concern the choice to automate first in terms of coming up with a good guess as a command-line option, so the user could say but I don't.

B

I think that I agree with team that their smocks in flight connections is it's broadly deployed and.

D

It will yes, that is so, then that specifically, we will look at and then the rest of the stuff can come along later. Yeah.

G

F

Anymore, others, people wanna talk about their oh.

D

Yeah I mean negative. The only other thing was rolling back to the note issue. ah I think this end and the decide when to end we're talking about so I, don't know if we want to measure the not ready stuff, also as a measurement of cluster failure in this end-to-end or in another end-to-end, but it seems like it might be, it seems like it might be something there should be good if we had a test for it. Watch this to scale tests that we actually with that.

D

The other thing is a store of a disk set. I, don't know if we do anything, really there's Gil stores on discs that could be disruptive but I, don't know what what is there any plant upstream for for stressing discs out? One is the first question and then second of all hit there do. We think we should actually have me to leave for the doll any stuff I think not ready. We are, or we should measure that maybe destiny or something like as part of a three tests, see if I'm not ready.

B

Yet strips the I think that we are, or maybe we are only like, I'm sure we are testing something about not readiness, but it might be only at the end of the test potentially which made so. It means that, with my ID Wharf like doing something Mordor regarding like disk I think that, yes, we definitely want to do at some point, but we are still more more focusing on a little bit. I would call it stateless work out then stateful, but yes, I D, like I whatever.

B

That means like medium term or like long term or more short term. We should do it at someone guest for sure.

A

Yeah, so the problem, the problem with this, because you probably want to talk to the no team- and you know there are no fences for discs. So if you started to do like disk, stressing things where you have multiple, you know if you're actually exercising local disk versus like attached volumes and there's also well, it's two parts right there. There are no.

A

There are no Governors that exist for disk and there are no Governors for network, so you can kill it in two ways right, um so you can kill the local disk by having too many things writing concurrently, you kill the network and basically denial service attack, the control plane or having too many TVs. That exercise and stress that way too.

D

E

D

This down yeah, it's kind of a lovely thing, testing straight. It's definitely disruptive.

A

You can add the test. I just don't know who's going to run. The middle ended we're a little weird each week that we're trying to avoid added and nobody actually users. So that's just it. You know.

D

Is there I mean? Is the idea I mean we do have some external scale test, something? Maybe that would be better another external repository we are team kind of is working on that I can talk to them and sneeze.

D

All right, well, that's kind of I think enough for us, you can follow the rest up and in slack or whatever you know yes, so the most important thing was.

D

Determining whether or not range doing that stuff.

F

Okay, any more items.

F

I'll, take that silence to know its ninth up exactly. Thank you guys. Thanks.

B

Yeah, thank you.