Kubernetes SIG Scalability, 14 Jul 2016

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 2016-07-14 Kubernetes SIG Scaling - Weekly Meeting

Description

Public meeting recording of the Kubernetes Scalability SIG.
Check comments for meeting chat log.

A

Officially, starting today, six scale meeting July fourteenth 2016 okay, so I tend we were. You were saying: intel's made a big proposal across country tickets, yeah.

B

But I don't think it's related to scale I posted in the chat window. What.

A

What's the what's the general nature of the proposal or resource.

B

Isolation, taking advantage of a couple different, unique characteristics to Intel's architectures.

A

Are they does this indicate there? You have more people more people to they're going to be providing as contributors yeah.

B

It looks like they're going to be contributing. There are many tickets opened a couple different people wow.

A

Sounds good, hey boy tech haven't seen you here for a few weeks here. Are you and your post 13 recovery period here.

B

Try and find me button.

A

A

Maybe it's volumes not updated your milk near my comment.

B

Wave, if you can hear a high-tech.

B

Nope, the weird.

A

So we just had a there, just a one admin note: oh I recorded last week's meeting and we didn't really have a process for how we're going to post it. So I was not going to Sarah about it and what I'm going to do is set up using the group credentials that we have for managing zoom we're going to set up a YouTube channel and I'll just upload these and create you channel I'll post it here.

A

So, but some that's some busy work for me to work on here for the next week, but was one thing just a follow-up from us last week. um David did a presentation that it probably dragged on a little long, just because we we hadn't, really quite polished everything up about to do prometheus, based sharing age information, I. Think David's planning on coming back with a more succinct, a more succinct version about based on some of the feedback we got as well.

A

So that's pro tentatively for next week, maybe the week after and then we have a new person on our team. Who was just that still sleeping? Was it go for comp overcoming and it's a Mike Venezia's was the Cooper Nettie's? Can the Cooper Nettie's lead at viacom, so he's joined the samsung team and we're back to I was thought I saw I'm cho em up on the unbeaten delist linear, a true we're kind of renewing some work we were doing around at CD. Performance testing is multiple configurations. Primarily.

A

What we're trying to do is get all the right information to get cluster scaling to its the etsy d, part of engineering, the control plane so I'm hopeful that doesn't actually take us all that long, because we have all the basics here, but we're planning on doing at CD, both versions of SCD and some split at CD, as well as SED clustered fake, just to see how it affects cluster performance.

A

So hopefully, in the coming few weeks, we'll have some more information to share I'm out front.

B

We did some additional benchmarks and submit very rudimentary tests, we're going to stand up clusters fairly soon.

B

We took the rebate for us for open shift against 13 only landed recently and there's the tree shaking still is occurring, so we're going to evaluate in the future. But it's still probably couple weeks out, although.

A

The 13 rebase is well.

B

Both 13 rebase that we were going to test at cd3 performance against a larger scale. Cluster, okay,.

A

Good well looks like we'll have some interesting data to share I.

B

Did have a couple of topics but America's on so um did you have? Is there a numbers that you're aware of or node performance in scale at the number of pods per node? That issue is kind of like dragging on because it will affect density. The one where pods per node total number of pods are positive or total number of parts of the known.

C

Well, I, don't know if it's tested anywhere like I know they tests like on smaller machines. It should be okay, I did I think they test that I put 100 or 200 something but like when you are moving to like 32 cosmos in chains with which should support 320 or something then I, don't think anyone ever tested it.

D

Yeah, basically, the node team in Google is only testing like up to 100 and they explicitly said at some point that in the near term future they won't focus on anything larger at all. Okay,.

B

So for one for that, you guys are going to still stay at the 110 threshold. Yes,.

D

That's the plan. Okay,.

C

And we, if we want to move it higher, we certainly just need to carry the node team on board like denote sig, okay,.

B

So then, the goal for you guys won't be greater density. It just be higher higher number of nodes.

C

414, we are just in a stabilisation face-melting, so like we're just preparing to move higher in the next release, like I. Don't think we'll have like maybe we'll step higher throughput or something but I, don't think we will increase any numbers. Okay, that's good to know.

A

Take sounds tick tock. Doesn't it yeah.

E

Much better fit for this size of the release. I think August nineteenth is coming up on us pretty quickly. So that sounds like you get approached.

A

I'm uh I'm going to turn back and make more notes here.

B

Are there a list of issues that other folks are tracking right, I, don't know what the state is because I've been on pto for so long. A hunch out is the work to refactor the former still ongoing, or is that in flight at the moment,.

D

To be honest, I didn't see any PR around that so I didn't.

B

Know I guess the other question would be the shared caches for the controllers, so.

D

Something was actually merged, but I didn't pay that much attention to that. What exactly like I think we did like some step farther a little bit that something we are sharing a little bit more about and not exactly sure what exactly more I'm.

B

Still trying to catch up right now, so.

B

Has there been any other news, I guess on the performance front at all, the folks are aware of yes.

F

um I can't hear me: yes,.

B

We can hear you now alright.

F

So um I I talked about a thing in a 6k. We think and we have devised little discussion with I'm Lana and it is sounds like he's just not ready for attitude.

B

He's not ready for the informer change, I'm guessing.

C

Why have we moved to zoom exactly.

B

I, don't think that's the same problem I think he's like in a you know like a tunnel jet rocket tunnel.

D

Yeah, so one thing that I'm currently working is basically improving, scheduler fruit, boot and it's way better than it used to be like two weeks ago, like I, will send some numbers, maybe next week, or something like that when we, when I will be mostly done with that.

D

But it's like significantly better and in particular I will also like I'm generally testing like higher throughput in like Al Gore, 2000, no casters or the whole system like whole system like and to end system, with not many controllers, like basically only replication controller, and it seems that we are ready to come clean crease. The QPS limits, like obviously the best case. The best thing that we would be able to do is to have some back pressure, but that's something that is not going to happen like soon, at least in balloon, 1s bandwidth that that.

B

Was something that was on my to-do list to start looking into so I once I get rid of my ridiculously large backlog of emails, also by which, if we can start segregating or labeling a little better, that will be awesome sauce, because, right now I have a drink from the firehose of cooper, Nettie's emails to regularly I, don't know that other people also feel that burn so to speak. Well,.

E

So, what's sort of labeling or triaging? Are you talking about I guess.

B

Right now, as tickets come in not all are labeled appropriately like area performance right, so some are labeled area, the kind of crossover domains and some some will have the appropriate labels and some will be like area API machinery right and then there's there's that there's.

F

Not a consistent.

B

Like a sort of triage or.

E

Yeah every time this sort of thing comes up in community.

E

This is where Brian grant and daniel Smith start saying it get hubs like link system sucks, and we really should solve this with a bot or something that there's how to sort of at least take a decent first pacifists, because I'm personally finding that even as a human being trying to go through and just sees be, the appropriate sig, sometimes on issues can be a pretty daunting task, but is always on done daily and I'm not entirely sure how to fix it without some kind of bot or better communication that people should label and triage your issues appropriately when they create them and so like.

E

It needs to be a very short answer. If that's the approach or we got to figure out how to automate this and then I sort of go back to many of the folks in this meeting, probably don't have labeling privileges, so they can't even do anything about it.

E

Not that I disagree with anything you've said I.

B

Guess the only other thing I can report out is that we were doing. It probably doesn't affect anyone who's, not using device mapper. But it's something that we noticed when we started going really high on the nodes, high densities, so we're doing densities above 100 we're doing densities at 250, and when we do that, we see the memory utilization of the node start to grow really high and we've already reported it to the appropriate people.

B

But I thought other people on this call might be interested so they're spending a lot of time on the note itself or not a lot of time. A lot of memory on the note itself trekking disk utilization. This.

A

Darker demon you're talking about or something else no.

B

No, it's see advisor Oh, Steve, ledger hilton, the couplet, so the memory utilization for our 16 gig machine with 250 pods doing nothing was 5 gigabytes and memories just to pretty much track. Those pods and most of that memory was spent in the heap profiler. Most of the memory was spent in a disk monitoring.

B

What's going through the my car? Yes, we're tracking and trying to address so when.

E

I'm sorry I missed. That last part is you? Are you saying that utilization is coming more from see advisor and less from tablet correct.

B

The sea Pfizer's tracking in doing all the staff data so.

B

The problem is the see if eyes are smashed into the couplet, so it's it's. The couplet yeah.

E

Somewhere in catching up on my backlog of tickets, I think I saw like a sea advisor roadmap ticket living out there somewhere. um Maybe that's that's, got some information on this issue.

E

Well so um this is kind of recent go line. 17.

B

I just submitted that yeah.

E

I think this is probably a good group of folks to talk to you about just quick gut check. Does that seem realistic, 414 I, don't.

D

Think so, like do you know the exact date like when they are going to release it.

B

D

B

Know August is the proposed date sometime.

D

Yea, though I guess like we are reaching called freeze, I think like around middle August like 15, or something like that, so probably like I, think more 1.5 the cooler net is 1.5. Release will be reasonable for the 1.7 I mean it will be probably too late to do it for 1.4.

C

Especially that we know that I check every single yo version up until now cause some problems, so I is not reasonable to think that this will be different.

A

Yes, Hey wojtek, could you could you clarify the throughput improvements? You're working on? Are those one dot for or those can take home before you can yeah.

D

Yeah definitely like I'm, mostly focused on scheduler now through boot, so just the bottleneck for the robot in the system itself. So.

B

Are you testing I, don't know if there is a good test for throughput and speed with regards to the modifications that I believe the kwame guys added for affinity and anti affinity and word on the street is that performance is terrible? Yes,.

D

B

D

Like holy because that will now and I'm going to this like this, is currently switched off by default, yeah like it's, not part of default, predicates and different priorities, I just started looking into it, like literally yesterday, and I already have some plans to to improve up to something like reasonable, so yeah, basically like Google, wants to have it switch on by default in 1.4.

D

Okay, guys, like you, burn at ease and stuff like that. So so this will happen in like few weeks. It's.

E

Going to be tied in with this, with the affinity and anti Kennedy stop moving from alpha to beta for sister.

D

Like definitely, we can't go betta betta if we won't improve performance but I as a as far as I know. It's not the only thing that we should do to go to like improve to beat that. So I'm not sure if it like the plan is that it should go to beta. But I don't know if anyone is doing anything else than improving performance right.

E

This is one of those questions were like I guess. This had been started before the whole features. Repo and I'm still unclear whether that's really the marching order the community is taking, but I'm still confused about. There's a really clear document about API is moving from alpha to beta 2 GA. It's unclear to me like what other gates or criteria are needed, and so it sounds like if Google's really got is really interested in seeing affinity on by default.

E

There should be some listed set of criteria that had to be passed for that to happen and I'm like I, think.

D

In general, the API criteria, but for this particular thing, I think we are aware of some. There is a back for it. I can like an issue. I can try to find it, but there are. There were some issues that were like specific to this particular feature that we need to fix that we are aware of yeah.

E

No I agree I sort of like, in my my sweep of my ridiculously large backlog. I. Do you remember running into this stuff, but I, don't recall, seeing an umbrella issue that ties this all together. That's sort of I guess what I'm asking for, or some kind of document that, just as where the performance criteria are listed, it.

D

E

D

Aware of any up.

A

Hey could I could I try to spit back when I'm hearing here and see if I got it right wojtek. So the affinity, anti affinity feature causes pretty big hit performance you're working on throughput improvements. The goal for 14 is to get the affinity, anti affinity features into 14, but not to have performance go backwards. Yes, okay,.

E

Yes, exactly yeah, the features are in one free right now, they're just not enabled by default and sorry, you.

C

Don't condone I believe until any reasonable size of a passerby. Look. It's like the genome is.

D

Variable, even in like 10 note cluster, so it's yep, ok,.

C

But the terrible it's like really really really terrible. Just like the way you will wait for an up an hour, half an hour to schedule like a pod on a tenth know, interesting good, to know.

C

Yeah I have one thing for like a like a well a question for everyone like I, will be working contesting controllers city soon and or controllers, which means I do pretty much features that we have. I like and and all the sounds like, whatever number of parts in a service or something so a type of skating.

C

If anyone has any idea for an SLO that we should track in the system, I would be grateful if you can send me so like if you think that we should like that, we should handle whatever thousand part in the service with the same mistletoe or ID reaction time, for something is what every few seconds pretty much any other s, hello that you'll realize to do to the performance of up of controllers I.

C

Please send it to me on slack or mail or something.

E

Just so I understand, are you asking for like user-visible SLS that are black box? Are you asking for it just specific to say, like the replication controller controller should respond to a change in scale within a former.

C

It would be better, like the form is more important, but I guy can work with latter as well.

B

The only thing I can state is that we have a separate set of tests that we run basically to load all the controllers, so to speak, to monitor and do performance testing and.

C

I guess give them the memory usage but I from the user perspective. Does it actually, it doesn't.

B

Track a solos, but you could use you, could you did hi jacket to use to track as ellos or you can take the you probably have to create an intent test honestly, yeah.

C

Well, do the first think I need to actually define this alone right, yeah.

B

Step 1 define so.

E

I thought this is why I'm curious, like density for example, involves the use of replication controller controller. You could say that, like the time to ask it to say: please create 300 pots, and then you see 300 pots read it. You don't care whether they went running it. Ready right is that sort of an example than SLO for a controller? That's.

C

Yeah, that's an example for that I like the internal solo power controller, but like if you but I, also interested in like that, the SLO for the world for the time, but it's also, I also knee into type to define like what does it mean like what is take the system so like if it's the first control replication controller or do you want in the yeah? I can so like not only DD like response times and throughput and all the other like normal stuff, but I also the load in the environmental which we are testing.

C

That's right. That's like the first application controller. 100 pots will behave probably differently than the hundredth one, so yeah it.

E

Again, using density like a model right, you want to see what the behavior the controller is with a barely loaded system and their a fully loaded system, and you need to see that it makes this guarantees in both cases and what.

C

Are what are the numbers that people are interested in so which parts of the system are well used by by community? Well,.

B

We'd be interested in a document, and I can share it around with other folks, because I'm sure other people will be interested in it. um There.

E

B

Comments, questions or whatnot. If people care about performance, they write their own special custom, jig.

E

B

They use annotations but not get around it. So no.

C

But like also like, when we say anything about the system's performance like currently, we only mean the API call latency right back most. Yes, that's the only thing to measure which is not ideal. Cuz like, for example, we do not say anything about services or now resistant volumes or.

B

Anything well I mean this because we have this separate because in Cooper Nettie's we have segregated the controller from the scheduler, which is different from other systems where they smash them. Together, the we're always going to have this weird round trip back and forth questions so like we can I'm happy to collaborate on a document. I guess the question will be like.

B

Are we going to push those numbers to a certain point and change architecture a little bit or are we going to just provide a baseline for now, and so we can report out then improve well.

C

First, we need to know what's interesting, okay right like before that, like we can talk about and I think but like we first need to know, for example, if the one set throughput this and how interesting anyway or not, my guess is not but like maybe someone different think about it or throughput of pet said controller.

C

There's number of controllers that actually does something, but like not really all of them needs to scale and be really really free. Yet.

E

Feel free to this I can think of like there's some action they should take within some latency window. If you could probably describe each controllers responsibility in terms of do one thing and then do it within a certain amount of time that gives us some leather upper bound on the system. Performance, yep.

C

Yeah I, like like I, think that we don't need to be really precise, like best effort for something new once or something like it's probably good enough. So.

E

In terms of like the actual constructs that are interesting, I might start with whatever constructs the official adults are now scheduled with and I legitimately. Don't know whether that's replication controllers replica sets deployments or pet sets, but it's it's migrated from those from release to release. um That would probably be a good place to make sure you've got guarantees. First.

D

Yeah the problem with that is that, basically, the huge part of it is actually downloading the image which we really become full control on right.

E

I think there's something about that. The end-to-end tests I think installed a été image polar that should preload every note all the images necessary. So if you could say that you know, given certain conditions with little nodes pre-loaded, you can try to take out some of the environmental conditions, cool.

D

Yeah, but that's madness is that yeah with that with pre-loaded image, which is basically what we do for start-up is what start up a zillow it. Yes, we can definitely define it like for for Adams, also yeah.

C

But that's less interesting, cried like you like in the world. You want a download, every single image and every single note so well. Yes, we can measure it that's measurable, but the question is: if it's, if the thing that we can measure, it's actually interesting.

E

Yeah totally fair and just sounds like you want to remove environmental conditions et image, polar and mandating. Some prerequisites on the cluster state under test is first and then, since I'm sort of under the impression that eventually, the controller manager is going to turn into this thing, where each of the controllers start to float into their own positive process these elsewhere within the cluster. It makes more sense to be to find the controllers that are most meaningful to users, and so the first thing you're going to encounter the add-ons their scheduled into their cluster right.

C

G

So I, just one more question about um so currently what I see in the co is that I feel funny. Look isn't cash in EK, so it's like antique plates, something called undecorated like anyway like is it any any reason why it's not catching you guys ever so. If I have a lot of party resources like I like if I do a less or something like it's, it's gonna call from a CD /, most knowledge jointly. That's gonna be consuming I.

D

Don't know vartak, you don't know, so you asked for third-party resources like wait.

G

For honeybees resource on there's a registry right, yeah.

D

So not clash it according to we can do it like there's nothing that prevents us from doing it. Okay, it just no one have has time to do it. I see yeah, that's perfect! Thank you.

A

Well, we're we're out of time for today, good discussion. I was busy trying to make notes, but some of my notes are a little bit half done here. I see some folks in trying to help them get up-to-date. So just a quick call to take a look at the notes and help get them updated and I think we're done for today.

A

D

E

A

E

Thanks thanks thanks so I'll good to see you again.