Kubernetes SIG Docs, 16 Oct 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes Sig Docs 20181016

Description

Meeting notes: https://docs.google.com/document/d/1Ds87eRiNZeXwRBEbFr6Z7ukjbTow5RQcNZLaSvWWQsE/

The Kubernetes special interest group for documentation (SIG Docs) meets weekly to discuss improving Kubernetes documentation. This video is the meeting for 16 October 2018.

https://github.com/kubernetes/website

A

And so I, don't I hope have to deal with uploading and we are now recording. This is the weekly meeting for cig docks, October 16 2018, and this is Jennifer Rondo hosting, because that Carlson is out sick today. So let's get started. Do we have anybody new on the call I I'm not looking at the full list at the moment, but it doesn't look as though we do. We do have new, reviewers and approvers I. Think Jim I saw you on the call right.

A

Thank you very much for stepping up and I think we can move straight into the agenda and Andrew. You asked to go first with the funneling, so.

B

A

Want to share I think.

B

Dominic will so uh last week we asked like you know what topic people want to see next, so it sounded like high high availability was the winner so Dominic as the presentation prepared go ahead and take away dummy, okay,.

A

I will stop sharing so that Dominic can.

C

Here we go just a quick check. Can anybody see the screen yeah.

D

We can see it wonderful.

C

Thank you so, today in front with modeling, we want to talk about my favorite topic, actually about high availability and, once again, just as last time. Interactive, please don't be shy. Ask questions at any time and I may ask a question or two during the presentation. So, first off, when we talk about high availability, we want to set the stage and they're usually too calm. It's surrounding responsive systems, one is scalability and one is reliability.

C

So scalability is generally defined as responsiveness in the presence of load on the system and reliability is defined as responsiveness in the present of failure of system components. So, in this presentation for this talk, we want to focus on reliability only that is the ability of the system to be responsive in the presence of failure. So the first fundamental distinction that I want to make is that the reliability of kubernetes does not imply the reliability of applications hosted on kubernetes kubernetes may be able to sustain node failure or code failure.

C

That does not immediately translate into your application being able to withstand that failure. Simple example: if I publish my workload or deploy my workload as parts and the node fails, then kubernetes will keep being responsive, but my application will not be responsive, so, in that case, I would have to deploy. My workloads is replica sets or deployments in order to take advantage of kubernetes high availability and have kubernetes reschedule my application workload that just failed. However, that still doesn't protect me if my application has inherent bottlenecks or inherent single points of failure.

C

So we need to make this distinction. So first, let's look at high-level architecture. That is important in this conversation, so we have two set of components: we have the master components and the node components the master components they host the controller, a process that hosts all core controllers: the cube scheduler process that holds the scheduler cube, API server, the API server and an etcd node, whereas on the North components we have the usual suspect, cubelet and the container runtime. Of course, we have other components like queue proxy see advisor, but for this we can.

C

We can skip this and.

C

Now, if you want to talk about high availability, actually that is also true for scalability, the one ring to rule them all is redundant deployment. So, in order to withstand the outage of a component, we deploy multiple components or the same component. We actually deploy the same component redundantly. So in the case of kubernetes. That would result in this architecture. We have multiple nodes and we have multiple masters and so because a node cannot connect to a master directly, because the master is the one that may fail.

C

We have to add an indirection in that case, that is the load balancer, so that the node connects to the load balancer and the load. Balancer will then forward traffic to an available master. Now quick show of hands. As I said, this is all about reliability, but since we have a node balancer in there does this setup also increase scalability of master components, or does it not have an effect on master components?

C

A

Sorry Dominic ask the question again, not quite where I was expecting so I got distracted, fair enough.

C

Since, since there is a load balance in the picture and load, balancer is a common component. When we talk about scalability, who believes that the load balancer can help us scaled, the master components, if our, if our cluster increases or does the load balance and not have an effect on the scalability of our master components,.

A

Inherently I think it doesn't, doesn't it doesn't, it depend more on sizing.

A

Scalability I mean there's more variables here than just the load. Balancer and multiple masters.

C

That is correct. Inherently, it doesn't actually can get worse. If you look into how dominica.

A

So you know I've been sort of spilling my brains on the keyboard documenting high availability for the last two weeks, so this is really really timely. All right! No.

C

But I'm glad we renamed the timing, then so in this case, I want to quickly show the flow of request from a client, cubelet or queue controller to the load balancer to the API servers to etcd, so that the client hits a load balancer and the load balancer selects a API server at random, and this API server is co-located with the etcd leader, see in the upper picture that the request and the response flow is absolutely straightforward.

C

However, the lower half of the picture you see that in this case the load balancer chooses another API server that is not co-located with the current etcd leader, but with an etcd follower. Therefore, the etcd follower will not answer the request directly, but will redirect that request to the leader and then reply on the leaders behalf. So our bottleneck, no matter how far we scale our ap API service is etcd int.

C

Indeed, the more etcd nodes we have, the worse the performance will be because etcd needs to reach consensus and consensus is an inherently chatty protocol. Therefore, increasing DCD notes will decrease the throughput okay.

A

Don't I've got a question for you here please and I I.

A

Well, you know what they say about guessing I'm, really, not sure the answer, so you can set up a highly available cluster either with stack masters so that that control playing node is running on the same machine as that CD note or you can set it up with a separate, a TD cluster. Does that make a difference here.

C

Not much so you're right. There are multi deployment scenarios, however, it doesn't make much of a difference, because the etcd cluster will always be your bottleneck. Parallel scale, etcd horizontally, you have to scale the individual nodes vertically, so give it a bigger give it a bigger node, give it a bigger machine, give it faster, faster hard drives. You can scale up, but you cannot scale out right. However, now that you are deep into the topic, this is actually what I'm showing here is an oversimplification. It always holds true for right requests.

C

Anything that is state modifying. You do have the chance, so to start kubernetes with either a quorum read or not. If the quorum read is true, any follower will always redirect read requests to the leader. However, if it's not true, the follower may respond to requests straight from this local cache. However, you may actually get stale information, so you may read old data and if you do that rapidly in a row and the load balancer spits out random etcd followers, the same query may actually lead to different results over time.

C

Well then, let's do a quick, deep dive into the into the architecture, architecture of kubernetes high availability. So you see, we still have the note the balancer and we have individual notes hosting the master components in that case three are depicted here.

C

We have the cube controller and the cube scheduler just as before, but in this case you also see that kubernetes needs to select master components. I'm sorry needs to select leader components, the cube controller and the cube scheduler work on global shared state that is in etcd. If you have multiple cube controllers active at the same time or a multiple cube schedule as active at the same time, they may actually be competing.

C

This is not true for the node components, nodes or the cubelet actually works on inherently partitioned data, since a part is assigned to exactly one node and there will be no competing updates in this case. Since we're on working on global state, there may be competing updates, so kubernetes has a leader election for the cube controller and the cube scheduler. As you see here, cube controller and cube scheduler, the the leader can actually end up on separate hosts.

C

However, etcd guarantees you that at any point in time there is only one of them. The leader information is stored as kubernetes endpoint objects and the cube system namespace, and there is an annotation that is holder, identity and the holder. Identity will point to the cube controller or cube scheduler. That is the current master and kubernetes implements leader election on top of endpoints and on on top of ET city. Et city guarantees, consistency of it, and this is simply depiction of how the leader election works fairly standard leader election.

C

On start, any master will run this loop trying to acquire a lease. If a lease is not acquired, then the master will keep retrying. One of the masters will acquire elite and then executing its main functionality in case of the controller. So it will host the controllers in case of the scheduler.

C

It will host the scheduler and it will keep trying to renew the lease as long as it still has the lease it is still the master and once it loses the lease, it is actually a hard kill of the process, for example, cube controller it hard kills itself and the operating system mechanisms like, for example, system D. The process restarts starts again and enters this loop again.

C

An interesting sidenote, so is that etcd guarantees you that at any point in time there is a most one master control, either at most one leader cube controller or at most one leader, cube scheduler. However, that does not prevent the individual components from falsely believing that they are still the leader component. While they are not. That is the situation where, at one point in time, you can actually have two competing leaders- and this is also referred to as split brain typically split brain- is it's an unavoidable condition.

C

However, you can guard against it to begin with, for example, fencing tokens. However, kubernetes does not apply any of those guarding mechanisms. So, for a brief period of time, you may have two competing masters. That may lead to situations where you have a replica set that specifies three replicas at time. T you have zero. Two of the masters are online, leaving to be the leader resulting in six replicas of the pot. However, down the road the reconciliation loop will take care of that.

C

This is an interesting side note, since kubernetes actually doesn't give you any guarantees. You know with, for example, replica sets that it will create only up to three parts, but it basically, it only gives you the guarantee that over its lifetime, it does its best to keep it at a steady state of three parts. So, while high availability actually helps to keep that guarantee, there is there may be situations where you see jitter now, let me see we do have yeah. Unfortunately, this one.

B

Was animated but that one doesn't work out for me right now: try going into presentation mode or bring bring up the image in preview I just.

C

Bring it back here.

B

We go yeah there.

C

Yeah I trying to they're trying to acquire the leases. The leases is being renewed over time as long as the releases renewed, this master is executing eventually loses the lease it stops executing the main main loop and then the entire game starts from the beginning.

C

So I was it today for high availability. There are any questions or comments, be more than happy to try to answer.

D

So I think last week you talked about high ability. Could we read it? How much does communities do not much I think was what you said. What what was the? What was the not much the.

C

Not much is, if you look again at this picture, you take the load balance away or you take etcd away and the entire architecture crumbles on you, so the load balance and needs to be a highly available load. Balancer kubernetes does not provide that it just requires the presence of it and etcd is the one that actually provides a consistence of the data and in this case also of the master and yes, kubernetes does have its own leader election. But, as I said since it is, it doesn't guard against split-brain situations.

C

It is what's the right word you could, you could argue, this is suboptimal and in a situation that requires consistency. That would actually on a would be unacceptable. However, due to its reconciling nature, kubernetes can actually deal with these situations down the road, but you see certain jitter that we talked about, for example, with with the replica sets so actually out of the box. Kubernetes itself doesn't provide much.

C

It stands on the shoulder of giants that is load balancer and etcd, and it uses their capabilities sufficiently, however, not to the maximum yeah, but since it doesn't give you any guarantees. Besides, I do my best to try to reconciliate the current state with the desired state. It is actually a fair architecture, but in and off itself there is not much kubernetes adds to the high availability yeah.

D

You might have mentioned this, but does we have this idea of leader for Etsy d and for the scheduler and controller manager? Is there also a notion of leader for the kubernetes api server.

C

Actually I did not mention that, and that is a very good question. No, there is not so the etcd leader is correct. It's an inherent or is an inherent capability prerequisite actually of etcd, so etcd will guarantee you that there is only one leader at any point in time.

C

Since API server is, you could say a proxy on top of etcd. There is no need to elect a leader for the API server, whatever API. So ever the load balancer chooses to forward the request or will do just fine. It will then eventually forward, especially if it's a write request. Etcd will forward that request to the current etcd leader, no matter what it guys are always selected.

D

Right yeah- and you mentioned that the load balancer is not necessarily highly available, but the the load balancer is not something kubernetes provides us right, isn't the load. Balancer is something that, as you implemented, kubernetes environments you'd have to figure out how to do that. That.

C

Is correct? I did not want to say that the low balance is not highly available. It's a prerequisite. It needs to be a high availability, lopat right.

E

C

And yes, you are right. That is something that is a prerequisite for kubernetes, but kubernetes doesn't provide it whether in our Google, whatever whatever comes with the hosting platform, needs to be integrated into your deployment. Yeah.

D

Like node I, don't know if this is true for four different kinds of ingress controllers, but in gke the Ingrid's controller has this same trouble? Is that it it needs to. It needs to elect a leader, and if it doesn't get that right, there can be some of these problems that you're talking about so.

C

In general, however, that is a that is a separate topic for a separate discussion. However, to brush on to that, kubernetes has some implementation choices around how to host core controllers and how to host core schedulers. That is a cube controller manager process and the cube scheduler process.

C

Custom controllers or custom schedulers do not cleanly fit into this view. So various controllers as a concept or an extension point. When you have a replica set, you have a deployment, you can have anything else they're. An extension point. Kubernetes makes a difference in how it hosts core controllers or how it hosts custom controllers, and therefore you will run into these problems, because custom controllers can not take advantage of the entire already existing machinery.

C

D

B

To head out to another meeting now, but Thank You, Dominic I will talk to you guys later thanks Andrew.

A

D

Ok, thank you. That's. This was really really helpful. Yes,.

A

Thank you very much, Dominic any other questions. Okay,.

A

I believe the next item up I.

C

Say bye-bye to need to head out to the next meeting. Okay.

A

Thanks, bye and I think the next item up is github project cleanup, which is yours, Steve I, don't.

D

Exactly know why my name is there on that one does anybody know.

D

What was I was I assigned github.

D

A

Talked to my recollection of last week's meeting is that we talked about using a project in the kubernetes organization tonight.

D

A

Cleanup and we've also got a few projects in the website. Repo and.

D

Now I remember now: um I didn't get to that between last week and this week so and.

A

I forgotten about it until I saw this item on the agenda, so perhaps we should defer till next week, yeah.

D

Yeah I will I'll put it on my list of things to do between now and and next week. I remember now it was trying to get get all the different projects that lit that collect issues consolidated yeah.

A

Yeah we've been kind of random. We've been unsystematic in how we use the projects in the in the website. Repo, so I think the idea is to start figuring out more system there and to think of it in terms of like sub projects of the master project that lives in the work. That's my recollection anyway, yeah.

D

Okay, yeah cool.

A

And and Steve I'm happy to take a look at things if you want to rubber duck or brainstorm um since Zack and I have been talking about the larger content. Cleanup content, yeah, okay stock also sounds.

D

A

Cool. Thank you in that case, the next item on the agenda is looks to talk about sigdoc stools and its merry band hello.

E

F

Cool so yeah I, don't in terms of an action item, it's it's kind of more of a discussion item, which is so we have this. We gather together this group of sigdoc stool folks, so myself, Andrew, Zach a couple others and now I'm curious.

F

If the group has any input on what exactly we should do with it thus far, we've mostly just been kind of having you know, ad-hoc conversations on slack about various things and we've responded to a couple of small action items, but I guess I'm here to get input from the group not just about logistics and how our subgroups should run, but also what our priorities should be. If anybody has any action items that they'd like to request and so on and so forth, so I'm just opening up the opening up to this guy.

A

You anybody, our group has shrunk I think significantly here, but um Jim's Lana got anything I'm. Sorry I was multitasking here.

F

Yeah, maybe given that I think I'm the only person in the sigdoc stooling group who's here and given that we have some some key players out for the week. Maybe we should just shell for next week sounds.

A

Like most of the agendas headed that direction, which is fine with me and and I'll, give it a little bit of thought to not because I want to be part of the working group. I, don't I need to sit on my hands here, I'm interested but too many other things, but I think it might also be useful for those of us who aren't planning to be part of the group to ponder a bit more. Some of this stuff is already laid out.

A

I think the kinds of sort of tooling and infor issues that we have been thrashing around as a larger group to make sure that those are clearly defined. For your group look I mean some of that is sort of legacy, human migration pain and these largely cleaned up. That's why I need to go away and think about it and can't just brainstorm on the spot, but does that make sense to the rest of you.

D

Yeah, it does to me.

G

I'm thinking that a good first step for the tools would be to just outline what tools are available and the current issues kind of echoing was already said. Based on that.

A

Yeah, if we could have a list of things that have been bad and about casually, that would be cool.

F

Okay, yeah cuz I think you know it's one of those things where we have all these different platforms and sources of information and nothing to really not yet any means have really centralizing that than information, so I think maybe like in the next week or so I can I can give that a little bit of thought. You know I wonder if maybe like issue tags could be the primary means of of coordinating this information, or maybe we should set up a separate project so on and so forth, but yeah.

F

Maybe we can either talk about that next week or I can come with the presentation, something something that that's wrong. One.

A

Other thing occurs to me too, because I've been just dipping. My toes in Paul requests this week and there was one against the Travis build. I didn't look at it carefully, because I thought: okay, not my Pig! Now my barbecue.

A

Somebody else needs to wrangle this one, but I was curious about it, because historically, the travis build has done other things from what the this particular PR was suggesting that I do and if we had a place like like a way of tracking related issues, so that PR Wranglers, who aren't necessarily going to step into those PRS and wrangle them, but who need some context to figure out how to sort of point them at people on yeah. I guess that's really really a long-winded +10 for your issue. Labeling you Luke.

F

Okay, well, that's quite duly noted.

D

So so Luke is, this is one of the things this group is looking at. Is the automatic generation of reference Docs yeah.

F

That's that's something that Karen Bradshaw, who and I think a couple of those have expressed interesting yeah I mean basically I. Do think that that very much falls under our our umbrella yeah, which which at this point is basically everything. That's not writing in thinking about content kind of falls under our umbrella. So yes yeah, the long would it answer. Okay,.

D

A

Yeah that one we keep getting questions about from other SIG's.

F

Yeah I would love to I, don't know, designate a point person on that or or at least maybe you know, get some kind of a Google Doc or something from somebody, because you know I I, don't even know where to begin comprehending the scope of the problem or understanding the trade-offs and challenges that have been considered thus far. Let's have an information source on that yeah.

A

And another another person to bring into that conversation is to me because he's done the most work recently to rewrite stuff and sort of Corral and wrangle. It and there's still I know around the the bit that I know the best and I don't know it well. I only know the issues on the cube, ATM Doc's, four one: nine, six, Esther lifecycle generated them and for 110 I'm, not sure what happened when eleven things were.

A

I am still not sure what happened 112 for sure, sig Docs generated them, and it appears that we've taken that over and that's now part of the generated, Doc's responsibilities and part of the scripts, but that story alone suggests that we need to pull more bits together. In terms of the story we tell to other to other things. Also.

F

Okay, so maybe now that I think that the Hugo, the Hugo migration stuff is out of the way and I think some of the other pain points out of the way. Maybe we can target that as our p0 for the coming weeks and months as a group as a subgroup rather.

A

Sounds like part of a plan to me.

A

So the only other item on the agenda is Brad, topples presentation and he couldn't attend today because he's otherwise occupied so does anyone else have anything they want to bring up or shall we take back 15 20 25 minutes.

A

Going once going twice, channeling Zack here: okay, also channeling Zack. Here that's a seek meeting. Thank you. Everybody.

G