Kubernetes Community, 15 Mar 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes Community Meeting 20180315

Description

We have PUBLIC and RECORDED weekly video meetings every Thursday at 10am US Pacific Time.

Notes: https://docs.google.com/document/d/1VQDIAB0OqiSjIHI8AWMvSdceWhnz56jNpZrLs6o7NJY

A

All right Saleh, you can begin alright, hello, everyone. This is the kubernetes community meeting for March, 15th, 2018 and I. Am your moderator for this week's Ollie Ross remember this is a recorded meeting, so don't say.

B

A

The meeting that you wouldn't want.

B

Your grandparents.

A

To hear alright first up, we have a demo from Eric Korea super healthy. So why don't you take the lair.

C

All right thanks so how's my desktop, showing up.

C

Coming through you look good okay cool, so my name is Eric Greer I work at Comcast here in Seattle.

C

The team here has been working on a monitoring solution for kübra healthy to kind of solve some of the problems we've seen in our production clusters.

C

Basically, we're looking to get this project donated to kubernetes and put into the sandbox program which, as I understand, it is still kind of being figured out, but we'd be happy to fit in there as soon as possible.

C

This isn't technically a demo, even though I could bring up the web page and just show you a really boring JSON status in point, and the code isn't generally available right now because we're in the middle of a open source process at Comcast internally, so we're where we have the approval to open source, but where we open source is still up in the air and if we're able to get it into some kind of sandbox repo we'd greatly prefer that so don't be fooled by there not being a code base, you can check out we're gonna make one available as soon as we figure all that out.

C

So let's say you set up a new kubernetes cluster and you run it out in production, and you want to monitor it and make sure everything's healthy. You probably are gonna set up something like metrics collection, so you know that your cluster is not over underutilized health checks, so you know, endpoints are actually responding correctly and coming as everything is a-okay.

C

Of course, gonna monitor your application, so you know that your application is serving traffic appropriately and then you're gonna, probably even scrape some logs, like maybe dump them in Splunk or a ok or whatever you're gonna do and watch for some specific messages that you know are really bad stuff.

C

They could happen, but I'm here to unfortunately tell you that your cluster can still be broken, and we found this out the hard way by running clusters in production at Comcast and we're running those globally around the world, not just in America, and basically your cluster can still be broken in the fact that if a developer goes to create and deploy something, your pod gets stuck in, the creating state could get stuck in the terminating state could be you know, unable to mount disks. It could be unable to release its IP allocations with the C&I.

C

A lot of stuff could be completely broken. So one thing we found out that would turn up some of these issues, even though we had all these other types of monitoring was running, cops validate, which cops is an awesome project, and we use that here at Comcast to do all the baseline cluster bootstrapping, but cops validate was really good at showing these errors, because they checked a lot of really meaningful things inside of the kubernetes cluster. So what we did is it kind of took that idea of cops, validate and turned it into a monitoring.

C

Endpoint excuse me of kind of cold by the way of a cough drop. We turn it into a monitoring endpoint that very simply tells you your cluster is up or there's something wrong with it, and if there's something wrong, we give you basically some some explanation about what we've detected wrong. So uber healthy verifies that a pod can be deployed to every node in the cluster. We do that through a daemon set as optionally, a small EBS or persistent volume attached to it, then we verify that that daemon set can be torn down appropriately.

C

All the pods can be terminated can all be cleaned up appropriately. Everything comes back to normal. We also look at the cube system namespace to make sure that your pods aren't restarting what we would call too often, which for us right now is I think three times in ten minutes or something like that, but that well obviously be tunable. We also want to make sure that pods are all in the ready phase when you would expect them to be in your cluster. So when I know it has been online for more than 10 minutes.

C

I would expect that all of the daemon sets and things running on that node. Well, everything in the cube system, namespace on that node should be in the ready phase of the pod lifecycle, and then we also check component statuses to make sure those are always online, because those are obviously pretty important when you have that CD and the scheduler, and things like that. So here's a diagram that looks more complicated than it actually is.

C

This basically just shows you kubernetes hosts coober healthy inside of itself. It uses a service account to to deploy these daemon sets to do component status, checking to do coop system, pod checking and all of the checks that kübra healthy does inside the cluster.

C

It exposes a JSON Status page, as I mentioned before, and then the cool part with monitoring this is it works really well if functions as a service outside of the kubernetes cluster, so something like cog functions or lambda does really well for checking the status page and sending alerts off to appropriate channels.

C

Excuse me, I'm dying. You can also put this on status dashboards to kind of see the status in a single pane of glass for multiple clusters.

C

So this is what the status page looks like at the bottom right. It's boring, JSON, blob dap is, of course, written to go.

C

So that's what the struct under it looks like our lambda checks that JSON blob, if that JSON blob goes down, starts showing an error message, then our lambda will figure that out and we want to open source this lambda with the project as well, not necessarily with our deployment tools that deploy the lambda but the lambda itself, so that you have the option of using it and then, of course, it can send alerts to any integration that you need, such as in this situation. This was Pedro duty.

C

This sends to slack so we're using it in production at Comcast. You have this in four different regions around the world and at least six clusters I. Consider it alpha right now before we actually recommend this for general availability. We want to make sure 100% that all checks are actionable from which is one of the main goals of the project, but we want to keep working on that.

C

We want to add some more checks like service testing and checking Kubb dns, and then we need to document it a lot better for public consumption. So, finally, if you have any questions, it's super quick presentation, probably best, because I may kick the bucket soon. You can email me agree at Comcast or on slack I'm, just airing Grier on that you actually can't have spaces in your slack username, so terminates like Eric Grier or throw me an email and I'm also going to plan on reaching out to request this for sandbox consideration.

C

There you go. Thank you.

C

A

All good doesn't look like we have any questions in slack, so thank you so much Eric ah that was a neat demo. I look forward to seeing source code next up. We have the release updates. So, first off we have 1.10 release of state with Jace. So take it away case.

A

And if I actually read the first bullet teeth, point of this is uh notes. It says the release team is in a meeting right now, so the full status is in the kubernetes community meeting notes, but due to some security releases and scalability testing issues, we push the release from March 21st back to March 26 and plan to lift code freeze by end of day money. Monday, assuming everything goes to plan. It looks like they're, also looking for a release. 1.11 lead so take a look at the role description.

A

The link is in the inning notes and contact base on slack. If you are interested in knowing.

A

But since the release team is in a meeting, I have I suspect that we won't get Billy's updates from the other releases, so we'll head right over to stick updates. First up, we have to take off with Erik.

B

So I just wrote down some notes beforehand to go over there in the document for anybody who wants to follow them and I also just posted them into chat for if you want to follow my notes as I go through them, so sig off has been pretty busy over the 110 release and I'm gonna go through some highlights of some things. That I think are a significant over the 110 release and then talk a little bit about some things that we want to be doing in the future. So first up is pod security policies.

B

For anyone who doesn't know what pod security policies is in kubernetes a pod, an individual pod has a lot of ways of actually getting around some of the container technology. Is that you'd expect when you've run out of regular docker container, so you can run a privilege pod, you can run a pond with hostnet or key. You can run a pod that mounts arbitrary volume this into your container and well. This is very helpful for things like node agents. It's a very, very obvious hole when you think about multi tens that use cases scenarios.

B

What do you want to give somebody a cute config, but have them be able to take over an individual, so Cod security policies is an ongoing sort of feature that has been a solution to that, but it allows you to do is define specific namespaces that do not allow your users to run privileged containers in them and the biggest amount of the biggest work that has been going on over the last release was moving.

B

This out of the extensions of you went beta, API group, the the thing that everything used to be in like TP, RS and deployments, and so on and so forth.

B

So this is now in a policy v1 beta 1 we're looking for a user feedback as we try to push this towards GA, and we would like to sort of roll this out in the same way that we have previously rolled out things like barback, pushing it to the point where a users sort of come to expect this as a default way that enables security in the cluster. The second feature: I'll talk about is advanced auditing, so the API server actually has many auditing features.

B

Inside of it that allows you to inspect, who did what look at what type of requests came in? Who was issuing them? What the response was and even look at the exact body of the request and its result? What we are we, this is still going to be beta in 110, but we are looking to push it to stable soon, a majority of the issues that were fixed over 110 involved with scalability issues.

B

Google has been doing a great job in terms of testing it out, putting it through its paces and finding interesting bugs and just running this at scale. So that has been the primary focus of the beta efforts and over the next couple releases we will be looking to push this to stable, big shot to mick, who has been doing a great job, just keeping a record of the features that are going in and every release.

B

If you look at issue number five, eight, zero, eight three, that's a good example of sort of him maintaining what the future status is and if you ever have a question about what the features look like. You can find those issues and they're very detailed talking a little bit about some of the Alpha features that were introduced.

B

This release, the Maya, probably probably the most interesting feature that we've been working on in sort of association with the container identity working group, is something called the token request, API in kubernetes when you deploy a container and deploy a a deployment that spawns out a bunch of containers. Those containers get a service account that can talk to the kubernetes api.

B

The problem with service accounts is that one, their stores and secrets, so anything that can read secrets can now grab those credentials and escalate to the most privileged service occurring in a particular namespace and in addition, when you deploy something like a deployment and you get three containers out of that or pods out of that, all of those pods have the exact same service account. They are unable. You are able to differentiate between any particular one.

B

The token request API is a way of dynamically creating service accounts that does not store them in secret, so you will send a request to the API server and say: I would like a surface account for this particular service or service account, and the API will actually dynamically sign that give it back to you and that's no longer stored in secret. They have exclusive experience. They will be findable to specific pods, so they will actually contain information.

B

That says this service account is not just only for this name service account or this credential, but this is actually for that specific pod running on that note, if that pod is ever deleted, that credential will just completely expire, it immediately and no longer be valid and, in addition, you'll be able to create audiences tokens with audiences for things other than the API server. That's sort of a technical way of saying you'll be able to create things that will not be valid for the API server, but it will be valid for an extra servus.

B

So a great example of this would be something like hash, eforp's, vault, kubernetes plugin. When you use hash you crimson vault, what it allows you to do is take a service account token and exchange it with vault for a secret, and this is a great way of basically plugging into an external secret store. The problem with that is that, when you hand over that token, that token is also valid for the API servers. So that's a bad pattern in our perspective, and we want to create tokens that are only valid for specific use.

B

Cases like this is only valid for false, so the combination of this feature- it's an alpha right now over the next few releases, we'll be pushing it to beta getting it. So the coolant has the ability to automatically request these and insert them into your pods and I think that the future of this is sort of looking like something where kubernetes fits much better in with integrating with external secret stores, and we can sort of start to move away from the existing secrets which have issues around ACLs and.

B

The next one is client go external credential providers. We have a lot of ways of integrating with external IDPs, but we do not have a good client side story for that today. At all client side, custom code is compiled into plain go and as of 1-10, we will introduce for introducing the ability to allow people to write client-side plug-ins, to do things like log, in a credential rotation for coop, CTL and so on and so forth.

B

A really interesting use case is for TLS boot stuffing, because this will be a client know feature and not just a QTL feature. The couplet will also allow this, so you could consider a world where the couplet could create a initial CSR as its own AWS instance like that abhi, for example, by writing a custom plug-in that uses the google it, and this helps us get past some of the bootstrapping issues that we'd see internally by not being able to have those kind of integrations.

B

So this in the same fan is like the cloud providers thing is a way of integrating with an external plugin that will then implement these custom protocols like integrate with AWS I am.

D

Integrating with.

B

Protocols that we don't integrate with like LDAP and so on and so forth, encryption at rest also got an external kms integration or external integration. Encryption at rest is the mechanism for the kubernetes api server to encrypt resources before it stores it in SED. A really obvious use case of this would be encrypting secrets, so I want to encrypt secrets before I start of an SUV. This allows you to do that.

B

The story that we have not been able to achieve is how does this generate with external kms providers like vault or AWS kms, or whatever an external key management service on an equivalent file, would be so this a govt, C endpoint or it's your busy hook at least, was added into the API server for alpha and 110, and we hope to see people writing integrations for that, so we can push it to beta and 111, and next the final thing that we'll talk about is future work has not really been finalized in any reveal capacity.

B

So I will talk about it, a little bit with a grain of salt, but there has been some initial exploration of a bounty for kubernetes. This is not specifically owned by sig off. In fact, it will need owners from far more states that just us, but was discussed in sig off and I think is interesting for people that are interested in that kind of thing.

B

The basic idea would be that there would be some fund that would pay security, reporters researchers for finding vulnerabilities in kubernetes, and we are sort of in the process of trying to figure out funding, as well as trying to figure out what the scope of the project would be.

B

It is very easy for us to sort of say if you find a vulnerability, yet only if you turn all these security features on the it's a valid one, but we also want to make this real estate into actual clusters that are running so follow up on that and the forum's other people. That may have more information for that particular item, but that it for my signal, update and if anyone has any questions, I'm happy to field them.

B

Awesome well, thanks for time and I'll hand it back over all.

A

Right, thank you very much Eric. So this slipped my mind earlier, but we actually do not have a note-taker for this week. So if anyone is willing to fill in suggesting notes, so that would be very much appreciated.

A

All right next up, we have sig instrumentation.

E

Hello, can you hear me yep monochrome, so, what's going on in sickest, imitation first important update is that we have a new cichlid Fraternity branching from Korres agreed to become music lick. It he replaced Fabian Hertz. All certain colors I would like to thank Fabian for leading the seek for a year and half so far for all great contributions.

E

He will be still around and I'd like to congratulate Frederick with the new role coming back to the let's say: regular work in q1 in together with 1.10, we introduced external matrix API, which is an API that allows to plumb matrix coming from other systems than kubernetes into the kubernetes.

E

It's implemented in an adapter pattern so that we are providing only API and if you want to you, know, have a real, improve a implementation integrated with your system, for example, which from me use with cystic dated or any other monitoring systems. You need to implement this adapter I am aware of the work to provide integration with stackdriver, not sure how about chromatids and other systems.

E

Apart from that, we graduated matrix API to beta. This means that matrix server, a new component, let's say inspired by hipster, is now by default, deployed to all kubernetes clusters.

E

Ip I is available for cube applicator and all machine users like this automatic users of the API are currently using the new API. This means that cube CTL table uses matrix API an HP, a horizontal pod out the scalar oozes matrix API as well.

E

We started some discussions around securing instrumentation endpoint parameters, instrumentation endpoints- and this is still going on. This- is a cross, seek effort between sick-out and sig instrumentations in 2018. We have a number of plans.

E

First is to focus on stabilize stabilizing things, so we want to graduate the API, so we have to G a master matrix, API custom, matrix, API external matrix API. We want to also stabilize matrix server a bit and duplicate hipster. Apart from that, there is a plan to define historical matrix API, so that component interested in using historical data to perform some actions like, let's say, scheduled, air or vertical cut out the Skyler can can consume them from multiple systems mmm there is.

E

There is an idea to have to have agreement on logging architecture and vision. Three years ago or getting half ago, there was monitoring architecture, vision proposed and agreed within the community. This is this. This was a really great achievement of cig instrumentation, because it was a foundation for many design decisions and- and basically you know some decisions made in the past and will be a foundation for such decisions in the future.

E

You would like to have something similar for logging, especially together with, if not, we need to figure it out how to how to handle logging on node level, because it's it's tricky there, and that is, you know, a lot of corner cases and a lot of problems.

E

We started discussions about exposing cubelet health health status. This would be very useful, for you know, monitoring the state of cubelet, but this is different from matrix endpoint. So we need to figure it out how to do this, and we have also on a plate, some, let's say organizational work. We would like to move all sig instrumentations projects to a new home.

E

This is more or less everything from my site. If you have any questions happy to answer.

A

Alright, it doesn't look like we have any questions and yeah. Thank you very much. Thank you. So next up we have some announcements, so registration for the kubernetes contributor summit, which takes place just before coupon, is now open. So if you are interested in attending that, please check the link in the evening notes. It is separate registration from coupon itself, so you have to register for both if you're interested next week, we have office hours, we're looking for volunteer developers to help answer questions that people are asking, there's a link for that as well.

A

For more information in the meeting Oates we have the the videos from the helm summit have been posted, there's also links to the hose they're, all on YouTube, there's a link to the YouTube playlist and finally, for this week we have some shout outs. So shout outs are the section of the kubernetes community meeting where we acknowledge awesome work done during the past week by various members of the community.

A

You can nominate people on the shout out channel on slack, so we have Shyam jbs barrold long Jase want to nominate, bins and see blacker, Jenna, fest, Nick, chase kress or press your this Liggett and Brad copal. Thank you very much and please send us your shoutouts for next week.

A

Alright, it looks like we have one extra announcement from six tail. You guys want to handle that.

D

Are you referring to the note the note I put in the comment stream? Yes, I am yes, so I think looking for. We may be able to resource this out of the six scale group, but thought it would be worth mentioning that one of the things that you mentioned it briefly briefly previously, but we really are looking for to add someone officially to the release team to try to track large scale regressions at the moment, this.

D

Those tests take so long to run that if we, if we make them blocking, it'll reduce velocity, so we really need a human being to be on top of making sure that when those regressions happen, they're driving back the work that needs to happen earlier in the release cycle. So we don't have a slip at the last minute, so I would just call it a call for volunteers contact me on slack or offline. Somehow, if you're interested in that.

A

All right all right does anybody else have any last minute announcements. They want to make questions they'd like to ask comments, they'd like to give.

D

Maybe maybe I should add one more comment there, which is. This is probably a good role for someone who is looking for a way to make a newer contribution on the project, because it's really a little bit more of a coordination role. So I'd say you don't have to be a scalability expert to make a big contribution here. Thanks.

A

All right, if there's no more questions, comments, concerns I'll, give everybody back half an hour and see y'all next week.