Cloud Native Computing Foundation KCD UK 2021, 19 Sep 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: What I Learnt Fixing 50+ Broken Kubernetes Clusters — David Flanagan 2.1.4

Description

Is your idea of fun sitting in front of a camera, live streaming to the internet, debugging and fixing a broken Kubernetes cluster? Doubtful.

What if these Kubernetes clusters were intentionally broken by members of the Kubernetes community, tasked with making your chances of fixing said clusters as slim as possible?

Join us today to learn the key methods, tools, and takeaways David has learnt fixing over 50 Kubernetes, live on his series: Klustered

A

Right, so our next speaker uh probably needs uh very little introduction. Is he? Are we bringing him on to the stage.

A

Is it happening it's happening so probably the most prolific live streamer in our native ecosystem. um His media empire includes rocco live. Slash, looks good to me on cloud native tv and, of course, custom. It's david flanagan. How are you buddy, I'm doing really? Well it's nice to be here. How are you I'm good, I'm good! So you you uh have you recently become a have. You got a little one, a new little one.

B

You're so close, so I have a three-year-old and I have a minus two-day-old, so someone drew within the next couple of days.

A

Oh, I knew it was. I knew it was either coming or just happened. So yeah.

B

It's close, I mean I may have to just leave any minute right now to be fair, but hopefully.

A

That will talk out the way quickly, but your reputation clearly precedes you, because dan papandrea turned up from new york in the middle of the night last night,.

B

Dearly, I don't know, what's happening.

A

And he's not here today, so uh he missed his opportunity. Oh.

B

I'm sure he's got some tweets scheduled and a handle down. Yeah.

A

Almost certainly right, I will uh pass you with the floor and uh have at it.

B

All right, let me see if I can work this thing, then so I'm going to click the share send button, because I I'm hoping we're going to be able to play a little.

A

B

A video, oh thanks, then.

B

I believe you can, and this talk is called what I learned: 1650 broken kubernetes clusters, so this is a a little bit of a knowledge here from my series, clustered, which I run on my rocco.live youtube channel where we break clusters and then try to fix them live in front of an audience.

B

Now, a little bit about me. I'm a senior developer advocate focused on cloud native and kubernetes, and I work for a company called equinix metal. I also am a cncf ambassador and influx ace. I am the host of the official kubernetes office hours co-chair of cloud native tv as well as hostess lgtm, as matt said, and my youtube channel that I devote far far too much time to is available at rawcode.live.

B

Now clustered is really good fun. We are, I think, 25 to 28 episodes in right now, depending on, if you count teams solos and uh the newber, the newcomer edition and the general idea is really really simple. um I reach out to friends within the kubernetes community. These are kubernetes contributors and end users, and I give them a freshly baked kubernetes cluster and tell them to break it.

B

Whichever way they can, they don't give me or the other members of the clustered episode any information on what has been broken and we go live and we share our screen and we try to work through all of the broken bits, identifying symptoms and looking for cause and effect to see. If we can get that customer back online and on paper, it's really simple. All you have to do is upgrade the clustered pod from image v1 to imagev2.

B

Of course, one of the things I've realized during the 20 plus episodes is that people are particularly mean very, very mean, so I've got a little video to start. It's just a couple of minutes long and I want to give you a taste of what clustered is so. Hopefully you can hear the sound.

B

Hopefully it's going to play yeah. We still have no cluster dns cube. Dns has endpoints, ah okay, that's after eight um yeah, let's see if guy wants.

C

B

And give us a last bit of advice before we re-wrap this up. Are we.

D

Close uh you've you've not looked at the configuration of cordenas you've looked at the pods, but not how the pods are configured to run. The dns.

B

We have no dns.

D

B

Keeps coming all right: did we miss something?

B

Let's, let's I'm going to say, I'm going to jump onto cluster 14 and grab the config map and see what's different.

C

B

Yeah this is the exact same fail, so he told us to look in the core dns config and then there was nothing wrong with the core dns config. That's just cruel, saying, there's a few pixels.

B

If this is a white space error in the core dns config, I will be mighty frustrated.

C

It might not be a white space heir damn.

B

He's mocking us now because he said you're jumping back and forward showed a slight difference. This number seems arbitrary, oh.

C

B

I'm assuming there's like a weird bug and I'm sure he's found it because he works at skyscanner and they've.

C

B

Of skill that most people don't have, but something to do with this number going so high is maybe causing it to load an old conflict map in the pods or something I don't really know.

B

But the number being so high worries me, um but we're now at the stage where I have to go and pick up my daughter so.

C

B

I'm just going.

C

B

Guy in to tell us what wonderful magic.

C

What we should do.

B

This is yeah all right, guys, lay honest.

D

uh So when you were jumping back and forth between the working config and the non-working contact, um if you look really hard at the sea in front of the in the kubernetes conflict,.

B

C

D

uh Yeah, it's it's not a c, no, it's a character which looks a lot like c what so the kubernetes plugin became authoritative for letter that looks like c luster.local.

B

All right, so that was particularly mean, that was a a unicode issue and the core dns configuration. Let me yeah. Well, there we go so that was the first time I ever changed. The rules on clustered and unicode changes are now banned.

B

uh What I what I love about that clip in that episode, specifically, I mean excuse my language in it, of course, but guy had me questioning primitives, like complete basics, like how.

B

Why was I questioning the resource version going above and I or 16 right and trying to like overflow back to older generations, like, of course, that would just not happen, because it's obviously going to be an i32 as a minimum, but you get in this position when you're debugging- and I think that's really important- is that debugging, these things is really difficult and it's going to challenge all sorts of assumptions and you're going to make wild guesses.

B

That will be wildly and accurate, but getting things wrong is normal, and if we look at albert einstein here, you know failure is success and progress we have to learn is through all these mistakes, that we learn the things that we need to learn.

B

So I'm going to take you on my path of failure from clustered all the assumptions that I made and the things that I got wrong and the things that I learned so the first thing and then probably the most prolific attack surface- that people use on cluster to break these clusters is the linux system itself. Now this is just a really shorter video.

D

Just to demonstrate, let's see some nodes man, let's see permission to know.

C

B

Let's check ed you've hit your first snag. There we go.

D

Okay, make it plus sex yeah. I need my auto complete stuff. I'm lost. You didn't install fish while you're. Here too I can. I can.

C

D

C

B

Okay, it's an interesting start. Yeah yeah do an uh do: an ls dash, l, a on user bin uh change, mod or probably just bin change mode.

B

Oh father, you don't even have tab completion.

D

Yeah, it's killing me but hey, we'll make it work.

B

If you don't, I think that's okay,.

D

Yeah exactly so just look at the whole slash bin. Is there anything which is executed at all.

B

D

Ls is obviously.

B

All right so take a look at the screenshot. This is the team's edition. This was a red hat versus talus systems and jiffy on the red hat team and his other colleagues removed the executable permission from chimod, not only that all the files and binaries that we see in white here also had the executable permission removed.

B

Let's change, attributes to mod qbdm cube control, oc which they were using as an alias to control scp, so they couldn't pull in other files and even peril, and what I love about, that is that it is really really simple and a mistake that we could all make easily and then not something that everyone is fully aware of how to fix so cube control and cubad members. You know you know.

C

B

Need those to be able to debug the cluster, removing the chi, mod and change attributes, which is plain crow and even removing the executable attribute from peril peril, allows you to actually execute syscalls on files really easily. That command is a life saver. You should keep a null of it.

B

However, what we learned in that episode is, you can actually use the dynamic linker directly to execute binaries on the machine that don't have the executable bit set and, in this case, we're using the dynamic linker to re-enable the executable bit on the chi mod binary itself with the gmod binary, and I think that's a wonderful tip and a great debugging thing also something I learned during custer is that learns file attributes. You know I'm familiar with the the chamod, the 777, the 604s etc, but there are extended attributes on all files.

B

In fact, you can make files immutable and I can't remember it was dan feneran or jason the tiberius, but they both or one of them at some point made the lcd right, headlog immutable breaking ncd. So these things are really really cool, but not, if you don't know they exist.

B

The next biggest attack surface we see on clustered is networking. So beyond linux, they always like to try and mess with the way the systems within the kubernetes cluster communicate with one another, and what we've discovered with networking is that there is a life after ip tables.

B

You know you may be familiar with eb and nf tables, um something I wasn't aware of until very recently, on an episode of clustered is there's the concept of traffic control, a tc command for manipulating the packets on the device through quality of service rules, and, of course, we've got ebpf and xdp, which are being leveraged heavily by the cilium project.

B

Now what I didn't know about iptables before clustered is that you can actually apply drop rules using the statistic module to apply a randomization effect to the drop packets, causing what appears to be intermittent errors. Errors at the ip tables level, particularly sneaky another common attack. Surface we've seen, is people changing the dns policy on the pods.

B

There is a dns policy in kubernetes called default. It is by no means a default, it does not use cluster dns whatsoever and in fact cluster first is the default uh dns policy. So it looks like when you look at a pod spec and you see dns policy default.

B

You probably think that's all right, because there's a warning sign that you probably want to fix something something else: we've seen is network policies with cilium cellium can use standard, kubernetes network policies and apply those rules for you directly, but selium has extended that api with their own cellium network policies as well, which add an extra layer of abstraction of this location to what's actually happening within your network cloud, also has cluster wide network policies that will modify the traffic shaping across all the nodes in your cluster.

B

All of these things are very tricky and delicate and need to be handled with care.

B

There's some tips for working with networking. You know I'm not going to stand here and say everyone should stop selling. Of course, you've got your own needs that have to be met, but selium is a wonderful cni implementation and it ships with something called hubble hubble gives you a visualization and user interface into all of the network policies across your cluster and service communication, showing you in real time, packets that are being dropped or accepted, allowing you to kind of trace all these um requests through your system.

B

It is a super power tool and I encourage everyone to check it out, even if you're, not using the hubble and psyllium, uh even if you're, not using hubble zillion, provides an editor for modifying and working with these network policies as well. You can go to editor.selim.io and you can actually start to build through a visual interface or drop the yaml into the box to visualize the network policies that you need in your system. This is a really quick and fast way to bring in those network policies and I've seen time and time again.

B

The teams are reluctant to do this because it's a dangerous and daunting task. Nobody wants to bring down their cluster networking.

B

Now next is zcd. This is the biggest scariest one that we ever see like. I don't think anyone who has ever appeared on clustered has been happy when scd is unhappy. Nobody is actually that familiar with debugging ncd itself, and nor should you be right, but there are a few things you need to be aware of number one is we've seen many people now unclustered attack etd by modifying the quota or fill in disks or writing arbitrary junk data into ltd itself.

B

All of these end up with an entity alarm now, even when you fix the problem by cleaning up the space you're moving the junk keys, increasing the size of the resource quota that alarm stays in place, even if you restart ltd and all sorts of things, and it was only through trial and error and pain that we realized. You actually need to alarm this arm for the lcd to become healthy and happy once again,.

B

Another thing we've seen very recently this was just last week on clustered- is that barco, a member of the kubernetes office servers, decided to enable encryption very nicely on a clustered cluster.

B

However, he only partially encrypted all of the secrets and config maps within the system, oh dear, so what this meant is that we couldn't pull or query any default, namespace config maps or secrets, there's a really cool trick to working with the encryption system in kubernetes and that is to add multiple keys and multiple providers and in fact, the way that you turn on encryption is to enable the identity provider, which is the default unencrypted provider.

B

Add in your key and then literally, you get all the secrets and all namespaces as json and do acube control replace to re-encrypt the values as they go in the encryption and ltd and kubernetes at rest as applied when modifying or writing to ncd and the way to fix the partially encrypted problem is just to add. Both providers so add a key and an identity.

B

And in fact, the way that you do key rotation as well and how a kubernetes configuration is to add multiple keys and then remove the old key after you've done that they get and replace.

B

All right so where next that's a really good question, but there's something I want to address. First, there are a lot of places that you can attack a kubernetes cluster and I've built this word map just to show roughly what's going on, but you know: we've seen people attack the cri, the csi we've seen, people who attack the controller managers even to the extent of recompiling the controller manager and publishing their own image or recompiling a cubelet and publishing their own image, making things immutable, applying policies via cavernogs policy, oppa, etc.

B

People get really really creative. The surface that you need to know in order to be successful and running kubernetes is ever expanding and dangerous.

B

And this isn't really a talk about failures. Specifically, I don't know if you've been paying attention to the quotes as we move through, but all of this failure things where I'm talking about the next area. We want to address, there's a quote that tells you that failure is just a part of the knowledge cycle. The way that we learn is by dealing with these problems, and I think that is crucial.

B

So what I'm going to encourage everyone to do is to remove hero culture right. We don't need hero developers. What we need is the confidence and the ability to say I don't know within our teams within our organization and if you're bold enough to do it in public, I get front of an audience every week with clustered and spend most of my time going. I have no idea what I'm doing, and I think I'm pretty good at this, and we have a duty and an honor to set a precedence for new people entering this industry.

B

The hero developer is no longer a thing, so my journey next is taking me towards ebpf. I have noticed through all of these clusters. Episodes the ebpf probably has the answer to every problem that we've ever dealt with, and I'm going to use one slide just to try and show you what I mean by this. So ebpf is really performant as the bytecode compiles into the kernel that exposes tracing probes, allowing us to understand what is actually happening at the kernel level.

B

There are some really great projects in this space for kubernetes psyllium we've already covered balco by cystic, is amazing for getting into the audit log and the events happening within the kubernetes system. We have inspector gadget by the convol team and pixi by pixie. Labs are completely autonomous, uninstrumented observability into your kubernetes cluster. You should definitely check it out.

B

The special gadget exposes all of the snips that I've listed here in a kubernetes content. Exec snoop is a bpf program that will tell you every time a new executable is run on a machine.

B

I o snip open snip, tell you when files are opened or being written to or being read from, and then there's all the tcp snips as well, which are going to give you visibility into all the packets. Within your system, ebpf is a superpower and it's what I'm really keen and excited to be learning next, if you want to learn more yourself, there's some links here, I'll publish a link to the slides on the slack channel momentarily but check out the bcc examples at github.com.

B

And what comes next, I'm not entirely sure yet, but I'm looking forward to failing a lot more. Thank you for your time.

C

Hi david, thank you that was excellent. I I do wonder how you put yourself through live fixing of kubernetes just continuously. It looks like you have fun doing it.

B

It is a whole lot of fun um once you get over the awkward bit like. Oh I'm just going to be completely confused, I'm going to have no idea what I'm doing, but it's through collaboration and painting. You know, I'm not doing it alone. I've had a guest about me. We have teams, we we talk about the symptoms. We try and understand the problems and it's through those conversations that we actually are able to share so much knowledge with the broader kubernetes community.

B

So it can be painful at times, but it's one of the most rewarding things. I've ever done.

C

Fantastic well we're running slightly over, so I will say thank you very much and then, if there's any questions, if you're hanging out in these slacks for a bit, then folks can hopefully post some questions into slack, and you can take a look. So we are taking a quick break time for folks to grab coffees or water.

C

Get yourselves ready for our next session, we'll be back in around about five minutes time. Thanks everybody.

C