Cloud Native Computing Foundation Online Programs, 1 Sep 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Kubernetes Misconfigurations - What are they and how to identify them

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

Hello and welcome to kubernetes misconfigurations uh what they are, how to identify them with fairwinds. If you're here for a different talk, that's not what we're doing! This is what we're doing. I'm excited.

B

To have you and ivan.

A

Do you want to say your greeting sure.

B

We were joking about this when we were getting started so uh good morning, good afternoon, good evening, wherever you are. Thank you for joining this webinar.

A

Yes, thank you. I like that so much. Okay,.

B

A

Dive in um so before we begin, let's introduce each other so um robert, why don't you introduce yourself.

C

Yeah, my name is robert brennan, I'm vp of product development. Here at fairwinds, I head up all of our engineering efforts on fairwinds insights, which is our platform for detecting kubernetes misconfiguration from a security perspective, reliability, perspective and efficiency perspective.

A

Great and uh you also have your hands in the open source things as well, which we're going to be talking a lot about today. um Ivan.

C

A

Want to introduce yourself yeah.

B

My name is ivan fetch, I'm a software developer here at fairwinds working on the team. That's part of insights.

A

All right and my name is kendall miller. I am technically an evangelist here at fairwinds and uh excited to be with you all um we're going to be diving into a number of different things related to misconfigurations, but first tell you why we're the ones talking about it? So uh this is not a hard, fair wins pitch, but to give you background on why we're here talking about this fair winds has been in the kubernetes space. For a long time, we've been in business seven years over seven years, uh seven and a half ish wow.

A

It's been a long time and um working with kubernetes almost the entirety of that time. Helping organizations use kubernetes correctly, uh get over the the hurdles of using what is for most people a new paradigm and it's complicated to learn. um Today we developed open source software as well as a sas product that specifically addresses misconfigurations organizations out there are worried when I move to kubernetes. Am I going to mess everything up? How do I do this correctly? It's a new paradigm. I know how to think about the old paradigms.

A

How do I get used to this new one and that's what fairwinds does so we build open source in that space, we're going to be talking a lot about some of that open source. Today, we also build a sas platform that addresses this, so misconfigurations is literally the water. We swim in uh it's what we do all day, long at fairwinds and that's why we're addressing this topic today for for y'all, so proper configuration counts uh and let's dive into some of the. Why about this? um First of all?

A

What's what's different well well before we do that. um Let's talk about some of the common misconfigurations that we see, and uh one of the first things here we have is only 35 of organizations have correctly configured 90 of their workloads with liveness and readiness probes. Now we can dig into this specifically here in just a second, but I want to begin with um huge percentages of organizations leave things off, partly because kubernetes is just a fundamentally different paradigm. Right, liveness and readiness probes in kubernetes is a different way of thinking about things than previously.

A

Where you know there was a time in your uh career where you could tell a machine went off because it was um you know in your closet, in the server room in the back or something you might have had liveness and readiness probes for that kind of thing too, but uh it's a whole new paradigm. So, let's start with, what's just fundamentally different about kubernetes and then let's get into some of the specifics of this particular common misconfiguration and um ivan. You want to kick us off. Why is kubernetes so fundamentally different.

B

I think what relates to to this is that with kubernetes, things are typically a lot more ephemeral than your server room back of the closet under the desk scenario. So even though, we've always had some kind of monitoring some kind of thing that in the kubernetes world we're calling liveness and readiness probes. What I think is uh appreciable difference here is that containers are coming and going and scaling and moving around a lot more than applications tended to move around, and so it sort of exacerbates uh makes much more important.

B

The proper configuration of all of these things.

A

And in kubernetes world aren't we declaring the state that we want in a way that's different from previous models of actually writing code to go, create the state that we want. I mean isn't that one of the big fundamental shifts here is we're just uh we're just defining what the end state is and letting kubernetes figure out all the configuration to get us there rather than having to like say, hey, go out and build me a box build me a box.

A

This way build it like that, build it this way and scan it this way and grow it that way and scale. When you hit this like with kubernetes, we just set some parameters, and it does everything for us right.

B

We are, and, and while it happens often very quickly- and it feels very similar to that uh scripted or configuration managed way of implementing something that was more imperative. It is that declarative control loop that happens in kubernetes that keeps things as close to the desired state as it's possible for kubernetes to do, and that's what helps things heal when you have things break in your app or you lose a kubernetes node et cetera. Kubernetes sees that there's that difference now between desired and actual state and works to fix it.

A

Great so with that in mind, and- and I I like to use the analogy that moving the kubernetes is like moving from windows to linux- it's not that linux is really really hard to use. It's that if you've never used linux, it's a whole new paradigm. Once you get used to it, you will adjust, it will start to feel normal, but if all you've ever used is this other thing. This new thing feels very different, and so uh that's part of why people are so afraid of misconfigurations. Is they just know?

A

I'm gonna mess things up. So how do I avoid that? So um now talk I mean ivan. You touched on this kind of you know almost in passing, but let's talk. Let's talk specifically about the liveness and readiness probes like um how does that relate to what you were just talking about in the new world of kubernetes.

B

Yeah, so um loudness and readiness probes are two kinds of probes that kubernetes can make connections that kubernetes can make to your application and it uses them to either restart containers or pods that have been hung or that have stopped responding or also in the in the case of readiness.

B

Probes kubernetes will stop sending traffic to a pod that is determined and not be healthy, so that that helps avoid things like you know: bad gateway, http, 502 errors that might come from a upstream load balancer or an ingress in kubernetes to some application, pods that are less healthy.

B

So, when we're defining these, we also need something for kubernetes to talk to on an application, and if your application doesn't listen on some http port, for example, you can also exec a command in the container, but one one important aspect of this is: is that whatever you are querying for these probes, you want it to be relatively efficient. These get queried pretty often potentially once a second for a single pod and you're, probably running, hopefully running more than one pod for your application.

B

So if you have a probe that does a bunch of analysis inside your application or a database query to find out if the application's healthy, you don't want that to cause an outage, because the probe is being queried for your app. So often.

A

Yeah, so um this falls into just one of those. This is a new kind of uh paradigm. This is one of those configurations that people do get wrong uh correctly configured. You are going to have liveness and readiness probes on more than 90 of your workloads. There.

C

Might be a.

A

Situation where you don't care, but correctly configured by our definition here is the vast majority of your workloads should probably have liveness and readiness probes. If the workload matters you want to know that it's running uh anything that I'm missing there robert before I move on.

C

No, I would just say, like the big thing to keep in mind, is that um you know the reason most organizations uh or a lot of organizations have not configured a lot of these is that it is optional. So um you know your development teams will be able to successfully.

C

You know, for some value of successful, uh be able to successfully deploy a workload without these things set, but that workload is likely to experience downtime uh if, if it's not set, um it's likely to you know have have issues that don't get caught um if they're not setting them. So you really need do need some kind of proactive approach to check for these things and make sure every individual team is doing it.

A

Yeah- and so uh I mean we have a couple more slides like this- and we're going to get into some more specifics here in a little bit, but uh just to give you a feeling, like the reason we have. These statistics in here is to show you organizations struggle with this. So only 42 percent of organizations today manage to lock down most of their workloads uh and 54 are leaving over half of their workloads, open to privilege, escalation and thus security holes.

A

So um there it's these kinds of statistics that just show us that people struggle with configuration in kubernetes, because it's so different and it's vital for your organization, if you're using kubernetes to get it right.

A

I I feel, like that's that's that sounds obvious, but uh you know in the same way that I've seen people move to the cloud and never implement things like auto scaling, which, like the whole promise of the cloud, is that you can scale up and scale down, uh and so, if you're, using the cloud and you're not using something simple like auto scaling, okay, simple is hand wavy. It can be complicated but you're not using one of the greatest promises of the cloud you're, getting it wrong uh or you're.

A

You know you're you're messing up, probably um similarly, when you're using kubernetes and it's in it just is different because it's cloud native at scale in whatever cloud you're running it in um it's very easy to mess up these common things, and it's just not worth doing when kubernetes is a different paradigm. You can get it right. We're going to show you tools to help you get it right, but uh kubernetes is a different world. Okay, I've. I think I've belabored that point long enough.

A

Are we ready to dive into some specifics around security, reliability and cost yeah.

B

Let's do it totally.

A

Okay, here we go security, uh so there are a number of common misconfigurations that we see in security. Let's talk about over-permissioned containers and then we'll dig into some some broader other things that we talk about, but we'll go deep on this one to start, we'll go deep on on one specific issue with with each of these and then talk about the broader common kinds of things that we see um but uh robert. You want to start with over permissioned containers. What what is an over push permission container?

A

What is the problem with that uh and and then go from there.

C

Yeah so again, uh the defaults for kubernetes um are not always the most secure way to run a container. There are a lot of things that kubernetes will allow you to do by default, that you don't necessarily need to do. For instance, running a container as root kubernetes by default will allow a container that runs as root, but you can specify in your configuration.

C

I never want this container to run as root and that's a great um you know way to tighten the security of that workload because, for the most part, it probably doesn't need to run as root unless it's doing something very specific or it's. um You know designed in a very specific way to need uh root access.

C

um Most likely you can. You can run your application perfectly fine without root. uh This goes for several other different configuration options that are available in the kubernetes security context, so whether that container runs is privileged. uh What capabilities have been added to that? uh Added to that container? um These are all things that basically uh a workload that is misbehaving or that gets compromised by an attacker.

C

These permissions could be used to basically escalate um to get access to the to the root node to get extra permissions on that root. Node and, potentially, uh you know, basically spread the attack throughout the cluster instead of being restricted to that one single container.

C

So it's it's super important to as much as possible, tighten the security of a workload uh to make sure it's adhering to that principle of least privilege to make sure it doesn't have permissions to do things that it doesn't need to do.

A

Yeah- and uh I mean I've- I've said before in in other webinars. I feel like the the average response from a person. Who's not tuned into this is to think like nobody's gonna break out of a container and get access to other things right like like that, just it sounds far-fetched except in the world we live in. We know of lots of people who make a career out of breaking out of containers like that is. That is a thing that people are like hey. I escaped a container in this situation.

A

I escaped a container in that situation. I mean one of the ones that I saw was. um I think it was. I escaped a container running on a cluster in a mainframe or something you know it was just like yeah, that's uh that was more just for street cred, because that's not gonna be a thing that we're going to run into a whole lot in regular life. um I have anything to to to add specifically over permission containers.

B

Just a bit of an underscore robert robert covered it all, but uh this is harder I think, to put time and effort into implementing correctly than, for example, the thing we just talked about readiness and liveness probes. So if you want to limit the linux capabilities, which is essentially what colonel calls, is it possible for your container to make that takes effort to minimize those and then run your application through its typical qa testing?

B

If you want to make sure that all the things your app needs to do are still possible for it to do while it's running, so this is an easy one to ignore, because it's hard to do and you've got to set that time aside. So it's it's a a bigger gap for sure that we see and when you take the time to do the first few, like don't run as root, have your file system be read only in your container those type of things? That's an awesome start.

B

Please, please start there and then move on to the other stuff uh when you can make time for it.

A

Well, and so so, let's spend a minute talking about other security issues that we see commonly made in kubernetes. So um it's not unique to kubernetes, but something that people forget about like there's there's some amount of I can deploy a workload with a known vulnerability.

A

It's in a container. It's going to be fine, it's not going to have access to anything. You know, I think, of the the famous log 4j example from recently you know. First of all, you want to stop known cves that are running in your workloads or running in your containers from being deployed into a production environment. You want to stop that from happening period.

A

If that's going to happen and you've at least kept your container pretty locked down that does limit some of the attack radius that it can have right. uh So so when it when we, when we think about security, there's the big picture things and you work down to the granular level and everything in security and operations is a trade-off. But uh there's there's huge trade-offs to be made. It like. There's huge mistakes to be made, and this may be more complicated to do. But really limits the attack or the the blast radius of an attack.

A

If you do it well, um but.

C

A

Just just broadly talk about some of the common security issues that we see over permission containers we've mentioned- I just mentioned um you know deploying known cvs or or vulnerabilities into your cluster. What are some other common misconfigurations uh that are that we see regularly.

B

They're, probably still broadly in this space, but uh you know allowing access to things like host path mounts from your container. So now your container is mounting a directory, that's on the node itself, depending on what you allow access to that can be a risk.

B

uh Similarly, access to the host network, it's pretty common that we see containers being allowed to access the host network instead of isolated network space in the container, and that means that everything that the host can see on the network traffic wise you that container can see which can be handy for certain there's certain things that that we won't mention that that need to run that way.

B

uh But it also is a risk um similar, similarly host ipc having access to that inter-process communication or host pid as well means the container can see processes that are running on the host, not limited to the processes that only run in the container. So that gives you, even if the container is not running as root, gives you some visibility into what else is running on the host, which is you know, intel for some attacker, for example,.

C

Yeah ivan brings up a really good point there, the the reason you know these security options are available is that sometimes they are necessary. If you're, you know running a workload that does like uh you know, network telemetry, it probably does need access to that host network um to be able to do its job, um and so that's an example where you would want to create an exemption for some of these rules and say: okay, this particular workload does get access to this particular security feature.

C

um The the issue is that the vast majority of workloads, especially the ones that you're building internally at your company, don't need access to these things. They're, probably just api servers they're serving a website, something like that. They don't need. You know deep telemetry into you, know the network operations going on inside the cluster. um So I think uh I think it's important to note that sometimes these options are appropriate, but it's in very rare and isolated cases. Yeah yeah.

A

So um well anything else to add just in the broad bucket of security misconfigurations that we see commonly robert. What are the other ones that come top to mind.

C

The other, like broad category that we haven't really mentioned, is uh something like the control plane level, um so that would include things like if you're, if you're, managing your own control, plane you're, not on like eks or gke, making sure that that's not you know a publicly available control plane so that you know anybody from the internet can just you know log in and start messing with your kubernetes cluster. um You know making sure that it's using ssl that cd is encrypted there's a whole bunch of stuff.

C

You need to do to have a really solid, secure control, plane, environment. The easiest thing to do is to go with a managed service like eks or gke, where they manage the control plane for you. They do it well.

C

So if you can go with one of those providers, but if you're managing it yourself there's a lot, you need to do to make sure you're doing it right, there's also kind of analogous to the control plane. um A lot of topics are on roll rate, role-based access control, making sure that you're using the principle of least privilege to ensure that you know different personas at your company have the right level of access to your cluster.

C

You know, maybe the sres need to be able to delete and modify things on the fly, but the developers only need read-only access that kind of thing, and also the individual workloads in your cluster. The service accounts that are doing some automated operations. Those those two should adhere the print to the principle of least privilege. They should only get permissions to do the things they need to do in order to get their job done.

A

And it's worth uh making a plug there for one of our open source projects. It's called our back manager, so rbac role-based access control. That robert just mentioned uh is for managing um role-based access control, which uh part of the reason that that project exists is we see people struggle with it more than uh than we would like them to so we've tried to ease some of that process.

A

um There's a lot of ways to manage uh our back, but uh one of one of them is our open source project so check out uh our back manager. If uh that's something that you're you're not feeling confident in today, it will help you uh make that a little bit easier. um I have anything to add before I move on.

B

No, that's awesome, we'll see the next one.

A

Okay, so uh I'm I'm, I want to wrap up security with just a. There are a lot of misconfigurations that you can you can. How do I explain this uh reliability costs you something immediately. Oh this! This is slow. This isn't up uh user tried to get on my website. I cost me, you know I think, of how much money it has to cost amazon if they have two minutes of downtime right.

A

That's why you basically never ever see amazon's website down, because it's probably millions of dollars, for you know a minute if, if not more than that, your website is similarly affected by reliability, misconfigurations, which we're going to get into in just a second but security misconfigurations you can get away with until you can't it's free until it's so expensive, it puts you out of business, it can do damage to your brand. It can do damage to your user base. It can do damage to everything and so having a correct security posture.

A

I guess anybody who's been in this space knows this, but I feel like it's worth it uh reiterating it matters to get this right. In fact, security is like one of the number one reasons we see: people interact with our software, both open source and our sas. Software is because they want a security posture in kubernetes that they have confidence in. So it's not a it's, not a minimal thing. um Okay, now we're gonna do something similar here talk about reliability. Let's talk about health probes um ivan you can start and we'll we'll talk about.

A

You know what are health probes? What's the problem that they're they're they're trying to solve? What's the impact of getting it wrong uh and then we'll go broader on some of the other reliability issues that we see, people struggle with.

B

Yeah, so um I don't, I won't explain uh these probes because it we actually kind of talked about them earlier when we were uh starting with kind of the overall overview of these topics, but I'll say a little bit more about the. What happens if you get it wrong part. So if you don't have readiness and liveness probes defined for your application, then kubernetes doesn't have any way of knowing what the health of your pods containers are and the impact of that apologize for the fire engine.

B

In the background, the impact of that is that if you have a container that hangs or becomes unhealthy stalls in some kind of way, kubernetes won't know that it can go ahead and restart or that it should restart that container without a probe. Kubernetes definition of a healthy container is that the process you ran for the container to start is still running and, as we know, that process could be quote unquote running, but it could be deadlocked for some reason. The readiness probes without those kubernetes won't know whether it should send traffic to that pod.

B

So if you have a service defined in kubernetes, that could either be used internally for other applications to talk to your application, like a microservice architecture or a service that is leveraged by your ingress controller or directly attached to a load, balancer you're, getting traffic in to your application through that service and without a readiness probe. If you have that, similarly unhealthy or unresponsive pod kubernetes will otherwise then continue to send traffic to that, and so now that means that your users and your customers are accessing an application pod that isn't able to do work.

B

So that means tongue connections or http errors and those kind of things. So those are the ramifications of not setting these these probes that we've talked about earlier in the webinar.

A

B

Making sure that.

C

A

Are working the way they should be working? Go ahead, robert.

C

A good like symptom that you've got an issue here; either they haven't been set or they're. Misconfigured is, if you see you know, every time you deploy, there's a big spike in 500 errors or some other type of error message in your logs. That's like there's a good chance that you know you're seeing this kind of downtime. As you know, a new pod spins up it starts getting served traffic before it's actually ready for that traffic. um So keep an eye out for those kinds of uh you know: spikes and errors. Every time you deploy.

A

Yeah, um I'm trying to think of a an analogy there, but it's it's. You know basically um think of a think of a new person coming into your organization and uh on day one you, you know hand them uh a mountain of paperwork and ask them to file your company's taxes like there's, no way they have the context to be able to do that.

C

They're not going.

A

To be able to do that, they're not going to do it well and they're, going to stare at you with a very blank face. The same way a workload is going to respond with uh not yet uh not yet not yet, except it's not smart enough to say not yet so it just says no um anyways. uh What are some other common reliability issues that we see? I mean you know building. Yes, I need to build my cluster, so it's secure.

A

I just covered that, but uh building a cluster so that it's going to be reliable and up- and you know what are the, what are the common misconfigurations that we see? What do people get wrong? Often.

C

Another one might be you know the presence of a horizontal pod, auto scaler or a pod disruption budget for each of your workloads, um so the horizontal pod, auto scaler, for instance, uh you can tell kubernetes uh when to scale your workload up when to scale it down how you know what the maximum level of concurrency you want like how many pods you want at once.

C

uh What's the minimum number of pods you want at once, like you probably always want at least two running at any given time um for any kind of high availability scenario, uh so making sure that that hpa, that horizontal pod autoscaler definition is present is a huge part of reliability.

A

Yeah that I mean to to add to that very simply, um you know, there's there's a big difference between how much traffic you serve as a e-commerce website on a tuesday in february uh than there is what you, what you're serving um on black friday, and if you don't have the ability to scale. uh You know if you're trying to serve uh ten thousand times the customers with the same amount of computing power, you're not going to have a reliable experience. That is good for your users uh ivan anything. To add to that.

C

A

Other other reliability issues that come to mind right off the top of your head. I mean we're gonna some of these bleed into one another and we're gonna talk about cost in a second and some ways to get cost right and some of the effects of getting cost uh and and settings around cost wrong, and some of that.

B

Let's yeah: let's move on to: let's move on to cost because I think there's uh some some good natural overlap coming up here so.

A

Okay, okay, cost uh inappropriate resource requests and limits so so ivan. I know that you were worried that I had we had these separated and you you weren't, going to be able to talk about them separate so.

C

A

To let you talk about uh resource requests and limits how it affects both your cost and your reliability so dive dive in.

B

Yeah uh not not really worried, but definitely I think that a lot of these categories are, um of course, relate to each other. Like you said about.

A

You were, you were worried. I remember.

B

C

A

Keep keep going.

B

Like you said about the uh you know, efficiency and reliability and how they kind of overlap um and security, and so does this. So um so as far as cost cost goes, we've got two big key things here: around resource requests and resource limits um and the the cost. No pun intended of getting these incorrect is that uh you end up having a noisy neighbor problem, among other things, in your kubernetes cluster. So um if you've got over provisioning of your nodes, so you've got nodes that are too large or too many nodes.

B

Then now you've got cost overruns happening. uh If you have under provisioning on the other side of that coin, then now you've got instability, so um an instability is its own thing. uh The overlap, but also you know, there's a cost to instability which is downtime and impact your business and your data and your customers, so developers uh or folks that are deploying your apps need to be specifying uh cpu and memory requests and limits and requests are essentially.

B

What are the resources that you think your app is going to need that that's a baseline and then limits is how much should your application use as a maximum as a cap of sorts and there's a lot of technical detail that I think we'll avoid uh for now about what happens when you reach those. But this relates to all kinds of other functions in kubernetes, depending on how you have these requests and limits set, they get you used for scaling new nodes into your cluster and uh putting the workloads on the correct nodes.

A

Yeah, so uh these do bleed together. I you know, I just think of um the reason that we have cost problems around inappropriate resource requests and limits is tied to reliability. uh I'm an engineer deploying my workload. It works on my machine. I literally click the apple in the top corner, see how big my machine is and make sure I provision a container. That's the same size, uh that's probably overkill for what I'm probably doing in that container. Probably in huge huge quotes, you never know.

A

I can write a really inefficient workload if I want to um just make sure that that you know there's memory leaks everywhere, but um that you know as a developer. I just want that thing to work.

A

One thing I definitely don't want if I, if there's a culture of service ownership, is for my app to ping me in the middle of the night, because I underprovisioned it so I'm likely going to wildly over provision it so that that's never a problem, and so that's where the business use case ends up in conflict with the individual's use case.

A

You know the business wants to make sure that that workload is up, but also that it's not costing a fortune and even understanding quality of service levels is really important here um and uh uh it's easy to to mess those things up, but uh yeah robert anything to add.

C

uh No just that you know there there is, like you said, there's a natural tension here between uh the folks who are charged with making sure this application is up and running all the time. So, namely you know, the developers uh who are gonna want to over provision and say give me all the cpus give me all the memory uh versus you know, finance and ops, who are you know ultimately responsible for that aws budget? That's getting a portion to these containers, you know sometimes incorrectly.

C

So it's just it's super important to have those discussions and make sure you're making those trade-offs correctly, and you know making sure that they're also data data, fueled conversations right. You need to make sure you're watching these workloads in production.

C

So you know, okay, you can come to the table and say you know this workload has never used more than half of a cpu at any given time and we've got four cpus provisioned like we definitely need to take that down uh to at least like one cpu, which would still be way over provision, but save us. You know 75 percent of our bill, um so so having data for those discussions is super super helpful.

A

And it's one thing: when your organization's very small and you have one or two pods running it's another thing when your organization is very large and you have thousands of pods running and all of them are even a little bit over provisioned that just spirals out of control really really quickly. So, um okay, so we've, given you lots of examples of ways to mess things up, uh and now we want to talk. You know a little bit about open source tooling to actually identify those misconfigurations.

A

So we play a bunch in this space, we're going to talk about some fairwinds, open source tools and then we're going to talk about a few other open source tools that are non-fairwinds and and go from there. So, let's, let's dive in first with uh polaris, so um a back story for polaris today today, fairwinds is a um software company.

A

All we do is build software for organizations to to help them succeed with kubernetes, but our genesis, as an organization was in services where we were building and maintaining kubernetes infrastructure for customers and at some point we realized it doesn't matter how great of an infrastructure we build for people if everything that's deployed into it is fundamentally broken and we saw the same mistakes that people were making over and over and over again, and so we went and built players, so polaris exists because people mess the same things up.

A

Everybody makes the same mistakes and I imagine some of it is related to you know. If you come from windows over to linux, you're. Looking for the start button right, that's the paradigm. You know you're going to make mistakes and you're going to click around hoping.

A

You find something that looks familiar and you're going to make the same mistakes over and over again, every single person, that's making that transition is going to struggle with some of the same things, and so that's part of why we see these same issues, but um robert give a little bit more of an overview for polaris than I'm I'm giving right now.

C

Yeah I mean so polaris checks for pretty much everything we've talked about today. All the misconfigurations we've talked about from cpu limits missing to security context, issues to liveness and readiness probes. We have built-in checks for that. I think the the really important thing to note about polaris is that you know it's not just going to look inside your cluster and tell you here are all the things you're doing wrong. It actually can be implemented. You know not just as a dashboard.

C

Looking at you know, what's inside your cluster already can also be implemented as an admission controller, so it can block things from getting into your cluster if they don't meet a certain level of configuration and it can also run in ci cd. So it can look at your input infrastructure as code changes as somebody's making a pr and say hey, you know you added this new deployment that doesn't have a liveness probe specified. uh You know, I'm gonna block, you block you for merging this pr.

C

Until you specify that liveness probe or until you you know, add cpu, requests and limits things like that. um So the fact that it can run the same checks in all three contacts- you know, infrastructures, code, admission control and you know, inside of a live cluster- makes it a really powerful tool.

A

And you can also write custom checks using a json schema, which is important because there are other tools out there that do custom uh policy enforcement and we're going to talk about opa in a second and you know, but there's some people struggle with rego uh the language that you need for oppa, and so, if uh using a json schema that you're familiar with, is an easier way to approach that hilarious can make that easy for you to implement that way, um and this is what a dashboard for polaris looks like.

A

So it gives you an overview of the cluster, gives you a grade. A health score talks about all the things that it's checking. What's passing and what's failing, this is great to deploy across one or two clusters. It's really difficult to deploy across an organization and check everything and make sure it's all implemented in the right way across lots and lots of clusters. And you know if you want to operationalize any of these tools at scale check out fairwinds insights. That's our sas platform. I'll probably mention that a few more times.

A

But what insights does is adds a few things. Proprietary, above and beyond this to add value, but also makes it really easy to operationalize at scale write the policy once enforce it across all your clusters in your organization. So um if you want something similar check that out uh goldilocks and ivan, you want to give a high level overview of goldilocks.

B

Sure so goldilocks uh is the theme of get your resource requests and limits just right and what it does is watches your workloads running in your cluster and then makes a recommendation for what you should be setting those to, um and we do have some extension uh beyond that in our and our other products as well.

B

But um that helps you for scenarios where you have very spiky workloads at certain times of your life uh sort of like kendall's black friday example earlier, but uh goldilocks is awesome because it helps you avoid having to like dig into a monitoring dashboard and look and look for the the peaks in the graph and try to do the guesswork of what should I be setting my requests to now that I know that they're important and I should set them to something and what the cost of me not doing so is how do I figure out what those numbers should be and goldilocks helped you do that.

A

Yeah, it's called goldilocks, so you can get it just right and also you know just just to add on, and we do got to speed up the wrap up here in time, but uh everyone struggles with the resource requests and limits you you tell an engineer, get that right: they don't have any clue how to get that right. Goldilocks makes that easy.

A

Finally, go no go robert. Do you want to give a quick pitch on this.

C

Yeah, so this is actually our newest project. It's um it's a really cool uh way to basically validate whether or not you're ready to upgrade to a new version of a home chart uh so often for home charts like, for instance, cert manager is very popular one for managing certificates. uh They will make breaking changes from you know. One version to another: you'll have to update some crds and the way you're doing things in order to be compliant with the new version.

C

Go no go will look inside your cluster and tell you whether you've implemented all the changes you need to make in order to safely upgrade to the new version.

A

Great and uh let's talk about a few third-party tools here- um tribi opa coupe bench- you want to give the quick rundown of that robert.

C

Yeah so trevi's a great tool for container scanning.

C

It can look inside of containers and understand if there are any known vulnerabilities inside of those containers by cross-checking them with a very large database of known vulnerabilities, uh so very powerful tool for container scanning um oppa is the next one. In line here, uh oppa allows you to implement custom checks um so similar to polaris, but even a little bit more powerful.

C

It's it's really like a um not quite turing complete, but a full-fledged programming language for doing these kinds of checks- um and this is great for, uh if you have you know very special uh custom needs in terms of uh you know, making sure that every workload has a particular label set like. Maybe you want a cost center code label on everything you know, things that are very specific to your organization can be implemented as opa checks and then last we have cube bench here which will look inside of a cluster and help.

C

You understand how well it conforms to the sysbenchmark for kubernetes, which is a set of guidelines for how to configure particularly the control plane of a kubernetes cluster. So, if you're, managing your own control, plane, coupe bench is a great way to understand how secure that configuration is and what you might need to do to really get that security profile tightened up.

A

Great and finally, uh we do just want to give a quick plug- I mentioned in passing fairwinds insights, but um this is for kubernetes governance. Putting guard rails around the ways that people are deploying things into kubernetes from ci cd through to production, write policy once enforce it everywhere. We cover security, cost, optimization policy and guard rails. It includes polaris, goldilocks, trivia, coop bench oppa and a few more as well.

A

So if you need an all-in-one solution, that's going to make it easy to operationalize policy enforcement in kubernetes across your organization, check out, fairwinds insights and, finally, we're going to wrap up with go check out a white paper. We have for common kubernetes misconfigurations, where we cover these topics, um kubernetes the good, the bad and the misconfigured. So we have a white paper on that and.

C

A

uh Where this is published, so thanks so much for being with us, we're gonna wrap up to hit that 40 minute time, and we will hopefully see you another time. Thanks.

A