Cloud Native Computing Foundation Online Programs, 2 Feb 2023

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Cloud Custodian - Proactive Governance of Your Cloud, Cluster, and Code

Description

Don't miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe in Amsterdam, The Netherlands from 18 - 21 April, 2023. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

A

A

Hello, everyone and Welcome to our webinar today we'll be discussing an introduction to clock custodian, simple rules for your Cloud cluster and terraform.

B

Hey everyone, I'm Sony, XI I'm, a Staff engineer at statlet and maintainer of cloud custodian.

A

And I'm George Castro Community manager at Cloud custodian.

A

So what exactly is cloud custodian?

A

Cloud custodian is a yaml DSL policy engine um for the cloud it scales up from the startup level of just having a few resources and is used in massive Enterprise scale in production uh by many large organizations. The intent is to drive behavioral change and Tighter feedback loops for your developers. But what does that actually mean so I'll give you like kind of my plain.

A

um uh We can go back one my plane, my plane, um explanation for it is uh you write your policies in in yaml and Cloud custodian is a rules engine that runs in the Cloud's control plane and ensures that the policies that you're writing get enforced on your Cloud. So a typical example we like to use is make sure that we're not opening a database and leaving it on the internet.

A

So you have rules that kind of manage all of your resources, and typically you check these into git or some kind of Version Control, and then Cloud custodian ensures that those policies are enforced on your Cloud to make sure that you know, if you have a policy to make sure that uh you're not supposed to serve open databases on the internet, then we we make sure so in a lot of ways.

A

The analogy we like to use is a seat belt to your Cloud resources to enable that, um if you accidentally do a thing manually that you have Automation in place to keep you safe next and recently, as cost has been more important for people over the past 18 months. Using these policies is also a great way to ensure that you're managing cost in your Cloud. So you can use cloud custodian and people are using Cloud custodian to not just ensure that they're meeting compliance needs.

A

But do you have a bunch of unused DBS snapshots somewhere or resources that might not be tied to a specific account that you were expecting or or things like that? So by defining all of these rules, you can manage your entire Cloud deployment more smartly and things that aren't supposed to be there. uh Cloud custodian can kind of uh garbage collect for you. That's where the analogy of the custodian is.

A

Is you define what resources and their limits are supposed to be in a certain place and custodian kind of forces that for you- and this is useful in the cost um aspect, especially because resources that you are not tracking uh tend to kind of pile up so having that garbage collection for a lot of organizations ends up being a significant cost savings by ensuring that what they think is running in their cloud is the actual thing that's running.

A

And, of course, compliance, um one of the great things about this tool is that you can catch kind of your compliance and rules and by version controlling them and using them in cicd uh custodian kind of enables a git, Ops workflow that allows you to manage all of that stuff in a tight feedback loop, because it does do real-time uh compliance checking of these rules.

A

So if today I were to try to deploy something into one of our resource and it was violating a policy custodian, you know if you set it up, that way, can remediate immediately and notify me that hey! You know uh there was a resource that I asked for that, isn't getting made because of these reasons, and what we are trying to do, as we alluded to earlier, is kind of drive that behavioral feedback of um you know. Okay, so where, where do I fix this?

A

If I tried to set up a thing that wasn't compliant where's the actual issue that I need to fix, does it need to be in my terraform or or how can I enable my developers to kind of instead of running into these guard rails?

A

uh To kind of allow them to have that self-service in order to help change that organizational behavior to be more compliant next and uh correctness, uh you know it it's kind of inefficient to set up a bunch of stuff and then find out that some of it is uncompliant to have to tear back down that costs time.

A

That goes resources uh developer time, especially so that's kind of why the model, the Mantra behind a tool like this is to enable that type feedback, loop, uh driven all by your existing automation, uh that you have and that's what we're going to talk about here today, specifically around kubernetes clusters and around um your terraform.

B

Yeah, so what does all this uh look like exactly? um You start out with a policy. So first thing you do is specify a name for your policy, and you also have to select a resource in this case, we're looking at S3 buckets and AWS.

B

Then you can Define any number of filters that you wanted to filter on for those resources. In this case, we're saying we want to find any buckets that have a head bucket and git object actions that allow the account listed here to access it, um and then you can specify what actions you want to run. So in this case we're saying we want to notify the resource owner and also to send a slack message uh using a certain policy template.

B

So that way, you can send these notifications directly to the people that are violating your policies instead of having to do something like keep a list and then track it down, and um you know pass around a CSV or something to your, uh your engineering teams and, finally, to do all this.

B

You just run the custodian run, commands where you pass in the name of the file and give it an output, um and then you'll start to see your policies running so here's another example policies so in this case we're looking for IM roles that are over over proficient. So you can see that we also support um these. uh These knots, ands and ORS. So any sort of Boolean expression that you want to have so we're saying ignore any any roles that are named.

B

I am provisioner and you want to check the permissions to say any roles that have this I am change password action inside of their uh inside of the the role itself um and again we want to notify that so in this case, instead of sending it to the resource owner, we're sending it to the security, email, distro and copying the uh the cloud team as well.

B

um So finally, also uh custodian policies can be run in two different types of modes. So there's a pull node where you are querying the cloud itself or the cluster directly. So in this case, every single time you want to check those over provision. I am roles, you're checking everything that's out in the cloud. Currently. There are also event based modes which utilize things like cloudwatch event triggers uh cloudtrail and config on the AWS side, and we have equivalents for that in Azure and gcp these modes.

B

Allow you to trigger off of events that happen in your Cloud, as well as in your cluster. So that way you can be much more reactive as well as do things like, uh remove any non-compliant resources that are net new instead of having to wait for the resource to exist in the cloud for a while and then do some sort of action, because that can lead to things where you can potentially take down live running services, for example,.

B

um So Cloud, custodian and kubernetes has support for those two modes, the first of which the pull mode. So you can query your cluster with the same policy language as your cloud. um Basically, this means that if you're familiar with running custodian policies for AWS, Azure, gcp you'll feel right at home. In addition, there's a Kate's admission mode where you could run custodian policies in an admission, controlled mode to allow deny or warn on any sort of object life cycle event.

B

It's easiest to deploy in your cluster with a Helm chart, and you can also do things like Auto label objects as they come into the cluster to determine resource ownership, for example. Finally, we have terraform support as well. So not only can you govern your infrastructure, that's already out there.

B

You can also use custodian to govern your infrastructure as code, so this allows your developers to know ahead of time that the things that they're deploying are not going to be compliant or they're not going to be in line with the guardrails that you've set this way they can make those changes early on and not have to deal with. The headache of going going back and potentially having to do things like stop a database schedule downtime and recreate it.

B

In addition, C7 left will also annotate these policy violations in line which is really nice to see. This is the exact thing that I have to change according to the policy itself and makes it a lot easier for developers to do the right thing.

B

So we'll go off and do a quick demo see so. The first thing that we'll start with will be a kubernetes poll mode example. So, um on the left here on my screen, uh I'm just running the kubernetes admission controller, which we'll get to in a second but first, let's run the policy for kubernetes. So, like I said, this is pull mode. So this is point directly from my cluster and, if I take a look at that uh research dates on that comes back.

B

So this is basically all of the information that you would expect to see if you do like a cube, CTL describe pod um and if you get every single pod- and this is a great way for you to see, uh attributes that you can filter on. For example, so let's go and take a look at the event based modes. So the first thing that we'll do is we'll take a look at our policies that we have here.

B

So the policies here are just in a config map that we've deployed to our kubernetes cluster, and you can see here we have a few, so the first one here deny pod exec based on the Pod. We have another policy here: checking for missing recommended labels, um another one restricting service account usage on pods and then uh one last one showing that uh we need to require at least three replicas on any kubernetes deployment that we have. So the first thing that we'll do is we'll try to create a pod.

B

So let's take a look at our pod manifest right here. So the first thing you can see is uh We've. We've got our pod manifest and if we try to deploy that you can see, we get a warning saying Mitzi recommended like what was all pause, mustafu and bar labels. So you can see in our benefits here we only have the food one.

B

So if we take a look at our uh part that we created, not only do you see, we only have this through equal bar label, but we actually use the um the policy itself to append the owner contact label here. So we can see that the kubernetes admin was the one that created the resource and then we also have this additional message uh that we that we appended as a label uh saying it's missing labels. So if we delete our pod there.

B

And then let's go ahead and add our bar label.

B

You can see we don't get any warnings. The Pod was created successful successfully.

B

um So this is a great way if you want to sort of ease developers into uh making sure they're doing the right thing before you do a hard restriction. The next thing we'll do is uh actually let's keep that pod up.

B

um The next thing we'll do is we'll try to do a exec into that into that pod.

B

So if we run uh Cube CTL exec, let's see here that we actually get an error saying that um it failed due to these policies, which says you can't connect to any pods with database in the name or the namespace c7n system, and so this is really great to allow you to have more fine-grade control on some of the uh actions that developers can have against the resources um and let's so the next thing we'll do is check out how to uh what happens if we try to create a pod with a more restricted service account.

B

So the first thing we'll do is uh we'll create this service account here. That's called cluster admin um and let's try to apply pod with service account. uh Actually, let's take a look at what that looks like first so here the main thing is that we're using the service account called cluster admin, which I'm sure you can assume has all sorts of permissions that you don't want everybody to use.

B

So if we try to apply that, so we apply pod with service account, you can see here that again we get this restriction, saying you can't use that service account. uh Finally, we had that policy there that restricted deployment saying you have to have at least three replicas on your deployment. So if we take a look at our deployment demo, we see that this one has three. So this should be able to work just fine, but if we go ahead and drop that down to two.

B

And we run a acute CTL apply deployment.yaml you can see here uh it failed admission due to the policy required at least three replicas. So let's go back in and change that 2 into a three.

B

You can see that our deployment was able to go through just fine. um So again, all the stuff you see on here is basically what you would see in your blogs for your deployment when you deploy this on your cluster, basically it'll match against only the events that you actually care about uh for your from your policies.

B

So the next thing we'll take a look at will be our C7 and left demo. So D7 left is a separate um CLI from custodian. It has one command so we'll just look at the help here. So there's a C7 left run command.

A

Can you sorry, can you increase the font up one on this one sure.

B

Yeah yeah, so if we go, two seven left run help. um Basically, what you do is you get pass in a policy directory which will be your custodian policies as well as a directory for your actual terraform itself. So if we take a look at the policies you can see, we have a policy here that says all resources should be tagged uh and specifically, it needs to have this environment tag, and then we have one saying that all SQ ads must be encrypted.

B

So if we run C7 and left run and we give it our policies directory as well as our current care form directory, we can see here that we failed two of these uh policies, so the first one saying that uh sqs must be encrypted and the second one here is saying all the resources should be tagged. So if you look at our main.tf here, we can note that so this first one we have a sqs queue that we just have uh here.

B

It's not in a modular thing, it's just directly in the in the main terraform. um So if we add our tags here like so that should fix the first one and then you can see in the you can see here in the second one, we're actually using a remote module. So, um rather than only be able to test the terraform that you have directly inside of your local terraform workspace, it will actually be able to look up the the module references as well.

B

So here are problems that we had uh managed SSC enabled set to false. So if we set that to true that should fix it and it will be run C7 left again, you can see that we have uh passed all of our policy checks and you can also look up the summary based on the resources as well. So in this case we have some ion documents and those paths as well as our sqs cues.

B

So those are the demos and I'll go back to the slides here.

A

And that's basically, a tour of a custodian on a cluster and an infrastructure is code if you're interested in this you'll find us at kubecon and cloudnativecon in Europe in Amsterdam uh coming up, and we don't have any information now but hoping to also have a maintainer session as well, if you're interested in contributing and checking out all the um cool stuff that an open source project has to offer and with that Sunny. Thank you very much and thanks everyone for listening and uh feel free to join us, Cloud custodian dot, IO. Thank you.

B

Thanks everyone.