Cloud Native Computing Foundation KCD Sri Lanka 2022, 22 Dec 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Securing Kubernetes By Nilesh Jayanandana

Description

Sri Lanka has a growing group of Cloud Native enthusiasts, students, professionals, and technology leaders. KCD Sri Lanka offers a platform for this community to come together and connect with other tech communities in India and neighboring countries. It provides an opportunity to experience conferences like KubeCon / CloudNativeCon together with the rich cultural heritage of Sri Lanka.

A

Thank you thanks girat, so today, I'll be talking to you about securing kubernetes, so in this session, I'll be talking about from a point of view of an attacker and how an attacker might try to sabotage a cluster and how we can prevent it.

A

So, as the first slide I'm introducing you to the area of attack when it comes to kubernetes Cluster and at the bottom, you can see infrastructure as Cloud, which is uh basically where you have your data centers, your firewalls, your network and servers and and the next layer. We have our cluster. Where we have our back authentication authorization, then we have admission control and we have Network policies.

A

On top of that, we have containers where we have container sandboxing image restriction, privilege escalation, supply chain Etc and finally, we have developer discipline as the code base layer. So let's look at the first scenario.

A

Attacker has access to your network so now the attacker has somehow uh found a way to find your network. And now, how can you prevent that? So, basically, if you're using Cloud, you can basically use SSH based key key auth, and then you can make your kubernetes API private. To avoid these attackers or Intruders, discovering your API and try attacking it, and then you can have a firewall. You can secure your network through a firewall, and then you can have proper order back policies implemented in your cloud and also in your cluster.

A

The next scenario is attacker: has access to your kubernetes control plane. So now the attack has somehow gotten through your network, and now it has access to the infrastructure of your kubernetes control, blade so before jumping into what we can do. Let's just go through a few uh say, a few slides on how you know what kubernetes are back is, and you know who can what things are. So on the first slide we have on the left, we have who can access? So in kubernetes we have two types: users and service accounts.

A

Users are for humans, and users are someone with a certain key and the certificate is signed by the certificate authority of the cluster search is managed by external to the kubernetes, and it's not a resource in kubernetes.

A

Really, as I mentioned it's handled externally, and then we have service accounts, so service account is for processors, it's not for human use, but rather when there's a spot or some other resource in kubernetes trying to talk to the kubernetes API, let that be a worker node, let that be a pod crd, whatever they need a service account to access the API. So these are the users in a kubernetes API and then comes permission.

A

So, as you can see, on the right hand, side we have two kinds of permissions: one in blue is called roll and the one in Orange is called clustero. So, as you can see, the role is namespace bound. So when you create a role, it only affects a namespace, whereas when you create a cluster role, it's rather Global in kubernetes it's accessed across all the namespaces in the in the cluster and then to bind this role into a user or a service account.

A

We have something called a role binding, so there are two kinds of role bindings, one being the role by the name: role, binding itself and the next one being the cluster role binding. So role binding is also namespace bound and when you have a user, you can bind that binder role to that user in a given namespace and that role can either be a role or a cluster or but when it's a cluster role binding, it's it's more advanced and it's more. It has a lot of privilege because it's it doesn't scope into a namespace.

A

It's there for the entire cluster. So when you do a cluster role binding to a certain person or a certain permission, the person can access that permission across all the name spaces in the kubernetes cluster, which is a bit dangerous. So we'll look at an example in the next slide.

A

Here we have John, who is uh who can do read secrets in the full name space and there's a role called read: secret role in full name space, and you can see on the slide that there's a read secret role behind into John who, which gives John access to read secrets in the four namespace. That is all right, then, there's Jane, so Jade needs access to read and write secrets to the full name. Space, however, read and write secret is a cluster role we have created.

A

For some reason, and in this scenario we can still create a role bind into the cluster role, which is read, write secret to Jane, which gives her the correct access to four namespace. But now then, look at the admin user admin user. We have given admin user read, write secret cluster role, binding, which means the admin user, as you can see, from the red dotted lines that they can read across namespaces of all the secrets that there are on the names all the namespaces available in the cluster.

A

So this can be pretty dangerous if you don't properly give access. So it's recommended that you use role bindings and roles instead of cluster roll bindings, so hardening your kubernetes cluster. We already spoke about our back policies and how to set it up. The next thing you've got to do is enable audit logging. So when a disaster or something has happened, someone has access to your cluster. The your best friend would be logs.

A

So you need to enable audit login to figure out who did what on your cluster so that you can trace what has happened. Then you know come to a conclusion. The next thing is you've got to run CIS benchmarks on your cluster. This is mostly applicable for the Clusters that you have done from scratch on your data centers Etc, because Cloud managed clusters, gke AKs e case. This already has CIS Benchmark reports on there.

A

It's managed for you, but if you are managing your own clusters, if you are managing your control, planes run CIS Benchmark and do the recommendations that they have given the next one is. This is applicable for the Clusters in Cloud as well manage clusters as well use a CIS hardened image.

A

So by default, when you get a gke or a case or E case cluster, they provide you a uh a simple node, a node with a Linux runtime and continuity runtime for image for containers, but you can have CIS hard node images, which has more second profiles and power profiles installed back deep within to the email, node image itself, which would give you more security.

A

The next thing is encrypted. Cd doesn't matter whatever the security tools you run, you might be running, uh runtime security sandbox, you might be running everything, but if your hcd is not encrypted in a control plane, any attacker who has access to your control plane should be able to easily look at the key value store that hcd provides and then read from it and get your secrets same goes for your secrets, so the common recommendation is I, mean 12. Factor apps taught us to inject variables as environment variables. That's not the case anymore.

A

Now you have to mount them as files right, so attaching environment variables can leak your credentials to someone who has access to the VM. So don't do that mount your variables as a file and then read it read read from that file.

A

Let's go to the next section. Attack has access to a clock to a service account in your cluster, all right so now attack has come in now. It has uh has the control plane access and somehow it has access to talk into your kubernetes API with some permissions right. So how do we stop that?

A

First of all, let's look at Network policies.

A

The concept of network policies is, as you can see, on the example here we have a web server in green, a python backend and a database and, as you can see, there's no need for the web server to talk to the database. Only the python backend needs to talk to the database and on the on a separate namespace, we have something called a super important API right by Network policies. What allows us to do is we can specifically say which pod can talk to which service right.

A

So we can specifically say: web server cannot talk to the database. Web server can only talk to the python backend and python. Backend can talk to the database. So if somebody has exploited web server and is has access to the web server, they won't be able to write directly to the database. They won't be able to access the database and then, if somebody goes and you know, comes and gets into the python backend. Of course, then they have access to the database, but still because of network policies.

A

They will still not be able to attack the super important API. We have on a different namespace right. So that's what network policies allow us to do and by default you know you have to use a cni, a container networking interface plugin on kubernetes that supports these Network policies. Network policies can be at many layers so commonly it's at tcp4, but layer, 7, Network policies are network policy. Supporting cnis are there such as psyllium Etc.

A

So it's up to you to figure out which to use let's move into the next slide, so as I said so. The first thing is you gotta use namespaces, to isolate your tenants on the network, then use Network policies which is the kubernetes equivalent for firewalls.

A

Then it's added Advantage. If you have an API Gateway, you can even make the inter-service communication go through an API Gateway. If you are using a service mesh use mtls, uh you know if you're using a service mesh, that's powered by ebpf. You know Network encryption Etc happens by default. You can do that. So that's great! That means. No one who has access to infrastructure cannot overlook your network and figure out. What's going on?

A

uh Finally, there's something called admission controllers on kubernetes, so there's open policy agent key row Etc, which allows you to write policies on kubernetes. You know kubernetes cluster, which I'll explain in a bit more in a later slide. So now attacker has access to your code base.

A

Now, if an attacker has access to your code base, attacker can do a lot of things. They can insert malware. They can insert vulnerabilities and exploit them later they can. They can do a lot So. To avoid this, uh you can do a few things. First, one is of course, static scanning with your CI pipeline. You can do Sona, Cube, scan, Etc and figure out. If you are committing any uh sensitive information to git which is accessible by anyone.

A

If it's public and then you can do code vulnerability scanning as I mentioned, and then you can, you have to keep scanning these right. It doesn't matter like today use scan it. It's all good. Everything is green, but I mean there could be an exploitation or vulnerability uh discovered tomorrow. So you gotta scan you know on a daily routine or on on whatever the preferred way that you you do uh then goes image vulnerability scanning.

A

So here we, uh our images, might contain vulnerabilities this also again, you got a scan at a interval or a duration. You can't just scan once and if it's okay, you can just forget about it right, you gotta scan continuously, and you know, exploiting and a vulnerable image could lead to. You know: privileges Collision someone can get remote shall access your information could leak. Someone could give you Adidas attack within the cluster I personally use Claire, but people use 3v and any other tool.

A

You can do to use a image, vulnerability scanning and uh then there's another one called configuration scanning, so configuration scanning. Is you know when you are committing your uh yamls to your github's repository Etc? You can easily check them via check over something else and see if you are missing anything if you are missing a trick, if you are missing any any configuration security, configuration or a policy, so it allows you to it, gives you feedback and you can can fix them and you know secure your pipeline.

A

So let's talk about kubernetes admission controller, which I promised I talked in the previous slide. So let's see what what it does. So in this scenario, there's a create pod request by a user. So when that request is gone to the kubernetes API via Cube CTL, what happens is? Firstly, it tries it figures out who you are right, then it goes to the authorization. What can you do? Can you actually create a pod, then? Finally, there are something called admission control. So in admission control there are so many policies.

A

Even we can write our own policies in this scenario. What here, what it's going to look at is whether you know have had the Pod limit has been reached on the given NS, whether you are able to create a pod or not so admission control has two types: when we plug in, we have something called validating, webhook and a mutating web hook, so validating webhook is a read-only uh type. Where you can it scans a given request and it just either allows it or denies it.

A

This is perfect, for you know third party policy controllers like open policy agent or something else. So here you can give any additional policies like don't pull images from. You know public Docker right pull images only from your private Docker registry, so you can give these policies and it would automatically uh deny at the admission control level when you give that from policy. Why are a third party policy controller or you can write your own validating web hook for it as well? Then there's mutating webhook.

A

So that's a bit different, so it changes the payload dynamically. This is mostly used with crds, and you know controllers that you know work with these operators that CR this. So here you give a payload and, according to some some logic, it changes the payload dynamically and applies to kubernetes with a different man right. So that's what kubernetes admission controller does, and this would allow you to write your policies and secure your cluster right.

A

The next step is now the attacker has access to your content. Somehow they have exploited your container now they have a share. They have a shell access and now, let's see what we can do so when it comes to content hardening when you're building the content itself, there are some best practices that you got to follow.

A

First thing is: remove the Basher right so that they won't be able to access your shell remotely make the file system read only so that they won't be downloading anything or writing anything into your content file system and then make sure the user is running as a non-root user. uh So when these are done, it's hard for us to you know for an attacker to uh attack, and you know, exploit a container so basically make it container mutable.

A

Then there are other things that we can do. There are images which we don't have access to. You know to change the image right. There are scenarios like we don't build the image somebody else built it. We only run it so in those scenarios we can use kubernetes without changing image to enforce immutability.

A

So we can run this. This thing called startup probe in kubernetes, where a content is running, you can run some script before the container actually starts running, using that we can remove bash if you want to, and we can set like run as group run as user and as non-root set security contacts, remove privilege. Escalation can do a lot of things so by using those, we should be able to make the containers immutable at a kubernetes level as well then comes runtime security.

A

Doing all of that is sometimes not enough and you need more. Let's look at how containers work and what this runtime security means right. So, as you can see in a VM when a content is running, it's running at lxc on top of lxc, and then we have Linux kernel at the below right. So basically, a container is a you know.

A

Group of namespaces and three groups, so when containers are making this uh doing things it does this calls to the Linux kernel so say that means, if there's a vulnerability in the Linux kernel, the container should be able to exploit it. If someone with the proper knowledge and tools are there.

A

So first thing to do is, as I mentioned earlier, disable privilege escalation and drop all the capabilities of a container and then only add the things that are needed using kubernetes constructs and there are there are things like app arm and setcom profiles which restricts the ciscores that you make to the Linux kernel so that you know you don't make any weird ones and you know try to exploit.

A

Then that's container sandboxing, there's firecracker, there's G visor. There are few out there, which is you know. A lot makes, makes this issue go away, but on top of it adds more performance issues, but still it's it's great. If you're running a multi-tenant system, and if you don't trust the images, then there are tools like cystic Falco Etc, which monitors the abnormalities of the container on time and then gives you alerts. Like this is happening, stop this Etc right. So let's continue runtime security for you, so you gotta, concealer information.

A

This is something I told earlier as well. I asked you to do not inject environment variables, but rather do them as file mounts and when you're doing that it is recommended to inject wire secret manager at runtime. Don't save it as secrets on kubernetes itself. Base64 is not an encryption, so uh use hashico, Vault or Azure KV AWS secret manager, Google. There are so many secret managers out there use one of them and then inject your information, sensitive information as files to The Container at runtime.

A

You can very easily do that with those Technologies and I already told make your container uh root. The system read only- and this is a developer discipline- do not lock sensitive information, uh so comes to final thoughts.

A

Kubernetes security is still new and you know vulnerabilities get discovered every day and then they get past frequently so update your clusters as soon as you could, if you're running your own or even in Cloud, just upgrade, keep upgrading your clusters uh when it comes to managed kubernetes clusters in the cloud security is mostly I mean 70 of security control, plane, security, Spanish, you don't have to do much there and many organizations today need some level of multi-tenancy.

A

They might have different different teams, Etc working different different projects and within the shared kubernetes cluster, so so yeah. So these could be. You know these things that I told in this slide in this session would be appropriate if you can Implement them in these scenarios.

A

So you know, keep the proper standards up and follow up and things should be all right. That's it from me today, thank you and I hope. You learned something today over to you.