Cloud Native Computing Foundation Online Programs, 13 May 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: eCHO Episode 46: Security Observability with eBPF and Tetragon

Description

Learn from Liz Rice and Thomas Graf about the future of security observability with eBPF

0:00 Headlines
4:35 Open sourcing Tetragon
21:35 Tetragon CLI example

A

Hi welcome to episode, 46 of the ebpf and psyllium office hours or echo live stream. So uh today we are uh going to be talking about something that we will be previewing next week at cubecon, or rather we're previewing today, something we'll be showing next week at cubecon.

A

So uh let's quickly uh talk about the headlines as always, if you're here with us today, do you um say hello, I see sachin joined pretty early so um good to have you with us sachin. As always, we love to hear your questions and get your comments and and know where you're watching us from so yeah. It's been a busy week in the world of ebpf, uh the linux kernel developers, bpf conference happened.

A

uh I think it was last week there was a lot of folks there um from the I surveillance team, so we saw some pretty good pictures coming in and in fact, on this website you see a picture of these uh folks who are building ebpf for the rest of us to uh to build tools. On top of so we're grateful to all those folks- and uh I think they probably had some really interesting discussions last week and speaking of interesting discussions next week- it's kubecon in valencia.

A

So if you're going to be there do say hello, um tell us, uh you know now, if you're going to be there, we'll we'll look out for you next week, so psyllium is going to be there with a psyllium project booth and we're also going to be there in the form of isovent as well. There are loads of talks about psyllium, many of which are by end users. So, for example, we've got ikea speaking at the ebpf co-located event.

A

On monday, uh there's uh bell canada talking about psyllium at the telco event, also on monday and several other end user stories, which I think are going to be really exciting, also you'll be able to come and talk to us about how to get involved with the psyllium project. I want to particularly highlight this session on thursday, where we're hoping to meet developers, technical writers, folks, who want to get involved in psyllium in any form, we'd love to see you there.

A

If you have ideas things that you want to see implemented in the project, that's a great place to come and discuss those ideas. I can see loads of people. Who've joined us today, so just quickly running through to say hi to sachin hi, to nicola hi, to quentin, who is moderating for us as as always hello to tony russell. We've got some regulars here. This is really great um hello to naveen hi, to kamaraju, watching us from india.

A

So as we go into this next section, I hope you're going to find this a pretty exciting new feature that we're going to be talking about. Do. Let us know your questions. This is going to be kind of the first time we've been speaking about this publicly. So I'm pretty excited about this.

A

Let me welcome to the uh echo show. Let me welcome my colleague thomas. So thanks for joining us again, thomas um have you on the show.

B

Yeah thanks always great to be here. I was I feel like I was just here and.

A

B

Think I was last week or the week before.

A

Yeah only a couple of weeks ago, so uh yeah we'll we'll we'll get some different guests on next time.

A

But I think this time it was uh a really great opportunity to talk about the topic of security. Observability um folks, who've been paying a lot of attention and who've been following us on social media and, following the blog might have seen natalya's blog about a security observability book that we've published we'll be giving away copies of those at q con next week. So do come and uh talk to us for one of those uh yeah.

A

So really the topic that we're going to talk about today is psyllium observability and a tool that what we're describing it as new. But it's not so much brand new as something that has been part of. I surveillance uh offering for quite a while, but we're now bringing to open source as well so thomas introduce us to tetragon.

B

Absolutely let's get started.

A

So, as liz mentioned.

B

um Tetragon, let me see if the screen share is working. I think so. Okay, so texagon is a new open source project.

B

Tetragon provides ebpf based security, observability and runtime enforcement, and we have been shipping this to our customers as part of the high surveillance, psyllium enterprise distribution.

B

So many know us or many know I surveillance for having created psyllium um on the cni on networking now, service mesh they've always been also doing runtime security, in particular around observability, and also a bit on the enforcement side and today or next week. So this is a preview of what we will announce next week at kubecon, we're essentially open sourcing major parts of tetragon and making it available for contribution to the open source community.

B

So let's look a.

A

B

A

Can we just pause for a moment and acknowledge how great the tetragon logo is.

B

It's amazing: isn't it kind of the the small little b warrior fighting with sodium and ebpf.

B

A

What is tetragon and I think.

B

Hopefully, a lot of this will make sense very, very quickly and very soon, so I think ebpf is obviously a amazing technology to introspect uh the system and applications. So essentially, what we want to build with tetragon is bring all the eppf experience we have as a team. Many of our team members have been involved with ebpf since 2014 2014 when it was created, we've created psyllium d, major c9 plugin, using ebpf, we're now bringing all of that knowledge to the observability space, in particular, with a heavy security focus.

B

So if you look at kind of where ebpf can hook into, we see all of this, this batches of the ebpf logo on it. These are all places in the system where we can, where we can hook in and actually gain visibility from the lowest level storage network like disk access network packets name spacing whether it's network name, spaces, mount name, spaces, username spaces, the virtual file system, the network stack and so on, and so on.

B

All of these subsystems, edpf, provides visibility into and, of course, also, the system call layer the system collar in that layer. A lot of tools already exist. There's a lot of system called traces that already exist based on ebpf as well that provide this visibility.

B

What some people don't know is that ebpf can also be used um to introspect the application itself. We can actually build function, tracers we can look into library, usage and so on. So.

A

All right, I guess, we've seen some examples of that on um on echo before so things like the um flame graphs that we've seen from tools like parker and pixie. um It could give us that indication that we can in inspect the applications as well as what's happening in the kind of infrastructure layers.

B

Exactly absolutely and we're bringing that so it's the same underlying edpf technology that we that we use and we bring that and actually look at a very security, specific or kind of security. Centric uh observability uh focus here. So this is one level down and we can kind of look at what are the types of observability and or observable data that we could extract.

B

I mean talked about data access, for example, when are certain devices accessed or looking at the network level parsing individual protocols, whether it's http dns, tls tcp, but then very interesting, actually observing the name spacing layer. This is the linux kernel, subsystem that essentially enables containers and because.

A

It is software, it's possible.

B

That there are software box and when a user or an attacker attacks these software box, kernel, kernel, vulnerability, escapes or breakouts can happen and with ebpf and with tetragon we can actually observe them. So we can see when a container is escaping its name space isolation. We can also see when a container is escalating its privileges. Maybe it's allowed to do so. Maybe it's not. We can see both, but then we can also look into, for example, tcp sequence number attacks or we can observe when certain file system or certain files are being accessed.

B

For example, when somebody writes to a systemd unit file, we want to see this because it might modify the bootstrap process of the vm or, if somebody accesses, etc, shadow and so on and of course, also the system call level and as we talked about function, calls and so on.

B

What's very unique and what's very specific and- and I think interesting with tetragon- is that a lot of the observability logic around collecting actual signal, so not just looking at the raw observability data, but a lot of this processing, for example, creating metrics or creating stack traces or creating histograms or aggregating data together doing filtering a lot of that. Additional logic is usually run in a user space agent.

B

That tetragon has as well, but tetragonal moves a lot of that logic into the kernel of ebpf, and this allows us to to perform this very deep observability with incredibly low overhead. Like a a fraction of what let's say, a p-trace based observability approach would provide and that.

A

Also allows us to.

B

Actually be very flexible and uh use the extendability to provide, and we can zoom in a little bit to actually provide a lot of integrations into interfaces that already exist, where we can extract our observability data too, for example for metrics. We want to export that into prometheus or maybe into a sim. We want to have like a raw json.

A

B

For events, but then also fluent in fluent integration same for logs, you might want to export that into a grafana instance or into elasticsearch for traces. They could go into open, telemetry and so on and with ebpf. We have this flexibility because we can extract the data in the form we need and then a tetragon agent can export them into kind of the the relevant formats. So that's.

A

B

Entire observability piece to summarize this, I think this, the deep observability uh key aspect of tetragon. We can see everything from lower levels, all the way into applications, the transparency, so no application changes are needed. It's all done from within the code of ebpf, so the application does not even know when it is being monitored. It's actually a difference, for example, to petras. If an application, if you use ptrace, to trace an application, the application can detect this, that it is being pre-traced.

B

But then it's also the low overheads, because we can do so much in kernel very close to where um the observability data is actually produced, and this helps us actually to gain this deep observability data. Because often when you go deeper in terms of visibility, it becomes more expensive so having a very high level of issue of efficiency,.

A

B

Important, that's the observability part, which is, I think, exciting on its own, and I think this is where tooling, like bcc and bpf trace and like pixie and others have already done quite a bit, um we're, I think, bringing in a more security centric focus here as well. So I think it's a nice complementary, complementary, complementary project um to a lot of existing projects already out there, but then we actually bring the second aspect in as well, which is the enforcement.

B

So, besides extracting visibility data, we can also take policies, and I've listed three examples here: kubernetes crds or just json policy, and also something like, for example, a policy created by open policy agent. That's this logo here can be fed into into the tetragon agent, and the tetragonation then enforces these rules in kernel. So this is very, very different. We'll look at an example in a second on how usually that's done um in kind of traditional system.

B

All of the enforcement, including all of the rule engine, is in kernel, which means we can do so-called real-time enforcement. This is the same synchronous enforcement that, for example, assay linux or sac comp or lsm 2, as well like as soon.

A

As you observe, as soon as.

B

The system observes behavior, that is not correct. The process is immediately terminated or the action is immediately stopped, which is very different to a system such as petra's page trace, based enforcement, where ptrace is used to extract visibility and then a user space agent processes that asynchronously and then kills a process reactively and in the meantime, until that that kill signal has been sent, the process can continue and actually do more, more damage.

A

I think we've we've seen that kind of concept for a while. It's certainly something that I've spoken about around how you know, if there's a gap between the time that you detect a malicious event and actually being able to do something about it, it could be too late.

A

Your data could have already been exfiltrated so being able to prevent the malicious event from occurring rather than just saying: hey sorry, something really bad happened, but it was you know, 200 milliseconds ago, it's a huge difference in in power and the the real defensiveness of that security tool. Yeah.

B

It can take milliseconds to extract data and send it somewhere right so yeah and ultimately needs to be immediate and enforcement needs to be in real time.

B

I think the other unique aspect of tetragon is that it can do enforcement across multiple layers.

A

I've listed three and like.

B

It can do enforcement actually in a variety of kernel subsystems. We are kind of used to doing enforcement at the system call level, so this would be something like setcomp bpf being able to allow like allow certain system calls to be made and that's a great way to reduce the attack surface of the kernel. So if an application attempts to exploit the vulnerability in the kernel can only do so if the system call that's vulnerable is actually allowed, um another example would be something like else lsm or sc.

B

Linux, which has enforcement points deeper down in the in the kernel and tetragon is more like this. It can actually observe and enforce across the entire stack, not just at the system call level you can, of course, do a system call enforcement. What is allowed, what is not allowed if tetragon, but you can do more, you can, for example, say I want to limit access to certain files and not at the system code level, further down when the file system is actually being accessed.

B

So even if the file excess would happen through some other mechanism, that's not a system call. We could still detect and block that as well. An even better example, I think, is the name spacing and the the escape of of of of say the username space or the the file system namespace. So let's actually scroll down and skip a bit and look at that specific example, because I think that will make a lot of sense, but here's the picture so a very common example. I think this is the main way.

B

How containers usually escape is there's a kubernetes part running or just a container, and this container is aware of some vulnerability in the linux kernel and it will make a system call. That does then exploits the kernel vulnerability, and this vulnerability allows to execute code to escalate its privileges.

B

So an unprivileged part which does not have, for example, caps's admin capabilities or is not running as root can gain these capabilities, even though it's not supposed to do so and then break out of the container and tetragon can essentially observe when this scope of privilege changes in real time and immediately kill the process before it can even continue. So essentially before the system call even returns.

B

The process is killed. Typically, a system like a a part here would exploit the kernel and then wait for the system call to return, and then you can do something, for example, install a rootkit or make that make the damage permanent, install a system unit file or deploy a static pod onto the node, like a variety of ways to then make the attack permanent, but for that it needs to be able to execute further code.

B

We do have an example down here on how that actually looks like so: let's scroll down and we'll go back. So how does this automatic mitigation of privilege and container escapes actually look like and we're using here an example, the cv? This is a vulnerability in the net filter in the iptables kernel, subsystem that essentially allows to gain capsis admin privileges.

B

If the kernel is vulnerable to this book with tetragon, we can create a policy that looks it's a bit shortened, but it will look like this, which is essentially a kubernetes crd, and if you read through this, it's kind of easy to understand. It basically says if the capabilities of the process, change and capsis admin is now allowed, then execute an action and kill the process like send sent to signal, kill which will kill the process.

B

um If we install this policy into into tetragon and then we, for example, run this cv. This exploit script is essentially the demonstration that this cv works. I will essentially set up a container name space and then exploit the kernel block and pop a rule root shell, and we can see that when we run that script, usually it would actually succeed, and then you have a root shell. In this case it just gets killed and the rutgeole never happens, because tetragonal observed that the capability change happened and immediately killed the process.

B

So the kernel block could still be exploited because it's there or it could be. You can trigger the bog. You can try and exploit the bug, but you cannot gain a privileged escalation from.

A

It- and this is a massive benefit.

B

Compared to a lot of other approaches in terms of trying to secure this, because it does not need to know the details of this cv, we could have used another cv in this example to to gain the privilege escalation. It would have. The same result would have would have happened. So that's.

A

B

I think massive part of tetragon that, like besides the observability besides relatively basic enforcement principles based on system calls, you can, you can understand, um or you can enforce, on a much higher level policy as well.

A

Got a good question here from russell: is this enforcement intended just for kubernetes, or can it be for any workload on linux.

B

This can be done on for any workload on linux. Essentially, there is tetragon is kubernetes aware, so it does actually understand part data and so on, but that's why I listed here in the policy. If we go back here, the policy api there's also just json, so you can actually just feed in um a json policy that is not does not contain anything about kubernetes. It could be my binary x can do these type of system calls or if a binary matching this pattern escalates privileges kill it. There's nothing cubanetta specific.

B

With about tetragon itself, um the agent does have kubernetes integration and I'm using kubernetes as an example, because I think it's one of the primary use cases that tetragon targets is, but you can of course, use tetragon outside of kubernetes as well. You could use it with other container runtimes or choose container runtimes native processes, or even inside of virtual machines as well.

A

And I think, um having that kubernetes awareness is what makes the output from tetragon so useful. You know you're gonna, see uh if you're in observability kind of mode, if you're not killing things you're still gonna see traces being generated. That will show you exactly what process inside which part in which namespace is responsible for the potentially suspicious uh event, yeah.

B

Let's we can actually look at an example here. I think we have uh the cli example, which shows the kubernetes integration.

B

Yes, here we go now we'll zoom in a bit, so there is a cli with tetragon that can be used to essentially introspect and analyze. The behavior of an application, in this case we're actually leveraging a kubernetes paw that is running curl github.com.

B

So, let's scroll, let's zoom in a bit there we go so we see each line essentially represents an event. So we see that there's a process execution use a local bin, curl dasher github.com. Then we see the dns resolution what's happening. We see the ip that is being returned, so this is an example of uh tetergon actually understanding the dns protocol resolution. What happens on the wire there? Then we can see the connect system call. We see it connects to the ip that was returned on port 80 because we did not specify https.

B

So it's going. Actually it's unencrypted port 80. github doesn't like that. um So we see the http processing. We see an http request. We see this as a get to slash, but then because github wants us to use ssl tls, it returns a 301. Hey, please actually not go here. You need to talk to me on port 443, which means then coral does another. Dns look up gets the same ip again and now it does. It connects to port 443 on github, and then it gets even better.

B

We then tetragon understands the tls handshake of the protocol, so it actually understands okay, you're talking to github.com, that's the sni. This is using tls 1.3 and we have negotiated this cipher. We actually have even deeper visible. This is just a summary. We would have all the aspects of the handshake um that have has been ongoing here, so we actually see, for example, if this was using tls, um pls, 1.1.0 or something we would see that as well.

B

Let me see that the process actually exits, so the the the curl process exits with error code zero and then we also see the the tcp connections and that they have terminated, and we see that the amount of data that was uh transmitted and received on those as well like they. As soon as a tcp connect or a close, is observed. It will actually accumulate the data and understand how much was actually transferred as well.

B

So that's an example um of the of the of the cli showing the the kubernetes integration. um The kubernetes integration is essentially this left part. Here. We can see that this is the d in the default namespace uh a part running or a pawn name test dashboard. If, if this was not running as a part, you would simply see the process id uh and the binary name here.

A

So this is a really great question. Do we need really modern, modern kernels like 5.15, which I doubt very many people are running in production today? Right? Yes,.

B

So actually, it's maybe a good point to talk about um kind of why we have created central goal, because a major reason is actually this. So let's go back and because the short answer is no, it can actually run on much much older kernels and um the reason why that's, why that's possible is because we have invested quite a bit in understanding and creating something that does not require very recent kernels. So let's go back up here. Why a tetragon?

B

So traditionally these were kind of ways of gaining visibility into the system, so app instrumentation, changing your app or ld preload or ptrades, which is a it's an early on debugging interface, that the kernel allows to trace uh processes. What system calls they're making they provide some visibility, they're quite efficient and ptrace, is also transparent, but there they have very limited enforcement capabilities.

B

You can do some enforcement, for example, with ld preload like loading a library, but an app can simply bypass that by doing static linking and then whatever enforcement capabilities built into lt pre-laws preload would not work. Ptrace itself is no enforcement, so you can observe and then do asynchronous enforcement. But that leaves that race window, where more damage can happen, which is why the kernel team or the alliance chrono actually imposes or creates or provides a variety of built-in um mechanisms to do security.

B

I think well-known, is second, I think, which is 15 16 years or even older. By now. Sc, linux and lsm. Lsm is a framework, so specifically in this case would be something. For example, like lsm ppf secump is, is great, it's transparent, it's efficient, but it's only enforcement at the system call level and it's very limited in what you can see.

B

You can actually not do inspection into the system called argument, so you could actually not see what is the ip it's connecting to or what is the file it opens and so on, and there is no visibility, which is why, as the linux and lsm also exist, they're great they're efficient, they are transparent to the application, but they have no kubernetes awareness and even worse, they're, not really extendable, so lsm dpf is provides great capabilities, which tetragon will definitely use when, when available, to do even more enforcement but requires a very reasoned 5.7 kernel, and they also, I think, that's the other angle.

B

They need to understand the actual attack vector so they're um they're, uh very much focusing on oh, I I want to only allow what is known and what is good, they're, not able to understand okay, privilege escalation has happened that shouldn't happen. Let me just kill the process which, for most security, vendors left essentially the kernel module path which is efficient. It is transparent and it is extendable, but it has downsides as well. In many environments, you cannot load a kernel module loading, a kernel module is a security thread on its own right.

B

I think it can crash your kernel. It can compromise your kernel and it's also very challenging to upgrade a linux kernel module without actually turning off the functionality and then turning it on again. Ebpf solves all of these problems, which is wonderful, and that's essentially why we have created tetragon right. So it gives us this efficiency. It gives us the transparency, the app doesn't even notice. It is being uh observed, it's being monitored.

B

It gives us the synchronous enforcement that lsm, that um that uh I mean that acid linux has as well, but it's also extendable, so we can actually make the kubernetes we can make the kernel cubeness aware or we can build in concepts, new concepts or new protocol parsers that are needed now um and when the kernel version that somebody's using was created when were kind of unknowns or not expected, and it also gives us this deep visibility into all layers of the stack. So um tetragon works even with four point x.

B

Kernels uh all the way way back, depending on the on the availability of certain bpf functionality. Tetsagon will use more or less, but there is no minimum kernel version. That's that's reason, so that's a major reason why we have created tetragon.

A

And I think um just there was a follow-up question here about how we can guarantee that this isn't going to cause the kernel pan to panic and of course that is one of the beauties of ebpf. It's the the verifier. That's going to keep us safe from kernel, panics right.

B

Exactly right, so in this picture we see this kernel runtime box with the ebpfb logo in here, and what this, what this kernel, runtime or ebpf runtime um provides, is a couple of things. First of all of list mentions a verifier. The verifier will ensure that it's not allowed to load any programs that can harm the kernel, so you're not allowed to, for example, load a program that loops forever ebpf does support loops, but they need to be bounded, so you cannot actually load a program that would then spin forever and just halt your kernel.

B

You can also not just unless you have very specific privileges. You can also not just access random kernel memory. You can also not just write into code random kernel memory, kernel modules can do this and if they do that's when they crash the kernel, ebpf programs are also subject to, for example, all the variables that an ebpf program uses they need to be initialized like the verifier ensures that there is no random kernel memory state in evpf, program, variables and so on.

B

The list goes on and on and on there's a very sophisticated verifier in place that ensures this. The the focus on of this safety is very, very high, because today, ebpf is used in major.

A

B

So a good example is, let's say, facebook and google. They use ebpf at the lowest levels, for example, to mitigate gcp xeros tcp. Serial data is do low, polarizing, ddos protection security issues in the bpf runtime are would be. Are a major would be a major problem for for this company, so there's a lot of eyes on ebpf and and and the verifier. So I think that's that's essentially the main difference to a linux, kernel, module and ebpf.

A

And I think we could see you know in the uh bpf conference that happened last week. You know there's folks from google and meta, and some of these really, you know high scale companies who are extremely involved in in how ebpf works precisely because they're so reliant on it and it needs to be safe for them to use.

B

A

So speaking of how people use a product like tetragon, now, of course this is something that we're open sourcing, but it's been used in production for a while. So I think we we have some experience that we can use to talk about how we imagine users will leverage tetragon you know. Is it just to protect tactically against particular cves, or are people using it more broadly to defend an entire cluster against a broad range of possible attacks.

B

Yeah, let's look at a couple of views because we have listed them down here. Let's look at them: um let's do the let's: let's do the monitoring of access to sensitive files first, this is a good start, something that a lot of users are doing relatively early on. um This is showing tetragon integration with splunk. This is the sim integration that exists and in this, in this version of like we're, essentially monitoring access. In this case, it's monitoring, access to, etc, password uh other good examples would be.

B

I want to monitor the init scripts on my node, because nobody should ever tamper with them or an even better example, would be the unix domain socket of the container runtime, but because whoever has access to that has essentially gained root access on on the um on the system.

B

So I think this is the first level where, across the fleet of of cluster systems, whether it's with cubans or without you can gain values from observing uh monitoring access to uh sensitive files, even in this case it's actually highly useful to understand kubernetes context right in this case, we not only understand like the name space and the pod name of the of the workload that has made the access. We can also understand the the container image, so they can actually see the version of the application. For example, we see the uid and so on.

B

This is just showing a couple of columns. We have a lot of metadata visibility that we can that we can extract. Another example: is detection of weak tls ciphers so, for example, to enforce tls compliance monitoring? So, yes, great, your apps are using tls. How do you know? Maybe I can zoom in a little bit, so we see this a bit better. How do you actually know whether whether your apps are using a tls library that is using a modern version of tls, and how do you know whether the ciphers they use are secure?

B

How do you know whether the key length that are being negotiated are above 128 bits, for example, and with the tls monitoring here we can actually see what is being what is being negotiated, and if any parts use insecure, ciphers or weak, uh keel length or cipher key length, then we can tell, and we can tell exactly which pod, which container image version and so on. We even see the sni, so we even see okay, what was the remote destination when this was was used?

B

Obviously, this can also be used to, for example, detect an ssl downgrade attack as well when the attacker tries to intentionally you I kind of announce. Oh I'm only able to speak this very weak version of off of this cipher, and we can also defect that, for example. So that's.

A

That's these are kind of starting.

B

Points and then I think, um a more sophisticated version of this is that that we have many users using. Is this combined uh network and runtime visibility?

B

This is available as a splunk app, for example, and it combines the view we had in the cli before, like the network visibility with the system call and runtime visibility, it actually builds a tree of what an app is actually doing and we can see if we zoom in a bit. We see that there is an and kubernetes name space, a par name, and then we see the container runtime outside we see. This is a node app, and this node app was then compromised and a reverse shell was invoked using netcat.

B

You see the nc binary, which is reaching out to an external domain here, which the name says: it's not a reverse shell, of course right and we see the attacker. The attacker are using curl to actually access in this case, elasticsearch and then use curl again to do an hdb put to upload something onto an sf3 bucket. So.

A

This is a kind of a next.

B

Level, so not just monitoring, but now we're also investigating, uh maybe for forensic use cases right. um So you can store all of this data in a sim and then looking back. Oh, what actually went wrong like this part behave really for some reason or like. Apparently, there was an attack from this time. Time to this time step, you could really dig in and figure out what was executed, who reached out to whom and so on, and then at the last level is a section.

B

That's what we have added most recently is this enforcement capability, where we use all of this visibility for in for um enforcement and our goal, and what we are looking to solve is to go away from having to understand every single cv, every single attack vector and instead actually say. Well, we have isolation guarantees that users want such as the container name space boundary or this process is supposed to be not running privileged and, if that's violated, kill it.

B

Instead of trying to list all the potential ways for to actually violate the chronology or to to exploit the kernel, I think.

A

That's still also useful.

B

In security, it's always multiple layers, but this is a great last line of defense. In the end that, if the escape happens, we can terminate processes. We can kill the workload without actually understanding where the attack or how the attack even happened.

A

And as um I I feel like, I often say this cloud-native approach to software architecture. We tend to put small amounts of function into each part, we're breaking our applications across name spaces and and pods. We can use that kind of.

A

These smaller pieces of function as a way to reason about what malicious behavior looks like and what normal behavior looks like, and we're already doing that with network policies right we're saying these services are allowed to talk to this particular other service and this external domain name, um but nothing else, and we can start to make similar policies around the kind of file access that different services need.

A

You know the kind of user identities, the kind of capabilities, the privileges that different workloads need and uh the more we can build policies in a way that people you know they don't need to dig into every single detail.

A

uh You know is this wrong? It's! It becomes more a case of if it's not right, we're going to block it. That's exactly how that works in network policy. We should extend that to other um aspects of security as well, and I think that's exactly what where tetragon is taking us.

B

Exactly yeah, and I think network policy is a very good example um where we can actually combine tetragona and network policy together, which we call runtime awareness or policies. So what let's just mention is cubanez network policies like if you oversimplify it a ton, it's essentially a policy that I can show one here. That essentially says. Oh, I want my front end pause to be able to talk to my backhand parts, or it could be my front-end.

A

B

Should be able to talk to cider, 10-8 or something, but the granularity of kubernetes network policies on the pod level. um It's written here, like the the network policy, doesn't care whether it's curl, that's invoked in the pod or whether it's the python script, or whether it's a node, app or something else, or if it's multiple, a multiple container pod, they can all do and produce network packets or receive them. The network policy won't care.

B

It will just allow based on part-to-power level or based on the ip level, adding tetragon to the to and combining that with psyllium and to the psyllium network policy, which is the crd that offers we have richer network policy options. We can actually add runtime context. So if we scroll in here and look at this from runtime, so we can see that this policy says that this policy should apply to all positive label name frontend, and it should then allow egress outgoing connections to all parts with the label name packet.

B

But then we want to lock it down further. We want to say yes, you can do so, but only if the binary is called app.pi or pi here, in this case a python app, and it's only if, if the app is not privileged and it's just an example, we could use other filtering language here as well. We could also parse and match on the arguments given to them or we can match on the uid or a gid or other runtime aspects, but essentially allows us to lock it down further.

B

Another good example would be just a container name inside of the pod, so if you have multiple containers inside a part, just okay, I only want my init container to do this connection, but not the actual workload. It's actually very common that users are running an init script, a patch script that will may pull some secrets from world or do this and this and this and this in its crypto. This in a container may need additional privileges and additional um rights that the workload afterwards should not have.

B

So if the workflow gets compromised, it should not be able to access wall just because the inner script required access to world. While it was bootstrapping and with runtime or combining these policies, we could actually achieve that. I think this is very exciting, which means that if we take this example, if the car gets compromised, it gains privileges um and then it, for example, extracts data from the local disk.

B

It could not upload that data anywhere and you could not even talk to the back end paul here, because it's no longer unprivileged, so I think that's also very powerful. It's a good example just like if the visibility we saw before combining runtime information, network security or network level information is incredibly powerful.

A

All right, if you're out there and you have questions about tetragon and everything we've been talking about today- do uh type them in get get your questions in now and we do have a question here: it's it's a little bit off topic, but let's, let's go with it anyway. It's about the difference between a service proxy and a service mesh in kubernetes, and we've been talking about service mesh quite a lot recently as well, and uh so I guess it's it's reasonable to try and talk about this.

A

uh This question. I guess this for anyone who hasn't been uh following us recently we are um psyllium service mesh is moving into ga and uh I think that's something that will get quite a lot of attention next week at kubecon, as well as tetragon yeah.

B

A

So I think service mesh.

B

Is known uh kind of maybe if we go back to a picture here that helps to understand a bit better um if we zoom in on the part here right, so we have paul running service. Mesh is known um to traditionally run so-called sidecar proxies and the side car proxies of running inside, of the pod kind of where the yellow ebpfvs right now essentially inside of the pot they get injected.

B

And then they run side by side in the pot and all the traffic gets funneled through that sidecar. So it's no longer essentially app to app. It's essentially app to the sidecar to the side, car of the other part and then other side car to the app. And this.

A

B

To give to gain visibility and also allows to do enforcement, but it comes at the very high cost um and it requires to run if you're on a thousand parts, you need to run a thousand sidecar proxies, um which for some people can be worth it. But it's also it's not a very efficient way of achieving this. So with psyllium service mesh, we are looking to provide the same functionality.

B

The same service mesh functionality such as traffic management, layer, 7, observability, mtls, so mutual authentication, retry circuit breaking all of that, but without using per part, sidecar proxies and actually leveraging ebpf to do as much as possible in this layer down here, which is also transparent to the application, where we also have the same visibility and we can after the same control but without running sidecar proxies.

B

Then some people refer to service proxies as a proxy that runs separately in front of a a service itself, so that could be, for example, an ingress, a controller, an ingress proxy that terminates tls on behalf of of the app that's, oh, it can actually be the same proxy technology, so it could be envoy in both cases of the sidecar and the service proxy, um but the use of it or the model on how the proxies are deployed is very different.

B

An ingress controller would typically run one proxy per node or a couple of proxies per node, depending on whether the proxy can do multi-protocol, but definitely not one. One per part.

A

And I guess we're in some ways making this question a little bit more confusing to answer in psyllium service mesh, because the distinction between a service proxy and a service mesh, the functionality will be provided by psyllium. So it's going to be harder to draw a line and say this is service mesh, and this is service proxy, because they'll be offered by by the same components.

B

I think a related.

A

Question here um asking about the well the rc- I guess this is probably asking about the ga date for psyllium, so psyllium 1.11 is the current ga available version of psyllium. We are very close to releasing 1.12 we're on. I think the second release candidate at the moment uh so just.

B

Yes, we just pushed that rc2, I think last week. Absolutely yes, I think, there's one release block left on the service mesh side before we can push out 1.12, so we're getting very close to the final final release I mean. Obviously next week is kubecon. The team will be busy with that as well, but I think 112 will come shortly after kubecon.

A

Another question going back to tetragon- and this is pretty interesting question- can tetragon look into app data and what, if that app data, was a pci compliant app handling credit card information, for example, yeah.

B

So, yes, you can like any other privileged process. On a note, it can look into arbitrary application memory. um I mean when I say: yes, it can that's with an asterisk. There are ways to not allow this.

B

If you want, if you're interested interested in looking to trusted trusted computing, there are ways for not allowing root on the nodes to look into arbitrary application memory, but if you let's say run um a standard cloud instance cloud provider instance and run linux on it, and you have your have privileged access any process with privileged access and access to to to the kernel.

A

B

Arbitrary application memory.

A

I think this is really going to be a question about making sure that your policies that you're installing are well. You have trusted code. If you're loading ebpf code into your kernel make sure it's code that you trust, don't just you know, download an ebf program from the internet and uh you know ebpf is powerful, so use with care. um Yeah.

B

And I think this is why it's important that we're actually using ebpf because it provides uh when the capabilities there right. So it's. I think it must be known.

A

B

Think you can access arbitrary memory, but it's the edpf pro beer that does this and it it's using a vel standardized api and it's uh it's not random code that run this. It's not the tetragon agent here that will access memory um of memory of the user space application, it's wired, the kernel level.

A

Yeah- and I I don't think you know you need to be aware, when you're putting in policies, I think one of the interesting things about tetragon is: it has an extremely flexible policy definition language, so you can hook into all kinds of places.

A

I don't think there is a a way to say read, you know some part of user space memory or kernel memory, but uh you do need to be a little bit cognizant of what your um what you're extracting and, where you're putting it and when you're logging data you want to make sure that that those logs don't contain secret information, for example. um So there's.

B

A

Of you know interesting, uh I think.

B

That's actually it's. It's actually a great reason why tetragon needs to exist because, yes, tetragon has the privileges to do this, but cubelet as well like hublot, runs as root and can can access arbitrary application memory data in the end.

B

So I think it's it's very, very important that you have the observability and you can restrict who like because not every out now not every process is required or really needs to um access arbitrary users based memory, but many of them have the privileges and it's coming back from, I think, and traditional unix linux security model where it was root and others and the fine-grained capabilities that were introduced later were still in the process of really rolling them out.

B

A lot of processes still run as root and have very very wide, reaching very wide reaching capabilities and privileges. So it's really important to have something like tetragon to understand. What's going on and restrict it as well.

A

So we've got a great question here about why it's called tetragon and I actually have a picture that I want to to show to explain this one. This is actually from my talk that I'm going to be giving at securitycon next week, so tetragon is or tetragoniska is actually a category of bees, and so there are these tetragoniska bees that are- and I love the description of this particular one, because I think it's very apt. It's a very small bee and it builds unobtrusive nests.

A

It produces large amounts of honey and it's not a threat to humans. I feel like that's a very good uh kind of parallel, for what tetragon is it's very small probes? It's using very you know small amount of resources, it's very efficient, and yet it's going to produce a large amount of very sweet information.

A

So that's why it's called tetragon.

A

Okay, uh here's another good question: are there going to be any use cases for uh the rule engines and and of course the answer is going to be? Yes,.

B

Absolutely yes, and I think um we once we published the github repo next week, there will be examples of policies that you can use for, observability for enforcement and so on.

B

um And then, if you want to have like a predefined rule, set that that is available in one of one of our products as well, where the entire mitigation is a bit more automated and you don't need to you- don't have to load the rules manually, but there will be plenty of examples in the open source repository and how you can leverage and use tetragon for both visibility and enforcement as well.

A

I will be showing at least one or two cases in my talk on tuesday, so uh if you're there at security con the co-located event at kubecon, do you come and watch that the videos will be available shortly afterwards? I'm sure we'll be showing it in a future episode of echo as well so yeah. There will be examples. There's also going to be an online tutorial that we're putting together so that people can, uh you know, try it out for themselves all right.

A

So uh I think I mean there's a lot of people saying some really uh like nice, positive things about tedcon, which is which is wonderful. So thank you very much everyone for your positive comments and I think that's pretty much everything we wanted to cover today right unless you have any last words, you want to add thomas.

B

I would say we're super excited. We can now also engage. We've been engaging with customers on this for a long time several years and we're super excited that we can now collaborate with the entire open source community so help us take us. Take this to the next level. I'm sure we there is, we see so much potential with tetragon, and I really feel like now is the perfect time to let as many people as possible to try it out to contribute uh and to build stuff with tetragon right.

B

I think that will be a very exciting phase. That's now that's not coming.

A

Absolutely so my thanks to toss for spending time with us again today and thanks to everyone, who's been watching and sending us in your excellent questions. Thank you for that.

A

As I say, if you're in spain next week, do you come and say hello, we would love to meet you in person and if you're, not there, we'll catch up with you online very soon take care everyone and we'll see you on echo again see.

B