Cloud Native Computing Foundation Cloud Native eBPF Day North America 2021, 30 Oct 2021

Previous Meeting

⏯

youtube image

►

From YouTube: eBPF in Microservices Observability- Jaana Dogan, AWS

Description

Don’t miss out! Join us at our next event: KubeCon + CloudNativeCon Europe 2022 in Valencia, Spain from May 17-20. Learn more at https://kubecon.io The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.

eBPF in Microservices Observability- Jaana Dogan, AWS

A

Anyways, everybody is here for a reason. You know, ebpf is kind of like this. Like swiss army, uh you know um toolkit. uh My reason to be here is to talk about like observability and within observability. I will be like mainly talking about like microservices and how we are using ebpf.

A

So, first of all, I'm not a linux developer. Like some of the people here, I am generally like working on monitoring observable and performance. Tooling uh are my area, is you know, multi-tenancy and more, like you know, micro services focus and especially, if you think about in the last 10 years, with the you know, the container orchestrations and so on. It just became so much easier to you know like pack things you know, deploy things uh scale up and down. So we have this like huge.

A

You know new world with uh in growing number of like microservices topology changes, all the other components and so on, and so on.

A

um I'm not sure if you've seen this before this is from brandon gregg, it's about, like all the canonical tools that we used to use back in the day to diagnose linux, as you can see that there's all these like different layers and so on. Some of the panelists has been mentioning like very briefly that this world was great because it's like rich there's, like so many tool sets and so on, but they were talking to like some very um concrete, inflexible apis to read the diagnostics data.

A

So if you need anything else from the kernel you had to, you know ask kernel developers to put in there and then, like it's gonna ship or you go with a kernel extension which people don't like because the next time you launch things, you know your kernel, may panic and so on.

A

So this model, what you know lived for a long time, but it just basically didn't scale and ebpf came out as a result and um ebp is this more programmable way to be able to hook into the kernel, get the events and then in the user programs. You can basically take that data out. Do whatever you want in order to enrich or filter or aggregate?

A

So um to give you a very, very, very brief intro. This is how like ebpf works, you write ebpf programs, you send them to handoff to a verifier and a jit compiler, and then you can attach them to certain places. In this example, I'm attaching to the sockets to be able to you know, read the network data, and then there is this, like bpf map type of data structures, where you can collect the data. The events coming from the sockets and um bbf event. Maps are accessible by the user space programs.

A

Your user space program, such as an agent, can come in and read it and take it and filter and do all the post processing to the events event stream ebpf can hook into multiple places like kernel and user functions is a common example.

A

People like to do you know profiling, continuous profiling, sometimes based on the data coming from this hooks, you can hook into system calls, so you can collect system calls if there's a really huge security use case for it. You know. People like to audit monitor system calls going on on a machine network events. Is the other example. We like it together out of the box, you know network uh telemetry and then the other one is the kernel trace points so before jumping more into ebpf.

A

I want to like recap, some of the bigger challenges that we had in like microservices and when I say microservice I think think about like in a grounder scale, think about also your kubernetes cluster, how many different components there are: don't just like fixate on your user, uh your you know your own services, because we sometimes know a lot more about our own services than the other components in a cluster.

A

One of the bigger challenges in microservices is like. This is not a world where we just monitor. You know virtual machines, or you know, processes anymore. We primarily care about the critical path. um You know a user request comes in, for example, you know it hops, through different services, all the way to the database and storage we care about. You know the health of this critical path, because our user doesn't necessarily care about one service being up or down like we can.

A

You know, maybe start their requests for a different replica of the same service, but they care about the health of their uh critical paths because that's their experience and if something is going down, as you can see in this case, which is a downstream service, you know our critical path is broken. So it's very, very important for us to be able to you, know kind of like understand, what's actually going in a critical path and what is broken? um Some you know like.

A

In the later years I've been working on distributed, tracing distributed tracing was becoming much much more popular because of the you know, the number of growing services and different things in our critical paths. So this is our first challenge. The other challenge is the con is the context. A couple of people in the panel mentioned this, but you know we have all these, like you know different services in the chain and like downstream services, don't always have the same context.

A

uh If you make a request from an upstream service, um you can't really like capture telemetry data at the downstream services, with the context related to that upstream service. Or um you have this big cluster, you know it's a multi-tenant environment. You want to be able to capture the telemetry with your cluster name, pod name and all of that stuff. In order to be able to narrow down your telemetry.

A

If you don't have context, you know it just becomes much much harder, so context matters a lot. This is like a typical. You know, mn problem. We usually have multiple processes and there are multiple like rpcs handled by each process, and then you know you have. The containerization is the namespace and, like you have orchestration, is the logical grouping and you know you want to be able to capture as much of this, this type of context, to be able to figure out where the issue is originated.

A

That or you know, when you're narrowing down your telemetry, you want to be able to. You know quickly see what is being affected in order to understand your blast. Radius, um the to to kind of, like you know, follow up with the the critical path and the health of the critical path. The other thing that we he started to do is like when there's an issue we first debug.

A

What is in the critical path of our request and the next thing we do. You know like we in the monolithic monolith times um it was more common to be able to. You know, just go and debug in certain functions or like syscalls and so on, and so on. Now it's kind of like you know the step, one um debug the critical path, step: two you can go and dig through and, like you know, maybe kind of understand.

A

What's going on in a specific service and so on- and this is where, like you know, correlations make um a lot of difference. We have actually another talk with morgan mclean in the conference um this year talks a little bit about like the challenges and like how um some of the you know. The ways that we do correlation is uh making life easier when it comes to troubleshooting. The other challenge, like someone else mentioned today, is like there's too much data in a in an environment like that.

A

Not just like you know, evp there's already too much event data and you really need to want to be able to sometimes have like some runtime controls or like have like a control plane to be able to say hey. I just want to enable more data and like disable more data, that type of stuff becomes very important because of the enormous amount of like data we produce, and you know every customer I talk to every team. I'm working with instrumentation itself is a huge burden for everyone.

A

I used to work at google on the instrumentation team. Now I'm kind of like leading parts of instrumentation at amazon, and you know it's kind of, like you say, there's a huge amount of work here in order to be also aligning on the data that you produce consistency of the labels or the shape of the data naming of the data. It's a long, long, long process and just because it's such a gradually moving area, you always end up being inconsistent in terms of like the data you produce and so on.

A

So consistency becomes a huge challenge over time. To recap, you know we talked about out-of-the-box. Networking is very essential because we have all these like small pieces talking to each other with network.

A

We are the extensibility in the runtime is really really critical because we want to be able to. You know, maybe enable and disable based on the situation in order to be able to troubleshoot more- and you know, given there's so much data, it's costly to you know always keep maybe instrument. You know this type of data.

A

You know fire hose up and running, so you want to be able to maybe enable and disable, and we talked about context where you know we want to be able to decorate and enrich the data, so it becomes much easier when we're navigating the data. So where does ebpf help? um Eppf has a lot of interesting things that we talked about like network diagnostics. In the panel you can get out of the box, tcp udp http. You know like high level network events. You can turn them into metrics.

A

I've specifically mentioned metrics here, but you don't have to you know you can get like very raw events. um You can also inspect like you know, protocols, um for example. This is a screenshot from pixi I'm running. I just run pixie on my cluster, and this is all the like inbound, http uh traffic coming to my service.

A

uh Without me, making any changes or anything you can see. Also some of the sample of the slow requests uh you can go and like inspect, you know what actually happened. um This is another example from like psyllium. Has this component called hubble? It comes with this, like nice ui, it's just so easy to install these things on your kubernetes cluster. uh You run like you know, psyllium um hubble enable and then like it gives you a you know.

A

It deploys a couple of components and then uh there's also a command to run to get dui, and you can see my services in my cluster talking to the world and you can see in the in the bottom section that, like you know all the different, like you know, specific requests and there's like some metadata about it. The other thing that, like not a lot of people are talking about, is distributed. Traces so distributed tracing, is a very tough topic because it requires you to.

A

You know propagate trace editors, but if you already have a trace editor in the incoming request, actually ebpf can help you to generate the data, because you know as soon as I see an incoming request with some header, I can generate a distributed tracing span. So you know, if you put generate your distributed tracing headers at your load, balancer or something you don't actually have to like instrument all of your web services.

A

You just need to make sure that you're you're passing the distributed tracing header around and you can get the data and you know you can go and make just a new modifications to the data type of data produced. You can add more attributes, you know you can just kind of like do things more programmatically to like enrich the data and so on. So this is actually a very cool thing that not a lot of people are talking about.

A

The other thing is continuous profile and a lot of people talked about it like one of the things I like about ebpf is actually very low. Overhead profiler is an example of a continuous profiler. There are so many of them nowadays. What is interesting about contin profiler? Is it unwinds the stack? So you can see like the kernel, you know stuff, invoking user space programs and be able to. You know like see the entire pro um you know profile uh without.

A

You know that breaking the same thing by the way exists in some of the other projects I mentioned, like pixi, there's also like extensibility side of it, which makes us really happy because you know, as I mentioned, that like sometimes you don't need this data all the time there's so much data and being able to extend um is is great like, um as I mentioned, that you can hand off an evpf program and like enable some more collection and some of the control plans like pixi, uh is actually making that more streamed line.

A

So you know you can pass a bp bf trace uh program and you know it kind of takes it and distributes to the existing agent to the existing nodes, um and you know, can collect more data, which is very cool. um The other thing is the decorating with context, I'm not sure if you're having run out of time, but um as I mentioned as you are collecting data in a user space program, this is where, like I think, magic comes in because in the same context, you can actually uh look for additional metadata.

A

In this case, I'm talking to the kubernetes api server to read, you know which cluster I mean, uh sort of the name space I mean like my pod and so on, like in in in um the type of data coming from ebpf events. It for network, for example, is like you have a source, ip and a destination ip. You don't know much about.

A

You know what's going on, if you just export it as it is it's not that useful, but if you can resolve what services those are like what pause or you know what additional metadata you know about those ips, then it becomes useful. So it's it's really nice to be able to. You know, decorate things with context. This is a profiling from pixi and, as you can see at the bottom, the data is broken down by namespace pods. You know container pid, so you can narrow down and you know, navigate the data. It's very useful.

A

You know when you have an incident and you just want to just go and focus on one specific thing. So I mentioned several projects. If you want to take notes, uh seleum and hubble does a lot of things. Pixi does so many um float mill was an earlier project. It's been sort of like merging to open telemetry. Now uh profiler is the continuous profiler and parka from one of the prometheus. Maintainers has just been released as a continuous profiler based on ebpf.

A

So what is coming up next, like I think a lot of people have different ideas. These are you know my ideas, I feel, like you know, there's a burden, because a lot of people are still you know finally struggling to write ebpf programs, so maybe a high level language, I'm not sure you know how it would look like, uh but it might make things more streamlined.

A

um We're talking about more platforms. Supporting ebpf windows is a very you know interesting example.

A

um At aws we have uh different, you know more restricted, like platforms like fargate, um we have a very tiny like vm over the different virtualization uh layer like um firecracker, so you know we're looking into like making ebpf available in these places, and the other problem is like you know. Some people are very far uh behind in terms of kernel version, so I hope that like people will be, you know just moving up uh because of all the goodness coming from ebpf.

A

The other thing is, I think, like there's a lot of, like you know, generally speaking, ebpf projects in terms of agents in terms of event, collectors or processors.

A

Maybe making some of these things will reusable will make make it. You know easier for people to just take the best and like build their own, like collection, and you know, processing agents. The other thing is a lot of people mentioned about.

A

Like you know, ebp programs are signed and sandbox, but you know if you are enabling disabling some of these things in production and especially like copy paste and someone else is, like you know, c code like it's just not great, um so you know it would be super nice if we were able to distribute them in a more science in a different way.

A

So that's something to discuss with the larger community. So um I just want to thank you. I hope that I didn't run out of my time. If you have any questions, you know find me here or email me, there's also an after party. By the way I realize uh pixie is a after party, the rooftop, if you're around, if you're in l.a in person you have to rsvp, I highly recommend you just check it out.