GitLab Developer Evangelism Team, 28 Jan 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: eCHO episode 34: Parca

Description

Liz is joined by Frederic Branczyk to talk about Parca, a project that continuously profiles CPU & memory usage using eBPF

Show notes:

A

Hello and welcome to echo uh this is, I believe, episode number 34, so as always, we'd love to hear from you hear where you're watching us from do say hello in the comments uh we've got a ton of news today. Let me just bring up the uh headlines loads of articles to share this week, so I'm hoping some of you who've written.

A

Some of these articles will be watching either live or at some point in the future and do get in touch if you have other items, if you've written a blog post or made a video or in any way talked about psyllium or ebpf we'd love to hear about that.

A

So what has been going on this week?

A

We've got a bunch of talks, so let me just share my I don't know my my screen is there already, so the first of the talks that uh got broadcast this week is uh for guild 42 in bern and daniel balkman gave a talk about citium and ebpf and how it creates a really great data plane, and you will find on this site the slides and the video hi to quentin, who is moderating for us today and hello to matthias from berlin uh matia very similar names there and margarita who's from bavaria, hi margarita's previous guest.

A

I can't remember how long ago, which episode that was but uh good to see you again.

A

Okay, so do check out this data plane talk and if you liked uh daniel's presentation, you'll be really happy to see, there's another one that he was involved in this one here about bpf and spectre, featuring daniel piotr christian and benedict schluter.

A

Talking about how we can use bpf to mitigate transient execution attacks, like spectre, um I haven't seen this particular talk. Yet I have flicked through the slides. It's really really interesting stuff, and I have seen daniel talk a bit about this. I think it was at ebpf summit as well, so it's fascinating stuff and, uh if you're interested in how that uh those kind of speculative execution attacks work. This is really eye-opening stuff, hello to uh assadisman, from assad, in fact, from bangladesh all the m's in chat yeah. In fact, we've got another m.

A

We've got michael from bavaria as well, hi to christian from peru great to see you with us all right and then there's another couple of talks and I'm afraid they've got me in them. So here is one that uh myself and my colleague christopher did for our friends at container solutions, and we talked about what ebpf and psyllium are.

A

They have a really fun series of wtf in ours over there, and I was also uh earlier this week on this red hat developer: youtube channel. It's um called cute by example, kbe insider. So if you don't get enough of me on today's show, you can check that out as well.

A

Then, if you've still not heard enough from me, I was on a podcast. Now I actually recorded this last year, um but it came out. I saw it a bit on the uh the twitter sphere this week, so I I thought I'd include it today, uh where I had a really interesting and fun discussion with guy from snick about um security.

A

Ebpf came into it. uh I I enjoyed that conversation, so I hope you'll enjoy that one as well and then another one I haven't listened to yet, but I hear is really interesting with a past guest from uh from echo dave, tucker and uh brent salisbury, who were featured on this episode of heavy networking talking about ebpf and cloud native networking. So I can't wait to check that out. I hear it's good, so that's lots of things to watch lots of things to listen to. We've got a few things for you to read as well.

A

uh This is a really good article from arthur ciao. I haven't worked through all the examples here, but um it's exactly the kind of thing that that I love it's talking about how something works and showing you it in code. So uh what he's done here is implementing kievaneti's network policy and building an example of how that would work in ebpf. So that seems like a really fun thing to check out.

A

Then the other article that I've got for you today is about service mesh, uh I'm wondering if any of you watching today, I have tried out the psyllium service mesh beta, say hello, if you have um so, we've had that better out for well since, just before the holidays really, and we wanted to start hearing kind of how people have been getting on with that and starting to get some feedback on what the next steps should be.

A

We've also shared in this blog some of the results that we've already seen around what features people are particularly interested in, and so, for example, you can see. Observability is outstanding as the feature that people really want to see from a service mesh.

A

So we've got a the results from that first survey here in this blog post and you'll also see this link to another survey. So we can get feedback for the next stage.

A

Okay, that's some things to read, and then maybe there are some things that you might like to try out. There is an eb ebpf, debugger called edb. I think this is very new uh and it looks really interesting. I think the idea here is that you can run your ebpf code in user space to um debug them more easily than you could if they were actually running within the kernel. So if anybody's tried that out, let us know, um I think that could be a really interesting thing to explore on a future episode of echo.

A

What do you think?

A

Another new tool that we've seen is from rancher they've released something called lock c, which uses ebpf and the linux security modules interface. So if you're a regular viewer, you might remember we had kp singh on. I think it was in december talking about how that lsm interface uh sits in the kernel and how we can use use it to trigger ebpf programs to provide more flexible and bespoke security policies, but using that lsm interface and this lock c project looks to be doing exactly that. So another really interesting development there.

A

That would be fun to try out and the last item. I've got here under tools: it's not a new tool, it's a new location for the source code for bpf tool, so you will have seen we had quentin on back last year, talking about bpf tool and showing us all the amazing things we can do with bpf tool. He's got an incredible twitter thread of I don't know, 50 odd, tweets with different items that you can use bpf tool for.

A

If you want to build the code or look at the code, you can now find a copy of that on github. So the main development is still going to go on on the kernel mailing lists, uh but uh it's a much easier way to check out the the source code for that all right. Yes, mercedes, saying: let's bring up this comment so much good content. I really do think it's been a busy week or two in the world of evpf, which is fantastic to see all right.

A

So uh I think it's time to move on to the main topic for today's show. uh It's my pleasure to welcome to echo frederick franchuk hi, frederick.

B

Hey thanks for having me.

A

You're very welcome, so frederick is the founder of polar signals and uh they are working on a project called parker and uh yeah. We thought it'd be really fun to have frederick come on to echo and tell us a bit more about parker, which is using ebpf and maybe show us show us parker in action.

A

So um maybe just give us a couple of sentences about what parker is and and why you've built it.

B

Yeah, of course, um so so parker is a continuous profiling project um and continuous profiling is still kind of an emerging area in the observability world. So I think it's worth also spending a couple of sentences just on the methodology itself, without before diving into the tool itself. um So continuous profiling is in essence, uh just profiling.

B

um All the time right, we um profiling, has kind of been a part of the developer toolbox. Ever since programming has existed right like I I actually when I started the company, I did some research and basically, since the six 1960s we've been doing uh profiling because we always needed to understand what our running programs were doing.

A

And when we say profiling, we're talking about things like cpu usage memory, usage, correct.

B

Yeah, exactly theoretically, profiling can be anything where you have a stack trace and a number attached to that stack trace. um So we we've seen network. I o profilers, we've seen um block device profilers, so really anything any resource. Usually you can you can associate, but sometimes there are even profilers that are specific to sp to certain runtimes. For example, the go runtime has a special profiler for understanding go routines.

B

Those are never going to be, let's say in a generic profiler like linux perf, because the operating system just doesn't know about concepts like that, so there are definitely also profilers that are useful for your specific language and runtime, but coming back to like continuous profiling when we think about profiling most of the time. The first thing that kind of pops into our head when we think about always doing this is like this is like that sounds expensive right, like profiling is kind of.

B

Has this uh association with having overhead and they're kind of there there's. I guess two major uh reasons why we can do continuous profiling, so the the first and foremost is that we use a technique of of profiling called sampling profilers. There are kind of a couple of different kind of profilers, and talking about all of them is maybe a bit too uh too much for today. So I'm going to focus just on sampling profilers today, but xamarin profilers are essentially that we look at a stack trace in the case of cpu.

B

It's let's talk about cpu. We look at a stack trace, let's say 100 times per second and 100 times per second is actually relatively few for what a cpu does, and so that actually ends up having relatively low overhead, but because we're always profiling with continuous profiling.

B

100 samples per second over time actually accumulates and we get a statistically significant representation of our program, and so that's why we can always do this and then why we're here to talk about evpf, right, ebpf kind of put that put sampling profilers on steroids in terms of overhead lowering overhead? I guess because we can capture exactly the data that we want in exactly the format that we want and only export it every couple of seconds as opposed to capturing every single sample.

B

A

Right right, okay, so there's, I guess, there's almost sampling at two levels: there's the sampling within the kernel itself from the ebpf program, and then you can choose how frequently you export data from there into user space. Is that right.

B

Right right, so it's it's even so. We use the the like perf subsystem for this as well, but in instead of doing what, like the canonical perf tool does, which which just captures a whole lot more data than what we actually need. um We, we really only capture the stack, trace and capture. How often have we seen that stack trace and that's it and then every 10 seconds, our user space program kind of takes everything that we've saved in our ebpf map um like um resets it and in like after another 10 seconds.

B

We reiterate over it again, so really only every 10 seconds do we need to grab any data from kernel space at all.

A

Right, okay, and when you were thinking about the idea for parker was, did you have the we want to do continuous profiling? First or you know, how can we do that or were you thinking ah with this ebpf thing we can? This is something we could do continuous profiling with you know, was it evp, first or continuous profiling that you wanted to yeah.

B

It's a it's actually, it's funny that you ask it in that way, because when I, when I started the company, um my my first reaction actually was, I don't want to concern myself with the collection of profiling data at all. I felt like the harder problem was um storing it and analyzing it in a useful way, and you know I'm I'm the last couple of years.

B

I've spent in the go ecosystem, where profilers are excellent, um like really high quality profilers in great formats um like in standardized formats, and so I I felt like, at least for the start, that's some a problem that I wanted to set aside.

B

It wasn't until we kind of launched our uh like private beta program that we kind of realized that, if we don't find a way to do, we call the generic profiling back then, um where we kind of where we wouldn't require a user to add instrumentation to their code, because that's how it works with go.

B

For example, you actually need to change your code, redeploy your code to start profiling and that's something we wanted to avoid at all costs to not kind of fall into the same kind of category of observability as metrics or logging or tracing even like tracing is the the extreme, but even metrics in logging requires some amount of code changes and redeployment right.

B

So we were trying to find a way to mitigate this problem, and actually, since we've mentioned it a couple of times today, our very first proof of concept we're actually just using linux perf um and so after getting some like good successes with that we were like okay. How can we do? How can we do something even better or can we do something even better and that's kind of how we ultimately stumbled into evpf world.

A

Right, I think this is a probably relevant question from christian about how well what languages can parker support.

B

Yeah, that's a that's a great question, so um best supported are compiled languages today. uh So basically like go c, plus, plus c rust, haskell anything that compiles to you know machine executable code. um There is kind of a middle ground that also works quite well, but doesn't quite work in the zero configuration way just yet.

B

So whenever we deal with languages that or runtimes that have a just-in-time compiler, they actually eventually do produce machine readable like executable code right. So it takes a bit more effort, but most of those jit runtimes usually write the kind of mapping of memory addresses to symbol data that we humans understand to some standardized place, and then we can grab them there and that parka does have support for as well so like node.js and java erlang.

B

All of these have support for this, but they do require you to pass a flag to your to your command. So it's not quite zero instrumentation or not. It's not quite zero changes. It is zero instrumentation! You don't need to do anything to your code, but in the node.js uh case, for example, I think it's like dash dash prof perf map. I think, or something like that, that.

A

You'd add to the node executable.

B

Yes exactly right, um however, that's really just an intermediate state. The whole purpose of the parka project is that it's entirely hands-off. You only need to deploy the parka agent and you get whole system visibility. So that's that's just um you know. The perf map uh interface is something that the linux kernel had standardized for perf, and so that's something that exists that we can already take uh advantage of, but we want that experience to be even simpler than that right. As I said it, it's supposed to be 100. You don't have to do anything.

A

It's pretty interesting, I think that you know if you have a an interpreted language, your your computer is going to be spending time doing the interpretation as well as actually doing the the code. So I guess that's what you'll see in your profiling is. That is that true, so.

B

In the in the case uh where, where you have legit, um we kind of see both because it actually ends up being truly interpreted, uh uh execute um like a machine.

A

B

Code, but um in cases where we or we don't have a jit like let's say, ruby or python, um it's exactly like you say um and those we also need um specific integrations for uh which we are working on, where we actually inspect the python or ruby runtime memory, because at the end of the day they build kind of virtual processors. You can, you can think of it as um within the the interpreter, and so we just need to read the frames that they built in their memory as opposed to on operating system stack.

B

So it's all possible. It just needs more work and unfortunately for interpreted languages, it needs a per runtime uh specific support.

A

Yeah that makes sense, yeah and I feel like if you are prepared to do so. Maybe we should see it in action.

B

Yeah, absolutely, let's do it, so let me first share it's fairly simple: uh to set up, um let me actually share my.

A

While you're sorting out your screen, I'm just going to say hello to regular viewer russell nice to see you uh hi to door and vice, I hope I've said your names correctly great and there is your screen.

A

Sorry excellent.

B

I I've already got it set up, because it's really just these three commands, um but um what I'm I'm gonna show it on a kubernetes cluster and as I mentioned, we have kind of two components: we have the parka server, which is the thing that stores the the data and allows you to query the data with a built-in ui and then the thing that actually does the collection and the actual profiling is the parka agent. So um they are distinct components, and so you can.

B

You can use these commands, for example, to set it up on a local mini coop. I happen to have uh parker already set up. Let me switch to my terminal on a kubernetes cluster. Actually, in this case it is a k3s cluster, but it works just as well. Let's see.

B

So I've got this set up already so, as we can see, as I mentioned before, we have the parka agent running on every node in my kubernetes cluster and we have the main parka server um and kind of topology wise. um I think it's it's fairly simple. Actually, the the parka agent does the collection.

B

As I said, every 10 seconds, it grabs the profiling data from every container running on that on that node and then compiles that in the open standard pprof, which is a an open standard for representing profiling data that google developed um and then the parker agent takes that and sends that to the to the parker server and that's where we can then uh analyze it. And let me pull that up.

A

So it'll be the um the agent that is loading an ebpf program or I don't know.

B

A

Than one to do that, profiling, okay.

B

Exactly exactly, um we can actually uh in a second have a look at kind of, at least on a high level how that works. If people are interested, we can jump into the ebpf program itself as well. It's actually fairly fairly simple, it's only, I don't know, maybe 80 lines of code or something today um but yeah.

B

So uh when you, when you kind of open the parka ui- and this is the parka server that serves the ui and then also serves an api to query this data, the first thing you would do is you would select the type of profile that you want to query.

B

In this case, the parka agent, actually today only supports cpu profiling, but that's that's just a matter of starting with something and cpu tends to be the most uh expensive resource that we have in in terms of cloud resources um and that's why most people are most interested in optimizing their cpu cpu time. So that's why we kind of started with this one. We definitely want to add. You know, allocation profiling, network profiling, disk profiling.

B

All of these things that, eventually, you know, contribute to not only the cost um that that occurs by using using the cloud right, but also being able to understand just performance characteristics of our of our running applications. So.

A

Yeah, I I my first thought was: you know, you'd use it for kind of optimizing performance and spotting bottlenecks. You know what is it that's causing something to take a long time to respond, for example,.

B

Yeah yeah absolutely that's kind of the classic use case for for profiling right, but we find, since we have this whole system overview, we're profiling, absolutely everything all the time. We actually get a really fantastic view of what is my biggest cpu offender in my entire infrastructure and that then very very quickly becomes a tool for cost optimizations, because we can actually statistically significantly optimize our infrastructure right, not like individual points in time.

B

Those might be very interesting for let's say latency optimizations, but for cost optimizations being able to see the whole picture and seeing statistically significant data is actually incredibly helpful and, as a matter of fact, it tends to be the number one use case that we see for for a tool like that, as opposed to performance, profiling, interesting.

A

We've got a question here that I think, since we're just talking about running on on kind of managed cloud services uh door is asking whether or not you've run parker on a variety of different managed. Kubernetes services. Do you think there's any problems running on those platforms, and I guess the question there is: can you load the ebpf programs into the kernel on all those platforms.

B

Yeah yeah, that's that's, basically what it what it boils down to that said we have, uh we have it like. Our in production infrastructure runs on gke, so I can vouch for it definitely working there. um We we did uh make extra sure that it does work on the openshift container platform.

B

um We know of reports that eks works and aks, but I I'm not sure about the rancher or kubernetes engine, but I'm, as you said. Basically all of this revolves around whether we can load the evpf program and whether we have access to the kubernetes container runtime interface and we do support um we support docker.

B

We support cryo and we support container d for kind of discovering the the c groups to attach our profilers to um for for the um individual container, so that already kind of brings us to to the topic that you know if we have a cri runtime. That, let's say, is a virtual machine that wouldn't work. um I think there are like a handful of uh providers that do something like that.

A

I think maybe that leads me to to asking how does parker discover the kubernetes pods the containers? How does it kind of know what pod is associated with any given stack trace? I guess.

B

Yeah, that's a that's a great question and um that there's uh this is also. I I'll tell you how it is today, but also how it potentially will evolve in the future. um So right now today, the way that it works is we actually utilize the prometheus kubernetes service discovery?

B

If you, if you know me, this may not be super surprising, because I happen to be the maintainer of the kubernetes service discovery in prometheus and I'm generally a prometheus maintainer, but essentially how it works. Is each parka agent on every node basically asks the the kubernetes api to tell it?

B

What are the pods and containers running on the host that I am on and whenever a new one gets created or deleted, we discover the c group c groups associated with those containers and then attach our ebpf programs um using the perf subsystem to that c group, and that way, basically, every time. Basically, the the perf subsystem is kind of buffer based. uh No, it's overflow based so at most we would get a hundred um samples per second meaning, like at most.

B

Our ebpf program will be called 100 times per second, that is configurable, but that's the default that we use, um and so that's how we kind of get our sampling frequency and that's how we can correlate a stack trace with you know, a statistic.

B

Time, distribution, because you know if we capture at most 100 samples per second well, then one sample can be at least statistically looked at as 1 100 of a second.

A

And you've got uh an instance of that ebpf program attached to each different c group. So I guess they're.

B

A

Their data kind of yeah, hopefully we'll take a look at the code shortly.

A

B

um So yeah, let's, let's continue the demo uh here so, as I said, the very first thing you would do is you would select the type of profile and, as I said, this is all completely generic and based on the pprof format. So if you have prof formatted profiled, let's say for memory you can send those to parker as well, and the parka server will understand it. It's just that the evpf program doesn't support creating those today so yeah you can.

B

You can first kind of look at the cumulative metrics of each of the processes and if there is one particular one, let's say we had a spike here right. Maybe this one is one that we're particularly interested in understanding what's happening right, and we can then browse this particular profile that was taken at that point in time um and see where the cpu time was spent.

B

um So at this point it's some sometimes people haven't seen uh these types of visualizations yet so this is what's called a flame graph or actually in this representation it's called an icicle graph, a funny association with our brand. That was totally not intentional. You know polar signals um in terms of icicles.

B

I actually only found out after I'd I had like created the company and the brand and everything that brendan greg actually calls these um icicle grass when they're built from from the top down, as opposed from bottom up. I think.

A

They an icicle graph is a flame graph upside down right.

B

Exactly yeah yeah yeah yeah um anyway. So um the way the way you read these essentially, is that the very top one, the root kind of represents all cpu time uh like of this profile and then the further down we go. The spans are always relative to that root. So if we let's say hover over this one, we can see runtime that go exit. All the stacks under this particular function in cumulative make up 96 of uh our cpu cpu time, and then, if we go further down it, it goes down.

B

But basically what this means is that the if there's a span, let's say, for example, this one that has an overlap and not necessarily all children make up all the cpu time of that span. Then that means this. This kind of gap in between here that's optimization potential, because that's something that truly this function is doing um as opposed to what the what the children are doing here or if it's a leaf. If we can optimize this leaf, then it's going to have exactly that effect on the cumulative cpu time of our process.

B

So that's just a quick intro into how to how to read these and a couple of really cool things that you can do when you do continuous profiling, and this is not necessarily related to evpf, but I think it's important to understand why we're doing this on a on a high level.

B

As I said earlier, what we can do is we can not only look at profiling data as a point in time, but we can look at all the cpu time our process has spent over the lifetime of it of this process right. So here I've filtered down all of our data just to seeing the parka container so parka profiling itself.

B

Here is a bit meta, but um the what I want to demonstrate here is that we can look at the entire cpu time of the um of the process's lifetime and we can do that by hitting merge and what it does is it truly took all the data that we just saw on that graph and put it into a single report and why this is interesting and why this is good is now.

B

We've got not only a representation of this process at a point in time, but over the lifetime of the process, and if we can optimize something in this report, it means because this is kind of statistically significant data. It means it will actually translate into a cost saving of our of our infrastructure.

B

um So that's that's why uh continuous profiling, or it's one of the reasons why continuous profiling is so interesting, um and there are a couple of other things that are just possible because we're capturing this data over time earlier we were looking at this peak right, and so another thing- that's really interesting to do is look at differences of certain points in time or even entire versions of software right. So it's kind of this question: that's as old as software engineering, which is like.

B

Why was my process spending like having the cpu spike here, but not here right, and we can actually do that. We can select a point in time here. We can select the time here and it will tell us okay in this case it was pretty dramatic. Almost everything was worse, but we can see kind of shades of red where things were particularly bad right.

B

So here uh this is the difference, as we can see in the like second line in the hover bubble, um it was plus infinity which me, which means that this entire span was not there at all in the first um in the in the first profile that we selected so looking at all of the spans here. Basically, what we can infer from that is garbage collection was happening at this point in time and that's pretty typical to see in a garbage collected language. Let go but it's.

B

Finally, we have that true validation right like we're not guessing that it's garbage collection we can. Actually. We know this is the code that was executing. It was garbage collection for sure.

A

B

um And because the parka project kind of works on all kinds of profiles, we can do this with memory. We can say like why was there a memory uh spike here? What was the difference? uh What was eating up all my memory? All of these kinds of things.

A

B

A

Least, one fan here, madav saying this is so impressive, so awesome now, uh michael at least and I'm gonna vote for it as well. We'd like us to dive into the code, so uh is that okay? Can we do that.

B

Let's do it, let's do it great all right, let me pull up. um Let's see, I think it's going to be easier for me to navigate. If I do it from my.

B

My terminal, so let me share that.

B

I think you can yeah my terminal again hear that yes see what do I have there? Okay, that's fine! um So the the most interesting thing. I guess that people are going to want to see is the the evpf program itself right, yeah, so.

A

That looks good yeah.

B

So um the you know: if you've done this cut uh sorry, I need to not use neobim. New of them doesn't like.

B

Let's see code, I don't need a second it freezes and then I can close it.

A

I am resisting saying anything about editors.

B

So just normal then will do it so.

B

We, let's maybe take a step back before we dive into the program itself, the way that we build um and kind of load our ebpf programs is using lip bpf.

B

um I I'm guessing that it's been discussed numerous times uh here, but just for for folks that it might be kind of new, uh so the bpf kind of introduced um and I'm by far no expert. So if there's someone who's more knowledgeable and I'm saying something stupid, please correct me, but um basically it kind of started around this. uh Compile ones run everywhere: um initiative because it with like the bcc uh tool chain.

B

The way, the way that you did it was you chip, the compiler, with the thing that you wanted to do something with ebpf with, and it would compile your ebpf program with the kernel headers on that particular host on that host, not but not before you ship that program, and so um we were fortunate to start this project late enough that the compile ones run everywhere.

B

um Initiative was already uh like kind of very well established, and so this kind of turns the turns the the idea around where we can actually compile programs once to bpf bytecode and then, when it's loaded, things that are specific to the to this host are kind of, let's say, replaced um on that host when it's when the program is loaded, so there's kind of some uh relocations and a couple of things are being replaced potentially so that we don't actually have to have our like kernel, headers installed anymore, um and we can truly compile precompile our ebpf programs, let's say in our ci pipeline and then load it on any um compile ones, run everywhere, like capable kernel, and that kind of brings us to one of the things that we like vaguely touched on earlier, which is um this does require a relatively new kernel um so for for newer things it requires, at least I I believe it's 5.2 is the uh right.

A

And- and that relates to a question that martin uh asked earlier, he was wondering about golang support which isn't available on relate I'm guessing. Actually, it's more. The kernel support that's relevant to whether or not you can run on relate.

B

Yeah so actually kernel support works, fine.

B

So I'm we have a bunch of things in progress and with some of those things. That may not be the case anymore, um because you know this. This ecosystem is work moving so quickly that some of the things we may be using may not be supported in rel. Eight, um I don't think actually go. 117 is necessarily supported.

B

I think it's just what we happen to use when we did the like go mod initialization, um so I think that's something that we should check out, but we're doing our best to keep the kernel version as low as possible, but unfortunately, for like baseline, compile ones run everywhere. You need a 5.2 kernel, I'm not 100 sure which rail 8 kernel equivalent. That is yeah fine. So with.

A

B

Very long intro, um that's how we write our write, our ebpf programs, which is why you don't see- let's say a string replacement in here or something it's truly just the c code, and we compile it like that, um so we um with epf. This is also a little bit different with the bcc tool chain. We actually are even more restricted than what it was with bcc, so we need to uh pre-declare a bunch more things. So in this case we say um also how many, um how many addresses can be in a single?

B

No sorry, the max stack addresses are how many stacks. Can we unique stacks? Are we going to be able to capture and then the stack depth is? How deep are we allowing a stack to be that we're capturing?

B

Then we've got a couple of um ebpf maps and the interesting ones are um which one was what I'm kind of reading this right now as well, but um we've got conceptually. We've got one that has a mapping of a stack id to the actual stack right, and then we have another one that says the stack id to the observations.

B

So how often have we seen this particular stack, because that's how we are gonna, ultimately infer the cpu time right, because we're saying at most we're doing a hundred observations per second and the way that we're doing that is by saying we've seen this stack with this id, let's say five times and that's how we're inferring the cpu time from from there. That's essentially how any sampling profile ever works.

A

B

But because we're we're doing it like this and storing it in the ppf map, we can actually only every once in a while read these these maps from user space um and all the other time we can just completely leave it. Leave it running right. Just whenever the perf sub subsystem decides, this ebpf program should be run, it runs and inserts or increases. This particular stack trace that that we're seeing at that point in time. So these are the um definitions.

B

um This is this one's also important. That's what's actually already used above the way that we actually identify um the the stack that we're seeing with the with the counter is that we use the process id the id of the user space stack and the kernel stack. So that way, we're kind of uniquely identifying the stack trace of a particular process.

A

B

Here we're actually creating those and then.

B

This is kind of a helper function. Let's skip over that for now, and here.

A

Is example that looks good yeah.

B

This is the the actual program, so the very first thing is we we check. Is this even a pro process that we are remotely interested in? um I think zero is that this is some kernel thing. That's happening, um so definitely not a user space program that we're interested in right now, um the next and then we start building our um stack key, that we were defining above right and then we use a couple of uh bpf helper methods. So these are we didn't write these the bpf get stack id.

B

This is something that we get from the helpers and here we're saying that we want the user stack. So the what this does essentially is it walks the frame pointers, and we can talk about that um in a bit more detail in a bit because I actually have we've got some kind of new things that we're working on in this this area. So I've got some uh things that I can show about that because, like some visualizations but um vaguely speaking, we can.

B

We can say that we can we're able to capture the user space stack and the kernel stack, which is on the on the next line here, and this essentially also if this stack has been seen before we kind of don't do anything, but if it hasn't been seen before it writes it into our stack traces, bpf map and the the way it basically works is that we have the memory addresses that make up the the entire stack trace.

B

That's the value of our bpf map here and the id is basically a hash of all of that, um at least that's what how we can think of it and those are then what we use as part of our key here, okay and once we have essentially at this point, we've built our stack right, the the the key that uniquely identifies our stack, and at this point we just kind of check if we've seen this kind of stack before and if we haven't we created in our bpf map and if we have, we just fetch it, and then we atomically increment it yeah, that's basically the whole program.

B

A

How we grab that stack information, that's in the map, your user space code is, is accessing that map all right. I guess another question: um how did you when you've got some go code? You've got obviously you're. Writing your ebpf code in c. Did you use an ebpf library.

B

Yeah, so we use uh aqua sex. um I think you might have worked on that before uh the lippy pf go.

A

Okay, yeah yeah yeah.

B

A

A

So we got a question earlier about whether or not you need to be root, or even a privileged container, to run the parka agent.

B

Yeah great question: so basically, whenever you want to load in a bpf program and attach it to something, you're gonna need to be rude or I think, although it's not really any better, I think it's like uh capsis admin or something.

A

There's a cat bpf actually, but.

B

um Yeah, but I I think that one's really new like 5.8 or something like that, um but yeah, it doesn't really matter you're, very privileged when once you're able to load and attach a bpf program.

A

Yeah, I think I often say you know you shouldn't be giving out the ability to load bpf programs, any more than you would give out root privileges because yeah you have the ability to change the way the colonel behaves so yeah, it's yeah, it's great power, but great responsibility.

B

100 yeah yeah. um That said, we we do put a great deal of care into making sure that giving us that privilege um is, you know, uh respected as much as we can possibly do that and one of the things I think that is kind of cool. Since I do already have my editor here, um we put a lot of effort into actually doing byte by byte reproducible builds.

B

So um when you check out the repository of- and you you check out, let's say some uh some, like any of our versions that we've released uh and you build that container image it will bite by bite, be identical to the one that we've released, um and so that way you know we're introducing at least some amount of um like supply chain security, because you can be sure that the thing that we've published is genuinely the thing that you're executing as root right, so um that.

A

And a couple of not an easy task to build, you know, reproducible builds in in you know for docker images is, is not trivial, so.

B

A

Other conversation we could, uh we could have about that, but.

B

I think, because.

A

We're getting fairly close to the end of the hour, and I know you have a few things to to talk about around sort of what's coming next and william has asked he's hoping that in the near future, there'll be things like memory discs and and networking support. So do you want to tell us a bit about what the future holds for parker and what you have in class.

B

Yeah absolutely so, there are a lot of things happening and I'm going to keep it um focused on, what's hopefully most relevant to the to this audience. We have an entire company working on this uh kind of umbrella project right. So there are a lot of things happening, but specifically in the um bpf space. Let me share some slides.

B

Something that I'm um really excited about, and I'm not actually the one uh working on this, but we're kind of collaborating on on the idea of this. So as I, as I said earlier, the way that we currently build the stack in the bpf program is using that helper and the way that this helper does.

B

That is using something called frame pointers you can think of it as when, when our operating system or our compiler really builds our stack frames of our compiled programs, it reserves one register to say where the upper frame essentially is located or where it ends, I'm not exactly sure of the the exact semantics, but it doesn't really matter. The point is that we can use the frame pointers to kind of walk, a linked list to get the entire stack representation, and this is really great, because walking a linked list is super fast.

B

All of this is basically an l1 cpu cache. This is lightning fast. We we don't have to worry at all about doing this in production.

B

That said, unfortunately, um and brandon greg um comments on this uh in on his website as well. Unfortunately, there's a very evil. Compiler optimization that kind of ignores frame, pointers and reuses that register that frame pointers are supposed to be uh written into as an additional register. For you know, performance reasons I.

A

Didn't know that that's that's crazy!.

B

And uh this is incredibly common, unfortunately, in like the cbs plus world, but also when you do a rust build and you do the I forget if it's called production or release builds or something the default is to um omit frame pointers, unfortunately, and so, if they're not there, that means that this strategy of walking up the stack and as the linked list doesn't work anymore.

B

B

How do we still make this work right? How how do c, plus programs still tell us, even if they omit frame pointers? How do they tell us when an exception happens right? We still get a.

B

What do they call it? They get an exception trace. I think um so. How does that work right? If we, if we don't actually have that information, how to walk the stack?

B

Well, there is an additional um section in our elf binaries, um that is, that is kind of enforced to be there called um exception, handling frames, and these uh basically they're a lookup table how you can compute, where the frame starts and ends, and that way you can build um your build up your frame essentially, but you actually need this information in order to look it up every time, you're at some point in the stack to then find the next um kind of step in your stack trace, and so what we've been working on is kind of taking these exception, handling tables and minimizing them to the absolute necessary information and passing them to our ebpf program, and that way we can do the walking even from within the kernel space.

B

You know without needing anything else, because the alternative- and this is what perth does today- is we kind of take an entire snapshot of the whole stack, put it into user yeah into user space and then unwind it from there, because we can kind of do that offline. But that has a couple of drawbacks. First and foremost, we need to kind of copy that stack every single time.

B

We capture a a sample, as opposed to just walking a stack, but also it means that we're potentially grabbing some super sensitive data from a process's memory right and then at least somewhat persistently putting it into user space. And so ideally, we would be able to do this at collection time right, and so I'm I'm really excited about this, because it's kind of a novel thing. We haven't really seen this in other profilers before um so yeah. We're not even sure whether this is gonna work out, but if it does.

A

B

Be pretty cool, I think.

A

Yeah, that that is, that is cool, yeah yeah.

A

And a really interesting point that you know what is intended as an optimization could result in you know if everybody wanted to profile their applications, they might have to. You know, find a way around that optimization. Maybe maybe they could just turn off that optimization, but uh it's pretty cool that you'll have a sort of zero.

A

uh You know no need to change. Your configuration, no need to change your application approach to solving that problem. So that's that's cool.

B

That that said, please, if you have the possibility, please keep frame pointers if, if, if, if like someone, anyone who watches if you're walking away with anything after this episode today then plea that it please let it be, keep your frame pointers. It just makes life so hard. um It makes debugging and production completely impossible. It makes capturing this kind of data impossible and, as a matter of fact, we've been speaking to several engineers.

B

Who've built similar systems at you know, hyperscaler that you're thinking of right now and all of the hyperscalers have essentially had internal conversations, and they all came to the conclusion. It's not worth omitting them. Just just keep frame pointers. The the tiny performance gain that you're getting um is not worth not being able to understand your systems.

A

Super interesting, I think, that's a really great and interesting point to wind up on. I mean we've almost hit the top of the hour unless there's any last points you'd like to make frederick that's.

B

It thank you for having me.

A

Oh well, wonderful, thank you so much to all our audience and some great questions special shout out to michael for encouraging us to dive into the code. I appreciate your help and uh yeah. Thank you so much frederick for for being with us.

B

A

Back, I think next week, duffy should be hosting in the u.s time slot and uh yeah. Thank you. Everyone for being with us, we'll see you again soon.