Cloud Native Computing Foundation TAG Observability, 15 Mar 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 2022-03-15 CNCF TAG Observability Meeting

Description

* Hubble, cillium.io, https://github.com/cilium/hubble
* Pixie, https://pixielabs.ai, https://github.com/pixie-io/pixie, blog.px.dev
* KubeCon EU, TAG Observability Meetup

#observability #ebpf #cncf

A

We've got an exciting agenda today we have two folk, two teams here uh to present. uh This is uh the mid month meeting of tag observability, it's a cncf related event. uh As such, the code of conduct does apply. uh Please don't do anything that would be in violation of that code and again, apologies for being a little tardy. uh Let's just uh start right away at the top of the agenda.

B

Yeah, so I actually put it put with my point should be very quick, so I wanted to announce that um we kind of organized um there was an opportunity to organize uh project specific meetings at the next cubecon in europe in valencia, which is in may, and we had opportunity to create some kind of time for dedicated for our tag meeting. So um it's kind of opportunity to meet everyone who will join kubecon in europe, so I just wanted to announce that it will be around 11 00 a.m. On monday, um I believe so monday.

B

It's kind of double checked 70th may is that monday, yeah, probably, um but essentially we have like dedicated two hours to um to speak about anything just meet each other, and if we need any special, I don't know equipment or anything. uh Let me know, and also we probably want to decide if we want to make it like only in person or do we want to maybe uh sync and have like a virtual conference as well like together within person. We can do that as well. We have a you know: projector, we have.

B

We have equipment so um also. We should probably build some agenda, so that's what I wanted to announce.

B

Any questions any proposals, any ideas.

B

It's actually 16th of may yeah. It's monday.

B

Okay, if nothing uh no no other comments, then I guess we can go to another agenda item which is hubble project presentation.

C

Hey folks, uh this is thomas.

C

I will talk about psyllium and hubble today and briefly introduce it. Let me start sharing here I only very minimal number of slides here, but I think it will help a little bit.

C

um My name is thomas, I'm, the co-founder of I surveyed and also one of the creators of psyllium and cylinder, is essentially the base project or the the overall project and hubble. The observability layer of celium is what we're talking about today, so limit itself as a cntf project at incubation level.

C

So if we're talking about like sodium and evpf, those are open source projects, um I'm also creator or founder of I surveillance, the company behind cylin um and hubble.

C

What is sodium selenium is actually more than observability, but today we will talk about the observability layer, which is called hubble overall cylinder also provides networking load, balancing syllabus is cni's and we can do service mesh. So we can do a lot of network security and also runtime security. But what's interesting today is this observability layer. We can provide permissions metrics, extensive flow loading, open, telemetry output and service dependency graphs. What is unique about cylinder is that serum itself has been entirely based on etf and uses ebpf to its full extent.

C

In fact, we have extended evpf for the first two years before we have even started the ceiling project. So let's talk briefly and look into this evpf technology. For those of you who have never heard about it, ebpf is actually very, very simple to understand. It makes the linux kernel programmable, essentially allowing to run a program such as this one.

C

This is c code, but it's also possible to write this in higher level languages to run a program when certain events in the kernel happen, in this case we're using a system call, but this could also be a network package, storage access, a function, call of the youtube space, user, space, application, a kernel trace point and so on. We can then use that program to actually extract visibility.

C

The program I'm showing here is actually called on every time the exact system call is being made, for example, when somebody invokes a new uh command on the shell, we can then export statistics like cpu. What is the pid? What is uid and so on? This is allowing us to build flame graphs and get a lot of additional visibility.

C

What is very unique about edpf is that it is essentially a general purpose or almost general purpose uh runtime, so it's very very similar to javascript in the browser, but for the linux code. So we can load bytecode um into the linux kernel and essentially run them, and that means we can all of a sudden essentially add functionality to the linux kernel that was not there before. So we can add our tracing observability functionality. We can parse http headers.

C

We can parse network editors, we can collect histograms of any sort we want without making the linux kernel itself, without actually making chrome source code changes for changes that would previously have required source code changes.

C

I'm sure I'm showing one example here, where we we can kind of make the impact of this so obvious. This is a quick benchmark of how effective how efficient epf-based visibility can be, in this case, we're showing the difference between a proxy or a sidecar based hdd visibility, attempt which is the yellow one for injecting proxies and an evpf-based one, which is the red one, and the baseline here essentially just the benchmark itself um is, is the blue bar. This is showing in particular, at higher requests per.

C

Second, how minimal the ebpf overhead is to, for example, provide http visibility. It's just one of the examples, but this is essentially true across the line that ebpf is both powerful and super low overhead, which is an ideal combination in the observability space overall, while ebpf is aligning, is actually much bigger, because the kernel is actually a super powerful place to extract visibility, because the current can see everything, but it has been very hard to get kernel changes into the hands of end users.

C

I've been kernel, developed, promote intent for more than 10 years at red hat, and it usually took years and years and years for a new kernel version to make it into the hands of end users, which made it very difficult to extend or build new functionality into the kernel and get that out to users quickly. Ebpf is changing this because, all of a sudden, we can make changes in real time or like at any given time and just run up load the programs and run them.

C

So this has allowed for a very similar innovation, as when javascript was added to browsers, because all of a sudden, we don't need to upgrade our browsers anymore, just to load a new website, which was clearly the case 20 years ago, where we had to upgrade our browsers frequently. So that's the power of ebgf, we'll skip the rest of here and go into hubble.

C

So hubble provides a lot of different things. um It provides this visibility on the left and look into that briefly. It provides metrics prometheus metrics, but they can also be exported into an srem. It's belong to elasticsearch or something and then also network tab or essentially distributing pcap as well, where you can actually get real copies of the of the of the network traffic that we're seeing.

C

This is the example I'm showing here in terms of how we have how hubble is evolving network flow visibility for the networking. Folks, if you this is enabling a slow lock, but it represents any five tuple-based uh logging. It's essentially, this ip is talking to this ip number of pipes, number of packets and so on, not really useful in the context of containers and kubernetes and cloud native.

C

This is the visibility that we provide, which at the first class, doesn't even show or look like a network flow log, but it actually shows you um not only the the now the the kubernetes name space, the part, the entire process answers to three who is invoking what we can see that docker d is the runtime here. It's spawned by system b, then we see a crawler binary which is running containerized.

C

Then we see that this crawler binder is invoking node app and then surprise, surprise. There was actually a compromised app here. We see that a reverse shell was used to reach out of the cluster, apparently received instructions. Then an attacker was using curls local around and we see the actual network connections um in the as arrows here with the destinations they've reached out, and we see that both the actual workload is called talking to our elasticsearch uh server in here, but then also the attacker attempted um to reach out to that.

C

We can even say that see the layer, 7 observability data here that we're actually observing and http get to slash user search, so very, very powerful visibility, because eepf can see everything from the network layer to the long time, layer and so on, but let's switch over to a live demo, where we actually see kind of what type of visibility that we can provide on top of this. So let me switch my screen here and show you a couple of the the metrics um that we can generate.

C

You should be seeing my graffana dashboard right now. Is that correct.

B

C

I hope excellent, so there's a ton of dashboards that you can build, I'm showing a few of them right now. So obviously we can look at kind of the raw network level, so this is prometheus exports um with just the grafana center grafana dashboard. We can export the same metrics as open, telemetry uh metrics as well or any format that you really want. So we can see, for example, for water versus dropped. We can see that a certain amount of traffic is constantly being dropped because of policy deny reasons.

C

For example, looking to the network layer, we see what type of tcp events are ongoing. We see how many tcp syns are being sent without being responded to. So this is essentially a graph that shows you how many connection timeouts or how many connections are timing out. We can see why packets are being denied. We can see, let's go into the http layer, so we can observe the entire http traffic that we're seeing and all of this obviously completely transparently. So no changes to apps or anything like this.

C

You can deploy this as a daemon set into your cluster and actually transparently get these metrics out, for example, what type of http traffic? What type of responses we're not getting any 404s right now, which is good latency of just graphing p50 p99. Here then dns, probably their most favorite dashboard by platform teams and also app teams. We can see what type of dns requests, what type of dns responses, how many errors? What are the parts currently receiving venus errors?

C

It is well people look at very very quickly because dns is the source of issues so many times, but then also um what are the queries that are actually being done. So that's kind of the network view of things.

C

um Let's also quickly look into, for example, a bit more advanced metrics, so we could example actually hook into the tcp layer and graph out the smooth round trip time of all tcp connections. We can look into what pod is consuming, how much or producing how much traffic. So you can see. For example, what's the max um traffic per part here- and this is um a service called node? Exporter is essentially at max pumping out 70 megs per second.

C

You can also look at the average, and we see that the hubble timescape in chester is averaging the most traffic. You can, for example, look at how much traffic for each part. um We can then look at the binary level, so not just at the pod level, but actually look into which binary inside what part is consuming or producing how much traffic, which also gives us, for example, we can look at cubelet right, not just actually containerized or pod level traffic, but also cubelet.

C

So this is the amount of network traffic that cumulus is producing or I'm receiving. We can look at dns, dns destinations traffic. We can look at how many, how much traffic in between pods we can see which parts are doing. Tcp retransmissions.

C

You can see that there is the source controller, which is talking to github.com, frequently and apparently that is receiving or doing tcp retro retransmissions, okay, occasionally so lots of different visibility or we could even go into let's say, lower level metrics. So this is the these are the network interface statistics you can actually see which node is pumping out or receiving how much traffic or interface are there interface, uh interface, errors and so on. So a massive amount of statistics and visibility that we can explore it.

C

So the kind of summarize I noticed was a quick intro to kind of summarize hubble is the observability layer of psyllium all edpf based it provides promising metrics from like lower network levels, all the way to layer, 7, hdp, kafka, dns and so on. We can use it from like platform, team levels or kind of security, metrics network policy, all the way into building gold and signal dashboards, because it's using evpf, it's completely transparent and it's part of the saline project, which is a cntf project integration level.

C

I hope this was a good first initial intro into hubble. If you are interested in it and want to learn more feel free to go to cilium.io or join our slack, lots of people are happy to answer questions there. You can obviously also dm me on twitter or on slack and I'm happy to answer questions as well.

C

Do we have time for questions or should we move on.

A

No, we, we absolutely do. Thank you so much um uh if you could also, after uh put a link to the slides that you presented as well into the doc uh downstream. But there is a question here from dan. I think he's still here, so I could let him ask it.

B

uh Yeah uh great presentation, we're actually exploring adopting a mesh right now and- and the key uh thing that's been murky to me- is like how do I actually enrich my metrics? um So this was a great presentation. Can you go into a little bit of detail uh on on how that actually works? What am I right? Let's say I have a metric x that I want to tag. um We do like ownership driven tagging, um so I want to say, like this team owns this metric uh through the sidecar. How would I achieve that?

B

C

So I think, a wonderful part versus that soleim achieves all this visibility without sidecar. So when we actually talk about service mesh, we're not talking the side card model, we're talking cycle three service mesh model, that's entirely edpf and envoy, driven, but without side cars from it from a label perspective and from an observability perspective. We have programmable metrics, which means you can actually add as much context to every parameters.

C

Metrics that you want, so you can have name space or labels or whatever, and then we actually have our back functionality where, based on those labels, you can then restrict who is seeing what's metric. So you can, for example, have a prometheus scraping endpoint. That only shows the magic of one particular namespace and then actually have like a dashboard, that's specific to a team that only sees that name space as well. Does that make sense?

C

So we can actually our back or scope with our back rules every metric and bound that to, for example, paul labels or a.

D

C

Name or some other identification that we can retrieve so essentially you can. You can enrich probatious metrics with as much metadata as you want. That's completely completely programmable.

B

uh What is it programmable, via.

C

So there's there's a static configuration then golang as well, so like a lot of can be done in static configuration um c or c or d based and then, if I want to want to go further, there's essentially a go plug-in layer that allows you to even create more metrics or aggregate them or put them or use a different aggregation form. And so on.

C

Did that answer the.

D

C

A

The other half of his question that he had put in the in the doc was: could you please elaborate on the relationship between hubble and psyllium? Yes,.

C

They're both they're tied to each other. So right now you cannot use hubble without sodium, but you don't have to use sodium as your primary cni to observe and use hubble, but essentially hubble bases itself on top of sodium and uses psyllium as its ebpf based infrastructure.

C

So you you're not required to migrate to silium acids tni, but you are required to also run selim as a demon set to load, edpf programs and have the base functionality around. Does that make sense.

E

uh Tom, can you ask a question I recently met with uh our director of network and what he just showed is exactly what he's looking for we're, mostly a double yes show and he's desperate to find something that will show all dependencies between all these network pieces. So I'm just curious, it's not just about kubernetes in general, because it's just part of this game. So how exactly you extract this data?

E

I just suggest using um just a simple grafana, slash, aws interface or using you extract this data by using any aws uh connectors exporters and later you're working with this data. Yeah.

C

So we have for metrics, we have um prometheus and open telemetry support for flow logging. That's essentially event based like we're seeing this connection, this connectionist connection, it's json.

C

We have a fluentd plugin, so you can export this into whatever system you want, and then we also have um a timescape based time series database, where you can actually store this persistently in your own kubernetes cluster as well, and then we also have hubble ui, which is a service dependency graph that ingests the the flow data on the metric data and shows that so that's that can, for example, show you a service dependency graph who's depending on what, but you can also achieve that with the open, telemetry export, if you're, using something like jagger or another.

C

uh That can show you the, uh but this.

E

Is sake I'm using fluency anywhere to push this data to look at right yep. So, let's say: if I'm just going is understanding. I just have to push this to another uh plug in your plugin and you would take care of this one. You would uh build that in all these subjects. This is what you would just okay, exactly yes, okay, so, but this is a only what could be discovered and make a relation map inside of a kubernetes server itself. Okay,.

C

You can also run cerlium on any linux machine and in in evpf, has recently been ported over to windows as well, but right now, celium or silicon windows is still in a kind of kind of an alpha level, but you can run cinema on any machine. You want it's not actually kubernetes specific. If there are no kubernetes parts, it will simply report the raw processes.

E

Okay, so can I ask them if ever since this information shows this information? So what can you do sure? Absolutely.

A

Yeah, we'll we'll have all uh all links and everything can blame. I can send out a blast to the email list as well. um I think eric had one question as well. um uh I think he's still online.

F

Yeah, so I actually have a lot of questions, but my most basic one is with all of those statistics running what kind of overhead does that take cause. I know that you showed that one graph that showed you know the more requests you had the it was just a tiny overhead but you're gathering a ton of information across a lot of different network interfaces.

C

Yeah, so I think that the massive benefit of ebpf is that all of this histogram collection, which, in the past you used to do kind of export, a lot of samples from kernels user space and then aggregate and collect histograms and user space. All of that is now done in kernel, which is very effective, so the real override actually comes at the level of prometheus and so on like prometheus memory.

C

This is where you think about like um label complexity and how much data to add the actual collection of the max itself is incredibly low overhead. So, even with like very extensive configurations, it's somewhere in the five to twenty percent override, it will depend a little bit on how many http requests per second, for example, you have, or if you have lots of low level, udp connections and you're. Looking at every packet, lots of things can be configured, but the actual collection layer is incredibly efficient.

C

Usually the overhead or the bottleneck is things like json encoding or how? How how prometheus, how much memory, um the actual promises instance will use, and so on so.

F

You're saying five to twenty percent, including prometheus or is that for the collection part, that's.

C

On the that's, a collection on the note, prometheus overhead is a bit hard to measure because it's usually not per node right like that's also, I mean we've promoted. It's usually like the the total complexity. You have how many labels, and so on uh will dictate how much memory you will use there.

F

And all of your graphs I saw were look like they're network based is it? Do you also do cpu utilization as well yeah? I mean.

C

We have dashboards, we show that as well, we're not, but then we're just using existing cpu collection like we haven't written a new collector for cpu stats, we're just using okay.

F

What about like different accelerators, besides networks? So there's you know like gpu, there's these encryption accelerators, there's all these different things out there or are you focused mostly on just the networking part.

C

I mean we have started out with networking only now. We've added this process context as well, and we're now expanding um right now, we're essentially very network heavy, very connectivity, heavy.

F

Okay and do do you support? Are you adding all the ebf by code yourself or is there like a module laser modular where people, let's just say they, wanted to add something into the hubble project? Do they do that themselves? Is that something you support or do do they have to contribute that upstream or how does that process work.

C

You would have to change hubble's last cell and code base codes to change the etf program, so yes, psyllium loads, the egf programs. um There is no pluggable infrastructure right now, where a user could could add raw egf programs. That could definitely be done and added, but that doesn't exist right now.

F

And kind of a really basic question, I didn't really understand: what's the difference between hubble and celium like what does celium do and what does hubble do like where, where are their lines at as far.

C

As the overall project, which also includes like network routing load, balancing network security enforcement, all of that and hubble is the observability part.

F

Oh okay, okay got it thanks.

A

If I could kind of tack on to that last question partially as a segue, but also because we ask it for a lot of the folks to come present, um you know how could could you provide either in the doc or provide a short overview of how someone might engage with the project of psyllium hubble, uh all of it uh if they wanted to contribute or had ideas to make it better? Where would they start, and what's the community and governance model uh like pragmatically.

C

Absolutely, as I mentioned and published part of stellium syllabus, the cntf project, uh github is getup.com.

C

uh You'll, find pointers to hubble there. The best community entry point is cylinder io, which will point you to the slack channel. We have. We have about 10 000 people, there's a development channel there. So all the development is happening in roma planning. All of that's happening on github and slack uh and you'll find all of those pointers on silent.io.

A

Awesome. Thank you very much. um If there are there any other, and thanks thanks again for for presenting um uh this is this is really cool. um uh Are there any other questions before we move on to the next presenter to pixie.

A

Okay, um well, thank you very much if they're, if they're, if there are further questions for for uh thomas or the the hubble uh team, uh feel free to use our slack channels or mailing list uh or uh other means, but um thank you again.

A

We've got zayn and michelle.

D

Hey, um hey everyone, uh I'm zayn. Hopefully you can remember me.

A

Yeah there's a little echo, but we can hear.

D

Okay, sorry about that, oh you know why uh probably broadcasting from two devices. uh Is it any better now.

A

uh No, but we can totally hear you it's totally fine.

D

Okay, great, so I wanted to give a little quick update on the pixie project. um Matt asked us to do a quick cast on that. um So let me pull up the slides real quick. There were a few slides so for people that are not familiar with pixie. We are currently a sandbox cncf project um that uses ebpf to enable a bunch of different observability use cases specifically on kubernetes.

D

C

There's definitely some overlap with which.

D

You solve, uh but we have a slightly different approach to it, and you know thomas and I out, we know each other on a bunch of things so anyways without further ado. um What is pixi pixie is a developer debugging platform with the goal of providing autobots visibility for your kubernetes cluster. um So once you get the xc installed without having to modify anything in your cluster, you'll get a bunch of different uh pieces of information.

D

Things like service health, uh logging request, tracing uh costco and a bunch of different things out of the box, and all this is done pretty transparently um using ebpf, and you can check out our website like ps.dev for a lot more of the technical details.

D

um So we kind of were built on like three different. You know hello principles. uh The first one being is everything is code driven on the fly and no manual. Instrumentation collection, uh pxe uses uh pixel, which is basically you know, pandas python dialect uh and it allows you to go and like process and build data processing for data. uh We split our storage between edge and cloud, so the high level over here is that we can collect a ton of data using udpf and we want to make sure that we can officially store it.

D

So most of the data is actually stored on the node until we need to like deep breath it out or something, um and then third thing is everything's api driven, so you can basically go and access all of the data available through pixi into an api to go enable other tools. uh For example, there is a pixi, you know grifana plug-in using using the pixel api.

D

What's new um so I'll kind of walk through the entire thing, because I'm not sure, however, how familiar everyone is, but what's specifically new in the kitchen what people have been calling pc is that we now have uh continuous profiling and flame graphs for java programs, um we have the support for node.js and opens acceleration uh for requests, uh kafka, tracing and a bunch of other other minor things, um so I'll actually just go through and do a quick overview demo of pixi, along with you know, show off a couple of the new features like java, continuous profiling and costco.

D

So if you're over here, um this is the main pixel ui. So once you you know get into pixie, you can get a very high level overview of all the http requests that are going on in your cluster and what is their? You know throughput. um You know latency and you know other information like error rates uh based on, like you know, different services, so there's actually a load generator running it's making a bunch of requests which then causes a bunch of downstream requests to happen.

D

um So remember all this happens as soon as the installments which takes about you know two.

C

To three minutes, depending.

D

On how kubernetes are stealing at that moment in time, um but after that, we instrument everything using evpl, so you don't have to do any additional work. um We can dive in to um more details over here. So, for example, um I can dive into this online batik pod and you can see like specifically for this namespace ps online boutique, where the request happening. What are the slower class, where the ranges of requests that we're seeing um if you specifically click into a service? Let's say I want to see what the checkout service is doing.

D

um I can get a more granular overview of the http requests errors latencies. um I can see what a sample of all the slow requests are. So if I click on this, um I can see that this is a http 2 request as a grpc request, and then here is the protobuf that was sent in. uh That was causing this request that you saw so you can use that for uh for debugging.

D

um If this is in go or java or sql's plus and a few other languages, you click in on the pod, you can actually get uh pretty detailed information like the flame graph. So if you dig in over here, you can go in and see where in the pod um cpu time is being spent.

D

um So if you're familiar with flame grass, basically the wider, the box, the more time it's being spent. uh Honestly, this pot is pretty idle, so you're not seeing a lot of activity, but it was very active. You'll see a bunch of horizontal bars that are very very wide.

D

um We also give you other information like oh, how much recruitment did this particular pod.

D

um So the only one I'm going to point out is all of this is basically done by a script, so you can probably see up here that their scripts changing with different arguments, uh so there's a little python script that goes on like grabs this data and captures it, um and it's an open repository of scripts that you can submit to on our github or you can have like private scripts if uh you don't want to share them with others, uh but as soon as you create a script.

D

Sorry, my way, okay, as soon.

C

As you create a script,.

D

uh And it's our github like in a little bit will you'll if you pull the latest version, uh you'll be able to see the the script pop up over here.

D

um So one of the quick things I'm going to show is some new feature here, which is the kafka stuff. So if you're interested in kafka go to the pxcoffee overview, we can basically scan the entire cluster across all these spaces and say: oh look, we see some cosplay traffic.

D

This is kind of a toy application, so you're going to see that there's a producer talking to the order service, uh which then talks to consumer shipping and consumer invoices you're, basically seeing the um you're, basically seeing all the costs of traffic flow through. um You can then go into like specific information about you know, topics and broker, cause and and actual data.

D

That's going through the kafka kafka cluster um for campbell you can actually see like here is the published topic and here is the destination and then the actual uh actual data that we're seeing.

D

So we encounter a fair amount of very detailed information. uh Here's something else produced, um so we we basically will capture all of the kafka information and put this in here. uh What else can I show you?

D

Oh, I don't think this one's instrumented yet, but if you have the right topic, you can actually take a look at things like producers and zero lag. um So if you enjoy the topics, we'll be able to tell you what it is to produce your consumer lag between different different content instances um in terms of java. That was the last thing I wanted to point out. um Actually I think it's in our confident demo.

D

So if I pull up a service, that's using uh java, like the orders service over here- um and uh let me pick actually in the right cod.

D

If I go over here, you can see that we're actually pulling in the plane grabs from you know the actual java process.

D

um So this is, you know one of the things that was actually a fair amount of workforce figure out. You can read all about it on our blog um and our dots, but basically um we instrument the jvm software through to be able to capture um all of the you know, actual jolla flame graphs, so typically.

C

E

D

With native code- and this is like our first intent of doing it in like a vm based environment,.

D

um Cool actually, the last.

C

D

Point out for folks that are not from earlier is that we also have the ability to write custom ebpf pros so, for example, there's a custom script over here um uh called tcp drops. So this one right.

D

There is a bpf tray script and you can deploy this across the cluster and then process it using some pixels um and once I run over here, what you'll see is that we're now going to deploy the ubpf pro across the cluster and then um it's going to take a few seconds to capture data and then now you're going to see like all the tcp drops that are happening across the entire uh entire cluster. So it's pretty easy for us to embed new vpnf scripts and pull the data into um yeah.

D

I think that's about all that I had um you know we support other protocols like postgres and http if everyone's in detail, but we're giving up based on that earlier, um what's coming soon um right now we're pretty actively working on open, telemetry export, um we're actually working on this thing called um plugins. So pixi only does data storage for a short period of time.

D

um You know usually like a few hours um so with the pixel plug and support, but we all have basically allowed us to plug in a different backend uh like prometheus or timescale, or something or commercial systems to be able to increase our data storage requirements uh and also do things like learning. um Then we're adding support for script, versioning and then the last one of the government side and we're trying to get more and more into like the public board meetings and governance process. um Hopefully our way to becoming an integrated project.

A

Yeah, thank you there's two already uh that we could start with uh in the doc, but any uh folks feel free to free to jump in. I think ken had the first one.

C

Yeah, I think you answered that, but in.

B

After I wrote down the question, but it was basically for java flame graphs. um Are you usually utilizing the data from jfr to build those blame graphs, or I think you mentioned? You actually got a special ebp ebpf process to do that.

C

Did I hear that right.

D

Yeah, so what we do for java is that um we basically use so right now for the continuous profiler. uh We basically capture all the stack.

C

Traces that are running on the cpu, like 100.

D

Times a second or something like that right so.

C

For every gpu we know exactly.

D

What the stack addresses are a hundred times a second some of those might end up being in you know, a java process and if they're under general process, we have a small agent that we can selectively insert.

A

Okay, cool um cool um then.

A

um Is there any sort of our back or control over who can either log into the ui and do queries or uh is there any kind of uh are back around uh who can deploy uh trace? uh Ebpf programs yeah? The capability is amazing but like, if you're, if you're thinking about like running a we're, putting this into a product into an organization, you know uh I'm sure a lot of cios, probably uh with all of this, would have some heartburn.

D

Yeah, let me let me turn this out, so this.

A

Is our roadmap.

D

So right now we have support for data application and what we call high security mode, which will restrict some of the features like they're talking about like dvds, script conjunction and stuff, that only like happens, uh the authentication will redact all the data that might have sensitive information in it. So right you know the ui, you can go view, http requests right, http request might have pii data in it um and we have support to basically offset all of that data.

D

um We are working on adding support for spine scripts so that, if there's a new script, that's executed, you can send it out so that it requires an admin approval before it. Let's keep especially um so this doesn't prevent people from being able to randomly add in, like you know your progress or something um so you'll be able to know that this script has been verified and it's safe to execute. On your system.

D

um We plan to add in um the basic r back, and um so we have some are about support right. There's some split between what admins can do and what users can do. Then our back support will kind of make that a little bit cleaner um and then we kind of add in um over the next headquarters cable level, our back so for every piece of information to collect. We can restrict who can access.

D

It uh then there's column level, our back, which will restrict which columns of data people can access and then there's our back by entity which will influence the cloud next year, um hopefully sooner but that'll allow things like. Oh, if you're in this namespace, you can access data about this namespace, but like not others, um so that's part of our security roadmap. I shared uh the link.

D

This is a public document, um we're still doing a better job documenting all this stuff, but we do plan to add in better better artifacts support, um but it's pretty limited right now.

A

That's awesome thanks.

A

Does anyone else have any questions for pixie or xander michelle rather.

A

Okay, I guess uh it's a quiet. Oh hello,.

A

All right, I guess that's it, then um um thanks very much for for the overview. um I I should. I should end with the same question. I asked for hubble: uh what's the best place to go to engage with the project, if you have ideas on how to improve it or you're interested in just joining that community.

D

Yeah so the best place for probably to go to our.

D

And then we have a pretty active blog as well, um which has a bunch of information about how a lot of this stuff is implemented under the hood.

D

um So if you're interested in that, I posted the links in the um in the zoom chat.

A

Great and I'm adding all that to the notes as well.

D

And then we also have a reactive community, so I think you can join that on that official.

D

I do have a belief in the zoom chat.

A

All right well, then, um thank you for thank you. uh Thank you both for for presenting um it's open floor or we could return four minutes to everyone's day. Awesome.

D

A

Yeah all right, I guess that's it uh have a great day. Everybody. Thank you.