Open Telemetry Uncategorized, 9 Feb 2023

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 2023-02-09 meeting

Description

OpenTelemetry Prometheus WG

A

You what's up, everybody.

B

But everyone a few more minutes.

C

A

Hey well I think we can maybe go ahead and get started.

A

um Yeah everybody here, I believe has been here before, but yeah um welcome back everyone and yeah today, I'm, not sure how much uh there is to discuss on uh Felix your side or the elastic side, but I. Guess it's just like a recap. From last week we went over the um we went over the uh collector Sig meeting, where we sort of presented our initial thoughts.

A

Our initial ideas about um you know profiling and how that will work in the collector or how it might work in the collector and um some of kind of just both to uh update them on our current thinking and then also to get some ideas and feedback. uh Part of that feedback was from um specifically tigran and the other TC members was just that there should probably be some benchmarking to uh I, guess yeah.

A

One of the big parts of the discussion was on stateless versus stateful protocols um and basically that, if it is State full, it adds a much higher level of complexity, and so that would need to be justified by some sort of benchmarks. That shows that the Savings of a stateful protocol are worth the added complexity, um so yeah that was probably the um one on the bigger side of the feedback there um and so yeah. So then, last week we kind of talked through some of the ways we could collect some data.

A

That would be helpful in moving the discussion along there um and so yeah. The biggest thing for the agenda today I think, will be the pr that Dimitri made, which basically uh gets a bench. It uses the benchmarking repo that we already had sort of established, but um basically uh uses that repo to kind of um get the ball rolling on.

A

You know, at least for now, a way for people to upload a bunch of profiles and get a a bunch of data from those profiles um and then eventually uh to show either um yeah I, don't know, show the benchmarking data on whether or not it makes sense to go with the state pool or the stateless protocol.

A

um I see a bunch of people have jumped in since I started talking but um yeah. Hopefully we will uh catch everybody up to any of the recap. They just missed throughout the meeting today.

A

um Yeah has anybody else have anything to add there or questions thoughts concerns?

A

If not, then I think maybe we can just start with you Dimitri. If you I'll paste the uh meeting notes. If people want to add to the attendees list and also see the agenda and the links um and yeah I don't know demon, do you want to kind of go through what you built, what it does so far, and what there's left to do.

D

Yeah so last time we talked about um testing the hypothesis that, if you well I, guess it's not really a hypothesis like if you remove, you know, symbols from P Prof, um the bandwidth. uh The requirements will go down right to go, the throughput will go down, um and so what I was trying to do is kind of figure out. Okay, if we did remove symbols, if we remove them with hash symbols or something like that, uh how much could we really save?

D

um And so here's kind of what I did I I wrote this Benchmark um program and um the way it works is it takes um a set of eprof files as input and for each file?

D

It generates kind of three versions, one um where symbols are completely removed, one where symbols are just encoded as strings um and the last one where symbols are hashed, um and you know my idea was that that would be like um simple way to kind of estimate how much we could save uh from using a system, that's similar to what elastic uses, um and um so then you know for for all three of these versions: um I encode them. As you know, protobuf, blobs and um I also GCF.

D

Those profiles and then I report in a table uh the sizes of each one, and then it shows you a percentage difference.

D

um You know how much do um symbols taken, uh how much do do uh hashes take um in addition to that? Another thing, I do is I, take all the symbols from all of these files and I put them together and I dedupe them and I gzip that as well to kind of see. Okay.

D

If we were to remove symbols completely from from the profiles we upload and then upload symbols separately, like I'm trying to estimate like um what kind of savings could we get um so I ran this on a few apps and I provided examples of those runs, um but here's kind of the findings um so for most apps, especially the ones that get any traffic that have um High CPU utilization uh profiles are typically in, like 10 to 50 kilobytes range.

D

I. Guess I didn't mention it here, but these are all CPU profiles. I did not look into memory or any other types of profiles. I also only looked at um goaling profiles at this point um so yeah, so they are usually in this kind of range symbols. Take 30 to 60 percent of the total size of uh B Prof.

D

um It varies for a couple of reasons. I noticed that when the profiles are on the smaller side, uh symbols tend to take. um You know a large portion of that, and the other thing I noticed is that when you use labels, um symbols take I, guess less. You know proportionally, because labels now take a lot a lot of space. So that's why you know the the difference um yeah and so for, for, as for the deduplication part, you know when I run it on 50 to 100 profiles.

D

You duplicated symbols, take also roughly 10 to 50K kind of similar to you know the sizes of the profiles themselves, so so that I feel like is you know, that's good news and I think a good case for kind of you know continuing with this um stateful approach um and yeah I kind of summarize here like in theory. If we found a way to um not send symbols every time, um it would be pretty good. We would get a pretty good reduction in size of each payload.

D

E

D

Questions yeah yeah.

C

D

B

D

Got to like the in well yeah anyway go ahead. Yeah.

F

um I guess I have a few questions. um The first one is simple, uh which numbers here are before or after gzipping the profile. It's like it's a 30 to 60 size of the total P profile is set in a g-zip people for, and this.

D

Is this is all after Jesus, so.

B

D

Of I I intentionally, don't even you know, measure before g-zipping, because I think it's you know um I guess it would be interesting to look at, but um I don't think it's like important at the end.

F

Yeah and then the second thing is a little bit more of a comment, two comments. Actually one is um I.

F

Guess you haven't had a chance yet to look into what deduplicating stack traces like the program counter lists would look like, which would be very interesting um and the second one is I'm a little confused on sort of like the further bullet points on like using um hashes for the symbols, because I don't think that would be needed for compiled languages, because the program counters, essentially the unique identifier for the symbol and you don't need a hash.

B

For it, somebody.

F

Correct me, if I'm wrong on that, but just curious.

D

um Yeah, that's a good point. I I also think like I, don't know if those are available for all languages and um but yeah I don't know, I I also don't know um Sean. Do you want to go next sure.

G

I just had a a quick question. I actually think um glorian asked the same question or kind of made the same comment on on slack um just in terms of the uh the symbol: removal in terms of how our stateful protocol works. We actually instead of hashing each symbol and then sending in a hash into symbol. uh We actually hashed the entire stack frame and then just send one hash representing the entire, um like stack instead of say, a hash per symbol per frame. If that makes sense,.

D

uh Yeah, so so I almost I almost got there, so that was kind of my next point. I thought that you know this naive, like let's, let's hash symbols themselves, I thought that would be kind of enough.

D

um But yes, what I found in practice is that this does not improve anything, and um the next step is I. Think exactly that I think we should try hashing stack traces.

D

um You know the whole thing, because uh that yeah, that that will likely uh improve the situation.

D

um Part of the reason why I did it this way is is that it was also much uh kind of simpler to implement you know in in P Prof you get like this. This string stable. It's really easy to just hash. Every single one, um yeah.

B

D

um I'd like to say, do you want to go next.

G

uh Alexi, we can't hear you I think you might be muted, perhaps.

H

Can you hear me now yeah, okay, sorry Zoom always confuses my microphone settings. uh One I had one question to Sean Sean. When you you said, like you hashed the whole stack Does. It include, like all frames, including the leaf. The reason I'm asking because like when you sample I, would expect that the leaf is much more random and if you like I, don't know exactly how you do this, but if you would sample like everything except the leaf and then send the leaf separately, that might be even more efficient, at least for CPU sampling.

H

Key profiling is a different story. It Stacks are much more deterministic there, but for CPU profiling. Leaf is always special, so I'm curious. You.

G

Yeah so right now we just hashed the entire thing, but yeah we've done some experiments and you get um some good data reduction by excluding the leaf, and then you also can get a little bit more by actually excluding the leaf and the caller of that. But we haven't got a. We never bothered to implement it. um It's kind of one of those things where we're like.

G

Okay, this gets a you'll say 90 of the job is done by just hashing everything and then um I think the it's been on our to-do list for a while to come back and exclude the framed.

G

um It's actually we're doing a port at the moment of our back end, and we may do it as part of that.

H

Okay um and another kind of like open-ended question, I was curious if, like if profile size can be reduced by also ordering things within the profile in some way. For example, if you take the string table in profile product and you sort it, would it be gzipped more efficiently and you would get the size reduction that way or could string table be be encoded as a try? For example, in I mean like there's, I was just looking at 30 and 60 percent and I was thinking like well.

H

If you go to tigran with look like this is 30 60 percent better, while it's still not order of magnitude.

H

So like I guess, one question is what is good enough or what is like big enough.

G

So think on that um maybe crystals you can speak to this, um didn't we recently also start doing kind of string interning or like the duplication of strings in our wire protocol as well.

E

Yeah, so we did this some time ago, so effectively we're doing similar things to what you see in the paper format. They have a string table, we have a string table and then I did a whole bunch of experiments uh when I was coming up with that. So I did try a few different coding schemes, including sorting but post visit. There was no not the simple difference, so we went with just simple staying interning uh no I mean that doesn't necessarily mean that you know that we shouldn't do additional experiments going forward.

E

That could be, you know, uh games that are to be had, but as far as making the case uh to dig around I, don't think we're gonna get another magnitude difference. This is these are also the numbers, by the way that I'm, seeing in my own tests of the LA I spent a few days last week doing experiments with the hostages coming up with numbers with our own optimized protocol uh and getting a super representative uh Benchmark going with uh handcrafting a new protocol to work statelessly.

E

That would be a lot of work, so I I I took some shortcuts there, uh but it's still a representative in in a good way. I think and the numbers I'm saying is on the argument of 30 to 40 percent uh Wars.

E

If, if we did things uh statelessly so in numbers, this basically translates to a single mastering single agent going from sending less than 300 megabytes a day to more than one gigabyte a day and that's with the 20 Hertz sampling rates, so 20 Hertz is not a lot and for the future we have plans to to drastically increase this right. So that's when uh things get interesting, because that's one machine sending one gigabyte a day with 20 Hertz we'll go to 200 Hertz.

E

Now that's 10 gigabytes a day. That's for what I'm, assuming right, the whole Fleet amplifies it number and then I'm not sure what the current cloud eager is, of course, are um just did a quick Google search and turn on the order of uh 0.08 dollars per gigabyte, I think for AWS now, basically for a click. The 100 machines saying that data translates to 90 a day, um uh egress costs.

E

So that's what I'm looking at right now.

F

um Yeah, that's super interesting. um I. Have a few comments uh on the question of sorting things. I did experiments not on sorting the string table but on sorting the program counters, like the samples in p profs uh to also get a little bit more compression, but it didn't move the needle either in my testing.

F

um So maybe it's not too big of an impact.

F

um The last thing that was set by Christos about the increasing the sampling rate, causing a proportional increase in bandwidth I, don't think that's the case for people, because in the people, if you aggregate into a one minute interval- and you will see a lot more stack traces being the same, um and so you should actually not see a linear effect, I, don't know how it will scale exactly, but it's probably logarithmic or something similar to that.

F

um Then uh what was the other point?

F

um Yeah I had a question on um for Dimitri, where the inputs are coming from, like the typical size 30 to 50 kilobyte, it seems a little low from what I've seen on my end, but not crazy low, like I I'm, just curious, where you got the data from the string table size does strike me at a little high, so I think I've. Seen profiles where the ratio to string table to samples is a little different. So just curious where the data is from.

D

Yeah, so the data is from a few um test: users that are running in basically production rate data from some users, and it is applications with um you know, for some I know, you know CPU utilization and I um mentioned it there uh for others, I I, don't know how you know how how much um do those apps actually work um and I mentioned it in the comments as well?

D

um There was definitely a split. You know some Services, it seems like it's. You know they process a lot of uh data constantly and those tend to have larger pcrops, and there are some services that are clearly kind of they're, mostly waiting for things.

B

D

And those tended to have, you know, sometimes even in like 5K, arrange, um be profs, um that's kind of what I've observed and and again these are CPU only um when we look at memory profiles, those tend to be much larger, but I haven't written analysis on like if it's you know large from symbols or not.

A

Yeah one one thing: I'll just jump in real quick to add there is that I guess yeah. That's also part of the uh you know. Obviously, like you know the actual like analysis piece like you said, there's like a bunch more optimizations that could be made um me and Dima talked about it, some as well as he was doing it and uh from last week, I. Remember part of the discussion.

A

um The elastic folks mentioned that you know that you were gonna, look into it, that it would take some time and so I figured it might not be time might not be best spent with Dima trying to sort of uh you know, I guess like recreate what you already are more familiar with in that sense, but I am curious, hopeful that uh maybe there's some way to um both, like you were saying, Felix that you know you see something slightly different that you know. Other people can also run this script on a bunch of profiles.

A

um You know it's all like there's no actual like data here, it's just the metrics about them, and so um you know, perhaps you could get a know Corpus of profiles to run it on the elastic folks could do the same, and we could even ask you know Community people as well to do a you know to run this script on their own profiles and we could at least get um you know some kind of mechanism for as we do come up with this algorithm or I. Guess.

A

However, you want to say a protocol that we can. Then um you know, run this script and get you know, data real data about real profiles from various. You know, corners of the profiling world and hopefully have a good case to make for whatever we. Ultimately, you know so that if we say it's 50, kilobytes or whatever it is that you know that's substantiated by a lot of data that that comes from um sorry, I didn't I went before you met, you want to uh jump in met.

I

um Sure uh I was thinking about sort of this problem of symbols and the trade-offs to make around them, and it. It strikes me that you know we really have two fundamental streams of data right. We've got. You know all of the symbolic info and strings and other things that multiplicated, if multiple instances of the same binary are running right across the fleet um and and that duplication could be quite extreme. If you're talking about hundreds of thousands of nodes or something like that or.

B

Hundreds of thousands of containers even.

I

uh Before you get to very large clusters- um uh and so then I got to thinking about how will this be used right so in in the processors in The Collector, for example like if those things could just focus on one of those two streams right either the the strings and symbols that fly by uh versus just the stream of of samples um that that later need to be combined with with that first stream right, the processor has become much simpler if we actually treat it like a multi-stream protocol, um a weird analogy might be like tcps um out of band data capacity uh facility, rather um so so that was kind of um just one idea I had and as to you know how to think about the the namespace or the key space of all of the possible symbols um in an unrelated work stream.

I

I've been looking at the project, Quine q-u-I-n-e and they kind of have an interesting design element that I thought might be relevant here where you know. Basically, in a nutshell, it's a giant graph database that does cool things, but in order to reduce the total size needed to store all of the nodes, they just say you know we assume that all nodes exist right and this is an event processing thing.

I

So it's like if I, if there's a node that doesn't exist yet, rather, if there's a note that I haven't gotten any attribute data about or any metadata about, in other words, I haven't heard about this node, it doesn't mean it doesn't exist. It just means it doesn't matter so I'm not spending any storage on it at all. So we could kind of have a similar approach here right where we say like we assume that you know for every individual, unique binary that has a unique symbolic info, hash or I ID.

I

You know just say across all languages across all things. You know we can have a global, unique key or just assumes that they all exist, but we don't have to instantiate them all and so, like that approach to a protocol, design kind of makes it very very lightweight. And then, if you have very large fleets of things running the same thing, you know you're only sending samples and there's no duplication at all. So so this trade-off kind of goes away a little bit. So I was kind of curious.

I

If anyone has experience in this application of having like a multi-stream protocol that separates those two different types of data that need to move across the wire.

G

So Matt, just on the on the first point, I'm not actually sure if I fully got um got what you meant, but just um so for us in our stapler protocol like we, we do separate the um uh like, say, ascending of frames, from sending of symbol data for those frames, um but it's a fairly straightforward thing.

G

It's just like essentially two like different types of messages um and similarly, if we have, uh if we have a stack, trace and frames and there's no symbols for that, we still just store it under the assumption that at some point in the future, something else will send symbol data um but like that, based on what I think you're describing sounds like it's something similar, but maybe I've just totally missed the the point is that is that possible.

I

I I guess I'm asking maybe, if I make it more concrete like say we're using the new protocol that this work stream comes up with or informs the creation of right and and if my scenario is very large fleets right, then you know I almost in the normative case, I don't want any symbolic info or any anything. That's not just samples um going. You know being sent at all period right, because I really only need, at the point of analysis like access to the symbols and and and IDs to them.

I

So when we're thinking about you know these trade-offs between compression and how how you know the trade-off between you know, stateful versus stateless and all of the other discussions like. Is that already inherent in the design for the for what we're thinking like like? In other words, if, if I have a hundred thousand containers all running the same binary, all being continuously profiled right is there, you know 99 999 of them that are only sending samples, you know because they don't need to send the other stuff.

I

Or is it the case that every individual uh instance of a profiled thing is going to negotiate its own? You know stateful protocol with the collector, where, in that you know for that one case, the symbols are only sent once, but but they they are still sent. Does that does that make sense.

F

I

This could make an extreme case of like a Lambda executing right where, like you want to profile the Lambda and there's only just you know some samples from that execution of the Lambda, and you know what version of the binary was running, but there's not even an attempt to move symbols or at anything that's not just sample data.

F

I

F

Think my my understanding is that uh the protocol stuff we've been discussing so far would allow symbols to be sent literally just once when you create a binary like during CI, but that only works for statically compiled code that doesn't dynamically create symbols. So if you have something like python or Ruby, um where you could maybe create a new function with a new name at runtime, um that symbol will have to be sent from each instance.

F

If you have no way of predeterministically knowing what symbols are going to be coming out of that thing until you run it um so I think the protocol would be designed to support both cases and users have statically compiled binaries could choose to only send the symbols once per binary, potentially yeah, not going through the collector but I think there's value in still sending that stream through the collector. So it could do some filtering on those symbols. If somebody decides that's something they care about, um but yeah, maybe Sean has other sorts on this. No.

G

Just like in practice that's how ours works, so the for Native binaries we by default, never send any symbols. We assume that somebody will have like a CI hook that will actually push those or for like um symbols to come from binaries in open source projects. We simply we in the back end mirror those uh debug symbols for the user, and then we automatically inject the symbols. But as Felix was saying yeah you have the issue with um with languages like python.

G

The ability to like just generate new symbols on the Fly, um and then you also have the problem of uh let's say: you're the operator at a large cluster you've all sorts of random containers containing like running all sorts of random software.

G

um At some point, um if you're doing whole system profiling, you're going to end up executing code that you don't know where that code is where it came from or possibly ever where to find it again.

G

um So we yeah that's why we kind of default to always send the symbols once the first time for interpreted languages and things like that um and okay I didn't Felix, just post a chat there that even Java apparently does um symbol creation on the Fly, um but.

B

G

The the basically tackle what Felix said, I think the the protocol that we're designing should handle um these cases and I mean if you have a user who really is just running a homogeneous, Fleet everywhere and they're. No, they know there's no symbol. Creation going on at runtime.

G

um There's no reason that the protocol, as we've kind of described it so far, wouldn't be able to support that I think um so. It makes sense to me.

A

Yeah um yeah, so I guess one thing that uh I definitely want to make sure we talk about. Today is just you know, okay, so after the last or whatever the collector Sig meeting, you know, Tyrone had mentioned that we need some more benchmarking data and I guess. The idea here was to take a step in a direction, hopefully the right direction, or um you know something near it towards getting toward. You know having something, a benchmark that we could eventually present.

A

You know to uh the TC to The Collector Sig, to whoever about you know what yeah like you know, what the actual data is um like I know. uh Like Christos, you mentioned that you uh found certain numbers when you were doing some analysis uh sort of on your side and I guess. The idea here is try to bring some analogous thing. I mean.

A

Maybe you can't represent the entire protocol in this, like you know, kind of small sandboxed, benchmarking Suite, but I'm, hoping that we can at least do something that uh again like the idea here is that, if we're going to go with this stateful protocol, we need like something that people can run, that we can show them some output metrics that justify that and I'm wondering what you all think. If this is a step in in that direction, and if this is something we can you know.

A

Obviously this is a first step and there's plenty of ways like Dima said: there's, um you know other types of profiles we can do, and we can maybe even add that as a column, so that you can, you know eventually, filter this by Language by profile type by you know various things and still you know, be able to look at the output metrics and um just have it be more of a comprehensive sort of Suite but uh yeah. What do you all think about if this is? You know something that we can potentially build on?

A

Even if it's not a perfect representation. For example of uh you know the exact stateful protocol that you use but of a um you, know, stateful like protocol and what it might look like if we were to implement it. uh Felix.

F

Yeah I think the the question comes down to like do. We all agree with the numbers that have been mentioned so far, which to me seem to hover around 50 percent decrease in bandwidth for realistic scenarios. Or does anybody here think that if we set it up a little differently we'll be closer to 90, because I think that would make a big difference like 2x versus 10x would would make a compelling case, but it sounds like we're going to be somewhere around two extra reduction and then twist. Does anybody see that differently.

A

And by that, do you mean uh the content of the profiles themselves or I? Guess like the way that they're being calculated, um because if it's the content of the profiles themselves, then I would say we could just add more profiles to this. You know you know to this list to come up. You know like if you're saying that the profiles you see are different. You can add your own profiles. That would you know, change it, but if you're I.

F

Don't I, don't think the ones I have would be would be totally different. um No I'm kind of wondering like if we think that's the overall bandwidth requirement of like sending let's say CPU profiles uh will be more than 2x different between the people, approach of buffering things up for, let's say 60 seconds, sending it versus see it like stateless approach of like doing the hashing and all the good stuff.

F

um That's what I'm curious about, because if, if we are all in agreement, then I think we like sort of had two methodologies Converge on similar numbers, and then it doesn't matter too much on how much more detail we add here. But if we think that the methodologies have led to wrong numbers, then we should continue iterating on that. Obviously,.

G

Cool I just had one question about the um the applications that were profiled. The one case, that's always in my head, um is uh like the case of java programs with very deep stacks and very long function. Names um and I was just wondering I'm guessing like uh Felix. Like you work with Java stuff, it sounds like and Dimitri I don't know like. Have you guys included in your in your testing? um Should we say Java applications and.

D

G

For Christos I was wondering like did you? What kind of applications did you look at.

D

So, on our side we do have a lot of uh Java traffic. The problem is all of it is in JFR format and so I would have to first you know, convert to P, Prof and I I guess I could, but that that would you know that would be quite an undertaking. So I don't have those I think yeah. For that. The easiest thing we could do is kind of go from the other end. Right like it would be, for you all to um to try to you, know, add symbols and then calculate the difference.

D

G

Cool and then Felix did you um test with with Java application.

F

I did not, but I just had a thought when Dimitri mentioned that uh it's difficult to convert shave R to P Prof, which might be prerequisite to make a fair comparison here. um Mark Henson's profile or pedia has at least one pass of tools that you can chain up together to go from JFR to P Prof. So maybe that's better than writing something from scratch.

F

All got the link in the meeting notes for that, but yeah I think we should look at Java a little bit more I've mostly looked at go but I do have access to Java and could take some looks there.

G

Alexa, do you want to go ahead and then I can say what else.

H

Yeah I have two questions about the Benchmark: one is uh I, I looked at the output and as soon as you print out percentages like as positive percentages, there's like no symbols and then symbols is X percent. Larger I would print it out. The other direction because, because like plus 100 is minus, fifty percent and in in the minus percentage, is what we care about.

H

I, don't I don't want to mentally, translate it uh that's uh again, but it's just it's just more mental load and another question is how many, how many profiles We compare, because if we compare like just two profiles, one like what is the Delta and what is the increase? If you have a series of profiles, then the savings can be can be greater because you ship most of the information in the first profile, but the subsequent profiles would be well. They they still wouldn't be that smaller, like right.

H

If it's a series of uh yeah, if the second profile is 2x smaller than the whole series, is still um I, guess kind of took smaller right. Yeah yeah I was curious like if, if it's, if you need to look at the whole series or two profiles, as you know, but two profiles gives you essentially a lower bound I. Think correct me. If I'm wrong, maybe I'm not thinking straight.

D

I I, don't know where you're getting two profiles, so so each line in that you know um output is a separate profile and then for each one, there's three numbers one without symbols, one you know the regular one and one with um cash symbols. Well,.

H

Yeah two profiles kind of like two profiles that we simulate from one profile by what essentially okay like by just looking. What is the symbol table um anyway? We can. We can skip this part. Oh.

D

I think I yeah I think I get what you mean. Yeah.

H

I I was curious like because in reality we will collect profiles from the same machine and I was trying to think like does it matter or what we do is good enough approximation, but I think it's.

C

H

It's kind of I think it's kind of low. It's kind of lower bound.

C

I think it comes down to the to the comment I had in Spec that if you sent the profile at first and send it with all frames, then you have some time uh some amount of data you spent, but for the second time, if you hash the complete trace and just send 128 bits for the hash, um it's a huge reduction in this case. So if you send it multiple times and um I, think most of us already can say that most of the programs are repeating something at some point.

C

So you will see the same profiles over and over again, and so you spent sending the data only once with the frames at the beginning. The first time you see it and the subsequent tests will only be sent hey. I have seen it again to increase the counter.

H

I guess one practical question is: do we want to extend the Benchmark to also duplicate the full Stacks like elastic? Does because I think that's one of the things that was mentioned, that is kind of like significant difference between what we try to emulate and what the current implementation actually does.

A

Yeah I guess that's the one that I'm most interested in too, like what, from a practical standpoint like what do we? How far is the current bench? You know what the pr that currently exists. How far is it from you know modifying the code of that PR in some way, or you know, merging the pr coming up with a separate PR to add that functionality, um you know I I, guess I, don't know how much people have had a chance to like really dig into the actual code of it, but yeah I I.

A

Think if, if it's reasonably, you know close uh the the kind of action item from last time, Marie said you know, elastic will look at measuring the efficiency of their protocol will take a few weeks. Yeah I'm wondering if, like this, can be sort of the you know the single place where, where that takes place, um but keep.

I

Tuning in a processor uh in in a collector processor like inline or or do you mean as part of the actual protocol.

A

I guess for now I'm just talking about in the in, for the benchmarking purposes: I, guess and then eventually that would go elsewhere. If I'm understanding your questionnaire.

A

um Yeah I don't know it's I'm curious. What people think, if you think this is a a good place to to update the benchmarking Suite, to kind of have that more sophisticated version that, uh like the one you're talking about fluorine foreign.

D

Well, I would kind of maybe go to back to Felix's point and, like maybe we should agree on like the theoretical number that we you know like is it? Is it like 2x maximum difference um or is it you know, 3x maximum difference that kind of thing and um I don't know what you guys think but like we could potentially even go with that and say hey best case scenario, we're going to cut this much um of you know, profile sizes. We go with stateful protocol. What do you guys think that kind of thing.

H

uh For me, it's like a two 10x is like wow 2x as like meh, not not like, not exactly my but it's like. If someone would come to me and say, like hey Alexi, let's have a stateful protocol for this system and I would say: what's the Delta, 2X and I always be like? ah Are you sure that's something to carry to carry for years uh or I would like try to check my benchmarks to see like? Is it really 2x? Maybe it's 10x after all, uh but I wonder what others think.

E

So to jump quickly in here and answer the Felix's question, the numbers that I'm saying with Java uh would scan the watch case like Sean mentioned so down. Benchmark I think it's all coach uh three to four x right. So that's it's not 2x! It's 3 to 4X- and this is again it's a preliminary Benchmark. So you know in reality it could be even better, but that will take more than a month of development time to actually get to test.

E

F

Yeah is that three to four x compared to JFR.

E

Okay, I'm, comparing stateless of stateful, so uh stateful three to four x, better in terms of less traffic on the Y right.

F

And what's your stateless Baseline, is it JFR.

E

No I don't have a specific program I'm taking our own uh elastic protocol and I'm simply putting the information back in right, so I'm sending everything and compressing so.

F

uh So you go from the stateful data you have and you're trying to create stateful files out of them, but they're, not necessarily GFR, encoded or anything like that. It's just you just dump it. Yes, all right.

E

So I I took I, took off stateful protocol and I cannot change it to be stateless, so it's still optimized it's still using the duplication, compression and everything. So all the numbers that I'm mentioning here are posted application post compression. So it's not a naive Benchmark it. It's somewhat representative of three other conditions, uh but no I didn't go and convert it to people for JFR or anything like that and I. Think that's another point. I wanted to make because the current.

E

Europe is not really the best format to to to use when we talk about stateful right, because essentially, we would design our own format to best accommodate a stateful protocol. We wouldn't use people uh so and to go back to to something we need to show tigran I think uh it's not going to be easy for us for elastic to come up with with a nice related Benchmark uh that that works in the same way that our hostating works.

E

It's going to be a lot of work, uh but in in terms of The Benchmark that we're looking at today. Yes, so the first thing that's happened. To my mind is we could simply compare paperov with a custom format, but simply like Florence said sends passage all traces and it sends it.

E

It has the trades plus an Associated count, which is kind of the gift of what the elastic protocol does uh so I wouldn't say any symbols whatsoever right, because you would assume that symbols would be sent separately once and periodically reset, but that period is large enough, but gets amortized out over the long run and it wouldn't affect the numbers.

E

So that's kind of what I would do. First.

G

Crystals to do that is it. uh It seems like it would be possible to do that in the context of Dimitris framework right, yeah, I guess. The challenge is um how we would represent um something like Gather in that right, because I guess there's no way to get a paper off profile from um from java. Is there like I I, exclusively use our tool to do whatever profiling so I? Don't really know what.

F

I I dropped the link in the channel and how we might be able to go from a GFR to a p Prof. um So it's it's a lossy uh way of doing it. It's using I think the folded stack format in between, but it's better than nothing. If we need a starting point.

B

It will get you the.

F

Symbols I think this.

F

um Yeah one one thing I also want to add, is I. Think if we think we can report a number, that's much better than 2x for potential savings, I think 4X yeah, it's not 10x, but we're getting closer to 10x um I. Think then it's worse uh yeah.

F

Getting that really nicely written up and and be able to show that if we think we can get 4X instead of 2x or maybe even more uh because I think if we present 2x to otel uh I think it will be difficult to argue for the complexities of a.

B

F

G

G

Would be um I, guess I'm wondering? Does it make sense to do both crystals? You write up what you have um based on what you've done just by making our existing protocol um stateful and then also try to in parallel, let's figure out some way to make the representation in Dimitri's format as faithful as possible, to what, like kind of say, a truly optimized um like status protocol, would be. Does that? Does that make sense, or do you think that's Overkill.

G

Because it sounds like it would, it would be valuable to have if we're going to use Dimitri's um like Benchmark and whatnot to demonstrate results. We should probably also have a representation in that of what we think the best possible results are. Does that make sense um besides Java? Is there anything else um that we can't get a good P, Prof representation from that's important foreign.

A

I guess maybe other languages right, I guess like. uh Would it matter for I guess what like python or Ruby or any other languages.

F

So if, if needed, I I can get people from all those languages because we use people for all the languages not just go. uh Java is the only exception where we use JFR right now,.

G

I think that would be pretty interesting to see, because if we can essentially within this framework construct what we think would be um an optimal representation of the stateless protocol and then we can have an apple Staples comparison.

A

Yeah Jonathan uh your hands up.

J

Yeah two things: um one of the the data conversion I have looked at JFR to B Prof. um The general case is hard and it is lossy, but if you're only interested in CPU samples uh dealing with that subsets, actually quite tractable, it's it's not that much code.

J

um The other point was around amortizing costs and the benchmarks. um A stateless protocol nevertheless does some of the same things a stateful one does in that it's batching a number of samples, along with a single set of symbols, and the interval of that batching is quite important because at some point, you've essentially seen all the symbols.

J

So additional samples packed into that batch add the samples but don't add any more symbols, and that starts to change the ratio of how much space the symbols are taking up compared to how much space the samples are taking up.

J

um So the number of samples you're packing into I, don't know a 60 second batch matters um and I think the benchmarks need to reflect that, and that goes to what load is on the server as well. If it's changing between a lot of different things over the course of that interval, that minute- and that looks different too, if it's running a tight loop on the same thing.

G

It makes sense.

A

Yeah I'm gonna make sense um yeah. So we have a couple minutes left, um yeah I, don't know I want to what we think. Next steps are I guess to be clear. On that um yeah I, don't know I mean it sounds like there's no I guess like huge objections, at least to the like. You know, Foundation, that this pull request is sort of uh starting so I'm thinking, maybe as next steps.

A

If um you know some people from this group can can check out that pull request uh review it and maybe we can even get it merged and then, as we start to iterate on it, you know we can kind of get into like a somewhat regular. uh Workflow of um you know creating a pull request to add functionality to add you know profiles for more languages. You know that kind of stuff.

A

um What do you guys? Think of that as at least a um hopefully somewhat easy, concrete, Next, Step um and then yeah I guess I'll stop there? um Do you guys think that's fair cool, um no objections speak now forever hold your peace, okay, cool um and then I guess yeah.

A

We can even make some issues for some of the other stuff that was mentioned today and um yeah, hopefully make progress towards that I I definitely agree with what Sean said that it would be nice to have, even if it's not a perfect representation, just some sort of um yeah representation of I guess yeah, the the more efficient way of uh I guess: hashing I, don't know how to describe it necessarily, but um have it in this repo. Just so that again we have like one place where we can say you know.

A

Look here, TC, look here, collector, Sig um and look here. Community I mean I. Think at some point. Once we get this to a point where uh we feel comfortable with the suite itself, I mean I. Do think it would be a cool even just like um to involve the community in general and have them just like run this script on some profiles that they have just so that more people are kind of involved in this process and getting familiar with profiling and stuff I feel like it has a lot of other positive effects too.

A

um But yeah I mean that's I, guess what I would see as immediate next steps? I don't know, um Felix I know you had mentioned last week, uh potentially creating a draft PR for something I, don't know if you wanted it or I just shot your hands.

G

Up just a quick question: just before we move on um uh Dimitri had you planned to add the um should we say more optimized versions of uh the the stateful protocol, or do you need um like kind of more Hands-On deck on that one.

D

um Yeah Ryan: what do you think about our plans with the Benchmark I would kind of refer to you.

A

Yeah I mean I I, guess.

A

Yeah I mean I, don't know, I I think definitely more hands on deck. I guess just uh maybe some guidance with what the again like it doesn't have to be perfect. But what the easiest way to you know, estimate approximate, you know whatever it is.

A

um You know what that might look like. uh I do think having ul's input on that would be good. Just you know, since you're more familiar with it, um yeah I say like. Maybe we can just create an issue for it and discuss their offline, and uh maybe then it will be easy to come up. You know, with with a board there.

G

That makes sense, I think I'll, like um on the elastic side, we'll sync up later and uh have a think about that as well, and then yeah just contribute that way. Yeah.

A

That way, we can kind of structure it out too of like exactly what what we're yeah, what the parameters are. There requirements um yeah.

F

Yeah, for the other thing you mentioned, uh I had planned to continue with the um pull request.

F

I had already shared a little bit in the last meeting, uh which was my attempt to kind of take the otlp uh uh protocol buffer stuff that uh collectors using and trying to figure out how our profile signal could look there um I had not did not have a chance to work on this in the last two weeks because of some other stuff, but I will do it for the next meeting and have some more to share on that yeah.

A

All good um we'll have other jobs here too, so uh yeah all right, cool, well, um I, think that is everything unless anybody else wants to add anything before we leave otherwise we'll see you all on GitHub and in a couple weeks see.

G

A

Hi everybody all right later.