Open Telemetry Uncategorized, 23 Jun 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 2022-06-23 meeting

Description

OpenTelemetry Prometheus WG

A

A

A

All right looks like we still got some people trickling in give them a couple more minutes.

B

B

C

I'll repost, the meeting notes in the chat.

A

All right, well, I think four minutes in I feel like we can probably go ahead and get started um yeah. So I guess just to start off. um Welcome slash, welcome back everybody um yeah. So to start off, I was thinking we could just kind of recap what we've been talking about at the meeting so far and then um kind of get into some stuff for today. So um so far there's been, I guess for those who don't know or if you're watching this on youtube.

A

Whatever the this group is meeting to talk about, creating an event type for profiles, um that's supported by hotel um we've had two meetings so far. This is the third the first meeting we kind of just met with everybody to get an idea of what people wanted out of a kind of standardized profiling format or profiling event, and then just kind of yeah sort of getting a a wide overview of what some of the goals might be.

A

um Second meeting we generally agreed on on a set of goals: they're, not you know, necessarily set in stone or anything yet, but we agreed on four so far of one being the ability for data center system-wide profiling.

A

That seemed to be for most the overwhelming majority of the people, the type of profiling that we were interested in being able to connect profiles to other hotel signals. That was another big goal uh representing slash transmitting profiles across native code, slash runtimes. So that was one that a couple people mentioned that I guess we didn't.

A

We didn't dig too much into that one yet, but that was definitely something that was interesting to some people and then uh being able to map between existing profiling formats and um whatever this new format we come up with, um because there are already a decent amount of um you know: there's a decent amount of support for some profiling formats, and so, whatever format we ultimately choose, um we identified it as a goal that that format should be able to play nicely with the existing formats and somehow or be able to migrate easily or something along those lines.

A

um Yeah. um I guess let me pause there if I missed any or if anybody um you know. I guess I I put out here on the the agenda that if there's any more that we want to add or you know after we've had you know a week to think about it. If there's any that people think we did not cover any aspects of any of those um I saw. I don't know if floor.

A

Oh yeah florian's here he had mentioned um in slack uh something about like the network bandwidth, um that's something we didn't explicitly talk about, um but I suppose that kind of falls under the bucket under the bucket of ability for data center system-wide profiling. Maybe um I don't know if you want to add anything there. Flooring.

D

uh Yeah sure uh so the question is: if there is always on profiling and system-wide and data center-wide, it will generate some kind of data around the network and depending on where everything is deployed. This can result in huge additional traffic and davro can cost customers uh additional network traffic. So um I think this should be also a point in the existing or proposed formats. How expensive is the proposed format or decided to format on.

A

Sure that's a good point. uh Sean.

E

I had a related question on florian's point, which is: how do the auto kind of working groups typically think about this? Like? Is it um like? Do they typically try to take into account the amount of data on the wire or, like the I don't know, amount of processing that might be involved in like either serializing or deserializing format, or um how is that typically thought about? Because, as florian says like in in the world of continuous profiling, like these are potentially fairly impactful things.

F

Yeah I can uh tell about what we did for open telemetry protocol. Obviously it was one of the points that we focused on the the wire size format when we worked on the otop- and I think it's also important- maybe even more important for profiling, because you expect maybe even higher volume of data, so it definitely is important.

F

So we when we were designing otlp, we did multiple uh alternate designs and deep comparison. Benchmarking from the site's perspective on the wire uncompressed compressed, also from the perspective of the cpu uh required for marshalling and marshalling all these things. So I mean it very likely applies to the profiling as well.

E

Awesome, that is exactly what I was hoping to hear.

A

Yeah, I'm curious if, um if there's some is that, I guess publicly available somewhere like in a issue or a pr or something that we could kind of see, get an idea of like how to set up a benchmark suite that would um you know, I guess sort of be in.

F

Line with what.

A

Was used before.

F

Yeah, maybe not the entire process as we went through it, but the the final proposal for the protocol. The otep is captured there and also there is a repository where we did the benchmarking and comparisons.

G

F

Including with other protocols or consensus, for example, I can give you the link.

A

Awesome yeah that would be that'd, be great.

A

Okay um yeah, so I guess related to that. Is there any other, and um I put a link in here as well to the uh the logs data model, um there's like a a file there, that sort of explains the logs data model and that's sort of where I was uh coming from for, like I guess, like the overall structure of this of, like you know, thinking about the goals, the types of fields that we care about.

A

um You know that kind of stuff, and so I'm curious if yeah, if over the past week, if anybody's had any, um if anybody wants to add anything to any of these goals or any separate goals that weren't mentioned here. um If not, we can go ahead and move on to the.

A

um I guess, like fields, part of the conversation.

A

All right, I guess, if, uh if anybody wants to add any uh either throw them in the chat or we can talk about them later um so yeah.

A

So one of the things that we had talked about last week as well was getting a little bit of a taxonomy on the existing formats for profiling, and so um you know, that's obviously important, as we start to think about this new format, to understand sort of um yeah, even as we just mentioned the benchmarks of the current formats, and I think that kind of gives us a somewhat of a range as well um once we actually get.

A

You know tangible numbers there of you know whatever this new format is if it falls somewhere between, for example, you know you know pprof and jfr in terms of like bandwidth or whatever the benchmarks we actually choose. um So I guess that is a good um segue, maybe yeah. First, let's talk about the ones that we had on here. So I believe, um is it um and uh alexei uh added some info here on prof.

A

I think that's probably a good place to start since that um when we pulled everybody last week that was uh the most popular format that people were using um and so so yeah. So we kind of um evaluated uh several different formats on a couple of those goals that we had mentioned um and so yeah. I wonder if uh one either alexa or um wants to uh you know chime in here about sort of uh pprof and how it stands up to. I guess these goals that we've mentioned so far.

D

I could start.

H

A little bit, and maybe alexi could probably, like maybe add stuff um you have more context in history or or I might say something wrong. You correct it um is my microphone working well yep, okay, um yeah, so I pretty much um try to answer the questions on that doc and I think the questions stem from like the the requirements um you were talking about, and so I think the kind of like the summary is our our main use case at google is, um I think it was mentioned in.

H

Maybe the first meeting is statistical profiling, which is more like a more like an aggregated view of profiles, um rather than having like a a time, stamped profile that we can look into like the change across time, and I think the meaning of that is- and this was brought up uh in my previous conversations with um felix. I don't think felix is here today, but um I think one thing one pain point of of this.

H

This p prof um format is that uh it doesn't like for each for each stack.

H

It doesn't support kind of like if, if we have identical stacks across time, we would have to represent the whole stack again and we don't have some sort of like reference to like.

H

Oh this is this stack and then multiple profiles, or it look just look like this, and so that kind of uh that kind of is a little disadvantage if we were working with like timestamp profiling, and, I think ryan you pointed pointed us to uh the the custom profile at pyroscope or something that is trying to address that issue, and I think uh yeah, if, if, if we we're also trying to support something more general like the timestamp profiles, um that would be the direction direction to go um and also we don't have, and I don't think we have like native support for um like coupling the profile data with, for example, tracing data.

H

um We only have support for adding arbitrary labels to samples, and so that kind of, like indirectly, you can add, like this kind of support indirectly with this feature, but I think um the the spirit of hotel is kind of like making these things more explicit.

H

um That's my impression at least, and another thing is like the question about- is prof stateful and uh I think the data format alone I mean it's, it just represents the profile, but if you're talking about the protocol, um we have something like uh the the server determines the frequency of the profiling. So um a client tries to talk to the server the server kind of like throttles it and kind of like lets.

H

The client know when it's time to collect that profile, and so I'm not sure like if otlp is at the name of the protocol, if hotel the auto protocol has something similar to this. But um if we want to make hotel like generic generic enough to at least support like google cloud, um that is one requirement. That is also. We probably also should talk about as well, um and I think finally the I I wasn't super sure I understood the the last question correctly.

H

Can people represent profiles generated across native code or runtimes, but um my interpretation of that is that um can is it able to represent enough info um if the profile is from like some native code like perf or is it like? A interpreter? Stack on java can those be mixed together, and so my answer is like yeah. It has.

H

uh It has enough fields to represent, like the the vicious information or it can just um represent, like, for example, function, names and lines, and that would be it uh so that that's kind of like the the summary of of p prof, and um I don't have a lot of um historical context when I, when I joined the team, um p prof, the proto was already there, but if the context matters, I think alexey would be the better one who couldn't who could answer history or, like maybe benchmark related questions.

B

um Yeah for benchmarks: probably can you hear me well uh um for benchmarks, not much to add? uh Overall ppro format tries to be compact. This is why, for example, like string tables and string and turning is used, um even though it's it's not like many protos. If you like, take just like protobufs in abstract, they don't do that so extreme in churning is kind of like people of was people of decision to minimize size on the wire further.

B

um On the other hand, one thing that I've been like over over my time dealing with people format. One thing I always went back and forth and wished like. Maybe we could have done it differently- is stack representation. Currently in people format stack is a flat table. Basically, so each stack will have full sequence of frames. It's not encoded as a tree, and that is like that is one obvious place that can be improved.

B

For example, we have an another like internal formats in the in a different tool, and there we encoded stack as uh basically it's two arrays one arrays like indexes of nodes and another array indexes of parents. You can encode tree in like basically two arrays, um and that is more efficient representation, uh but in people, if it's a flat table, on the other hand, having like flat curves is maybe more convenient super users.

B

So when you have an agent or like profiling, agent and those you have for many languages uh like four or six languages, it's kind of to optimize that the agent code is as simple as possible and with flat stack table that usually maps well to how profiling agents capture the stack during the runtime, because it's kind of like it would be a hassle to manage, extract.

B

Nodes and things like that, and with like stack stack flat.

B

A

One thing that I've.

B

A

I'm heading out, could you say that yeah you're cutting out a little bit? Could you say that last part uh we heard, I think it's a hassle and then you cut out a little bit.

B

B

Am I freezing again.

B

Is it now better.

I

It's better now, but it did freeze again.

B

Yeah, okay, uh yeah, it's it's a hassle to you know profiling, aim that runs within the application process.

B

It's a hassle to uh manage, like stack like basically like stack, because then you would need to deal with like lifetime of nodes. If you want to evict a specific stack, um I can put notes on this somewhere. Maybe that would be more. uh Maybe we could have something for. One question for the format's overview could be like what people are unhappy with with the format that they use and they there there might be some interesting findings. There.

A

Yeah, a great point: uh florian. I think you had a question first.

D

uh Yeah. Thank you um question on the format. Does the client do some aggregation of the stack, traces and data before sending it out or will the aggregation of data happen on the backend side so once it received.

B

B

Supposed to be aggregated stack by the client.

B

uh Sorry, let me move.

I

If you disable your camera alexi, it might be a little more stable.

B

uh Camera or let me also, let me also switch, get closer to what I maybe it's.

I

B

Can you hear me now? Is it better now, I think so? Okay, uh so clients are supposed to aggregate the metrics per stack. So if, uh for example like if it's cpu sampling, then uh cpu samples that happen at the same stack, they are supposed to be kind of. Accumulated accounts are supposed to be accumulated by the client.

B

But I'm not sure I get the question so like for statistical profiling me. Maybe maybe it's because, like we don't kind of stamp data, uh it's usually like we typically profiling is done for 10 seconds and during those 10 seconds you kind of get like called tree with associated sample counts. So that data is yes, it's aggregated, but maybe I'm missing the point of the question.

D

uh I think um I did understand you correctly, that you have sampling rate or have time frames of 10 seconds and aggregate on the client side. These stack traces before singing them out.

B

Profiling is for 10 seconds and then the sampling period is uh 100 hertz and those samples that occur within those 10 seconds will be aggregated per cold stack. Does it make sense.

D

Yeah, thank you.

B

E

Sean my question is actually a little bit of a follow-up. um What I was wondering about when uh alexia and um were describing their uh wire format, is like it's a little bit difficult to uh kind of evaluate this kind of the suitability of a wire form without kind of knowing the context in which it's used. This kind of touches on what alexa was just saying.

E

I was kind of wondering um if, alongside the wire format, descriptions that we provide, if it might also be worth describing like some sort of architecture for how we currently use the um like our agents, because I think what alexis is describing is actually quite a bit different to what we do.

E

So I think he's describing an architecture where, like they will trigger collection for a certain amount of time intermittently on some subset of nodes, um but the requirements that you might have for that are quite different to if, for example, your assumption was actually run all the time on all nodes at, like you know, maybe a lower frequency like 20 hertz or something because that's that's what our use case is, and I think, depending on what way you were to come at it, you might end up with a very different wire format being satisfactory.

E

um So I don't know how we.

B

Yeah, I would say I would say.

B

I would say people of uh format is not designed with streaming in mind and I think you're talking about more of something like streaming, where you have kind of like ongoing profiling, and then clients will send essentially delta for over, like explicitly taking the previous statement.

B

If, if I summarize that correctly shown.

E

Yeah, to a degree like I guess, the the main difference is we're say continuously hurts on all nodes all the time and the constraints that you have there end up being different. I guess that if you were um otherwise uh doing it, maybe perhaps less frequently, um but I guess my point really is: I wonder when we're describing the the wire formats, if it would be helpful, to add some sort of like description of how we're actually using them in kind of in production at the moment, um because it will just help contextualize the solutions.

A

Yeah, that's a great point.

C

We can definitely add that in um yeah I think it's an important piece by the way. One one thing.

B

Before just before I forget, uh I don't think we describe anywhere like in in what wire format we are going to like, or I don't know like which wire formats hotel actually uses, for example, for protobufs.

B

Is it like json encoding or is it like protobuf protobuf, binary encoding, because what we found in the past is that, for example, json encoding for certain prototypes, like, for example, if you have many many small, if you have like repeated field with small with many small objects, that has quite an efficient encoding in json format, just because of json overhead.

B

So I wonder if we need to take that into account, because, for example in not in this format, within some internal formats that we had to optimize for json encoding, we, for example, we used uh structures of arrays extensively and, like again like. I, don't know how deep we want to go here, but it would be nice to capture maybe somewhere, do we want to optimize for any specific uh actual, like wire representations, so.

I

B

Just to quickly answer that.

I

So in open telemetry there's the native open symmetry protocol that the different components used to talk to each other over the wire otlp has an http json representation, as well as a grpc protobuf representation. It has both.

F

I

Would, in the fullness of time, to support both.

F

Yeah yeah, the the binary protobuf, was the first uh encoding that was introduced in otlp. Json was proposed later. It is not stable yet, but will like it will likely become stable, but both are supported and for transport we support both grpc and http, uh and uh I guess to answer the other question. I think it was about throttling. Yes, odlp does do throttling.

F

The server can signal to the client that the server is overloaded and the client is supposed to follow a specific exponential back-off strategy, but it is a generic throttling it it. It is not. It doesn't have any domain knowledge right like like what was described by um where the server may may know what is profiling and offer a specific uh profiling rate to the client.

F

That is not what ltlp does it's just a generic signal to back off and overload it come back later, something like that. You know tlp and another thing about the the dictionary encoding which people does by the string stable. We did try that for tlp it it is significant savings on uncompressed data.

F

When you do compress this data, it is less significant. Let's say this way right, so it would be interesting to see how much you actually gain when you do.

F

The the gz gzip is pretty efficient at eliminating this duplication, uh which is what you, what what you actually gain right by doing the dictionary encoding, it probably still is worth doing, but uh well I'm curious to see maybe some benchmarks which show how much you gain, particularly with compression, because you're very, very likely going to be doing compression uh when you send this data over the network.

B

Yes, but uh compression assumes that you at one point in time you have to hold in memory both uncompressed and compressed data set and for in process profiling. This can have like memory usage impact for the profile replication.

F

B

That's true yeah.

F

Yeah, I'm I'm not saying it's it's bad, I'm saying the the the effects of it on the wire may be less significant than than it's maybe expecting yeah. I agree with that.

A

Okay, cool: um we can, I guess, yeah like um I'll. We can kind of follow up after this meeting on more. I guess I guess maybe more explaining how all of those how otlp currently does it and that kind of stuff some documents there. uh In the meantime, we also have uh so um dimitri. You want to talk a little bit about the custom format that is um somewhat close to prof um yeah and give a little context there.

J

Yeah, so we've been internally working on a kind of variation of p prop that is slightly more optimized for um correlating profiles with other hotel data, particularly traces as well as also kind of optimizing.

J

The network bandwidth um a little bit um I can are we allowed to screen share in this meeting.

I

Oh yeah definitely.

J

All right, so let me maybe uh show this visually so on the left we have dprop. This is a visualization of the protofile and on the right. We have this variation that we're internally. Working on um this is not currently used in production.

J

um This is um yeah still kind of work in progress, um so particularly the the things I'm talking about um you can like. You can see that these are pretty similar. You know the mapping stuff is uh the same location line. A lot of these things are um more or less the same.

J

What's different is the representation of samples, um the the kind of the problem we're um seeing with eprof when you correlate profiling data with traces is that you end up with a lot of these simple structures where each one of these represents a stack trace, connects it to a set of labels and has some sort of a value and, and the problem is, if you have a lot of different um combinations of labels, um you end up with a lot of duplication because for for a lot of kind of label sets you'll have the same.

J

You know combinations of location ids. I hope that makes sense. So we tried to kind of um normalize this a little bit and we moved these things to a separate stack, trace structure and that allows us to kind of de-duplicate these.

J

um In addition to that, we are kind of splitting the kind of the global kind of profile um profile context. It's it's called scope profiles here. um Maybe this is not.

J

I guess I get the ideas. um I think it's called this way because it it somehow um similar to naming and um hotel um not super sure, but the point is that we do this split uh to kind of um represent the profile like the the kind of the holding structure that we sent um and then separately represent all those separate profiles that are associated with traces um and so and the linkage happens.

J

uh I think, through this link structure and.

J

It you know it has. It has things like trace id spam id um things like that, and I guess the last thing I'll say is we also kind of started to incorporate other types that are defined in otel spec to this format? For example, we use this. You know key value structure that is already defined to encode attributes, which I think are kind of the same as labels.

J

And I think we do a few other similar things like that. I think this concept of kind of specifying, if, if a metric is cumulative or not, I think this is copied from um from a similar concept in metrics spec um yeah. I don't know, there's like there's a dog that kind of describes these in more details, um I'm happy to answer any questions. People have so far.

F

This should also let you do time-sampled profiles right, because you can refer to the same stack, trace multiple times more efficiently.

J

Yes, so and that's another thing, we kind of considered it's not here yet, but I think I think it would be relatively easy with this format. To yes, add uh timestamps um yeah.

F

I'm kind of I see you have you have the timestamps there already start time, unix nano and time.

J

Oh you're, talking about the last time I see what you're saying um yes it it. It adds some kind of nice things around pprof as well um start time and time also kind of like global.

J

Actually wait, let me think about that. uh Maybe it doesn't do that yet um you know. Another thing we talked about earlier is adding global labels to like the whole profile. I think it's um yeah. Maybe it's not here in this version.

A

uh Let's see, is it john d.

K

John sorry, I need to rename that it should say pete um I'll answer it either um dimitri that um gotta say those are really nice diagrams. Thank you for taking the time to render those very helpful a couple. I I think this comment is more. What's what thoughts I'm having?

K

As as we dig into these details and one thought that I realized, um I don't think we've discussed it much, but what's the if you're just writing an agent, you may not want to comprehend the um vast amount of thought that went into the normalization scheme within like the protocol that we're describing and what do we want?

K

The you know api into this thing to be, and what's the division of labor, uh because if we're g-zipping or hashing or doing something else, it just starts to raise the bar quite a bit on what a um you know. What the person on the other end of this is going to be how they're going to interface with it. So that's a thought that I'm having right now like um what is the division of labor is the are.

K

Is this what we're going to invent here responsible for taking a more simple like I'm, just gonna, throw you a stack trace and it's going to and then some other piece of machinery that I don't own is going to encode it this way for me, because that's kind of my secret hope, if that's if this is where we want to go um that is- and the other thought in my mind is I I do think. The convergence around you know scaled up.

K

Data center wide is very real, but I do see a bit of a um a few differentiations on how that's done some um pprof discussion is we're going to target something for 10 seconds and and get that and then there's some other version of things where we're always running profiling and maybe at a lower frequency, sending everything we're very efficient about how we send it.

K

I I do feel like that discussion is quite interesting. I I also wonder if we know what the is it a push or a pull model or somewhere in between by push or pull, I mean, um I guess.

K

My baseline- and I don't know if this is the right bass line- is that you're always sending it like you're, always profiling, always sending, but I'm not all the profiles, but I'm not sure that's what we're going to say here.

K

Oh sorry, final thought is, and I think this might be something we just do is once we do this. We can do some back of the envelope like what's the bytes per packet and how what's the packet rate we're going to send out in some vague sense. I think there should be a very easy back of the envelope sort of a way to comprehend how much data is going to be sent over the wire.

K

A couple couple thoughts, there's.

A

Thank you, yeah. Those are all great points um I added to the the goal section um I mean. Obviously we will have to define what reasonably easy is, but I put that it should be reasonably easy to to use this new format. um You know so.

G

A

uh So there's it's not as much of a headache for those who have to work with it, um but yeah all great thoughts. um Jason. You had something you wanted to add. Yeah.

L

Thanks ryan, uh dimitri thanks for showing that I'll, try and keep it short. um I'm curious in in that format, if you have anything that uh accounts for thread, identity for multi-threaded languages and if that's important at all and then I have a follow up.

J

When you say thread identity, like kind of like linking specific profiles or stack traces to like particular threads.

L

J

Yeah we we yeah, it doesn't have that out of the box. I guess like if you treat thread ideas just another attribute or a label, um you could encode that information that way, um I'm kind of curious. Actually, so do you like the way our system works? We never really like we're. Never really interested in the specific you know thread id of where um something was running. So I wonder like why. Why is this even um important.

L

Yeah, so I mean if one of the goals of profiling is to track down where you have uh hotspots or problem points that can often be associated with uh problem thread. That's like a place for a developer to go look like if I have the name of a thread that can help me to maybe pinpoint.

L

But okay, so that I mean I don't think it's a huge deal. It's not that important.

L

um I mean ideas, it's probably both actually, but uh I think yeah I think id is probably more unique, but then to a human user. It's meaningless right. So it's probably both.

B

It just seems to get into the land of uh debugging, because when I collect the data from like hundreds of production machines, I I don't know what the thread ids are. Usually it's like. It's cattle, not pets.

L

Yep yeah respect that um okay and then uh I noticed in the linking with uh span and trace identity that you included, attributes there and I'm wondering what the what the need for that was right, because those attributes I think are, is that the same set of attributes that exist on the span itself or is it a different set of attributes just from the data model that you showed earlier.

J

Yeah, that's a great question that might just be a mistake honestly.

H

J

Because yeah they are, they seem to be the same.

L

Cool okay, thanks.

A

uh Alexa you had, or did you already say what you were going to ask or say.

A

Assume, yes um matt, you have no.

B

Sorry, I actually, I actually was muted sorry uh I saw I saw hotel labels or, like key value structure, is used. Is that typed or is it like just string string because people of labels are can have like numeric value and also have like a unit field? So.

J

I think those are string string.

F

You know I have no.

J

No, they are not in.

F

Otlp, they are not just strings, you can have strings in 64. float.

G

F

An array of values and the key value list, so any arbitrary json-like data can be represented. There.

B

Okay, does it carry units.

F

It does not okay.

A

Thank you curious if there's a particular reason why you ask.

B

uh I think units are the most important because some labels, for example, represent memory and clients need to know that.

A

G

um Matt, you had your hand up, uh yeah, hi everybody. um I I wasn't able to attend the first two meetings, so if this was already covered and not in the notes- apologies, but um in the past I've written profilers for embedded systems that use things other than timer interrupts uh for the predicate on which to collect the sample. So, for example, you might care about hot spots for tlb cache utilization or you know other sorts of cache hit, miss counters or things like that.

G

Where every nth tl be miss, you might want to generate an interrupt or something like that. So in all of the discussion of these protocol wire formats, are we making an implicit assumption in sort of where we are now uh are that this is timer based or timer interrupt based sample profiling? Because, if that's true, then all you need is you know the frequency or something like that?

G

But I would, I would propose a more uh open or extensible definition of what's the trigger for these samples, uh so so that that was one thing and I had two other responses, but I think that to other things that were said but um but on this topic I think that that was my my primary uh uh rumination over the last week or so.

E

Matt, could you elaborate on that a bit.

G

Sure so, for an example, this is now 20 years ago, but you know we were. Our team was porting some stuff uh from win3 as a concrete example, porting stuff from from win32's network style, the windows network, stacked down to windows ce and we were moving from an x86. You know cisc architecture to a mips risk architecture where suddenly pipelining and cache utilization really is more to predictive and determinate of performance than um than say, cpu speed or where time was spent.

G

You know, because on some risk chips and- and I mentioned risk because with the prevalence of arm and arm socs and edge computing, I think we're going to see a lot of things so, for example, the pxa255, which was an alarm chip, had a co-processor on it and you could program it to generate interrupts on any number of conditions, dozens of conditions.

G

So in the case of that networking code, you know one big loop through a tcp packet processing loop would just obliterate an instruction cache on a mips, but not on an x86 that had larger eye caches right. So when you're saying how come everything is working, but it's 40 slower, you start caring about destruction, cache and data, cache utilization and hotness versus, say time spent, and so we took our sample based monte, carlo style. You know p prof, style, uh profiler and just changed the predicate.

G

So instead of a timer interrupt, it's now generating a statistical profile of where you're blowing your eye cache right. So these kinds of profilers, I suspect, uh will be needed as as we try to diagnose. You know: scenarios with lots of edge computing nodes, running non x86 hardware right, maybe on custom socs that have their own counters. So so so that's sort of a concrete example of why I think this might be important to have a a permissive extensible way to describe um what's generating these profiling reports.

E

G

What I wasn't sure about is in the canon of all the modern profiling formats. Is this already captured by any of them, or are they all pretty much implicitly sample, based um on the matter of thread ids? That can be really useful for debugging, but there's a lot of scenarios where thread pools might be in effect or goes use of its concurrency model that doesn't always jive exactly with the physical processors.

G

So I think thread ids are useful, um but you know they can also in non-single threaded scenarios or scenarios where you have a work pool. You know asynchronously processing things. um Sometimes they can cause a little bit of confusion. um The last little thing I think I already heard answered it- was a minor finer pointer on labels, but um dr cite's book understanding software dynamics, which just came out. He uses labels in a profiler uh at kernel level and uses base64 encoding.

G

So if we're willing to throw away a case and just have uppercase characters, you know when we get down to uh benchmarking. You know how many bits on the wire that can be a quick, easy way to encode labels that that chops, a bunch of storage out, but really that first point was the one that I'm most concerned or interested in hearing what people think.

A

Yeah, that's uh thanks for adding that's a lot of information. I mean, I think I don't know if we necessarily have come to a decision on that. Yet I mean, I think, that's kind of what uh people were mentioning at the beginning of this call of um you know is something we kind of still need to figure out. So I don't know if there's a, I guess, certainly.

G

I wasn't asking like: has this already been decided or what are we doing? I wanted to kind of put it into the conversation as something to consider as we look at these different profiling formats and hopefully find some consensus in time.

G

I think, is a requirement I would propose. We should consider um you know looking forward. Okay,.

A

Sounds good yeah florian uh in the chat mentioned um he he double double, plus that as well so um yeah. We can definitely add that in as well, uh there was one more um custom format. uh We have a little bit of time uh that I was gonna try to uh um pete if he is still oh yeah um yeah. So we last week we had talked about a couple custom formats and um the elastic profiler people mentioned.

A

uh You guys would potentially have something for us next week to talk about um on what is custom about the format you use.

A

um Just as um I don't know recap there, the the idea is that you know those using custom formats obviously found something uh inconvenient or that could be improved over existing formats and so we're trying to better understand that pixie uses a slightly custom format as well, and so I was wondering if pete you wanted to maybe just give us a quick overview of what that is in the time that we have left.

K

Sure all right, so thank you for the interest that I think I would describe our format as minimally viable and we went with a um a flat representation where we put profiles into the all the stack traces go into a single table with columns to represent the stack trace and that table the schema for the table is uh directly translated into a proto uh format so and that becomes the wire format and then that gets unpacked on the other side when we need it so because it's flat there is not as much efficiency in this we basically put a timestamp, a stacktrace id, which is loosely an integer that represents this particular stack trace.

K

Although because we don't keep infinite history, the same stack trace could get assigned a new stack trace id um in in the future. Then we have the stack trace itself, which is a string.

K

The string might have symbols, or it might have virtual addresses just depends on whether we were able to get the symbols populated for that stack trace um and then the count. Sorry, um the count is essentially the um frequency of that stack trace over the window of time that it was collected.

K

So this is actually probably the simplest one. I've heard so far, it's um and and to add a little context on push versus poll. uh Pixie was originally conceived to not require customers or users to deploy additional resources.

K

So we we constrain our memory usage, but we do store the profiles on the host where the profiles are collected. So you don't necessarily have a long history, but if you want to see a profile, you just ask the cloud component to go and pull the profiles off the hosts, and then then it does so and then it renders so um in terms of data over the network.

K

We're not always sending the data over the network, we're just sending the data over the network on demand, and that that you know if it was a network bandwidth problem. Surely there's I, I think the discussion around how to make this efficient is quite fascinating, and I've certainly seen a lot of really neat um neat ideas that are um out there.

A

Oh yeah, thanks thanks. Does anyone yeah have any questions on that.

M

So this is uh omid um work with pete. um I think it might help I'm a visual person. I like the diagram that was presented before as well. I think that really helps as we go through all the different.

M

You know, implementations and kind of look at their pros and cons and just go through this process. I think it would really help to have uh visualizations, and maybe we can put together like just one picture for the sake of.

K

M

And send it out.

K

We surely will yeah.

M

um Yeah pixie did go with a very, very simple one and we had plans to kind of optimize and kind of commenting to um pete's earlier thing like the api, we were kind of considering it more as just the api level of like how we get it out to kind of the rest of our infrastructure, but I think uh yeah. Maybe we can just share something out. Pete.

A

Yeah, that would be great um yeah, as I I kind of um you know suspected that this this part of the uh the conversation about you know formats and stuff would take uh some time for sure. um You know we'll we'll definitely talk about uh the, how you know how elastic is doing stuff related to this um next week, and so maybe, if you want to share that week, that would be good um in the meantime.

A

One thing I guess yeah, I'm curious if, if anyone else has thoughts on what would be good next steps here sounds like one of the biggest themes is one like figuring out something around benchmarking, um and uh you know how we're going to you know ultimately evaluate these formats and on what criteria.

A

um That seems to be a big theme, as we continue to yeah further evaluate the existing formats um and then um and then yeah also sort of, uh I guess, some, some more granularity on the various goals that we've mentioned and sort of um the context. That is important as we sort of approach those goals.

A

So, as I think, sean was mentioning, you know, understanding sort of not only the formats themselves, but the context in which they're used um most uh is, I guess, important stuff that we should add, as we continue to evaluate the existing formats. Is there anything else that anybody thinks that we should? uh You know be that should be top of mind currently uh curious. Also, if uh tigran or morgan have anything to add there um but yeah. First uh matt, you have your henry's I'll, be.

G

Super brief, I just as we decide like kind of what happens next moving forward. um There's some nice examples in like, for example, in the prometheus data block format. You know they have a scheme where every block can be encoded with a different compression or have a different encoding.

G

So in talking about like efficiency over the wire, this compression versus that compression, I personally would favor again sort of an open protocol that allows for a variety of different encodings or or or um compression, or you know what have you and that might make this protocol more about. What's the data on the wire and then and then leave some of the other concerns about scenarios uh to be something that could be fulfilled, you know, while in the protocol, but in a well-formed way, if that makes sense, yeah that's a great point.

F

I think you're yeah.

C

Sorry, I wasn't muted.

F

Yeah just one comment: uh otlp supports arbitrary structured events right. These events can also represent profiling events and it's possible to do that. uh It will likely be less less efficient than than a custom format designed specifically for profiling.

F

uh I think it's important to show how much worse it will be. If you do this using the generical tlp.

I

Let's say 10.

F

Worse, it probably isn't significantly it's 10 times worse.

I

Can you hear me.

F

I

Your audio completely dropped for like 10 seconds. He was totally fine for me yeah, oh maybe it's on my end. Okay, never mind, never mind.

F

Should I go on yeah.

I

You're good, I shouldn't.

F

Restart, okay, I think it's on my.

I

End: apologies.

F

All right all right, so uh what I was saying is that there is a way to represent this profiling events using otlp. It would be great if you could show that whatever you're designing the custom profiling, specific format is significantly better, and that will be a strong argument in favor of doing that. If you not show that that's kind of weakens the argument right, why you want it to be a custom format and why we don't do the otlp.

F

So when you do the benchmarking, it would be great if one of the options is representation of profiling. Events using the otlp structured log records that exist already today,.

A

So I think, we're out of time so I'll stop here, yeah yeah, that's awesome, thanks um yeah, so I'll uh put yeah! um We've got, I guess, yeah out of time. If anybody else has any thoughts um feel free to throw them in the slack channel I'll, uh try to organize these notes and then send them out with like a recap and then um and then yeah.

A

I think we have some some good start for next steps and then next week we'll talk a little bit more about um elastic profiler and as well as potentially a more visual representation of pixie. Thanks everybody for coming and um we'll hopefully see you all next week, see ya thanks.

I

For running, that's right.

M