Open Telemetry Uncategorized, 10 Aug 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 2021-08-10 meeting

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

A

B

Okay, now now it worked just have to rejoin. Let me show my desktop.

B

I have three topics, one is regarding the exposure pr. I I got some technical issue that my my clone repo has some corruption, so I I'll share my screen and quickly go through the changes and get some feedback, because I I have questions and I'll polish the pr and fix the technical issue. On my side there are two other pr's, so one is from victor. He like he started a new pr, try to take the major feedback about and another pr is currently marked as draft from josh, and we can talk about that later.

B

If you have any other topics, please add them and also we would appreciate if you could put your name on the attendance list.

A

I think there is another topic.

A

The the goal also was to present what I did regarding the multivariate time series and a general way to store events in general, matrix traces and logs.

B

Would you, and that for the agenda will cover that sure? Thank you, okay, so for the exporter, uh if you can see my screen, let me see if we can make it.

B

B

So I'll quickly go through this and and I'll have some questions and when you have feedback- or you have some answer from my question- please let me know I'll take the notes.

B

The structure is very similar to the the trace spec. However, there's a there are two things that are special to the matrix part number one is we need to define? What's the data type and I believe our current thinking is, the data type should be a representation like a per language representation of how otlp models the data.

B

So I, instead of trying to declare that in like the details, I would just have some very high level like information saying this is the language choice as a the native representation of otlp, without specifying any detail like later. If we want to add something otlp, we can do that. Would folks be okay with that.

C

The only thing I want to call out sorry, sorry, it was a little late to this. um The only thing I want to call out is um so in tracing. This is also true resource and instrumentation library um in open in otlp.

C

You have this notion of resource metrics and then instrumentation metrics, and then you have the actual metrics metrics right um and we might actually flatten that in practice which is kind of standard intrigues, so I am totally on board with like, let's, let's uh let languages decide the right way to format that um you might want to call that out of like here's, what most languages did with trace and we're probably gonna. You know this is a similar thing for for metrics.

C

Okay, the the only reason that's important is the uh the actual interface for an exporter.

C

um Does it take a resource as an argument, or is that bundled in like the list of of metric data, uh that it gets right and if it is bundled in there um are we bundling? Are we pre-aggregating or pre-uh grouping the metrics based on meters and resource.

B

Okay, thanks and the second question is when we try to use force, flash and shutdown. One thing I noticed on the tracing spec today is that we only require two functions instead of three. Let me find.

B

It okay, so so here we're saying: the exporter must support two functions, export and shutdown force flash is not required, although it's mentioned here. I believe it is due to the history. So initially when we released the spec, we only asked for two force flash was added later and that's why for compats, this is a nice to have, but not mandatory for matrix.

B

I think we don't have the problem, so I tend to say it must provide support for all the three functions, which is a little bit inconsistent with tracing, but I I believe that's a good ideation.

B

Okay and the the biggest trouble I have is force flash because false flash from semantic perspective. It only makes sense for push export like when you send that to like. When the sdk is actively sending that to the destination, then you can force flash. So what I'm trying to do is from the top level description. If you look at the exporter interface, I still try to inherit the spirit from tracing. Everything is basically worded the same way as tracing past.

B

However, I have a special section here saying: if you have a pool metric exporter, then the force flash wouldn't make sense. So I ask people to always return failure immediately.

B

Another possible approach is we don't try to restrict this? For example, people say someone called force flash and the exporter is still busy trying to reply to the scraper. Then the flash can wait and, for example, you can block for a few seconds until that http response has finished the new return that that might be a good idea, but I I I'm not seeing huge benefit and I I think that's been to create confusion. So here's my proposal.

C

So I think the use case that we've seen for force flush from open census has been like cloud function, uh uh function as a service style compute, where you need to get metrics out when the thing is killed um so flushing on shutdown and force flushing, and that's that's like specifically, an area that pool metrics.

C

Don't really support super well, because you have someone remotely hitting you so I I totally agree that force flush, not interacting with full metrics and just being an error, makes a lot of sense because effectively, you wouldn't be using that style of exporter. For this use case.

B

And for shutdown, I think that still makes sense, because people might use shutdown to free up some resources.

D

I feel like this might be a little draconian can't I mean if you're doing both push and pull, and you call force flush and you've pushed your metrics. I would. I would have expected that force flush just means the next pull will succeed at seeing the data that I just flushed. It wouldn't cause an error. That seems a little surprising to me.

B

Yeah good topic that brings my my last uh topic here. So in memory exporter, I'm trying to say that your memories follow her should support both than the pool uh number one question: do people think this is a good thing or you think your memory exposure should be push only or pull.

C

What's the difference like what? What practically does this mean.

B

Okay, so uh imagine if you write a unit test, sometimes people might have some like scheduled projects, popular they're, saying every five seconds: harvest the metrics and send that to the memory exporter, and then they will check some status like a list or something. uh But the problem is um you have to handle the timing, the building of full exporter? Is you can report all the measurements as much as you can and after you think, you're done.

B

You just call the uh like the pool metric exporter using the sdk you're saying I want to harvest the matrix and that's at that point. It gets everything that has been reported so far. So it's very predictable.

D

That's what I was trying to say is that flush puts the data into an in-memory place where a pull exporter can get it. That's, that's my that's what I would expect um it's almost it's almost like language-wise calling a pool exporter. An exporter is hard. It's not really an exporter. It's a um that's! That's why we're getting into trouble here. I think that if we described pull-based exporters as an operation over the in-memory exporter, which which makes sense as a as a terminology, then we would get out of this trouble.

D

Full exporters are not exporters. They read from a memory exporter.

C

Yeah the other way you could, you could say that, like I wouldn't say that this supports both pull and push. I would, I would say that effectively you could have any pool based exporter, do a force, flush and then grab the latest matrix yeah.

D

You know, like that's another way, to think of it and then that way, there's no error case. I don't like the error case. In other words up above I see.

B

Okay, I'll I'll try to reward that. Let's finish my topic and once I publish the pro I'll send that out for a review thanks.

D

But uh um which pr were you referring to just now.

B

I haven't I I see I was saying my my clone. Has some corruption? No idea what happened? It seems random like it's not only for this require. I got other like open, telemetry.net repo, I'm saying the same thing, so it might be a tooling issue or github.com issue.

B

C

B

Yellow down yeah I've been having this issue since yesterday.

B

Okay, anyways, like I'll I'll, try to work around that problem, and so I wonder if victor is here, uh we can talk about the aggregator pr. If not, I can follow up with him. He just he just came back from vacation.

D

I would like to talk about that pr anyway, I was looking at. um uh I was looking at uh the pr and I couldn't respond to it in github today, because github's down, but I but josh's last comment was something like want to confirm a few points and I feel like conflict here. So the point was josh wrote, default, aggregation for synchronous instruments is delta right now, not cumulative, and I and that that to me is implementation detail, because I don't care how the sdk is actually implemented.

D

What I do care is the default sdk that we put together works for prometheus and if it's like delta, is not the default outcome, and I I'm not sure how to write this down in our specs, but but what we've done is say that the api for synchronous instruments receives deltas and that's not saying anything about how you aggregate- and I thought josh had said this really well two or three weeks ago. You know there's two different, viable implementation strategies here and- and I anytime we say something about the aggregator.

D

It seems that we're saying something about the default implementation strategy and that's not what we want to do right anyway. I'm just trying to say I don't think that the default aggregation should be delta. The default behavior should be cumulative and it doesn't matter how you aggregate it matters, how you implement the sdk.

D

C

Exactly what I wanted to call at that point is, I actually think the uh the default aggregation should probably start as cumulative, and you should have to opt into delta. um Specifically, if we think about this aggregator configuration as I configure a view, and I get metrics out of an exporter right. So what do my exporters see? The default should probably be cumulative to start with.

D

And you opt into delta and the implementation of that is to have a sort of store and forward where I'm calculating deltas over short periods and then adding them together. And that's how I get cumulative and victor's point that I learned something from is that you could totally rearrange your sdk and do it differently and have your deltas and have your cumulative aggregators be computed right away on the phi. You never get a delta there and that's totally valid, but it changes the memory and the synchronization and all the management of your data.

D

And so we just have different approaches.

C

Yeah, so I guess the the question I have then is because I I stated that, as I think these are implicit assumptions in the spec that I wanted to make sure we we walk through and agree on. um My question is: if we agree that the default aggregation coming out of prometheus needs to be cumulative for these things, and we agree that the I don't know if we agree on this, but do we agree that the default aggregation for otlp should also be cumulative.

C

Okay, josh is not in his head. I can't see anyone else but yeah. So let's assume that those two things are true.

C

I would argue that in the spec it should say the default aggregation uh from the spec perspective would then be cumulative for all the metrics.

C

Specifically because again, the specification should focus on the configuration that goes in from the user and the metrics that come out in the exporter, or that the exporter sees right. um There's an open question of how we want to deal with an exporter that gets delta metrics and can't deal with it. I think that's one of my other, like points that I talked about that I think is not addressed in the pr. Nor do I think we should address it in that pr.

C

I think we need a whole pr to address exporter influence of aggregation, but I did want to call that out, like that's. That's a whole thing that we had talked about a whole bunch, it's not in the pr I'm I'm kind of happy with it, not being the pr and we say cumulative by default uh for now, and then uh we can talk about uh exporters influencing default, aggregation temporality.

C

um As like a follow-on thing, uh I had a third point too. I need to go look at what it was come.

B

On get up and and a quick question regarding the second point, so do you have a like a main reason why you would export like you would expect otlp to be accumulated by default.

C

It's it's a little bit nicer for back ends um effectively. What it means is, if you're missing a data point you can just drop the previous well, so.

D

A stateless exporter, a stateless collector, can export to prometheus. That's the biggest deal there.

B

C

Yeah but but it also means that the collector can uh ignore, dropped points because it has cumulative points. So by default it has. It has some nice properties around guaranteed delivery, where we don't have to worry about it as much I mean you still have those missing points, but it's not as dangerous as say: deltas.

B

Algona has mentioned something similar. I.

D

That argument is a lot harder to make. I think just just saying prometheus default is cumulative and otlp introduces deltas in a way that makes sense, but no prometheus ecosystem can handle. So that's why I thought we were specifying cumulative default.

D

C

Think I think that that's more true and you're right that that's an easier sell but yeah.

B

Do you think that the the collector could do the delta to cumulative conversion, and we have a way of reporting missing data? Then we would have confidence that the conversion is accurate, or you think that this is some problem, although we can solve, but technically it wouldn't. Work like just accumulate by default would be much easier to handle.

B

So the question is basically: do you think this is a? This is a blocking issue that only cumulative could solve, or you think both cumulative and delta could solve. As long as we have some way to indicate, there is amazing data point in delta when we report otlp.

D

Well, we do have a way to to indicate missing delta data points in otlp, using the time ranges and stuff the specked out and data model, and I to me I see it as a choice of performance and reliability like you can get higher cardinality through deltas and you can use less memory in your clients and your collectors with deltas, but getting reliable reliability is much harder and whether users want the performance advantage of deltas with all the overhead and complexity of managing gaps versus the simplicity of handling cumulative data and not worrying about it.

D

But having limited cardinality and high memory usage, it's just a choice you get and because of prometheus is the dominant ecosystem. We make that choice by default to go with cumulative, um but many vendors support deltas and I'm hopeful that the open source ecosystem will begin to support deltas.

D

That's will make a huge difference in this discussion when there's an open source server service that takes deltas and shows you metrics and and the problem is that whenever you talk about delta to cumulative conversion, you've just kicked a can down the road and you've made the the problem of managing cumulative state someone else's problem until you have a database that handles deltas all the way through, there's not much benefit to having delta's part in part of your system, because you pay that cost eventually.

D

But when you put it in the data database, you can actually do things with deltas in ways that are meaningful and optimized and efficient and yeah. You lose some like correctness because of dropped data, but the advantages are seen as like. You can handle high cardinality data, for example,.

C

Yeah, just to echo on that, I think I think, there's trade-offs on both sides that need to be made, like neither side is a panacea of right and at this point, bouncing on in favor of being prometheus compatible uh is how we've been leaning- uh and I totally agree with josh that uh if we could get to a point where deltas were the optimal way to go, that would be ideal from our own overhead standpoint.

C

You know if we were only exporting deltas, we could actually really minimize uh the cost of using our sdks um and that's there's some power there right, uh but then there's a whole slew of other problems. We have to deal with, so it's it's balancing the two, um and until until the databases don't or handle those problems for us, we have to handle them with cumulatives a little bit uh specifically like prometheus. So um anyway, I I still think cumulative is default. Is the right way to go.

B

Yeah makes sense. Sorry, you have the third topic.

C

Yeah yeah it just loaded, okay, so the last one is uh the public names that we expose in the sdk um right. Now we have there it's exposing the name of gage, the name of sum and the name of histogram, where gauge means keep the last value sum just means add up points and histogram means fixed bucket, histogram.

C

The reason I raise this is if we want to provide other types of aggregators that are not last value. Aggregator that produce some metrics. Do we think that's going to be confusing? Is that worth talking about? The second thing is for histogram: do we want to use uh a name in the view api that that is explicitly fixed bucket histogram so that we have freedom to change to the new type of histogram or prototype the new type of histogram without overloading the word histogram, so that it only means fixed bucket right?

C

Can we preserve the name histogram to kind of be not talk, not locked down to fixed bucket histogram? Is that worth doing?

C

D

So I don't know how.

B

But I think so I I I would guess that histogram can be arbitrary buckets and there can be a special case table. For example, if you specify something every explicit boundary, then it should work and of course you can model exponential buckets in this way, but you won't be able to get all the benefits and then you can special. You can special case that if it's exponential buckets and especially with space two, then you have hyper performance, but the model shouldn't just assume that everything will be obviously buckets or it will be exponential.

B

It should support any type of buckets.

D

I agree, I think the the term histogram should just mean I'm doing something with histograms because likely the the sdk or the distribution or the there will have an ideal default and um you you may not even have the code linkedin for some other type of histogram. You know you may only have one or two options and um I think um any histogram is probably what is what is meant in the view and if we want to get more specific, I think then, like it's a sub option.

C

Okay, so so to clarify in in the you know in the proposal you would say you know create view, here's my instrument, selection criteria and then I want to aggregate as a histogram, and here are my fixed buckets as options right or or would we say I want to aggregate as a fixed bucket histogram and here are the options you see. You see what I mean with the difference uh versus just. I want to aggregate as a histogram with no options and then the sdk is free to choose what that looks like.

B

I I would vote for you say I want histogram, and the package is an option. Instead of you make fixed bucket and explicit type or something.

C

Okay, so what's interesting here, then, is: um do we feel like we have room with the current option strategy to say, like you can define a histogram where you have these fixed buckets or we have another optional configuration parameter in the future. We can put there that allows this um automatic bucketing strategy or some some form of some other form to sketch right. The logarithmic.

D

Thing, for example- and I see your point, it sort of makes it more difficult to spec out what the options look like, because you're saying histogram and then you've got some explicit bucket options which imply a explicit aggregation and you've got some like exponential options which imply an exponential and then what do you want to do? What happens when you add a third or whatever um yeah the spec becomes a little bit more murky. I guess, but the user doesn't have to think about what kind of histogram they're getting which probably they don't.

D

I mean I do think someone's going to think about which histogram, but I don't think it's the end user very often.

B

Yeah, I can imagine in python, you just have something like a bucket equal tool and you can put an array which is the access list bucket. If you want to put some specific type, then you can do the right click and figure out. This is the base tool exponential buckets or you can put something as like: smart, like smart buckets or something like the isd, will automatically figure out. What's the best option for you.

C

Right but like those configuration parameters might not be the same across the the different histogram implementations, and so now we have this weird issue where, as a user, how do I know what configuration parameters are used for my actual aggregator like physically we're, using the same name for different types of aggregators.

D

We um spec out long names for all the aggregators and then have like spec abbreviations. That say like. If you choose the term histogram, it means best available or like.

C

Yeah that that.

D

Would actually be my provider.

C

Yeah yeah so, like I'd, say, use the word gauge sum: histogram. The sdk has freedom to pick the right aggregator for that thing. uh But if you say I want fixed bucket histogram, then you know you're getting that specific aggregator with this specific set of config. But if you use the word histogram you don't get that control.

E

We allow an option for people to choose the aggregator specifically for a given name, then, in other words, some sdk thing that says, if you use histogram, I prefer to use you know an explicit versus a binary and so forth.

C

That would be why they have the ability to use an explicit name instead. So if you're going to use the word, histogram you're just saying make a histogram and I don't care how, if you use like, I want to fix bucket histogram and hear my buckets, then you're saying here's the specific histogram I want.

F

F

It also sounds good to.

E

E

Because we also have a section that says default right and the default just lets the sdk choose for the most part. So you know we could expand on that for histogram.

C

In this case, the the major the major concern there is the um the configuration parameters of the aggregator, that's called histogram.

C

My specific argument is: I want the buckets that bucket config to require you to explicitly say fixed bucket histogram when you use the buckets parameter specifically so that we can divorce the configuration of fixed bucket histograms from other types of histograms that we know we're going to add in the relatively near future.

B

I agree with josh right.

C

I think you can keep the name histogram as a thing that people can use just it doesn't, it doesn't come with config and you know sdks can choose which histogram implementation to pick when users use.

C

C

I will comment that, appropriately on the pr, whatever.

E

You think yeah do you think we have the same issue with some as well for all of the different configuration modes.

C

I think so I think the difference here is for some. We don't have any alternative aggregators planned that I know of um gauge. We might have other aggregators planned, so so it might apply there, but there's also no configuration for it whatsoever right now. So I don't know if it's as important, I think histogram's, the only one that we have to talk through, because we know there's another implementation coming.

D

I think I I mean might be even more general than that. Can you imagine any other aggregators for some that don't compute a sum I mean is it? Are you thinking of like there's the one that uses atomic operations and there's the one that uses the mutex? That's the type of thing we're talking about, they still compute the same result, um whereas with histograms. It's just that space is so big that we that it's easy to. Imagine other alternative implementations. It's it's not so easy with some engage.

A

I think for the histogram, the problem is the representation at the end, uh how this histogram is represented. For example, the hdr is to run high dynamic. Crunch instagram has has a different representation of early um a bucket, a bucket based instagram, where the bucket is fixed.

A

uh So the that mean that if you are an exporter and you specify a different representation at the end, the back-end and the intermediary system have to understand this representation. Otherwise they will not be able to do anything with it.

A

So it's more present a problem of representation of anything else. In my opinion,.

D

I see so you're you're propos you're, pointing out that that you don't really care what the aggregator is, whether because uh you care what the data point type you get out is, and that's that's a legitimate point like um whether the view should state the type of histogram data that you get.

D

Explicitly, in which case we don't want this term histogram to refer to any histogram, we will want to refer to the data point kind.

A

Yeah, in my opinion, as a telemetry protocol, the representation of the histogram should not be something that any client could decide, because that has no value. If there is no back-end able to interpret it.

D

Well, at the moment, we're expecting a second histogram, and so there would only be two and I think the question that we're facing right now is whether to to choose a histogram. If, if you, if you aren't so inclined, do you have to say exponential, histogram and explicit histogram or dj instagram and let the compiler or like the build rules whatever puts together the dependencies, choose the histogram implementation, which then implies which data point you're going to get.

D

So, if you just say histogram, and you only link in the exponential histogram code, you're going to get that exponential, histogram data out and if you say histogram, and you only link in the explicit histogram code, then you're going to get the explicit histogram. And what we're discussing right now is how do you influence that decision in in a specified string? I guess- and I see both being fine, I'm not sure I.

C

Care there's also a practical bit right now of I agree. Your back end should influence what histogram you get practically. I don't know if you have a way to do that. Yet just yet, and most people are doing that with configuration in libraries that support things. So if we support multiple histograms to the extent we can figure out how to have the back end influence that we should, uh I think, that's going to be through some sort of config based mechanism uh that will influence views and default.

C

You know when you see this instrument use this histogram implementation that my backend supports. um So we still need to actually specify how to configure that, so the sdk can set itself up appropriately and that influence should eventually come from the back end implementation, but uh we still have to specify what it looks like.

C

I think we need this topic.

F

To death by now, um anyone have any other. Any other points, that's just a more a question, but would that mean that um you would either have all of your histograms in exponential histogram or all of your histograms in explicit histogram? Or would you be able to to mix and match those two, because maybe you might want to have a like some histograms in an exponential form and some not an exponential form.

C

I I think so we're discussing like the full configuration of the sdk, and so, if you look at the what we specified with views, uh you should be able to accomplish.

C

Actually all of that, above all, right like we should be able to select all you know, instruments of a particular type that are histograms and say use this as your default, and you should also be able to say here's the specific set of instruments make these this histogram uh and that's part of view, configuration we haven't gotten into how the programmatic view configuration turns into say something uh like a file based configuration or a remote based. You know, protocol configuration, but that's that's kind of what I'm suggesting here is from a specification standpoint.

C

This is how you configure an sdk to work with your back end, so we need to specify all the options that backends could choose or that users could choose and then make sure that the sdk can set itself up appropriately for that exporter.

C

Okay, thank you.

B

Yeah so short answer for your question is no, you have flexibility. If you have three histograms, they can be different. You can do whatever you want here.

B

Okay, uh exemplar.

C

Yeah I'll try to make this quick, so we can, we can get to column or data, um so this is. This is waiting on the aggregator pr, because there's a component of this, where the uh aggregator we need access to aggregator in some fashion, to determine exemplars, so just waiting to see what the names of those aggregators will be. um That's that's the one of the open things, but in the meantime there are three things raised in the pr to talk through. One is: when should we filter metric attributes on exemplars?

C

Do we want to send exemplars with all of the attributes that were seen at measurement time to an exporter and require the exporter to do the filtering, or do we want to require the filtering to happen in the sdk prior to getting an exporter?

C

um I honestly don't care either way um that that's if, if it goes all the way to the exporter, I think we need to publicly expose some kind of filter mechanism on attributes to be able to do this filtering in exporters um in our public sdk, but otherwise it doesn't matter to me. uh The second thing I'm gonna walk into real quick is uh we have two ways of filtering uh or or uh sampling measurements in the spec?

C

One is um a it's just a filter of should this be eligible for sampling and the only reason this exists is to make it easy for users to turn on um to turn off sampling of reservoirs or of exemplars to turn on sampling that is attached to traces that are sampled or to just sample any possible measurement.

C

um The the expectation here is that that hook is mostly important for sample exemplars, with traces that are sampled or turn this whole thing off, and it's just like that binary flip is the only important bit of the filter. The reservoir will also make additional sampling decisions, and so we actually have two pieces here making sampling decisions that was raised as a why? Why does this exist? What's it for um so explicitly? That's why? So? We can make this on off decision with traces or everything, and then uh so. The reservoir can do additional sampling.

C

Just from practical need standpoint. um We don't want to be sending um every single sample measurement from a trace for a sum, for example, that's probably way too much data, so the notion of reservoirs gives us a fixed memory cost. That is that's. That's the idea behind these two. um So that's an open question. If we want to walk into that now, we can.

C

We can also comment on the pr, and the last thing was: does the default of always allowing uh uh uh does the default of always having exemplars and only sampling uh where traces are sampled. Does that make sense?

C

Those are the three open questions. I don't know if you want to walk into any of them now or whatever, but uh I I hope to see more comments on the pr around those three and as soon as the aggregator pr is in. I will open that up for for more review.

D

So to summarize, yours you're asking whether we all agree that the default should be sample. Only events interfaces sample only uh sampling on by default, and I think you were looking for one more constraint and I would propose that. It's like one example or per point report period, and that says that you'll get.

D

uh I mean at least one so that histograms can have one per bucket, but sums and gauges could give you one example and- and it seems like that, might be controversial there because there's prometheus infrastructure, it hasn't been built to do this who's going to care about the example of a gauge that may not matter to anybody. But the sum example is well-defined.

C

And you just see, people use atheist, prometheus has exemplars and it has one exemplar per point and so for histogram buckets uh because of the way they do. Histograms you have one exemplar per bucket.

C

um You have one example per thing: I'm actually not suggesting. We only have one per point at all. I'm leaving that unspecified. You can decide your reservoir. How big you want to make it if you want to have two per sum: that's fine like if I want to do min max sampling on sums or on gauges. That's fine as well, that is that is left unspecified. That's up to you.

C

The only thing this specifies is you have a reservoir who can make decisions of what measurements to keep and you have a filter which determines which measurements are available to the reservoir right. So.

D

If you want to do.

C

Min max example, our sampling totally fine on every point. That's fine! um That's that's allowed in the spec. It does expose that reservoir interface to users to define their own.

D

B

And regarding the first question, I'm guessing the exemplar data point will be very similar to the span event and if people have the logins back later, it might be similar to the log record as well, and my word is at least I don't want to see the exporter having some filtering mechanism, because that's not like what the exporter is supposed to do and also we give individual vendor flexibility to implement their exposure. So I can imagine eventually it will diverge.

B

I want that to be part of the core icp functionality and whether we have like five different processors or we can combine them. I I think it might take extra time. So my question is: if we just like, tell people we're going to give all the attributes to the exporter and by default, we'll export that to premises and otlp, and if the feedback people think some they're too costly to collect or is privacy concerned, they want to change or they want to alter that. We will cover that later.

B

Well, that makes.

C

Sense, the data model already requires that the otlp has filtered attributes, so we can provide all the attributes to the exporters, but at least for otlp we'll have to filter them down further um for prometheus. I looking at the implementation- I I don't know yet whether or not in practice labels will be filtered.

C

uh If somebody knows please call it out, but looking at the implementation, I didn't see filtering implemented in the early prototype.

B

Let's see and probably like a credit question, so if we were saying for now we're going to give all the attributes to you from nick's power and the otlp part, we can do something specifically for otlp, but if you have a specific backend that you, you must filter out something for now you can send the data from the sdk to the collector and the collector will give you some flexibility.

B

Then you can export the data to your own format.

C

So so riley, I think we might be spending too much time on this. You can always get back all the attributes, because the metric data point has a set of attributes and the example just has the attributes which are different than the metric data point, so you should always be able to take the metric data point in the exemplar and restore every attribute like this is just a compression technique. The only open question here was: when does that happen? uh Does that happen in the sdk? Or do we push that onto do?

C

We allow the sdk representation this data to not filter and then require exporters to do that filtering or not, um but but practically either way. Because of how you have to represent this data in otlp, you should be able to recover all the attributes.

C

It's just a matter.

D

Of only to the next topic, I feel like this requires people to actually read the prs.

C

I agree: let's take that offline yeah. Thank you.

B

Thank you, okay. Next one.

A

Okay, uh can I show the screen.

B

A

Okay, thank you.

A

Okay, so very quick introduction, uh so I'm lauren kirell, I'm new in this.

A

Kind of meetings I already met some of you like josh and josh, basically so the let's start with the motivation behind this uh benchmark. I did and in order to demonstrate what we could do to improve the representation of multi-diet time series and, to some extent, the representation of traces and logs in general, so at f5, so I'm working at f5 and we are producing a lot of multivariate time. Series mutilate time. Series means that we have multiple matrix related together for the same timestamps.

A

So a good example of that is the x y mouse move. When you move the x and y move together, there is no real interest to just observe x and y separately, and we can imagine many many multi-time series like that. A multivariate time series like that, so the problem with the existing open telemetry protocol is there is no real support for multivariate time series that is efficient.

A

We need to, obviously we can represent them, but to represent them. We have to create multiple universe, time series so one for x, one for y, and you can imagine that you have to do that when you have. Let's say for a server something like 12 metrics combined together, you have to create 12, univariate time series the the problem with that is most of the time. Every univariate time series will share the same context.

A

The labels or the dimensions represent the context. So that means that the the protocol will be very inefficient in terms of representation for multivariate time series, because you have a lot of duplication and that has also a lot of other implications.

A

So, for example, we are talking about processors, some intermediary processing. If they have to work on the multivariate level, they have to recombine those univariate time series together in order to apply some modification. If you have a backend that is able to store and process the multivariate time series again, you have to join again this information together.

A

So what I like to do is to change to make some changes into the protocol to make multivariate time series first citizen in order to improve the efficiency, but also to improve the the way that the type of processing we can do on those time series.

A

So that's why the motivation, so, let's go directly into the benchmark. I did it's bigger than just motivating multi-diet time series it's covering also traces and logs, but for today I will focus on the multivariate time series.

A

So the experiment I did is comparing hotel v1, like we have for the the current uh prototype specification compared with a version of it. That represents events and you can map every mutilate time series on events and for those event, we have a columnar representation. So, instead of having one row per event or one row per matrix, we have multiple columns and every columns were present.

A

So um every request every columns represent one of the metric of this multivariate time series and the same thing for the labels, and you have arrays like that.

A

So for a batch you will get, for example, for multivariate time series with six labels and six matrix, you will get only 12 arrays six for the labels and six for the matrix, and that's it so in term of even in term of memory allocation, it's more much more efficient because you just have to to instantiate 12 arrays instead of instantiate. The number of events multiplied by the number of dimensions, plus the number of metrics, so the second implementation columnar based fully implemented in protobuf.

A

The third implementation is an experiment where I reused an existing columnar representation. That is.

A

Already used in the industry named apache rule some backend already supported to export, analytics information or to import analytics information, so the this third implementation is basically taking the existing open, telemetry protocol and embedding into the the batch this arrow of representation.

A

So this diagram basically represents the three bench, the first one what we have currently, I just try to summarize how the information is currently stored, the second one where we have columns and we have a batch object containing those columns and the third one where we have a batch and a single arrow buffer, taking care of the representation of all these events or multivariate time series make sense.

A

Okay, so the results and- and we can go in the detail um in the next sections of this document- um let's say that this one represents one. Currently, the the results are for a typical mutilate time series something about 18 times faster for this representation and 46 times faster for this representation.

A

So, as you see, the difference is huge. Obviously, it's huge because we have a lot of redundancy when we are talking about multivariate time series. The difference will not be that big when we are talking about traces and um and logs. That really depend also, obviously on the type of the traces and logs you have. If you have some kind of duplication, then you will see the difference.

A

If the number of duplication is very minimal, then you will be close to one, but we will gain that the gain will still be there, because we have a lot less memory. Education, the serialization will be uh faster and better. So at the end, I think, even for traces and logs, we have some win any question on that.

A

Okay, so the the scenario of the the bench I try to to create a bench where every steps are represented from the collection of the information. So we collect a batch of information batch of multivariate time series. We create a batch create a batch. We process the batch, so this one will represent basically a processor doing something, for example, doing some aggregation or doing some filtering and serialization step a compression step, the compression deseriation.

A

So I use your old step and try to figure out the type of gain we we got from the different representation, so the overall that this diagram represents the.

A

So here you have the otd1, the errors bugs are represented vertically there and you have for the red, the hotel, columnar and the green is the hotel arrow. So, as you see in terms of total time for all this tape, the the difference is huge and arrows is the for the arrow-based. Our presentation is the pure winner.

A

So now, if we go in term of speed upper steps, the regarding the hotel, columnar, the batch creation and it's 11 times faster- why? It's? Because we don't have to to create a new allocation for every metric point and labels.

A

We can pre-allocate an array per column and then we can just put the values inside for the batch processing.

A

I think that, as you are all aware closer when you have to compute something and when the data is collocated in a continuous uh area of memory, then you you get the full benefits of the the different cpu cache, um optimization and acceleration.

A

So the batch processing also will will be faster just because we have organized the information in a way that is more friendly for the cpu and the memory for the serialization. It's the same thing, because we have a single array: uh it's just a copy boom and we can put that in the on the on the socket or whatever is the the way to to transmit this information for the compression um columnar data store have been invented for that.

A

If you organize information, a similar attribute, let's say a metric representing the memory usage, most likely the memory memory usage, you will see multiple times the same value of some very close value together.

A

So if you organize the same type of value in an array, then the compression will be much better because you have a lot of duplication that are close together and then the the compression algorithm will be able to uh um provide a better compression ratio.

A

If you organize the the information in the rows, then you have different type of information that are mixed together and then the compression ratio will be lower so same thing for the decompression and the deceleration.

A

That's exactly the the opposite, so we had the opposite the same thing, also because we we don't have to allocate that much arrays or our object with a columnar representation for the horrible presentation.

A

I still have some parts that need more attention because they are surprising, but definitively what arrow did was to really improve the the time spent in serialization and digitalization, uh and even the batch question is much faster. What is surprising is the batch processing. I think I have to. I still have to do some work there to understand perfectly why I was expecting something very similar to this one or even better, so take this part.

A

It's still not finalized. For this part, but again it's better than what we have currently.

A

um So now the next section is the conclusion of my analysis and the recommendation.

A

We had some discussion with bush and she was with josh mcdonald and josh to uh regarding this uh multivitamin representation, and the outcome was why not representing for creating a new type of object, name, even that we could use to represent matrix logs and span all together in a columnar or presentation.

A

So with this benchmark, we demonstrate that it's visible first and add some good properties and uh in terms of um sdk evolution. I think what we could do is introduce this new event. We don't change anything on the existing sdk, except that we can add a support for like a switch. If you want to use columnar representation that the same sdk could be used, but the representation of this information will be different and we have a modification a forward and backward compatible version of the protocol supporting this batch of events.

A

um The next diagram is about um more the the decision that, in my opinion, we have. We have two decisions. Do we want to go in this first representation based on events?

A

Second decision: do we want to be only using protobuf only or can we agree to use arrow embedded into this.

A

Open telemetry prototype protocol and I will encourage you to consider the third option for the following reason: apache arrow is already well installed in the in the industry.

A

We have many many components already existing in stream processing in analytics databases and many other some connectors to export or import from parquet or from different other formats directly into aperture so leveraging the ecosystem, in my opinion, will be a great benefit and will give us the maximum efficiency that we can expect to represent muted time series events and analogs.

A

D

Thank you very much. I don't have questions that was wonderful. um I uh speaking as a member of tc think we should uh review the arrow, uh I'm not very familiar with it, and I think that this is very promising. um uh I think we're out of time to have much more discussion. I really appreciate that, and um I speaking as a vendor really want column, wise or some sort of efficient multivariate encoding, especially for spans, which is another topic or another forum to discuss how this could really impact span recording as well.

D

um I don't know that we have any more time to discuss it right now. I think we should. um I think that this should be discussed in front of a larger group, and I um also think we should see a proposal for a protocol change. um I thank you very much. I think we should look for look at moving this forward. Okay,.

A

D

Using the aero recommendation.

C

Right, I I want to echo that, and also thank you again because now I feel, like everyone, can read this and understand, what's going on and uh and the implications, so thank you.

D

This will give you a presentation: let's share this document. If this is not already open as a pr in the oteps repository, it should be now and we can get it merged. I think it's a very good proposal.

A

Okay, so I created a draft for the audio tape I didn't submit it at all. So I can update a little bit dear. Tap include the result of this benchmark, a few links inside and then submitted. Thank you. That's the right way to go yeah. I think so. Okay, good.

F

Thanks all thanks.

D

Thank you thank.

D