Open Telemetry Uncategorized, 31 Jul 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 2020-07-31 meeting

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

Hey good morning, everyone I'm I'm matt. I work with luke miller and conwall at microsoft,.

B

Hi, uh I'm nara. I work for netflix.

A

Hey nara good to meet you.

B

Nice to meet you.

C

Hey matt, uh this is sandra from aws x-ray.

A

Hey sandra, it's good to meet you.

C

C

A

A

A

Oh, why, but it won't. Let me turn on my camera right now,.

A

A

A

A

Yeah, so I'm mostly you know just here to learn, I'm I'm kind of starting to reach out to our customers to try to like better understand what they're looking for in the way of sampling, and so I'm still pretty early in my investigation. But thanks for letting me join.

D

Oh, hey matt, uh who are you with um I'll, introduce myself? I'm will tran um I'm with autonomic ai we're building a connected vehicle platform and sampling is very important to us. So if you need a user to talk to, I can be one of those.

A

Oh awesome yeah. That would be great I'd love to follow up with you um yeah I'm. I work with louis miller on azure, monitor I'm up, I'm a pm, oh cool.

B

I'm curious about the secondary sampling discussion and I was introduced to this meeting by anuraga from.

D

Aws, I've actually been playing around with secondary sampling, um trying to uh make some modifications to the jager uh java client to um because the way that works is it does a sampling decision at the very beginning of the trace, and then that decision holds for the rest of the trace and- and I read through that secondary sampling link- that was posted a few weeks back, and that was a great idea.

D

um I think the use case there is like you want to turn up the sampling rate for um certain sections of a trace and in my case I have traces where there's a lot of noise and uninteresting stuff um and then there's this one critical path. That's really interesting to me, so I want to turn down that noise and- and I can do that through secondary sampling- um and you just have to reevaluate a sampling decision at every span I mean- and that can be a little bit heavy on performance.

D

So you need to be very, very careful about how how that sampling call is performant.

B

Yeah yeah yeah, so we have. uh The link that you got is basically the experiment that we did so we do have a prototype, so I haven't pushed anything to production, yet I'm still trying it out.

B

So that's just the idea. So another way of implementing that is like which is much easier on the client library is to collect hundred percent, but that do the secondary sampling in the back end, while the data is streamed, but so that that so another challenge is like now we have to collect hundred percent and then publish hundred 100.

D

Yeah, that's- and that was my that's my fear in that um I mean- and maybe it's just a premature assumption of mine, where I can't afford to collect a hundred percent of everything. Maybe there are some certain use cases that I must collect a hundred percent of, but in general I cannot collect a hundred percent of of everything going through the platform and then post-process it it's just it.

D

I fear it would be too expensive, and so I'm trying to do as much work as possible up front to cut down uh how many spans my pipeline needs to process.

B

Yeah yeah, that makes total.

E

Sense, yeah a new relic. um We have a sampling uh strategy, that's uh reservoir sampling and it basically um so I was. I was trying out the new sampling uh api that we talked about last week. That was my first week, so I'm still somewhat new to all this, but um new relic does the sampling um just before harvest rather than when a span is created uh so that it can determine um how many you know spans have been created, and if we haven't created more than some maximum number, then we just process them all.

E

But if we have, then we can at that point when we harvest drop the least important ones and so uh the api. I know that that we were discussing last week is, I think, just for up front when this fan is created. Is that right.

C

Yeah, that's right.

E

So we were thinking it might make more sense for the type of sampling we do to be integrated into our exporter rather than using this sampling.

B

Api, so did you say it's called reservoir sampling or what is it called.

E

uh Reservoir sampling is what we call it internally. I don't know if there's a uh like a a wider term for it, but yeah.

F

That's a standard scientific term uh in the computer science literature um you can find several algorithms uh um they're good things. I actually want to see us get some standard reservoir sampling because it gives you the ability to do um statistical like statistically accurate information like summarization of data that you can't collect. All of um so I'm glad to hear that I think in an answer to your question.

F

Elizabeth I'd say that the sample api definitely is meant for for sort of upfront sampling, the sort of head sampling decision where you might decide, I I still can't I actually can't process even generate all the data I want to say, do a head sample of 10 and then do reservoir sample just to drop you know or to sort of like improve the quality or the signal noise of your sample in the tail of that in the exporter, like you said, so, I would support that sort of design.

F

I I dialed in because last week we had uh sort of some action items about getting back to this priorities. Concept, which um was going to be there's gonna, be some follow-ups from a lolita. I don't see her here, um um and so I don't actually have an agenda or I was just gonna. Listen in.

F

um But as long as nobody's talking, I I would wanna, I do have an idea of the pitch. I can't remember which issue it's been discussed in the most. um It is that I do enjoy reservoir sampling. I think it's really something we ought to be doing, um and so there's been this proposal to have a sampling probability somehow encoded as data on the spam.

F

So this in the exporter now you've done some reservoir sampling and the thing about reservoir sampling is it's in size. That's the word reservoirs for fixing size and, if you end up seeing more data that can fit it's going to selectively drop them in an ideally unbiased way, so that you can then extrapolate from the statistics you get out and summarize the whole population.

F

Now, if you're doing that and you're doing this correctly, it means that you can see one span in your exporter and and say this represents 10 spans in the output and um there are sort of simple uniform strategies for for this type of sampling and there are more complicated weighted strategies for this sort of sampling, but both in both cases. What you get out at the end of your analysis is an estimate of either probability or, if you invert, that you get an estimate of count.

F

I like to call it sample count, because it's in the natural units that we think of when we talk about statistics of these things, this, if you've done one in 10 pro sampling, which is sort of a probability sampling, I would call fixed probability sampling. Then every span you get out is going to have a multiplier of 10 on it.

F

So I call that sample count if you're using reservoir sampling, you don't know what that counts, going to be until you actually close the period and do the computation, but you do get a number which is either probability or count and that can be used to generate graphs to do sort of approximate analysis in your downstream system. So that's why I was here to advocate that we actually put that information in the spam and I prefer to see it as a span data field.

F

Not as an attribute, I don't think it's sort of first class information about the spam, that's being carried out, it's really sort of metadata about how this fan was gathered.

F

That's my position on sampling. I don't particularly have a strong feeling about this. Other concept. That's been discussed about sampling priority. It has to do more with what you do in band when you're calling your peer to tell them how much sample you think you have, and it does get more complicated when you're doing scale, sampling or as reservoirs something because in the moment, in band when you're doing reservoir, stamping in the tail, you don't actually know your effective probability.

F

You only know like an upper bound or a lower bound on it, so it gets complicated when you talk about mixing all these things together. um That's all I had to say.

D

Hi josh. um Yes, I remember talking about this. This sample counts uh issue on the github issue. um That's that's tracking, this stuff and I'd love to get some some more momentum in in defining this, um and- and maybe I can provide some more examples of how sample count could be useful.

D

um You may want to have like multiple samplers um running through your data and if, if the subsequent sampler can pick up the sample account of input data, it can continue to use that in its output, and so it will output the further sampled data that can still be reinflated to represent, um I guess some semblance of the original population.

D

um So yes, that would that would support subsequent sampling um like another kind of sampler, it's a little more um simple, I guess would just be like a leaky bucket rate limit kind of sampler and- and that could I mean it's, not probabilistic, but it can keep track of what it throws away and then, when it outputs something, then it just outputs the the count of all the things that it threw away along with its output span.

D

And then there you have your sample count. So it's it's kind of a concept that can work with lots of different kinds of samplers that you could conceive.

F

Yeah, you just gave an idea that I've never considered about how to estimate the probability from a from a leaky bucket type sampling uh scheme. uh Yeah, you could sample the things you drop and then add them to the things you keep. I guess um I haven't really ever looked at that, but I am familiar with at least two good reservoir sampling, algorithms.

F

um Maybe three actually uh and I've experimented with them. They work I really like them. So I want to make sure that we we all kind of if you, if you're, not familiar with how to get a estimated sample count from probabilities.

F

uh Reservoir sampling is really one of the best tools we have, um and it gives us this answer which will then let us, you know, only collect one percent of the data and still have a fairly accurate signal, depending on sample size of the entire population.

E

We do resource, oh sorry, go ahead, please I'm just going to say the way that we do reservoir sampling at new relic. We also doing incorporate a priority um so that it's not completely random. So if, for example, a span has an error on it, we can increase the priority and thus the chance that it will, uh you know, get processed, and so you know we, you know we definitely want to keep track of. You know how many uh are getting dropped.

E

You know how many and all of that, but it's it'll the way that we do it it's a little harder to statistically tell you, know things about them other than the numbers, because you know, maybe uh you know 50 come in with errors, and so we increase the the priority on those. So all of them get sampled, but that does not necessarily represent the the percentage of them that had errors.

F

Yeah I see um I I'm uh I've. At least I think I I'm trying. Let me try to say um that sounds good. I think um there may be ways to it. Doesn't I'm not? I'm not sure, I'm familiar exactly with what you're doing, obviously, but um I I am confident there are ways and probably ways that you're actually doing to get fairly good signals. Out of that which is what's what's really important here.

F

um I think the the debate that I remember in one of the issues was whether there is a um firm sort of mathematical concept here, which is that there is a way to put a single number on a spam and extrapolate information from it, which is, is the claim and all there's so many ways to compute that single number and they all have different characteristics.

F

um You know you can move variants around like you can improve accuracy in one place of the data space and, like you know, you're sacrificing variability, you know variance for bias and so on, um and ideally we're unbiased. But there is a concept of um I. I don't like the phrase biased sampling that has been used in the observability space. In recent years, I think from looking through statistics textbooks you might actually prefer to call that unequal probability sampling and there are papers which I've posted links to in some of these issues.

F

Talking about how to implement weighted reservoir sampling. When you have weighted reservoir sampling, you can implement this unequal probability sample strategy. So I I I will look up this issue. I gave a brief summary. I wish I had code. I could just open source and I've actually asked my company to. Let me do this, but um so, let's suppose that you have a a reservoir which, in which you've captured for temporary purposes, all of your data right um excuse me for temporary search uh purposes.

F

You've aggregate you've got all of your data sitting in a buffer and you and it's like 10 000 spans, but you only want to send out 1000 spans so you're in the tail of this. At this moment, you're exporting from span, you need to reduce the data by a factor of 10.. If you have a, you can take two passes over your data. Take the first pass over your data you're just going to count how many errors are there and it looks like.

F

Oh, there are three percent errors out of my 10 000 and then you can set a target and this is a very narrow application. I want to say there's a million ways you can deal with probability here, but let's say you have three percent errors in your sample in your total population.

F

What you then can do is construct this arbitrary weight factor and you want the the weight factor on error spans to be 33 times as much as the weight factor on non-error spans, because then, when you multiply them out the span times the weight you get an equal sum. So the sum of weight times span count for errors is the same as some subtle span time weight count for non-errors. Now you run your weighted probability sampler using a reservoir.

F

You pick a thousand spans at the end and they have been chosen in an unbiased way.

F

So uh what what that means is, you know, like every span of an error, had an equal probability of getting into the output, and every non-error span had an equal probability of it within its group of getting into the output and this reservoir sampling algorithm um or any any weighted reservoir sampling algorithm will then give you a select for you, a thousand spans out and each one of those gets a weight factor applied to it, which is going to be approximately 10.

F

uh Well sorry, because the output now of this algorithm is, we expect 500 error, examples and 500 non-error examples. That's what this map does here, because the goal of a weighted probability, sampling algorithm is to um give you estimated sums, and so the estimated sum of all error spans is going to equal the estimated sum of all non-errors fans.

F

Given this the weight factors I just described and what that means is you'll get approximately s expected number 500 error spans expected number of 500 non-aerospace, but remember I had the original was ten thousand spans and three percent errors, so that means there were only 300 errors.

F

Well, in this case, I'm going to get expect to get all 300 of my errors in the sample, because it's fewer than half and my other 700 are going to be the non-error case, but they will have been selected again in an unbiased way and each one of them has a multiplier.

F

The multiplier on the error spans in that case is exactly one and the multiplier on the non-error spans is going to be add up so that so that they each represent seven hundred out of ten thousand seven nine thousand seven nine thousand ninety three thousand out of ten thousand or something like that. Ninety three hundred out of ten thousand that ratio inverted, is the weight or something like that. I don't do math very well in my head, but um I don't know. Hopefully this didn't sound extremely complicated.

F

um This is the basic approach when you use weighted sampling and it gives you the sample count. The single number, which is, I think this band- represents 10 rising. This 10 represents 20 and you can use those numbers to generate statistics that are they're accurate, if not precise,.

D

Yeah- and I think that it's important that we agree on on what this thing is because then we can start building building out those analytics back ends that can all take advantage of that one number that represents what we needed to.

D

um While you all talk, I'm going to find the issue, because I gave a quick summary of.

F

A

Yeah josh, I look forward to to reading that paper right. I think what you said makes sense to me my. um Hopefully, this isn't kind of a stupid question, but um you're trying to get um you know essentially metrics that that aren't skewed and so you're trying to get like an accurate percentage of errors and non-errors and so like. Where does the scope of you know this sampling metrics accurate metric, stop and like pre-aggregated metrics begin?

A

Does that make sense like what's part of the metrics pillar versus the traces.

F

A

F

uh I'm not sure if I exactly understand what you're asking, but I um actually can you try and clarify the question again.

A

Yeah lynne, millett um is that is it. We have the concept of like pre-aggregated metrics in azure monitor, and this is basically doing the same thing as part of the sampling. But yet it's not part of metrics right. We're talking about trace sampling, yeah.

C

What what we have in azure monitor that similar to new relic? We do sampling class thing after we collected everything and we use the tracing data to uh build metrics like number of http requests with certain cardinality. uh So I think what matt you're asking is, uh whether the approach described by josh helps to solve this problem. Isn't the question.

A

Yeah exactly are we solving this problem, or are we also solving it again on the you know, metric in another way with prayer graded metrics right exactly? Does it solve this problem.

F

I I'm not sure if I'm fully following, but I think I can answer affirmatively that that the goal here is to say we're only looking at spans, we're not concretely computing metrics after seeing 10, 000 spans and computing unbiased, that's really, the key, unbiased, unbiased, unbiased is that every span had an equal chance of getting into the output according to its weight and therefore those you I mean this is just like this is like basic statistics and I'm not really an expert, I'm like not a mathematician, but um because of the unbiased property that you get this estimated count or inverse probability whatever you want to call it, which um is uh accurately.

F

That is accurate in the sense that I can sum any subset of my data sum those counts and estimate that that is the number in the whole population. From my sample. um I just put a link in the chat where the comment I was referring to. There are two papers that are linked in there. um This the authors overlap, there's sort of a sequence of papers, um the one the first one, the sort of simplest one is called priority.

C

F

And that's another reason.

C

F

This word priority gets confusing for me because I know an algorithm named priority sampling, it's a very simple algorithm beautiful. Actually, it's not the easiest to apply. But if you read through that paper, it's the first one linked there from 2005.

F

The whole point of that paper is is what they call subset sum so you're going to compute a weighted sample and then using the items in your weighted sample. You can um estimate the weight of a subset, an arbitrary subset. So that means we can. We can filter our sample count. The sum sum the counts and say that's: the proportion in the whole population, and this this meant. This thing I mentioned earlier with a two pass algorithm where I I compute the probability of an error and then use that to generate a weight.

F

I've been calling that to myself, I call that inverse probability sampling and there are some papers out there that roughly describe the same thing and put the math behind it. That explains what's what's going on, um but by the same authors. Actually, if you just go digging through the sort of citations to these papers. um So uh if you read through this first paper from 2005 called priority sampling, it gives you a technique for um what I call natural weight sampling.

F

So if you're counting network bytes, let's say you've got a packet with those 500 bytes you've got a packet that was a thousand bytes. You've got a million packets and they each have a size, and you can only store a sample of a thousand packets, and so you do that using the actual size of the packet, and the reason is that your goal in this case is to estimate something about the total network traffic from your thousand packets and your thousand packets can be categorized and summed in many different ways.

F

That's the subset thing, so I can say, choose me all packets, with a particular endpoint for a particular ip address. Now sum all those counts, and that's the number that you expect approximately for that one endpoint from your sample data. So that's what I've been calling natural weights that you actually have a weight on the piece of data itself. The case that I described for the hit the tail sampling earlier, where you take two passes, is you're just making up weights to give you the probabilities to give you the output you want.

F

So it's sort of like you, you know the frequency you'd like in your output. In the case I gave earlier it was, I think I want half of my examples to be errors and half of my examples to be non-errors, regardless of whether it was three percent errors or 70 errors.

F

So there's a second step here where you can just build weights that give you the output. You want- and I keep saying, there's just 100 ways to do this, because there's so many variations on this basic approach, but um I so anyway, I like to think of it in two steps.

F

You first just understand how natural related sampling works with a reservoir and a weighted sampling algorithm, and then there's this next step, where you can take two passes or there's again a million ways to do this, um so that does give you, I think, just trying to tie this back.

F

The original question, metrics from spans metrics from a sample of spans, is what we're actually saying so that the you can basically compute a pre-aggregated and sorry- and I don't say, pre-aggregate, you can compute an aggregated metric, approximately from a sample of spans if it was done in an unbiased way, and um that that is something I feel is not quite well enough understood in the group of open summit tree to push forward on these issues like I just want to have a sample count. It's a double.

F

It's a floating point number, because it's going to be approximate and it's on every span and if it's not set, it means I wasn't using probability sampling um and that's okay. You can still keep your token bucket or whatever it is, and but I think what I'm sensing is that in the open source community, there wasn't an unknown solution for this problem, and so they adopted token bucket sampling, and that means that tracing has been seen.

F

Tracing exemplars have been seen as merely examples not as samples, so I will find you a trace that has the exam that has the property you're looking for, but I can't tell you whether that was one percent or ten percent or three percent of the population, and so the goal here is to step us up to a point where at least we have the ability in our data stream to record information.

F

That gives us not just an example, but an estimated weight or a count on that thing and getting back to elizabeth's original statement like reservoir sampling is exactly the way you can do both a rate, limited sample and get probabilities at the same time, um and and just having this one field, there's so much flexibility in how a vendor goes about this. Like I mentioned this two-pass algorithm, it was very arbitrary. The way I described that you can just do lots of different things.

A

Josh, uh what what's going.

C

A

Oh, no, it's good, I mean it makes sense. I mean you said initially, you know you have to do. The algorithm would have to know kind of what the total sample size is and to how to properly assign the weights right. So there's kind of like some work up front. That needs to be done is that is that possibly why this hasn't been done before it's like kind of more expensive.

F

To do that, initial calculation.

A

F

um I yeah, I definitely gave a kind of boot for a solution here where I was buffering, 10 000 spans, which is expensive and that's sort of the drawback. As I took two passes over my data and you don't always have that luxury, um and I keep I'm also kind of waving my hands here, because um there are, I mean, like it's, not difficult, I think, to come up with a strategy.

F

That's adaptive is like I can say, I'm gonna, like kind of make a guess like in the last hour, I had three percent errors, so I'm just going to guess that I should wait errors 33 times more than non-errors. That's not necessarily going to get you the output you want, but it's also still an unbiased thing so like if you also all of a sudden, have no errors. Your your output's still going to have statistical validity. It just won't have any errors in it.

F

If you all of a sudden have 100 errors, the output will have 100 errors and everywhere in between there's um there's these trade-offs between bias and various that are just kind of you're. Shifting around your uncertainty, like you, must have new uncertainty because you reduce the size of the data, so um the various approaches basically have different properties and um I'm really not a mathematician.

F

So I'm not so good at explaining that these trade-offs sort of technically speaking- um and I mentioned- I don't know- let me pause again- I'm definitely advocating for this field and I'm happy to back it up with any talk about sampling, algorithms that we can.

F

Have if you wouldn't mind, I want to indulge another idea which uh I mentioned a week ago. The reason I've come to this meeting is not just to like to sort of promote the idea of weighted reservoir samples and so on, um but in the metric space, where I've been um we've been sort of finishing up the specifications.

F

um I have this it's more of a personal like mission. I think, and then that's why I did back off a little bit, but um the metrics world has been divided between prometheus and staff, see for 10 years or so, and um the prometheus world has a very sort of strict notion of what's allowed as far as cardinality this fc world. Just does not so you you have these sort of different users user pools, one of which is accustomed to. I guess, high performance and low cardinality and the other is accustomed to less performance.

F

But high cardinality is okay and um getting to a point where open telemetry had a viable metrics library that any user, including assatsy or prometheus user, might accept, meant sort of like at least opening the door to high cardinality, but also not requiring some sort of huge memory consumption, which is what prometheus has. If you use high cardinality and the way we've I think approach this is um rests on a belief that I can that I have, which is that we can use the same stamping ideas.

F

I just described to reduce the dimensionality of a metric so that you may have metrics coming in with three dimensions or four dimensions and the exploded cardinality of those three or four dimensions might be very high, so you might say in your. I only intend to monitor precisely in terms of exact counts. I might prefer to only monitor two of those dimensions, so I'm going to drop two dimensions.

F

Well now I've lost information, but I could get that back with a sample, so I might, for example, say I want to count this particular metric, but I'm going to reduce the dimensionality, I'm going to group by two of my dimensions, get exact count out now, I'm also going to compute a thousand exam example points from a sample that includes all four dimensions, um the the all four dimensions.

F

I can. I can do this in actually two ways I can compute one sample for every combination of the first two dimensions. I could compute like 100 points for every exact combination of the first two dimensions, or I could compute a thousand points for all of them and each in each case, I'm basically going to do the same type of thing.

F

I'm going to say I would like to get um you don't actually have to do any of this weight stuff, for example, I'm just gonna say uniformly sample all the events that match some combination of my first two dimensions and get a hundred points out. Those hundred points have sample counts in them, just like I described for spans and, if I add those up, those will estimate approximately the missing dimensions in my exact calculation.

F

It's the same approach, saying that you know we aren't going to collect all the spans, we're not going to collect all the metric dimensions, but here's a sample that lets you approximate those metric dimensions that were that weren't exactly calculated um so I've. Actually, we have a proposal that puts a sample count field into the raw exemplar for a metric data point, um and I intend to say it's exactly the same thing that is being discussed for sample count in a span data and I'm now going to stop talking like somebody.

F

D

I just caught up um so I posted and I'm trying to keep notes here uh as our facility. A regular facilitator is not with us. um I put a link to the discussion of um in the open, telemetry specification uh that request, so you can open that up. I just caught up with that. I had no idea that there was a lot all this movement uh from 10 days ago, but but thanks to josh for moving this thing along.

F

I'm happy to help. This has been an area of personal interest for me for a long time um before, dating to before I came to lightstep. Even so, um I've been trying to figure out how to solve sampling in high cardinality as a user practitioner for a while, um and I ended up asking around at google back when I worked at google and found these algorithms through sort of asking smarter people than myself.

D

So um I think I brought this this up before a couple of weeks ago, um or maybe it was three weeks ago when I last attended this meeting and ted young. um He was talking about. Well, I guess he he didn't.

D

He wasn't part of like directly part of this conversation um around sample count, um but he kind of knew of it and he knew that there was a lot of waffling and, um and he suggested well could we could we just like um uh put this behavior in a sampling, plug-in and- um and I I'm not familiar with the sampler plug-in api, um but but I feel like it's a strong enough concept that, like that, can work universally well and put it in the spec rather than just um these sort of uh sampler plug-ins that you could write that that may or may not support it.

D

C

We're doing more.

D

Unless we need some more like um empirical evidence of it, of its universal usefulness, and that may be what we kind of have to start with, um yeah there's.

F

This trace state thing, which is where you can put vendor-specific stuff, and you could imagine like fitting it in there. I guess my I think you're right, there's a question and there's a lot of uncertainty. So um the question is: is there a broadly useful meaning for a single value called sample count?

F

The answer is yes, I think, and then but but but you find yourself wanting a little more just as far as like I've described, how there's many ways you can compute this probability and that trade, bias and variance, and so on, like you're, not going to know how it was contributed and there there could be 100 ways that you could have computed that and that information about how you computed the number does impact the data set.

F

Somehow and that's not known like if I was using one type of reservoir sample versus another there's just trade-offs here, um so I think it is a middle ground. An argument is that the middle ground is that there's something useful in the middle ground and if there's something more complicated, you need like it's, probably not. It doesn't fit.

D

Now, if we, if, if sampling plug-ins, only give us access to trace state, whereas like a sample count as a I think, you're saying a span attribute, um I I think sample like within a single trace um and when we're talking about secondary sampling, then then, if we're going to use sample count and secondary sampling, then it actually has to be part of every span.

D

um Trace state is something- and I mean tell me if I'm wrong here, uh but is trace state just like something that um applies to the entire trace.

F

uh There's a choice: I'm there's like an area where I've felt vague uncertainty just because there was something in this in a spec that didn't have it wasn't flushed out enough to know exactly what it was there for, but there is something both in the spam. Struct called trace state as well as I think in the link structure, which that's one where I get confused about, but so there's like. I don't understand why you have the same thing in both your link and your node, but.

C

Well, I can explain.

F

So, oh thank you. I remember sergey explaining that microsoft had a position here. So um probably you could explain that.

C

Yeah, so the idea of the trace state is that it's something that flows with your trace, but you can mutate it uh on every boundary and it is like uh from my best understanding. It is a way to stitch things together.

C

Let's say you have legacy uh correlation um protocol or legacy ids that don't fit into the ws3c trace parent or if you need to propagate some control information, so um our interest from microsoft site here in sense of sampling, that we want to propagate the score of the sampling, the value, the hash value you calculate out of the trace id and it's somewhat similar to what your josh is describing.

C

The item count, and I would be curious to think more about it and understand if you can do this, the one thing that would work in both cases. So it seems in your case, you're more interested in in-process value and we are interested in both in process and the fact that this value can flow downstream.

F

Yeah, I am familiar with the complexity that arises. I like I'm thinking backwards. I was on. I was in the dapper team back at google and there was this mechanism that they had and I didn't own the code. Thankfully it was pretty complicated, it was all c plus plus, and it was basically like at the moment when you start a spam.

F

You have a probability that you're that you're given is a target for your own process. If you're a root, then just flip a coin or whatever with that probability, but if you're not a root. This is when it got complicated, so you're expected to propagate your parents, probability in so that you know at the moment, when you're starting a child, what your parents probability was, and then you have your own probability and you compare those two if your own probability is less than the parents. You just like.

F

Take your parents decision, but if your own probability is greater than your like, then you make your new decision create a new route and so on- and this is the case. This is the point where you want this notion of an in-band probability that it might be, and this is where it gets really for me, it's complicated because tail sampling is different than head sampling and the sample api that we have is head sampling.

F

So if you will indulge me for the moment, try and explain one thinking here is, like you start a span knowing its probability, let's say: you're a root, I'm I'm head sampling at that point and let's say I've: I've decided to sample only 10 at the head because it's like too much throughput too much data. So so now, I'm running through my code and I've done 10 samples. So I have an effective count now of 10.

F

and I start a child, meaning I'm gonna make an rpc and I'm going to send some information to that child um through an inband connection.

F

This is where we've there's been a request for this thing called sampling priority and I think it came from microsoft and it's understandable. um But when you think about tail sampling, it starts to get a little bit trickier, but but it's still legitimate, it's just really complicated. So then you make your call to the to the child. You pass some numbers saying I'm sampled with max with 10.

F

Now the child may have some logic that it wants to do which says I I have myself. I have a absolute rate limit on on spam output like it's not even about sampling rate. It's about, I can only send thousand spans per second like and I'm getting 15 000 requests per second, so I have to do something so at any given moment, I'm going to get a new fan, starting in the child. I have to make a decision.

F

um I don't know a way to do this other than to be speculative and to say I I I have an estimated target and I have an incoming probability and, like I'm, gonna, have to flip a coin and just like in order to get my output rate, I'm going to do something based on the input rate, but the input is still speculative side, because the problem is there's a divergence.

F

The child gets some information about my temporary state, which is that at the head I was sampled with 10 percent. By the time the parent gets to the tail.

F

They may want to do some research, some tail sampling at this point, you don't know whether your child, what your child decided to do, because it had its own resource constraints,.

C

F

And yet yeah there's no good solution here. All I can say is that in band I think we're actually propagating is a head sampling position or else like a upper bound or a lower back. It's a lower bound on probability. It's an upper bound count. You can only you can only you can only change that in one direction you can't lower. I'm sorry, I'm getting.

F

You can't lower a raised probability, one way or the other. It's it's one of those directions. You can't do. You can only um increase your sample count.

C

F

By refining your sample, well, you can't introduce sorry excuse me.

C

um Yeah, so if you sampled uh in head-based sampling on the on the incoming boundary 10, you still can have lower rate somewhere else, uh and I guess the the best effort that we can get here is not like consistent sampling everywhere right. We cannot do this ever, but um what we want is to assuming you're um services are configured in like any compatible way like you, don't do completely different sampling, algorithms in different places.

C

What we want to achieve is that no, uh we want to eliminate this algorithm, how you flip a coin. So basically you flipped it once and you have this double or flawed value, and if later you want to flip a coin, you you don't flip it. You compare it with your probability.

C

So this way we just don't uh stick with a particular sampling algorithm. We don't even care, it could be random.

F

Yeah, well that that that actually connects with this paper, I I linked to from 2005 called priority sampling. The way that one works is you just literally attach a random number to every piece of data and it's got to be safe. So at the moment you get a new piece of data, just generate a random number between zero and one and attach it to the data and make sure it stays with the data. And then you can resample it downstream and you can continue resampling it and the properties of the algorithm work.

F

I mentioned that it was a little harder to use this algorithm and it's because you have to attach a random number to every piece of data, but and that's why the second link is actually easier to use harder to implement, um but uh that was that gave me an idea when we're having a conversation about this fixed decision quality. Is that instead of having this spec out spec for a sampling priority, which is like a number between zero and whatever you could instead uh create a pro?

F

The proposal is you, you have a sampling, random number and and in the paper it's called alpha, so I think of it as alpha.

F

So you just say the alpha for this span is you know, 0.36 or whatever that's a random number and and then any particular decision you make is going to need some randomness and that's your alpha and then the next question that everybody asks is why don't I just use my trace id as a random number and then any kind of shrug and say I'm not sure that that trace id is truly uniform or truly unbiased in some sense about so so it depends on how good you want this to be yeah.

F

You could just hash your trace id, let that be the random number. But then you need a standard hash function, which is the whole problem we were trying to solve, so maybe having a random number as the standard would work, although I don't really want to fight for it.

B

So these are like smart ways to we're talking about smart ways to sample, but I'm curious like another way to look at another angle to look at this is what is the use of the trace data that is collected this way? Like you know, what use cases can this enable right? So, ultimately, we are going after troubleshooting use cases right, so data collected this way. Would it be useful because in in our experience, what we have found is like anything like anything less than 100 like for what I want to look at is uh useless.

B

Really um it doesn't help other than just understanding the service topology or you know things like that, but not really for troubleshooting.

B

So when I say hundred percent, not hundred percent hundred percent sometimes like, if I get hundred percent of what I want like, for example, um we have on demand tracing where uh we sample hundred percent for certain criteria, and it said sampling again, that's one possibility and the other one is the secondary sampling. So I want to uh look at trace for few services, so we collect hundred percent for one or two services.

B

So that's helpful for things like okay, so 4k video is not being played for a customer, and the customer support representative representative can use the customer id and do the analysis and determine why 4k video is not played for them. So.

D

B

I think it comes down to like how can we use the data right.

F

Yeah I mean I think, you're describing like more of a to me. That sounds more like a logging application. You do really like logging, less than 100 is troublesome.

F

I think- and I think part of this sort of appeal of open tracing or open telemetry is that you may be able to uh sample spams but ensure that you get 100 of logs in each span, so that you're at least looking at a consistent all of the information when you're looking at a single piece of it, but um and that's that's- maybe sort of arbitrary definition, but I think of it that way.

F

um So, if you're, if your use cases, I need to find exactly some specific information, it's rarely the sampling is going to get in your way. True truly, um so I think that just the quick answer to your question is really that if you have sampled data, you can generate metrics from it. That's that's the application um and um there are, and- and hopefully you can tolerate- high cardinality, because there's much more of a convention of using high cardinality tags on spams than it is on metrics.

F

So and and then I think, some of the applications that that these vendors have including us, especially on honeycomb, I'm thinking right now, are like I'm issue. An arbitrary query give me some statistics on the rates of things that match my query and, like you can generate graphs from an arbitrary query using if the samples are unbiased and so on. So that's the use case. It's it's. Please give me some graphs about this broad data that I've summarized right.

B

I guess there the assumption is: the metrics are created from the spam data, because the thing that is getting counted is not instrumented otherwise, but in practical sense um like we have ipc counters, like you know, for any any rpc calls made. There are like separate counters, at least like for us. Metrics is first uh when it comes to instrumentation anything to do with accuracy and like that's the primary tool for troubleshooting.

B

So getting count out of uh like using the span. Data is not necessary because the data is already collected via other counters.

F

Yeah, I you're absolutely right, I'm not contending that! There's like you must use this or that, like you, have to get that benefit from sample data or probabilities. um Here.

G

um I I see what you're saying as a sort of.

B

um Yeah, we won't even go to that level like why count the spans I already have the counters or if there is no counters, might as well just add a counter.

F

Yeah, I guess I guess the the only leftover bit of thing there is is when you have high cardinality it um you'll have too many counters and then that's I potentially, um and I I again it's very speculative. I I am as a vendor. I don't even know that this is going to catch on. So I'm I'm just promoting an idea here, which is that.

B

Yeah, so uh I'm I'm not an expert in high cardinality metrics data, but from what I have seen the other parts of the organization. The way they do, that is high. Cardinality data is converted to a matrix uh to a metric uh using our stream processing system called mantis so there, depending upon what to do with the metric. uh Certain tags are dropped and the the metrics the metric that is created represents what they're trying to troubleshoot so yeah.

F

That's that's fair, that's fair! That is probably the most traditional way and I'm advocating for something non-traditional a little bit so, but it's just really, I mean advocating for something that that can be useful if you use an algorithm that generates this number and it um yeah- and I guess another way if I just one last way of looking at this- is that lifestep started as a tracing company and and doesn't do didn't, do metrics we're we're starting to get metrics in our product right now, and so so that change.

F

That's changing, but and and but if you look at sort of us and like honeycomb who's, sort of a peer of ours in the in the space right now like the whole model is you've got events and you sample them and you make queries and you get graphs and the graphs are the time series of the sort of events and you can't collect all of them. So um it's a new way of um thinking about, I guess, a sort of a replacement for metrics, not all it's not completely replaceable for metrics.

F

It's sort of a complementary tool here um that lets you dig into high cardinality data and and- and I think just another way of thinking about what you just said earlier- is when you sampled your data, you're, never going to be able to use it for this sort of like finding a needle in a haystack there's just too many needles and there's the haystack's only so large.

F

So what do you use sample data for? Well, you can usually pick out the like heavy hitter or the like leading cause of something so like. I may have described earlier a case where I'm going to drop some dimensions and it's like those dimensions, cost me too much because of cardinality, but I'm going to keep this sort of like modest size sample. That does include a little bit of information about the dimensions I dropped.

F

It's very approximate, but and and and some some of those pieces of data are going to be so um tiny that they don't statistically mean anything. I don't have enough examples.

F

The variance is too high, but some of them, like the like, if I have a particular key value, a label that has 50 of the data it's going to show up and it's going to end up being statistically meaningful so that I can, if there's a sudden event, I may not have accurate information about the tail, but I have some good information about the head of that distribution and I can say, oh, it looks like some change happened here and I see approximately that 50 of the change was caused by this particular user.

F

Now I have a new clue: um that's the sort of application that people are getting from sample data. um I think.

B

F

Yeah all right, so I think what we're struggling with here in open telemetry is that there's so many ways to use observability data, so many different approaches that it's hard to like think about all of them at the same time and the what we described doesn't always work so so we have to fit kind of different solutions into the same api, sdk.

B

F

Stuff about tail sampling and propagating in-band probabilities is just still extremely complicated and- and I don't have there's.

G

I don't have a great, I don't have a great feeling um yeah doing anything on.

B

The the client side, when I say client side what I'm referring to is on the the tracing library or instrumentation, is changes rolling out changes takes months so even like. If you fix a bug in that probability, algorithm like let's say you have implemented something and you want to change something it's and it's going to take a while to roll them out. That's one problem and the second problem is uh like variety of uh runtimes like node.js, python, java and and so on, so get getting this uh implemented um across.

B

uh You know polyglot, uh you know run times, that's another challenge, so this this things, like you know, make make uh it more complicated to uh like compared to metrics and logs, like metrics and logs they're, like you think, they're like very stateless right. If you want to fix something and fix it and roll it out, but with trace it's it's like you know, the entire fleet um yeah.

F

We have this problem in tracing in like more than one way like you want to assemble a whole trace, so you got to figure out how to gather all those spams and you need a buffer. And so um the only response I have to what you said now is is that the collector, the hotel collector, ought to be able to do sampling so that you can say cost 100 of your spans into your collector and then do something more clever.

F

But you still have a problem of like I need all my spans from the same trace to be in the same place, potentially if I'm sampling or, if I'm doing anything, based on trace structure. um So so, and- and this is the architecture that lightstep starts started with we're changing that as well. But you know we collect 100 data into our satellites and then we can do this type of summarization. But it's it's.

F

um It's still tricky to get whole traces, um but you're at least able to generate this sort of metric signal from a query over your spans, and so you could imagine having a reservoir sampling algorithm in the collector that that just outputs data, it's still tricky.

F

um Yeah uh but at least for me, you don't have to implement this complicated thing in every every um library.

B

Yeah, the more we think about it uh here at netflix. What like we try to keep the the tracer library instrumentation very, very thin and move the complexity to the back end. That's what like general uh approaches that, but then we'll have to take case by case like whether it is feasible to do it in the client or in the back end, wherever possible. We'll first try to do it in the back end. So that way, it's easier for us to operate.

F

Yeah, I think we should we should all hope for this to get into the collector and that one of the reasons that I was um originally I mean this is another connection here. Is it? Is that there's no such thing as a reservoir sampling algorithm in the collector right now, but there is a rate limited sampler which is non-probabilistic, and I think that we could have a probabilistic rate, limited sampler if we just swapped in a reservoir algorithm. um So that was my goal.

F

I mean I made that post that I've linked to you can check it out. My goal was both to like say: I want this for metrics because of high cardinality. I think that's interesting, but I also want this because right now the the collector is doing rate limited collection, but not giving me probabilities, and I think we could solve that.

F

We've reached the end of an hour. um It's been lovely talking, I don't I, I don't know that we've made any decisions or if we I think we were missing a lawyta. We were going to talk about this sampling priority question, which uh um lumiela was also interested in. um So maybe next week and I'll be back and I but I hope I'm not derailing this conversation by ranting about weighted sampling, for example,.

D

Oh that's my reason for coming too. So, thanks to two of us.

F

Great so I'll be back next week and we can try and push forward. I um uh hopefully uh there'll be more uh more to talk about um and maybe we'll be able to come to a decision about sampling probability as a um as a field, rather than an attribute see you next time.

A

Good to meet you guys see you guys bye.

A