Grafana Tempo, 7 Feb 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: FOSDEM talk: Getting started with Grafana Tempo, Joe Elliot

Description

https://fosdem.org/2021/schedule/event/getting_started_tempo/

A

Hey everyone: uh my name is joe, I'm with grafana labs, um and today the promise was to talk about getting started with tempo and to demonstrate an open, telemetry, instrumented application that supports exemplars and we're mostly going to be doing that. But things have drifted a little bit and it's kind of this presentation is kind of divided, maybe into three sections. Now we're gonna talk about tempo, as promised um we're gonna talk about the state of open source exemplars, and why?

A

What I initially said I was going to do for this session is not quite possible, but we'll still get to a demo in which we'll demonstrate exemplars in open source and talk about, maybe where those are, why they're not quite available yet and what it will take for those to kind of kind of limp over the finish line and for us to all be able to use open source exemplars.

A

So, let's start with tempo tempo is a new distributed tracing back end where that we built at grafana labs kind of with this goal to sample 100 of our read path. So at the time we were sampling, I think maybe 15 or so of our read path, and we often were having long queries that we wanted to diagnose.

A

Maybe a customer is interested in why a particular query took a long time or, of course, we are also interested in that because we want to, you know, understand why these queries are taking a long time and try to understand you know where that time is spent and how to how to reduce the latency on these calls and the way we found to do that was simply by sampling 100 of our read path.

A

We wanted to do this, but unfortunately, with our previous back-end, we would have to scale cassandra or elasticsearch to a point where the cost um in you know the cost in memory or cpu the operational cost. um The cost to my sanity would have been far more than was worthwhile, especially for me, I think, but this the size of this cluster would have just been. My whole job would have been managing elasticsearch cassandra, which is not what I want my job to be. I want my job to be building tempo.

A

Apparently so we built tempo, um the solution is yeah tempo. The thing we built- and uh we put the the difference here- is tempo um tempos uh dependency, the only tempos the only dependency of tempo is object, storage, so s3 or gcs are azure, um and these are of course cheap to to uh to manage. I mean, there's no management right.

A

These are managed services, it's very cheap, to store very large amounts of data in these services and to use them, and so the goal here is to basically build a trace um back tracing back end on object storage. Only the trade-off for now at least, is that currently tempo can only search by trace id. um I think the goal is, or at least the goal will be in the next versions of tempo- to support some sort of search, some sort of native search, but for right now we only can do trace id search.

A

So you are going to only be able to ask the question. Give me a trace for this trace id. This may seem limiting, but at grifano we found many different ways to kind of get around this and we'll talk about those in a second we'll talk about, maybe not even get around, but ways that we feel are very powerful ways to search for traces that do not require native search in your back end.

A

So tempo currently supports all major open source, instrumentation libraries, jager and open census, open telemetry um zipkin. So if you're already instrumented with anything like this, you can immediately start using tempo without issue. um We put things in s3 or gcs um and uh also azure. This slide was made before azure support was added and I actually didn't even wouldn't even recognize these symbols anyway, but that red one is s3 and that blue one is gcs and we visualize everything and shockingly grafana discovery. Okay, so you can only look up by trace id.

A

How do you discover traces and right now the answer is logs and we found we were doing this even before we switched to tempo.

A

We were often going through our logs to find our traces, so we would be logging on the same line as a trace id, and this is standard request, request response style, logging, um nothing new or weird about this, but log of trace id log, a http method, path, latency status code, a bunch of common parameters or a bunch of common fields, and these now kind of effectively have built an index into our traces and index.

A

We don't have to spend more money to build into our tracing back end, but instead can just use our logging system as it is now. Graffano is loki and our demo will be loki, but this works with elasticsearch or splunk or whatever you want to use.

A

Also exemplars are kind of the new upcoming feature that we're going to talk about a little bit in a in a bit here. The state of exemplars, but this new upcoming feature is a way to discover traces through your metrics as well. We'll talk more about what that means in a little bit so uh trace id search only in tempo.

A

Currently, we use logs very effectively we'll look about look at that in the demo and hopefully in the future, we'll be using exemplars. So where are we at um 380 000 spans a second or so, and it fluctuates more than I care to admit I wish it didn't fluctuate at all. I wish I could just push this higher and higher, but um we're at 380 000 right now um we're at a little bit less than 7 000 traces a second.

A

So if you do the math there, you can find out how many I should have done that before camera, maybe about 60 or so I could be totally making that up never mind uh ignore that trace's uh spans per trace is what I was trying to get to, but anyways we have about 380 000 spans per second about 7 000 traces per second, and our latencies are good. I'm very happy with this. uh Certainly we can always push this down.

A

In fact, recent additions to tempo include the ability to um scale the query front end or scale the squ scale, the query path. So we could actually reduce this quite quite a bit if we wanted to, but right now we're querying over 4 billion traces and our p50 is around 400 milliseconds, which I'm very happy about. um You can also see.

A

P90 is right on 500, milliseconds and p99 kind of reaches up to one to two seconds, occasionally so um very happy with these latencies, uh always something to improve, of course, but I think this is well within you know operational expectations for a tracing back end architecture of tempo. This is a little detailed. We won't get too into this, don't be afraid to not understand all these pieces, but this is architected roughly like loki or cortex. If you spent time with those open source products, so we have a distributor.

A

The distributor handles replication factor pushes our traces to adjusters and gestures, then batch those traces up into blocks, and then these blocks are pushed into our storage back end into s3 or whatever. We have this idea of the compactor over here on the side, and we are currently flushing. I think something like 400 blocks an hour. So the compactor takes those small blocks and builds larger and larger and larger blocks.

A

If right now we're doing what I'd say, 400 blocks an hour 24 hours a day, and we currently have a retention of two weeks so 24 times 14 times. 400 would be without the compactor our total block list length, which would require a lot of time to search. So the idea of compaction is to basically keep this um keep the length of this block list as short as possible in order to um in order to improve query performance um and then on the query path.

A

We have this thing called the query or its job is to well look into the ask the ingesters for recent traces and also check the back end. We have the query front end which handles parallelization of queries and sharing of queries out, and this tempo query piece is hopefully going to go away soon. It's actually like a shim to translate to jager, which is the only uh which is the only tracing back in that grafana, can handle right now. So that's why we're using it uh in the near future?

A

uh Hopefully graffana will support tempo directly.

A

Okay, so that was a lot of things and you don't have to know all those things to just get started, and this is supposed to be getting started so um check out these links here, single binary, deployment's important, so a way to just deploy tempo, not in distributor and just your query or all these crazy pieces, but instead to deploy it as a single binary to get started and to understand what it does and how to configure it. Tons of examples in these docker compose folder you'll see listed here we have helm options.

A

There's a very simple helm chart that I made that I think the helm community would hate, but that, but it's kind of the way I would do a helm chart and we have a elm chart, that's being pulled or a pr open now to have a more official, more robust helm chart that I think meets the expectations of people who regularly use home and uh jasonette said a you know, blob of jsonnet stuff as well to deploy so all of these different deployment options. um Hopefully you can kind of dig into these.

A

You know, look through the helm.

A

Yaml, look at the docker compose, get a feel for what configuration looks like and how to deploy this thing, so you can get started working with tempo on your own, okay, so tempo, uh it's our tracing back end discovery through logs and exemplars high volume is the goal here, but exemplars is something we really want at grafana and we need to talk about where those are because part of this presentation was supposed to be demonstrating those with uh it's supposed to be demonstrating those with sorry supposed to be demonstrating those with open telemetry.

A

But we need to talk about why we can't actually do that. Just yet so, first exemplars are exemplars are a record of a single uh request or an instance of a single request that was then aggregated away to create a metric, so the power of metrics. Is this aggregation right? I can aggregate a thousand a hundred ten thousand. However, many requests into a very simple number, a single floating point, and if I do that, if I aggregate it all away, I can query these extremely quickly.

A

I can provide, you, know, store them extremely cheaply and provide very powerful visualizations of my infrastructure, that is the power of metrics, but what's lost is the individual instances, the individual requests and that and exemplars aim to kind of um complete that picture, to take the aggregation to display the aggregation and give you all that power while at the same time letting you drill in and find a single instance of a request that was kind of used to create that aggregation.

A

So where are we um how? Where exemplars everybody wants? Well, I want exemplars other people probably do somewhere.

A

What's going on right now, well, they're defined in open metrics, so the openmetrics spec has been it has a defined standard for exemplars, but that is not supported currently by open telemetry and in some of the um in some of the issues, I've read: it's not they're, not requiring it for ga. So I really just don't know exactly what timeline open. Telemetry is looking at to support um exemplar. Specifically, I believe um they're talk. They want full support for openmetrics, so we are excited to see that, but it's not quite there.

A

Yet what about in prometheus client library, so prometheus, client library support openmetrics generally, where is it there? Well, happycat knows that. Go and python is ready to go so if you're using either go or python, if you're, using either of these libraries or these instrumentation libraries, examples are available to you now you can expose exemplars in an openmetrics compatible format or in the literal openmetrics format, which prometheus would then scrape store in its backend and make available to a visualization layer, java and ruby have issues open.

A

This unofficial.net client also has an issue open, there's a lot of other prometheus, a lot of other openmetrics clients out there, and I encourage you to um find the or, if your library is not go or python, if your language of choice is not go, go or python, I encourage you to go to your current openmetrics, client library, if you're using prometheus and using the.net or java or whatever library or some other one, please get into those repos and request openmetrics support.

A

Let the maintainers know these things are important to you, and hopefully we can kind of all get together and all get. You know support in these various client libraries for for um for exemplars, so in the demo today, I'm going to use go because goes already supports. Exemplars um and it'll, be you know, significantly easier, but what about our back end and what about our front end? So in prometheus? There are two pull requests now that need to be merged for complete exemplar support. The first is an in-memory.

A

This first is in-memory support, so this first pr when it is merged, will support an in-memory ring buffer, so an ephemeral storage of exemplars, um so these are just kind of held for a short amount of time. It's basically, however much time you uh are basically based on the amount you're scraping and the amount of memory you give to the ring buffer and then they're stored internally only and when they're thrown away they're just dropped.

A

uh The second pr adds uh add support, for example, ours to the wall to store permanently and then to remote right to push to a back end. So um after that, second one is merged. You should you would see, support for exemplars, hopefully start coming out in back ends like cortex or thanos, or some of these other kind of long-term prometheus storage, back-ends we're using so the first pr. Those of us use just used prometheus will immediately be able to use exemplars for the second pr.

A

Those of us use prometheus in combination with a permanent uh long-term storage backend, the second pr will kind of make it so that those back-ends can start recording, storing and exposing exemplars, okay, so grafana. Actually, support for exemplars is in the tip of master of grafana right now so soon, actually soon um for real soon in grafana. I can't commit, of course anything I don't work on the grafana team directly. um I don't def with them. I don't do any of their.

A

You know milestones or project planning, but hopefully maybe like in 7.5 or some soon near future release. We should see example. Our support, like I said right now it is, um it is tip of master merged already, which is fantastic.

A

Okay, so, let's get to the demo and talk about instrumentation a little bit. So the goal here again was to do open. Telemetry everything um and open telemetry has a really compelling uh and very powerful server and client instrumentation setup. uh Very easy got my server here right. So I just set up a new handler. I wrap my actual handler, so I have some http handler.

A

I just wrap it in this hotel handler and then I serve the hotel handler out of my server and then magic happens and it's going to instrument my http server for me same with the client side. So in this case I'm just replacing the transport with an hotel transport, and in this case my http client is now instrumented.

A

In this case, I'm worried about tracing so context will be propagated correctly from client to server and everybody will be happy. This is all going to work nicely and with a very few lines of code. Of course, there's also like boilerplate setup code to initialize the tracing libraries and get things set up, but for just kind of using an http client setting up a server. um It's pretty tight set of clean code, but open telemeters and support exemplar. So here we are.

A

This is the demo you're about to see our metrics are going to be set up with open, metrics and prometheus, and our traces are going to be through open, telemetry and tempo.

A

So tempo is our backend open, telemetry is going to be our instrumentation library and then openmetrics or prometheus is going to be our metric side and then stored in prometheus, and the reason is because open telemetry, open, telemetry just doesn't support exemplars yet, and you can find this example kind of at the link there at the bottom github joe elliott tracing example, cool so to the demo.

A

So this is grafana. This is not. This is a build off the tip of master, which is why um or is it built off the tip of master? So this is not something you just pull in a docker container or just install. It's not ga or anything like I said. Hopefully, support for this will exist soon. I'm querying prometheus. So this is uh this. Prometheus is um a image from actually callum's branch, so callum stan is uh head. The two pr's he's the he's.

A

The maintainer who submitted the two prs, for example our support, so the prometheus image I'm using is um just one of his personal images. You can see it there's a docker compose, so this is again a open repo. It's a github repo.

A

Please go check it out and you can see the docker compose exactly what images I'm using. So there's nothing like hidden here, there's no tricks! This is actually all working.

A

So I'm asking like this a normal can I maybe I should zoom in a little bit, maybe a little bit. So this is a you know: normal histogram, prometheus, query, p99 and what's being returned along with the normal return. The normal return is the metric, and this is actually a little crowded at the moment, but you can see these exemplars as well. So I see my trend. I can also go over here and click on a exemplar. I can choose one of these and I can immediately jump over here to a trace.

A

So I have my metric that metric is a ton of requests that are all aggregated up into a single request, and I can then use this new exemplar support in openmetrics and in prometheus, hopefully very soon, to store an example of a single request, um and then here I can, you know, dig in my trace, and this is all normal distributed tracing for those of us who have played around with this cool, um so exemplars are kind of this new upcoming feature talked about all the different places that you know it would be available soon, hopefully cross.

A

Your fingers, um like I said before, tempo is also dependent on excuse me. Tempo is also dependent on um this search. It's also dependent on logs for discovery. So this is the way we actually discover logs right now- and this is a loki query um as discussed doesn't find anything. Maybe I can as discussed. uh You know this is loki, but you know this is uh compatible with.

A

Is it there? Oh yeah, okay, we're just in a weird spot, um so this is compatible with elasticsearch or anything any logging backend, where you can build a link from like an id. You know this is going to work, so there's no like dependency on any. You know on loki or anything like else like that, but to show you what we do in grafana, so you can see here. I have the you know the path recorded. I have the latency down here.

A

I have the trace id and I probably should have other things of course, like you know the http verb status code and other information. If I put all these on a single, if I put all these on a single log line, I can then do some really clever. Queries like this, like. Let's look for something over two seconds, maybe yeah. So now. All of these traces are greater than two seconds and I am now interested in maybe some long-running traces. So I can diagnose some kind of latency issues.

A

If I were more clever and had like, let's say, method equals get, and you know I could do another pipe and do something like status equals 500. If I was interested in failed queries or whatever so with some careful logging, I can create- and this is normal http request logging. So a lot of us already have these logs. I can build basically an index into my traces that um lets me, do advanced searches and can also discover traces. So let me jump over here boink and I should be able to get.

A

You know a trace out of this log line and it's greater than two seconds like I asked for cool, okay, so tempo is this new, like I said, uh distributed tracing backend designed for high volume, extreme volume and it's designed to be inexpensive and cheap to run put everything in s3, but everything in gcs, don't don't have to bother with complicated backgrounds.

A

In exchange, we're doing trace id search only at least for now and, like I said at grafana, we're doing this kind of log-based look-up of our traces using loki again, you can use whatever you want.

A

This discovery through logs allows us to store traces, super cheaply, super inexpensively and then soon, hopefully, we're gonna see exemplars like uh like we're digging in only supported by some clients.

A

Support for prometheus has two pr's up once merged we'll have complete support. Grafana has support in the tip of master, so we're real close to the point where we should start seeing open source, exemplars and all of our favorite. You know metrics tools. We also used open telemetry for the demo application. So if you want to dig into the code- and we saw it- showed some example code there and then we had to kind of use, also open metrics.

A

In order to get example, our support definitely looking forward to when open telemetry has full open, metrics support, including exemplars, and we can use the open, telemetry instrumentation.

A

Thank you all for your time. I think we're going to do q a in a little bit, but thank you. Fosdem enjoy your conference.