Grafana Tempo, 12 Aug 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Tempo Community Call 2021-08-12

Description

Discussion of Grafana Tempo news.

- Version 1.1
- Update on Search
- New Maintainer

A

Enormous and amazing search capabilities um and then nanya took off all right. So we'll start with, uh we have a new maintainer, which is cool, um rob hayden. Is that how you pronounce your last name.

B

A

Perfect, that's the midwest american accent applied to your.

C

Midwest american name.

A

um Conrad has been with us, for, I guess a bit over three months and has just done great work in tempo. He's been an agent he's been doing operational stuff he's been adding code he's been adding features, he's been instrumental in search and participated with that in the hackathon and he's even contributed to some of our upstream dependencies in open, telemetry he's kind of just been everywhere everywhere. He sees um any kind of need, he jumps in and he gets some work done. It's been fantastic having him. So.

A

uh Would you like to thank your friends and family kunron.

B

No, I mean I had a like. You know a really good time working on temple, it's really fun um and yeah. I think we have a very good foundation right now, so I'm looking forward to everything we'll we will be building on it. So we search and we're doing you know all these reliability improvements so yeah, it's really cool to be maintainer.

B

I'm exciting excited for the future cool.

A

Well deserved: absolutely, um it's been a lot of fun having you and yeah, I think you're going to kill it with search with marty and the rest of the team coming up in the next couple months, so very excited to see that we also have an additional team member back leslie master of egg greats.

A

um Would you like to introduce yourself.

D

uh Sure yeah, my name is zach and um I collect milk crates. This is true, um so yeah I've got a background in system, administration, network engineering, sre type work and uh yeah. I've been a long time, grafana user.

D

You know, because uh if you're running systems- and you want to know what's going on so visualization and observability all really matters in that and uh so yeah- I'm super excited to be here after you know using grafana for so long and I love all the tool sets so and uh yeah I'm just getting started with uh distributed tracing. So tempo is gonna, be a lot of fun. I'm looking forward to it.

A

Glad to have you here.

D

Yeah, thank you.

A

Cool um also, okay, a couple blog posts that we had this really cool info world blog post. It's in that doc I'll also just go ahead and slam it in the chat here.

E

A

I just pasted the world word info world into our chat and not the link which is useless here. We go. um Oh no, that's! Oh! I mixed up the links. Didn't I oh no. I did.

E

Third time's a chum.

A

That's right: okay, okay, let's swap these real fast. It's all part of the show, gentlemen, all right there we go. So this is the info world. One we'll talk about that in a second and there's nothing really new in here. I don't think for anyone. Who's used tempo. It's really just kind of this high level overview of what it can do and some of the other integrations with like exemplars and such I really wanted to show it just to kind of show some of the growth tempo's seen some of the attention we're getting.

A

You know out there um in different in different publications. uh Tempo is really kind of catching on and we're very excited to kind of see more and more companies use it and get more exposure like this.

A

um I think in particular, we've hit this kind of volume, kind of niche that people love and that the like kind of cost of operations is much lower than other back ends and now we're looking to add those features through search that are really going to explode, the I think, the tempo kind of universe, um and then this link says zach and it has nothing to do with zach leslie.

A

um Instead, this is a zack swanson he's a he's, a community member he's very involved with tempo, also loki and other grafana pieces, and he just wrote this blog post and it's on our graffana blog.

A

I wanted to highlight it just kind of just to say thanks to him and the work he's done, and also it's a cool post and it's talking about how to run some of these things like loki and tempo and fargate and thorgate's a bit more ephemeral than other systems, and so he kind of goes through some of the issues he had and some of the things he had to do to fix that up.

A

So, oh man, somebody put a much better description that I did in there. Thank you very much, um but yeah just kind of wanted to bring attention to some of these things that are going on in uh in tempo cool b, one one. So we just cut v one one, um the uh I'm sorry, we cut the first release candidate rc, zero, don't install it.

A

Coinrod will tell you why, in a second um but to go over some of the features, some of the new things that are coming up, that we're excited about um that we think everyone will be uh will be able to use. First is this hedge request thing? um I should probably be linking like prs here: if anyone would we're capable of finding those pr and bring them up I'd appreciate it. um Hedge requests basically will repeat a request: a get request to the back end after uh so much time.

A

So I think internally we have it set to 500 milliseconds or something. So if your request to your back end, gcs or s3 or whatever exceeds 500 milliseconds, it can repeat that request. um It'll! Do that once so! It'll only do it only do two requests at most and really this just to get around kind of the long tail of the back end. So any back end, even something as stable as s3 or gcs has a long tail, and you can easily set this kind of timeout.

A

This hedge request timeout to be at a point where you're only going to be repeating a very small percentage of your costs, like say one percent and see amazing kind of uh an amazing impact on the p99 of tempo. So it costs almost nothing because you're, just only repeating maybe one percent of all gets to your back end um and the uh and the kind of consequence of that is, you just kill the p99 of tempo. I think we are seeing like five or six seconds p9, which is not great at all.

A

We definitely want to get rid of it now we're down to like two and a half seconds p99.

A

um So when tempo does execute a query, it can often like I find by trace id, it can often result in hundreds or thousands of requests to s3, because it's kind of doing this huge and parallel search across all your blocks and so anytime, you do a thousand requests to something you hit the not the p99 right, but the p 99.99 right, because you've executed a thousand requests so you're, seeing like uh the long long tail of the back end service you're using so this hedge request kind of helps uh deal with that problem, um the second one the tenant index.

A

E

I was just going to say the there's like a slightly elevated number of requests to the back end, because of that it's it would be like whatever. uh If you set it to the p95, then five percent of your request would be reached right. So but it's like a small cost to pay to get the latency improvements.

E

A

Yeah we haven't played with it much. I really just set it to 500 milliseconds, for I don't know what reason it was just a number in my head. I guess.

E

A

Haven't experimented with it at all, and it was just so impactful so immediately. I think we just kind of moved past.

E

A

But it'd be kind of fun to set it to 200 milliseconds or you know, try other values and just see what happens.

A

um But if you are using tempo, I think when you, when you get one one by default, this is turned off. It's set to zero, um which turns it off this. This setting, so I'd recommend heavily kind of looking to tune that setting to reduce your latencies.

A

um The second one the tenant index here is to reduce calls to the back end. So right now, every every compactor and querier has to know the state of the back end all the time to for queries, to execute queries and for compactors to kind of reduce your blocks, and they are pulling the back end by just listing every block over and over and over again.

A

I think once every five minutes is the default, so in smaller installations, this is probably not a big deal, but at the size that grafana has reached we're executing thousands of requests.

A

A second for no reason except to know the state of the back end and that adds up over the course of a month in costs, and so this idea of attendant index only a small group of compactors are going to create basically a zipped json file and the json file just has a list of every block and then all of the other compactors and queries download that.

A

So I try to get some good documentation in here, um because it is a kind of a operational change that, if you do operate, tempo would be good to know about. So you kind of understand the way your system is working to help understand how to configure and also maybe what could go wrong. Some of the failure modes- um and I did my best to increa- add some very you know solid documentation around that, but please kind of review that and um if you have any concerns or some of the documentations make sense.

A

Please follow an issue. Please um you can bring it up on this call or mention the slack or um yeah. Just let us know and we'd be glad to help clarify, maybe what's going on and why we made the changes we did and hopefully we can get you kind of confident in this than this switch cool, ananya.

B

A

Going to tell us about what he did next.

E

Module, do you want to say something.

F

Well, yeah, I guess about the tenant index. um Did we call out it's kind of beneficial for some workloads or installations where just the pulling itself is become like a non-trivial cost like it can become significant, and so this really helps drastically reduce just like the polling operations. Overall, yes, I mean yep, I'm not sure where that's come up, but it has before yeah.

A

It's come up right. There's just like you know, there's definitely a flat cost right to polling on our gits, but also people who have run. I haven't heard this from tempo, but I've heard it from cortex and loki. um People are running it on-prem and maybe they're running with something like mineo or any of the billion. Like on-prem s3 things you can run sv compatible things they're having maybe issues with tempo or one of these other applications overwhelming their back end.

A

Like you, none of us can overwhelm us three right, but um but if you stand up mineo, you can overwhelm that right because you're operating that as well. So it's kind of also just to reduce the list calls and the gets on on um on maybe on-prem back-end.

E

Cool so yeah. The next link uh change is about caching flexibility, and this is also about like sort of larger workloads. I would say it's not super relevant to smaller workloads where you can sort of fit all of the working set of tempo in cash. uh So the working set of tempo is the total size of all of the bloom filters for all blocks in the back end right.

E

um So for smaller installations, that's relatively smaller, but as the number of blockless grows like I think we had 22 000 blocks at some point uh in our back end and then the size of the bloom filters just sort of shoots up to 120, 150 gigs and at that sort of level it's it's not really advantageous to to cash. All of those bloom filters you may want to like pay a latency penalty and and just sort of have a fraction of those bloom filters in memory. So that's kind of what we implemented with this.

E

It uh turns out that um so for the lower levels in the compaction level, the blocks are actually they have a lot of churn, so the compactors will quickly combine level zero blocks together and move them higher up the level, so they disappear kind of quickly and also super old blocks. You know that they're going to get deleted soon, so it's kind of advantageous to just have recent blocks that are higher up in the compaction level and this pr sort of adds the ability to do that.

E

It's kind of complex- and let me see if I can share um a way in which we can figure out how we can choose which blocks we want to cache. Let me go ahead and share to do. Caching.

E

So a page from our dock should be up.

E

Yeah see it by the way this pr, like I I put it up, and I went on pto for a week and marty really worked on half of it and it wasn't much before I came back so that was.

F

Awesome, I was going to say.

G

F

Better or worse, yeah.

E

It works um so so we have this section on caching in our docs and if you scroll all the way down over here, we have this cache size control and this sort of talks about what we're trying to do here is to is to just store a fraction of the bloom filters in cash and for the rest of the blocks. Tempo will go and fetch it from the back end anyway, there'll be a slight latency cost and um because they're not in cash.

E

Obviously right and thanks joe nice talk, so we've added two sort of configuration options, the minimum compaction level and the max block age, so that sort of limits um the blocks that will be cached and if you sort of run this okay, let me see if I can bring this image up. Let me share this step instead, so this is sort of a summary command that we added to our cli and it it lays out um how the blocks the bloom filters over the days as well as the compaction level.

E

So you can see that there are. There are like 20 or bloom filters, which are one day old land at compaction, level, zero and so on. So this sort of breaks down how the bloom filters lay out over the days and compaction levels, and you can see that the the ones that are lower in the compaction level- we don't want to catch them and the ones that are really old. We also don't want to catch them because they're going to get thrown out anyway.

E

So really we sort of want this quadrant of bloom filters, which are recent as well as high up in the compaction level, because they're not going to churn right they're going to stay, and so to do that.

E

We have these two parameters, you can just say cache a block only if it's above the level 2 or if it's not older than 48 hours- and I know this- this is like a little complex. So if you have any issues when you're using it like feel free to just like open, pr's or issues, and then we can improve on it. But this is, this is something that we thought would work really well for the first pass and it's already started showing some results.

E

um It's not very pronounced because we have like a lot of um blocks at the moment, but still better than what we had without it. So yeah I'll stop here.

F

Yeah, we would really like to hear from from anyone who you know what their is with. Caching, what amounts they're using if these settings are helping um just be, and maybe it really comes into play for larger installations. Like you mentioned, our environment had over 100 gigabytes of wood filters and it's impractical to kind of catch all that so yeah yeah we'd just like to hear any feedback.

A

Cool um sure, so, there's a lot of channels to reach us, like slack. um Of course, this meeting um all kinds of different ways to get a hold of us make an issue in the repo. If you do have feedback on some of these things, we're adding or you know, certainly anything to help you operate it. um Let us know we certain we kind of have a scale which is good because it lets us kind of find issues before other people.

A

They can also be bad because we just don't see some maybe of the issues of running this at low scale. You know and we're maybe fixing issues that we have. So we highly um we highly suggest or encourage everyone to help us know what issues you're running to running tempo, um so we can improve things and uh and help other or help you all get things going.

A

um Yeah next on the list is bad things which we used an upside down, exclamation point for because it felt like the opposite of an exclamation point, but I actually don't really know what an upset on exclamation point is for. Does anybody know that.

F

We use it in spanish at the start, so you put the normal one at the end and that one.

A

Oh man, it's like bracketed by both yeah, so.

D

You know at the beginning of the sentence.

A

That's it it's exciting, you don't have to wait till the end. It's pretty cool all right go ahead. Conrad.

B

Yeah um so, while cutting the 101 release, um we have been noticing a couple of issues with memorists in our largest clusters, um so yeah in this release we have updated our cortex dependency, which brings a lot of upgrades to you know the ring to memory listen to other stuff, we use, um but this also changed a bit. Our member list is configured and yeah we're we're thinking something um might be causing trouble there. So we're still investigating that.

B

uh But the things we are seeing right now is that um compactors have trouble getting ready and that remember this is causing a lot of cpu. So if you yeah so yeah, basically what we're noticing um we're still investigating. If you have to change the config somehow um so, for now our advice with the 101 release candidate is you know, can you can try it but might cause some issues?

B

So you know keep a good eye on that, um we'll be running it internally and we'll we're trying to track down on the issues we're seeing um again as similar to the previous part. If you have any issues running memoryless, just let us know um create an issue reach out on a community slack um we're constantly trying to improve it. So yeah.

A

Right, we cut that rc0, we deployed it and we immediately noticed just um much heavier like cpu and memory. We kind of tracked that down.

E

To some memberless.

A

Defaults changing so I would recommend not installing that just yet, but we are on track to get one one out, uh we'll just have some. You know big bug fixes and such uh we will def we'll be adding no more features until we kind of cut one one.

A

So everything from now until one one is going to be stability, and you know kind of rounding up whatever these issues are that we're seeing um and we are continually fighting member list um if anyone else is having issues with that, please let us know we struggle sometimes to forget things.

A

It sounds funny like you go, you know to the ingester ring or the compactoring and click forget um and that's been. Memberless has generally been very good, but this one like haunting issue. We can't get rid of um I'm kind of deep, diving it now. So maybe we'll have a resolution or maybe we'll tell you I'll, stop using member list and we'll remove it from default. So I'm not sure it's going to be one of those two directions.

A

Probably, um but I don't know I hate to say I am close because I kind of think I'm close, but uh I don't know it'd be nice to it'd, be nice to kind of finally put this one to bed for sure um another one. Oh sorry, good.

E

I was just saying you said you were close close to removing my list altogether. I'm.

A

Not going to tell you what I'm close to it's either.

F

A

Deleting member list from our code base or finding the issue, actually it is one of those two things I'm not sure which one it is yet uh one more one more thing to add about one one is we've deprecated, two older block formats, which you probably don't even know about because tempo kind of deals with these seamlessly, but there's some notes in the change log for how how to deal with this.

A

If you install one one, there's nothing, you need to do, but we're going to remove support for these in one two and there's some details in that change. Log about uh what you need to do, which is basically, you need to upgrade to a version that will use uh the newest block version, which is v2 and kind of. Let all your blocks, all your older blocks fall off, but the changelog has some really good details added by marty um on how to kind of migrate forward.

A

It's uh not really that you really have to do nothing except install a version of the of tempo that understands both the v1 and the v2 blocks, wait for all v1 to go away and then install uh 1.2. So just heads up that that kind of change is occurring and now service graphs are up, and unfortunately I don't have a ton to say here, because mario is the primary um kind of owner of that on our team and he is currently on pto.

A

um There was this pr that we dug up uh connor. I don't know if you have anything to add you're welcome to say I have nothing to add joe since I'm totally putting on the spot here, and we discussed this not at all beforehand. But if you know anything about service graphs, conor's on the grafana team um proper. Do you have if you anything to share we'd appreciate it.

C

um Nothing too specific to add pr's in review right now, um so looking to get that out soon excited to have it.

A

Once mario gets back um he's doing the work in the agent to kind of generate the metrics necessary to draw this graph. um Conor and andre are both working on the actual code to like display this in grafana. So when he gets back we'll, probably maybe look to have a demo or look to kind of show a little bit more, maybe next month at the community call in september cool.

A

uh Finally, let's wrap this one up with marty. Well, let's not wrap it up. um If we have anything else, I don't want to say what's done, but marty uh has some new uh info on search and some of the work we've done there.

F

Cool yeah, um so in the last community call we kind of walked through the timeline for what we see for tempo search, and I think we maybe we did a live demo and we kind of showed off some of that. So we're not going to repeat that here. None of that has really changed. um We just have some numbers from some internal benchmarking and performance, testing and stability, and things like that, so I'm gonna go ahead and share that.

F

um But if you want to see see all that kind of stuff just go back through um and look at the previous community call, let's see these are just like totally awful slides.

F

Okay, cool, so everybody can see that so so uh what we did is we we we're testing out, search in an internal environment, that's kind of heavy, it's kind of like the one that we always are talking about with 100 gigs of gloom filters and things like that, and so some stats about the environment at the time it was processing, 1.5 million spans per second um and it's 40 ingestors and 20 distributors.

F

So that's the right path and that's kind of like what we're focusing on here and the investors are keeping data for around nine minutes. So it's that's kind of like the environment that we were working with um so for searching so after letting it run and kind of, like all the new data that that's present on the injustice has been indexed and things like that and he's also searched meaning. It's searching. Everything on the ingestors was able to be completed in about 1.5 seconds.

F

That was fairly consistent, um and that means it was searching around 34 million traces or about 66 gigabytes of search data, and so each ingestor there were 40 of them. Each one was doing able to process uh kind of over one gigabyte per second or so for ingester, and so overall, the search performance we felt pretty good with it was 45 gigabytes per second of search data per second um leading to the 1.5 seconds for the exhaustive search now in exhaustive searches.

F

So searching this an exhaustive search, we're doing by searching for tags that don't exist, so it was forced to search everything, but um other searches were much faster if they were able to be completed and find enough results sooner than that cool. So that's kind of the performance um and again that's just over around nine minutes of data. So this is promising, but you know what we want to search a lot more than the past nine minutes of data um resource usage.

F

So there was a lot more overhead to turning this on which we expected, but we have some numbers for that too. So, storage wise, it was around 1.5 gigabytes of extra disk storage per adjuster for the additional search data, and so that's kind of like the 66 gigabytes, distributed across the 40 ingestors um for on the right path, which is the distributor and the adjuster.

F

There was around 25 increase in cpu 30 in memory, and so the cpu is required for just processing the additional data because, as each trace comes in kind of like that, search metadata is extracted and built and built up and then saved. And then the memory actually comes from storing the uh live choices with their search data in memory. A lot of that and then of course, just other other work going on cool any other things to cover about the performance.

F

I do have another slide: that I'll jump to.

A

Is that 25, cpu and 30 memory is that more on the distributors, the ingesters are evenly or split or both or what you.

F

Know I closed that tab, but I could pull it back up that.

C

D

F

So this was overall um we I could pull that document back up, but that was overall across the entire environment. It mean it was yeah. I'd have to get the better numbers about that sure. It's.

A

It's no big deal. I was thinking it'd, be more impact on the distributors because they do a lot of the work of like actually building that like index thing right.

F

Yeah, so the the ingesters do have a lot of work to do, because, although the distributor extracts it initially and sends the bytes over to the adjuster that part's kind of efficient for the adjuster, but there's still a lot of work to do so if the the trace is coming in in multiple parts, those have to be unpacked and repacked together and then uh after it's written to the wall, it then has to be flushed into another block. So there's a little bit more work to do there. Okay,.

A

Yeah, um something that I I want to do is right now, when we first release this uh in one two it'll be in one two right question mark.

F

Probably yeah, so that's that's the slide. Probably I.

A

Think so, oh my bad spoilers.

F

Oh sure, yeah, so I'll go ahead and touch on this. Oh sorry, like every time I do google slides full screen. It always messes me up. So that's why you'll just have to bear with me on this.

F

So this is the same timeline that we showed off previously, so this hasn't changed. I'm just highlighting here kind of the section that we're talking about. So this is phase one of search, so tempo will have an api. It will search the adjusters and we will kind of have this experimental upcoming, grafana ui, so the tempo pieces we're targeting for one two. So it's not the release we're talking about today in the in this. That's one one! It's one two is the one after this um and then the grafana version is to be determined.

F

So we'll have more on that as we uh as we get there cool.

A

When this first is going to be released, it's going to be behind a feature flag is that right, like you'll, have to set a flag to turn it on.

F

Yeah, that's how we would start in tempo. Okay,.

A

So um I think longer term that will be dropped and it will be default, but I think we always want a flag to turn it off so for people who want tempo, as just this high volume, key value store back end like with the kind of performance it has today, we want to always preserve that we went over some or marty went over some of those resource increases, which is of course expected, but for extremely high volume deployments.

A

It's possible you'd would just prefer continuing to use your um continuing to use your logs or your exemplars for discovery. So we want to kind of maintain that, after this, so for a while, it'll probably be like a flag to turn it on a feature flag and then eventually, it'll be we're. Flipped and it'll be a flag to turn it off.

A

If you prefer kind of um you know doing it the old-fashioned way and having that level of performance that you uh have today, because we do know some people, you know enjoy it for that kind of, like extreme high volume back-end and this feature may not make sense for them cool um all right. I think that's roughly the end of our stuff um questions, uh any other agenda items or concerns or questions or if anybody wants to you know tell a hilarious joke.

A

That'd be fantastic.

A

Not very funny people a joke, that's right! um What's his name, richie likes using my name in terrible ways.

F

um Yeah, it doesn't look like we have any additional agenda items from the community in the document either.

C

A

All right all right! Well, everybody take care, um let's go ahead and call it. If there's nothing else to discuss, and um we will see you in about a month- please reach out on our slack channel. It's probably the best place to get real, quick help. We also have um community forum. Oh there's a question here, one sec: we.

E

A

Have the community forum, which is a great resource, the grafana forums, to ask questions, um and then the ish or the the github repo? We do. We work very hard every member of this team to be very responsive on all of these different. You know methods of contacting us, so please reach out on them. If you're having issues an overview of the grafana tempo deployment would be pretty helpful. Cpu memory, storage, any config changes of note um tanner.

A

Do you mean config changes in 1-1 or just kind of like an overview from base principles of grafana's or tempo's kind of architecture?.

G

Hey I'll uh turn on the view of this one sure yeah, so we're at shopify we're just rolling out tempo um we're up to about I've, gotten it up to about 60 million spans per minute. Now, um so it's getting pretty big or that's and that's at a low sample rate. um So we have. We have a lot of span, so our tempo deployment is going to be pretty happy.

G

um Basically, I'm I'm just running into so many shoes around. You know things like cpu throttle like there's, no real good, like how much cpu do you give the distributors? How many replicas do you run? How much memory do you give them?

G

Distributors seem pretty low on memory so, like I keep them fairly low, but the cpu seems to be very spiky. I've noticed on the distributors um and then same with the gestures.

G

It's kind of just yeah, I'm just going through like ground steps of getting things figured out so an overview.

A

G

The existing, like the large deployment, would be really helpful.

A

Sure, let me just show you our.

G

A

Your uh kubernetes.

G

That'll be good.

A

I'll just show you our internal dashboard, we can talk through it. um This is our ops cluster. It's in a weird state right now, because of some of the stuff we discussed with one one. So this is not entirely.

A

I don't know completely up to date. This cluster is taking 2 million spans per second. At the moment, um this dashboard is awful. It needs a lot of love and we use it every day or I do at least uh injectors. I don't we don't see spikiness on the ingestors. Really I mean unless it's below the prometheus scrape interval, which is of course possible right. um It might be that we're just not seeing something that could be no.

F

I think the spikiness was on the distributor, cpu yeah.

G

I've like I've seen in some cases, I haven't figured out why but I've seen where I just had a couple distributors sitting there not really doing anything. I've had that a couple of times as well.

A

I would load balancing yes very good.

F

Oh yeah, no, that's that's a good one. I was going to mention um also the ring to see if there's anything unhealthy in there.

G

The ring the ring is typically good. um I've had some issues where, when I'm restarting my injectors and it's replaying the wall um that can get a little funky, sometimes it's flapping because of that, I'm actually I'm having bots flapping. Because of that right now,.

F

So a couple versions back we improved the adjuster replay to not actually replay like it used to actually duplicate all the data into a new wall on startup. That was the actual replay. But there was oh yeah, um but maybe it was in 1.0. We changed it to kind of just res quickly, rescan the wall files and replay a lot less. I'm honored.

G

Recent version I'm actually.

F

G

Like it was an r4 branch, I think I needed the config expand end option, so I needed to run off the faster for.

C

A

um How many ingestors do you have.

G

Right now uh I just scaled it up, so I have 20 ingestors right now. Okay, it's been going.

E

G

Down since this morning, I've I've been trying to get it back running again,.

A

You're saying you're having ooms or panics.

G

uh So it seems to be some, investors would stop running or they would restart, and so I think I triggered this um currently scaled down and lost two adjusters right off and then it's kind of everything has struggled to come back a little bit because of that it seems like so then some get unready and then it starts to kill some and then those start come back. They start doing the replay walls, so they're unready for like five minutes, um but they're kind of this, like vlog, is going on.

A

Sounds like your walls are quite big. Our replay is, I feel, like seconds like under 30 seconds. um I would look at maybe your settings on cutting a block like uh in terms of the size and age.

E

A

E

uh Compression on the wall on by default.

A

Oh yeah, I would turn that on. If I think it's actually off by default, you should turn compression on the wall. Okay, yeah! Well, that's good!.

G

That's good to.

A

Know we use snappy and we went back and forth on it for a while and we've been on snappy now for months and it's been very stable. So.

F

Right so we use z standard for the back, end blocks and snappy for the wall. I think that's what we're saying yeah.

A

Yeah, I wonder if we should make that default. It is not I kind of forgot about that. Setting um we've had it on snappy for so long.

G

Yeah right now I have about seventy thousand blocks.

F

Seventy thousand- and you said, sixty million spans per minute, which is a million spans per second right.

G

Yeah, the span rate is down right now, because the ingestion is not quite right. That was, that was what I peaked to that. That was yesterday.

F

70 000 blocks.

E

That's a lot of blocks.

F

E

F

F

E

Exact question: hey.

F

Can you switch that component to the compactor.

E

Sure do you have like a good number of compactors running.

F

Yeah, so so we had we, this environment here is doing 2 million spans per second, and it has 18 000 blocks. Okay,.

G

So, quite a bit different yeah compaction and onions right.

F

I think compaction would be a good thing to click into the settings there.

G

A

I'll help, people with that the compaction window in particular, um is a good one to reduce. So I think, by default, it's like an hour and when you're creating a huge number of blocks, reducing that window like five minutes or something will allow a lot more compactors to participate very early on in the process at the level zero which will reduce your block list quite a bit okay.

A

So I look at dropping that.

E

um Also, what's your retention like how many days are you keeping the data for.

G

uh I think it is eight days right now.

E

G

F

Yeah I mean I guess we don't have numbers that we've arrived at through, like any sort of like formula, but I mean maybe we could you know, do a better job, with some guidance for different levels of scale. Like this environment, you know, is in the million spans per second, we could do guidance for something like that and then maybe other loads like 100 000 or something.

A

How many compactors do you have uh tanner.

G

uh The compactors are pretty low right now. I think it's five.

A

Yeah we have 20 40 40 40. yeah, it's. It is a bottleneck of the system that compactors, um because it pulls has to pull the entire block, which is a lot of data right, rebuild the blocks and send them back out. I don't know I think at some point. We don't have plans in the immediate future deal with that, but I think to get into like the tens to 20 millions of space per second per cluster. We might start looking at.

A

um We might start looking at changing the way we do compaction uh from this, like full block compaction to something else. I don't know.

G

Okay, cool yeah. I can't get too much into our numbers for spans per second um sure we can. We can, if I guess, if we need to like do a test, we can get a very, very high here.

A

Sure, what's your uh back end.

G

A

Okay, I was really hoping you're on s3. I just I really want to hear of a really I mean I want to know of a really high volume deployment on s3 to feel comfortable that it works um the way we think it does. We've just never run on s3 at high volumes.

A

E

I think maybe in every community call we can, we can just share our numbers on cpu memory, ingestion and so on. That would be like a nice timeline. um We can just go up, see the backlog of community calls and not backlog, but like change log of community calls and see how cpu memory is doing.

A

Right, uh that's not a bad idea. I kind of want to look at our other settings see if I can pass anything else on um what what are you using to hold your ring panner.

G

uh It's memberless right now is it, do you have any issues with it.

A

Can you uh show us how to fix it tanner I.

G

Don't know if I have it or not to be honest right now.

A

If you look at your compactor ring, I want to know if you have any unhealthy compactors in there. That's that's the big question. I.

G

Had one weird one where some old distributors were sticking around in uh in the ring, yeah.

A

um That's the bug.

G

Like all of my deployments were gone, and I had like four or five uh unhealthy distributors.

A

Yeah, that is, unfortunately, the bug, um and we are right now working hard to figure that out and if I cannot figure it out in a couple of days. I think we're going to say please use console from now on because it is plaguing us um in multiple environments.

G

I'm just checking my come back.

A

How long does a query take with 70 000 blocks.

G

Honestly, I haven't even queried it for a couple days now, I'm just trying to get the system to be stable sure, and then we can start tweaking yeah. There are unhealthy compactors here: okay,.

A

It's the same bug, it's the same problem all right. Well, um I will talk about it and select some if I can figure this out.

E

Really hopefully,.

A

There's eight unhealthy ones here. That's also part of the reason you have a long block list, although just scaling up would be helpful so with those compactors sitting there that aren't actually there they own pieces of the ring and the other compactors are basically ignoring entire sections of blocks. They could be compacting.

A

So that's part of part of that problem and.

G

Okay, I'm just doing some prod editing here, so I scaled up our compactors nice.

A

um We do 700 megs as our max block size, that's when it cuts to the wall, so I would also recommend looking at that.

F

That's the ingester max the wall, max block, bytes, correct right.

A

F

Back end is much larger, so that would be another thing to look at too. Some of our examples are really small, like one gigabyte and it's easy to run with those values, and they don't work well for high loads right.

G

Right here so I'll take a look.

A

A

um Yeah, I think for sure the um the snappy wall would be a good thing to do. We also do this.

A

We add this recently.

A

Snappy compression between the distributors and the injectors.

E

Should we should we make this default? I think I don't know.

A

E

We should probably.

A

Yeah, I think we should review our like homegrown config, that we slowly just worked.

E

A

Time and make some choices about what to make default and whatnot.

E

A

Because I think we have kind of settled on things that are quite good and we have never gone back and made those default, the snappy walls, one. This is one I think, there's another place where we can do compression, but I'm not seeing it I'm just kind of glancing through our jason that right now.

A

That's a good! It's a good call here. I'll make a note to myself to my future self.

A

Cool um awesome, honestly, it's awesome to hear you having success at 1 million spans per second um 70. 000 blocks is enormous and we need to work to get that down, but we can help you do that. um I would kind of expect it to just fail because of the queue size and the queriers, but give it a shot. Let's know what happens when you do that.

G

C

G

Of changes here to make now so I'll have to roll this out sure.

A

Yeah man reach out on reach on slack or whatever is convenient for you, um as you kind of play with that over. Maybe the next couple days or weeks we'd be glad to get you in a more stable spot um and if you can't get your injustice stable, that's concerning, and we need to dig into that. So um if you can't please reach out and we can share logs or whatever you know panic, you know slight uh stack traces or whatever you can get. Okay, cool cool.

G

Yeah we just started sending traffic like the actual traffic on uh on monday, so it says we're just.

A

That's awesome, cool all right um anything else. Tanner nope.

G

That was all I had.

A

Appreciate the the feedback man and the it's always good to hear about other people's installs. We use that to kind of balance our own experience. um You know we do have good experience running at scale, but to hear what other people are doing is always good. Of course, cool.

A

All right um well on that note, unless there is any other questions, any other things in the doc, please anything for the agenda. There question mark. Oh man, we got good notes here, good job ananya, um if there's any other, if there's not any other questions or that we can kind of just go ahead and finish up all right, everybody take care. I will see you all next month, if not before tanner I'll, probably be hearing from him in the next couple hours.

A

D

A

D

Bye everybody bye.

F