Grafana Mimir Community, 25 May 2023

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Mimir community call 2023-05-25

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

All right uh Welcome to the the May, the mirror Community call um I'm Vernon Miller software engineer on mimir, um so just going through the the agenda. Today we have some updates of stuff we're working on um the the first item on the agenda is about reducing queer memory utilization um Marco. Do you want to talk about that.

B

um Yeah sure so.

A

C

B

Speaking in charge of Ganesh and Charles, um they are both Engineers from grafana lamps they're.

B

Not in this call today, um so over the past month, uh they worked to introduce an experimental feature in Mir, which is expected to significantly reduce the manual utilization peaks in the mimir query, um and the the main idea of this work is that, instead of having to loading up front all the chunks which are basically you know, batches of compressed samples before executing the procure engine, these chunks are actually read from the Injustice in a streaming way, while the procure engine is running.

B

So the idea is that we will run, we will start running the bronchial engine, while downloading the chunks from the investors, and we will keep downloading the chunks from the Injustice while running the frontual engine. At the same time.

B

um So far we did some testing in a def uh linear cluster and we got some promising results in terms of memory utilization reduction in in the query, um however, we just tested- you know there is more set of queries like the most common one. Some rate is, you know, is the first query from which we started, which is also the the most common one in in Prometheus.

B

um So more testing is required and we also want to run. um We also want to start testing this. This new feature, uh you know in the big Premiere cluster, but so far dude. The results we got are pretty interesting. uh There's one uh main downside which it's something we are still working on um when you enable these experimental features, it's expected to increase the ingester CPU accreditation.

B

Now the real question is by how much and that's what you know we want to measure in our production clusters.

B

um The reason is that when you enable this feature, we have to sort the series by labels inside the Injustice instead of doing it inside the queriors, um and this sorting actually hits you know a significant amount of CPU, um which maybe you know visible in terms of cput radiation of the Injustice.

B

um That's definitely something we you know we will keep an eye on. um We are also very sensible to try to keep the Injustice if you as lowest as possible, um and maybe um there is also room for some opportunities there I think we will have some data to share not for the next release, which is the tour time expected.

B

uh The release process is expected to start next week, but probably in the release. After so in mimir 210, we should have some. You know, data to share with the rest of the communities.

A

All right, thanks, Marco, um the next one I can talk about so a feature that we've started working on is per series, extended retention uh and it's still we're still finishing finishing up the design phase and working on a proof of concept. But the the idea is, um you know you may have a ton of a ton of metrics.

A

Let me admit: okay, you may have a ton of metrics, um but some metrics are more valuable than others and uh for some some subset of those metrics you may want to to save them for longer than the default retention or the retention that you would like for your your less important, less important metrics.

A

um So the feature we're working on is to uh on a pertain basis to Define two tiers, a default tier and then an extended retention. Tier um and you know the user would select um a set of a set of series that they want to be retained for longer and then mimir would automatically uh move those.

A

um You know those series to the extended tier once they kind of age out of the uh the default tier.

A

um So the you know the benefit is that um right now you may you know a customer in the situation is faced with either having separate tenants and it's an operational headache um or to you know, set the set the retention to uh the maximum required, which then uh you know, increases storage costs.

A

um So this is something we're excited about and uh should have more updates in in future. Community calls.

A

All right uh next one is about um reducing memory utilization to store gateways uh Dimitar. Do you want to cover that.

D

Yeah so so, last few months, I've been working on reducing privatization, store gateways, um the gist of it is that, where the straw Gateway is now, can trade uh fetching posting through the tsdb index for for fetching Series? So, overall, the volume of data from the tsdb indexes should be reduced.

D

um We've seen this to greatly reduce Peak memory, utilizations in the store Gateway like in some cases, memory becomes pretty much flat, um whereas before they would be Peaks up to 50 40 gigs of go keep.

D

um This is still so. This work. This works most of the cases whenever the usage pattern is such that it doesn't benefit from this optimization. The circuit, who ends up fetching slightly more series than it needs to this ends up increasing CPU um Yeah by roughly 2x in some Peaks.

D

um We're experimenting with different ways to account for this um yeah and if we managed to to get a reasonable trade-off, I think we just flip the default um in 2.9.

A

C

A

Had a question about that in tar, I saw the uh the um the parameter you have there to try out setting the worst case, I. Think there's if I'm correct, there's like three different choices for that parameter is that is that right.

D

There are yeah um so like where's. The uh worst case sounds bad, but, like worst case refers to optimizing for uh the worst case in fetching data, so it assumes you'd fetch many series and then tries to to fetch postings, which are in volume no more than that. So you can have worst case.

D

You can also have um speculative, which assumes that with each selector in your query, the number of series, The query, would select halves and then in this way it decides how many series and how many postings to fetch, there's also worst case, small posting lists, which favors um postings a bit more than in series.

D

So this is this is a version of worst case, but it should have.

D

um It should be less aggressive and uh yeah. That's interesting and there's all which, which is just the default, which kind of disabled, optimization.

D

Oh, what is to the dog.

A

All right, yeah, thanks I'm, sorry.

A

um And on the the next item, I've I've actually been looking at some of Marco's PR's around uh the compactor, reducing uh object, storage, API calls, it seems like it could be pretty huge uh Marco. Do you want to talk about that.

B

Yeah sure um oh yeah I wouldn't call huge, uh but we I'm I'm working actually I worked uh on a couple of optimizations to reduce uh yeah API calls issued by the both the compactor and the ruler to the object, storage um so ruler, uh the ruler uh synchronized, the configured rule groups on a you know, regular interval, which is configured by the configuration option called ruler, pod interval and by default, is one minute.

B

However, most of the time there are no changes to the rule. Groups between you know two consecutive intervals, but unfortunately the ruler doesn't know um unless you know keeps checking uh all the configured rule groups in the storage. What we did was introducing another way to sync uh the real groups, which is a sort of event based. So whenever you call the ruler, config API to change, they configure the rule groups, a notification is, is pushed across the ruler, replicas and everything is, is triggered.

B

This is in addition of the periodic polling, um but we expected thanks to this feature, uh we will be able to increase the polling interval um because you know we expect that most of the um well and you know, unless edge cases or race conditions that we expected. You know the thinking would be actually triggered Always by the the events uh we are rolling out.

B

This feature to production right now at Griffin labs, um So, based on the results there um I expect to eventually change the default whole interval in in mid-air to to increase it.

B

um There's also a configuration option to disable the new event-based uh sync.

B

uh So the compactor issue, a bunch of object, exist, API calls to the object storage like at Griffin Labs. If you look at the all the object, storage API calls issued by the compactor 96 are these object, assist on the Block meta Json file.

B

um So after some work and some refactoring in the code, we changed a bit how we discover the blocks in the storage and we have been able to reduce sorry to completely remove this objects exist. Api call this feature: I mean I. Just worked this week on this um I merged the last VR three hours ago, so we will start rolling out this feature on to our Dev clusters next week to our product clusters, the week after so I should have. You know some results to share in a couple of weeks from from now, okay.

A

All right, thanks Marco um and the last item around um oh Matt, you have a question: go ahead.

C

I was only curious if you had a second to mention how how do you know? uh Presumably we needed to know that the objects exists or not, and so what is the detection mechanism for that? If we don't do the I, don't know where they had calls, maybe before.

B

Yeah, so the compact, before the changes the compatible was, which was doing two different things, one we were listing the blocks issuing a list, object, API call and when we list um what's your least in the object, storage are sort of prefixes.

B

um So when we were listing, let's say blocks which are basically like directories in help check storage, um you can get 1 000 of blocks in each list object API code.

B

um So the number of API calls uh you have to issue to list. All the blocks of a tenant um is not equal to the number of blocks for that tenant inside the object storage because we list a thousand with just one call, but then for each of them we were checking if the meta Json file was existing and the reason why we did it was to check if the block were sorry.

B

If there was a deletion in progress um on the Block, um which means when we delete the block, we started deleting The Meta Json file. It's our way to trigger that the block should be ignored when the metagic file does not exist, and then we delete all other files of that block until all the block is deleted.

B

Now there could be a deletion in progress for that specific block between when we issued the list object code- and you know when the compactor uh run the planning and start the compaction um so yeah. We were basically checking uh every meta Json file of each block here to see. If the block was sorry, if there was a in progress, deletion or not- um and that was inadequate, I mean this code was years old. It was originally inherited from from Thanos. uh You know in a different state of the world.

B

Now we have the packet index in the mirror, which is mandatory, so we can infer the same information from the bucket index as well. So before listing objects, we actually you know, look up the bucket index, uh which is just one call pertinent um to check all the blocks which have been marked for deletion and and then we use this information to check whether a block should be excluded or not from the compactor, because the compactor that doesn't need to take in account any block marked for from for deletion.

B

Sorry, um so we can just you know, exclude all these blocks without checking the meta Json file. Sorry, it was a long uh description, but unfortunately it's quite tricky the way it works. No.

C

I I asked and I appreciated that thanks uh thanks for the in-depth uh rationale.

A

Yeah yeah very informative, um all right. The last item is that next week uh we'll cut um the next release from a mirror 2.9, and so uh you know, after that, it will run it in. You know in our Dev and production environments and and then be um you know, published.

A

So that's the end of the regular agenda. So now it's kind of open anybody. As you know, questions share pain points. You know anything.

E

I would have a question um I uh sorry, I I'm, in between meetings, so like I'm, trying to uh to see how much time I have um the uh I see that there is a recommend, a new recommendation now in the documentation uh for the compactor to basically have uh one chart. Every 8 million active series instead of the old 25 30., um is that recommended for all use cases for big tenants as well um is what what's the why there was a change?

E

um Is there any pros and cons to um changing them? Is it faster or because what was the limitation there or.

A

I'm not very familiar with uh I mean I saw that change, but I'm not familiar with the um the rationale anybody else on the on the call now.

E

Yeah, just first got some questions, so it's gonna be a big change for us, because right now we have uh 64 shots of the and the pertainant and the. So if we, if we go from 30 to uh but right now, I think we are considering the 30, so we would yeah need to multiply by by quite a lot.

E

So we put that like 250, vertical Parts yeah. Is there any problem with that many shards or.

B

Yeah my my quick answer here is: do you have any problem? If not don't change the configuration?

B

uh So there was something discussion internally, whether we should recommend you know and now sorry we should keep recommending a one shot. I think we were recommending the documentation going short every 20 or 25 million, um and then you know um some people from from grafana Lodge suggested that we actually at graphology. We actually run with the lower Target than what our documentation say.

B

um Yes, we run with uh nearly targeting nearly eight million eight ten million um so yeah. That was you know this. This question raised uh someone proposed to change it. We had some brief discussion internally. I personally didn't have a strong opinion.

B

um The takeaway is, um if you increase the number of shards.

B

um You may better parallelize, uh uh the you know the.

C

B

Both the compaction and the query um up to a certain degree, uh which means uh um you know if you target I, don't know one million series per Shard. You end up, probably with too many shots too much of it and clearing all these small blocks, um but targeting between 10 and 20 25 million active sales.

B

um I think you know it's a it's a reasonable number, at least based on the recommendation. We also give in terms of CPU and memory uh requirements for ReStore Gateway.

E

B

E

Shouldn't like so it's fine if we still, if we stick with like the 20 well, the third yeah.

B

If it works fine for you, yeah.

E

Definitely Well, for now it looks yeah.

E

It works fine, it's difficult for us to yeah to know if it would work better because it would be like a very difficult uh um yeah experiment to to run, because we have 900 Millions active series uh and the re-replicating this data and then having the you know, the heavy query that we are having in production would be kind of a tricky so yeah we don't know really how to how to know if changing the the shot, even because when I change the shower, then I can't even go back uh well, like I can't go back in a in a future time, but all the blocks you know that are compacted remain compacted the same.

E

B

E

Yeah yeah yeah, uh yeah, uh okay, okay, thanks! Well, that's good to know, so we don't need to maybe to change this dramatically. Maybe I I I will try to. We are trying to get the the number of shards like the as a power of two basically and the shards as well, that the query use so at least they can kind of interact with each other. Even if, in the future we change the number of or shards or uh for the compactor.

B

Yeah yeah, that's definitely recommended. If you don't do it already.

E

Okay, okay, then maybe I will, uh if we, if we change that, I will try to see how are the the queries if they see they're faster? Normally they can let you know if that information is so useful.

E

Okay, uh thank you very much guys.

A

Yeah all right, thank you.

D

Today can I have a follow-up question on first series Federation.

D

um How are you, how are you doing this like? Are you cutting it? Are you cutting new blocks? Are you making use of the Tombstones previous US.

A

D

A

Yeah so Tyler correct me thought if I'm getting the design wrong but yeah we're we'll use like a series. Matcher to you know specify what series to um to include and then, when is a compactor would normally you know, uh Mark blocks for deletion based on the default to your retention.

A

um You know they would recognize uh based on you know, extra information that we put in meta.json or in a another. You know file just in the same directory. um You know which series need to be. uh You know kind of extricated from that block and then written to a new Block in the other tier.

D

So this happens that when you, when you cut the block.

A

No, when uh when normally, we would delete a block at the end of the default retention and.

D

Then you look up whatever the current setting is and then.

A

Everything yeah, okay and so yeah, and that makes it kind of a no-op if you know if, uh if a user decides, oh actually I want to keep these in the defaults here if they haven't aged out in putting it in the other tier just you know just don't know what yeah.

D

Let's sleep difficult.

A

All right, I guess we have no more outside joiners by any other internal discussion.

E

A

I guess we'll end it here and see everybody next month, all right! Thank you. Bye-Bye.