Grafana Mimir Community, 27 Jul 2023

Previous Meeting

Next Meeting

⏯

youtube image

►

From YouTube: Mimir Community Call 2023-07-27

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

Because so hi everyone, we are doing our Community monthly meeting. We are really doing this month because we skipped last one. It was only Peter and me in the call, so we decided to just skip it and today we have some points. Some news around the new upcoming version coming out, which I'll be separating the release, and maybe we can discuss some of them the first. The first point is about the health chart. I guess gradual added this.

B

Yeah this was a surprise release. uh I, guess we surprise everybody. We surprise ourselves a little bit as well, um because uh yeah so Industries, we are dealing with Port security policy objects in in kubernetes and uh apparently Helm is in great at handling deprecated apis, uh meaning that if you have a deprecated object in your hand, release and then you upgrade kubernetes to a new version that doesn't uh have that.

B

Then the kubernetes upgrade will work, but then you cannot upgrade the ham chart anymore, the hamburgers anymore, because Han keeps uh basically a history of what it installed and the history will contain the deprecated object which is impossible to like handle at that point. So basically, you have to make sure that you remove the port security policy objects from the from your release or from the deployment before doing an upgrade to the new kubernetes version, um so that so, basically what we did yeah.

A

Which coordinate this version? Do you know the number two write it.

B

Down well, yes, it's in the reason, not sorry it's 1.25, so 1.25 removed pod security policy and we don't. uh We don't render Port Security Police objects on 1.25 since, like a year now, but apparently that's not enough for him for him. You need to stop rendering it in the on some previous Series.

B

So you have to not have that even before the upgrade of kubernetes, so basically what we did at 5.0, we were forced to do this uh major release and this major release uh basically stops rendering or security policy uh on 1.24 already, which is a breaking change. um But you can force it to do that. But you have to be aware that you will run into issues with kubernetes.

B

Then um this API has been duplicated for a long while now so I, don't think it it causes any issues for everybody and also the hand chart is, can run, did uh within the restricted admission control uh for uh admission yeah control for kubernetes, so you can switch to thus to admission control. So it shouldn't be a big deal, but we like, according to the rules of the helm, versioning and and compatibility which had to do the 5.0, and it contains this removal or not so that that was 5.0.

A

Okay, nice surprise.

A

And let's talk about the upcoming release and we are still deciding the date for this release. We are talking about switching to quarterly release, rather than six weeks that we've been doing up until now, so we will announce soon, probably in the community snaps like when we are going to do the detention we'll.

C

Update the release.

A

Document and some news we have here- um I- don't know who has some context on this, because I have no context any of the changes.

B

Yeah I've I've added the changes, but the change look is huge, so I try to select some interesting ones and the first one is a minor thing which is just performing our. You know policy to be always promises API compatible, so they adopted this filtering of the rules. Api and then I listed a bunch of experimental features, and maybe Marco will be much better at talking about those.

D

uh Yeah sure, um okay, so the next one in the list is we've added the uh an experimental support to Cache the queer result, responses for the cardinality API endpoints and the label name and label values. Api point now this. uh This caching is very, very simple and it works like a CDN cache. So, uh given the same input parameter input parameters, we cache the response for a short period of time. This cache is not invalidated when the data change in the mirror.

D

So the idea is that you may configure this cache with a short TTL we've rolled out these caches in production at profound labs, um with a TTL of one minute, so very short, but we've seen a pretty good benefit. Just to give you an idea, uh 30 of the label name and label values. uh Api in point uh are now picked up from the the cache. Even if the you know, the TTL is is pretty short.

D

um The typical example is grafana dashboards, uh called the label value API in point to populate um the um the drop down menu to select I, don't know the cluster name or the namespace the variables basically, um and if multiple people open the same dashboard within one minute or if the same person refreshed dashboard within one minute. The second time will be picked up from from the cache um someone raised the hand.

D

No, no okay, I.

A

Just said they are called template, bars.

D

A

D

D

um Yeah pretty simple, um as far as we can see pretty pretty effective um as well. um It's called experimental like any new feature in mimir. uh It typically takes some time um to Market stable, um but we've already rolled out to production.

D

um We haven't noticed any issue with the latest version, um so hopefully we'll turn into staple relatively soon. Maybe the next release my next release, I mean 11., not the one that we're going to publish um any question on this. Otherwise we can move to the next one.

D

The next one is another experimental feature and the idea is to um perfect the right path in the Injustice rejecting queries. If we detect that the ingester is overloaded, um when this feature is enabled by the way it's disabled by default, you can configure a CPU utilization and memory utilization thresholder.

D

The ingester continuously monitor the process CPU and memory utilization, and if the utilization is above the configured thresholder, it will start rejecting um query requested but will keep in just the.

A

D

Path: data- um it's something we are still playing with. um We haven't fully rolled out to production, yet we are still doing quite a lot of testing on this feature. We have done some load testing to see how effective could be um looks like working as designed.

D

um I mean if the injection is overloaded because of some heavy queries. We can prevent the Injustice from being either overloaded or even booming at the cost. Obviously, of starting rejecting queries.

D

um We um just to give you a data point we rolled out to.

E

D

Rolling out to production the memory based limit, we haven't rolled out the production, the CPU based limit, because we are still observing some edge cases. um We want to to improve in how we compute the CPU utilization. We are currently using um the so-called exponential moving average, um but yeah um I expect, in a couple of releases um to be to be stable um and ready to use for in production for for everyone, any.

E

Question on this.

A

Is Linux only right.

D

Yeah good point: yeah, we read the superior memory through the procure file system.

D

D

Yeah, uh the next one is something I worked on um and that's what we call the tsdb had early complexion um now.

D

As you know, um the most recent series data um is kept in the ingester memory um and specifically, uh the data structure where the series are stored is the tsdp head, which is basically an in-memory data structure inside tsdb and then every two hour we run the so-called gstb head compaction, which takes all the samples in the tsdb head with a timestamp between minus three hours and minus one hour ago and compact, a new block which is stored on disk and uploaded to the object storage.

D

This means that if you have a spike in the in the in-memory series, it can take up to um three hours before this series are compacted into a tstv block and the number of in-memory series drop.

D

um Now the idea of the family compaction is to add the another dimension or another trigger to decide when to compact the tstp head before. It was just by time every two hour regularly with the tsdb head compaction, um it all other than the regular to our compaction.

D

We can also compact by space and what I mean by space is when we detect uh that the the number of in-memory series grows above a configured threshold, but the number of active series is significantly lower than the number of in-memory series and that's mean that we could drop the in-memory series. If we trigger an early compaction, then what we do is triggering an early compaction. So basically we compacted all the series up until 20 minutes ago and 20 minutes is not a random number. It's actually um the the active series threshold.

D

So since this logic is based on an estimation of the number of series we may drop if we trigger, if we trigger an early compaction um to have this estimation uh accurate, we use the active series tracker um to detect the actual number of active series over the past 20 minutes and then, when the early compaction trigger.

D

We compact all the series data until 20 minutes ago, agenda I've shared a couple of screenshots. Just to you just to give you an idea um here you can see one single memory, cluster deployed in multi-zone. We enabled the tsdp early compaction in one zone. uh We kept it disabled in chiwada zones, um so same exit data as stored across the Injustice.

D

um You can see that without the helicopaction sorry, the early compaction trigger was set to 2 million in-memory series pairing gesture. So without the Hurley compaction uh in just a real memory, series grows up to 2.5 million, um with the in with the early compaction enabled um we see that it keep it push the in-memory series down close to the 2 million threshold, because whenever the ratio, the number of in-memory series goes about this 2 million, it checks if there's an opportunity to trigger an early compaction again.

D

If the number half active series is significantly lower than the number of in-memory series and, if so, trigger uh heneral compaction to push down the in-memory series.

D

F

uh So I guess the the goal here is to to reduce these spikes of of in-memory series to produce memory usage right. um Is it does it only? Does it like compute these early Confections just to get back below the threshold or does.

D

F

Many as it can when it hits that threshold.

D

F

D

As many as it can.

F

D

F

Like it's sticking, but it still like hits that 2000 and kind of stays around there. I.

D

Mean I guess: yeah, let's drop below it's just enough to fact of um the query of views here, so here we're just looking at the ingester with the maximum number of in memory series.

D

um But if you look at this at the actual injuster as soon as the Hurley compaction triggered the number of in-memory series drop down to the number of active series for the single ingester.

F

And so, if, in that two hour block this, one of these series does come back, it just creates an extra a block that needs to be duplicated by the deduped by the compactor yeah.

A

D

um Again, this is very experimental. um We are running it in just few production clusters with grifolabsa.

D

um We, we started the rolling out I think a couple of weeks ago. um So it's still, you know quite soon.

D

To show uh you know a long history of of results, um but so far it's working um as suspected, and it's helping us to keep the ingestion memory utilization, which is which, on the right path, is Mastery, driven by the healing memory series um under control uh whenever there's a customer with a nice serious charting rate, so they create series which leaves for for a short period of time like in this case. Here you can see that the blue line is the maximum number of active series um across the Injustice.

D

um It's way below the maximum number of in-memory series across the intestines.

A

I think it's also worth mentioning that the active series by default are considered 10 minutes in nimir and we run it as 20 minutes in grafana labs.

A

D

All right good point: I I, totally forgot about that.

A

So your results- May Vary, hey! You may want to tweak that param. Depending on what you want to consider active series, it will be less aggressive if you will make it higher.

D

Maybe we should propose.

A

To change the default, maybe we've been running with 20 minutes forever, so.

A

Okay, thank you. Marco Matt,.

E

Was 20 minutes based on a previous architecture of the storage uh with chunks, or was that a different, uh different, different reason for that number.

D

um I think it's related to the building at profile, apps historical plan. Oh.

A

Okay: let's talk about the tokens.

D

Okay to myself, um hopefully the last one from my from me um yeah, that's something um done by Yuri um another another engineer, working at follow-ups, so.

B

D

You run me: um you're, probably noticed that the number half in memory series uh between Injustice is not perfectly balanced.

D

um Now um we spent quite a lot of time investigating why uh the number of in-memory series are not perfectly balanced between between investors, and there are a couple of reasons um one, which is the one we addressed is related to the Token ranges. Basically, the ranges of the tokens assigned to each ingester inside the.

E

D

Ring is not balanced, um tokens uh generated randomly, um and even if on you know on, when do we need generator in a.

A

Very high number.

D

Of random numbers, you may get a fair distribution um in in the specific case of Premiere, where we generate 512 um tokens per ingester. You may end up with some imbalance in the number of tokens uh owned by every single uh ingestion.

D

um And that's the number one reason why uh talk um series are unbalanced between Injustice.

D

um The second reason is actually related to shuffle sharding uh and do how it works, and we don't have a solution for that yet so the problem we solved is the imbalanced token ranges registered by Injustice in the ring. So if you don't use Shuffle sharding um with the new token generation strategy, which we call the spread minimizing strategy, uh you may get almost perfectly balanced series uh between Injustice.

D

If you Shuffle shredding like we do in many of our production clusters, you will still experience an imbalance, um so we introduced again a new strategy which we call spell minimizing in the agenda. You can see the screenshot of a production cell, sorry progression cluster. Before and after the migration uh to these new strategy. Before we had about 20 percent uh in balance or.

E

D

Of in-membraces between investors, after migration, modulating to the new tokens generation strategy, we dropped uh the imbalance to below 0.5 percent. uh If you don't use Shuffle sharding, like you, have a single tenant, for example in your cluster or a few tenants, but you don't need a shuffle sharding. Then you may consider migrating um to this strategy was.

A

That a question yes I was going to mention that if you do use apple certain, it's not that you won't see an improvement. It might be much worse with this strategy, because one of the one of the assumptions for this new strategy is that you have all the investors up until n investors.

A

If you use half a certain you're violating that and you might be choosing just the injector four five and six, so the imbalance might be even bigger than with random tokens, because we've chosen the tokens in the special way, assuming that all the investors are running pretty on that.

A

So don't don't try it with example, starting yet.

B

Something to remember for the release, notes, I, think.

E

E

Myself I'm going to write this release, notes.

A

And also important to mention that there is a long and tedious migration process to migrate from one to another one. You don't just switch the flag on all adjustments and lose your data. I guess.

A

You want to add something about this Marco. You have to drop to another color, uh no.

D

A

Okay, thank you and glad you want to talk about open, telemetry.

B

Yeah so I put these two items here because I don't know if this is like a well-known Factor like how how many people know this, but we are not only providing compatibility with tomatoes it guys, uh but with the open, Telemetry write API as well.

B

um This has been in the product for several releases now, but now we see an optic in people starting to use it, and we noticed that when profiling, this endpoint, we notice there are some things that are not very optimal. So there were a couple of optimizations done in the memory utilization and the algorithms that we use and- and the reason we have to do this- is that the opatority endpoint actually converts everything that you send to its uh in the open Telemetry formats into promato's, metrics, prometus series and metadata and everything.

B

uh So it is, uh it is doing a transformation from open Geometry to to promote use. Metrics- uh and another thing to note note here, is that it already supports the open, Telemetry exponential histograms, which is almost but not exactly the same as Prometheus native histograms. uh So we do a translation for that as well, and that translation was missing a feature which is the dark scaling.

B

So there is a feature of feature difference between exponential histograms and Native programs, which is that the open gravity, exponential histograms can have any resolution. So basically, the buckets that you had in those histograms can be arbitrarily small, but for practical reasons, uh the promoters native histograms restricted to to a certain value.

B

um Fortunately, there's a kind of easy way to downscale, meaning to meaning to merge the smaller buckets of exponential histograms into the slightly larger buckets of native instruments, and that was implemented.

C

B

This release, so uh if you run into that use case where you, for example, try to um use span metrics generated through opentametry, those band metrics would be too high resolution. Especially when you start the measurement and you would be losing those because we would reject them. But now that our scaling works and.

A

B

Will get the get those metrics uh now there's still some discrepancy between open, Telemetry and and the and prometus regarding histograms, because we don't support the uh the delta temporality or the cumulative temporary histograms, but so far that hasn't been an issue and there's already some processor in open, Terminal corrector to to actually convert the from data temporarily to cumulative. So I'm, not sure that will be support for that in the future.

B

uh We'll see you know if if it becomes a a big issue or you know too much of a hurdle form for some users, so that's it about open telemetry.

B

Any question too oh hold on.

A

I guess well not strictly not strictly related to linear, but I guess. Delta temporality is very useful for serverless, but otherwise you can always you always run some kind of agent and that agent can aggregate for you.

B

Yeah, that's what happens. The the specific use case does that this could be a bit like tricky is when then your server is auto scaled and you suddenly start accumulating your. You know Delta temporary metric in a different server that done before the scaling, but but then again histograms support, detecting counter results and and such things, so it should work like I said we are not quite sure yet if we need to do anything with this I'm, just mentioning it here to get some feedback.

B

If somebody runs into it, then you know, please tell us and Commit This Nightcore uh Reach Out, all right.

A

Okay and that's important.

C

um One sorry to interrupt only one note on this point: it's slightly related with orthogonal, though, uh which is that in Prometheus we are um about to merge native Auto ingestion as well. So it's like similar to what memory is doing um and I wondered if some work could be reused or if there's something to be shared here, uh I don't know, because I haven't looked in detail to Mimi's implementation of the all the ingestion, but just so that you are aware that this is happening.

B

Right, though yeah the name, your implantation is just using the open, termite, uh contribute Poland or contribute GitHub wrap up basically GitHub project. Okay,.

D

Yeah Jesus, can you share some details about um how will work in Prometheus.

C

So it's still very early. Basically what we've done. We've talked with the hotel, uh uh Community folks, and so they have in the other collector. They have code for remote writing uh into Prometheus. So we've basically copied that code um to create our own native endpoints, um and the idea is to improve support for auto metrics.

E

C

So the first step was just copying the code. Eventually, it will will be removed from the hotel collector, but only after, like we've reached a certain degree of stability on the endpoint um yeah, so those are more or layers the state chart in which we are right. Now.

B

That sounds pretty interesting, because I guess you can save some processing and temporary data structures. Okay, yeah! So.

C

It's another optimization.

B

C

Like it sounds like the pre-processing you're doing right now, mimir will be redundant when this, when this is like stable, so yeah something to discuss in the future.

A

Cool thanks Claudio, your oh right. Just one point about what we are working on: I.

B

Guess yeah we usually yeah so yeah on this course. We usually talk a little bit about. You know ongoing work like bigger things that we're working on. That's that are coming sometime.

B

You know we never promised any, uh not raise numbers, but these are coming sometimes so one big chunk of work that we're working on, which is really working in promitus, actually is the support for out of order uh ingestion of of native histograms, uh and uh the only thing I would say about it is that out of order isn't trivial uh thing to solve, and Native histograms aren't trivial either they are a new data structure and there are some analysis there that are quite deep and then we are trying to match the two together.

B

So this is uh um some quite important, but we are working on it and uh it's going to come in some pull up release.

B

um Another thing I wanted to mention is that there's an open PRS for a couple of months now on mimir about adding Auto scaling to a couple of simpler components for mimir and another engineer, and myself are working on trying to get this into the the official ham chart, because now we feel that enough time passed and we have enough uh kind of experience with with the auto scaling on the Json net side and in production, so that we can.

B

We can turn this into reality in, of course, some selective components like Distributors Quivers, and you know stateless components mostly. So that's a heads up that.

C

B

Working on that um yeah again, I don't have a timeline for it.

B

This is a bit lower priority than other work that we have.

A

Okay, thank you very much.

A

And that's our points, I, don't know if someone from.

B

A

Has any questions comments.

A

Maybe suggestions blames.

A

If not, we can just finish the call here, I guess and see you all next month.

A

Thank you very much.