Ceph Performance Weekly, 19 Jul 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 2018-07-19 Ceph Performance Weekly

Description

Weekly collaboration call of all community members working on Ceph performance.

http://ceph.com/performance

A

A

I don't hear you I think your mic is off again.

B

Good morning work, can you hear me no end.

B

A

I can't, oh sorry, who is that welcome.

B

To the smiles performance meeting, little whale yeah I have to fake off minutes. I guess.

A

People can't trickle in hopefully we'll get a couple more, but it's been small lately I think um we've kind of had a drop-off in community folks showing up and I'm not totally sure. If that's just past summer, people are gone a lot or if it's you know that you know. Maybe maybe folks are happy with the performance now and don't bother showing up. I, don't know a.

B

Lot of it's, our could.

A

C

Eighteen, sudden and intense deja vu.

A

Great, what was that datavu.

C

Only contents of this meeting room.

A

A

Well, yours here and we might get Radek. Sometimes he comes.

A

I have to find, where I put together this purse etherpad. Where did I put it.

A

Reopen it back up, I guess.

A

So let's get this started: I'm gonna be on vacation for the next couple weeks, so I. Imagine unless sage decides to run this, we probably won't have it until sometime, maybe second week of of August.

A

So, having said that, let's see pr's this one Greg. If you want to have a throwback to looking at the MDS there's a PR here to optimize the the way how max export size is enforced. Mu, colonel I, don't know anything about MDS, optimization stuff, so I'm not sure if that's interesting or not, but it's there.

A

The erratic has up tracker work and also retic has is this clamp to avoid extra ref, counting in primary log, PG, p, ours so I think radix our winner this week in terms of new new work.

A

And then Neha's yaar to limit the PG log length merged that is fantastic. I should try to get that into my version to our cherry-picked into my my branch here, so that I can test with that because it might affect the the defaults I'm using for the the memory limit stuff. If we don't end up, peaking higher I might have to change those a little bit but yeah glad to see that got merged. Yeah.

B

Be really interesting to see how that affects your tests. Doing recovery, especially.

A

Yeah, yeah and I haven't even tested recovery, yet I've still been just screwing around with like normal normal work, but that was on my list of things to do is look at recovery scenarios so definitely getting that merged before I. Do that I think probably makes sense we're getting that cherry-picked rather.

A

Or just rebase myself on master, which probably makes sense to do again soon.

A

Let's see what else um there's this EC stripe cash, er I, don't think anyone's looked at that. Yet if anyone wants to review.

A

The age add the perf counter. One or recording the lane see if K be finalized I mean that hopefully isn't that big of a PR I, don't think, but um that can be kind of interesting to see. I haven't seen kV finalized that that particular thread being super super like busy lately, but maybe in certain circumstances it might be so it would be interesting to see my PRS are all stuck on this Lib standard. C++ runtime issue, um one 1604.

A

If who's gonna try to fix it, which, if he was here I'd thank him again profusely because that I don't understand why it's exactly happening or how to fix it. But um hopefully he'll he'll have some make some progress on that.

A

Mafia and Peng has a couple of different PRS I. Think sage has been looking at.

A

Yeah and then there's this kind of remove async recovery, one and that's gotten a lot of discussion. I think sage is also in the middle of kind of discussing that or for anyone who has been intimately involved in that it, it seems to be kind of subtle to me. The issue there I haven't looked real closely at it, but for people that are interested that might be worth looking at and I think there there's been some progress on understanding. What exactly the issue that's come up. There is so.

B

It's very difficult, and there needs to be a ghost algorithm to do something that makes the data stand still be safe and consistent and there's no such job for them proposed. Yet so.

A

It Josh it's a real problem. We understand it now well,.

B

It's not a real problem. It's a puzzle to do something. That's unsafe and Safeway. Okay,.

A

I didn't even understand that, so the the the proposal is an optimization that makes it unsafe, basically yeah. Okay, all right now, I understand. Well, okay, good luck, uh-huh what else Adams on vacation right now, but he before he went on vacation. He was looking at implementing some kind of thing for looking like trying to model fragmentation of the different blue, sir alligators I.

A

Don't think I've seen any results from that yet, but it looks like there's at least some outstanding pr's here for it. Hopefully that will once he gets back he'll chance to get that merch a beer, it very interesting to know. Jason has been doing some testing with our BD workloads in one of our hats, labs and and saw some just interesting, behavior with stupid, L care and bitmap allocator, where bitmap allocators actually.

A

Doing a better job of utilizing, free, contiguous areas or random rights when available versus stupid, mount stupid alla khair, which is only doing it until it kind of hits the end of the device and then starts kind of doing just random I/o, which is I, guess what you'd expect for 4k or for 64 K, random I/o.

A

That's was testa's but but stupid, AR, sorry, bitmap, allocators, maybe actually being a little bit smarter about well, maybe smarter, about using large uest chunks of free space who turn that random I/o into kind of more sequential looking I/o, whether or not that's actually a good idea. Maybe is a point of contention, though. Maybe you don't want to use your large free, contiguous chunks of space or random I/o, even if it makes it kind of sequential.

A

Maybe you want to save that for big objects or big writes if they should come in and do your like, 64k random I/o, you know 64k open spaces or close to 64k. You know open open, you know contiguous regions on the disk. So anyway, definitely he knows some differences be in behavior, so anything that that we can do to get a better understanding of what the fragmentation looks like and what kind of free space we have under different kinds of workloads of different sizes would be would be useful.

A

Let's see what else um sort of surprised at this rgw thread pool size to 512 has not merged yet I can't imagine that that is a particularly complicated change to to do, but I don't think that that has actually merged. Let me just double check to make sure I didn't miss it nope, not yet so anyway, what else aging tests again for allocator stuff, I.

A

Don't has has anyone in what's going on with this lebar BT shared persistent read-only cache and that's been out there for quite a while. Does any one kind of.

B

Up and leave stuff up there in case might be no sure.

A

All right, well, that's been that's been there for quite a while, it looks like it must be getting you know, semi regular updates, but has been kind of sitting around right, huge pages to limit page faults, thing, that's kind of been kind of slow lately, I think.

A

Yeah I think a lot of their stuff is pretty old, yeah, not now a whole lot. So all right, move on to discussion topics I've been continuing to work on cash, balancing, stuff, refactoring and cleaning up and trying to figure out reasonable defaults in the current state that it's in it needs some cleanup. But it's it's doing pretty well.

A

I did can't increase the number of priorities to currently user-defined priorities or ten that you can set for kV, Mehta and data that potentially could result in if you actually defined every single one as an independent type, 30 different config button and others convinced me just to shove this into a space delimited string. So now it's three: you can kind of see what the defaults right now that I'm testing are there.

A

That basically means that that at priority, what level one kve has we'll look at five seconds of LRU cache and then add to another five seconds for kv6: high-priority three: six I'm, sorry, not two seconds some five seconds ten seconds and then 30 seconds or priority 3 or K V and then Mehta starts at 5 seconds.

A

It's basically five. Second intervals than the number of intervals, so, if you're looking at the etherpad, you can kind of do the delay out there. That's just kind of a very rough idea and my part of kind of what sort of might make sense here. It seems to be working pretty well this. This set up means that for small cache sizes, kve ends up using the majority of the cash and then at large cache sizes that all transitions over to two meta cash.

A

Instead of kV cash, because you you have kV hit there, sorry Mehta hits happening really rapidly and data. That's in the kV cache ends up kind of degrading or getting old and then kind of falling off and and not not being prioritized relative to met Akash. So it kind of does the night the behavior we want, where for small cash values, everything's in meta cash for large cash values, I'm, sorry everything's in Cavey cash and for large cash values.

A

All the owners end up in meta cash instead and then data cache as it's somewhere below all of that. So anyway, still testing going on with that. But I think it's kind of doing was supposed to do, which is good, and the performance numbers interesting right now, I think are probably going to be higher than just using static ratios, even when the stack ratios are for a cash value. That's like aggregate cash value is larger than what the auto tuner has available to it all right.

A

So so that's going well one question that I proposed and I guess earlier. I think everyone here Ari have knows about her or responded to was looking at mallik plus placement new for allocating memory for a struct kind of all at once upfront, rather than using. You know, allocating memory for the struct and then allocating memory for say, like a a car star array inside of it.

A

This all came out of rocks to be doing things that were kind of evil, so we might be able to still get some of the benefit of what they're doing without quite having to be as evil as what they were doing anyway. I'm hoping to look at that a little bit more too and see if there are any applicable places we can do some of that may be inside blue store for or limiting or reducing the number of new and well I, guess, Malik and and free or delete calls that we make so anyway.

A

That's it! That's all I've got any.

A

Idea, I'm gonna, I'm gonna pick on you a little bit. I was wondering I remember when we had our weeks ago. We saw that we might be able to reduce the amount of memory that was being used for pg log entries.

A

If I recall correctly, we were maybe copying stuff into a vector or an array or something that that wasn't strictly necessary. Do you recall that at all I don't.

D

Recall exactly the same thing: I do remember that you brought this question. That is there a way that we can introduce HPG log entry signs, or do we actually know what the upper bound for? That is yeah, that that yeah I definitely remember that discussion.

A

So your PR right now at least bounds everything. So we we at least know that we're not going to extend it beyond whatever is defined in the stuff that comp file right right.

D

So we had this parameter called the OST max PG log entries, which was there earlier, but didn't really act like a max because during recovery backfill we were just extending the log to whatever whatever was easier for a log based recovery. But what my PR does is, whatever value that you are going to give this parameter. It's just gonna, stick to it, no matter what whether there is recovery, backfill or like a regular scenario, just gonna have that's the maximum upper bound for the BG. Look, no ok!.

A

Well, that's really good. That will absolutely be helpful and then, though, okay Josh, do you remember when we were walking through the code? I thought there was something where we were kind of like either copying these into some kind of like temporary data structure, or there was something I thought we were using extra memory. We didn't necessarily need to I think.

B

There might have been one case of that I remember at least with maybe the maybe the row, black information that we didn't need for a replicated case or like that I think you might have also been a few other things in the PG log structures itself, where we might have been like copying.

B

Thank you. Like one point, your point you two pointed out was like the object names. I've got thing goes around for a bit.

A

Yeah I mean there's that, oh maybe that maybe that's it. Maybe it was the names that we were copying into some kind of other data structure. I, don't remember anymore. Did we take notes on any that? Do you know not sure? Okay, but.

D

That reminds me that we did talk about the rollback information being irrelevant in one case, and we also I think the size of the object. um Then the name of the object also came up because we were particularly disk discussing I, think rgw or something so these two things definitely came out. Yeah.

A

I think I think you guys are I think it was maybe the name of the that string that was maybe getting copied. Maybe that was what it was and we could reduce the on to memory by not doing that. I was heard. Remember.

A

B

But that is I think would be good to take an empirical look at exactly how big these injuries are, and where did that space is being used. It's uh sometimes looking at the code is a little bit misleading.

A

Though, in the the mempool, just in some very very you know, just random tests I've been running recently well, I guess: they're, not random. It's for K, random, writes and an RB d with a 256 gig volume and I think it's 256, P geez, I'm, 1, OS d I can verify that but I think or PG log entries in the mempool we're using around 400 megabytes of memory.

A

So you know there there's a value right, isn't.

B

It why are bags for how many peachy's, and how many entries RPG yeah.

A

It'd, be, let's see the default wherever the defaults are and.

A

A

Yeah 256 P geez.

A

So I think is our default. Like 3000 yeah.

B

Then it's being greedy.

A

B

So that's that works out to roughly 5 thread need bytes I heard 30 bytes 3.

A

Sure that's our BD rather than rgw, which I guess you'd maybe expect longer object, names, yeah,.

B

Especially with the EC having more row, black metadata, yeah.

A

B

You have the numbers for an ER DW test marker I.

A

Know I've popped, my head, I might have them laying around somewhere, but it's easy enough. I've been running these tests like almost constantly while I tweak the memory tuning stuff, so I can.

E

A

Easily grab some in whatever next you know. The next rgw test I run is.

B

A

Yeah, maybe I'll just record it during the duration of the test that we can kind of watch it as it starts filling up. The volume see how the how quickly the PG log entry you know, mempool increases I, imagine it'd, probably be pretty fast.

B

Yeah II presume yeah.

A

The other thing I have noticed in those tests is that the I don't know, if is fragmentation, or if it's more, that we have just like spans that can't be released by TC Malik. It's a combination, I think of a lot of different things, but when the when the the auto-tuning code is rapidly shrinking or growing, the rock TV cash versus the oh, no cash in blue store.

A

We can kind of end up in a state where, where the the amount of memory available to the cash to keep our heap size within a certain boundary, bounds decreases pretty dramatically.

A

I think it's actually true, maybe also for the PG log entries that are stored in memory, the the more kind of different size of things that we have a memory and the more we kind of remove those and add new ones in the worse that seems to get- and this is just kind of speculation and observation on my part, but it seems like when we have lots of different sized memory allocations. We can kind of confuse TC Malik pretty pretty easily. That sort.

B

Of makes sense yeah. It seems like a plausible theory.

E

Behavior with Lipsy and.

F

E

And it seemed that he malloc is a bit better in this respect, so at least for sequences that very producible for Jamaal occurred. Similar behaved, much better, probably still have the same issues.

A

It might be that as I've been trying to read up a lot about not only this. This kind of thing, with like using placement, new and and trying to create, like contiguous ranges for Struck's plus their their pointer content.

A

But beyond that, like mem polls and have the trade-offs that are involved in in allocation speed and if you're doing lots of deletes or not, and this I suspect, given kind of how much we are creating and deleting stuff that an a mental might be better or for some things. At least if we have kind of the the equal sized objects that were allocating.

A

But yeah kind of I'm I'm not sure how much it helps if we're doing like more dynamically sized things like PG log entries, for example, you know what was that was that look like you know if you've got a variable length of string and you've got a static size structure, what's kind of the optimal way to allocate space for all that stuff, and where do you put it I'm, not quite sure, but it seems like we might be able to help TC malloc a lot or any of our memory, alligators, potentially a lot if we can kind of make it well defined for them what things they're going to get or not have a lot of different sized things that are being created and deleted constantly, or at least limiting it.

A

B

And yeah: it's not a case where I'm curious about the actual size of these entries, if they type it, for they certainly vary between rgw nred but within one of those workloads. I. Imagine they're fairly consistent, but maybe you know, has a style.

A

B

A

Yeah for our video, it imagines are consistent, but do you think they would be for rgw.

B

Well, I think that um I mean, with the object name being I mean piece of variance there. If we had a style alligator with that, I was able to accept like a largest every game we do expects, or maybe they, the percentile 90 percentile. That kind of thing that might be sufficient to avoid confusing T's. Not so much, maybe.

A

I guess I was thinking that in the RBD case, I would expect very, very consistent, like object name lengths, but for for rgw I'd be afraid that, like if you really had an rgw deployment with like thousands of random users, all using it at once that it would be, you might have just you know, really variable length names for for the objects.

B

Yes, you may waste like 4k of space for a pantry or something, but that's not huge amount.

A

Yeah, maybe maybe it'd be worth just eating it and then and then not having to to worry about the dynamic allocation.

A

I'm Linux, how much stack do you get? Is it like 8, Meg's,.

B

That sounds right: yeah, okay,.

F

A

Don't think so, I think you get a lot.

F

But I I mean just be fret I.

B

C

A

A

Well, in any event, uh-huh lots lots of questions here. I think, though, just based on the stuff that I've been playing around with recently it seems like focusing in this area, would be probably good for us I think if we can do things to help the memory allocator. You know, certainly when we switched over from simple messenger to async messenger that helped dramatically, but it still seems like um based on some of these behaviors I'm, seeing where we try to keep the memory size limited.

A

It's it's really interesting to watch TC, Malik struggling to do so and kind of the amount of space that ends up available. When we do that, I suspect that we'll see both we could improve, have the the memory fragmentation effects and then also probably improve performance quite a bit if we, if we kind of approach this with kind of a a goal of making things friendly so anyway, maybe when I get back, I'll I'll try to look at it a little bit more, but that I suspect that there's gains that we could make there.

B

And I agree so they're not doing anything else to long-term serve any allocation patterns, new paths, general yeah.

A

All right: well, that's all I've got unless anyone has anything last-minute, no well I.

G

Kind of do yeah all right running into some problems with the EG overdose and memory protection, as anybody in the Ceph community had a similar experience. We're trying to install OpenStack, which creates a bunch of different storage pools, and you know most of them. Aren't that big, like what we're seeing is that it's pretty easy to hit that lemon.

G

The was it the Iman max PG / OST limit and that's causing a lot of like heartburn, and it's like I'm wondering if we kind of went to the opposite extreme there like before there was it was. There was not much of a limit on how many pgz and that got us and certain kinds of problems. As you can't really. You know it's it's hard. You can overdo that yeah.

B

I agree, I think we have seen other folks right into this as well and so um relatively recently I guess we kept the default hard limit to be like 600 pts for Pro St instead of 400.

G

Yeah, that's the hard limit. I agree with that, but this is the soft lemon I mean I. Think I think the soft limit seems to assume that you're only creating like you know one or two big pools and I'm, not sure that that's you know true for everyone. You know, and you know if we could loosen that up a little bit. You might just prevent a lot of people from getting frustrated who there was yeah.

B

I mean this offline doesn't really relate it to number of pools at all. It's just family to the recommended number of PGs people have set up.

C

B

Tools could be um folks good guides there that should stay with Josh.

A

That seems like a kind of convenient answer for us right. Yes, if you increase the number of pools, you're gonna increase number.

G

Sure, though, I posted the lot, the code, that does the check, or at least that we think, does the check and it looks like it's adding up the you know the different contributions for the pools and then figuring out if it can handle the new pool of a specified peach.

B

That right, yeah, exactly I mean it has to add up. Then the number of piggies in total right.

G

So I just I, you know I understand why you put the check in place, but it might be a little too light. Look.

B

What is the am, the consequence of this check is just a health warning right.

G

No, it's basically, the whole creation fails and the OpenStack deploy, fails and.

G

Is it f, there's enough difficulty getting an openstack deployed to work without you know, without that.

A

Related to this, that's.

G

A

Question for you what if, instead of keeping all P G's in memory, we didn't know? What are you instead and.

B

If entirely unrelated to this, though, but.

A

We could potentially allow people to increase the number of PGS than with I mean the.

C

Whole limit right now is right because.

A

C

Completely infeasible mark it will absolutely destroy any pretend said that reasonable Layton sees we can talk about. Why later, but no, let's.

B

Just focus on the one issue time here: okay, so.

E

B

The I guess I'm wondering how the open cyclamen is hitting this, or maybe maybe they're, creating a bunch of these pools before very many of the OS DS are deployed and that's why they're hitting his limits so soon. Well,.

G

You're you're, absolutely spot-on. That was the first problem that was it was hit, was that they tried to create the poles before they created the OS DS and that clearly didn't work. So they put in a check in there SEF ansible and set danceable to say wait until all the OSD stated before you create the storage pools, and you know that helps, but in there check may not be, but I mean when I do the arithmetic, you know they when they even may have reasonable eg count set for their pools.

G

It comes right up to the limit. You know that.

C

You're, just like the deployments are happening on small clusters and they're setting small PG counts for pool, but that the total number P DS is unreasonable. I mean.

C

A

Hurry are requirements that we impose on users is unreasonable. Our PG counts per per OSD are unreasonable for users. If we want them to be able to create a number of pools like is being set here. Yeah.

B

Exactly okay, so this is the default thing to 200. That's definitely not good. We should probably increase that to 600 as well just.

A

But Josh, that means our memory is gonna explode on the OS DS. It.

B

Only does for its kind of initial deployment set up or leave the four clusters deployed right when you're, deploying and you're out having a ways to use and adding pools, and the thing that is entirely up here.

A

Every single PG maintain a a log of every entry for that PG in memory. Once it's been written, yeah.

B

But you're not even doing I could use yet until as you deploy this cluster right.

A

B

Can read growth you, but.

A

If you assume that every single pool is at some point going to have stuff written to it, you're going to end up with every single log for every single PG in memory, eventually.

C

So signal, but then is this: where is this coming from? Is this performance group doing stuff or no.

G

It's it's just people or trouble. It is the performance group doing things because we're one of the few groups that doesn't use toy clusters. You know yeah.

C

G

So here's the thing you know you, you basically have this PG calculator. That recommends certain. You know settings right and if you know what I want to do is make sure that the PG calculators and recommending things and then the overdose protection limit is saying. No, you can't do that is that that just creates a hurry. Music situation like he was trying to create a poll with a like a thousand and twenty four PG s with twenty OSD I. Don't think that's yeah.

C

G

C

Like the actual numbers of things that you're doing and like, is it just that the clusters aren't set up yet? Is that is that the only time you're seeing this or are you still getting getting weird results? Because the exact number is like matter I.

G

Understand that, but you know, if you, if you want I, mean so the case was a thousand 24 P G's for this p.m. storage pool and they had like few other pools that were smaller and when you do the arithmetic, it you. You only have 4,000 PGs, because you have 200 but 200 P, geez, / OSD times 20 OSD, that's 4,000, and then in a calculation it takes 2024, multiplies it by three, and so basically it starts to come. When you add all these these pools up.

G

It starts to come right up to that limit and you know it might fit. If you have all your OS diese, but it's really close and I think yeah. You don't want to be cutting it. That close.

A

Completely agree with you is super easy to hit these limits. I I hit.

B

A

The time I think.

B

The problem here is that we didn't realize that this was also on another hard limit that was set so low, so I think I agree. We should reach. It just increases her limit.

G

Right I don't want to.

G

Maybe you know, let's try 300, you know I.

A

Do want to like draw attention to the fact, though, that this is gonna increase memory usage on the OSD as soon as you write to any of these pools right unless I'm quite mistaken, it will yeah, that's not, and it's not insignificant. It's it's enough that we might be like if we want to keep the OSD s to a certain kind of memory ratio or memory limit. We might be blowing all of our available memory on like PG log and not on other caches. If we increase this dramatically well, I thought.

G

That was a sort of a different related issue, which is the there's the PG law max and min, or whatever they recall that the difference between them was causing it to consume a lot of memory, and there might be other factors as well like what Neha is working on. But this is more of a. This is like a much more kind of urgent thing, because you can't even get the cluster running. You can't even get into memory yeah.

B

A

Could we could certainly make the logs shorter right? You know if we made the logs shorter increase the number of PGs, then you could keep the you know kind of the expected maximum usage roughly the same, but then you have shorter logs.

C

Like the PG calculators maintained by I, think someone in consulting and like we definitely messed up with the hard limits in terms of what people tend to see happen during recovery scenarios, but those numbers were pretty accurate for where we want our memory to be allocated in the OSD like if the calculator is giving you numbers that don't fit in that, then either the calculators broken or the or it's being used wrong, or there needs to be a reckoning somewhere and we're not going to do that reckoning in this meeting.

C

Right now, without all the data. Okay,.

G

C

G

You need more information, fine, but I was just kind of letting you know what I. What I was hearing I think.

B

G

B

To increase that hard limit, because it is crazy to have a hard limit so low.

C

Doesn't mean that people should actually be running into them just because they're running you do it when they turn on a cluster. That probably means that it's configured badly or else we lacked a lot of data I. Don't.

G

Think these clusters were configured badly, I mean sorry but I think.

C

Is probably okay.

A

So so the two primary reasons I can think of and add to them or correct them if I'm wrong, that we recommend so many PG spur OST is one the kind of random distribution nature of it and then to the the locking behavior inside the OSD. For for the number of PGs that that you have, if we, the the first thing I think hopefully sages PG, balancing code helps dramatically with blocking behavior. No but I wonder if we really need as many PG spur OST now as we used to.

A

Maybe we can get away with fewer recommending fewer is that even Abul or something worth looking at more.

B

Yeah potentially I think the main thing there is a distribution um and it's tough with like a the smaller pools at least I. Guess they don't they don't matter as much as that. I have intensive ones, but but.

A

I remember both Loic and sages stuff when they were writing it, it looked like it. It really really worked well to balance distribution.

A

Even at small PG counts, like you know, maybe maybe 20 or 30 per pool seemed like it was.

B

A

Good job with it, that's.

B

It's certainly balancing things in terms of um data usage right now, but I, oh okay,.

G

And that's great I mean I totally agree with that strategy, but the or is that something that will ever be in luminous.

B

Any marks talking about more far future things right now. I guess I do.

B

A

They are yeah what I couldn't remember if it was luminous or mimicked that it finally merged for, but it it's, you know it's it's there. I don't know um I, don't know how to even enable it or use it. But it's there yeah.

B

It's the manager, module I'm so used to you can just turn it on with them. You like command and I- think there's still a few bugs to pop up from time to time, but it's there all.

G

Right sorry, I didn't realize immediate.

A

It's only in the manager module, though sorry.

B

Again, it's a flood as a manager modules, online, I.

G

Don't run that so I mean I'll. Look into that. I I just didn't realize it. It made it yeah. If we had that, then we wouldn't need so many Pete agree.

A

G

A

Have locking issues potentially, if you're, on really fast devices and usually keep them mind, you explain that a little bit locking so there's a PG lock. So if you have fewer PGs, if you, if you do like a wall clock profile on the OSD you, you might actually see contention for that lock, especially if you have really really fast like nvme devices. So just you have to be careful to watch for that and see.

A

If you know, depending on how many, how small you shrink the number of PGs in a pool to you know in depending on the number of OSDs you you could hit that. But but you know they still might be worth it.

G

F

F

Applies to situations to a situation when you have more than one working one TP HTTP, Fred barrier OSD work you shot, that's I would define. However, this is our default configuration honest is this: we have a chart and two worker threads per shot, so the lower the lower number of PG you have, the more likely you will have a problem with a pigeon of contention.

G

Assuming that your pools have all used, I mean so some of these pools or not.

C

The peep there's a PG lock, it's a logical unit of locking work. So if you have a bunch of contended, IO, 2p G's, then reducing the number of PG spin slowly down that that's what all they're saying and.

A

It's also by reducing the number of PG s, you're sort of indirectly reducing your your the size of your PG log right. Your aggregate number of things that can get logged right.

B

A

Your your kind of you know effectively reducing the length of that by by shrinking your number of PGS and reducing your beverage consumption right. Exactly all these things tie into each other and kind of you know. If this one of the things that's complicated for the user right is that by by tweaking certain things you affect other things.

G

All right, this got some homework to do. Thank you.

G

If I find the PG calculators because the you know the eg, then.

B

Thanks Ben cool.

A

All right we got ten minutes. If anyone wants to discuss anything else, so there was, we can wrap it up any any takers.

A

All right, well, I will let sage know if he wants to run the meeting and have it in the next two weeks. He absolutely can otherwise. I guess have a good couple of weeks. Folks and I will see you probably second week of August all.

C

Right, take your mark.

A

All right see you later guys.

F

Everybody nice.