Ceph Performance Weekly, 8 Jun 2017

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: 2017-JUN-08 :: Ceph Performance Weekly

Description

Weekly collaboration call of all community members working on Ceph performance.

http://ceph.com/performance

For full notes and video recording archive visit:

http://pad.ceph.com/p/performance_weekly

A

Good morning, guys.

B

Good morning,.

C

D

The groom, like might not be what we tried to talk on.

E

It I got a mute and to none please yeah.

D

You're not you're good.

E

All right we have CDM yesterday talked about a bunch of utility, ops, stuff and already cash and came up so start packing, like there's a pull request, um not much new. Let's see, there's a something about the scrub job. Priority I've looked that one yet looks like Greg read it um and Mark squeaking I'm sumus, yet freaking the throttle cost for hard disks yeah. It's.

D

Not really, you see.

E

That looks good, though. Let's approve that and we can just make it.

E

Be for the other ones, um let's see like entry, oh yeah, those yeah, the one that I think is proposed. Dick svet innocently care. If that merged the thing that dad did that he was going to clean up subside. So there's that there's a there's a new command for the monitor that will just show you the most recent blog entries in the cluster log and there's a related map of recent entries. That's used to prevent, duplicates when demons resubmit their stuff. So let's find a sea of voiding overlapping extension Rashard and that merged reviewed.

E

You read that dynamic, restarting merged yeah, that's like the one of the biggest most important new things to come to are saving a long time. Basically, you have to worry about buckets. Isis emerged of you anymore. It's just going to make it big, it's going to split them and if they get small again, it's going to merge them. So one less thing for users, not race to think about.

E

Let's see I think this amid transactions mission. Frankly batches but I can actually close this and he had a different approach. Azusa, yes, close yeah I have different parties. Looking at the status, fest thing emerged that this morning, I just forgot to merge it after the rebase happened in my memory that was merged yep and the unshare blobs was merged, yet there's a slowly buggin net. So that's that's a pretty good I, don't think, there's anything major that that's standing on the performance front.

E

That is super high priority for luminous and there's the PowerPC SAS zero thing, which sounds great but I, don't really know anything about PowerPC or how to optimize it so deferring to to you, there see.

F

It doesn't need to be improved.

E

You know just let's see if it's as anybody reviewed it she's.

B

Going to reviews it sin Rose.

E

Yeah I mean it needs to go through. The key I see just make sure, there's no like build and regressions or whatever so I tagged it as such and peer reviewed into proof. But are you here.

E

Yep, taking are you yep, okay,.

B

B

In it, I think fine.

E

If you want, you can add your approved by on there. That just means I'll put the thing in that to commit. You know, get credit, I, guess button, yes, kind of thing, self-born, a security make sure it builds and I assume it does speed things up for you, Justin yeah,.

C

E

Okay, all right sounds good any other one. Is this up commit app applied thing I, think word, yeah I think we just need to figure out what path to take there. I don't think this is super critical. It's not going to go into a luminous I, don't think so, but yeah Josh, I think I just asked for your opinion on direction. There and I can't remember what it was and it was a video yeah I think we can wait on that. One I'm!

E

Sorry, it's the only the one that sort of outstanding and interesting is the one 206 to garishly of based on void bags work that does all the commit synchronously in kc, f, NV, gems or something I haven't rebased that yet I think I want to wait until after luminous to to revisit that and wait until they have a little bit more know a little more about what what we should be targeting there.

E

um Maybe we should get marks on and B dams, so you can play with it sure: okay, no complaints there um yeah I, think all of other stuff is so not not super important mark mark sometime Dana, my guess but yep. That's it from the care front.

D

Iron, it I've got some discussion topics here, but maybe I'll open it up. If anyone else wants to speak first, anyone from the community or anyone else have anything they want to bring up this this week,.

E

D

Alright, so first blue star, so I I think I'm going to keep harping on this until until we figure out this way, which means that you might be starkman's for a while, but metadata metadata, metadata metadata. That is the big thing right now.

D

So, if we don't have cache misses for the Oh notes, we can do pretty good I'm being me and those he can do about 38,000 right, I, apps, maybe even better, depending on festers cookies, are then how that's convenient, but once we miss hit, cache misses for the Oh notes and extents and everything else that we've got. It slows down pretty dramatically, and this is lung, the order of like maybe 10,000 right, apps into 38,000.

D

So it's a big deal.

D

If we can somehow yeah yeah, that's exactly configure a bigger, oh no, somewhere there I think I. Even it said that today, higher Depot cache size rather see oh yeah.

E

Oh yeah yeah yeah, so.

D

I, have you got it right there, um so yeah? The the questions here are: okay, if cache misses, are this expensive for us, I mean yeah. The the immediate solution right is helping even Ono's cache. Can we make the default for SSDs and maybe we need a bigger one, but especially.

E

Good stuff right there, um the default cache size that we set for buspar, is one gigabyte and I sort of just sort of picked that, because that's sort of similar to what um a lot of people that feels like that should be within the bound of what people have deployed on hardware, but I think it technically are recommendations which should be taken with a grain of salt um were one gigabyte per terabyte of disks, which means that somebody has like a six terabyte disk or four terabyte disk, and it should be or gigabytes per OS T over him anyway.

G

E

Wonder if I we should just set our default at two gigs I, don't know what do you guys think what should the default? Oh.

G

Well, it's quite a sleepy from my instigations. It took about one geek pair.

G

128 geek of overall space and that so everything in RAM yeah and that's for the case where, when check sums, are off okay, I guess.

D

I can keep everything with it gigs for 512 gigabytes. If you give us a ramp with either in particular, yeah I think.

E

I wasn't a problem: I can happen right, I mean. If walk, we have to increase our cache size it and you know 20 Meg's, then obviously performance is going to be horrible and meta natives can read a problem. So it's at some point. There is there's just some cost here and need to figure out I.

G

E

It just seems like there's two ways to approach it: one is to figure out what what sort of a reasonable amount of RAM is that we can expect users with current and future hardware to do those two O's DS we base it on that and the other is to try to figure out. The sort of the curve is and figure out where quite close to the knee is on the curve and then.

G

And I can see, I will prefer the first one is to try to reduce or not yeah yeah as well, and the second one is try to optimize. The read performance from frogs. Defecation maybe indicate what's the bottleneck and things like that, yeah.

D

Yeah we can I mean if you, if you worthwhile, looking at how much the the encoded owner sizes on average our verses, we, the unencoded, then also be compressed encoded in laxity of. If those are talking the compressed cash, whether or not we gain anything over just the internet. You know everything is going to be a trade off, but maybe maybe one or both of those are worth it to save. On space I.

E

Think that the other, the other one that we should remember, is that there's actually a Roxy cache that is also super arbitrarily sized I can't member what it set at footage. Yes,.

D

That's what I mean yeah.

E

Okay, so we're showing that yeah.

D

It might be that that's better than the oh, no cash shot to do a store side. They you know, maybe maybe keeping most of this encoded is better or maybe even keeping the most of encoded and compressed is better, which is the Yurok speaks. You get the two caches who's going to be onion uncompressed in the compressed cache you.

E

Know what maybe, what we should do actually also is right. Now you have like recast settings, multiple casting so set well, one of them is the blue star cast size, which is the memory that blue stars managing one of them is the rock City cast size, which is a separate option and then there's another one that is I, think embedded in the rocks to the option string, which is super awkward. So maybe what we actually want is to make blue store control.

E

The rx-3 cast size as a and configure it as a fraction of the overall blue store cash yeah and then, and then we in blue, so we can decide that the best way to spend my memory is, you know, take your percent or actually, if you percent me or whatever it is.

D

We do we have any kind of we keep track of um how big and then coded instead of metadata is relative to be in memory size of it. Please do track that anymore. Okay,.

E

D

That's one way to start making like live decisions on this right. We don't get any benefit from a much benefit by encoding if you're, better off just keeping stuff in the only resource on the dash.

E

Yeah, what so I think I think it's it's more complicated than that, because there's the cost of encoding and decoding it, and so, even if smaller, whatever it might still make sense to keep some ratio whatever set I, think I think what we really need to do is actually test it.

E

So do do some runs where we fix the blue star, cast size and then bury the rest of the cast size and figure out or even better and then hopefully like what we really want is maybe I can I can do this change first, where we configure in terms of ratio both like with the specific memory budget. Is that memory better spent in our XP cash or and just do some a/b tests? Let me happen to see which is which does better yeah. We can it.

D

Depends entirely on how overload your CDs I was enough that encoding Homer set I'm coding her to day yeah.

E

D

If you have lots of CPU to burn, then it might be almost free, but if you have like tons of nvme drives on one know, there's something and- and you just cpu buns everywhere then helymus get that so.

E

We'll have to pick up that you could just sort of a middle ground. There.

E

E

The OpenOffice, yes, so by the way, I think some simplifying this config. It's like a good step.

D

E

D

E

To it back to my original question no time can we just take a quick poll at the people in this call. What a reasonable amount of memory by default for a no steve process to consume is.

E

Like right now, the blue star cache size is one gigabyte, which means that the OSD footprint is probably somewhere around two gigabytes total, without once you get on all the other stuff.

E

Is that a reasonable default should be four.

B

So gigabytes sounds we can I do for me out of it.

D

B

D

H

D

Their thinking about, like nvme or SD back to SDS versus like hard drives the main concept to supper. Two evils for us: I.

B

F

G

Would you submit.

F

A

These gems are for us are we. We would like to use the more memory to get a more performance so far on our server. We pretty much have 119 a 100 to 80 gig of memory for each it server, so we can consume as up to 10 gigabyte each OS, because I don't have a 12 or D for each server and there we can consume up more than cause unintended, app or OST, but I'd. Rather, we could at good America's good performance, yep.

E

D

The sage I wonder: can we can you roll some of the other things that we dedicate memory for into this tune, not just the risk of cash and me the rock CD cache that we get rid of some of the other tools that we have regarding and buffer sizes and other random things, and can just give you one number that you designed for your speaker, members, the.

E

Only other knob, well the only other main thing that the OCC- oh they're, okay, they're, only they're only handful about their sort of pools of memory that the OSD uses that are tracked. um One is the PG log, that's the bulk of it, and we do them in pool that. So we could base that on memory usage and the other one is the throttle theme of the amount of memory we allow to be read off the wire before we process it yeah, but that's kind of a sixth thing and it just it's not.

E

It just needs to be big enough to like make sure that we don't starve the OC of work, but no bigger time that unless I don't think that's something that is going to be tuned much in practice.

E

I'm not sure about I can't decide this, because if you have ten gigs of RAM, for example, data, it gives your OC. That doesn't mean that you want to dedicate more memory to logs like they're sort of diminishing returns there. So I'm kind of inclined to keep that one separate. But.

D

Is it something that we ever see a user needing to tweak those valleys up really high, though like the logs and.

E

D

Of you increase.

E

It I think I'm decrease it to reduce memory, but never increase it, so it just means that you'll do. If you have OC, is down the amount of time it can be down is shorter before it falls in the back, though, but like it's a rare event and they're like it's difficult to, like balance the cost of like memory overhead versus time down before you have to do a more expensive recovery operation, it's like super difficult to make a judgement anyway. So they're not too concerned about it. You.

D

Know it feels to me almost like that's one of those things where yeah. If you have little memory, you want to be down, but you have lots of number. Maybe you very slowly cooked up, but it's not even a decision the user needs to make, though right I mean if they have lots member you can do that. Imax is what we did have them, set it to something yeah.

E

D

If you dedicate only one gigabytes of the OSD to use, then ok, well, let's push everything down, see all.

E

Right Paul, oh.

E

Are you dad, let's see, is um Josh sorry on the call skip out on that replacing the blue star cache size with like an OST memory or basing on an OST memory.

H

Could be Roble I think it's a long term. It would be great to have a single suitable for Oh, Caesar, Marine general, um but in quickly said, there's much things to doesn't the place. That's work before that works well, yeah, okay and the FBI legate.

F

H

Weak so applying clever ones for like the PG log stuff that just be a constant instead of being epicenter. That.

E

And we don't account for all those to memory usage either, so it wouldn't work very well yeah, exactly okay,.

H

That's maybe that's the.

E

Quandary voltar's yeah, oh yeah, yeah, yeah, all right all.

F

Sorry, yeah yep.

E

F

It is listen I, just wondering like. Is it possible to like keep track of free memory in the system run accordingly, I guess the value. This is Wondrich that.

E

Is a good question, I think it's technically possible, although it's a little bit awkward because um there's all kinds of possibilities like feedback, loops and noise and whatever it's not a very precise thing, but you could sort of try but I'm worried about is whether that's a good idea. um The Bennett people come any input or opinions from anything on that the kernel can sort of do it because it owns the memory system and it can reclaim any memory it needs to if it needs to for any other purpose.

E

So it just uses all memory for cash by default, but for users to pace processes. Do that that's dangerous, because the kernel, if they'll, react quickly enough and they're gonna all start to swap them out instead of just doing a reclaim, just sort of a problem reminds me, I seem to remember I, don't know: did anybody have any thoughts on that.

B

Baidu adaptable store, organizes 0x10 in page cache. It's in its own page, guys.

E

The blue store has its own cache. It doesn't use the kernel page, cache, yeah I, don't know if it optimizes zero extense. So it does not know I, don't know how common that is that it doesn't. It.

E

There was a talk of Colonel of the curl for writing a feedback mechanism so that userspace processes can respond to out of memory.

E

But is not know how no I don't know if anything like this is usable. That might be a bad Fork, but even then I'd like, if you have 2000 students on the same host, they need to cooperate somehow also so that they only yeah have to be very careful.

E

It would be nice to just have them magically use as much memory as they need and trim it as necessary, but there are a lot of things that can go wrong like even if givenness they realize there's memory pressure and they try to free bunch of memory. If their heap is fragmented, then they won't actually be able to release that memory back to the kernel. So.

E

E

Imperfect situation.

F

Okay yeah, but that is not the goal. An ideology houses believe in stuff. The two we should yeah.

H

It seems like something that the foaming tools could to a good tune when they're, installing, like excess keys on server, become green, hasn't, yeah, conservative and portion of thos d. You.

D

Have to be really careful without, though, if you're talking about like some kind of hyper-converged pollution, which seems to be the hot.

H

Thing sure constrain memory, so therefore there to have a good idea of a delimited. You.

D

Know um maybe that I don't actually don't agree with that. It seems like a lot of times in hyper-converged scenarios, people have no idea how much memory applications are going to use like at peak right, and so this is kind of a problem for us now, especially with having static memory settings rather than just using page cache.

E

So this is a related question, not go into blue store, but we currently have this behavior incest, where, when a PG is degraded, we use more memory, and this is this is the primary reason why there is this.

E

But wouldn't we put this idea out there that stuff posties use more memory when you're doing recovery? It's not because it's hard it's because we deliberately make the PG log longer so that we can tolerate a longer post ease being down for longer. So it was written that way as a feature, but it's sort of mostly I think people interpret it as a bug and something that makes it difficult to do. Capacity planning I, wonder if we should just stop doing that.

E

Just fix the size of the PG along so that I, remember a to put footprint is fixed and doesn't increase or decrease with recovery.

H

Answer the King's birth justing out and the necessary, maybe even adding some and on disk creases in memory semantics or you keep longer one on disks well, but might not even be necessary.

F

But what do we visit.

H

F

E

It's if the PG log is shorter than if there's no see that's down for currently. If, as OC goes down for like a few minutes, while it reboots and comes back up, the PG long will be long enough that we know exactly which objects were updated, and so it can do a very fast recovery.

E

But once you go past the end of the log and you don't know which objects have been touched, and so you have to do a scan of the data on the OSD in order to bring it resync it and bring it back in. So it just means at that time window where a short down period can be recovered quickly. Shorter.

D

We have intend of knowledge from the field about how often recovery events involve short periods of time like that versus ones where we flow past anyway.

D

Now that's Ken. What we need to know if this is even remotely useful looking at and then actually well.

E

I think there are two questions like what should the length of the log be in general and how does that translate to like minutes of downtime?

E

That's sort of question, one which lets you choose, what develops you'd be and then question two is: should that value be artificially increased or expanded to like doubles or whatever in a degraded scenario to like to make bring those aussies back in more quickly? The.

D

In terms of downtown business can depend on the characteristics of you this and like seventy basic regions. It.

E

Depends on how well your I/o rate is yeah. That's another thing.

G

E

Is that if you have a pretty idle pool like we're, keeping all this memory and your windows very long or if you have two different pools, the log length of logs are the same, but the recovery window for each is different. But.

D

Yeah I, don't think we can really answer the question. Okay, I mean all we can do is look at their profile. We can't even really do this that. How we maybe could do is look at what really happens in practice with people yeah.

D

E

F

I

Like it really just.

F

I

I'll just say it seems like really just interested in sort of the CI op and VI ops with the node performs. And how long do you want? How many seconds you wanted to last so I mean it's the default. One is what is it a thousand for PG and they assert you could use it 10 times. 30 does not, which for a hard drive, is what five minutes so yeah.

E

D

E

I

E

Interest, but what.

I

We happen also, what isn't actually enough for a reboot. Maybe we just like double to do that. Actually,.

E

I'm sick of that guy, 10,000.

I

E

Not a thousand so I think we're well.

I

Well, 10,000, which would be great and one I, think a thousand is the non-degradable one. Unless or maybe those got turned down.

E

I'm ii be right, it's three thousand and ten thousand are the she thresholds currently.

E

I

Is like, like my.

E

F

I

F

Makes any other so.

I

Maybe we make an another rotational media thing and just stick the harddrive ones at 3,000.

H

Okay windows are also /, PG change so, and we have like, we have more than one PG bra guys, just put like even with marked end in that.

E

Well, okay, so here's a here's, a question: we could. We could switch over hip there bunch of questions. One is um we could change the PT log trimming so that, instead of having a fixed length of log / PG, we have a minimum that we keep just because for like do pop detection, but then we instead set a. We did. The trimming based on whatever the the oldest is across all PG s, so yeah, mostly idle pull in a very active pool.

E

The very active pool gets a long log and the idle pool gets a short log. So that means that our memory would be efficiently utilized right.

H

E

Got to be the first lady and then the second thing is maybe this should be instead of configuring, the number of entries we could instead configure the amount of memory which is now impossible, because it's actually a mental and things like EC pools have much fatter log entries because they have all this other metadata for roll back in there or third.

E

Maybe we should actually configure this as a time window or at least report a time window so that you can actually ask the OSD, and it will tell you the current log horizon is 15 minutes or something like that.

D

Can we can we within an OSD? Can we steal memory from other things now that we have mumbles? Can we say you know evict all cash, you know, don't don't do anything else. Just give everything to the TV log.

E

Anything's possible that that would be bought to work now. Yeah.

D

There would be food would be I, think.

E

Having that type of thing would make sense, once you get into the world where he set a memory budget for the total OSD, and then it has some smarts that figures out allocate it until we get to that point. Yeah.

D

Look at that because it isn't even a goal to shoot for I. Guess is why I'm asking.

H

How useful is in practice, yeah it.

E

Feels like so that the window PT isn't degraded. The only purpose of the PG log is for the du pape detection. So.

H

With with the branch that Eric's finishing up now and those should be stored in much smaller mass memory and then.

E

I totally sad for any structure.

H

That we call a service.

E

From the PG log entries, in which case we could make the PG log almost empty in the case, where we're not degraded, yeah, yeah and reclaim a bunch of memory that way, but it does mean that there is a the memory usage will balloon when you have degraded PGs, so I think as long as it. That makes me think that we should configure the amount of memory that we devote to that in terms of megabytes, so that there's a setting that says that the recovery overhead for logs is 400 megabytes or whatever, whatever it is.

E

So then them the administrator knows what what Headroom may have to plan for.

H

E

Know it said maybe.

H

Move where you understand well, it's.

E

Probably more interesting, it might be more like useful to configure, say: I actually want a 15-minute window and, however much armor that takes. But if you can't really plan around that because we don't know how much that's going to be so.

D

So in terms of capacity planning with potential customers and say, here's the static value per list- B plus, you know, however much per per gigabyte of disk policy. They have. What do we do with this I.

E

Don't think it's related to gigabyte of disk, it's actually no.

D

No they're saying that's part of the static value right. We've got like a static value that you have to have. Perl esteemed and I recommend this in advance with you right now because I for me, unless someone would be very happy, Thanks yep.

E

And they and the in practice it's there's a bunch of a couple things that go into that static value. It's like it's that if you like overhead and then it's also the the buffering incoming messages and yeah so on, because.

D

That I think it's great. If we do it like that, we can. We can even get rid of those as two bolts right, if you just say how.

G

D

Memory, yes, this is what you have to have plus then add this in, for your your caching to be effective. Yes,.

E

I, like that, so yeah so I think we should. We should replace the TG min log max log fields with a amount of memory where we just say the mempool PG augment pool it's just much memory. If.

I

We're actually going to go to dynamic stuff like that I think we probably would rather steal it from the from the blue store cache, rather than having a change. What the administrators get allocating.

E

So that when there's liners.

I

E

We we reclaim memory from yeah yeah.

I

I think that was a concept value. It's a lot easier like we don't have to talk the administrators into reserving for hundreds of free RAM or whatever they're, never going to get that right and that.

E

And it's actually the moat the better way to think about that is that when you're not to create it, then blue Sorkin borrow the memory, because that's the common case.

H

E

D

This, what we measure this ones can't thinking of Sony and yeah.

E

Yeah but it the more I think about it, the more that makes sense. Okay,.

E

Okay, well, let me I think about Baltimore how that wouldn't, but it maybe maybe the way to think about it- is that we allocate a budget for PG log memory, but then have a mechanism to loan it to blue store when it's not being used.

E

And how we're not sort of explicitly I'm worried about going to the tunable for the total amount of memory do st uses and as like the one tunable, because I'm not sure we're canning for enough of our memory usage to like actually respect it. I wish I yet.

D

You have pieces and post that we can do it. I mean if they're stuff out there that we're really worried about that. We can't account for easily I think.

E

We did I think we need to evaluate that experimentally I just had it looking at production clusters and comparing the mempool with the actual RSS and figuring out how well we're doing and.

D

A big part is like teasing mouth right and like foreign cash and other things that our hard press, the ed plan for yep.

E

District cash and there's just keep overhead like, even if we account for everything, there's like we're going to pay like 20 30, whatever how many percent, just because it fragments that happens over time, it's dependent on access patterns. It's like super hard to account for yeah I.

E

Feel reasonably good about the blue store cash accounting, because I've run an OST under load with a configured cache size and seen the RSS.

E

You know stay at a fixed size, but there's a there's, a Delta between the actual RSS and what the blue star cast eyes was I'm, not sure that everything else in between there sort of I can it for well. Okay, well, anyway, that gives me a couple things. So one is the blue star box, we cash ratio, one is the borrow memory from PG log and configure like in terms of ramp. Does that seem like a reasonable approach to you guys, question Greg.

H

Yes, yeah I think in general sense like these bees look rich Angus long as it's not too complex. It seems like something that we could do easily. Okay, oh.

E

D

In sage I can I came at the same time. Try to see if I can look at how rocks TVs, both compressed cache and uncompressed cache and the owner of cache all kinds of battle, or she shut her or not. Some work with each other will say. Okay, we do this alright, but buried in all. This I've got this little elm deeper proposal right discussed at CBM yesterday, but they'll have to go over it. Just this.

D

Anyone is interested in trying to resurrection, shins or PR for MDB and see if I can make it make it work and maybe give it another try. I, never I, never really got the impression that we did a real evaluation of why it was slow for rice and kind of meat, see how how bad it really is. It may be this totally worthless, but we'll see the only other thing I'd get on here was I had the white right back rattle settings, but this has come up again: I figured.

D

We should probably ash it a little but I hate this discussion because I feel like there's. It's a just a giant spectrum of it where, yes, we can, we can make it we can make it have more things that are shorter. Oh, we can fewer sins that are longer no advantages and disadvantages to both and we go back and forth on the spectrum mode. Well, maybe, if I do this, or maybe starter, did that and it always just kind of sucks- that's kind of the conclusion that comes in Jim yeah.

E

I think the only I figure it kind of doesn't matter so much where we end up here, because you're sort of on the spectrum between batch throughput versus, like um you know, tail Layton sees and burstiness whatever having more consistent performance. So it's not like there's a wrong answer yeah, but these numbers just feel huge for hard disk um like the hard limit of 5,000 names you journaling and things like an order of magnitude. ah You know I prefer a disc with this. Every.

D

Time that we were visited making them smaller, it seems like what happens. Is we test it out and it's for work? And then we read back off because we're like okay, you know it slows us down, maybe, and maybe maybe there's something unreasonable that we're doing that's the reason why it slows us down, but we are like okay, we can't take the 10% performance hit or whatever it's not.

D

You know it's too big people are going to be pissed, and then we don't so, and we can it's just every time we've tried to do this in the past. It seems like it. It gives us the corner tip that they were not willing to take where that products not will instead.

E

So yeah and I, don't I, don't know, I mean I kind of lean towards whatever like generates go. Beast support some users, some customers.

E

D

Do we with the current values you can know that there were no, but what we see is ridiculously high. Do people run into issues with those be dying because the things are taking too long? Is that actually a real problem that people run into.

D

It hasn't been something I've heard people complaining about, but I I could be wrong. I, don't I, don't have a whole lot of insight into customer problems. We face Josh Jo.

E

A sense of like a brisket.

H

We heard okay, not really this time. Oh I, don't.

I

Have numbers and I could definitely happen.

I

It's conceivable that we've added enough other throttles since the last time that it's no longer an issue, but only a couple years ago, having pretty frequently and I think it still happened. Periodically.

D

If, if we could even just get, if those happen, if someone someone could explain them, so we can actually get a sense of what it is and feel a lot better about justifying the performance hit to do the people that yell at me. Whenever we see you know some problems, baby.

E

We can ask negative: it gets to.

E

What he's seen recently since.

H

He hears all of these answers every Muslim. It sounds like a reasonable way to gather data I.

I

Mean so many are doing, the testing did. Did you actually like run out on hard drives and turn you down? Some 5000 slowed you down or just that's the steep thing. There's.

D

Nothing yes, both and no several times over the last like six years, I mean it comes up, maybe every like year or two, but let's look at our back paddle settings again and it's for whatever reason the setting then really really high seems to be the thing that that uh the helps is not just kind of high but but kind of having no oh yeah. I know higher was not even effective right. Yes,.

I

When they're not effective, then you know you just sort of pile right IOT's in as long as your benchmark. That's just short enough. It said that I'll get swept up then you're great, but.

D

We do a we look like a five-second window. Let you sink on. Oh.

I

It's much longer than that. It.

A

I

Dumb yeah, because it's a like the exit, that's delayed right back stuff when it has to push them out of it's on this journal and into and into the actual directory structure, is insane like wait. I.

D

That's 30 seconds: isn't it.

I

From it's on to sternal into it directories, I have no idea, I think it's when the journal gets full but I think.

D

I thought for some reason there was a certain second in the dinette that I might be wrong. um That.

I

Sounds like a dirty memory thing to me: maybe not a knotted search. This I.

D

Guess what what comes back to you, for me, then, is is okay is it? Is it actually causing is what we have now causing problems in practice, because we could? We could spend a lot of time on this I'll, sir right I mean yeah.

E

I, don't want to spend a lot of time by sparking yeah.

I

Yeah can't we just like sweating, that's a few HR guy ones and set the hard drug ones where we think it should be by running some Mac and math like. If that causes problems, then people will tell us.

E

How do you want to support this? One.

I

By doing math and saying okay does 100 I out, got it yeah, okay,.

E

Yeah I think we can do that I'm, so well, I'd, say ping, be cat and, what's just charity says, and so we get some sense of the pain but I think even if we don't make a dramatic change there, we might as well just do the work and split them out and we can adjust that. Let us usually adjust up and down going forward. I.

D

Mean argue, arguably, we have known in WAV files are now that we're pushing blue stores the future and things and still use this. We could just just file store down and be like that. You know slower, Barry, okay, yeah.

E

They're still every deployed cluster out there they're going to use it for a while, and conservative users won't use it for a while. So it's not that we can't totally not care well.

D

I mean we could we could make this, we could make file store more conservative. Now, since we have, you know there are some tens that we could just we could just make it slower and be like yeah well as to playing it slow. So don't look behind the curtain is slow and it's ain't. So true, that's true.

H

All right mind interprets what paging is to a few from big see whether it is actually causing any issues right now or recently, yeah.

E

I guess the it's slow car game. It cuts both ways right.

E

Yes, sirree says.

H

E

Anything else you brought my rap a.

C

Taejun Marlon a and have a couple of questions. The first one is related to that Glossner issue that we were talking about. I was wondering: how does it compare to Westeros its didn't improvement of transport or disintegrate fast henceforth, which.

E

Which issue at.

C

The one rated to rock steady cash when the amount of metadata becomes crowd.

E

My understanding is that we're still consistently better than file store for hard disks and certain normal SSDs. When you start using fast in Viennese, then we're pretty close to.

F

E

Saw number whatever booster was actually slightly slower, the main problem. There is just that the the nvme is no longer the slow thing. It's not. The bottleneck at all. Their bottom onyx are elsewhere in the sack and pulsars doing a bit more work then, and all right, fostering.

D

Excellence but.

E

I'm not sure where referral real and now but.

D

Adding on to that, from what I've seen recently when notes are in cash, blue store is pretty minute. It can be fostered now for branded nights, because we did the KB synthetic splitting that's what I was a big benefit, but when you're out of cash, but if you're significantly out of cash, then then we're probably slower.

D

E

So I think this is it feels to me like the takeaway here is that if you're going to get performance out of in dream, you need to make sure to vote plenty of react. What yeah.

C

I got it thing. Did you know the question that I have is that I benchmarked before this a very same grade level? I? Have some interesting results, I'm going to send them out now, just because I have it, we are completed the benchmarks, one of the. What is missing is that I want to change the SDN list or an ICD vector. Yes, just see what so, it's not that straightforward, like it's, not just changing the declaration of the code, so I was wondering someone.

I

C

For that, so I can read it or do it. We've.

E

We talked about it several times in the past and looked at it I think. Actually, the last time we talked about this, the proposal wasn't to push to a vector per se, but to switch to a Allen.

E

Allen Samuel evolve, acted versions of the STL containers that had a custom allocator that was statically in line with the declaration, so it would be like, in with the list declaration, there's an inline buffer of like five entries or something so because the shortlist it would be stored in line, and if it was long, then it would fall back to a traditional list.

E

I think that was the most. That was the proposal we talked about before, but it might just be that with a straight up vector, it's simple and easy: that's what also worked too, but it would be nice too. The nice thing about the the container list type was that it was his container types was that they were drop-in replacements for the assailant, so they would still support things like swap, swap and splice and everything they would fall back to order and if they were in line, but in was always like three, so it matter.

E

That make sense absolutely.

C

E

Okay, there's still a full request: that's dnm from with his original prototype for those types and it meets, needs love before it can sort of be resurrected, but I'm still hopeful. I still think that that's probably a good path forward, because once we have those types in place, we could sprinkle them all over the good base in places where we expect the containers to be small and avoid any allocation, because everywhere we're doing like sets of those T's.

E

And you know in the and do SD all those for lists could just avoid allocation completely, because they're always less than ten.

E

But does all that set I'm very interested in hearing I'm? What your? What your, what your benchmark, showed cuz that'll that'll tell us what direction to go and I think.

E

That's your um anybody else.

E

All right thanks, everyone all right, a good Thursday u2's.

D

E