Ceph Performance Weekly, 6 May 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Ceph Performance Meeting 2021-05-06

Description

https://ceph.io/community/meetings/#performance

A

Good morning,.

A

Good morning, so uh the core meeting looks like it's gonna go over again, uh I left and we are maybe about two thirds of the way through people the those people to go through. So we uh I'll wait just a little bit here before we get started, but I suspect people will be joining late.

A

A

Doctor have you oh excellent, excellent yeah? I was insane hadn't seen you in the meetings before. uh Are you a community member randy.

B

I work for renna. Do you okay, excellent, absolutely I'm in ce? I I support um sbrsef and sbrocs, as well as all the cross. Collaboration for for ceph integrations nice, nice. You know michael kidd or um yeah joe quinn, same team.

A

Yep my peers, excellent yeah, yeah michael, has uh I'm pretty sure he was actually there back in the ink tank days. That's right so yeah.

A

Well, very good, very good. um Maybe while we wait for people, I'm always interested in hearing how things are going on that side, what what kind of stuff are you guys running into these days.

B

um Well, I spend the majority of my day just trying to stay afloat. um I'm one of the only folks on the team that can take on you know the the l3 type uh ocs cases most of the time. It's ocp type issues or just integration, or it's always something. That's got like what time spent on stuff, and it's mostly, you know, openshift, and for that reason you know I normally get hunted those cases. So that keeps me pretty busy ibm cloud packs is an abuser of our of our slas.

B

So anytime they come on. It's like drop it. You know, figure them out. Yeah.

A

Do you um do you deal with uh very many performance issues on that side.

B

Well, yeah, actually, now that I think about it, um we've got a lot of folks who are just throwing anything and everything at ocs and still has three devices.

B

um You know concepts of of tiering and understanding what will or will not get reconciled by the operator which objects are you know stateful um and and what can fix? You know have these specific static settings that it's appropriate time or you would reach a scenario when it's appropriate to turn them off. You know scale them down to zero and say don't do anything. I know what I'm doing do what you got to do from like a set admin perspective and then flip them back on and see what happens.

A

Semi-Recently, it's been a couple months now, but um we kind of heard about people that wanted to run xcd on top of our rbd devices right and that's a nasty use case because it's like small, underlined, synchronous, sequential rights, yeah.

B

It's the old, let's just have a virtualized control, plane, put satellite director cloud forms and everything on rev three nodes backed by gluster, which just so happens to also be running the mons and the whole virtualized control plane for stack it gluster, hates that tiny io. So the minute you start a cdn sync that would bring lester down. I mean it's, it's the same. We're always fighting, and I think now, with local storage.

B

Having that native uh association with host name where pod can get scheduled, is going to be very helpful for us to say: hey, it's, not your root device anymore, let's patch, a local storage in via me, that's on your master, because it should have had that to begin with anyway, and and that- and that was the story that we were kind of uh telling.

B

uh When me and michael hackett synced up is we didn't do ourselves any favors by saying we support this ipi elevation, because all that did was brought the problem back down to set backed by spinners. You know what what did the client expect? uh So ephemeral, oracle or compute node with ssds would still be better.

B

If you trust the aha aspect, that ncd brings to the table and you're on top of it, um if you want to back it by rbd because you say hey center is going to work, it will work, but you know ramp up a workload. Do a solid update. Have it go crazy? You know uh olm kick and it's not going to be happy. You know image registry takes the amount of io there's a bunch of stuff. That happens. You know with just olm in general,.

A

Interesting interesting one, uh one thing that I've been talking to intel about lately and I haven't seen them on the call recently but um they've been so they've got something called opencast. It used to be this proprietary caching layer that they they made, um but now they've open-sourced it. So it's kind of competitive for game cache sort of, but they've been doing a lot of benchmarking lately looking at comparing those and they really want to have something that can sit on the client side.

A

That is in front of rbd, so that you could do local caching as long as you're. You know see that as an acceptable solution for your problem, but for some of these uk use cases that might be a lot faster.

B

I'll have to scrub a bz, but we had an instance where we had someone trying to do that on the lvm layer: um okay, okay, as as to as the front the the hole we don't have enough solid state, but the reality is is how much solid state do we need? Let's see if this works, it was a topic that was briefly discussed and I think it was shot back down in terms of we're not going to move that direction or support that model.

B

uh We try to solve the value of the benefit and invest in the time with the initial results uh that came out of it. I don't remember what zillow that was remember.

C

B

Was the lpm cache that they were using? I remember was specific to lvm and it was an lvm cache uh thing, but I can't remember uh what about um def site was actually going to consume, that you know how it was going to get down. It was outside of our preview um native to lvm or or something else. Let me see if I can find it.

B

Throwing random keywords out of there uh come on course. Our plugzilla is super fast.

A

But then from intel's perspective right, they just want to sell, like you know, obtain drives. So that's terrible part of this is like they're they're working on the software piece, but they you know. Ultimately they want to listen to lots of object drives for client-side caching for rbds. So that's uh you know, that's that's where they're coming from, um but legitimately their stuff actually looks pretty good. I mean their their catching layer. Looked like it was um in the test that they just ran. It looks like they were.

A

They were faster than lvm cache, so I'd be really interested to go back to those our developers that are working on so, okay, you know, is there anything that we did wrong here? That you know should have been done differently and we make this uh if we can't make this faster, because there's some way we can use ideas and work together with with intel to you know make this work.

A

All right looks like we're starting to get people from core. Maybe uh we'll get this thing off the ground.

A

All right, hey guys, uh so I think it's gonna be a pretty short meeting day. There's not a whole lot on the agenda unless anyone has something, but um we've got a couple of new pr's this week, one from adam to make do small right, never do buffered rights. um I started writing up a review on this.

A

uh I think my concern would be that right now we don't actually have uh the the buffer cache enabled by default, and I think we probably need that to get rid of this uh and and after being burned a little bit when we turned on directio versus buffered io at the bluefs layer. uh I want us to be very, very sure that we know all of the uh unexpected consequences of doing something like this. So um definitely we want benchmarks to data.

D

Yeah, I agree about the benchmarks, but um I think you yourselves and we're saying that um it's off by default, so uh the default behavior shouldn't change at least.

A

No, no, our cash right, like our our.

D

And doing the not doing buffalo to the um the baio, I thought it wasn't related to our cash. But maybe I'm misunderstanding that so.

A

So so my understanding is that um that- and maybe I'm maybe I'm miss thinking this, but that adam's pr that he wants to not have aio wright do buffered rights to pollute the page cache like he's. Why polluting the page cache this is. My understanding is his idea. Was that we're all we already have our own buffer cache in blue store where we cache stuff? So why pollute the page cache when that bin makes it so that, like roxtv stuff, might get forced out of cache, or whatever um am I thinking about right josh?

A

Does that sound like what you.

D

uh I think he's also thinking about it as a correctness thing like doing buffalo and and about freddie um at the same time, isn't really recommended with the kernel in general and the bio um you're generally supposed to use it with with only direct I o. um So it isn't. It's unclear why we were. We were kind of uh doing this, but for leo I had this path as a possibility. In the first place,.

A

I assume it was to get it into the page cache because I remember we thanks- and I talked about this some years ago, but.

D

Yeah yeah, but if it's been off by default for so long now, um that is it from what from yours comment, it sounded to me like it was making no change in practice with default settings. It was only if you turned on this bluefs buffered right option that it would change change. Behavior,.

A

Maybe I am missing that because I'm looking at the car right now, I I'm trying to look for like a conditional where, if we have like this only happens, if you have bluefish or something, if you have the blue star.

D

Buffer yeah, I guess it's, it's the buffered property of the um uh I o context.

A

Oh, I see so if that, if that's enabled in the I o context, then we do that code path, okay, right, right, okay, yeah! So so in this case, it would only follow that if you already had the the buffer cache enabled- and in that case you would then want to do the directio rather than the buffer diode, exactly okay yeah, I get it. um That seems reasonable something we should benchmark.

D

Yeah yeah, it's always worth benchmark. Actually there is one other case where we have the flag as well, which is when we have the um f advise we'll need flag on the operation, although I'm not sure if we ever actually use that at a higher level.

A

A

But I'm not confident enough to say that we do.

D

Now it appears we do in uh at least a cls fifo for rgw and cls rbd, okay, yeah, okay, so yeah we're probably worth maybe worth changing that. So the patch changes that behave here for that those operations then.

A

Yeah, I I have a theory that, back when a lot of the stuff got written, the thought was that we'd have like the page cache as kind of like we had like this kind of hierarchical cache structure. Right where, like we had the blue star cache for our own so supposed to be super fast, then we're supposed to have like the roxby block cache that would be bigger and being coded so that, theoretically, everything was smaller in it and we'd be able to use as like a secondary level.

A

Cache then page cache is like a third level cache. I wonder if maybe that's why this was kind of done. That way,.

D

Yeah yeah, maybe I think, you're right, it would be worth benchmarking and I think, what's going on with it today, yeah there's so many changes to the cash infrastructure since it was implemented.

A

Yeah, I honestly, if we can get if we can switch to direct io everywhere, like figure out, why it's bad. In the other cases, I'd really like to just get rid of buffer entirely, but.

D

A

A

All right uh so, let's see next uh oh initial support for c-store from sam. I didn't give it the performance tag because it's probably not appropriate yet, but I wanted to add it in here. That's really exciting.

D

Yeah, that's fantastic! It's finally, getting hooked up all the way through this deck.

A

Yeah yeah, okay, next uh closed pr's, my little description change from this buffer dio merged. It looks like so that's good is josh. I said you commented on that. It's not in the docs.

D

Yeah yeah in the blue store configuration config reference, there's no um documentation about the option, so you have to be looking at the source to know about it.

A

Do we want people to change this.

D

um I think we might want to document it in terms of um just recommending that people disable swap yeah.

A

D

Could be, maybe maybe you don't need to document the particular option, though you're right.

A

I don't even know I didn't check. Is this: what level of option is this? Is this a user level option, or is this the dev option.

D

I think it's advanced.

A

Forget I'm checking the pr right now. um It is advanced.

A

Sure, okay, so yeah we could. We could certainly put something in the docs, especially you're, right about swap disabling stop. That's, uh I think, going to be really important. Okay,.

A

That lock was uh changed for bluefest was reverted. Oh sorry, fs from bluestore um was reverted, that's good and also the other fix that, I think, probably wasn't. Actually a fix was closed.

A

This mempool cache line, optimization detection stuff that louis has been working on it, looks like smith farm had written something that that I think was related to qa for testing this and that got merged and then, um oh, I closed my old pr that had the blue first buffer documentation change and changed the ode map to uh uh tree structure instead, since ronan wants to try playing around with the um the uh hash specialization or the o node map when using an order map.

A

If that works better, so that'll be a fun little project for him to work on when he has time.

A

ah Let's see updated, rw compression bypass casey, you reviewed that uh further. I don't. I didn't really pay too much attention to what he's talking about there but looks like it's is still actively being looked at. uh Gabby you're uh looks like you've been making more fixes to your your allocation work.

E

Hey come again: oh, it looks like you've been uh making more fixes for.

F

F

As I go, but it's not changing anything about.

A

A

F

So I'm saying I I'm making some changes around how these things start open, close and such, but nothing about the main algorithm. So the performance are the same.

A

Awesome awesome all right. Good luck! I know this can be trying getting through all those tests.

A

All right uh last updated one here. uh This crimson osd's client request parallelism dar. It looks like that got rebased, uh I'm very curious to see how well that one works in performance testing, otherwise uh that last of us, no movement, I actually didn't make it through the whole list. Unfortunately, I didn't have time to finish up this morning, but, um but I think most of this other stuff is probably still just stale.

A

Unless, uh oh occasionally, we see updates but a lot of times it stays stale. So anyway, um did I miss any prs anyone's working on.

A

All right, so uh the only discussion topic I have is that I've just got a couple of updates for the oatmap bench work that I had mentioned last week.

A

um Gabby some of this is for you, so um I went back and I looked at so okay, so so for people that haven't seen these before um I'll just link the spreadsheet into the chat window uh here and uh gabby had noticed in these tests that last week that we saw like a huge performance improvement between luminous and nautilus and a couple of these tests, specifically the set keys and the remove tests.

A

It's a really dramatic performance improvements when going from luminous to novelis. uh These numbers are in seconds so basically um between, like five and and uh even up to like 15 times uh or higher, maybe even yet, um and it's faster so and he was worried about because you know that's a big change, so I in the the third tab here I started going back and trying to look at uh different versions of master in that time period.

A

I don't have everything in this table yet and I I still have a lot of tests to do, but the the gist of it is that.

A

We're seeing what to me what looks like not a single commit, making things faster, but perhaps multiple commits between luminous and nautilus that perhaps were having an effect it's hard, because the version of luminous I tested had a bunch of back ports already done to it, and when I went back in time to these point in time, snapshots of master one. It was super hard to compile these on centos eight that it did not. It was not happy. I had to do a lot of a lot of screwing around to get stuff to compile right.

A

um I also had to back port the omap bench to work with intermediate versions of the code as it changed since that api, the object store api had changed some, um but I was able to get to work and I saw that set keys uh was faster, but not as fast as nautilus in these kind of intermediate versions, as diffic prs. That sage did that reduced flushing behavior and I had noticed, as I was doing these with wall clock profiling.

A

That flushing really seems like it's the the key to this story as to why it's so much faster in luminous in like set keys, we're spending a huge amount of time in the kdc thread. Just doing, um I think, a sync file range. If I remember right, like sixty percent of the time, the kv sync thread was spent just doing that, and in nautilus is this, we actually don't see sync file range at all.

A

If I remember right, it's, um I think we're doing um f data sync and surprisingly, that seems to be much faster, not really exactly what I would have expected, but um we also, I believe, reduce the amount of flushing and syncing that we're doing specifically for omap. So um I believe that this story will become more clear as more stuff comes to light, but that's kind of kind of what we're seeing now.

A

um Interestingly, though, the remove performance didn't change, it was more or less what it was in luminous, so something else seems to have improved that in nautilus it doesn't seem to be the same, the same factor at least not all the same effect. So lots of lots of interesting uh questions still exist, um but that's that's what I see right now and separately.

A

um I was looking at buffered versus direct, I o, and why that that is so much better and buffered for like omap get and a couple of other cases remove actually is one of them and uh with wall clock profiling there I saw that um with direct. I o. We are indeed going back and doing a bunch of reads on disk in prefetch that we don't do when using uh buffered io. The strange thing is that both should be reading from the rock's tv block cache.

A

We already knew that this is just more further confirmation that in the wild clock profiles, we do not see that happening. We instead see us uh instead of reading from cache. We see us reading from disk. That's really really clear. So still, don't know why, but it's just kind of more fuels of the fire that um that, for some reason, things that I would have expected us to read from cash in those scenarios were not.

A

And that's all I've got anyone have any questions on any of the stuff.

B

Your testing is on all downstream builds.

A

Upstream upstream, this is literally like pulling stuff out of old master upstream master like intermediate commits. So yes, it's all.

A

A

Though we do have evidence that disabling or re-enabling buffered, I o bluffs buffered, I o and then disabling swap on the system, seems to work much better for people than using direct. I o we had disabled blowfist buffered.

A

I o like a year ago because of some internal testing on, I believe on downstream that showed that um we were seeing really bad swap behavior with rgw, with osds that had rgw targeting them when uh that was essentially just causing everything to like fall over um it looked like the actual oc processes were getting swapped out possibly, and everything was super slow and problematic when we switched to direct dial that didn't happen, but it turns out that we didn't realize when we switched to direct, I o that was causing um certain like collection listing and omap performance to really degrade badly.

A

I didn't show up in any of our performance testing at the time, so people started really noticing that when um they were doing things like deleting pg's, I believe in a couple other scenarios.

A

So now the kind of the recommendation we have is is leave blues and test buffer io on and that's the default again now back to that being the default, but simultaneously disable swap.

A

Is this the result of the uh of the bloomberg issue?

A

This is a public meeting, so we don't. We don't talk about it, but uh but um it was a customer that uh that ultimately resulted in a lot of this work done.

B

I I did look, who was here, I just didn't, realize it was being.

A

Recorded again yeah yeah, it's it's being recorded, but that's okay, I'll just mention not to share this one. So no.

A

Worries, okay, so that's all I've got guys. Anyone else have anything.

C

They want to bring up oh adam's here, hi guys I was another meeting, so I just joined so.

G

C

Have no, I have no way to know what you were talking about mark until I uh rewatched the uh meeting, which I will not so.

A

Yeah yeah no, no worries. I was just talking about this old map bench stuff that I had some some more updates for it.

D

A

Got it by myself, but.

D

um Maybe not, as you can talk about the pr with the buffering.

A

Yep we talked about a little bit earlier, adam.

C

About uh buffering uh io or what, but I'm still not in in the context.

A

uh Adam, your pr uh for uh beijing do small write small to never do buffered rights.

C

Oh yes, but that is something that I think we should do it cleans me the ability to modify the block device interface a bit more, and I also think that it's just by mistake that it doesn't really bring us anything useful.

A

I I have a suspicion that sage did it back then, because he was at the time he was thinking a lot about things in terms of like multi-level caches, like he wanted, the oh, no cash in blue store to be a high level fast cash with like the rocks to be black cash being slower, but more um dense cash underneath it um and then potentially the page cash thing under that with, like. You know that being kind of like the third level cash that can be shared between different processes and things.

A

But I wonder it might have been a mistake, but he also might have been thinking about like it being a secondary, the page cache being like a secondary cache underneath the the buffer cache in blue store.

C

I don't know I mean we are using uh that way. I mean aio right with dependence on whether we cache it or not. Only in do right, small in other cases. We don't do that at all, so that I still leave some room for thinking that maybe we wanted to make a behavior that if we do write small and only till a sector in half or something we expect to add it soon, so we will read it soon.

C

It it kinda makes sense, but it also makes sense on the level that we could cache that remaining data ourselves and then, when reading use our our data read not go to page cache. It's like we did not really decide what we want to do. It's like split logic.

C

So that's I thought that maybe try to go straight and clean that uh wiggle room will be better.

A

We should we should test and verify it doesn't cause any strange like corner case issues like when we disable bluetooth, buffered io, like just to make sure that there's nothing weird that we don't expect, but in general I agree with you. I think.

C

Yes, I will- I did not test that yet fully and I intend to test that using a basic small unit for testing smaller than allocation unit. So I will be spamming, a lot of really small rights and that's the possibly a place that I can see a difference, because otherwise there will be no difference at all.

A

Josh- and I also, I think, both agreed with each other that um it it would be really nice in general if we can fix whatever's going on with roxdb uh to just get rid of buffered io, all over just make everything direct io.

C

I think josh is gone yeah he left, but yes yeah. I agree I I would prefer to go for direct io in all cases, if we doing any buffering ourselves so for bluestore, it might make some sense to use buffered io really, but maybe even there so much if we properly verify. Finally that we're using a block a block cache as we should use it.

C

A

Cool well yeah. I think I think that's all I've got this week guys any anyone. One.

F

Question I don't really understand when we say buffer io is it specifically for a spinning drive, or it's also for ssd, because there's a huge difference between them in the need for buffering right, ssds are sometimes even better without any buffering while spinning drives buffering is a is the only way you could get any meaningful ielts.

C

I'm sorry I will ask who was just talking because no icon was blinking gabby. Is that you.

F

Yeah yeah, sorry, it's me.

C

It's your new mic, I'm not accustomed to your numeric story!.

F

Yeah, it's really annoying, but it's uh wireless so.

F

I decided to join the future and go wireless.

F

Did you get a question.

C

Yes, we do not differentiate turning on and off of buffered io, depending on the type of device you are based on. I did not see such kind of recommendations, and I didn't hear anyone talking about it. That way. We either decide to make it buffered on the page cache level, and then we make it buffered or we buffer it ourselves, and we just make it using direct interface and buffering ourselves.

F

Why do you think it's just a right, or it's also wins.

C

We buffer reads: also: we we keep in a blue store block cache any data we read. Always we have an option to maybe also put their data during write. For some reason, we can that that part is very optional optional, but caching on read is mandatory.

F

So that's the thing. So, if ssd, there is always a good reason to buffer right because of fight amplification.

F

If you could do proper read, sometimes it's even better just to go directly each time and don't bother too much with buffering. I mean some of these ssds. They can do like hundreds of thousands of iops.

C

Okay, gabby, I will. I will stop you there, because I don't understand what do you think by it makes sense to buffer rights.

F

uh I'm saying when you say buffer, I o for read block device, I'm completely with you ssds.

F

It's not the most, especially we're talking about multi-level with uh buffering right, I mean if it was just one specific battery that we control yeah sure go for it, but once you have multiple players trying to buffer it, and we only talking about ssds, I mean I I I would try to do without buffering versus these, just because we have.

A

F

Levels of caching.

A

We're trying to head in that direction, gabby. The problem we have right now is that um for some really strange reason, it doesn't appear that if we have like an o node miss it, it looks like if we- uh I don't know if it's not going to miss exactly it's more like a.

A

uh If we have a a miss for like um omap data when you're you know looking at an o, node or something or or something else right like a rgw object um that we don't read it from the block cache like we should, especially during iteration. For some reason I don't I don't understand why. But um we end up.

F

Does cash it right? It is cash inside or to be cash.

A

That's the problem is that, for some reason it doesn't look like it's reading from the block cash. Those blocks don't seem to be getting read the way that we would expect them to from the blog cache. Instead, it's like being going into a prefetch step and re-reading from the disk, and it's doing that like even as we like reiterate over and over again, it will like go refresh that same block from the disk over and over again leading to the huge read amplification.

F

Do we have an option for obviously b to say, don't trust cash always go to.

A

Drive I mean you could, but the the problem we have right is that if you're like iterating over a collection and you're, always reiterating from the same point, looking for some something in the collection which is apparently what it seemed like, we were doing, um especially in older versions where, before igor's, uh some of his fixes, it would like just keep re-reading the same block over and over again from disc. It was like awful.

F

No, of course, that thing is terrible, but doesn't rock the beat take care of this. I would expect them to be able to handle this game, because people are using this. This thing right.

A

Roxdb typically uses uh the page cache, though usually it's doing, uh buffered reads and it uses the page cache as like a a secondary cache for the block cache, and by doing that, even if you're, like re-reading the same block over and over again, if it ends up in the page cache, it doesn't matter really in sort of it's not great, but that's that's. How we get around this right now is that we have to turn bluefish buffer dio back on, so that those block reads would come from page cache rather than from disk.

F

So we are trying to solve a bug in rocks db code by introducing yet another cache layer. Maybe we should we don't know why yeah.

A

We're not sure.

A

This is only the suspicion that we have, because it's not behaving the way that we expect so we're.

F

Just we're trying to produce a test showing how many times to be accessed that drive for stuff. We should be in cash.

A

Where I think adam has some uh stuff that he's written, that will let us look at the hit rates in the.

C

Last, yes, I have that I have to rebase it to master again and maybe finally we'll get merged.

F

I mean if we can prove that roxdb is reading from this stuff, which is in cash.

F

That probably means that they are using some options. There are some options to bypass cash right, I mean in every system in every cashing system. There's an option say: don't trust cash, there's options or maybe there's an indication causing grogs to be to suspect that the cash is not up to date, that there was some change which is not reflected in the cash. If we could find what is this thing causing blocks to be to mistrust the cash? Maybe you can just fix that issue instead of doing double, cashing.

A

Yeah, that's the, I think, that's the long-term goal. Gabby is that if we can do that, then we might be able to go back and disable buffer. I o again and just go straight back to direct. I o without the performance hit.

C

Yes, I concur. That's the what thing I would like to to see. We just make block cache work uh perfectly fine and we disabled buffered io, and we control fully the memory we are using for osd.

F

Do we know what option causes rocks to be? So you, in what cases robs db, bypasses recash.

A

We see the effect of it primarily, I think, when we're doing like iteration with like omap. That seems to be a really common case in that spreadsheet that I linked. um If you look at the second tab, that's like the 500k slash 20s.

A

um Even if you look at like the 16 gigabyte numbers, that's where there should be plenty of block cache for this, that everything should fit in the block cache. uh Yet when we switched from buffer to direct, I o for like omap get, it was very slow.

A

um Although the iteration was faster, I take that back, iteration deep to first and lower bound was actually fast. In that case, it was just get that was slow and delete that was slow or remove. That was slow.

F

Oh okay, just one more question: when you say direct io, do you mean rocks to be direct io, or do you mean our own direct? I o it's a blue.

A

Fs so roxdb is actually running on top of blue fs is like a virtual file system.

F

No! No, but when you say direct io, do you set direct? Are your option for works db or for bluefs.

A

F

Because roxdb also have an option for direct, I o right, so they have direct. I o option to bypass the operating system, cache.

C

No it's irrelevant because that option only uh tells how to operate its port uh and we, but we are having our own file system port. So those options do not even matter for us.

E

F

I'm just now trying to read and invoke the pages. What are they doing and what would cause them to bypass the the os page cache.

F

Okay, so we definitely need to try and understand this behavior, but it should be something documented. I would expect that such thing as when to bypass the voice cash when to mistrust or is cash or what kind of logic causes rocks to be, to suspect that the cache is invalidated. Do we have an option to invalidate a cache?

F

Does any of this thing under control.

C

You mean our block cache, invalidate no, we don't have a command to invalidate uh block cache. We provide for rxdb.

F

But no I'm saying there's something causing works to be to mistrust or cash or it's I I'd expect so I mean in some cases, just think about.

C

We don't we don't know, we don't know if the rocks db, mistresses, the cache or just using it incorrectly from what I can see when I observed it in it in detail, I've seen that roxdb tended to request a lot of entries from cache that were never put there. It was like a different pattern of a key that they were trying to retrieve and, of course, they were they weren't there. So maybe there's just an.

F

Error right I mean usually what I would do in in a caching scenario. I would always start by searching the cache and if it's not there, I will go and go to the disk. If I don't make and if I don't.

B

F

um History, what I put in occasion, what is not the simplest thing, is always go first, the cache and if it's not go to the disk, what? But you are saying that the data exists in cache. We know that it's in a buffer cache, but roxy b still goes to the disk.

F

So it doesn't ask if these things is in cash. Is there some reason for them to say we don't trust the cash or don't expect? Maybe it's an optimization when they don't expect that to be in cash, then they don't go there or maybe it's something automatic after they had that many cash misses they would stop going to the cash because going to the cash introduce some extra uh hope that they want to avoid.

A

Gabby, I put a link in the chat window. um This is like a walkthrough of the code that adam and I were doing a couple weeks ago. I guess or longer holy smokes um back in march. I guess, though, um possibly maybe that will be if you're interested in looking at it. That might be a good good place to to kind of the the flow of how the code works.

F

Okay, okay and since adam works in my time zone, I will maybe try and schedule one-on-one with adam and try to better understand this scenario.

C

Yes, gabi, I I would really love to I mean, but it's not like that. I know the answers. I'm also just searching almost half blind to uh roxdb code and the effects I see in the logs. So I would gladly debug that with you, maybe we'll.

F

Find some some serious problem looking for the answer? Yes, yes, I'm still trying to start from the question. Once we have the questions, then we can go search for the answers.

A

Yeah, I think I think the the the next step in my mind is we need to to see just if what the block cache is actually doing like are. We are we actually fetching stuff from it and how often, and how often are we actually doing the reads from disk and and then from there we might be able to actually start instrumenting the cache itself to tell us why it's making decisions the way it is yeah.

F

Okay, yeah. That sounds like like a start.

F

Oh, it's again: johnny.

F

No, there are some more people.

D

So gary, what were you saying.

F

No, I I was asking if it's again just a free offer.

E

Oh, no, we still got randy, uh we might scare him off yet.

C

C

So long on this talk that everyone gets disinterested and goes away, but it's not like it's a private one.

B

I I'm part of the uh the kobe generation. Now I'm just trying to find friends that aren't my kids, I hear you.

A

I hear you all right: well, I'm actually gonna wrap up early today, guys because I want to go, uh have lunch and eat so uh any any last minute things anyone wants to bring up.

C

I'm good, and I would just like to sorry please go on.

B

Just that bc I threw in the chat just something to keep in the back of your mind. If you ever hit a bucket with millions of objects, if it's, if it has a soft quota tied to it, we we saw a recent issue and it's present and master right now, where um it will go and still do a list object or list buckets against um against the bucket, with millions of objects on a simple get, uh even when it's not explicitly called because of the quota, and that was generating a significant amount of I.

B

On the bucket index pool uh for a customer. We had recently.

C

Well, I will take a look but from what I scanned, I couldn't understand anything so it ne. I need more time sure sure no.

A

A

And gabby did you have something to that? You wanted to bring up.

E

C

No, I will just I.

F

C

Just like to say to.

F

Keep me busy for the foreseeable future.

C

Okay, I would just like to signal that I will be touching um block device interface and straightened up dependencies between being direct io and buffered and asynchronous. I just want to make it always asynchronous is never buffered and the other modes might and might not be buffered. With addition that, possibly, I will add an api call to think right. That is different than right.

C

I mean synchronize regulars to disk, make sure that they land on disk being different than right, because currently we made it so the implementation, at least, is that right, unbuffered right always tries to make sure the data goes to disk.

C

uh Sorry, buffered right make sure that always data goes to disk, so there's no possibility to spam. Small writes to the device and then and some at some other time make it flush yeah, that's that stems from my problems with performance with uh blue fs and buffer taiyo, that was heat, so yep.

A

Sounds good adam. That sounds excellent. I think we need to do lots of testing, but I think it. It sounds like a good plan.

C

A

All right well, then, have a great week. Everyone see you next week.

C

Bye hi, thank you guys, see you later also mathias and randy visit us next week.

G

It's time to meet.

A

That was good, bye. Everyone.