Ceph Performance Weekly, 3 Jul 2023

Previous Meeting

Next Meeting

⏯

youtube image

►

From YouTube: Ceph Performance Meeting 2023-07-03

Description

Join us weekly for the Ceph Performance meeting: https://ceph.io/en/community/meetups

Ceph website: https://ceph.io
Ceph blog: https://ceph.io/en/news/blog/
Contribute to Ceph: https://ceph.io/en/developers/contribute
What is Ceph: https://ceph.io/en/discover/

A

B

A

We're streaming fancy.

B

A

Hey Igor: how are you okay? Thank you. No.

C

I'm doing pretty.

A

Good I guess uh is a little more relaxing this week than last week. So that's nice.

C

That's not true for me for my week, I'm.

A

Sorry I had one of those. Last week it was many many hours and and I'm glad that this weekend, I'm actually back to working on interesting things, but I hope your week improves.

A

All right we're starting to get folks from Court here. So let's get this started. ah Okay, I didn't see any new pull requests this week. If I missed yours, please uh speak up and edit, uh but for now I'll go on. uh I saw two close pull requests this week.

A

um uh First, there's this uh FTB FS on GCC 13 fix for roxdb um that I think merged into reef and not main erratic uh you merged that one is that is, was that only a refix.

D

No, no, no, no I think that we uh it's a two-state, merging, rocks the pig and Dot git and then uh git and, uh according to my notes, I will double check, uh but I think that we matched to the to a brand. That is simply named Reef.

A

E

It right the the sub module on Main is pointing to that Rax DB, Reef, Branch, okay,.

A

Okay, so we'll just need to make sure that when we, when we update again when we update main again we'll we'll carry.

D

That over right exactly yeah instead of git, it's just uh it's just Main, okay. However, it's a I'm, it's a basically a fix up for uh for the version of uh rocksdb that we are using in May in every event. Now in Squid, Till There is no big uh difference.

D

uh Big update of roxdb I will prefer to keep using the reef Branch even in Squid. Thank you.

A

Perhaps right, the only thing I would I would advise. Is that we don't let it get too far behind, because it it becomes a really really big update. Then.

D

Yeah, basically, basically, we sold we sold the uh upgrade uh upgraded, keep up with uh Rose dbb uh major releases, I. Think.

A

Yeah I don't know if we want to do every major release or not, but we should try to. Maybe my vote would be for uh every every release of stuff. We we try to get to the latest major release of rocksdb unless there's a good reason not to.

A

It's it's scary, I! Don't like it, but I think it's better than the alternative.

D

Thing with uh with uh We've outdated drugs, DB yeah, yeah.

A

D

I guess we will make a uh we will make uh the squid Branch, uh maybe later in this, like I, don't know winter uh 2020 uh free spring early spring uh 24, something like that.

A

Yeah, yeah, and- and maybe maybe once if, if we start migrating over to c-store and crimson and we're just using roxyb for the months, maybe maybe we revisit that I don't know.

A

um Definitely it seems like every time we do an upgrade, though there's in the the bug tracker and the the changelog we see that they fixed, like uh like data corruption, issues.

D

D

Yeah it looks like you're right, it looks like we should. You basically should establish a policy about consuming crocsdb.

A

I think so, because.

D

Now, even even now, even um a mess with uh with uh Branch naming.

A

A

Carly, okay,.

D

A

Can decide what you want the policy to be.

D

D

Anyway, uh anyway, I think that this fix up is also beneficial for Reef I I'm. Assuming that uh that uh we want to keep uh billability of reef on uh on on new Fedora. uh Is that right, Casey.

E

Yeah I was planning to take the sub module change to Reef, also but yeah.

D

Maybe not neces well. Is it critical to have it in the reef RC on or on the main, on the first main release of uh of reef, because if not, maybe we could backboard the backboard light after after some after some more staging.

E

I, don't think it's critical, no, unless it starts causing check failures anywhere.

E

So I'll just uh create a Tracker issue for it and yeah we'll handle it with the normal backboard press process. Whenever you're ready.

D

Cool, well, we can, we can we can. uh We can create the pr now and uh and click the button later.

A

So so, Casey and radic only very very tangentially related to this topic.

A

um At some point, we need to decide what to do about Snappy and lib fmt as well, because both of those libraries are now breaking with any new distribution or our compiles breaking basically I should say um they are both changes that um are intentional and breaking and I don't know what.

E

Is this specific to Pacific and Quincy releases.

A

No, this is specific to like the all of stuff. I mean basic, the gist of is I, think in Snappy. They completely got rid of rtti support, so we have to we'd have to figure out how to deal with that, and fmt was some other issue. um They they changed the API or something or.

E

How yeah I know that lib format made some breaking changes with version nine, and we have some in if defs, to try to handle that? Oh thanks, I think for Maine. Ultimately, we want to just bump the required version to to nine or higher and remove all that backward compat stuff, but um that doesn't solve the Pacific and Quincy branches which, uh if you use system format with a version. Nine then I know that we're seeing build failures there, yeah Snappy I, don't have a strong opinion. We might just consider deprecating it.

A

That was what I was going to suggest. I'm glad you said it because I don't think, there's a good reason to keep it around.

A

Google made it very clear that they are only testing and only support their own use cases for like Chrome, so it doesn't seem like it's appropriate to use it in anything else, anymore.

E

Is Snappy like good in comparison to the other plugins? We have I.

A

Don't think so it.

E

Doesn't seem like it's hard better.

A

Than lc4, frankly, maybe.

E

Deprecating will be difficult, though, since we have compressed data at rest at.

A

Least in rgw yeah, it's for point.

D

I think the same applies to Blue Star yeah.

C

Blue Store compression showed better performance with Snappy than anything else.

A

C

Did Adam it was better, it did. Okay.

A

Isn't there something else that came out recently that claims to be better than Snappy and lc4 I forget what it was. There's some other one that is.

C

Please do not take it as me wanting to hold on.

E

C

With getting uh that read, but just wanted to mention it, it it's working and for compatibility, I, don't know if we don't do not have to preserve it.

D

Well, if you want to deprecate uh Snappy this, this must go over the entire, uh the entire, the usual deprecation process, and we and riff is pretty pretty soon. So if we want to uh get rid of Snappy, no no much time to this to make the the.

A

Decision, I wouldn't do it for Reef I'd say we keep it in for a reason. We punt on that until after.

D

Yeah yeah for uh keep keeping versus the removal versus uh depreciation. Two two different things: yeah.

A

Maybe just just.

D

Make a dog uh change a darker uh suggestion that uh claim that we are going to because we are deprecating Snappy with with Reef.

A

Yeah I mean Reddick, would you would you propose that we have any kind of fancy migration path or would it just be.

B

Redeployed OSD.

A

D

Think we should we started calling it this really deploy process migration.

D

From one uh from one set of pictures to another, okay.

E

D

E

Think um the rgw migration will be as easy we'd effectively have to find all of the rgw objects that are compressed with Snappy and recompress them with something else.

D

Oh man, bigger bigger thing, so it's not uh so for sure it's not just a car. It's seven General, basically,.

E

Right I mean you could certainly deprecate Snappy for OSD, compression or blue star compression, but I think we'd need to do something else with igw.

D

A

Fun times, ouch.

E

So I mean what it, what exactly is the problem with Snappy? Is that going to be a blocker to being able to build or or Link at some point uh are.

B

E

From pulling up to a recent Snappy version because of that.

A

Here's the red hat bugzilla for.

C

A

Oh, my sorry, the thing at the end, accidentally a threadzilla uh and somewhere we've got the comment from Google saying that they don't they're, not gonna they're, not going to allow the re-enabling of this in uh and their code in the sappy code.

D

So, look that uh no way to uh either you are either you are consuming uh Snappy from our uh Repository.

D

Or we are seeing a lot of problems.

B

I mean we could just disable system Snappy and force the use of our sub module and re and re-enable them ourselves.

A

Yeah, we totally could do that and the patch to like re-enabled rtti is Tiny, so yeah.

D

The downside I see is that basically we are forking. uh We are forming another another component. It might might be not so terribly bad, um but if we could avoid it.

A

Well, that would be the the thing, though, that we could do while we consider deprecation right, as uh this would be, the first step is to just do it ourselves. Yes,.

D

We are, we aren't living in IDL World likely. We will need to uh to make our uh make some tiny Fork of uh of Snappy, like we did with drugs in the format, so a short living format that we must support forever.

A

Yeah I guess looking at what Google is, is more or less saying here. It doesn't really sound like there will be there they're at all interested in any other use cases for it. So I, don't I, don't know if there's much a badge sticking with Upstream anyway.

D

Have we checked for any uh API compliant replacement for a Snappy I, guess that we aren't uh the very sole project uh facing this issue? Maybe somebody has already forked uh Snappy call it I, don't know Snappy old generation and then- and they say, re-enabled rtti. uh uh There.

A

Yeah I mean it seems like this is hitting a lot of different projects that use it or used it. Maybe I.

C

B

Question is who wants to.

A

A

We could we could just Fork it and make the archet tag version and see if the package maintainers want to start packaging. That.

D

Well, we are providing, as a Adam uh mentioned, another person we are providing. We are having uh the option to build with uh uh with our own. It's not even now, I think am I correct, I.

E

I just checked our cmake and I. Don't think that we have a sub module for snappy we're just using system packages as far as I can tell.

D

Well looks that we will need to. We will need to go with the custom Fork Snappy building it.

E

Yeah I think once we see something break, that'll Force our hand and we'll we'll Fork at that point. So.

A

I have Casey latest Debian breaks and I think it might hit Fedora too I, don't know for sure if it does, but the Debian it definitely is, is breaking up. Bookworm.

E

All right good to know yep.

A

All right, let's see we can continue on with the pull request now. I guess um what else we have Igor I think your writable file allocate merged some good news on that front yeah. It looks like that updated uh I, see where the you also uh uh approved the first part of the elastic shared blob, so Adam's work uh on elastic, shared bulbs and then reviewed part three. uh When you approve your friend too, as well yeah yeah, so Adam do you feel like um or do we have Adam.

B

Or one thing about Snappy before we move on from that yeah sure it does look like a lot of distributions are re-enabling rtt on rtt on Snappy for their own packages.

A

B

Does that um I think I saw something upstairs about Debian looking to do that because they don't like it breaking everything else either. So it might be that Google doesn't want to enable Upstream, but the reason, but the people making the repos might enable it. Okay,.

A

Well, that's gonna, be a lovely mess right.

E

Might search the Debian project for a bug tracking this, and if there isn't one then we could create it.

A

Yeah I guess, if all the distros that we care to support do this themselves and maybe just don't don't bother.

A

Well, yeah Adam! If you see, uh if you saw something with the Debian folks talking about it, then yeah um we can. Maybe we can get that added into a Tracker here too, for us on our side, just trying to figure out who's doing what? Where.

A

All right um so uh back to the elastic shared blobs, PR um uh Adam. uh Do you feel like with Igor's approval on the first two parts of this? Should we merge those first or are you? Do you want to do this all in kind of one big go.

C

um I would want to go with one big go as specially that it seems now we want to have it a bit different I, don't know how we will need to review it, but it seems we want to have the mandatory part, uh meaning part one and two um just merged with, uh without even if Def and then drop, if def, for the part three and four and replace them for just runtime conditional.

C

It seems that we're leaning toward that direction, and unfortunately it might mean that we shouldn't merge it, as is just I, will need to modify it before marriage, just at least drop the if devs for part one and a part, two so far and then modify it part three and four for some just wrapping out not with if devs but with just run timing, so that that's it so I'm not feeling okay with merging it. As is now okay,.

A

Okay, well, good luck on that uh and uh it'll be I'll, be curious to see how that all works in the end, all right. uh Moving on um ah so I have a very old PR here to enable TC meloc when using c-star for crimson, and um it broke uh some of the address. Sanitizer uh checks that were being done and uh radic uh I think you just merged uh the pr for uh basically yeah.

B

A

Thing that that fix uh makes other sanitizer happy again.

D

um Yeah, the crazy one, with embedding with embedding uh the separation file directly to binary.

D

um Interface, they made the hooks, uh especially for doing that so wow.

A

So is this: is this one ready to go? Do you think, is there anything.

D

I think so I think so uh good good, I I think there were some trigger uh match conflict after after that, we are ready. uh We are ready to go.

A

Okay- okay, uh let's see here, uh oh uh I, think it's blocked on your review. Was there.

D

Oops uh I'm, going to correct myself uh I think I was just pointing out the dependency on uh on uh on suppression, uh essay and otherwise we would. We would have the uh Crimson Dash radios tested red again.

D

Right now, okay, good.

A

Deal good deal, uh let's see, then moving on uh Corey Snyder's PR for sitting in the Rock CB iterator bounds for collection list, um I. Think uh Adam, the only thing there's you were reaching out to see if, if he's still working on it right.

C

Correct correct, okay,.

A

And Corey's not here today, so I guess we'll move on um next. Is uh this RBD replace image context config with image config proxy um there's a request for Iliad to re-review that again, I think and So, since neither of them are here we'll move on from that? uh Oh and that's the last one for updated, so I, don't think! There's anything else. Interesting in here for today was there anything I missed from anyone.

A

All right: well, then, if not um maybe the only discussion topic I have I didn't write down here is that uh Paul has been continuing to do his uh tests. Looking at uh RBD performance in Reef versus Quincy, his tests are rate limited, so he's trying to basically look at a very given uh consistent workload and what is the CPU usage and what is the the underlying physical?

A

I o that's hitting the disks and in his tests he was seeing that um that reef is looking like it's maybe less efficient than Quincy, both in terms of CPU and in terms of uh the amount of iOS that we're doing for client IO.

A

um He did see variations when he increases the memory Target size, uh the CPU usage effect goes away. Our hypothesis right now is that this is probably the roxdb change that we made. The roxdb change basically makes the kvsync thread more efficient, but the trade-off might be. We might be seeing evidence that the trade-off is that by making the mem table smaller so that the KV sync thread has to do less work to keep each mem table in sorted order.

A

We might be seeing that proxdb is having to do more work to do uh like an O node lookup when you have an auto cache Miss. So it might be that as long as your own ones are cached, this is just a pure win, but if your o nodes are not cached uh are not, you know well cached, I, guess, then, then you might be seeing more overhead when you have a cache.

C

A

That might be driving CPU usage up um I'm, trying to replicate some of his tests right now on Mako he's also running more tests himself.

A

uh His new test, he's going to run, are going to be doing very low, OSD memory, Target tests, so that o nodes are not cached well and seeing, if that makes the effect more exaggerated and he's also, his tests have so far all been mixed workloads, um uh random reads and random rights, and so he's going to look at just a pure, random, read or pure and right workload to see if the effect is more or less present in either of those or just always present.

A

um He also saw that the effect was exaggerated at like 8K I o sizes and less exaggerated at 14. 16K I have no idea why that would be I suspect it's just an artifact of something happening at AK versus those other I o sizes, but um that that was one small interesting bit of this as well.

C

I have to tell you how it could be. It could be a pressure from caches. It could a larger blog data cache could further uh still memory from oh node cache.

A

Do you do you think, though, that then it would make sense that it go down again at 16k.

A

It's like it was fairly flat at 4K, then went up at 8K and then went partially back down at 16k.

C

No I don't see it yep.

A

Match up yeah, I kind of assume that there's just some artifact, maybe it's real. It might be a real thing, but just it's it's some behavior that that we don't quite understand yet.

A

Well, in any event, um hopefully later today or tomorrow, I'll have I'll have some tests that that, hopefully, at least to some degree, replicate what Paul's doing it will be exact.

A

um He's he's letting this IO age for for quite a long time, so I'm I'm right now just trying to get the lay of the land and do some quick tests. One thing I do notice, though, is that our CPU consumption in kind of a repeated test like this after you do other I o, does go up as you'd expect since there's more metadata and and more work to do to track things. So I do Wonder a little bit if we're seeing some effect of the Aging impulse.

A

Tests and I want to make sure that the Aging actually looked exactly the same between reef and Quincy, because if it didn't, maybe maybe we're seeing some artifact of the way that it aged, but in any event um more work to do there. So uh that's, oh I have one other update, so that was one and the other one is that um I've got a user in the community.

A

That's doing some high performance tests uh with just a bunch of nvmees in one node and um he reported really good results and he actually gave me access to do some work on that node and I'm, seeing much worse results, but the results I'm seeing actually mimic what I'm now seeing on official analysis as well. Basically, once you get past like three or four nvme drives we're seeing scaling slow down way way down in within the node.

A

If you just do like a single cluster test on that node um and it's it's interesting because, like I hit about a a limit at about 400 to 500, 000, random, read iops and maybe around 180 000 random, write apps, not counting replication.

A

That's really similar to the per node numbers that we're getting out of mako when we do like a big 10 node test and get like four and a half million apps. Again, it's like 450 000 per per node. It makes me think that we're hitting some limit in the node that we can't easily see when we add more osds the scaling just kind of tapers off after about four.

A

So that is something that needs to be investigated as well, and I've got tests also going simultaneously. Looking at that, but uh anyway, that's uh that's it at least for me.

A

Anyone else have any topics that they want to discuss this week.

A

Adam I see your mic going on and off is. Is that a I.

B

Was gonna say: I found the Debian Snappy bug, uh I, don't know if we want to talk to people in Fedora and ask them if they can or we'll do that or send to us nine and ask if they can or we'll do the same thing, because you know debian's doing it and I wouldn't say we should always do what Debian does. But it's an argument. Yeah.

D

A

If Debbie and Fedora and since us do it I mean that's that covers mostly everything I. Imagine the Ubuntu guys will just do the same thing. Damien does Damian, does it? Yes, yes, so yeah and you.

B

A

Is already doing it right.

B

Arch is already doing it. Yes, yeah.

A

I mean like if, if that's, if there's consensus, I'd say then, let's just let's just go with the float that'd be my vote.

A

Adam, are you? Are you gonna? Are you going to track this? Is that, are you? Do you feel, okay doing that.

B

Are you asking me or the other Adam.

A

Emerson, yes, you are, you. Are you okay, like oh.

B

Sure sure you want me to make well, uh do you want me to put it in the Seth bug tracker or do you want me to go, find someone in Fedora to ask them to do it or yes,.

E

About the Fedora side, okay,.

D

But actually it might be even worse, above because we also want to be aware about. We want to track the problem uh during uh during uh our scraps. So if it's not in already in the main uh access tracker, I would kindly ask to put the uh in there as well all.

B

D

A

All right well, then, I think we're done guys. Thank you for coming and have a great week.

E

D

C

Right bye, thank you.