Ceph CDS Quincy, 20 Apr 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Ceph Developer Summit Quincy: RADOS Follow-up

Description

https://tracker.ceph.com/projects/ceph/wiki/CDS_Quincy

A

All right, I guess, we'll give people a couple of minutes to just trickle in, um but while we're waiting um welcome everyone to the follow-up session from cds, raiders, um which happened a couple of weeks ago.

A

um The aim of this meeting is to just go over the items we discussed at cds and make sure uh we prioritize them uh based on what we think needs to be in quincy and whatnot and assemble the backlog uh in our trello board.

A

So in the interest of time, what what we've done is that we've already created some uh trello cards for the items that are there uh in the ether pad, and there are also some items that have been there uh since pacific, which just spilled over to quincy.

A

So we'll, probably just go over the items in ether pad first make sure everything landed in the trello board and then also quickly go over the items in the trello board. Make sure all the labels look right.

A

If uh does that sound good to everybody, any other ideas.

B

A

Cool perfect, okay, I think I'm just going to do start. um First of all, I've shared uh the ether pad in the chat and also the backlog trello. um I can share my screen. So that comes easier. I'm talking about.

A

All right can everybody see my screen.

C

All right, yes,.

A

Cool um so yeah. This is the raiders section, the first item and the second item. uh The first one was about dashboard and traders and there's detailed things that we want to do there, but I don't think there are any particular um greatest uh trello cards required.

A

um Those required will be made by the dashboard team and we figure out what needs to be done there so I'll skip that um next. Also, uh the crash telemetry panel stuff uh lots of improvements mentioned in the ether pad, nothing particular I mean we have one card here about mapping, device errors and osd crashes factor devices. That's a radius card, that's been there, but uh I don't think, there's anything specific that we want to create there as well.

A

So moving on to the next item, which is about blue store or split cash improvements, uh we did not go over um this in detail in the last cdm session, because the availability of some of the blue store books, but uh there's a detailed document about how we want to make these improvements and we'll probably go over this in a performance call or something um for now we have added. This is uh in this link to the either pad is already there.

A

A

So next item here is about immutable content optimizations. I think we collectively agreed that this is more suitable at the rgw layer versus at the radius layer. So we are also going to skip this one.

A

um Next, we come to manager improvements, so this one has a whole bunch of improvements. uh Some short term improvements some longer term, some also um sell by different uh components like dashboard and primitives and other things.

A

So what we essentially have here.

A

We have a bunch of improvements that we have mentioned here already um things like improve progress, module scalability is just about the progress module. Similarly, for insights, uh we have a separate card um and the entire ether path that we have is linked here.

A

In general, I feel we have cards that capture um some of the improvements that are mentioned in this, um but I guess the idea will be um some of the cards that are already. There are more like short-term things that we are aiming to get done in quincy. As we start taking things as done, we will be able to make more cards that we can um probably target for quincy, or maybe even later, that that was the general approach for this.

A

uh This item on the list, like, as you can see, progress, module progress and the insights stuff already has cards.

A

um The other thing which I think is going to come up next, is about the common uh manager rule. So there is a pull request from patrick that already addresses this.

A

um The idea is to be able to use the common sqlite database. So currently the plan is to be able to migrate the device health metrics pool.

D

A

In future we went inside starts using a radar school to store its data. We can also use the same tool, dot manager um that we decided to name it. So I uh yeah go ahead. Sage.

B

I was gonna say we could um just rename that pool and reuse it instead of creating a new one.

E

That's what we're doing I.

B

A

What I see, what that.

B

E

A

That's exactly what does yeah there's a migration path involved. I think we also have upgrade tests to make sure that you know things work. Fine perfect, so I feel like this doesn't require a separate card for now. uh We already have the bare bones of what we need.

A

A

All right next, there is this item about autoscaler improvements um here also, work has already started happening. um There is a pr from junior that already came in um start off old creation with one uh pg if the auto scale mod is on, but I guess we also have items here in general for auto scaler.

A

Like one is this, we decided that we will have separate auto profiles um and users can just select which profile they want to select based on what kind of workloads they are running or what the state of the cluster is, so that this card addresses that bit and um and yeah this. This is the main thing. This is what we attempted to address in specific. There are some rough edges, so it's still marked frequency.

A

So if we address the other issues, this you know comes along with it. Hopefully more stable.

A

Anything else on that item or should keep going.

B

A

Good okay, so next item here is about uh cluster log messages going through faxes. um This is bad, so here I think um there are a couple of things that we discussed.

A

One uh was about uh controlling uh the trimming rate in the monitors uh in a more adaptable fashion versus just um statically, choosing a value which we were doing earlier. This pr has already merged.

A

um So that address is one part of the problem, but I guess the other main issue was that we decided that we do not want to store all these low request messages in the cluster log um and probably just direct it to the manager log and not store everything I have like. You know, n number of updates that we store so have got your in yours, so this addresses that problem.

A

Will be both osd and manager, um cluster login, which will now go into the manager log, and we just limit the number of messages going in.

A

I would also like to point out that there is one pr from the community that tries to tackle the same problem, but it just does it on top of the current machinery that we have.

A

So I feel what we can do is we can also incorporate this change as a part of of the bigger change that we are planning.

A

Those are basically the three main things that we can address in quincy. There's also things about um storing things in a dumb way and we should fix it. This will need a refactor um from the log monitor yeah. I.

B

Had a card for that: okay, okay, perfect.

A

But yeah these are three and then there is.

B

Oh, that the pg stat.

A

Yeah, I agree. We have a card yeah, that's a good idea. I did add a card for that.

B

D

A

It is underrated, it's hidden somewhere, so this one was a good idea. uh This needs to be.

D

A

You can see and.

A

So yeah this was also a good idea. It's gonna be really useful, but debugging pops makes sense.

A

uh I think that covers everything in.

A

This next on the list is structured, con files and we've already merged a pr from kifu that implements this, um which is awesome. So hopefully we can um get the rest of the items he has here as a part of quincy. We've also put it in here: auto generator talks.

B

A

A

um That brings us to this last item here, automated auth key rotation. We had a detailed discussion about this. uh Unfortunately, uh the the source from where this requirement came in there is lack of clarity on what is actually needed, um so we hope to get more clarity uh from them, maybe in a month or so. But uh if we do get clarity on that, it would be a nice to have featured in quincy, but I I don't have any more details at the moment, but we have a card for it and.

A

A

That's that I think that, with that we cover all the items we discussed at cds, specifically um anything else. We want to talk about, or we can now go through the trello backlog and make sure everything is there.

B

Yeah, let's go to trello.

A

A

So the first item here is uh the second part of the pg uh removal. Optimization you've got a pr for it, um and igor is already working on it. So hopefully we can get that um done.

A

I think the format changes and stuff already went in so we can technically even backboard it at some point, but we'll see I mean I don't have a strong opinion.

A

um Next item is a broad item about crimson um essentially has a link to um the crimson trello board and we we have very clearly outlined uh items there for quincy.

A

um Then we've got so: we've got a few items related to a qos. This is something we did not discuss, uh particularly at cds, but we have a lot of clarity around what we want to do and what is done. So um there is not trimming scrub and pg deletion that we want to address next, as some of the background activities that m clock will also prioritize.

A

Currently it is prioritizing client I o versus recovery and scrub to some extent, but there are things that need to be done.

A

So if we get these done, we can essentially um I I think this covers everything we don't require any of those manual sleep stuff that we have for all of these background operations and we've been snap running, I don't think we've started work on, but pg deletion and scrub. We have ongoing work on that.

D

Yeah, so so we have seen that uh we have yeah. We have identified uh the the physical deletion and the sub uh tuning part uh for which we come up with the test cases um using cbt itself, so once we test using that, we can go ahead and fine tune, as as we find things.

D

Okay, that's the plan. Yeah.

A

Perfect- and I would like to also mention that the aim with quincy is to default to m clock scheduler. I already have a pr out for that. uh There are some testing issues that came up, so we might need to adjust some tests and stuff, uh but if everything goes well, we will go ahead with that change. So quincy will default to n clock scheduler versus wp q, which is the default at the moment.

A

A

Then, apart from the background activity stuff, we also have automate uh baseline uh measurements, so currently uh m clock or the m clock. Scheduler requires us to feed in um some data or some information about cluster performance or throughput, um and that's a manual step at this moment, and we have some default values but which may not hold for different kinds of clusters.

A

So the idea is to be able to automate this uh step so that users can just run something and or this can also be done like a background activity by the manager, and we automatically figure out what those values should be versus a manual step involved, so that I believe, is going to be cool and I think that's the only thing that's holding us from making everything you know work out of the box at the moment.

A

um And yeah, the the last item about qs is all this that we talked about earlier was the background activity stuff. We also want to do qos, uh client versus client, um and that is going to be kind of the final piece to make sure we have complete qos in cef any anything else. We want to talk about uh sridhar.

A

Anybody else have any thoughts on.

D

This yeah- I think uh that covers the high level stuff that we want to do.

A

Sounds good then,.

B

Do we need something that covers sort of the user facing part of this like how um either set qs policies or how they can or monitor.

A

Yeah, I guess that we could have uh I I don't think there is much integration with the dashboard, but in terms of how users can use it. At the moment we have detailed documentation, uh developer documentation and user-facing documentation from treader, which, which is uh one part, has merged.

A

One is cylinder review, so we have the docs part of it, but it'd be really nice to have it as a part of the dashboard and that's one thing I discussed with the dashboard team in cds is currently what we have is m clock just has some profiles. The default profile is going to prioritize client, I o, but it would be nice to uh have give users the ability to just change that profile in the dashboard itself.

B

What about the the calibration step like that, the doc or pr that I saw a while back basically said, run cbt yeah.

A

So this this this card is for that: okay,.

B

A

This automated line this this is the main manual step at this moment. So if we can get this done, that becomes automated as well.

B

E

We haven't really gotten into the details. There.

B

E

I think once we start talking about that, we can talk about like the monitoring and the ui aspects of it.

A

Yeah yeah, I can see that this this having an automated way to you, know, measure baseline performance can be useful in other ways as well, not just for qrs. If we have the hooks to just do it.

A

um Moving on, I think we already went over this car about limiting slow requests, so I'm going to skip this.

A

So the next item is just a clean up item, but I think it's again useful. uh The idea is that we have a whole bunch of asserts in the code um which are multi-condition asserts. So from debugging point of view, it's very difficult to understand. um It's not obvious, let's just say that which assert was hit. So there is uh some extra work that developers need to do be able to identify which assert would hit or the bug is about.

A

So the idea is to separate them out and make them individual asserts so that it becomes easily identifiable.

A

And this is a small item, so anybody you know looking for small things to start working on and pick this up.

A

um Then we've got this one about map device, arrows that I uh glossed over earlier um and I think there is a yeah yeah it added a an ether pad here describes what the idea behind this is.

B

C

Guess it's just yeah go ahead. I wonder if it's worth um considering um integrating the the data that we have from uh the crashes um now that we know that we do have io errors uh from uh devices, so I haven't looked into that too deeply, but maybe we can use this information as well.

C

uh Sage, what do you think is this something um I mean: does it make sense.

B

It might depend on where we show it. um I'm not sure if we've put much thought into that, um like my first guess would be the device ls screen that lists your devices should show like error count or air history or something I don't know, um and I could show crashes too if we wanted to do that, but um it doesn't map as cleanly to device as it does to a demon. So that may or may not make sense.

C

B

But I think yeah having having I o errors by um by device also by pool like if we can link it to a pg, even not just a device. That might also be helpful.

B

You know which pool is saying errors. I don't know, I think, there's a couple different things here and then I think once there once we have those stats aggregating somewhere, then we can have a set of like health alerts that trigger.

C

This okay yeah.

C

Yeah and we probably use the new methods to store this data as well.

C

A

I remember one of the initial ideas was about. We already have this metric nam shad repaired. That would uh just exposing this at the manager level. Yeah will be useful to begin with.

B

Yeah, like I keep noticing on the lab cluster, for example, that they're pgs that are doing a deep scrub repair and I don't know why are they like seeing errors on the discs and they're repairing them? I don't know, but every time I look right now. There are three pgs repairing, for example,.

A

Okay, you should see that somewhere, you should probably see this um could be there. uh Yeah I'll, take a look to see what what these values are are if these values are even reflecting any of the reverse.

A

But, okay, I think time to move to the next topic.

A

um Okay, I think the last one we've already discussed at slow request bit to pg stat.

A

um There are a whole a lot of other items like some marked with small, um which be nice for somebody who wants to get introduced to raiders or like even ceph and start picking up, but we don't really have a target in mind for them. So I'm not going to go through all of these items, I'm going to move to store, monitor and manager.

A

So for bluestone we discussed the split cache improvements. uh There are a couple: others one is about make assets unique per return value so currently, when we assert in some places in blue store, we just assert, but it's not clear why we assert it as in the return value, is not clear whether it was out of space or it was some sort of eio uh making them explicit again um helps with debug ability in general. So this is again marked with small for blue store.

A

um And then final item for blue store. This has been there for a bit. I think there are several other improvements that we are making in pacific because of which the cash bin cash age binning pr from uh mark had to wait. But at this point I believe all of that stuff has already merged um yep.

F

Okay, we can. uh We can do this now. I think, uh based on stuff that uh that uh adam uh wrapped up and based on uh some of the other stuff that we've merged you should be able to go in now.

A

Yeah and then one of the main blockers was uh the uh rocks uh the column family shouting. So that's, you know gone with pacific, so it's pretty stable. So we can start working on this again.

A

All right then, coming to monitor. I guess this is something that sage you just added about. The log monitor stuff yeah, just a refactor.

A

um Then this there's this item about um severity of degradedness in self health. This is um thompson from this. Danny already started working on this at some point, but it got. I think he got uh distracted by the some of the qos work he was focusing on.

A

I guess let's go to the rfe, so the idea is to be able to uh display pgs. um This the example here is for ec in k, plus n minus and for all quantities. F, health.

B

Yeah I mean essentially yeah.

A

Yeah, essentially, when you at what point do you do, you know make pgs inactive think just having to know how many of them are in in what state, just the breakup of how bad or how much time uh a pg would need to recover um break up of that, making that more obvious to the user. uh Currently, it's just overall degradedness of the cluster that we show so this this is where this comes from.

D

Yeah, I will revive my work and.

A

Cool um and this item actually spilled from pacific, I'm not sure what the status of this one is again.

B

A

B

Because it seems like a smallish item that we should get done.

A

Yeah, I think I guess the idea should we enforce ssd class. I mean technically it's just like something we assume or mandate that uh index pool should be.

A

I guess, there's no harm doing it.

A

All right, then, we are already running over, so I'm just going to go through the manager once real, quick. We talked okay, there's a separate card for this perfect um that you added sage, okay, so this is one we can say it's merged now or not much date.

B

I think it's not merged.

A

Yeah, which will be much.

B

A

Okay, then, we've got this item about, add a general manager functionality to store retrieve data in traders. So this is essentially the same thing.

A

We talked about man, autoscaler profiles inside scalability, auto scale avoid scale down. Then we've got this one, I'm not sure again, this one. What the status is h. Do you know.

B

uh We'd have to check, I actually thought we might have. It already included this, but I'm not sure this basically just include which multi-site rgw features being used in telemetry. We should just refresh.

C

Yeah, I think I think we collect just uh how much like um we count them, but um not more than that, um and it goes together with the other card that we just need to find a good way of collecting new data from telemetry.

C

um So we will not break the flow of the user reporting um because they need to re-opt in every time we add new data, so yeah, and it goes also um with a card for the orchestrator data and practically all new data collected the telemetry.

C

A

A

All right, then, the next card here is about capturing exceptions and crashes uh for the manager, um similarly similar to what we do for c, plus plus crashes. I think this is going to be another.

A

A

Doesn't mean any more discussion here um next, we've gotten this uh item from, um and this is also spelled over from pacific. The implementation was not that trivial. The idea is to be able to use the actual osd utilization for balancing um pgs by the upma balancer versus. Currently, what we use is the number of pg's per usd there's. There's a lot of discussion. We've done around this. We all think this is a good idea um getting it in. That is a left.

B

A

Progress improves scalability, we've discussed automated ost reformatting, and we have a lot of use cases where this will be useful. That's this uh provisioning!

A

This is another thing we all agree is um yeah, so next one is also something that fits in uh like a short-term improvement of our manager. uh Scalability uh also uh fits in with some of the trimming work we did with making trimming more dynamic, so this manager's tax period is again a static value. The idea is to be able uh to dynamically change this, based on what the ingest rate is for the manager and uh brad has been working on this.

A

And then there's this broadcast about manager and for scalability talked about um telemetry very shared data. Page. I see you added it.

B

Yeah, it's just making it so we can add new things to telemetry without sort of prompting people to turn telemetry off. That's gary mentioned him a moment ago.

A

um That's all this is one more small card about display detailed output for enable modules but yeah. I think this is another beginner card.

A

um You've got a couple of items here in messenger messenger, for compression on the wire.

B

We have a request for this. I think an implementation.

A

I'll get this uh there's a building, a request.

B

Here on the list- yes, maybe not being the most ideal thing, but it works so might be fine.

A

This is, this is the pr you were thinking right.

B

What's the point, I think we have to decide whether we want to move forward with this one or not. We can defer that probably.

A

um And then there is uh this item that jason added the point about new readers, rpg offers reactors or do something simple.

A

Async messenger.

B

I would put this in the whole bucket of discussion about how we need to look at the object or implementation itself, as well as the messenger and reactor frameworks, not sure this came up a little bit during cds, but it's not um it's not clear that it's. It's likely that there's more work that needs to be done. The object or itself would be to actually run in the reactor framework right now it uses the seo function, calls but doesn't actually it still uses that same red architecture.

B

I think this is there's a whole whole planning discussion that needs to happen around this stuff. Before we have specific steps.

E

D

I think it's a little.

E

Bit early for this play not quincy reveal yeah.

A

Perhaps you could just discuss this at the cdm and figure out what the next step yeah yeah.

B

A

um Okay, that's rotation, yeah! I think that's it anything else that I'm missing.

B

B

There's a perennial item, but I don't know if we have.

A

E

Yeah not the greatest level right for the quincy, at least.

A

Yeah frequency.

E

A

B

Cool okay, okay,.

A

Cool um thanks, everyone for joining and have a great day, see you later thanks.

A

F