Ceph CDS Quincy, 28 Apr 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Ceph Developer Summit Quincy: Orchestrator Follow-up

Description

https://tracker.ceph.com/projects/ceph/wiki/CDS_Quincy

A

Hi, everyone and yeah welcome to the eds follow-up, and, as far as I can see, we are supposed to kind of build some roadmap for quincy, I'm not mistaken. um What what I did is I've used the orchestrated, quincy other pet to write down what things were on the list at the cds plus.

A

I I scrubbed through the tracker and through trello and I've hoped to come up with a list of kind of high level things that we need to do I mean I, I didn't list bugs because um that that would be overwhelming and it wouldn't be sensible- and I also didn't include things that are minor additions to existing features, uh because that would also be kind of explode.

A

So what I would recommend to do, or what what prob maybe make sense, is to just have a look at the things here and then just decide if it's high priority medium priority or low priority, and then they can decide later. Does it work for you.

B

A

Okay, so then, let's give us a few seconds per item and then let's see how fast here so refractor safe adm into a proper python package, high medium or low priority.

A

um It's a developer experience thing and.

B

Are we sorry just a clear, are we putting this? Are you looking at the trello board as far as ranking these priority? Are you looking at the um the tracker.

A

I'm looking at I'm looking at the other pad. Okay, um I've copied the trello or do you want do we want to use a trailer? What what did you do for the other.

B

A

B

To use the trello just so that it's all everything is there and distracting.

C

B

Level items and.

A

Okay, I'm flexible so, and we can't properly schedule in high low and merging priority. So but.

B

You can just, I think, it's just ordering on the list.

A

Yeah um I mean managed client curing is already in progress right yeah. If we are looking.

B

A

um If you're, looking at the seth radium column here, it's already at the top, um minutechef.com is already in progress. Poststrain remove is kinda done already right, yeah the documentation, yeah uh clarify host maintenance mode.

B

That's kind of just for me because I don't understand how it behaves and I'm I'm nervous that it doesn't interact with scheduling properly. So I can. We can take it off, but that's on my to-do list.

A

um It's just a demon, stop the curios safely for a for one host, so it doesn't have to do with with the scheduler. It just stops the daemon and it's pretty okay.

B

My recollection was that that helper, that I renamed recently um that's all the schedulable hosts excludes the host maintenance, maintenance, hosts and.

C

A

B

Scheduler is going to move all those stuff.

A

We have another pro request open that just make sure that the host that's in maintenance. What is still continued to be used for scheduling.

B

B

A

um Nfs8A, I think that's kinda kinda in progress already as well. um Postspec crash.

A

Info, um it's a gap that we have compared to defensible. um I would also keep it there wherever it is right.

A

Yeah yeah make max get you the handlepod conflict, that's a bit low level. I think that's a.

A

Bug I'll take you in osd memory, it's hyper converted infrastructure. I think that's required or.

A

Downstream, oh always tuning. This is a gap that we have, I'm not sure so sure about those tuning.

A

A

Not agent exporter, I think it's super important, I think at the level of nfsha, because if we want to have fast reconciliation, then we are going to need the pernode agent or panodemon.

B

Yeah I had it there just because it seems like it should come after auto tuning osd memory, um which is more of a functional crap birthday agent thing. It's only going to be like performance at scale which.

A

um Faster reconciliation, I would think that we should focus on improving the architecture instead of just improving the speed, because if we are making really use of the agent to push updates to the manager, then we don't need to uh improve the performance of the reconciliation loop.

B

Yeah at the end, yeah, that's it might be redundant. That's I mean those yeah. Let's think of the same thing, I can pull that off. um Well, I don't yeah, I mean that's the. If we, if the agent gets us the reconciliation speed that we want, then we can just call that one done partly. I use this list also just so there's a when we finally merge and we're looking at the end of the release and we're trying to build this, a high level list of the things that we delivered.

B

We can see that we have faster reconciliation, but okay, but yeah.

A

I mean um I would my idea would be really to focus on improving the architecture and making sure we can update the dashboard pretty fast, even on really large clusters, instead of maybe sparing one or two ssh connections or uh um yeah, make uh cutting the the the time to carry a host by half, because it's not going to it. It's increasing the uh the the um static portion, it it's just a yeah, it's the constant part, you're, reducing the constant part, but you're not really having a big impact on larger clusters.

A

That's my problem was: was adjusting uh improving the loop okay, um smb number kinda important right.

A

We at least have a look at it.

A

uh How did you an mds memory.

B

They were missing that one's blocked because we don't have the um the high level config option for the mds yet um so we have to wait for that. First.

A

There is a blocked label actually.

A

B

Like monitor you do the monitor.

B

A

A

Ingress from manager.

A

A

I'm a little too yeah, I mean it really has some benefits right.

B

Yeah, I just that the thing I I started a pull request for this and I just kept running into like annoying technical issue. After annoying technical issue, it's just really complicated because the manager has so many ports and they can come and they can go depending on. If you like turn on a module, then suddenly the aj proxy configuration has to change and port conflicts demons that weren't conflicting before might conflict whatever it just got. Really it was getting really complicated and I, whatever.

A

I I had the same issue. I had a very similar issue. We need to reconfigure.

A

The prometheus immense, if the premises manager module, is getting activated.

B

A

It's a dependency between a module within the manager to a different.

A

Demon yeah and that got super complicated because there are races and and and I didn't wait for it for things, so I abandoned that part of as well.

A

Yeah, do we want to put it on the back burner, move it a bit further.

B

Maybe we can yeah. That was my thought. That's why it's that's why it fits below all the other stuff.

B

It didn't seem as high priority as um like nfs.

A

B

But it would be nice.

A

Yep smb, it's h. It's a, I don't know. Yeah.

B

uh We need sma.

A

First right.

B

B

A

A

Reflect their past package, um I think it's kind of important and I would really like to have it because um there are so many duplicated code within the staffidm binary, without really the ability to refactor at large scale, without being able to have proper modules.

A

So I would really like to have it.

A

Maybe we can put it a bit up.

C

One thing about.

C

This, yes, uh I think that, uh maybe um I I don't know I maybe it is related with the possibility to have uh some kind of fdm agent in each of the host.

B

We'll use it less if we have the agent, but we still need to do. We still need that ssh connection in order to run the agent start and stop the agent and all that other stuff.

C

Yes, but what I mean is uh if we are going to install the uh fdm agent in its node could uh maybe we could use the same wealth installation of this agent in order to install also fedm and to convert the vtm in a complete python package and divide the script in order to use modules and to make more clear, maybe we can use the same approach and, at the same time convert everything to a proper python packet.

B

Yeah, it could.

B

A

uh Yeah, it would make us less dependent on ssh, which is, I think, a good good way to do things.

A

But we would still have dependent dependency to ssh. That's.

A

A

B

Do we have a like an intern project or something for doing the remote part.

A

Yes, all right, yeah that I would reach our own, so I've added the quincy label for that.

A

So that's more or less said, and if, if that project failed, then I wouldn't make it high priority and just wait d here to fix it then, and.

B

Then yeah it's 20, 20, yeah yeah. The only thing to watch out for is that we are using the remote um like bit where it runs python code remotely for a couple of cases, as opposed.

A

B

Getting a command, so we need to make sure we sort of cover those effectively.

A

Yeah, but I think it's maybe three cases.

B

A

You're, adding a new one right, one.

B

Yeah, that's nothing much.

A

So for for putting a new file, we we have to replace it then with some with a different library, but that's doable right.

B

A

uh um Repay for oc's, is it something that you really.

B

Want not high on my list.

B

I'd probably put it below crimson, but the maintenance commands actually put me in his comments.

B

B

Oh, that's that yellow for hw multi side is already up there in the middle. It's check for rw multi-site bootstrap.

A

uh Perfect and we can just remove it- oh archive.

B

A

That's crimson, but I had that pro that um uh request, even before thinking about crimson more than half a year ago already, so it it's really something. Users are interested in in having the at least ability to cpu pin than our sd card.

B

So we right now we do it through the config options. That's it's sort of independent of that video, um but there are certain limitations to what I think yeah. My question would be what the what the requirement there is, but we already have. We already have it some degree like if you set up for, if you set an option on a single osd, you can tell it what numa node to use okay.

B

So the question is whether we need like stuff idiom to like divvy up the osd's automatically or something like that. Like maybe that's what we want. I don't know.

A

um So um the the issue is coming from here um in in bluetooth, so the the the request was to use uh ipmctl to automatically deduce the.

A

A

But if we have a workaround for that, that's kind of no such high priority to me would be great to have it right, but um only if you have the time to do that.

B

Yeah, the current thing is just super tedious because for every single list you have to like set a config option to tell it what to do.

B

A

But at least it's possible that's great yeah.

B

A

A

Progress items: that's um what kind.

B

A

Yeah um the um progress module items.

B

Right but like what? What stuff idiom actions would you want to progress items for like we services or something or.

A

Those these, for example, is a big issue. The osds are popping up for for for like ages and a user doesn't have any idea how many are still going to pop up.

B

Yeah there we go okay, I mean just in general, applying um a service um when we add remove. That would be that'd, be an easy one, because we calculate what we're going to do and then we do it. We iterate over the list.

A

A

A

uh Totality workload we have a few.

A

A

But would be great to have more.

B

A

There's something that we can hire.

A

B

For the client workloads.

A

B

B

Put it up here, no, like my my first priority, I think, is just making sure that we have like the smoke that separate em, subsuite or whatever that covers um as much as possible, with with short running tests, so like virtually no actual client workload, but just um deploying it making sure it works and then removing it um and expand the coverage that way, and then it seems like separately.

B

We want all of our stress tests to be migrated to use that video where possible, and then I get, I think, in a few cases, they're like they're, actual, like step fading stress tests that we want, for example like the um nfsha, for example. We should probably create a thrashing truss test. That's like destroying ganesha nfs servers and running a workload and making sure that clients are behaving and.

A

What about an um a manager, thresher yeah? Are we really sure that everything in safe adm is item put in.

B

C

Yeah, it's almost important.

B

It's perfect: what are you talking about all.

A

B

Manager thrasher that up here.

B

A

um So bind to specific ips for monitoring services. That's kind of important.

B

Yep I did ganesha as part of the nfsha thing. It's pretty trivial to you.

A

Yeah but it needs to be done, man, yeah, yeah, don't forget it.

B

Yeah, what does this dashboard rgw card.

A

Mean um so if, if you're deploying an ntw, you do not have any dashboard integration, I think for dashboard. We need to. I think we need two things right, configure the dashboard to know where it needs a doctor and set up a an admin user in order to make the dashboard able to actually do stuff.

B

Yeah, I would put that under the dashboard um trello, probably.

A

B

I mean it feels like we really need to just there's a whole that yeah.

B

We need to sit down and have a better plan there, because the dashboard uses the api endpoint that doesn't work until you deploy rgw, which means that you can't create realms and zones, which means I don't know whatever, but we have to read. I think, revisit that whole um approach.

C

Yeah it's another case we're trying to run the kratos gateway admin commands like from the manager like I had done previously. We need some solution to do that. No more helping yeah.

B

Yeah, it's come up a couple times in the rjw and dashboard meetings. So I think I think it's really a dashboard item.

C

Yes, because we talked about that, this process is probably is something that should be executed the interactivity okay. In order to see what happened in the in the moment that you are creating it's it's part of the architecture topology, so maybe a with art or something like that in the dashboard is going to help.

A

But we really need a plan for that, because it's a bit awkward. You know um you're deploying a writer's gateway with safety amp, and then you need to configure the dashboard to actually manage the writer's gateway that you just deployed.

B

Yeah yeah, I mean so as long as we still have to do that stuff. Then self idm should do it, but I think ideally we should try to figure out how to work around that requirement.

B

um I think we have a rough plan. I just need to sit down with yehuda and lee ernesto and just make sure that it's right the right path forward. Okay,.

B

But I would be inclined to leave this card off until like it seems like the set. Fadem part is actually just the dashboard needs to when you. If, if the dashboard is going to do like realm and zone management, then it should like follow, which apply and actually create you create them in the dashboard. But I'm not sure if it does that yet.

A

B

A

Works for for adw's that are deployed by cipher damp yeah overall. So if you have an rgw, that's multiplied by rook, you do we want to lose the functionality to deploy to manage those. I mean why not right! Why not? Why not? But.

B

Well, it yeah yeah. I think it. I think it could get away with doing both, because it could show all of the realms and zones based on the realm and zone config and if you create one it'll, also deploy it for you, but if it already exists, then it'll you'll, just see it. You won't be able to delete, won't work properly. I guess it won't shut down the demons for you, but uh that don't happen on this one.

B

Okay, so is there anything missing from the ad that's not on here.

A

Yeah, I just covered them over. That's why things just.

B

Pop up rotation- I guess, is the last one: yep yeah.

A

Is it something we care about, oh no, for quincy, or is it something for.

B

It's already on it's under the security one yeah I don't know, I'm not sure um how big a priority it is. Yet I would leave it near the bottom of the list for now.

A

Everyone happy with the list.

A

Do we have a consensus.

B

It looks good to me, I think the only thing to do is just make sure it's sort of ordered the way that we're actually going to do this stuff.

B

Like os tuning could probably move down, maybe I don't really know.

B

I don't know, I'm not sure how important that is.

A

um We will work around that. No susan worked around that by by putting it into a self-sort right, but it actually belongs into the video.

B

Yeah yeah, so maybe it's sugar's data, um probably fixing the monitoring ips can happen before smb because that's easy.

B

I think the spec for rgw multi-site should happen sooner, probably even before smb.

B

B

These progress items also are sort of a low-hanging fruit.

A

Now you have to properly.

A

Save the state and.

A

Kind of continue where we left when restarting them and when went failing over the manager, so we we have to persistently store them.

B

A

Right so it's not super low.

A

A

Yeah I I've seen so many issues, so so many problems where the progress items that are used in the progress module are not persistent, but only in memory, and I mean I mean in the original manager module that created those progress items in the first place and as soon as you fail over the manager.

A

The reference to the progress item is getting lost in the source.

A

A

We kind of really have to persist it for all.

A

Manager, progress items.

B

Don't they just go away? If you, if you're not persisting them, I thought they were.

A

I think that persisted in the progress not again.

B

I mean it seems like an easy thing to do, would be to not persist them and then have them disappear if manager restarts, because, most of the time the manager doesn't restart, and these are sort of cosmetic anyway, just to see I'm creating, you know four nfs servers or whatever it is, or I'm deploying image. Busties.

B

B

Yeah, I don't know.

B

I mean especially for like the service creation, the service supply, because the way that the algorithm works is stateless. Anyway, you look at the existing demons and you see if there's work to do and if there is, then you do it.

B

But if you reset the demon, you just repeat that whole process, so you just create a new progress item. The only.

C

B

Lost would be instead of being like four out of six. It would restart and say zero out of two and then continue, which seems fine.

B

um Anyway, okay, we can worry about that when we actually do it. I think.

A

Yeah, did we miss anything from that list.

C

What about the possibility of purge uh cluster.

B

That's got to be out of scope.

B

Because yeah stuff idiom can't purge itself, I mean, I wonder if we want a- and this is the doc that you're starting sebastian internally but like if we have a uh an auxiliary ansible playbook, like uh maybe stephanie.

C

B

C

It's not possible to execute the the fatm rmrm cluster in each of the host. In fact, we are using a uh file uh was file that is doing that. Okay, so I think that, for example, if we have a cluster and, for example, in the theft cell, we are moving the monitor and the manager to the to the host where we are running the theft cell. Okay, and after that, we are using an rmrm cluster in each of the host. We are going to remove everything.

C

Okay, and the only thing that we are going to need to to clean are ost. Okay, I know is this: is infrastructure? Okay, that I think that is easy, because we have the fav theft volume command. Okay, and after that, even we can uh remove the files that we have in the backlift theft configuration and backlog theft configuration. So I think that it's perfectly possible to remove everything to have only one host and in the moment that you have only one host you.

C

The thing that you need to do is to exit from the theft cell and to execute the fatm rm cluster again in the in the last host. So you can do that only with two uh commands. Okay- and I think that it's clean and it's easy and is fast.

B

I think the one thing that we could add would be a um a command that will look at this volume inventory and zap every osd that matches the cluster you're, trying to destroy.

A

B

Yes, so just like arm cluster, we take the f50 and it would like that.

C

Yes, so we can include that in the rm cluster command that we have in fvdm in this movement and just executing this command each in each of the host. We are going to clean the cluster.

A

So the problem is uh orchestrating the cluster removal from within the cluster. Is that you and you are optimistic right um if you, if you, if you don't, have a monochrome and you can't execute.

B

It if you he's not he's not talking about the manager module, he's talking about the cli, the selfie and cli command running.

A

I think both right- oh my god, you're talking about cleaning up the cluster from within the cluster from within the manager.

C

You can do that because you have information in the organizator about what is the composition of the cluster, but we can uh do also from the from the third cell and sorry from from there from the horse directory with the theft in the embinary. But if you want to do that from the inside orchestrator, I think that even that is possible.

C

The only thing that we need to do is to to be sure that we are executing in the same host that we have the one monitor and the manager active is the only thing that I think that that is needed. If you are in that case, you can, you are going to be able to remove everything in the rest of them.

A

That's like removing a kubernetes cluster from a pod within the kubernetes cluster. Don't think that's a good idea.

B

Yeah, I think we, I think it's yeah, I think we're better off having a um a cfd mrm cluster with, like seems like we need two things. We need stuff adm, like zap zap, all that you pass an fsid and then have for arm cluster have an optional flag. That includes the zap all for that same fsid, so you can do it in one command instead of two and then, um if, when you're gonna, do that when you're gonna do the cluster, you just do like a one line: bash.

B

That's like 4h in your host list, ssh host arm cluster.

C

B

C

A

B

A

I've created a track actually for that already yeah. Yesterday.

B

Well, I can add here the videom that um cluster.

B

Things like this.

B

To figure out what to name it, probably not zap cluster roasties, but maybe that should go near the top, since it's we're hitting it now.

A

um Maybe we can also do it like cfidm rm cluster dash dash with osds web osds.

A

I think it's a bit simpler.

B

Yeah, I just want to be able to do it both ways, because it might be that you already deleted. You already did the arm. The osd thing is already gone and.

A

Or I don't know at night needs to be item. Put it anyway, yeah right that needs to work without having a working cluster.

B

Yeah, I guess so yeah yeah. I guess that was zappos. These flag would do it to do the trick.

B

I'm just it seems like there's, probably a case where you want to zap the osds, where you don't want to remove all the all the demons so.

A

B

Whatever, but you can, you can make that's.

A

A

That's a good argument actually.

A

A

C

One more thing about the monitoring stack: do we have any card in order to provide the image that we want to use in for the different elements of the monitoring stack in the in the spec file?

C

Let's try not not in bootstrap. I think that, for example, if I want to to apply some kind of customizations and the most basic is just to to say, I want to use this this image, for example, for prometheus or for graffana okay, to have the possibility to to use an expect file with this information.

B

I know that a bunch of the specs have an image property, a container which properly I'm not sure if all of them do. But I remember.

C

Seeing it not sure.

B

A

The custom container image spec has a container image property.

B

Even some of the other ones, though, like like the the h a ones for example, had a bunch of properties for the people, id image and the optional, obviously in the ancient box image.

B

But we should just make sure that the um atheist one or whatever also has that's that property. But.

C

It have sense to to have a specific spec file for the monitoring stack items. I think that, for example, we can use custom images, but we need to to use a theft config command in order to change okay and to the deploy after that, the the service.

C

So maybe, if we can provide any kind of specification in order to to put this this kind of things and other customizations, because this opens the door to include more more custom, customizations.

A

Doesn't that actually work with the that you, you added the possibility to override config options in the spec form right.

B

Yeah yeah that.

A

B

A

uh What what is the key config and then you can.

B

Say manager slash that pdm slash, rafana, underscore container underscore image, probably something like that.

A

It actually works already and I've added that into the um into the other pad, but I agree that we need to have some kind of uh it. It's too flexible right on one hand, it's super flexible to have those config options overwritten by better specifier, and on the other hand, we we have too many competing ways to do that. Right now,.

B

Yeah, I kind of wonder if we should instead drop the this, the service, spec property for image and only use the config option just so that there's all funneling through one way.

A

Yes would be an option that.

B

That would limit the spec file or we include the well. No, because you can put those in the spec file or we can put them in every we. Otherwise we have to put it make sure we have it in every single spec definition the option to set the image.

B

I think it's not in all of them currently and then there's also um uh the.

B

We'd have to audit the code to make sure that the like there's a pull request that I'm testing right. Now. That um upgrades the monitoring containers and I'm not sure that it's.

B

I'm not sure that it's doing the right thing with respect to.

B

The big options or the service spec whatever I think it might only be looking at the concern options.

A

That's a great argument to not put it into the service pack because it makes upgrading it easier because upgrading the service pack is hard. Upgrading config options is easy.

B

I think it's yeah, it's also going to be pretty rare. I think that people do this right like for a downstream version. We don't actually want to use that mechanism. We want to just modify the code so that the defaults are different.

A

Yes, now I'm I mean I can see that for for upstream users that have internal registries, they need to do that.

B

Yeah, that's true. Yeah.

B

I I think they can set that globally on the whole cluster, also not on individual services.

A

Yeah, just that would be my preferred option to have one way to just update all the different default container images, and not you have to to change that config option for the save base image. You have to set this config option for the monitoring images you have to set this their respect property for the uh ha proxy stuff um having having one workflow that does update all the container images.

A

However, it looks.

B

Right I have another container image from specs. Is that what we want to do, then.

A

I think so, yes, at the top.

A

Except for the custom container image, because that doesn't.

A

A

All right for the custom container image you, you really need the image property.

B

Wouldn't, wouldn't you just use the config thing? Isn't that what you just.

A

No it it doesn't really work right. um If you have a custom container image for uh snmp hux hooks, then you need a specific image for for that service and if you have.

B

That's how this config option these config options, set it under nested, like config scope or just that service.

B

Oh, but you know what those won't work for monitoring, because those don't pull from, maybe they do have to touch a check, make sure they work for the monitoring services.

B

Yeah they don't work for the monitoring service, that's the problem because they aren't deaf humans, and so they don't have a config group, which is why they're yeah? That's? Why they're.

B

There, but that would only matter if you had like some demons: one service has needs a custom image, and a different service needs a different custom image that are both the same type like two different grafanas of two different images in the same cluster. I don't think that'll ever.

A

um We need to be able to have a specific container image per daemon.

A

That's needed for the upgrade kind of.

B

Kind of I mean that the upgrade is pretty uh aggressive for the monitoring stuff that just we just redeploy it. If it's different, there's no like um there's no pause for the monitoring upgrades. Basically, and so we don't. We don't need that because it's there's no pause.

A

But in general it would be neat to have some kind of the possibility to have a specific container image for one specific demon for all demons, independent of if they are safe demons or monitoring demons or custom config containers or at your proxy.

C

I think that to have different versions in for each diamond to open the tool to unexpected behaviors and problems now well difficult to to resolve. Okay, I think that well, if we need to we decide to to deploy a service, rafana monitoring monitors manage whatever. Now we need to use the same image for all the diamonds of one service.

C

Another thing is that you can use another different image in another service, but.

B

Yeah, I guess I would be inclined to wait until we have a an actual use case for somebody who needs multiple, different custom images before we worry about it, because it's it's work to support that or whereas what we have now, but it still lets you customize it. It just assumes that all your grafonites and some in the cluster are the same, which I think is.

B

I don't really think of the reason why that would not be the.

B

Case we have plenty else to do.

B

B

A

Okay from outside, we can conclude the the meeting here, except if you want to talk about a hindis.

B

um But what there was that I wanted to check on it. It's it's in pretty good shape. I think I think the main problems is just that it's um it's a little bit delicate like if you start. If you start scheduling services that conflict on ports, then things get confused.

B

If um I think this might be a problem with ingress in general, if you like, if you delete the backend service and the front-end service like it might throw an assertion or in some cases I noticed that in like the prepare, create and generate config functions, if those through assertions they get swallowed and aren't shown anywhere, um which made the booking interesting. Those are all these are all sort of unrelated to nfs, typically, but um we should clean that up at some point.

A

Yeah, we need to bubble up those errors right, but in documentation.

B

Yeah, I mean one of the things that I'm wondering about the well the biggest thing. Actually, that has to happen with the nfs thing is just testing um like it. It all fits together. The way that it's, I think it's supposed to fit together, but I don't need to actually scale the demons and make sure that I'm just guessing at the aja proxy configuration and we need to actually test it.

B

There are like a whole set of timeouts in there that I'm not sure which ones what values make sense um in the hprc config um and then the other thing that this has been.

B

I think the thing that most worries me is that um by default, the nfs service is on port 2049, the nfs port, um but if you're going to use, ha then that's the one that should be on 2049 and the backend nfs port should probably be something different in case they get scheduled on the same hosts, 20 40, 49 yeah, um and so I'm wondering if the nfs service should default to a different port, but the expectation that you'll then apply h a or or not. I don't really know.

B

um It also seems weird to have nfs defaults to not run on the nfs port, so.

A

I mean if we make sure that we that our documentation really state that you really are encouraged to use the h-a-n-f-s.

A

Then I think we're good and then, if we have a dedicated section about yeah, okay having two sections right, one one sec supposed to look like so, but if you really want to have an fs ganesha without ha then you need to set the port for nfs ganesha to the default port, 20 40., yeah.

B

Oh, I guess this is sort of related. It came up in the review. There's there's still this osd spec affinity, thing um yeah. I was always confused by this before I never fully understood it, but I think now that we have the service name property on all. The demons like three you can is that obsolete now.

A

It it it's like, we are putting the it's not completely obsolete if you're it it's more or less obsolete, but it's not completely obsolete. If you are the the osd spec affinity, is the osd drive the osd spec name, but it's written down into the lvm tags.

A

So if you plug out one drive, put it into a different server and then recreate the demands, then we kind of lose the ability to assign service names to it, but the osd spec affinity is part of the lvn talk, so that can be resurrected using the the lvm text. So it's not completely obsolete.

B

A

um By by the way, if you do that, then uh we have a bug, because we then have two different: we we have the same demon name on two different hosts. We need to periodically in our reconciliation. Look over all the all the demons find duplicates and delete all demons that are.

A

If, if we have pairs of demons and have one that's active and running and one that's down and broken, we have to delete the broken demons in in our reconciliation. That's about.

B

Yeah good point: okay,.

A

Because users are actually doing that you're moving one disk from one house to a different.

A

um What, in that case, what about the crash map? um What what's uh right now the crash is in kind of wrong, because it's that drive is then hooked and under the wrong horse.

B

Yeah the osd figures it out when the osd boots up it. Will it reports to the monitor and updates its location.

A

Okay, so it actually works perfect here, yep, no.

A

B

Cool all right looks good.

A

Have a lot of fun with enven evo fabrics, yeah.

B

All right see you guys, I.