Ceph Orchestration Weekly, 18 Jan 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Ceph Orchestrator Meeting 2022-01-18

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

Looking at the agenda, we have um yeah, we are, we have quincy, so we we have quincy um and we have a potential race.

A

um Okay, so, first, first of all the quincy release is going to be not completely imminent, but um we have to wrap up quincy and we've discussed it last week. So there isn't too much that we really have to care. At this point.

A

It's mostly that we should be aware that we by now have a dedicated quincy brain, so we have to make sure that we are properly back porting to quincy at this point, if there is something um worth back parting, which I think, every bug fix and every feature at this point is going to be worth backpotting to quincy.

A

And as always, my my idea is to propose we we're doing batch black pots again, um I did a bunch of uh individual back parts to pacific today and, and yesterday and my experience doing it- or I think it was last week already mike my experience or my take on it- is that we should avoid individual back pots by all means, because it's creating a ton of extra work, any thoughts better, maybe or mike.

B

Yeah, I prefer the badges is that even having just like having 10 pull requests open at once, just a lot of noise. You have to make sure you go and approve all of them and put the test tags on all of them and everything compared to just one.

A

Yeah, and actually the biggest pain point I discovered is that uh we are going to merge the backboards in a different order than they are in in the master branch, which means that at this point there is no way to properly sync up master and pacific anymore, because it's going to it's going to create a ton of extra conflicts and um yeah, I would that would continue with space backwards, anyone against bat, spec bots or for force f adm. At this point.

A

Mike any thoughts.

C

Nope, I'm all for it. It's worked well for us in the past. We've avoided a lot of conflicts. I guess yes, I'm together. So yes- and I did yes.

A

Okay, I think the the most impressive backport was about 150 commits or so, and maybe two conflicts. That was really impressive.

A

um Okay: let's continue with patchback parts.

A

A

Pacific, we are still need to backport the agent to pacific adam.

B

Yeah, I think, there's two things left open and then you can start working on the back porch, so those are gonna have to get uh so we can can we just backboard to quincy at the same time? Yes,.

A

Even though I wouldn't, I wouldn't create an extra hm backpack to quincy, because the uh quincy already contains the agent. So I.

B

Just mean for the two changes that are left: uh yes, yeah, yeah, yeah.

A

But um pacific is going to be a bit more more work because you have to resolve non-trivial, merge conflicts.

B

Yeah there'll be a lot um again once those two changes are in it'll. Probably take a few days, get that working.

A

A

So we really have to hurry up getting the two remaining issues into pacific.

A

um What what exactly is missing.

B

You'll find the names um so there's I know, there's one for using its own cache object, workflow in the cache size and uh what was the other one, and then we already get merged in actually.

B

Okay, I think you already merged in an hour ago, the other one. So it's just the one. Okay.

B

Okay as well, I got to just go: look at it and get a chance great.

A

Okay, that's for pacific um anything else! You have to look into for pacific at this time. I did a bunch of individual pacific back part, as I mentioned a few seconds ago, and it was a pain.

A

It was so painful.

A

Okay, um the next topic that I have on my list is the um a race in the surf loop um against other things happening in different threads, changing the data structures um mike. I guess you have investigated this a bit right.

C

I casually looked at it. I can't find a root cause, one of the things that I've speculated when I've seen this is people are doing some of the synchronous commands when the server clip is blocked, doing something else like so, for example, a second or daemon ad, or something like this um async to the server being stuck somewhere else, which then, as the result of adding things to the host cache.

C

So maybe you should not be there, such as like a daemon, that's in a strange state, but um we have to be able to reproduce that there's like a very narrow timing window in there.

A

A

So yeah, um as far as I know, I have created at least two pull requests trying to fix individual races.

A

um The the problem, I think the main problem we are facing is that the surf loop itself is really um dependent on having.

A

The order as it defined in the surface so first we are refreshing the host and the demand we're getting the status of all the daemons and we and then we are deleting straight demands.

A

Then we are deleting our sds and then we are creating new daemons based on the server specs and then we are checking all daemons if they need to be reconfigured, for example, and.

A

It looks like if we are changing the data structures in a way that's unexpected. As a surf loop, we are creating weird side effects.

A

One pull request I made a year ago was that we are sometimes creating three or one additional co-located monitor on the local bootstrap host.

A

A

As I said, the the host add command was was bracing with the service edition, so we we created um so first we um can actually share my screen.

A

Here, can I actually see that, can you see that, can you see my screen yeah, okay, so um one one race I've found a year ago was that.

A

This line didn't discover any on the local bootstrap host.

A

Because at that point the local bootstrap host wasn't added to the to the safe atm manager, module and then in between those lines, the local bootstrap host was added and then apply our services added another manager on the bootstrap host, in addition to the bootstrap manager.

A

So we ended up with two co-located managers by by accident. We fixed it by making sure we have.

A

We have checked the host demons before applying uh services on a given host that so that was a workaround for this particular thing. The next one was a pull request.

A

That was a race between.

A

A

um Nfs migrations.

A

Let me try to find it.

A

So this block was missing, so we are removing.

A

We are renaming the server spec file.

A

But we didn't remove the old daemons so when, when migrating the nfs servers in that particular upgrade step, that means we, that meant, we.

A

A

There is a possibility that we first deployed a new nfs, ganesha demons and then in the next iteration, we deleted the old nfs ganesha daemons, which didn't work, because we now ended up with with a port conflict between the old servers and the new servers. So in general, everything worked because we first deleted the old straight humans and then created the new demons. But if it, if we are racing with a surf loop, we are creating the new nfs ganesha daemons before deleting the old ganesha demons.

A

A

What do we want to do about this? One about this race that we have encephalia.

A

B

But that one you were just talking about, would it always do that because it looks like you're going to migrate, then we apply services every time.

A

Yeah um I fixed this particular race by by manually deleting the old um by manually, deleting the old.

A

Demons here I'm I'm deleting the the old nfs siemens.

B

A

B

Is was this actually like a race that happened sometimes or seems like? It should happen every single time.

A

No, it only happens sometimes because it depends on where we are in the cell.

B

I just thought it would always go like you delete the straight demons and you go to migraines, but the stadiums aren't gray yet because you haven't done the migration.

A

B

A

Migration happens in parallel with the with the surf loop.

B

Oh, it does and realize it was just on the red.

A

No, I had the um that's a different thread: yeah.

A

Oh, it might end up in a different thread exactly um so we we ended up doing the migration in between those lines here before deleting the old demons and and creating the old demands, and if we do that, we're creating the new nfs demons before deleting the old demons ending up in a port conflict. I I did it. I I resolved this particular race.

A

The problem is, the problem is.

A

Can we do something in general to avoid this race in in all circumstances, or do we need to, or can we just live with this race and just make sure that the surf loop is able to cope with things.

B

I mean most of our synchronous actions. I don't think would do anything that would cause issues.

B

I don't know if it feels like if this all happens in the right.

B

I'm not sure how the threading works with the serve loop. I thought we always did all these things in order.

A

We do, but only in the surf loop. Some some operations are in indeed in a different thread, like the migrations might end up in a different thread. The adding hosts removing hosts might end up in a different thread.

A

Applying your specs might end up in a different thread.

B

D

B

I don't think what we could do the migration we'd have. I think, if we really wanted to make sure we never have any issues, I think we'd have to just almost do it as part of the server loop. Rather have it been its own thread be slower, but if you're really worried about it, we could do that for the synchronous stuff I feel like if he really wanted to never have any race conditions.

B

We'd have to almost like whenever someone like adds something in a spec or a scheduled action or anything we'd have to almost cache it somewhere and like the end of the serve loop, we actually take them and put them into the real data structures.

B

But again it sounds like uh something kind of annoying to do and I'm not sure how often these things are going to come up. That's worth doing.

A

What's up with the agent, the a10 is also working in a different thread.

B

Yeah, so the agent does there's only one data structure, it's really updating and that's just the demon cache, but we get around that by having that uh like host metadata up to date check, and so we never apply services if um that check doesn't pass.

B

um This is the way it works. Is that every time we apply a service and we say we're going to add or remove a demon from a host we mark, the metadata is not up to date anymore, and then it's considered not up to date until the agent replies with more metadata. That includes correct, like counter value.

A

No okay, so we have kind of resolved this particular race, at least from the agent side, without meter data up to date, flag.

A

Can can we do that? Also for uh for the other um operations like uh from the cli.

B

um I mean, theoretically, we could, if somebody applied a spec mark, all the hosts is not up to date, anymore, required getting metadata again before we apply anything that would guarantee refreshing demons essentially happens before applying specs, and you do the same thing with scheduled actions as well. We mark that particular host. That's not updated. We really wanted to um again that would just basically enforce refreshing demons always before doing anything else. On those.

A

Sounds like a good idea.

A

C

I was, I was trying to think: would that have an impact on scalability, no global lock around the entire data structure. In essence,.

A

Am I the only one who can't hear mike.

B

I can hear him: the audio quality is kind of low.

C

Oh, I said that it sounds like a global walk around the data structure. Is it me again? Oh, you cannot hear him at all.

C

Maybe am I honorable.

B

I can hear you mike yes.

C

B

I can hear him, I can hear michael's phone just sebastian.

C

But in essence yeah, I think it's a global log, so I wonder what that would do just scalability in essence, let's slow things down and and reduce the benefit that we're gaining from the egypt. I don't know.

B

um I mean right now that is the only one who is using it.

B

Let me just skip it, otherwise,.

B

Another way we would have to add that in and what effects it would actually have. It would just be for scheduled demon actions and applying services.

B

When the agent's active applying services, it already does it anyway.

C

Right yeah, I mean, is this per host like what's the granularity of this.

B

Yeah, it can be done for hosts, so the way it works right now is it gets turned set to like false. Whenever we create or remove a demon on a host, then that particular host metadata is marked out of date. It was really only intended for fixing the um applying specs and you want to like double place: demons yeah, so it gets marked um false.

B

Whenever we remove a daemon and then we update the counter value, and then we don't uh mark it back up to date until we get new metadata with the right counter value.

C

Seems like it's worth a try to me. Yeah, I mean one of the other things I've seen is. If you kind of pound the scheduled actions over and over like say, a redeploy or restart you queue a whole bunch up. You can confuse the scheduled actions in a way that um the servlet's broken as well, and the fix for something like that is drop. The host cache so so putting a lock around the host cache sounds like the right thing to me.

B

Hey man, I kind of like want to see the actual effects. You know I'm kind of confused how this works with scheduling actions, housing issues- that was one I didn't think would be a problem just because, like you just schedule them, then at the end the server was one responsible for actually doing anything.

B

What is actually happening when you are scheduling is at weird times.

D

Another place, maybe what, if with another start any time on in a host, if we see uh that there is a diamond of the same type for the same service,.

B

You mean, if there's another demon on that host, that's the same type. You don't put a demon there, yeah, that's kind of.

D

Hard, it works the.

B

The issue we're running into is that we don't know the demons there yet and we try to. We think that we are putting the first one down, but it's actually the second one.

D

Yes, but in in one moment you are telling start this diamond in this host. Okay, in one moment, we are issuing this command.

D

So in that moment, if we check that we have another diamond from a previous version in the same course from the same service, we need a. I think that we can stop and uh do not launch the new the new diamond easily.

B

Yes, itself to check for demons.

D

Is in the moment that we are going to start the diamond when we are going to in the moment that you have uh make all the checks? Okay in the in the same moment that you want to start. The new diamond is the moment that we need to check if there is another diamond of a previous version or the same service running, and if this happen, but we can abort the staff.

D

Probably we will need more information about the diamond okay, that is running about versions or get this this information directly from from the diamond. I don't know how to do that, but in this way we avoid to really in any kind of extra data structure in the gazette or in the in any other place. So it's in the moment that you are going to start the diamond in the host is the moment that you are doing the check.

B

That would enforce it being synchronous. I guess I think it's just implementation wise, I'm not sure how to do it.

D

The thing is that in that moment, you need to get information about the the container that is running.

D

What is the service that is, that belongs to the container and what is the version of the diamond that is running this in in three years time, just in the moment that you are going to start them. I think that this is the the more difficult part. Probably we will need to add more labels to the containers.

D

I don't know I don't remember exactly what are the labels that we have in the containers.

B

So that information sounds like it's almost exactly what our ls is, but we don't want to run that every single time we want to play some demons because it's so slow.

A

Yep, the the problem is that we are that that we can't do that right um because it would be too slow, and this is why we are adding the the the agent in order to push information to the miniature module asynchronously.

A

But now, as a side effect, we are having to deal with uh income with both possible inconsistencies between the data we are having and uh and in the real world, and um I think the the way adam you fixed it in the in the 800, with the metadata up to date. Flag is a good idea that you should leverage more often also in the in the non-agent cases, because I think it's a an easy way to avoid this particular problem.

A

um Given that the self-monitor key value store on the keystore doesn't have resource ids, I mean if we could get a resource version of a particular time, then we could just get a consistent snapshot right, but we are not going to have it. We are. We have to deal with the the fact that new updates to our data structures are overwriting existing values and- and we have to be aware that this is racing.

B

I almost feel like a takeaway is that it might not be totally possible to just 100 eliminate these things, but they have to try to cover the cases as they go and do stuff like this, where we try to make sure things are up to date. Before we take actions, yeah.

A

C

There's one one.

A

Other thing we gotta consider.

C

Like per um so that rbd issue, I found our set volume hangs, but if we acquire this and that the thread never comes back, has the potential to hang everything up too.

A

We cannot kill threats.

B

And this came back to, I know we just on wednesday. In this end up we had some ideas, I think we're talking about like timeouts and um like watchdog threads or something um we cannot.

B

We cannot kill threats.

A

Okay, the problem is that, if you're killing a threat, you're not cleaning up um blocks, you're, not cleaning up some airflows, which means as soon as you're, killing a threat. Your application is prone to deadlocks.

B

So one of the ideas we had was because all these places where it's getting stuck seem to be in the middle of these ssh commands right now the ssh run commands don't have any timeout. So if they hang, they hang forever.

B

um If we should input a timeout system, but we never got anywhere with the discussion. I don't think because it's sort of tricky to come up with good timeouts, because you can't make them really small, because there are certain commands that are slow like a deploy command. That has to also pull an image. Gonna take a while, but if you make it too long, then you know you wait for 20 minutes. Then it runs the circle one more time. Then it gets stuck again.

B

So I mean you could raise a health warning at that point at least, but um it's so slow, still, no there's no inclusions or anything.

A

One long-term solution would be to make use of the agent more often all right, don't don't synchron try to really avoid synchronously downloading the container image.

A

When doing ssh operations, does it work with an idea item.

B

Yeah um it would take a while to get to there, um but I mean it could be: maybe a launch, a long-term solution, I'm going to start short term. It still is probably worth having some time out like a really long one, maybe and just raise the health corner. So people know, instead of just the circle of hanging forever um in the long term, maybe go with that.

A

Back in the open attic days, five years ago, we had the same problem with the raiders commands, never coming back and we solved it by spawning new.

A

Processors for every raiders command we created because we weren't sure or we there was no way we could be. We could reliably.

A

Kill raiders commands in open attic and I fear we are having the same issue with uh with hanging ssh commands. If we can't find a way to have a supported way to the border's edge commands, there is no way we can.

A

Properly terminating ssh connections, except for creating physical linux processes for every ssh connector for every linux, ssh for every uh established, ssh connection, that's going to be super resource incentive.

A

A

Anything else to that particular topic.

A

Okay, um I have one more thing to do and it's announcing that I'm going to leave redhead by the end of january, so this is going to be my second last orchestrator weekly.

D

Next week you are not going to be with us.

A

um Next week I'm going to be here, um but next week is going to be my last orchestra weekly.

A

Yeah anything else for today.

B

uh One thing I just want to clarify, because I know we're talking like about this on thursday, this hd proxy image for our upstream ci, like you said, we were actually mirroring that somewhere.

A

A

Yes, there is a docker cache docker registry cache. We have an upstream set here, because the problem is that upstream, scepter is downloading.

A

The magnitudes orders of magnitudes, more images than the app than the docker container docker registry allows us to. So we have to have a um the proxy registry for dakai dot, io and up jim sapien, we're doing it. David onset.

B

Yeah, I wasn't sure where they were. If um because I was just asking because the openstack team is still looking for those containers like where they pulled them from, because they still have been pulling the docker ones and their ci is breaking with the.

D

B

Limits yep, but I guess they can't use that one.

A

um No, it's it's internal to optimizapia. Unfortunately,.

B

I think you might might was saying: maybe you could mirror them. Nikoi.

B

Images I might have to go with that route.

A

um Maybe even I was linked up with david galloway, maybe he has had a better idea.

B

I did ask the openstack team to send him a message. I think they did send an email last week, but I don't remember if they included him in it. I can ask them.

A

Okay, that's it and adam you're going to take over this orchestrator weekly meeting.

A

Perfect. Thank you.

A

Anything else anything else for today.

A

Have a great week and let's end up next week,.