Ceph Performance Weekly, 9 Jun 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Ceph Performance Meeting 2022-06-09

Description

Join us weekly for the Ceph Performance meeting: https://ceph.io/en/community/meetups

Ceph website: https://ceph.io
Ceph blog: https://ceph.io/en/news/blog/
Contribute to Ceph: https://ceph.io/en/developers/contribute/
What is Ceph: https://ceph.io/en/discover/

A

So, okay, uh I don't actually have a lot to talk about with uh the pr so go quick. um I did.

B

Not see any new.

A

Or updated ones this week, um just I think uh everybody's taking a little bit of a break after all, the quincy work that happened, but uh there, the the one that did close was this tracing pr. um I approved it last week uh the new numbers looked really good. That came out so um that that has now merged, so that's really good. um Otherwise, I I have a bunch of stuff. I need to go through and figure out what to do with um a bunch of pr's old prs.

A

I still need to look at the cpt toothology proof. Testing thing. I don't know why it's uh apparently sometimes not giving results, but um I don't know it's uh there's a there's a lot of glue code in all of this, so probably something to do with that.

A

um Let's see dc malik, enabling tc malik with cstar that we have to decide what we're going to do about um the the uh sun errors that we're cropping up with tc malek. I don't think it's our fault. We just have to decide if we care about them or, if we're just going to kind of whitelist it. So um what else?

A

This time-based adaptive, near-fit algorithm is low priority, but I can at some point update that.

A

See we don't have chorus adam, I was gonna say this, um uh allowing the control, tc malik thread cache size with a stuff configuration option. That's still, actually, I think a really good idea, but I don't know not necessarily super high priority.

A

uh Gabby related thing we were just talking about um so in the ether pad. There's this pr, which um adds some old map benchmarking capabilities to the google test, suite store test, um but it adds a lot of extra stuff in store tests like it takes longer to run, but it's it was actually really useful when we were playing with it. um This may help give an idea of like what omap and roxdb um uh overhead looks like like how fast we can do that.

A

It doesn't necessarily tell you about like performance while you're running this, but you could run this in another benchmark. At the same time,.

C

Are you saying, there's a benchmark for um snap remover.

A

No, it's o map.

D

Okay and snap mapper is using right. Every snap object is represented by a dual of all maps, one from object id to snaps and one from snap id to objects.

A

Yep- and I I think, did we determine um that iteration behavior was still a big deal.

D

uh We we are not sure, but whatever is the pro actually sorry, there is no iteration the current code. What is doing is doing iteration size of two, so it's unlikely that iteration is a serious cost, but.

A

D

Hope to change it and then do each time I don't know, maybe 100 objects each time so iteration would become all of an issue, but but another thing we think might be possible to do. There is to change the data structures at the moment. The way I understand it, snap mapper, is a single global object, holding all the snap information.

D

I think we should break it down by pgs. Yeah fpga would have its own mapper.

D

So if you remove a snap object, it's only going to affect the sorry you you're going to remove the snap on every pg, but mpg is standalone thing. So I don't see what kind of dependency would be between two pgs, but you might be missing something and then, if we would break it further and make it every pg would have a table of snaps and only there will have the object. Then, when you remove it, it's going to be an object that nobody else is accessing. So you don't even need no locks.

A

That that piece that you just described, I think, was the only thing really holding us back from doing like for pg sharding in blue store. Oh there's I mean there's other stuff, but not really important stuff like that was. That was the thing that always came up as one of the reasons why we we can't do like independent.

A

um uh You know hashing across pgs for or you know, pgs across charts right um that you had this. This global snapmapper thing.

D

So I'm not sure why is this net mapper a global object? I cannot see any benefit from holding everything together and I could clearly see benefit for breaking this into multiple small objects. So, for example, you're not going to have lock collisions and such.

D

And it's not that it's the same object. So you know logically, it's one object, but since every pg standalone entity, why do we need to share data across pages.

A

Josh d: do you remember because I don't I don't remember there ever being any real reason why it looks the way it does, but that's it it's every time I've asked about it. It's always kind of been like uh you know, kind of a hand wave you don't don't really remember.

A

We don't have josh.

A

So gave me the.

D

Anything to do if maybe, if you want to create snap across multiple pgs, you could do it using a single log, but I would expect that you'll be able to grab multiple logs and then do the same things.

D

Maybe it was just because snap creation is a very short time from if you look at the lifespan of a snap which is access every time, there's a right or if there's somebody trying to read so that thing is going to be long live, but snap creation is really a fraction of a second.

D

So we should not optimize for that case and we should optimize for anything else. Snap removal, snap access, snap right, snap listing and such and by breaking into multiple pgs. I think we should be able to optimize the access.

A

Yeah I'm trying to go back and look at the history of the snapmapper.

A

A

Just looking at plug requests and it works so well.

A

So have you have you already done this gabby? Have you been if you look kind of at the snapmapper source.

C

Oh history, I didn't try to look at this.

A

D

It's by reading the online documentation, but it was extremely short.

A

I mean, like, I think,.

A

D

I could think about is somebody was trying to provide an easy mechanism to create a snap across multiple pgs, because that's a single synchronization object, but I think that thing could be done. Even if you got multiples network one rpg, you could still synchronize the creation by grabbing multiple locks.

A

So uh in the in the chat window, I posted a link. I think this was the original commit of the snapmapper.

A

Back in 2013.

B

D

Okay, so it was done by sam just so we should really get him.

A

I don't know how similar this looks. The code we have now actually.

C

It looks very similar.

A

Yeah I suspected that this probably hadn't changed a whole lot.

C

D

I mean there is now more event racing and such.

D

And more sanity checks and more error handling, but actually that's a cleaner piece of code because we don't handle all the corner cases.

A

So yeah what this is appears to be the state that it came in in like what original the original uh you know, intent or thought was was here.

C

Do we have so here's the code, but where is the pull request.

A

I don't know if there was one at this point. This was fairly early on. This may have come in just as a direct commit to the the main uh branch.

C

So there is, where is the commit log planning something.

C

C

Is there no commit message.

A

Doesn't appear to be one, I see just a title.

A

Things were a little more um yeah, less documented back then. I think.

A

A

I wonder if there might have been something else in the tracker.

A

D

We should definitely get sam.

D

Otherwise, we'll just be guessing what he meant.

A

Yeah yeah, I was hoping that maybe there would be a pull request associated with it. Maybe there was and we just lost it over the years. I don't know there were a couple times where I think we reorganized things, maybe maybe somehow some of that metadata was lost. If there was one.

A

But in any event, um so your proposal, gabby, um I mean the the gist of it- is that you were. You want to do the same thing with the snapper that you did with um the allocation data right, yeah.

D

So so there is, there are three parts to the suggestion which could be done, independent of each other, the first one which I think is the most obvious.

D

It's object. Node got duplication data which a duplicated set of data which could be used to recreate a snap mapper. So there is really no reason to make snap mapper protected by rockstd.

D

At the moment anytime, we update the snap state, meaning if we are writing to an object which is protected by snapperclone and we have to copy and write, then we need to add new mapping object to snap mapper from object id to the snap id.

D

If we also add another snap version, we need to again edit this thing and then there is a reverse mapping from the snap id to the object and we need to add one more entry for this.

D

So my understanding is that every time we add or delete an object touch by snap, it's going to cause us two rocks to be accesses and when the snap is deleted, we tend to do the delete in burst right. So you could have thousands of objects.

D

We need to delete them all and every deletion is actually translated into two dimensions. So there's a burst of tombstones creations and that's one. We suspect that diet. That might be one of the reason for rbd mirroring pain because rvd murray is using, is creating snap every 15 minutes and once then it's iterate over the snap to copy the data to the remote side.

D

Once it's done, it's going to issue a command to the latest name, which going to take all the object and remove them.

D

But there's going to be a burst of tombstones and orbs to be really doesn't like tombstone, and my suggestion was take snap mapper out of roxdb completely keep it as an in-memory data structure like we have for the allocator and when there is graceful shutdown, we could enumerate the snap mapper and this it into a single flat file.

D

It's going to add a little bit of time to shutdown process, but I think it's only going to be tends to hundreds of milliseconds. It's not going to ever reach one. Second, as I've seen with the allocation map.

D

I think the worst case scenario huge map was taking half a second, so I think it's safe to say it's going to be no more than a few hundred of milliseconds on a very big system.

A

And and right now, the the iteration that you mentioned when you iterate um and then is that before or after the deletion is it also iterating during deletion.

D

So deletion, you do an iteration, but for deletion you just start the iteration. Take the first two object and you are done and then you're going to start another iteration. But if you do a copy, I would imagine you do another iteration, so in rbd mirroring they probably have to iterate over over the snap to know which object to copy to the remote side. So I sure iteration you see.

D

Let's just uh delete this later. I just want to first introduce the first concept yeah.

A

Sure sorry go ahead.

D

So the first concept saying we don't need ropes to be to give us protection for the snap mapper at shutdown it's going to be staged to a file. It's going to cost us hundreds of milliseconds in worst case scenario, no big deal on startup. We going to read it from the file into the memory object, and- and here we actually going to save time because reading from a flat file is much faster than reading from works.

D

To be my experience with that location map seems to indicate we could actually save, maybe one second on a again big system if it's small system none of this matter. But if it's a big system, I I I've seen that reading the allocation from roxy b can take a few seconds again very big system, hundreds of million of objects here with snapmap. I don't think it's ever going to grow to 100 million of object.

D

So, probably whatever you do, it's not going to make any performance impact on shutdown or startup, but even if it does we're going to slow down a bit in shutdown but make up for this by by a faster, faster startup. So that's the easy thing and we already have the code.

D

Sorry, we have the entry point when we could, when we can safely stage the map, because we know when roxdb is disabled and when blue storm is active and there was a lot of race conditions at the time when we did the allocation and we needed to find a perfect position when roxy b is inactive, but glue stories is still active, but blue store is not moving stuff internally.

D

So all this thing and all this pain we experience, is going to benefit us now, because we have the exact entry point to call for store and restore. So that thing should be relatively easy. It's essentially serialization of the object, doing a right to disk and again reading from the disk and populating the object should be very simple process, and this time it should be safe because we know what we're doing and when we it should be safe to do that. So that's part.

D

One part two is, of course, what should be done in case of a disaster, because roxdb used to give us protection from disaster by using right ahead logs and all uh the availability concept that you got from works to be we're going to keep rocks db so in disaster, we're going to lose that the snap marker the same way.

D

We lost the allocation map in a disaster and again we could use the same mechanism. We can iterate over object, node in rocks db and recreate snap mapper from them.

D

This is what we're doing for that location map, and that process is expensive and on big system it can take minutes and again, big system means hundreds of million, maybe billions of objects.

D

But I think this time um we're not going to add a lot of cost extra cost because we have to iterate over rocks db object anyway. This thing we do for a location map, and that was the biggest part of of the coast.

D

I think it was like eight to ninety percent of what we do is the cost iteration of the rocks to be everything else was relatively simple.

D

So if any way we iterate over rocks db process the object recreating the snap map, I assume it's going to be a matter of more seconds. Maybe it's going to add 10 time to the overall process.

D

I don't even think it's going to be that that's expensive so and again also snap marker tends to be much smaller than the allocation mode. We don't have that many entities in the allocation map. In worst case scenario, I was talking about billions of entities.

D

I don't think snapmapper is holding billions of objects which have been affected by by snaps, but even if that would be the case, I would expect that recovery time would be maybe 10 slower than it is now.

D

So that's suggestion number one which I expect it.

D

It makes sense and I run it by george targan and I got some feedback from matan and the feeling is that the object not have enough information to rebuild the data, because that's the real problem. How do you rebuild the snap map from the object node and there was agreement that the information there is is sufficient for rebuilding that's project number one independent of this project.

D

We suggested another project project number two, nothing to do with this project. I mean they could be done one after another. We could choose just one of them or we could choose them and the project is dealing with just a deletions.

D

The first project is going to impact any access to snap. The second one is only going to impact.

D

Which is a very common thing or rpd neuron, again avd mirroring every 15 minutes. We generate a new snap.

D

We move it to the remote.

C

D

And then we delete it, the deletion that is done today we iterate over snap mapper. We take each time, two objects. I don't know why we chose this number two but never mind. Even that number is not utilized really because for each entity for each object from the two that we took. So the only thing we use the two is just to make the iteration more efficient, but really two is and then for each object.

D

We act as if this was arrived, meaning we need to create a replication uh job and a pg log entry to represent the fact that we are going to delete the object mode.

D

So every object node that we're going to delete is going to generate the pidgey log is going to do uh communication with the replicas and it's going of course, to go to roxdb. So first thing we go to roxdb. We delete the object node from the snap. We also delete the two objects on the snap mapper, which, if we do the first project, then that part is going to go away, make it cheaper and we create a pretty long entry to describe it.

D

We send it as a replica when we get all arc, we delete the pg lock.

D

The suggestion here is: why should we be doing a single object each time? The pity log that we generate to protect us could be utilized to describe a big list.

D

The number we are thinking about, maybe 100, but of course, and we could go 50 200, I don't know, just find a good number and then we're going to generate a pidgey log entry with the list of all the entries that we're going to delete it's not all right. It's I think the problem is that we simply took the code we had or client right, because the client right it's something we cannot anticipate.

D

So we have to do one each time at the time. I suggested that we would merge multiple housed right into a single pg log entry.

D

Or the benefit of decreasing the amount of pg log and tombstone, and there was some objection to this because it might change scheduling and such and ordering- and I don't know so I didn't continue doing this, but in deletion there is really no ordering because the object, because the snap is disabled.

D

So there is no reason to do. One object at a time. Take 100 create a pidgey log entry. Remove them locally replica would remove the the the full list of 100 object. You got all you remove the pg log engine, so the pg log.

D

Pain is going to be amortized on maybe 100 object, so in fact it's actually going to mean that pg law going to cost us 1 of what it used to cause us and, of course, we're going to say dcp communication with the replicas, and we can do everything else much somewhat faster. So that's project number, two.

D

Anybody any comments, suggestions, corrections so for.

A

Project number two gabby, um I I was wondering uh where, where is the code that governs the the number of deletions that is happening at the same time.

D

For you um actually can I I will share my screen sure. Okay, so this is where we start primary log await a sync work react. This thing is being called externally.

D

We start by creating a vector of h, object called to trim which is going to hold the object we wish to trim, and there is the max number we're going to use which is taken from here, pg cct, max concurrent snapstream. That number is two.

D

um We reserve place for two entries and then we call the snap marker get next object to trim.

D

We pass in the vector the two trim vector here and we have the limit of two sorry, that's the the vector and that's the snap that we wish to trim we're going to get two objects here.

D

Get next object to trim.

D

We pass it the snap, the max which is 2 and a vector out vector which is going to use and then is iterating over the prefixes and generating two objects, pushing them into the out.

D

Yes pushing them here, but the number is really just two, but even that it's no big deal, because, let's assume you change it and make it to be 100, then you ask the snapmapper to give us the next object to trim and everything went fine. There were no errors, then we reach here now we iterate over the object that we got from the two three. The two trims give us list of object belonging to the snap which need to be trimmed, and then we called pg trim object.

C

This thing is doing the train.

C

There is something in the end, it is a pretty long chord.

C

C

D

I'm going to ignore it for now he's going to do something on the local store. Okay, sorry, here's the walk page is simple operation submit this thing is done inside the loop. Do you see that this is the loop.

A

D

Loop is done on the on the object inside to trim so for each one of them. We call dream object to manage locally for the changes, and then we generate a transaction transaction is done here.

D

Okay, here's a transaction and it's done for each object. We call issue replica operation, evan, replica operation, recovery, state and then replica operation boot.

D

If you recover your operation.

C

It's doing this.

C

I think it's here.

D

Is calling in to submit a transaction?

D

So this thing keep in mind. This thing is called for each and every object. In the end we call pg back-end, submit transaction.

D

Which? Eventually.

C

D

And which calls into update snap map.

C

Update snap mark is working.

D

On this getting everything again and then snap marker update, snaps.

C

Remove object id a second.

D

Okay, sorry so update snip map is calling into snap mapper remove object, id remove object. Id is doing this. We are still working on a single object again. I know it seems like a lot of code, but we're just doing the single object get snaps.

D

Then we clear the snaps we see, we click object id from the snaps. We again create the object we wish to remove and we insert.

D

We insert the snap the object id and the snaps and there is something else. Actually we did maybe here sorry clear, snaps.

D

Yeah, so we're doing back-end remove keys with the transaction. This one is going is.

D

Doing that's the object id, so this one is removing the object id to snap map and this one is removing the object. The snap map to object led mapping, so there's two objects and each one of them is calling back and remove keys, which essentially update a transaction.

D

Is that enough? Does it make sense.

A

Maybe um while you were talking, I simultaneously was looking at um how we kind of um ended up with some of that code. I think um so. I'll be honest. It doesn't totally make sense to me because I was, I was only half paying attention, but um I was also looking at this and I wanted to get your your opinion on it. um This is was part of the erasure coding. I think this um this pr 11701 is when um I believe when erasure coding was introduced.

D

Can you do erasure coding across pgs.

A

Well, the individual chunks could end up on different pgs. I believe yes.

C

Digital can change.

C

After the snap mode.

C

D

The number end is two.

A

Yeah, I confess I am I'm just trying to do some archaeology to understand um kind of some of the the rationale here instead of the snap mapper and the snap trimmer to grip n at a time is that that's two is that right is that the network.

C

Actually, let's see if there is a number here.

C

Where, where do we set the n number, it was.

A

So I see um there they added unsigned max equals g-conf osdpg max concurrent snap trims.

C

Yeah in 1, 3, 1, 28 right.

A

C

Where is where is, I don't see? Where is the this?

C

Where did we update the max.

D

I mean, where is this number set osd pg max concurrent snapstreams.

A

A

That should be in the uh yammel yeah, but I don't see sorry it would have been back. Then sorry they could changed that at the time it would have been in the config options. um Maybe it already existed because I do see in the original code there was a reference to this, so it must have. We must have already had ostpg max concurrent snap trims, but.

D

A

D

So maybe that number.

A

Yeah, let me look and see if, in that version of the source, it must be in the uh common options cc.

A

Second and I'll find it.

A

A

This is a little ridiculous. I don't remember what was called back then.

A

Well and in the event uh yeah, that's the setting, that's being read.

A

But okay, but then what does it mean if it the this says here that we're changing it to n um if we were already using that somewhere, unless that was part of this erasure coding pr in general and it uh it was done in a previous, commit in in the pr as possible.

C

So I I don't understand the world of erasure coding here.

C

D

It just changed the account into two final steps.

D

What was using.

D

Find the yemen file.

A

Sorry um yeah, it should be, it should be in the config options for in the source.

A

C

You know which directories exist.

A

Yeah source common options is where it should be now.

D

And then global.

A

Yeah, it's uh okay.

D

So we have the number of g maximum current snap strings type uh unsigned integer level floated two minimum one with legacy. True.

D

That's not very much, you know what what I suspect look at this, the next one max 20 min pgs jeez. You say: okay, don't three more than two pieces at the time. That's because stupid is a lot of stuff, so somebody just copy and paste the same thing.

A

Yeah I was trying to find where that was introduced or that that that flag was introduced.

D

But did you see the one below it?

D

Yes, I suspect somebody copy and paste and forgot to change from two into two hundred, but still it's not going to make any difference because, as you saw, we loop and this thing just control how many object ids you're going to get from the from the snap mapper in a single core.

D

But once you got this vector and let's say you got 100 entries at the moment you just got to. But let's say you got 100 entries you're now going to look over each one of them and then pro create a replication uh operation for each one of them with its own pg dog entry.

D

So there's going to be a lot of pity log entries created and then deleted.

C

D

Thinking about it, I think the first change going to make much larger impact, because uh first, it's going to impact any interaction with pg log, with snap, not just permission and even for the deletion. It's going to eliminate two tombstone per object, because at the moment, when we delete an object, we are generating four tombstone one for the object, two for the step mapper and one for the pg log.

D

By getting rid of snap mapper or moving away from roxy beats means we are down from two to four tombstone into two tombstones.

D

ah Sorry, actually, that there's another impact.

D

Every time we create a transaction there's going to be a right ahead, log access so by doing 100 in this time we're going to eliminate the amount of right, headlog rights. So it's going to become shorter. It's going to be out shape of the monitor.

D

And then the last post, the last change, which might be interesting, is breaking snap marker of a single monolithic entity describing all snaps in the system into having every pg maintain his own set of snap mapper, and it's actually going to have in fpga going to have a hash table from snap id to a snap mapper.

D

So you're going to have, um I don't know if we have 100 pgs and we got 10 snaps 10 active snaps, then you're going to end with having 1000 small snip mappers, which each one of them could be accessed using its own set of locks.

D

So when you delete a snap, mapper you're not going to impact anybody else when you're deleting a snap, so that's the third change we suggest I expect its impact is going to be much smaller, but it might help with reducing lock collisions. I don't know how much of this we got. How many collisions we see.

A

Gabby, I'm still trying to see if I can find where that was introduced, specifically that um both of those those options, uh but I haven't, haven't gotten it figured out yet so. um In any event, my my reading on all of this is that you should trust your gut.

A

um You know, I think you're right that there's probably a lot here that um just kind of ended up in the code base without a whole lot of you know, thought or testing on it in terms of you know how how fast we could do this or what the rocks tv behavior would be like. So um I I think I think especially your your first proposal is a good idea. Second one not sure about, but um but you know, I I think it's a very fruitful area to be looking at in general. So I.

D

Think it's a good idea. I I'm still looking for feedback, mainly from sam, because he's the guy, which did big part of his code, so he should have deeper understanding of what yeah reasoning for anything we did here.

A

Yeah, I agree, I think getting more uh uh info from sam would be a really good idea. Yeah all right!

A

Well, um just a couple minutes left here, so I I'll just quickly say um that I'm I'm looking at our shard upward q in the osd- and um I we talked about this a little bit last week, but the the gist of it is that um when you have lots of shards and one messenger thread, it was everything's really really slow, and I think I figured out the reason for that this week um when we can't keep the per shard q full.

A

We basically let the threads just um you know, wait uh on the condition variable and then, if we detect that the queue is empty, but there's now um something coming in, we wake everything up. We do a notify all, and I think that is the reason why this is really really slow. When we, when we create lots of shards and don't have enough data coming in to fill them all up, waking those threads up is really expensive. That's what it looks like is going on.

A

So um I think my guess is that any time we get into a state like that, where we don't have enough data, keep the the per shard queues full we're seeing a lot of overhead and a lot of performance degradation.

A

um If you can keep the queues full, maybe you do okay, but if you can't consistently do that, then we might start seeing performance and efficiency go down. So um I'm I'm thinking about different ways to try to change this. um The one I'm kind of focusing on right now is: maybe we can have an alternate method to wake up these threads. Maybe we instead um keep track of the threads that that we should be waking up and we we do so individually in a loop rather than a global notifial um based on one. We.

D

Just put them on a semaphore of such kind.

A

uh Here, let's look at the code.

B

Well, I think that the problem would be that you need to uh you need to wake the proper spread for all of them and then check yes, just necessary number of uh ophress. You will be interested. You need to pinpoint exact ones.

B

But before go dipping for before diving further, the main question, I believe, would be whether it's the it's a case. We want to optimize, because it sounds that the problem happens at the indefinite happens when there is already when the system is already under loaded. Basically,.

A

I think radic, what I'm I'm a little concerned about is that um we can't er essentially um in a situation where the the messenger threads can't keep the shards full. That's when we hit it right and maybe that's um not common. If you have a huge amount of work coming in, but I wonder.

B

A

If with our default configuration variables, often it might be that you see um uh those those uh cues like become empty.

B

Okay, this uh basically means that an alternative could be increasing the default number of messenger threads.

A

Maybe maybe um you know three definitely is much better than that.

B

Or minimizing the number of shots we have by default.

A

Yeah we could, we could both simultaneously reduce shards and increase messenger threads, but I think what I I'm a little afraid of is um okay. So one question is how good and how even is our hashing over over um the shards like are? We are we ever seeing situations where the clumpiness, due to the random distribution with the hashing means that that you know one particular shard might be um less less activated than others? uh Well,.

B

You mean I, I bet you mean sharding uh by selecting the the shards for uh selecting, basically tp or dtp. The thread for uh selecting the pg will uh will be handled by.

A

Yeah, yes, exactly exactly and.

B

But messengers, I believe our messenger workers are far more flexible.

A

Yes, I agree with you on that. um I guess what I'm I'm wondering is you know? How often can we end up in a situation where any particular uh shard ends up um uh calling notify all.

B

A

Like it do we have cases where um it's easier. Certainly, if we have fewer shards, it's a much better situation right, because it's much less likely that the the cube becomes a any individual cube becomes empty. But um then we have to increase the number of threads per shard and that's not optimal either. It seems.

B

Yeah well how about uh thinking that on that next week? Well, tomorrow is tomorrow's good day.

A

We should wrap this up, but I agree with you um and then I can also show some of the um the locking behavior too, because I think we do see contention, especially on the guard lock, um but also we see contention on uh the the weight lock as well. So it's, uh I think it's a balancing act, a lot in this. uh How to to manage all these things.

B

Well, we could improve uh the messenger uh to tp passing at the price at let's, let's say more contention around pd, lock and guard items.

A

Okay, I was even just just to think about well well you over the week, but um perhaps maybe you have a messenger thread that um that uh well could we re work this in some ways that each messenger thread uh is per shard and can pull off data off the um of the the network layer? I don't know um I'm not exactly sure how that works, but um things to think about how how all of these different threads interact with each other in the system.

B

Yeah, oh, it's! So you really need to wrap up okay, cool thanks for thanks for beginning the discussion. uh We will get back uh we'll get back to you.

A

Okay sounds good all right. uh I think it's probably good time to wrap up then too so have a good day. Everyone thank you for coming and uh we'll continue. Maybe that discussion next week.

B

Thank you see you soon, bye, guys, bye,.