Ceph CDS Quincy, 8 Apr 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Ceph Developer Summit Quincy: Crimson

Description

00:00 - boost::asio-based code be adapted to use the seastar reactor
3:12 - Manage RWL Replication Info
21:51 - m-n mapping
1:08:51 - Seastore

Etherpad: https://pad.ceph.com/p/cds-quincy

A

So the first topic can boost azure based code, be adapted to use c star reactor, I'm not sure who added that do we have anybody leading this.

B

So, at a super high level, I believe the notion here is that there's a bunch of code and stuff already that relies on boost asio.

B

So the question is: how easy would it be to adapt that code to run within cstar can.

C

B

For instance, wrap it and just run it directly in the reactor with appropriate wrappers for wiring up the callbacks. I think this is just investigation if it needs to be done or live rbd to start with. I think that's the important first case. I haven't myself looked into this yet though, but that's my.

A

Understanding of the.

B

A

Yeah yeah, I think the new radar stuff all uses as your uh boosters. Maybe josh did you add this by any chance.

A

Okay, what's that, okay, okay, well,.

D

I have a little.

A

D

About this as well, um this is actually jason had um he had already rewritten over d to use uh at the near raiders interface, which is based on bustazio, and his thinking was that it might be possible to template out the implementation, specific pieces of futures and not flash code, routines, slash reactor and um leave the messenger with on, like the crimson messenger, for example, with only the protocol logic.

D

um So then, you get a unit protocol and b not have to have a separate implementation of messenger versus async messenger.

D

But that was a.

D

More of the idea phase, so I think I like that consensus. It was saying we need to be investigated to see how feasible that.

B

Is likely, with some sort of initial prototype, proving out what the code would look like.

B

I don't know that there's much less to discuss here about someone who's deeply familiar with booths, with both boost, asio and zstar would care to comment.

B

Right, like not so.

B

Much sure what rwl replication info is.

D

Lavik was from chiang you're there hi.

E

uh Yes, yes, george, I'm on line uh is it time for me to talk about the artwork replication um work.

C

E

Right: okay, uh I'm a software uh engineer from intel working on saf uh and uh my peer champion is also online today that we will talk about the design of the rapid car diamond and the rapid calm monitor. uh Is it okay for me to chill the s slice test.

F

E

E

Can you see my screen now yep, okay and.

E

It's unloading, and previously I think that uh lisa has talked about the rw, replica, replication work and today's item I and the chimpanzee will focus on how to design the replica german and the rapid car monitor to manage the replication information and and and for the rwr work previously, that we have finished the design of the single copy for the nvidia and ssd right back cache and these features has been merged and for the next steps that we are going to do.

E

uh You know replica the data across the medium over the rdma protocol and for the first part, is that um we focus on how to manage the replica. How to manage the client cache information to do the uh further rep to do the further rapid replication work and, for the other part, currently that we haven't started yet.

E

And today that, um on this slide, um it is the basic um from work about that. How we design the replica, german and the replica monitor. Firstly, is that on the left screen or on the left screen of the slide, and that is the replica demon we are reporter. The replica damage information to the replica monitor this. This information includes atom.

E

um There are the rdma's ip address and the support that is used to listen for the coming connection and the other information also includes that the free size of the mv team and after that, the red car monitor, receives this information that it will that it will store it into the replica demon map, and it will also go through the packages service to keep consensus of the replica german map information across all the replicas across all the repticon monitors and for the client side.

E

um On the writer screen of this slide, the item- the folder client, it will use the replica, it will use the get replica demon map message to request the replica monitor to choose the proper replica demon and aggregate the information. And then that's the replica monitor. We are feedback.

E

The replica demon map information to the leap rpg and then that is the and then that is the leave rpd. uh We are starting the connection to the replica demon further for the replica work and um uh current and the currently that we have um write us. The code um to re, to request a further comment from the community and um on this slide, that we show that the basic fram work uh about the replica german and the replica monitor.

E

But we also have the but we are, but we are, but we and we also have a basic document to show them uh to to to to choose that. What's mental data are included in the revit card demon and how that?

E

um Because of that, we also have uh defined three kinds of messages, and we also have design um these three message: classes to be used uh between uh the replica demon and the rebelcom monitor, and also include that how the leave rbd sent the request to the red car monitor and then that's the red car monitor, we'll use another message to feed back the information to the leave rpg and I will open the framework design document.

E

Okay and on this screen and for the um for the replica damage information, it included that the demon id and it also included the rdma's port and the idm is ip at the address, and there is another there is. Another memory field is to store the free size and for them and for the replica map enter in. Because of that, the replica map will be stored in the replica monitor and it includes all the replica damage information.

E

So that currently, is that we only use a stl vector to store all the replica domain information and um and further and the for the cliented silence item uh one uh when the leave rbd of other.

E

When, um when the leap rpd sent the request to the replica monitor, it will uh um include the information that, um what's the number of replication here, we define the field rep cards and it will also include that, what's the um requested replica size to the replica monitor and um for these three kind of um man for these three kind of mental data information that we also have defined the proper uh messages to be used between the red card demon and the replicom monitor, and also between the leave rbd and for them.

E

When the replica german reporters information reported the replica damage information to the um replica monitor, it will use the message. Replica demon, blink message- and this message will include the replica diamond in information and for the client side. Just for the leap leap rbd.

E

It needed to send the gate, replica demon, map information to the replicom model to the replica monitor and then that the replica monitor will choose the proper replica demon and aggregate all the information into another replica map. And it will send the replicom demon map information back to the lib rbd.

E

And these are the three kind of man. Mental data and the corresponding messages and in the end of and and in the bottom of this screen.

E

This is what I have showed on the slides just now that here that we use the blink message to report the damn information to the right to the rapid, calm, monitor and then that the leave rbd requested the replica map through this message and then the replica monitor, 50 feedbacks the information to the to the live rbd through the replica demo map message and currently that we that we have sent the email to the community to look for some feedback, and so that, uh could you know josh and others.

E

Could you can ask some suggestions about this design framework? Why.

B

Does this need to be in the monitor.

E

Do you mean that the replica demon map information why it is in the replica monitor right? I mean.

B

Why is the monitor involved here at all.

E

B

So there are other ways to maintain persistent mappings in rados that don't involve modifying the monitor rgw, for instance, uh maintains information about caches in using watch, notify and rados rbd does the same thing. Why does this need to use a new paxo service.

E

uh um Apparent um uh current thing is that for this kind of um information that we have um that that I have that, I have looked through about the osd monitor and because that's also because that most of the texas services are included in the replica monitor, so that I add them so that I add another replica monitor which is inherited from the pexel service. Yeah.

F

B

Like that, I don't think this should be in the monitor at all. I think you should be able to achieve roughly the same effect with primitive rados operations and watch notif. I.

F

Don't think you need to play.

B

The monitor at all.

E

B

All you're doing here is publishing a mapping right yeah. So why not put that in regular radius objects.

E

E

For the for the design about that for the design of the radius of the rgw, currently that I haven't looked through them yet because that uh I I I just take some other pixel service and then yeah yeah.

B

This is a huge maintenance burden like massive. If you can do this, on top of the liberators interface you'll be much easier to merge. This is almost impossible to merge.

E

E

The problem is for them for this kind of mental date. Do you mean that we need to manage it on the above layer of libraries.

B

I'm saying rados itself like the interface the library interface offers primitives for performing operations on regular radius objects with the ability to create, locking and notify mechanisms. I don't think you need to embed this in the monitors. Packs of service it'll be less code to write. It will work better and it'll scale better. If you implement this in rados.

C

E

B

Is on top of liberators.

E

um So, um for this is them some design talk. Is there some design document about that? How to implement this kind of design based on the liberators.

B

All I see here is a an rbd name to set of damon mappings, that's the rbd that that's how you locate the replicas for rugby right. Secondly, there's a registry of available replica demons: you could put each of those two things in a radio subject with um some kind of watch notified to maintain authoritative ownership and ensure atomic mute mutation. Lib rbd already gives you ownership of a particular rbd object, so that part's easy. You simply ensure that only that client is allowed to modify the um that mapping.

B

In fact, you can embed it directly into the rbd's metadata itself. I'm not really sure why you need that to go into the monitor in the first place.

B

um The second part you can maintain with just a registry uh just an object containing a binary representation or whatever an omap entry for each one of these demons with a last seen a lifetime. If you're trying to detect failure.

E

I mean that is the problem, but to the property is that we need a center to gather all the information about the replica daemon, and this.

B

Is that an object with a single, well-known name.

B

Right, you already have a cluster that you agree on, so all you need to do is write an object with a well-known name or a family of well-known names, a lot of ways to do it. I suggest that you study the way. Librarbd manages ownership of the rbd head object and the way rgw manages its cache semantics.

B

um If you ask in the mailing list, people may be able to point you at documentation for them.

D

I think the rvd merit even also does things similar with managing demons and replicating which images.

B

So bloodly there have been a lot of.

D

B

Projects that have needed to do something like this and the solution is that rados itself gained features to enable this. Adding this to the monitor would be an extreme amount of extra overhead and, like we'd, have to very carefully review it.

B

We'd have to very carefully evaluate its performance properties, how much you're, storing how frequently it's updated worse if it's poorly implemented or there are bugs in it, you can crash the monitor, which makes the entire cluster unavailable so modifying the monitor is a big thing so before we consider that you should probably prove to us that it's necessary in the first place.

E

All right, yes, today, that we are talking about that. um Whether this way uh is the proper method, um yeah yeah.

D

I think we also need to go back and consider why we're trying to do client-side replication from cache to cache.

D

I think when we had discussed this previously cdm, um it started turning into almost implementing an entire osd on the client side and at that point.

D

It seems like it might be more beneficial to implement, like this kind of rdma replication interface at the osd side and run the osd on um in front of these caches. Instead,.

B

Actually, if I'm understanding this architecture correctly, there wouldn't be anything stopping you from running these replica demons on the osds themselves.

B

They'd be two different demons, but they'd be physically located on the same place. You just partitioned the nvme, which is not awful. It's not my preferred solution, but it's an inter, but it's an interesting intermediate implementation that wouldn't require drastic changes to radios.

B

That's why I was asking, for instance: is it actually necessary that the library instance have a local replica demon? Does this system require that I would recommend to you that you do not make it require that to enable the possibility that all of the right all of the replica daemons are actually remote?

B

F

Do you see what I mean.

E

um For us, thanks for your suggestion and uh and and and the could you that reply, that um uh the message, the email that I sent to the computer.

C

B

uh Sure I'll just slide it.

E

Okay, thank you.

E

And uh for the year and and and the form, what are you just a oh, what you said just now that that I will talk with my peer champion. Okay,.

E

E

D

D

All right, what's the next topic here.

D

Like the multi-core supporting crimson or m2n,.

F

Mapping uh yeah.

B

You have anyone here who wants to. I can talk about the broad strokes, there's a document linked here, which I think contains sort of an in-progress uh conversation. I think at a super high level, there's a lot going on here. um We want to.

G

B

Run crimson multiple cores, I'm just going to begin with the assumption that it doesn't make sense to run one process or just flat out it doesn't so coarsely. We have three different resources. We need to to deal with their incoming connections from clients which will come in on whatever port handles the message. There are state bundles for each pg.

B

These will be the pg class objects. If you will and then there will be the back end, let's go with c store implementations.

B

These numbers may all be the same, in which case a single message would be handled in the same core all the way through, or they may be different depending on details of networking setup or arrangements on the device or amount of parallelism permitted by the device.

B

So the other reason we don't want to just assume that messages will come in on the correct core is that old clients will be unable to and for backwards compatibility reasons. We must support that feature anyway.

B

um So the short-term goal, I think, is to implement that middle piece where the pgs are allowed to be on multiple cores. We create multiple c-store uh c-star. What do you call it um whatever their thing for multiple, multiple? We permit multiple reactors and we create a partition service or whatever that spreads the pg across them, and then we allow the messenger to farm out messages to the relevant core based on which pg, I think that's the first piece of work after that.

B

There's work to be done in getting c store the partition of a disk among a number of discrete cores, but that's a little bit further out radically. It looked like you have something to say.

H

Yep, I got a question actually uh questions. Last slasher proposal.

H

Do maybe we can go similar way?

H

uh I was thinking about actually turning that is currently crimson. What do what is currently.

H

H

H

So I'm running on a backup, I'm afraid it won't be better.

A

H

I

A

You turn your video off.

I

H

So uh I'm curious whether we really need to uh to pass uh connections past messages uh across the course. I understood that we are thinking about. uh We need to implement. We need to bring more, let's call, let's say, front, end processing. uh Oh, is this front end processing capabilities to saturate uh to be able to saturate fast devices?

H

One car is not enough. We need to. We need to have more uh front-end processing power to saturate to saturate fast device, but I'm not sure where we need in order to do that, we need to uh we need to chart across uh pg. uh Maybe we don't need to worry about uh about sharing resources at all and just uh just have multiple osd's in the same process. In the same address space uh sharing just single instance of.

B

Like they share the same failure domain, it means that you can no longer move data from one to the other easily. I'm not salted that simpler.

H

But actually in crash, our domain failure is cost. So not no huge difference here.

B

No, it's not that it's that every ost requires metadata in the monitor.

H

So you have to.

B

H

Is of course- and this would I'm not I'm not saying that uh I'm not saying it's extremely straightforward- it would require some extension like.

B

No, it's not that it sounds weird.

B

We multiply the amount of metadata the monitors need to track for osds that have to always time that's wasteful.

H

It's wasteful, but I'm not sure whether we need to fight with that. I still have. I still keep in mind that you.

B

Actually have osd scaling problems.

H

Okay, that's something uh new, I'm a I'm! I I was unaware about. I keep in my mind, uh I pictured where I literally the 10 000 osd uh cluster testing.

H

I was, uh I assumed we have a lot of uh a lot of infrastructure capacity. I mean uh processing power of on monitors.

B

They don't scale up, they can only ever be as fast as a single monitor right.

H

F

Like they don't scale up.

B

Monitors doesn't make the monitors faster, so baking in a bottle. Deck like that doesn't seem like a good idea.

H

Okay, but still if, if if this happens in production, then we could tackle uh tackle uh just in time, not sure whether.

B

The other problem is that's the main question: it's device usage. So if, let's say we have an nvme device with, let's say a terabyte or four terabytes whatever of space, and we need to use 64 cores to saturate it right. That means we have to divide.

C

B

Four terabytes of disk into 64 different disks.

C

B

Based on a uniform random assignment of objects to those 64 different osd's, we're going.

J

B

A severe um space wastage problem from the smaller or less used osds in that group um sitting on sitting on space that's allocated, but that they can't actually make use of if they were all one big bucket we'd get much better disk utilization.

B

I mean it just doesn't seem.

H

H

Okay, I need to think about that because I don't have. uh I don't have an answer just uh just for just for now I will, I will rethink and we could. We could think I'm there.

D

We don't need to make things perfect for quincy either. What's that kind of the simplest way, we could get multiple cores running.

F

B

Way is what rad is describing. We set things up, so just run multiple sds, whatever I don't like, there's planning for quincy and then there's deciding what to do next. I think this is an important thing to do next. It's not that it needs to be there for production user, but it's important like if I were placing resources. This is one of the places I would. I would put them.

H

And actually the implementation related uh things was uh was, it was uh uh was. I think I wanted to ask uh uh during this meeting. I understood that chonmei has started working on the end-to-end mapping.

H

Is this assigned already or it will be divided.

B

Like I said, this isn't a single feature, this is a collection of improvements that will need to be made, so no one's going to work on m2 mapping someone's going to work on modifying crimsons so that pg's get split up among several cards. That's probably the first step. There's work to be done c, star c store, rather that I'll, probably work on and then there's work to do in the messenger, both in allowing.

F

B

To run multiple cores and advertise multiple ports and modifying the rados protocol itself to allow clients to correct one. So there are a lot.

H

Of independent projects.

B

H

Yep- and I think that search one has already did- is already doing uh doing similar stuff for xi'an star. I saw a pr from him about about spanning multiple multiple cars in more efficient way, and this leads me to another question.

H

Can we consider it to to implement.

H

To implement the sharding with, let's say, object, store, adapter.

H

Is it even possible.

B

I'm not really worried about it. That's already running in its own process uh in its own threads, so it'll scale independently and any card that needs to do. I o simply sends it to the common queue. We can be clever about improving parallelism, thereby partitioning the queue, but I'm just not worried about it. It's not a big deal, it'll be much more relevant for c.

B

Store you're talking about alien store right.

H

No sorry uh sea answer.

B

That's the end memory one. I don't really care to be honest.

H

It's for testing basically sure.

D

It sounds like a memory store could be like a trivially charitable display, creating multiple of them as well.

B

You just create one per pg chart you're not going to shut the osd down and back and turn it back on, so it doesn't really matter if the mapping is persistent and done correctly, even it's not a huge deal. I don't think.

B

Do you see what I mean.

G

Yep, okay, we got to keep it.

K

uh I have some questions: uh how do we so can we, how do we decouple the work of m2 and messenger and m2 and osd, for example? uh How does messenger know which call to send.

B

K

B

Some code to make it so that there's a cross-court service that exposes a pg to reactor mapping, so you look.

C

At the message.

B

You look at the message. You do the osd map stuff. You need to do to resolve it to a pg and then you look up at that. Roscore.

K

B

K

But that's very.

B

Much like how classic osd does it actually.

D

The messages actually contain the pg they're targeted for directly now.

B

That's even true for client messages now.

D

B

Even easier cool, the.

F

Only tricky part is.

B

If the pg doesn't exist yet, but again you can have a look at how classic osd does it it'll be similar, but with hopefully less horrific code.

K

B

Look at the os, the classic osd fast dispatch system is basically what we're implementing here, but with better.

B

Libraries, so what so like at a super eye level, the messenger receives a message. It looks at the metadata in the message that tells it which pg it's supposed to be handled by. It looks up in a cross-core service.

B

Where is this pg? What reactor it then uses the d-star primitive to schedule? A future on that core that kicks off the handling process that cross-court service to expose the location of the pg is yet to be written. That's a bit of work.

K

I'm wondering is that part related to osd, so so messenger will deliver the the message directly to the same call the same no.

F

That is not true.

B

That's what I'm saying that isn't necessarily true. Once we've fully implemented all of the uh protocol work um to inform clients of which port is mapped on which core and which pgs are on which core yeah that'll mostly be true, but even then the client uh we may choose to until we choose to do that work. The message will come in on whatever messenger comes in on and the messenger will need to look up which core it's going to.

B

There are a lot of reasons to support.

C

B

Pathway other than including, very importantly, backwards compatibility, but also because this way we can do this part before we do all of the radios modification work, which will be a lot of work.

K

So don't I mean, does.

B

It on the right court.

K

No, no, I mean uh the the osd map look up to to evaluate which message belongs, to which pg, uh I think.

B

ah That may not be necessary. Josh is telling that.

K

That's not not not being necessarily in the messenger side, maybe it it can be part of ost, because osce has mostly.

B

No, it's not necessary at all. It's not necessary at all the the osd's all know or the the messages now all contain the actual resolved pgid of the pg they're supposed to go to.

B

So all the messenger has.

C

B

Look at that part of the message figure out where it's supposed to go and then look it up in the map, not the osd map, my dude, that's a whole separate thing, that's just reserved words and stuff. Sorry, if I the word hosting up the osd service inside of crimson that maps a placement group to a core, whatever we choose to call it pg core mapping.

K

uh Where is it, could you doesn't exist in the classic.

B

Oh in classic osd.

D

B

D

Fastest fashion, immediately cues the message into the appropriate chart in the chartered work view based on the pg that the message uh references and then, if that pg doesn't exist or is splitter or some or is pretty merged or something um when it when it gets, um gets dequeued.

D

The classic oc notices that uh this pg needs to wait for some reason, maybe for the split to finish or a oc map to be processed by the pg and puts that pg that operation on a waitlist.

D

So essentially the the more complex osd osd map handling is done at the osd layer, not the messenger layer. It's deferred.

K

Yes, so so why crimson can be simpler, not to wait for some pre preconditions before this? Attaching messages to another call.

B

The only complicated piece is if there is no entry for the pg, that's the only one you have to worry about. If there is no entry, this service is simply going to provide a waiting list. That's.

K

There would some event notification to rpg.

B

Which the case where the pg doesn't exist is the case where the client we got an osd map that moved the location of a placement group. The client immediately sent an I o to it before replication back to all that crap. That actually happened so we're essentially waiting on a uh a notify to.

F

B

To cause the pg to come into existence.

K

Osd map service will be embedded in messenger.

F

B

Again, all the messenger will do is it will check the current cross-core mapping from pg to reactor if the pg is there, it just immediately cues it on the relevant reactor. If it's not there, it sticks it on a waiting list for that core or for that pg and returns. That's it when the pg is created, the reactor will go. Oh I have this pg now it'll claim that q and it'll move on with its life. There will be some subtlety in exactly how those operations happen, just to ensure uh correct sequencing, but not very.

K

B

Of it will happen on the messenger side.

K

So you mean if we got the pg pg id from the message we immediately know which core it belongs like.

B

Well, we know which pg it belongs to and then the osd itself has in memory state that is available to the messenger to tell it which reactor it's on.

K

So isd will expose some interface to.

D

It could always be providing a cue to the messenger to put the message in and it would either be the waiting list queue. The qp doesn't exist or the correct queue for that relevant core.

B

If you're worried about the memory sequencing problem, there are techniques for getting around that, so these uh mappings will have some monotonicity properties that let us do rcu tricks so likely what's really happening under the hood. Is there's a published pointer to a current, consistent mapping from pg to from pg to uh reactor, and any messenger wishing to do a read from these things does so without barriers.

B

It's just that. It may be the case that they're reading a slightly out of date, map, in which case, if they don't find the pg there will be a slightly slower error path, but in the common case, they're just doing a barrier-free, read to a well-known um bit of in-memory state.

B

There's it's synchronous in the sense that yes you're doing a memory read, but it's not synchronous in the sense that you're actually waiting for another quarter.

F

Does that mean yes,.

H

Yep there shouldn't be any uh on path. It shouldn't be any penalty. I believe now there shouldn't.

B

From costco we will probably go through more than one version of this service, but yeah.

G

B

There's no reason to expect there to be a penalty on the fast path. We'll just do the right thing.

H

Yep, but if I understood correctly, we would need uh at least without uh initially without the extension to radius protocol. We will need to somehow uh somehow pass data between between uh cars again.

B

We have to we have to support that, no matter what.

H

Yeah, but it will, it will be a bit uh a bit complex because we need to implement. We cannot even after finding the proper, uh the proper reactor for handling traffic to particular pd. uh When we are responding, we will need to. We cannot directly send no.

F

B

F

Would message that.

B

We'll have a the message: will have enough state to find the original messenger. That's fine! So.

H

This is another. Yes, it's fine! It's just another piece to implement.

H

Well, I'm I'm perfectly fine with that. I will just. uh I would love just to ensure that uh the the assumption that we really need to worry about uh about uh the uh osd map entries the the the infrastructure traffic, whether it is correct yeah. I think we we do.

B

For one thing,.

C

B

We we need yeah, I'm pretty confident that we do. I don't think, there's any realistic possibility that we can get away with an assumption that we'll have one messenger per you.

H

Know maybe two years ago there was a discussion on on set devil. uh We were iterating over that and- and I bar and bear it in my mind that uh that we have plenty of uh of uh of uh of resources there, maybe because maybe well, maybe I will maybe maybe I'm just understanding.

F

B

H

Are fundamental.

B

H

I imagined uh I I was. I was thinking about the ten thousands uh osd in cluster testing.

B

Yeah I mean we did this. I mean sage worked with, um uh oh god, certain there we go um with their extremely large cluster and they're constantly hitting monitor scaling.

H

Problems yeah. If so, then then I am I'm afraid it would be no other in a viable option than then initially uh do the pg sharding and then make an extension to the protocol.

B

But I mean the reason why I'm just not bothered about this is there are so many other reasons we don't want to have to run one house deeper core. It's just.

B

There are really a lot of reasons not to do that.

H

Okay, I I agree, it's it's far from being uh legend, but it's it's damn stupid.

H

Sorry! uh Well, I believe it's the the one uh osd pair per car, it's very, very illegal.

B

Well, it's what we have now. We can do that that testing we can do now. I don't.

H

Know different different thing: sharding by osd.

B

H

The the situation where you were you have, you are sharing solely uh object: single objects, for instance,.

B

No, no, I mean I get it, what's the benefit of it,.

H

Just just simplicity: no need to worry about uh protocol extension at all. No worry about uh passing about.

B

F

B

Complicated this protocol extension isn't very complicated.

B

F

I get that this.

B

Is simpler and what I'm saying is we already have this version like that's why we did this version first like we can right now simply run one crimson osd per core. If you just partition up the disk, we can already do that um and it may.

F

Well, be worth modifying stuff.

B

To do that for us in the short term, but.

H

At the price of of putting extra burden on on deployment uh engineers and deployment, tooling yeah, it's uh it's all it's actually it. It moves completely. The complexity somewhere else.

B

As opposed to oh, are you saying you put all the oc's in the same process?

B

Yeah? That seems really bad, because if you crash one of them due to a memory, corruption or something it crashes, all of them, that's real bad you're gonna have more than one osd. You should at least use multiple processes. It's way simpler.

H

Okay, you can have, you can have multiple osd's talking, sharing sharing just the of the objects, for instance,.

F

H

Really, with with memory with some kind of.

H

Shirt memory, not entire, not entire uh address space, just uh just just a region and.

B

Then you're giving back yeah. This doesn't make sense at.

H

That point you're getting yeah.

B

It's complexity you gained by moving.

B

You also have to pay the cost of dealing with cross-court communication for talking to the object, store.

F

B

G

B

Don't think that that design is worth considering me.

K

I have a second question, uh so we have a heartbeat message: messing messenger, which is not pg related. It is like a host service post lab service, so we can leave, leave them as it as it is now right, because if we need to shut make the happy messenger to to be across course, it will be really different to manage those metadata in different course, and we don't. I.

B

Don't see anything to do that, I think you pin it to a single cord.

K

Yeah, I think we can leave them as it is now right.

B

I mean all of the pgs: do need to update information in them, so you'll have to update it to permit updates from other reactors to its own in memory state. But that's not a huge deal. You can do it by sending messages to the core.

K

No, I mean heartbeat.

B

Yeah, I'm not talking about the messages I mean, um so what the heartbeat messenger actually does is it. uh It maintains a bunch of osd wide state, some of which are minimums and maximums of the pgs, all that osd. So all of the pgs, when they're doing their own thing are updating state that the heartbeat messenger then condenses into a message that gets. That is the heartbeat right. So there is information that will need to make its way from the other reactors to the heartbeat messenger.

B

But I don't think that's not the heartbeat message to the heartbeat state, I'm sorry yeah, the heartbeat messenger would remain in one core. I see what you're getting at now. Yes,.

F

A

B

Heartbeat stage is what I was talking about just a separate thing.

K

Okay, a third question: uh how does those event uh so, like uh the the connected event, the the reset event remote? Is that event, if we.

B

um What are these for an osd, heartbeat failure or for an osd connection? Failure it uh that's entirely messenger! Well, okay! So there's the parts of the for stateful connections.

F

B

The re-queuing messages stuff still gets handled the same way. That's all inside of the messenger on whichever court happened.

B

Trying to think where do you actually get callbacks on connections? It's.

F

K

Connection will call the dispatcher interface.

B

Like client connections don't generate events, so those don't matter uh for osd osd information.

B

When we get a reset, we it tr it triggers the behavior, where we send a message to the monitor, so whatever handler exists for that probably just gets pinned to reactor zero. For now um I don't know, but let's look at that code in some more detail.

B

My guess, however, is that for every connection either it'll default to reactor zero or the connection will get a tagged reactor where the callbacks will.

C

B

F

That make sense.

K

I think so, but we need to look at each each case is individually right.

B

Yeah, I think the two big users are osd liveness checking and watch notify. That's the other one. I've forgotten.

B

So that'll be subsumed when we figure out how we want to deal with watch the watch notify state.

K

B

Because it's slightly complicated, many-to-many mapping.

K

So as the first step, so currently messenger is single core messenger. So can we assume it can be decoupled so that we can use that single core messenger to serve the multi for osd?

K

So they have different development paths, pace yeah.

B

I think that's that's correct. Yes, I think messengers will, in general, be single core. So actually I guess this is something to ask you about. um If you're serving on a single port, is there any advantage in the messenger being able to run on multiple cores at all, keeping in mind that the messenger.

F

B

Doing anything any work, it's just handing off the message to the corresponding backing reactor.

K

The connection will will be benefit because there are encode decode works inside the connection. So if the connection itself is started among the course the work can be distributed.

B

F

Have a multi-core.

B

Reactor accepting on a single port where each connection that gets come in that comes in ends up local to a single core um that might be worthwhile, but even in that case, the core, the the connection ends up on won't necessarily have any relationship to the messages it's serving without the future work we haven't talked about yet um so in that case, you're still going to have to hand off the messages to whichever reactor they're supposed to go to.

B

So my suggestion to you is that indeed, we just let the messenger be one core for now and work on uh creating the multi-core infrastructure in the osd. First.

J

Hey sam uh one question here uh so still: there's some information shared between the uh articles, for example, configuration with map collection. uh So how we share the information you just one call all the the data and other courses request. The information.

B

Not necessarily so we can do the same rcu trick I was describing before configuration. Information could be shared as a sort of widely available memory map, or we can investigate some other implementation. If we need to I'm open to suggestions on that, in other words, I don't think we want every config query to involve a remote message to another core that would get expensive.

I

Fast yeah at this moment, each core have its own copy of the configurations.

B

Yeah, so as long as we have a way of broadcasting updates, that seems fine. That would work great as a result of a config change, so yeah. That seems that that seems perfectly adequate.

H

And when it comes to watch notify, I believe that this shouldn't be a problem uh in crimson. Everything related to crimson watch network is actually encapsulated in uh inside the boundaries of rpg, except.

B

For the fact that the sessions that react to it actually interact with multiple pgs watch notifies going to be a problem um not like an unsolvable one, but one that will require actual architecture.

B

um If you want to have a preview of what it looks like have a look at how classic osd does it we're going to have essentially the same problem again with hopefully, better libraries and better readability, but the same basic problem.

B

That is so watch notify is dealing with connection state right, but a single connection might have watch notify might have watches on objects from multiple pgs that are therefore on multiple reactors.

B

For that reason, the state associated with the watches associated with a single connection can't be local to a particular reactor.

B

F

G

B

Get too far into this because, honestly, I don't. I don't remember the interface well enough, even to remember what the requirements are, but we're going to want to write some little boxes on uh google docs or something, but we're all satisfied with what it looks.

B

B

I think it's a project for a couple of months.

H

B

I don't I mean it's not a big problem again. Classic osd already has pretty much the solution we want here. It's really a matter of translating it into um libraries and primitives that are more convenient for c-star watch, notify.

F

B

A big performance intensive thing, so we don't even need to know.

H

It's uncalled buffer, usually so.

B

It's important that it not be inordinately expensive or silly, but yeah. It doesn't need this. This isn't something that happens in every io.

I

Exam you mentioned that we, we will not have have multiple osd in a single process. Is that what I mean.

B

I don't think it's a good idea, I'm willing to argue about it, but I mean we've been arguing about this for a year and I still haven't heard a good reason to do without that. That way,.

I

Okay, I I thought we we were going to discuss this over this this meeting, but I I was well.

B

We discussed it a bit before so like at a core. The problem with having multiple sds in a single process is that if one of them crashes, they all crash by.

F

C

B

Have multiple osds you're already paying metadata overhead in a lot of places where it's very expensive, so at that point you may as well run them in different processes. It's just cheaper.

I

B

I

That's the argument.

B

Is there a cover.

I

In classical d for it's, uh it also have a have a name like collocation or sd in the same same host, something no.

B

No sorry, you could have multiple issues in the same host. You said same process.

I

Yes same process.

B

There is no way to do that with classic osd. Currently we permit it for testing, sometimes in that it's literally possible to create two osd objects, but no there's no way to configure classic osd to run two osds.

I

Okay, as you know,.

D

So to summarize, your uh you're discussing uh going forward with n10 mapping where the osd operates much the same as it does with classical osd, having like, essentially a single oc instance per object, store and per device.

B

Yeah for deployment purposes, you could still choose to partition a device and create two of them like same with classic osd, but I think there are a couple of core questions. One do we want to require use of multiple cores require different processes. I would argue: no, because we want the backing object, store implementation to be able to share memory.

B

If they're sharing memory do we want to use multiple osds. The answer, then, is also no because we've already paid a lot of the complexity and the monitor would have to do a lot of extra metadata for multiple osds.

B

So that leaves us with this version, which is that an osd runs um multiple reactors, a subset of the reactors host messengers, a subset of the reactors, host, object stores and a subset of the reactors host pg data.

B

It's possible that in some configurations those three sets are identical and when a messenger gets and when a messenger gets the message, it's always for the core that it already is on, which means it gets handed directly to the very same reactor gets picked up by the very same by a pg on that reactor work happens and it gets passed directly through to the c-store instance right behind it in that configuration, unlike classic osd, everything is handled by this by the same thread but architecturally.

B

It would also support a situation where we don't want 128 c-store instances, so instead we have 128 messengers, 128 pgs, but only whatever 32 of these four instances, because for whatever reason, that's the more efficient way to do it, so we would be able to independently scale these. These things does that make sense.

I

Yeah sort of I thought we could we could in this way we could share the different theories resources which could be shared across different ocds, but uh but having multiple.

B

Usds is is bad, like that's, not a good thing, so by the time we've chosen to go to the to go to the work of supporting multiple reactors, at least at the object store level. We also don't want multiple osd's, because having multiple osd is in the same process when they don't have independent failure. Domains means we're artificially imposing extra work on the monitors.

H

And our discussion here was uh whether this is actually a problem or not.

G

It's a problem.

H

I would I would just I would just check the assumptions we made and that's all still. I keep in mind this, but.

B

What I'm, what I'm saying is it doesn't even save much complexity at that. At that point,.

B

Like we've already gone to the work of creating cross-court communication primitives to allow the multiple osds in quote, to talk to the common object store back end, so we've already done that work. So why would we stop there? It doesn't seem sensible.

H

Okay, uh the reason would be could be actually to not worry at all about ext about extending the protocol, wouldn't be at all about passing messages uh across passing.

F

That messenger, the special.

B

Case of the general case of cross-court communication, which we're opting into here, in fact.

H

That's why I'm just.

C

H

I'm just arguing that in in that scenario, the cross communication, uh the cross-court communication- would be extremely limited just to adjust to the uh adjust to the uh object, store, related crossbar, actually.

B

Essentially, the same thing.

H

Conceptually yep, I I can agree with that at the at the level of implementation there uh it will be, it will be two different things.

B

Different, yes, but both complicated.

I

So, in other words, we still need to worry about how to short a given set of core to to different uh to different ocd instances right.

B

Yeah, I claim that that's not a big deal, so I'm first off. This is seth adm's problem, not crimson osd's or whatever.

C

B

Sefadm will it could either statically configure, which cores go to which osd, which is fine with me, or it could be off loaded to the way that containers work. I think containers actually can limit the number of cores seen by a process.

B

So maybe we don't need to worry about it at all. If we assume that crimson osd is containerized, then we can just assume that it owns whatever course it sees so done and done so. I'm not worried about that that part okay and we.

D

Can talk more about that part in the um orchestrator session? uh That's tomorrow, worst case.

B

Scenario- oh sorry,.

D

Yeah, I don't think it's a big deal if, like uh a safeadm, has to like pass in uh a list of course to use, for example,.

B

Yeah, it's just another configuration property along with everything else. It.

C

B

Be embedded like there's metadata, even with classic osd, we like right down into the little blue, store config folder. um This can be one of those those things and it's soft state too, so it doesn't even need to be consistent boot to boot.

B

No, I think, I think the hard part is, as you guys have put your fingers on it's. There are a number of pieces of state like the watch, notify state the osd map state itself. This mapping from pg's to coors are that are fundamentally cross-core pieces of state. We need to write code for all that stuff. We need to define the messenger to osd interface, that an osd or a messenger osd interface, that appropriately deals with this localization of pg to reactor, and we need to test the hell out of it.

B

Everything else, I think, is just wiring up configuration stuff, which is also a lot of work, but less conceptually.

B

B

And um if we do want to defer this- and we are happy with statically partitioning disks and running multiple sds- that's fine too. We should work on something else in that case and use what we already have if we want to defer this work to later, but I don't think we should do an intermediate thing. We should either start working on the final version or use this version for a while longer.

D

Yeah from my perspective, I think it makes sense to start working on this sooner than later,.

B

In my perspective, as well.

D

It's going to be like the um essentially that make it make it have the same deployment model as the classical ost, which means that the integration with deployment tools is going to be quite a bit simpler. There's no reason to implement like a crazy deployment method that we're not gonna throw away in uh six months.

B

I mean it's not that crazy I've seen I've seen I've seen people do that with classical as osd, particularly with really really high throughput and vb devices. You just statically partition up the disc and run one osd. Each staff adm even probably supports it. If you do the partitioning ahead of.

C

B

Just give it the devices.

D

But so what does it gain us to run things like that right now, if we're going to go this direction in the future,.

B

Oh, it doesn't I'm saying as long as we support running from synosd at all. We necessarily support this. In the same way, we support it for classic osd, which is fine with me.

B

I'm saying we would we would, by default, support it to the extent that we care.

B

D

All right, well, this is great, sounds like we've got several different, independent projects that can proceed in parallel along this direction.

D

So looks like that last thing uh here is uh c store.

B

Yeah, there's a link there to just some current state. Okay. So in the last year the tldr is, we've got a transactional layer with a logical block um in direction.

F

B

A no node index, an omap and garbage collection and a little tool for running fio workloads against the lower level interfaces. uh We're just now wrapping up the work of wiring up the higher level interfaces so that we can actually run an osd on it. We've got the omap portions wired up as well as the oh note, stuff itself, I'm finishing up the extent stuff in the next week or two, and I believe shuehan is working on x, adder and uh the metadata machinery.

B

um Neither of us should be a huge deal um after that. um It's mostly there are two sort of big device integration stories to consider. One is that c-store currently is designed for zns devices, because that seemed like the primary design limiting use case. So I wanted to make sure the transaction internals were capable of dealing with it, but there are devices for which direct mutation of the storage is actually fine. So we there's a document at the bottom detailing the initial ideas of how we're going to support that.

B

But the short answer is we'll: create an additional storage interface next to segment manager for directly mutable storage, and we will add.

F

B

Additional mit phase to the journal, write portion and extents will know whether they came from a zns like device or a randomly mutable like device on a per extent basis, so we'll be able to support those sorts of devices in the same c store instance, once we, if we choose to set it up, so it can be configured that way.

B

um This should set us up nicely for future tiering like work where we tier between a smaller, faster device at a deeper qlc device. um The other major component is persistent memory, which ying then has.

F

B

Here for integrating into the cache itself- and the notion here will be that the currently ephemeral cache will become will when there is persistent memory, we'll just have its extents located directly there and we will uh journal enough state to reconstruct the mapping from uh those extents back to the physical addresses they're meant to represent, which, which should layer nicely. On top of the other support.

B

It also means we don't need to expressly cache things like metadata they'll, just be naturally cached as a consequence of caching. The blocks they're contained in it gives us a simple way of doing partial caching as well based on usage.

D

When we're consorting these pieces of metadata uh in the same form, they're they're represented in memory. So there's no like this decoding.

B

So by cashier I mean a block, cache 4k blocks, it's just 4k extents. um The thing that maps, those extents back to where they came from is an ephemeral mapping that we will periodically dump into the journal with delta updates as we go through. So every time we put something into the cache the transaction that represents it. We'll also have a special delta updating the in in-memory mapping, so that.

F

B

We can rebuild the mapping, such as it was at crash time.

D

The question is more on the cache lookup side, um so you're looking up an o node. What does that involve.

B

uh The o nodes are a b3, so you descend the b3 looking at each extent. In turn, one for zooms the upper levels of the b3 will be appropriately kept in cache, so those will be in memory or persistent memory. Lookups. Eventually you all you either fall out of cash or you find what you need.

D

Is it is this the same block cache.

B

The very same block cache the metadata for the oh no index itself are themselves blocks. It's just a b3.

D

Okay and as to be clear, there's no there's no decoding of that state, because it's the same on disk in those blocks.

B

Yeah, so you don't, for instance, need to pull the entire block decode the whole thing into a standard map, and then it all right. You interpret the on disk state to get to the appropriate uh buffer inside the block and then.

F

B

The fields as needed to get to what you're trying to get to.

D

But I'm getting out of it. Yes! Is that um because there's no extra translation needed that's. Why using a block cache for everything makes sense like in blue store, for example, um there's a separate block caching layer versus a cache that is dedicated to things that were already decoded, as of course accessing things that already decoded as much much quicker, but in this case there's no extra step there. So it's not necessary.

B

No um we're, I do not think the anc will show up at any point here right now. All we have are actual direct integers and like unexpressed buffers, so that's um clearly you don't decode those the object info.

B

Presumably you still do because that's the osd's problem, but that's a problem to be solved later, but the rest of the node now like, for instance, if you're trying to read out the extent information what you'll read out is a couple of integers representing the logical address, map or logical address range, corresponding to the object to the object's own range. And then you just directly look those up you! Don't you don't decode it to the extent that those ex blobs exist at all.

B

It's in the on-disk representation of the logical address mapping in its own b-tree. It's not separately represented in the node.

B

Right, the logical address mapping, incidentally, can be sparse. So that's where the smartness happens. It's there, not in the there's, no blob info equivalent.

D

I guess maybe further further in the future. I might whenever you think, the um after the info, the um piece, but you said this later.

B

Yeah and it's part of it's that it's it's just expensive to run like the the specific encoding we use is expensive.

C

B

There's that and then wholly separately from that, we don't usually need the whole thing so separating out portions of it, so that we only need to access part of it would also work.

C

B

Yeah, I'm ambivalent sort of about that as long as the data itself is in line within the o node under reasonable conditions, which it will be here, joy hound is going to inline up to reasonable sizes. The object info and the snaps up.

B

I think that's about all the object layer can do to help we'll return a buffer pointer. That will be the same buffer point that it was before um for persistent memory. Lofted.

D

Oh, we could reconsider the layering a little bit potentially.

B

We can, but there are buffer problems. So if we just let the osd hang on to those buffers for till the end of time, then it badly complicates things. C-Store is able to do with memory.

B

So as long as we keep the data bundle small enough and we limit the actual cost of doing the decoding, then everything else isn't nearly as important.

B

Like the main problem, isn't that object infos are expensive to copy or that they're expected to decode it's that they're too big in the first place.

D

Yeah, that's too much.

B

In them of numbers this, wouldn't we wouldn't even be having this conversation, you just copy the struct out of it and interpret the fields as you need it, probably without even doing the bitcoin link.

D

Is also the memory management aspect, um but that's probably less of an issue with c star.

B

Yeah and if it were small enough, you'd, probably copy it by value sure.

F

B

Wouldn't bother to keep allocated or you'd stick it directly into a pre-allocated field within an op contact structure that gets pulled.

C

D

Yeah, certainly breaking it down into a smaller piece that could be fixed size would were reasonably uh yeah. The kind of thing that we could look at in the future.

B

Answer the larger question yeah as we as this starts to become useful uh usable messing with that like we may want to move object info out of accelerating into its own special purpose thing.

C

Right now, special.

B

Purpose rules: if that's what we have to do to make it behave better. That's what we'll do.

B

We might even be able to expose the o node object itself within c store, thanks to ying zhan's work with the cursors in the b3 implementation. Permit, keeping the actual memory extent associated with the offset for a no node, um like uh in memory and usable outside.

F

Of that interface.

B

So we could conceivably even expose that all the way up into the osd, although with the set of caveats and wherefores, there is substantial, but it's a thing we could consider.

D

Yeah yeah definitely a lot of room for optimizations in the future.

B

See what other c-store things um then there's just like a ton of performance work to be done. It's also a little bit. Crashy right now, as chad may have been discovering, um so debugging would also be good.

B

There is essentially no way that any of the data structures I used in transaction managers are appropriate, so those will need to be profiled and replaced. um We need a lot of work like I've listed.

F

B

A few things here that are obvious instrumentation gaps, but it's the tip of the iceberg, there's so much more. That needs to be done there. The first step is probably to introduce some kind of instrumentation interface that we want to agree to use that and in crimson and then start wiring it up.

B

So if anyone's interested. So basically many of these little subheadings are things pretty much. Anyone could work on and see store has the benefit that it's a much smaller code base than crimson as a whole and right now it literally can't run in grimsum. So you can only run this little tester, which is much less code than crimson as a whole. So I would suggest that, if anyone's interested in getting comfortable with uh d-star.

H

What do you mean by instrumentation, you mean something like event tracking.

B

um I don't know so, for instance, I'm I can just. I can just predict what I'm going to want to know in the next couple of months. I'm going to want to be able to know how many extents were mutated, how many extents were retired? How many sets were.

F

Allocated how many.

B

Bytes were written how many bytes were released, um because I want to track for the bytes and release things, because I want to be able to track um the disk versus uh like um the number of bytes written versus number of logical bytes written the track um right. Amplification will also want to be able to track um usage application. The way the um garbage collector is working as well as performance indicators.

B

So I see lots of like it's not any one thing that.

F

B

I suspect we will want both event tracing with sampling of some sort and also aggregate stats. Accuracy. Stats are probably easier, so we should probably do that. First.

H

Yeah, it's actually some sounds like a combination of uh of a perf counter with event tracker.

B

Well, I mean, I don't know different things, probably but yeah.

B

Yeah I mean like a perf counter-like implementation would do the job, so I don't remember if we have one of those in crimson, yet we're probably getting to the point where that's a good idea.

I

We have yeah, it was added by partly by by trim mate, but I wonder if we could have a metric group offered by sistar and have it incredibly in our framework. I.

H

Believe it's already done uh at least abner moss was working on on that on damping. uh The sister uh provided counter counters uh to the we with the tooling. We already have in osd admin comments.

B

Anyway, I know we're already over so that's kind of where that's yeah. That's, I think, a good summary of what the next steps receiver are.

D

Great thanks that sounds like there are a lot of pieces there that, like you said, could be tackled by somebody, perhaps new to the project more easily.

D

That could be a good starting point.

B

We've actually got some xenon strides. I forgot to mention that so that's another piece, um I don't know.

B

I anticipate that it may be annoying to access the zns library interfaces through cstar, so that may be some amount of work.

B

F

B

F

D

I

uh I'm just out of curiosity are: we are we supposed to to access dns using some library for excellency.

B

I think so I don't know I have. I have quite a bit to learn there. We finally physically have some drives. So this is the thing I plan on tackling in the next week or two or month or two demanding.

I

If we need to use some user user space libraries for for accessing accessing and we need to integrate it into it into a system.

B

Yeah I mean it how it can't be that hard. One assumes okay, we'll see.

D

Could be more complex than the next aio right.

G

B

Know what if it doesn't, if it already has an iou ring.

F

B

That would be the easiest thing, except that c-star doesn't support iou right yet.

I

Not yet if I recruit correctly, but it is a pending pr or pending.

B

Oh, is it is it alive again last I checked it had gone dormant.

B

Anyway, that's another thing: we need to keep. We need to keep track of.

H

And another one quite similar to this would be about the user space drivers. Do we envision to poke to tackle them in in this cycle or postpone user space drivers for what dpdk slash spdk.

B

um I prefer iou ring to those things.

B

Personally, from a strategy standpoint, it seems more likely to actually work for a long period of time.

H

I see well, I'm not surprised uh the kernel drivers.

H

Looks far looks looks like far more proliferated than the user space drivers drivers as I'm asking, because we initially, we had some some requirements towards them and I'm not sure about its current status. Do we need to still worry about dpdk spdk, about user space drivers in general.

B

I don't I mean I don't personally worry about that right now. That means yeah.

D

I think it's too early to worry about it. I think we want to use what's what's available now and it's simpler to use and get the basics of the os of rooms and working first and a feature wise once it's complete, then we can start thinking about uh more optimal ways to do.

D

Networking display.

B

Yeah, I think that's right until things just become available. What we have now is fine, that's it's not our goal. I think there's not an immediate need to transition to more efficient um interfaces, but even when we do it's more likely to be iou ring than dvdk, so I sort of double, don't anticipate, dptk, being a big deal.

H

Okay, but uh those things are actually very different.

F

H

About uh about uh kernel uh provided iou.

B

Ring is an extremely thin wrapper around the mvp drivers. Exactly.

H

It's not it's, not uh it's not a replacement for dpdk. It's richard. This is a replacement for the cisco interface. Now.

B

They're, both now that in dbdk are trying to do similar things, they're they're, trying to make it so that, when you're accessing certain kinds of underlying devices you can do so with low latency and low overhead. They do it in wildly different ways. Admittedly, an iou ring does technically still involve the kernel, but it also allows the device to still behave properly for all of the other tooling, which is a pretty big deal. So I agree that they aren't the same thing. I don't agree that they're wild that they're like unrelated.

B

Anywho um mainly josh's point to sell this isn't actually the thing to worry about now: okay,.

H

Fine with that.

D

Yeah, I think one of the most important things to focus on for this cycle is more about stability and increasing the test coverage. Just like you've been doing right, you know knocking down tons of bugs and uh expanding the tests. I can run.

H

Well, still, still a lot of a lot of to do anyway, uh uh maybe five hours ago, uh the uh watch timeout has passed the unit test. So oh nice.

D

H

Getting back uh to the topology testing, just after polishing the code which I believe take maybe a few.

H

D

That's great and looking forward to seeing if we can run more tests and find even further varieties of bugs.

G

Yeah, pretty sure uh pretty sure there there is that a lot of them.

B

No, I guess our snapshots are probably equal for quincy that didn't make the list, but it should have.

D

Yeah yeah, um I think gabby will be able to work on the snapchats once he finishes up the his current project of the blue store allocation metadata. That's getting pretty close now.

B

I don't think, there's a ton to say about snapshots. The semantics and underlying representations are pretty analogous to what classic osd does.

D

Yeah, it should be a much more straightforward port than, for example. Scrub was thanks.

F

D

Coming everybody.