Ceph Performance Weekly, 2 Sep 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Ceph Performance Meeting 2021-09-02

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

All right tell I'll quickly just mention here um that I linked in the ether pad uh a spreadsheet that the core team has already seen, but for folks that are interested in crimson. This is um our first attempt, uh brett had kind of requested this at looking at um trying to simulate what we might see with a multi-reactor by just using multiple osds.

A

Previously we couldn't do this. It didn't work, but thanks to radic noticing a couple problems with uh with it uh and fixing it.

A

We now can do this, um so these tests are basically just looking at what happens when you stick multiple osds on a given uh device with uh both have our traditional uh classic uh object, store, implementations and then also in crimson, with science store and uh alien, using memstor or bluestork, um there's kind of a couple of interesting things, probably the biggest of which is that we we do see good scaling with random, reads and still see uh uh lower cpu usage in terms of like cycles. Prop, that's really good.

A

I think that means that when we do multi-reactor we're going to see that too, I I don't see any reason why we wouldn't, if we can do it with multiple osds, so um you know we'll see, but uh that's good news. What I'm a little more concerned about is why we're not seeing better scaling with uh random rights, uh blue star tradition, is actually really good compared to everything else.

A

We've got uh when you have like high io depth with random rights, so um we probably have some work to do there, uh just even baseline with one osd. We don't do great, with either mem store or uh science store, but um yeah.

A

I was expected to see this almost kind of like a linear increase, so I have to figure that out um then this weird behavior with sequential rights uh in in aliens or sorry in crimson. um I don't know if we're just like doing some right, combining through the classic osd we see in both memphis or bluestore, where we're quite a bit faster than anything through crimson. So another thing to look at: um that's that's it, though, really um a lot of data, but you know those were kind of the trends.

A

I saw any questions on any of that.

A

All right, let's move on then uh today is your knee higher josh. Do you want to talk about billions of tiny objects and stuff.

B

B

I'm sorry about the only okay, um so this came up from a discussion um with software heritage. um uh Louis actually, I think brought this up at a cdm call maybe like four months ago, six months ago or something like that. They basically came back and they designed this whole architecture that sits on top of radios to store bajillions of tiny objects. Basically, they're write once objects their source files.

B

They need to be identified by the hash of the content, so they can dedupe them and they're right going to write trillions of them and they're never going to be deleted.

B

So it's kind of a different use case than what radius is normally used to, and their approach in a nutshell is basically to have sort of they have radios on the back end and all radius is going to be doing, is storing um rbd images that are like 100 megs, or something like that. I can't remember how big no not 100 gigs, I don't know, they're big big rbd images um and those rbd images are not actually file systems or anything. It's just essentially like a big file.

B

That's a concatenation of a bajillion of these objects, um and so they have these right servers that they step in front, that sort of serialize the rights and just append to these big objects and they dump it into radios, and then they have a whole bunch of databases that track how to look them up later.

B

So the main, the main challenge that they have is that they want to be able to read an object by the hash of the object content, and so they have some other external database that um you look up the hash of the object. It tells you which chunk it's in which rbd image they're called shards, but which chart it's in and what offset in that shard, and so they can go find that object and they have to sustain certain read rates and write rates, but that's sort of the architecture.

B

In a nutshell, um the main sort of sticking point is that it's they they want to be able to look up by object id, but then they also want this like efficient packing and so the architecture. I think that they defined on the wiki. I think it'll work.

B

Fine, it's just like a whole lot of moving parts, um and it occurs to me that we can do almost the same thing with almost exactly the same thing, but with the um the sort of the tiering v2 stuff that we started to build up um using a sort of a subset of the vrados manifest stuff object manifests that basically, where you have a radius object where in the o, node there's a thing that says.

B

Actually the content is over here in this other object, so using a subset of that, we could accomplish the same thing.

B

So basically, the idea has two main parts: we would. You could make a radius class, um so it'd be a little bit different in that in their proposal. They have these big rbd images that are striped across radio subjects. um In this proposal, we'd simplify that so that they're just rados objects each charge would be a radius object and we'd make them as big as we think is sane. I don't know if that's 64 megs, I think we have a hard-coded limit of 128 megs, but I don't want to get bigger than that.

B

I think 64. Banks is probably big enough, um and so you pack up you know thousands many thousands of these inside of the greatest object and so you'd have a radius class. Where you say append this thing to the shard and either would say the shard is full or it would say sure, and it would the return.

B

um The right return would include the offset that it wrote to, and then you would take that offset and then you would create a second object in a different pool. That's named after the object hash and you would set the manifest for that object to point to the shard and the offset where the data is, and so as a result, if you want to read the object, you just read the hash from the index.

B

Tier and rados would do its little like read thing where it proxies the read over to the chunk and it would read it out and so, like the reading is sort of trivial existence, checks are trivial, like radios just handles that, um and then, if you want to do the bulk stuff, you would go to the archive tier and you just enumerate the chunks.

B

The shards, whatever you want to call them and you'd, read them in their entirety. So you'd read like these big 64 meg objects um and you could stream them out or whatever, which I'm hoping would be that size if that size is sort of big enough to get further like bulk um requirements. I think that would work too.

B

um It sort of chooses.

C

B

Different yeah go.

A

Ahead, why did they choose such huge rbd image sizes, and I assume barbecue is just for like convenience sake right, that's the only reason they were using it. Yeah.

B

They just need like a big thing with that they can squish objects together. I don't know exactly why they chose that particular value. um Their requirements say.

A

B

Million they want to enumerate a million objects at a time, and so they probably just said, oh, we can fit a ballet million objects into whatever it is 100 gigs, rbd images, but I don't know, I don't know that it really matters how big that is um with them to make them to make sure. um My guess is that 64 meg shards is a big enough bulk container um that you could still efficiently like stream these out in bulk and mass you could you could?

B

If you lose the you could throw away the index tier and you could just read all these shards and they contain all the objects with the when you put the object in there, you, the file or whatever you put a little header in front. That says like this, is the hash of the object.

B

This is the length whatever and then you'd stick it in there and then you have another header and then, like you, just pack these together inside this big yeah, so you just stream it in and you can rebuild the index if you wanted to.

A

It's all right right only right, I mean what this thing is.

B

Right only delete never read frequently, probably whatever it is yeah yeah. So I think that I mean there are like a few implementation quirks um because, like rights wouldn't be atomic, so if you have two people trying to write an identical object with the same hash at the same time, they must might attend it to the same chart or two different charts before they realize that the first one already completed and created the index object, though, if you had simultaneous writers, they might waste a tiny bit of space. I don't think that matters.

B

It probably never actually happens in practice, though I wouldn't worry about it. um Otherwise, I think I think it would work fine. The question, I think the real question is for us whether um we want to support um that raido's manifestop, because so far it's it's like a small piece of all that work around the deduplication, tiering v2 stuff, um and there are definitely bugs in there. I think mostly around the reference counting, because I think, but I think they don't need the parts that are buggy like we could. We could narrow.

B

So it's just the like this idea of having just a manifest that points something and ignore all the other stuff. We can separate into a radio stop a separate radio stop so that it's like distinctly destroying. um But I don't remember exactly what tests are failing, so I'm hoping yeah.

D

There are also those terminals.

E

There are yeah, I think, uh from my perspective, there are two kinds of bugs there are one tearing bugs, and then there are so there is development going in this manifest area um which do expose bugs they have also been fixed. So I am not 100 sure, because I haven't looked like all the chain. Looked at all the changes that have gone in this area about how stable that is. If we can get confidence, uh you know- maybe even um maybe I'll talk to sam as well, because he's been reviewing some of those pr's.

E

um What he thinks about this.

B

Yeah, okay, yeah, I think I mean I think, that's basically the I think. That's the question like. Are we willing to support this support, this sort of narrow use case? um And I guess that's the first question.

B

The second question for them is like whether this like alternate set of like size, trade-offs or whatever is division for them, um because if so, I think it's like it would be way less work, because I don't know I don't if you look, if you pull up that page in their wiki, they've got like postgres databases and these like a couple, different servers and dispatchers.

B

um I don't know it's like all this stuff on top, and I mean logically, it's like not super complicated, but it's just like a lot of moving parts um and I'm guessing if we can find something that is simpler than that would be.

C

Yeah, I need to need some of those to attract the chunks themselves, because that these only like they're third text on top of that, but- and I guess it relates to this one layer-.

B

Yeah yeah there's a whole other layer separate from all this. That's like the the bigger index, but at least for the storage component.

B

B

So so you're basically saying we should double check with sam.

E

Yeah, I I'll take a look at any existing bugs that are popping up currently and also check with sam as to how stable he thinks you know it is. uh Some of them have been like you know, mostly test issues that have gotten fixed over time.

E

Some are race things that have also come up, um but as far as I know recently, I haven't seen any of those, so I I'd double check with them and and in general I like the idea, given that we already actively developing in the end if they can just reuse the same um logic. That makes sense to me if that works, for them as well.

B

I think that the one thing that I might suggest that we do is that um I think a lot of the complexity with the the current manifest stuff is that it sort of is coupled with the reference counting, so that if you have an object with the manifest that points somewhere else, it like increments the reference count and if you delete this, it decrements the reference count um and we for this use case. We don't need that, and so I think we should make sure that it's um possible to describe a manifest.

B

That, like I don't know if there's a flag or if it's a different type of reference, that like isn't a reference, counted reference, but it's just a pointer um and possibly like separated out into a separate radius op to set these so that because right now that the um the radio stop that you use to create. This is called like set chunk. And it's it's like a. You can set basically an arbitrary manifest with like arbitrary content, um and so it's like kind of like a wide surface.

B

And so, if we have like a very narrow op that we know that has an actual user, then we can make. We could we'd be free to still fiddle with all the other stuff and wouldn't be stuck in some awkward support, compatibility, weirdness thing.

C

Essentially, the yeah, the their higher level database would be taking care of the reference guiding for them. So that's why they wouldn't need.

B

C

Well, they never delete, so they don't care about right right.

B

C

Right, that's right.

B

C

Yeah, and actually that actually run for um other users too, that, like that the right kind of archival right ones read many case, or they literally never wanted to delete for the similar kinds of reasons not not necessarily to archive it forever, but to collect legal requirements or something to keep.

B

Yeah- and I my recollection here is vague because it's been a while, since I looked at this, but um I thought that we actually had um in those these manifests. There was like a flag for to indicate whether or not it was a.

B

Whether yeah there is a flag for whether we have a reference, so the manifest structure when it points to it, there's a flag on that reference. That says whether we're responsible for decrementing a reference on the remote object or not, so we just wouldn't set that flag. I think all the like the plumbing is there. um I think we might just want to make like a sort of a more narrowly scoped radio soft, perhaps that um that does just this so that we can.

C

Use the part that works.

B

C

B

We'll be free to change the implementation. If we decide later, we don't want to have radio stops that have references or something then like we're not like. We don't have some rich api that people are using, that we have to break or we might want to modify or whatever yeah.

B

The other simplifying case in this, in this particular case, is that, like a reference is only ever one extent, whereas, like we actually support, you know a whole list of a single object. Logical object might have lots of references to lots of other bits for the dedupe case, where you like chunk it up, and that would we don't need that here.

A

Did you know what uh how hard this uh first bite of any object never takes longer than 100 ml quarter minutes.

B

From a performance standpoint.

A

Oh yeah, I was just curious, I mean that's, that's an interesting requirement.

B

A

B

They just want to make sure the reads are fast. I think.

B

Yeah I mean none of these require I mean that there's always going to be like some 99.9 percentile. That doesn't happen, but I think in general, like if you're, if your index tier is on ssds and then the other tier is on whatever like in general, the rates will be pretty quick um because what you'll hit the index tier and it'll proxy the read off to the thing and then usually radius observe not more than 100 milliseconds, but it all depends on like how you provision your storage right. So it totally depends yeah.

A

I'm even thinking about things like oh no cache misses if you have like lots of objects.

B

A

If you're, if you're talking about putting into like a 64 meg object, how many objects do you end up with when you're talking about, like you know billions of these little like chunks,.

B

I think the other thing that we might want to that would be that's worth considering here is that at scale this this index pool is basically 100 of your storage is owned, metadata, there's like no actual data, and so that means that your ssd is basically just rocks to be right, yeah and then things like probably every read, will be a node cache miss.

B

Basically, uh the performance balance stuff is like a little bit different, um so I think those would certainly want to be like pure ssd, um like you'd, want to have like just indian me back in that case, um so that you get reasonable performance, but I'm not sure if we've like really done much performance testing for how bluestar behaves in that kind of scenario,.

A

B

Okay, it kind of depends on what yeah it depends on what our, um what our expectations are.

A

Yeah I mean if you're, if the oh go ahead, josh.

C

Would it be uh bad if it was if it was all spread out across the yeah? Clustered like you had? uh Was these that have a mix of nvme there that might that might make it better.

B

Get a little bit of metadata on all the across all the clusters.

C

B

Rules or whatever.

C

That's in the bottom line with xtv is more cpu than um disk.

B

Yeah yeah and if all the osds have are like a our hybrid with disk and that's right, the metadata would look mostly on the disk yeah. That might be another. I mean. I think these are probably performance questions that, like they could experiment with today, right like even if they're, not using the manifest api, they could just like create an object with an attribute that has, but they don't even have to create an option.

B

You could just create empty objects and like modulo, like 50 bytes, it's going to be basically the same as what their um the end result would be and just see what happens when they create like that number of objects, either on like dedicated osds or on osds that are have some like the expected ratio of.

C

Yes, with the reference can design the manifest objects. Do you need to have a no node per tiny object.

B

Yeah, let's see that's the idea. Okay, the tiny object is yeah and that's, I think the whole point of going down this path is to like get that right so that you can just say: I have the object. You have to just go straight to rados and it it handles the indexing of like figuring out where that, where that content is right, right, just thinking.

C

About that nodes are pretty expensive these days, so that might be yeah. That's gonna.

B

Be a lot of overhead.

B

Do we know how big our doughnuts I'm guessing it like 300 bytes, maybe 200 bytes somewhere in there. I think it was larger than that uh yeah. I think so too josh for even for any object like an empty object, because there's there's remember: there's no data bytes, so there's no allocation. That's.

C

True, I'm not sure about it.

B

C

Yeah, because I think it was like it was it's pretty largely like like the network of 10k, um but that's probably because of all the checksumming and and blob information for a large object. Rbu is much smaller.

C

B

Sends big attributes, maybe like the rjw allocation metadata, should be small, because the blobs will be big because of all the hints about sequentialness. But.

D

C

Yeah, I guess that's something else to measure, though.

B

Yeah yeah like they could just do a test that just like does it create without writing. Any data on like million objects and just see measure the overhead yeah, and they could try like filling up an osd with empty objects and just see how things behave.

B

That'll get them basically doing a stat on an object would be sort of give them some sense of how the read will behave. At least the metadata overhead of the reits.

C

Yeah, yeah and- and that means elites is actually a benefit because that they're really expensive with blocks to be so uh that probably improves their performance quite a bit.

B

Yeah I mean the other thing: is they could they could accomplish? Basically, the same thing by also creating these.

B

So this is sort of the extreme version there's like if you take one step back instead of using the manifest thing, you could just create the object with an attribute with that same data, and then the readers could stat that object and get the offset the chunk and then go read it directly without having like radius. Do it transparently magically for you.

C

B

Would work almost the same way right, so they.

C

Could also do that.

B

On the client side, instead, yeah and and you can take one step back from that- and instead of using rados for this index, but just put that index in.

A

B

Or something else and then have their own server account like they did before and which, which means basically, that we're we're using radius objects for the right side of it with the radius class, but we're they're still using the read side index. Yes,.

C

So that points basically just radius objects instead of rbd and yeah right, yeah.

B

But I mean just like looking at their I'm looking at their diagram, a lot of complexity. It feels like for, if they're, okay, with the like size, trade-off of a of a big greatest object of like 64 megs, that has an rbd image of.

B

However many gigs yeah.

C

Anything else depends on like what what their kind of time frame is uh effect. Postgres is a system- that's already kind of tuned, for very tiny data like this and designed for working well with these kind of mappings.

C

It's something that's relatively unusual for readers, but it might be a bit more tuning and experimentation involved and make sure that performs well.

B

B

On the other hand, it's like it's well unrelated to them. I guess, but this is work that if, if we are going to go down this like tearing de-dupe whatever route like, we need to do like we want those two to behave in this scenario. Yeah.

C

Yeah, we need to fix the plugs. We want that.

B

To work anyway, yeah and they could take a they could take a tiered approach right. They could do. They could use radius, object, charts and use an external database for the lookup and then later, when the radius issues are sorted out, they could just push that into radius and like get.

A

B

Thing that they wanted to.

D

C

Anyway, okay, looking at the flagger and there's like 27, open bugs in the gearing category, I'm not sure how many those are troops or, um but there are a number of them that are even this year.

B

C

B

I gotta drop off our conflicting meeting. Okay run away.

C

E

Yeah yeah, so talking of those bugs some of them uh could definitely be related to old versions. um Like I see some from 2017 etc, which we are not seeing today, I don't remember: having removed any turing tests yet so um so I'd be I'd, be curious to see these the new ones, the 2021 and 2021s, which are still relevant so yeah.

E

If, if we really want to go down this path, then we want to see how relevant these are and, as I said, I also want to understand sam's confidence, because some of these bugs have been ignored in the past because nobody was using it. So we don't care too much about the priority of these bugs. But if somebody wants to use it, then we will take it seriously.

C

Yeah, there are definitely things around like ordering and recovery and and uh pg logs between tiers that are not have not been fixed.

C

But yeah I'm curious to hear what same slots are too.

C

I think, but I think he's weak yet.

A

Linked in the chat window, another.

A

But it describes that mismatch.

E

I just asked sam to join it. He might just pop in.

C

ah He says it's not stable at all.

E

A

I think the needing an o map for every small object aspect of this is okay.

C

Yes, little spreading.

A

Maybe if we could somehow really really scale what that means down.

C

Yeah I mean gabby and I were talking a while back about an idea like this, where you could have a specialized pool just for storing metadata like this, or you could have a dedicated kind of type of node structure that you wouldn't need to take much much space. It would be kind of like more like a maybe a larger spectra that would uh contain these kinds of references.

C

I guess we could make sure we were talking about it like as like a kind of hash pool, but be specifically designed for storing this kind of information instead of using the generic united structures.

C

That could be much more efficient.

A

I'm still really curious with our current own node structures, how much of it we actually use every time we we read no note like do we actually use these yeah, or are we really just grabbing them mostly for a couple fields.

C

E

Yeah I was just saying: sam has just joined sam uh you're, just curious to know your confidence in the tiering code and uh um just the manifest of handling the usd it's extremely far.

F

From being ready.

E

I wish you were here a few minutes ago.

F

Like every time a new test case gets added, it turns out a significant portion of it doesn't work like there's it's. No, it's extremely extremely far. It would need. I don't know a person year of work, to get to a point where it's even testable.

C

Yeah, I guess sage was kind of hoping that um we're stringing it down to a very small subset of what the manifesto is capable of this use case, namely just tracking.

C

Just just purely packing uh many small objects together. Nothing any of the reference counting stuff would reduce the surface area of bugs but engaged. I guess, there'd still be the general tiering bugs it would affect them.

F

You still have to like write that code. So yes.

F

A

F

Many small objects together is that even.

A

F

Because the um the cache pool it's coming from still needs to sort the metadata pointing at those at those packed objects right.

C

Correct: that's what we were just discussing too, that that's our landmark's main concern with this kind of design.

C

Just like the metadata overhead of the individual nodes and graphic objects.

F

Yeah, my first instinct is that packing them in another pool is exactly the opposite of what you want to do. You'd want to store them efficiently of small objects in the first place in the cash flow or whatever a redirect makes the problem harder, not easier.

C

And so perhaps like I could, I will optimize for small objects, could do this kind of thing, but that would be entirely new different thing.

F

Yeah, but that's essentially what you're signing up for when you like yeah, like I said just doing the redirects part, is already really expensive. It's worth it for this sort of deduplication strap because the idea is to cut down on total space used, it's not to store a ton of objects right.

F

So I don't, I don't think a redirect is a viable problem shape. There is a viable solution, shape.

G

So the main question here, I'm just jumping in um as a newcomer to this problem, but um the question is whether to use existing infrastructure at the cost of maybe efficiency or to create invent something entirely new or efficiency, but then there's the problem of creating something entirely new. That's what's the dilemma: yeah.

C

Anything with that structure instead do this outside of stuff and not um use an existing database to handle this kind of redirection metadata.

A

Got it even reusing? What we have, though, is sort of like inventing something new in a way right, yeah.

E

So in general, sam, so if I had to ask you what exists, I I know there's a lot of code around this that exists in the source tree. So what parts do we have.

F

I mean we have the ability to set or redirect up at the cash pool, to an object in the base pool um trying to remember it doesn't actually have the ability to point to a sub-object, so the packing feature still would still need to be implemented. That's a whole new thing that doesn't exist now.

C

F

But again, I'm I'm, I'm pretty sure this is flatly not viable. If the goal is to store a lot of these objects, then the metadata, the cache tool is already going to kill you. You have to store that efficiently and by the time you chose to do that you might as well store the objects themselves.

A

And why why I hear this? What fundamentally breaks down to me for me is like how do you efficiently store billions upon billions of files and rocks to be or not files, but uh you know key value pairs. That's kind of you know not really.

A

A useful thing for us to be spending a lot of time on right now. I feel like.

F

No roxy is perfectly well suited to that. That's not really a problem. The problem is, as you pointed out, that the node structure is a lot more.

F

The structures the osd keeps for its objects are a lot more powerful than what you need for very small objects. So, yes, a small object pool where we don't keep most fields where we assume immutability and other things is what we have to do.

A

I sam I, I somewhat disagree with your premise that that's the rocks to be as well suited to this. They have a requirement that no first bite read of any object takes longer than a hundred milliseconds. I don't think rocksteady can do that with like and well it's not in one instance, but just like, maybe you know, hundreds of millions to a billion e-value pairs or more in one instance,.

F

I mean we've only really done testing of rocks to be under release the really specific conditions that blizzard uses it for so maybe that's true, but I don't think we know that.

A

Lack of deletes helps- certainly we talked about joshua mentioned that earlier, but they'll think it's unlikely to work.

F

A

D

F

Use case where they need to store a ton of small immutable objects without the idea.

C

F

C

Software heritage foundation um use case where they're kind of archiving um billions and billions of tiny objects, like mainly the objects from gate repos and trying to pack them together and index them. Their indexing scheme is actually in a separate, separate storage system, but they're going to use f purely for storing the contents.

C

The other repos and the um potentially the mapping form.

F

Suited to this, namely um using crush to distribute the objects across those. These was a good fit.

C

Yeah so they're.

F

They're design.

C

Their alternative is basically using rvd images that are uh that they have their own indexing scheme into where they write a bunch of small objects into the rvd have their own index within that rbd of what which objects are where and also store that information about how to how to look up the different repos in their in a separate database.

F

Yeah or you know a radio subject, but whatever.

C

E

Yeah it does sound like like uh josh, you asked this question earlier, but if they want a solution as early as like you know now or like you know soon, maybe going the rbd route is a better way for them. Given the state of the hearing court.

C

Yeah, certainly right now, um I think their time frame is more like six months to a year, but um even if we started working on this stuff now, I think it wouldn't be dirty for quincy. So.

F

No- and I don't know that there are a lot of other users like this- that's the real.

C

F

Yep like this would be a large development effort. Like I said, the large-scale properties of ceph seem to fit this. Well, um that's a way of distributing these keys across the whole cluster and dealing with recovery whatever, but there are a bunch of details that really don't fit like we like to do recovery on a per object basis. That's flatly inappropriate for this, even within the osd we'd want to pack them and only consider them on bulk during recovery or whatever yep.

F

um So I'm not I'm not sure this is even just a matter of making efficient donuts. This might be a matter of creating a pool where, within each host, we have a much more complicated indexing structure, yeah. So.

B

F

Gonna happen anyway, then I kind of rather they do it.

A

I don't totally hate, as I think about this, that they're dealing with objects in from seth's perspective as being big, coarse grain things, and they deal with the problem of like indexing. These tiny objects themselves through some other. You know, presumably fast mechanism.

A

Rather than us trying to do it.

F

Do they have candidates for a database.

C

uh In the diagrams, they have postgres clusters and I think they're also using cassandra.

F

I mean it should be easy to come up with a partitioning scheme to scale to where it'll work. One wonders to that point why you don't just store it in the database, though.

A

Yeah, this data is not big and they're, saying, like 50 is under 4k.

C

Ain't they concerned about the massive scale that they're going to have so.

C

Just by understanding why they, why were they looking specifically at, except for that contents.

F

It's the metadata data ratio. So if what they're going to be storing is a.

F

Say 128 bit identifier for each one of these and what the it's get hashes. Is that what we're talking about.

C

F

So it's 128 to 228 bit mapping.

F

Anyone want to do math for me. How many bytes is that just woke up.

F

128 divided by 8 16., 32 bytes per key value pair.

F

If it's 4k of payload, then.

F

Anyway, I'm worried that the metadata cluster ends up being an appreciable percentage of the total storage footprint. Absolutely, but I guess it's still a solid factor of 10 less so maybe it's still worth it.

A

I mean there are other requirements here. Is there is no space amplification for subjects, but that's kind of yes right. I mean, I guess it depends on what they mean by that, but they're certainly going to have metadata overhead.

C

Oh sure, there's like any data overhead in any kind of system. I think they're yeah, it's really about like minimizing that, rather than by by the packing strategy.

F

I'm not complaining about the existence of the of the overhead. I'm saying that if the second metadata storage system is big enough, it might be better to simply expand it by a factor of 100 and use that as the only system of thus avoiding sef entirely.

F

But I'm not sure that's true, in fact, I'm sure it's not. This actually does make sense.

F

Assuming they don't try to store anything else in the metadata system that would make it bad.

C

That same since I that's missed your this discussion, I'm adding you to the next meeting with them later today, so we can talk more at that point.

F

C

C

F

I'm giving a talk at 10 30, but I should be able to make that okay, cool.

A

All right: well, what do you think guys anything else to discuss on this, or do you want to continue on a different meeting.

C

I think we're good for now.

A

All right cool thanks for joining sam, appreciate your input.

E

Thanks sam for jumping in.

A

Cool all right well have a great week. Everyone we'll see you next week. Thanks see.

C

A