Ceph Crimson Weekly, 22 Jul 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Ceph Crimson/SeaStor OSD 2020-07-22

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

Let's start um that's weak I mean I was working on some other streams. It's just like the anemic Butte and also reveals them there being a tree named yeah, and this week I will rip up your review and pretend to the crash when in doubt at every place at Hearst. That's it to me. I.

B

Modified the extend my tree, the 32 x comments and I still have some questions, so we discuss it later after this data.

B

I designed the Oman tree layout, so, let's also discuss it under make make a payment under layout a roadmap tree.

C

I'm I'm still doing the scrub currently have being basically a scrubby working got preemption working, also tested it and working on some problems with optional, spacing which I changed, which I modified, but it's getting better and other than that. I'm reading appears that it remains yeah.

C

D

A

D

It occurred to me that I can put off free space handling almost completely and do it all in memory, so the lb8 remapping itself, authoritative, Li, represents all uses space right if it's not mapped by the LBA tree. It's not use so during startup is just gonna scan the LBA tree and count up all of the blocks and use in each segment and then just maintain that Napa memory. It's like a few thousand entries. It's basically free like a megabyte of space.

D

It's a little expensive for the initial scan, but we can improve it later. If we need to, and the main trade off during operation is that when cleaning segments we'll have to do a bunch of reads to find out whether blocks are still in use or not, but the truth is for qlc flash in particular I think we always want to trade reads for writes until we're done, but we have proof otherwise, so I think we'll go.

C

D

Implement this and we'll do some performance testing and find out how it does it has the benefit of being super easy, so I'll be able to get this and save them cleaning gun at the same time, I think it's so correct. We're still we're still writing down transactions that represent whether things are you Bob. It's just that on startup, we'll have a structure we'll have to rebuild.

A

Because wait on the reboot- oh ok, all the time. Ok! Well,.

D

I'm not even sure how slow it'll be to be honest, I remember, we don't have to read the whole disk. We just have to scan the tree itself and for a fast flash device, I mean yeah I'm, not convinced that it's even a problem we'll have to see. Ok.

D

umm If it is a problem, then we'll do the thing we'd have to do anyway, so it's all I think we're losing anything. I.

D

C

D

What I'm working on now in a week or two or I'm off next week, so this will probably take several weeks, but after that we should have enough that I'll be able to wire up something stupid and do performance testing with up I/o or something huh it'll be below the level of the rest of the outlet store functionality. So it doesn't require anything else. Just transaction manager.

A

Ok, what are these words I, still struggling with the templates, because I am how the remover does not try to compare somehow the remover I you Rico I. Have it we remove a maker which tried.

C

A

But that is somehow it's fair to find as a right and remover, and it's just this compiler complains that it can that remover is a incomplete type, still I'm still struggling with it. I will try to come up with that with mmm when unit at which just exerciser remover tried to find out the minimal minimal the producer, because yeah hello, other example works for me, but when it comes to a remover, is that it fails? I will find out. Why trying to try to finish up? Okay, we'll.

D

Talk a bit about at the end of this meeting, I think about the other thing. That's.

C

Last week, I was trying to implement some implemented. The interaction mechanism, execution.

C

Replacing insist our futures along the path with their a raid on the future after additionally I will try to end the interrupt condition in.

A

Reckon might want to take a look at it.

C

For the Onochie, the unit unit has for internal now, the third working and the implementation, the template, the template. Id code is mostly ready and and I'm working on the rest of produce to pass the tests.

A

What's the last part, I think you, you mentioned that the internalization now works. What.

C

I'm currently, the next step is, is the internal. Our first fight will implement a unit test for for the internal nodes plate and then you're, using that those tests to implement the plate to complete this plate logic actually for the internal node, then I will start working to make it work on the see, stop block layer.

A

A

Thank you before we all could anything else.

A

B

This continued discussion Sam, so you don't you don't think our exchange outside is needed in the external map.

D

Let's back up a level, so I don't think it's ever a good idea for us to map an extent with unused space first thing. So if let's say at some point, an object has a fork: ax down mapped and we overwrite one K of that. I claim that we should rewrite that entire extent. We should not keep three K of it and write a new extent with only one can up, though I think the same is true of larger extents.

D

So let's say we already have a 16 K extant written, and that might happen if user action or of RVD actually sent down a 16 K right for whatever ism right. If we get a 4k right in the middle of that I think what we should do is not punch a hole in the middle of the 16k extent and remember two old references to the same extent, plus a new one in the middle I think we should split it into an eight and two fours or a 16 with an actual mutation in the middle.

D

They don't really care, but we shouldn't leave a partial reference in the extent map. So, for that reason, I think we should go one step further and the extent map should always be blocky and length aligned.

D

So if the user submits a 200 bite right in the middle of an object, then the layer above the extent map whose only purpose in life should be remembering the mapping from the extent offset or from the object offset to the logical address um the layer above that should say: okay, this is a 200 byte right, so I'm actually gonna grab the 4k x. The 4k aligned extent that contains it or possibly, two of them right.

D

If it happens to cross a boundary, it should ask for all extents that cross that padded out value, then it should return the set of extent that result from the right, whatever that happens to be. If we do it that way, there's never ever ever a reason for the extent map do not map an extent that starts the beginning of an object or of an extent at least I can't think of a reason why we would need one.

A

Reason more like L, okay, to run up to the the multiple over block right, yeah, okay, I.

D

Don't think there's any reason to do less than that, like we're. Never ever ever going to do allocations that are unaligned like it makes no sense for K alignment exists everywhere in the block stack like if we get a 200 byte right, it's not well, it's nothing in the stack assumes that that's a line there's no way.

B

So when and when asketh air be added to one page, for example, okay and object to right to the outside, where hundred they write to the for pee outside about hundred right I,.

D

Claim that something will happen above your code, that makes sure that it only ever asks for the whole for pay extent. Yes,.

B

So the the right floor bill allocated it, the folk' from air, be a yes.

B

They only write to the from the 100, the outsider. In this extent, where hundred, for example, the hundred in your example right to the by hundred it's not from the extended beginning, it's justa from the extended by hand or with yeah.

D

That's exactly right, yeah! That's.

B

Normally help in the external map tree, we couldn't. We needn't recover that at the data strata from the 100, because.

D

It doesn't, it starts at zero. As far as the extent map is concerned, it's just the external.

B

My tree only recorder that from object that 0 to map to the.

B

D

B

Oh no, there should the to most things to remember that the data start from the 100. Oh no.

D

No, no, it doesn't have to the first hundred bites can be zeros.

B

D

Read zeros there were literally be zeros on disk. That's what it means to write to offset 100 in a rate Oh subject. First hundred bytes are zeros.

D

Nothing special has to happen. The only trick that actually does matter is that the oh no needs to remember that the object ends at byte 200. That's the only tricky.

B

D

Correct because the extent that doesn't care the oh, no care, though the Oh note- needs to remember that the objects real size is 200. Bytes.

B

D

A field called size. The size of the object is the only educator for everything else. Within the object either there will be an extent mapped or there won't be. If there isn't, then it's a whole and at zero. If there is then whatever the bytes there that's the answer. There are more sophisticated strategies, but I don't want to deal with them now and they require a hell of a lot more design than we've done so far. For.

A

D

You're going to use partial if you're gonna remember a partial extent, you need a way to figure out where they are because you're leaking space right.

D

A

Don't really get it: okay,.

D

A

Extent tree just allocate the the the the distich the displaced in in a unit of proc size, and it comes to the extent tree we try to which it's all. We are open attitude to keep track of the extent even whatever bytes. Oh.

D

No, no, no, no, we end the unit of pages or they might be byte the bytes just because it's easier to do but they'll always be block aligned, page aligned, whatever, whatever the minimum allocation size at the transaction manager level. That's what extent that will track so.

A

If you do want to write a memory trunk, starting from for like for one to two bytes, so it's a it's a it's a it's a it's! It's his or her responsibility to read: read back the the fight started from four four zero: zero right, 4,000 and then writing the back after the 4 4 1, 0 0.

D

Yes, the user would have so the in that it has to work that way anyway, because you can only read whole blocks. So if you get a right at offset 4100, you have to request all of the extents that overlap that block do whatever you're gonna do to them and then write back the results back to the extent map. It might be that you mutate the blocks in place, in which case you don't have to do anything to to the extent map.

D

It's just journal deltas or it might be that you have made choices that let's say now. The extent has been is as old, and you want to rewrite it. So you'll read a new extent copy the data over to it and then replace the logical address in the accept map, but that logic needs to happen above the extent map or several reasons. First extend map that H is an interface from this I infer that the goal is to later have other implementations.

D

We want to the choice of representation on disk of the mapping to be independent of the choice of policy for when to do in place rights versus when to do replacements.

D

The only thing I want the extent map doing is managing the b-tree everything else I want to live in the layer above it.

A

D

A

D

It's just a b-tree. All we're doing here is maintaining a mapping from numbers numbers and.

A

A unit of the allocation Estabrooks page yeah.

D

And that's not even the extent a problem- that's you can add, asserts to that effect if you like, but it's the users problem to make sure that that happens, and you see how this works at the next layer up right. It's a bit like how blue store has alignment requirements on blobs.

A

D

Exactly what we're doing here now later on, if we want to do something clever like tail packing, which is the the thing where you take the ends of objects which, for instance, for very small objects, let's say you have a 200 plate objects. You don't really want to use a whole 4k extent for that right, we're going to for now, but you don't really want to so what you could do is the transaction manager could remember, and it's like a block.

D

It has on disk that it's packing little bits into and every and we you would then update extent manager. um You can either update extent manager to be able to remember more about this to say: oh, this isn't a real extent. This is a blob that has separate accounting rules or it could. If it's just the tail, it could be an extra member of oh, no, because there'd only be one of them right, just the end of the object. That would be enough to optimize for small I guess that case.

C

D

That case, you wouldn't use an extent map at all. In fact, you'd want to avoid the extent nap, because it's just space overhead. It would be, in that case it's a 200 but object. You don't need to not expense the whole things in the tail right.

D

So there are a lot of versions of that where the extent map never has to know so, I want to put off worrying about it until we have a more concrete design, yeah.

A

There's a good chance.

D

That it's, for instance, if we're targeting our BD, there are no 200 bite objects in our VD ever they're all 4 4 megabytes. Moreover, rights already come down in page size increments. Anyway, we don't get fours. We don't get some 4k size rights, not a thing not really, because it comes typically from a file systems. Page cache right in real life, our be.

A

Impossible to to to make it a config configurable options for the unique unit size, because in the case of our beginners.

D

Yes, it: what is what its gonna be? Is it's going to be that the extent map never Maps things smaller than the transaction managers minimum lot size? So it's not even a cannot even a config setting it's a it's an API call. You just ask the transaction manager what the small sizes I haven't bothered to set it up, but it's okay for now, I mean.

A

In the case of our we did user case, because if we can't determine I did a very beginning that the pool is only used for our BD, we can. We can save, with a lot of metadata space, a space for kami trap card with a poke, a lime alignment branch locations. If, though,.

D

A

Know it's not orbiting that might.

D

Be for their Parvati is already what we're optimizing for right, we're already tracking allocations smaller than 4k there's nothing better. We can do I mean.

A

We can we can track trick the the allocation you need like with.

D

Rhythm right, that's no! No! No! We, okay! The object sizes! Four megabytes rights can be as small as four kilobytes. Okay.

A

D

You see what I mean what I mean. Is it's not that we don't get 4k rights, that we don't get 200 bite rights and we definitely don't get unaligned rights. Okay,.

A

D

Because it comes to the page caps right page cache size is 4k, so typically we're going to get multiples of 4k somewhere between 8k and 12 K I think our most common.

A

But it's I.

D

Either way, I think it's I, think I'm safe and saying that we're not gonna get 200 byte rights not from every day. Oh I.

A

See but it reminded me that on some settings like arm it's a it's a it's an it's, it's very possible to have a like 16, 16, kilobyte, page.

D

I understand, but it's not smaller than 4k right, don't work, hey does the trick. That's a minimum size, not a maximum size right because.

A

Improved or well, you want there's a PR that of tries to change the page size for to get.

D

Immobilized, like.

A

That, yes sitting naked in it because he mentioned that we we told Astapor like 4k 4k any any page size do not 4k so I, just I'm just wondering. If we can, we.

D

A

Have way people.

D

So this this doesn't matter. So if we get a 16 byte right, we're gonna or a 16 kilobyte right, we're gonna get a 16 kilobyte allocation. 4K is a minimum, not a maximum.

A

If we can't have a minimal minimal, brook sizes like 16, okay,.

D

Why would that be good? That's what I'm asking I don't understand why it matters, because.

A

It could say with the right backing that, if we actually.

D

Get a 16 K right, we're gonna do 16 K allocations. Anyway, we don't need to set it as minimum. That's just what we're gonna do. Okay,.

C

D

C

D

We're not going to go out of our way to break things into four key pages that doesn't make sense. It.

A

Depends on how we track the free space if we try to free space using bitmap.

D

It's literally at all, yet eivol.

C

D

What we're doing is worked works, we're tracking the total amount of free space in buckets. So if there are a thousand segments on the whole disk, then we track the amount of free space in each segment once it gets low enough. We go through the segment and remove everything, that's life. After that the segment is completely free. We don't remember which 4k pages are alive. We just remember whether an entire segment is life. Oh right.

A

D

A

D

Quite different yeah.

A

There's actually yeah.

D

Allocation is actually free in this model. What's expensive is garbage collection, okay, yeah.

C

I'm not saying you're wrong. We.

D

Will eventually need to tackle some of these things, but I only want to tackle the things that matter, police prefer doubt. No, yes, so may did you follow that.

B

D

The idea is that we don't really get to it or bite rights. It's not a real thing, for our BD at least were mostly going to get 4k and larger alliant rights. Yes,.

B

D

It's a block layer right block layers, don't tend to get itty-bitty random rights. Most discs can't even handle it. So for that reason we don't want to optimize for little rights. It doesn't make sense. So if we do get a little right, we're just going to do something stupid, we're going to allocate a big space around a little right and treat it as though, or a big right.

D

We can choose to do things smarter later, but that's good enough for now.

B

Okay, so we don't care about right out at the zero or something like that, so we needn't recover the offset in this exchange right. What's.

D

Up sorry, I didn't get the beginning, yeah.

B

So the ex country only recover the beginning of the the page are being allocated and in the lands this be allocated right, yeah to map it to the object block aligned outside in the bro in the object. So since we don't recover the, where are the real data of that in this well.

D

It's important to remember if it starts at a hundred. There is real data before a hundred. Yes,.

D

No, it does matter, we have to read out zero, we're required to.

B

Yeah, what reads out zero that will not influence there without the read without it's just.

D

Oh, you know no you're misunderstanding if it's the same as writing till two like in normal POSIX. If you write to the one gigabyte byte in a file that is empty, the first a gigabyte of the file of zeros, it's not optional, you really do have to return zero.

D

You won't typically put the zeros on disk, but if anyone does do a read to those extents, you have to return zero. So in our case, if you write to a very far out extent on in the in the object like four megabyte and the object is empty, then we'll allocate a little fork. A chunk around that write with zeros, except for the part you wrote and the rest of the object will be implicitly zero.

D

But the important part is that, as far as the extent nap is concerned, it doesn't care the extent that it maps that whatever offset is the data and the user above will have ensured that whatever data actually got written gets written into the extent and the parts that need to be zero will actually be zero. So the read results will be correct in the future.

B

No, the zero does have has map in the index and map for the debate. No, no, no matter in the exam lab right. So.

D

Let's say you do a right at 4, K plus 200 bytes for 240 100 bytes from keifa's example. um Well, is that 4 k plus 100 is that 0 x4 yeah anyway, though? The what will happen is the user layer will allocate 1 for K extent, starting at 4 K. So the extent map will contain a single extent, beginning at 4k of length 4k, yes, the first 100 bytes of that extent will literally be 0 because they have to be yes.

D

The next hundred bytes will be whatever the user wrote at the part after that will be 0 again, though, that doesn't matter as much, because it's not really part of the others. Okay,.

B

D

If the user does a read at like, let's say, 1 K 2, 2 K.

B

D

On the user, at the user code will map that to okay I need the page from 4 0 to 4 K. It will find there is nothing mapped and it will infer. Oh, this is a whole I return. 0 right. Yes,.

B

D

That's all that's going on here. It's really no different from the way blues. Blue store works more like this is a normal way of handling things. If you don't expect to handle a lot of small files in the future, if we do need to handle small files, we will have to do something smarter, but this will work for now.

B

D

Like in extremists, we could just we could we could pack like we could pack small bits of object into the Oh note itself even like there are a lot of little tricks. We could play so I. We can think about ways to do that later, I. Think for now we should focus on our VD.

B

So I think I understand what extend might redo.

D

And I'm, okay with other designs, I just want like um when you're writing out your interface people reading or reviewing your code and later on people reading your code because they need to either use it or expand upon it or whatever or just understand it. They're not going to be able to guess necessarily how you meant it to be used. So it's really really helpful to have at the top of the interface, for instance, just an example of how you expect the user to work.

D

Yet each method should have an explanation of what it does of. Why.

B

Nobody care about where the data is stored in this external right at the outside. Nobody care about it.

D

What do you mean by that.

B

Okay, but they try to the 4k plus 100.

D

It definitely cares, it has to be at 100 right. The user didn't write.

B

400, so nobody recorder, where the data outside, why.

D

Would they need it? Why would they need to.

A

D

On why would they need to so.

B

We redid the whole egg stand out. Yes,.

D

Yes, that's that's correct and.

B

From before the 100, it is zero. Yes,.

D

It literally has zeros right: okay,.

B

So if the data less is it's 200 after the 308 is zero. Its.

D

We were free to do whatever we want there, but, yes, it should probably be zero. The Oh note itself the next layer up we'll know that that's not object. Data.

B

B

D

Oh node can record the size. We have to do that anyway, because we have to be able to return the size if asked.

D

This is a case of the last extent. This isn't true of any other except right, because the the object can only have one end.

B

From the 100 a to forgive my 300 and that the reader size is Feb Henry, yes,.

D

Hang on a sec from where did the read start, the.

B

Data stop start at a 100 to 300 yeah.

D

No actually at 4102 4300, but yes,.

B

3100, ok behind rupees, 300, yeah.

D

B

If the read operation reader from 0 to 500, ok, 0 and a to 4k past 500,.

D

B

D

End of the object, which is 0 for 4k plus 300,.

B

But we don't know yes, we.

D

Do do note, does yes, we do yo, no, does the size of the object is 4304 k, plus 300.

B

Where you see the real get less I think.

A

The note should be the cracking the real size of the.

D

B

D

Because if so any other extent in the object, if it has zeros its zeros.

B

There's no, there a hope between the extent I.

A

Think they're the two cases right. Why is the? Why is the host with a with a boundary of the block under the case like like external host and the internal host? In the case of internal hood, when the rate kind, the caller?

A

Okay, the trunk it'll write the data into the portal, for example like the it offset the date? The trunk before the extent between one under 100 is zero, because that's how the data is written to the disk. Well,.

D

We can find it to be zero. That's the rules, it's not an accident. The Raiders protocol actually requires that.

B

The only recounted the total size of the object, the right yeah.

D

So let's say: let's: okay, let's say the sequence of of rights looks like this Oh.

C

D

A

D

A

D

We've done to rights here: we've done an empty object, one at 4k, plus 100 or 200 bytes, and a second at 12 k, plus 100 200 bytes right.

C

D

What happens here is the first right causes the object size to be 40 through to be 4k plus 300 right. That's not that's like we're. That's not optional! We're required to do that because, because of the way Rados works, several components rely on us. So after the first right, the size is 4k plus 300 after the second right, the size is 12 k, plus 300.

D

So the question is what, if we get a read at.

D

Right, that's what you're asking.

B

Because there's no, my team.

D

The first one doesn't have a mapping sort return, zero, correct. The second one does have a mapping, though there's data mapped at 5k, plus 100.

B

Okay: okay, no fab cake; the.

D

4K one is 4k insights right, we're always doing for 4k pages. So the extent at 4k extends from 4 K through 8 K, so that 5k plus.

B

D

First map is that 4k? Yes, so the very first read the 100 read: yes, there's nothing mapped there and you get zeroes implicitly that is you get back 200 bytes of 0 right, it's a successful! Read you get data back, it's just all zeroes! Yes,.

B

D

Second, one at 5, K plus 100 for 200 that actually does lie in a map. Two steps right. It's that first extent we mean that 4k plus that that first 4k.

B

D

Literally there vadik stems has 100 bytes that were whatever the user wrote and the rest of it is 0. Yes,.

B

That has my baton, no data. It.

D

Has data zeroes.

B

D

A literal extent that has 4k of zeros and 100 bytes of something that offset 100 right, but the rest of it zeros, and it really is data. It's like there's no later on, there's no difference between those zeros and 100 bytes, the user wrote in fact those hundred bytes could be zeros. We may not have bothered to check ok, similarly, the 13k one that actually falls at pass to the end of the object.

D

So you get an error message or an error code rather a range, maybe what I mean it's within a mapped extent, but the Oh nodes going to notice that it's passed the end of the object? No, it's going to return an error.

D

What do you think it'll? It would return right.

D

So 500 bytes starting at 4k. What do you think would happen a.

B

Great little file back a 500, a bass right, zero, wait.

D

What sorry, the first hundred of zeros, the last.

D

And the users written bytes are in the middle yeah exactly.

B

We don't care about the real s, no.

D

But I keep saying this and I need to say that it's really important. The zeros are real data.

B

D

No difference between those zeros and you and zeros, the user actually wrote.

B

D

There's nothing preventing the user from actually writing zeroes M. If the user writes an entire page of zeros at offset zero, we actually could you just not do the mapping right. We already have rules that say ah we were gonna return zero anyway, so it's a no op. All we have to do is update any end time or whatever stuff right there genuinely is no difference between implicit zeros because there's a hole and zeros. We remember because it happens to fall into an extent we bothered to map from the users point of view.

D

It makes no difference. They can't tell the they can't tell one from the other.

B

C

A

In our first first phrase, we chose to to be no spot to open my dog, the the oh.

A

D

Mean we will optimize out internal holes forever, we're talking about crumbs that are of the size of few kilobytes right, but if we, if real RBD objects, are four megabytes in size, that's a thousand of these guys. So if we get a few for K right spread throughout the object, we're not really gonna fill that hole for megabytes. We're just gonna. Have a few pages mapped.

D

We will still optimize for internal holes, we're just not gonna optimize for one kilobyte of internal hold yeah.

A

I mean the host in uh in the yeah. Yes, it's a I think it's even more difficult for us to kind of difficult. First, it's.

D

Not just more difficult, there's a fundamental trade-off right, the more finely you want to track that the less metadata you can afford to use to do it, because the size savings are so small, though yeah, it's not really worth it.

D

Plus you, the overall metadata size gets bigger, so you need to spend more CPU cycles. Finding the right extent. There are more branches, you have to do more, offset calculations and corrections, it's just more complicated and since we're never going to get unaligned rights in the first place, it's not worth optimizing for.

B

Okay, so the extend my only friend exchange rate remove X and Ida their extent. Yes,.

D

I think one day in the future we'll want to add a more sophisticated thing to, let us add a batch, but that's good enough for now. Okay,.

B

D

I think so, but again, my goal here is to explain our reasoning so that you can make you know your own informed decision like we could be wrong. We could have overlooked something. Does this make sense to you? Do you feel like you like? Is that enough.

D

B

I couldn't okay.

D

Cool like I, could dictate an interface I'm just trying not to you spent more time. Thinking about it than I have I think.

B

Okay, I have no problem cool and let's talk about, oh my, oh, my layout, Oh Mac tree layout.

A

B

The key is variable last three and the value is variable. Last string right so yep.

D

B

Omap tree will provide the fallen interface in update pair delete pair and catch value back a leaves, the case these pairs of kids and the values. So that's the currently staff or map provided the interface and.

B

Can you see my screen.

B

D

Yes, now we go.

B

Okay, so the map current, we will provide these kind of interface, insert update, delete and get value, and at least case these pairs of case and values. So this is a character set. Oh Mac provided function.

A

What what what do you mean file list kiss just a range curry, starting from at.

B

Least, all the case attribute case, so, according to this away, neither to list all the case list all the parents and we have to use to be Plast trace. That leaf should be linked together.

D

You know you don't strictly have to, but I see what you're getting up yeah.

B

So and OMAP tree, so each node will have his own map tree. Just like extra Dimitri and.

B

The Oh mom tree, the key, is variable. A string and value is for the internal node in internal node. The P is variable s tree and evaluates fixed length, logic address and the further for the leaf node. The key is wearable and Street, and the value also wearable s tree. So the internal node.

B

The layout looks like this one, though there are some paper, the second paper. They introduced the string key, how to arrange in a string key, and they put the string key to the right side and the left recorded offside and pointer to the to the.

C

B

So that is, that is the Oh notice, not honored Oh mum tree internal internal node, so that the key the string key is on the right side and left has recorded offset where, where the P is in this in this layout and the next pointer to pointer to this other node, now there.

D

Are n minus 1 values on the right right, but.

B

D

Minus 1 values on the right, I think in.

B

D

M I think that's fell at minus 1 traditionally there and there one fewer pivots than keys, and that's usually what you refer to the things on the right there. As pivots.

B

You may be there, no.

D

B

B

D

D

Doesn't refer to as pivots because they may not actually be any key that actually exists in the database.

B

Yes, so if it's true, then to be the key, but here we just put the P instead of the the real key is. This is the value here.

D

Oh yeah I know I'm just pointing out that they're not necessarily keys they're generally referred to as pivots they're like keys but they're. They there may not be any actual key value matching. It could be a truncated version of them, for instance. Yes,.

B

So this is the inner inner know. The leaf node is this is all the value under the key, we're also on the right side and the left side, the recorder, the key of that and the way that.

D

Ones, end of them, if that's the leaf, that's Adam, not on one sorry, the N, minus one part is only for the internal.

D

Row in TN, minus 1, that's M, so I don't know exactly what design you're using, but the standard way this works is for the internal node. You have n pointers to child notes and between each two of them you have a pivot, which is why they're add minus 1 of them, so key space at minus 1 in the internal node. That should be M.

C

D

P.M. right that should be AB. Okay,.

B

D

At minus 1 should be n minus 1 because there's one fewer of them. That's correct, I. Believe, okay,.

B

D

You tell me it's I mean I can't be sure that that's what you're doing but I think that's right. Yeah.

B

So so that it's layout of all the inner node and the leaf nodes, so the the stream, the case 3 and the y low string also started both starred in the in the right side, and the left side is fixed outside v. Less outside recorder, where the this history and registry is so that is easy to calculate the position for each atom.

D

Hang on a sec, Y key offset and valof sup. Is that just so that you don't have to oh I, see.

C

B

That's that's what I.

D

Mean I think so yeah it's pretty.

B

Yeah so just to make sure they.

D

C

You worried about storing.

C

My question, my question is: is there a probe on the size of the key and the value the maximum size of the key, the string key and the string value.

D

C

D

Actually so my suggestion is that if a key becomes too large allocated separate stuff for it, I.

B

Locate the separator pager, if you me yeah,.

C

B

So the outside will be the pointer for the next honestly.

D

Honestly, for now do this do the simple thing and return an error early if something in the transaction has a key, that's too big that a maximum size of the key in the value and just do it that way way simpler. Yes,.

B

D

Doesn't set Google map keys, so this isn't really a problem for now.

B

D

On we'll have to do something smarter, but it's fine for now.

B

B

I start the next pointer to the to the next leaf in the header. It's okay.

D

You are storing the pointer to the next header in police, leaf.

B

Node next, a leaf node yeah.

D

So if you do that, it seriously complicates split and merge. Oh.

B

D

Mean you'll notice it as soon as you start writing them. It's much more difficult. The.

B

Where you started the next pointer, no.

D

B

Having an X pointer.

D

At all, has this this property? Oh.

B

But when you list all the object of what you do, you.

D

Do the same thing, you always do when you're listing a recursive tree you workers up to the parent to get the next child and reverse down.

B

So now the user, the next pointer I, mean.

D

It's it's not a bad optimization, I'm, not saying don't! Do it I'm, saying it's complicated, okay,.

B

D

It is just an optimization you can do scanning without it.

A

When you first onto the leaf you it's a very possible you, can you keep a track of the parent address and it's even the parent node? If you move over to the next one, probably you can just occur. We're.

D

Just just using simple recursion, you return everything in that range from the leaf, and them the parent calls back to the next child and so on.

D

B

D

Mean I'm not saying a B plus trees of that idea. I'm just saying you don't you know.

B

If it is not that very.

A

Complexity is that you need to ping the parent turned out in the memory or locate.

D

That's not hard, I mean.

C

To the presentation, a it's regarding the resolution of the: if I try to work work yesterday me Gabi, unquote and they're, not solution, I got what was what I got now, which was very hard to read. Do you know how to solve this? When sharing the string onto the final solution, the genes of ruminate.

D

It might actually be easier to find I think there's a Visual Studio online in browser solution, where you could both log into the same thing and see it that might actually be better um I'm.

C

Wincing because I don't I, don't have a good solution.

C

D

It's up to the it's just up to whatever, whatever blue jeans thinks it can get away with with grieving resolution. I, don't think it's really designed to do a good job with text. Sometimes it works. Sometimes it doesn't. You can also make.

C

D

C

Sorry yeah, of course you see the same movie means.

C

D

That I've seen, although.

C

I, don't know about your mom. Maybe then.

D

You have a way to make it better.

C

But you mind, showing it called visual code: I want them to get organized yeah.

D

I would anticipate that it would be a bit I'm, not sure how you actually point it at a code base that you already have I. Don't have a good sense for that, but.

C

D

Go like there on line, one yep.

C

C

I'm trying to suppose.

C

That's nice very nice.

C

A

Next week, I'm.

D

Off doing by the way, okay.

A

We connect I'll.

D

Probably still be available via email, though, if anywhere else, questions okay,.

B