NEAR Whiteboard Series with NEAR | Technical Deep Dive Sessions, 26 Oct 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Whiteboard Series with NEAR | Ep 43: Alexey Akhunov from TurboGeth

Description

"This is the future."

Speakers:
Alexey Akhunov (@realLedgerwatch)
Mikhail Kever (@NEARProtocol)
Michael Birch (@NEARProtocol)
Evgeny Kuzyakov (@ekuzyakov)
Dani Osorio (@waverlymaven)

A

Hello, how are you I'm good? Thank you and we're live right now, one sec, one sec! There we go so uh hi everyone thanks for joining us. Today uh we have a whiteboard session with alexey working on turbo gas and uh joining us is danny, michael uh misha and eugene. Who would be asking all kind of questions that they have after they watched the recent video without by alexey about the turbo gas.

A

So I gonna, let you guys kick it off and just don't mind me from now on.

B

Awesome, okay, so I'll get started so alexi. I know we talked about how you've been researching this uh for ethereum for the past three years, and you found really some proposals for improvement regarding constructing state in a better way in order to avoid having to deal with issues like state rent or you know, staking fourth said state brand later on the line just getting the design right, and so uh we talked about uh kind of learning from those best practices. So I wanted.

B

Maybe if you could tell us about those best practices for the group or like what were your findings, and that can be a jumping starting point for us here.

C

Yeah, so thank you there there was a. Can you hear me: okay, yeah, I'm just kind of muffled. You know.

C

B

Hear me: can everyone else hear alexi.

C

Yep yep, it's not good to me, okay, so because I just I just come across muffled to myself anyway, so it the conversation, I guess started when we were thinking about the state rent and you a lot of people probably are aware that we had the state rent project in ethereum and the reason why it started is because there was a realization that the state is very large and basically this the size of the state was creating a lot of the issues, uh for example synchronizations and processing, and so it looked like a very kind of crucial, very urgent problem um and nobody really knew exactly how big the state was, and that was actually interesting because, depending on how you store it, the number would be different.

C

So if somebody asked for that moment, but how big is it like? What are we talking about and some people say? Well, it's a hundred gigabytes, some other people say no, it's 60 gigabytes, so it was already the first sign of that. We don't really.

C

You know this is not very well defined so and in the trooper guest. Very recently, we have experimenting with different ways of laying down the state, and until recently we had a representation which currently would be where we stayed currently would be 50 gigabytes, which was already quite a kind of good achievement.

C

But then we found another representation which is as efficient uh to access as that one, but uh but basically it's 10 gigabyte so and then made me thinking that uh if that is possible, then a lot of the worries that we had uh back then in 2018, when we kick started, the stage project were actually not as urgent as we thought they were and we can address them in much better way.

C

Instead of you know, we we do still have time to think about those things better and whether the stage rent is actually does have to be coming right now or does it could it go? Could it come a bit later because it does come, the state rent does come with a very um hefty price tag if you introduce it on a live system.

C

Of course, if you design a system from scratch- and maybe you can do that much easier but trying to put the estate rent on ethereum, where there has never been a state trend, it's actually quite hard and it's taxing.

C

So yeah, that's basically the what we were um talking about before.

A

A

Michelle do you want to like ask questions that you had.

D

uh Yes, so actually the most relevant uh part of the presentation for us.

C

D

Their flat state.

C

The flood stage so.

D

Yeah, so this is a very relevant problem for us, especially with wasp execution.

D

So in in our case, contract execution is very fast because of compiling international code, but uh storage access is still really expensive, so we are very interested in optimizing. It.

C

Okay, well to be fair. Actually um I don't know if you've seen that the on friday there was the uh I mean there was my another talk on friday at least online, but there was another one which is called evm, 356 384, which is basically this is. This is a project which also has a long story, and essentially the moral of the story is that the evm you can actually make evm interpreter very efficient. I mean the the perception of people. People had before that evm is a sort of evm interpreter has to be super.

C

Slow actually is, is not really justified because you can write a very efficient interpreters and actually there are already efficient ones, and in this particular talk they demonstrated that for certain things you know, if you had a certain operations added to the evm, it actually becomes more efficient than web assembly um and simply for the fact, because evm has the long word, operations like 256 and and this proposal is, for example, introducing 384 bit operations and if assembly has essentially 64-bit and it has to assemble older, more complex operations out of 64-bit operations.

C

So it's not fair to say that you know so the web assembly is by definition faster than evm. It all depends on your interpreter, but coming back to the state problem, yeah. Definitely uh so the flat state is exactly where the flat representation state is exactly where the stupid.

C

Yes project started from the observation that when we do state access in the traditional layout, when we when we represent the state as a radix tree, essentially so I can start drawing if you want so some people kind of know it as the patricia miracle tree, but actually there is much more uh standard definition of this radix tree that usually, if you, if you had them, if you had like a job interview, uh sometimes they ask you a question.

C

How would you um like, if you had a dictionary if you want, if you wanted to have a take a dictionary and put it into a data structure in your program? What would be the the very efficient way of representing a dictionary in the dictionary? Essentially, you have to look for whether the word exists or not, and if it does exist, you need to, let's say, find our definition or translation to a different language. So usually the the way. Can I oh? Why is it not?

C

Oh, because I don't have a pen, so usually the dictionary. So the the correct answer to this interview question is the radix tree where you essentially build for so the radix tree consists of multiple levels and at each level you can basically have.

C

Let's say that we have only six letters in the alphabet and so the first level is that all the words that start with letter a they go in this way and all the words starting will let the b go this way and and so forth, right and and then sort of like then the second letter, then the next level is the second letter of the word. In a dictionary, let's say that if we have a word apple, then it of course will start with letter p, and there is no letter. Let's say that.

C

There's no word with starting with two letters: a like there's. No, a a uh this is like a b, for example, so that thing would start with the b. So we have words starting with a b or like abacus, and then we have the word starting with ac, which is like what is the starting something I don't know there definitely exists or something like that right, and so you get the idea on the second level. You have all the kind of letters here and so forth and then in the third level you get the.

C

What is the third letter in in the word, so abacus would have a again and so forth um instead of abba, for example. If if abba wasn't a dictionary, then it would be here and so forth. In the end you get all the through the levels and somewhere in the end, you will get either the the definition of the word or the translation.

C

uh So here you get so that's the leaf of the tree definition of translation and so basically, if you're, basically building a dictionary in a job interview. This is what you have to say. You know the the radix tree, um and so that's the similar idea to what the patricia merkel tree is or any other kind of radix tree, but so in order to map this problem.

C

To the dictionary question is that in ethereum, for example, imagine that instead of the words in the dictionary you have addresses of the accounts right and that's why, if the first and here essentially in the first level, instead of the first letter of the word it basically, you look at first four bits of the address or or something and so there's only 16 variations of the first 40 bits.

C

So you know that in the first level there will be at most 16 elements and then, in the second level, at least at most 16 elements and so forth, and all it goes all the way down to a certain level where there's no more. um You know, as you can imagine, that, with the dictionary, for example.

C

um At some point, you don't really have a lot of branching anymore because, for example, if you take the the same example of a abacus after you've done a b a abba, there is no many words that actually start with the a b, like maybe a b, a ba or whatever. So imagine that there's no other words which starts with a ba and then so. In this case, your abacus could actually be straight here.

C

So, let's, uh let's uh sort of like re erase this, and so let's imagine that there is no other words that start with a ba. So what we do that's in ethereum, then we say we put. The kus in here is like some sort of special element, and then we map it to the to the definition or something like that and the the reason why we could do. That is because it's essentially like there's no other words with this is such prefix, and this is a you know the way to kind of compress the.

C

This is how you can compress the tree um yeah in ethereum. Essentially the definition. Instead of definition, you will get a balanced nonce um and something like this and, of course, is not only in ethereum and that's basically, that's the structure and it's all good, uh but there are a couple of nuances to it. um When ethereum had one of the audits, uh the guy who did the audit andrew miller from least authority, but or at least he was at least authority, he basically noticed.

C

One thing is that it is possible, if you don't, if you just put, if you just use the simply the addresses of the accounts as the words in such dictionary, then it's pretty easy to construct um a very kind of deep so because we we're not really like here, it's not a dictionary, it's the contract, sorry the addresses and you can generate the new addresses very very uh easily. So what you can do is that you construct uh the structure which is super super deep and, and you can see how you can do that.

C

What you need to do is that you need to find two addresses which are which have basically really long prefixes two different addresses, but they have very long common prefix, for example. If you can, if you manage to find two addresses uh for which- uh which actually it's not very hard, uh because because basically you can just say fff, f, f and then on the very end, you get uh basically, um let's say f so and so because the address is um 20 byte long.

C

So the depth of that thing is going to be 40 40 uh nibbles. What are they called nibbles? And then you generate another one which is going to be exactly the same, but with the a with the e in the end, or something like this so yeah and then you're you're gonna have a very deep tree and there could be other things like. Oh so he was actually suggesting to do uh something like the spine structure.

C

So it's not just you do it on the very top bottom level, but you actually you branch it on every single level, so you're creating some kind of spine structure like this on every single level. You do this kind of branching.

C

So, basically, by doing this, you create in a very deep uh structure like it's 40 levels, deep and and then basically the reason why you know, usually you say we can on what like okay, 40 levels. Deep and what's the problem like okay, that's thing, that's fine! um It becomes a problem precisely when you start thinking how you store this in the database, and this is where.

D

You get tonight yeah.

C

D

So I just want to summarize kind of my high level understanding, so the goal of this tree is the ultimate goal we have is to have a state road.

C

D

C

It's one of the goals, but it's not actually the only goal.

D

So, okay, so what are the goals of actually this building.

C

So the goal, the the the from my point of view, the most important goal of this tree is to be able to add things easy easily and find things easily like is any it's essentially a representation of a a stored, sorry sorted map.

C

It doesn't have to be sorted. It could be kind of just a map or something like this uh yeah, but you need to do it in such a way that you can actually calculate some kind of commitment to this entire map, and it happens to be that people risk decided that the commitment to this entire map is going to be a miracle tree of certain way of certain uh construction, but the the the problem of computing, the hash route in a specific way. Wasn't there in the beginning, it has been constructed.

C

The original problem was to be able to have a data structure which, where you can put stuff in and you can get stuff out like a mapping.

D

Sort of so what I mean is that if we don't have state route it doesn't have to be part of the protocol. It could be any key value, storage.

C

Correct yes, so yeah, if you need to so, where do we come to the state route thing? We come to the state root? Because so, when we say okay, we need to make sure that everybody has the same mapping right in the memory or somewhere in the database. How do we ensure that? Well, first of all, if you look at what bitcoin does, let's go?

C

Let's step back from ethereum, because in bitcoin you can imagine they have the same kind of structure, but it does not um explicitly specified in a protocol because in bitcoin um there is also a state which is a set of uh unspent uh utxos right and you need to have a state in order to be able to verify whether a certain transaction is valid or not. So it's the same kind of thing, but the difference is that in bitcoin they don't construct commitment to the state and they don't put them in a header.

C

So what they do is that they assume that everybody is having this same code or code does the same thing without bugs so that if they run all the blocks, they will arrive to the same state they're. Assuming that that everybody could do that, and if somebody doesn't do that means that they're incorrect, then they they sometimes can do wrong things in ethereum.

C

Essentially, for for lots of reasons, the designers decided that okay, we we don't just assume that everybody has the same mapping, but we want to somehow compare these mappings. I mean, of course, miracle trees were known for a long time for such purposes like if you take the products like cassandra, for example, it uses miracle trees all the time to reconcile uh the lab replicas. So if they have two replicas and one then data missing in one replica, they use miracle tree to quickly find out.

C

You know what is missing and just reconcile it, so they call it repair.

C

So here is exactly the same idea: okay, let's uh have a miracle tree and then we basically going to have a way to reconcile the replicas, because basically it's a replicated database. Okay, that's uh in the replicated state so- and somebody chose this particular form of merkle tree for for the for the task. Yes,.

D

So ethereum, as I understand, uses this radix tree.

B

D

Hashes of keys for keys to make it balanced.

C

D

A

Basic projects.

D

Have uh different ways of building the trees, such as like balanced, binary, search, tl.

C

D

C

The notable example is cosmos right, yeah I've looked at that structure as well, so yeah that works too have.

D

You looked into trade-offs between them at all.

C

Yes, I did so. uh The trade-off is this, um so one of the main properties, some one of the main differences. Can I uh create a new page here? Is it like a disk plus button right.

A

Yeah but but but but you can also like scroll down uh with you know, you have the controls on the right and the right panel. You can just click on this cross there and scroll it down or you can create a yeah. You can click plaster.

C

A pan you mean the punting, okay, fine yeah, so essentially the the trade-off are quite interesting. So the trade-offs are uh in the balance tree. uh Let's say it's ivl tree or it could be a white balance tree or whatever red black tree. So the avl tree are probably the best ones because they they have a much better bound on the on the depth.

C

um So what happens in the balanced trees? Is that if you take a one tree, so they basically binary- and you um imagine that this is like some kind of algebra right so in this algebra, your trees are your elements and then you get some operators that operate on these trees. So, for example, one type of operator is that when you want to add a key value pair, which is basically adding some sort of key and then value at the end. So that's one operator.

C

Another operator, for example, is deleting the node, so another operator is removing certain key value pair um and so forth. So there's two different two main operators, you add something your delete, something so the main property of the uh radix tree is that all the operators are.

C

They they commute with each other like it's, this particular kind of in this particular algebra. These operators are commutative, so it you can, or you can basically add elements in any in any order and then or remove them in any order. In the end it doesn't matter in which order you've done it. uh What really matters uh sorry, but uh in okay, one correction, the the additions are community, but the um uh and deletions, but they do not commute with between each other.

C

So, basically, additions commute with additions, deletions, commute with additions with additions, but the uh additions and deletions obviously did not commute, um so you have to know in which order to apply them, but for the balance tree, it's not the case in the balanced tree. These operators are not commutative, so it does matter in which order you add them. So essentially, if you have it two elements, you add them, let's say a and b and take the balance tree.

C

You add a first and b next and you can have a different tree than if you actually add, b, first and then a uh next.

C

So that is a bit of a challenge um and if you need this property, if you do need the property of uh commutativity, then you have to solve it somehow you need to. I mean there are solutions to that, but that basically that's the main uh con cons. I mean uh the disadvantage of the balance tree, but there is advantage, of course, in the balanced tree. You don't have to do these extra hashing to uh to produce the balance, because the tree balances itself and it has a it- simplifies quite a lot of things.

C

I would say, uh for example, because you have this hashing stuff for uh some implementations. Initially, we also had it, but now we don't you've got to have these pre-images, which are quite annoying. You know we when you look at when you look at the tree and it's it's hashed and you have to always like um let's say.

C

Basically, if you go back to this tree, uh if you imagine that these um um you know these letters like these paths in the tree are not actual addresses, but the hashes of addresses and you've somehow found one of these hashes. How do you recover what the address was? Well, it's not really possible without really knowing without having another database which will map the hashes to the addresses, which is a bit of a you know, trouble anyway, so I think the main thing about the balanced trees.

C

The main thing I like about balanced trees is that you don't need to do this extra step, but, as we say uh it's a bit of a yeah, it's a.

D

Trade-Off yeah. So then, if we're implementing a client.

D

So the client would store.

D

For this try structure, it would store my pin from a hash of trinode to the node object. Essentially.

C

Well, this is how the yellow paper tells you to do so, but.

D

That's not how.

C

You should do it.

D

Well, yeah: let's start with this naive implementation,.

C

D

And then that would be for that. It would be using some kind of key value, storage.

C

That yeah that's what everybody essentially is doing. um They are. um So if you go uh go back to that structure or I don't know it doesn't matter which structure it is, it could be this one. uh If I just uh let's see, let's make it a bite of course, what is it I'm just going to get them confused?

C

Let me see, let me erase this thing and this thing: okay. Okay, let me just have a little tree here.

C

Okay, let's three levels will be enough: okay right, so the way I usually think about this is that imagine that you have um the um the nodes are pointing to each other they're kind of pointers, and each pointer is I I draw it as an arrow here.

C

So usually, when you have those structures in memory, um you basically have this element on the top um okay. So how do you actually point to something here? Is there a way to point.

E

D

E

We can see your cursor.

C

My cursor, well, I don't have a course or the ipad because I um yeah yes, I don't that's why I I chose ipad because it's easier to draw but and then I don't have a coarser anyway, it doesn't matter. So I'm just going to say so. You know even in this top element, where I drew the two arrows from it. Imagine if you store this structure in memory, not in a database, what it would mean is that this top element would have two pointers right and the pointers.

C

Usually I mean, if you have a 64-bit architecture, the pointer is essentially a number which is eight byte number, which has the address the memory address, of the thing that you're pointing to so that you can go there and look it's very similar.

C

How these naive would say, implementations do it, but instead of the pointer being 8 byte memory address, the pointer is in the case of ethereum 32, byte hash and in order to dereference this pointer in order to find the things it points to in the in the case of our memory, we simply need to you know: ask where what is inside this at this memory address in the case of the database representation, the referencing, the pointer, means looking up that record using this hash as the key. So it's basically the same process. So that's why?

C

If you establish this connection, then you can basically think about this as the pointers in memory and then you can very easily reason about what happens when you access things, and you can obviously see that the depth now matters, because if you always start your search from the root which you basically have over time and you need to find something the deeper it sits, the more hops through the pointers, the more dereferencing operations you need to perform.

C

And then, if you remember that in a database, dereferencing means doing a lookup, then you realize okay, you know if I go eight levels deep, I do eight lookups. If I go to 40 levels deep, I do 40 lookups and another feature of that is that you cannot parallelize it because you they are data dependent. You cannot hope. Let's say in this picture. You cannot hop straight from the top element to the second level. You have to first go to the first like to the first level then to the second level.

C

So you can't skip the levels uh because you need to find this. It's like a it's. Like a you know, a quest. You know you first, you need to find the next point or the next point or the next point, and that's why the depth really matters and that's why the uh this uh security audit, I was talking about that's why it was a problem if somebody could uh create those spines, as I was pointing out those okay where's this thing again: oh no, it's the wrong button.

C

Sorry yeah, the spines that I was drawing here right. You see the spine. um No actually, no anyway, so you can't see the spines, because you somehow moved this thing around. um I don't know, um but the spines that I was drawing yeah the the spines here on the right. So the problem with those spines. If you manage to construct them, then um you force the tree to be very deep and then uh you, your all, your queries inside those the area of the street are going to be super slow.

C

um So you can specifically like if you don't like a certain contract. If you want that contract to become slow, you will create spines around this contract by by basically sending you know. One way you know a transaction with one way to those addresses and you don't even need to do any kind of address mining. You just um do that and of course, if you now hash everything with the sha, whatever 256 before you put it in three, it's still possible, but it's uh it's much more resource intensive.

C

So you need to buy um some sha i6 sarsha 3 a6, to be able to pull that off. I mean you could still uh probably go a few levels deep, but um I mean it's not that potent anymore.

C

And yeah, that's actually. uh The good point is that this prehashing, before putting it into the tree, is not a perfect protein protection against uh this. This balancing attacks, whereas the cell balancing trees, are pretty much the best protection you can get theoretically. So it's theoretically the best protection.

D

Yeah so let's start with the naive implementation and then try to optimize.

C

It okay: let's do that.

D

In any case, we always have to compute state road for each block, correct.

C

Not not correct.

D

C

Not in any case, because so what you can see now happening in ethereum is that a long time ago I don't know which year was at 2016 or 2015..

C

Most people basically gave up on the idea of computing the state route at each block, and that came in the form of what you know as a fast sink right, you're, probably familiar with fasting. So with the fasting. Essentially, instead of trying to go through blocks through starting from genesis and all the way in computing, we stayed and executing everything and hashing it every block, which is super slow.

C

What we do instead is we we're choosing a certain recent block which we call pivot, and then we download from some peers the the the these the pieces of that tree and once we've done, that we compute the state root on that tree, and then we verify okay, that matches up with the head, but notice that we have skipped the computation of the root for every other block before that. So from that yeah from that on, of course, we we start computing. The stage for every.

D

Block when we're validating new blocks, we always have to compute status.

C

Again, this is not true again, because, basically, I want to open your mind to other possibilities.

C

Basically, the one the whole process of discovering trouble geth is throwing away the dogmas that you might think.

B

C

They have to something has to be true, and the reason why I'm saying is that, uh because in turbo gas for example, sometimes we skip the verification of the root uh the for the, for example. If we, if we think on hard drive on hdd, it cannot do it for every single block, but it can, for example, do it every 15 blocks, which is, I think, still fine.

C

You know, because if there's some problem, you will notice that anyway, um after 15 blocks, so I don't see this as a as a completely dogmatic thing that we have to validate uh state root average after every single block I mean, depending on what you're doing you might sometimes skip that, and in fact a lot of people skip that for most of the blocks when they do fasting.

D

Yeah, that's a great insight.

C

Okay, um so so.

D

Then uh then my question is now: we could go.

D

uh Now we have two different things we can do to optimize it. We can either change the way we store the state completely or we can store a separate secondary version of the state.

B

D

Which would be better.

C

Okay, so what you just said, the second option: um okay, the second option where we say we start building a secondary structure which has basically kind of flat-ish store. This is what I believe currently go. Ethereum is doing so they also realize that the flat uh representation has advantages, and but they don't want to kind of completely redesign the the model, at least for now, so they basically started building this thing on the side, which is the snapshotter and that's uh one of the approaches.

C

But from my point of view, if you maybe that's the only thing they can do, I mean it depends like what situation you're in if you are, if you want to get the best results, obviously you have to be change the the data model entirely, because there are a lot of things that you simply cannot do with the. If you still have the try, I mean basically we're not completely so thing to understand.

C

Is that we're not com completely, throwing away the try as the concept but we're throwing it away as a physical thing where uh so, essentially, another dogma that you have to get rid of is that the try is does not have to have a physical embodiment.

C

It could be just a logical concept and as long as we can produce the state root in some ways which will match up with whatever the try would produce that's fine. You never have to have a try actually, even in memory, and in fact we don't, we don't. We used to have a try and memory in true beget at some point, but now we don't even have it anymore as the as a physical embodiment of the of that thing.

C

So yeah you can go like as far as you want like you can go baby steps, but then you you, it might take you a long time or you can just like rip everything apart and then create a new thing. Of course, in our case we didn't repair. I mean I didn't rip everything apart. I was discovering lots of things on the way like as we would you probably plan at this white boarding session. Is that we're going to start applying optimization a bit by bit and eventually we will come to the final design.

C

We could definitely do that if I remember all the steps, but there were a lot of steps.

D

uh We could try to directly go to the final design, but then, let's talk about problems we have with uh okay, if we would try to implement it right away. So one difficulty I see is that there is one thing that this naive way of storing the tri gives us, which is accessing any version of the state.

F

D

So if we have blocks coming on top of arbitrary blocks and we have to compute state transitions, we need to access every version.

C

Okay, so this is this is what I want to question in order to compute the state transition. You don't need access to all the versions. You only need to access to the previous. Well, basically, you only need to access to the current version. That's it! You.

D

Don't but which one is current.

C

D

C

You're, when you start your state trade transition, you need to make sure that you are in the correct state and then you apply the state transition and you change your current state.

D

So in case of ethereum, where we have proof of work blocks.

B

D

It like when a new block comes in. Does it mean we have to roll back current state a few blocks and then apply blocks from indication.

C

No, no, okay, so again, yeah. I think what what we're kind of missing a bit here is that, again in um naive implementation, you're right, you can think that you can have multiple uh states at the same time you can hold on to because basically every you can point, um you can point to any version of the state by just simply one single pointer, which in your case is the root hash. Yeah, that's fine! So you don't actually have to make a choice which one is the current one.

C

You can say: oh you know, I've got few of them around and you can use any of them in the. If you start using the flat model, you have to make a choice. You have to always know which one is my current state right. That's my current state and you only you. You can only have one so now you have to be doing things slightly differently.

C

So if you basically receive multiple blocks which are the same height and you need to make a choice or you already made the choice, but it was wrong choice. You have to come back and reapply the other thing. So you always have to maintain your your chosen state, so you cannot have it like. You know in multiple ways. So what triple get does it's? Actually it has only one current state and it moves it back and forth mostly only forward.

C

Sometimes it has to move it backwards, but that's not a problem, you move it backwards and then you go forward again. If you're, if you know the other path is better and yeah, that's another realization you have to do is that you always have to know what is your current state yeah.

D

So does turbo gaff try to make try to make current state the most recent blocks or, if kind of a few block, break and storage if separately,.

C

The most the recent blocks, so we are so we always the current state is always the state that has resulted from the whatever last block we have received. uh I mean whatever was the best last block received yeah.

C

We never try to be a bit behind. We. We always try to be as far as possible right in the in this process.

F

C

Okay and then you probably were asking: how do we go back right? Is this what.

D

I assume we store diffs.

C

Yeah, okay, so that's the that was the case some time ago we started divs, but actually now we store reverse dips, uh uh which is there's a slight difference, but it makes a big uh impact later on. Yes, so we store dips as we process the blocks.

C

Every time when we uh process the block, we remember which uh bits that we have changed and uh we obviously during this execution, we have information about what was the value of the thing everything before the change and after the change right and so obviously the value after the change becomes part of the new state and uh the value before the change.

C

This is what we remember right and then we remember that for every single block as we go, we call them change sets, and so we have them, and that means that whenever we need to go back, what we simply do is that we take that change set and we're applied to the state, because it will basically just because it contains the the things that were before the change. So essentially it rolls everything back. You can also call them.

C

Maybe roll back sets or something like that, so they basically allow you to do to do to go back uh and and then you can also go back pretty fast because uh you know, if you want to go back, let's say three blocks. You read all these three things. You combine them and you roll them roll back. I mean you can do it in many different ways or you can like the merge sort because they already pre-sorted you can. You know, there's quite a lot of things you can do.

F

So the flood state is a certain map right. Sorry, what is what map flat state is assorted map of keys.

C

It's a a shorted map. Yes, um I mean it doesn't strictly have to be sorted but being sorted, um helps a lot of things along down like down the line um because, basically having them sorted, allows us to um do some kind of map reduce things. You know uh you know what I mean like uh the the basically, you can uh set up this kind of flows of data which are which have to happen in a certain order.

C

um Yeah, it's sorted because basically, it's a database and the database we use is the b3 and it's sorted yeah.

C

um Oh actually, that's interesting, uh the one of the things that haven't happened yet, but let's say that if either scan, let's say um or whatever block explorers started using uh turbo gas, what they could actually do, I mean we could introduce the rpc request so because we actually store the state as the pre-hashed version.

C

Let's say before the oldest hash is applied, so we essentially store it as the mapping of addresses to this to the values, not the hashes of addresses. What you can also do is that I don't know how the user scan does it. Let's say that you have this, um you know the uh the the search bar where you start typing the address, and it shows you the you know: what are the addresses where this prefix exists in the state?

C

Right, let's say you you, you say type in abc and it already shows you the you know the these are the existing accounts with the abc and they probably uh pre-you, know pre-made this specific table.

C

But in our case you don't have to have a special table because you can simply search uh through that state and that kind of helps to be sorted for that case. For that reason,.

D

So you don't catch the addresses.

C

We do, but we don't hash them until so. Basically we don't, we only hash them until sorry, we only hash them after we have executed the transactions, so it's all separated so evm works on unhash state. Once it's finished, then we convert an hash to hash and only after that we compute the state route.

C

It's basically a separate process.

D

But are you worried like if your back-end storage, that you use, I assume you use some kind of.

C

That's excellent question: I know what you're going to ask, but it's an excellent question. So why do we store the state twice right once unhatched one hashed right? Okay, that's kind of a waste! Isn't it? Yes, it is the waste and it was even more waste when we um so when we actually introduced this extra thing: extra separation, hashtag.

C

So obviously in the beginning we only had hatch state and we executed avm on a hash state, and that means every time you access something you just have to do another three calculation, which is not maybe a big deal, but it actually would have become a big deal later on anyway, so we separated them.

C

You know, and it's obvious, the state was 50 gigs or whatever 60 50 60 50, 60 gigs. So then, instead of one table of 50 gigs, now we have two tables of 50 gigs, bad, isn't it, but it wasn't actually bad because we also had another table which was more than that tiny bit more than that which is pre-images. Remember I told you about pre-images, so guess what, when we introduced the separated, the hash and plain state, we could get rid of pre-images because we didn't need them anymore. Now we have all the plane state.

C

We can run all the queries from a plane state within net preimages. So essentially there was no net gain in the storage. Then what interest? Another thing happened. When we managed to squid squeeze the state into from 50 to 10, then it was actually net positive.

C

We ended up because we already throw away the pre images and we wouldn't be able to squish the pre images that way. So essentially, we ended up with 20 gigs instead of 120 gigs, so it as a result of these two operations.

D

I see so at that point. Could we maybe make.

D

Kind of go back to to go directly to the final implementation. Okay,.

C

D

So you probably.

C

You probably imagine what the final is it scrollable or for me or not when I scroll it, do you see it scrolled, okay,.

A

uh It's very it's a lag well. Well, I'm scrolling it. I'm sorry.

A

You can also share your own screen and zoom if you want so that I don't share my screen. You want to do.

C

It ah this is what no. The thing is that I no no it's okay,.

A

Because I I basically.

C

Have an ipad here on my right and.

A

C

Screen here I uh it will take me a while anyway, so but okay, no just scroll it back so to make sure that they um pass that the binary tree, because I'm going to be drawing here, look I'm going to be drawing right here here. Can you see that, under the binary tree, where I'm drawing okay yeah, this will make it like on the top of the screen or something like this, I'm going to start from there?

C

Okay, so let's say that this is address one address, two, okay, so what basically we do is that we optimize the data layout for evm, because evm wants to have instructions like a balance.

C

Let's say and the balance takes the address as an argument right and it wants to have a balance or if you have essentially a slowed instruction, then it gives you two things it gives you address and it gives to the location one something like this and it wants you to find um a storage item or if it doesn't exist, it gives you zero right, and so that's basically. This is exactly how we map this in a database, so we make it easy.

C

So we map in our database it's the same table, but we map uh addresses to the very simple structures which is basically balance nonce, something like this, uh maybe cold hash, but we we we actually wanted to remove hashes from there as well. But, let's say, there's some sort of hashes. There doesn't matter and the same thing here, um balance nons and some sort of hashes for the all. For these things, of course, uh we have some kind of concatenation.

C

There is a tiny bit um complexity here, which we called incarnation incarnation. That's the, I would say it's an advanced concept, um but that's basically how we managed to get rid of get around the create two and sell destructions, uh because that creates a bit of a trouble for a flat model, so essentially for the for the contract storage. The keys in our database com consist of three components. Not two first component is address.

C

20 bytes, then there's incarnation, which is currently 8, bytes and then location is another 32 bytes, okay, the location is on is not, is not prehashed and the value is essentially the value and that's basically the model.

C

So if you want, I can go deeper into what incarnation is, if you really want to know, but do you want to know.

D

C

Okay, so the incarnation essentially is required because there is some really odd thing about self-destruct and and and even more odd things about the combination of self-destruct and create two. So self-destruct is this operation, so this truck so what it it's very unique operation, because what it does it for the very fixed amount of gas.

C

It allows you to delete a lot of data theoretically like if you have a contract which has like million entries. Of course it doesn't really happen in practice, because such contracts usually don't have self-destruct instructions. But theoretically um you can have a really large contract and then with one in one go you do self-destruct and all that data is gone. It's not visible anymore and of course, if you store everything as this tree model, that's easy right.

C

You just remove the pointer and everything, and you expect there's some kind of garbage collection will be going on and then everything which is not referenced by the pointer will be collected right. That was the still thought, but for turboget because we use the flat model. It means that um we can't do this thing. uh We have to remove it. Genuinely remove data and that wouldn't work, because that would be a dos attack vector if you're, requiring to let's say for this fixed gas cost, remove a million entries from the database right.

C

You know it could take quite a long time. So that's why we decided to get around this problem is to um so. First solution will be to just ignore it because we said okay, let's, if do we do self-destruct? Let's not do anything with the storage, let's remove it at all. Okay, fine, because because I thought the only way to do self-destruct is first to do, create or create or create a contract using, let's say a usual transaction to a non-existent address or doing some sort of create.

C

But if you remember how the address of the contract using create is calculated, it calculates from the um creators, so it's basically calculated from the creator's address, plus nonce and some sort of hash of it. So essentially the idea was that it's not it's it's theoretical or it's basically practically not possible using create, or this the old way of doing things to uh to land the contract.

C

On the on the same address as as the previous one, so that's why the our strategy of simply not removing the storage worked, we just simply didn't remove it and it was fine until the crate 2 came along and with crate 2, it is possible to land the contract on exactly the same address as before, and that is a problem like what do we do with the self-destruct right now? Okay, so that was a big issue and that's why we started to introduce incarnation.

C

We still did not want to remove the whole storage straight away, maybe later during the pruning, but so, instead of deleting it, we wanted to mask it to make it invisible, and this is how we introduce why we introduce incarnation. So every time we create a new contract on the same address, we increase, we increment the incarnation. So let's say the first time a contract created on this address. It has incarnation one.

C

Then, if you sew this track, we still remember that there was something in there and when you started create it again, it gets incarnation two and then so forth.

C

So basically, then we this way we separate the um the storage items of the different incarnations of the contract and we don't let them to be mixed up so that if you're, you know, we basically make them invisible and then the idea is that when we implement pruning, which we haven't done yet, but it should be pretty easy, we will figure out which incarnations are dead and we just clean them up. uh But that's how we get around the issue with create two using this incarnation thing which is kind of weird.

C

But it's one of these things that we have to figure um sort of figure out.

C

And incarnation also uh crucial when you do um unwind, because basically, let's say that if we didn't do the incarnations and we did it delete all the things right. Imagine that you did delete everything self-destruct, but then you said wait a second I want to. You know I want to do rework. Can I just go back and then do the block, which did not do the self-destruct?

C

So that means that if you do want to do that in the in the rework, you have to reinsert all this million uh things into the database again. So that actually kind of convinced me that we we should not never delete them. Basically, because you want to you, want to go back and you have to reinsert them, and then you might remember, maybe you haven't seen this, but basically I was saying that the self-destruct was probably a bit of a design flaw in the evm, because it's a it's.

C

This kind of it's, this kind of operation, which could potentially have a very large impact but has a fixed cost and then reverse of this operation is even weirder, because I think now, from my point of view, the inverse of any operation should also be quite cheap to execute, but the self-destruct is very weird.

C

Basically, its inverse is even worse than the operation itself.

F

Okay makes sense.

C

So basically, that's the flat model, it says: that's it there's not much more to it. uh Just.

F

The way you compute the.

C

Okay stage root is computed in two uh steps. So can we scroll it? uh Can I scroll by the way, let me see: does it work? Nope, nope, nope, okay, you scroll it okay, so um so step one way, you're going to scroll it under. I want you to write under the self-destruct thing.

C

Okay, thank you so first step. One is that we obviously need to convert the unhashed to hashed, which is easy right. It's it's pretty easy. I mean it takes a while, but it's easy, so I'm not going to get into the details of that. You can figure that out. So in the end, we are we're sorry, okay, so, at the end, um after the stage we have address hash at one mapping to something an address hash to map into something okay uh to some values: values.

C

Okay, so then, what's next um and the next, you want to compute the miracle root. Okay, um so what we do um so the problematic bit about the flat model is that in initially um I didn't even bother uh having some extra data structures for for for this particular operation, uh because I would just say: okay, we just recomputed. This whole thing uh from scratch all the time, but obviously- and it at that moment- the state wasn't that big and it took about you know.

C

Sometimes it took about half an hour on some devices, but maybe one hour on other devices, and so it was okay, but then I I thought I'm gonna solve it somehow differently. So what we do currently is that it's also not very uh it's not very uh difficult.

C

So imagine that radix tree again, let me see it's right, x3, and so what we're going to do in this radix 3 is that we also going to have a values here at each point which are going to be inter we call them intermediate hashes. So with each.

C

With each node on this radix tree, let's imagine that we have these intermediate hashes.

C

Okay, so that's basically the meaning of this intermediate hash is essentially it's the it's the hash of that sub tree right and that's exactly why we like why we like these um explicit representation of the smerkle tree, because at any point of time we could um would say that okay. So let me illustrate this problem.

A

C

Let's, okay, so let's say that we wanted to modify uh a bit this bit right this bit. Okay, so we wanted to modify this bit and, as you know, what will happen is that will be some sort of ripple effect right. So if we modify this, then the hash of this whole element will be modified. I'm going to circle it then, because of this is modified, then the hash.

C

The hash of this is modified right and then eventually, the hash of this is modified right, but, as you can notice, is that, although the so when, let's say we're computing, the hash of the root element, uh because the other hashes are not modified, we can just use them as they are, and this is exactly where um how we uh we basically do the auxiliary structure, so um we store the mapping of.

C

Let me just it a little bit, it's very simple. Actually, it's a key prefix.

C

To intermediate hash- and that's basically it that's the whole magic so for each node in the tree, we simply store the key prefix uh mapped to the intermediate hash, that's it and that structure is so turns out to be pretty small. uh I mean, I think, now it's about two to three gigabytes for the entire state and that taking into account that it contains all the intermediate hashes for all the for all the basically uh non degenerate prefixes, I'm just gonna call it okay, so basically there. This is this again.

C

This is not uh the magic. Basically is in the way that we are so it's not actually again. This is not a magic, the the bit that where you need a lot of interesting engineering, is that how do you update the structures and how do you essentially walk through them, because when we are when we are updating the state route, we first we start with this change, set which we're talking about or rollback set, where which basically is the enumeration of all the keys that have changed and then we're starting with the structure?

C

And then we we know we basically walk through the structure and we we carefully using some sort of hash stack. We just walk through this whole tree and compute um compute the result, so it is a kind of like it just it requires a careful algorithmic work. It's not uh it's not like something out of.

C

Ordinary, so basically, because it's it's it's like um imagine that you have to do some kind of merge, two um two structures that already pre-sorted right, the merge swords.

C

You know and one way to do it if you, if you basically do it in memory, then it's easy, but if you let's say do merge sort of with the two huge files which are pre-sorted, then of course you enter you open two, I would say we call it core source or whatever one of them goes through the first file or iterators go through and another one goes through the second file and then just at each point. You compare them and you just move them um in a lock step with each other.

C

So the same mechanism here. But uh what you do is that your um these two structures, one the first structure- is this uh mapping of address hashes to the values, and the second structure is the uh mapping of the key prefixes to intermediate hashes. Essentially we have this uh kind of merge iterator going on through those things.

F

So so, basically, this uh preferences are only if there's a note right. So so let's say you have abba and then you have like a b is the same abacus all right. So then you will have prefix a b yeah.

C

That's okay, so yeah, so basically we don't store all the possible prefixes, but only the prefixes that are non-degenerate that are that are kind of have something in them, and I guess this is actually this is this kind of comes to. The point is that um people before before us uh did not essentially just attempt these things, because I think they should have, because somehow they assume that all these things will be super hard and and it will take a long time, but if they tried it would be fine.

C

You know like did somebody try before us to just take the entire state and mercalize it you know even for this, like you know, we we tried it, it worked and then people other people started trying it and said: okay, actually, that's not that bad and then um and then they started trying other things like. Oh what happens if you yeah, when you happen, when we do this prefix mapping? Oh actually it turns out we're not that not that large. You.

B

C

uh You know if you probably thought it would be like uh tons of like tens of gigabytes. Actually, it's a two gigabyte or three gigabyte um yeah.

D

But from their complexity point of view, when you're changing the storage, like you also had to change a bunch of other stuff, but presumably.

E

D

It was relying on the fact that every version is available, and now you have to do switches between.

C

Yes, that's why it took two and a half years to essentially retrofit all the functionality using this approach because, basically we broke, I broke everything. I have to fix it back again and I just keep kept breaking it and until until basically this year, the around you know when we did the alpha release in july, I sort of realized that okay, we have unbroken enough. We have fixed enough stuff that we've been broken yeah. So that's basically the thing that you do.

C

I mean that I did just broke everything completely and radically and then try to fix it again.

F

So what yeah? So the question is like, let's say you need to from rpc perspective return some historic state. um Is it even possible in ethereum or like you, don't really need to worry.

A

F

For this particular implementation, good.

C

Question, yes, that's very good question, so there are two um ways you can look at the historical state. One is, I think, more important. One is less important, so um you can look at any. um So you can ask the questions about okay. So what was the balance of such-and-such account at such and such block? Right?

C

That's quite easy with a flat structure right. So you just look at the. I mean you basically, because if you're, if you store the state in a flat structure, you can also state the history in the flat structure. In our case, we store the history as these those rollback sets and then we index them and that's another story. I think actually, the data structure which stores the history is more complex than the data structure that stores the flat state.

C

So it's we're basically making an index over the robux sets, but we can pretty efficiently figure out. You know which uh block this certain account has changed. uh You know around this sort of. Basically, if you're asking what was the balance of such and such account and a block one million, then we go through our index and we find okay. When was the this account changed uh first, after the block one million, and we we use the index to find it, and once we found it, we look at the robux set.

C

Let's say that it was changed in the block, one million and fifteen. That was la the first time it has changed since 1 million. We then open up the rollback set and we look okay. What was the value before that and the value before that was actually the value at the block, one one one million- and this is what we do for such queries. It's still much quicker than going through all the trees right um right.

F

And so so, how? How do you start this index to support this type of or about operations.

C

Okay, so we have, we have a one way stored indexes, but now we're actually going to change it to a different way. So the way we change it is so index is uh essentially a series of block numbers.

C

So index has one record well, logically, one record per per account or per a storage item, and imagine that the value is basically the string of block numbers so from and those that the block numbers all the block numbers where this account has changed its value and you can pack it in different ways. For example, we we currently store them as the actual numbers and when split them into the chunks.

C

But uh we have now we're now moving most of the indices into the bitmap roaring bitmap indices, so essentially just going to be a bitmap with a certain representation, and it turns out to be even more efficient than than than this just stored as a sequence, and I think we're going to change that soon to to the bitmaps, but even without bitmaps it works. The index works pretty well and the index is not actually that large either.

F

Well, it depends on like how active an account right so, let's say.

C

Well, oh yeah, you know I what I mean by not that large I mean in total for all the accounts in this in the system uh I mean he. um Let's say that I just let me find if I can find the the latest data I mean the latest that was in like september, uh of course,.

F

So, okay, so the update on it. It seems to be like incremental constant time, because you only need to like update accounts that were touched in the block. Correct.

C

Yes, so updates are incremental.

F

Lookup is just a binary search on the top of this right now, potentially yeah.

C

Lookup is a binary search, and so everything is kind of searchable. We don't do full scans, so yeah the oh. This seems to be, quite quite so, it seems to be 19th of august. The latest chart okay, so basically the I just have a data from 19th of august.

C

How big that index is, for example, from that date, the rollback set or we call it the change set. We probably should remove it for accounts. It was.

C

Okay, for accounts, it was 50 gigabytes for for contract storage. It was uh 65, 66, gigabytes and then index. On top of that, the index for accounts was 16 gigabytes in index for storage. It was 40 gigabytes, uh so that's basically allows you to um go back to any account any storage item in history reasonably quickly and figure out what the value was, and this is what we use for uh historical retro like re-execution.

C

So what you can do with trooper get is that you can pretty efficiently re-execute all the historical transactions, and my recent test was about 38 hours. I mean it takes just 38 hours to rakes, get everything back using the history. I mean this is basically where you don't keep updating the current state. You don't actually do any rights. You just basically run through the history and you're re-executing.

C

Everything- and I need this for because I want to add some extra indices like for code, traces and stuff, like that, and I decided instead of just rerunning this whole whole state calculation. I'm just going to try to do this historical thing and that's also quite useful for other analysis that people do it's very useful to to be able to run this through in about couple of days. You just get through all the historical stuff.

F

um Yeah, that sounds sounds good. I think most of the stuff. We understand well, okay. I would probably want to ask michelle michael if they have um any follow-up questions on top of what we do, that might be unique to us.

F

um So one thing we do is like garbage collection, and that seems like might help, especially with indices so we'll just clean it. Oh.

C

um Actually, so the garbage collection is going to be trivial in this case, because what happens with the rollback sets, for example, because rollback sets are keyed by the, so the keys are the block numbers so, and the the beauty of the robux sets is that if you remove the older robux sets, it does not affect the functionality of the newer ones, because they all kind of they never look um back. They always look forward.

C

So if you, if you uh basically think of history, uh if you think about history as represented as the miracle trees right in the miracle trees, the historical references go backwards in history like they, you know they go back back back and that's why the garbage collection becomes quite tricky because, as you try to delete something old, you have to first figure make sure that this old thing is not referenced, some from something which is which is still current. So you have to look uh before that.

C

However, if all your references go always forward, so this is what this the property of the of the rollback set. Is that all the references going forward? Never back! That's why, when you chop something up, it never invalidates anything in the more recent history. So that's why the rollback, because basically becomes deleting a bunch of records from a database and for the uh history indices.

C

It's also quite uh uh quite tricky whoa. You still have to go through every record, but basically you truncate. At most your you essentially delete some records, and then you truncate one of the records you just throw a bunch of numbers out of it or if it's a bitmap, you just basically truncate the bitmap. So in all these cases I think we had we used to have the pruning implementation, but we'll just throw it away to rewrite it again.

C

It's pretty trivial, and so that's one of the reasons why we have rollback sets rather than like forward div sets, because it makes the pruning trivial or because basically, we don't need to run any garbage collection at all right.

F

So michael wanted to say something.

E

Yeah one thing I was interested in that we haven't touched on yet, but you mentioned in your talk, was the concurrency changes that you had made where you were switching from this, you had parallel streams where each stream was, you know, entirely processing the block to more like stages where, in parallel, you were doing all the signature, verification and all the transaction execution. You know et cetera,.

E

Like ten times faster- and I was wondering yeah, uh you know more details about why, specifically that kind of a change made made things faster and like what uh what what the parallelism underneath looks like. Okay,.

C

So there's um there's two reasons why it made things faster uh kind of first reason is that uh when we, if we manage to to pro basically like uh first of all, if you zoom out of this whole, what does the ethereum client actually do right and what it does really is that it receives some new data from the network right that ingests new blocks or something or and then it basically goes through the series of data transformation.

C

It transforms this block, I mean transforms the the current state, so it's all kind of data transformations until you get to, let's say to the data in the stores history dating in the state, and then you figure out the state route and so forth. So it's basically all data transformation, and so the first realization that uh uh the first reason why the stage sink is faster is because, if you now take this on mass, it's basically a large scale.

C

If you have to do a lot of transformations, um then you can, um and all these transformations end up taking stuff from the database, doing something with it and putting it back into the database. So when you're actually putting stuff back into the database, it matters quite a lot in which order you're putting it in.

C

So if your, if your database is empty, if your table is completely pristine empty as it what happened in initial sink, if you pre-sort every all the data with the sorted level of keys, then inserting in in this way is about 10 times faster than, if you're, trying to insert them in some kind of random order, essentially, instead of using database as your sorting algorithm, which is basically like ins, and it would be some kind of insertion sort right, uh basically on the disk, which is actually a bad idea.

C

Instead, if you're, if you did the, if you said what we do is that we do quick sort of the memory chunks then put them into the files and do merge sort out of the files and into the database, and then database doesn't have to do any sorting. It just puts the data in the disk on the disk, so essentially we're utilizing much more efficient, sorting algorithm to put the data in and on that large scale.

C

It matters quite a lot, I mean you would not think of this, but it doesn't matter the sorting algorithm matters um and that's the first reason, and the second reason is slightly and more unexpected is that when you uh separate this work into these kind of much more homogeneous stages- and you start looking at them first of all your profiling before it becomes much clearer, you start seeing things that you haven't seen before you start looking you. Basically, all your bottlenecks become clearer and then.

E

Are you literally running like? Is there literally a piece of the code that says you know we are currently in stage one and we've got.

C

Right now, worker threats there is a. There is a directory in the code, which is called east stage sync, and then it lists all the stages by the names that stage heater stage. This I mean we don't call them by the by the by the numbers, because we sometimes insert new ones, but you can see each stage essentially has two functions. One function is move the stage forward.

C

Another function move the stage backward when we do the unwinds and and at each point it gives the you it is receives the handle to the database transaction, which is already pre-opened. So it needs to do its job.

C

Take the data from what it needs to put the data where it needs to put and then job done and then the same happens on the other direction and at some point the the overall kind of driver thing will decide when to commit that transaction because, for example, one thing we do is that we're we don't commit every every stage we just commit in cycles like when you're, so to make sure that, if you're running like a block after block your your database is always consistent up to the up to the like latest block, you never see the state of database in the middle of processing.

C

One block, you always see it at the points where you know the block is completely processed and committed, and you know if you're external reader, that's what you see. You see the very consistent snapshots and that's actually quite important for um you know for our pc uh requests and stuff like this. So you can simplify things quite a lot if you have this assumption um but yeah. The second reason is that, as you can imagine, the code organization becomes different.

C

You know you have these much clearer chunks of code, which does do one thing and do them well, and you know you can think about. Okay, can I do so you you say sort of like you unleash the the creativity, a bit more of people who working with this code because they can now uh it's sort of the the problem. Space is much smaller, you can simplify things and you can profile them much better.

C

You can like oh yeah like this is the bottleneck and we had these moments where lots and lots of uh optimizations actually happen after we have separated stuff out because it becomes basically obvious okay. This is why it's slow, or this is why it's still slow.

E

Right, yeah, yeah, okay, that that makes sense, and then that actually leads us into the the last thing we talked about was the architecture right. You wanted to do some pretty heavy architecture, changing.

C

E

C

E

And like most of them on that slide, it hadn't started yet. But you do you want to just like talk a little bit.

C

Well, we actually, if you, if you watch the one that I did last friday, the you know 23rd. This is where I went more into detail about the architecture components, but I'm going to shortly.

C

Explain you what I meant is that it started with rpc daemon when we split it up and we decided to ship turbo get as already two components in the beginning, so rpc demon was already separated and it's a moment it's running through some grpc connection, but actually what we will do quite soon, and I mean it's already working, but we don't advertise it yet. If you still want the sort of the old way of running everything on the same machine through the same database, you can still do that using shared memory.

C

It works, but we as I say we don't advertise it yet because we need to fix some other. uh We need to make sure that migrations happen in exclusive mode, uh so and so forth.

C

um So there's some technical details, but now um we have an appetite for more component separations because actually I think this is important and I think it would be crucial for this whole project to grow and become stronger, because I think the the sort of the death of other projects- or maybe all certain like- uh not not a death, but a considerable difficulties of other projects were that over time you know the the the development team cannot grow very fast and uh so over time the the team becomes overburdened by all the feature, requests and all the stuff, because it's been pulled into all sorts of different directions and uh the core team which still understands the code cannot basically satisfy this growing demand and, like oh, we need this.

C

We need that and we need to fix that, and we need to fix this. So I'm expecting that by splitting things out and actually making other people code owners. Let's say one of the good examples would be transaction pool. We want to split it out and make other people code owners so that they can run their own releases. They can so we will have interface. We will agree on interface, we will run the the core part and they will be running.

C

uh You know they will be developing transaction pool and it will be compatible and at some point I think, when we fill the functionality gaps. I would like the rpc demon to also be a separate project run by different people uh and then.

E

Right so so the idea here is not about scaling. You know the the software itself, it's about scaling the organization around the software, because.

C

Exactly exactly because what you also it's sort of it's, it's only natural that this should happen if we want uh the system to grow like because, okay, of course, it becomes more complex for the users to say. Okay now it's got six components. It consists of like what do I do like. I need to run a century transaction pool uh demon. This demon dad like uh here like six different things.

C

How do they talk to each other yeah that becomes more complex but like you, then there will be another tier of people who will be whose job will be to package this. All together, like that node, for example, they would they already made the package of trooper yet, but I expect them in the future to make packages of selected bits that people want like. Do you want the component with these things or like the the packaging.

E

Gives you more more freedom to customize your note to suit whatever.

C

Your needs, it's like it's like a different linux, distributions and stuff like this. So there are people who are specializing on providing certain distribution with certain flavors and- and I think, that's kind of a sign of the healthy growth of the system, where you have different tiers of people doing different things like not necessarily like core developers doing everything.

E

Yeah yeah for sure, okay, cool yeah, I think at near. Actually we do have a pretty good uh separation of components into different modules and in the code anyways it all ultimately gets compiled up into the same monolithic binary at the end. But like.

C

C

This is actually, if you watch this presentation, I started with the kind of discussion about code ownership and why it is important for the code to be outside of your binary. Even I think it's quite I mean for some people. It might be just a semantic difference like semantics, but not it's yeah,.

E

It feels like an implementation detail right, like.

C

Whether or not it's running in.

E

The same process.

C

Yeah, but it's not implementation detail, because it it's much more. So basically, I was explaining why it's very hard to like um get the changes in in some of the implementations and pretty much all of them is because people feel ownership.

C

They don't want to just you know they need to personally verify a lot of things like most of the things to make sure that if something bad is you know it's not going to affect the quality of the product, because they're going to be responsible for the result in, in any case, not the people that you know, you can't blame the people that you accepted contributions from for all these things. Yeah, of course I mean you, I mean you can, but it doesn't. You know it doesn't.

E

Look, it doesn't work very well if you do that.

C

No, it does not. No, it does not so you're, essentially, but the the proper com componentization with the proper interfaces very well defined, documented uh sort of stable interfaces.

C

They only these things give you the the ability to separate the code ownership, not some kind of clutch inside the code base that you're called a plugable, something like sure, okay,.

E

Yeah- and I guess the other advantage right is when you have separate binaries, you can have entirely distinct release cycles right, like if you're all compiling up to the same binary, then you have you're all on the same release schedule. Ultimately, at the end of the day, you cut one release, you know who knows what features got in there.

C

Exactly and then you're you have to be much more disciplined about the versioning of the interfaces you have to be. You can write in different languages. Of course you know if you, if your interfaces are and that's actually already happening. As I pointed out my presentation, we already have two implementations of, I mean they're, not production ready, but we have two implementations of p2p sentries, one in rust, one in go.

C

uh We have two implementations of the the core execution component, one in go, and one and c plus plus uh I mean the this whole thing is we we we're actually probably going to have uh two or three implementation of transaction pool, one in go, one in rust and one in uh whatever in python.

C

So I think that's. uh This is the future right.

F

Cool well, thank you so much. uh I think, out of time, yeah.

C

Right, no, it was a pleasure, so I um I hope that it was um useful and um yeah.

E

Yeah, thank you so much. I appreciate it.

A

All right thanks.

E