IPFS IPFS Camp 2022 - Measurement & Performance, 31 Oct 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: DHT Routing Table Health - Guillaume Michel

Description

This talk was given at IPFS Camp 2022 in Lisbon, Portugal.

A

The DHT one is one of the principal component of lip2p and apfs in order to find content, and um so we've studied how it work in practice. We've measured it and that's where we're gonna Dive In.

A

So first, um let me give you a quick introduction of what is the DHT what's a distributed hedge table, so it is basically a computer overlay Network, which means that node are connected to the same network using internet and the Overland is not participating in this DHT um will be connected to each other on top of the internet, and so the abstraction that the DHT gives out is.

A

um It is a distributed key value store where you can find the location of some content according to the content itself. So it is um so all of the store.

A

So all of the data is stored in a flat key space, which means that if some files have a similar um similar name of some some similarities, there will be no similarity to the location where they are actually stored in the DHD.

A

um And why do we have a DHT? What are the trade-off there? So if we have node in a network, so what we could do is just keep track of every single node in this network so that we can reach to them directly and it would be fast, but then it doesn't really scale because we have to keep a very large state.

A

um Alternatively, we could keep track of only a small amount of peers, but then, if we want to look up for some content in this network, we have to iterate through all of the peers and it's not efficient. So the trade-off is we want to keep track of login of peers and it allows us to efficiently have a lookup of login complexity, which means that we have login hops in order to to reach the content.

A

So now some specificity, so ipfs and lip P2P implement the Academia DHT. So we use a key space of 256 bits, um so each node in the network has its own peer ID, which format you may be already seen, and it can also be represented as a 256 bit, so it leaves in disk space um and the content also live in the same key space as the peer IDs.

A

So the locality- let's say in this key space, is based on the xor distance between the so the bits um the DHT itself in apfs doesn't store content, except in the case of ipns, but it stores pointers to the content, which means that, if I'm looking for, if I want to publish a file to the DHC or to ipfs, then the file is going to stay on my machine and I'm just going to advertise that I have this file.

A

So if someone is looking for this file um come and reach out to me, and um so the way, let's say the state, so how we keep track of the peer is organized. Is that each peers, um cheap, so sorts the the other peers? They know um by logarithmet core distance and stores them it one. What we call K buckets so I'm gonna focus a bit more on this.

A

So now, if we take an example, so this is an um an existing Network, so let's say it's Academia and each of the peers there um are represented, so the peer ID are um yeah just bit strings. So just for convenience, it's not 256 bits for an identity, but just four bit, but you get the ID, then we have one new peer that wants to join the network. So that's the one in green and what it needs to join the network is a bootstrap node.

A

So it means that if you want to join the club, you got to know someone. Otherwise you cannot um yeah join this network, which is obvious, so you need to know at least someone and so what we can represent. So that's um the tree with the sore distance I'm gonna get to this later, but in order to get to know some more peers in the network, um so the green node, the new one needs to ask um and the other peers is no okay. Can you give me some more peers?

A

I need to fill in my address book. In order to be able to communicate efficiently with the network, so what it's going to do is going to look for itself. So it's going to ask um the the period knows. Okay, can you give me some peers that are close to me and the other is going to say sure.

A

So in this case you can ask one one: zero zero, which is closer to the green node and then so iteratively, um the the new node is going to search for itself, so um ask all of the peers. It knows, give me some peers that are close to me and eventually, and it will learn to know some new peers, and so that's how a node will fill in its routing table. So it keeps track of all of the visited peers.

A

um So once the node has a jungle Network, it will periodically look at look up for itself in order to keep so a node has to know all of the the peer IDs that are in the closed neighborhood. So here it will be the nodes that are around like in Orange um when you perform a request for some content or for some peers.

A

It is the same as is the the same key space, uh the so the request is iterative and is resolved similarly, and the request happens concurrently, which means that you don't need to wait for the answer of a single peer.

A

You just ask multiple peer at the same time and you get many answers so when you get a request for some key, what you're gonna do to help the other find the content it's looking for, is you're gonna, look in your own routing table and give the 20 Clauses peers to this content so that the the peer has some Choice um to go and find this content.

A

um When you publish some provider records or some pointer according to the content you are hosting in the DHT, you will push it to 20 peers. So the 20 Clauses peers to this content, identifier, and so each bucket gonna, come back to the bucket later, is capped at 20 piers. And please note that these three constant, so the 20 closest period to the requested key, um the the replication index and the bucket capacity are the same value in ipfs. But they don't necessarily need to be the same value.

A

So those are three distincts constant and so now, if we come back to the key space and the key bucket, um so as we say, a node can learn about some other peers and it will sort them inside buckets and so for will will give IDs to the bucket in order to just name them, and so the bucket number zero is for the bucket that are far away in the short distance from the node itself.

A

And so it is the peer that share uh common prefix of zero bits. So here, in this case, we'll take still the same node as a reference node and the nodes that are far away from from this node. um So this one. If so, if our reference node were to learn about them, it would put them in the bucket number zero because they are far away and then for the bucket number one. So that's the appear that are closer and that chair prefix of length one and it would place them into bucket number one.

A

Then we go on bucket number two number three and as we get closer and closer, there is going to be less candidates. So what we expect so now imagine you have a tree of depth, not four but 256.. So the buckets are far from you are going to be full because there are so many candidates to fall in this bucket, but the bucket close to you are going to be totally empty because there's just no one that has this prefix and so that's kind of what you would expect.

A

So at each bucket you divide a number of candidates um per two.

A

And so something that is also important in this cable cat is how do you select the peers? So it's simple: when you learn about a new peers that you didn't know before you just compute in which bucket it would belong. If there is some place left in it, then you just put in there, but if the bucket is already full, you just don't take it.

A

It's on the waiting list and then um regularly the the the peers will probe all of the peers in his routing table and if they fail to reply they will get evicted and that's how new peers can um get inside the routing tables, and so that's um So.

A

Eventually only the very stable node are going to be in there, because the one that are not stable are gonna get evicted, so we performed on some measurement and we use the nebulac roller to crawl the network and it has technique to get the routing table of every single peers in the network. So that's pretty awesome, and so we took some data was earlier in April this year and yeah.

A

So the enabler crawler can give us a global snapshot of the routing table of all of the peers at one specific point in time, um so yeah. So the nebula crawler gives us this this snapshot and from this so we know all of the peers of the network and all of the routing tables of all the peers.

A

So we can reconstruct the K bucket for every single pier and, as we have a global knowledge of the network, we can verify if um some peers should belong to some buckets but they're already missing from there, um and also we can look up so one of the property of Academia is the routing is sound. Only if you know all of your closest neighbors and your closest neighbor know you, because if your closest Neighbors in a short distance do not know you, then you cannot expect anyone to route to you.

A

If nobody knows you the network, and so that's what we wanted to verify. So um the first graph we have is um displaying the the ratio of unreachable peers. So it means that we took all the routing tables and we tried to probe each one of the peers, and so that's the ratio of peers that were in the routing table, but that were not reachable at the time of the crawl, which means that there are still entries they are in the routing table, but they are useless. We don't really want them and um so 0.

A

yeah 16, so 16 is very low. um So it means that um yeah, so out of the 20 closest peers. It's um only like uh yeah like a couple of peers that are missing and we're expecting much more and we can see a difference between the bucket with the low ID. So back at zero to let's say eight and the bucket nine plus and so bucket 0 to 8 is the bucket that are full.

A

They have many candidates, and so they will easily replace the unreachable piers and the others may keep the debt PFO longer because they don't have much candidate to to join this bucket and it takes more time to uh yeah to clean up second result. So that's the peer distribution in the bucket, so on the x-axis you have the bucket ID, so it means that bucket zero one and so on are far from you and the bucket. So we have up to bucket.

A

20 are close from you, so we can see that the bucket um up to number eight are almost always full, which means that there are a lot of candidates there and um uh yeah so that they are capped. There will be some more and, as we go down, we can see the exponential decrease.

A

um That is expected because, as we said before, we expect the number of candidates to be divided by two, um for when we increase the bucket ID and so in Orange, we can see the missing peers, which is um so in blue. That's the number of pure per bucket that we observed from the routing table of the peers and in Orange. It's the appears that we're not in this routing table, but that we observe in the network.

A

So it means that they were missing from this bucket and we can see that this number is very yeah. It's very low. It's a bit high for the bucket ID number 99 plus, but it's still very, very acceptable now about the 20 closest peer awareness.

A

So, as I said, you have to know your neighbors and your neighbor have to know you in order for the routing to to work, and so the result we got is that um 50 sorry, 61 percent of the peer no all of their 20 closes here, which is excellent and that 95 of their peers um no at least 18 out of the 20 peers. And it is not a critical failure.

A

If you are missing one of the peers because you, in order to be discoverable, you need at least one peer in your closed neighborhood to know you in order to be accessible. So that's again, very good stats and in fact we would not expect to have um 20 Piers, because when you look up for yourself, so you're gonna get the 20 kilos appear to your own identity and you're going to be included in this identity. So in fact we expect each node to no more than 19 Clauses peers plus itself to itself.

A

B

On the previous slide, you showed the CDF and PDF and like I'm, wondering where the math for that comes from.

A

Where the math come from or.

A

Yeah exactly so, it means that um here we have um 60 of the peer that know. All of the 20 closes peer. We have like 25, then 19 out of 20.. We have like what is it like? Five seven percent I know 18 out of 20 and so on, and so we have like almost no peers that no only 10 out of the 20 piers.

A

uh Yeah it's the average on the snapshots.

B

A

It's based on measurements, yep.

A

So yeah, that's um the the good news we have and then something a little bit more concerning is that the diversity in the bucket, which means the number, the total number of distinct peer IDs in, um let's, let's say, all bucket number zero.

A

So if you take the bucket number zero of all of the peers- and you count the number of distinct peer inside these unions of bucket, um it tends to decrease over time and which is a bit bad for uh yeah, just load, balancing and decentralization, because it means that if you are to reach a peer, that is on the other side of the network, um it's going to be only a small set of peer that are going to take all of their requests.

A

So we also did some measurement on that. So that was testing data from the first measurement. So we can see that the number of peers is was approximately constant and what we got is so when we so yeah. Let me explain this graph so when we take the bucket number zero for all of the peers and count the distinct number of peer IDs in there um we reached um these numbers. So we yeah counted for each week and we have number week number zero and week number nine.

A

So that was the all the the duration of the study, and what we can see is that the bucket that is the most diverse, is bucket number nine, which, um if we recall, was the first bucket to be and not not full, and so we would expect to have more diversity in the bucket that are full right, because there are more folks there.

A

But the thing is um that over time as we tend to keep the very stable peers inside our bucket with a lot of candidates, we're going to keep the very stable one and the very stable one are going to end up in every peers, bucket number, zero and one, and so over time. We will get rid of the unstable peers and only accept the stable one, which is good. Because now you have a stable network.

A

But then it is centralized because um all of the all of these very stable Pier start to get lots of requests, and so it doesn't scale anymore. And so what we can see here is that um the diversity is decreasing over time and I, don't yeah. Probably it will stop at some point, but the worst case is that the bucket number zero is centralized.

A

We have only 40 different peers, so 20 for each uh half of the key space, and that would be just terrible, but I mean yeah, we're not that there yet, and so that's the moving average of the of the difference. So we can see so the diversity in bucket number, nine in green is always higher than the the one in um the bucket zero and one. But what we can see is that the diversity is decreasing much faster in the buckets with a low ID than in Buckhead, with a higher ID and so yeah.

A

It means that we are converting to less and less peers, and that's not good so yeah.

A

um Yeah another, maybe an improvement that we could have is that currently, um when an ipfs node leaves the network, so you shut down your computer or you just kill the ipfs or Kubo process, you're going to flush everything, and when you open up the computer again, you're gonna have to insert yourself again in the DST and do again all of the lookup and get to know some other peers and what could be good um for a centralization.

A

I mean to come yeah to fight centralization and uh to make the Buddha process faster is to keep a state. So you, um when you shut down your node, you just write your routing table or the peer list on your disk and when you boot up, you try to contact the same peers again so that you converge much faster when you boot up, and you avoid to get those Central bootstrap nodes directly in your routing table, because that's the most popular and stable one, and so they get less requests and we have a faster convergence.

A

And now I'm gonna give an explanation on this graph. So actually it's a very new graph I think we had it like two days ago and the explanation we have why we can see so many clusters and so many colors, and it's always going to be group of power of two. So if we can see the largest, um let's say empty space that we get is between the left and the right. Okay, so that's like two groups and then we can see another big line between the up and the down.

A

So it's another line and then we can split each one of this quarter into so that's the color and inside each of the color. We can yeah clearly see that there are two distinct groups, and sometimes you can even see four distinct groups in this color, so each subgroup of a color has again two subgroups, and so the way we can explain this is the the larger. Let's say: DMZ or empty space um means that, for instance, on the left.

A

We have all of the nodes that start with the prefix zero and on the right, all of the one that start with the prefix one and because they form a cluster together.

A

So if we take it from the bottom, let's take the two same reference nodes. Again, um these nodes are going to be closer together, so they will always be um and connected together in form a cluster. And then, when we look to the closest node to them, it's going to be those two guys and it is bijective. So those two guys are going to be the closest to this one and so they're, going to form a cluster yep go ahead.

C

Check your IDs as a sample. How would you plot it on this chart? How do I take AP or ID and then plot it on this plane, so like one zero, zero one? Where does it correspond to on the on like on the plot, so.

A

We'll have to have a mapping between the colors and prefix, so what it means is each color is a is a distinct prefix, and so what you take is you take just the yeah, the prefix of a node, okay.

B

A

Can place it according to the color, so let's say, for instance, if we Define some arbitrary prefix, the left is going to be uh zero and the right is going to be one okay and then upper left is maybe zero. Zero got it and then down left is going to be zero one. Okay, so there's three bits plotted.

C

Here is that right, there's eight there's eight segments.

A

Yeah, the colors are three bits, but then each color is divided in two. So that's four bits and then each subsection is divided in two again.

C

And what determines the radius of the point or how far it is from the origin?

C

um What do you mean? The origin I see a bunch of dots on the screen with different colors, and some of them are farther from the origin of that plot than other ones. So what determines whether a node is plotted farther from the origin or closer the numerical value associated with the peer ID or what I.

A

Don't it might.

C

Be over analyzing the chart but I'm just trying to figure out how I go from a pure ID to a color in a space on that grid. If that makes sense, yeah.

A

Yeah but I think it's very delicate to have a two-dimensional representation of these connections because the academic Source base. So if we have let's say n Bits Keys um is going to live in an n-bit Dimensions. So that's just like a projection into.

C

Dotted on 2D, so how do I go from a pure ID to a 2d plot strip off the prefix, give it a color and then with the remaining bits? How do I locate it in space.

A

Yeah I mean I I. The way the image was generated, I don't exactly know so.

C

A

Can't say exactly for the location, so why is it left or right or more on the outside or on the inside of the thing? But so the explanation is more for the cluster. So how are nodes grouped together, yeah and yeah, so um so to conclude this presentation, so we say that the the DHT is in a very good shape. So we have um so.

A

We know all of the peers that are close to us and they are a really low rate of debt peers inside the network, so stale and trees, um the distribution inside the K buckets is as expected. So we can see the exponential decrease as the bucket index increases um and we have- um and yes, so the only concerning thing, maybe is that we lose diversity over time. But um there are some some solution to to tackle this and it's not critical at the moment, all right, so that was it.