IPFS IPFS þing 2022 - Content Routing 1: Performance, 10 Aug 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Decentralization Paths for Network Indexers - @jbenet - Content Routing 1: Performance

Description

Decentralization Paths Network Indexers - presented by @jbenet at IPFS þing 2022 - Content Routing 1: Performance - https://2022.ipfs-thing.io

A

I'm going to talk about decentralizing paths for network indexers. uh This follows some talks about the network indexers in general, um and the current architecture is a centralized system, uh so the network indexers that we're describing um coming in come in as one additional content writing system uh for ipfs networks. There are many existing content writing systems out there. uh This is probably not even a full mapping of these. This is just kind of the the the widest wettest used ones, um so the public hd is by far the most common content routing system.

A

um There are also automatic, like local area network dhcs that form. uh So, if you're, if your regular most kubo and a couple other implementations do this, where they disconnect from the rest of the world, they will form automatic dhts in the lands and so on. uh So those already exist. uh Many private networks out there exist today that run their own separate dhts.

A

um There's the hydro boosters that you heard about um and there's been other talks about them in the past that come in and support the public dht. So that's a it's, not a separate content, routing system, but it's a subsystem that ends up helping the public thc.

A

um There's a bit swap content, routing tool which um there's there's a frequent um the ways of enhancing bitswap, but in sort of like hack content, routing in a sense where um you can um use once uh and so on, within bitswap itself to just optimistically end up in a spot where you're likely to to um uh already be connected to peers that have the content or they're connected to somebody who has a content.

A

um So the way ways to kind of hack that and there's been many people that have tried things like that and then there's just local dna mdns discovery. So many ipfs implementations have ways of connecting in local area networks using standard protocols and once nodes find each other and connect to each other uh they'll automatically bitswap content to each other.

A

So there's kind of this implicit uh content, routing discovery loop, where nodes that happen to be in the same network will happen to connect to each other once they're connected to each other content requests will just work but there's kind of an implicit content routing system there. It's not an explicit thing that you you enable now um the start of the hash indexer you just heard about, um and just one uh additional content writing system. There's gonna be many more so over time.

A

The community is gonna experiment with a lot of tools um and I'm just gonna kind of go through a few um and by the way, all these are kind of all. These systems are optional and different groups connect to different systems, and they have different reasons for using them. um I think we're still working on how to build composability into this system like how could you compose these into a thing that just kind of works for most use cases?

A

This tends to be um it's not clear that there is a just work solution, because there's a lot of security considerations and authentication considerations in terms of where you want to ship your reader, your queries to uh so as we'll see tomorrow, um there's a lot of privacy constraints here, uh but anyway um it nevertheless, when shifting a lot of content into um into one indexer uh that currently centralized um it'd be good to kind of describe. What are the paths that we're considering for turning this into a different, uh centralized network?

A

um One other thing worth noting, like the amount of data that store the hash, is storing now just can't go into the dht. It's just an enormous amount of data so like it's either like spam, the dnc and like wreck it or, or do this um the um so we're. Currently, these are like five different ideas or five different paths. uh There are many more potential things, but just to give you a sense of different directions.

A

um One very simple thing that we can do is to follow a model similar to the gateway model, where there are just a lot of different indexers and there's a registry of indexers. If there's something like less than 10 indexers they're big, this can work fairly well and 10. Different organizations in networks tend to work reasonably. Well, it's not great, it's not an ideal solution, but it can work for some period of time. I wouldn't take any one of these solutions as permanent.

A

I would think of all these as kind of transient systems and so uh federated separate systems could work fairly well. The gateways are doing that today and they're working quite well, and so on. There was in the past, people were considering building one gateway out of multiple gateways, but that would have been you know, kind of like integrating them. So that's kind of like the second idea, which is: could you take multiple different organizations that together work on maintaining one system for something like the gateways?

A

I probably wouldn't recommend that, because um the gateways problem um is high, like performance is like it's a huge constraint and the variance of applications is very high, and so you end up with different gateways that tune for different use cases. um But that's not the case with indexers linux source is basically the same problem, and here we can follow a.

A

We could take a path similar to the dram network, so the dron network is a a a uh blockchain randomness beacon network um that that forms a um that emits a new um bit of randomness every 30 seconds. No every 10 seconds now, every second now three seconds.

A

Cool so so the droid network has a um here I'll share.

A

So that you can look at something, um so the newer network is a distributed. Randomness beacon, um it's built with um with the participation of many organizations there are.

A

There is one um set of large deployments of of the drain system called the league of entropy, and so there's a number of organizations that together run the league of entropy and the league of entropy is one that deployment has multiple different networks that you can connect to that emit randomness at different rates, and so as long as and so there's some security properties embedded in this protocol and as long as um there's some sort of trust threshold.

A

Here where this is a threshold cryptography um uh protocol and as long as you trust um enough participants- uh and you expect them to not cheat um then this this matches your secure model, and this can work, and so something like that can work here for uh network indexers as well, where you could have an integrated system that works kind of like drand, where you could have multiple different entities working together and maybe assigning portions of the index. So this might be sharding across parts of the key space and so on.

A

The problem with that is like it quickly will turn into you needing some way of showing when entities are working well or not against the protocol. You'll need to start defining what behavior like what what defecting from the protocol looks like um and then have to start programming against it.

A

uh We could also explore explore, like um traditional peer-to-peer, technol techniques against this uh similar to the dht model. We could go into trying to there's a lot of past protocol designs for things like this, so we could find a distributed peer-to-peer indexing system um that is meant to run in this kind of trustless, but but uh working well enough um structure and arrive at a good solution.

A

um The problem is that we've looked at a lot of these kinds of things over time and there are promising directions, but it's a pretty hard problem like you, you in order to meet the scale requirements and the performance requirements and knowing the fact that these indexers will become both very large.

A

Like will end up with large routers in in a small number of locations, it's fairly difficult to guarantee that this will work really well, um there are other directions dealing with with blockchains, so once you have smart contracts and mechanism design, you can do verifier networks where you can use blockchains to commit resources against running an indexer. You can do things like check that participation is, is continuing to work.

A

Well, you can reward participants for doing this, and this is the kind of thing that can yield the hardware build out that I was talking about earlier. You can get to content routers that are massive scale, localized and work really well. If we have a proper economic model for how you run these things, that's one of the things that that's one of the other things that might make structure peer appear, not work.

A

Once you deal with massive amounts of traffic, it costs a lot of money, and so, if you want to run massive amounts of traffic with high availability, you need an economic model to support that, especially if you want to go to regions all over the world.

A

So you could do this as kind of an l2 l2 chain. You could also do take utico the hierarchical consensus tool that existed a and just create like an entire like l1 chain just for indexing um with or without it like. You could start it today without economic structures and see kind of how this this thing might work uh and then over time evolve it to to have uh some incentive structures. So these are like five possible paths. um All of these have varying degrees of complexity, varying degrees of um of utility over time.

A

um It's a time to like discuss these things, because we could take many different paths here um or we could take one of any of these paths, um and so, if people have opinions about the direction, they should go definitely come talk to the network, indexer team uh cool. That's that's it! For that. Any questions on this.

B

C

So as we get bigger and bigger.

B

Great place for us to be um at that point in time with all those indexers.

D

Have you thought about how you're preventing basically every single index.

A

Yeah, it would depend on on the structure here like what exactly do you do like? What topology do you uh use like? Are these full index full replicas everywhere, or do you do some of that? Regioning that I was describing earlier today of like? Do you split things up such that indexers in particular, regions get different distribution of of content? um It's likely that you want.

A

You have one content set that you want to be available everywhere in the world um with very high with like extremely low latency, and then you have a different con uh set, which is um you're, okay, with like a that content being available in a set of regions and you're willing to pay for that being like super low latency, but then you're, okay with 100 millisecond or you know um one second latency from everywhere else in the world, and so it kind of depends on the latency requirements uh that you need for.

A

For that content like once, you start pushing into petabyte scale indices the user's goals really matter.

A

Because also if you have a massive content index inside of a data center, that's just content, that's relevant in that data center and not really outside you. Don't necessarily you don't want those applications to have to like come up with some economic model to move all of that data out into the rest of the world and replicate that everywhere else.

A

C

um Is there a solution you prefer using more relevant um specifically for like web users.

A

um Like uh like specific use like some of these approaches, mapping better to specific use cases that people have.

A

I mean, I think, I think, for web use cases. um You certainly don't need like the intra data center content, routers right with web users. You definitely want content to be accessible um at least some fraction of it globally and a large fraction of it regionally with very, very high with very low latency.

A

um So I think you would probably want some kind of you might be able to scale one single index and replicate it everywhere, but I think that would get into the scale of like.

A

I don't know five to ten petabytes of content, and so you would need five to ten petabytes of index um everywhere and that's fairly big like if it was a hundred terabytes 100 terabytes for petabyte. That's easy to replicate around the world um so that one order magnitude is like kind of annoying. You go from. You know a box, this big to like a whole rack, um and so that difference just makes it makes the scale go from.

A

One person can go and set something up in one data center or in one internet exchange to no. You need like a whole team to operate this thing long term, and so I think, like the more we can end up with one box replicated everywhere. That's that you can get ten thousand copies or like a thousand copies of and like no problem yeah, a thousand, probably maybe.

B

A

A

Other questions.

B

D

Is it on the planet, but in the first model.

D

You know like different entities, could go and set up an indexer node and by that they can observe requests coming in and at the same time they act as a retrieval provider where they have big enough cash to catch the most popular and then that's how they make money. Out of you know the most popular in their region.

B

So they have an incentive.

D

To actually keep an update in the center.

A

Yeah you could a problem. There is that you might get into this conflict of interest where they're supposed to run this like indexer and provide all providers. But then, if they're making money on the retrieval side, then they might like only start showing that one you get into these, like it's kind of like when, when uh when us, when a search engine also runs the shopping tool or whatever, and it's like oh check out these results, click here and don't look at these other like uh lower price items right and so like.

A

That's, that's the issue. I think you can. You can turn it into a thing that has that could still work. Well, if you have a good guarantee about all the providers being being returned and um the right metrics being gathered, um then I think you can arrive and like the internet today has good.

A

There are many systems that like fall into this and and we have found good ways of handling it, um but I do think that, given that we have mechanism, design and blockchains already and they're well deployed just use those just use those to verify that operation is correct. I.

D

Think it has to do with all the commonly providing economic models so like one index or retrieval provider is the main one for some content, because I have paid this particular one, then you know I can.

B

D

It's always in the best interest of everyone to get a cash copy of my content closer to where some other requests. If you can request these, but then these guys serving the content, always returns back a share of the money that they're going to get because.

B

A

Well, I mean you could have a model where you can use um cryptography and um smart contracts in like a blockchain setting without needing an economic model, meaning you could run a separate blockchain that has no economic incentives. It's run by this federation and you encode in it. The way of checking that behavior is correct, like you figure out the cryptographic protocol, part of it and you just use it as a way as a mechanism to catch misbehavior and slash it. Well, sorry, you don't necessarily slash economically, but you can like have a reputation system.

A

You could you could go back to the peer-to-peer history like there's tons of reputation systems there. You just use one of those and that's your economic model, and if your reputation drops below a certain rate, then you gotta, expel.

A

So I think something like that could totally could totally work. Other questions.

A

Thoughts suggestions, other models.

A

Cool all right, the question.

B

Is anyone who's running a gateway if they can solve the they can just resolve on them?.

B

That's a nice win on performance and on.

A

B

So there's a of what we need to get to for that to happen, and it's something about being able to follow the current and having accomplished that you're willing to just use your local index.

B

B

A

Yeah we can follow a model kind of like ntps or drams, um so we we, um we based iran's um cache layers on the ntp structure, where you have one set of authoritative nodes that is figuring out reality and then there's a set of nodes that are just caching and extending. So you could have a model like this.

A

Where there are these network indexer cache copies and they're they're, not writable at all, um and you can you when you run a gateway or some large service, you can run one of these entirely and stay up to date from the indexer. So content still has to go through the indexer and then propagate it down. But then you can now run one full one full replica here t what would that be interesting to you guys.

C

um Something that we started thinking about.

A

um Do you guys talk about content routing as a thing and moving the world towards that.

C

So another moment, especially because we've mainly focused on like, um uh but the content crafting especially, is like like we're, trying like like historically regular handshake, is like certain communities clusters um and now like we're, trying to move towards operating more like like directly into the pops. Until that becomes a lot more prevalent issue.

C

In order to know where the contract is what it has to do and how we provide internationally.

A

Yeah yeah yeah sounds good cool, uh any any other thoughts. Questions.

A

All right sounds good. Thank you.