IPFS IPFS þing 2022 - Content Routing 1: Performance, 10 Aug 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Network indexer roadmap - @willscott - Content Routing 1: Performance

Description

Network indexer roadmap - presented by @willscott at IPFS þing 2022 - Content Routing 1: Performance - https://2022.ipfs-thing.io

A

So this is this: is sort of a state of the world on network indexing um and and store the index. So this is a little bit higher level than what andrew gave in the lightning talk earlier today and is also a little bit more forward. Looking at sort of current paths that were that we're thinking about um so in particular, we've got a problem of scaling content. Routing we've got store the index as a current thing. That's doing this.

A

uh We can talk about where we want to go, what interfaces we've put in place and how we're thinking about evolving them and and where we're heading and the things that we don't know how to do um so right now. um This is a problem that we brought up a few times. I'm going to be quick about this part. um When you look at the hydra, it's got 2 to 4 billion records, um 20 000, 40, 000 peers.

A

That means it's asking for about four megs of routing resources per peer right now, um when we look at our network store the index, we've got on the order of eight billion records um which are coming from right, like you think about file coin as like, maybe a thousand ish miners or storage providers that are that are producing that.

A

um So we've got a scale issue here, which is the amount of content, is growing a lot faster than the number appears, and so that's sort of like one of these fundamental short-term things of this dht model of just scaling the content with the peers. um We can't we can't just have a homogeneous pool of peers. We need to make use of the heterogeneity in resources, and the current dht stuff has some security things that you run into around sybils and sort of resist resiliency.

A

If you try and make a heterogeneous dht right, if, if someone can claim an arbitrary portion of the dht space, it becomes very trivial for them to do bad things to your ghd.

A

um So we've got a couple things that we've, I think, become somewhat opinionated about in terms of what we need to make this problem. Tractable one is that we think that it's pretty important for the index providers, the the producers of content to have an ongoing interface where they sort of say this is the content that I have and that stays accessible like they're responsible for, like tracking, that and being able to have that get pulled by the network, and so rather than them just saying like hey I've got this sid.

A

Has these one-off messages they need to like, remember those sids that they've said and that string of advertisements and be able to have multiple people. Ask them about that, and so that gives them accountability, which is like you can't go and tell one guy. I've got this sid and then give a different manifest to someone else, because if we want to have any sort of reputation of like you know, you really then need to actually serve that content.

A

That you're claiming we need to sort of have this ability to hold hold accountable, and it also is an interface that allows for multiple competing indexing systems uh to both pull right like that. Content is now available and it's not just been pushed into one network system, but rather another networking system. That's trying something different can also pull that content so that that came up with the structure that we're using in store the index.

A

The anyone who's producing content that they want to make indexed, uh creates a chain of advertisements the advertisements are assigned. Each advertisement has a bunch of uh sids that it points to or multi hashes.

A

We have the index provider uh sort of put out when they're like a proactive ping, when there is new data available um that helps with lower latency, because it's it sort of gives say, hey, there's an update available and then the indexers are fetching the delta of what they don't have yet. So they go back until the previous thing and that allows them to not get overwhelmed.

A

So if, if, if we end up with as with less headroom- and we can't just respond to every ping, we've got a sort of graceful degradation where we can collapse and and do these sketches of multiple advertisements at a time. um And so in particular, you know if someone is creating new advertisements every second and we decide that.

A

That's really quite often uh it's on the indexer side to tune that down say actually we're gonna we're gonna, pull every 10 seconds or every minute and and get sort of that that content um so where we are now um so. These are. These are sort of some distributions of how large uh the number of entries that we're seeing uh per advertisement and then how long these advertisement chains are growing.

A

um So we're seeing advertisement chain depth, so this is like the the number of advertisements that providers are doing.

A

They vary we're seeing a lot of them, which we think are most of the file coin ones in the in the range of 10 000 to 50 000 right like corresponding to these deals right like if you've got 10 000 deals. That's like a lot of terabytes of data.

A

And then, at the high end, you've got like some, I don't know getting into orders of millions and that's someone like nft storage that I'm just going to keep staring at through a lot of this talk as the you're out on this edge of the uh storage provider uh range of what indexes look like uh and then the the core. The sort of counterpoint is the the number of entries in those, and so what we're seeing is again this.

A

The sort of the big jump area, which is like this bulk of file coin deals, is that you've got uh between 32 000 and half a million sids in one of those 32k blocks right, and that's that's about what we expect the block size to be right, if you're, if you're pushing in files and they're getting to be a meg or two meg blocks for a lot of that, then in your 32 gigs you're, going to end up with under half a million cents, um and then you've got uh a few that are, you know getting up to up to that like million-ish, where it's like biocoin state tree and a lot of really small blocks, um and then you've got a few that are really small, where it's it's this trickle of data, but the trickle of data there's very few providers that are doing that.

A

So this is per provider. So we're not like if we counted the number of advertisements that had small numbers, there would be a lot of the small numbers ones, but they're coming from relatively few providers, I think, is the way to read this um okay. So where are we going? Well, right now like in terms of like this year, I think the goal is like: let's get it pretty fast and like pretty reliable, um we're not there yet.

A

I think we're doing pretty well on our on both of these numbers it and it's a matter of activating enough of the network to be able to say that we're actually doing this. So when we look at store the index itself, um most of our lookups, like at the point that you get your http query to us or your network query to us. We get a response out in about one millisecond, pretty much all the time.

A

So we're reasonably happy with that, um and then the question is like: how much of the time do you hit us and how long does it take um for for things like hydra we're under 100 milliseconds? I think in order to get to having ipfs gateways match this sort of target we're going to need replication, so we're going to need at least regional replicas and we're going to need reframe operations hookup, where we're not going through hydras.

A

um So that's sort of the how we get to to this for ipfs and then it's going to be what are the other sort of clients and ingresses where we need to reach this.

A

um This is sort of going to to juan's point about pushing stuff further right, which is great. We can say we're going to do three or like a set of regional replicas and that's at this level going to get us under 100 milliseconds um to get down to 10 milliseconds or something like that.

A

We need to actually care that there's like there's a lot of data centers in iceland and if we just say great, like aws, gives us what like 20 regions or something that's not getting to the the 10 different data center places where they're servers in iceland.

A

I think one of the things that we are likely going to do is add a cache level, as well as the full stored the index. That's a more realistic thing to try and push with gateways and with edges, and essentially the thought there is there's a lot of data.

A

That's never getting requested, and so if we can just have a cache that anytime anything actually gets asked for just keeps it, but if it doesn't have it, it falls back the first time at least that'll be a few orders magnitude, smaller, probably um and content that gets asked for once. Maybe we're okay with that being a little bit slower than content that has been asked for before in the region.

A

um So that's that's sort of like the next plan that doesn't get us into this this edge. Yet um I think that's going to be this incentive problem like that's, not something we're going to try and talk to all the data centers, um but but I think that's that's sort of like this.

A

Where we would like to go, um I think I think there is a question of like. Should we be pre-caching to ipfs?

A

Hosts um like is, is that is that cache thing where we are just doing it when we get a request enough or do we need to have pre-built those caches and push them in advance, because if I'm going even to regions, there's something that's hotter than what someone has asked for it in this region like there is a set of content that we know when you spin up a new replica, it is going to get asked for, and so we should like proactively push that that shouldn't purely be on demand like oh, you got unlucky because your node restarted, and so the really popular like front page of the new york times.

A

You have to fall back like we can push a bunch of that stuff in already, and so we have to identify what is this uh head of content routing stuff and get that packaged as an even smaller subset? That is, that is proactively pushed um so that that's going to be harder, but but seems like there's a tractable problem in there.

A

So this is going to be scoping down a little bit on on replication, the stuff we're working on right now we want to be able to spin up new instances pretty quickly um where the critical factor for how long it takes to spin up a new replica is like what bandwidth that node has like. How fast can it pull down? Three terabytes of data, like it's gonna, be hard to get around that, um but it shouldn't be.

A

Oh there's this storage provider that has data that we care about, but it's really slow, and so everyone who's starting up, has to wait and re-pull the data from that source. So we need to get it so that uh the data doesn't have to only come from the sort of authoritative origin, but rather we can have archive snapshots that also because this data signed like I don't need to get it from there.

A

I can just like go back to content routing and get the data from a file coin, backup or from a car or something um we need to be a little careful that we don't cause infinite loops of infinite content.

A

Routing in doing that, um we sort of already realized that if this goes into sids, then those become provided and we just sort of make our lives harder, um and we would like instances to be eventually consistent um so that if we, if things did pause like there, would be sort of an even place like right now, um provider sets do not always stay equivalent uh enough that we're sort of like confident that we have consistency, we'd like to move towards stable snapshots um and sort of, and then, as we finish, this replication work right now.

A

The next thing that we're sort of starting to stare at in scope is how we get the instances of either the index or these large caches uh near the gateways and and make the gateway case work. Well, there's a couple other things that I think I'm going to throw in this replication heading that that we haven't, like necessarily, I think said when we're going to take on, but that we're thinking about um one is this consensus problem and in particular one of the things we might do short term is like.

A

Should there be a daily like consensus, snapshot of what the index is for this day or every six hours, or something right that that's not the every sid is the consensus, but if you've got a stable snapshot where there is an agreement of like what is the correct index in for today or for the six hours or something um now, you've got accountability for the indexing system, uh which is like this indexer did not answer correctly against the snapshot that had it, and so you can start to penalize indexers.

A

That are wrong because you know what correct is and that's a nice thing to have, um and- and it is probably useful for some of the privacy things uh so so it likely is either privacy or a reputation thing that drives needing to do this or wanting to do this um and then the other one is like how much of this uh caching and proactive replication we we want to sort of throw in, um like we'll, probably have a cache, that is on demand fallback.

A

But the do we figure out hot content to proactively push is going to be another one that we need to figure out as a performance, then for scaling um we're working on checkpoints as andrew alluded to. We want to not need to a replay of full historical chains and in particular for cases like nft storage, where it's a really long chain. It would be really nice to just have like a snapshot of all the content.

A

Currently, that can be reaggregated and it's as much to allow them to not have to store all of that previous historical data, um as it is uh we've caught up, um but the next guy hasn't. So, like there's reasons to like have that rebundled for more efficiency, um we think we will at some point need to shard our ingest layer, and so this is as we fetch from providers right now. That's you know.

A

We have a centralized node, that's doing that and as the number of providers expands, we'll probably want multiple nodes that are taking different subsets and then combine them. So that part, probably splits up as the number of overall providers grows, so figuring out exactly how that then goes into a single index.

A

There's a question around performance here, which is: what is the need for a low latency pipeline from content, that's newly published to being accessed, and is that all one system which is, could we have a different streaming system for the first six hours? That has maybe less consistency, guarantees and then content? That's available in the next snapshot goes into sort of this main indexing system. That's got a more mapreduce-y periodic feel versus a streaming uh behavior for a much smaller but very churny amount of fresh content.

A

Those may be different systems, depending on where we end up um right like if I've got new content, that's newly available, I might want to push it out and it may be in some temporary cache and it may not get the same. Consistency guarantees until it gets put into a consensus.

A

uh And then it's also gonna, like the the thing will get bigger, we'll have to figure out like does it end up on across multiple disks, like we can already sort of handle? This we've got a lot of files like it's easy enough to mount this across multiple disks, but we need to figure out what sort of sharding a data center or iraq will actually want to feel comfortable with those deployments as we get there. I think we're fine for a while, though, like a zfs pool will keep us going. I think it's really more.

A

As as we get higher query volume we need to figure out. Is the answer more replicas or is the answer like well here? It is, and we've got lots of readers that can like independently access the same copy like there may be data center cases where you do want the efficiency of just having that copy on fast storage, and you can have multiple compute nodes accessing it and that would be sort of the the sharding case or or the multiple readers uh over one instance, and then we've got some trust stuff.

A

um We're not working on any of that right now. But the things that are sort of like on the horizon is figuring out uh signing an authenticity of records. So in the reframe um provider spec we're trying to say we should start signing the records that we're publishing uh the current indexed writers are already doing this. So like. Let's, let's keep that ball rolling and then we have some questions about. Should we be blinding or hashing or adding privacy in here? And what does that? Look like.

A

So immediate ones that we're likely to deal with at some point, we may have some malicious replicas. How do we deal with those um do clients cross-compare from like different replicas, that they can see and like make sure that they're not just being like given bad data?

A

um There's some latency tradeoffs there and some like excessive bandwidth stuff in in in the making that the client's problem? um But as you get to trustless you, you either are doing some crypto stuff or you're, making up the client's problem so um and then uh there's also a feedback loop of like how do the clients find the right ones, and so in the short term.

A

That may be some sort of gossip-based discovery of these things in a permissionless world, or we may do some sort of consensus-based routing table of what are the ones out there. um I'm less worried about clients finding the fastest one.

A

I think that we already have enough, like ip geolocation type databases, that it's pretty easy for a client to make pretty good guesses about like given a list of multi addresses, you can probably find the ones that are faster closest to you um with with reasonable precision and then, if you can keep some state of ones that have worked well previously, like that plus your general ipmap is, is probably going to do pretty. Well some things. We don't know how to do um this. First one goes back to the dht like.

A

If we've got heterogeneous data, that's being provided, can we can we safely make use of that with a bunch of efficiency? So this this heterogeneous dht type world like? Could we have the file coin storage providers provide?

A

You know, terabyte per storage provider of indexing stuff into the ipf sdhd and feel like it was still a permissionless dht or would we need incentives in stake and then it's less of a permission stage in some sense?

A

um Where do we root our notion of consensus of correctness uh in in an index snapshot if we do that um and and what what? What is that baseline? um Who learns? What do we do? We add limits to information leakage, and- and are we going to feel like we can do that sufficiently efficiently, that it makes sense, uh or is there going to be sort of a tension there that we need to worry about, um and then what does the distribution of these indexes look like so do we do?

A

We need some sort of cdn like thing to store these archives and the file coin data so that we can spin up indexes quickly or spin up index replicas quickly and then what is the right cache that we would be proactively pushing at different levels.

A

There's a website said.contact this morning, the firewall here didn't like it. I think we complained to them and it works now. There's also stored the index on filecoin slack and then the research open problems. There is a an open problem uh around privacy, stuff and private retrieval, for that includes content, routing uh that we'll talk more about tomorrow, all right, happy to take questions.

A

What does a team need help with?

A

um I think more integration like there's, there's two sides like there's: there's an ecosystem thing here and there's only so many of those links that we can build. So the more proactive like if, if you are building this like either, have that open source have a generalizable help us with docs. Tell us where things don't make sense, so that the next people who are also doing those links can do it faster because getting more clients and more providers is what makes this sort of gain momentum and succeed.