Filecoin Retrieval Markets Summit Lisbon '22, 21 Nov 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Retrieval Incentives Intro - Marina Kostioutchenko

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

Hi everyone I'm Marina from the Bedrock team.

A

It's great to see everyone here. Thank you for joining super excited about our discussion today. I think is going to be super productive. um So these are the goals for today's discussion. We have about five hours together and hopefully we'll achieve four things.

A

First, I will give a quick introduction to the problem itself, where we're seeing the drop-offs pain points um uh we'll learn about existing efforts. I'll do my best to cover every everything that's going on in the retrieval incentives world and then Jake will uh navigate us through the uh brainstorming and hopefully, will come out of uh the session today with the joint understanding what the highest priority items are for us to all work together in the next six months or so so in the first half of next year.

A

uh So, let's start with a quick background. I'm sure all of all of the people in this room are familiar with the retrieval incentives world, but nevertheless, I just want to give a quick introduction so that we have the same kind of background.

A

So when we think about successive retrievals I would classify um kind of like two big buckets where we can see some challenges. First is the reliability of rituals, so the technical process of serving retrievals and in that regard, we've done a huge progress um in the past a couple of months. So, first of all, the data transfer success has more than doubled in the past three months.

A

So thank you to the Bedrock team and all the great effort they've done there and um uh you know we're continuing to work um towards making sure it's even less than than what we have right now, and um particularly there are some ux improvements needed for just getting invisibility into a Lotus node processes.

A

But then there is the second bucket where we see challenges, and this is basically storage providers being willing to serve retrievals. uh What we see right now and we'll dive into that a bit more um in the next few slides is that uh storage providers are either choose to not serve retrievers at all or they are limiting the access to data, and this is where we want to focus today. So on the willingness of storage providers to serve retrievals, uh we have gathered some data in the past couple of months.

A

um So all of the all of this data that we are looking at right now is uh from the auto retrieval uh bridge that Bedrock is maintaining.

A

uh So, let's go through what we're observing the funnel is right now so out of um um out of the 100 requests.

A

um uh We see that only so the uh if you take out everything, that's cached in the auto retrieval Bridge. There is a first big drop of. Let me see from 203 to 12.5 million requests. This is the data that we can find with storage providers.

A

This is kind of expected, but uh the the area where we want to focus at is the uh drop off number blue, uh 12.5 to 475 000.. So the first drop-off is um basically the retrieval uh request or any request at all uh being unsuccessful.

A

But then um the interesting part is 1.77 to 4 uh 74, uh where we see some um uh errors with retrievals and if you look at the breakdown of all errors, it turns out that more than 50 percent of these um uh retrieval requests to public data through graph sync protocol uh to ipfs nodes failed to some restrictions that are introduced by storage providers.

A

So half half of those are either so their rate limit, receivable requests, so basically search providers are saying we can only serve like three requests per day and the other half roughly is the access control restrictions. So this is storage providers, uh just usually through CID gravity, Tool, uh saying that we don't want to serve this content.

A

So how do we solve uh and like? How do we solve this issue? First of all, let's try to understand why storage providers are exhibiting this behavior um and it's uh from the kind of current state of the world. It's pretty natural. First of all, there is no rewards for certain retrievals. Currently you don't earn filecoin or even like Frankly Speaking reputational standing uh by servant retrievals, because such systems don't exist yet, nor are you being punished for not serving retrievals?

A

There's. No system for punishment, yet at the same time obviously retrievals come at the cost. There is operational cost of setting up. uh You know the systems for achievals. There is bandwidth that you need to allocate towards retrievals.

A

um Second, there's opportunity cost you can use that same capacity to mine uh file coin through Cc or storage deals, there's time and effort both in setting things up, but also debugging, because they're still uh operational efficiencies. And finally, there might be some legal liability if you accidentally or not, knowingly serve data. That's illegal, for example, so search providers choose to just not do it, uh and so before we start in diving into incentive structure. Let's take a look at what data is on chain right now. What what data do we want to retrieve?

A

Potentially um so I've gathered basically or tried to estimate the best we can from the uh 258 petabytes of data stored in chain? So, first of all, in terms of who pays for a data being stored. Currently it's mostly in the field plus program.

A

We see four pepper bites that are a legacy. Slingshot deals, but majority of uh data stored in chin is sponsored by Network. So it's Phil plus program in terms of data privacy. This is a highly estimated number. So don't please don't quote me on that, but this is basically uh the uh you know. If you look at the amount of data being indexed, obviously about 30 percent of data currently provided to the indexer, maybe it's more, maybe there's some public data.

A

That's not been indexed right now, so there is at least 30 percent We Believe of public data and in terms of the sort of like programs, clients again, there's a not a ton of transparency by by Design.

A

But what we can deduce from some of the uh programs being run is that about 18 18 paper bytes out of 258 are slingshot data, there's 3.5 and another two programs, which is discovered in Evergreen um aggregators, which are dag house at shuri, Only Store about one petabyte of data and the rest uh 236 is: is this um a bucket of different kinds of field plus clients? It's both public and private data? There are research institutions uh and Enterprises, so um that's the best we can tell about data clients.

A

With that in mind, let's put these clients or programs on the two by two. So let's look at uh you know the latest requirement, as well as how open the data is so yellow uh ovals are current use cases. You can see that they're, mostly kind of like concentrated around Cold Storage. We have both publicly accessible and Access Control, Data and publicly accessible. It's uh and call is mostly all the um uh data programs that we have slingshot.

A

There are some public field plus deals such as internet archive.

A

There are some aggregators that are also doing uh called storage and store public data, and we also see the green kind of emerging use cases um uh they're very highly concentrated in this publicly accessible data. That's that requires a non-archival storage. It's the ipfs knows that one request and retrieve data from Storage providers is the emergent Saturn Network and their compute nodes that will serve compute over data, and so this is the area that we want to focus today.

A

So when we, when we break into the workshop, start brainstorming, uh we want to talk about the publicly accessible uh non-archival storage uh incentive systems um uh in terms of the systems that we've spoken about, or maybe some teams have started working on. um Here's uh here's, the lay of the land as we can see. uh There's you know we don't assume that there will be one incentive system that fits all the use cases and that's that's how it should be so, for example, for the Access Control Data.

A

We anticipate it's going to be and continue to be legal contract that incends the retrieval, so search providers will serve retrievals, because this is what they agreed with with the client for the data that you know only they can access in the publicly in the publicly accessible data um in the archival world. We will also see uh legal contracts, but on top of that there are some reputation systems that are emerging through mostly word of mouth right now, so clients would come to our team.

A

Ask team members about you know what search providers have have been trustworthy and good to work with. There are some emerging set of dashboards. That I'll show you later that also provides invisibility into what search providers are reliable and will serve retrievals and also there's obviously uh enforced requirements. For example, slingshot program requires that data is uh being served. The retrievals are successful in terms of uh kind of this sad and focus.

A

Today, we've seen a lot of interesting ideas emerging um of how to make retrievals attractive to search providers it's reimbursement at the end of the month, for example, I think this is what Saturn network is planning to do. Other ideas include clients, paper, byte, so pay as you use so to say. They're staking at deal time, which is more of an insurance insurance system.

A

Teams have spoken about prepaid coupons where, for example, the user of the data is not the person who um who is interested in this data being retrieved, and so there's separation and payment and usage of the data. So this is the initial set later today we'll break into groups and brainstorm one more, but this is the area where we want to kind of idea today um and very quickly. I want to give you an overview of what's going on right now. In terms of retrieval incentives, I would categorize them in three different.

A

Buckets first is generating metrics themselves, getting some visibility into whether retrievals are even successful or not, and all the detailed characteristics of retrievals second bucket of efforts is surfacing these metrics to users in um in a way that's useful to them and to their goals and, finally, the systems that are aligning incentives between data client storage providers in a way that uh data is easily retrieved.

A

So in the first bucket the generating metrics bucket, um uh there are three projects that were or are ongoing, the most prominent and interesting. One is the validation, but um current is built just for the slingshot program, as I mentioned before. Slingshot requires data to be retrieved, and so we want to be able to observe and enforce that rule, um but the the vision is to grow it further, so make the solution open source and open it up to more use cases than a slick shot. Slingshot used to use dealbot. Currently inactive was semi-manual.

A

Solution didn't quite required a lot of manual effort from the team's perspective, but that was sort of a previous version of validation, bots. So to say- and you know, the data that we reviewed earlier is the other retrieval Bridge data that Bedrock is collecting. These are ipfs Gateway requests that were able to that. We were able to find to be stored as storage providers.

A

Azure also ran their own automobile Bridge, based on what I know is currently paused, but it also exists. So this is this: these are the systems that gather metrics. Now there are a few dashboards that present these metrics serve these metrics back to data clients. First one is Project called fieldgram developed by filmmind team.

A

um The interesting part about this uh dashboard is that it allows for clients to actually leave a qualitative and quantitative reviews on search providers. So, for example, if I store data with a storage provider and I had great experience, I can rate them with a five-star rating and write a quick review.

A

um We see some friction with the world: authentic authentication, um not an easy ux. So uh currently we don't see a ton of reviews, but the capability is there and it's very interesting idea in terms of developing this sort of reputation, metric based on kind of like Yelp, like reviews, um retrieval metrics, are not included, but once we have at scale kind of metrics collection, there's capability to include these metrics here, another dashboard I'm sure a lot of people might have seen because it's one of the oldest it's a filter up that IO.

A

Unfortunately, we don't see again a high adoption because partially because retrieval metrics are not included, currently not actively supported, but again, there's capability, there's API capability and there's capability to include more uh more data points here um and finally, there is uh starboard dashboard.

A

Also there is room for including retrieval metrics here, currently they're not being served. The work has been paused but again there's capability to plug in retrieval metrics here and make this dashboard place where uh clients can go and learn about some of their achieval um details about this or that storage provider, and then finally, there are a lot of exciting projects in the incentive systems. Right now, I would categorize kind of mechanisms at either either carrot or a stick and I try to to do this distinction here.

A

So the first one here is the what we call slingnet: a sort of marriage between storage, metrics, dial project and validation, bot that I just covered earlier, and there are kind of two types of incentive mechanisms. This uh project project is exploring.

A

First is a stick: you know if you don't serve retrievals you're excluded from the data programs and excluded from the opportunity to earn filled plastic words within the slingshot program, but there is a um kind of like um more in planning, but the project that is hoping to set up a uh storage Dao around these metrics. That would allow storage providers to earn Dow tokens, for example, for good retrieval, behavior and I, say hope. Luca will cover that today or Irene. Irene will cover oh you're not going to cover uh if you're interested.

A

We can share more about this project, but this is basically at the high level rewarding storage providers for exhibiting good storage, Behavior, retrieval Behavior, uh the next one that will be covered. I hope is the retrieve.org, and this is more of a stick uh mechanism where provider of this insurance will get slashed if retrievals are not served. So you stake some amount of um asset at the beginning and then based on whether you serve or don't serve retrieval you either are able to get it back or you're getting slashed.

A

So there are two active deployments. I think I I really will cover cover that project a little later, so we can learn more about it.

A

um And finally, we we do have one aggregator that is currently using a notion of replication score to Grant to decide who the deals will go to and that's Phil's one um developed by Charles's uh team, again more of a carrot if you serve retrievals you're rewarded by more deals in the future- and this is an active system that exists already and um Jake- will help us go through the phases today. I'll, let him cover the plan.