Filecoin Fil+ Day - Austin '22, 16 Jun 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Fil+ Canvas - Anomaly Detection and Network Governance

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

Without further ado, here's the agenda for today's talk uh quickly go over some contact setting to on uh try try to try, try to say the the importance of understanding data data cap flow uh and then next I'll move on to quickly defining what's anomaly detection in our context and then what characterizes anomaly and what are some archetypes?

A

We have identified in the network and next I'll be talking about fill plus canvas which is a product we're building uh it's an analytical solution, that's trying to model and quantify and detect and ultimately prevent any types of suspicious behavior, that's happening inside the network and finally I'll close with. What's what's next and also fill plus watch, which is a separate visualization layer, we're building cool. So I think a lot of speakers has touched on this. So quick, uh two interesting statistics.

A

We have seen huge adoption um of phil plus among the ecosystem and, uh as of um most recently, I think the the the daily act deals attributed to a field plus is actually around 88 and the new committed deals uh measuring tib is almost 100 attributed to the field plus, which is really demonstrating the adoption of field plus among the ecosystem, so great success kudos to the team.

A

I think we would argue that as fuel plus scales, it is very imperative for us to understand how the data cap flows throughout the ecosystem. So here on the slides, you see a pretty simple bow tie graph, so you have your data cap flowing from notary to clients to their data cap applications and then from clients ultimately to storage providers through data cap spending. So that's a very simplified framework of of basically how data cap works.

A

We will argue that uh modeling data cap flow has four main benefits, so obviously starting from number one, you have reduced cost and complexity to monitor the the the adoption of uh the field plus program, um and on top of that you can also model and visualize different types of incentives or interactions. That's happening between the actors, which are also very interesting to look at and a lot of deep insights and interest to look into, um and actually what we are more interested in is to detect and kind of prevent any type of anomalous.

A

Behavior that's happening inside the ecosystem. um This is all this is very important to both. You know, promote the governance and health of the net network, but also at the same time, you know it's it's to promote the fairness among the players that are playing by the rules and then actually contributing to the system.

A

And finally, we can you know, on top of that, um quantify actual reputation uh to you know, drive long-term network health and governance, so in particular our product is concerned with anomaly detection and prevention at scale, and we would argue that is very key to the network success in the long term.

A

So on the left, you see your familiar voltaic graph, but then on the right are a top, a couple of examples of announcing archetypes we have observed so your first type, which is um when hypothetically the clients, are kind of colluding with uh together with storage providers, and then they sort of in their mind, have a predetermined set of storage providers. They'll channel their data cap flow too, but basically without any type of proper dd, that's happening.

A

So we think that this behavior is relatively uh easier to detect through proper data sets, that's recorded on chain and then interesting thing is these storage providers usually become belong to the same owner, so they have actually. The ultimate beneficiary of this type of behavior is like relatively like concentrated into like one or a couple of very few entities.

A

Next moving on another triple layer, complexity, which is when notary could theoretically also participate in this type of collusion behavior. um This is relatively like a little bit harder to detect because you have to substantiate any sort of suspicious behavior between notaries and clients, so this is a little bit hard to substantiate and where uh we believe that this is um sort of the next step. On top of uh the first type of collusion, that's happening and another. Yet another more interesting type of analogy.

A

That's happening is that we sort of observed this, that the same set of notaries are kind of channeling, the same set of data caps to storage providers. But it's through a transient set of clients, which is to say that the clients kind of disappear. They only do one deal and disappear, but then, ultimately, who benefits from uh actually the data cap flow are the same set of storage providers.

A

So um I think to detect anomaly at scale. What we're trying to create is field plus canvas uh and we're collaborating with deep, deep and galen, um so the governor's team and then we're doing this iteratively. So we start from ideation, which is to hypothesize any type of behavior, that suspicious actors could have, for example, channeling data cap to the same set of storage providers, and then we formulate these behavior into risk metrics and then next we validate we implement. These risk scores test effectiveness against a known list of anomalies.

A

And finally, what we are thinking in terms of production is we want is a simple reputation system could be a weighted average of you know. The most defective risk scores versus v2 is a more advanced analytical solution to do like clustering graph, algo, even ml on top of these risk scores.

A

So by far today, we have identified already quite a few effective risk scores that can detect these set of suspicious behaviors uh within the ecosystem. So we divided the population among like two groups, suspicious clients and other clients. So these suspicious clients are sort of a couple examples. We're working on then subject to change, but this is based on guillain's investigation, our like relatively suspicious clients that we should look into, um and we are here on the screen I'll be presenting three different scores. We have this side, which is quite interesting.

A

Number one is concentration score uh as related to owner ids. So this uh it's a score measure from zero to one which measure measures how concentrated the deal flow are from clients to storage providers. We observe that among suspicious clients is concentration is very high, like very much close to one versus other clients are 0.78.

A

Another interesting thing is um average price paid per store a unit stored per epoch.

A

This is also very interesting to observe that between suspicious clients, this is zero versus other clients, it's 11., so we believe that this also kind of makes makes conceptual sense because suspicious clients, they don't really have the incentive to pay for the deals, but I think their ultimate goal is rather to directly channel the data cap to their predefined set of storage providers and, last but not least, there's this another metric called minimum time between any data cap allocation activity and the next deal proposal among suspicious client group.

A

This time is roughly just one epoch versus the other group. Is 850 epoc, also very interesting to observe and makes cons intuitive sense, because um you know these set of clients already know that they're going to channel their data cap flow to these set of storage providers that don't they don't have to perform. You know a due diligence on top of that to really examine okay. These storage providers are relatively better, so I'm gonna, like do some due diligence and then afterwards store store my deal with them.

A

um So with all these said, I think there's a couple challenges we can we're still facing so, um for example, anomalies are very easy to detect, but it's hard to substantiate so this would require continuous investigation and collaboration with the ecosystem players, as especially deep and guillen, and the governance team to really substantiate these types of suspicious behaviors, um and then, on top of that, we believe that actors could also adapt to eva, evade any type of detection, that's happening.

A

So if you imagine this sort of a game theory thing, that's happening, um actors could adapt to whatever, like governance behavior, we introduced to the ecosystem, but I think that we can combat this by designing risk scores that uh so that, if any deviation, they have any deviation away from their risk. Scores it'll be against their incentives to do so.

A

So we're trying to you know, design risk scores in a sort of robust way to like prove any type of um to to still still still still make it uh prove to uh any kind of adaptation from the players.

A

And, finally, we are still kind of working just on on chain behaviors, because we are uh kind of using the data that's being recorded on chain, but in the future we do uh well think about expanding to any type of option. Anomaly detection, including using nlp techniques on any sort of um you, know, deal application data, cap application or any type of deals. That's been recorded on github, for example.

A

Finally, last but not least, we're also building another product called plus watch, which is a visualization layer on top of uh canvas, it's one of those functionalities. Basically, you can do uh com sub community graphs, kind of which visualizes the data cap flow from notaries to clients and, ultimately to storage providers. So it's another interesting thing we're working on. I think I'm at time.

A

Thank you. Everyone.

B

So david is this available right now in public.

A

uh It's still in stealth mode, we're still collaborating with gila and deep at the moment, but uh I think the we do have a pretty good uh prototype already, but uh we just want to really test it, iteratively with them to really make sure that the risk scores are capturing actually suspicious behaviors, but not like any type of other things, uh because these things are relatively very sensitive like calling. Some people, an anomaly or like suspicious is a very hard strong thing to say so, really want to test it out before making it.

B

And it will be public right, um meaning it will be accessible to anyone ecosystem to see yeah.

A

Yeah that that's the intention, yes.

C

Last one from above, uh the goal is definitely to have this be public.

A

D

I think my question is just quick follow-up, but when you do make this public, are you also going to make public the criteria for what is considered as a suspicious client.

A

uh That's an interesting product, design question um I think um yeah, I guess. Maybe deep do you have anything to say about this um yeah.

C

I think a very good, topical question, I think, a difficult question. um I think it's kind of on up to the community to define that- and maybe even the word suspicious is not a good word. Okay then, like maybe we need a better word. uh There's.

C

Also cases where, like uh you know, projects themselves could be defined as not being compliant quote, unquote like with certain guidelines or policy, but the same client or the same entity could, in a different world, be compliant with policy, uh and so we probably need to do a little bit more thinking like how we want to term these things, uh and that's part of also, why, like there's, still brainstorming and back and forth happening.

C

If you have suggestions on like terminology that we could use, that is more sensitive to the topic and like more appropriate uh and sort of recognizes the complexity of the situation. I'd we'd love to hear it. uh I think our operating terms are more for the simplicity of communication in how we iterate in the program today or iterate on the projects today, as opposed to like correct, uh given like the nature of the situation, so.

A

Today, it's more.

C

Like exhibiting behavior that we consider to be potentially abusing data cap, uh and that's really how we define suspicious so like leveraging it for self-dealing leveraging it for disproportionate deal, making with a small cluster of storage providers like profit hearing from data cap or, like generally bypassing the system to get disproportionate allocation.

C