IPFS IPFS Camp 2022 - Measurement & Performance, 2 Nov 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: ProbeLab 2023 Roadmap - Yiannis Psaras

Description

This talk was given at IPFS Camp 2022 in Lisbon, Portugal.

A

Okay, hello: everyone um thanks for coming around this is the second part of the track on measurement and performance, um a quick recap of yesterday. It was about um recent results and studies that we have done uh with members of probe lab and others and collaborators of ours. Today, it's going to be more like an interactive session where we are going to look into the future and what we want to do in the coming quarters.

A

uh So I'm going to start the session by um talking about some of the Milestones that we have giving a brief description uh and then the floor is going to be open for everyone to um either suggest new items or you know, uh uh let us know other areas of Interest.

A

What we're then going to do we're going to have another talk that came in uh the last minute by a colleague that uh just joined and he's very interested in letting letting us know what they're working on and then we have a bunch of breakout sessions. They are on the website on the ipfs camp website.

A

um The timings are a little bit messed, so don't pay attention exactly to the timings they're going to be 45 minutes each and we'll use the tables basically to you know have two at a time in parallel, we can well when I finish, that we can um hope uh when I finished, that we can vote. If we want to drop some session, we can do it if we want to have an additional one that someone has to propose.

A

We can do that as well, so today it's really time to get uh involved and uh I, don't know influence what we are going to be discussing until uh yeah yeah, the next three to four hours until five six o'clock. Okay. So that's the brief logistics for this afternoon, so I'm going to start with um yeah, with going through the Milestones that we have prepared.

A

uh The first one is about the hydras, so we had a talk Yesterday by Dennis on um some results on whether hydros are providing uh value in terms of performance to the network, and, as you understand this is an ongoing study. The Milestone that we want to reach is be able all to answer the question of whether hydras are needed in the ipfs network. What performance Improvement do they bring in several different situations, so yeah that is going to be the end goal of that pretty important. In my opinion.

A

Now we also went through and like referred to several tools that we're using primarily that has been the crawler, the nebula crawler, but we have Mikhail mentioned the CID holder, which we use for the provider record liveness study, so we do have several tools we use to do the active and passive measurements in the network. What we want to do next is set up some continuous measurement infrastructure or continuous monitoring infrastructure with a more sexy name.

A

We need to come up with, but what we want to do is have those deployed get the tools as I mentioned um yeah. All these are already open sourced. They are documented. People can use it it's just in Scatter, they're scattered around. They are not in one place, so what we want to do is have and like do, do the instrumentation so that all these tools leave in kind of one place and they're. Perhaps one organization in GitHub or something there is a back end where they're deployed they run continuously there.

A

The experiments, some of them might be running really continuously like every I. Don't know a few minutes or so some others will not make sense to run so so frequently.

A

So the kind of periodicity is going to be different depending on what the script and the the experiment is doing and then feed all the results that we have in a kind of data lake or data warehouse where we collect everything, and then we have the right tools to analyze the results and, like you know, give out the the plots or the dashboards or whatever we want to have as a um as a front end um yeah.

A

That should be easy to consume by the users by the ecosystem, but it should also be easy to reproduce and use from others for their own purposes.

A

um We do have another one that um I kind of briefly uh touched into when I was talking about the DHT lookup latency. Well, the retrieval latency, uh more generally, which is the bit swap provider delay where, if you remember in the plot, what happens in ibfs is that when you, when a user is asking for some content, they go and ask their immediately connected peers and they wait for one second to receive a reply back and if they don't that's when they go to the DHT and try to resolve the content through there.

A

So this means that if, um if content is popular and it is within the swarm of a beer, then you've got the chance of getting responses very quickly. But if not, which could be the vast majority of content, perhaps I don't know um then you're waiting for one second and the success rate is very low. So in that case you you're just waiting for one second, for no good reason, and only afterwards you go the DHT. This is an ongoing kind of set of experiments that we're doing with um yeah, somewhat controversial results.

A

You would say um results that at least we cannot really explain 100 right now, but basically the target of this Milestone is to say: is it worth having bit swap ahead of the DHT lookup in the uh in the retrieval process?

A

um Yeah, we call those magic numbers, they are fixed in the ipfs and the B2B code base, so um the the bigger project is kind of clean up. Those magic numbers have a more rigorous kind of Investigation of whether they're set correctly. Ideally if we can have non-magic numbers, but parameters that are just dynamically based on something on the network load on the Node load well, depending on what the thing is, that would be great. It would make you know everyone's life, much easier. You won't need to go and be checking periodically.

A

Is that now that the network size doubles? Is it still the right parameter? You know, so that's a part of a big, bigger project that we have another one um yeah another one that we have, that we pretty much uh the project is complete, but we just need to do. The final touches is on the uh provide the record intervals. The expiry provide the record expiring intervals and uh provide the record republish intervals.

A

The again Mikhail presented very uh in great detail yesterday, so we do have some new recommendations for the engineering teams and it's a matter of basically uh yeah, including those in a new uh in a new release, but most of the work has been done, which is great news. We already have a book a box to take.

A

um Then we have another colleague of ours that he's not around she uh in Lisbon these days he's called Thunderdome. We might get lucky and have a pre-recorded video by him uh saying what Thunderdome is it's a very useful tool again, of course it's open source, but what it does in brief is that it is instrumenting a node trying to um replicate the ipfs gateways, but yeah more generally, it's an ipfs Kubo node, basically, and you can use that to interact with the network and set up. You know tests and experiments.

A

You want to do in an easy way and interact with the network. It also replays the traffic from the gateways, but okay. Of course, this can change. So it's pulling the stream of requests from the gateways and it's replaying that through the node, so that you can have a in some cases realistic. You know traffic load that um that is being is being done with the experiment, um which is useful in some cases.

A

Now what we want to do with this one, um of course, it's very interesting more generally four measurements and we're using it already, but uh what Ian wants to do is to have um a heavy test automatically the new releases of ipfs, so that you can check for any regression uh that you know the apfs team wants to push out um and check if everything is as expected, basically uh so yeah, this is done, but somewhat manually.

A

We want to have it that as part of uh of our infrastructure, uh then we have this study that is called optimistic, provide. We didn't present that, um if you're interested, we can point you to uh several very detailed documents that we have, or you can talk to Dennis it's right here.

A

um But what this study does is that um as I showed yesterday, the provide process when someone wants to publish content in the ipfs network is very slow. It takes tens of seconds and sometimes hundreds of seconds.

A

This is because of the internals of how ipfs works, that out of scope for this particular presentation, but what we found out is that you can optimistically kind of choose which providers you want to pick to store in order to sort of provide the record with, and once you choose them, you can like preemptively, finish the process and that will complete much faster much less than tens of seconds. So this is also developed. It's close to being finalized.

A

We do have results that are very um yeah, very encouraging, so to speak, and we we've tested also, you know comparatively with other approaches of how can we, how could it possibly be done uh and yeah? There is a kind of robust uh study behind that, so the next step with that is to test a little bit more.

A

um Do some other things that I'm not aware of because they were discussed just a few days ago and basically put it in an experimental release before it goes out more generally. Another magic number is the DHT timeout. There are several uh boards in the DHT um yeah, while walking the DHC to do several different things. uh So that's one study that um will like potentially can increase performance a lot, because in many cases you might have experienced that some things are just hanging forever.

A

You start requests that never finish, and you you just hang there forever. These could be small, easy wins and low hanging fruits so to speak on. You know like if something is not working, just kill it early and restart the process. It's going to be much faster anyway. So a very interesting study we haven't started on that and um yeah. It will require a lot of digging into the internals of the have the DHT works, uh the nut whole punching.

A

um There is a talk tomorrow in the lipit speed day about um not hole punching and how it works. The lipitime has put out a very interesting technique to overcome this eternal problem in peer-to-peer networks and what our team uh kind of is offered to do is measure how successful this can be. Having a solution is a different thing to having a good solution. So, no questioning anything, it seems that it's working very nicely, but again, more testing is needed. Reminder on this. You should sign up and become part of this experiment.

A

If you scan this QR code, then you're going to be redirected to this form, the sign up form and the experiment we're expecting to have it during December pessimistically for a week optimistically for a month, we'll see um but yeah you'll be asked to basically get download the binary which is going to show up on your taskbar there like that and indicate when it's running and when it's not and from the infrastructure that we have built, um we are going to be attempting to do.

A

You know to connect to your node from the outside from other networks. So it's very interesting. It's going to give us lots of results, but for this um you know in order to get as many results as we can, it would be great to have as many people running it as we can.

A

um There is documentation uh yeah. This is the GitHub repository there I think. No, that's not it but yeah. Anyway. There is links to the GitHub repository of how everything is instrumented. Obviously it's all open source, so you can check internally what it's doing um yeah and it's going to be great to have this experiment running so with that we're hopefully going to land this Milestone of being able to say it succeeds with. You know this much percentage points uh quite likely we'll need the follow-up study on that.

A

If we need to do any optimization, but that's the initial Target that we have uh yeah another one on um Liberty privacy guarantees. It has been uh mentioned a few times even during the morning session, and there is an entire tomorrow. The entire afternoon is on lipid B privacy. So come along there to see what we have been working on.

A

There is an approach that is called double hashing and we arrived to that after doing quite a bit of research on you know what would be the best thing to do as a first step, there is no one-size-fit-all for privacy, even in like protocols that we use today, uh and even more so, for you know, decentralized um protocols that are based on dhcs and Pub, sub and peer-to-peer, and so on. So we yeah. We chose that approach of double hashing.

A

Again there is a super detailed document about how it works and uh what we should do and what we are going to be doing so you we can link you to that. If you want, uh but yeah come along to listen to what it means uh and we have started the development work is pretty close to completion at least some parts. A tricky part there is the transition like how, because there are some kind of non-backward compatible changes or yeah, someone could do it.

A

This way, how it's going to land through the the existing DHC network is not very um yeah. It's not very clear right now, so there are some challenges there. uh Ongoing discussions get involved if you want, if you have other ideas, yeah but exciting work. uh What else yeah? We have lots of things, uh so we want to focus on Gossip sub as well.

A

Gossip sub is one of the protocols uh in the lipid to be stack is the main Pub sub protocol, I would say, although there are others um in the Liberty uh library now. The interesting thing is that gossip sub is used in the filecoin blockchain in order to transfer the blocks around from sort of from one storage provider to the other well to everyone. So it's pretty Central that it works um as expected, and that's what we want to measure.

A

Basically, we want to go down and do the all the instrumentation that is needed in order to see you know when, when one storage provider is publishing a new blog, how soon after has this propagated to the rest of the network?

A

um That's one very basic thing that uh we want to do that. We need to do. The second thing that we want to do is see if um the security properties of Gossip sub are working as expected. So, um as you understand you know, blocks in the Falcon blockchain are um kind of transferring value around.

A

Basically, so if someone wants to attack the network in one way or the other, that is not going to be great news, so um gossip sub was instrumented with some security guarantees um a few years ago before the file coin launched, and these are with regard to a score function that works for every peer and um yeah, basically kicks out the peers that seem to be misbehaving. uh There's a very nice. Well, there are several talks about how this works. I can point you to them.

A

They some of them, are on YouTube to get into the details, but we want to check basically whether those mitigation strategies that we have put in the protocol they're actually working as we expect them to work um very important study. Quite yeah, quite heavy I would say uh it's going to take some time, but we're going to take it step by step and yeah. Don't forget to breathe, I, don't know um yeah, and that's it.

A

uh That's that's how uh I don't know about 10, uh maybe 9 10 next Milestones, that um our team has got for the next few quarters So yeah. Thank you very much. Any question. Yep.