IPFS IPFS Camp 2022, 3 Nov 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Bitswap and IPFS real-time metrics analysis - Leo Balduf

Description

This talk was given at IPFS Camp 2022 in Lisbon, Portugal.

A

My name is Leo and I will be talking about real-time metrics and mostly about bit Swap and other ipfs Basics. So, as we heard earlier, all the content retrieval in ipfs starts with bit swap so it's quite nice that I'm first, so why do I want to do this, or why do we want to do this? We have a bunch of fancy grafana dashboards, which display real-time information about the network. You probably can't read this, but this is real-time.

A

Geolocation of requests that are performed on the network and I will get into how we do this in just a second.

A

Maybe you can read the link at the bottom, which is grafana.monitoring.ipfs dot truly.group, where you can go and look at this stuff in real time all right. We are currently having a project with PL, which has two parts. So the first part is about this. It is about collecting metrics, analyzing them and visualizing them, mostly in real time wherever possible, and that's this talk and the other part is about tracking content throughout its lifetime, which is not this talk, so just for your expectations.

A

I will start a bit at the beginning. I will introduce some Basics about ipfs lip peer-to-peer bit Swap and whatnot, so we have a little bit of background to work on and I'm going to talk about real-time analysis, mostly of bit Swap and other ipfs stuff, and we're going to have a lot of time for Q a at the end, all right, the basics. We already heard that ipfs was built on top of Libya TPM and lip peer-to-peer does so-called stream multiplexing.

A

So what you see here at the bottom are two nodes: Two Hosts talking to each other over a single transport connection, which is usually TCP or quick, and they run multiple application protocols. On top of that, these are called streams in Libya repair.

A

This gives us metrics on multiple layers. We can look at the transport layer, but we can also look at the stream layer and that application protocols on the stream layer and that's what we're going to do so nodes on the network advertise what protocols they support.

A

Basically, what protocols they can speak and there's a bunch of them, but there's a lot of them, but a bunch of them are important to ipfs I've listed a few here, for example, there's the ID protocol, which is a part of leap year to piano which does node identification, so it does public key exchange and it does all of this stream handling, exchange and stuff like that, then there is the Ping protocol, which estimates round-trip times there is the DHT protocol actually two of those.

A

Then there is a nut hole, punching protocol which I won't talk about today and then there's bit swap which I will talk about on this layer already. So this is produced using the ID protocol. We can already ask a bunch of interesting questions, for example, how many nodes support some protocol, for example, I know that the new hole punching approach is being rolled out and an interesting question would be how many nodes support this at the moment and of course, how does this change over time?

A

We can do a bunch of other stuff. We can find new implementations. We can find new applications.

A

A few of the things here, but we can also look in into the protocols and we can think about, can we use these protocols to extract metrics that are interesting to us. So I mentioned earlier that there's the academlia protocol, which does DHT stuff. There are a bunch of DHT crawlers, using this protocol to find DHT servers on the network, infer the graph of DHC servers and in further basically the network core.

A

So we can do that. There's the ID protocol, which I mentioned earlier, which is about agent versions, supported stream protocols, transport protocols and all that, so we can. We use that to extract metrics about the network. Basically, then, there is bit swap which I deal with a lot bit swap is used, as we heard earlier, to initiate requests. So every request is broadcast on bit swap first, so we can collect these requests and learn about data requests on the network.

A

From that we can get popularity distributions of content, we can get request patterns, we can analyze the content and many other things and, of course, the Ping protocol. We can use that for rtt estimation.

A

So do we do this? Yes, of course, we are running a distributed setup to collect, collect these metrics. We run multiple large monitoring nodes with unlimited connection capacity. That means a normal ipfs node has some 600 to 900 connections. At any time, we run nodes that have something like 20 000 connections at any time, because they have unlimited connection capacity.

A

They do this passively. They just sit there, but people collect to us because we are connectable via the Internet.

A

We run unmodified Kubo, so we run unmodified software with a plugin that we developed to extract these metrics for us, then we have a client that connects to the plugin and collects these metrics in real time and analyzes them. Finally, we display it on grafana the entire setup. It's a distributed system. It's a bit complicated, it's set up with ansible needs a bunch of servers which are connected with a VPN, but the end goal is to have this Deployable by anyone.

A

So people can replicate what we do and ansible makes this okay, I would say manageable.

A

There's a bunch of pros and cons to this so running. This with multiple monitors gives us multiple Vantage points which is good, but it does increase the costs and complexity and a bunch of other things, not great.

A

As I said, the monitors are passive. This is easier to run. There's also in the end, leads to a uniform sample of peers that each monitor is connected to we analyze this and it's actually true, which is quite nice, so we each monitor, gets a uniform sample of all peers on the network. So we can run statistics on those.

A

However, if a peer is connected to multiple monitors at the same time, they generate, for example, duplicate, request, entries or duplicate peer entries in connection tables and stuff like that which we always have to keep in mind.

A

Then, as I said, we collect bit swap messages and analyze them in real time. So this is some published subscribe basic model. We wrote this ourselves, which gives us good performance, that's great, but it's a custom solution.

A

You need a custom client, it works, but it's not great and for some things we had to reinvent the wheel, for example, for back pressure, and things like that. So on the on the agenda is to move this to another system that that does the pops up for us, but it needs to deal with many thousand requests per second and I still have to figure that out.

A

All right I will go into bit swap in most detail in a second. But let's first maybe talk about this ID protocol I mentioned earlier that every ipfs node runs this and exchanges information about streams, protocols that are supported about the agent version about the public key and stuff, like that, any running, ipfs node any running kubernet keeps track of this and knows it locally and our plugin exports this and makes it available for real-time analysis stuff that we get is, for example, all the peers that we're connected to what protocols do they speak?

A

What can they do? What agent versions are they running? What transport protocols do they support? We can also see the number of DHT servers, so the network core, because it's a number of DHT clients and surprisingly, we are connected to DHT clients much more often than to DHT servers. So what we can do with this passive monitoring setup is look at The Fringe of the network, whereas DHT crawling can look at the core of the network. For example, another thing we can do is estimate the size of the network using multiple Vantage points.

A

This is relatively simple for two Vantage points. You can model this as an urn with two colored balls or balls of two colors, and you pick the balls and then you you end up with this, which is relatively simple for two monitors.

A

You can also do this for more than two monitors and we model this as a modified coupon collectors problem, which is not so simple, anymore and I'm. Quite glad that someone else did that and not me, we have it in our paper. It works yeah in practice. This is not completely real time. We just sample our monitors in intervals at the same time and then we basically compute this.

A

But this is quite nice because you have multiple Vantage points which intersect in the piers they're connected to, and then we can estimate the size of the entire network, including the HD servers, the HD clients and whatnot. So everyone.

A

We can also I show this on the first slide, or so we can do geolocation. We can geolocate peers so, for example, for size estimates we could see where the peers are located, but we can also do this for requests and we are currently doing this for all the requests we get. So we get I, don't know a few thousand up to ten thousand or so requests per second and we geolocate the origin of those.

A

And then we end up with a graph that you can't read, but it says that most of the peers are in North America in the US actually and then in Singapore, China, Germany and Ireland, and we run this continuously all the time, which is quite nice.

A

All right bit swap finally I do like bit swap this is how bitswap works. We up here P1 and we are looking for some content, so we ask all of our neighbors P2 to P4 to tell us if they have this content and a few of those respond with a half message indicating that they have the content.

A

We then pick one of them and get the content from them. That's how bitsource works every node does this as the first step when they request data, so every request for data goes through bit swap now now we are not P or P1 anymore. We're P3 we're not requesting data, we're only listening to other people's data requests. We basically eavesdropping. If you want sniffing the network for science, um you will notice that we don't reply with a half message, because we don't have any content.

A

So we listen to other ques all these requests and we record them. We timestamp them record them, and then we can run statistics on them later on. We do this, it's pretty cool. What can we do with that? The most simple ones are request rates.

A

How many things are requested per second, and then we can specify that by whom, for example, by origin country or by type of request or by I, call that origin group, but you can separate the peers into, for example, DHT servers, DHT clients or Gateway servers, non-gateway servers and a bunch of other classifications classifications.

A

That's simple. We also get a bunch a lot of cids. Of course we get many millions of unique cids every day and we can analyze those. We can look at the codec that is listed in every CID as a proxy to estimate the usage of the network. I will show results for that in a second, we can derive content popularity distributions, which we did with interesting results, read our paper and we can, for example, also download the content sampling it to estimate what is being requested, for example, mime types.

A

What is the usage of the network.

A

Looking at that request rates, for example, we get some many thousand requests or so per second per monitor, which is quite nice, but it generates a lot of data which we have to store somewhere, which is not so nice. This is one of the reasons why we want to do all of this in real time. Ultimately,.

A

Real quick: this is this is old data, but we looked at the codec listed in every CID to estimate what people are requesting and the top two of these entries here, which is DAC protobuf and raw. These are codecs used for file system stuff, which means people use ipfs as a file system.

A

That's not too surprising, being a distributed file system, but you can use it for other things. You can use it, for example, to request git blocks or ethereum transactions or whatnot. This is possible, but it's not done very often foreign ly, real-time metrics analysis. Why do we want this?

A

um Janus usually introduces me with I, want to be the Google of ipfs and store all the data, but I really don't. This is many many terabytes of data that we're storing already with the traces and it's too much. We don't want to store it. It's terrible.

A

We want a live view of the network, still having all of those metrics extractable from the traces, but in real time for that we're running multiple monitors, it would be nice to have one unified request stream of the entire network, but, as I said, we have to deal with multiple monitors, concurrent requests arriving on those monitors and I put it here. Oh God, the Horrors- these are maybe readable, traces from two monitors and we have timestamps origin peer request, type and CID.

A

These are real and you can see a bunch of problems already. They are out of order because they arrive concurrently on the monitors we timestamp them, but we have to get them into some order.

A

They have duplicates within each monitor, for example, on the first one you see prb requesting some content C2 and then later requesting it again. Why and there's also duplicates between the monitors, of course, because if a peer is connected to both monitors, you get the requests on both.

A

So how do we deal with this? We have to match them somehow, and we do that. It could look. For example, like this, you have some requests that you can match easily. You have some that you can't match easily, and then you have also a unique requests.

A

I have implemented this and it works offline. It works faster than real time, which is good. Will it work online? We will see. Do we want this online I want this online. So I want this in real time to see the request stream of the entire network, so I'm gonna. Do it.

A

One last note: before we get to q a.

A

um All of these metrics we're feeding them into Prometheus, and we experience a curse of I would call it cardinality, but it's usually referred to as dimensionality every counter that we feed into Prometheus. We have to annotate with a bunch of labels and all of those have some cardinality, for example, which monitor received the message. A bunch of possibilities.

A

Are they duplicates? Are they matched between the monitors, origin country? Absolutely terrible? There's hundreds of countries, origin group, is this Gateway? Is it a DHT server? Is it whatever entry types multicodex? So we have all of these labels on every time series, and this is a product. This is not a sum. This is a product which ends up with a giant cardinality in these Time series.

A

That needs a lot of storage and queries, get slow kind of, but it works at least I haven't found a better solution yet, and this is working so we're doing it like this. For now, as I said, you can see all of this online, it is working, you can go there, it's a public refiner.

A

You can explore the metrics. We have data from about a year or so I would say, and you can also look at the plugin and the client to the plugin to see how we generate those and now it's time for questions.