IPFS IPFS Camp 2022, 2 Nov 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Pinset orchestration with IPFS Cluster - Hector Sanjuan

Description

IPFS Cluster makes distributing a pinset across an scalable set of Kubo peers easy. In this talk, we will explore the basic features, setup and monitoring for home and production deployments.

A

My name is Hector San, Juan and I'm. The ipfs Caster lead today I'm here to give you an overview of ibfs, cluster and I hope by the end of the presentation. You have more context and knowledge of how ipfs cluster operates.

A

The ipfs cluster project started long ago. We recently had our sixth anniversary and the goal is to facilitate data availability and redundancy on ipfs by providing automated pin orchestration across a swarm of ipfs nodes.

A

My plan is to talk about the features and Concepts in cluster, followed by practical things that are valuable for operators like scalability and operation in production environments.

A

The first thing to understand is cluster beers are sidecars to Google peers and, as cluster peers, they're fully independent entities. There is one cluster peer per Cubo beer and they're, usually collocated cluster beers have their own identity and they have their own configuration and they communicate between each other using a private, lipid2p Network. All communication with Kubo is done through the kubos HTTP RPC API ipfs cluster software comes with two main binaries, unlike Kubo, where the ipfs command is both the server and the client in cluster.

A

There is an application that runs a Daemon and an application that runs the client and this application uses the rest API exposed by the cluster Daemon to to talk to the demon and perform operations on the demon. The demon is run by ibfs cluster service and the client is run by ipfs. Cluster CTL um I mentioned cluster peers from a private Network, and they used that private Network to communicate with each other, basically using lipid to be pops up and an internal RPC API that they share.

A

Podcaster beers are tracking and modifying what we call the cluster pin set. The cluster pin set contains all the bins that the cluster should be tracking and their opinion options is a key value. Store is a big database with all the pins there options and these database is replicated to all the Clusters that are participating in the cluster in the ipfs cluster.

A

So in Kubo a pin is just a CID and a pinning mode that can be either recursive or direct in cluster. It is more. The pin in cluster includes custom metadata that the user can provide desire, replication factors, creation date, expiration date, so that the pins can be removed or unpinned from ipfs at a given date, Origins information and other options.

A

Each individual cluster beer, of course, is tracking all these pins and they can complete this static information that is stored in the cluster pin set with Dynamic up-to-date information that they extract from their own from the Roman state I'm from Google. That is, the state or status of the pin. Whether the pin is spinning or queued to be pinned or has errored pinning or has successfully completed, pinning the addresses of the allocated peers, in this case the ipf spears, the timestamp of the last status, change, etc, etc.

A

When you send a new PIN to the cluster, so when you ask the cluster to pin a CID, there is a complex process that is triggered which identifies based on the bin options. I showed you and based on the on the state of the beers in the cluster, where to allocate a pin that is which pairs in the cluster should be the ones asking Kubo to pin the item.

A

Every cluster appear can optionally export three different apis and potentially all together, so you can enable and disable at-wheel. The rest API is a native API. This includes a full feature, parity with what cluster has to offer and is the most performance one, because it's built exactly to to fit how cluster is and there's a second API called the apfs proxy API, the proxy API mimics, the Google's RPC API, except for a couple of methods like bin, add or bin RM.

A

Instead of doing what ipf has what a single ipfs demon will do, it interprets those things as a cluster bin or a cluster bin ad or a cluster pin removal such making some specific cost to ipfs cluster-wide operations. This is thought so that you can essentially drop up cluster beer where, before there was a single ipfs, Daemon and not notice a difference, because the full RPC API offered by Kubo is is kept unchanged.

A

A third API that we recently added is the ipfs service been in API.

A

um This is experimental because it's very new, but it offers essentially compatibility with anything supporting the opinion, Services API Kubo itself, and it's very good for opinion and opinion, among other things, some very easy things that this allows you to do is to have a local Kubo Daemon on your machine with opinion services with opinion service configured as a remote backend for that Kubo demon. That is actually a cluster running somewhere else.

A

um Unfortunately least operation Flags, like Beijing pages and filtering they're, not well supported, because pagination is very difficult to do in in the way cluster stores the state, let's open. The second part of our talk and and discuss a bit about scale, so the performance as size of a cluster can be related to different dimensions and this Dimensions May matter more or less to the cluster operators.

A

It depends what you're doing with the cluster and what applications you are implemented on the cluster and what type of pins you're putting in the cluster, whether they're, big they're, small, whether you have many of them or not so many, etc, etc. But the dimensions of the things that matter here is how well Kubo demons are performing when they try to pin something how fast the cluster appears can commit new pins to the cluster.

A

How big the total number of pins in the cluster can potentially I get, and also, of course, how many peers can be added to a cluster, because your storage um will potentially increase the more beans. You add to the club, the more beers you add to the cluster.

A

So if we're diving into each of these, you can see that cluster with many small beers might speed up the overrate, um the overall rate of pinning, because you get more ipfs or Google demons to be pinning at the same time. So you can pin more, but at the same time this translates into more information flowing in the cluster's room, since all of them need to replicate the pin set which may make clustering ingestion through both smaller, depending on the available bandwidth and so on.

A

So these are variables that intersect with each other and Depends a lot of the on the environment on which you are making your deployments in terms of Google performance.

A

The type of disk that you use is the layout whether you're using lvm volumes, also some rate configuration ssds or Spinning Disk. The Google data store configuration the available boundaries on the machine and particularly the ipfs configuration, including internal bits of settings, etc, etc, is very important if you want to make Google beam more and faster. This is not related to to Cluster at all. This is related to configuring Google, in a way that cluster can can tell it to pin and Google pins faster.

A

um You can always scale Kubo machines vertically. You can always add more RAM and CPU tune the configuration accordingly and we operate Google nodes with 20 million pins in them with 50 terabytes of data in them, but they are expensive and they need huge amount of run so we're using like 192 gigabytes of RAM for these nodes. Instead, it's more sustainable to use smaller machines for Kubo, because Governor only needs to being Google needs to provide to the network.

A

Kubo needs to announce the DHT and Google needs to do maintenance and potentially provide the content to to anyone that requests them.

A

Therefore, the more content that is in Google the more resources it's going to use, and you cannot change that. So if you limit the amount of content in every Google, Google appear, the resources needed will be model and, in general, that peer will perform better.

A

In terms of how many pins you can ingest to a cluster, the minimal number of pin requests that a single cluster peer can track and replicate and sync to a network of 25 pairs is 2050 pins per second, this is the base. So this is the the lowest number that you can do by doing just things stupidly: sending requests to the API of a single node.

A

Even so, this number means 1 million bins ingested every hour and as I say, aglaster is made of multiple beers. You can write two different beers. At the same time, you can parallelize your requests to the API Etc, so these numbers can only go up so I'm, giving you the very base. um The variable is figure.

A

This works because cluster beers use Conflict Free, replicated data types to sync, their state and we've seen that we can batch, pin requests and we can actually get a huge performance gains by by doing batching and only sending sending batch updates to the rest of the cluster um every few seconds or when the batches get be big enough, etc, etc. It's a matter of configuring I'm, finding the right balance between the usage that you give to the cluster and the performance that you want to get out of it.

A

The maximum side of the pins that a cluster can support is another question, because the cluster needs to ensure that everything in its pin set is actually pinned, or it's actually tracked and has not errored and is correctly replicated in the places where it should be replicated.

A

Therefore, um the size of the inside will be heavily influenced by the machine performance, but the badger data store backend that cluster uses and how it's configured and how it optimizes the available RAM usage, our biggest cluster, has 100 million pins in it is made by 25 piers and every of those peers is taking their portion of the 100 million pins and making sure that it's actually pinned and is retrying those pins that are not getting through Etc.

A

um In fact, about your data store that stores those pins and that provides a backend for the crdt structures that are used to coordinate the the sinking of the pin set, has not 100 million keys in it. But 300 350 million keys in it, but in general the size of this data store is only 0.3 percent of the actual of the actual size of the content is stored on ipfs.

A

So even if this go grows, big and I'm talking at this data store is now all together will be three terabytes in that cluster from all the beers. um That's only a small portion of the of the 940 terabytes that the cluster is it's storing um so yeah. These figures that I'm giving you are not made up.

A

They come from an actual cluster setups, which means that in your own deployments, you can expect, at the very least on this type of performance at these ballparks, and if your report deployments have less requirements that this you can expect a spec very reliable operations with cluster ipfs cluster has received lots of improvements in the last year to get what it is. But of course you we have. We need to note always that the heavy lifting in these machines is made by ipfs.

A

So ipfs is the is a real resource hog in in any machines and I'm in Kobo. In this case, in the sense that the work of retrieving the content, the work of writing it to disk the role, the work of announcing it, the content, I'm, providing and having the bits of accessions to all other appears are all things that fall on ipfs shoulders and again it will depend on usage. It will depend on how many people are requesting that content. How many things your opinion at the same time, that the machines will behave differently.

A

So, for the last part of the presentation, I will go over some. Some aspects on on the operation of a cluster and I will look into adding peers, handling, pin pressure and doing monitoring.

A

So a good property of this crdt synchronization used by cluster beers is a cluster piece, are always fully operative, regardless of the state of other peers. So this is. This is very different from other other distributed. Key value stores like raft where replica and Pearson management is a special operation in ipfs cluster. Any new beer just needs to be given the multi-address of some other beer in the cluster, and as soon as it connects to it, it will discover the cluster and be fully operative. You don't need to do nothing else.

A

You don't need to do a special operation. You don't need to run a cluster command to add the new peer. This happens all automatically, so it is relatively, it's actually very easy to add new beers to the cluster, in the sense that you just started and you and you're going, and you can use that peer from that moment to write and store new content.

A

um In order to manage our pinning queues.

A

Obviously, you can send more more pins to a cluster than it can actually pin. That means that these pins will be cute because we don't send all the beams as they come to ipfs directly. There's a a configurable number of pins that you can set so you're never going to be pinning more than 10 things at a time. For example, everything that cannot be that cannot be pinning at that moment goes into a pinning queue and cluster peers use those spinning queues to Signal when a beer is overwhelmed by pins.

A

So, for example, if your opinion cluster page can ensure that your pins are replicated in different regions, so that a pin is replicated on your three different regions, for example, and in those regions they will choose which beer um have the lower pinning queues so which pairs are not overwhelmed by the things that are opinion in the sense that they're falling behind and not managing to, pin everything that they should be pinning and of those that are not overwhelmed.

A

It will choose those that have most free space. This ensures that the cluster capacity is used in a balanced fashion. So everything, normally, what you see is that everything converges into into into into into storage use. So the storage used in all the beers will tend to be will tend to be the same and they will end up filling up at the same level and then you will get your pins distributed in your cluster in a very balanced fashion and if not, it usually equalizes over time.

A

I think the graph here shows shows are pinning you that you see it doesn't go. It doesn't go very high because it's configured to not to not be more than 25, 000 or so and the moment it gets over that um cluster will stop sending sending pins to that a specific peer and then a specific peer can actually get done through that with this skill and it it's just not growing in defense indefinitely.

A

At the same time, other other peers in the cluster will be taking that load so that, if there's a cluster here that is lower if there's a cluster appear, that is unlucky because it has to be in very big pins, etc, etc. Everything is balanced dynamically so that others can take over.

A

Finally, in terms of monitoring, I've been showing you some some graphs from different dashboards cluster provides a promise you set endpoint, you can scrap a scrape cluster specific metrics.

A

You can scrape the rates of pinning you can scrape how many things are cute, and this United, with the metrics that Kubo demons are exporting themselves, can give you very good insights into what's a state of the cluster can give you insight into whether the cluster campaign, everything that you send it to it, whether the cluster is having a lot of Errors when pinning and things are not getting through. How long is it taking to pin something which appears out faster?

A

um How how well synced the beers are in terms of the cluster pin, set, etc, etc? That's it that's all. I have to say I, don't want to leave without pointing you to the documentation. The documentation goes much more in depth that I can do here. It explains you, for example, exactly what are the configuration options in Kubo and in cluster when you, when you're deploying at scale so which, since you have to touch which, since you want to adjust and I hope, you can go home with a better idea of how ipfs cluster operates.

A

There are some features in ipfs classes that I haven't talked about particularly collaborative clusters that use ipfs cluster followers um I'm happy. If you reach out to me to discuss later, we can talk about it and coming up, there's a presentation about the ipfs operator, which is how we're making clusters deployments to be really painless, painless and automatically optimized by doing them on kubernetes and having your your full. Your full cluster, essentially coming up from nothing very easily.

A

Also I told you that the pencil synchronization layering cluster appears is powered by miracle. Crdts and I'll be diving into how they work in 40 minutes in the ipfs 201 app design patterns and developer tools track. So if you want to come and see me there, I will be very glad to have you as well. Thank you very much.