IPFS IPFS Camp 2022, 3 Nov 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: A new approach to IPFS pinning and caching - Rüdiger Klaehn

Description

Ipfs-Embed is a small embeddable version of a subset of IPFS, written in rust. It is used in production in the German manufacturing industry.
This talk will present a novel API for pinning, caching and garbage collection that is designed for the use case of highly interactive and dynamic applications with soft real time requirements.
Afterwards there will be a discussion about advantages and disadvantages of this approach, and how it compares to the Kubo API.

A

80 percent bed is an implementation of ipms in Rust, which we have developed. It is um started by David Craven in 2020, and it was built by David and me specifically for needs of the company tactics. So um it is open source and it has some Kubo interrupt, but it was not the primary I already designed goal. So if you connect to Kubo it might work it might not.

A

It has worked at some time for some cases, but not really so it wasn't our goal basically, and it is intended to be used for small private swarms. So it's not meant to be used in the global ipfs, but in a small swarm of let's say up to 100 devices, and so there it is used in production in factories in Germany. So due to that, there's a lot of bulletproofing in terms of peer handling and all kinds of things that you only can figure out.

A

If you are in production and things start breaking at 2 pm 2 am in the morning. um Okay, so act takes is basically you have a private swarm. A private ipfs swarm which connects in the factory connects um where humans with a tablet in the hand, machines and nodes in the data center and each of them Builds an event log on you and you want to get the event from each device on each other device.

A

So you can then do something with the data like you have the current state of the world, and then you can make decisions based on your view of the world which is built from these events.

A

um Okay, so features um it has low and most notably bounded resource usage in particular memory. This is running on small devices. The smallest we have is a Raspberry Pi with 500, mag and uh yeah. You cannot exceed the memory when the thing stops and people will be very angry, so it's just not good and the API is Guided by the principles of local First Development.

A

So, um basically you work from local data when offline, so you have to clearly know when you do an operation, whether it will be local or whether we'll connect to the network, and you never rely on the network being available. So obviously you have to rely on the network sometimes to get some data, but at least you should. The application should always be in a usable State. You should at least be able to tell the user. Your data is not there sorry, but you should never get a spinner basically um and it's soft real time.

A

What that means is that it should not block for a long time because there's some stuff depending on it, but we are not using it to actually control machines which could kill somebody. So it's not used in a safety critical way, but it is used in a way where you really really don't want breaks for more than half a second or whatever um Okay. So, therefore, the API might not be appropriate for cloud usage.

A

It's just very special, so actix is published for basically all platforms, Linux, Mac, Windows and Android and Linux on basically every arm architecture. There is because there's quite a bit of variety in in the in these machines: okay, API. So this talk is not about the internals. That will be too long. I just want to talk about the API, because we did a few things different than Kubo.

A

um It is based on Rust, ipfs, Android, P2P and sqlite, and other than that, let's just see what the API, um what the API looks like. So this is probably the most controversial thing. I started with that, so we got local. I o you've got some operations which are always strictly local. You know, when you call these operations, you will never touch the network, and so they are strictly separate from the from the network operations and they are blocking boo. So people really look down to having having blocking courses say.

A

This is like 20th century Tech, but I find it very practical to have blocking calls occasionally. um So async is not without performance and not without mental overhead, um so and being sync. So no no async calls simplifies writing complex Logic. On top of uh of these things and um yeah I got another talk tomorrow about one one of the things you built on top um so, and that means existing local data should be consistently fast.

A

So if you want to get a Blog from local data, you will get the blog or you will not get the blog, but you'll have an answer in less than a millisecond, typically a micro. Second, so just no point making this async, it's not like you're, going to wait for a second or whatever um and yeah. My opinion is that's the only thing to do a thing to do in an embedded use case.

A

People might disagree but anyway, um so it might be different if you have a cloud deployment like we've seen with other use cases where you are deployed in the cloud and your storage is not local to your peer-to-peer node, then this obviously doesn't apply, but this is for this use case.

A

um So this is our API, the local API you can get. You can insert. You can check whether the block is there and you can list all blocks that you have that's it, and this is the mental cost of async. This is the method in Rust, with lots and lots of Lifetime parameters and I mean I've been doing rust professionally since 2018, but even I have a hard time.

A

Understanding things like this, so it's not to be um ignored and the path cost is that whenever you have async and rust- and you have abstraction- you often have to box, and that means you have an allocation. Even if you have a very, very cheap operation, you need to hit the hit the allocator, which is not good, um okay and so in Rust. They have this catchphrase abstraction without cost and I say in asynchrist. Abstraction is no longer without cost so and that's bad I mean uh so, but surely not everything can be sync.

A

So there's a embedded database called sled I think they do it right. um They have a cheap, local, local interactions which are synchronous, and then they have a call called flush, which is basically writes the state of the database to disk, and that is async and ipv is a bet follows this. So we got a bunch of operations which are sync which you've just seen, and we got a sync: it's async uh fetch, Flash and async. So we got things that take a long time and they involve a lot of stuff, I.

A

Think, okay and there's. um There are some very funny runs by Space Jam, the developer of sled, about rust, async and also by tomaka the main developer of wrestling B2B about rust async. So if you have some some fun, you can read them latest. Quite hilarious I agree with almost everything there, um okay, so now, let's that was the controversial part of the way.

A

Now, let's take a look at our API other than that pinning so we got basically um pinning is in our case is completely independent of whether you have a block or whether you don't have a Blog, so pinning just expresses something like if I had a. If I had this data I would want to keep it, but it doesn't mean that you have to have it before you can pin it. You can say: I want to pin this hash, even though you don't have the data for the hash.

A

I mean why wouldn't you you're, just expressing if I had Wikipedia I would be cool with that, um and that means, for example, you can pin Wikipedia, and then you look at all the pages and then everything you ever looked at in Wikipedia will be on your disk, because you express the the desire to keep it by pinning the root of Wikipedia before you even got it, um and then all pins are recursive.

A

We haven't found this to be a big deal and you can still have a link to something big and you don't need to have it all to pin it. You know you can have a DAC where you don't have some stuff and it's no no problem.

A

um Now we got two different kinds of pins. You have to think about. If you build a deck locally, you build always build a deck from the bottom up. That's the only way it works, because you cannot have to have the hash of the route before you build stuff and if you think something from somewhere else, you always build a deck from the top down, because you're thinking the route that then you stick. Your ball all the way down and kind of for these two use case.

A

We have two different mechanisms, so 10 pins are basically you use them. While you yourself are building a deck and what you do is they are incredibly cheap. You can consider them basically free and they use a rust on cpss pattern called array, meaning that you create a temp in and then, as soon as you go out of scope, the template gets deleted and the the sit is free to be collected again and um the intention is to say yep GC. Please leave me alone, while I build this thing.

A

uh So in our case the GC is constantly running in the background. So, if you build starts something- and it takes like- let's say five seconds- the decent chance that while you build it, it would be collected so to prevent that you have this 10 pin mechanism and now basically they are ephemeral. So when you restart your note, your templates are gone you have to, but if you are interrupted in the middle of building a deck, you probably want to start from sketch anyway.

A

um Okay, so- and this is quite easy to implement in our case, because the ipfs lives in the same process as the application, so if the application dies also ipfest dies, which is, if you have I, mean two different processes. You have this whole microservices dilemma and it's much more complex.

A

So this C API, you can create a campaign and then you can add a sit to the temp in and you can add as many seats to the dampen as you want, and it is very cheap um and then the other thing we have. We have name pins, they're called aliases and um you usually have a few. You might have just one name: pin per application, and um so what's the name in an M pin anyway, um it is a blob I'm, not not a believer in restricting things like that or to utf-8.

A

There's just no point if you have something which you might want to put in a name like a hash or I, don't know some other information, it's just a blob, and these are obviously persistent, and these are the ones that determine which data really doesn't need to be collected.

A

And this what the API looks like you say, Alias and then you have a a blob and then an optional set. If you want to pin something you set it to some you're sit, and if you want to clear it, you set it in one, and this is an example, so you build something you create a temp in then you build your stuff, then once you're done building you set an alias, and then you delete the temp in which you don't have to do it.

A

It happens automatically you want to once you go to gets out of scope, um so this is a clean way to build the big thing locally.

A

um Now now, until now, we've only talked about local operations. Everything I've talked about now is just local. You know local first, in the talk also um now we're talking about the network. So what do we have in terms of network operations? We got fetch, that's exactly like getting in Kubo, it gets from local or from the network. Whichever comes first. Basically, it raises fetching from the store and getting from the network, um and then we have something else which is called Sing and sing.

A

Basically allows you to sync an entire DAC, you just say sync and give it a hash and then what it does is it produces a stream of progress updates and then once it's done once it's done, the stream is done, and you know that your data is local.

A

So once you have finished your sync, you can then only use the local API and yeah. Also we've got gossip sub publish to subscribe and we've got broadcast broadcast is something to just send something to all your current peers, which is in a local scenario, it's quite helpful because those are the ones which you can easily reach without multiple hops.

A

um Okay, this is the Box API. You got fetch, which is async and sync, which returns us an object which is of both a future and a stream. So you can either just not await it, which means you're waiting without progress or you can tweet it as a stream, and then you can get some nice progress update and you can show it's syncing and now. This is an example.

A

How you would think that is one one important thing here: you when you sync, you have to make sure that your data is safe from GC, while you're syncing, because otherwise you might sync it and then you see, might kill it while you sync it so what you do is you create a temp pin and then you sync and then once once you're done with syncing, you have this, uh you switch it over to an alias, and then you delete the temp in so even for data, which is currently synced, you have to predict it from GC.

A

So, okay, let's then um yeah! This is the messaging API. It's nothing special you can publish on Pub sub, you can listen to Pub sub and you can broadcast broadcast is just sent to all the neighbors. Basically and now now we get to the store and the well the store. The most uh interesting thing that the store does is garbage collection. It's based on sqlite um I was going to use sled, but this author of sled advised against it. He said: if reliable is your primary constraint, use sqlite state is better so well.

A

I did that and I'm currently working on the future store, which is a Radix 3 with a custom storage backend, where you might use a storage package from Space Jam, the author of sled, or a very simple storage backend, which would work and wasn't um like you're using index b or whatever. But it's never going to be the primary because it sucks um uh you cannot it's really hard to have something which works well, both in wasm and on Native I tried hard, but I think this is a good compromise anyway.

A

So do you see um so it used the recursive sqlite query to find out the live set, so the set of all things which are currently pinned and it uses sqlite Advanced features with recursive and um and then basically drops all the blocks that it has determined that are often basically and now. The notable thing is. It does a second thing.

A

Incrementally so your GC poses are not that long, um so this is the sqlite by the way it's uh the sequel, It's quite complex, but it the great thing about this is that it is not that fast, but it is bulletproof because it's inside a single sqlite transaction- and it will just work- don't worry about it. It's one less thing to worry about.

A

um Okay, so you can set multiple limits on the store you can set GC time limits, you can set a Target GC time. That is the maximum time or the target time that you see is about allowed to run. So you set that to one second, it means the GC would try to run only one second, at a time.

A

Of course, the GC might take longer than one second, but then you see we'll have to split it up into multiple runs, and then you can get a problem where you are not really um making progress and to make sure that that doesn't happen.

A

um You can set a minimum amount of blocks to be receipt per run, and if you don't have this minimum amount, you will exceed your time. That's to make sure that you always are collecting enough. Basically um and okay, that's that and then you got another thing, which is the limits of the store size. We got limits both on the size and on the number of blocks.

A

The reason for that is, if you have lots of small blocks, it doesn't matter how small they are, they will cause trouble if you have Kubo configured with 10, gigs and I, just put in 10 gigs of small directories. I promise you it will fall over because it is just too much stuff, so you have have to have limits both on on the total size and on the number of blocks. It's like in the Unix file system, where you have an inode limit or something like that.

A

um So limits do not apply to pin data they're only about the stuff which is cached and um the config is here. You can see it set these two values and now caching caching concerns itself with the the part that is not pinned. So if you have 10 gigs free in total- and you have one geek pinned, then caching is how to allocate these remaining. Nine gigs and there you have a customary custom caching strategy, so you can have one which is built in which is in memory lru.

A

Basically you just whatever is there, which has been accessed last uh gets kept in case of a GC, so example, you have a Gateway which has a lot of files and the files which have been last accessed are kept. Then the next one is. You have a persistent allow and in that case it's in a separate database, because this database is not as important as the main database.

A

If it breaks no big deal, you just lose information who has accessed your stuff, it's just who cares and then you can even have custom strategies like if you have Unix FS. You only keep directories, preferably because they are more important to find the structure and so on and depending on your application, you could come up with a different strategy. uh What to keep in case of caching?

A

um Is there some more here yeah? So this is the caching trade. Basically it it's just a bunch of methods which get called on every axis which allow you to keep track of. What has happened so you can configure out what to keep and there are two default implementations and you can also roll your own. Basically um so PS we do a bunch of things about PS. So if you run on restricted Hardware, you really cannot afford to have 600 peers just a no-go.

A

um So you have to be very careful about which PS you keep. So if you exceed the limit of PS that you want to have you need to figure out which piece to throw out and which Piers to keep. So what you do is you have a kind of hierarchy of peer value so, for example, your bootstrap here you want to keep them forever. In our case, because this is a this is probably box somewhere in in a cabinet which you know it's always going to be there.

A

If it's not there for for an hour, it doesn't matter once you get back in in into a wireless line, it will be there. So you always keep the boots up here and then we got stuff like uh if you have manually connected peers, you want to keep them longer than others, because somebody made an intentional. He said: I want to connect to this peer.

A

So please keep this pier because I told you you know, and then the next thing is mdnsps remember local first mdnsp is come from your network environment, and so they are probably more valuable than something far away towards you, because there's more likely that you will get them again at some point and then I mean the exact details, don't matter. The main thing is that you have to have some kind of logicalistic about which Piers to keep. You cannot keep them all.

A

So you have to do something, um so we got some API questions, um there's a bunch of things internal to rust. So this is this. We have. We have implemented this and as soon as we had something that worked, we had to move on to the next level, so we couldn't spend a lot of time designing the perfect API.

A

So we have a bunch of questions left, but I think we will be able to come up with something something better at number zero. Now that actually we have time to focus on this and one of the biggest things I um biggest questions I have is um how do you do an incomplete string of a graph? How do you say I want to sync, but I: don't want to sync the entire DAC, because it's giant I want to sink only a certain subset and there's a bunch of heuristics.

A

You could do to limit the death or use graph sync or send a predicate over the wire or anything basically, but we don't know how to do that. Yet, um okay and that's basically, it I think.