IPFS IPFS þing 2022, 9 Aug 2022

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Peergos private filesystem - @ianopolous - IPFS Implementations

Description

Peergos private filesystem presented by @ianopolous at IPFS bing 2022 - IPFS Implementations - https://2022.ipfs-thing.io

A

G'day everyone I'm going to give you a whirlwind tour of pergos and some of the cool stuff we've built on top of ipfs.

A

So what is pigos? It's a global peer-to-peer private file system, an application protocol designed for the average person to use safely uh it's a file system, so everything has a unique human, readable path which begins with your username. You can share individual files or folders, read only writeable, constant time sharing with with a group or revocation.

A

So the usernames are unique. How do we do that? So you need a pki. This is basically just a mapping from username to a list of signed claims and each claim is essentially two public Keys. You have your identity, public key and your Home Server public key. That's an ipfs node I do uh this is slightly more secure than dids, because in the general case, because uh with with dods, you can have DNS leaking into it via the service endpoints, which are URLs. So we avoid that via the Home Server node ID.

A

uh This is all stored in a champ, which is a compressed hasteroid map to prefix, try, which is a super cool data structure. It plays well with codts, uh it's in insertion order, independent a bunch of other stuff and so for the pki. The only consensus we actually need uh is just time ordering.

A

So if two people can't claim the same username but yeah you get efficient, lookup uh comparison and merge and all this champ data is mirrored on on every instance, uh and so you get private local search if you're trying to connect with a new friend, which is super important for something. That's social and once you've got your identity of the person, who's data you're, trying to get or log in or whatever from the pki. What do you do? Next? You need to get a mutable pointer.

A

So in our case the immutable pointer is a mapping from a public key to a signed pair of cids.

A

And how do you? How do you get this so with a peer-to-peer? Rpc call. So these are just standard. Http calls over peer-to-peer streams, and this is I think this is an undistold feature of ipfest, which is amazing because you can totally avoid any dependency on DNS or the TLs certificate authorities. You just say: I want to dial this node by public key and send whatever you want um and yeah so yeah, it's awesome.

A

So this gives us fast retrieval fast, remote updates, but you could still fall back to ipnet to actual ipns if for a slower, read backup if your server is offline or for whatever reason, and so this is the the basic architecture at pagos uh installs and runs an ipfs instance uh itself. I've just mentioned it's dns3 and trustless, and so, if Alice logs in on on one instance and tries to modify something so Alice has a Home Server.

A

As we mentioned, uh all those rights get proxied over a peer-to-peer stream, and so the data ends up initially on on the the Home Server.

A

So with a file system, especially a social one, you need access control, so we we do that with a thing called crypto Plus you you've heard crypto several times today. Already so, uh let's say what? What does the plus mean so quick well quickly, itself is, uh was invented in 2008, um so we've added a bunch of things on top of that that initial version, uh including metadata privacy, ciphertext privacy and made it post quantum.

A

So it's pure capabilities, so you don't need to rely on a server to enforce Access Control, it's fine-grained, it's also stored in a champ. We like Champs the ciphertext access access control is a relatively new thing. That's as of January this year. We do that with things called block, access, tokens or bats, which I'll talk more about later and another super cool thing that we get is zero. Io seeking so, if you have a huge file, I, don't know like gigabytes, maybe even terabytes it's encrypted, but you want to.

A

You want to be able to seek to somewhere down there really quickly. You've got the start of the file say. How do you do that, like? Obviously, if you encrypt the the entire file at once, you would have to download the entire file and decrypt it, which is not going to work so I mean the first part of that. Is you chunk the file, obviously, and you, but each chunk is independently encrypted, so you can get whichever bit you want uh to decrypt it, but the other.

A

The other key thing is uh how you get from the location of the first trunk to the location of some later chunk and that's the zero IO thing um which, if you want to hear more about just talk to me later um and as you'd, expect with ipfest. You get efficient modification. So if I modify a byte of a terabyte file, I don't have to re-encrypt and upload the whole thing.

A

So this is how it looks. You've got your internal champ nodes.

A

Then you have a crimp tree node for for each chunk of your file or directory, and that can have links to the the encrypted file fragments, and so the keys in this champ are basically random.

A

um Subsequent keys in a file are not random, but they're, still not deducible uh by the server, so the storage, Your Home Server can't figure out or can't link the different chunks of the same file. So we use that to hide the the tires of the file among among other ways, the read correctly is pretty simple: it's it's been discussed earlier, but yeah, it's a tree of symmetric keys. If you have one key, you can follow the the arrows follow the links. It also gives everything a well-defined path.

A

So if I just give you access to this file, you can follow the parent links to get the names. So you have a path, but you still can't see if there are any other files in that directory any siblings or anything like that.

A

The right tree is even simpler, so there's just one key for each file or directory. These are all symmetric keys by the way in the previous slide. um Also, the top ones are symmetric Keys. These are obviously key pairs at the bottom um and the the metadata that we protect file names, file, name sizes. If you care about that, uh the file sizes I've mentioned so there's a chunking part, get you down to modulo 5 Meg.

A

We also had pre-encryption to a multiple of 4K, so you you end up with five Meg over 4K or 1280 possible chunk sizes in the entire world. So that's cool uh the iprd format for cryptv that we used makes files and directories indistinguishable. So you can't the server can't tell what's a file what's a directory or who has access or even the directory topology.

A

So this is how the Crypt view format looks like. So this is the the cryptory node itself. This is a dags keyboard, node and there's basically three independently encrypted bits. The first two are quite small and there to do with more with the structure of the crimp tree, and this is the actual data like children.

A

If it's a directory or or the data of the file itself- and there are these these bats, these things that I keep mentioning um and minor optimizations are so everything here is padded as well, which you mentioned, um but if a file or directory which most directories are is is under 4K, we just we inline it, so you don't have to do any other Network requests.

A

So back to bats, what is a bat, uh so yeah, the the important Point here is you shouldn't be relying just on encryption for privacy if you make your ciphertext public that matters in a whole bunch of threat models.

A

um So, with the bats we've we've got a post, Quantum Access Control at the Block level in ipfs, it's again pure capability based uh and the cool thing. It manages to maintain the auto scaling properties of ipfs. So in ipfs you know if one node retrieves a block, it can then help to serve it up it and the way we've done. It is the same, the same thing: they can help to serve it up and continue to apply the same auth to it.

A

And what actually is about so a bat is just 32 random bytes, the the auth we used over over a libhead appear is uh s3v4 signatures, which are time limited tied to the source, uh the source ipfs node, making the request. That means that we can. We don't have to worry about these auth tokens. We could just broadcast them to the DHT.

A

There's no there's no such thing as a replay attack uh and this whole auth token, in with a signature, and it's wrapping is 89 bytes, so about two and a half cids, and we we have one of those for every, not every block. Some blocks are still public, but the ones that actually have ciphertext in them. uh So it's quite a low overhead, but of course you need a modified bit swap to to be able to handle this. So we've added yeah bit swap which sends this this auth string.

A

uh You can, you can see it there, um there's a URL one super important thing which, uh which I kind of just mentioned, is you need to check any any scheme you use? You need to check it against the actual uh the source node ID, coming to bit, swap which made the request.

A

And we use this in a Thing Called ipfs nucleus, which is a strip down ipfs implementation that has all the stuff we need. It basically just has the block API, so we call it an IPL, Daemon and yeah. You can see those are all the API calls we have as well as obviously the the peer-to-peer HTTP HTTP proxy.

A

And this type of first nucleus thing has a has a customizable block allow API. So this is the thing that bit swap hooks into, and this is the function signature basically, so you have allow passes in the Cod, the actual data of the block, uh the source, node ID and then the or string it received over the over the network and that just returns. Whether or not bitthought should release this block. So again you can check out ipfs nucleus.

A

If you want- and there were, there were two two things I haven't had time to talk about, um which is uh GC implementation. We have a fully concurrent GC, which so you might have might have noticed that there's no there's no pin API here. So we we don't. We don't actually have a pin.

A

Api uh pins are implicit for us from the basically from the mutual pointers, and so the GC just grabs the mutual pointers and you've got an implicit pin set and you can proceed from there and the other thing is uh well I'll talk about this tomorrow. In another talk is we we've just released an application sandbox which lets you run private applications over private data in an untrusted way, so that the application, if it was malicious, couldn't steal your data or exfiltrate it So yeah. Thank you. If you have any questions, come find me.