Status SWARM Orange Summit, 1 Aug 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Encryption in Swarm

Description

In this presentation from Day 3 of the #SwarmOrangeSummit, Daniel Nagy gave a talk titled “Privacy on Swarm”. This presentation gives a basic overview of Swarm’s layered structure and how data chunks work, he then goes onto explaining the encryption algorithms and processes Swarm uses to protect data.

A

So can I begin okay, hello, everyone! So today, I am going to talk about what facilities Warren provides for protecting the privacy of data stored in Swan.

A

Okay, is it audible now like this okay, so today, I'm going to talk about the various facilities that swarm provides for is going to provide for protecting privacy for data that are stored in swarm and in order to do that, I will do a really really brief overview of how swarm in general works in the unencrypted version. Many of you are already familiar with it, but I think this is worth repeating and explaining for those who who are not familiar with the low-level details, then I'm going to talk about symmetric encryption of content.

A

This is something that has already been implemented and you can try it out and play with it and then, in the end, I am going to talk about more fine-grained and even to some extent, identity based access control, which is work in progress, and this currently not yet available for playing with, but we're hoping to to get there really soon.

A

Okay, so swarm is a layered stack where, on the bottom layer we have a network which forwards and stores four kilobyte chunks, so they are stored locally and they are passed around into network, and this particular layer has no idea how this relate to each other. So they treat chunks as units of information that are entirely independent of each other. Above that we have a layer that handles arbitrary lengths, binary files without Anna metadata and above that we have a layer which deals with collections of files which all have a URL very similar to webpages.

A

So basically, it's a virtual content address web server, which has metadata attached to collections of files and above that we have. This warm hosted, distributed applications which can access the layer below it and even the raw layer which deals with with files without metadata. But applications cannot typically access data on the chunk level.

A

So, what's in a chunk, it has 64 bits which we call the span, which describes the length of the file encrypted. Sorry, encoded by the subtree of at the root of which this particular chunk is and at most 4 kilobytes of payload. So if the span is smaller than 4 kilobytes, then this means that this is a leaf chunk, meaning that the payload information in this chunk is actually the content of the file. Otherwise, the chunk is called intermediary and it is. It contains references to sub trees.

A

So here's an example where we have a 10 kilobyte file, how it is stored in swarm. It is worth stored as 4 chunks in one chunk, which is the root chunk. We have 10, we have a span data, a span field which tells that the subtree encodes 10 kilobytes of data, and then it has three references to three chunks and these references are, in the unencrypted case, just hashes, and then we have three chunks.

A

Two of them are full they're, four kilobyte, four kilobyte chunks which have parts of the payload, and the last chunk is just two kilobytes, also having the last 2k 2 kilobytes of the payload. So that's how a ton kilobyte this file is actually stored in swarm, and on top of this we have these manifests, which are merkel eyes, key value database.

A

So the key is the path the URI just conforming to the stem stand, the same standard as the ones that you use on the web and the value as a minimum is HTTP metadata, so as a minimum, its content type, but it might contain other metadata as well, basically HTTP headers, if you will- and it has also a reference which, in the unencrypted case it's a simple hash to the raw content file, which is just the binary data without any further metadata.

A

So that's that's that's what swarm is and that's what you need to know about it in order to understand the rest of the discussion, so symmetric encryption in swarm is achieved using this so called CTR mode or counter mode. This is a illustration from Wikipedia I think it is very illuminating. So it's a good illustration. That's why I haven't drawn my own.

A

This is a very popular mode of operation in modern crypto systems, because it is actually very convenient to reason about it and given certain assumptions, reach conclusions using just for like methods of formal logic, so create formal proofs of security that are dependent on a clearly clearly denoted set of. So how this counter mode works is that you have a key, and then you have, in this case it's as a block cipher, but in fact it can be any one wave function, any cryptographic, one wave function where you have a nonce and the counter.

A

You combine them into a one-way image of these two things. So the key, the nonce in the counter, and then you accelerate with the plaintext to get the ciphertext or accelerate with the ciphertext in order to get the plaintext. So the encryption, the decryption operations are identical and one benefit of this is that it allows for partial decryption of the content, namely, if you only need this middle part, for example. So you have the ciphertext and you need the plaintext of the middle part.

A

You can do it completely independently of the rest, which is convenient for a many use cases, one of which I'm going to elaborate a little bit later.

A

So in swarm, we use a peculiar variation on the CTR mode, so we implemented one part of this. So this is for three separate blocks, and this is what we do for one a particular block. In this case the block is 256 bits and instead of the block cipher, we use char tree twice.

A

First, as a compression function that compresses the key of the counter and then just as a one-way function, with the same input and output length, and then we XOR the result with the plaintext to get the cipher text or with the cipher text to get the plaintext. So why are we doing this? Why? Why are we running? Why are we using sha-3 and why are we using it twice?

A

So the reason we're using char tree is because aetherium in general use ash a tree form a mnemonic purposes and if Shari turns out to be vulnerable, then the security of the crypto system and etherium is compromised and we have much bigger problems than being able to attack swarm and we are going to be able to attacks one anyway and therefore, in order not to increase the attack surface, we are kind of finding behind the same door instead of putting another door on the same wall which the attacker can also attack.

A

So the choice of Shari as a one-way function is basically dictated by the fact that unencrypted swarm and aetherium in general also use ash a tree, and so we put so the only security assumption that you need to make in order to formally argue about the and reason about the security of swarm is the one-way nature to random Oracle nature of Shari.

A

So why are we using it twice? So the reason we're using it twice is because we want to allow for selective disclosure, so in order. So if we use Shari only once so, the second box would not be here then, in order to reveal the plaintext given the ciphertext for this particular block, you would need to reveal the input to this Shari box, which is the key on the counter. But since you have the key, you would be able to decrypt the whole chunk, not just that particular block within the chunk.

A

Moreover, if you can decrypt the whole chunk, that means that, if that chunk is not a leaf chunk, then you can decrypt the entire subtree below that chunk, which means that the selective disclosure becomes.

A

Well, not impossible, but inconvenient, whereas here what you can do is you can reveal the result of this shot three. So you only reveal this, and these data are still protected by the one-way nature of the Shari function. So you cannot recover the key on the counter from this data. However, you can still the crib the plain text on the cipher text, so this is so. This allows for selectively disclosing 256 bits of any particular encrypted content. Of course, it is.

A

Waitwait of it, so let me get there. So, of course you could. You could reveal this here, but the problem with that would be exactly what I have shown you last time is that for any ciphertext and any plaintext I could come up with a value that would risk that would create that plain text from ciphertext.

A

If we move one step up from here to here and this close what's here, then we can also show something that is not there, because we just get a random value, apply 3, accelerate with the ciphertext and then claim that that is the that is the plaintext that we that we have encrypted. So this kind of thing in cryptography is called existential forgery, but existential forgery is much better than being able to have so the previous one. When you revealed this and you get any ciphertext for any particular plaintext that is called Universal forgery.

A

This is just existential forgery, meaning that you can. You can claim for that data to be some round random, other data, but you cannot control it. So if there's any integrity, protection on that on the plaintext that imposes a constraint on the decrypted plaintext that only the real plaintext can meet with a very very high probability without going into mathematical details.

A

If we're using 256 keys as we're using it now, we got a 128-bit security, so the probability of- and this is because of the birthday paradox- but I really don't want to go into mathematics, because that would take a long time, but basically the attacker, who actually wants to forge a ciphertext in a way yeah sorry forge a plaintext given the ciphertext. They would need to do 2, ^ 128 attempts, which is still beyond the realm of feasibility.

A

So this means that if there's any cryptographic integrity protection on the on the plaintext, then revealing the preimage of the last shot 3 and obtaining a plaintext that actually meets the constraint imposed by the integrity protection being it a Mac, a hash or a digital signature. Then it is a proof beyond reasonable doubt that it indeed is the actual plaintext that has been encrypted and, in particular, when we want to do inclusion proofs in a large file encrypted by swarm.

A

Then the neat thing is that the fact that the the plaintext is the hash of a chunk further down in the Merkle tree that itself serves as a integrity protection. So if I reveal a plaintext and then in swarm, I find a chunk that actually matches that hash, that it's, that by itself proves that the plaintext that I have revealed is actually the plaintext that has been encrypted because under existential forgery, the chances of finding a matching one are diminishing with slow small.

A

So in encrypted swarm references are more than the hash of the plaintext. They are the hash of this ciphertext, plus the decryption key. So, instead of the 32 bytes in the unencrypted versions, we have references that are 64 bytes in total and in particular for the API. It means that you can use the exact same api's that you use for the unencrypt. It's warm.

A

You just need to use longer references and, if you're using longer references, then the than the swarm client this one gateway would recognize that this is a request for encrypted content that needs to be decrypted, so the API remains the same, and the only cryptographic assumption that we have is the security of char tree.

A

So as long as we trust char tree to be a cryptographically secure, one-way function, this design is formally verifiable and that assumption is so heavily weaved into the fabric of etherium that, if that assumption breaks, we will have much bigger problems and because and want one other thing. What makes a tree attractive is that char tree is available in EVM the Sirian virtual machine. So it makes this encryption smart contract friendly, so you can actually check inclusion proofs by a smart contract.

A

So you can make statements about contents of encrypted files in such a way that you can prove everything using just shot rehashes without totally disclosing and decrypting the file or revealing the decryption key, but simply selectively disclosing only the data that you want to disclose. So that's where we stand now. This is already implemented and I encourage everybody to try it out all the example gaps that the swarm team publishes together with together with swarm. All all of them are already encryption friendly. So you can mount encrypted volumes using the fuse interface.

A

You can browse them using the swarm Explorer and you can create encrypted private photo albums, which means that the vision, creating a decentralized drop drop box like functionality where you can store your files and only share them with those with those with whom you actually want to share them, is really really close. So it's it's going to be it's going to be already probably soon this year within a few months.

A

A

So currently in ENS, you can only obviously you can only register short references to unencrypted content and it could kind of defeat the purpose, if you put the decryption key on the blockchain, so there needs to be some kind of identity based identity based access control, which is the topic of the third part of my talk. This is precisely what's missing.

A

So typically, yes,.

A

So I haven't really meant it shot. Three, so I have relied on pre-existing shot, recode the same that Gath uses for every other purpose in aetherium, but the counter mode encryption. It had to be reimplemented well from scratch, using sha-3 as a primitive, so it fits onto one screen a source code. It's a really short and clear implementation.

A

Right so so that part has been had to be re-implemented, because it's a variation on the counter mode with that double hashing, and so there's there, no pre-existing libraries part of me right. So, yes,.

A

Correct so, if you're not running the node yourself you're using the Gateway than the Gateway decrypt and drain cryptic using the SSA SSL, this secure socket layer, which means that you need to trust the Gateway. That's true, but you know in the bright future. You're gonna you're gonna run at least a light note on your mobile device or your desktop. So don't don't rely on gateways.

A

Right so the asymmetric encryption that we're going to use is the elliptical version of Aldama encryption, so the identity will be a sub P, 256 K 1 public key, just like the keys that you use for etherium addresses, and the reason for this is that there's already quite a bit of trust and confidence in the security of this system and, moreover, there are actually tools available for managing such keys, including very high security. A specialized hardware, like the hardware wallets that people are using to store their their crypto funds.

A

I would like to single out treasure and ledger as the two most popular solutions, so they are already able to handle this kind of keys, and we would like to. We would like to tie the Identity Management. The identity base the encryption of swarm to this same key so that you can use your hardware wallets or your already stored private keys. The same way you use for accessing your aetherium funds and sending transactions to the block changes.

A

Well, you don't have to I mean you will need to do that in case you want to decrypt sensitive data that justifies the use of a hardware rod. The fact that you have a hardware wallet does not force you to use Hardware wallet for everything you can still have private keys on your aetherium node right.

A

Well, it depends on which I don't see use if you use an identity. That is on your on your hardware, wallet danya's, but actually I think that in the not-too-distant future, it would be a good idea to have a have a hardware. What even for frequent frequent use transactions, because the fact that etherium transactions are rare is something that I think money in this room are hoping to change.

A

So the plain text in this case is a symmetric key denoted by M, which is 256 bits and the way we use it has been discussed in the previous part. So the ciphertext and the encryption of the decryption of the formulae are there? That's how you do algum up and how do you use it? So access management is done through ACLs, which are referenced in the metadata in the method data part of the manifests.

A

So this means that for every subdirectory and even for every file in a swarm collection, you can have separate access control lists and the access control lists contains a public key of the ACL owners. So there's somebody who has the means of modifying these ACLs and their public key is part of the ACL, and the ACL itself is also a manifest. So it's a key/value table a mapping that map's diffie-hellman shared secrets with the is with the.

A

Published public key and the encrypted the encrypted secret keys, with the other key being the public key of the of the identity to which access is granted. So the value is a Ultimo encrypted content key and the key is the diffie-hellman shared secret. And the point of doing this is that if you don't have the private key corresponding to a serine you're, not the owner of the access control list, you do not. You do not know who is on the like. Who is on the access control list?

A

You cannot read the access control list unless you're the owner, but nevertheless, if you are in the access control list, if you're one of the listed identities, you can find yourself quickly. So that's the point of this scheme, and the only thing a third party observer can glean from this. Acl is a upper bound on its size.

A

They cannot even necessarily know the actual size of the ACL, because there's nothing preventing the preventing the owner of the ACL to stuff it with bogus data. So, even if you have like three keys that are authorized to access a certain part of the collection, you can still have an ACL list which is 100 entries long and a third party observer is non divisor. They have no idea which entries are random data and which are actually access. Control entries.

A

A

A

A

Yeah, you encrypt the same symmetric key with three public keys correct. So this is actually the topic of this slide, so we're using hybrid encryption with and Kurtis asymmetrically encrypting, the secret key and then symmetrically encrypting. The content modifying ACLs requires uploading, a small number of chunks, recalculating the root hash and the sinc single transaction to ENS or the mutable resource update. So it's it's really cheap and read. Access means that you, you have ownership of the key and you're able to decrypt the content. That's that's. What read? Access actually implies and write access is.

A

Implemented within the DNS resolver contract, which means that your key is somehow permitted to change to change the entries in there is over. So that's that's the right access, but for a for the end user, if you will, you will be presented with a drop box like interface when you can simply add, read and write access to various parts of the swarm encrypted content necessary swarm, hosted content with the same mass of words that people are already used, and this is just how they're going to be implemented in a decentralized fashion.

A

So that concludes my talk and please you.