OpenZFS OpenZFS Developer Summit 2016, 10 Oct 2016

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: ZFS Native Encryption by Tom Caputi

Description

No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).

A

So next we're gonna have Tom Caputi talking to us about a very exciting feature which is native encryption in CFS.

B

Hi everybody I'm, Tom, Caputi and today, I'm gonna, be talking about something I've been working on for the last nine months or so, which is basically adding encryption at rest into ZFS. So, basically, I'm gonna start out with an overview of the implementation of you know what you're gonna see as a user.

B

What you can kind of expect from you know, assisted man's point of view and then I'm gonna kind of build up the encryption implementation as we go through the presentation, so you can kind of get an idea of how it works at the block level. So, first of all, what is encryption from a really basic 40,000 foot view?

B

Basically, we want to prevent somebody known as the attacker. We want to prevent them from accessing data that belongs to us data that we're considering private permissions are not good enough to protect against this, because, no matter what a root user can always change the permissions on pretty much anybody's stuff. So if you have files or directories, you know, if you don't trust the root user under system. You can't really do anything to protect your data from them.

B

Kernel bugs have routinely been exposed in Linux, especially in the past couple of months, where there are kernel bugs that have allowed for privilege exploit of escalation which allows other you know, untrusted users to get to root and then from there they can access your data. So this really isn't good enough, and the other thing is. Even if you know your data is in some machine and you know that nobody's gonna be able to escalate, and maybe you are root on on your machine.

B

Somebody can always move your somebody can always move the disks to a different machine or install a new operating system and they'll be able to read the disks no matter. What so really permissions are not good enough to protect against these kinds of things. The solution is going to be to encrypt the data, and what that means is that the data that's on disk should look pseudo-random to everybody, except somebody who has the private secret key.

B

The user basic is gonna, be able to encrypt and decrypt data with that key in order to get to it and access it as they want to, but from an attackers point of view, if you don't have that key mathematically, it's gonna be very hard to decrypt the data or even to write new data.

B

So let's talk about why we want to do encryption in ZFS now, as opposed to you above it or below it. Let's talk about above it first of all, there's a couple of solutions that deal with this. One of the big ones that comes to mind is equipped FS.

B

One of the biggest issues with this from a performance standpoint is that ZFS is able to get a lot of performance out of the fact that it compresses data before it goes to disk. If you have encryption on remember your data is gonna, look pseudo-random, so pseudo-random data there's no patterns there, there's nothing really that can there's no compression that can really be applied to it. So you will lose a good amount of performance there.

B

You also lose D dupe capabilities, because if your data truly is pseudo-random, two pieces of data that might even be the same are going to look completely different. The other thing is that it's gonna write out metadata, headers and stuff, because encryption as I'm going to show you in a little bit does require having some metadata that needs to come.

B

That needs to move along with the data these in E, crypt FS and a few of these other solutions are going to be written in the header in the file which can disturb the which can basically disturb the file alignment and therefore things like databases might not perform as well. Now, that's being said, it's even there's even more restrictions.

B

If you do encryption at the disk level at the disk level, let's say you have some kind of raid system set up, and then you have a DM crypt for those of you who use Linux set up on top of that. Basically, if you have multiple copies of any data, that's going to be encrypted multiple times, because again, each block is going to get encrypted a different way. Each time. There's no intelligence there, and so that's gonna mean extra CPU overhead for each encryption.

B

The other thing is that you can't do any kind of ZFS commands. You can't even recognize that this is a ZFS pool until you've loaded your keys and you know, got the underlying block device decrypted. So what this kind of means from a user's perspective, is that you're gonna have the keys loaded all the time I mean because you just if you want to use the data they they have to be there.

B

This will also mean that if the keys aren't there, you can't do basic pool operations like scrub resilvered. Nothing like that. You also can't send data without having the keys loaded and the last big advantage of this disadvantage. Of both of these things is that it kind of adds more complex management, because, right now you can manage all of your storage through the ZFS and zpool commands. If you, you know, if you're using ZFS, but without with this, you need to add another layer which will just handle the encryption either above or below.

B

So this is kind of a big win. Just from a administrative standpoint, so for those of you who don't know, I thought I'd include a little slide on how we're planning on using this a dado, basically dado, to sum up one of our biggest products. In a nutshell, it's basically we have a backup agent, which is a piece of software that lives on a client's machine.

B

The clients are gonna, backup their data to a zpool which sits on a separate machine on-site at their at their office or wherever it may be, and then that is going to backup to the cloud using ZFS. Send the advantages of native encryption here are going to be that we're going to be able to get a lot higher performance encryption without losing compression because it is baked into ZFS.

B

We're gonna have a much cleaner implementation than we currently do, because what we currently have is a bunch of stacked block devices that have that perform the encryption intermediately, and the last thing is that we're gonna be able to backup our customers data to our off-site server without being able to decrypt it, which you know from our users perspective. They're gonna want that because they can't they don't know if they can trust us necessarily and we'd. Also don't want the liability of you know being of being able to decrypt it.

B

So now, let's get into what is actually going to be encrypted in a ZFS, encrypted volume or in a ZFS encrypted pool. Basically, all of the user data file data and metadata will all be encrypted. That includes Eckles names, permissions attributes all of that directory listings. All of that XIV all data fu ID mappings master keys, which I'll get into in a little bit about how that works, but your master keys are actually going to live on disk, but they will be encrypted separately.

B

All of the above will also be encrypted if it lives in the l2 arc, because that is still persistent, even though most of the code kind of assumes that it's you know that once you reboot it's gone forever, but with encryption we really can't assume that and then the other thing is all of that will still be encrypted.

B

If it's in the Zil, what's not going to be encrypted, is basically anything that's related to the structure of your of your pool, so data set snapshots and names anything that lives in the Mo's pretty much data set properties, the pool layout structure, the D dupe tables, so the D dupe tables will be clear and unencrypted, but the data itself will actually be encrypted for those of you wondering and everything in RAM will not be encrypted either.

B

To give you an idea of what this is going to look like from a sysadmin point of view, these are going to be kind of the new commands that are going to exist. There's a modification that's been made to ZFS, create that will allow you to add two new properties. One is encryption, which is basically your encryption, algorithm and I'll get into what we're gonna support for those of you. People who know encryption fairly well and the other ones a key source and key source is basically just how you're going to provide your key.

B

Your password, your you know, hex key your rakhi. However, it may be to ZFS, along with that, we have a new command sub-command ZFS key, which will help you manage your keys in ZFS, so that you know you can decrypt and encrypt and decrypt your data. Basically there's gonna be ZFS key L, which will load your key into the system, allowing you to encrypt and decrypt anything. That's there unlock you will unload the key, basically doing the opposite.

B

C will allow you to change your key so that you know you can read so that you can change your password change. Your key, if you feel like it might may have been compromised or just want the security of you being able to rotate it and that will not Riaan crypt all of your data by the way, the other one. There have also been modifications made to a number of ZFS NZ pool commands so that you can load the keys as you're mounting and unmounting things.

B

The key things to remember here is that, as long as the key is loaded, your data sets will be mountable, so they'll be able to be brought. You know brought up and they'll look just like normal ZFS file systems to everything else. Your child data sets are going to inherit encryption properties and the key source by default.

B

You can also set new encryption in new key source, stuff and new key sources on child data, sets and then they'll kind of be encrypted and need to be mounted separately and, as I said before, the key and key source will be changeable without having to re encrypt the data set, but it will still be secure last thing to talk about as far as administration is basically what we're going to support as far as encryption, algorithms right out of the gate, the first one is we're gonna, be supporting AES as our main block, cipher and I'll get into what that means in a minute in CCM and GCM modes, and with 128 bit 192 bit and 256 bit key links.

B

Encryption equals on. If you just take the default on it, encryption it will default to off. But if used to say, encryption equals on that will default to a es. Ccm 256 bit the key sources, which again is how you're going to give your key. You can either specify that you want to do it from a prompt. So, basically, when you save ZFS mount or ZFS key L or one of those other kinds of commands, it will prompt you for your key at the command line, there's also a file.

B

So if you want to have your key in a separate flash drive and you plug it into the flash drive and then you know, then your data sets are mountable. You can do that for greater automation. As far as the formats you can give the key either in raw bits. You know, just as it actually is in hex or a passphrase.

B

If you do specify a passphrase, we're going to have variable numbers of pbkdf2, iterations and I'll explain what pbkdf2 is, but for those of you who know encryption, that will be important and we also have these new properties which are encryption, key source, key status and pbkdf2 ITER's, which will kind of become self-explanatory in a minute.

B

So, as far as the things that kind of had to you know some things that had to change that aren't necessarily the best as far as ZFS encryption goes, the biggest compromise we had to make was that copies is gonna have to be limited to two, so you won't be able to do copies.

B

Equal three with your encrypt with your encrypted data, dee doop themselves, like I, said, are not encrypted, but the that being said, the data that is in a dee doop table is encrypted, but what this means is that you'll be able to still see some patterns in your data within an encrypted data set you're going to be able to see you know which blocks have d duped against each other.

B

um The last thing is kind of I encourage people to Wikipedia it if they're interested, but basically there is something called a crime attack that it's not applicable to 99 percent of applications. It can be if you're really concerned about it. You can get around it with compression just by turning compression off but for most applications. This won't really matter so we're leaving so we're not making it mutually exclusive.

B

So now, I'm, gonna kind of show you and walk you through the implementation of how the actual data gets encrypted on disk and I'm gonna kind of start from the ground up. So what I mean by that is first we're gonna start start off by talking about the encryption scope. What is actually going to be encrypted?

B

Obviously it's going to be the users data, but are we gonna do it at the file level or the block level we're going to be doing it at the block level, and this is gonna be for a number of reasons. First of all, we're gonna be able to encrypt each block separately, which means that if you have a very, very large file in order to encrypt or decrypt anything, you need to encrypt and decrypt in that unit, so this will kind of limit it to the maximum block size.

B

So if your maximum block size is 128 K, the most you will ever have to encrypt and decrypt it once is 128 K we're gonna store the encryption parameters in block pointer T, and that's going to be important in a little bit I'm going to show you how that's all going work and by limiting the scope of the encryption to a block. Instead of to a file, we can make sure that only blocks can get lost.

B

You know due to loss of encryption parameters and things like that and that the the the scope for that is only a single block as opposed to an entire file.

B

So now the two types of encryption and again this is from kind of a 40,000 foot view is a symmetric, encryption and symmetric encryption. A symmetric encryption just for completeness is kind of like the the kind of encryption that's done for SSH and TLS handshakes. It's meant for verifying people and verifying trust between users and making sure that everybody who's talking is who they say they are and that they trust them. This is very slow and usually what happens immediately after this?

B

Is they submit exchanges, symmetric encryption, key a symmetric encryption key uses, a single key for both encryption and decryption, whereas an asymmetric key, a symmetric encryption has a private and public key pair. Now this is going to be way way faster than doing asymmetric encryption for everything on a yet on an x86 64 architecture.

B

If you have the AES anti instruction set, which, if you're using an Intel processor you'd, probably do it will actually be a bet about a thousand times faster to do this encryption and decryption than it would be if we were trying to encrypt everything with RSA or some similar algorithm like that. So, let's start off by talking about the basic unit of encryption. This is called a block cipher. A block cipher essentially takes one block of plain data and your encryption key and will turn it into one block of output data.

B

Now this it's, you know this is good, and this is what we're gonna be basing everything off of, but this has a number of really severe limitations, the biggest of which is that it only works on a fixed block size which is 128 bits. Most of us want to encrypt more than 16 bytes of data, so we're going to need to modify this and we're going to need to turn it into a dream cipher. What a stream cipher is basically is we're going to add we're going to take the most basic stream.

B

Cipher is called ECB mode or electronic cookbook mode, and what that's basically going to do are electronic codebook. Excuse me yeah, but anyway, the what's going to happen is basically we're just going to take the AES algorithm on each block of the plain data and kind of just do it in series on each single block. So that's going to actually encrypt all of our data, and you know we can now encrypt as much as we want. But this has a severe problem too.

B

For those of you, who've studied cryptography a little bit. This is kind of the picture that everybody's known knows and sees, and it kind of explains the big problem with ECB encryption problem is: is that you can still see patterns in your data because everything was encrypted. The same everything was kind of encrypted the same way. So each block, you know still, you can still see patterns in all the data. What we want in really is this is pseudo-random data. We don't want to be able to detect any patterns.

B

Everything should be, you know completely random. So in order to combat this, we're going to add a we're gonna make this a confidential stream cipher.

B

What a confidential stream cipher means is that basically, we are going to add in this thing called an IV, the IV kind of acts as a salt for the AES algorithm, so the first block is going to be encrypted salted with this IV and then every block after that is going to be it's going to take kind of the results of the last block and use that for the next block, so that this way we can't detect any patterns, because every block is kind of dependent on the block before it.

B

This is going to be important for a number of reasons. The big thing is that there's a bunch of different ways to do this, then these are called modes of operation and each one is going to have some slightly different, some slightly different requirements for how we need to manage it.

B

Basically, the big requirements that you need to know about in order to get the rest of this pre tation are that were limited to up to a hundred a hundred and four bits or thirteen bytes. That's the maximum size that we can allow of an IV for the two modes that we're going to be supporting and 96 bits or twelve bytes is what's recommended by NIST because of a number of performance issues.

B

The other really important thing is that reusing, an IV with the same key will result in a catastrophic failure of the encryption, which basically means that with even if you don't have the key, you can decrypt both blocks of the data. This is really really bad and it means that we're lying to the people who we told them that we encrypted their data. So obviously we don't want to do that.

B

So the big things to take away from here are the IV needs to always you be unique for these two modes and we get 12 12 bytes of it. The last thing that we want to add to this kind of basic block of encryption is, what's called an authenticated encryption now. What this means basically, is once we've written our encrypted data out. How do we know that nobody's changed it? When we go to decrypt it?

B

We want to make sure that what we wrote out is exactly what we, what what I'm sorry, what we're reading is exactly what we wrote out and that, because, with a regular like sha-256, checksum or any other kind of checksum, anybody can produce that checksum and they, if you can change the ciphertext, you can get our applications to decode garbage, and you know that's still not good from an you know. We still want to be able to make sure that our data is secure. We at least want to know that somebody was trying to you.

C

B

Was trying to alter our data so we're gonna add something called a Mac. A Mac is basically a message, authentication code and it's basically a checksum, a cryptographically, secure, checksum like sha-256 or sha-512, or one of the other ones. That will require a secret key to produce. So in this case, that's also going to be the encryption key we're going to so. Basically, when we go to decrypt this data, we can check that the Mac. We can check the Mac against what we actually had.

B

You know the resulting Mac from decryption against what we actually had and we can make sure that since nobody else had our secret key, if the Mac doesn't match up exactly, then this won't work that or then we know that somebody tried to alter the data.

B

So the last thing or so the next thing that we want to add in here is, if you look at this encryption key, if you remember before, I was talking about the IV and what I said is that the IV is limited to 12 bytes 12 bytes is a lot of room for for data, but here at ZFS you know we promise to be able to do and to store up to a zettabyte theoretically of data, so we need to be able to use more IVs without the risk of with what, without the risk of reusing one.

B

So what we're gonna add is this H KDF function, an H KDF function is really simple. All it does. Is it takes a master key and then assault which we're going to store somewhere too and the salts between the two of them. They will basically result in an encryption key, and this encryption key will be what's actually used to encrypt the data. We can generate a new salt every once in a while, and that what that will do.

B

What that'll enable us to do is it will give us basically a fresh IV space, and we don't need to worry about colliding with previous IVs that we might have had before.

B

This is relatively quick to calculate, and it's basically like I, said the whole point of it is to prevent the master key from kind of going stale, because we're running out of IVs that we can use now. That being said, this salt and this resulting encryption key, are going to be usable for quite a while. So we don't need to change it.

B

Every single op on every single operation, so we're gonna be able to cache this for a while, and basically we will regenerate it every certain number of transaction groups or every time that you re in port, the pool I'll get back to at the end. So now this is what our diagram looks like. If you notice we employ, we replace the regular encryption key and everything here, that's highlighted in red is stuff that we added to it.

B

Basically, and now we have our master key, which is going to be used with the salt to produce the actual encryption and then that's, what's gonna be used to encrypt the data. So now we have these two values right here: the salts and the IV, and the question is: where are we going to get these from? The answer is we're going to randomly generate them with a pseudo-random number generator I can go over all the math if you guys really want to know about it.

B

But basically, if you calculate out the numbers, what we're guaranteed pretty much is that we get whatever. That number is 41 million years at. If we, if we're encrypting, 1 million blocks per second, we won't have, we will have a 1 in a 1 billion chance of of reusing the same IV with the same key in 41 million years and by then I'll be retired. So so now this is our hole. This is our whole diagram. So if you look at the things that we have to store, I highlighted them in blue they're.

B

Basically, the salt, the IV and the Mac, the cipher data is just gonna go where the original plain data was gonna go anyway, so we don't really need to care about that too much. It's just gonna get applied to the transforms so to talk about where these things are gonna go. Basically, we kind of made a bit of a Union out of block pointer. It's not a union in the code for any of the for those of you who might have been concerned about that.

B

But basically we took a bunch of fields in the block pointer and started using them where we could. If you look at the salt, the salt is used for the fill count it's stored in the fill count. We can do this because the the because the fill count we only encrypt a level zero data and level zero data always has a fill count of one. So we can assume that this is kind of. So we can assume that this is. You know that this is one just because this block is encrypted.

B

So therefore we can store the salt there. We don't really need that field. For these for the Mac we're going to store that in the checksum, it's basically going to be 128 bits long, which is half the checksum, and the reason that we can do this is because the Mac and the checksum kind of serve the same purpose. The Mac is there two are the checksum? Is there to make sure that data is exactly as we wrote it in the it's the same thing, but it also protects against malicious users.

B

There's two reasons why we split in half and calculate both the first one is because the maximum length of a Mac for both of these modes, that we support, is going to be 128 bits, so that kind of worked out the other thing about it is that by having a half of a regular checksum, we'll still be able to do resilvered and rescrub and all those kinds of operation and scrub operations, just as you normally would by using that half of the checksum for for checking all that, even without the keys loaded.

B

The last thing is the IV and mat. Errands and I talked about this for probably about a month and a half about where we were gonna store. This we threw around a whole bunch of ideas, basically originally to kind of sum it up the originally. This was stored in the padding, which is right here, and that seems like a great idea until you realize that it would use three-quarters of the padding, and that would mean that we can't store any more un 64's in the in in the future.

B

So this kind of locks out block pointer pretty much forever. So we kind of didn't want to do that. We also toured around with the IV with the idea of generating the IV from different from different fields in there. So you know, for instance, we could take the place on disk, but things get rewritten to the same place on disk mode. Most often we could have added in the birth time, but the problem with the birth, with adding in the birth transaction group is that ZFS.

B

If you hard shut down, it can rewind a little bit, because you know it will write out this block, but that won't be the block, that's in the youever block. So now we need to kind of have this count. So now we kind of you know it's it's not really secure. We can't really guarantee that that is truly, that that is truly unique and that we never reused it.

B

We thought about using the bookmark, but the bookmark gets complicated, because when you have snapshots or D dupe turned on, that block could be referenced from multiple different places. So none of these were really going to work out, and so that's partially, why it's randomly generated and partially why it's stored where it is- and this is for those of you wondering from before why we can't do more than two copies of encrypted data.

B

You will still be able to have more than two copies of the metadata above ZFS encrypted, but the data itself, it's limited to two.

B

So now, let's talk about D, dupe and how this will work with this, because we will still be able to D do in order for dupe to work, as it does currently without a whole lot of modification.

B

Basically, we're gonna need this Mac and checksum from back here to match, because that is the part that D dupe is going to look at now in order for the Mac and checksum to match we're going to need to write out the same ciphertext data and we're going in order to get that or for the four equivalent blocks of data, and in order for that to work, we need to reuse the same IV in salt, otherwise we won't produce the same output.

B

Now, if you've been listening to me, this whole time, I've been saying: hey, we really can't use the same IV with the same key and that's exactly what I'm proposing here. But the difference is: is that we're now also using the same data and because we're using the same data here? What we've really only done is, instead of leaking all of everything, we've actually just duplicated, exactly what we had before and we can use that to decrypt and we can use that to detect deduplication now.

B

This is technically still a leak of information, because again we wanted to completely pseudo random that nobody can detect any patterns in but D to the D dupe tables were going to leak this information anyway and if T dupe is important to you. This is kind of something that you can be aware of, and you know and understand.

B

Yes, this is only with D do write. This is all considerations for for D to write, but your data won't be completely entirely random and that's something that you could should kind of be aware of. If you're really concerned about it, you can simply turn D dupe off on those particular data sets.

B

So how are we going to generate the same IV and salt for everything for equivalent blocks of data? The only real way to do that is to take basically a checksum of the data, and because this is encryption, we're not going to take just a regular check. Some of the data we are going to take an H Mac, an H Mac is basically the same kind of thing as a Mac. It's a secure checksum that you need a secret key to regenerate. The only difference is that we're not.

B

We don't also have to encrypt the data in order to make this happen. Basically, this is going to result in a 256 bit Mac H Mac, and then that is going to be split up into 64 bits of the salt and 96 bits of the IV and that's what's going to be stored. So now this is where you get your money's worth. This is the picture of how encryption is going to work completely.

B

Basically, if you're doing I highlighted the parts that are gonna be different in red, so for nandi, doop you're just going to have the plain data and then you're gonna have the pseudo-random number generator generating your salt, an IV for dee doop. It's just slightly more complicated we're going to be using this H Mac function to generate your salt, an IV, but that's really the only difference other than that everything's gonna work, kind of the same.

B

Basically, there's two last things to talk about as far as diagrams and encryption go and actually getting data encrypted to disk. The first one is I said before that we don't want that. We don't want to have to re-encrypt all of the data, that's on disk just because somebody might have compromised their password or you know, just because the user wants to rotate their password. That would be really really inefficient.

B

So we're gonna add one layer of indirection between the master key that I was talking about before, which is right here and between the actual key that the user gives you. So the user is going to supply this wrapping key we're gonna, call it and the wrapping key is going to be used to encrypt the a the master keys and the H Mac keys on disk.

B

One second we're getting to that: okay stay tuned. Now this is what's as you as I'm sorry yeah, as this gentleman pointed out.

B

Basically, if this is going to be the case for when you have for when you're specifying a key via Raw or hex, fret or hex, so this is what its gonna look like, but basically all we're doing is we're taking the master key which were randomly generating and we're going to encrypt that on disk in a separate object, which is called a DSL crypto key when it comes to pass phrases, pass phrases present a slightly different problem. Pass phrases are variable lengths and they're reputedly, extremely extremely weak.

B

You can come up with a table of like the top 1,000 most, you know most used passwords and you know you can enter that into a data set and chances are. You might find something? You know you might find something.

B

So what we're gonna do to fight that is basically something called pbkdf2. Pbkdf2 turns a passphrase into a into a usable key. So basically, what's gonna happen is we're. Gonna run this function and the function is designed to be just very, very hard to compute. It's gonna take a whole bunch of iterations. Basically, and the idea is it's going to generate out this key at the end of it.

B

But if somebody were to try to do this over and over again, it would be very, very computationally, expensive and it would just take too long to actually do for the person who's actually trying to decrypt their data correctly and is using the correct key. They only have to pay this price once so. It's not so bad, but everybody, but anybody who's trying to brute-force the password will have to pay this price and many many times a couple of additional topics, just to kind of give.

B

You guys an idea of some of the things that are also that are also coming with this. As far as Zil encryption Zil blocks have a slightly. You know, they need some slightly different considerations, because they'll dill blocks first of all are pre-allocated, and so we can't save encryption parameters into the block pointer after we've already written out the data block. So, instead we're gonna have to do something. A little different first thing.

B

We're gonna do is we're going to store the Mac in the Zil header and then the IV we're gonna pre-allocate, along with the block I, don't have a whole lot of time, yet time left, but basically the concept here and I can talk to anybody who wants to know about it later.

B

The concept here is that we're going to leave bazzill structure intact, but encrypt all of the important user data in it.

B

So all of the entries in it are going to be encrypted, but the blocks are going to be or but the blocks themselves and the pointers to other blocks within it are gonna, be left decrypted which will allow the Zil to be claimed without keys l2 arc encryption was changed a lot recently because of the compressed l2 art changes that just recently came out the, but basically the way that it's going to work now is.

B

We are going to the idea of the new compress darkus that we're storing data we're storing data exactly as it exists in the in the pool. So, basically, when you read data into the l2, our key, it will be compressed, but not but decrypted, and the reason for that is so that we can reuse the data over and over again without having to decrypt it all the time, because there's no real benefit to that.

B

Once we have the data decrypted or once when we go to write it to the l2 arc, we're going to re-encrypt it and yeah. That's basically how it's gonna work, but it's basically gonna be exactly the same as it is on disk and when it gets read off of the l2 arc, it will use the block pointer, tease, ivy and salt to decrypt it.

B

The last thing is something that I'm kind of excited to to talk about, and this is the idea of a raw send. So the idea is that we're going to, in addition to the compressed sense, which have just recently been merged, we're going to be able to take we're going to be able to take the data exactly as it is on disk and move it up to an off-site storage facility.

B

Even if it's untrusted and it will be still encrypted- and basically the idea here- is that you can send data to an untrusted server without risking that it can be decrypted. So this will make ZFS kind of a true platform for end-to-end encryption so that you know we could really be used for anything and including high-security kind of security, sense of kinds of applications.

B

An admin will always be able to take backups of their pool efficiently, even without there. Even without you know the person behind the data being able to, or you know, having to trust them. Essentially as far as the current status, this is completely implemented the except for Ross sense. That's gonna, be coming in a separate pull request. It's ready to review I'm begging, pleading for people to please review it and take a look and tell me what's wrong.

B

Tell me you know anything that that you can think of that might be a problem with it I'm. You know I've been kind of sick and tired of rebasing it on top of the current master for the past few months, so I'm, you know so I'm definitely trying to get it pushed through. The primary pull request is on Linux because that's where we at detto are based, but there are pull requests out that are basically tracking the same changes on OS, X and illumos.

B

I also wanted to give a quick thank thank you to Jurgen lungmen for helping to maintain the OS, X and illumos ports and by helping I mean doing it. Matt, Ahrens and brian behlendorf, I'm, gonna guess, but basically for answering all of my questions about everything. That's happened and George Wilson and Dan Camel for helping me work through the art changes that just got merged in a couple weeks ago.

B

Yes, that will work with the same ZFS send stuff, because the entire structure of the pool will still oh, the question was: will send still be able to work with the data and encrypted and still be able to do everything incrementally. The answer is yes, because the entire pool structure is still left unencrypted and that's kind of a big win, because we can do that and we can do scrubs and sense.

B

Embedded block pointers you can't they. This is not like a zpool. These features are incompatible, but at the Xylo layer, if you have encryption, turned off embedded gets turned off effectively, which is a bit of a performance loss for some applications, but that's kind of I have to store the encryption parameters somewhere. So.

B

B

The question is: will a mass will a root user on the system be able to get to the master key regardless correct, but as long as the system is running, the answer is the master key is never exposed to to the users it's completely kept within ZFS and the keys. The wrapping keys aren't really exposed through that.

B

The only way that they'd be able to get through that is through some kind of like dev mem and dev K mem interface that still exists on very, very old, very, very unsecure systems, but that I'm, sorry or yeah. You could get it from it from a debugger, but most people, yeah they're. So the one of the big things that I wanted to point out was that this is encryption at rest. This is kind of like encryption is broken into three different kind of categories in transit at rest and in use.

B

I can't really do much to prevent to protect encrypted data in use because, for instance, that will even live in the page cache in Linux or Alamos or other things. So there's not a lot. I can do there. This could be added, but this will. This will definitely provide a big layer of security that you know will be very hard to overcome.

B

Yes, but you have to have also given them your key at the same time, because you you would have had you would have had to enter your passphrase on the remote server. You can back the data up to there and be able to get it back and yeah and it will be able to be integrity checked, but it won't. You can't decrypt it unless you give them the passphrase as well.

B

No yeah exactly.

B

B

Yes, it's there are, there are weaknesses to everything and people are finding. You know different flaws and encryption algorithms 128 bits yeah there. There are weaknesses and everything, and every day people are finding more and more like I said the default is 128.

B

There there are a couple of things like that: we do default to CCM 256 bit with on, but this is also expandable. You can there's the possibility with this to add encryption. Algorithms, like as we find new ones the same way we can added checksums.

B

I'm, sorry, could you repeat that one more time.

B

The question is: will the Ross sense and the encrypted data as it exists on disk or the unencrypted data correct or both yep I, think I have the N. So tell me if this answers your question basically I'm gonna be adding a flag which will do a raw send and the if you have that turned on it will be exactly as it is on disk. If you don't have that flag turned on, then then it will just do a regular, send and you'll have to have the keys loaded just like it would normal.

B

But that way you can.

B

You'll, have it will error out if you don't have the keys loaded normally and if you do have the key or and if you don't have the keys loaded and you specify a raw send, it will do that.

B

Oh I can't hear you too well, but it sounds like you're just asking for performance information. We did some testing on performance and basically, what we found is approximately anywhere under a 5% CPU overhead. There wasn't a big there wasn't a big overhead as far as throughput to disk is concerned, because that's kind of the bottleneck, but as far as CPU overhead, it could add as much as 5%. Usually we saw it at about 2.

B

Are we storing that the the fact that the data is encrypted in the block pointer, yes, I, believe it's bit 67? It is the one it is the there is. If you look in the comments of block point or T, there is actually a bit that has been reserved for encryption since the Sundays and I'm, using that one.

B

So the two questions were this well I'll answer the second one, because I remember that the second one was some systems like Lux have the ability to have multiple keys or yeah multiple slots to unlock the master keys currently right now, no, but that could easily be added every the encryption keys are stored in a zap, so it'd be not too bad to you know. Add that functionality later in the future, but right now it's just a single key to decrypt. What was your first question? I'm. Sorry.

B

Have we had anybody with the extensive crypto background check all of the protocols and procedures that we're doing? Basically, the answer is not yet but we're in the process of getting that of getting that done and we've had people who are there there's a big thing with anytime. You work with cryptography where any time you ask somebody for their opinion, they say I'm, not a cryptographer, but and then they give you their opinion and we've hit a lot of that. But we're working to get that.

B

So for that, I didn't have time to cover this, but we're actually using illumos, which we have ported to OSX and and Illuma and Linux. The reason for that is because, in order to get, we needed one crypto framer to kind of work across everything, so we made a separate module called the ICP which will also support cryptic new checksums like sha-512 and skying, and a few other ones.

B

It's that's actually out for another pull request right now: I'm, sorry, the the ICP is kind of like a framework and I poured it over the stuff that we needed for that. But if that exists in a Lumos, then it can be easily be brought in will be.

B

How does this compare to the Oracle Solaris encryption?

B

Basically originally, when I first started working on this, it was kind of based on it, but in the time that has happened since then we found a couple of problems with it that you know I can get into offline if you want, but basically they we found a couple of minor flaws that we thought should be addressed. It's not compatible pool wise at all, but that what kind of wasn't a goal.

B

All filenames are encrypted in the are leaked. No, no none of that is late. If you have an encrypted data, SEC data set any user information. Pretty much is fine, including file data, C, Vols file, names, Eccles permissions extended attributes any of that all encrypted.

B

Yes, basically object data and also the we have the bonus buffer. If for those of you who know about the DM, you there's a bonus buffer and we force the bonus buffers for encrypted objects off the Sybil off the spill blocks, and then we encrypt the spill block.

B

Is it possible to list the snapshots without the encryption key? Yes, because that's all part of the pool structure, so the pool names will be the pool names and data set names that will not be decrypted not be encrypted, but everything within it will be.

B

Do we have support for any kind of external encryption stuff not at the moment, and the reason for this is basically because open ZFS needs to cater to OSX, illumos and open ZFS at minimum, and it's very hard to kind of get those things to work across all three. It could be added in the future if we could, if we could figure out how to get a framework to work with the hardware encryption devices.

B

Is free best BFD on my radar I've just written the Linux version. I really want to thank Jordan, London and I'm, hoping I'm pronouncing that name right, because I've never actually talked to him. It's all been through emails, but he has been maintaining the illumos and OSX ports so that I don't, but it should not be incredibly hard to port at all.

B

um Is there a way to protect a USB key? If you have the passphrase on a USB key? Is there a way to protect the USB keys stuff?

B

There are plenty of ways to do that, including things like eek, ripped, FS and other encryption things for that that's kind of outside of ZFS. But the idea is that if you wanted to read it from a file, it was more gonna be automated anyway, because if not, then you can use a password. You know, just like normal and I will be generating pretty much the same thing that the encryption key on the USB would be.

B

You'll have to talk to me.

C

That afterwards,.

B

Yeah: okay, yeah, oh okay, gotcha.

B

If there a way to list loaded keys, yes, the key status property is a read-only property. That will tell you if your key is loaded.

B

Basically the three, the three states that can be in our dash, which means your data set, is encrypted available and unavailable anything else. Thank you very much.