OpenZFS OpenZFS Office Hours, 11 Oct 2013

Previous Meeting

⏯

youtube image

►

From YouTube: OpenZFS Office Hours with Matt Ahrens

Description

OpenZFS Office Hours with Matt Ahrens, October 11, 2013

Questions / topics:
Hole Birth Time
Send/Receive network buffer
NFS over RDMA
shareiscsi property
compacting ZAP objects
larger block sizes
Write smoothing & histogram changes in production
Linux port & cross-platform codebase
Device removal & bprewrite
Performance with lots of snapshots
Trivia question & t-shirt giveaway

A

Cool so um I'm coming to you guys live uh from san francisco, I'm matt ahrens.

A

I started the zfs project with jeff bonwick back in 2001 at sun maker systems, and I'm here today to answer questions about zfs um and open zfs and anything you might want to talk about. So uh it looks like we have just a few people here uh extremely promptly, um so take it away, um derek uh karen or percussion anything that I can tell you about open, zfs.

A

Nobody has questions all right, um then. I will just start telling you about some zfs stuff that I'm working on.

A

So right now I'm working on a project with max uh max grossman who's, another employee at delphics.

A

um He is stuck on a train right now or uh he would be joining us, um but we're working on a project that he mostly implemented, and the idea is that if you ever use, zfs send and receive say for backups or for remote replication, then.

A

uh We've noticed the problem with it a performance problem where, um when you're doing a send of a stream that has lots of holes in it, um so, for example, uh this happens a lot with uh sending z vols. So we noticed this problem when we were sending uh z vols for remote replication um from one machine to another, so z, vols um hold other file systems um and they start out basically totally sparse, but then, as the other file systems say, maybe ntfs um fills in parts of the files parts of the volume.

A

uh Then the parts that it hasn't filled in are going to be holes, so those holes need to be um with the open, z fest. Currently those holes need to be transmitted to the other machine, um every time that you do a send. So basically, the problem is that we don't know whether a hole was newly created or it's always been there.

A

um So the end result is that you get these sense streams where maybe you've changed like one megabyte of data in your zval, but there's also, you know 10 000 holes, in other words, regions that are uh logically zero filled.

A

um So, with current zfs we have to send the one megabyte of change data and then also the 10 000 um whole regions.

A

So, uh with the new changes that max has done, I will basically store the birth time of every hole. So just like we just like when we write to a data block, we store the birth time and then we use that birth time to know whether the block has been changed or not, and therefore, whether we should send that data to the other system or not we're gonna do the same kind of thing with holes.

A

So when, uh when a hole is created um by either um the application, writing a whole run of zeros and then uh us compressing that away to a hole or um by an explicit uh trim, command or um a uh truncate of a file. Then that will record the when that happened.

A

So then we know whether we need to send that to the remote system or not so, we've seen this have a drastic reduction in the number of holes that need to be sent, which which has a big performance impact um primarily on the receiving system, which now doesn't need to go examine all of those blocks to figure out. um Are they already a hole? Do I need to punch a hole here or what not um and uh we've also this is kind of the result of several improvements to zfsn with holes.

A

We we first discovered that there were a bunch of kind of almost pathological performance problems uh with receiving these holes, and uh this is exacerbated by the fact that there are so many of them. So first we saw that that um the kind of pathological performance problems that you could see, sometimes when receiving a file like a z-ball with a lot of holes and that's in open zfs today and then we realized well.

A

We could actually reduce the number of holes that we need to send drastically as well, and that's what we're working on right now, um any questions about that. I know some people a few people joined in the middle of my little explanation. There, hey luke.

B

Hey how's it going.

A

Good um any questions about that or other questions about anything open, zfs.

A

No all right, then, I'm going to start picking on you guys and asking you some questions um so luke. uh I know you guys use zfs at hybrid cluster and you're using send and receive to send the data between the different nodes in the cluster.

A

How, how are you doing that? In other words, what's uh what are you using for a network transport um and uh what have you noticed any kind of performance issues with that, or have you needed to do anything in particular to improve the performance.

B

Yeah sure um so we've got a sort of a group messaging protocol which is built out of twisted, which is a python networking framework that coordinates all of the nodes, but the actual um zfs send and receive happens via um it happens over an ssh transport. So the group messaging code just sets up these ssh connections.

B

It actually uses a fifo on the local machine in order to make one node start pumping data into another node and then zfs receive gets initiated on the other node to start receiving it. um Some fairly recent changes in freebsd improved performance of uh ssh transfer over wide area networks, which has helped us, and we also use m-buffer in the um pipe. So we use.

C

B

On both sides- um and that's really useful, because it looks because m-buffer then reports to us what the speed of the transfer is and we use that to detect stalled transfers and things like that. So.

A

M-Buffer, um as I understand it, is a utility that basically uh receives the stream and then um has has some fixed size buffer and then sends the stream. So it's basically just like a very simple. Like uh producer consumer um kind of problem, yeah.

B

Right, it's just like a larger user land pipe. I guess.

A

B

Than getting stalled on um the receive process needing to find some free space that would stall the sender um if the pipe size was very small, then buffer allows you to smooth out that uh send performance over the network um and yeah. I think that I don't know. Inbuffer is a threaded c program that uh was in ports. I think it originated on solaris that was intended for like tape, archives or something but um yeah it does the job yeah.

A

um Yeah I've seen that problem as well, um and you know the the kind of fundamental problem there is that the kernel is producing say with zfsn. um The kernel is producing uh data and then sending it into this pipe, but the pipes in the kernel have very small um buffers. It's like dozens of kilobytes, so you end up with essentially having one thread which is um producing the data like it gets the data from disk it it generates.

A

The send stream sends it into this pipe and then, uh if it isn't being read extremely quickly from the network, then it has to wait for the net for those bytes to be sent over the network before it can get the next batch of data from the disk. So you end up with, like I'm either reading from the disk or I'm sending over the network, um and what you really want is to be doing both of them at once, which the m buffer achieves.

A

um So this is something that I've thought about um just implementing in uh zfs like in the zfs userland. um Just because it's it's such a common problem.

B

A

Would be really cool.

B

So we found some bugs in buffer.

A

Well, let's not implement those that would be doing zfs.

B

Well, exactly I mean so so one particular thing that we found was that if you send something, that's if you send a send stream, which is smaller than the minimum member or the typica, sorry, the ember for block size, the num buffer, just hangs forever.

B

um So uh I'm actually, I think that we we fixed that by reducing the m buffer block size to something very small.

A

So it would be.

B

But that's probably just a bug, I think.

A

So um I think this is a pretty small. This would be a pretty small project to implement it in zfs. So um if anyone is going to be at the open, zfs developer summit and hackathon, I think this would be a great uh hackathon project. um It's probably doable in a day, or at least you could get it started in a day.

A

So we had a couple other questions over um on irc, so one uh was from josh simon asking if there's any version of open zfs that supports nfs over rdma um when I was at sun. I remember them working on this, but I don't know if it ever got integrated um george.

D

I think it actually did get integrated. um I don't know if it was complete, but uh I think it should be in illumos has rdma access.

A

Cool, do you know, uh out of curiosity? Is there also such a thing as iscsi over rdma or.

C

uh I do not know fibre channel over rdma.

D

Yeah, I do not know if that was something that anybody was working on, that I know no.

A

um Josh says: uh yes, it's called icer. I ser.

D

So with then, I think that's also in illumos, because I have seen some icer thing that went in a while ago.

A

Cool all right, so it sounds like that is in illumos. I don't know about um freebsd or linux.

A

All right, cool um and there's an another question um from someone on irc odk asking: do we plan to implement share iscsi property? He recently checked the last bill to smart os and it was not there.

A

A

I know you can share stuff over iscsi, but I think that's a little bit.

C

This is to actually do like um zfs share.

A

I think it's that smell, like the same way, you can do kind of zfs set share nfs equals on like doing something similar with set share, iscsi equals on yeah.

C

D

Know if, um because the current iscsi share functionality, I think, is for the old, like I scuzzy, tigget d, which is the old implementation, not the comments implementation. So I don't think that anybody has gone in to actually look at what it would take to rip that out because it pretty much that's all dead. As far as I know and then re-implement it using the comstar iscsi kernel stuff.

A

Gotcha so in it seems like this would be specific to whatever platform you're running it on. So you know we're talking here about doing it on lumos. um I imagine that the way that it would work like, hopefully you you we would be able to do something where you could set share is equals on on any platform, and it would hook up with that platform's um specific way of sharing stuff over iscsi, but that way would probably be different on each platform so need to be implemented separately.

A

Yeah, so eric um eric sprole was also commenting, the same thing that you said that the sheri schezzy was removed when they switched to comstar, which is the new um ice iscsi and fiber channel et cetera, sharing mechanism. That's in the lumos kernel.

A

So there's a question on irc from prakesh asking how difficult would it be to support collapsing of fat zaps? In other words, when there's a lot of entry, when you have a zap object, that has a lot of entries and then a lot of them are removed. It becomes really sparse. It'd be nice to collapse. The leafs to a more compact form.

A

um So, uh interestingly enough uh way, back in the day, like probably 10 years ago, you could actually shrink or zap objects would automatically shrink um when uh the leaf blocks became sparse enough, um and uh this is a little bit tricky, especially in terms of the locking, um I think back. Then I was trying to implement an even finer, grained locking strategy than we have today.

A

A

If you want, I think that that would be pretty doable. I think it'd probably be easier to do today um with with kind of the infrastructure that we have now than it was back when I removed when I removed that functionality a long time ago. um The main tricky things uh so like the way that the main trigger thing would be integrating with the locking um of the zap. um So to give you kind of an example of what hap think about what happens when we add an entry to the zap.

A

um So when we add an entry, uh we look in uh there's like a hash table that tells us um which leaf block gets stored in uh which leaf block that entry should be stored in we go to that leaf block and then, if and then add the entry to that leaf block.

A

Now, if the leaf block is already mostly full, um then we will need to split the leaf block so um take that one block uh and create another block to ship to move half of its entries to um and then we'll change the big pointer table that points to it. um If there's, hopefully, there's two entries in that pointer table that point to this leaf block and then we'll change it so that one that points to the leaf block and the other one points to the new block that we've created.

A

So in terms of the locking um what normally happens, the fast path would be that uh we get a lock on the entire zap object just for reader, so the reader writer lock, and then we we lock the specific leaf block exclusively.

A

Now. If we discover that we need to split it, then we need to change that uh table. So we need to get an exclusive lock on the zap on the whole object, so upgrading that lock and retrying, it is kind of the tricky part, and we need to do something similar when shrinking. So when you go to shrink look, uh we would look at the or when you get to remove an entry, say, go and look at that that leaf block.

A

If it is only you know, 10, full or whatever, then look at its sibling and see. Oh, could I could I collapse this this block and its sibling into one block. That would still not be too full, uh then I need to upgrade the lock on the entire zap object to writer, so that somebody else doesn't try to do this at the same time, but I think that would be pretty doable.

A

um Okay, questions.

A

ah So um alan jude um is asking about um a problem he's asking what it would take to go to uh arbitrarily large block sizes like four megabytes or one megabyte or four megabytes.

A

So currently, the maximum block size on zfs is 128k and I the we might want to have larger block sizes up to say one megabytes, four megabytes, maybe even bigger um the main uh motivation for this um is to increase performance.

A

um It would probably increase performance, a tiny bit on um on, like mirrors or um stripes, but there'd probably be a bigger performance uh improvement on raid z, where um you take that. Currently we have like a 128k block, you take it and you split it up into a bunch of chunks, um each chunk going to one of the devices in your raid z group.

A

So if you have a pretty wide raid z like say, 12, disc, right, wide raid z group, you take that 128k and divide it into 12 chunks, each about say, 10 kilobytes.

A

um Now, uh if that, uh if that, um if those devices that are in the raid z group are have 4k sector size, then you're talking about a very small number of sectors that were writing to each device.

A

And so there's both space space efficiency utilization issues there, as well as performance issues where, if you're doing uh say a bunch of say, if you're doing like a re-silver, where we need to read every block of data off the disk.

A

If we have larger blocks, then will be able to read um more contiguously from the discs, um because uh uh resilient tends to be a very random io uh type workload um and uh so having a smaller number of operations that we have to do because we have larger blocks. I would potentially increase performance there a lot, um the the issues with actually doing this. uh Our main are mainly that um we would need given the current infrastructure when we, whenever we have a block, uh say 128k block.

A

um When we cache that in memory we allocate a 128k of uh kernel stack space, so uh sorry of kernel heap space, so we allocate 128k on the uh in the kernel's heap, um and uh so that means.

A

Don't mute me derek thanks.

A

um I know I'm rambling a little bit here, but uh uh so the the the performance issues, the performance issue is that um doing a large allocation requires a lot of contiguous uh virtual memory in the kernel's address space, and that uh depends that performance of that depends a lot on the kernel memory allocator, which has kind of varying qualities across different platforms.

A

So I know, for example, on lumos allocating regions that are bigger than fit into the cam caches.

A

A

If it, if it's bigger than what fits in the cam caches, then we have to cross call all the cpus to set up a new. um A new address. Mapping for that set of new contiguous addresses.

A

um So having to allocate one meg as opposed to 120k, just exacerbates the um that issue, um so mainly it's a matter of evaluating uh how that will perform on the different platforms.

A

I know that on, for example, linux they're already they already have uh some performance problems as a result of having to allocate just 128k contiguously, um so the linux guys are looking at um using the page, allocator directly and basically creating like a scatter gather list or a list of pages rather than a contiguous, um contiguous, in-memory buffer.

A

It would be a list of uh particular pages of memory um and then, whenever we needed to copy in or out of that, then um the scatter gather code would deal with accessing each page separately, so that would uh enable some more scalability there, at least on the linux side. So those are the kind of issues that we need to address.

A

B

B

So next question.

A

So luke is asking about the right. Sorry.

B

I just thought I'd write them up on irc.

A

Do you want to go? Do you understand and ask them.

B

uh Yeah, no, I might as well ask it for the record. Actually um so andre um has been working on merging two change sets uh from the lumos into freebsd.

B

The uh right smoothing change that you described at eurobsdcon and also the metaslab histogram changes, um and I was just wondering how those are working out for you in production. uh Have you got them in production yet, and then, if so, what are the uh improvements in that, and, in particular, with fragmented pools, is how's that working out and also is there more work to come on that yet or is that sort of the the body of it?

B

A

um What first I'll um I'll talk about the right smoothing code and then I'll ask george to answer the question in terms of the the histogram and the future of that, so for the right smoothing code, um we're using that uh both of the both of those features um we're using uh internally at delfix in production. Today um I we have, I don't think we have shipped uh that code to our customers, yet.

C

um It's actually at customers. Now, okay,.

A

So um there is a few customers um in terms of our internal use. We use it on um a server that serves up uh virtual machine images for for us to use for testing that gets a lot of load um so that that should sell a lot of performance issues. We have we've seen um basically no problems with the um with the right smoothing code. um That's been working very well for us um in production so far, and um I don't think that we really, I think, that's basically done.

A

I don't think that there's really more work that we have planned um in that area for right now, um george, do you want to talk about the uh histogram and fragmented pool performance, stuff.

D

Sure so um so yeah, that's also uh going out to our customers that any of our new customers are actually deploying those changes.

D

So there is more work to be done on that side on the fragmented pool side.

D

So, what's out, there today uh is kind of a fact-finding mission as well as some performance improvements, so the histogram is kind of meant to give us a good idea of like how is the space actually being comprised in a meta slab.

D

There's some changes that I'm putting together that actually are going to take that same histogram and then bubble that up so that you'll be able to see a bigger view on a v dev and also on your entire pool, so you'll be able to get an idea across devices and across your entire pool that show you that same kind of histogram and how your space is now um allocated or how your free space actually is uh exists.

D

um The other changes that are part of that are to do some pre-loading of metaslabs, which we've seen, has actually been beneficial at customers, where there is a lot of fragmentation, it's tunable today, so it's um I think it defaults to. I want to say it's like three per device where it will actually preload them at the end of every transaction sync. So every spa sync, when it completes, we asynchronously go and load the next three best meta slabs for that specific device.

D

We've seen customers where that has been a pretty big win along those same lines.

D

The things that we're working on are to take some of the information that we're getting from um from the histogram we're building up a fragmentation metric which we're going to expose to the user, so you'll be able to see kind of how fragmented your devices are, and then we're adding additional logic to allow you to select meta slabs or the code will actually select meta slabs that are quote unquote better than other meta slabs and then we're using that space to determine also how we select devices to allocate from so.

D

The three main components for allocation have always been select a device select, the meta slab, select a block. We have effectively made changes or will be making changes across all those three things. The way that you actually select the block has there's a new allocator, that's out there today, but isn't deployed so um still that hasn't really changed, but there's new code that went in as part of that meta slab change.

D

The way that you select a meta slab will be changing and the way you select the device will also be changing and we're taking into account the information we're getting back from the histogram that that we now have in place.

B

Well, that's very cool! Thank you and thanks, thank you so much for working on it, because the pool fragmentation has been one of the things. That's been a problem for us yeah.

D

By no means is is do we think that the fragmentation problem is solved so yeah and we look forward to kind of getting information as people like deploy with the histograms and start generating histograms. You know.

B

D

B

My next question right is there any way that it would be helpful for us to submit, um and then how can we do that for instances where we have pools that have become badly fragmented and where we're getting performance, problems and um yeah? What's the format for packaging that up and sending it to you.

D

So so you can actually dump out the histogram using the zdb command.

D

It now has um the ability to dump out not only the on disk histogram, but you can also get a more accurate in core histogram.

D

So if you look at zdb with the minus m option m as in mary, that will give you that information. Now, if your pool is live and heavily fragmented, it can be very difficult to get that information, because the pool is changing. So you know so dramatic right right. um So one of the next things that I'm doing as part of my next change is to actually be able to pull this out um on a lumos via mdb, but presumably from the other platforms.

D

You would also be able to pull it out from the kernel um yeah and then dump it out that way. So that's um I recognize that zdb is is great when the pool is pretty static and when it's not it's really hard to get some of that information out.

B

Sure one of the things that would be helpful that is helpful for us is that with hybrid cluster we can live migrate. All the applications offer node in the cluster and then use um that tool to extract it once the pool's quiescent and then so we could that that can help immediately, um but that's great and it might be. It would be really awesome to have some instructions on how to get that data on the wiki, perhaps um yeah, so you can follow that that'd be awesome. Thank you.

A

George um related to that, um how much does the work like once you've already once your pool has already got been fragmented? How much does the workload impact the kind of performance that you're going to see ongoing? So, in other words like when we gather the fragmentation information?

A

Do we also need to know like what the workload look like looks is looking like in the future in terms of like size of blocks that need to be allocated uh in order to like evaluate what you know or predict like what the performance will be like.

D

Yeah it's it's kind of difficult, because you're right, I mean, depending on the workload, is greatly going to impact how the fragmentation comes into play if you're having to deal with a lot of instances where you're doing large, very large rights, but you're left with a bunch of small little fragments, then you know those slabs are going to really impact you.

D

I know we've kind of talked about trying to take some of the stuff that we did with the right throttle and the knowledge we have kind of ahead of time to make an indication of what the allocator should do. So I think that's that's probably the next step where we can try to get some guidance based on the workload coming in to affect how the allocator needs to behave when it's trying to make allocation decisions, but it is, it is uh definitely susceptible to different workloads, are going to cause different issues based on the fragmentation.

D

A

Yeah, one of the things that we have um kind of tried to avoid, uh I think in zfs uh so far has uh is um uh like having assuming that there's going to be some some period where there's no work, where there's less load on the machine and then like going around and doing some activity during that time like, for example, um scrubbing or something like that, so um we've kind of avoided doing that um so far, but I know we've thought a little bit about along those lines for this alloc for the allocator stuff um in in terms of saying like well what what really is the best allocation algorithm, because um you know you may be that I'm allocating a lot of small blocks, and I have a lot of- I have a bunch of small holes and I have some big holes.

A

Should I just always be filling the biggest hole in getting the best performance, or should I kind of somehow figure out that well right now, the load is really small. So now is a good time for me to be filling in all these little holes. Even though it'll mean a lot, the I o will be more random um and so like it'll take longer to do that. But you know it's: okay, that it takes longer because the load is lower because it's quiet at night right.

B

Yeah that makes sense.

A

um So to kind of save that big chunk for when I really need it, um and I.

C

D

We've kind of looked at it both ways I mean we've looked at both options of like. Does it make sense to to waste some cycles when the load isn't as high to go fill holes?

D

I think the answer to that, at least on some experimentation that I've done is. Yes, it probably does make sense to go. Fill those holes just so that we can minimize the number of segments that we have out there, but it's probably going to be workload dependent.

A

Yeah, so it's very tricky to figure out like what is a light workload and what is a heavy workload right um yeah. So when should I be doing which um so there are some questions on irc a little discussion going on there that I think I'll I'll recap um and address a little bit uh for people on the video.

A

So um I think prakesh was asking um about the linux port and, um if there's anything that we can do to um help them, I guess keep more up to date on the changes that are happening in the other platforms.

A

um You know to you to speed up the process of linux tracking the upstream um and so one of the things that we're looking at. uh Well, I guess first, you know the question would be what um what are the problems with uh uh that that are facing people that are trying to up pull pull changes into it into linux, uh say from a lumos today.

A

So one of the problems is that a lot of those changes um don't apply cleanly um another problem so meaning that the diffs don't just uh apply um an automated fashion. um Some an engineer has to go and uh look at a merge, conflict and figure out. You know which lines of code to take from which side or make minor modifications to um the code that's being brought in that are specific to linux.

A

Another problem, that's kind of mechanical another problem is being confident that the changes that you've applied um actually work and don't break anything. um So hopefully uh we can trust that that the changes work in a lumos, but that doesn't necessarily mean that they will work properly on linux, and you know, one of the things brought up was like the right. The right smoothing right, throttle patch. um You know something that's very performance dependent like uh like that is difficult to verify.

A

um So I think this there's several ways that we're looking addressing that one is with automated test suites.

A

So um if we have you know regression, regression tests can be used to to get a lot more confidence around a set of changes so that uh once the changes have been applied, you run the regression tests and at least have some confidence that they haven't broken anything too badly.

A

um So I think one thing that we need to work on is um getting the tests that we have, that that can be run on illumos and on freebsd to also be able to run on linux. So john kennedy just recently has finished importing all of the zfs test, suite from the stf test framework to the new test runner framework, which is much simpler, and I should be much easier to port to other platforms.

A

So I think that we should work with someone on the linux side to get the test runner test, suite up and running on linux and then figure out how to make that you know part of their normal integration process.

A

The other. uh The other aspect is, um you know, patches applying cleanly, so.

A

What we're working on is creating a common zfs code repository, so the idea would be that, rather than um pulling changes from uh pushing pulling and pushing changes from each of the different platforms like from lumos to linux- and you know from freebsd to lumos and from limits to freebsd, we would have a common um code repository that would be independent of all platforms and then the the goal would be that every platform would be able to pull the code from that repository directly into their platform without any without any platform.

A

Specific differences to the files that are part of that. So um there's several challenges in accomplishing that um we need to be able to test the code. That's in this repository um on any platform and and be confident that it's going to actually work on every platform.

A

So the way that we would go about doing that is creating a used land framework so that we can compile all this code into userland and test it in newsland on every platform. So the idea would be that we would create like a definition of a zfs kernel interfaces that would be functions that um the kernel would on. On every platform would need to provide, um and then uh those would be there would be like a compatibility layer that would be on every platform.

A

You know, including lumos, uh that would translate from those like zfs kernel, apis to um that specific platform's way of implementing it, um and then we would have uh another kernel, api uh or kernel compatibility layer that would be for running this code in userland. So the idea would be that that code would be able to run on any platform um and uh so we're working. What I'm working on right now is um making more code being able to be tested in userland.

A

So right now we have z-test which tests mainly a bunch of spa and dmu code in userland, and it's kind of like a stress test. So what I'd like to be able to do is run the full dfs test suite against the user line implementation of zfs. So this would allow us to test like libcfs the zfs command line tools, z, if I send, and receive basically um most of zfs, except for uh the zpl. The zpl is the posix layer.

A

um So that's the part that interfaces with each platform's vfs virtual file system layer, which tends to be uh have a lot of differences between the different platforms. um So that would not be a candidate for initial inclusion, but um I think that we could get to a point where the vast majority of the zfs code can be compiled in new zealand tested in userland um and then uh be confident that that code can be taken verbatim to all the different platforms.

A

So that'll both increase test coverage of the code and make it easier mechanically to pull those changes in. um So obviously the work the work to do there is is both the kind of infrastructure that I that I'm working on right now to to make that code able to be compiled into design, which is largely around creating like an ioctal shim layer, so that the zfs command line tool can, uh rather than talking to the kernel to do its tasks, can talk to this like userland, zfs daemon.

A

That's the userline implementation, but uh the tricky that I think, they're, really more invasive part of the work will be to change to decide what these zfs kernel apis. Should be and then change the code to use that abstraction layer, so, for example, we were just having a discussion on the mailing list yesterday about uh use of. I think it was cv timed weight, sig high res or something some combination of those words with underscores between them, which is it, which is a interface on uh lumos.

A

That allows the thread to um block for to to go to sleep for a certain amount of time with a specified resolution, so this is used as part of the right throttle code and there's no exact um corresponding code in linux. So this is an example of something where what we should do is like create a zfs kernel um interface. That says, uh you know, put this thread to sleep for this amount for this amount of time and then on elimos.

A

It could be implemented using this cv, time, weight, high-res mechanism and on other platforms it could be implemented using you know, whatever the platform-specific mechanism is um now. Differences in how that's implemented would uh potentially have um both correctness and performance implications. So obviously the routines need to actually do what they're expected to do. um So we would need to document that very well uh in terms of you know all the different error cases side effects things like that.

A

So you know, for example, there's a question about like what, if the time that we want to wake up is in the past or what, if we're sleeping for a negative amount of time, what does that mean? um So defining all those sorts of things? um And with this you know, there's also performance implications. So you know, if the amount of time that we can sleep, has different resolutions on different platforms.

A

Then uh we would need to make sure that if it has lower resolution, meaning like you, can only sleep in big granularities like say you can only sleep for one millisecond chunks at a time, then we would need to make sure that the right throttle code still behaves the way that we expect it to even with um higher resolutions, so that that I actually tested uh when I was doing this on lumos, because you can just input a parameter of what the resolution is.

A

But things like that are things that we would need to keep an eye out um for in these common interfaces.

A

um I know I kind of rambled on there a bit. I hope that that kind of answers, your question uh prakash, um I think, ultimately like there's a bunch of coordination, work to be done and a bunch of just uh you know not very um glamorous, um hammering out of uh interfaces to make that comment and um importing test. Suites.

A

um Percussion asks for a pointer to the test, suite um yeah I'll, find that and post it on here um when we're done.

A

A

Zsh on irc has a question about device implementing device, removal.

C

D

Matt there was also a question I think earlier about bp rewrite. So oh yeah in case you.

C

Want to touch on both yeah.

A

So bp rewrite is a project that I was working on at sun and the idea was very all-encompassing, so the idea was that we would be able to take any block on disk and manipulate it in whatever way we need to allocate it somewhere else change the compression change, the checksum. um You know dedupe it or not, not dedupe it um and uh then change um and uh you know, keep track of that change.

A

So it's called bp rewrite, because in order to do this, we need to change the block pointer um to point to some others to some new block. So uh the tricky part you can imagine kind of the the straightforward implementation that you might do like on ufs would simply be to traverse all of the blocks.

A

You look at each block if I want to reallocate it I'll hit it somewhere else, great shove, the new pointer in there done um the tricky thing about doing this on zfs is um it's kind of two two parts of the trickiness one is that you can have the same, uh a pointer to this a pointer.

A

There can be many pointers to a single block, so uh this is because of snapshots and clones, so you can have one block on disk, um which is pointed to by you know, ten different clones um and uh the reason that we implemented zfs. That way is so that performance of snapshots and clones is uh very good.

A

So there's no difference in the performance of accessing a clone versus accessing the main file system, um they're they're, basically identical, um there's just like a administrative control, basically which says which one is this file system and which one is the clone which you can change with the um zfs promote command.

A

um So that's why we did it that way. It creates this problem where, um if you have a bunch of point, a bunch of instances of that block pointer, then when I traverse this, um uh when I traverse all the block pointers, uh I'm going to visit that old block winter several times. So we need to remember that we swit we changed from um this old block pointer to this new block winter. In other words, I moved this.

A

This particular block from place a to place b um so that, if I see another pointer to place a then I know to change it to place b.

A

This creates a performance problem because you end up having to have like a giant hash table which maps from the old location to the new location and, if you're familiar with uh dedupe, then you're aware that also involves a giant hash table. um Mapping from the uh blocks checksum uh to the location on disk, that's stored and the ref count, uh and if you've ever used dedup in practice on very large data sets. Then you're probably aware that the performance of that is not very great.

A

um So there are similar performance problems with bp, bp rewrite and there's also some additional trickiness, because the fs is very full featured in terms of its space accounting.

A

The space used by a given block is accounted for in many different places by many different layers of zfs.

A

So, for example, um it's counted in the uh in the d node so that each file knows how much space it's using that's used by. If you do ls-s, which tells you the number of sectors used or df that counts up the amount of space used by all the files, it's also used by um user accounting. So if you have a quota uh on a on the files used files owned by a particular user or group, it's accounted for in both of those places.

A

um You can access that with the like zfs user space commands they'll tell you how much space each each user is using and you can set quotas on them. um It's at. It's also accounted.

C

A

The um in the dsl layer and in a bunch of different places, so um in you know, when you do zfs list, it tells you how much space is used uh by each file system um and uh the space use their impacts like all the snapshots um and then all the parent file systems, uh because the space is inherited um up the tree uh in terms of the space you used.

A

um So there's a bunch of different places that that uh space accounting needs to happen um and making sure that those all get updated accurately when a uh block changes size is very tricky. um So, for all of those reasons, as I was working on this, I was very concerned that this would be the last feature ever implemented in zfs, because.

A

As uh most programmers know, magic does not layer well um and the bp rewrite was definitely magic um and it definitely um broke a lot of the layering in zfx it needed to have uh code in in several different layers. No, you have have intimate knowledge of how this all worked, um and, uh on top of all that, I should mention that um you know we were doing it.

A

uh We wanted to be able to do this live so, in other words, while you're doing arbitrary things to your pool, be able to do this vp rewrite, which is pretty essential when it's going to take you. You know weeks for the to accomplish this because of the performance issues.

A

So you know it's also kind of like um changing your pants, while you're running um as you're trying to move everything to different places, uh while you're also looking at everything- and you know- maybe you're deleting snapshots creating snapshots, while you're also changing where those snapchats are trying to reference.

A

So those are some of the kind of issues with that.

A

A

So that's uh so because I was because of my concerns about the layering issues. There um I'm actually kind of glad that that project um was not completed, because I think it would have had some big implications on what uh on the difficulty of adding other features um after it.

D

Sorry do you know if, if anybody is attempting this, I know, there's been discussion about that and I think niccento was even looking at some implementation.

A

I don't think anyone is is attempting um like a full-on um bp rewrite um do ever do anything kind of implementation.

A

um I know you know some people have have looked at it from different like what, if we just did this, what if we ignored that um which I'll kind of get to in a moment um like, for example, derek here is asking, would an offline bpv rate uh be accepted into the upstream code base? um I think uh it would depend on how exactly it was implemented.

A

So uh a separate utility um for uh you know potentially like something based on libsy pool, but not adding more code to it. I think would be great. My concern with having even an offline vp rewrite in the codebase is the kind of far-reaching um implications that that um can have on all on the different layering. So um I think I would very much welcome something like a separate utility which allows you to offline. You know, bp, rewrite your stuff.

A

um The issue would be um how that is maintained like how much of the common code it needs to use and then how you know deeply its fingers are stuck into that common code, so um certainly doing it. Offline um simplifies a good chunk of it simplifies all the interactions with the arc and the and the dmu, but you you still have the issues of the performance of having a giant hash table and um needing to update the accounting at every layer. This is certainly doable um in terms of streaming.

A

I think you know it depends a lot on how how much it affects the other layers um so now getting a little bit too. So this is a little bit something I talked about. Someone on iris who's asking does performance suffer as the count of snapshots goes over time for a given data set. So um in terms of the data path so reading and writing files.

A

um The number of snapshots has no impact on performance, um so performance is not going to get any worse as you create thousands or hundreds of thousands of snapshots um in terms of um you know. Reading and writing your files, um obviously uh operations that impact snapshots or iterating over snapshots, like the zfs list, are going to get a lot slower.

A

Deleting of snapshots um is going to get a little bit slower, as you have zillions of them, because uh snapshot deletion is roughly proportional to the typical number of snapshots in your file system, but that is usually not a huge concern. Even so, because you know 10 000 is not a big number for computers.

A

So getting back to um device removal and bp rewrite so one of the things that bp rewrite was supposed to solve or would have been able to solve, is device removal. So I have a bunch of devices in my storage pool and I want to shrink the size of the storage pool by removing one of these devices. So bpv rate would have allowed us to say great, we'll find all of the blocks that are on that device and just allocate them onto a different device.

A

This is a problem that can also be solved in other ways, so we've talked a little bit about potentially doing this at delfix. um The idea would be to, rather than um changing the block pointers, uh we would implement it as a virtual v, a a virtual v dev, a v dev stands for virtual device, so a actually virtual v, dev um or um perhaps more sensically named in indirect v dev. um The idea would be that uh we would uh take that you can think of kind of from this.

A

The most naive way of doing this would be to say, um take that device and just like copy that device into a file that's stored on the pool so like create a zval and then copy the device onto the z-vol.

A

If uh so, this, basically would say: okay, you can no longer allocate from this particular device that I'm going to remove. Now, I'm going to create a new z-ball, I'm going to just use dd to copy all the data from the device, I'm removing to the z vol. That will end up writing that data to all the other devices in the pool, and then I can essentially treat that z-vol like a replacement.

A

So, like you know, you can do a z, z, pull replace to replace one device with another one. Basically, what it does is it puts them into a mirror mirrors all the data over to it um and then removes the first device. So you would basically be doing something like that, uh but rather than mirroring it to another actual device, mirroring it to a z-ball which is stored itself in the same storage pool.

A

There's kind of a bunch of cool things that we can do to improve the performance of that.

A

But the essential trade-off is that with the bp rewrite scheme, once bp rewrite is done, then the data structures are all um as they were before. In other words, um you could actually do a bp rewrite and then use that pool on old software, um and you would have exactly the same performance that you would have if the data had been laid out. That way originally um so, basically vp rate is like huge performance impact, while you're doing it and then once you're done no performance impact.

A

um The indirect vdev method of device removal would have much less performance impact while it's in progress, but then there would be some performance impact of accessing of reading data from that indirect vteve, perhaps in in perhaps forever. um Obviously, as the data on that view is deleted, then there would be less and less of that data which would have this additional layer of indirection.

A

We also think that for most um for typical workloads, uh the um indirection table could be kept in memory um due to some tricks about with the um keeping track of like where the space is allocated um and uh there's also some additional tricks we can play uh with. um If you access that data from a file system as opposed to from a snapshot, then um when you access it from the file system, we can change the block pointer. That's in that file system at the time you access it.

A

So, in other words, like the file system, says it has a block pointer. That points to you know v dev number three, which is the one that I removed and is now a indirect we div. um We could see. Oh I'm reading that one I go through the indirection table and then I find oh, it's actually stored on vdf7 at this other offset. Well, let me just put that viewdivin offset into the um into the block pointer in the snapshot.

A

uh Sorry, in the file system, we cannot modify snapshots, which is why um those will still have to um go through the indirection layer. But there's a bunch of the kind of takeaway is um this is device? Removal is doable in a much um lighter weight way, both in terms of performance and um layering violations or lack thereof. This would basically be implemented with the exception of those little uh tricks that I mentioned, it would be implemented totally in the spot.

A

Upper layers wouldn't need to know about it, and um there would be some amount of performance impact of continuing to access the that data, um but we think that it would be small enough that it would not be measurable, at least in in most use cases.

A

George anything you want to add to that. I know this george and I kind of had talked about this a few months ago and had a bunch of ideas about it.

D

A

D

Just kind of as an aside to that, I think the other way for for those that have been looking at bp rewrite as kind of a solution for your specific problem. It might be best to look at what that problem is you're trying to solve and see if there are other ways to solve it. I know fragmentation comes up uh frequently uh things like rebalancing storage.

D

So it's it's a lot easier to look at those problems directly and see if there's a way that we can do it without having to do this monster. Bp rewrite project that matt spoke of.

A

Yeah so like, for example, with device removal, um I'm sorry with uh lun rebalancing uh or device rebalancing. um You know we would start by looking at. Why do you need to rebalance the luns? So probably the reason is that performance sucks. So the question is: how can we improve performance when you have imbalanced lungs?

A

One is to rebalance them which could be done using a variant of the indirect v dev method. Basically indirecting like if you have um three devices and then we added another one, that's partially full. We can basically say let me chop off the end of each of these three devices and indirect, just like the last third of them onto you, know this new vw, but one like in some use cases like, for example, what we're doing uh what our customers, delfix customers, do.

A

um They're usually running on top of like lines over some fiber channel network to some emc storage array and the the issues that they have with balanced luns are largely self-imposed self-induced.

A

In other words, these are problems that are created by zfs and can be solved by zfs um and george, I know has done some some preliminary work on that so far, um but the the issue, the the idea there is that we've added a new device and it's not as full as the others, um but so if we just wrote to that device and ignored the ones that are really full, then um performance would be fine, um because all of those devices are actually lens that are like backed by the same discs or whatever, like in some abstract storage fabric um and the the problems that we see are mainly with like allocation, trying to allocate all these little bits.

A

When you know we can't figure out which minnesota we should be going from, I mean, I know, george you've done a bunch of work.

C

On that already yeah and I think.

D

It's probably not as well known of a feature primarily because it is kind of a manual process, but there is a in the upstream. There is a tunable that you can enable that says. I want all my devices to have you know I allow the devices to allocate or to allow zfs to allocate from those devices until it reaches reaches this much.

D

You know until there's no more than this much free space. So the idea is, let's say you had four devices that the first two had you know: 15 free space and the other device had 90 free space. You could set the tunable at 15 and it would start forcing all the rights to go to the one that is mostly empty until they all come up to the same level and then they all start writing again.

D

So um there's been code in zfs for a long time to try to do that, heuristically and it, but it moves the needle very at such a very, very slow rate and what we needed for our customers was to actually be able to just quickly move to the devices that were empty. So I'm happy to talk more about that. That particular feature. If it's something that you think might be useful, we've been looking at it as something that we intend to um eventually make automatic.

D

So it isn't attunable um so you'll see that coming forward, uh probably in six months or so um is when you should expect that.

A

Cool there are a couple questions about the discussion that we were just having, so luke was asking um what, if I remove one device and it becomes an indirect device and then I remove yet another device which may itself have indirect mappings on it. um Yes, that does that would work recursively and we thought, through the problems um related to that. um I remember that we thought about it. I think that we decided that it just works um yeah. I think.

C

That's the case.

A

You know you, basically, you would go through one indirection layer and then you use you'd find okay, um this this this device is indirect and the data is actually stored in this. You know in this uh z-vol and then we go into that z-vol the offset and we find oh it's stored on um a block pointer, which is uh which is stored on device number y and then we go look and find oh device y is also indirect. Then we just keep repeating this process.

A

um Obviously, if you have to repeat that many many times, then the performance impact is, uh you know, is more and more, um but uh you know so so in extreme cases, I'm sure that you could shoot yourself in the foot, um but I I think in the common cases um we think that the performance impact would be pretty minimal.

B

Yeah cool and presumably, if you're, removing a bunch of discs and then adding some later, then you might also be able to expand one of these indirect redevs out back onto a full device.

A

Yeah um in practice um that shouldn't be necessary, so the one of the tricks um that I kind of alluded to uh would um would basically mean that if, if you do something like um add a new device and then remove an old device, um you know, then they could be different sizes or whatever. Then, basically, what we can do is um allocate a monstrous contiguous region um and basically the the size of the indirection table is proportional to like the number of these monstrous contiguous regions that we um have to allocate.

A

So if we can, if you have like a device, that's four terabytes and it's only like one terabyte is allocated, and you happen to have like one terabyte of contiguous free space. There'll just be one entry saying where that data is stored.

A

So we use that one entry plus the space map that we already have of that device to figure out um where the new data, where the new location is cool. um There's another question um about um the performance of snapshots, um so he's asking.

A

Even if I have like a lot of you know a snapshot every hour and I'm reading a snapshot of a z-vol that I'm taking every hour and I'm doing lots of heavy reads and writes to it, then: um will there really not be any performance impact due to all the accounting that has to happen.

A

Basically, no um the you know the whenever we. um So when you do a write to that z-vol, then um it's potentially overwriting an old block and that lock that gets overridden overwritten is logically free, but it could still be referenced by some by some snapshots so um that block pointer is put onto um if it's referenced by the previous.

A

If it's referenced by snapshots, then it's put onto the um z-vol's dead list, so the dead list is a list of blocks that have been um they're no longer referenced um but were referenced in the previous snapshot.

A

So um that is independent of the number of snapshots that that there are so you could have a bajillion snapshots and it's still just a matter of um whenever I free a block. I write it to the um z balls dead list. Now um I made some uh enhancements to the dead list code.

A

So if you go and read like my really really old blog post from 2005, um it describes this process of um how snapchats are implemented, um and if you read that then you'll the details of that then you'll see that, like, oh okay, so basically like there is no. uh It doesn't matter how many snapshots there are we only care if they're about, if the most recent snapshot references it, which is kind of which is true until uh I made some enhancements to the snapshot to the deadlift management code.

A

So the downside of that mechanism is that when you delete a snapshot, it can be fairly expensive because we have to iterate over that snapshot's entire deadlift.

A

Now uh the change that I made um makes it so that the dead list is actually uh a composite of many lists, um and the number of lists is proportional to the number of snapshots that there are. So um it is true that if, when you have more snapshots, then, um as you uh make changes as you logically free blocks from that z-vol, um those block pointers will need to be written.

A

Those rights will be spread out over several different um objects which hold the free lists which hold the deadlifts.

A

So you know there's some kind of second order effects there in terms of performance, so you know because, rather than just appending to one object, I'm appending to uh you know if, when I free, you know ten thousand blocks rather than appending those ten thousand blocks block pointers to the end of one object. I'm appending those ten thousand block pointers to the end of you know, I'm splitting that up into a thousand different lists. um Each of um you know you know putting ten things on to the end of each of those 1000 lists.

A

um So you know there's some kind of skill. uh Scalability uh or you know you don't get as much um uh as good of like consolidation of like the indirect blocks and things like that, so um there's a little bit of impact there. um I haven't measured that, but um if you do see issues I would be I'd very definitely be curious to hear it.

A

I suspect that, with kind of reasonable number of snapshots like in the thousands or tens of thousands, that it's not going to be a huge issue.

C

A

Yeah, so um richard lager was mentioning that when we have this, when you remove a bunch of different devices and have this like chained in direction, it would be nice to eliminate that recursion um that would be eliminated by the kind of by the trick. I mentioned with um changing the actual block pointer in the file system.

A

So when you read from the file system and we discovered that it was indirected uh we'll we'll go through all the layers of indirection to find the actual concrete um location on disk uh so that the next time you go and read that block from that file system, then um it will go directly to the location on disk.

A

uh There's a question about: is there a zfs internals book in production? Not that, I know of, um I think, uh we're starting to try to document more of the zfs internals from an implementation point of view on the opencfs website. um Max has been doing a lot of that um working. He. He actually just wrote up an article about uh denote sync which, which talks a little bit about um freeing.

A

So, for example, when we, when you truncate a file um we or uh punch a hole in a file, then um we need to keep track of that and how that gets written out, um which is some stuff that he ran into while implementing the uh the um whole birth times for send and receive, which kind of brings us back full circle to the beginning of this discussion. So.

A

um Eric um george, do you know what the name of that tunable is? Oh here, george is on there cool, um so uh I think we're kind of getting to the end of the questions um think about. If you have any more questions and also um there's going to be a trivia question, whoever answers the oh man, whoever answers the trivia question um correctly or whoever gets closest to the answer of the trivia question- is going to receive a open, zfs t-shirt, which I will send to you through the mail.

A

um So far, no one has gotten a t-shirt through the mail, um so a few people have requested and I've been too lazy to do it to actually send them out so far, um only people who don't have a t-shirt are qualified to enter this contest. um But this is your opportunity to get a t-shirt without having to go to a conference that I'm speaking at.

A

uh So are there any last questions before I um tell you what the trivia question is to find my my notes for the trivia question.

A

Okay, all right, no questions all right, so the trivia question you have to answer it quickly. You cannot go look this up. If I think that you have gone in uh looked it up, then I then you were disqualified.

A

um The answer is the form of a number um and uh so you'll need to type in the number and whoever uh gets closest to the actual number is going to receive a t-shirt in your choice of size, um shipping included to the united states. If it's outside the united states, then we'll have to work something out um all right.

A

So the question I'm gonna, um I guess I'm gonna. I should type this in to um irc um because they have a little bit of people who are not actually in the hangout have a little bit of lag so all right.

A

So the question is how many lines of code are there in zfs?

A

um So I just posted the question so in particular I'm talking about the zfs. um You know in files that are cn.h and files that are part of like the normal user facing part of zfs. So the kernel lib zfs lib zfs core the zfs command, the z, pull command.

A

How many lines of code, whoever gets closest, is going to get a t-shirt?

A

I see george chuckling over there. George. You are already disqualified by the fact that you have a t-shirt. This.

C

A

This is why I made that rule, because otherwise I'd just be giving you another t-shirt.

A

All right, derek derek is guessing 30 000.

A

Zsh is guessing 100 000.

A

I'm gonna give you guys a few more minutes, but if anybody looks it up, then um I'm gonna ridicule, you jones says 440 000., luke luke would have guessed thirty thousand, but he already has a t-shirt.

A

George you can put in your guests last after everybody has guessed so as to not give it away nobody. Nobody else wants to guess.

A

No, no one has come within 50 of the correct answer.

A

Prakash 225 000.

A

Anyone else want to, I know. Unfortunately, I know that most of you um have not actually looked at the zfs code, so you you're definitely at a disadvantage. Nobody, um I was hoping that uh you know. Maybe uh some of the some of the guys who have reported it to other platforms would have shown up, because I bet they would have a pretty good idea, because they've kind of looked at all of it all right. Nobody else. Nobody else is going to guess.

A

All right so george, what's your guess.

A

Oh he's typing, you better be typing it in and not looking it up.

D

Yeah, I'm typing it in, but it's probably been a while, since I've actually looked at how many lines of code.

C

There there were.

A

Come on how many zeros are there? Oh! Ninety thousand, all right so george is not the closest to the actual number of lines of code.

A

The winner is zsh, who is no? He doesn't have a name all right.

D

Yeah, my guess would have been a hundred thousand.

C

But you went under, I went under because they, you should have gone over yeah. I should have put 110.

D

A

Okay, so the actual size is 149 992 lines, so almost exactly 150 000 lines. So there was an opportunity there for somebody to get it to get it to within eight lines. If they had guessed 150k.

A

So uh this is, I did this on a lumos. It was very similar on freebsd.

A

um It was a little harder on linux, because they've like moved all the files around, um but 150k of that the kernel is 115 000 and then um the userland stuff, uh libsyfest and zfs and zeefulcommandline are is the rest which is uh 30, 35 000..

A

um No, it's not 500 000 as brewer, um so I meant to go and look up like how many lines of code there are in some other file systems. But I didn't have time, um but I know uh george like when we were back at sun. Like you know, this is like maybe 2007 2008 um and we actually broke 110 000 lines, and that was including all all the code that we wrote.

A

um So if you include like kind of all the code that the zfs team at sun um wrote, including, like you know, z-test and um zdb and stuff like that, it was it's now like 173 000., um there's a time when we when we we broke a hundred thousand- and I remember um jeff being really sad that um our code had gotten so bloated that it was a hundred thousand lines of code.

C

um Yeah, that's the thing I remember too.

D

Is jeff always used to talk about how we were right at around a hundred thousand? That's why.

A

You didn't guess that we would have broken that um in the past.

C

Seven years.

D

You know I not accounting for like things like z, hack and zhu, and all these different z stream dump and all these different utilities.

A

Even just the kernel is only is 114 000. yeah.

D

That's pretty amazing I should have. I should have thought about the fact that you know that was before we had added like d-dupe and.

A

Yeah yeah um so yeah, I think, if I remember correctly, um ufs was bigger at was more than 110 000 um at some point. Is that not right.

D

Yeah, I think it's correct. I think it was larger. I think there were even some uh device drivers that were larger.

D

I was trying to find I I seem to recall that uh there was a thing that that compared it.

C

That compared it, but I.

A

I just looked at the just the dot c files in ufs kernel is forty thousand.

A

uh And just the dusty files in zfs is is ninety nine thousand six hundred. So if you just compare the kernel stuff, then we're still quite a bit bigger we're more than double the size of of ufs, um but I would contend um much more than uh double the functionality.

A

Well, I think what I think what jeff was comparing before was ufs plus svm, the slurps volume manager, um which I'm sure is ridiculously huge.

A

Cool all right, um thanks guys for coming, um we haven't determined when the next one will be the next office hours will be, but um I am proposing that it happens on october 31st because that's always been kind of an auspicious date for zfs. We've had a bunch of big milestones around that date: october 31st 2001.

A

We first we had like the first tiny, tiny prototype, and then we open sourced the code and integrated it into the solaris kernel on october 31st um 2005., um and I think also some other big features were landed around october 31st on halloween.

A

It's also just a few days from my birthday. So hopefully, in a few weeks um we will get another zfs expert to hold office hours. um Thanks george for uh fielding. Like half of the questions that came at me today, um hopefully uh you will have your own office hours at some point in the future.

A

um So thanks a lot and um zsh email me your mailing address and uh and shirt size and I'll. Send you a t-shirt thanks guys.

B