OpenZFS 2021 OpenZFS Developer Summit, 17 Nov 2021

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Adding Logical Quotas to ZFS by Sanjeev Bagewadi

Description

From the 2021 OpenZFS Developer Summit
slides: https://docs.google.com/presentation/d/1o2bhd5eKNoJhzqyj9GjA7lsZPhATL6NxozvZ5XCOjUo
Details: https://openzfs.org/wiki/OpenZFS_Developer_Summit_2021

A

A

This so this this is something that came up as part of the files solution that jeetendra was talking about earlier.

A

So earlier we were, we were not using compression, so we had not enabled zfs compression because for us the compression was uh done at the back end on the nutanix eos platform. But then at some point we figured that it might be better for us to enable compression in files and rather than doing it on the back for various reasons.

A

So um as part of that work, one of the requirements that came up was the quotas need to be on the actual size and not on the compressed size. So the if you look at the current the default implementation that we have on cfs the quotas are applied on the compressed size. So if, if compression is enabled, the quotas are applied on the actual blocks that are written to disk and not prior to the compression.

A

So that was uh the thing that came in uh as a requirement and we are trying to figure out how to implement that. So before we jump into the actual implementation or the changes that we made. Let's take a look at what the default cfs has in it, so each d node keeps track of the blocks consumed by that d node in dn used. So if you look at the denode fizz, it's essentially keeping track and indian used and also in the block pointer.

A

We we have all the sizes. We have the logical size. We have the physical size and a size which is the allocated size on the disk. So we, the block pointer, is fairly neat because we have everything that's needed there, whereas at the d node, um we don't necessarily have the logical size on it.

A

What we have is the physical consumption non-disk, so that was one shortcoming that that we have to achieve what we wanted with for the logical quota.

A

um The other thing um that we, if we look at the user and group consumptions um those are maintained in in the special denotes that we have, which is minus one and minus -2, where we keep track of the blocks consumed by each each user. It's an indexed access, it's essentially a zap where we index based on the uids and gids, not exactly them, but yeah sort of that gives you an idea of what's happening there so um and for the data set consumption.

A

I think I think we are fairly covered, because uh at the data set level we keep track of both the compressed and uncompressed count, and you we in fact use this for zfs uses this for, for the compressor ratio, computation and the space calculation itself is, is updated as part of the kha sync done by these two routines, which is the node video space and do user quota cache flush. So these are the routines which are which are responsible for doing uh doing the updates um in there.

A

So this is that's some background on what we did, or rather what's present in there and moving on to what we uh modified.

A

So the changes that we needed were uh one like. We saw earlier um the d node we needed. We need some field to keep track of what its logical size is, and we also needed to keep track of the logical consumption for the uh uids and gids. So those are the two changes that that we needed so for the d node. What we chose was the the impact three in uh in field in in genome phase.

A

So we picked that um field to pick to maintain the logical size, so we call it uh dnl l used instead of uh as in drawing parallels from d and use, we call it dnl used, which is the logically used of that keynote. So that's a new field that we added in there and for the user and groups those so the user in group was essentially one value that we maintained earlier right. It was just one value which told us how many blocks or how many bytes each of those users consumed.

A

So I extended that the value field and added a second value in there, which is another 64-bit value which sort of keep tracks of the keeps track of the logical use. So that's an additional field that we added to this app now this um there's a good chance that.

A

That the zap, which could have been a micros app if the number of users is limited, definitely switches to uh fat zap in this case, but for all practical reasons or for for all practical uses, you may not really have um a micros app there if in in a real world case, so the number of users could be much larger and hence it would already be a fraction. So this was not a problem, a couple of reasons why? So? That was one of the reasons why we decided to go down this path.

A

The other another option that we were looking at is probably use use a different d note to keep track of the logical consumptions, but decided to go with the existing one, which is simpler and fairly less amount of change.

A

So what we did in addition to these two things is um added: a data set property which, uh which is just a binary, switch a flip on or flip off, so called logical quota.

A

So if you say logical quota is on all the enforcement code, which is when you are trying to do a dm utx assign that's the one where, where you check against the quotas, it would start using the logical quotas, whereas ah if, if logical quota is off, it falls back to the current implementation that we have, which is you apply it on the physical block allocations?

A

um So that was those were the changes that we did and um a couple of other things that that we modified is in the um in the vfs or in the vm ops, where um the the back end for um for stat, which is your zfs, get actor um there. uh Depending on the um on the data set value uh on the data set value for uh logical quota, uh we switch to reporting the l used instead of the used dna used.

A

That was one change and then the other change is in the cfs user space, where it reports both the used and the l used in there.

A

I made a small change in in zdb so that it reports both the sizes, uh which is the the l used of the uh d node. So those are the changes that we made and.

A

So some uh challenges we had was: we were not able to convert the existing or start supporting the compress compression on on the reason I bring in compression. Of course, zfs does support enabling or disabling operation, but for our purpose the logical quota you're not able to enable it on the fly for existing data set. So we we switched to using it only on um a new data set, so that was that was one change, uh or rather that was a challenge.

A

There's couple of other things, wherein we had an interesting issue where um our record sizes or sorry the the sector size that we support uh was 4k and if you had a smaller block, which is like 512 bytes, you just allocate you'd still end up allocating a 4k sector, for it so um had to fix that problem, because if you, if you just went with the l size for all the computation things, go wonky uh when you have a large number of small files, so um we had to make that wherein it's, it's actually a max of um l, size and a size.

A

So you you, if, if you uh well get that that's a fix- and I have I am yet to bring that into the uh port. So at what stage is this work right now? We.

A

The the I haven't integrated this with the project quotas, yet I'm still working on that one trying to figure out how how this should integrate. With that the changes I have put it out on on the link there, so you can go, take a look at it I'll I'll, be working on it for some time to make sure it integrates with the project quota so I'll get that sorted out.

B

I think this is really cool. um You know I, I did all the quota stuff way back in the day um and it's cool to see that it extended.

B

um One question that I had was what your plans are or if you have plans to change the way that the you know right now, you're talking about a logical quota that controls the behavior of all the other quotas versus. um I think we had talked offline about the possibility of adding keeping the behavior of the current quotas and adding new quotas that are, like you know, logical.

C

B

One gig logical refused equals whatever logical user used equals whatever. um I was curious if you had thoughts on uh how doable that is, and if.

A

It is, it is yeah, it's it's sorry, yeah. Thanks for bringing that up. Matt from the offline discussion yeah, I missed that one yeah.

C

A

Yes, it's it's definitely doable because um the way I would look at it um for the general consumption is that this brings in an additional field in the d node, and now you have pretty much all the control, because your block pointers tell us what the logical size is now. The d node also tells you what the logical size is and at the same point for every user and group you now we now we can now keep track of the logical consumption.

A

So once the those fundamental uh counts are available, we could easily build up um logical quotas on top of it as as individual properties. So it's it's definitely possible um in there today for our use case. It is we just wanted a simple switch to uh flip it, whereas it can easily be extended to have additional additional quotas applied. So it's doable.

B

Yeah, well thanks um alan did you want to ask your question: live.

D

Sure I actually had a different question: um do you ever run into problems with users having quotas and, if they're, very near the quota? The performance is really bad like right, performance.

A

Yes, yes, we do see that um in fact it it's it's a little. um It's a little tricky with with our cases where, um if, if you draw correlation to what uh jitendra was saying earlier about distributed shares, so we we have slightly more uh odd problems in there, because the uh the distributed chair itself is actually scattered across data sets right. So assimilating them sort of becomes a problem, and there are quota runs that we see, but with respect to the usage.

A

Yes, uh we do see it dropping so because every time you try to do a demure text assign it it sort of sees that you're close to it, and since we overestimate um as part of the assign.

D

Yeah, the over estimate, the inflation ratio is.

C

24X, the logical.

D

C

D

A

C

D

Candidate patch that will improve.

A

That yeah yeah, I think that makes sense, definitely.

D

Interested in other people running into this problem- uh and I think the easiest way to tell is there's a a k stat under the dmu tx that counts. How many times the system stopped assigning, because it thought the quota was full and if the.

C

A

Keeps going up.

D

Then you're definitely running into this problem.

A

Yeah yeah, I saw your patch on that. One yeah makes sense actually.

A

In fact, when we are computing, um probably I should just uh bring this and because um so the logical size that we are trying to assign there's a couple of other tricky things that we have to deal with, wherein uh whom do you charge the um the metadata or the indirect blocks? And how do you?

A

How do we charge them because those are kept in replicate so there's some compromise that we worked on that we said if it's, if it's not a data block, then you charge uh don't charge for the duplicates and stuff like that.

A

It's not in uh directly in uh in the context of your question, but yeah.

C

A

Again, that's when I is looking at the inflated cycles in there.

A

Or the inflatable estimates.

A

B

um Alan did you want to ask your other question and then.

D

Let uh maybe let mark have his question first in case.

C

Is that a simple question you mentioned that uh logical quotas only worked on new data sets right, so you could have an environment where you had old and new data sets where old data sets would have. Physical quotas and new data sets would have logical quotas. Is that correct.

A

That's true mark, uh in fact um we went a little further in our use case. um Yeah I mean for general purpose. Yes, you, you could still do that uh between uh each data set. uh You could you could turn on logical quotas.

C

Okay, so does that mean that your code is able to distinguish and say all right? I know this data set.

A

C

A logical quota support so I'll.

A

C

Fields, yes,.

A

Yes, because the uh the the property that I was talking about is the data set level property, the logical quota, property.

D

That I just referred to earlier.

A

C

A

This one is a data set level, so you can enable it on individual data sets cool. Thank you.

D

My second question was about um when you added the accounting for how much logical space has been used. Does that extend to snapshots like currently, there is no l used uh property on a snapshot, but is there now, in your code.

C

A

B

A

I got the question.

B

Because because the snapshots, the snapshot space used is like that's how much space is unique to that snapshot and then the other accounting values are are around like you know how much space was written, how much space is written since some old snapshot or whatever, based on the uh dead lists, which is.

D

Different, basically, where that that unique value is, I think it's actually stored in the the object, as the field unique uh when that's calculated it'd be helpful to calculate the logical size of that data as well and store that uh so that you can actually see the logical used on a snapshot. Instead of only the physically used.

A

D

And you know if you're already adding other stuff to the denote it. It makes sense to do all of that in one go rather than uh separate.

A

B

Rather than the d node, but right.

D

Yes, it's I think it's in the robbed or whatever it's called.