►
From YouTube: Device Removal by Matt Ahrens
Description
From the OpenZFS Developer Summit 2018
Slides: https://docs.google.com/presentation/d/1u4gIGHJCbKAUxpFjU6VpUJaTX_pHAyT_HVpQJxUOFR8/edit?usp=sharing
A
So
our
next
presentation
is
going
to
be
about
device
removal
by
my
parents-
and
probably
most
of
you
have
already
heard
about
this.
It's
a
new
feature
which
has
been
in
the
works
for
several
years
and
we've
been
using
the
del
phix
already
for
mayor
several
years
as
well.
We
have
made
to
Linux
relatively
recently
I
believe
so.
Matt
is
gonna,
give
a
good
deep
dive
into
a
device
removal
and
its
internal
functioning.
So
please
welcome
Matt.
B
Thanks
everyone
so
yeah,
as
people
mentioned,
some
of
you
might
know
what
device
removal
is,
but
I'm
gonna
start
by
explaining.
What
is
the
point
of
this?
So
what
I'm
talking
about
here
with
device
removal
is
like,
for
example,
in
this
case,
we
have
a
pool
with
three
top-level
v-dubs,
three
mirrors:
each
one
is
the
mirror
of
two
disks,
and
what
I
want
to
do
is
remove
this
whole
mirror
and
I'm,
reducing
the
total
amount
of
space
in
the
storage
pool.
B
So
this
is
in
contrast
to,
for
example,
detach
where
I
have
a
mirror
and
I
want
to
remove
one
side
of
the
mirror.
You've
been
able
to
do
that
since
since
the
beginning,
but
that
doesn't
actually
reduce
the
total
amount
of
space
in
your
pool,
it's
specific
to
mirrors.
So
what?
What
is
the
point
of
this
like?
Why?
Why
would
you
want
to
do
this?
Oh
sorry,
before
I
continue
on
it
notice
that
this
is
not
being
rendered
correctly.
B
All
right,
hey
luck,
surrender
it
different
fine.
What
is
the
point
of
this?
Why
would
you
want
to
do
this?
Well,
when
we
were
design
this
feature
at
del
phix?
This
is
back
in
2012,
maybe
one
of
the
main
use
cases
which
is
you
know
coming
from
our
field,
people
as
well.
What
if
customers
over-provision
they
add
too
much
storage
to
this
storage
pool
or
like
they
have
a
temporary
project
where
they're
like
oh
I,
need
a
bunch
more
storage
just
for
like
the
next
month.
B
While
we
do
this
project
and
then
after
that,
I
want
to.
You
know:
remove
this,
remove
that
space
and
use
it
for
something
else.
You
know
for
a
different
storage
pool
or
something
else
entirely
mm.
Has
anybody
had
this
problem?
No
wow?
Oh
one
person
has
over-provision
their
pool
yeah.
This
turned
out
to
not
really
be
the
really
use
case.
B
Another
big
use
case
is
like,
if
you
add
the
wrong
disc
or
add
the
video
as
the
wrong
type
like
you
meant
to
add
it
as
a
log
device,
but
you
added
as
a
mean
as
like
a
regular
device,
it's
kind
of
stuck
in
there
forever.
Don't
worry,
I
won't
ask
if
anybody
has
experienced
this
use
case,
but
the
main
use
case
that
we've
actually
seen
in
practice
is
for
storage
migration.
So
the
idea
here
is
like
you
want
to
Mike:
you
have
a
poor.
You
have
a
storage
pool.
B
Basically
you
want
to
remove
all
of
the
discs
and
you
an
ax,
my
great
the
storage,
to
all
new
disks
that
might
be
a
different
size
or
different
numbers
so
like
in
this
example,
say:
I
have
10
1
terabyte
drives
and
I
want
to
replace
them
with
4
6
terabyte
drives,
there's
some
special
cases
where
you
could
kind
of
hack.
You
do
this
in
a
super
hacky
way
before
this
project,
but
this
makes
it
a
really
first
class,
so
right.
So
what?
How
do
we
do
this?
That
destined?
B
My
talk
is
about
how
do
we
do
device
removal?
So
all
that
we
need
to
do
is
find
all
the
allocated
space
in
this
storage
pool,
which
is
represented
by
these
blue
and
purple
squares
and
then
allocate
new
space
for
that
in
the
remaining
devices.
In
this
case,
I'm
I
want
to
remove
the
device
on
the
left
here.
B
B
So,
as
I
mentioned,
we
need
to
keep
track
of
the
mappings
from
the
old
locations
to
the
new
locations
that
uses
memory.
So
in
this
case
we
want
to
remove
this
device.
C
2,
T,
1,
D
0.
We
use
the
end
for
a
no
op
flag
and
then
it'll
tell
us
great
after
you
remove
this.
We
think
it'll
use
37
megabytes
of
memory.
Then
you
just
run
it
without
the
N
and
it
kicks
it
off.
B
In
the
background,
while
it's
running
in
the
background,
you
can
check
the
status
it
shows
up
in
zero
status,
so
it
tells
you
like,
oh
I'm,
in
the
middle
of
removing
this
device,
and
we
also
call
it
device
evacuation
to
be
really
explicit
about
the
fact
that
hey
what
I
have
to
do
to
remove
this-
it's
not
like.
Oh
you
run
people,
remove
and
then
yanked
advise
you
you
renze
full
remove,
and
then
we
evacuate
it
by.
You
know
getting
all
of
the
all
the
allocated
data
off
of
that
device
on
to
the
other
devices.
B
B
If
you
want
you
can
cancel
the
removal,
you
could
cancel
it.
We
just
kind
of
set
everything
back
the
way
it
was.
You
know
that
you
might
want
to
do
this.
If,
like
I,
you
you're
you
in
the
mode
of
removal,
you
started
on
Friday
night
and
then
like
it's.
It's
Monday
morning
and
you're
like
oh
whoops
like
this,
is
taking
longer
than
I
thought
people
are
going
to
be
coming
in
to
work
and
they
need
really
good
performance.
So
let's
cancel
it
and
rethink
what
we're
doing
here.
B
If
you
lose
power
or
you
reboot
during
your
removal,
it'll
pick
up
where
it
left
off,
so
it
remembers
all
across
it
remembers
all
the
progress
I
mean
you
can
do
everything
else,
while
you're
in
the
mode
in
the
middle
of
the
removal.
You
know
you
can
add
new
devices,
you
can
take
snapshots
everything
and
then,
after
you
complete
the
removal.
The
gee-pole
status
tells
you
about
that
as
well,
and
it
tells
you
how
much
memory
is
used
in
this.
B
B
Okay.
So
now,
if
you
didn't
buy
my
three
line,
explanation
of
how
it
works.
This
is
how
it
really
works
in
detail.
So
if
you
want
to
follow
along
later,
first
thing,
I'm
gonna
be
talking
about
the
removal
process
and
then,
after
that,
I'll
be
talking
about
after
removal
has
completed
what
we
need
to
do
to
handle
the
new
state
of
things.
So
this
is
mostly
happening
in
V.
Dev.
Underscore
removal
see,
there's
a
there's,
a
bunch
of
really
huge
comments
in
there
explaining
this
stuff
in
more
detail.
B
If
you
want
to
check
that
out
later
so
first
I
want
to
mention
we,
we
start
by
checking
the
removal
types,
so
you've
actually
always
been
able
to
remove
devices
from
a
pool,
and
so
this
this
I
octal
is
not
new.
It's
just
that
most
rules
that
you
might
want
to
do
will
rule
result
in
Ian
Val.
So
first
we
check
which
type
it
is
so
you've
always
been
able
to
remove
inactive
hot
spares,
remove
cache
devices
when
we've
logged
devices,
but
this
talk
is
about
removing
top-level
devices.
B
First
of
all,
you
need
to
have
enough
free
space
because,
like
I
said,
we
need
to
allocate
new
locations
for
everything.
That's
on
that
disk
and
one
one
little
caveat
that
we
ran
into
when
when
doing
some
testing.
Accidental
testing
is
so
we
require
that
you
have
like
enough
space
plus
a
little
bit.
The
little
bit
that
we
decided
on
is
is
based
on
the
slop
space
in
the
pool.
The
slop
space
is
just
three
percent
of
the
total
pool
size,
but
it's
it's
the
total
current
pool
size.
B
So
what
means
is
that,
if
you
want
to
remove
a
device
which,
on
its
own
that
one
device
is
more
than
97%
of
the
size
of
the
pool,
you
can't
do
it
because
of
this
check
which
we
could
relax,
but
hopefully
you
don't
accidentally
like.
Hopefully,
you
aren't
testing
and
want
to
add
a
one
terabyte
device,
but
accidentally
add
a
one
petabyte
device
yeah,
that's
how
we
discovered
that
there
there
can't
be
any
known
damage.
B
So
this
is
just
you
know,
kind
of
sanity
check
like
if
you,
if
you're
trying
to
remove
the
device-
and
we
know
that
the
device
is
missing-
some
data-
we're
not
gonna.
Let
you
do
that
and
then.
Lastly,
the
blocks
have
to
have
the
same
on
disk
layout.
So
what
that
means
is
that
all
the
devices
have
to
have
the
same
a
shift
and
unfortunately
you
cannot
have
any
raid
Z
I'll
kind
of
get
to
some-
maybe
feature
work
there
later,
but
for
right
now
it
does.
It
works
with
plain
disks:
it
works
with
mirrors.
B
It
does
not
work
with
raid
Z,
ok
cool.
So
now
that
we've
decided
we're
actually
doing
this
device
removal.
We
start
by
disabling
allocations
to
device
to
the
device.
So
we
maybe
won't
have
any
rights
to
that
device
while
we're,
while
we're
in
the
middle
of
the
removal
and
to
make
sure
and
then
to
hopefully
make
sure
that
we
really
really
don't
have
any
rights
to
it.
We
do
this
spa
reset
logs.
Does
anybody
know
what
that
means?
What
does
this
function?
Do?
No
excellent
Oh
people
know
so,
what's
my
reset
logs
does?
B
Is
it
basically
like
clears
all
the
Zil
logs
and
then
reallocates
them?
The
reason
that
we
need
to
do
this
is
because
the
Zil,
the
Zilla,
is
like
a
singly
linked
list
of
blocks.
We've
always
allocated
the
next
block
that
we're
going
to
write
to,
but
we
don't
know
when
we're
going
to
need
to
write
to
it.
B
So
we've
already
allocated
it,
it
might
be
allocated
on
the
device
that
we
want
to
remove
and
we
don't
want
to,
like
in
the
middle
of
the
removal,
have
to
handle
a
right
to
the
device
that
we're
trying
to
remove.
So
by
resetting
the
logs.
We
make
sure
that
they
all
get
reallocated
since
we're
doing
it.
After
we've
disabled
allocations
to
this
to
this
device,
they'll
all
get
allocated
on
the
remaining
devices,
not
the
one
that
we're
trying
to
remove,
and
then
we
we
kick
out
a
sync
task.
B
So
sync
task
is
like
a
callback
that
runs
from
spa
sync,
while
we're
syncing
out
with
txg
and
that's
gonna
initiate
the
removal.
So
what
that
does?
Is
it
initializes
the
on
disk
state
that
says
we're
in
the
middle
of
a
removal?
We
have
made
zero
progress
and
then
it
kicks
off
this
new
thread.
So
this
is
pretty
different
than
a
lot
of
the
other
background
operations.
So
you
know,
if
you
think,
about
what
kinds
of
background
operations
you
might
normally
see
in
zequals
status
like
scrub,
Andrew,
silver.
B
Those
do
not
work
by
creating
another
thread,
what
they
work
by
doing
all
of
their
work
in
syncing
contacts.
So
when
you're
doing
a
scrub,
basically
like
he
does
the
normal
rights,
the
spa
sync
does
all
the
normal
rights
and
then
it's
like
great
now,
there's
like
some
time
reserved
for
doing
scrub,
and
this
is
not
really
great
for
performance,
because
it
means
that
you're
you're,
basically
taking
some
overhead
out
of
out
of
your
overall
possible
right
throughput.
B
So
the
the
moss
is
the
meta
object
set
MOS
and
that's
where
we
store
all
the
pool
wide
metadata,
for
example,
how
much
progress
we
made
for
a
removal
or
what
is
the
mapping
between
the
old
in
the
new
locations?
So
we
can't
actually
modify
that
from
this
threat,
so
I'll
get
into
how
we
do
that
later
on.
B
Okay,
so
we
kicked
out
this
new
thread.
What
is
the
thread
need
to
do?
First,
we
need
to
start
by
finding
the
allocated
space
to
copy.
So
the
interesting
thing
here
is
that
we're
doing
this
by
looking
at
the
space
maps,
not
by
the
block
winters.
So
if
you're
familiar
with
zpool
scrub
or
resilvered
those
go
through
and
find
all
the
block
pointers,
which
means
they
have
to
traverse
the
whole
tree
of
indirect
locks
and
everything
in
the
storage
pool.
B
I,
don't
know
if
anybody's
noticed,
but
in
some
circumstances
the
scrubber
nuru
silver
don't
have
the
best
possible
performance,
and
this
is
something
that
you
might
want
to
do.
You
know
when
you're,
not
in
a
disaster
scenario,
so
we
wouldn't.
We
wanted
to
make
sure
that
this
is
something
that
well,
it's
already
slow
enough.
So
I'm
glad
that
I
did
this.
We
find
this
space
by
looking
at
the
space
map
and
what
that
means
is
that
we
can
find
the
allocated
space
in
order
by
offset
on
disk.
B
So
we're
able
to
do
the
reads
from
the
target
device,
starting
from
you,
know,
offset
0
and
then
increasing
and
in,
but
but
also
skipping
over
parts
that
are
actually
allocated
so
fast
discovery
of
the
dated
copy.
We
get
sequential
reads,
but
the
caveat
is
there's
no
checksum
verification.
So
the
check
sums
are
stored
in
the
block
pointers.
We
aren't
finding
all
the
block
pointers
instead
we're
just
finding
what
is
actually
allocated.
So
we
aren't
able
to
verify
the
check
sums
during
this
and
what
that
means
is
that
for
the
most
part,
everything
works.
B
Great
and
I'll
explain
how
this
works
with
mirrors
and
data
integrity
a
little
bit
later
on,
but
I
mean
it
does
mean
that
transient
errors
can
become
permanent
errors
right
so
like
if
I
read
from
the
device-
and
it
says,
here's
the
data
like
we
trust
it-
we
write
that
to
the
new
location.
If
it
actually
gave
us
the
wrong
data,
then
well
I
mean
most
of
the
time.
It's
like
it
gave
us
the
wrong
data.
It's
gonna
keep
giving
it
wrong
forever.
B
B
Okay,
all
right,
so
we
found
this
space,
we
allocate
a
new
place
for
it
and
we
keep
track
of
the
mapping
from
the
old
to
the
new
locations
I'll
get
into
the
mapping
in
a
bit.
So
in
a
little
bit
more
detail
in
order
to
find
the
allocated
space,
we're
iterating
over
the
Metis
labs
in
the
device
that
we're
trying
to
remove
loading
that
Metis
Labs
space
map
into
a
new
range
tree.
This
svr
Alex
eggs.
B
That
tells
us
this
is
what
we're
working
on
copying
right
now
and
then
we
find
the
next
chunk
to
copy.
So
the
simplest
way,
and
the
first
way
that
we
did
this
was
to
just
say
like
find
the
next
allocated
region.
However,
big
de-allocated
region
is
that's
what
we're
copying,
but
you
know
this
could
result
in
a
large
number
of
mappings
in
thus
a
large
amount
of
memory
used,
especially
on
very
fragmented
pools.
B
So
in
this
example,
I'm
showing
like
the
red
blocks
are
allocated,
the
green
blocks
are
are
free
and
each
one
of
these
is
like
one
one
sector,
one
a
shift
size
four
kilobytes
in
this
example.
So
here
each
of
the
free
brought
you
to
the
runs
at
free
blocks
or
free
sectors
is
less
than
or
equal
to
32
K,
so
I
can
actually
allocate
almost
a
whole
16
Meg's
for
this,
so
I'm
kind
of
copy
this
whole
almost
16
Meg's
that
I'm
in
you
notice.
B
B
So
we're
going
to
read
this
whole
almost
team
eggs,
we're
gonna,
allocate
a
new,
almost
16
Meg's
for
it
and
then
write
that
all
to
the
new
location,
including
the
free
space,
so
we're
actually
allocating
space
for
these
unused
sectors
and
we're
reading
those
unused
sectors
and
then
we're
writing
them
again
and
the
again.
The
reason
to
do
this
is
to
reduce
the
number
of
mappings
and
the
reason
that
we
do.
B
We
could
say
well,
we
know
what's
allocated
in
freed,
so
just
issue
they're,
like
sure,
allocate
that
big
chunk,
but
just
do
the
reads
for
what's
actually
allocated
and
just
do
the
rights
for
what's
actually
allocated,
but
that
actually
tends
to
result
in
worse
performance
because,
like
making
the
disk
skip
over
little
bits
results
in
worse
performance,
and
this
is
actually
one
of
the
reasons
that
well
that's
one
reason.
The
other
reason
is
just
having
a
lot
more
as
the
I/o
is
to
deal
with
in
the
software
level.
B
It
has
overheads
and
so
actually
add
the
vfq
layer.
We're
already
aggregating
reads
across
spans
of
up
to
32
K,
so
I
could
have
issued
a
whole
bunch
of
reads
for
each
little
allocated
bit,
but
then
the
layer
below
me,
the
evita
of
queue
layer,
would
have
just
said.
Oh
I
noticed
that
you
read
things
with
like
a
little
gap
there.
Let
me
just
do
one
big
read
and
then
be
copy
out
the
parts
that
you
want,
which
would
have
just
been.
B
B
So
right,
so
why
I
said
that
we
could
have
allocated,
we
could
have
copied
that
whole
16
Meg's,
but
instead
we're
gonna
do
a
little
bit
less.
Why
do
we
want
to
do
that?
Well,
the
the
reason
is
that
we
want
to
minimize
the
amount
of
split
blocks.
A
split
block
is
where
there's
so
I
showed
what's
allocated
and
what's
freed,
but
we
don't
actually
know
where
the
blocks
are.
So,
for
example,
we
might
have
this
whole
16
Meg
region
is
all
allocated,
but
it's
actually
a
bunch
of
you
know.
B
20
kilobyte
blocks
next
to
each
other.
So,
from
the
like
logical
point
of
view,
I
have
a
whole
bunch
of
20
kilobyte
blocks
and
they're
all
packed.
They
all
happen
to
be
allocated
right
next
to
each
other,
nice
and
contiguous.
That's
just
what
we
want,
but
from
the
space
maps
we
don't
know
that
we
don't
know
where
the
block
boundaries
are.
All
that
we
know
is
hey
the
whole
16
plus
megabytes
is
all
allocated.
I
need
to
copy
all
of
it
great,
so
here
I've
actually
labeled
the
blocks
with.
B
B
So,
like
these
split
blocks,
we
want
to
avoid
them
when
we
can
but
they're
unavoidable
in
some
cases.
Like
the
second
case
here,
where
16
plus
megabytes
are
all
allocated
like
wow,
we
have
this
constraint
that
says
you
can't
copy
more
than
16
Meg's,
so
I
guess
I
got
to
do
exactly
16,
Meg's
and
just
hope
for
the
best
I
mean
the
reason
that
we
have
the
16
mag
constraint
is
because
that's
the
biggest
block
that
ZFS
can
allocate.
B
So
in
theory,
we
probably
could
have
taught
the
alligator
to
be
able
to
allocate
bigger
things,
but
in
practice
we
felt
like
like
a
16.
Meg's
is
big
enough
and
B,
even
though
we
could
extend
arbitrarily
large.
We
know
that
we
still
do
have
to
deal
with
the
case
where
we
can't
allocate
it
because,
as
we'll
see
later
on,
you
might,
we
might
see
a
great
16
Meg's.
Let
me
go
allocate
16
Meg's,
but
there
isn't
16
Meg's
contiguous
free
space,
and
so
we
have
to
chunk
it
up
smaller
anyways.
B
Okay
right,
so
that's
what
kind
of
what
I
said
we
may
have
to
chunk
it
up
into
smaller.
We,
if
fales,
we
have
to
split
into
two
smaller
allocations
and
then
what
we
do
is
we
kind
of
learn
from
that.
So
for
the
rest
of
this
transaction
group,
we
won't
try
to
go
back
as
16
Meg's
every
time
and
say:
oh
great,
16
Meg's.
Oh,
that
didn't
work.
Let
me
go
to
half
of
that.
Oh
next
allocation,
16
Meg's!
Oh
that's,
still
didn't
work.
Surprise!
Surprise!
B
Let
me
go
back
to
the
smaller
one
that
that
was
very
wasteful
and
also
we
wanted
to
make
the
back
off
go
to
the
exact
size
available
rather
than
exponential
back-off.
So
like
it'll,
go
to
16
Meg's
you'll
try
to
allocate
16
Meg's
if
that
doesn't
work,
it'll
split
it
in
half,
so
8
Meg's
and
8
Meg's,
but
next
time
we'll
try
like
16
Meg's
minus
1
sector,
so
we'll
eventually
find
ok.
B
This
is
the
actual
largest
segment
that
we
can
allocate
in
the
pool
on
this
gxg
and
then
then
we'll
be
able
to
go
really
quickly.
Allocating
those
so
this
is
this.
Is
this
kind
of
algorithm
is
very
important
when
your
pool
is
actually
fragmented
and
all
the
space
is
fragmented,
not-
and
you
don't
have
like
like,
if
you
just
added
a
whole
bunch
of
free,
a
whole
bunch
of
empty
disks.
You
don't
have
this
problem,
but
we
wanted
this
to
work
in
all
cases,
not
just
the
ones
that
are
easy.
B
Okay,
cool!
So
then
we're
going
to
read
it
from
the
old
we're
gonna
write
to
the
new,
and
then
we
need
to
free
the
unused
parts
of
the
new
location
right
because,
like
in
this
first
example,
I'm
I'm
allocating
that
whole
16
Meg's
a
little
bit
which
includes
all
those
free
bits
but
nobody's
using
those
free
bits
like
there
aren't
any
block
pointers
referring
to
them.
That's
the
definition
of
being
free,
so
we
need
to
free
them
from
the
new
locations
after
we've
done
this.
B
Cool,
so
let
me
add
one
more
wrinkle
of
complication
to
this
before
I
explain
the
master
strategy.
What
if
you
have
mirrors?
So
if
you
are,
if,
if
you
have
this
example,
where,
like
you,
have
a
pool
I'm
trying
to
remove
I,
have
three
mirrors,
YouTube
2
discs
I
want
to
remove
the
one
on
the
left
there
might
be.
B
The
mirror
is
healthy
right
because
we
said
we
don't
do
removal,
while
the
mirror
is
unhealthy,
but
there
might
be
unknown
damage
so
there
might
be.
It
might
have
been
some
silent
damage
to
one
of
the
sides
of
the
mirror
and
we
want
to
be
able
to.
We
want
to
be
able
to
handle
that
in
ZFS
and
we
can
in
all
the
other
cases,
so
I
was
like.
We
should
be
able
to
handle
it
here
too.
B
So
the
way
that
we
do
that
is
rather
than
just
reading
like
if
you,
if
we
were
to
normally
read
the
data
from
the
mirror,
we
would
say
great
what
you
want
to
read
it
from
the
mirror.
Let
me
choose
like
choose
one
side
at
random
and
there's
there's
the
data,
because
the
two
sides
of
the
mirror
are
the
same
right,
they're
supposed
to
be,
but
they
might
not
be
because
we
don't
trust
the
hardware.
So,
instead,
what
we
do
is
we
read
from
both.
B
We
want
to
read
from
both
sides
of
the
mirror
and
write
each
side
to
the
corresponding
side
of
the
new
location
mirror.
So
in
this
example
like
the
dark,
blue
and
purple
blocks
we're
reading
from
the
left
side
of
the
of
the
removing
device
and
we're
writing
them
to
the
left
side
of
the
new
locations.
And
then
the
light
blue
and
purple
were
reading
from
the
right
side
of
the
removing
device
and
we're
ready
writing
them
to
the
right
side
of
the
new
locations.
B
Okay.
So
how
do
we
do
that?
So
against
keeping
with
the
example
of
the
two-way
mirror?
We
create
a
zio
tree,
so
this
this
harks
back
to.
Hopefully
you
all
were
paying
attention
during
George's
talk-
and
you
know
by
looking
at
this
diagram
exactly
what
will
happen,
which
is
that
you
know
so.
This
tree
is
showing
the
zio
dependencies
so
remember
from
dirges
talk,
the
children
have
to
complete
before
their
parents.
B
So
here
you
know,
we
have
one
like
side
of
this
for
each
side
of
the
mirror
that
we're
accessing.
So
the
one
on
the
left
here
is
reading
from
the
left,
child
child
zero
and
then
writing
to
the
left
child
child
zero
on
the
right
on
the
right
here,
we're
reading
from
child
one
and
then
writing
to
child
one
of
the
new
location
and
then
the
null
zio.
That's
the
root
of
this
whole
thing.
B
B
I
just
need
to
pass
these
arguments
to
this
other
function
like
I
just
needed
it
like
I,
just
need
you
to
call
this
function
from
singing
context
and
pass
these
arguments
to
it
and
I'm
going
to
be
sitting
here,
waiting
for
you
until
you
do
it
like.
Basically,
it's
just
a
like
context,
encapsulation
mechanism,
but
you
could
almost
think
of
it
as
like,
like
if
we
had
a
more
powerful
programming
language
than
C.
You
could
just
say
like
here's.
B
Here's
a
block
of
code
just
run
this
in
this
other
context,
great,
which
is
essentially
what
we're
doing
with
the
callback
and
the
arguments
and
whatnot,
but
but
critically,
like
the
arguments
are
static
through
this
whole
process.
Normally.
But
what
we're
doing
here
is
we're
kicking
off
this
sync
task,
but
we
aren't
waiting
for
it
and
we
kick
it
off
and
we
give
it
the
head
of
a
linked
list
that
we're
then
adding
to
like
we're
continuing
to
add
to
that
linked
list.
B
Wow
like,
even
though
we've
already
dispatched
the
the
sync
task,
and
so
there's
a
there's,
some
trickiness
here,
where
we
keep
track
where
we
know
like
which
transaction
group
the
copy
is
going
to
complete
in,
and
we
make
sure
that
we're
adding
to
that
one.
We
know
that
it's
open
it
hasn't
started
so
that
we
know
that
we
haven't
started
processing
the
sync
task
yet
while
we're
modifying
its
linked
list.
B
So
this
is
like
a
little
bit
tricky,
but
it
actually
worked
out
really
well
with
the
infrastructure
that
we
had
and
we
found
this
paradigm
to
be
really
useful
in
in,
like
a
bunch
of
other
scenarios,
where
you
want
to
be
doing
something
in
the
background
from
an
open
context
thread
and
yet
have
it
be
modifying
things
that
are
stored
in
the
moss
in
singing
context.
So
we've
repeated
this
I
think
the
the
redacted
sender
receives
stuff.
Does
this
when
you're
creating
the
redaction
list
and
a
couple
of
other
things
to
do
as
well?
B
B
So
when
we
get
like
when
we
get
a
zio,
read
it's
gonna
say:
zio
read
device
ID,
zero!
Oh,
that's
gone!
What
do
we
do?
Well,
we
need
to
know
what
the
new
you
know.
We
need
to
know
where
we
should
read
from
so
we
keep
this
mapping.
It
maps
from
the
the
old
offset
dasu
on
the
remove
device
and
the
length
to
the
new
device
and
offset
and
I'm
showing
here
as
a
table.
This
is
this
is
how
it's
represented
on
disk
and
in
memory.
B
It's
just
like
an
array
of
structs
excuse
me
and
but
but
critically,
the
it's
sorted
by
the
old
offsets.
B
So
what
that
means
is
that
when
we
want
to
go,
do
a
look
up
in
it,
we
can
look
up
by
just
doing
binary
search
because
it's
already
sorted
so
we
can
look
up
in
in
log
and
time
and
because
we're
generating
the
mapping
by
going
through
the
space
maps
in
offset
order.
We
generate
it
in
this
sorted
order,
naturally,
so
there's
no
like
post-processing
step.
In
order
to
that,
we
need
to
do
in
order
to
sort
it.
B
So
like
this
thing
tasks
all
that
it's
doing
is
like
I
have
a
linked
list,
a
linked
list
of
like
structs,
where
each
row
here
is
one
of
these
trucks.
And
then
it's
like
it's
just
as
great
take
that
struct
plop
it
into
the
object
in
the
moss
at
the
end,
append
it
to
that
object
in
the
moss,
great
we're
done
copying.
B
We
have
everything
in
its
new
location.
We
know
what
the
mapping
is
between
the
old
and
new
look
and
new
locations.
Now
we
just
finish
up.
We
freed
the
space
snaps
of
the
device
we're
trying
to
remove,
because
we
don't
need
them
anymore,
there's
some
other
little
bits
associated
with
removing
Veta.
If
we
get
rid
of
those
and
then
we
replace
the
Vita
like
in
this
example,
it's
a
mirror.
B
We
is
the
first
example
and
the
ones
where,
in
the
example
where
your
gaps
are
less
than
or
equal
to,
32
K,
which
so
are
on.
Maybe
surprisingly,
the
the
first
example
here
is
0%
fragmentation.
The
second
example
here
is:
is
high
fragmentation
like
more
than
70
percent
fragmentation,
both
of
those
work
really
well.
B
But
the
worst
case
is,
though
the
worst
case
possible
would
be
where
we
have
a
run
of
fries,
that's
just
more
than
32
kilobytes,
and
then
we
only
have
like
one
sector
allocated
in
between
it.
So
in
this
case
it
could
be
really
bad.
600
megabytes
per
terabyte
of
disk-
this
is
terabyte
of
like
space
in
the
disk
not
allocated
space.
The
allocated
space
would
be
much
less
ubi
one.
B
You
know
one
thirty
one,
whatever
one
eight
or
whatever
of
that,
because
you
know
only
one
out
of
every
16
is
actually
allocated,
but
the
worst
that
we've
actually
seen
in
practice
is
about
100
megabytes
per
terabyte,
and
this
is
this
is
with
fragmentation.
That's
between
those,
so
less
than
70
percent
fragmentation.
But
you
know
more
more
than
I
think
it's
like
more
than
20
percent,
which
we
found
that
this
is
pretty
pretty
tolerable.
It's
actually
has
been
much
worse,
so
we
originally
didn't
implement
this
gap,
skipping
mechanism,
and
then
it
was
a
lot
worse.
B
So
all
right,
so
we
talked
about
the
main
removal
process.
What
that
thread
needs
to
do
to
do
the
removal,
but
what
can
happen
when
we're
in
the
middle
of
the
removal?
Because
remember
at
the
beginning,
when
I
explained
like
how
you
use
this
I
was
like
everything
just
works,
you
can
do
it.
It
runs
the
background
you
can
do
whatever
you
want,
while
you're
in
the
middle
of
it.
So
what
do
we
need
to
do
it?
B
Remember:
we've
allocated
some
some
some
of
this
space
we've
allocated
new
space
for,
but
not
all
of
it.
So
what
if
we
need
to
do
a
free
in
the
middle
of
a
removal?
Well
so
first
we
can
free
it
from
the
old
location.
We
know
that
is
not
needed
anymore
and
then
there's
a
couple
of
cases
that
we
need
to
think
about
it.
We
might
be
in
one
of
these
three
cases
or
more
as
it
turns
out,
if
we've
already
fully
copied
this
this
region
of
space,
so
we've
already
written
the
the
we've
already
done.
B
B
If
we
haven't
start,
if
we
haven't
started
copying
it
yet,
then
we
don't
have
a
new
location
for
it,
but
we
want
to
make
sure
that
we
don't
copy
it
and
remember
I
loved
the
space
map
into
this
svr
Alex
eggs
range
tree,
so
I
might
need
to
remove
it
from
there
I
might
not.
It
might
be
that,
like
you,
know,
I'm
in
the
middle
of
copying,
this
meadow
slab
and
I'm
freeing
something
from
over
here.
It's
not
in
it's
not
relevant
to
the
svr,
Alex
eggs,
or
maybe
I
guess
here
in
that
case.
B
Finally,
we
ignore
it,
but
it
might
be
in
that
svr
Alex
eggs,
which
is
telling
us
this
is
what
we
are
are
going
to
be
working
on
copying.
We
haven't
started
copying
yet,
but
we've
figured
out
what
we
want
to
copy
from
this
meadow
stop,
so
we
need
to
remove
it
from
there.
It
might
be
in
flight,
meaning
that
we've
issued
the
read
we
like
we've
allocated
a
new
place
for
it.
We've
issued
the
read
for
it,
but
the
mapping
hasn't
yet
synced
to
disk
and
we
might
the
write
might
or
might
not
have
completed.
B
B
So
we
we
remember
that
range
that
needs
to
be
freed
in
this
SVR
freeze,
which
is
a
arranged
tree
that
we
indexed
by
the
transaction
group
and
then,
when
that
transaction
group
syncs
as
part
of
that
sync
task,
that
I
mentioned
we're
also
going
to
free
everything
that
is
in
this
range
tree
and
you
might
be
in
multiple
or
all
of
these
categories,
because
a
free
is
like
a
range.
B
It's
like
free,
this
one
megabyte,
and
we
don't
really
know
like
the
thing
that's
going
in
and
copying
stuff
like
its
operating
on
whatever
increments
at
once.
So
it
might
be
that
of
that
megabyte
like
a
little
bit
of
it.
We
haven't
started
copying
a
little
bit
of
it
is
in
flight
and
a
little
bit
of
it
is
in
flight
in
a
different
txt
and
then
a
little
bit
of
it.
It
has
been
already
fully
copied,
so
the
routine
that
does
all
this
is
extensively
document
extensively
commented.
B
B
So
we
cannot
just
go
through
the
mapping
and
say,
like
whatever
the
mapping
points
to
free
that
for
two
reasons
one
is
the
map
like
the
mapping
could
span
free
chunks
and
two
is
we
could
have
had
those
concurrent
fries
happen
so
like
it
was
originally
used,
but
now
it's
not
allocated
anymore.
So
if
we,
if
we
tried
to
free
it,
we
might
be
freeing
somebody
else's
stuff.
So
instead
we
go
through
the
device
that
we're
trying
to
remove.
B
Okay,
that's
what
happens
while
we're
removing
a
device
and
I
still
have
some
time.
So
that's
good,
because
now
I
want
to
talk
about
after
the
removal
has
completed.
What
do
we
need
to
do?
How
do
we
deal
with
it?
So
the
first
thing
that
you
might
need
to
deal
with
is
opening
your
pool
when
you
open
the
pool
the
first
thing
that
we
do
after,
like
we
read
some
stuff
off
of
the
labels
and
then
we
go
into
the
moss
and
we
read
the
moss
config
object.
B
This
tells
us
about
like
what
what
devices
are
there?
What
are
the
devices
IDs?
Oh
it.
This
is
like
there's
some
mirror,
there's
like
a
mirror
like
v
divide.
E0
is
a
mirror
and
it
has
two
children
and
the
two
children
are
you
know,
device
ID,
x
and
y
so
that
that
everything
to
get
to
that
cannot
be
on
indirect
PDFs.
I
should
say
all
this
stuff
that
I
may
be
talking
about
is
in
Vita
of
indirection
C,
as
I
mentioned
the
device
that
we
remove.
B
We
call
an
indirect
Vita
afterwards,
it
doesn't
show
up
in
like
zero
list
or
the
inter.
You
know
in
the
CLI
interface
anywhere,
but
under
the
covers,
there's
still
like
AV
divide,
easy
ro
or
whatever
that
you've
removed,
like
you
movie,
divided
easy
ro,
but
then
afterwards
you
still
have
a
video
of
ID
0.
B
The
representation
of
it
on
disk
is
the
same
as
in
memory
which
is
really
nice.
It's
just
this
array
sorted
by
offset,
but
the
key.
The
tricky
thing
here
is
what,
if
I
have
removed
multiple
devices,
then
on,
like
the
all
the
the
the
first
device
that
I
removed
its
indirection
table,
might
be
on
an
indirect
vtf.
So
I
need
to
load
these
in
the
right
order
so
that
by
the
time
I
get
to
that.
B
First,
one
I
already
have
all
the
mappings
that
I
need
to
load
its
mapping,
so
we
have
to
load
them
in
reverse,
chronological
order
right
because
older
mapping
objects
may
be
on
more
recently
removed,
indirect
videos,
okay
cool,
so
we've
we've
opened
the
pool,
everything's
cool.
Now
what
operations
might
we
need
to
do
that
would
interact
with
this
indirect
to
be
deaf?
Well,
we
might
need
to
read
from
it.
That's
like
the
most
common
one.
When
we
read
from
indirect
we'd,
have
we
go
through
the
indirection
table?
B
That
tells
us
where
we
need
to
actually
read
from,
and
this
is
pretty
straightforward,
there's
this
function
via
of
indirect
remap,
which
basically
you
give
it
a
call
back,
and
it
calls
your
call
back
on
on
telling
it
what
the
new
location
is
or
new
locations
are
because
remember,
we
might
have
split
blocks,
meaning
that,
like
one
logical
block,
part
of
it,
we
move
to
one
location
and
part
of
it.
We
move
to
another
location
and
it
might
not
just
be
two
locations
like
I'm
using
in
my
examples.
B
It
might
be
like
a
thousand
locations
in
practice,
probably
not
a
thousand
but
like
it
could
be
more
than
two,
and
so
we
did
some
work
to
make
sure
that,
oh
so
you
have
this
split
but
then
also
on.
In
addition
to
it
might
be
a
split
block,
it
might
be
multiplied
and
directed
so
like
I
might
have,
if
you
think
of
so
I
had
to
come
with
all
these
use.
B
All
these
scenarios
to
test
this,
the
the
one
of
the
hardest
scenarios
is
like
I,
have
two
devices
I
remove
this
one
and
then
that
copies
everything
to
here
and
then
I
add
it
back
as
an
empty
device
and
then
I
remove
this
one.
So
it
has
to
copy
everything
over
here
and
then
add
it
back
as
an
empty
device
and
then
I
remove
this
one
so
like
I'm,
just
migrating
the
data
back
and
forth,
and
back
and
forth,
and
back
and
forth
back
and
forth.
B
So
every
single
thing
in
the
whole
pool
is
like
in
directed
through
as
many
indirect
v-dubs
as
like
I've
done
removals.
So
you
can
do
this
in
the
test
case.
The
test
suite
has
cases
that
do
this
where
it
just
like.
Does
this
remove,
remove,
add,
remove,
add,
remove,
add,
remove,
add
and
you
can
get
like
hundreds
of
layers
of
indirection,
so
we
needed
to
implement
this
function
non
recursively.
The
obvious
way
to
do
is
like
okay,
great
I
go
with
Rooney
and
DirecTV
dev.
B
It
tells
me
here's
the
new
location
and
then
the
new
location
is
some
Vita
right.
It
doesn't
matter
what
kind
of
viewed
it
is.
Then
I
just
do
the
read
on
that,
but
that
one
may
also
be
indirect
and
then
that
results
in
this
could
result
in
this
recursion.
So
we
we
made
this
video
in
directory
map
aware
aware
that
it
might
point
to
another
indirect
V,
dev,
oh
geez,
alright
I
only
have
five
minutes
left
so
we'll
see
how
this
goes.
B
So
we
made
this
actually
working
on
recursively,
with,
like
a
explicit
the
allocated
stack
of
things
to
do.
Okay,
but
in
the
common
case,
where
it's
not
a
split
block,
then
we
can
just
do
a
child
I
owe
to
the
to
the
one
new
location
passed.
The
check
sum
in
and
then
that
child
IO
handles
the
data
integrity
so
like
it
might
be
a
read
from
a
mirror,
the
just
like
the
normal
read
from
a
mirror.
It
has
the.
It
knows
the
check
sum
it
can
try
both
sides.
B
But
what,
if
we're
reading
a
split
block,
then
we
don't
so
the
issue
is
that
we
don't
have
sub
block
check,
sounds
so
like
we.
We
have
the
checksum
of
the
whole
block,
but
half
of
it
is
here
and
half
of
it.
Is
there
when
I
issue
the
read
for
this
half
I,
don't
know
what
the
checksum
of
that
is,
so
I
can't
pass
it
down
to
the
the
child's
I/o.
So
instead
we
need
to
handle
the
integrity
at
the
indirect
veto
of
level,
and
this
is
kind
of
similar
to
what
happens
with
the
raid-z.
B
So,
in
that
case,
we're
we
issue
a
child
I/o
for
each
segment
of
the
split
that's
going
to
target
the
the
top
level
of
eat
of
that
it
points
to
so
that.
So,
if
that's
the
mirror,
it's
going
to
read
from
like
you
know
some
random
healthy
feet
of,
and
then
we
get
them
all
we
stitch
it
back
together.
We
verify
the
checksum.
B
The
checksum
is
good
and
we
say
great
now
we
have
your
data,
we
know
we
have
your
data,
but
what,
if
you
have
split
locks
and
you're
reading
from
an
indirect
you
reading
from
the
in
direct
review
dev,
you
have
slid
blocks
and
you
have
mirrors
and
you
have
silent
damage.
So
you
know
in
this
example,
I
said:
okay,
we
read
from
a
writ
from
a
healthy
leaf
and
then
we
get
their
corrected
data
and
the
checksum
is
correct.
But
what
if
the
checksum
is
not
correct,
we
want
to
again.
B
We
still
want
to
be
able
to
handle
the
cases
where,
like
there's
silent
damage
and
the
disk
says,
here's
the
data
and
it's
not
actually
the
data.
So,
in
that
case,
what
we
need
to
do
is
we
don't
know
which
part
of
it
is
damaged
like
right?
We,
we
split
it
in
this
case
I'm
just
showing
but
like
it
could
be
more
slits
right,
like
I,
could
have
five
different
parts
of
the
split
in
five
different
locations
and
I,
don't
know
which
of
those
five
is
actually
has
the
damage.
B
B
So
like
let's
say,
first
I
try
both
the
both
left
child,
both
left
children
and
the
actual
damage
which
I
happen
to
magically
know,
is
on
both
on
the
left
child,
which
I've
marked
here
with
those
red
do
not
enter
symbols.
So
we
verify
that
we
check
the
checksum
and
it's
like
nope,
that's
not
the
right
data.
So
then
I
try,
okay!
Well
what?
If
I
tried
the
left?
B
The
first
part
of
it
is
the
left
and
then
the
second
part
of
it
is
I,
try
the
right
side
of
the
mirror,
so
the
light
purple
there.
Well,
that's
still
not
the
right
data
okay.
Well,
then,
let
me
like
I'm
kind
of
doing
binary,
counting
here
right,
I'm
like
adding
one
so
I
flipped
the
next
bit
so
now,
I'm
doing
the
right
part
of
the
first
mirror
and
the
left
part
of
the
second
mirror
still
not
the
right
data
geez.
Where
is
that
data?
So
then?
B
B
So
this
is
also
kind
of
unique
and
I
think
better
than
what
we've
done
in
a
bunch
of
other
cases
like
with
raid-z,
because
we
actually
like
compare
like
we're
gonna,
be
be
comp
the
blocks
and
see
like
okay,
this
one
like
because
you
might
have
a
three-way
mirror,
for
example-
and
you
know
we
want
to
figure
out
which
one
is
right
and
which
one
is
wrong
and
we
might
not
have
gone
through
every
possible
combination,
but
we
can
check
and
see.
Okay,
here's
the
right
data.
B
I
know
this
is
the
right
data
and
I
have
every
other
copy.
Let
me
just
see
what
any
of
them
that
differ
from
the
correct
data
if
they're
different,
then
I'm
gonna
issue,
the
repair
right
in
this
example.
I
only
have
four
different
ways
of
doing
it:
combinations
to
try,
but
you
know
every
additional
split
is
exponentially
more.
B
So
if
I
had
three,
if
we've
split
three
ways,
then
instead
of
four,
it
would
be
eight
combinations
and
if
you
have
too
many
combinations
which
rarely,
if
ever
happens
in
practice,
but
does
happen
when
you're
like
running
z-test,
then
it
might
take
like
until
the
heat
death
of
the
universe
to
try
them
all
right.
Cuz,
like
you
know,
two
to
the
hundred
is
a
big
number
like
counting
from
tu-tu-tu-tu-tu-tu
to
the
100
is
like
takes
forever
and
then,
like
doing
you
know
doing
like
sha-256.
That
many
times
is
also
not
practical.
B
Guess,
that's
what
you
get
to
do
if
it's
your
conference.
So
what?
If
you
get
a
right
to
an
indirect
view,
dev
wait
wait
a
minute.
What
didn't
we
start
out
this
by
saying
like
there?
You
are
not
writing.
There's
no
rights.
We
made
sure
that
the
syllables
I'm
gonna
write
to
it
and
whatnot,
but
it
turns
out
you
could
get
like
a
self
heal
right.
So
the
most
common
case
that
you
might
see
this
is
like.
We
discover
that
there's
some
bad
data
via
like
a
doodle
block.
B
So,
like
you,
have
a
block
pointer,
it
has
two
DVA
s.
Let's
say
one
of
them
is
concrete
and
one
of
them
is
the
indirect
one.
We
read
for
you,
the
indirect
one
and
when
we
say
like
I
tried
all
the
combinations
like
the
data
is
just
not
here,
but
then
we
can
like
the
layer
above
us
in
the
Zeo
chain,
is
going
to
try
reading
it
from
the
other
D
DVA
and
and
maybe
it
finds
up
alright,
that's
this
one.
It's
still
good.
B
B
Okay,
freeze
from
the
indirect
VF,
so
pretty
simple,
we
just
free
it
from
the
new
location,
but
interesting
thing
is
that
now
that
means
that
some
parts
of
the
mapping
are
no
longer
needed
right.
I
freed
that
the
free,
like
by
definition,
means
nobody
is
ever
going
to
read
that
again,
so
that
part
of
the
mapping
is
no
longer
needed.
We
call
that
obsolete.
So
maybe
we
could
reduce
the
memory
used
by
the
mapping
table
now
that
we
know
that
some
part
of
the
mapping
table
is
no
longer
relevant.
B
Maybe
all
right
I'm
gonna
fast
for
it.
So
the
caveat
to
all
this
is
that
we
implemented
all
this
before
we
did
the
large
mappings
thing
and
when
you
have
the
large
mappings,
this
is
much
much
less
necessary
to
do
this,
managing
the
obsolescence
to
reduce
the
size
of
the
mapping,
because
the
mapping
is
just
much
much
smaller
to
begin
with
and
because
the
mapping
tends
to
cover
many
many
blocks
like
you
have
to
wait
till
a
lot
of
things
get
freed
because
you
might
have
you
know,
maybe
it's
not
16
Meg's.
B
B
Because
I'm
running
out
of
time,
I'm
only
going
to
show
you
the
cool
pictures
that
I
did
because
I
spent
a
lot
of
time
in
them
and
really
this
is
this
is
not
so
much
to
convince
you
of
like.
Oh,
this
is
super
cool.
It's
more!
So
you
like
it
when
you're
going
and
looking
at
this
code
and
you're
like
what
the
heck
is.
All
this
you
can
understand
like
the
high
level
of
where
it's
coming
from
so.
B
Right
so
there's
what
happens
is
when
you
do
a
free.
We
append
that
space
to
the
obsolete
space
map
that
tells
us
everything
that's
been
freed
and
then,
in
the
background,
we
do
this
condensed
operation
that
basically
takes
that
obsolete
space
map
and
incorporates
it
into
this
obsolete
size
which
is
not
rendered
correctly.
But
that's
essentially
like
to
the
side
of
the
main
mapping
structure.
We
all
saw
this
piece
of
information
that
tells
us
like
how
much
of
that
size
is
obsolete.
If
it
gets
to
be
the
entire
size,
then
we
can.
B
When
we
rewrite
it,
we
can
omit
that
entry
and
then
we
can
also
say
like
when
you're.
Whenever
we
write
an
indirect
block,
we
can
say:
hey
here's
an
I'm
writing
this
indirect
block
and
it
happens
to
have
like
kind
of
irrelevant
to
what.
Why
I'm?
Actually
writing
it.
There's
this
other
block
pointer
in
here,
which
points
to
an
indirect
Veta.
Maybe
I
can
write
that.
Maybe
I
can
rewrite
that
indirect
block
pointer
to
point
to
the
new
concrete
location
which
maybe
then
I
could
now
like
mark
that
obsolete.
B
But
you
can't,
if
there's
snapshots,
so
the
snapshots
might
still
reference
it
via
the
indirect
block
pointers.
So
we
have
this
new
undo
structure
called
remap
dead
list
which
keeps
track
of
DBAs
that
are
that
are
referenced
in
the
snapshot,
but
then
have
been
remapped.
And
then,
when
you
delete
a
snapshot,
you
can
find
everything
that
everything
that
has
been
remapped
and
is
no
longer
part
of
any
snapshot,
and
this
algorithm
is
like
almost
exactly
the
same
as
regular
dad
this.
So
a
lot.
B
B
We've
been
working
on
this
for
many
many
years
and
a
lot
of
other
developers
have
helped
us
with
this.
We're
listed
here
and
we've
been
using
this
in
production
in
del
phix.
In
our
product
since
2015,
and
then
it's
been
in
upstream
in
all
the
repos
since
early
this
year,
so
future
work,
two
really
cool
things
that
we'd
like
to
do.
One
is
being
able
to
queue
up
multiple
devices
to
be
removed.
The
main
point
of
this
would
be
to
mark
them
as
analogous
ineligible
for
allocations
and
to
do
the
space
checking
upfront.
B
The
big
win
here
is
that,
like,
when
I'm
removing
a
device,
rather
than
moving
that
space
to
all
of
the
remaining
devices,
I'm
going
to
move
it
to
the
ones
which
will
be
remaining
after
I've
completed
all
the
removals
that
I
want
to
do
right
now.
I
have
a
prototype,
but
I
need
some
work.
The
other
really
cool
thing
that
we
would
like
to
do
is
to
be
able
to
remove
a
raid-z
group.
I
think
this
is.
B
B
Well,
you
can
ask
one
question
and
then
we'll
we
can
talk
later
No,
so
the
indirect
means
there
forever
I
mean
in
theory.
We
could
say
like
Oh
like
once,
once
that
mapping
table
gets
to
zero
size,
we
could
remove
it
but
like
there's
no
cost
to
it.
So
whatever
that's
what
you
use
your
one
question
on
all
right.
B
Yes,
so
the
question
was
about
the
spanning
those
those
free
segments
and
then
the
impact
on
SSDs
yeah,
like
so
like
when
we
were
doing
the
copy.
It's
like
good
for
SSDs
right
because
we're
doing
big
chunks,
but
then
yeah
like
when
we
do
this
freeze,
it's
gonna.
Do
it's
gonna,
do
the
trims,
if
freeze,
do
trims
so
like
if
you're
running
on
a
platform
or
a
config,
we're
freeze
do
trims,
it
will
trim
them
because
it's
just
so
normal
free
but
I.
B
Will
that
reminds
me
that
there's
a
trade-off
to
be
made
here
like
before
when
we
were
not
doing
spanning
the
free
chunks?
It
meant
that
if
you
have
a
fragmented
device-
and
you
do
divide
in
you
remove
it,
then
all
the
free
space
gets
compact
like
everything
gets
compacted
and
then
your
fragmentation
like
it
goes
away
versus
with
the
spanning
the
free
chunks
it.
It
basically
preserves
your
fragmentation,
which,
like
is
not
great
but
like
the
memory
and
performance
benefits,
are
really
really
huge.
So
if
you
need
to
you
can
change
that
tunable.
B
B
Yeah,
oh
I,
see
so
the
question
was
basically
like
what,
if
I
just
have
a
regular
like
an
on
split
block,
just
like
a
regular
block
could
I
do
this
combination
thing
yeah?
You
totally
could,
if,
basically,
if,
if
what
you're
trying
to
protect
it
is
like
I
have
a
mirror,
I
have
a
block.
It's
like
it's!
B
It's
just
a
normal
mirror,
normal
block,
same
thing
on
both
sides
and
what
what
the
failure
mode
that
I'm
concerned
with
is
like
the
beginning
of
the
block,
got
messed
up
on
this
side
in
the
end
of
the
block
got
messed
up
on
this
side,
then
I
want
to
be
able
to
stitch
it
all
back
together
on
a
like
sector-by-sector
basis
or
like
byte.
By
byte
basis.
Yeah
I
mean
you.
You
could
totally
do
that.
I
would
not
like.
B
B
Yep
yeah,
so
the
question
was
like
what,
if
my
pool
is
not
homogeneous
and
I
have
like
some
three-way
mirrors
in
some
two-way
mirrors.
In
that
case,
we
basically
do
the
best
that
we
can
in
that
scenario
so
like
if
you're
removing
a
three-way-
and
you
only
have
two
ways-
we're
just
going
to
read
like
two
randomly
selected
children
and
then
write
those
over
here
and
if
you're,
like
I'm,
removing
a
two-way
and
I
have
a
three-way.
Well
like
we're
gonna.
One
of
these
is
gonna,
get
the
same
thing.
B
You
know
copied
twice
yeah,
but
yeah
it
does.
It
does
handle
that
and
does
kind
of
as
best
as
it
can,
given
the
situation.
Great
question,
all
right,
I'm
already
like
way
way
over.
So
thank
you
all
for
indulging
me,
and
this
is
my
first
full-length
open,
ZFS
presentation
in
six
years,
I've
been
organizing
this
conference
for
six
years.
This
is
my
first
full-length
technical
presentation.
So
thank
you.