►
From YouTube: Sequential Reconstruction by Mark Maybee
Description
From the 2020 OpenZFS Developer Summit
slides: https://docs.google.com/presentation/d/1vLsgQ1MaHlifw40C9R2sPsSiHiQpxglxMbK2SMthu0Q/edit?usp=sharing
Details: https://openzfs.org/wiki/OpenZFS_Developer_Summit_2020
A
So
yes,
as
matt
said,
I
am
getting
two
back-to-back
talks
here.
The
first
talk
is
going
to
be
relatively
short
and
we'll
have
some
time
for
q.
A
between
that
talk
and
the
second
talk,
and
this
topic
sequential
reconstruction.
A
There
is
some
some
depth
here
that
we
can
get
to
or
if
you're
interested
in
going
into
the
nitty-gritty
details
feel
free
to
reach
out
to
to
me
or
to
ryan
mellendorf,
to
get
more
details
about
what's
happening
here,
but
I
really
just
wanted
to
talk
more
about
what
this
sequential
resolver
really
means,
or
so
much
reconstruction
means
and
a
little
bit
of
comparison
and
contrasting
that,
with
our
existing
healing
receiver
within
zfs,
first
a
little
bit
of
history.
A
This
feature
actually
originates
out
of
the
work
that
isaac
long
did
for
the
d-ray
project
and
a
few
months
ago,
brian
bellendorf
sort
of
pulled
it
from
that.
The
the
d-ray
pr
and
created
a
its
own
pr,
primarily
because
that
allows
us
to
get
us
in
allowed
us
to
get
us
it
into
the
code
base
before
2.0,
and
it
was
really
a
separate
feature
that
applied
to
more
than
just
the
d-rate
feature.
A
So,
first,
a
little
bit
of
taxonomy.
I'm
putting
this
up
here,
because
the
language
around
re-silver,
rebuild
reconstruction
is,
is
a
bit
confusing
and
sometimes
conflated,
and
we
do
use
some
specific
language
in
gfs
when
we're
talking
about
these
processes.
So
traditionally
data
reconstruction
zfs
always
use
the
term
re-silver.
A
I
believe
this
probably
dates
from
the
time
when
we
have
mirrors
as
our
primary
redundant
data
representation
and
so
mirrors
use
the
termini
silver
to
talk
about
moving
data
back
onto
a
copy
in
private
placements.
A
This,
though,
got
carried
over
to
be
used
in
raid
z
as
well,
because
the
same
code
base,
the
same
processes
were
used
to
drive
the
raid
z.
Reconstruction
was
happening
with
a
mirror
reconstruction,
and
this
process
is
driven
by
a
cool
traversal
through
the
entire
pool
tree
using
the
block
pointer
tree
reconstruction
that
I
just
mentioned
in
history
was
introduced.
Much
later
with
the
d-rate
work,
and
in
that
work
it
was
actually
called
rebuild.
A
It
used
that
term
because
rebuild
is
a
term
commonly
used
in
the
rate
community,
particularly
the
clustering
rate
community,
to
talk
about
the
process
of
recovering
your
redundancy
after
you
lose
a
drive
and
recovering
onto
a
distributed
spare
drive
in
zfs
and
our
implementation
and
isaac's
implication
of
this.
It
was
actually
it's
actually
driven
by
a
traversal
of
the
space
allocation
mouse
rather
than
the
block
pool
block
pointer
tree.
A
So
currently,
after
as
we
were
putting
the
pr
together
and
getting
this
integrated
back
upstream,
we
came
with
some
new
terminology
to
try
to
clarify
what
these
two
different
processes
were
all
about,
and
so
we
defined
the
reconstruction
driven
by
the
block
pointer
tree
as
a
healing,
re-silver
and
I'll
talk
about
reason.
Why,
here?
In
a
second
and
then
reconstruction
driven
by
space
maps,
we
now
call
a
sequential
re-silver,
and
that
is
largely
because
it's
being
driven
sequentially
with
space
maps,
where
healing
silver
is
being
driven
by
a
black
point.
Traversal.
A
Yep,
I
just
managed
to
go
all
the
way
through
my
talk,
all
right,
dealing
with
silver.
It
really
gets
its
name.
I
believe
from
the
fact
that
it's
based
off
of
the
technology,
the
read
technology
embedded
in
zfs
for
self-healing
reads.
So
we
call
that
that
that
process
self-healing
reads:
it's
that
same
basic
process,
which
we
leverage
for
data
re-silvering.
That's
that's!
That
goes
on
the
name,
dealing
with
silver.
A
It
works
on
all
data
layouts,
at
least
all
data
layouts
that
have
some
sort
of
data
redundancy
specifically
mirrors
and
raid
z,
as
I
mentioned,
versus
a
block
tree
to
find
a
reconstruct.
A
What's
nice
about
this
process,
is
that
it
can
trim
the
traversal
based
off
of
the
block
pointer
timestamp.
So
as
it's
traversing
and
see,
I
need
to
just
re-silver
this
section
of
time.
It
can
just
visit
the
blocks.
The
section
of
the
tree
subsection
of
the
data
point
tree
that
incorporates
just
those
blocks
the
problem
with
it
is
that
because
it's
doing
it
in
block
point
or
order
based
off
black
pointer
tree
as
the
pool
ages
and
your
pool
your
tree
no
longer
necessarily
has
its
data.
A
In
chronological
order,
or
in
sequential
order
on
on
the
the
media
that
traversal
the
visiting
those
block,
pointers
now
becomes
somewhat
of
a
random
process,
so
in
particular
in
situations
where
you
have
a
lot
of
small
random.
I
o
you
know
age
tree
you're,
going
to
find
that
new
silver
is
slower
and
slower.
I'm
sure
that
you've
all
experienced,
or
at
least
heard
stories
of
re-silverings
lasting
weeks
for
pools
particularly
pools,
raising
pools
with
large
stripe
width.
A
So
that
is
one
of
the
real
problems
that
we're
trying
to
address
with
sequential
race.
Over
we,
there
was
an
enhancement
to
the
traditional
black
poetry
based
traversal
healing
silver,
which
does
improve
the
situation.
It
batches
lots
of
ios
together
by
reading
a
whole
bunch
of
block
pointers
and
then
sorting
them
into
a
sequential
list
and
then
traversing
those
black
pointers
and
at
least
you're
getting
or
that
the
scale
of
that,
those
that
pre-read
a
sequential
traversal.
A
A
A
So
it
can
break
up
the
allocation
into
chunks
of
maximum
efficient
block
sizes.
So
you
can
sequentially
issue
appropriate,
rebuild
operations,
okay,
so
the
mirror
copy
operation
very
efficiently
and
very
quickly
traverse
through
what
has
actually
been
allocated
on
the
drive.
A
A
That
isn't
necessarily
the
end
of
the
world,
but
it
is
something
to
be
aware
of
when
you're,
using
sequential
receiver
and
another
important
critical
constraint
is
because
it's
basing
its
traversal
off
of
the
space
allocation
maps
and
it's
synthesizing
block
pointers.
A
This
just
doesn't
work
at
all
for
radio
apps
raid
z,
as
you
should
know
or
well,
is
a
allocation
that
needs
the
block
pointer
data
to
be
able
to
locate
the
actual
data,
size
and
parity
associated
with
the
razy
layout,
and
that's
just
not
available
when
you
all
you
have
to
work
with
is
the
space
allocation.
A
A
So
why
is
sequential
silver?
Well,
obviously,
it's
going
to
be
good
if
your
recovery
speed
is
critical
and
when's
your
cover
speed,
not
critical.
This
just
reduces
your
window,
vulnerability
for
losing
a
second
drive.
You
get
the
data
out
to
replicate
it
back
as
quickly
as
possible.
A
Our
measurements,
particularly
on
a
fragmented
small
block
data,
shows
that
sequentially
silver
can
be
over
twice
as
fast
as
a
healing
receiver,
but
in
being
able
to
recover
from
or
replace
the
drive
and
drive
out
of
situation
and
in
the
situation
of
a
two-way
mirror.
If
you
think
about
this,
it's
actually
pretty
much
always
going
to
be
a
reasonable
idea.
A
The
fact
that
you
were
unable
to
verify
that
read
data
before
you
wrote
it
to
the
other
side
of
the
mirror
is
not
hurting
you
in
terms
of
your
rebuild
all
right.
If
you're,
you
had
verified
the
data,
you
would
have
simply
said
all
right.
I
don't
have
good
data.
I
can't
copy
the
data,
so
I'm
going
to
have
a
bad
date
anyway
and
for
all
the
good
data
we'll
get
a
good
copy.
So
it's
always
a
good
idea
in
that
situation,
but
that
said,
scrubs
are
sort
of
critical
in
sequential
silver.
A
Once
you
have
completed
your
sequential
receiver,
you
really
do
need
to
go
scrub,
go
through
and
say:
did
I
actually
lose
any
data?
Did
I
actually
have
a
problem
with
antlers
copies
and
particularly
with
metadatas,
where
you
may
have
another
copy,
scrolled
away
and
zero
block,
then
you
can
correct
and
detect
that
situation
and
actually
clean
up
the
other
half
of
the
mirror.
So
a
scrub
is,
is
not
just
recommended,
but
automatically
triggered
by
the
sequential
silver
coat
as
soon
as
it
completes
this
work
now.
A
I
believe
that
you
should
always
scrub
regularly
regardless,
and
so
it's
really
a
moot
point,
but
in
this
case
scrub
is
necessary.
A
So
that
can
save
you
time.
If
you
think
about
it,
the
the
a
scrub,
a
healing
silver
plus,
a
scrub,
is
going
to
be
slower
than
a
sequential
re-silver.
All
on
its
own
sorry,
I
say
that
right,
yes,
no,
sorry,
sequential
silver
plus
of
scrubs.
This
is
gonna,
be
slower
than
healing
silver,
healing
or
sulfur,
and
so,
if
you're,
just
thinking
about
in
terms
of
when
can
I
get
back
to
a
state
where
I
know
everything
is
absolutely
correct.
A
Finally,
healing
silver
is
not
absolutely
necessary
if
you're
talking
raising,
we
can't
scrub
raid
z
or
we
can't
sorry,
we
silver
z
with
sequential
silver,
and
so
you
have
to
the
only
option
you
have.
There
is
a
healing
of
silver
at
this
time
and
then
the
question
is
what
about
d-rate
and
I'll
answer
that
in
my
next
talk,
but
for
now
let's
go
ahead
and
take
any
questions.
Anybody
might
have.
B
So,
first
one
from
alan
jude:
how
does
the
terminology
from
the
previous
semi-sequential
re-silver,
where
it
uses
the
range
tree
to
do
a
block
pointer
driven
receiver
but
with
fewer
random
iops?
So
I
think
that's
the
one
that
you
mentioned.
A
Yes,
so
yeah,
so
the
semi
sequential,
is,
is
the
enhancement
to
what
we
call
healing
silver.
So
healing
silver
really
covers
that
now,
so
that
semi,
sequential
and
is,
is
a
variation
on
the
healing
silver,
where
we
do
the
batching
of
the
ios.
B
All
right
second
question
from
jan
bramkamp
is
sequentially
silver
limited
by
the
vfs.cfs.vdf
scrub
max
active
setting.
C
Then
you
could,
in
theory,
like
you,
lose
one
of
the
disks
when
you
re-silver
it,
there
isn't
just
one
copy
of
the
data,
so
in
theory
you
could
benefit
from
verifying
the
checksum
when
you
read
from
the
other
way
or
like
reading
both
sides
and
be
like.
Oh,
if
they're
the
same,
they
don't
need
the
checksum
or
something
like
that
right
right
right.
So.
A
You
could
sort
of
get
at
least
compare
the
data
you
get
off
the
other.
If
you
have
multiple
copies
available,
you
could
potentially
compare
them
right,
but
then
the
questions-
I
guess,
unless
you
had
enough
of
a
quorum
to
be
able
to
determine
that
you
what
you
know
you
had
more
copies
that
were
this
way
versus
that
way.
A
Yeah.
You
know,
I
don't
know
that
it
would
help
until
you
had
four
weight
mirroring.
B
Right
question
by
becky
ligon:
how
do
you
specify
healing
versus
sequential
re-silvering.
A
So
in
the
re-silver,
when
you
request
a
re-silver
there's
a
flag-
and
I
think
it's
like
minus
s-
for
sequential
minus
h
for
healing,
but
don't
call
me,
do
mean
gfs
and
find
out.
B
Thank
you
and
the
last
question
we
have
here
from
another
one
from
yam.
As
far
as
I
know,
the
checksum
is
only
stored
in
the
block
pointer
okay.
So
it's
not
a
question.