►
From YouTube: Improving “zfs diff” performance with reverse-name lookup by Sanjeev Bagewadi & David Chen
Description
From the 2020 OpenZFS Developer Summit
slides: https://drive.google.com/file/d/1t9O_KXa5aXxUwfXG6DvIMIetCOHzdH0Q/view?usp=sharing
Details: https://openzfs.org/wiki/OpenZFS_Developer_Summit_2020
A
Greater
than
everyone,
my
name
is
david
and
my
co-worker
sanji
venai
today
is
going
to
talk
about
the
festive
optimization
and
also
some
problems
that
we
fixed
yeah.
So
so
what
does
diff
is
for
so
z
festive.
So,
for
example,
if
you
have
two
snapshots,
you
can
do
a
zip
step
on
those
two
snapshots
and
it
will
show
like
all
the
changes
made
between
those
two
snapshots.
A
So,
for
example,
if
you
look
at
the
below
there's
a
sample
output
symbol
over
here
so
yeah,
it
shows
that
some
files
are
added.
Some
files
are
modified,
some
files
are
like
remove
or
the
rename
yeah.
So
we
use
cfsdf.
A
To
allow
the
doctor
body
software
to
do
in
backup,
so
basically
we
send
this
information
to
instead
of
third-party
software,
and
it
will
do
the
like
the
copying
the
files
modified
files
for
backup
so
yeah,
so
it
works
pretty
well
for
for
small
set
of
changes,
but
we
found
out
that
there's
some
problem
with.
If
you
have
a
large
number
of
files
of
changes
between
snapshot,
the
z
festive
will
take
a
very
long
time.
It's
very
slow.
A
For
example,
we
have,
we
have
a
cases
that
we
have
like
1
million
files,
1
million
changed
files
and
territory,
and
it
takes
like
about
about
an
hour,
and
so
we
try
to
dig
dig
the
where
the
perfect
bottleneck
is
so
we
found
out
that
it
is
mainly
in
the
the
known
the
dino
to
pad
translation.
A
That's
where
that
most
times
are
take
taken.
So
another
issue
where
we
found
out
is
is:
when
you
have
power
link
changes
mixed
in
it
will
actually
break
zip
as
break
the
div,
for
example,
if.
A
For
example,
if
you
you
have
a
file
called
the
5.6
and
you
you
create
a
hard
link
in
the
another
directory,
and
then
you
remove
the
new
link
you
just
created,
and
then
you
do
a
cfs
div.
You
will
show
here
that
the
file.6
is
removed,
but
you
you
don't
actually
remove
that
file,
so
the
file
is
still
there,
but
the
zip
is
div
showed
that
the
file
is
removed
so
that
that's
not
the
correct,
not
not
really
a
correct
result
that
we're
expecting
so
yeah.
A
So
this
issue
has
been
reported
on
on
github
before
some
people
may
see
like
unable
to
determine
path
or
stats
for
object,
107
some
kind,
something
some
message
like
this
yeah,
so
yeah.
This
is
something
you
might
also
see
before
next
slide.
Please.
A
A
A
So
it
find
out
that
okay,
here
there's
an
entry
with
the
id
118
and
the
name
of
the
entry
is
bob.
So
here
is
it
figure
out
figure
out
that
the
the
fountain
is
bought
yeah,
so
this
operation,
it
needs
to
repeat
all
the
process,
all
the
way
to
the
root
territory.
So
you
can
figure
out
like
the
file
name,
the
the
directory
name
from
the
parent
and
then
directory
name
of
the
grand
grandparent
and
so
so
forth.
A
Yeah
next
time,
please
yeah
so
yeah.
So
because,
because
of
the
linear
search,
so
each
pad
components
will
cause
the
big
one.
So
a
few
times
like
how
many
layers
of
your
file
is
so
it's
it's
is
it's
going
to
add
up
a
lot
and
also,
if
like,
if
you
have
a
lot
of
files,
are
inside
the
comment
directory.
A
You
are
also
like
repeatedly
doing
the
translation
for
those
commentary
as
well,
so
so,
basically
you're
doing
the
same.
You're
you're
doing
the
same
thing
like
multiple
times.
So
if
we
can
catch
the
result
for
the
common
tertiaries,
so
it
will
also
speed
things
up
significantly.
A
Also,
the
linear
search,
because
when
you
do
linear
search
through
a
large
territory,
you
also
like
caching
a
lot
of
a
lot
of
caches
in
the
arc.
So.
B
A
You're
like
causing
unwanted,
churns
yeah,
so
so,
basically
yeah
we
with
a
large
large
number
of
changed
files.
This
problem
will
will
significantly
increase
the
the
total
runtime
of
the
zfs
diff
next
site.
Please.
A
Yeah,
the
second
problem
is
the
harling
problem.
So
so
because
we
only
store
like
one
parent
id
inside
the
the
dino,
so
we
will
create
a
hard
link.
So
what
the
current
call
does
is
that,
for
example,
if
you
look
at
the
graph,
so
we
have
the
file,
that's
in
thirteen
id5.
A
So
if
you
do
a
hard
link
to
direct
id
six,
so
now
you
have
two
directory
points
in
the
same
file.
What
it
does
is
that
zifa's
current
code
does
is
that
it
always
changed
the
parent
id
store
in
arduino
to
the
new
new
parent.
A
The
new
hard
link
you
created
so
now
when
you
remove
the
remove
the
hard
link.
Six
so
you'll
find
out
that,
because,
because
the
the
parent
id
store
on
the
the
dna
is
still
six,
so
he
basically
couldn't
figure
out
the
real
parents
now.
A
So
so
that's
why
it
it
broke
the
the
path,
translation
yeah.
So
that's
the
the
issue
with
with
hard
links
next
time.
Please.
A
Yeah
so
so
the
rest
I'll
hand
over
to
sanji.
B
Thanks
david,
so,
like
david
explained
on
how
we
have
this
issue
of
sequential
search,
I
think
one
of
the
key
things
that
I
would
like
to
add
to
what
david
said
is
the
zap
entries
themselves
are
indexed
using
the
name
of
the
file
or
a
directory
and
the
so
that's
your
key,
and
the
name
is
the
key,
whereas
the
value
is
the
inode
number
and
hence
you
cannot,
although
it's
indexed
search,
but
it's
indexed
for
the
name
and
not
for
the
for
the
inode
or
the
dna
that
we
are
looking
for,
so
that
that's
where
the
reverse
name,
lookup,
sort
of
becomes
a
problem
because
you're
doing
a
sequential
search.
B
So
the
way
we
try
to
address
this
is
when,
when
you
add
an
entry
to
the
zap
you
we
actually
compute
the
hash.
So
we
know
the
hash
value
when
when
we
are
adding
it,
so
what
we
thought
was
we
picked
up
that
hash
and
added
it
to
the
d
node.
B
The
hash
itself
is
64
bit,
so
we
could
just
easily
add
it
to
to
the
d
node
and
we
felt
that
the
system
attributes
themselves
actually
lend
pretty
neatly
to
this
usage.
So
we
added
it
added
a
new
system
attribute
called
the.
B
You
would
need
to
update
the
link
name
hash
because
the
hash
value
itself
could
change.
In
addition,
the
parent
directory
also
could
change,
but
that's
that's
all
right.
I
mean
it's,
but
the
cost
of
that
is
not
much
because
you
would
anyway
need
to
change
the
m
time
and
see
times
off
the
note
so
that
that's
that's
all
right.
B
So
that
was
one
change
that
we
did
so
that
sort
of
addressed
the
problem
of
sequential
search,
and
this
now
reduces
to
a
constant
time
search
because
once
you
have
the
hash
value,
you
could
directly
do
a
lookup
in
the
zap.
You
have
the
parent
id
and
you
have
the
hash.
So
it's
it's
a
quick.
It's
a
constant
time
cost
to
look
up
into
the
zap,
so
that
was
one
solution.
B
Second,
so
here
is
a
sample
that
we
have
from
the
zdb
output.
We
have
the
object,
128,
whose
parent
is
2
and
we
do
have
the
link
name
hash
for
it
stored.
So
what
you
could
now
technically
do
is
in
this
app
d,
note
2.
You
could
quickly
look
for
for
this
hash
and
the
code
itself
lends
it
pretty
neatly.
B
You
could
pass
in
a
hint,
so
we
added
that
part
and
we
can
now
do
a
constant
time
lookup
for
the
hard
links
slightly
a
longer
solution,
because
what
we
technically
need
is
a
way
to
store
multiple
pairs
of
parent
d,
node
and
link
name
hash.
So
storing
this
is
not
depending
on
how
many
hard
links
or
hard
links
are
created.
You
need
a
large
you'd
need
a
larger
space.
So
initially
we
explored
option
of
storing
it
in
the
d
node
itself,
but
the
d
node.
B
In
our
case
we
are
using
the
standard
d
node,
which
is
the
5
12
byte,
and
there's
not
enough
space
there
to
store.
So
what
we
decided
to
do
was
add
a
new
zap
zap
object,
and
this
is
allocated
only
if
the.
If
the
file
is
has
is
is
hard
linked
as
an
if,
if
its
link
count
is
greater
than
one,
so
only
in
such
cases
you
would
have
a
zapd
node
allocated
and
in
this
app
in
that
zap,
what
you
do
is
keep
a
pair
of
these
parent
d,
node
and
link
name
hash.
B
So
it's
essentially
a
list
of
those
and
we'll
quickly
see
how
that
looks
like
and
once
you
do
this
now
it's
easy
because
you
you
have
the
entire
list
of
parent
directories
that
are
pointing
to
this
file
and
the
corresponding
link
hash
value.
B
So
you,
you
can
easily
list
out
the
various
files
that
are
on
various
directory
entries
that
are
pointing
to
this
particular
d,
node
so,
and
the
zfs
diff
itself
can
right
now
it
lists
out
the
first
entry
that's
found,
but
you
can
play
around
with
it,
so
we
actually
added
a
utility
which
can
list
the
various
entries
so
I'll
we'll
get
to
that
in
a
bit.
So
if
you
see
here,
we
have
an
example
where
object.
5
is
actually
a
hard
link
and
it
has
multiple
links
on
it.
B
In
this
case,
there
are
two
links
that
are
pointing
to
it,
so
it
ended
it
has
it
has
a
a
parent
id
which
is
256,
that's
that's
pointing
to
it,
and
we
have
a
links
app
384
and
in
the
links
app
there's
the
second
entry,
which
is
250
259,
which
is
pointing
to
it
and
the
the
other
output
that
you
see
here,
which
is
the
linx
app
output.
B
It's
listing
both
the
one
which
is
present
in
the
d
node
itself,
that's
256
and
the
second
one
which
is
259.,
so
you
can
technically
list
all
the
hard
links
that
are
that
are
referring
to
this
d
note.
So
those
are
the
two
changes
that
we
did
with
respect
to
the
on
disk
part.
Moving
on.
We
did
some
additional
optimizations
in
there,
which
is
like
we
like
earlier
mentioned.
B
The
top
level
directories
which
are
common
as
you
grow,
closer
to
the
root
of
the
data
set
and
those
would
be
dereferenced
a
large
number
of
times,
depending
on
how
many
entries
have
changed.
For
example,
if
you
look
at
the
below
change
here,
if
there
were
200
changed
files,
we
would
be
dereferencing
the
directory
a
b
multiple
times
in
there
or,
for
example,
c
itself
would
be
dereferenced
about
100
times.
B
So
the
call
the
benefit
of
caching
c
is
high
and
the
benefit
of
caching
b
is
even
higher
or
for
that
matter,
benefit
of
caching.
A
is
is
really
high,
because
it's
at
the
top
of
the
top
of
the
tree
there,
so
the
benefits
actually
go
up.
So
what
we
did
was
we
built
a
cache
in
the
user
land.
Now,
just
a
note
to
remember
here,
we
are
not
using
the
standard
cfs
diff.
B
Instead,
we
hooked
into
lib
cfs
and
where
we
are
using
this,
but
the
same
can
be
applied
to
standard
lib
dfs
as
well.
We
could
probably
build
a
cache
there,
so
the
other
changes
that
we
did
is
in
order
to
utilize
this
we
modified
the
iota
that
does
the
denote
to
path
translation.
So
we
we
could
we
pass
a
flag
in
saying:
don't
translate
the
entire
path,
just
just
fetch
us
the
name
and
that's
that's
sufficient.
B
So
once
once
you
just
fetch
the
name
and
by
the
way,
because
we
of
the
earlier
change
of
links,
app
link,
name
hash,
the
fetch,
the
fetch
is
quick,
it's
constant
time
and
once
you
fetch
it
and
if
you
know
it's
a
directory,
you
just
stash
it
into
into
a
cache
files.
You
don't
need
to,
because
files
would
just
be
looked
up
once
you
don't
need
them,
but
directories
you
just
stash
them
into
a
cache
and
the
next
time
around.
B
If
you
want
to
walk
up
the
path,
you
could
just
refer
it
from
the
cache
and
use
it.
So
that's
the
additional
change.
So
what
did
this
fetch
us,
or
rather
before
we
get
to
that?
The
summary
of
changes
is
in
here,
but
what
did
it
really
fetches?
B
So
initial
implementation
was
was
in
python,
which
was
fairly
simple
to
get
it
going
and
it
reduced
the
time
from
60
minutes
to
about,
I
think
about
15
minutes
or
10
minutes,
and
then
we
started
hitting
python
issues.
So
we
we
stripped
that
away
and
then
we
switched
to
c
plus-
and
it
has
come
down
to
about
15
to
20
seconds.
B
Even
if
we
say
it's
about
30
seconds
from
40
minutes
to
30
seconds
is
a
huge
reduction,
and
this
is
right
now
shipped,
it's
part
of
our
backup
apis,
that
we
provide
to
haiku
net
backup
and
other
vendors,
and
it's
running
on
customer
machines
right
now
so
fairly
tested
and
running.
So
it's
sort
of
tested
in
there.
B
What
what
effect
does
it
take
for
us
to
move
it
into
the
community?
So
the
summary
of
changes
that
we
have
done
the-
and
there
are
two
system
attributes
that
we
have
added-
that
I
think-
is
easily
portable
to
community
code.
B
There's
an
extra
d
node
that
we
consume,
but
this
is
again
only
if
it's
a
hard
linked
file.
So
that's
that's
the
additional
cost
that
we
pay.
There
then
there's
some
change
in
in
the
zil,
where
we
had
to
handle
the
additional
properties
that
we
are
passing
in
for.
B
B
B
So
that's
a
summary
of
changes.
I
think
we
can
open
up
for
questions.