►
From YouTube: June 2023 OpenZFS Leadership Meeting
Description
Agenda: RAIDZ Expansion; Github Action Runners; next-gen Dedup
full notes: https://docs.google.com/document/d/1w2jv2XVYFmBVvG1EGf-9A5HBVsjAYoLIFZAnWHhV-BM/edit#
A
All
right
welcome
to
the
June
2023
open,
CFS
leadership.
Meeting
I
have
a
couple
of
exciting
topics
on
the
agenda
today,
so
don.
Why
don't
you
kick
it
off.
B
Okay,
yeah
I
was
I'm
gonna.
Do
some
contract
work
to
see
the
raid
Z
expansion
to
completion
I
know
there
was
there's
been
somewhat
of
a
stall
in
recent
in
in
the
recent
past
and
I've,
so
I
just
want
to
give
that
update.
I
started
on
it
last
week
and
successfully
rebased
the
current
master
and
have
been
writing
tests.
B
I've
observed
a
few
issues
that
I'm
that
I'm
working
through
right
now
so
hopefully
I'll
have
a
pull
request.
Open,
I
I
would
guess
this
week.
If
I
can
hopefully
resolve
the
issues
and
then
we
can
move
forward
to
iterate
it
to
completion.
A
Awesome
look
forward
to
working
with
you.
If
there's
anything
I
can
do
to
help
out
walk
through
the
code
or
anything
happy
to
help
out.
B
That's
great
yeah
I'll
walk
through
I'm,
entangled
with
it
by
debugging
these
things
until
now,
I'm
sure
I'll
have
a
big
list
of
questions
of
how
things
work
or
why
why
things
were
made
the
way
they
were
I've
been
going
through
the
the
test,
Suite
test
and
annotating.
Those
and
also
I'll,
probably
have
some
questions
as
to
why
we're
testing
it.
That
way,
those
might
be
for
all
right.
I
can't
remember
his
name,
but
it
was
v-stack.
I.
Think
that.
C
A
B
So,
but
pretty
good
set
of
tests
to
start
with,
so
that
I'm
I'm
I'm
grateful
that
those
tests
are
there,
because
it's
really
it's
exercising
the
code
so
awesome
and
I
think
the
Z
Tab
the
last
bit
of
work.
Those
pull
requests
that
you
had
mentioned
to
me
Matt
were
we're
all
about
z-test,
so
my
plan
is
to
take
get
that
on
sort
of
second.
After
I
got
all
the
test.
Suite
Stuff
working
makes.
B
B
Makes
sense
because
it
you
know
it's
it's
more
of
a
hobbyist
I
guess
you
know
so
two
net
scale
would
probably
pick
it
up.
A
Man,
the
cbsc
foundation
sponsored
the
book
of
the
work
that
that
or
they
selected
all
the
work
that
I
did
yeah.
D
The
first
one
back
in
like
2016
or
whatever
I
think
IX
was
part
of
that.
Then.
A
I
mean
they
were
doing
work
like
doing
some
development
and
debugging
and
stuff.
So
you
know
just
just
contributing
the
open
source
collaboration
aspect,
which
is
great.
A
Cool
well
I
know
that
work
is
long
overdue
and
I
I
look
forward
to
see
it
completed
and
put
into
your
foot
into
production.
You
know,
I
regret
that
I
wasn't
able
to
get
it
all
the
way,
all
the
way
all
the
way
done,
but
if
I
can
do
anything
to
help
out
happy
to
do
so,
cool.
E
Alan,
this
is
add
just
a
quick
comment
on
that
too.
Ix
has
definitely
had
an
interest
in
this
for
a
long,
a
long
time
yeah
as
Matt
said,
the
the
foundation
was
the
primary
sponsor
of
the
the
bulk
of
the
work
from
the
beginning.
I
think
IX
did
contribute
a
small
amount
early
on
as
well
right.
E
Yeah
but
but
in
any
case
you
know,
IX
has
certainly
had
an
interest
in
it
for
a
long
time
and
the
foundation
also
wasn't
interested
in
it.
So
I'm
very
happy
to
see
that
it's
it's
gonna
finally
come
to
a
closure.
E
B
E
Yeah
I'll,
send
you
a
note
offline
and
we
can
see
about
I
mean
I've,
got
some
interns
and
things
as
well.
That
might
be
able
to
to
try
out
some
testing
at
the
same
time,
and
things
like
that
so
we'll
see,
if
there's
a
a
good
way
to
collaborate
on
that.
A
Cool,
that's
great,
well,
I,
look
forward
to
seeing
seeing
your
new
PR
and
closing
my
very
outdated
one.
A
All
right
next
item
on
the
agenda.
We
have
from
Tino
around
testing.
A
F
F
F
Is
it
seem
yeah?
Yes,
so
we
got
access
to
to
a
lot
of
machine.
This
is
an
overview
I'm.
Sorry,
this
is
in
German
currently,
but
you
will
you
see,
I
mean
we
got
access
to
power,
PC
the
64.
little
Indian
or
big
onion.
That's
up
to
us
what
what
we,
what
we
want
to
use
another
Forum,
CPUs
and
and
instances
at
a
maximum
of
10.,
and
they
are
already
inside
some-
some
different
distal
distributions.
C
F
F
Distribute
action
Runners
for
us
in
GitHub,
so
we
can
just
Define
our
yaml
stuff
and
it
will
run
on
on
every
machine
and
after
it
is
run,
it
will
completely
reset
and
then
be
free
for
the
next
one.
Of
course,
this
is
the
list
which
is
currently
active
on
my
personal
GitHub.
Instead
of
as
settings,
we
can
just
pull
this
over
to
opencfs
when,
when
it's
really
done,
when
it's
ready
for
for
distribution
and
I
can
of
course
show
some
action
which
is
done
just
before
some
minutes.
F
So
this
is
an
arm
system
with
some
beginning
of
setup,
not
not
currently
finished.
F
Here
we
have
the
runner,
it's
a
special
Runner,
it's
and
another
different
project
from
from
here
Christopher
hikes,
and
he
has
made
with
the
language,
go
some.
F
F
Maybe
we
have
a
list?
Yes
in
the
releases
there,
we
have
all
the
different
systems
which
are
supported.
F
This
is
the
fullest,
and
you
see
we
can
run
it,
maybe
everywhere
GitHub
action
Runner
with
with
this
thing,
but
currently
no
power
PC
on
FreeBSD.
If
power
PC
is
only
supported
on
Linux
currently
because
go
goes,
the
limit
the
go
language
isn't
a
port
currently
on
previous
the
power
PC.
When
this
is
done.
Open
ZFS
can
also
be
test
compiled
and
test,
and
all
the
testings
can
be
done
also
with
GitHub.
F
F
F
F
This
one
is
running
also
on
power,
PC
little
Indian,
Central,
S8
and
so
on
and
so
on.
So
we
can
all
these
25
27
boxes-
I
have
currently
could
be
in
this
list
and
they
are,
they
would
run
some
some
building
of
CFS
in
testing.
So
I
think
this
would
be
really
cool
when
this
is
finished
and
we
have
it
Upstream.
Yes,.
A
That's
great,
the
testing
that
you're
doing
on
these
machines
is
it?
Is
it
just
building
the
building
ZFS
or
is
it
running
like
the
ZFS
test?
Suite
or
z-test
is.
F
The
full
test-
feed,
of
course,
okay.
So
the
fourth
thing-
and
we
have
we
can
do
this-
we
have
the
okay
or
from
from
the
director
of
the
open
source
lab
there.
A
Cool,
that's
great
yes,
and
so
the
other
question
is
about
kind
of
resource
utilization.
You
have
that
list
of
Runners.
It
looked
like
there
was
like
one
of
each
type.
A
Yes
like
each
OS
and
CPU
architecture
is
the
idea
that,
like
when
you
run
a
build
when
you
open
a
new
PR,
it
would
kick
off
a
build
on
all
of
those
different
ones
and
then
like.
If
there
are
two
PRS
open
at
the
same
time,
kind
of
one
of
them
would
be
running
the
tests
and
then,
when
the
machine
frees
up,
there's
there's
like
there's
only
one
like
there's
one
arm64
Alma
Linux
8
machine
and
that's
going
to
run
the
test
for
one
PR.
F
Currently,
it's
limited
by
this,
this
number
10.
We
have
only
10
instances
and
we
have
10
different
system
tubes
before
we
we
take
away
some
distributions
here
and
then
we
may
also
have
to
Debian,
11
or,
and
then
we
can
maybe
have
if
more
power
for
the
real
test.
I
I,
don't
know
how
how
long
the
full
testing
Suite
on
all
these
machines
we
run.
I
I
did
it
some
day
with
power
PC
and
the
power
pc
pc
machines
on
nvme.
F
They
are
really
fast,
so
this
I
think
maybe
two
or
three
hours
for
the
full
testing,
but
but
not
more.
The
arm
and
the
AMD
systems
are
a
bit
slow,
but
AMD.
Of
course
you
may
also
take
over.
This
is
the
thing
how
how
it's
done
and
set
up
also
some
some
openstack
with
own
AMD
machines,
and
then
we
can
also
start
another
podcaster
of
new
runners.
A
F
A
F
D
C
D
Amazon
had
offered
to
donate
resources
to
run
more
of
the
arm,
64
testing
on
their
graviton
instances
as
well.
It's
just
mostly
I
think
a
matter
of
coordinating
that
with
them
and
and
actually
hooking
up
the
stuff
to
to
run.
Those
builds.
A
Yeah
Tio:
do
you
mind
stopping
your
screen
sharing
so
that
it's
not
like
going
through
a
couple
of
different
tops,
yeah
I
told
so
we
have
Amazon
has
given
us
some
resources
to
the
opencfs
project
and
and
Tony.
Do
you
want
to
talk
about
what
what
you're
working
on
moving
some
stuff
over
to
there
so
far,
I
think
we're
just
doing
the
x86
stuff,
but
it
definitely
could
be
possible
to
use
that
for
from
we're
testing
on
them
as
well.
C
Yeah
yeah
I
can
go
and
do
it
a
little
bit
so
I've
been
working
on
a
couple
different
things
with
our
open,
ZFS
AWS
account.
The
first
thing
is
moving
over
our
S3
bucket
that
hosts
all
of
our
repositories,
all
of
our
RPM
repositories
from
the
Livermore
National
Lab
S3
bucket
to
the
open
ZFS.
One
second
thing
would
be
moving
buildbot
to
run
on
the
open
ZFS
around
our
open,
ZFS
AWS
account,
and
then
the
third
thing
I'm
working
on
is
looking
into
the
future.
C
So
right
now,
I
think
they
just
support
as
far
as
Linux
goes
like
Ubuntu,
but
we
need
to
test
it
on
Fedora
and
Centos,
so
I've
been
kind
of
in
the
shadows,
testing
that
I've
got
it
to
basically
run
hello
world
in
an
instance,
but
that's
kind
of
like
80
of
the
work,
because
there's
just
a
lot
of
setup.
You
have
to
do
so.
A
It
sounds
like
the
the
solution
of
having
a
GitHub
Runner
that
then
executes.
Like
then
controls,
another
machine
is
maybe
similar
to
the
the
thing
that
Tino
is
showing
us
to
like,
create
or
GitHub
runner
on
this
other
architecture.
C
A
A
Cool
well
I
mean
that
all
sounds
great
I
think
expanding
our
test
coverage
is
wonderful
and
so
I
look
forward
to
seeing
both
of
these
projects.
Move
Along
questions
for
Pony
I.
Guess
we
got
there
by
talking
about
arm
64
testing
Kony
du.
A
Have
you
thought
at
all
about
using
the
Amazon
graviton
stuff
to
for
testing.
C
A
Cool
yeah
I
mean
those
are
the
ones
that
personally
I'm
I'm
interested
in
those
instances
making
sure
that
those
work
I
think
probably
there's
a
lot
of
folks
at
least
people
who
are
doing
ZFS
in
the
cloud.
You
know,
that's
probably
the
at
least
second
most
popular
architecture
that
they
may
be
running
on.
D
A
Yeah
yeah
I
mean
it's
something
where
like
maybe
maybe,
if
we
don't
have
the
money
or
resources
to
test
everything
fully
on
every
OS
and
platform
combo,
we
can
do
the
full
tests
on
the
the
cheapest
ones,
which
might
be
AWS
graviton
and
then
do
you
know
a
more
limited
number
of
tests
for
every
PR.
On
the
other
platforms.
A
Cool
other
topics
for
today's
meeting.
D
G
Week
he
mentioned
to
me
that
he'll
be
on
vacation.
This
weekend
should
return
next
and
start
the
branching
process.
Okay,
he
could
start
before,
but
was
already
traveling
somewhere
makes
sense.
I
can't
wait
for
it
to
happen.
We
are
stretching
it
quite
a
lot
in
which
contexts
are
related.
If
somebody
has
time
to
review
a
few
pages,
I
have
half
dozen
different
optimizations
be
bigger
and
smaller
right
now
open.
G
D
Does
anyone
happen
to
know
like
kind
of
similar,
to
Alexander's
point
of
what
the
cutoff
will
be
for
I'm
guessing
in
order
to
get
into
the
the
next
long-term
release
of
Ubuntu?
We'd
have
to
have
it
out
in
time
for
the
next
not
long
term
release
if
Ubuntu,
so
they
have
a
release
of
testing
before
or
whatever
it's
like.
How
soon
does
2.2
have
to
be
ready
to
make
sure
it
makes
it
into
the
next
Ubuntu.
D
Right,
that's
when
it'll
be
released
so
it'll.
H
A
Sounds
good?
What
else
do
folks
have
to
talk
about
today?.
G
There's
one
more
thing
see
if
somebody
wouldn't
want
to
command:
there's
a
PR
open
to
increase
dedup
blocks
from
current
4K
to
something
bigger,
and
this
goes
there'll
go
in
discussion
to
how
high
could
it
be
and
current
consensus
to
try
to
go
to
16k,
because
I
was
wishing
to
go
higher
I'm
trying
to
push
it
lower.
But
if
somebody
has
other
ideas,
small
PR
is
open,
practically
changing
default
and
making
them
tunable.
G
So
Alan
saying
so.
Your
team
is
also
working
in
that
area.
I
would
appreciate
your
comments
if
you
already
have
any
numbers
I
like
I
I
have
mentioned
you
there
already
from
the
point
that
maybe
it
doesn't
work
to
touching
it
right
now,
while
developing
this
already
going
in
parallel,
but
it
seems
like
yeah
he's
interested
to
push
it.
So
if
you
could
go.
D
Yeah
and
I
can
I'll
see
the
point
of
you
know
the
work.
We're
doing
won't
be
into
that
too.
If
it
starts
shipping
in
two
weeks,
and
so
did
we
want
to
at
least
have
tunables
by
then
but
I,
don't
know,
but
I
agree
with
you,
I,
don't
know
what
changing
the
defaults
is.
A
good
idea.
At
this
point,
we've
mostly
been
focused
on
the
the
work
to
make
the
amplification
not
as
bad
and
then
once
that's
done.
D
We
expect
that
increasing
those
indirect
and
the
leaf
block
sizes
will
have
much
less
of
a
penalty,
but
until
we
were
there,
we
don't
have
as
many
measurements
as
we'd
like,
but
we
can
do
some
in
the
interim
to
to
validate
what
it
looks
like,
because
we
saw
we're
a
different
customer
completely
unrelated.
D
It
turned
out
they
were
using
dedup
and
while
doing
some
benchmarking,
we
saw
transactions
stalling
for
multiple
seconds,
just
iterating
over
the
freeze,
and
so
just
the
fact
that
a
lot
of
blocks
were
overwritten
in
this
transaction
meant
that
the
transaction
just
sat
there
spinning
at
the
end-
and
that
was
quite
sub-optimal
for
you
know,
they're
doing
fio
and
then
there's
just
a
pause
for
seconds.
Well,
no
new
rights
happened
because
the
transaction
is
Flushing
and
the
next
transaction
group
is
already
full
or
it
hit
the
the
dirty
Max.
D
So
yeah
there's
a
lot
to
be
done
there,
but
I'm
with
Alexander
that
at
the
moment,
I
think
increasing
the
the
leaf
size.
There
would
just
make
it
worse.
D
H
D
If
there's
interest
I
suppose,
but
right
now,
what
Clara
is
working
on
is
just
a
different
way
of
doing
dedup.
D
That
will
not
suffer
as
many
of
the
problems
kind
of
taking
some
inspiration
from
Matt's
original
log,
YouTube
idea,
but
applying
some
trying
to
deal
with
some
of
the
limitations
it
had
of
the
whole
DDT
having
to
be
in
memory
all
the
time
and
just
the
way
the
man's
version
had
to
rewrite
the
in
when
it
condensed
the
DDT
had
to
rewrite
the
entire
thing
as
one
transaction
Group,
which
again
could
lead
to
that
thing
of
the
whole
system,
sits
there
and
waits
until
that
transaction
is
done.
Writing
out.
H
A
Yeah
I
think
at
a
high
level
like
the
stuff
Allen's
talking
about
is
like
in
in
implementation,
detail,
detail
of
of
the
current
dedup
property
and
and
how
that
all
works
versus
the
block
reference
table.
Brt
thing
is
like
a
totally
different
mechanism.
A
I
understand
that
could
be
used
for
like
after
the
fact
dedupe,
but
I
don't
know
that
anybody
is
working
on
that
right
now.
D
You
know
if
it's
a
science
experiment
I
wanted
to
play
with
in
the
back
of
my
head,
especially
for
something
like
oh
I.
Have
these
two
vmdk
files
that
are
similar
go
find
the
blocks
that
are
the
same
and
and
use
brt
to
fix
them,
but
with
our
current
work
on
dedupe,
maybe
we
can
get
dedup
to
not
suck
enough
that
you
don't
have
to
to
worry
about
trying
to
do
offline.
Do
you
do.
D
G
A
D
All
right,
so
just
a
quick
slideshow
here
talking
about
some
of
the
problems
with
d-dube
and
then
how
we
attempt
to
address
them
with
a
list
laundry
list
of
design
changes
to
dedupe.
D
So
the
biggest
thing
is
that
we
did.
You
have
to
do
the
read
before
the
right.
So
every
time
we
get
a
write
if
the
DDT
is
not
cached
in
memory,
we
have
to
do
a
random
read
to
disk
or
usually
multiple
through
the
indirect
blocks,
and
then
the
leaf
block
to
find
out
if
this
particular
hash
is
in
the
DDT
and
kind
of
to
what
Alexander
was
talking
about
before.
In
that
pull
request.
Part
of
the
issue
is
that
we
store
those
Leaf
blocks
in
very
small
records.
D
4K
each
and
even
the
indirect
blocks
are
also
limited
to
4K
instead
of
the
default
128k
for
each,
but
that's
because
of
the
right
implementation
problem
which
we'll
get
to
in
a
second.
The
other
problem
is
the
DDT
is
sorted
by
the
hash.
So
if
you
you
know,
during
this
transaction,
we
wrote
100
new
blocks.
The
hash
means
that
those
are
going
to
be
kind
of
evenly
distributed
across
the
entire
DDT.
D
D
So
for
each
hash
that
we
have,
the
key
is
384
bytes,
but
the
body
is
even
bigger
because
we
have
the
three
DVA
slots
multiplied
by
four
sets,
so
we
actually
stored
12
dbas
for
every
dedup
entry,
one
for
all
the
dvas
for
a
copies
equals
one
block,
one
for
the
copies
equals
two
and
another
for
copies,
equals
three
and
then
there's
also
still
the
deprecated
support
for
ditto
blocks,
where,
if
we
do
block
a
lot,
we'd
write
an
extra
set
of
copies
of
it,
and
so
that
means
we're
having
a
lot
of
these
very
large
DVA
slots
in
every
entry.
D
When
most
of
the
time,
almost
all
blocks
that
D2
are
going
to
have
copies
equals
one
and
even
if
they
don't,
there
are
ways
around
that
and
then
the
eye
of
simplification,
especially
if
you're
doing
a
z,
Vol
with
say
8K
record
size
or
wall
block
size.
If
every
time
you
write
8K
for
that
or
a
database
or
whatever
you're,
also
writing
a
4K
block
of
the
DDT
and
its
indirect
blocks,
then
you
can
end
up
with.
H
D
Know
more
of
your
iops
going
to
updating
the
D2
table
than
actually
writing
data
to
the
disk
and
I
can
really
hurt
and
the
same
thing
can
happen
with
just
running
at
a
right
bandwidth,
so
part
of
that
the
first
thing
would
be
implementing
a
dedup
quota.
D
But
for
that
to
make
sense,
you
know
claro
tried
to
implement
that
in
the
past,
but
the
problem
is
without
the
zap
shrinking
work.
The
digital
table
never
got
smaller,
and
so
as
soon
as
you
ever
hit
the
limit,
no
matter
how
many
blocks
you
erased,
you'd
never
get
any
in
the
future.
So
with
the
zap
shrinking
work,
we're
now
able
to
actually
shrink
the
DDT
and
get
room
back.
D
So
if
a
Block's
been
on
there
and
it
hasn't
deduped
in
the
last
hundred
thousand
transaction
groups
or
whatever
criteria
we
come
up
with,
then
we
could
remove
them
from
this
list
requires
some
special
processing
so
that
when
we
free
the
block-
and
we
see
the
dedupe
flag,
we
don't
assert
that
it
must
be
in
the
dedup
cable.
But
we
know
that
if
it's
not
in
any
of
the
dedup
tables
and
the
d-bit
is
set,
that
it
must
have
been
on
the
unique
table
and
got
purged,
and
so
it
is
safe
to
free
it.
D
Even
though
we
don't
it's
not
on
the
list,
we
know
for
a
fact
that
it
was
only
had
a
single
reference
and
so
looking
for
ways
to
basically
get
rid
of
the
oldest
entries,
because
it's
the
newest
ones
that
are
most
likely
to
dedupe
right.
If
a
Block's
been
on
your
pool
for
a
year
and
never
dedupt,
it's
probably
not
going
to
do
tomorrow,
but
a
block
you
wrote
today
has
a
higher
chance
of
deduping
tomorrow,
but
also
to
deal
with
the
cash
performance
effects.
D
D
Can
we
rewrite
that
section
of
it
to
make
them
less
sparse
and
also
mean
that
a
whole
sub
tree
of
this
app
will
now
be
contiguous
again,
instead
of
being
scattered
all
over
the
disk,
meaning
that
prefetch
will
do
a
better
job
of
getting
us
more
cash
hits
on
dedupe
and
I
mentioned
about
having
four
sets
of
three
slots
for
every
Block
in
the
dude
table,
and
one
of
the
things
we
want
to
do
is
part
of
the
reason
for
that.
D
Right
now
is
to
deal
with
the
different
copies
equals,
but
we've
got
a
concept
where,
if
we
have
a
copies
equals
one
block
in
the
deduct
table
and
later
somebody
bursts,
a
new
block
with
copies
equals
two.
Instead
of
having
to
store
that
in
a
completely
separate
slot
of
two
dbas,
we've
upgraded
the
existing
entry
by
just
adding
just
one
second
DVA
to
the
existing
entry
downside
to
this
is,
if
you
write
a
copies,
equals
one
version
and
then
a
copies
equals
two
version.
D
When
you're
free
the
copies
equals
two
version,
we're
we're
still
going
to
keep
two
copies
in
the
dedup
table.
We
won't
free
the
space
until
the
block
that
hash
goes
all
the
way
to
zero
references,
but
I
don't
see
that
as
being
a
a
common
case
where
people
are
mixing
the
copies
equals
property
on
a
lot
of
stuff.
A
D
So
I
would
save
another
a
couple
of
bytes
in
every
entry
if
we
just
only
had
to
keep
one
DVA
for
every
entry,
we'll
have
to
consider
that
and
then
also
looking
at
doing
better
prefetch.
For
example,
I
said
the
the
freeze
were
blocking
that
system
for
a
long
time,
especially
with
freeze.
D
We
know
which
blocks
it
is
we
have
the
block
finder,
so
we
can
maybe
get
more
prefetching
happening
there
to
pull
that
part
of
the
DDT
in,
so
that
we're
not
waiting
on
a
synchronous
read
later
and
trying
to
look
at.
Can
we
do
we
calculate
the
check
sum
of
the
block
early
enough
in
the
right
path
to
be
able
to
prefetch
that
part
of
the
DDT
and
make
a
performance
difference
there
as
well,
for
when
we're
actually
thinking
it
out,
especially
if
it's
an
asynchronous
right,
the
other
big
thing
is.
D
Can
we
fan
out
the
zaps
now
right
now
we
have
these
two
giant
zaps,
but
if
we
did
them
say
on
a
histogram
of
the
record
size,
we
know
that
you
know
a
block.
That's
4K
can't
have
the
same
hash
as
a
block.
That's
one
Meg!
So
can
we
look
in
a
smaller
zap
and
have
more
locality?
More
of
all,
the
blocks
of
the
same
size
will
be
in
the
same
zap
and
maybe
improve
the
performance
of
reading
and
writing
there.
D
But
the
biggest
part
is
the
log
dedupe
a
different
version
than
Matt's
original
one.
D
But
the
idea
is:
writing
an
append,
Only
log
and
maintaining
those
changes
in
memory
until
after
some
criteria,
like
you
know,
it's
been
too
many
transaction
groups
or
the
logs
too
big
or
too
old
to
then
flush
those
out
to
the
normal
zap
data
structure,
so
kind
of
a
hybrid
between
log
space
map
and
Matt's
original
log
dedupe,
where
we
will,
you
know,
keep
a
log
so
that
creating
birthing
new
blocks
is
just
depending
to
this
log
object,
but
eventually
we
do
flush
it
back
to
a
normal
zap
so
that
we
have
the
option
of
not
having
to
keep
the
entire
DVT
in
memory
at
all
times,
especially
if
we're
having
a
dedicated
fast
device
to
be
to
store
the
DDT
and
we're
not
going
to
need
to
have
the
whole
thing
in
memory
all
the
time
it
gives
us
back.
D
This
ability,
to
you
know,
fault,
or
to
page
out
the
DVT
basically
and
be
able
to
pull
the
tune
off
disk
again
and
once
we
reach
a
certain
size
or
age
or
or
whatever.
Then
we
would
flush
these
logs
so
that
we
don't
have
to
replay
them
at
import
to
make
sure
the
time
for
import
or
failover
doesn't
end
up
getting
long,
but
by
grouping
a
whole
bunch
of
changes
to,
hopefully
a
smaller
zap
with
the
mix
of
the
sharding
and
this
logging.
D
D
And
then
the
by
using
separate
logs,
it
means,
if
we,
you
know,
have
a
log
of
just
4K
writes
that
are
being
deduped.
We
can
collapse
all
those
right
about
to
the
zap
and
then
truncate
or
remove
that
log
item
entirely
instead
of
having
logs
that
go
forever
we'll
be
able
to,
you
know,
pick
a
certain
type
of
Rights
flush
them
all
to
the
DDT
and
truncate.
The
log,
instead
of
having
to
you,
know,
have
logs
where
we
have
to
have
a
ring
buffer
kind
of
thing
and
we're
always
scattering
around
on
it.
D
So
yeah,
the
main
advantage
to
this
is
that
we
don't
end
up
having
to
fit
the
whole
DDT
in
memory,
so
we
don't
have
to
have
a
limit
on
the
size.
So
if
you
have
a
you
know
a
two
terabyte
nvme
to
store
your
data
table,
you
can
have
two
terabytes
of
dedo
foundries.
If
you
think
that's
a
good
idea,
but
it
means
most
of
the
memory
requirement
of
this
is
Arc.
D
So
it's
evictable,
whereas
the
original
Matt's
original
log,
D2
design,
that
memory
was
all
kind
of
overhead
and
you
couldn't
ever
not
have
dedupe
table
in
memory.
So
if
you
do
the
table
wanted
to
be
too
big.
You
just
had
no
choice
but
to
have
that
much
memory.
H
D
D
That's
right
now,
I
think
they're
about
176
bytes
of
RAM
for
each
entry
that
it
keeps
in
memory
and
something
like
200
and
something
on
disk.
I
forget
the
exact
numbers,
but
by
doing
some
of
the
things
like
what
Matt
talked
about,
if
we
limit
it
to
just
one
DVA
in
total,
then
we
get
a
lot
different
effects.
As
far
as
how
much
memory
it'll
take.
We
can
reduce
that
by
a
lot,
and
so
we
can
get
the
entry
smaller.
D
Then
it
will
make
a
bigger
difference
about
how
compactly
we
can
do
that
and
if
we
can
reduce
the
amplification
cost
so
that
we
can
allow
a
larger
record
size
that
will
allow
the
deduct
table
to
have
be
compressed
again
because
right
now,
if
you
have
4K
sectors
and
4K
blocks,
then
we
just
disable
compression,
and
so,
if
we
can
get
that
suppressed,
then
we
could
cache
more
DDT
in
the
same
amount
of
memory,
thanks
to
compressed
Arc.
A
A
A
Any
other,
like
look-upable,
under
structure
like
this
app,
but
presumably
you
are
rewriting
that
zap
in
its
entirety
like
when
you
flush
the
log
you're,
basically
going
to
be
like
okay
well
like
now
we're
reading
the
whole
existing
zap
that
corresponds
to
that
log
and
then
we're
writing
out
all
the
blocks
of
it.
Right.
D
No
basically,
whether
it
be
the
log
will
be
appending
the
things
that
we're
going
to
change
about
the
zap,
so
basically
increments
and
decrements
of
ref
counts.
And
then,
when
that
log
gets
big
enough
or
old
enough,
we
will
apply
those
changes
as
one
transaction
group
to
the
zap.
So
we
won't
have
to
rewrite
the
whole
zap.
D
Like
some
of
the
zaps
we've
seen,
some
of
the
detail
tables
we've
seen
in
in
the
wild
are
100
gigabytes,
yeah.
D
A
D
When
we
should
just
be
applying
the
zap
changes
as
if
you
just
made
all
those
changes
to
the
zap
in
bulk,
so.
A
H
A
D
A
D
A
D
To
summary,
but
ideally
we,
like
you,
said
because
we're
touching
so
many
of
the
blocks
that
we'd
amortize
the
cost
of
rewriting
those
blocks
and
of
the
the
indirect
blocks,
and
we
would
save
a
lot
of
iops
by
not
having
to
rewrite
so
many
yeah.
A
A
But
in
order
to
do
that,
you
know
you
have
to
be
able
to
limit
the
size
of
the
zap
end
of
the
logs,
such
that
the
changes
can
fit
in
Ram,
which
I
think
that's
doable
with
some
tweaks
to
the
design
Dr
line,
but
so
I
think
maybe
I'd
like
to
understand
that
a
little
bit
better
where
I
was
actually
going.
Was
you
know?
A
If
you
do
that,
then
the
then
I
wonder
if
the
zap
is
really
the
right
data
structure
for
on
disk
representation
of
this,
like
you,
because
you're
talking
about
making
some
improvements
to
the
zap
which
is
like
you
know,
probably
non-trivial,
and
does
that
mean
not
it's
probably
not
an
optimal
data
structure
for
this
anyways,
especially
given
like?
A
If
you
are
writing
the
entire
thing
at
once,
like
that
just
say,
for
example
like
if
you're
writing
the
whole
thing
at
once,
then
you
can
like
kind
of
pre-compute
exactly
how
it
should
go
in
order
to
not
waste
any
space
right,
as
opposed
to
like,
with
the
kind
of
existing
hash
table
structure,
you're
going
to
have
some
amount
of
empty
space
in
every
leaf
block
kind
of
by
Design,
and
it
sounds
like
you're
saying
that
the
undis
structure
of
the
zap
is
what
you're
going
to
be
caching
in
memory
and
that's
what's
going
to
get
you
the
fast
lookups
right
so
like
when
you're
looking
up
in
it
you're
going
to
go
through
the
arc.
A
That's
going
to
look
in
the
zap
and
that's
going
to
give
you
the
value
from
the
lookup,
but
we
don't
need
to
be
able
to
modify
that
like
on
the
Fly.
We
can
just
have
like
some
data
structure.
That's
on
disks
we've
on
disk.
We
write
it
once
now.
We
we
might
be
doing
a
bunch
of
lookups
in
it
and
then
and
then
later
on,
we're
going
to
throw
that
one
away
and
write
out
a
new
one.
A
C
A
Be
able
to
process
one
whole
log
in
memory
at
once
and
then
I
I
think
that
the
the
other
kind
of
design
decision
is
is
the
on
disk
structure.
Also,
the
the
in-memory
structure,
like
that's
kind
of
what
you're
proposing,
is
like
it's
a
zap
on
disk,
we're
going
to
have
that
be
the
in-memory
structure
for
reading
and
then
like
there's
a
different
structure
for
the
log
and
then
I
guess
maybe
there's
a
third
data
structure.
That's
like
an
in-memory
version
of
the
log
that
you
can
do
lookups
in
yeah.
A
D
To
what
that's,
how
d-dube
is
normally
and
kind
of
our
in-memory
structure
of
what's
in
the
log
is
a
smaller
version
of
that,
because
it
has
fewer
fields
and
so
on,
but
yeah.
It
makes
sense
to
consider
some
other
options
there.
As
far
as
making
it
yeah.
A
A
Just
because,
like
the
zap
is
designed
to
be
able
to
do,
you
know,
look
up
some
modifications
one
at
a
time
with
you
know
one
disk
access
per
thing
for
the
per
lookup
or
modification,
but
you
don't
care
about
you
only
care
about
lookups.
Basically,
because
modifications
are
going
to
the
log
and.
A
Yeah
essentially
like
you're,
going
to
want
to
generate
a
whole
new
on
disk
structure,
all
at
once
from
from
the
contents
of
the
log.
So
you
know
a
different
data
structure
might
be
more
compact.
This.
D
A
Table
right,
yeah
yeah,
that's
the
main
thing
assume
that
you're
using
the
same
data
structure
on
disk
and.
D
Yeah,
we'll
yeah.
A
D
A
A
All
right
other
last
topics
or
mentions
for
today's
meeting.