►
From YouTube: ZSTD Compression by Allan Jude
Description
From the 2017 OpenZFS Developer Summit:
http://www.open-zfs.org/wiki/OpenZFS_Developer_Summit_2017
A
A
B
All
right,
so
my
name
is
Alan
Jude
and
I'm,
a
FreeBSD
committer
and
I
co-authored
FreeBSD,
masteries,
NFS
and
advanced
@fs
with
Michael
Lucas
and
I
run
a
video
streaming
company
in
my
day
job.
So
the
compression
algorithm
is
called
Zed
standard.
It
was
designed
by
John
Collett,
who
wrote
lzd,
for
which
we
use
heavily
in
Zetas
vests.
Today
at
Facebook,
the
general
concept
is
to
get
compression
ratios
closer
to
what
you
get
with
gzip,
but
faster,
because
I
was
at
four,
you
get
less
compression
but
more
speed.
B
It's
actually
a
combination
of
a
number
of
different
compression
algorithms,
including
a
finite
state,
entropy
encoder
and
Huffman
encoder
and
like
how
gzip
has
its
nine
levels.
Currently,
there
are
22
and
soon
to
be
more
levels
in
Zedd
standard,
and
it
provides
you
a
much
greater
array
of
speed
and
memory
trade-off
and
then
there's
also
a
dictionary
training
feature
which
we'll
talk
about
in
a
little
bit
so
just
quickly,
comparing
Zedd
standard
to
dead
Lib,
which
is
gzip
and
lz4.
You
can
see
instead
of
about
a
two
to
one
compression
ratio.
B
You
can
get
closer
to
three
to
one,
but
it's
not
as
fast
as
LZ
at
four
but
four
times
faster
than
gzip
and
these
numbers
are
per
core.
So
if
you
have
a
reasonable
number
of
cores,
then
you're
going
to
be
much
faster
than
your
spindles
or
maybe
even
your
SSDs,
and
so
the
trade-off
for
having
the
compression
is
pretty
low.
So
I
originally
started
working
on
this.
When
said
standard
1.0
came
out
in
the
middle
of
2016,
the
very
beginning
of
it
was
quite
easy.
B
Zfs
has
a
nice
clean
API,
you
just
add
some
functions
to
a
table,
and
you
know
it's
like
here's.
The
buffer
I
want
to
compress
here's.
The
buffer
I
want
you
to
write
the
compressed
version
into
the
sizes
and
whatever,
and
it
was
very
straightforward
to
do
this
made
easier
because
in
the
design
of
Zed
standard
they
actually
provide
a
way
for
you
to
specify
your
own
memory
allocator,
instead
of
it
using
malloc
and
free.
B
So
it
was
easy
to
adapt
that
to
say
well
here
hook
up
to
the
FreeBSD
kernel
memory
allocator,
although
in
the
1.0
version
of
zed
standard,
they
also
used
a
lot
of
stack
space
which
caused
all
kinds
of
grief,
but
luckily,
in
later
versions
they
offered
a
heap
mode.
Much
like
lz4
has.
That
meant
we
could
just
out
the
memory
that
way.
B
It's
already
been
updated
a
couple
times,
I
think
when
we
imported
it,
it
was
1.1
and
now
we're
up
to
1.3
and
then
in
the
freebsd
based
system,
we
actually
install
lib
zed
standard,
although
we
install
it
as
what.
In
previously,
we
call
a
private
library,
meaning
its
names
faced
off,
so
that
only
applications
that
are
part
of
freebsd
use
it
where,
as
third-party
packages
that
users
install
won't,
be
able
to
find
this,
and
if
something
depends
on
zed
standard
or
they
want
to
install
that
standard.
They
get
the
version
of
the
library
from
ports.
B
So
there
was
a
few
other
challenges
with
memory,
unlike
lz4,
which
has
a
fixed
context,
size
for
compression
and
decompression.
It's
slightly
tunable
and
elves
at
four,
but
the
the
one
we're
using
in
Zen
FS
is
just
a
fixed
16,
kilobyte
memory
allocation,
so
there's
1,
K
memcache
reside
standard
with
the
different
levels
and
different
record
sizes.
You
get
different
contexts
sizes
for
the
compression
and
decompression
so
I.
B
Actually,
the
approach
I
took
so
far
is
to
create
an
array
of
kmm
caches
based
on
the
I
picked
three
of
the
compression
levels,
the
minimum
the
default
and
the
maximum
instead
of
implementing
all
of
them,
because
there's
only
so
many
spots
in
the
enum
on
the
on
disk
format
for
the
compression
types
and
it
turns
out,
we
don't
actually
want
to
put
all
of
them
in
there
anyway.
So
a
decompression
context
with
LZ
or
with
Zed
standard
is
150
kilobytes
and
then
compression
varies
with
a
16
K
record
with
the
minimum
professional
level.
B
B
So
we
have
an
array
of
the
compression
levels
and
the
record
sizes
and
we
used
a
function
instead
standard
that
estimates
the
contact
size
and
we
create
a
bunch
of
K
mm
caches
that
we
would
use
and
initializing
those
doesn't
really
have
a
cost,
and
then
they
only
get
used.
If
you
actually
start
using
something
that
block
size
and
that
compression
level
so
know
those
50
megabyte
k,
mm
caches
won't
actually
take
up
any
memory
unless
you
actually
start
compressing,
8
Meg
blocks
of
data
there's
in
the
newer
version.
B
B
So
this
is
led
to
the
question
of
because
there
are
nineteen
or
twenty
two
of
you.
If
deaf
in
the
ultra
mode
in
said
standard
levels
in
on
dis
format,
we
only
need
to
know
that
it's
Zed
centered.
So
when
we
go
to
decompress
it,
we
can
use
that
decompressor.
We
don't
actually
need
to
know
which
of
the
22
levels
was
used
to
compress
it
in
order
to
decompress
it.
B
Having
trouble
reasoning
about
how
to
handle
that
when
you
know,
if
you
set
the
compress
level
for
Zed
standard
to
ten,
and
then
you
switch
to
gzip
ten
is
in
the
valid
compression
of
the
level
and
same
with
ELLs
at
four,
which
doesn't
really
have
compression
levels.
Although
there
it
has
something
we
could
use
like
a
compression
level,
but
I
don't
know
how
to
have
a
property.
That's
very
tightly
coupled
with
another
property,
where
you
know,
if
you
change
from
zest
and
ER
Del's
at
four.
B
B
So
for
now,
instead
of
filling
up
the
enum
I
just
created
the
minimum,
the
default
in
the
maximum
compression
level
in
the
prototype,
so
I
did
a
little
benchmark
here
of
compressing
the
Selita
compression
corpus
as
a
standard
benchmark
for
compression
with
the
minimum
level.
You
can
press
about
three
hundred
and
thirty-five
Meg's
a
second
per
core
with
a
compression
ratio
of
two
point.
Eight
to
one.
B
Although
looking
at
gzip
the
minimum
on
gzip,
you
only
get
two,
my
seven
dewayne
compression
at
77
megabytes
a
second
at
about
that
same
speed.
You
could
get
3.4
DeWine
compression
was
at
standard
and
with
gzip
9
you're,
barely
getting
the
compression
level
that
says,
Santa
would
get
at
20
times
the
compression
rate.
So
you
can
get
a
lot
more
throughput.
B
B
So
earlier
a
couple
weeks
ago,
I
was
in
Paris
for
euro
BSD
con
and
was
talking
to
one
of
the
vendors
that
was
there
and
they
run
a
payment
processor
in
Europe.
It's
mostly
an
append-only
database,
but
I
was
helping
them
debug.
Some
performance
problems,
they're
having
and
I
first
thing
I
noticed,
is
for
their
MySQL
database.
B
They
were
using
128,
K
record
size,
I
assumed
it
was
because
they
didn't
know
better,
but
when
we
discussed
it
with
them
is
actually
they
do
it
on
purpose,
because
they
get
a
better
compression
ratio
and
they
have
20
or
25
terabyte
database.
That
has
to
fit
all
on
SSDs
and
they
can
only
afford
so
many
SSDs
and
since
it's
mostly
an
append-only
database,
it's
not.
B
They
don't
get
as
much
write
amplification
as
you
would
with
random
access,
but
they're
using
the
larger
record
size
because
it
got
them
an
extra
of
like
0.5
to
1
on
the
compression
ratio.
So
obviously
stronger
compression
with
that
would
still
be
fast.
Enough
might
be
quite
interesting
to
them.
So
I
grabbed
a
database
that
we
have
that
work,
which
is
our
ticketing
database
for
a
pay-per-view
system.
It's
about
14,
point
2
gigabytes
with
lz4.
B
We
get
about
3.8
to
1
compression
with
the
regular
16
K
blocks
we
actually
use
in
the
database.
But
if
we
scaled
that
up
to
1
Meg
blocks
in
the
database,
we
actually
get
5
point
4
to
1,
even
with
just
ELLs
at
4,
and
writing
that
data
takes
about
50
seconds
to
write
at
the
14
gigs
and
have
it
be
compressed
with
gzip.
B
A
B
One
of
the
other
interesting
features
I
touched
on
part
of
the
reason
why
Facebook
is
so
interested
in
Zed
standard?
Is
it
has
this
custom
dictionary
training
compression
mode?
Their
main
goal
with
it
is
someday
that
browsers
will
support
Zed
standard
and
they'll
be
able
to
send
their.
You
know
10
JSON
messages
that
are
based
on
the
same
dictionary,
but
with
different
content
compressed
with
this
custom
compression
dictionary
that
will
be
able
to
abstract
out
the
the
repeated
parts
of
the
structure
of
the
data,
and
so
while
I
was
digging
through
the
zetas
fest
code.
B
B
If
we
wanted
to
actually
offer
this
to
end
users
to
be
able
to
say,
here's
a
dictionary
or
some
dictionaries
for
the
types
of
files
I'm
going
to
write
into
this
data
set,
what
would
the
ZFS
API
for
the
user
to
load
that
dictionary
into
ZFS
look
like
and
how
would
we
manage
them?
I,
don't
know
what
that
would
look
like.
B
B
Another
interesting
thing
that
says
sander
has
recently
grown
as
a
contributed
project,
is
an
adaptive
compression
feature
kind
of
designed
before
we
had
compressed,
send
and
receive
where
people
would
pipe.
Is
that
if
I
send
into
gzip
or
multi-threaded
gzip
or
whatever
this
said
standard,
one
will
actually
dynamically
adjust
the
compression
level
based
on
how
fast
the
output
is
being
consumed.
So,
if
you're
going
over
a
slow
Network
link,
it
will
spend
more
time
compressing,
but
only
up
to
the
point
where
it's
not
starving
the
network
link.
B
B
So
it
dynamically
adjusts
the
compression
level
to
keep
up
with
the
how
fast
you
can
drain
the
output
buffers
that
might
be
very
interesting
in
ZFS,
where
it's
like
compress
it
as
good
as
we
can
without
slowing
down
or
writes
to
the
disk,
although
in
ZFS
we're
writing
records,
that
are,
you
know,
even
in
the
worst,
their
best
case
are
only
16
megabytes
or
in
most
cases,
are
128
K
or
16
K.
You
don't
have
much
time
to
adapt
in
that
in
that
kind
of
context,
but
I
know
at
mic
Center.
B
So
if
you
have
any
ideas
of
what
that
might
look
like
or
what
extra
features
you
would
like
from
a
compressor
in
order
to
make
that
more
integrated
or
more
useful
in
ZFS,
like
El
Cid
4s
got
some
interesting
features
for
that
early
abort,
where
it
will
decide
that
it
can't
compress
it
into
that
small
of
a
buffer
and
will
not
waste
a
bunch
of
time
trying
to
compress
it.
So
maybe
something
like
that
would
also
be
nice.
B
Zed
standard
also
has
both
a
block
compression
and
a
streaming
compression
API.
There
might
be
some
use
for
that.
What
would
be
nice
is
talking
about.
If
there's
things
we
could
do
to
reduce
the
amount
of
memory
it
takes
when
we're
trying
to
only
compress
like
8k
blocks,
we
really
don't
want
to
have
to
allocate
a
hundred
kilobytes
of
of
RAM
for
the
context.