►
From YouTube: Saso Kiselkov - Compression - OpenZFS Dev Summit 2014
Description
Compression (Saso Kiselkov from Nexenta)
C
And
we're
going
to
be
talking
about
essentially
what's
pointless
working
compression,
I
got
no
weight,
so
first,
a
quick
primer
on
compression
stuff
compression.
Mostly
when
you
look
at
the
world
of
personalities
out
there
they're
mostly
falling
into
two
categories,
you
got
the
archiving
guys
and
you
have
the
real-time
guys
so
the
archivers.
C
D
C
The
new
thing
about
that
is,
though,
that
they
get
really
good
compression
ratios,
really
being
sort
of
you'll,
see
that
in
a
minute
and
the
real-time
guys
try
to
do
just
a
good
enough
job,
but
they'll
they'll
be
pretty
fast
about
it.
Those
are
the
guys
down
at
the
bottom
there
and
if
you
recognize.
C
C
Data
non-compressible
data
and
decompression
lc4
is
pretty
damn
quick,
so
this
is
pretty
much
why
we
bother
with
real
real-time
stuff.
It's
not
just
for
crazy
things,
like
compressing
memory
or
doing
sort
of
a
cpu
to
memory
transfers.
It's
also
used
for
actual
data,
because
the
data
savings
are
slightly
modulated
by
the
fact
that
we
get
huge
performance
but
the
arc,
and
so
there's
a
there's,
a
sort
of
middle
ground.
C
B
A
C
Sometimes
it's
just
worth
it.
Sometimes
we
don't
really
bother
with
initial
cpu
usage.
Sometimes
we
are
just
concerned
with
serving
the
data
many
many
many
times
so
the
initial
cpu
cost
gets
sort
of
dilute
it
down
and
the
bandwidth
can
cost
us
a
lot.
So
it
makes
sense
to
do
it
right.
The
first
time,
first
time
around
certain
stuff,
even
compresses
super
duper
well
to
take
your
average
web
server
text
files
logs
stuff,
like
that
compresses
down
really
well
up
to
and
over
ninety
percent
loft
cost
even
better
than
that.
C
However,
certain
workloads
do
not
compress
much
at
all
so
pre-compressed
stuff.
You've
got
your
multimedia
venture
archives,
stuff,
that's
already
getting
sort
of
crunched
over
my
compression
algorithm
is
pretty
much
a
non-target,
and
so
when
we
do
as
administrators
look
at
setting
up
compression
on
our
data
sets.
What
are
we
looking
at?
Initially
we
look
at.
Are
we
going
to
pay
for
the
cpu?
B
C
Cycles
but
usually
we're
pretty
good
on
start
systems.
These
things
have
reasonably
fast
cpus
and
they're
underutilized.
The
second
question
is:
am
I
going
to
be
getting
something
out
of
it?
We
asked
that
question
and
we
tried
into
it
in
some
strange
fashion,
probably
akin
to
my
reading
what
our
datasets
can
be
composed
of.
So
we
try
to
gather
whether
it
makes
sense
to
turn
it
on
the
funny
thing
is.
We
should
not
even
be
thinking
about
that.
C
We
should
be
letting
the
machine
figure
that
out
for
us
for
us
and
the
beauty
about
file
systems
is
usually
when
you
look
at
files
they're,
either
one
or
the
other
they're,
either
compressible
or
they're,
not
compressible.
Usually
that
way
so
you've
got
your
text.
You've
got
your
documents,
your
uncompressed
audio,
that's
super
compressible
and
you
got
some
stuff.
That's
never
going
to
compress
much
at
all
and,
unfortunately,
for
us,
though,
as
administrators
have
compression
settings
profile
system,
so
there's
not
much.
We
can
sort
of
do
in
a
fine-grained
approach.
C
C
C
C
B
C
All
computer
sciences
have
been
all
about
them
for
the
last
20
years,
but
so
far
nobody's
actually
built
a
cpu
capable
of
running
the
damn
thing
and
for
decompression
it's
really
just
grammar
processing.
So
you
get
a
compressed
stream,
which
is
a
complex
grammar
of
various
compression,
primitives
and
you're
just
trying
to
reconstruct
the
original
data,
the
more
expressive
the
grammar,
the
slower.
It's
going
to
be
pretty
much
so
there's
these
are
sort
of
the
two
areas
we
can
understand
and
we
cannot
do
much
about
the
uncompressible
stuff,
though,
has
this
range?
C
Why
would
that
slow?
We
usually,
as
I
said,
we
know,
profile
pretty
much,
whether
it's
going
to
compress
or
not
so
why
the
hell?
Are
you
spending
time
trying
to
figure
out
the
same
bit
of
information
over
and
over
again
we're
basically
compressing
my
block
and
forgetting
the
history
of
what
we've
been
doing
previously?
C
B
C
Let
the
user
decide
yeah
good
luck,
trying
to
keep
track
of
fueling
files
and
try
setting
the
compression
on
there's
tricks.
You
can
do
with
directories
and
stuff
like
that,
but
pretty
quickly
you're
going
to
be
sick
and
tired
of
it
you're
just
going
to
turn
it
again
on
or
off
which
is
again
getting
back
to
the
file
system
setting
which
originally
was
a
sensible
decision.
But
this
is
not,
and
of
course
frequently,
the
administrator
is
not
even
the
user
who's
actually
using
the
start
system.
C
C
We
did
not
modify
the
on
this
format
and
we
just
have
cfs
make
the
decision
based
on
historic
performance.
There's
a
simple
heuristic.
It
just
checks.
How
often
I've
succeeded
if
I've
not
succeeded
much
at
all,
I'm
not
going
to
try
for
a
while.
Then
I'm
going
to
retry
again
and
progressively
either
backstab
or
becomes
more
reluctant
to
stop
to
back
to
back
off
from.
C
Beauty,
the
beauty
about
this
approach
is
it
works
for
any
file,
any
data,
you
don't
lose
much
compression
ratio
at
all
so
far
I
haven't
actually
seen
any
and
it
works
even
for
composite
crop.
So
you
check
your
bmd
case,
which
is
full
of
all
manner
of
data
and
we
can
sort
of
dynamically
adapt
to
two
various
right
patterns
going
on
in
there.
C
So
the
yeah
there's
a
couple
of
confusing
lines
here,
but
essentially
the
important
lines
are
the
top
one
and
sort
of
the
the.
So
this
is
compression
performance
conversion
performance
with
a
certain
amount
of
in
constant
input.
Data
flowing,
that's
not
compressed.
Essentially
your
garbage.
C
C
So
you
can
see
that
there's
about,
I
don't
know
like
20
performance
increase
one,
it's
even
greater
because
for
some
reason,
gzip1,
the
regular
compression
stuff
is
faster
than
when
it
gets
fed
on
compressible
crap.
So
the
performance
difference
there
is
much
larger,
actually.
C
So
the
the
algorithm
works,
essentially
by
remembering
it's
remembering
his
results.
We
track
the
compression
performance
on
a
perfect
basis
when
the
block
depends
on
what
the
state
is
currently
off
the
file.
So
when
it
was
hit
by
a
lot
of
incompressible
data
chances
are
we'll
skip
it,
it
will
just
accounted,
as
is
a
bunch
of
data.
We
tried
that's
compressed
and
we'll
try
again
later
on
when
the
file
has
been
getting
decompression,
we'll,
try
and
compress
it.
C
C
So
the
thing
with
lg
force
test
or
whether
something
is
compressible
is
that
it
can.
It
is
fast,
that's
true,
but
it's
sort
of
a
quick
check,
so
it
will
either
determine
that
something's
good.
So
there's
a
good
chance
that
it
will
impact
your
compression
performance
compression
ratios
pretty
severely.
It's
actually
tunable
inside
of
lc4.c
how
hard
it
should
try
and
the
effect
to
that
is.
C
The
high
compression
of
orion
lz4
you'll,
see
it's
not
there.
It's
just
the
non-compressible
performance
is
exactly
or
almost
the
same
as
the
compressible
one.
C
It's
because
if
you,
if
you
do
these
quick
checks,
chances
are
you'll,
throw
away
data
that
you
could
have
compressed.
It's,
unfortunately,
a
trade-off,
but
the
beauty
about
it
is
about
the
smart
compression
approach
is
that
we
don't
even
have
to
do
that
check
anymore.
We
can.
We
can
remember
our
previous
performance.
That's
the
point.
The
compression.
A
C
So
yeah
there's
no
persistent
changes
in
that.
We
do
not
qualify
on
this
state.
There's
one
property
we
have,
which
is
certainly
not
really
changing,
it's
not
incompatible,
and
it
will
turn
on
automatically
if
you're.
If
you
have
compression,
turned
off
on
the
data
set,
it's
sort
of
a
separate
setting,
but
by
default
it's
on
when
you
do
use
compression
on
your
data
set
you're
going
to
get
this
feature
automatically
on.