►
From YouTube: Refining OpenZFS Compression by Rich Ercolani
Description
From the 2022 OpenZFS Developer Summit: https://openzfs.org/wiki/OpenZFS_Developer_Summit_2022
Slides: https://docs.google.com/presentation/d/1og6UY010exjAANYkkmZn9qrAlO9r0TN6/edit?usp=sharing&ouid=112595186103367032517&rtpof=true&sd=true
A
A
This
is
going
to
be
a
talk
about
mostly
a
couple
experiments
that
I
did,
some
of
which
panned
out
and
a
lot
of
which
didn't
and
how
I
got
there,
which
will
hopefully
be
more
interesting
than
the
dry
topics
in
the
slides.
Well,
I
mean
it's
fine,
so
quick
disclaimer,
the
talk
is
independent.
I
am
employed
by
Google.
A
A
A
If
you
picked
up
a
random
thing
off
a
library,
rather
just
then
trying
to
update
lc4,
because
people
keep
asking
about
that
trying
to
updates
a
standard,
because
people
keep
asking
about
that
trying
to
add
broadly
because
I've
had
like
three
people,
ask
about
that
and
then
an
experiment
that
did
pan
out
more
adding
an
earlier
board
function
to
the
standard
compression
and
then
a
summary,
and
if
we
have
time
A
bunch
of
random
other
experiments
that
I
didn't
go
into
depth
now.
Why
did
I
do
this.
A
A
You
know
it's
nice
to
have
some
of
the
results
like
I'm,
going
to
show
later
about
going
from
two
hours
to
15
minutes,
so
I'm
compressing
something
great
people
keep
asking,
and
it's
nice
to
have
an
answer.
Besides.
Well
it's
hard
and
we
haven't
tried
and
more
directly
I
started
contributing
more
actively
because
I
spent
a
bunch
of
time
on
leave
and
I
wanted
to
get
back
into
the
habit
of
actually
working,
so
I
started,
trying
to
reliably
contribute
a
little
bit
and
the
Habit
stuck
and.
A
All
stuck
with
me
so
unique
requirements
about
compression
here
so
in
a
couple
of
places,
EFS
assumes
that
if
it
decompresses
and
recompresses
it
gets
the
same
thing
back.
A
The
one
I
can
recall
offhand
is:
if
you
use
the
l2r
persistence,
it
always
saves
it
compressed.
But
if
you
have
uncompressed
Arc,
then
you
may
be
sad.
If
there's
a
mismatch
so
swapping
out,
just
lz4
is
a
standard
or
whatever
for
a
new
version
will
produce
different
results.
Both
can
decompress
and
compress,
and
both
can
decompress
each
other,
but
like
ZFS
will
get
sad
and
ZFS
is
sad.
It
will
throw
errors.
A
This
would
also,
for
example,
cause
problems
for
like
an
operator
dedupe,
because
you
know
different
result,
different
checksum,
that
time
technically
right
now,
that's
a
problem
with
using
gzip,
because
Linux
and
everything
else
use
different
z-lib
compression
versions
and
also
a
problem.
If
you
use
the
Intel
qat
offload
stuff,
but
nobody's
really
complained
about
it.
It
just
is
true,
so
maybe
it's
not
really
a
problem.
A
We
have
to
worry
too
much
about,
or
maybe
nobody's
using
Jesus,
who
knows,
but
more
interestingly,
like
ZFS
has
like
tiny
records
that
we're
compressing
right
like
128k
one
Meg,
If,
You,
Really,
Turn,
It,
Up
16,
and
a
lot
of
things
are
focused
on
like
large
streams
of
data
or
large
masses
of
data
at
once.
A
So,
for
example,
Z
standards
performance
test
as
far
as
I
can
tell
are
mostly
focused
on
like
either
parallel
streams
or
large,
like
tens
or
hundreds
of
Meg
sets
of
data,
whereas
we're
never
going
to
get
that
large
or
well
I
suppose
if
we
get
really
creative,
we
could,
but
why
and
zephas
won't
save
space
in
tiny
units.
A
B
A
Did
have
it
already?
Well,
if
you
save
like
another
3K
on
128k,
it's
not
worth
the
trade
up,
so
it
won't
make
a
difference.
Even
if
you
know
3K
of
128k
across
your
whole
data
set
is
kind
of
a
lot.
So
you
know
hatching.
B
A
A
Each
of
the
compression
algorithms
currently
would
get
to
implement
handling
that
themselves,
which
is
not
impossible,
like
I,
have
branches
for
this,
but
it
is
an
additional
complexity.
You
have
to
deal
with
so
anytime.
You
want
to
update
it.
You
would
have
to
consider
whether
that
complexity
is
worth
it
versus
what
you
get.
A
And
currently,
as
I
alluded
to
earlier
is
the
standard,
for
example,
does
a
bunch
of
things
about
parallelizing
compression
and
the
interface
for
doing
that
gets
slightly
worse,
compression
sometimes,
but
also
is
paralyzable,
but
that
also
means
that
it
can
be
unreliable,
how
it
doesn't
necessarily
guarantee
you'll
always
get
the
same
compressed
result,
if
you
do,
that
is
my
understanding,
so
that
would
run
into
problems
with
caveat
one.
A
So
here's
how
I
did
the
graphs
that
are
coming
up
here,
I
made
a
couple
data
sets
that
I
thought
would
be
differently.
Reflective
one
of
them
is
a
20
year
old,
male
dirt,
I
bought
that's,
you
know,
text
so
mostly
highly
compressible,
but
tinier
files,
a
bunch
of
firmware
blobs
that
I
have
from
updating
various
devices
over
the
last
decade,
which
mostly
incompressible
not.
A
And
then
a
snapshot
of
my
root
file
system,
because
that's
just
a
wild
card
of
all
sorts
of
things
then
wrote
them
at
different
record
sizes.
We
send
streams,
sentimental
receives
backed
by
fast
storage,
a
couple
times
and
averages
the
results.
A
The
space
saving
can
be
kind
of
variable
because
you're
also
weighing
how
much
the
metadata,
among
other
things,
compresses
and
that's
going
to
vary
based
on
where
it
puts
things
or
you
know,
phase
of
the
moment
of
writing
whatever
there's
a
variation.
That's
going
to
happen,
naturally,
even
if
you
didn't
do
something
differently,
so
if
you
see
results
below
like
30
Megs
or
something
it
is
probably
just
noise
and
I
did
this
with
a
my
zentry
desktop.
A
My
Intel
coffee
link
desktop
my
Raspberry
Pi's
and
a
Mac
Mini,
which
seems
like
a
reasonable
set.
I
would
have
also
done
like
a
spark,
but
I
tried
letting
that
run
for
12
hours,
and
it
still
wasn't
done
so
I
I
said
no
much
as
I
enjoy
running
things
on
The
Spar,
but
you
know
surprising.
Nobody
thinks
that
are
one
CPU
intensive
task.
Essentially
a
lot
are
going
to
vary
wildly.
A
A
There
was
a
lot
of
spelunking
involved
to
find
that
I
was
trying
to
figure
out
a
good
way
to
add
compatibility
because
or
backward
compatibility
without
a
feature
change,
because
it
seems
like
a
shame,
since
the
forward
and
Backward
Compatible
and
the
standard
does
this
by
putting
a
header
at
the
front
with
versioning
information,
but
that
would
break
old
does
E4.
A
I,
I
can't
I'm,
apparently
hearing
things
great.
You
can
stick
a
version
field
on
the
end
in
the
gap
between
how
long
lz4
thinks
the
record
is
and
how
long
it
actually
is,
and
I
checked
and
the
old
code
is
perfectly
happy
with
this.
Doesn't
care
great
and
you
can
handle
all
the
rest.
However,
you
like
it's
very
tiny,
like
lz4,
is
like
lz4.c
and
lz4.h,
and
also
lz4hd.cma.
A
Basically,
like
it's
kind
of
really
easy
to
just
log
in
and
go
have
a
nice
day,
so
I
tried
this
on
the
mail
dirt
and
the
firmware
blobs
and
the
space
Delta
I
got
even
at
a
one.
Meg
record
size
was
like
you
know:
15
Megs,
one
way:
five
banks
another
way
like
nothing
like
it's
noise.
A
So
I
tried
again
to
see
how
long
it
would
take
to
write
and
read
and
mailed
her,
and
you
know:
writing
was
noise
like
there
was
no
difference
really
but
reading.
On
the
other
hand,
like
the
newer
decompressor,
that
you
know
that
that's
a
pretty
good
Delta
and
like
the
range
on
it
was
pretty
large
across
a
bunch
of
different
data
sets
like
sometimes
it
was
a
little
faster.
Sometimes
it
was
a
lot
faster,
but
it
wasn't
really
ever
slower
so
that
Arrow
wasn't
supposed
to
show
up
that
way.
A
Oh
well,
after
all
that
the
compressor
wasn't
really
worth
it
and
the
complexity
wasn't
really
worth
it,
but
the
decompressor
was
a
good
win.
So
why
don't
we
just
take
that
and
go
and
I
did
and
it
landed
after
not
very
much
reviewed
because
it
was
like
it
works
and
we
can
always
just
pull
it
out
if
it
doesn't
work
and
it's
not
in
a
stable
release.
A
Right
now,
but
I
have
been
running
it
since,
before
it
got
merged
and
I
haven't
found
anything
that
breaks
so
it'll
be
in
the
next
release.
Great
the
standard
update
is
the
one.
A
lot
of
people
have
been
agitating
about.
Wanting
the
one
we're
running
was
released
in
May
2020
merged
in
August,
2020
I
believe
it's
not
when
the
pier
was
open
just
when
that
version
was
merged,
a
bunch
of
different
files,
kind
of
involved.
A
So
originally
it
was
all
aggregated
into
one
file,
because
the
standard
has
a
thing
to
do
that,
for
you
already
built-in
versioning,
so
like
no
fun
on
disk
format.
Meddling
to
deal
with
one
unfortunate
thing
is
that
their
testing
was
such
that
1.5.0
was
so
much
faster.
They
decided
to
turn
up
the
compression
settings
for
each
of
the
levels
in
151
and
newer.
So
as
a
result,
those
all
are
slower
than
they
were
because
they
figured
they
had
performance
bandwidth
to
burn.
A
A
Negative
numbers:
that's
a
bunch
of
different
versions
that
I
bolted
on,
in
addition
to
145
and
the
difference
from
that
two
one,
four
five,
so
you
know
Z
standard
five
is
a
nice
Improvement,
7
and
11
are
okay
improvements,
and
you
know
the
rest
is.
Noise
is
how
I
would
interpret
that
your
mileage
may
vary,
but
then
on
a
different
data
set
or
the
same
data
set.
Actually,
if
I
recall
with
a
different
record
size,
then
the
space
usage
goes
up
or
does
nothing.
A
Isn't
so
you
know
going
from
sorry
again,
I'm
hearing
things
great,
you
know
going
up
by
like
30
seconds
out
of
120
or
so
not
really
a.
A
A
A
Then
this
result,
which
was
the
incompressible
data,
being
really
really
slow.
It's
like
oh,
no!
This
is
a
bad
idea,
so
after
all
that
it
would
complicate
that
to
have
like
something
to
handle
the
standard,
versioning
properties,
because
I'd
argue
for
that,
because
otherwise
you
know
people
with
dedup
and
not
break
would
be
very
sad
indeed,
but
then
we'd
need
to
keep
it.
B
A
Forever
and
yeah,
sometimes
markedly
slower
for
no
better
results.
The
early
important
thing
that
I
mentioned
earlier
and
I'm
going
to
talk
about
shortly
might
be
very
helpful
for
this
I
thought.
A
I
had
something
set
up
with
that
integrated
already,
but
I
didn't
and
I
ran
out
of
time,
because
I
discovered
I
had
done
half
of
these
tests
wrong
when
I
was
rerunning
them
with
updates
I
had
half
integrated
the
old
version,
half
the
new
one
and
that
that
that
wasn't
going
to
do
anything
useful,
so
I
fixed
that,
but
did
not
have
time
to
rerun
this,
so
I'm
probably
going
to
do
that
during
the
hackathon
tomorrow,
unless
I
have
a
better
project
and
we'll
see
how
that
goes,
all
the
graphs
are,
after
fixing
that
to
be
clear,
the
previous
graphs
were
much
worse
results
and
always
said.
A
So
the
other
thing
another
thing
I
tried
was
adding
broccoli
because,
like
a
couple,
people
came
to
me
and
said:
hey
here:
have
you
heard
of
this?
Compression
thing
was
like
I've
heard
of
it
I've
not
heard
that
many
people
use
it
that
often
but
I've
heard
of
it,
and
you
know
it
was
already
in
like
self-contains
the
great
not
like
another
experiment:
I
did,
which
was
trying
Snappy,
but
snappy
is
written
in,
go
so
not
as
convenient
sure.
A
Let's
go,
I'm
not
intended
broadly
compression
goes
like
zero
to
nine,
and
so
you
know
that's
a
fairly
simple
range:
there's
not
anything
complicated
technically,
it
says
10
and
11
too,
but
those
are
a
very
different
thing
and
should
not
ever
be
used
interactively
do
not
do
it.
It's
bad!
It's
bad,
don't
do
it.
A
It
turned
out
to
be
a
little
more
complicated
because
it
turns
out.
Rotley
wants
to
do
floating.
Point
Mass
when
it
compresses
running
floating
Point
math
in
the
kernel
can
be
sad.
They
I
asked
on
mailing
lists.
They
were
considering
adding
fixed
point,
but
they
haven't
done
it.
A
So
you
know
we
get
to
do
that.
We
put
barriers
around
every
call
in
then
it
turns
out.
The
allocator
has
a
problem
where,
when
it
says
no
sleep,
it
actually
means
no
sleep
unless
I
want
to
and
sleeping
when
you
have
preemption
disabled
is
bad.
Don't
do
that.
Linux
gets
really
mad,
though
oddly
it
doesn't
mind
on
older,
x86
kernels
for
some
reason,
but
newer
ones
or
any
other
platform.
It
gets
real
mad.
A
A
This
is
a
bunch
of
useful
or
a
bunch
of
comparisons
of
these
standard.
Gzip,
broadly
for
and
nothing
broadly,
is
red
and
Z.
Standard
fast
is
light.
Blue
and
Z
standard
is
dark.
Blue
all
of
that
is
mostly
to
be
referenced
later,
but
the
point
I
wanted
to
make
is
that
it
looks
like
broadly
at
lower
levels
can
be
better
and
faster
than
the
other
options
that
you
might
have
like.
You
can
see.
A
So
you
can
see
here
that,
like
broadly
at
its
lowest
level,
is
nicer
than
say
gzip
or
Z
standard
one,
while
also
being
pretty
fast
compared
to
them.
So
you
know
not
I,
don't
I,
don't
think
that
it's
worth
the
difference,
especially
with
the
complications
and
overhead
that
I
had
to
do
to
get
it
working.
A
A
Cool,
so
early
important
was
a
feature
that
I
thought.
Maybe
this
could
work
because
one
thing
we
talk
about
a
lot
for
there
was
a
lot
of
talk
about.
You
could
find
in
lots
of
blogs,
and
people
talking
is
that
one
of
the
reasons
lz4
is
nice
on
ZFS
is
that
it
will
bail
out
early
rather
than
wasting
time
trying
to
compress
things-
and
you
know,
Z
standard
is
another
compression
thing
originally
by
the
same
author.
A
So
surely
it
would
do
something
similar
or
have
some
similar
functionality?
Are
we
maybe
not
using
it
because
I'm
sure
if
you've
used
the
higher
C
standard
levels,
you're
familiar
with
how
unfortunate
it
can
make
your
system?
If
you
don't,
if
you're,
not
careful
and.
B
A
So
it
skips
through
small
portions
at
a
time,
so
you
still
get
decent
compression,
even
if
it's
a
mix
of
compressible
and
incompressible
data
without
burning
all
your
CPU
time
on
things,
you
can't
compress
that's
my
understanding
from
reading
the
code
carefully.
I
did
not
go
ask
the
author.
So
if
I
find
out
I'm
wrong,
I'll
tell
people,
but
that's
my
understanding.
A
So
as
an
initial
experiment.
Thinking
I'd
do
something
more
refined
after
this
may
be
worked.
I
tried,
gluing
lz4
on
as
like
a
pass
filter
for
weather
to
decide
it
should
compress
or
not
was
he
standard
and
the
initial
results
look
really
confusingly
good?
That
is
a
non-linear
graph
on
the
left,
because
otherwise
it's
just
not
readable
because
it
just
so
a
notable
thing
here
with
the
incompressible
data
is
like
on
my
this
was
on
my
Horizon
on
there.
It
took
10
minutes
to
write
the
incompressible
blobs
at
one
Meg
record
size.
A
Without
this
change
and
with
this
change
it
took
about
a
minute
and
a
half,
so
you
know
kind
of
a
difference
and
the
amount
of
space
you
use
difference
was
like
you
know
like
100,
Megs
or
10
Megs,
or
something
really
small.
A
So
you
know
I'll
take
you
know
a
fifth
of
the
time
for
10
Megs
out
of
45
gigs,
or
you
know
great,
except
if
I
try
this
on
highly
compressible
data,
then
the
Delta
gets
a
lot
bigger
and
it
was
like
two
gigs
or
so,
if
I
recall
so
like
it
didn't
take
much
longer.
So
that's
fine,
but
the
Delta
was
like
losing
two
gigs
of
compression
and
that
that's
not
really.
Okay,
okay,
so
like
that
first
result
was
really
good.
A
A
A
Yeah
me
too,
but
so
it
turns
out
that
all
of
them
are
really
bad
as
a
first
pass
compared
to
just
using
lz4
in
terms
of
space
savings
like
they're.
All
they
all
give
up
worse
and
I-
don't
show
it
here,
but
the
amount
of
time
they
take
is
also
sometimes
worth
the
higher
the
closer
it
gets
to
lc4,
because
lz4
is
really
the
king
of
what
it
does
it's
really,
astonishingly
good
at
it.
A
But
what
if
we
try
doing
both
there's
no
way
like
running
two
compression
passes.
First
is
going
to
be
time
or
space
efficient
right
right.
A
A
You
know
up
to
like
using
Z
standard
2,
where
the
Delta
between
it
and
just
running
is
the
standard.
Three
is
Tiny,
so
how
much
time
does
that?
Take?
Let's
run
that
test
again
where
we
do
the
incompressible
blobs,
and
you
know
it
looks
basically
the
same
I
think
I
didn't
run
it
quite
as
far
out,
because
I
didn't
feel.
Like
waiting
for
Z
standard
18
to
run
so,
okay,
that
that's
still
good
savings
great
and
on
the
highly
compressible
stuff
you
can
see,
the
Delta
is
like
nothing.
A
And
the
time
difference
as
well,
in
addition
to
the
space
difference,
is
still
you
know
pretty
negligible,
it's
actually
until
you
get
up
to
like
Z
standard
15,
still,
basically
the
same,
which
is
pretty
good
for
you
know
taking
some
of
the
data
and
trying
two
different
compressors.
First,
before
you
tried
the
thing
you
were
going
to
do,
okay,
so
this
was
all
right
on.
Like
my
high
end,
ryzen
I've
got
a
lot
of
cores
and
computation
per
core
there's
no
way.
This
should
work
on
a
Raspberry
Pi.
A
Actually
funny
story:
it
goes
if
you
do
the
same
in
compression
incompressible
data
test
from
taking.
You
know
well
over
I'm
sorry,
I.
A
I
I
said
two
hours
here,
but
I'm
thinking,
yeah,
that
math
is
okay.
I
did
check
that
two
hours
to
like
13
minutes
and
the
space
Delta
is
like
nothing.
Okay,
you
know
like
I'll.
Take
that
I
will
absolutely
take
not
taking
two
hours
to
write
this
at
that
compression
level.
That's
great,
so
I
skipped
over
playing
with
different
record
sizes
and
trade-offs
to
not
to
decide
when
to
do
this
because
ultimately,
I
picked
at
least
a
standard
three,
and
at
least
128k.
A
I
also
skipped
over
finding
out
the
door
was
a
bug
in
how
the
ark
did
recompression
that
never
came
up
unless
you
ran
this
I
still
don't
understand
why
it
never
came
up
unless
you
ran
this,
but
it
sure
did
so
that
got
fixed
as
I
alluded
to
I
have
a
lot
of
graphs
and
I
can't
just
push
this
up
to
Z
standard,
because
you
know
they
don't
only
operate
in
tiny
chunks
like
this.
A
A
There's
a
backboard
but
like
that,
that's
a
very
kind
of
the
actual
amount
of
code
change
is
not
that
large,
but
you
know
it's
kind
of
a
significant
change
in
what
you
might
expect
it
to
do.
So
it's
not
in
a
point
release
is
my
understanding.
A
B
A
It
I'm
not
suggesting
of
the
not
coming
up
gradually
I
thought
I
did
that
oh
but
I'm
not
suggesting
we
merge
it.
For
that
reason,
right,
like
that's,
that's
too
much
work
for
too
little,
but
it's
a
really
fun
data
point
and
really
strange,
and
you
know
that
I
thought
would
be
somewhat
entertaining
to
people
who
find
this
sort
of
thing
funny
laughing
at
compression
is
not
necessarily
what
you
were
expecting,
but
here
we
are
so
here's
a
summary
right.
So
lz4
update
the
decompressive.
A
If
you
ever
decide
you
want
to
do
it.
Let
me
know:
I
have
a
branch.
It
works
it's
great,
but
that
would
be
my
opinion.
The
standard
update
seemed
like
a
bad
idea.
I
realized
halfway
through
testing
I.
Did
it
wrong
and
did
not
have
enough
time
to
wire
it
up
properly
to
run
all
the
tests
again
before
this?
My
apologies,
but
so
far
it
looks
like
it
could
maybe
be
a
win
because
it
turns
out,
as
we
learned
with
early
abort,
a
lot
of
the
time.
A
A
You
know,
Bradley
was
fun,
but,
like
you
know,
it's
a
compression
algorithm
it
it
doesn't
magically.
You
know,
I,
don't
know,
use
neural
networks
to
magically
recreate
your
data
in
five
bytes
like
early
abort,
again
Twitter.
This
doesn't
this
should
not
work,
but
I
cannot
argue
that
as
far
as
I
can
tell
it
definitely
does
so.
A
A
A
Lot
of
time,
so
I
tried
updating,
zlib,
because
you
know
integrating
our
own
z-lib
copy
would
avoid
the
problems
I
mentioned
earlier,
with
different
gzip
versions
that
nobody
really
runs
into,
but
are
still
there.
We
just
don't
hit
them
in
practice,
but
a
lot
of
the
ones
that
are
faster
than
just
baseline,
gzip
or
zlib.
I
should
be
consistent,
mostly
rely
on
doing
FPU
instructions
to
be
better
and
don't
actually
seem
to
be
consistently
better
and
are
often
significantly
worse
at
compressing
in
The
Limited
testing.
A
That
I
did,
and
some
of
them
are
really
hard
to
get
to
compiling
the
kernel,
because
they
are
not
remotely
similar
styles
of
code,
because
they've
reshuffled
everything
as
I
mentioned
I
found,
while
doing
this
Linux
actually
did
a
similar
thing
to
what
I
did
with
the
lz4
decompressor
and
merged
the
zlib
decompressor.
That
was
newer,
15
20
years
ago,
but
let
the
compressor-
because
it
had
this
tiny
regression
on
arm
and
nobody
ever
carried
again.
A
Really
matter
because
I
haven't
heard
lots
of
people
using
Giza
right,
like
lz4,
is
better
at
one
thing:
Z
standard
is
better
than
other,
so
you
would
really
only
use
it
if
you
were
trying
to
have
some
compatibility
with
things
that
don't
understand
either
of
those,
and
that
would
be
a
very
Niche
set
of
people
as
I
mentioned.
Oh
snap,
peoples,
oh
I,
remember
what
I
was
thinking
of
it's
S2,
which
was
written
and
go
Snappy
is
written
in
C
plus,
which
you
know
lobbing
that
into
the
kernel,
not
not
fun.
A
Don't
do
that,
but
one
of
the
Linux
Colonel
doves
wrote
A
C
implementation
to
consider
something
similar
a
few
years
ago.
So
I
could
just
use
that
my
experience
was
that
it's
bad
at
General
use
since
as
far
as
I
understand
it
was
really
intended
to,
like
my
understanding,
is
we've
basically
intended
to
compress
like
blocks
of
text
that
that
was
the
goal.
A
You
know
like
a
tiny
thing
to
compress
text
have
a
nice
day,
so
it's
not
too
surprising
that
throwing
General
results
or
general
sets
of
data
at
it
did
not
end
well.
A
S2
is
an
interesting
project
where
a
I
understand
correctly,
a
database
developer
decided
to
integrate
Snappy
decided
it
didn't
perform
well
enough
and
wrote
their
own
re-implementation
of
it.
That
is
backwards,
compatible
compression
and
decompression,
but
markedly
faster
and
better,
which
is
a
neat
trick,
but
the
their
implementation
is
written
in,
go
so
I'm,
not
lobbling
that
into
the
kernel,
I
I,
know
and
I
haven't
spent
time
trying
to
re-implement
it.
A
But
it's
an
interesting
thing:
if
anyone
wants
to
consider
it
and
I
tried
playing
with
the
Z
standard
memory
allocator,
because
as
anyone
who's
looked
at
the
code
knows
it
does
its
own
custom
pooling
allocation
thing
which
works.
But
you
know
it's
a
weird
custom
thing:
it
would
be
nice
if
we
didn't
have
to
have
this
custom
thing
over
here
when
we
have
all
these
other
things,
but
Linux
at
least
has
limits
on
its
own.
A
Like
caching
allocator
things
in
terms
of
how
large
the
thing
you
can
cache
is,
it
will
just
complain
if
you
try
to
make
like
a
32
Meg
allocation
or
something.
A
So
you
can't
build
a
pooling
thing
out
of
that,
because
it
won't
do
it
and
you
know
just
dynamically
allocating
on
demand
the
size
of
the
allocation
that
this
thing
needs.
Sometimes
it's
just
sad,
so
I
tried
using
the
Zio
allocators
after
seeing
a
patch
that
Alan
made
at
one
point
to
do
that,
and
it
seemed
slightly
faster
like
cold,
but
then
the
more
you
ran
it.
It
basically
became
noise
compared
to
the
other
outages.
So
I,
don't
really
think
that's
worth
trying
to
merge.
A
If
it's
going
to
not
be
better
belonging
about
it,
I
was
going
to
say
that
I
believe
it's
the
last
slide
so
now
I'm
happy
to
talk
about
anything
I
just
said,
or
lots
of
other
random
experiments.
I've
done
that
I
didn't
because
I
thought
well
when
I
practiced.
This
I
was
better
at
talking
slower.
A
A
That's
something
that
broadly
does
and
that's
the
thing
it
does.
That
needs
floating
Point
Maps
is
it
does
at
higher
levels
anyway,
an
entropy
estimation,
calculation
on
the
Block,
you
hand
it,
but
that's
that
can
be
expensive.
I.
Think
lz4,
mostly
doesn't
do
that
Z
standard
does
at
higher
levels.
A
Doing
that
fast
is
kind
of
a
problem,
though
right
like
you,
you
can
basically
do
my
understanding.
I
am
not
an
expert
in
this
field.
Is
that
you,
it
basically
runs
into
a
bit.
Counting
problem
is
what
you
want
to
do,
or
something
to
that
effect.
A
So
modern
CPUs
do
have
fast
instructions
for
that,
so
someone
could
probably
write
one
to
use
for
this
purpose.
That's
a
better
refinement
than
just
the
brute
force
of
running
multiple
compressors
great.
A
But
aside
from
that
I,
don't
know
how
fast
you
could
be
at
it
other
than
having
to
iterate
over
the
whole
thing
or
having
data
on
it.
Initially,
like
having
data
on
what
your
input
is
before
you
came
in
is
my
understanding.
A
A
It
has
the
I
believe
the
compressed
size
of
the
data
at
the
front
and
that's
it
so
I
couldn't
just
shove
it
in
there,
because
there
was
already
something
there
and
I
need
I
wanted
to
maintain
the
backward
compatibility
as
much
as
I
could,
because
it
seems
like
a
shame,
to
have
a
feature
flag
bump
that
out
for
no
reason
so
I
could
do
that,
but
it's
not
necessary
there,
because
there's
no
existing,
broadly
implementation,
I
have
to
care
about.
A
No
I'm
just
throwing
it
out
and
that
wouldn't
necessarily
work,
because
the
way
that
it
works
is
that
it
operates
on
tinier
chunks
than
the
whole
thing
I
hand.
It
so
I,
don't
remember
the
constant
self-hand,
but
you
know
like
it
if
it
gets
like
16,
it
gets
like
12K
into
16k,
and
it's
not
done
it.
It
just
will
skip
to
the
next
16k,
so
you
still
get
some
compression,
even
if
it
and
then
and
all
of
the
algorithms.
A
We
have
integrated
we
hand
them
a
smaller
buffer
based
on
the
12
and
a
half
percent
I
mentioned
earlier,
and
if
they
run
out
of
space
they
give
up
rather
than
overrunning
the
buffer,
because
that
would
be
bad
and
so
no
it's
just
running
over
the
whole
thing
or
87.5
of
it,
potentially
depending
I'm,
not
using
any
dry
run.
A
Flags
I
did
experiment
with
using
the
higher
levels
of
lz4,
which
we
don't
expose,
but
like
all
the
codes
there,
but
it
turns
out
turning
the
level
up
at
all,
just
made
lz4
markedly
slower
and
did
not
significantly
improve
the
results.