►
From YouTube: Ceph Tech Talk 2020-06-25: Solving the Bug of the Year
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Hello,
everyone
thank
you
for
joining
us
for
this
month's
stuff
tech
talk
and
thank
you
to
dan
here
for
volunteering
to
provide
us
content
for
this
month,
especially
at
a
such
short
notice.
So
dan
has
a
nice
talk
for
us.
It's
solving
the
bug
of
the
year,
so
I'll.
Let
dan
go
ahead
and
take
it
away.
B
All
right,
so
thanks
for
the
chance
to
speak
here
at
the
sef
tef
talk
tech
talk.
I
will
be
talking
about
solving
the
bug
of
the
year.
That's
in
quotation
marks,
because
it
wasn't
me
that
called
it
that,
but
it
was
kind
of
exciting
that
one
of
the
mean
steph
devs,
called
it
the
bug
of
the
year
a
candidate
for
bug
of
the
year.
So
I'm
dan
I'm
from
cern,
I.t
and
yeah.
B
B
So
a
quick
recap
of
saffit
cern,
so
cern,
as
you
probably
know,
is
the
european
center
for
nuclear
research.
In
geneva.
B
Ceph
has
been
a
key
part
of
our
it
infrastructure
since
2013,
notably
for
block
storage
and
stuff
ffs
for
openstack,
but
also
we
have
an
s3
site
for
opens
for
object,
storage
and
then
we
also
have
some
raido's
clusters
just
for
custom
storage
services.
B
So
just
to
give
some
setting
in
2013,
we
started
offering
stuff
rbd
via
openstack
cinder
block
storage
for
our
cloud.
Seven
rbd
proved
to
be
incredibly
reliable.
Over
the
years
we
had
some
very
few
short
outages,
mostly
related
to
network
connectivity
and
over
time,
more
and
more
use
cases
moved
on
to
our
cloud.
B
Now
as
of
february
this
year,
there
were
just
to
give
an
some
idea
of
the
importance
of
suffix
cern.
We
had
around
500
different,
shared
openstack
projects
using
rbd,
that's
like
auto,
visual
audio
visual
applications,
databases,
repositories,
engineering
and
physics,
applications
and
more
and
then
also
more
than
1200
personal
openstack
projects
that
were
making
use
of
the
block
storage.
So
in
total
we
had
more
than
6
000
cinder
volumes
about
four
petabytes
raw
used,
and
this
was
split
into
two
pools
and
two
rooms.
B
Here's
the
timeline
of
what
happened
on
february
20th,
this
infamous
day,
so
at
10,
00
10,
around
10
11.
When
we
were
at
our
coffee
break,
we
got
a
message
that
the
main
rbd
cluster
suddenly
had
25
percent
of
its
osc's
down
the
pgs
were
all
inactive
and
all
I
os
were
blocked.
The
whole
cloud
is
down.
Basically,
we
started
investigating
a
few
minutes
later
and
we
noticed
the
ost
processes
wouldn't
restart
and
their
log
files
were
all
showing
crc
errors
in
the
osd
map.
B
We,
after
a
couple
of
hours,
investing
in
ourselves
at
12
30,
we
were
checking
with
the
community
on
the
irc
emailing
list
in
the
bug
tracker
around
an
hour
later
we
understood
the
problem
at
a
simple
work
around
a
basic
level
and
then
a
few
hours
later
after
that,
we
brought
the
subsequent
rooms
up
and
we
had
the
service
up
and
running
by
by
the
evening.
B
B
Now
ceph
maintains
a
small
osd
map
which
describes
the
state
of
the
cluster
and
I've.
Given
like
a
an
example
one
here
for
a
tiny
cluster.
It
has
some
information
when
the
cluster
was
created,
some
various
flags
and
the
list
of
your
pools
and
also
information
like
the
osds
and
what
they,
when
they've,
been
up
and
running.
B
The
ost
map
has
all
of
the
information
needed
for
clients
and
servers
to
perform.
I
o
and
recover
from
failures.
So
this
osd
state
information,
but
also
it
contains
the
crush
map,
which
is
a
description
of
the
infrastructure
at
the
rooms,
racks,
hosts,
etc
and
the
data
placement
rules,
an
ost
map
is
shared
by
appears
across
the
cluster.
An
osd
can't
do
any.
B
I
o,
unless
it
has
the
latest
epoch
or
version
of
the
map,
and
each
ost
is
persisting
saving
to
disk
all
of
the
the
recent
epochs,
maybe
500
of
them
a
bit
more
detail.
So
developers
rely
on
some
internal
functions
to
encode
and
decode,
an
osd
map
between
the
osd
map
c
plus
plus
instance,
and
its
serialized
form.
B
So
the
serialized
form
is
the
one
that
you
share
out
the
network
and
then
you
can
have
it
in
a
class
as
well
in
the
code
and
then
advanced
stuff
operators
are
used
to
using
tools
like
the
ost
map
tool
to
print
and
manipulate
an
oc
map
so
a
bit
into
our
outage
now.
So
how
did
the
actual
ocs
crash
at
10?
A
at
10
11
350
of
the
osd's
out
of
1300,
crashed
all
the
same
time
with
this
back
trace.
B
So
basically,
you
have
handle
osd
map
a
function
which
is
receiving
actually
looking
into
the
code.
You
can
see
what
this
does.
It
receives
an
incremental
osd
map
from
up
here
and
then
it
crashes
when
decoding
it,
making
matters
worse.
The
osds
wouldn't
restart,
but
they
have
a
slightly
different
back
trace
when
they
try
to
restart
they
have
they
call.
B
So
we
know
that
the
osd
map
has
been
corrupted
somehow,
and
this
is
what's
causing
the
crashes
we
reached
out
to
the
to
the
sef
irc
channel,
who
very
quickly
pointed
us
to
a
related
thread
that
I'll
give
the
materials
afterwards
after
the
talk.
But
you
can
follow
the
links,
but
who
pointed
us
to
a
related
thread,
an
issue,
and
there
was
this
had
been
seen
before,
and
there
was
a
quick
fix
known
how
to
recover
quickly
so
to
recover
from
the
actual
problem.
B
B
And
start
that
overwrite,
the
corrupted
version
in
each
failed
osd's
object
store.
So
we
use
this
concept,
object,
restore
tool
to
set
the
osd
map
in
the
in
the
object
store.
We
did
this
for
every
failed
osd
across
several
corrupted
epochs
and
that
brought
the
cluster
back
online.
This
is
the
process
that
took
a
couple
of
hours
for
us
to
script
and
get
right
six
or
seven
hours
of
downtime.
We
were
back
online
but
of
course,
questions
remain.
Why
did
they
suddenly
get
corrupted
and
is
it
gonna
happen
again.
B
But
to
help
with
that
root
cause
analysis.
We
started
by
looking
at
a
diff
of
the
valid
and
the
corrupted
copies
of
one
osce
map.
So
if
you
look
through
this
diff
of
a
good
one
and
a
bad
one,
you
see
some
bit
flips,
there's
a
bit
flip,
there's
a
second
bit
flip,
there's
a
third
and
there's
a
fourth
somewhere
from
a
one
to
a
zero
here
bit:
flips:
okay,
oh
no!
It's
actually
yeah
anyway.
There's
four
bit
flips!
You
can
trust
me,
so
we
had
now
some
different
theories.
Why?
B
Why
are
there
bits
flipping
in
this
cluster?
In
the
osd
maps
they
can
have
a
few
different
sources
right.
They
can
be,
can
be
memory,
ecc
errors,
so
uncorrectable
memory
errors,
it
could
be
network
packet,
corruption
or
it
could
be
software
bugs
let's
go
through
these.
Could
it
be
a
memory
error,
so
this
was
our
first
theory.
B
We
searched
all
the
servers,
ipmi
error
list
and
dmsk
output,
which
is
normally
printing
something,
but
there
was
no
evidence
of
any
ecc
errors
in
memory
and
then
also,
if
you
think
about
it,
it's
not
obvious
how
a
memory
error
could
affect
so
many
servers
simultaneously.
B
So
could
it
be
packet,
corruption,
so
tcp
checksums
on
the
network
is
very,
are
very,
notoriously
weak
and
there's
a
link
you
can
follow
to
a
paper.
There
ceph
uses
crc32
to
strengthen
the
messaging
layer
and
it
ends
up
quite
quite
reliable,
so
it
would
be
extremely
unlikely
to
have
packet
corruption
that
would
hit
all
the
servers
at
the
same
time
and
corrupt
the
osd
map
in
the
same
way,
and
it's
not
clear
how
a
single
checksum
error
could
propagate
across
the
cluster.
B
There
were
confusingly.
There
were
a
spike
of
tcp
checksum
errors
throughout
this
incident
in
our
in
our
router
and
switch
logs.
But
correlation
is
not
always
causation.
B
So
the
evidence
pointed
to
a
software
issue,
but
what
exactly
we
have
some
different
clues
right.
It
must
be
extremely
rare
because,
looking
at
the
tracker
and
the
mail
and
the
mailing
list,
there
are
only
two
other
similar
reports
across
the
thousands
of
subclusters
out
there
in
the
world.
We
had
some
initial
clues.
We
have
two
types
of
on
disk
formats
in
theft
right
we
have
file,
store
and
blue
store.
Only
our
blue
store
osds
were
affected
from
the
other
bug
reporters
on
this
one.
B
They
had
mixed
flash
and
hdd
clusters,
and
only
the
ssds
were
impacted
sage,
pointed
to
a
recently
bound
race
condition
in
the
osd
map
code,
where,
with
a
multi-threaded
access
to
shared
pointers,
that
could
cause
a
corruption,
but
it
wasn't,
but
anyway,
we'll
get
to
that
in
a
minute.
There's
also
a
feature
called
osd
map,
duplication
that
seemed
maybe
worth
looking
into,
and
maybe
compression
could
be
it.
B
So
when
we
reported
this
sage
got
in
touch
pretty
quickly
and
suggested
to
disable
this
osd
map
deduplication
just
a
little
bit
about
that
feature,
because
it
plays
a
role
later
recall
that
each
osd
is
caching.
Several
hundred
decoded
osce
map
epochs
most
of
the
time
between
different
versions
of
an
osd
map,
nothing
changes
so
to
save
memory.
There's
a
feature
in
ceph
to
deduplicate
the
osc
maps
in
memory.
B
So
just
use
pointers
to
point
to
the
previous
version's
version
of
the
data
for
each
member,
suspecting
that
maybe
there
was
a
bug
in
there
corrupting
the
osd
map.
We
set
it
to
fault
very
quickly
on
the
on
the
first
day,
but
let's
review
the
rest
of
the
functions,
so
here's
that
osd
handle
osd
map
function.
So
this
is
an
incremental
message.
Arriving
it's
processed
in
this
handle
osd
map
function
and
it's
pretty
simple.
It
arrives.
It
checks
the
crc
of
the
message:
that's
one
valid
crc.
B
It
reads
a
previous
full
osd,
bat
from
disk
and
decodes
it
which
checks
the
crc,
that's
two
valid
crcs,
and
then
we
imply
apply
incremental
changes
to
the
previous.
The
incremental
changes
from
the
message
to
the
previous
full
and
then
we
encode
a
new
osd
map,
check
the
crc
again,
and
then
we
write
it
to
the
disk.
B
So
three
are
three
crc
checks
so
from
the
back
traces.
We
know
that
we
receive
an
incremental
map,
apply
and
store
it
with
no
crc
errors.
We
validate
the
crcs
at
least
three
times,
but
then
the
next
incremental
map
that
comes
one.
Second
later
we
try
to
reread
the
map
that
we
just
wrote
to
disk
and
there's
a
crc
error
now,
so
this
means
that
something
must
be
corrupting
the
osd
map
after
we
encode
it,
but
before
the
bits
are
written
to
the
disk,
we
studied
that
race
condition.
B
Theory
that
I
mentioned
and
it
seemed
unlikely
to
be
related.
We
checked
the
osd
map,
dedupe
implementation,
it's
very
simple
and
it
seemed
very
unlikely
to
be
buggy.
So
we
focused,
we
started
to
look
more
down
on
blue
store
itself,
some
more
deep
thoughts.
So
let's
look
again
at
that
hex.
If
there
was
something
that
bothered
me
about
this
hex
diff
over
the
weekend
after
this
incident
and
the
thing
that
was
strange
about
it.
So
if
you
notice
anything
here,
strange,
okay,
I'll
tell
you
which
what
I
found
strange
was
this
part
here.
B
The
address
where
the
address
where
the
corruptions
are
happening
were
at
almost
exactly
the
128
kilobyte
boundary
a
128
kilobyte
boundary
in
the
in
the
file
in
the
object,
and
that
reminded
me
of
something
because
we
had
maybe
three
months
earlier,
enabled
compression
on
the
cluster
and
we
had
tweaked
some
options
to
set
the
like
the
size,
the
compression
blob
size
in
in
the
osd
to
120
kilobytes.
So
this
will
sound
familiar
so
could
it
be
compression?
B
I
was
thinking
that
here's
our
settings,
could
it
be
related,
seems
still
very
unlikely,
but
then
I
got
in
touch
with
the
other
bug
reporters.
Are
you
using
compression?
Yes,
are
you
using
lz4
like
us?
Yes,
are
you
using
aggressive
like
us?
Yes,
okay,
which
os
is
centos
7,
that's
like
the
same
as
us,
and
one
of
the
others
was
using
ubuntu
as
a
slightly
old
version
of
ubuntu,
okay,
a
couple
of
years
old.
B
So
now
we're
sure
what
the
problem
is.
So
here's
the
story
of
lz4
lz4
is
a
lossless
data.
Compression
very
famous,
probably
much
of
the
web,
is
compressed
with
this
thing.
At
cern,
we
enabled
it
in
2019
december
2019,
primarily
because
the
block
storage
is
highly
compressible.
We
can
save
around
50
of
our
space
five
days,
so
the
tuesday
the
bug
happened
on
a
thursday
on
the
tuesday.
B
So
let's
try
reproducing
it
when
when
we
write,
as
I
said
a
couple
of
slides
ago,
blue
stores
breaking
an
object
into
128,
kilobyte,
blobs
and
then
compresses
and
stores
each
blob
individually.
So
we
tried
splitting
our
osc
maps
into
128
kilobyte
blobs,
but
we
couldn't
reproduce
any
corruption
so
going
deeper.
We
started
learning
about.
I
started
learning
about
word
alignment
and
fragmented
memory.
B
So
when
an
ost
map
is
compressed,
it's
actually
not
the
thing
that
in
the
code
that
gets
passed
to
lz4
is
not
a
nice
like
great
contiguous,
char,
star
array:
it's
not
a
continuous
allocation
of
memory.
It's
actually
jumping
all
over
the
place.
The
cef,
decoders
and
encoders
are
very
memory
efficient
and
because
of
that
deduplication
as
well,
so
you
can
be
sure
that
your
osd
map
is
actually
going
to
be
like
just
as
a
sort
of
linked
list
of
of
random
locations
in
memory
and
those
are
often
not
aligned
to
this.
B
B
So
to
reproduce
the
bugs,
so
we
eventually
did.
We
have
a
new
unit
tested
stuff
and
we
could
reproduce
the
bug.
B
You
need
to
take
a
good
ost
map
from
the
disk
decode
it
into
an
ose
map
class
and
then
encode
it
back
and
then
compress
that
and
the
the
action
of
encoding
or
yeah
of
encoding
a
new
serialized
map
with
this
like
kind
of
memory,
scatter
the
the
bytes
scattered
across
memory
and
then
compressing
that
this
triggers
the
bug
so
there's
a
new
test
unit
that
that
reproduces
the
corruption
bit
for
bit
with
what
we
saw
in
real
life,
zlib,
snappy
and
zsct
all
pass
this
test
with
no
changes
but
lz4
passes.
B
Fails
this
test,
so
compressing
unaligned
memory
is
complicated.
The
developer
of
the
lz4
algorithm
has
a
nice
blog
post
from
a
few
years
ago
explaining
how
he
optimized
access
or
compression
of
of
unaligned
memory.
So
we
suspected
that
this
was
maybe
there's
a
bug
in
that
area,
so
we
reached
out
to
him
on
github:
hey,
have
you
seen
this
kind
of
corruption
before
and
he
replied?
This
is
cyan
4973
nope,
that's
a
new
one
to
me!
B
So
we
tried
we
had
done
a
thread
with
this.
With
the
developer,
we
tried
some
different
configurations:
different
compilation
options,
but
nothing
changed.
The
behavior
and
then
sage
kind
of
came
to
the
rescue
sage
noticed
that
a
newer
version
of
lz4
on
his
development
box
didn't
corrupt
the
osd
map,
so
he
bisected
the
commits
rather
quickly
and
found
the
exact
commit
in
lz4
that
fixes
the
problem.
So
in
the
end,
this
was
somehow
a
known
issue
in
lz4
that
had
shown
up
in
some
of
the
lz4
unit.
B
Testing
and
the
fix
is
applic
is
basically
what
the
fix
was.
Is
that
if
the,
if
lz4
is
compressing
from
data
scattered
in
memory,
it
can
corrupt
the
map,
it
can
corrupt
the
output,
if
you
consolidate
it
to
one
single
contiguous
buffer,
it
can
improve
the
situation.
So
lz4
version,
1.8.2
and
newer
already
includes
this
fix,
but,
alas,
centos
7
and
ubuntu
18.04
use
1.7.5,
so
ceph
needs
a
workaround
or
the
os
is
also
need
to
upgrade
their
version.
B
This
pull
request
here
changes
the
lz4
plug-in
to
rebuild
the
data
buffers
in
contiguous
memory.
This
has
been
merged
to
nautilus,
will
come
in
the
next
release,
it's
already
out
in
octopus,
but
it's
not
yet
to
push
to
dynamic.
So
if
you're,
using
lc4
compression
in
in
these
releases
just
be
slightly
cautious,
so
some
comments
on
actually
the
impact
of
this.
It's
the
combination
of
ost
compression
mode,
aggressive
and
lz4
algorithm
that
triggers
this
bug.
B
If
you
have
ost
mode
compression
mode
passive,
then
only
the
user
data
in
a
pool
is
compressed.
When
we
use
aggressive,
then
blue
store,
compresses
everything,
even
the
osd
maps
or
other
metadata,
that
the
osd
might
need
to
store
onto
this
client
data
was
not
corrupted.
So,
of
course,
this
was.
Our
concern
was
client
later
corrupted.
B
Rbd
data
that
comes
in
from
the
from
the
openstack
clients
is
always
written
from
a
continuous
buffer,
and
also
we
have
hundreds
of
zfs
file
systems
on
top
of
rbd
that
are
doing
their
own
independent
checksums
of
the
data,
and
we
would
have
noticed
by
now
if
data
had
been
corrupted
and
it
hasn't,
the
corruption
was
anyway,
incredibly
rare,
our
cluster,
this
cluster
iterated,
through
hundreds
of
thousands
of
osce
map
epochs
before
it
found
a
corruptable
one
that
triggered
the
bug.
B
So
we
learned
quite
a
lot
from
this.
We
learned
primarily
that
all
software
libraries
and
services
can
fail,
even
though
they
seem
to
have
worked
since
forever.
The
these
ones
that
have
59s
of
reliability
can
eventually
fail.
Anyway.
B
We
also
learned
that
too
much
reliability
leads
to
unrealistic
dependencies,
which
can
then
lead
to
a
sort
of
disaster.
We
had
like
we
had
zero
major
outages
in
six
years
of
ceph
and
hundreds
of
certain
apps
built
dependencies
on
top
of
a
single
ceph
cluster,
which
was
unrealistic.
Thinking
back
to
the
google
sre
book.
There's
a
story
in
there
about
something
called
the
the
chubby
distributed,
lock
service
that
had
a
similar
story.
B
It
almost
never
failed
and
a
lot
of
google
services
were
built
on
top
of
it
and
eventually,
when
it
did
fail,
it
took
down
much
more
than
it
should
have.
So
there
are
lessons
in
the
google
sre
book
how
to
how
to
avoid
those
kind
of
things.
In
our
case,
we
plan
to
introduce
block
storage
availability
zones,
so
we're
aiming
for
three
or
four
different
clusters
so
that
our
users
can
spread
their
their
applications
across
the
different
clusters,
and
then
we
can.
B
They
can
be
more
available
in
case
of
any
problems
in
the
future.
So
the
list
of
thanks
and
credits
is
quite
long.
We
have
we
had
teo
and
julian
working
directly
with
with
me
on
the
issue
the
day
and
the
days
following,
and
then
lots
of
colleagues
that
I
that
you
can
read
here
in
the
in
the
office,
building
that
we
were
bouncing
ideas
off
of
to
try
to
understand
what
could
be
the
root
cause.
B
B
I
also
want
to
thank,
of
course,
the
ceph
community
in
particular
dirtwash
on
irc
knew
like
within
20
minutes
of
our
outage.
B
He
pointed
exactly
to
the
tracker
ticket
that
was
was
the
issue,
so
this
got
us
on
the
right
track,
right
away,
troy
and
eric
were
the
ones
that
had
seen
this
cluster
before
and
they
helped
out
with
clues
and
comparing
symptoms
and
then,
of
course,
sage
and
jan
cole,
who
was
the
ultimate
developer
for
their
expert
input
and
fixes.
So
that's
the
end
of
my
talk
and
I'll
be
happy
to
answer
any
questions.
B
So
yeah
so
each
each
okay,
so
yeah.
So
how
did
I
determine
okay?
So
the
the
the
mon
stores
all
the
versions
of
the
osd
maps,
so
we
knew
the
epoch.
We
knew
through
increasing
debugging
through
the
this
fosd
we
knew
which,
which
version
that
the
osd
was
trying
to
load.
So
we
knew
the
epoch
number
that
was
stored
on
each
ost.
That
was
corrupted,
so
we
therefore
knew
which
one
to
extract
from
the
mod
and
inject
I
mean
the
epoch.
B
Number
is
kind
of
like
the
serial
number
of
the
osd
map
and
that's
the
same.
It
should
be
the
same
everywhere
on
every
disk
of
the
cluster.
On
every
mod
of
the
cluster
I
mean
in
the
midst
of
this
bug.
We
also
wrote
some
tools
to
like
independently
verify
the
crc
of
osc
maps,
so
those
helped
along
the
way,
but
the
short
stories
that
we
just
extracted
from
them
on
where
we
knew
it
was
good.
B
That's
a
good
question.
I
also
wondered
that
it.
I
don't
think
that's
been
measured
by
anyone
you
if
you
look
at
the.
If
you
look
at
the
fix,
it's
simply,
it
checks
it
just
does.
Buffer
list
is
contiguous
if,
if
buffer
lisp
naught
is
contiguous
that
it
that
it
rebuilds
the.
A
Buffer,
okay,
and
going
back
to
the
previous
question,
how
did
you
determine
the
uncorrupted
version
used
additional
question
that
came
in
so
you
just
went
one
back
from
the
problem.
Osd
map.
B
There
it
was
so
this
is
the
real.
This
is
like
real.
These
numbers
are
not
made
up,
so
this
epoch
number
2.9
million.
Something
was
the
first
epoch
that
was
corrupted
so
the
osd
in
its
like
in
the
object
store
or
this
osd
666,
which
is
also
real
by
the
way,
but
I
comically
included
it
there.
So
epoc
2808
was
good
on
this
osd
and
when
this
osd
666
tried
to
start,
you
would
get
an
error
that
it's
loading.
B
B
B
I
think
that
maybe
the
week
afterwards,
we
also
opened
a
ticket
with
red
hat
to
update
and
as
far
as
I
know,
they
will
have
1.8.2
in
in
the
in
in
one
of
the
centos
7
releases
very
soon,
if
it's
not
already
there,
but
at
the
same
time
nautilus
doesn't
have
a
fix
for
this.
Yet
right.
So
so
it's
a
we.
B
We
did
both
in
parallel
and
then
there's
a
question
about
compression
in
general
general
question
about
compression
sep
is
compression,
something
you
would
only
run
on
ssd
osc's,
or
is
the
performance
penalty
small
enough
to
run
on
hdds?
In
fact,
I
think
that
your
intuition
is
backwards.
I
mean
the
this
hard
drives.
Hdds
are
so
slow
that
you
could.
The
cpu
could
easily
compress
the
data
faster
than
what
the
hard
drive
can
absorb
and
write
the
data
anyway,
and
also
for
for
for
decompressing
the
data.
B
So
often
when
we
use
compression
in
other
in
other
areas
like
stuff
or
other
areas,
we
see
a
performance
increase
in
in
bytes
per
written
bytes
written
per
second.
B
Brian's
asking
when
is
it
expected
to
make
it
to
nautilus?
It's
been
merged
to
nautilus,
so
it'll,
be
there
14-10,
yep
and
then
victoria's
asking
are
all
the
clusters
of
nautilus
nope.
We
have
so
out
of
10
clusters,
two
are
still
in
in
luminous.
We
have
a.
We
have
a
cefs
cluster
that
we're
still
waiting
to
upgrade
we're
waiting
for
some
last
minute
past
bug
fixes
before
we
upgrade
to
nautilus
and
similar.
Excuse
me
similarly,
s3
we're
waiting
for
the
last
just
a
few
little
bugs
to
be
fixed
here
and
there.
C
B
The
only
I
I
think,
if
you
have
so
for
rbd,
I
think
that
doing
both
is
a
good
idea.
There
are
lots
of.
There
are
a
lot
of
gotchas
along
this,
though
so.
First.
The
reason
why
I
say
for
rbd
is
because
you,
you
often
have
like
big,
four
megabyte
or
eight
megabyte
objects
that
are
easy
to
split
and
then
easily
too
easy
to
split
into
things
that
are
still
relatively
large
and
then
and
then
you
can
compress
them
easily
with
ceffs
or
s3.
B
If
you
have
smaller
files,
then
maybe
it
doesn't
pay
off
because
you
have
you
have
to
like
pat
you
have
minimum
sizes
that
you
have
to
fill
with
zeros
to
pad
out
small
files
or
small
objects.
Now
the
only
the
other
gotcha
about
compression
on
rbd
is
that
there
were.
I
haven't,
observed
this
myself,
but
there
were
comments.
There
are
theories
that,
because
rbd
is
normally
like
the
actual
workload
is
normally
four
kilobyte
rights
or
maybe
64
kilobyte
writes
because
it's
usually
small
small
changes
to
the
blocks.
You're.
B
Not
in
fact
writing
four
megabyte
objects
all
at
once,
you're
just
writing
small
small
changes.
So
the
fact
that
you're
just
writing
making
those
small
rights-
it
might
have
an
impact,
and
I
don't
know
what
the
implementation
does
if
it
reads
out
the
whole
chunk
and
then
writes
back
or
if
it
just
somehow
writes
a
small
extent.
But
this.
But
the
fact
that
rb
does
small,
writes
anyway,
might
might
over
time
kill
any
kind
of
space.
Savings
that
you
might
have
earned
might
have
thought
that
you're
earning.
B
So
a
question
from
jared
just
asking
for
clarification:
it
was
the
storage
knowledge
packet
version
of
lz4
that
caused
problem,
not
the
client,
ciphers
correct.
It's
the
it's
the
version
of
lz4
running
on
the
ost
servers.
That's
that's
important
because
the
data
is
compressed
in
the
case.
So
in
the
case
of
bluestar
compression,
the
data
is
compressed
by
the
osd
itself,
brian's
asking,
if
we're
still
using
compression
on
the
object
storage
clusters
on
the
object
clusters
too.
B
B
I
think
that
so
the
mon
uses
roxtv
and
the
mon
can
compress
using
roxtv
compression,
but
I
think
that
that's
snappy
in
all
stuff
configurations,
I
might
be
wrong.
I
hope
somebody
can
correct
me
if
I'm
wrong
and
I
think
it's
off
anyway,
by
default
and
and
by
the
way.
Well,
I
think
that
when
the
mon
would
get
the
osd
app,
it
also
wouldn't
use
these
encode
decode
functions
like
the
osd
does.
I
think
it
would
just
get
the
blob
and
write
it.
I'm
not
sure
about
that.
A
All
right
dan,
thank
you
for
your
time
and
sharing
with
the
sup
community
for
another
self
tech
talk,
and
this
will
be
course
recorded
and
put
it
up
on
youtube.
So
thanks.