►
From YouTube: Ceph Performance Meeting 2023-02-02
Description
Join us weekly for the Ceph Performance meeting: https://ceph.io/en/community/meetups
Ceph website: https://ceph.io
Ceph blog: https://ceph.io/en/news/blog/
Contribute to Ceph: https://ceph.io/en/developers/contrib...
What is Ceph: https://ceph.io/en/discover/
A
I
suppose
we've
got
a
decent
number
of
people
here.
Let's
just
get
started
all
right.
I
see
two
new
PRS
this
week.
One
is
from
Igor.
This
is:
don't
use
the
real
whole
space
iterator
for
prefixed
access
Igor.
Do
you
want
to
talk
about
that?
A
little
bit.
B
Yeah,
that's
not
the
case
which
originally
brought
up
by
courage,
which
is
using
unblocked
operators
and
during
my
experiments,
I
realized
that
I
might
a
week
before
iteration
over
every
common
family
or
prefixes
which
do
not
belong
to
to
any
specific
column
families
so,
for
instance,
for
shared
blocks
and
for
starter
first
of
startup,
or
only
for
CK.
B
B
Uses
match
role,
merge
to
whole
page
iterator
and
and
in
some
cases
like
my
sandbox,
which
might
take
well
tens
of
minutes.
Ftp
is
in
that
degraded
state
with
photons
of
constructs.
B
A
Very
good,
very
good,
all
right
next
there's
this
PR
from
Adam
K
for
booster,
improving
the
fermentation
Square
metric
Igor
I
think
you
reviewed
that
I
haven't
looked
at
it
yet,
but
I
think
it's.
B
B
A
All
right,
let's
see,
moving
on
then
three
closed
PRS
this
week,
all
by
the
bot.
Unfortunately,
so
two
different
MDS
pairs
were
closed.
The
first
is
this:
removing
the
subtree
map
from
the
journal
I'm
sad
to
see
that
one
closed,
but
it's
not
surprising.
A
It's
a
really
complicated,
PR
and
I.
Think
Greg,
you
might
know
more,
but
my
understanding
was
that
it
was
just
it
was
too
much.
Maybe
we
couldn't
really
effectively
review
it
very
well
and-
and
we
ended
up
deciding
to
go
a
different
route-
is
that
approximately
right.
A
Sorry
this
is
yeah.
This
is.
This
is
like
an
old
one
from
from
Zhang
to
remove
subtree
maps
from
the
journal.
I
think
he
was
getting
I
think
he
did
maybe
like.
D
D
Don't
remember:
okay,
so
they're,
like
Jean,
has
many
PRS,
which
haven't
merged
that
might
get
versed
as
bandwidth
and
need
opens
up
for
subtree
journaling.
In
particular,
we
have
there
are
like
four
or
possibly
we're
up
to
five
now
together
in
various
ways,
so
I
think
we're
probably
gonna
go
a
different
direction
there,
because
Patrick
and
shubo
both
have
some
that
I
think
work
together
and
that
should
be
better
or
at
least
are
simpler,
to
understand
and
maintain
Okay,
so
that
one
that
one
I'm
guessing
is
is
gone
but
yeah.
A
And
that
actually
leads
into
the
next
PR
here,
which
is
Patrick,
has
a
PR
that
the
bot
closed
for
skipping
inode
with
cap
iteration
for
empty
directories.
A
A
Cool
all
right
and
the
last
one
that
the
bot
closed
was
make
do
right.
Small,
never
do
buffered
rights,
I,
don't
know
who
reviewed
this
one
last
Igor.
Did
you
look
at
that
at
all.
A
Yeah
yeah
I
know
he
was
working
on
it
quite
a
bit,
but
yeah.
B
A
Well
for
now
we'll
I'll,
let
Adam
know
since
he
was
new,
not
here
I'll
I'll,
let
them
know
separately
when
we
talk
and
see
if
he
wants
to
reopen
it
or
not
all
right
for
updated
PRS.
This
week
the
Rocks
TV
upgrade
PR,
I,
guess
I!
Guess
the
authors
have
any
issues
with
our
our
Seth
roxdb
repo
and
needs
additional
help.
So
I
told
him
I
I,
just
briefly
looked
at
this
morning,
but
I'll
I
told
him:
I
tried
to
figure
it
out
so
one
way
or
another
we'll
get
the
upgrade.
A
I
I
think
we
really
want
that
in
for
for
Reef
next,
adding
primary
balance
scores:
hey
do
we
have
Laura
or
Josh
Solomon
today,
oh,
you
are
I.
Don't
even
where
are
you?
Oh
there?
You
are
sorry
go
for
it.
E
We're
just
fine
at
the
the
balancer
feature
in
for
Reef,
so
we're
mainly
just
waiting
on
the
lab
to
be
fixed
to
I.
Think
there
are
two
PR's
on
there.
One
is
for
the
the
read
balance
score,
which
is
in
addition
to
the
OSD
map
that
calculates
a
score
for
each
pool,
replica
pool
on
on
how
balanced
the
reads
are,
and
then
the
other
one
is
mine,
which
is
the
overall
for
the
the
overall
balancer
feature
and
they're
they're
linked
together.
E
They
each
depend
on
each
other,
but
they're
ready
to
go
into
testing
as
soon
as
the
the
lab
can
handle.
It
awesome.
C
A
Yeah
yeah,
absolutely
very
cool
all
right.
Let's
see
here
then
I
think
we
took
care
of
the
last
PR
in
the
updated
bunch
here,
but
I
think
the
the
last
one
that
I
need
to
talk
about
is
oh
Igor,
your
faster
bluefest
allocations
in
AVL,
hybrid
allocators,
I,
know.
Adam,
did
an
initial
review
of
that
and
was
was
not
sure
about
it.
I
have
not
looked
at
it
and
I
apologize.
Oh,
oh!
If
Adam
can't
help
I'll
try
to
make
time
to
look
at
it.
A
I
have
a
bunch
that
fell
off
too,
but
yeah
I
I
will
I
will
try
if
I
Igor,
if,
if
I
don't
just,
please
feel
free
to
email
me
and
just
like
Hound
me
until
I
do.
A
Okay
sounds
good:
okay,
let's
see
lots
of
stuff
in
the
no
movement
category.
I,
don't
think,
there's
anything
super
exciting
here
to
talk
about
this
week,
more
rocksdb
stuff,
but
we
can
we
can
see
after
after
we
hear
from
from
David
orman's
group
what
maybe
decide
if
how
much
of
this
is
still
necessary
or
not
all
right.
A
Well,
then,
actually,
on
that
topic,
David
or
Corey,
do
you
do
you
want
to
take
over
and
talk
a
little
bit
about
some
of
the
experiments
that
you
guys
have
made.
F
Yeah,
that
sounds
great
Cory
I'll
hand
it
over
to
you,
but
I'm
happy
to
present
details
from
our
internal
monitoring
system
so
feel
free.
G
Yeah,
so
we
had
the
cluster
that
had
issues
that
we
talked
about
a
little
over
a
month
now
ago,
with
eg
movement
and
all
the
deletes
in
rocks
to
be
causing
iterators
to
be
extremely
slow
and
then
the
osds
crashing
due
to
Suicide
timeouts.
After
up
to
like
500
seconds
and
stuff
this.
These
osds
or
this
cluster
does
have
some
spillover
of
our
DB
from
the
thinning
disk
or
onto
the
spinning
discs,
and
that
was
part
of
it.
But
we
we
just
over
the
past.
G
We
started
upgrading
some
of
those
nodes
and
playing
with
some
ideas
in
order
to
work
around
that,
and
so
one
of
the
things
that
we
deployed
to
the
osds
was
marks
compact
on
delete,
filter
options
being
passed
through,
and
the
other
thing
was
something
I.
G
Think
I
talked
a
little
bit
a
couple
weeks
ago,
which
is
basically
passing
through
Max
skippable
Keys,
the
the
rocks
to
be
read:
option
for
that
from
the
blue
store
layer
down
to
rock
CB
when
we're
doing
a
collection
list,
so
that
I
could
bound
the
amount
of
tombstones
that
were
being
iterated
over
and
then
at
the
OST
layer.
G
Just
to
kind
of
back
off
and
retry
whenever
we're
seeing
an
excessive
number
of
keys
being
iterated
over
that
were
causing
big
latencies
and
ultimately,
the
osts
to
crash,
and
so
the
results
of
deploying
those
two
things
were
really
good
in
production.
We
basically
saw
the
the
compact
on
delete
stuff
in
and
of
itself
taking
care
of
the
issue
which
we
were
really
happy
about.
G
Compassions
were
extremely
effective.
In
that
scenario,
we
said
we
we
tested
essentially
by
Just
re-weighting
One
of
the
that
OSD
down
to
90
and
letting
like
10
of
the
pgs
move
off
of
it,
and
when
the
the
delete
work
happened,
we
did
see
those
compaction
filters
work
properly
to
clean
up
those
tombstones
pretty
much
immediately
and
we
never
had
any
high
latencies
at
all.
So
one
really
positive
result
there.
G
That
I
think
will
be
good,
since
that's
already
in
Maine,
so
I
think
that
that
will
be
a
a
good
point
for
those
moving
forward.
G
So
beyond
that
the
other
thing
that
we've
realized
recently.
So
at
the
time
it
happened
we
had
set.
We
we
had
set
the
TTL
compaction
setting
down
to
like
an
hour,
I
think
trying
to
get
things
cleaned
up
faster
and
we
had
left
that
there
for,
like
a
month,
I
guess
now,
and
we
recently
recently
realized
that
that
was
really
causing
a
lot
of
bad
performance
issues
for
us
with
the
spillover
and
since,
as
David's
got
the
graph
up
there
since
we
set
that
down,
and
we
just
removed
it
now.
G
What
else
so
we
also
have
been
playing
with?
We
noticed
that
the
nvme,
so
our
nvme
volumes
on
these
osds
are
like
60
gigabytes,
and
we
notice
that
blue
FS
was
only
using
like
five
gigabytes
of
that
and
the
rest
was
all
spilling
over.
G
That
was
very
ineffective
use
of
those
resources
and
we
ended
up
finding.
There
was
a
the
volume
selector
had
a
few
tunables
that
we
could
adjust
to
make
it
use
more
of
that,
in
particular,
the
blue
FS
volume
selection,
Reserve
factor
that
is
pasted
in
the
chat,
that's
default
of
two.
We
set
it
down
to
one
and
that
works
well
for
pushing
more
data
onto
the
nvmes.
G
In
that
scenario,
and
what
we
realized
there
is
that
the
default
logic
of
the
volume
selector
basically
assumes
that
blue
store
or
that
Roxy
to
be
is
doing
compactions
according
to
the
normal,
like
level
compaction
settings
and
that
each
of
the
levels
are
filling
up
to
their
requisite
amounts
in
in
doing
its
calculations,
to
determine
how
much
it
can
move
over
to
the
fast
device.
G
Even
if
the
whole
roxdb
level
doesn't
fit
on
the
fast
device
and
in
our
case
with
both
the
TTL
compactions
and
even
now,
with
the
packed
on
delete
filters,
it
turns
out
that
we
aren't
getting
anywhere
close
to
filling
up
those
other
levels
before
things
are
compacted
down
to
that
last
level,
which
is
L4
in
our
case,
and
so
it's
choosing
not
to
to
prioritize
moving
anything
over
to
the
fast
device.
G
For
that
reason,
so
there
may
be
some
other
things
we
can
do
to
be
smarter
for
cases
like
this
versus
having
people
adjust
the
Reserve
factor
to
kind
of
detect
that
scenario,
but
we
talked
about
that
a
little
bit
with
Mark
on
some
some
chat,
but
I
think
we
need
to
think
through
that
more.
B
B
G
B
Something
maybe
at
least
somehow
prevents
well,
because
it's
still
over
to
happen,
you
know
more
early.
G
Well,
the
so
we
have
the
normal
racks,
DB
level
settings,
I,
think
256
megabytes
for
level,
one
which
makes
level
three
twenty
five
gigabytes
and
we
have
what
100
I
think
I.
Guess
it
yeah
104
gigabytes
total.
So
we
have
a
lot
of
spillover
regardless.
B
G
B
B
G
B
So
it
would
be
interesting
to
get
more
details
on
this
tissue
offline.
Definitely.
G
Yeah
I'll
share
I'll
share
some
of
the
stuff
that
I
was
mentioning
related
to
the
actual
calculations
and
stuff
and
yeah.
It
would
be
nice
to
kind
of
brainstorm.
What
might
be
a
better
approach
for
some
of
these
edge
cases
like
we.
G
And
then
the
last
thing
I
was
going
to
mention
that
was
really
interesting
for
us.
We
tried
compression
for
both
well,
we
tried
lgbore
lz4
compression
and
it
was
really
effective
for
both
the
osds
and
the
for
data
and
also
the
rgw
index
osds.
So
mostly
omap
data.
We
got
like
I
think
slightly
more
than
50
percent
storage
savings
which
for
us
with
the
spillover,
was
really
significant
and
seems
to
have
been
a
win-win
from
our
perspective.
So
far,.
A
Have
you
guys
seen
any
negative
performance
impact
with
it?
In
many
cases,.
G
F
Yeah
no
I'm
happy
to
answer
that
we
actually,
if
anything,
we
saw
a
positive
Improvement.
We
see
more
actual
iops
going
on
the
devices
that
we've
applied.
These
changes
to
now
keep
in
mind.
We
have
applied
multiple
changes,
so
it's
it's
not
just
an
isolation
lz4,
but
the
combination
of
lz4,
which
positively
impacted
the
amount
of
spillover
we've
seen
relatively
consistent
latency.
Actually,
we
saw
a
little
bit
better
on
read
afterwards
and
we're
seeing
more
actual
operations
completed
with
the
reduction
in
latency.
If
that
makes
sense.
F
So
certainly
it's
been
positive,
but
that
could
be
a
combination
of
the
situation
with
the
amount
of
spillover
we
had
on
a
slow
drive.
The
compression
Advantage
May
outweigh
the
performance
implication
if
any,
and
as
far
as
like
CPU
and
stuff
we're
nowhere
near
even
coming
close
to
touching
the
capability
of
a
relatively
meager
Xeon
CPU.
F
So
it
doesn't
seem
like
the
the
compression
or
decompression
has
had
a
has
a
negative
impact
on
that
now
it
might
be
a
different
story
on
nvma
based
storage,
just
because
it's
so
much
faster,
but
at
least
with
rotational
based
storage.
It
seems
like
it's
pretty
much
a
win
all
the
way
around.
A
C
F
A
F
It
yeah
it,
it
seemed
like
it
was
kind
of
a
win-win
all
the
way
around
for
us
I'm
trying
to
think
if
it
was
24
and
25
it
might
have
balanced
look
because
it
was
pretty
amazing
on
our
index
disks.
Let
me
go
look
actually.
F
The
utilization,
so
these
are
these-
are
basically
the
index
pool.
Only
when
we
enable
compression
on
these
I
mean
it
was
an
enormous
change.
We
were
near
60
of
you
know,
I
think
these
were
700,
700,
gig,
750
gigs
and
we
dropped
it's
almost
yeah.
It
was.
It
was
more
than
half
there
just
because
of
how
much
omap
I
would
have.
Is
there
so
anyways?
That
was
all
kinds
of
win
on
those
things
too
and
again
you
can
see
when
we
enabled
it.
F
We
actually
see
if
anything,
latency
is
slightly
better
and
that's
in
vmes.
A
So
Casey
I
saw
the
best
compression
with
this.
When
I
was
doing,
testing
last
fall
with
rgw
I.
Don't
know
if
there's
anything
that
rgw
writes
it's
like
just
ridiculously
compressible,
but
it
it
for
whatever
reason
it
seemed
like
with
rgw
workloads
we
saw,
or
at
least
I
saw
a
really
really
good
Improvement.
C
Yeah,
that's
great
I
mean
I,
know
that
there
are
strings
in
the
index,
but
I
don't
have
a
good
sense
of
how
much
is
string
versus
other
fields.
But
potentially
a
lot
of
other
fields
are
like
integers
that
default
to
zero.
So
that
could
help
too.
F
F
You
know
when
it
when
it's
mvme,
we
don't
have
any
pure
mvme
clusters
to
really
mess
with
at
this
point
in
time.
So
it's
gonna
be
hard
for
us
to
tell
now
just
so
everybody's
aware
kind
of
what
our
process
is.
So
we
had
to
get
all
this
stuff
out
of
the
way
because
we
basically
had
osds
crashing
anytime.
F
We
try
to
to
do
any
data
shifts,
but
we're
going
to
take
what's
on
this
cluster,
which,
let
me
go
look
real,
quick
how
much
data
this
is
if
it's
right
around
1.5
petabytes
currently
stored,
so
2.1
petabytes
used
because
we
have
a
ratio
coding,
a
plus
three,
the
intent
is.
We
we've
just
built
out
two
new
21
node
racks
worth
of
servers,
but
those
have
a
multiple
6.4,
terabyte
nvme's
on
them,
and
so
those
will
become
the
new
DB
wall
devices.
F
So
we
should
have
more
than
sufficient
space
on
the
DB
wall
side
to
have
everything
nvme
and
we're
going
to
add
those
into
the
cluster
and
shift
all
of
the
data
off
of
this
existing
21
node
rack
onto
those
other
two
that
have
the
new
device
situation.
So
we
wanted
to
preempt
that
with
all
the
fixes
and
mitigations
to
allow
us
to
do
that
without
the
osds
going
into
these
nasty
little
crash
Loops
that
we
ran
into
we
had
a
host
down
for
three
days,
and
so
that
was
kind
of
the
plan.
F
So
the
next
step
after
we
finish
this
upgrade,
is
to
just
verify,
validate
everything,
looks
good,
post,
upgrade
and
Mark.
Of
course
we
can
share
whatever
information
you'd
like
and
you're
you're,
more
than
welcome
to
look
at
the
logs
and
anything
else.
That
would
be
helpful
to
you.
Maybe
you
can
glean
some
information
that'll
help
make
a
decision
about.
You
know
what
you
switch
to
as
a
default
more
clear
and
then,
when
we
do
the
migration
we'll
collect
data
during
that
process
as
well.
F
So
we
can
see
what
it
looks
like
as
the
pgs
are
being
purged
off
of
the
source
osds
and
then
most
certainly
we'll
have
better
data
available
on
the
destination
cluster
which
our
intent
is
to
keep
lz4
enabled
so
we'll
be
able
to
collect
some
data
there
with
the
the
proper
mix
of
DB
wall
and
nvme
to
hold
the
entirety
of
the
level
four.
A
B
F
Course,
oh
CP
cores
see
so
we
have
48
threads
24
actual
cores
and
we
have
24
osds
in
these
devices.
B
F
.
yeah
no
well
I!
Guess
if
we
count
the
index,
though,
as
these
bids
26
OSD,
so
two
EVMS
that
we
use
for
index
pool.
So
that's
two
and
then
24
data
osds,
which
are
serving
the
data
pool
for
rgw.
F
And
if
you
want
to
look
at
our
CPU
consumption
and
keep
in
mind
that
we're
in
the
middle
of
a
upgrade,
we
have
a
fair
bit
of
I.
Guess
spare
CPU.
A
Right
now,
I
have
the
opportunity
to
do
some
testing
on
very
limited
machines.
There's
CPU
bound,
not
not
in
any
kind
of
real
scale,
but
I
could
do
some
low
level
tests.
So
I
think
I'll
I'll
try
that
out,
but
I
I
suspect
that
the
the
win
in
terms
of
reducing
traffic
to
the
the
hard
disks
is
probably
worth
it.
That
tattoo
is
is
no
Envy
me
just
pure
hard
drives.
F
Yeah
exactly
again
like
we
are
in
no
way
shape
or
form
even
remotely
close
to
CPU
bound
with
rotational
drives,
like
it's
I'm,
not
going
to
say
we're
idle,
because
this
cluster
is
upgrading.
So
the
customer
traffic
is
not
nearly
as
high
as
it
can
be,
but
we've
never
had
high
CPU
load
on
this,
like
I,
would
be
very
surprised
if
we
could
saturate
a
CPU.
F
Even
if
the
osds
were
at
full
100
capacity
yeah,
and
we
could
probably
we
could
probably
look
I
I,
don't
know,
am
I
sharing
the
terminal
or
am
I
sharing
the
perhaps
right
now
the
graphs
okay.
Well,
then,
you
didn't
see
me
putting
all
the
information
in
the
terminal
showing
our
CPUs
are
currently
at
approximately
four
percent
utilization
per
core.
F
I
think
we've
got
some.
We
may
have
some
metrics.
A
F
This
was
prior
to
the
upgrade
which
puts
this
image
in
place
that
we're
talking
about,
and
oh
my
gosh
and
that's
in
the
process
of
upgrading,
so
keep
in
mind
when
our
index
mvmes
go
down
and
come
back
up,
Cory's
trying
to
dig
into
kind
of
what's
going
on
there,
but
they're
they're,
basically
reshuffling
all
the
data
so
they're
running
at
like
a
hundred
percent
utilization,
CPU
non-stop,
for
you
know
an
hour
or
two
while
they
shift
data,
but
even
with
that,
that's
what
we
see.
F
A
F
I'm
sure
sure,
but
I
guess
I
guess
again.
My
point
is
you
know
from
the
perspective
of
rotational
drives,
I
I,
just
I
I,
find
I'm
very,
very
I
would
be
very
surprised
to
see
if
CPU
consumption
got
so
out
of
hand
that
the
lz4
compression
really
made
a
material
difference
from
a
CPU
perspective.
Yeah.
A
Yeah
I
think
probably
the
only
cases
are
where
people
are
like
crazy
running.
You
know
60
osds
on
like
four
cores
or
something
ridiculous.
F
Yeah
exactly
I
mean,
if
someone's
like
super
over
subscribed,
but
I
mean
these.
Aren't.
These
I
mean
they're
xeons
and
they're
decent
xeons.
What
are
these
These?
Are
cat
Prague
they're
42-14s,
but
these
are
by
no
means
like
super
high-end,
vertical
scale.
Cpus.
These
are
more
just
lots
of
cores
that
are
reasonable,
so
I
would
be
surprised.
There's
a
massive
problem.
A
Yeah
yeah
David
Joshua
was
asking
in
the
chat
window.
If
you
guys
have
a
list
of
the
the
things
that
you
did,
the
the
back
parts
that
you
made
or
other
fixes
that
you.
F
A
G
Yeah
I
will
have
to
look
back
through
I
think
there
are
quite
a
few
patches
on
top
of
16
to
10
that
we
have
but
I,
don't
now
they're.
Jumping
to
my
the
front
of
my
mind
here
on
that
are
specifically
related
to
improving
index
OST
performance
I
mean
the
TTL
getting
rid
of
the
TTL
that
we
had
in
place
for
TTL.
Compactions
definitely
seems
to
have
been
an
important
step
in
improving
their
performance.
There
and
I'll
look
back
through
the
list
of
patches
that
we
do
have.
On
top
of
this.
G
F
Think
a
lot
a
lot
of
this
is
is
really
about
mitigating
the
stuff
that
was
causing
us
pain
in
terms
of
Shifting
data
crashing
osds
and
they
all
kind
of
had
knock-on
effects.
So
as
we
were
trying
to
address
kind
of
the
root
cause,
we
just
saw
that
oh
hey,
we
can't
not
have
TTL
based
compaction
on
when
we
have
this.
You
know
deletion
issue
that
causes
the
osds
to
lock
up
to
where
they
get
suicide
timeout.
F
So,
as
we've
implemented
patches
that
have
allowed
us
to,
we
started
by
reducing
the
TTL
or
I
should
say
increasing
the
TTL
compaction
time.
So
we
were
at
every
one
hour
previously
than
we
went
to
every
I
think
was
six
hours
and
now
we're
just
turning
off
TTL
compassion.
So
a
lot
of
the
performance
increase
is
because
we've
been
able
to
remove
things
that
were
actually
punishing,
I,
guess
the
osds,
rather
than
helping
them,
but
we're
necessary
in
order
to
prevent
the
cluster
from
eating
itself.
If
that
makes
sense,
yeah.
F
Yeah-
and
there
were
some
changes
we
made-
you
know
that
might
have
some
performance
implications,
certainly,
but
a
lot
of
it
I
think
with
Mark's
patch.
H
G
Don't
think
we
even
changed
the
other
settings.
We
just
back
part
of
the
compact
on
delete
stuff
and
we
did
set
those
compact
on
delete
settings
a
little
bit
lower
than
what
you
had
for
now,
at
least
while
we're
trying
to
move
data
around
I
think
we
have
both
of
them
set
at
512,
so
anytime,
there's
512
consecutive
tombstones
in
an
SST
file
in
marks
it
for
compaction
right
away,
we'll
probably
bump
that
up
a
little
bit
after
we
get
a
sense
of
like
what,
after
we
play
with
it
a
little
more.
A
Sure
I
I
just
kind
of
picked
those
out
of
the
blue
like
based
on
what
seemed
reasonable,
but
you
know,
testing
is,
is
much
more
important.
A
Guys
this
is
this
is
really
great,
like
I
I
think
we're
gonna
I
think
reef
is
potentially
could
be
one
of
the
the
best
releases
we've
made
in
quite
a
while
and
and
large
part
of
things
to
sell
this
testing.
You
guys
are
doing
so
really
really
appreciate
it.
F
Yeah
and
and
to
be
clear,
we're
running
Pacific
right,
like
I,
think
this
is
all
great
for
Reef,
but
we
should
definitely
give
a
lot
of
thought
into
where
this
land's
back
ported
to
because
a
lot
of
people
would
stand
to
benefit
from
this
too
and
and
I
know,
it's
I
know
it's
really
nice
to
be
able
to
sell
the
story
of
like
hey.
If
you
upgrade
to
Reef,
it's
gonna
be
way
faster
and
way
better,
which
is
harder
to
sell.
F
You
know
our
intent
is
to
move
to
Quincy
next
I
think
actually,
after
the
next
release,
is
kind
of
the
plan
currently
sure,
but
we
certainly
want
to
see
all
this
at
least
as
much
as
makes
sense
in
his
prudent
back
for
it
now
changing
the
defaults,
like
that's,
probably
a
different
discussion,
and
that
might
be
something
that
you
know
you
could
do
reef
specific
or
what
have,
but
I,
think
a
lot
of
users
would
definitely
benefit
from
this.
Just
in
general.
F
No
we're
currently
running
the
version
that's
distributed
with
SEF.
We
we
haven't
we're
not
running
oh
tree
one.
This
is
okay,
just
running
the
one,
that's
already
in
Pacific,
so
it's
just
specific
plus
yeah
I,
think
there's
like
10
or
15
patches,
or
something
that
we're
maintaining
for
various
things.
A
Yeah
yeah
they've
got
they've,
got
something
in
the
new
release
that
helps
improve
the
behavior
of
tombstones
in
the
mem
tables,
so
that
thing
I
did
was
just
for
the
SSD
files,
but
it
doesn't
help
you
at
all
with
memcable
Tombstone
accumulation,
which
you
know.
Maybe
we
don't
hit
I
don't
know,
but
it
sounds
like
the
new
version
of
roxyb
is
definitely
worth
getting
in.
If
we
can
yeah.
F
I
mean
I
read
through
the
changelog
and
they're
they're
like
so
many
thousands
of
bug,
fixes
and
other
things
which,
some
of
which
sounded
relatively
disastrous
and
I,
know
strange
behavior
that
we've
probably
seen
in
the
past.
So
you
know
it's
like
everything,
there's
a
little
risk,
but
I
think
the
reward
on
that
might
might
be
worth
it.
A
Yeah
we
just
we
need
to
get
baking
soon
as
soon
as
we
can
into
intoology,
so
that
for
Reef
we
we
feel
confident
and
then
for
Pacific
and
Quincy
backwards.
That's
scarier
for
me
because
they
they
change
a
lot
of
stuff
in
Rock's
TV.
So
it's
maybe
maybe
but.
C
F
For
Reef,
with
the
rocksdb
upgrade,
and
then
just
you
know
now
that
we've
got
some
some
data
and
we'll
continue
to
collect
data
and
share
whatever
is
useful
to
you.
Maybe
we
could
look
at
just
tuning
some
of
the
the
settings
or
giving
people
the
option
to
upgrade
and
tune
the
settings
to
get
at
least
some
of
the
benefit.
Yeah.
A
Exactly
exactly
I
mean
it
looks
like
with
stockcraft
CB
and
just
a
couple
of
these,
these
fixes
you're
you're,
seeing
like
dramatically
better
behavior
and
Pacific.
So
right,
I
think
it's
worth
it.
A
A
All
right,
well,
Joshua,
I,
I,
I'm
I
moved
your
stuff
up
because
it's
more
important
than
what
I
wanted
to
talk
about.
So
why
why
you?
You?
You
take
the
floor.
H
Yeah
I
mean
I
can
talk
about
it.
It's
almost
more
Igor
story
to
talk
about
now
than
mine,
but
thanks
to
insights
from
Igor
last
week,
he
and
us
have
been
digging
further
and
we're
at
this
point,
I'd
say
we're
fairly
certain.
H
We
know
what
the
cause
of
the
right
amplification
is,
maybe
not
down
to
like
the
very
precise
change,
but
basically
what
Igor
has
found
is
that
there's
gigantic
eye
notes
in
blue
FS,
like
100
600,
megabyte
inode,
for
example,
whoa
yeah,
so
the
extent
list
for
this
I
node
is
just
gigantic
and
because
of
the
way
blue
FS
log
works
is
every
single
time.
That
extent
list
extends
it's
rewriting.
H
Obviously
the
entire
I
know
to
the
blue,
FS
log,
and
so
the
log
is
just
being
hammered
both
by
rights
and
also
just
by
compactions,
because
a
lot
gets
so
big,
so
that's
actually,
and
so
that
I
know
that's
getting.
That
big
is
one
of
the
rock
cdb
walls.
In
the
particular
case
I
looked
at,
it
was
for
the
L
column,
family
I,
don't
know
which
is
out
in
the
stores
and
that's
actually
expected,
because
the
Pacific
setting
says
don't
let
the
walls
exceed
one
gig
in
size.
H
H
B
E
B
A
So
each
buffer
is
supposed
to
only
grow
up
to
a
maximum
of
256
megabytes
in
in
in
Pacific
you're,
saying
that
what
I
know
was
like
600
or
700.
B
I
have
a
feeling
that
it
might
be
relevant
to
a
different
column,
family
subject:
I
think
this
was
backboarded
from
nodulus.
Db
could
be,
let's
jump
it
and
maybe
that
somehow
relevant,
but
just
just
a
hypothesis.
B
That
interesting,
that's.
A
B
But
anyway,
Kevin
250
megabytes
in
Rider
headlock
64k
allocation
unit,
my
trigger
pretty
large
blueface,
lock
updates
as
well.
So
in
total
11
and
main
branch
we
use
incremental
updates.
So
that's
not
the
case
anymore.
A
H
Yeah
so
ever
had
recommended
two
experiments.
Obviously
one
is
testing
16-11,
which
we
are
planning
to
get
into
the
lab,
hopefully
in
the
next
couple
weeks
or
so.
We'll
have
a
hard
time
on
that
yet,
but
this
might
move
that
up
the
other
one
was
just
like
as
an
interesting.
What
happens
if
we
crank
the
bluefest
shared
Alex
size
from
the
default
64k
when
it's
on
the
same
device
to
one
Meg
and
so
I
ran
that
experiment
this
morning,
and
it
does
seem
to
make
a
pretty
big
difference.
H
A
I'm
I'm
curious.
The
new
tunings
that
we
have
currently
in
Maine
would
help
too,
where
we,
we
have
a
lot
more
buffers
of
smaller
size,
and
we
we
deal
with
the
the
through
amplification
the
database
in
a
different
way
by
having
l0
and
L1
more
closely
sized
with
each
other.
That
yeah
I'd
be
curious.
If
that,
if
that
changes,
it
as
well
might
be
justification
and
reef
for
leaving
the
the
new
tunings
in
place,
rather
than
reverting
them.
B
A
I
suppose
it's
because
each
column
family
allows
accumulation
of
256
megabytes
and
then
you
compact,
when
you
hit
a
gigabyte
I
bet,
that's
what's
going
on
like
like
it
used
to
be
right
that
with
one
column,
family,
you
would
write
into
a
mem
table
for
everything
and
then
once
you
hit
256
megabytes,
then
you
would
start
the
compaction
process
and
you'd
start
writing
into
the
next
one.
And
that
would
be
you
know
if
you're
on
an
idle
cluster
like
this,
you
compact
it
really
fast.
A
Then
you're
writing
to
the
next
one
you'd,
never
in
practice
ever
exceed
like
256
megabytes
but
with
column,
family
charting
now
I.
Imagine
if
I
remember
right,
I'm
a
little
fuzzy
on
this,
but
I
think
you
have
the
ability
to
write
into
buffers
for
every
single
column,
family,
and
you
might
allow
yourself
to
get
very
close
to
that
one
view
by
limit
without
ever
compacting
any
individual
mem
table
until
you
hit
the
global
limit
and
then
you're
like
oh
crap,
that
may
come
back
early
gotcha
I
think
that's
kind
of
how
it
works.
A
B
H
H
A
Okay,
do
you
remember
Sage,
had
that
PR
for
rocks
tbhbals
be
able
to
reuse
existing
files
and
that
got
blown
away
because
it
wasn't
safe
for
the
right-hand
log.
B
B
Again,
that's
not
a
big
deal
if
we
use
incremental
updates
in
blue.
First
look:
yeah.
B
B
A
Yeah,
it's
it's
kind
of
interesting
to
me
that
it's
it's
not
completely
un
unrelated
or
different
than
the
issue
in
some
of
us
with
encoding
the
subtree
map
for
every
single
Journal.
Right
like
this.
This
feels
like
a
repeated
problem.
We.
A
All
right,
well
people
or
or
Corey
or
sorry
Joshua,
were
there
any
any
other
things
there
to
talk
about
before
we
wrap
up.
H
I
guess
for
myself,
yeah
look
we'll
be
reported
back
on
the
ticket
to
see
you
what
16-11
gives
us
in
terms
of
improvement
here,
I
mean
overall,
despite
this
I
I
think
we
saw
an
improvement
from
a
Pacific
install
and
like
this
was
probably
across
like
two
dozen
production
clusters,
or
something
like
that.
H
I
think
we
saw
improvements
in
pretty
much
all
of
them,
except
one
good
though
like
despite
this
in
specific,
it's
still
really
good
or
better
from
a
latency
perspective,
and
so
hopefully,
once
this
is
solved,
maybe
that
last
one
will
come
in
line
and
then
it'll
be
just
a
straight
up
Improvement
across
the
board,
but
we'll
see.
A
Good
good
I'm,
very
much
hoping
that
when
you
guys
look
at
Quincy,
you'll
see
another
Improvement,
there's
a
couple
things
in
there,
but
especially
Gabby's
work
with
yeah
I
was
thinking
of
Gabby's.
H
A
Cool
all
right,
well,
I,
don't
think
we
have
time
to
talk
about
hard
drive
stuff
here,
which
is
fine
since
Adam's,
not
here
anyway,
so
we'll
push
that
after
next
week,
but
any.
A
All
right
well,
then,
thank
you,
Joshua
and
thank
you
David
and
Corey.
This
was
an
excellent
meeting,
really
really
excited
to
see
things
getting
fixed
here.
I
think
Reef
could
shape
up
to
be
really
good,
so
oh
I
see
David
had
one
last
comment.
F
We
we
were
in,
we
were
in
a
pretty
nasty
place
and
that
tool
made
it
a
lot
easier
to
deal
with
manual
rebalancing
to
get
ourselves
out
of
hot
water
so
seriously.
That
was
like
a
massive
help,
so
just
want
to
say
thank.
H
You
awesome
I'm
so
glad
to
hear
that
yeah
we
we
built
that
tool
internally
to
get
ourselves
out
of
massive
piles
of
hot
water.
I
like
stuff
I
can't
even
talk
about.
Unfortunately,
but
like
once,
we
were
done.
We
felt
like
it
was
worth
open
sourcing,
so
I'm
so
glad
to
hear
that
I
I've
been
happy
to
see
on
the
mailing
list
too
that
people
are
using
it.
A
F
H
Like
yeah,
so
like
it's
interesting,
you
say
that
Mark
so
like
I,
actually
thought
once
or
twice
about
actually
rebuilding
Pizzeria
mapper,
as
like
a
self-manager
plug-in
essentially
but
like
ultimately
and
like
I,
actually
had
a
cephalocon
presentation
lined
up
for
this
last
year,
but
with
all
the
shuffles
I,
just
didn't
get
on
the
schedule
But.
Ultimately,
what
I
actually
want
to
do
is
I.
Have
a
list
of
I.
H
Think
three
or
four
points
that
if
we
fix
those,
then
you
actually
don't
even
need
the
tool
anymore,
and
most
of
it
has
to
do
with
like
backfill
scheduling
so
anyways
that
that
it
would
be
an
interesting
topic,
builds
in
the
distribution
sure
we
could
I'd
almost
much
rather
just
like
hey.
Can
we
spend
the
time
to
fix
the
things
that
cause
us
to
need
peachy
or
rapper
in
the
first
place?
If
that
makes
sense,.
A
Yeah
are
you
coming
to
to
cephalocon.
H
That
seems
unlikely.
Okay,
like
many
places,
travel
budgets
are
tight,
yeah,
yeah,
yeah
but
I
don't
know.
Maybe
maybe
I'll
resubmit
the
talk
and
see
if
I
can
get
some
budget
scrounged
up
for
one
of
them.