►
From YouTube: Ceph Performance Meeting 2022-07-28
Description
Join us weekly for the Ceph Performance meeting: https://ceph.io/en/community/meetups
Ceph website: https://ceph.io
Ceph blog: https://ceph.io/en/news/blog/
Contribute to Ceph: https://ceph.io/en/developers/contribute/
What is Ceph: https://ceph.io/en/discover/
A
All
right,
I,
don't
think
we'll
get
them
for
a
little
while,
yet
Braddock
hasn't
gone
yet
so
I
think
they're
going
to
be
a
little
while,
but
we
can
get
this
thing
started
now
and
and
they'll
show
up
when
they
do.
Okay,
so
I
did
confess.
I
did
not
really
get
through
the
PRS
this
morning,
I
started
late.
A
It's
a
work
in
progress
ER
for
kind
of
trying
to
retune
our
rocksdb
settings,
with
the
focus
on
keeping
bright
amplification
at
least
close
to
what
we
currently
have
it,
while,
hopefully,
both
improving
performance
and
the
big
one
is
improving
Tombstone
Behavior,
so
the
performance
when
we
are
iterating,
especially
doing
these
kind
of
crazy
iterate
delete
cycles
that
we
tend
to
do,
which
for
xdb,
is
really
really
bad
at
right.
A
Now,
when
you
do
this,
you
start
iterating
over
two
zones
that
have
not
yet
been
compacted
and
everything
slows
way
down
and
becomes
awful.
So
right
now
it's
just
some
settings
that
are
changing
and
adding
the
ability
to
to
use
roxdb's
ability
to
compact
on
iteration,
which
they
added
at
some
point.
A
I,
don't
know
when,
but
I
had
support
for
it,
but
the
big
piece
of
this
would
be
to
actually
track
deletes
on
a
per
column,
family
basis
and
then
some
locking
and
trickery
to
basically
once
a
transaction
has
been
committed,
then
increment
a
counter
for
the
number
of
deletes
that
have
been
successfully
committed
on
per
column,
family
basis
and
then
only
trigger
a
compaction
and
a
flush
for
that
column.
A
Family
and
the
member
flesh
is
the
one
that
we
don't
have
anything
for
right
now,
right
now
in
in
men
tables
and
rocks
to
be,
you
can
accumulate
tombstones
and
if
you
have
to
iterate
over
them,
there's
really
no
way
to
fix
that
sort
of
doing
something
like
this,
where
you
actually
track
it
yourself
and
then
manually
issue
flushes,
so
Mark.
B
Yeah
I
think
some
of
the
tables.
The
the
use
case
for
Tombstone
is
is
limited,
so
object
node
there
is.
There
are
really
few
cases
when
we
delete
object
mode.
One
of
them
is
snapdrin
and
I
mean
deleting
them
on
Mass,
the
second
one.
If
we
do
remove
for
volume
or
for
collection
other
than
that
I,
don't
think
we
delete
many
objects,
I
mean
we
could
do
it
from
time
to
time,
but
the
cases
that
we
do
it
on
a
large
scale
is
when
we
delete
file
system
with
a
little
volume
you're.
B
Is
a
different
story
but
I'm
talking
about
the
Oh
No
Object,
not
because
that
thing
also
but
but
I,
think
so
for
the
object
node.
We
could
trigger
the
compaction
from
this
Services,
which
tends
to
be
background
because
deletion
of
a
volume,
a
file
system
or
snapdrin
all
of
these
phases.
They
are
not
performance,
critical
and
so,
whenever
we
start
finish-
and
we
could
from
time
to
time
just
add
some
code
to
the
compaction
but
on
the
normal
fuel
we
shouldn't
see.
This
can
happen.
A
So
right
now
we
don't
expose
the
ability
to
do
per
column,
family
compaction
or
flushes
to
outside
of
the
the
proxdb
KV
store.
A
We
could
maybe
add
that,
but
we
end
up
in
the
same
kind
of
situation,
I
think
well,
you
might
be
able
to
avoid
locking
in
that
case,
but
in
any
event,
we
don't
have
that
expired
right
now,
so
we
need
to
add
code
into
proxy
bkv
store
to
let
you
do
a
per
calm,
family,
compaction
or
flush,
and
then
still
there's
I'm
a
little
a
little
nervous
about
letting
people
just
reach
in
and
kind
of
issue
those
themselves
it
I
I,
don't
know
that
we
necessarily
want
that
as
part
of
the
interface
or.
A
If
we
want
the
the
TV
store
rocks
to
be
KV
star
code
to
kind
of
handle,
it
itself.
A
A
B
A
So
what
we
can
do
inside
the
the
the
glue
code
is,
we
could
I
think
I
think
we
can
do
this
I
think
we
can
basically
register
a
listener
with
rocksdp
to
say,
I
want
to
know
when
this
column
family
has
been
compacted
and
when,
like
a
compaction,
event
ends
and
then,
if
we
are
tracking
deletes,
we
can
basically
reset
the
delete
counter.
A
If
a
another
compaction
comes
in
behind
the
scenes
that
we
don't
know
about
and
kind
of
make
this
fairly
clever,
I
don't
know
if
it
all
will
work
or
not,
but
that's
kind
of
the
current
thinking
I
had
is.
We
can
basically
increment
our
delete
counter
issue
a
flush
or
compaction
depending
on
some
criteria,
whatever
it
is,
however,
many
deletes
that
we
want
that
have
finished
successfully
based
on
on.
A
You
know,
knowing
that
the
transaction
finished,
and
then
we
register
a
listener
with
roxdb,
so
that
if
roxdb
decides
to
compact
the
column
family,
then
we
don't
just
do
it
ourselves.
You
know
blindly.
We
we
reset
our
counters
based
on
on
that
and
then
re
start
reincrementing
them
so
that,
then
you
know
we
wait
until
more
deletes.
Have
happened
because
of
proxy
compacts
and
all
the
Tombstones
are
gone.
B
A
I
think
if
we
wanted
to
do
it,
the
other
way
we'd
have
to
expose
both
the
ability
to
do
profile,
calm,
family,
flushes
and
compactions,
and
then
also
pass
through
the
ability
to
register
register
listeners
with
roxdb,
and
that
gets
I
mean
it
gets
kind
of
messy.
Then
you're,
starting
to
like
expose
a
lot
of
internal
details
about
rocks
to
be
to
Blue,
store
or
theoretically.
This
should
be
an
abstraction
which
you
can
do
it
just
I,
don't
know
starting
to
feel
kind
of
gross.
B
A
All
right
so
in
general,
though,
that's
all
still
work
in
progress,
and
you
know
I,
we'll
we'll
find
out
what's
right
or
if
any
of
this
actually
works
or
or
what
I
guess,
but
we
probably
need
to
fix
it
in
multiple
different
ways
and
just
try
things:
okay,
so
next
updated
from
Igor.
This
is
get
rid
of
status
updates
on
each
transaction.
A
This
failed
QA.
That
was
a
deal
with
this
one
I
think
Adam
had
approved
it.
You
got
some
updates,
I
think
Adam
actually
maybe
wrote
some
fixes
for
this,
but
I
didn't
look
too
closely
at
it.
So
anyway,
it's
it's
being
worked
on
and
then
after
that
I
confess
I
didn't
really
get
through
the
the
old
PR's
as
possible.
Some
of
these
get
updated,
but
I
most
of
them
are
probably
still
kind
of
just
hanging
around.
A
So
in
terms
of
as
Updates
this
week,.
A
So
if
folks
are
interested,
the
final
version
is
out
there
Josh.
We,
we
talked
a
little
bit
about
this
in
the
PR,
but
you
had
mentioned.
Re-Amplification
is
a
really
as
a
concern
about
using
TTL,
and
one
of
the
things
in
this
article
that
I
saw
is
that
with
rgw,
the
right
workload
in
rocksdb
went
way
down.
When
we
started
my
start
testing
compression
it
was,
it
was
huge.
A
D
And
yeah
I
can't
remember.
If
I
mentioned
this
in
the
past,
we
actually
did
run
compression
into
our
rgw
clusters
internally
and
I
found.
That
was
mostly
fine
until
those
osds
became
a
backfill
source,
at
which
point
the
load.
Spike
was
pretty
significant.
D
Because
of
a
high
rate
Traffic-
and
we
actually
did
get
some
people
with
workload,
slow
downs
when
that
happened
now,
okay,
there's
things
that
could
be
done
to
tweak
that
right.
It's
not
like.
We
have
to
backfill
at
full
speed,
but
it
it
was
noticeable.
A
Do
you
remember
if
you
were
using
Snappy
compression
or
it
was
Snappy?
Okay,
okay,
I
I?
Think
they're,
pretty
close,
although
I
think
lc4
was
a
little
better
from
what
I
saw
on
the
The
Rocks
TV
Facebook
thing
and
then
also
just
in
in
testing.
D
Yeah
and
to
be
clear,
the
right
app
that
I'm
concerned
about
is
less
still
on
the
index
side
like
that
six
hour,
TTL.
That
was
based
off
of
our
speaking
to
find
as
acceptable
right
now.
So
basically,
it's
like
okay
assume
you
want
the
ear
SSD
to
last
for
five
years.
We
just
did
the
math
on
write
app
and
everything
was
fine.
We've
got
tons
of
right
capability
on
those
boxes,
so
it
wasn't
a
performance
issue.
It
was
entirely
a
wear
level
issue.
D
A
And
I
have
a
feeling
that
all
of
this
stuff
is
going
to
increase
ramp
overall
right,
like
the
TTL,
the
the
compact
on
iteration,
especially
if
you
start
tracking,
deletes
and
that
like
every
you
know,
thousand
or
ten
thousand.
Who
knows
what
deletes
we?
We
issue
compactions,
you
know
we're
gonna,
see
that
right,
amp
go
up
so
I'm,
just
trying
to
think
of
like
how
to
how
to
kind
of
start
balancing
some
of
this
out.
D
Well-
and
this
is
this-
is
The
Balancing
Act
with
rotstv
right,
yeah
yeah,
like
I
I,
was
super
I'm,
just
catching
up.
I,
actually
just
came
back
from
three
and
a
half
weeks
of
PTO
here,
but
I
did
see
someone
one
of
my
teammates
at
senior
PR
and
posted
it
I'm
super
happy
to
see
a
lot
of
those
settings
starting
to
be
proposed
to
Mainline,
rather
than
saying
in
a
blog
post
somewhere.
D
That's
awesome,
I
think
for
most
people,
it's
just
not
going
to
matter
and
then
for
the
larger
shops
like
us
or
others,
we're
probably
just
going
to
observe
carefully
and
then
tweak
it
ourselves
anyway.
D
D
We
actually
so
I,
don't
know
if
you
saw
him
across
your
way
at
all,
but
one
of
my
colleagues
Alex
naragon
I'm
sure
you
know
who
he
is
yeah
yeah.
D
We
had
Tracked
Down,
PG
log
load
issue,
which
I,
don't
think
were
the
first
ones
to
hit
and
the
PG
log
below
which
basically
caused
the
PG
log
to
be
many,
many
gigabytes
in
size
on
a
per
OSD
basis
and
it
actually
caused
rocksdb
to
fall
into
having
an
L5
or
data
recipes,
and
that
was
awful,
because
now
your
Tombstone
build
up
just
is
that
much
bigger
right,
so
the
more
families
you
have
the
fewer
tune,
Stones
you
have
because
they
can
see
it
compacted
out
more
regularly.
A
Point
of
time
was:
was
that
issue
that
you
hit
the
the
one
where
We
are
failing
to
trim
if
there
is
a
bogus,
basically
like
future
update
that
ends
up
getting
in?
That
was
obviously.
D
This
was
each
entry
was
getting
big,
oh
that
one
okay,
yeah
yeah,
because
of
like
the
ref
count.
Updating
PG
log
entries
like
the
every
PG
lock
entry
was
like
20,
kilobytes
Plus,
or
something
like
that.
Just
for
a
refund
update,
I
I,
don't
remember
the
details,
but
it
was
huge
and
it's
like
it
goes
back
to
some
like
Hammer
air
hammer
era,
change
right,
I,
don't
know
exactly
how
it
works.
A
It's
this
one
actually,
now
I'm
I'm,
it's
actually
a
dupe
entry
issue
is
was
or
if
you
have
a
corrupt,
dupe
entry
that
looks
like
it's
in
the
future.
Then
we
just
don't
trim
anything.
A
A
Yeah
we
went
through
and
we're
doing
a
bunch
of
work
on
that
a
week
or
two
ago,
and
that's
that
the
fix
is
good,
so
yeah
that
should
that
should
hopefully
no
longer
be
an
issue.
A
Okay,
so
yeah
I,
guess
I
will
be
very
interested
in
your
guys's.
Take
on
this,
and
especially
if
you
guys
have
any
kind
of
like
test
set
up
when
we
get
the
the
deletion
like
deletion
tracking.
If
we
get
that
in
I'd
be
very,
very
interested
in
having
you
guys
review
it,
and
if
you
can,
you
know,
make
sure
it's
not
following
anything
up
on.
D
Your
end
for
sure
yeah
I
mean
we
do
have
test
setups
I'll
have
to
see
because
of
the
way
they
hook
into
the
infrastructure.
I'll
have
to
see
whether
they
would
trigger
those
paths
or
not
so
I
mean
we'll
see
once
the
pr
is
up
how
they
integrate.
We
might
have
to
tweak
how
we
do
our
testing.
E
D
I,
don't
know
I'm
still
catching
out,
I
find
it
when
I
catch
up
for
vacation.
Like
words,
just
don't
come
to
my
head
as
quickly
but
like
there's
like
there's
a
huge
there's,
a
huge
array
of
customer
workloads
and
it's
very
hard
for
us
to
capture
that
perfectly
so,
usually
once
we're
here,
it's
not
going
to
fall
over
our
best
test
is:
let's
just
go
and
put
in
production
for
a
subset
and
find
out
what
happens
so.
E
E
A
Cool
all
right.
Well,
thank
you.
Definitely
appreciate
your
guys's.
You
know
Alex's
tracker
ticket.
They
made
kind
of
really
really
highlighted
it.
I
think
I,
don't
know
if
I
would
say
it
I.
We,
we
hadn't
we've
known
about
how
bad
Tombstone
accumulation
is
for
a
couple
of
years
now.
But
but
you
know
it's
it's
worse,
I
think
than
we
even
realized
so.
D
C
B
Think
a
tombstone
is
a
real
issue
when
you
access
an
object
by
key,
because
I
think
we
did
a
mouth
so
say
you
have
one
billion
object
and
you
deleted
the
object
thousand
times.
So
now
you
have
one
billion
Tombstone
and
one
million
object.
Are
you
with
me?
You
have
the
one
million
object,
but
after
a
thousand
deletion
cycle,
you
now
got
one
real
one
million
of
real
objects
and
one
billion
of
Tombstone.
Let's
assume
there
was
no
compaction
whatsoever.
B
Okay,
are
you
with
me
so
far?
So
until
now
Okay?
So
until
now,
when
you
did
a
search
for
an
object
which
is
done
by
log
n,
it
was
taking
20
steps
for
one
million
object.
Log
n
of
1
million
is
20,
so
20
steps
when
you
grow
to
build
an
object
because
of
all
the
tombstone
the
search
going
to
take
30
steps,
because
log
2
of
1
billion
is
30..
B
So
we
really
only
increased
the
work
that
we
do
by
50
and
that
when
we
had
crazy
amount
of
tombstones,
that's
not
a
problem.
The
problem
is
when
people
walk
on
Range,
because
then
they
start
searching
and
when
you
search,
then
you
are
linearly
affected
by
the
amount
of
Tombstone.
Does
that
make
sense
to
you.
A
Yeah
yeah
I
agreed
I,
don't
know
for
sure,
but
I
think
when
you
access
something
by
key
roxdb
internally
will
go
through
and
hit
the
bloom
filters
right
to
see
if
that
key
exists
in
particular
level,
and
if
you
can't,
if
it
has
to
it,
will
fall
back
to
doing
a
a
search
through
the
the
SST
files.
Is
that
right?
Okay?
Is
that
you're
understanding.
B
The
when
we
search
by
range,
that
is
a
terrible
thing
to
do.
We
should
never
search
by
range.
A
B
A
B
In
the
snap
map,
where
there
was
really
no
reason
to
search
by
ranges,
but
the
way
this
thing
was
built
forced
us
to
walk
by
ranges,
because
we
wanted
to
have
a
logic
map
form
type
ID
to
a
set
of
object.
So
you
can
have
one
snap,
ID
and
say:
20
000
object,
you
know
what
we
have
done
is
we
broke
it
into
20
000,
separate
scheme
key
value
each
time
the
key
was
the
snap
ID
and
the
object
ID.
So
now
we
start
searching
by
ranges.
B
So
all
the
searches
are
done
by
ranges,
and
that
is
extremely
expensive.
So
all
of
this
is
going
to
go
away
with
my
code
because
we
no
longer
use
omaproxdb
whatsoever
and
I
also
think
that-
and
my
code
is
now
going
through
some
kind
of
optimization
Cycles,
because
I
realized
that
a
lot
of
what
we
do
is
doing
worst
case
scenario,
but
we
never
really
use
this
code.
B
For
example,
when
we
the
way
that
we
call
the
code,
we
have
an
option,
we
call
update
and
the
update
gives
you
the
old.
So
if
the
Clone
object
was
not
to
free
snap
sessions-
and
now
you
remove
one
or
two
snap,
so
it
gives
you
the
new
map
and
tell
you
you
have
to
remove
all
everything
from
the
old
Maps
So
in
theory,
you
should
walk
over
all
the
snaps
and
remove
all
the
stuff,
but
in
reality
this
is
not
how
we
do
things
in
reality.
B
What
we
do
is
that
we
remove
One
Snap
session
at
a
time.
We
only
work
on
a
single
snap
session,
so
you
only
need
and
when
you
add
object
when
you're
adding
object,
that's
a
very
simple
thing,
just
add
the
object
and
you
can
patch
them
to
disk.
When
you
do
trimming
you
do
it
on
a
single
snap,
so
you
just
need
to
page
this
single
snap
map
into
memory.
Maybe
you
need
to
sort
it
I
don't
know.
B
B
I,
don't
know
like
by
the
object
ID
order.
If
you
remove
something
from
the
object
mode
on
column
family,
does
it
make
any
difference
if
you
do
them
in
one
order,
if
you
do
them
randomly.
A
B
In
which
okay,
if
it
doesn't
matter,
then
all
we
need
to
do
is
page
this
single
snap
entity
into
memory
and
then
remove
them
one
by
one
and
you
just
walk
over
a
vector,
it's
extremely
simple
thing
and
you
never
have
to
do
any
searches.
You
never
have
to
do
any
sorting
any
accesses
to
database
and
any
finger.
B
If
you
remember
that,
the
way
that
we
do
things
it's
one
snap
at
a
time,
we
don't
the
code
in
theory,
can
support
operating
in
multiple
snap
and
removing
multiple
objects
for
multiple
snaps,
but
we
never
do
that.
Actually,
I
take
it
back.
There
is
one
scenario
in
which
you
do
that,
but
that
scenario
is
not
critical.
B
That
scenario
is:
when
you
remove
a
volume,
then
you
need
to
remove
the
object
from
all
the
sessions,
but
that
thing
is
low
priority
and
it
could
be
done
easily,
but
the
way
that
we
do
snap
trim
is
always
one
snap
at
a
time,
but
the
code
is
always
operating
as
if
everything
could
be
modified
in
any
time,
and
it's
always
iterating
over
all
the
snap
when
in
fact
we
always
walk
in
a
single
snap.
B
This
whole
code
is
very
general,
which
is
a
nice
thing.
If
it's
I
don't
know,
if
you
try
to
build
a
toolkit
for
somebody
else,
it's
nice
to
be
General,
but
when
you're
doing
stuff
for
yourself
it's
okay
to
say
you
know
what
we're
going
to
remove
snap
sessions
one
by
one.
We
start
from
one
session
once
we're
done.
We
move
to
the
next
one.
There
is
really
never
a
reason
to
walk
in
multiple
sessions.
B
So
then
the
code
is
much
easier
to
do
and
it's
much
cheaper.
So
that's
the
way
to
do
things,
but
the
the
other
cases
when
we
do
range
I
know
in
PG
log
we
try
to
do
range,
remove
I
suspect,
that's
another
problem,
I
think
in
the
end,
we
decided
not
to
do
that.
In
theory.
Rendering
move
is
a
cheap
operation,
because,
instead
of
issuing
multiple
tube
Stones,
you
just
create
one
Mega
tombstone.
In
reality
it
seems
to
be
more
expensive
and
and
I
don't
know
this.
B
A
The
the
less
important
some
of
this
other
stuff
becomes
that
that
we're
talking
about
with,
like
automatically
triggering
compaction
and
flush
on
deletion
and
that
kind
of
thing
just
to
try
to
get
tombstones
out.
I,
don't
know
if
we're
ever
really
going
to
completely
eliminate
it,
though
I
don't
know
like
I'm
the
rgw's
ikc.
Do
you
know
like
under
what
circumstances
we
we
do
a
lot
of
like
range
based
iteration.
A
C
A
Yeah
Greg
I'm
I'm
kind
of
trying
to
get
a
sense
for
like
how
important
a
general
solution
is
to
in,
like
the
rocksdb
store
for
issuing
compactions
on
deletion,
to
get
rid
of
tombstones
versus
more,
like
one-off
kind
of
you
know,
allowing
different
things
to
issue
compactions
periodically
like
do.
We
need
the
general
solution,
or
is
it
good
enough
to
have
like
the
ability
to
let
something
trigger
compaction
here
and
there.
C
A
My
inclination
is
a
general
solution.
Basically,
in
the
rocksdb
give
you
store
code
basically
for
every
column,
family
track
the
deletions
and
then
have
some
cleverness
there,
where
we
can
reset
our
counters
when
Rock
Stevie
issues
a
background
compaction
and
not
expose
the
ability
to
do
like
per
column,
family
compaction
and
flushing
to
other
stuff.
We
just
take
care
of
it.
There
that's
kind.
C
Of
like
yeah
I
mean
because,
like
like
the
upper
layers
that
are
even
if
they're
doing
both
delete
both
deletes,
like
they
don't
know
where
they're
going
definitely
like,
like
they're,
not
paying
attention
to
what
PG
things
are
in
they're,
certainly
not
paying
attention
to
which
OSD
gets
those
pgs
and
that
all
matters
I.
Think
for
this
yeah.
You
know
like
the
MDS
stores,
all
of
it,
the
entries
and
omap,
and
you
know
sure
it
can
do
a
bulk
delete
of
omap.
Sometimes
it's
not
super
common
but
like
if
it
does.
C
A
And
I
mean
like
even
inside
the
OSD
and
inside
Blue,
Store
I.
Don't
really
want
it.
You
know
people
thinking
about
rocks
to
be
internal
Behavior
right
like
that's.
If
we
we
won't,
but
if
we
did
put
a
different,
you
know
key
Value
Store
behind
this
issuing
compactions
on
deletion
might
not
be
what
you
want
to
do
at
all.
So
it's
I
think
the
the
key
Value
Store
code
is
or
the
proxdb
value
store
code
is
what
this
probably
should
live.
C
Sure,
when
it
comes
to
me,
I
mean
I.
Think
you,
if
you
want
to
track
that
like
if
you
want
to
issue
it
based
off
the
ratio
of
deleted
or
tombstones,
then
there's
no
reason
to
do
that
above
Rock
TV
or
our
DirecTV
handling
because
like
why
would
you
rock
TV
knows?
Or
we
know
where
the
where
the
beliefs
came
from
I.
A
I
think
that
Gabby's
idea
earlier
was
that
we
could
kind
of
try
to
be
smart
about
like
waiting
until
there's
like
no
a
time
period
to
do
a
compaction
when
nothing
else
is
happening,
like
maybe
there's
a
trade-off
that
you
want
to
make
where
you
don't
want
to
do
the
compaction
immediately.
You
just
want
to
wait
for
a
while,
but
I
don't
think
it's
enough
of
a
win.
B
Whatever
we
do
so,
if
we
do
snap
stream,
we
could
stop
the
train
until
compaction
is
done.
There
is
no
reason
to
run
them
in
parallel.
If
we
do
anything
else,
then
then
you
could
stop
yeah
I
know,
but
at
least
you
could
stop,
which
is
clearly
generating
more
tombstone
in
the
same
column,
family.
C
B
C
B
B
C
That
does
not
swear
with
my
intuition
of
compact
compaction
and
tombstones
interact,
but
if
you
have
tests,
I
have
looked
at
tests,
I
haven't
I
mean
like
you
have
like
the.
If
you
run
compaction,
then
into
a
bunch
more
tombstones,
you
just
have
to
go
through
all
the
layers
again
and
you're,
not
like
and
I.
Don't
think
you're
reducing
the
number
of
Levels
by
that
much.
A
Greg
I
think
the
the
the
the
case
that
I
really
worry
about
deleting
or
you're
or
well
at
the
same
time,
but
you're
interleaving
them
right.
That's
the
one
that
that
we
consistently
see
break
where
you
have
lots
of
tombstones
you're,
not
writing
anything,
so
you're,
never
compacting.
So
you
end
up
with
iteration
just
taking
longer
and
longer,
like
you
know,
like
you
know,
seeking
to
the
beginning
of
a
range
or
seeking
to
the
end
of
the
range.
That's
that's
the
behavior
that
we've
we've
gotta
figure
out
how
to
fix.
A
B
Why
we
do
that?
We
know
that
we
generate
a
lot
of
Tombstone
for
others
and
since
we
can
stop
doing
that,
so
in
in
theory,
if
you
run
snap
Market
very
quickly,
you
would
feel
the
system
with
Tombstone,
so
you
should
in
a
way
pace
yourself
say
you
know
what
every
and
Tombstone
I
create
I'm
going
to
take
a
break
compact
the
system
and
only
then
I
resume
compact
trimming
or
or
volume
remove
or
whatever
it
is
I
do.
A
B
A
I
see
a
memory
or
whatever,
but
if
you're
only
working
on
like
one
thing
at
a
time
is
it,
are
you
consuming
a
lot
of
resources?
If
you,
if
you
end
up
blocking.
B
A
You
won't
you're
you're
you're
you're,
just
submitting
more
work
without
bound
yeah,
because
even
if
it's
not
completing.
A
Craig,
this
not
as
part
of
it
but
the
big,
the
real
part
of
it,
is
that
you,
when
you're
sort
of
like
you,
know
finding
the
beginning
of
a
range
it
you
end
up
going
over
and
having
to
iterate
out
over
all
of
those
two
phones
to
do
it
so
like
as
you
go
in
because
you're
doing
this
kind
of
like
iterate,
find
something
delete.
It
start
over
at
the
beginning.
Again
iterate
find
delete
it.
You
end
up
doing
you
know
it's
not
like
a
linear
growth.
It's
it's
a
bigger
growth
in.
A
C
A
C
Okay,
I
mean
yeah,
so
that
seems
like
something
we
should
work
on,
because
I
mean
yeah,
I,
I,
I,
don't
know
what
all
the
patterns
are,
but
I
mean
from
just
a
how
much
work
it
takes
to
do
the
compaction
and
and
propagate
things.
Then
you
know
having
all
the
tombstones
in
there
at
once
means
we
have
to
do
one
pass
when
we
come
back
and
doing
it
every
thousand.
If
we
have
20
000
super
Stones
means
we
have
to
do,
20
compaction
passes
and
that's
that
seems
bad
too
so
I
I,
guess
yeah.
C
We
should
look
at
our
iteration
patterns.
We
don't
really
want
to
have
to
like
stop
iterating
like
we
can't
remove
all
iteration
from
the.
A
C
A
Yeah,
we
I
think
we
need
fallback
code
because
we
can't
have
everything
just
kind
of
grind
to
a
halt,
but
generally
speaking,
if
we
can,
especially
if
we
can
avoid
restarting
and
re,
you
know
like
recomputing
a
lower
bound
right.
Like
that's
that's.
What
a
lot
of
times
we
see
is
that
broxtb
is
spending
like
a
horrific
time
amount
of
like
wall
clock
time
seeking
within
lower
bound
because
we're
restarting
over
and
over
again.
A
A
E
E
B
B
B
I
mean
removing
the
object,
all
the
processing,
all
the
transaction-
that's
expensive,
but
if
even
the
search
is
expensive
that
you
say,
I
cannot
do
it
for
that
long.
Then
you
are
actually
making
it
worse
sure
you
never
block
it
for
a
very
long
time,
but
aggregated
you
will
block
in
for
much
bigger
time.
E
In
general,
the
smaller
Market
frequent
complexion
strategy
has
been
applied
pretty
successfully
in
other
variants
of
ICB
like
Pebble
and
other
academic
papers,
so
I
think
I
think
that
is
worth
looking
into.
A
Yeah
Josh,
that's
kind
of
why
I'm
I'm,
leaning
towards
this,
this
plan
of
tracking
deletes
you
know
per
column,
family
and
then
just
you
know
we
could
tweak
it,
but
generally
speaking,
issuing
issuing
compaction,
I'm
afraid
of
the
right
amplification
overhead,
but
we
probably
are
going
to
increase
right
amplification
doing
this,
but
I
I
think
probably
from
the
whole
system
working
well
kind
of
yeah
perspective.
It
will
it'll
improve
things.
E
A
A
Yeah
it
gets
tricky,
but
the
the
gist
of
it
is
it's
probably
gonna
involve
locking,
which
I
don't
love,
but
we
have
a
basically
like
a
calm
family
rapper
thing.
That's
got
a
bunch
of
well
a
couple
of
different
counters
in
it.
We
increment
our
counters
when
we
issue
RM,
key
or
arm
keys
or
whatever
we're
doing,
but
we
don't
actually
apply
what
we
increment
until
the
transaction
completes
through
our
sdb,
then
once
we
have
that
particular
counter
get
high
enough.
A
We
don't
want
to
issue
a
compaction
or
a
flush
if
roxdb
did
so
recently.
So,
in
addition
to
this,
we
would
I
believe
we
can
do
this
register
a
listener
with
rocksdb,
so
that
for
each
column,
family,
so
each
column,
family
would
basically
have
a
listener
associated
with
it.
For
or
each
rapper
would
have
for
her
at
home.
Family
would
have
a
listener
for
its
column,
family
associated
with
it.
A
E
Yeah
I
think
so
I
guess
I
wonder
if
the
the
listing
for
for
compaction
and
events
and
pleasure
events
seems
like
may
or
may
not
imagine
how
I
kind
of
necessary
are
you
sure
that
is.
A
Yeah,
that
would
probably
be
it
could
be
speeding
enhancement
right,
I
mean
it's
not
the
end
of
the
world.
If
you
do
a
compaction
immediately
after
rstv
just
did
one
it's
not
great,
but.
A
E
Two
I
guess
with
these
like
delete
heavy
workloads,
it
seems
like
they're,
not
it's
not
compacting,
that
fast
or
that
quickly
to
keep
up
with
the
correct.
A
With
the
delete
workloads,
your
your
I
I,
don't
know
if
eventually
we'll
turn
our
compaction.
If
you
literally
fill
the
mem
table,
12
deletes,
if,
like
there's,
some
extremely
small
size
associated
with
that
that
Tombstone
I
guess
I
would
assume
there
is
but
yeah
it
it's
either.
Never
compacting
or
just
compacting
an
extremely.
E
Okay,
so
you're.
What
you're
saying
is
it
implies
that
some
activity
is
tracking
when
it's
taken
back
based
on
fullness
and
because
humans,
under
small
they're,
not
getting
that
bonus
very
quickly.
A
Yeah
or
just
never
right
and
we
don't
I,
don't
actually
know
but
yeah
the
so
the
the
potentially
the
Bad
Case
right
would
be
if
you've
got
a
some
kind
of
like
moderate
or
small
right
workload
and
then
like
a
delete
workload
mixed
in
with
it
and
the
rights
are
triggering
compactions.
But
then
we're
also
triggering
compassionism's
leads,
and
you
end
up
like
kind
of
like
colliding
with
each
other
and
have
like
twice
the
number
of
of
compactions.
A
You
maybe
should
have
right,
because
they're
they're
kind
of
stepping
on
each
other's
feet
sure
sure
that
would
be.
You
know
the
thing
that
you
maybe
want
to
try
to
listen
for
and
avoid.
But,
like
you
said
it's
it's
probably
not
the
end
of
the
world.
It
would
mean
more
right,
amplification
and
maybe
a
little
slower,
but
the
the
big
thing
is
actually
the
deletes
tracking
the
deletes
and
then
applying
what
you
see
on
successful
transaction
to
rocksdb.
A
To
the
you
know,
sexual
right
of
the
transaction
to
the
red
head
log
and
then
issuing
the
compaction
or
just
and
the
flush
the
flush
in
the
compaction
only
for
that
column
family.
If
you
see,
however,
many
deletes
you,
you
want
to
trigger
on.
A
E
Would
it
be
worthwhile
they
can
think
about
a
smaller
range
within
a
column,
family
too?
Where
would
that
help.
A
I
haven't
thought
about
that
that
far,
but
you
can,
you
can
do
like
a
range
for
compaction.
A
E
Or
maybe
longer
term
we
could
we
could
like.
If
we
see
action
takes
it's
taking
a
very
long
time
for
the
full
column
family.
We
can
try
to
reduce
the
scope,
a
bit
sure.
E
E
A
A
Documentation,
I,
don't
know
if
I
want
to
be
in
here.
E
E
Yeah,
it's
not
a
lot
of
English
to.
A
But
not
worth
trying
to
do
this
so
yeah,
maybe,
but
we
we
don't
need
to
do
it
to
start
out
right.
We
can
start
off
just
trying
to
flush
the
whole
thing.
A
So
yeah
I've
kind
of
been
trying
to
lay
it
out
in
my
head
and
I.
It
seems
like
it
all
works,
I'm,
afraid
I'm
gonna
have
to
protect
the
the
Bradford
thing
with
the
lock
on
on
deletion.
A
A
I
was
thinking
the
other
day.
It
couldn't
be
well.
Let
me
think
about
it
again.
I
haven't
ever
thought
about.
In
a
couple
of
days
he
said:
I
heard
that
team
yeah
Atomic
spray
I
think
it
didn't
work.
Okay,.
A
So
yeah
anyway,
oh
I
mean
I.
Try
to
take
a
look
at
that
once
a
lot
of
these
other,
like
random
fires,
get
put
out
that
we
have
right
now
with
performance
stuff,
but
yeah.
That's
that's
kind
of
the
plan
for
me
anyway
to
see.
If
we
can
do
this,
yeah
I
guess
that's!
That's
it
and
we're
at
the
end
of
the
hour,
but.