►
From YouTube: Ceph Performance Meeting 2022-10-20
Description
Join us weekly for the Ceph Performance meeting: https://ceph.io/en/community/meetups
Ceph website: https://ceph.io
Ceph blog: https://ceph.io/en/news/blog/
Contribute to Ceph: https://ceph.io/en/developers/contribute/
What is Ceph: https://ceph.io/en/discover/
A
I
only
have
two
new
PR's
this
week.
That
I
saw
the
first
one
is
actually
kind
of
exciting,
because
I
think
Adam
was
trying
to
start
looking
into
this
himself,
but
we've
got
a
contributor
from
canonical
that
found
a
race
condition
in
the
unknown
cash
Igor.
It's
related
to
all
the
same
stuff
that
keeps
harassing
us
over
over
the
years.
B
A
Well,
maybe
deceptively
simple,
but
not
a
lot
of
code,
so
yeah
I
added
you
as
a
reviewer
and
if
yeah,
if
you
have
another
one
before
is,
is
it
a
PR
or
is
it
just
a
branch
right
now
your
version
yeah.
C
B
B
C
C
B
C
A
Okay,
now
I
gotta
get
back
to
my
Hey
YouTube
that
window
okay,
so
there's
that
PR
and
then
we've
got
a
new
PR
from
Adam
about
improving
deferred
the
Deferred
right
decision
making
and
Igor
I
was
hoping.
I
might
get
you
to
just
quickly
look
at
at
that
too.
I
reviewed
it
I
Adam.
Those
changes
make
sense
to
me,
I
like
what
he
did.
A
The
only
thing
I'm
worried
about
is
that
you
had
added
some
code,
maybe
a
year
ago,
regarding
when
you've
you've
kind
of
got
multiple
chunks,
that
you're
you're
making
decisions
on
and
there's.
This
has
chunk
to
defer
logic
which
I
I,
confess
I,
don't
totally
understand
what
your
your
current
code
does
so
I
was
I
was
a
little
afraid
of
removing
it
with
what
Adam's
doing
now,
but
I
just
well.
A
If
you
have
time
to
maybe
pick
a
glance
over
it
and
see
if
you
agree
with
with
Adam's
solution
here,.
B
A
Sure,
don't
worry
sorry,
I
know
it's
it's
worth
I'm
I'm
reaching
out
a
lot,
it
is
doing.
I
think
makes
sense
to
me
at
least
it
seemed
reasonable.
He
is
making
kind
of
the
I.
Don't
think
Adam's
here,
yeah
he's
making
the
kind
of
conscious
decision
to
just
kind
of
treat
everything
that
the
allocator
gives
us
as
a
contiguous
region,
which
you
know
when
the
disk
is
full
or
if
you
get
lots
of
little
things.
Maybe
it's
not
a
good
assumption.
A
I
don't
know,
but
in
the
general
case
what
he's
doing
seemed
seemed
to
be
reasonable.
A
So
yeah,
if
you,
if
you
get
a
chance
I,
think
Yuri
actually
put
it
in
testing
already.
So
it
not
sure
if,
if
the
plan
is
to
actually
merge
that
right
away.
C
Or
not,
that's
just
to
get
a
head
start
on
the
testing
Mark.
It's
not
like
you
know.
We
still
want
okay,
we're
talking
about
plus
one
for
it.
If
there's
some
obvious
things,
that's
big
and
then
the
reason
being
being
that
I
think
eager
is
also
a
Pacific.
The
next
specific
point
really
at
least
relies
on
this,
so
we
we
won't
be
doing
the
release
until
we
have
a
fix
for
this
and
there's
a
lot
of
interest.
A
Okay,
well
so
anyway,
yep
there's,
there's
that
one
new
as
well
and
that
was
all
I
saw
I
didn't
see
anything
performance
really
closed
or
really
updated
this
week
other
than
those
two
new
ones.
Was
there
anything
I
missed
from
anybody.
A
All
right,
I
did
not
quite
make
it
through
the
list
of
old
PRS
this
morning.
I
got
most
of
the
way
through,
though,
so.
There
may
be
a
couple
things
that
got
closed
out
here,
but
I'm
I'm,
guessing
it's
probably
about
the
same
as
it
was
last
week
for
discussion
topics.
A
I
didn't
have
anything
listed
here,
but
there
is
one
thing
I
wanted
to
mention:
we've
got
a
user
that
is
trying
to
use
secure
mode
with
suffix
and
they
saw
significantly
lower
client
performance
than
I
saw
doing
the
same
RBD
workload,
a
small
random
reads
of
like
16k,
with
not
using
secure
mode
and
I.
A
Think
what
we're
probably
seeing
is
that
that's
a
fairly
significant
source
of
overhead,
especially
now
that
Adam
Emerson's
boost
SEO
stuff
emerged
that
they
gave
us
a
really
significant
client-side
performance
Improvement
a
couple
of
years
ago,
and
it
may
be
that
suffix
and
especially
secure
mode
now
is,
is
a
bigger
bottleneck
than
it
used
to
be
so
not
100
sure
on
this
right.
Now,
it's
it's
just
a
hypothesis,
but
it
looks
like
it
could
be.
A
A
B
So
I
saw
your
comments
in
the
pr
which
uses
recovery.
A
Brought
that
here
to
my
attention,
because
he
was
expressing
some
concerns
about
it
and
I
guess
the
the
thing
that
I'm
concerned
about
is
so
the
idea
here
right
is
that
you're,
basically
you've
taken
a
snapshot
of
of
Rock's
DB
and
if
somehow
your
database
gets
corrupted
or
whatever
you
basically
just
replace,
what's
there
with
the
the
old
version
and
then
use
kind
of
our
recovery
mechanism
to
try
to
get
things
back
into
like
a
stable
state
right.
A
B
B
Instead
of
need
to
perform
full
cloning
off
the
broken
OSD,
you
can
recover
from
this
database
and
then
use
grabbing
deep,
scrubbing
to
fix
just
broken
or
notes
which
have
no
matching
checksums,
not
to
mention
that
it
might
help
in
case
you
have
multiple
bronchologies
and
you
just
don't
have
enough
replicas
to
cover
pages.
B
B
On
top
of
this
stuff,
I
I
put
the
link
in
the
original
PR,
which
actually
reverts
Bluffs
to
initial
State
before
doing
the
in
the
recovery
and
while
I
played
the
beat
with
that,
and
it
looks
like
it's
more
stable
and
more
reliable
to
to
bring
us
the
app.
B
But
anyway,
the
the
the
approach
doesn't
guarantee
that
all
user
data
are
valid
after
the
recovery.
But
it
makes
it
allows
us
to
recover
USD.
A
Yeah
I
guess
I.
Guess
the
the
thing
that
that
that
I
guess
I'm
I'm
most
concerned
about
is
just
kind
of
the
unknown
right
like
like
it.
It
I
see
this
in
my
my
first
thoughts.
Oh
my
gosh.
How
many
Corner
cases
could
we
hit
where
yeah
something's
gonna
end
up
corrupted
with
with
trying
you
know
to
do
this
approach
and
maybe
that's
unfounded
I.
Don't
know
you
know
fear
of
the
unknown
right,
but
but
that's
that's.
What
I
am
I'm
honestly
most
concerned
about
I,
don't
have
any
specific
examples.
A
Just
you
know
kind
of
concern
that
you
could
end
up
with
a
really
inconsistent
statement.
B
B
Sometime,
well,
it's
slower
and
you
might
simply
don't
do
not
have
enough
replicas.
If
multiple
SDS
are
down.
B
A
So
yeah
I
guess
I,
don't
know
how
Joshua
Neha
do
you
do
you
have
any
feelings
about
how
to
if
we
were
to
approve
the
PRS
and
and
kind
of
make
this
available
in
in
some
of
the
tooling
that
we
have?
How
we'd
be?
How
should
we
guide
users
on
this?
This
scares
me
still,
but,
but
maybe
it's
maybe
as
a
last-ditch
thing.
It's
it's
worth.
Having
I,
don't
know.
C
Yeah
in
general,
I
think
these
animosity
can
be
pretty
helpful,
but
they
do
need
to
have
some
big
warnings
around
using
them,
because
it's
similar
to
like
the
TV
store,
TV
store
tool
like
DB
repair,
where
it
has
potential
to
and
and
give
you
a
kind
of.
You
know
incomplete
state.
That
may
not
be
so
obvious
from
just
the
name
of
the
command.
C
C
Yeah
my
two
senses:
it's
good
to
have
such
things
in
your
back
pocket
when
you
have
nothing
to
rely
on
you've
been
there,
we've
been
in
situations
where
we're
like.
Okay.
Only
if
we
had
something
like
this
but
yeah
having
that
extra
flag,
like
you
know,
there's
no
guarantee
about
full
recovery
is
definitely
something
we
should
add.
C
B
A
B
To
perform
recovery,
but
it's
definitely
not
100
guarantee,
and
in
that
case
the
keeping
claw
or
copy
of
metadata
in
its
main
device.
Outperforms
DB
looks
subtract
differently,
so
maybe
we
should
consider
this
optional,
maybe
start
working
on
some
design
on
that,
because
well.
C
B
A
Yeah
yeah
I
wish
we
had
done
that
from
the
get-go
just
appended.
The
metadata
to
you
know
a
portion
of
the
object
or
something
that
are
pushing
for
it.
A
Yeah
yeah
I've,
I've,
always
kind
of
thought
of
roxyb's
I
mean
it's
right.
Now
is
the
authority
source
of
metadata
information
right,
but
you
know
it's
really
only
there
for
fast
lookups,
yeah.
C
C
A
Well,
how
nice
would
it
be,
then,
for
a
variety
of
reasons?
If
you
did
something
like
that,
you
could
people
could
replace
their
DB
device
easily
with
a
new
one.
They
could
get
rid
of
the
DB
and
recreate
it.
You
know,
that'd
be
so
many
issues
right
now,
Beyond,
you
know
just
people
having
corrupt
DB,
they
want
to
they
decide
that
they
want
to
move
it
from
one
fast.
You
know
once
a
slow
device
to
a
faster
device
or
or
have
fewer
on
one
device
or
something
it'd
be
nice.
B
Well,
migrated
immigration
is
available
at
the
moment,
so
it's
not
big
deal
but
recovery.
That's
the
option
of
it
that
highly
appreciate.
A
I
know
there's
a
way
you
can
do
it,
but
I
thought
the
at
least
it
used
to
be
at
the
moving.
The
database
was
pretty
end
of
complicated.
Is
the
right
word,
but
just
a
little
bit
of
a
an
intense
process
per
user.
B
A
All
right,
well
cool
yeah,
if
Igor,
if
you're
interested
I'd,
be
happy
to
talk
to
you
more
about
that,
I
think
would
be
really
really
nice
if
we
had
the
ability
to
recreate
the
database
from
from
the
Block
device,
that'd
be
killer,
feature.
A
But
for
this
maybe
for
this
meeting
we
can
move
on
for
now,
I
guess
if,
unless
people
want
to
talk
about
that.
A
One
one
thing
actually
not
related
to
that
topic:
Igor
I
did
want
to
talk
to
you
about
is
Adam
and
I
have
been
talking
a
lot
about
deferred
rights
lately
and
one
of
the
things
maybe
the
heretical
position
I've
been
taking
is
that
we
should
just
get
rid
of
them
entirely
and
move
to
a
model
where
we
write
to
a
small
portion
of
the
flash
device
as
a
non-deferred
right
and
then
instead
migrate
data
over
to
a
slow
device
after
the
fact,
not
as
part
of
the
actual
right
process,
but
as
kind
of
a
just
a
later
deferred
process.
A
I
think
Adam's
not
totally
on
board
with
it,
yet
he's
still
trying
to
think
about
ways
to
make
deferred
right
better,
but
I
I
was
hoping,
maybe
you're
here.
If
I
could
get
your
your
take
on
on
things.
What
do
you
think.
B
B
As
well
so
at
this
point,
I
I
dislike
roxdb,
quite
significant,
so
the
less
Lord
we
have
there.
The
better
to
to
me.
A
I
I
started
looking
into
the
into
the
kernel
device
and
and
block
device
interface
and
I'm
wondering
if
there's
some
way,
I
can
basically
create
a
new
kind
of
abstracted
block
device
implementation
that
takes
two
kernel
devices
and
then
makes
decisions
about
where
to
write
data
to
and
then
from
there.
Maybe
even
Behind
Blue
Store
could
could
kind
of
migrate
data
from
one
to
the
other,
but
you
probably
want
some
kind
of
hinting
through
that
block
device
interface.
A
Regarding
you
know
when,
when
you
should
defer,
if
it's
something
that's
short-lived,
you
know
these
kinds
of
things
that
what.
B
A
B
Think
I'm
afraid
you
might
need
some
some
additional
metadata
to
keep
that
stuff
which
is
not
present
but
come
in
the
last
level,
so
just
speculating
so
far,
but
well
abusing
blue
effects
for
this
purpose
look
more
attractive
to
me.
B
B
Well,
at
least
this
looks
like
maybe
primary
option
to
try.
C
A
I
was
also
thinking
well,
the
reason.
One
of
the
reasons
I
wanted
to
talk
to
you
about
it
is
I,
was
thinking
about
your
right
ahead.
Log
work
and
I
was
thinking
about
how
once
we've
written
data
into
the
right
I
have
log
I
wonder
if
we
could
do
that
scheme.
We
kind
of
talked
about
in
the
past,
where
Maybe
you
can
almost
treat
the
part
that
you've
written
into
the
redhead
log
like
an
extent
like
you've,
already
got
the
data
there,
it's
already
on
the
fast
device.
A
Maybe
you
could
actually
treat
that
as
if
it's
like
you're
kind
of
intermediate
copy
of
the
data
you
can
read
from
and
then
and
then
slowly
as
you
as
you
merge
things
over
to
the
slow
device
you
can
clean
up
old
logs
behind
and
and
you
know,
otherwise,
you
just
leave
the
existing
lugs
in
place
until
you're
done
with
them.
B
B
A
Be
really
nice
if
you've
already
written
to
the
fast
device
for
the
log
and
you've
already
got
the
data
sitting
there.
If
you
didn't
have
to
rewrite
it
back
into
the
last
device
or,
like
you
know,
kind
of
a
fast
layer
that
then
later
gets
moved
into
the
slow
device.
It'd
be
nice.
If
you
could
just
leave
it
in
place
and
treat
it
as
a
that.
Data
is
like
a
new
extent.
B
B
A
B
B
A
A
All
right,
well,
that
was
that
was
all
I
wanted
to
bring
up
to
you
since
there's
just
there's
a
lot
of
work,
kind
of
in
different
different
people
looking
at
different
things,
but
there's
this
kind
of
unifying
desire,
I
think
to
try
to
make
all
this
simpler
and
and
make
more
sense.
So
yeah.
B
Yeah
but
as
we
discussed
that
before
all
these
modifications
look
pretty
dramatic.
So
at
some
point
I'd
like
to
to
Fork
the
story,
presentation
keep
the
Legacy
involved
and
maybe
go
ahead
with
new
one
because
making
touch
changes
to
existing
conveys.
So
it's
dangerous
and
it's.
A
It's
it's
probably
a
good
good
point
that
at
some
at
some
point
we
have
to
decide
what's
reasonable
to
do
the
Blue
store
and
whether
or
not
we
should
make
like
a
blue
store
two
or
something.
A
A
At
one
point,
I
was
even
thinking
in
Blue
Store.
It
wouldn't
be
impossible
to
kind
of
adopt
a
a
sharded
right
path
where
we
have
multiple
instances
of
rocksdb
and
we
have
specific
shards
handling
specific
pgs.
It
didn't.
It
didn't
look
impossible
to
me,
but
like
that's,
almost
recreating
c-store
and
and
in
Crimson
at
that
point,
not
not
entirely
but
sort
of
so
I.
Don't.
B
C
A
Yeah
I
had
started
working
on
a
branch
like
a
sharding
I
think
there's
a
couple
of
little
irritating
areas
where,
like
you'd,
have
to
change
things
to
make
it
work,
but
it
didn't
actually
seem
as
bad
as
I
thought
it
would
be
at
first
I,
don't
know
if
that
was
your
experience
or
not.
When
you
were
looking
at
it.
B
A
I
think
beyond
the
the
KV
sync
thread,
the
the
way
that
the
chart
it
up
work
Q
works
right
now
and
especially
the
way
that
we
kind
of
try
to
fill
things
in
with
the
messenger
threads
and
then
let
those
the
worker
threads
go
to
sleep
and
wake
back
up
based
on
the
status
of
the
queue
that
it
all.
I
I've
seen
some
evidence
that
that
this
kind
of
model
is
is
not
ideal.
A
That
that
whole
side
of
it
too,
if
we
can
figure
out
a
better
way
to
kind
of
make
that
whole
pipeline
work
faster.
B
Well,
maybe
one
more
topic
which
I
I'm
not
sure
if
we
have
anything
to
discuss,
but
we
might
want
to
so
well,
we
definitely
need
some
solution
for
the
former.
C
B
C
A
There
I
had
a
PR
where
I
was
starting
to
kind
of
talk
about
some
of
that
stuff.
B
Well,
it's
often
hard
to
stay
why
roxdb
is
unable
to
perform
its
regular
returns
so
automatic
comparison.
So
what
I
definitely
know
is
that
the
Rocks
DB
might
get
into
the
state
when
it
performs
widely
and
highly
likely.
This
is
related
to
previous
bulk
removals
and
melon
compaction.
Fixes
that
it
does
yeah
I
saw
that
for
back
feeding.
B
A
There
are
two
separate
issues
right,
one
is
fragmentation
at
the
SST
level
and
one
is
fragmentation
at
the
the
the
tombstones
in
in
mem
tables
themselves.
The
pr
I
think
do.
A
The
pr
I
linked
in
the
in
the
chat
window,
it
adds
the
ability
to
set
this
sliding
window
for
when
you're
doing,
iteration
will
trigger
compaction.
A
So
that
helps
in
the
case
of
the
SSD
file.
Tombstone
Behavior.
If
you
have
too
many
tombstones
and
you're
doing
it,
and
it
will
force
compassion
so
that
Pier
will
help
there,
but
it
doesn't
help
with
the
mem
table
problem.
If
you
have
too
many
tombstones
in
the
mem
table,.
B
Okay,
do
we
have
any
ideas,
how
to
at
least
indicate
that
cluster
is
exposed
to
this
issue,
Maybe,
something
like
how
many
tombstones
we
have
or
whatever
just
so
right
now
we
even
lack
any
diagnosis
tools,
so
yeah
we
have.
We
can
see
that
customs
in
a
bad
shape,
but
in
fact
we
are
unable
to.
A
A
B
To
even
say,
if
something
is
wrong,
with
the
cluster
I
mean
we
don't
have
any
metrics.
A
Yeah
and
I,
don't
think
Robert
might
be
even
like
gives
you
that
for
mem
tables,
like
the
the
mem
table
side
and
the
the
SST
side
are
like
really
separate.
I
I
wanted
to
see
if
I
could
re-implement
this
this
capability,
that
they
have
for
this
sliding
window
in
rock
CB
to
do
the
same
thing
for
mem
table
tombstones,
and
it's
not
the
same
like
it
would
be.
A
You'd
have
to
do
it
differently,
I
think.
Maybe
we
can
still
do
it
there
or
maybe
there's
some
way
that
we
could
expose
more
information
from
Rock's
DB,
like
you
know
whether
or
not
you've
created
tombstone
in
the
mem
table,
we
can
track
it
or
something,
but
Rock's
TV
doesn't
seem
like
it
gives
you
a
lot
for
for
this
kind
of
problem.
A
A
And
if
we've
issued
too
many
deletes
before,
like
a
compaction,
is
taking
place
because
I
think
we
can
ask
roxyb
to
do
a
callback
when
compaction
happens,
so
we
could
watch
for
compactions
and
if
we've
done
too
many
deletes
before
compaction
or
men,
people
flush,
that's
the
other
one
meant
people
flush.
Then
then
we
can
say:
oh,
we
might
be
in
a
bad
State.
A
A
Do
you
do
you
have
any
like
tests
that
that
have
the
the
problem
right
now
like
I'd,
be
I'd
love
to
know
if
this
PR
helps.
A
A
Changes
for
the
the
the
settings
out,
but
some
of
those
settings
actually
do
seem
to
help
a
little
bit
with
with
certain
behaviors,
so
I'm
tempted
to
leave
it
in,
but
I,
don't
know
it's
okay
either
way.
Really
we
could
split
this
into
two
separate
PRS
and
try
to
test
each
individually.
A
A
So
yeah
anyway
yeah
you
were,
if,
if
you
know
I'd
be
I'd,
be
happy
to
get
back
into
this
again,
if
we
want
to
try
to
make
it
better.
B
But
yeah
definitely
I'd
like
to
be
able
to
reproduce
this
issue
and
be
able
to
diagnose
it.
Yeah
remix.
A
I
I
do
suspect
that
this
pier
is
good.
Like
I,
don't
I,
don't
think
you
think
it's
bad
we're
exposing
functionality
in
roxdb,
and
you
know
we
we
can.
We
don't
need
to
change
the
defaults.
We
can
remove
that
if
we
want
to
I
think
it's
helpful,
but
but
we
we
don't
need
to
do
that.
We
could
just
expose
these
other
options
from
roxdb,
but
I
think
it
would
be
I
think
it's
a
win
just
no
one.
No
one's
approved
it
because
no
one
is
able
to
test
it
right
now.
B
A
If
not,
then
thanks
for
coming
everyone
and
have
a
great
week.