►
From YouTube: CDS Reef: Performance Meeting
Description
The Ceph Developer Summit for Reef is a series of planning meetings around the next release and some community planning.
Schedule: https://ceph.io/en/news/blog/2022/ceph-developer-summit-reef/
A
All
right:
well,
it's
that's
good.
I
didn't
miss
anything,
but
I
think
everyone's
still
trying
to
wrap
things
up
from
quincy
and
now
the
pacific
162h
release
so
not
totally
surprising.
A
There
was
one
closed
pr
this
week.
This
is
adam's
great
pr.
I've
got
the
world
on
a
string
sitting
on
a
rainbow.
I
think,
let's
see
is
adam
here.
I
don't
think
so
right
now,
so
the
the
gist
of
this
one
was
that
I
think
a
lot
of
it
got
moved
out
into
other
vr's
and
other
work
and
and
the
stuff
that
remained
wasn't
really
necessary
anymore
or
is
being
added
to
something
else.
A
If
it
is
so
that
got
closed
by
adam,
we
did
have
a
couple
that
were
updated
this
week.
Corey's
excellent
pr
has
had
a
lot
of
discussion
and
both
casey
and
igor
have
reviewed
it
and
it
it
looks
pretty
good.
The
only
comment
that
igor
had
made
was
that
maybe
we
should
have
a
configuration
parameter
to
to
switch
it
back
to
the
old
behavior.
In
case
we
caused
some
kind
of
odd
corner
case,
which
is
probably
not
a
bad
idea
it.
A
You
know
it's
a
little
crazy
to
be
option.
You
know
adding
options
to
revert
performance
improvements
or
bug
fixes.
Really
in
this
case
I
think,
but
this
one
might
be
warranted
igor.
Otherwise
did
you
did
you
feel
or
your
casey
either
of
you?
Did
you
feel
pretty
good
about
it?
I
I've
only
had
briefly
looked
over
it
and
you
know
know
the
general
idea
of
what
it
does
casey.
I
know
you
had
a
couple
of
things
you
want
to
change
earlier.
B
A
Yeah,
I
I
I
can
understand
your
your
your
concern.
There
they're
igor,
sometimes
rocks
tv
can
be
really
temperamental,
so
it's
yeah.
I
don't
think
this
should
cause
issues
just
given
what
I
I
understand
about
how
it
works,
but
but
yeah
I
I
do
get
the
the
concern
and
casey
are
you?
Are
you
satisfied
now
with
the
vr.
C
Yeah
code
wise,
I
I
like
it
and
it
does
what
it
says
that
it
does.
But
I
just
don't
understand
blue
store
well
enough
to
to
know
you
know
how
it
fits
in
and
what
the
repercussions
could
be.
A
Yeah,
I
mean
oh
good,
corey
you're
here
you
know
what,
let's,
let's
let's
talk
about
it
more
once
we
get
through
the
prs,
because
now
we
can
have
a
bigger
discussion.
I
think
so,
let's
see
other
updated
prs
my
time
based
algorithm
for
the
allocator
for
for
for
the
adl
allocator,
so
the
big
win
was
we
already
did.
A
That
was
to
fix
the
the
repeat,
the
repeated
searches
from
the
same
cursor
position
for,
like
you,
know,
every
time
you're
doing
trying
to
do
an
allocation
of
the
same
size.
You
start
at
the
same,
offset
you
fail
over
and
over
and
over
again,
and
then
you
know
until
you
actually
do
a
different
search
and
for
a
different
size.
You
you
you
just
waste
time
and
you're
fit.
A
So
that
was
the
big
win.
This
is
still
a
win.
I
think,
and
it
makes
it
easier
to
understand
what's
going
on,
but
we
we
can.
We
can
be
a
little
more
lazy
about
it
now.
We
don't
need
it
immediately
for
quincy,
so
igor,
and
I
have
been
just
having
discussion
in
there
mostly
about
getting
rid
of
the
debug
code,
but
it's
do
not
merge
right
now,
so
we
should
be
okay
for
the
moment.
A
Let's
see,
but
in
terms
of
allocators,
I
think
we
still
have
a
lot
of
work
to
go
back
and
look
at
behavior,
but
but
yeah
not
not
for
today.
Okay,
so,
let's
see
next
is
maintain
free
list
type
in
ncb
mode,
so
gabby.
I
asked
gabby
about
this
a
little
while
ago
and
he
said
that
this
is
more
or
less
now
just
a
refactoring
pr.
A
It
shouldn't
have
any
performance
impact
at
all,
so
it
probably
could
be
removed
from
our
list
here,
but
I'll
just
leave
it
until
until
it
merges
it
doesn't
have
anything
okay
and
then
finally,
the
doc
pr
for
rewriting
the
hardware
docs
there's
been
a
little
bit
more
discussion
on
that.
A
A
couple,
more
suggestions
about
wording
and
and
kind
of,
I
think
the
latest
was
looking
at
clock,
speed
versus
number
of
cores
in
terms
of
the
the
monitor
the
manager
or
something
so
anyway,
people
are
looking
at
that
and
talking
about
getting
that
all
all
right,
so
that
was
that
was
it.
That
was
all
I
had.
I
didn't
quite
make
it
through
the
no
file.
I've
been
trying
to
do
too
many
things
at
once,
but
usually
the
stuff
at
the
end
doesn't
see
a
whole
lot
of
updates.
A
All
right
well,
then,
corey.
There
has
been
a
lot
of
work
on
your
pr
or
fixing
our
our
it
slow
iteration
issue.
Do
you
would
you
be
able
to
talk
a
little
bit
about
kind
of
the
latest
and
what
you
guys
have
been
talking
about
and
doing.
D
D
All
right
so
yeah,
so
I
guess
let
me
just
start
with
where
we
ran
into
this
issue,
so
that
everybody
has
the
contacts,
because
I
think
that's
important,
especially
in
terms
of
deciding
how
we
push
it
in
the
pacific
behind
a
feature
flag
or
not
and
stuff.
D
So
basically,
we
had
a
customer
on
a
pretty
new
cluster.
We
didn't
have
too
much
going
on
there
yet,
but
we
had
one
customer
using
veeam
that
started
hitting
our
cluster
really
hard
was
doing.
I
mean
pretty
hard
was
doing
like
50
to
100
megabytes
per
second
of
writes
for
a
few
weeks,
and
then
we
started
noticing
that
it
was
pausing
like
every
six
minutes
and
then
restarting
again
and
we
ended
up
finding.
D
That
was
because
the
reshard
operation,
the
dynamic
resharding,
was
trying
to
reshart
every
six
minutes
and
it
kept
failing
due
to
one
of
the
bugs
that
has
existed
in
the
pacific
or
virgin
buckets.
D
And
so
we
saw
this
in
our
monitoring
and
stuff,
and
we
looked
into
that
and
we
ended
up
finding
that
the
issue
was
related
to
the
way
that
a
bug
that
had
been
introduced
earlier
about
ascii
two
non-ascii
2
keys
and
omap,
causing
issues-
and
there
was
a
fix
for
that-
and
so
we
basically
patched
our
production
cluster
with
that
fix
and
right
afterwards,
the
cluster,
the
bucket
restarted
successfully
from
11
to
397,
shards
and
then
performance
throughput
completely
tanked.
D
D
So
the
fix
that
we
ended
up
finding-
and
that
is
in
that
pr-
is
basically
that
the
omap
iterator
at
the
roxdb
level
was
trying
to
search
for
a
key
in
all
three
column:
family
shards.
D
Even
though,
based
upon
the
hashing
paradigm,
it
would
only
ever
exist
in
one
of
the
column,
families
and
when
it
tries
to
search
for
it
and
one
of
them,
one
of
the
column,
families
that
the
key
doesn't
actually
exist
in
it
may
come
up
against
a
range
of
one
of
these
delete
range
tombstones
and
the
way
that
it
works
in
rocksdb
is
that
essentially
has
to
iterate
over
every
single
key
in
that
range,
which
in
our
case,
was
like
millions
of
keys
and
do
the
decoding
and
all
that
which
took
a
significant
amount
of
time.
D
So
the
pr
basically
just
avoids
that
first,
by
checking
whether
the
range
passed
down
from
blue
store
can
necessarily
be
only
on
a
single
column,
family
shard,
due
to
the
hashing
paradigm,
and
also
prevent
uses
the
upper
bound
setting
and
rocks
db
to
prevent
from
iterating
over
anything.
Above
the
o
map
of
that
particular
object.
D
I
think
there's
a
very
real
concern
that
somebody
else
is
going
to
run
into
that
same
problem
that
we
did
and
have
a
really
bad
time
if
they
happen
to
have
a
client,
that's
doing
a
lot
of
bucket
listings
and
has
a
bucket
that
is
up
against
one
of
these
ranged
tombstones
and
they
don't
know
about
the
manual
compactions
or
this
flag
specifically
and
enabling
it.
A
Corey,
I
think
if
we
were
to
put
a
configuration
option,
it
would
be
that
this
would
be
the
default
behavior
and
that
would
only
be
a
short
circuit
for
people
to
be
able
to
like
turn
it
off
if
it
broke.
Something
is
that
would
that
be
reasonable?
Do
you
think,
or
do
you
think
we
just
have
it
blanket
on.
D
I
mean
I
think
I
mean
I
certainly
understand
the
sentiment
there,
of
putting
it
behind
a
configuration
flag
when
we're
trying
to
put
this
in
rather
quickly
at
the
end
of
this
kind
of
release
cycle
and
trying
to
get
it
out.
I
mean
that
makes
sense
to
me.
I
don't
know,
I
don't
know
what
the
extent
of
testing
is.
I
guess
that
is
being
done
prior
to
release
anyways
that
may
or
may
not
catch
it.
I
feel
like
from
my
perspective,
in
terms
of
risks
of
things
that
could
go
wrong.
D
A
Yeah
I
mean
it
on
one
hand.
It
feels
ridiculous
to
put
a
configuration
option
in
for
what
to
me
feels
like
a
bug
fix
here.
Right
like
this
is
this
is
a
major
issue
that
you've
you
figured
out
how
to
solve,
but
you
know
there
there
is
like
this
knit.
You
know
nagging
voice
in
the
back
of
my
head,
like
where
we've
seen
rocks
tv.
Do
things
we
really
didn't
expect
and
that's
I
don't
know
if
your
pr
could
result
in
that
or
some
breaking
something
else.
A
I
don't
really
think
so,
but
that's
and
just
on
the
surface-
I
don't
think
so,
but
I
guess
that
would
be
the
the
question
or
the
concern.
D
A
B
Yeah,
I
completely
agree
with
that.
It
should
be
enabled
by
default,
but
well
I
I
met
the
case
when
I
want
to
disable
some
stuff
in
the
field
multiple
times
so
yeah.
I
definitely
want
a
switch
for
such
a
new
feature.
A
D
Yeah,
I'm
just
thinking
through
that
right
now
I
mean
from
the
upper
bound
lower
bound
standpoint.
I
said
no
for
sure,
because
I
can
just
not
set
those
options
on
the
racks
to
be
read:
options
simple,
this
condition
around
that.
So
that
seems
very
trivial.
I'm
trying
to
think
now
the
idea
of
skipping
isolating
it
to
a
single
column.
Family
should
also
be
easily
based
upon
a
configuration
flag
too
yeah,
because
I
can
just
give
that
condition
as
well,
so
it
should
be
a
minor
lift.
D
A
Yeah-
and
it
might
be
something-
you
know
a
dev
option
that
you
know
if
this
is
working
well
in
you
know
six
months.
Maybe
we
don't
even
keep
it
around.
Maybe
it's
something
that
we
we
keep
in
back
our
heads.
We
don't
we
don't
really
need
it
in
the
long
run.
E
He
just
would
just
like
to
say
that
I
love
that
pr
and
I'm
sorry
that
it's
not
been
there
from
beginning
that
kind
of
improvement
it.
It
really
should
have
been
corey.
Thank
you
for
making
that
pr.
D
Okay,
well
I'll
go
ahead
and
I
mean
I
can
certainly
add
configuration
flags.
I
don't
think
that's
a
big
lift
at
all,
just
after
briefly
thinking
through
it.
So
so
why
not
any,
I
guess
do
I
just
add
them
like
as
any
standard
configuration
or
this
kind
of
option.
Do
we
add
any?
D
D
I'll
do
that
here
in
the
next
few
hours
then
and
end
up
taking
pr
awesome.
A
Yeah,
do
you
do
you
want
to
run
this
through
master
qa?
First,
before
we
do
the
back
part
or
what
what's
your
timeline,
looking
like
for
1628.
F
Yeah,
I
think
this
issue
is
urgent
enough
for
us
to
fix,
or
you
know,
lock
16
to
8..
I
would
like
to
get
the
test
run
started
for
master
as
soon
as
possible.
Given
that
the
dev
option
I
mean
I
mean,
given
that
the
default
behavior
is
on
by
default,
the
config
option.
Change
should
be
minor.
It
doesn't
hurt
to
run
it
through
a
first
round
of
testing.
F
F
D
A
Yeah,
I
I
just
put
my
approval
on
it.
Just
saying:
let's
get
the
the
option
to
disable
behavior
default
on
and
test
it
and
merge
it.
So
yeah
looks
good.
Thank
you
corey.
This
is
that
was
excellent,
excellent
work
and
can't
highlight
that
enough.
I,
I
suspect
that
this
is
going
to
reduce
the
the
workload
on
our
our.
You
know
our
consultant
folks,
quite
a
bit.
So
this
is
this
is
a
big
win.
Thank
you.
D
Yeah,
by
the
way,
thank
casey
also
for
giving
me
some
pointers
on
the
c
plus
plus
stuff
as
well.
It's
been
since
it's
been
a
while,
since
I've
used
c
plus
plus
a
lot,
so
I'm
just
kind
of
getting
back
into
it.
So
I
appreciate
the
patience
with
some
of
the
details.
There.
A
All
right
well,
this
this
has
been
a
success
so
fantastic,
I
don't
have
anything
else
for
this
week.
We
last
week
we're
talking
about
pg,
log
and
and
some
other
things
gabby's
not
here.
So
maybe
we'll
just
wait
on
continuing
that
discussion.
A
I
think
the
only
other
thing
I
had
is
I'm
looking
at
crimson
again
and
running
into
some
issues.
I've
got
a
pull
request
that
I'll
finish
later
today,
all
the
seastar
command
line
options
were
no
longer
being
parsed
correctly,
due
to
a
change
in
c-star
itself,
which
we
were.
We
were
kind
of
this
janky
thing
that
we
were
abusing
to
to
pass
stuff
to
them.
A
So
I'm
I'm
fixing
that,
and
I've
got
a
way
that
I
think
will
work
fairly
well,
but
then,
beyond
that
it
seemed
like
I
was
seeing
a
memory
leak
in
seastar
or
sorry
in
crimson.
Usually
I
saw
memory
growth
during
rights,
so
I
think
it's
on
the
right
path.
Radic
made
a
good
observation
this
morning.
That
potentially
was
the
way
the
alien
store
works.
We
may
be.
A
Leaking
memory
when
we
issue
new
threads
through
alien
stores
like
in
steepy
radik,
do
you
do
you
want
to
repeat
what
you
were
saying
in
court.
G
It's
not
even
a
hypothesis,
it's
just
a
long
shot.
Basing
on
my
on
my
experiments
when
when
we
were
when
we
were
trying
to
run
alien
star
and
bluester
with
the
c
stars
allocator
at
the
moment
it
got
the
situation
got
changed
significantly
because
because
a
sister
got
a
bunch
got
a
patch
that
basically
bypasses
the
sister
allocator
for
alien
threats.
Like
the
bunch
of
the
bands
we
have
in
alien
start.
G
This
should
work,
but
you
know
sure,
and
does
there
is
some
field
between
those
words
sure,
but
from
my
personal
experience
is
that
when
an
alien
threat,
okay
sister
allocator
is
is,
is
really
speedy.
It's
really
fast,
but
there's
a
lot
of
limitations.
G
The
most
important
one
is
that
there
is
a
limit
on
the
number
of
shots
and
a
shot
is
being
created
there
when
the
allocator
sees
a
new
thread
on
the
first
time.
G
So
if
there
we
could
be
creating
new
shards
if
the
bypass
somehow
stopped
working
and
actually
in
bluestar
in
roxdb,
I
saw
and
sometime
creation
of
short
living
threats.
G
So
it
turned
out
that
running
blues
the
running
blue
star
directly
with
the
very
limited
sister
allocator.
It's
not
our
way
to
go.
It's
not
a
good
idea
since
then,
of
course,
as
I
said
before,
sister
got
a
patch
that
basically
bypasses
the
sister
allocator
and
goes
directly
to
the
to
the
system.
One
short
work,
but
basically
just
a
hint
on
what
might
be
worth
initial
traffic
checking.
A
Yeah
this
is
related
to
this
topic
too.
Is
I
want
to
go
back
and
see
if
I
can
get,
I
thought
we
can
again
try
to
use
tc
malek
for
all
the
the
system
allocation.
G
That
would
be
super
cool
because
just
at
the
moment,
crimson
is
is
not
linked
with
dc
malloc.
So
it's
it
uses
the
leap
c
provided
allocator,
which
is
called
pt
malloc.
Second,
or
something
like
that,
where
it's
it's
always
lockish.
G
There
is
no
idea
of
lock
free,
perfect
allocation
every
single
time
you
need
to
go
over
some
locking
and
we
have
multiple
threads
in
the
in
the
multiple
posix
threads
in
the
alien
stairs
pool,
so
might
be
really
worth,
maybe
might
be
worth
first
of
all,
of
course
profiling.
G
But
if
I
recall
correctly,
we
were
doing
that.
Maybe
I
guess
I
will
try
to
find
it
soon,
just
after
justin
typical
and
the
difference
was
pretty
big
between
between
using
the
system
allocator
and
the
system
equator,
because
initially,
before
sister
got
the
bypass
for
system.
Okay
for
the
system,
allocator,
we
were
linked.
We
were
disabling
this,
the
assisted
allocator
entirely
and
switched
the
just
for
the
sake
of
operating
booster.
We
switched
back
to
the
lipstick
provided
one
and
the
difference.
G
A
Sorry,
no
worries
no
worries.
I
believe
you,
and
also
we
see
with
the
lipsy
allocator
that
that
we
see
significantly
more
memory,
fragmentation
and
higher
memory
usage
overall.
So
it's
it's
probably
a
big
win.
If
we're
using
alien
store
to
to
use
tc
malik.
G
A
Yeah,
we'll
also
be
able
to
then
use
all
the
the
priority
cash
work
for
blue
store
as
well,
which
would
be
nice.
A
A
So
the
the
last
communications
I
had
with
them,
no
one
knew
what
firmware
we
should.
They
asked
me
which
one
I
wanted
to
use,
and
I
told
them
I
had
no
idea
which
ones
I
should
use,
because
I
don't
know
what
the
differences
are
and
so
that
they
were
trying
to
figure
out
which
firmware
they
thought
I
should
install
and
that
that's
kind
of
where
we
left
it.
A
Hopefully,
maybe
I
can
talk
to
someone
a
little
bit
more
and
and
figure
it
out.
I
think
he
was
of
the
opinion
that
we
should
get
the
latest
just
whatever
the
newest
was
and
put
that
on
there,
but
I
think
they
were
trying
to
wait
to
talk
to
some
of
the
engineers
in
in
korea
and
south
korea
to
to
just
verify
that
that
was
you
know
the
right
way
to
go.
A
The
good
news,
though,
is
that
the
cursor
change
basically
got
us
back
to
where
we
were
before.
A
The
one
thing
that's
kind
of
interesting
is
that
the
certain
drives,
which
are
basically
the
same
things
we
have
only
with
newer
firmware
and
a
slightly
different
driver
vision,
are
still
like
10
to
15
percent
faster
than
ours.
So
I
don't
know
if
that's
due
to
the
firmware
or
if
there
was
anything
else
that
changed,
but
they
just
consistently
get
just
slightly
higher
results
than
we
see
even
with
our
fixes,
and
they
weren't
suffering
the
same
issue
that
we
were
so
you
know.
Definitely
it
seems
like
something
something
is
different.
A
Oh,
I
see
your
your
your
guests
here.
Radic.
G
Yeah,
it's
not
that's
first
and
found
it's
not
about
bluester.
Actually
it's
about
c
c
and
star,
but
well.
The
difference
is
pretty
significant.
A
So
yeah,
I
think,
it'd
be
really
interesting
to
see
how
ttml
compared
I
I
assume
the
c
star
memory.
Allocator
is
what
we
really
want
anywhere.
We
can
use
it.
G
G
Am
a
afraid
we
cannot
it's
simply
too
limited:
it's
not
a
posix
compliant
memory.
Allocator.
First
of
all,
there
is
a
huge
limited.
The
the
biggest
the
most
important
limitation
is
that
you
can.
It
can
support
only
limited
and
constant
compact
undefined
number
of
threats,
and
it
doesn't
even
recycle
the
resources,
the
shards,
so
this
boils
down
into
the
number
of
all
threads.
It's
seen.
G
So
if
you
have,
if
you
let's
say,
if
you
say,
if
you
have
an
atp
server
that
would
like
to
use
posix
threads
and
system
with
system
locator,
it
will
be
able
to
spawn
if
you're
corrected,
two
five
six
threads,
and
that's
all
after
that.
You
know
man,
sorry.
A
Braddock,
I
thought
that
we
could
use
it
for
the
any
memory
that
we're
allocating
on
the
the
c-stars
side,
though
not
on
the
alien
side.
Right,
isn't
that
the
whole
idea
is
that
we
can
use
the.
G
At
the
moment,
after
the
changes
in
systar,
we
can-
and
we
do
use
of
the
sister
allocator
in
the
sister
part
of
of
of
crimson
yeah
before,
if
we
had
to,
we
were
enforced
to
research,
the
ellipse
one
only
because
only
for
the
sake
of
hosting
bluestar.
Even
if
actually
somebody
picked
up
seanster
via
the
configuration
the
runtime
configuration
machinery
we
we
had,
we
were
using
since
the
the
ellipse
allocator
for
entire
process
for
all
parts,
also
the
sister
written
ones
after
the
change.
G
A
So
right
now,
if
we
compile
without
the
c-star
default
allocator,
then
we'll
use
the
c-star
allocator
for
everything
right.
G
If
we
compile
without
the
scissor
operator,
it
will
use
lip
c
for
everything
if
you
switch
that
to
if
we
switch
the
lipsy
allocator
to
overwrite
it
with,
let's
say
tc
malloc.
G
G
I
wouldn't
say
my
impression
is
that
the
the
the
performance
of
tc
malloc
will
be
somewhere
between
the
system
allocator
and
and
the
pt
my
lockers
too,
the
default
lipsy
ones
so
still
a
business
in
in
in
not
poking,
not
disabling,
the
system,
allocator
running
all
the
gizmo
and
all
the
system.
Gizmos
all
the
sister
worked
with
the
sister
allocator,
while
letting
the
alien
threats
in
alienster
to
use
pc
just
for
the
sake
of
booster.
A
G
I
prefer
to
not
rely
on
my
memory.
Sister
default
allocator,
let's
grab
others.
This
is
repo
okay,
core
memory
dot
cc.
Yes,
is
that
default
allocator.
G
G
A
Yeah,
most
mostly
I
was
just
trying
to
erratic,
I
was
just
trying
to
figure
out
like
has
the
behavior
now
changed
with
that
that
flag?
Like
does
it
at
one
point:
where
was
it
using
the
default
allocator
for
everything?
Is
it
just
alien
threads,
and
now
it
uses
only
for
alien
threads?
That's
what
I
was
trying
to
kind
of
okay.
G
G
A
Yeah
once
once
I
get
done
just
wrapping
up
this
other
pr
I'll,
try
to
start
looking
at
the
memory
usage
again
and
see
if
I
can
dig
into
exactly
what's
going
on
here
and
what
we
can
do
to
switch
over
to
tc
malik,
because
that
would
be,
I
think,
a
low
hanging
fruit,
big
win.
H
You
got
cat
in
the
middle.
Could
you
repeat
the
last.
A
So
I'm
sure
too
you're
you're,
thinking
that
it's
fine
to
use
c-stars
allocator
for
science
store
and
for
easter
and.
G
A
A
All
right:
well,
that's
all
I've
got
so
thanks.
Everyone
thank
you
for
coming
and
I
think
we
have
a
raido's
meeting
coming
up
in
15
minutes
right,
yeah.
A
Yes,
yes,
exactly
so
why
don't
we
wrap
this
up?
Everyone
can
get
a
20
minute
break
and
then
I'll
see
a
number
of
you
guys
in
about
20
minutes.