►
From YouTube: Ceph Performance Meeting 2021-07-15
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
One
of
our
our
guys
on
our
side
has
been
playing
with
it
and
he
wanted
me
to
help
him
a
little
bit
with
the
performance
debugging
since
it
was
going
kind
of
slow,
and
so
we
ran
gdb
pmp
on
it
and
it
looks
like
there's
a
worker
thread
pool
associated
with
it.
That
was
looked
really
busy
to
me,
given
kind
of
the
the
throughput
rate
that
he
was
seeing.
A
I
don't
know
very
much
about
it,
so
I
wasn't
sure
exactly
what
what
the
expectation
was.
A
Okay,
well,
I
I
had
to
send
it
off
to
the
developers
into
ilia
on
our
side.
So
hopefully,
smarter
brains
will
take
a.
A
Iran,
both
fio
tests
with
libraries,
the
rbd
engine
and
then
also
ran
rbd
bench
and
some
python.
He
wrote
a
little
python
program
that
just
used
the
rbd
directly
for
you
know
doing
stuff
and
all
of
them
kind
of
seemed
to
show
similar
issues
where
it
was.
It
was
actually
a
little
slower
than
just
running
rpd
natively.
A
So
maybe
this
configuration
issue
or
something
else
was
going
on
strange,
but
that
was
that
was
kind
of
the
seemed
to
be
the
prevailing
theme,
at
least
on
the
the
test
setup
that
he
had
seen.
C
B
My
I've
been
focusing
on
both
rbd
workloads
and
a
little
bit
of
cfs,
actually
kind
of
getting
my
hands
dirty
with
io
500
testing
and
working
with
the
former
music
guy
dennis
nujab.
He
we've
been
kind
of
looking
at
ways
to
tune
cfs.
B
You
know
in
the
entire
system,
kind
of
improve
the
performance
as
well
as
kind
of
figuring
out.
Where
does
octane
fit
in
all
of
this
either
the
ssd
form
factor
or
the
persistent
memory
version.
So
it's
kind
of
been
a
lot
of
experimenting
a
little
bit
of
hacking,
but
yeah
that's
kind
of
what
I've
been
working
on
lately.
A
If
yeah,
if
you
see
anything
interesting,
let
me
know
I
was
the
one
that
submitted
our.
I
o
500
results
for
red
hat.
B
What
actually
I
was
wondering
about
that
was
that
with
replication
or.
C
B
A
B
Yeah
yeah,
though
we
figured
and
yeah,
we
were
thinking
about
that
too
yeah.
It's
interesting.
A
A
But
yeah,
no
there's
there's
lots
of.
I
had
to
do
a
lot
of
crazy
stuff
to
get
those
results.
Like
honestly,
our
numbers
were
real,
but
that
was
like
after
10,
probably
taking
like
the
10,
the
best
of
like
10
runs
honestly
and
there's
a
lot
of
variability
with
the
way
this
fns
works.
It's
just.
You
can
end
up
in
a
situation
where
early
on,
if
you
get
good
distribution
through
the
dynamic
subtree
partitioning
like
you
avoid
having
it
like.
A
So
so,
if
you
can't
acquire
locks
with
the
way
that
it
works
early
on,
you
might
end
up
in
a
situation
where
it
can
like
never
acquire.
Loxes
is
trying
to
do
distribute
subtrees
across
all
the
different
mbs's,
and
if
that
happens
then,
like
your
score
can
just
tank.
But
if
you
can
like
happen
luckily
to
grab
it
early
on,
then,
then
you
can
actually
get
like
a
score.
It's.
A
So
that
would
be
honestly
that,
if
you
can
like,
I,
I
tried,
and
I
sort
of
made
it
better
and
some
of
the
work
that
was
being
done
by
u-kernel
looks
like
it
could
really
be
maybe
relevant
here.
If
you
can
fix
that
you'll
get
like
way
better
scores
than
we
have
already
even.
B
A
All
right,
I
don't
think
course,
actually
ended.
They
must
be
like
chatting
about
stuff
over
there,
so
maybe
we'll
just
get
started.
I
imagine
they'll
they'll
get
in
here
sooner
or
later
all
right.
I
was
surprised.
We
have
like
a
ton
of
pr's
that
came
in
from
a
guy
at
ibm
in
the
research
group,
and
this
is
like
out
of
the
blue,
but
he's
it's
like
catnip
for
me
because
he's
looking
at
like
memory
allocation,
optimization
and
like
avoiding
memory,
copies
and
all
kinds
of
other
random
stuff
all
over
the
place.
A
Most
of
it
can
focus
actually
in
lip
rbd,
but
some
of
this
is
relevant
for
other
stuff,
so
I
won't
get
into
each
one
of
these
individually,
but
go
take
a
look.
I'm
I'm
like
impressed
and
and
very
interested
in
what
he's
doing
so
ori.
I
think
his
name
is.
I
hope
I
got
it
right.
Oh
and
then
I've
got
my
stupid
little
pr.
That
increases
the
osd
client
message
gap,
which
is
you
know
like
a
one-liner
that
now
needs
to
be
rebased.
A
But
that's
you
know
based
on
the
stuff
that
we
talked
about
last
week.
So
that's
yeah
fine
must
be
closed.
Kifu's
abl
allocator
emerged.
I
think
they
got
some
reviews
good.
So
that's
neat.
A
Or
sorry
that
was
not
l
avl
allocated.
I
think
that
was
just
changing
some
options
here.
Sorry,
he
had
something
else,
a
b3
thing,
maybe
that
already
merged
that's
what
I
was
thinking
of.
Okay,
updated
this
manager,
ttl
cache.
There
were
results
that
were
posted
a
little
while
ago.
Then
there
wasn't
a
whole
lot
going
on
with
it,
and
then
it
just
got
some
updates
recently.
I
don't
know
what
those
are,
but
that
seems
to
be
just
kind
of
moving
along
and
hopefully
we'll
make
the
manager
faster.
A
So
that's
great
igor's
teaching
removal,
optimization
pr.
It
looks
like
that
may
be
past
kefu's
testing.
The
failures
he
saw,
I
think,
were
unrelated.
So
that's
really
good
and
it
got
approved
by
someone
I
don't
know,
but
that's
good.
So
hopefully
that
merges
soon
yeah,
that's
excellent!
A
No
movement!
Adam's
bluff
has
fine
grain
locking.
I
don't
think
he's
worked
on
that
much
adam
is
that
is
that
still
do
not
merge.
A
D
E
F
A
A
My
sharded
object,
cache
and
rgw
still
hasn't
gotten
a
review.
I
need
to
go
bug,
mark
and
and
and
the
other
rgb
rg
of
you
guys
over
there
and
see.
If
I
can
get
somebody
to
look
at
it,
it's
not
great,
but
I
mean
it's
it's
I
think
it's
probably
better
than
what
we've
got.
I
know
that
that
matt
really
wanted
to
rewrite
that
whole
like
completely
rewrite
the
cache,
but
this
you
know
it
is,
I
think
I
mean
it's
just
a
stupid
wrapper
around
the
existing
one.
So
it's
pretty
simple.
C
C
Let's
see
a
lot
of
this
other
stuff
is
old.
My
age
banning
pr
still
needs
to
be
updated.
A
And
merge,
I
need
to
work
on
that,
but
still
there
it
was
updated
a
couple
months
ago
or
something
when
adam
and
I
were
working
on
his
trying
to
see
if
we
could
get
running
on
top
of
his
stuff.
A
Okay,
yeah,
I
mean
there's
other
stuff
here-
that
we
need
to
look
at
again
like
adam
and
and
igor's
not
here,
but
we
still
need
to
make
a
decision
on
what
to
do
about
the
the
the
pinning.
In
the
note,
cache.
A
But
we'll
wait
for
you
to
be
around,
I
guess
so.
Yes,
I
mean
that's
really
about
it.
I
think.
Okay,
anything!
I
missed
guys
possible.
I
missed
some
close
stuff
because
I
don't
know
that
I
actually
got
time
to
look
over
it.
G
I
have
a
quick
general
question:
there's
that
big
blue
store
change
that
basically
drops
the
continuous
tracking
of
allocation
and
rebuilds
it
on
restart.
What's
the.
E
A
Okay,
hey
john
it
it
looks
like
we,
we
kind
of
oscillate,
depending
on
on
what
we've
been
doing.
What
we've
been
optimizing
on,
whether
or
not
the
kd
sync
thread
is
a
bottleneck
for
rights
so
like
we
kept
oscillate
back
to
it
being
a
a
bottleneck
again-
and
this
looks
like
to
me-
this
is
really
the
the
real
advantage
here
right
is
that
we
just
we're
doing
much
less
work.
There.
G
G
Yeah
yeah,
I'm
just
I'm.
I
was
just
thinking
whether
it
makes
sense
to
like
point
people
at
that,
but
I'm
guessing
that
it
doesn't.
It
won't
matter
that
much
because
people
who
wouldn't
benefit
from
it
would
be
those
who
are
writing
large
objects
and
they
won't
be
penalized
by
it
either,
because
the
restart
rebuild
will
be
quick.
G
G
H
I
think
we
capped,
like
the
max
possible
system,
would
take
less
than
one
hour
to
recover,
like
four
terabyte
osd.
H
Is
correlated
to
the
number
of
extent
so.
G
Yeah,
okay,
yeah
we'll
be
interesting
to
get
some
real
world
numbers
on
this.
I
guess.
H
G
G
H
G
D
When
I
get
this
first
part
merged
and
then
I
also
make
the
fastest
and
behavior
work.
A
All
right
cool,
any
other
prs
guys.
A
Oh
then,
let's
move
on,
let's
see
so
this
week.
The
only
thing
I've
got
is
I've
been
spending
some
time
looking
at
crimson
again,
just
because
we're
trying
to
kind
of
keep
track
every
quarter
on
cap,
how
we're
doing
with
performance
and
just
can't
generally
getting
it
going
so
compared
to
last
quarter.
A
Had
some
problems
comes
in
with
segmenting,
seemingly
due
to
the
most
recent
cpr
update,
but
guy
working
eventually
with
with
going
back
and
then
cherry
picking
a
couple
of
fixes
on
top
of
it
we're
a
little
slower
than
we
were
before.
That's
not
super
unexpected,
but
a
little
irritating.
If
we're
trying
to
report
it,
you
know
how
we're
doing
on
it.
I
went
back
and
did
a
bunch
of
wall
clock
profiling
again
were
still
completely
bound
by
the
reactor
thread.
Oh,
this
is
alien.
A
Storm
blue
store,
not
not
science
store,
so
yeah.
Keep
that
in
mind
that
you
know
some
of
this
is
is
alien
star
related.
What
I
saw
that
is,
that
we're
spending
a
fair
amount
of
time
in
the
started
up
or
sorry
the
chartered
work
queue.
This
is
both
before
and
after
the
the
lockless
q
implementation
by
kifu
before
we're
spending
a
lot
of
time
in
notify
one
after
we're
spending
a
lot
of
time
dealing
with
the
semaphore
as
being
used
by
the
lackluster
implementation.
A
There
seems
to
be
very
little
performance
advantage,
but
maybe
a
little
bit,
but
but
not
much
so
so
it's
it's
about
the
same
as
it
was
before.
A
There's
a
fair
amount
of
time
being
spent
malekin-free
in
the
reactor,
a
fairly
big
contributor
to
that
is
just
dealing
with
buffers
and
buffer
lists,
but
there's
other
stuff,
sometimes
within
c-star
itself,
especially
in
the
networking
code.
So
there's
that
manager
updates
the
reactor
is
spending
a
lot
of
time
like
maybe
up
to
11
or
12
percent,
just
dealing
with
updating
the
manager.
A
So
that's
unfortunate
and
maybe
something
that
we
need
to
talk
about
a
little
bit
more,
how
how
that
happens
and
how
much
time
we
really
want
each
reactor
spending
doing
that
lots
of
time
spent
dealing
with
event
fd-
and
I
were
talking
about
that
this
morning-
a
little
bit
of
c-star
uses
event
fd
for
communication.
A
A
Okay,
one
thing
that
that
we
were
talking
about
this
morning
is
he's
kind
of
bringing
up
the
point
that
blue
stores,
maybe
an
alien
store,
is
not
necessarily
the
the
best
thing
to
be
using
to
measure
efficiency
gains
with
crimson
just
because
of
the
amount
of
time
spent
in
blue
store
itself.
But
as
as
I
was
thinking
about
it
in
fact,
like
a
hundred
percent
of
the
cpu
time
is
being
spent
in
the
reactor
and
only
about
50
to
100
percent
is
being
spent
in
blue
storm.
A
A
I
think
actually
memstor's
right
path
may
be
less
efficient
than
blue
stores
at
this
point.
So
that's
something
that
needs
to
be
looked
at
more
and
I
think
I
need
to
go
back
and
finally,.
A
Update
the
changes
I
was
making
in
pet
store
to
just
apply
to
mem
store
instead
and
and
pull
the
best
of
that
stuff
out
and
and
get
it
in
since
pet
store
is
basically
dead.
So
it
has
been
for
years.
So
it's
not
that
much,
but
it's
mostly
the
the
biggest
benefit
will
be
like
a
vector
based
object,
implementation
in
memstor
that
that
really
seemed
to
be
quite
helpful
so
anyway,
that
that's
what
I've
been
looking
at
largely
this
week.
Any
questions
or
comments
on
that
stuff.
A
I'll
put
out
like
a
a
quarter
two
by
deck,
like
we
did
for
quarter
one,
hopefully
in
the
next
week
or
two
here.
So
people
can
see
some
of
the
results
of
this
testing
and
just
kind
of
a
general
update.
H
C
A
H
H
And
my
code
was
planted,
checking
against
this
while
fsck-
I
don't
think,
is
checking
for
this,
and
you
only
saw
this
complaint.
This
assert
coming
from
my
code,
so
I
just
edited
this
party
check
to
be
executed
on
new
mount
and
mount
without
my
code,
and
I
saw
the
same
thing
happening.
It
just
took
me
time
to
realize
that
the
tests
which
didn't
finish
it's
because
it
was
the
failure.
F
Wow
good
job
wait:
where
do
you
catch
stuff.
H
A
What's
what's
a
little
interesting
is
that
you're
going
to
push
the
classical
osd's
performance
higher
than
crimsons,
with
this
increasingly
more
volume
like,
but
the
classic
osd
is
going
to
be
faster
yet
than
than
crimson
with
your
change,
because
your
change
is
going
to
make
classic
osd
faster
for
small
random
rights
and
crimson
won't
change,
because
we're
bottlenecked
by
the
reactor
thread.
H
A
Just
it'll
make
it
even
a
little
more
tough
to
make
crimson.
We
have
more
work
to
do.
A
A
There
is
one
actually,
yes,
because
we're
wrapping
blue
store
an
alien
store.
Basically,
so
we
still
have
the
kvsync
thread.
We
still
have
worker
threads,
but
but
a
lot
of
work
that
is
being
done
in
other
threads
in
classic
osd
is
being
done
in
the
reactor
threat
in
crimson.
So
until
we
get
multiple
reactors,
crimson
is
going
to
is
not
going
to
change
much
when
we
do
blue
store
improvements.
A
A
Which
yeah
without
radicure
and
kifu?
Here,
I
guess
it's
not
worth
talking
about
it,
but
the
the
the
more
I
look
at
this,
the
more
I
think
we
need
to
get
the
mn
multi-reactor
work
going
at
least
if
we
want
to
be
able
to
show
anything
that
looks
like
you
know,
remotely
like
performance,
parity
and
tech
preview.