►
From YouTube: Ceph Crimson/SeaStore 2021-07-21
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
So,
let's
start
last
week,
I've
been
reviewing
and
runner's
change
to
to
extract
the
the
scrub
related
logic
into
its
own
into
individual
queue,
so
it
can
be
reduced
by
by
crimson
and
also
address
couple
regressions
introduced
by
the
refractory
which
try
to
adapt
the
new
edit
submit
to
api
in
the
recent
system.
Change-
and
I
think
that's
me
around.
B
B
I
was
working
on
the
sister
patch
on
the
version
3
of
sister
patch,
for
some
adding
your
ctl
and
their
control
interfaces
to
siege
there.
So
the
version
three
has
been
submitted,
maybe
not
come
and
see
it
so
yeah.
So
yeah
that
was
update
for
me.
A
B
B
Last
week
I
reviewed
some
gears
and
they
provided
the
system,
like
I
say,
magic
support,
and
when
I,
when
I
do
the
testing,
I
go
to
an
observer
in
mb3,
but
it's
only
once
I
can't
reproduce
it.
It's
reported.
The
repository
parent
has
been
invalidated
that
not
invalidated
a
dirt
happened,
but
I
can't
reproduce
I
try
several
times,
and
so,
although
the
trade
still
has
not
used
interruptible
features
and
also
reported
the
confliction
for
the
extent,
so
I
will
working
on
work
on
the
to
convert
another
tree
to
use
interruptable
future.
C
Hello:
okay,
let's
start
from
the
backhand
last
week,
a
bunch
of
a
bunch
of
fixes
have
been
merged,
targeting
the
problem
about
being
emery's
events
in
stray
and
reset
reset
states.
C
Here
is
the
gist
that
describes
the
investigation
appears.
Well,
I
won't
be.
I
won't
be
looking
at
them
there.
There
were
a
few.
Basically
the
problem
there
was.
There
were
multiple
problems
around
osd
osd
activation.
In
many
situations
we
were,
we
were.
We
started
crunching
peering
events
in
the
booting
state.
C
Misunderstanding
basically
between
the
peering
and
messenger
components.
It's
basically
it's
in
lock.
I
think
it
could
be
a
misdirection
of
of
the
pg
notify.
Second
message
looks
like
the
peering
state
was
wanted
to
send
the
message
to
one
osd,
but
messenger
somehow
send
it
to
to
another,
just
paste
it
and
just
pasted
a
link
to
the
gist.
C
C
First,
one
is
about
some
optimizations
for
buffer
list
cstr.
That
was
an
inefficiency
with
of
this
year.
Unnecessary
rebuilds
have
been
spotted
during
during.
C
The
coding
of
the
of
the
message
of
the
protocol
header,
basically
because
of
of
having
offending
the
empty
carriage
buffer
pointer
at
the
end
I
tweaked
buffer
list
to
to
account
to
account
for
that.
Another
thing
I'm
working
on
right
now
is
the
profiling
of
of
crimson.
Here
is
linked
to
the
geese.
C
Well,
with
we
c
answer,
we
will
look
pretty
good
when
using
random
info
in
4k
random
rates,
when
using
two
instances
of
writer's
bench
sending
requests
to
single
to
single
osd
with
almost
empty
sea
answer.
We
are
seeing
around
35.
C
Thousands
of
sites
per
operation-
I'm
continuing-
and
this
is
because
of
my
talk
of
my
last
big
talk
with
with
mark
nelson,
who
is
testing
with
bluestar,
and
I
were
very
curious
how
and
how
it
it
would
look
like
if
we
use
alienster
with
bluster,
but
the
bluester
with
very
small
tiny,
almost
empty
content
inside
idea
is
to
expose.
C
The
idea
is
to
not
focus
on
the
objects
that
are
part
it's
about
exposing
the
inter
the
layers
of
osd,
the
components
we
rewrote
with
sister
and
I'm
seeing
an
extremely
interesting
pattern:
okay
in
classical
or
if
I
switch
m
in
classical.
If
I
in,
if
I
focus
and
switch
from
memster
from
almost
empty
mems
to
almost
empty
bluester.
C
Basically,
there
is
no
no
harm.
Similar
efficiency
is
preserved.
However,
that's
absolutely
not
the
case
of
for
crimson
osd.
C
Hit
is
around
two
times.
Instead
of
thirty
five
thousands
of
cycles
per
up,
I
started
seeing
70
77
thousands
of
cycles
per
operation,
and
this
term
was
almost
profiling.
C
Well,
there
is
huge
impact
of
of
the
semaphore
plus
event
led,
not
sure
it's
a
it's
everything,
perhaps
well,
I'm
speculating
in
profiling,
I'm
seeing
direct
costs
of
of
of
the
syscalls,
but
maybe
there
are
also
indirect
costs
like
trashing
cpu
caches,
because
also
the
ipc
was
instruction
per
cycle.
Metric
has
been
severally
hit
in
my
in
c
answer.
Configuration
criminal
does
something
around
one
and
half
instruction
per
cycle,
which
is
pretty
good
because
because
classical
there's
around,
you
know
something
like
that.
So
we
are.
C
We
are
two
times
better
lies
in
cpu,
but
when
using
aliens
stir
well,
the
ipc
drops
somewhere
around
one,
in
other
words,
still
still
sniffing
will
hopefully
will
know
more
today.
That's
me.
C
A
A
C
Me
too,
my
idea
is
to
I
want
to
tweak.
I
want
to.
I
want
to
compare
memster
to
manster
idea
is
to
tweak
alien
store
to
host,
not
only
bluester
but
also
memstor,
so
I
could
compare
c
and
store.
C
I
mean
vanilla,
vanilla,
tiny,
objective,
implement
implementation
with
this
almost
almost
the
same
door,
but
through
the
exposed,
3d,
alien,
alien
component
aliens
component.
C
That
would
hopefully
that
would
allow
to
to
judge
the
real
overhead
we
and
that
is
being
imposed
by
aliinster.
A
Just
a
side
note
regarding
the
the
overhead
introduced
by
semaphore,
I
did
some
investigation
last
night
after
reading
mark
nelson's
comments.
I
I
realized
that
we
could
use
a
is
a
spin
lock
based
implementation.
C
C
But
in
common
we
have
we
people
in
common.
We
do
have
a
bunch
of
a
bunch
of
we
have
implementation
of
spin
lock
and
we
were
optimizing
it,
but
I'm
not
sure
we
merged
the
the
patterns.
There
were
a
bunch
of
them
I
will
take.
I
will
take
a
look
on
that
because
you
know
if,
if
you're
doing
a
busy
weight
in
in
in
on
iran,
you
want
to
take
care
about
things
like
pausing,
cpu,
etc.
C
Some
I'm
having
I'm
looking
at
the
sorry
I
need
to
reset
the
ssh.
I
will.
I
will
post
I
will
as
a
comment
to
the
to
the
guest.
I
will
post
paste
from
from
their
free
part,
but
from
memory
I
can
say.
C
Sure,
that's
still
under
profiling.
The
overall
difference
between
between
comment.
B
D
C
Yeah,
but
there
is
one
one
problem
you
know:
marcus
is
going
to
prepare
some
slides
for
management
about
the
efficiency
of
of
crimson
and
he
does
that
basing
on
the
on
the
alien
star,
with
bluester
and
and
and
at
the
moment
on
the
slice
we
are
maybe
25,
30
percent
better
than
classical
and
marx
complains.
It's
it's
it's
it's
it's
not
enough.
So
I'm
trying
to
point
out
that
this
kind
of.
C
Yes,
because,
even
because
of
slow,
because
the
bitcoin
itself
is
not
fast,
yes
exactly,
I
think
it
I
I
speculate.
I
bet
that
in
real
workloads,
the
blues,
the
objects
are,
the
bluester
will
take
most
of
the
cycles
being
being
burned
by
entire
osd.
C
Okay,
but
this
will
this
requires
a
convincing
mark.
C
C
Anyway,
I'm
curious,
I'm
still
curious
what
whether
the
semaphore
can
explain
entirely
the
the
results.
E
Hi
er
first,
regarding
the
conversation
we
just
said
just
some
materially
one,
you
might
want
to
take
a
look
at
this
is
a
book
performance
book
that
a
guy
named
paul
mckenny
wrote
his
performance
in
performance
in
snp
issues
for
many
years
he
was
working
for
ibm.
I
think
now
he's
in
intel,
but
he's
the
guy
who
worked
for
20
years
to
get
rcu
into
the
linux
kernel.
E
E
Yeah,
I
know
we
don't
need
no.
We
know
that,
but
some
numbers
that
what
he
has
in,
at
least,
for
example,
take
a
look
at
this
page
later
on.
He
has
lectures
with
numbers
of
that
were
measured
times
of
various
operations
like
taking
a
sermon
for
spin,
locking
some
discussions
of
this,
which
is
pretty
nice.
E
Basically,
we
all
know
everything,
but
here,
but
it's
good
to
see
the
numbers
and
being
reminded
of
some
of
the
considerations
and
a
pretty
much
an
expert
one
of
the
best
experts
in
the
field.
Now,
apart
from
that,
didn't
do
much
in
the
last
week
on
code
actually
for
other
for
doing
something
other
things,
but
I'm
still
working
on
scrubbing.
E
I
have
a
bug
in
the
back
end
refactoring,
which
I'm
trying
with
one
of
the
counters
which
I'm
trying
to
solve,
and
I
have
introduced
a
bug
into
the
scheduling
somewhere
during
the
latest
fixes.
I
introduced
a
bug
there
and
I'm
trying
to
solve
that
too
things:
crashes,
professional,
that's
it
for
me.
D
Yep,
I
am
doing
some
refactoring
to
the
lbap3
use
more
of
an
iterator
based
approach
that
will
hopefully
eliminate
the
fine
tool
insertion
bug
that
may
running
into
and
we'll
make
some
later
improvements
for
supporting
clone
easier.
What's
up.
D
There's
a
bug
in
the
lba
manager
where,
when
you
insert
a
an
extent
the
place
it
tries
to
insert
it
in
the
tree
is
incorrect.
After
split
because
of
a
bug
in
the
way
fine
tool
works,
it's
just
not
correct,
so
I'm
rehabbing
it
to
fix
that.
Okay,
it's
probably
the
cause
for
most
of
john
may's
crashes,.
F
This
week
I
I
modified
the
extent
placement
manager
pr
and
pushed
it.
I
also
added
the
extend
scan
mechanism
for
the
extent
placement
manager.
It
is
all
in
the
pr
right
now
and
I
think
it's
ready
for
review.
I
am
I'm
also
working
on
adding
voltage
device
support
into
systole
right
now.
That's
all
for
me.
A
A
B
Yeah
last
week
I
I
added
matrix
at
the
cache
layer
and
I'm
trying
to
review
the
extent
placement
manager
design,
which
seems
has
impacts
on
many
systole
components.
So
I
think
it
might
work
sam
to
also
look
at
it.
B
A
I
just
recall
that
I,
the
offline
discussion
was
was
the
way
was
the
englishing,
so
you
mentioned
that
you're,
not
quite
sure
about
the
target
device
of
the
immune
rhythm
device
work.
Is
that
true,
like
you're,
not
sure
what
what
the
exact
device
is
changed?
I.
B
D
I
still
wanted
to
go
through
the
extent
placement
manager
and
the
the
way
the
transactions
should
support.
It
is
very
much
like
the
out-of-line
events.
That's
why
but
other
than
that
you're
right
it
doesn't
need
to
be
cleaned.
It
doesn't
have
segments
as
such.
A
So
it's
more
like
a
generic
ssd
device
right
or
ram
pmap.
D
Ssds,
maybe
it
depends
that
that
will
depend
on
whether
on
whether
it's
more
efficient
to
treat
those
as
being
segmented
or
not
because
they
may
be
their
internal
garbage
collection
may
be
so
slow
that
it's
more
efficient
to
write
to
them.
Sequentially,
but
that's
just
a
call
we'll
make
after
benchmarking
and
measurement.
B
D
A
Anything
else
as
well.
I
will
update
the
this
pad
right
up
to
the
meeting.
A
C
C
I
forgot
sorry,
I
I
I
missed
myself
late,
okay,
I
posted
comment
with
the
pastes
from
from
perth
from
perth
report
and
it's
in
alienster
at
the
alien
side,
sorry
and
the
alien
threat
threats
site,
as
well
as
the
reactor
site.
However,
I'm
not
entire,
I'm
not
sure
it's
all
it's
solely
because
of
of
the
semaphore.
C
C
But
from
the
gist
is
from
the
comments,
it's
rachel
visible
that
the
the
after
the
work
on
conveying
the
on
conveying
the
requests.
It's
pretty
significant
is
pretty
large
in
comparison
to
real
work.
Real
number
of
to
real
work
performed
actually
doing
bluestar
things
just
compare
the
bluester
reed
and
bluestar
gettiers,
which
is
6.6
and
2.2
respectively,
with
the
entire,
with
actually
the
fact
that
the
the
loop,
the
thread
will
loop,
takes
around
30
percent
25
and
30
percent
of
of
all
cycles
burned
by
the
entire
process.
C
It's
not
time
it's
about
cycles,
cycles,
yeah,
so
so
be
aware,
if
you,
if,
especially
if
you
compare
with
with
the
results
from
mark's
gdp
pmp,
which
is
about
profiling
in
work
in
what
time
world
club
yeah
exactly.
A
B
C
D
D
B
C
Somewhere
close
to
the
to
the
sea
answer
results
just.
D
You
know
I'm
just
asking
about
the
test
construction
itself,
not
not
the
results.
It
looks
to
me
like
it's
two
two
random
reads
or
two
radius
bench
instances
which
are
both
sending
cue
size.
One
right.
C
At
the
exact
command
comment,
I
used
to
collect
the
data
to
read
to
record
this
stuff.
Okay,
updating
the
comment.
B
C
They
were
two
raiders
bench
instances
doing
30-second
random
rate
over
single
osd
instance.
C
That's
that's
a
question
also.
I
think
that
that.
C
The
queue
dies
for
each
no.
C
I
doubt
I
think
it
sends
up
to
16
requests
in
parallel.
D
A
D
A
D
B
A
D
C
Well,
we
could
just
limit
the.
C
D
C
We
can
verify,
does
hypothesis
just
by
lowering
the
number
and
seeing
whether
it's
the
q,
the
q
size.
D
F
D
D
D
A
No,
we
don't
do
the
weight
before
that.
We
were
using
a
like
way
before
that
we
were
using
a
single
queue,
but
johan
was
added
to
the
charitable
queue
so
to
improve
the
parallelism,
and
after
that
I
I
believe.
B
A
Lock,
the
lock
is
too
time
consuming.
So
I
and
I
don't
really
want
to
look
to
exist
in
the
in
a
c
star
institute.
Stress
I'm
I
mean
mutex
opposing
nucleus,
so
I
I
trade
it
with
with
semaphore
in
hopes
that
it
will
be
faster
by
removing
the
mutex.
D
C
It
alien
star
with
we
are
comparing
those
the
following
setup.
C
D
We've
never
allowed
alien
store
to
busy
wait
so
whether
we
were
using
p
thread
mutex
to
wake
up
or
whatever
doesn't
really
matter.
I
will
point
out,
however,
that
the
sharded
work
queue
concept
will
make
this
problem
worse
to
not
better
in
classic
osd.
You
may
recall
back
in
the
day
when
charted
work,
queue
was
what
was
originally
added
for
osds
that
were
frequently
idle
started
work.
You
increased
latency,
because
the
individual
worker
cues
had
to
go
to
sleep.
A
A
A
D
C
Okay,
the
point
would
be
that
we
are,
we
won't
be
able
to
do
to
saturate,
because
the
reactor
part
would
saturate.
D
C
D
C
D
C
One
will
be
too
much
too
small.
Okay
now
I
recall,
and
I've
introduced
that
that
variable.
C
And
I
did
that
because
one
was
not
enough
there's
if
there
is
against
any,
I
need
to
dig,
but
there
is
against
comparing
multiple,
multiple
values.
However,
it
was
before
introduction
of
sharding.
D
F
Oh,
I
didn't
do
random,
read
I
what
I
did.
It
was
random
right.
It's
about!
Oh,
okay!
It's
about
fifteen
thousand
iops!
I
if
I
recall
it
correctly,.
A
Were
you
using
the
redis
bench
or
some
other
test,
some
other
tool.
D
So
that's
that's
the
other
thing.
Radius
bench
is
kind
of
terrible
at
this,
which
is
why
I
was
asking
about
total
number
of
attributes.
Rbd
or
fio
is
going
to
be
able
to
generate
more
concurrent
reads
than
greatest
natural.
Well,
given
the
same
well,.
C
Raido's
bench
also,
but
we
will
need
to
bump
up
a
proper,
a
proper
parameter.
There
is
it's
comfortable.
D
B
B
A
D
B
D
C
Exactly
okay,
because
my
main
focus
is
not
even
on
regression
we
have
or
not
in
in
alienster,
I'm
curious.
Why
introduce
why
introducing
alienster
lowers
the
styx
per
up
so
much
two
times.
So
it's
not
about
eight
eight
percent,
twenty
percent.
It's
two
times
slower
slower.
D
D
C
F
Worth
I
think
the
default
setting
is
pin
blue
stars
blue
store
threads
to
the
last
five
or
ten
cpu
cores
it'll.
Never.
D
F
Yes,
I
think
that
that
that
that's
not
that's
not
the
the
sister
thread.
That's
not
the
cpu
code
that
star
threats
are
running
got
it.
A
D
D
Yeah,
I
know
I'm
looking
well,
we
sort
of
expect
it
to
right.
Every
time
you
go
to
sleep,
you
need
to
do
some.
The
the
thread
itself
needs
to
do
some
cleanup
that
you
know
calls
back
into
the
curl
which
puts
its
sleep
upon
wake
up.
It
has
some
restoration
work
to
do,
but
before
it
finally
gets
you
know
back
to.
F
So
it
costs
more
cpu
cycles
because
it
is
relatively.
D
Free,
yes,
exactly
and
it's
not
that
it
costs
more
cpu
cycles.
It's
that
a
larger
percentage
of
the
cpu
cycles
that
it
spent
not
sleeping
were
spent
going
to
sleep,
but
that
doesn't
mean
that
most
of
the
time
was
spent.
That.
D
Way,
you
see
what
I
mean,
so
in
other
words,
if
the,
if
the
threat
actually
spends
50
of
its
time,
sleeping
just
literally,
not
scheduled
at
all
and
then
of
the
remaining
time.
Five
percent
is
spent
going
to
sleep
and
coming
back
or
eight
percent
here,
and
the
remaining
forty
percent
is
spent
doing
real
work.
Then,
in
this
performance
graph
we'd
see
five
percent
over
fifty
percent
as
its
cpu
cycle
spent
in
stem
weight,
but
it
would
still
be
fully
almost
completely
underutilized.
D
D
Like
for
what
what
perf
tells
you
is
percentage
it?
It
samples
cores
at
different
times,
attributes
those
things
to
threads
and
then
gives
you
the
ratio
of
samples
that
happened
in
each
get
each
bit
of
the
call
stack
to
the
total
set
of
samples.
But
it
doesn't
take
any
samples
when
the
threats
asleep.
C
C
D
C
It's
because
I
filter
it,
the
it
at
the
level
of
of
a
perfect
report.
However,.
C
Wall
program,
I'm
pretty
sure
I
just
filter
it.
C
The
view
knows
the
data
being
used
to
do
the
math,
in
other
words,
27
percent
spent
in
the
loop
is
about,
is
in
comparison
to
the
all
cycles
recorded,
which
means
all
threads
from
the
entire
process.
I
was
just
take
a
look
on
the
perfect.
C
F
D
I
think
I
think
that
yeah,
I
think,
we're
late.
It's
limited
here
either.
It's
the
case
that
the
crimson
reactor
that
the
c-star
reactor
is,
as
you
pointed
out,
saturated
and
it
can't
keep
blue
store
full
or
neither
is
saturated,
because
32
concurrent
reeds
wasn't
enough
either
way.
I
don't
think
this.
This
this
thread
pool
thread
is
saturated.
I
think
it's
sleeping
about
half
the
time.
D
C
To
saturate
the
sharded,
the
sharded
alien
still.
D
C
D
C
C
D
A
Neck
is
is
on
the
it's
in
the
sister
site,.
D
B
A
F
F
The
iops
of
the
classic
osd
is
about
30
higher
and
at
the
time
the
sister
thread
is
one
is
using
is
fully
using
the
cpu
cord
that
is
running
on
so
at
the
time.
I
think
it
was
the
systar
thread
that
was
the
bottleneck
for
the
whole
testing.
C
Okay,
so
you
had
saturation,
have
you
checked
the
psych
spell.
F
A
F
Oh,
I
didn't
do
I
didn't
increase
the
the
alien
storage
risk
number
of
for
for
for
better
for
better
performance.
I
just
want
to
compare
it
with
the
classic
osd.
F
So
I
I
I
just
set
the
the
alien
store,
thresh
number
to
the
same
as
the
blue
store
in
store,
and
then
I
run
the
test
and
once
I
found
out
that
the
sister
thread
was
the
bottleneck,
I
I
didn't
do
any
more
tests.
So
I
don't
know
if
I'm
making
myself
clear.
F
Oh,
no,
I
I
actually
I
I
just
thought
it,
because
sooner
or
later
we
we
will
run
crimson
osd
on
multiple
cores,
so
I
think
it
would
be
better
to
have
the
edit
store
work.
Queue
started.
F
That's
possible,
but
I
think
that
possibility
is
not
supported
by
our
bio-radix
test,
because
the
test
shows
that
it
the
the
same
weight
on
small
cpu
cycles,
because
it
is
relatively
free.
So
so.
C
The
test
were
is
constructed
with
the
c
and
memster
in
mind.
Using
it,
the
bluester
is,
is
well
it's
just
an
incident.
B
F
I
I
still
think
if.
C
B
Yeah,
by
submitting
the
request
from
the
the
c
star
call
to
blue
star
call
right
so
that
process.
Yes
exactly.
F
So
the
performance
degradation
we're
talking
about
is
in
terms
of
ipc
right,
not
exactly.
C
I
think
the
reason
the
main
reason
is
because
not
only
because
of
doing
more
work
like
conveying
between
the
stuff
between
multiple
threads.
It
also
because
of
lowering
the
efficiency
of.
C
Well,
I
haven't
put
the
number
into
the
gist
yet,
but
just
to
recall
just
to
recall
when
using
c
and
stir
crimson
is
able
to
hit
around
one
and
half
instruction
per
second
with
with
bluester
plus
alienster.
C
D
C
That
is
also
a
possibility,
but
we
can,
it
could
be
judged
pretty
easily.
Perfstat
has
option
to
profile
a
specific
thread
only.
A
Yeah
we,
I
think
we
can
also
try
to
reverse
the
shutter
q
change,
to
see
if
that
helps
with
the
performance.
Actually,
because,
because
there's
a
like
six
six
threads
in
this
picture,
they
are
trying
to
swap
in
and
swap
out
when,
there's
a
drop
in
the
queue
when
there's
a
not
and
when
there's
not
so
a
single
single
single
thread
might
help.
A
If,
if
the
the
load
is
not
enough
to
saturate
the
the
charge,
q.
B
B
D
You
are
worried
about
the
impact
of
client-side
library
on
the
total
testing
environment.
Yeah.
Not
really,
if
you
make
sure
you
put
the
client
in
a
in
a
resource
unconstrained
place
and
give
it
enough
total
parallelism.
I
don't
see
why
it
would
matter
also
keep
in
mind.
Rato's
bench
does
almost
everything
librarbd
does.
E
C
C
Extra
like,
like
the
classes
on
only
on
on
called
paths.
D
B
D
B
Yep,
oh
so,
oh
I
mean
test
in
the
or
in
one
environment.
C
And
everything
under
assumption,
you
have
enough
course
to
to
delegate
your
tenancy
freighters
arrangers.
Okay,
if
you
are
testing
on
laptop,
that's
that's
a
cosmic
constraint,
but
on
any
reasonable
server.
It
should
be
fine.