►
From YouTube: 2018-May-24 :: Ceph Performance Weekly
Description
Weekly collaboration call of all community members working on Ceph performance.
http://ceph.com/performance
A
All
right,
I
guess
I
can
I,
can
start
going
through
pull
requests.
Let's
see,
there's
one
in
flight
from
Mohammed
I
think
it's
mo
good.
That's
that
you
that
yep,
that
switches
to
monotonic
clocks
all
good,
but
we
want
to
avoid
the
course
ones.
I
think
for
anything.
That's
measuring
short
Layton
sees
that's
under
discussion.
We're
gonna
review
a
couple
merged
Radek
had
one
that
changed.
The
encrypt
decrypt
calls
to
have
variants
that
don't
use
bufferless
that
use
just
pointers.
No
sped
up
the
signature
calculations,
notably.
A
Let's
see
Peter
had
one
that
made
the
log
subsystem
buffer
the
writes,
instead
of
doing
a
separate
sis
call
for
every
single,
our
country
that
helps
a
lot
when
the
log
is
or
when
you're
logging
a
lot
of
stuff.
Both
those
merged
marks,
priority,
cashing
thing
I
think
is
already
for
its
it
ready
for
it
to
go
through
QA
work.
A
Yeah
I
think
it's
tagged.
I
just
haven't
run
another
batch
through
that
it's
mimic
related.
So
let's
wait
till
next
week
along
the
same
category,
is
the
new
bitmap
allocator
from
igor.
I
needs
to
go
through
QA
that
looks
pretty
promising
it
swaps
out
the
old
implementation
for
the
implementation
and
the
pull
request
also
just
switches
the
default
to
the
new
one.
Since
in
his
testing
it
was
better
on
every
dimension.
A
That's
it:
let's
see,
there's
everybody
one,
they
don't
know
anything
about.
There's
Adams,
one
that
tricks
with
hashing
thing.
The
good
news
there
is
that
just
adding
in
the
other,
the
non
stringfield's
into
the
hash
seem
to
fix
the
problem
of
the
user,
we're
seeing
so
things
just
not
every
spinning
that
full
requests
that
it's
updating
the
normal
default
hash
method
for
each
object
into
each
object.
Instead
of
having
to
special
case
each
container,
I.
A
A
Let's
see
so
that's
just
crypto
stuff,
I,
don't
know
what
this
key
handler.
Oh,
this
is
the
old
one
right
read
this
right
here:
I
am
which
one
yeah.
This
is
the
old
one,
two
one
one:
seven
nine
crypto
AES
key
and
lurking
apps
cleanups
from
gifu,
because
we
already
emerged
the
thing
that
changed:
OpenSSL
is
the
one
that
ended
up,
not
working
I.
C
A
A
Update
that
or
close
it,
let's
see,
there's
a
periodic
discard
saying
it
still
has
some
problems.
It's
about
dating
the
EC
local
read
thing:
cost
feelers
and
curate,
although
if
it
has
been
updated
since
then,
probably
no
I,
don't
know
how
big
of
a
do.
This
is
so
yeah,
my
or
other
scrub
issues
of
this
around
the
same
time.
So
I
think
you
know,
try
rebasing
this.
A
Doesn't
turn
on
tests,
removing
memory,
induction
leaders
and
started
up
work
queue?
Let's
see
one
nine,
nine.
C
A
C
It
depends
actually
on
a
variant
of
division
that
is
being
replaced
if
you
are
going
to
replace
division
by
it
by
eight
bits,
integer
or
even
a
52
in
but
integer.
The
cost
of
this
division
is
pretty
is
pretty
loud
around
nine
cycles.
However,
when
you
are
doing
division
by
64
bits
which
isn't
unusual
in
self,
you
can
expect
basic
on
instruction
tables.
You
can
expect
what
NC
around
between
between
50
to
even
90
cycles.
It
can
be
but
depends,
as
usual,
I.
C
A
F
G
F
A
G
A
C
A
H
H
H
The
data
is
written
on
the
right
path
to
two
places:
it's
written
to
HDD
and
then
also
written
to
some
some
fast
fast
device.
I
mean
nvme
or
SSD,
and
we
are
the
waiting
when
flush
is
done.
Waiting
is
done
only
for
commit
the
data
on
H
SSD,
so
actually,
until
for
as
long
as
the
buffering
is
done
on
SSD,
there
is
no
waiting
for
flush
on
a
hard
disk
only
when
there
is
a
need
to
make
a
space
for
a
next
batch
of
operation
right
operations.
H
There
is,
of
course,
waiting
that
all
the
data
from
the
cache
has
to
be
moved
to
two
hard
disks.
I
mean
the
pull
request
is
not
obviously
it's
not
it's
not
complete,
but
it's
it's
enough
to
measure
the
performance
potential
performance
improvement
that
can
be
achieved,
and
here
is
the
the
graph
for
for
performance
boost.
It
looks
like
in
the
best
case
it
gets
like
plus
30
per
percent
I
ops.
H
H
E
H
Need
an
extension,
that's
that's
later,
I
mean
and
the
only
thing
that
is
and
the
booth
performance
is
unexpected
in
the
range
of
plus
30
percent.
But
what
is
a
mystery
for
me
is
why,
when
there
is
like
90
and
100
percent
of
of
Rights,
because
when
there
is
zero
percent
of
Rights,
there
is
of
course
the
having
tuned
on
or
honor
of
the
cash
doesn't
really
affect
anything.
But
I.
H
Don't
know
why
having
100
percent
rights,
it
makes
it
work
less
less
efficiently
and
I
intend
to
test
it,
because
my
my
intuition
is
that
I
have
overtaxed
the
SSD
drive
that
I
used
for
cash,
but
that
needs
needs
to
be
tested
and
I.
Guess:
that's!
That's
it
for
the
entire
solution.
It
doesn't
require
to
work
efficiently,
large
space
for
a
cash.
In
my
testing
it
was
60.
The
two
rows
of
no
and
60
megabytes
will
was
used
as
a
cash.
Just
just
that.
Okay.
A
So
I
think
I
mean
this
is
kind
of
bringing
back
something
that
we
had
this
file
store
where
all
data
went
to
the
SSD
and
then
the
rights
to
the
hard
disk
could
be
less
lazily
or
more
lazily
ordered
fewer
sinks
on
the
on
the
hard
disk.
I
think
the
trade-off
is
that
it's
a
it
means
more
burnout
or
whatever
on
your
SSD,
like
you
have
to
very
restless.
Do
you
want
lasts
as
long
because
you're
sending
all
rights
there.
A
F
A
H
A
A
A
question
answer:
I
guess
one
one
nice
thing:
if
if
if
Bluestar
is
gonna,
do
it,
then
it
could,
it
could
carve
out
a
region
of
the
of
the
SST
to
do
it
without
having
to
like
have
a
separate
partition
or
whatever
and
configure
the
DM
cache
layer
underneath
I
think
there's
this
this
larger
question
of
like
how
well
is
be
cash
or
game
cash
or
whatever
going
to
help
in
the
booster
case,
given
that
blue
stars
already
putting
all
this
metadata
on
the
SSD.
B
Think,
generally
speaking,
the
the
kind
of
the
low-hanging
fruit
would
be
just
to
even
randomly
place
data
on
the
SSD
forever
spaces
available
and
not
try
to
do
anything
too
complicated
with
promotions
or
emotions,
just
kind
of
or
some
portion
of
it
there
and
and
gain
whatever
benefit.
We
can
buy
that
without
trying
to
be
too
smart.
B
A
Yeah
yet
I
mean
this
would
be
a
great
area
for
somebody
to
like
sit
down
and
like
do
kind
of
some
comprehensive
testing,
because
that's
the
best,
the
best
results
we've
had
with
eye
casts
and
with
DM
cache
with
file
store
or
when
it
managed
to
just
get
all
of
the
XFS
metadata
on
the
SSD
and
then
like
add
them
and
nothing
else
without
doing
anything.
Like
other
fancy
tearing
that
seemed
to
have
the
best
impact
you
know
and
blue
store
already.
A
A
E
F
F
E
Yeah
I
think
the
other
one
was
about
a
module
called
DM
rate
boost
which
was
doing
the
Mike
Rinder
right
right,
I'm,
calling
him
right
back
from
the
yet
as
his
de
directly,
but
it's
not
I
think
it's
not
a
mainline
module
at
this
point.
Okay,.
E
B
One
thing
we'll
note
is
that
we
were
already
in
blue
store
kind
of
pushing
we're
not
getting
significantly
more
improvement
without
some
code
modifications
by
running
blue
store
on
faster
devices.
You
know
the
devices
are
fast
enough
already
that
that
we're
not
getting
a
whole
lot
of
gain
by
doing
on
faster
hardware,
I
think
we
have
probably
other
things
that
we
need
to
worry
about.
First
devices
right,
that's.
A
A
H
A
A
B
There
was
a
graphite
wood
yesterday
to
Google
Drive
by
can't
charities
leave
woods
on
IRC,
though,
if
anyone
has
the
back-trace
there
anyway,
the
the
gist
of
it
is.
Is
that
there's
no
like
silver
bullet
here?
In
terms
of
you
know,
we've
got
a
big
memory
leak
or
one
thing
is:
is
you
know
really
bad
when
you
kind
of
take
all
of
the
different
things
that
use
memory
and
stuff
the
PG
log,
the
blue
store
medic
cash,
the
booster
data
cache
KB
cache.
B
B
Those
combined
were
set
to
use
about
one
point,
eight
gigabytes
of
memory,
but
the
and
correctly
that
was
what
was
allocated
in
the
men
pool
and
the
stats
showed
that
that's
what
was
being
consumed,
but
the
RSS
memory
usage
of
the
process
actually
increased
by
around
three
point,
two
or
three
point
three
gigabytes.
Instead
of
one
point,
eight,
that
kind
of
amplification
also
appears
to
happen,
maybe
to
a
lesser
extent,
with
PG
log
and
also
in
rocks
DB
with
the
block
cache.
B
It's
probably
at
Sage
I
think
you're,
probably
right
that
this
is
heat.
Fragmentation
I
am
not
sure
if
there
could
be.
You
know,
other
things
going
on
as
well.
I
looked
at
the
heap
stats
from
TC
Malik
and
it
wasn't
like
there
was
a
whole
ton
of
of
RSS
memory
to
be
freed.
I.
Think
yesterday,
when
we
looked,
it
was
about
like
close
to
seven
gigs,
full
o
SD
memory
usage,
and
maybe
600
of
it
could
be
free
back
to
the
US
I
mean
there's
some,
but
but
maybe
not
not
a
ton.
B
The
free
list,
the
various
free
lists
in
TC
Malik,
were
pretty
small.
They
were
not
consumed,
total
memory
and
the
cache
there.
So
overall,
it
looks
like
we
are
consuming
most
of
that
memory
and
we
are
seeing
an
amplification
effect
for
all
these
things.
So
probably
the
next
step
is
to
figure
out
under
what
circumstances.
We
see
what
kind
of
amplifications
for
what
kind
of
data
and
then
I
guess
try
to
kind
of
create
factors
for
all
these.
B
Unfortunately,
though,
it's
pretty
easy
to
get
to
like
three
gigs
of
RSS
memory,
without
even
having
the
blue
store
caches
doing
much
of
anything
at
all.
We
can.
We
can
shrink
other
stuff
down
and
kind
of
massage
things
to
to
get
get
loose
or
some
cash,
but
this
is
where
I
think
having
some
kind
of
mechanism
for
trying
to
shift
memory
around
based
on
where
it's
needed
is
probably
going
to
be
really
important,
especially
if
we're
trying
to
restrict
B
caches
beyond
kind
of
what
we
did
currently
so
that
that's
what
I've
got
right
now.
B
B
A
I
mean
yeah,
yeah
I,
don't
know
if
there's
a
way
to
like
have
different
arenas
or
whatever
you
call
them
for
like
different
things.
So
if,
like
the
Blue
Star
cache
use
a
totally
independent
set
of
totally
independent
heap,
then
everything
else,
then
it
wouldn't.
They
wouldn't
interfere,
not
really
sure.
A
Know
I
mean
it's.
The
thing
that
worries
me
is
that
the
that
seems
to
change
so
wildly
between
an
rgw
and
an
RVD
workload
and
I'm,
not
sure
that
that's
because
we're
actually
caching
different
data
I'm
afraid
that
it's
because
we're
also
using
the
heap
for
other
unrelated
things
and
it's
test
sort
of
a
interfering
effect
on
how
things
get
laid
out.
Yeah.
B
A
A
H
My
my
observation
with
Memorial
locators
from
embedded
world
is
like
that.
If
we
have
some
unusual
expansion
of
memory,
I
mean
RSS
versus
what
we
observe
to
have
it's
usually
because
of
realloc.
We
tend
to
have
some
I
look
like
let's
say
four
kilobytes
for
some
reason,
and
then
we
trim
it
by
realloc
to
some
smaller,
smaller
part,
and
this
causes
a
fragmentation
tear
because
in
next
iteration
of
a
lock,
a
locator
is
because
you
asked
for
four
kilobytes.
H
B
A
A
A
H
Similarly,
to
what
what
is
kernel
is
doing,
it
uses
the
different
pools
within
the
arena
for
small
allocations
and
other
totally
different
mechanisms
for
larger
locations.
Sorry,
so
all
the
small
allocations
will
always
be
actually
will
not
up
more
space
than
than
the
like
five
per
10
percent.
Because
of
well
some
alignment
in
for
these
packets
of
of
allocations,
I
mean
when
you
allocate
when
you
start
and
reallocate
actually
the
small
chunk
I
mean
the
realloc
thing
is
it
does
not
fit
into
that
schema.
B
B
A
I'm
just
wondering
if
this
is
the
case
of
TC
Malik
sort
of
knowing
that
it's
acknowledging
that
it's
not
effectively
using
memory
efficiently
or
if
there
are
just
a
whole
bunch
of
allocations
that
were
not
paying
attention
to
that
aren't
being
tracked
by
men
Bulls,
because
we
have
like
a
2x
Delta
between
those
right
between
at
the
men
pool
wish.
I
could
find
that
document
they
share
drafts.
It.
A
B
A
Well,
yeah
okay
seems
like
the
next
step
is
to
look
at
the
TC
Malik
dumps
and
see
if
it
if
it
thinks
that
it
is
only
allocating
what
the
men
poles
think
are
allocating,
or
if
there
really
is
like
significantly
more
than
that
allocated
from
the
user
perspective,
and
if
so,
then,
the
path
is
to
go
figure
out
where
we're
using
the
heat
that
isn't
being
tracked
by
men
poles.
If
not,
then
the
question
is
well.
Why
is
teasing
I
like
using?
A
Is
it
fragmented
to
the
factor
of
a
hundred
percent
or
to
X
usage
or
whatever,
and
then
I
think
the
question
is
like?
Who
can
we
talk
to
that
works
on
memory
allocators
and
understands
how
they
work
better
and
to
make
a
recommendation
or
ways
to
use
them
more
effectively
or
something
Tony
sensei?
Yes,.
I
B
That
drops
us,
but
another
gig
and
now
actually
I'm
in
the
middle
of
running
a
test
where
we
eliminate
more
or
less
eliminate
the
KB
cache
or
set
to
32
Meg's.
So
we
drop
it
from
1.2
gigs
down
to
32
Meg's
and
that
test
is
wearing
super
slow.
Because
now
you
know
everything
is
hitting
the
disk
for
reads
during
writes,
but
at
the
moment
and
about
halfway
through
the
test
and
according
to
the
top
anyway,
the
RSS
number
usage
of
the
OSD
process
is
about
840
Meg's
tour
under
a
gig.
B
A
Okay,
yeah
I
mean
it
looks
to
me
here.
Like
we've,
we've
accounted
for
most
of
the
/
PG,
the
stuff,
that's
proportional,
the
PG
can't,
because
between
those
second
up
there
tabs
we
go
from
what
like
eight
hundred
Meg's
from
and
then
pools
to
basically
zero
or
close
to
it,
and
the
total
drops
down
about
a
gig.
A
B
The
right
buffers
I
imagine
that
we're
filling
one
and
then
we're
flushing
or
we're
compacting
that
once
we
fill
it
and
the
next
one
starts
writing
up.
So
my
guess
that
if
we
don't
fill
the
next
one
before
the
compaction
finishes,
we're
probably
sitting
between
256
Meg's
and
512
Meg's,
typically,
for
you
know,
memory
consumed
by
the
right
buffers
and
once
we're
back
and
farther
than
that
which
we
could
be
but
I
don't
think
any
of
this
is
unreasonable
if
you
assume
that
there's
amplification
amplification
effect
for
just
about
everything,
yeah,
okay,.
G
A
B
B
Seemed
you
know
it
smeared
here
in
the
RSS
memory
usage,
which
is
makes
sense,
because
you
know
it's
going
up,
but
then,
if
you
look
at
the
third
graph,
we
still
see
this
increase
these
bumps
in
the
RSS
memory
usage.
But
we've
only
got
eight
P
G's
I
mean
it
doesn't
make
any
sense
to
me
why,
with
such
a
smaller
PG
count,
we
still
see
that
big
of
an
increase
in
our
system
memory
usage
like
we
did
the
other
ones
so.
A
A
Sounds
good
I
might
put
two
set
music
stuff
here
on
the
list.
I
don't
know
if
there's
a
lot
to
discuss
there
or
not.
We
talked
about
this
a
lot
several
months
ago
about
making
a
an
e-text
class
that
compiles
away
without
all
the
locked
up
stuff,
but
I
think
Radek
made
it
change
to
the
OP
tracker
that
did
about
two
stuff.
One
of
the
things
was
changing
the
mutex
out
and
that
overall
made
a
big
impact,
so
I'm
optimistic
that
just
kidding,
compiling
locked
up
away
is
going
to
help
everywhere.
C
It
could
be,
we
have
actually
did.
The
change
is
related
to
create
request
of
up
tracker.
The
branch
deals
with
multiple
things.
One
of
those
is
its
using
our
abstraction
over
mutexes,
and
it
seems
it's
costly,
not
even
because
not
only
because
the
lock/unlock
things,
but
also
because
of
the
construction
costs,
there
are
much
much
bigger
in
comparison
to
plain
mutex
and
it's
painful
in
the
in
case
of
track
up
because
each
time
for
each
single
operation
we
are
constructing,
we
are
constructing
a
new
instance.
C
C
We
we
were
constructing
trap
table,
registering
it
inside
of
charcoal
and
just
after
that
feeling
some
some
events
inside
this
means
that,
because
it
was
constantly
it
was
it's
at
the
moment
it
was
registered,
we
had
to
take
and
we
had
to
operate
under
mutex,
and
we
were
doing
that.
We
are
locking
unlocking
to
say
mutex
many
times
around
five
I
have
I
just
removed
that
and
also
I've
killed
the
equation.
The
SICU
field
of
obstructed
obstructed
to
eradicate
some
cash
bank
bank.
C
We
had
between
messenger
threads
in
very,
very
initial
tests
on
insert,
but
on
a
classic
deployed,
not
with
CBT,
but
we
start
instead
I'm
seeing
up
to
ten
percent
of
change,
but
still
it's
not
our
official
methodology
list
like
that.
With
a
lot
of
more
the
grain
of
sock.
It's
not.
The
official
testing
of
universities
always
made
an
insert
that
using
CBT,
okay
and
also,
more
importantly,
it's
scenario.
We
it's
a
scenario
with
one
hundred
cash
hits
which
exposes
which
exposes
the
tenancy
which
exposes
overhead
of
especially
opens
dangers
of
the
frontage
of
the
world.
A
A
Make
our
mutex
implementation,
whatever
the
the
C++
compliant
or
whatever,
where
you
can
use
the
unique
locker
and
lock
guard
so
implement
the
lowercase,
lock
and
unlock,
and
then
introduce
a
compile
option
that
compiles
out
all
the
locked
up
stuff
to
the
current
class.
And
then
you
know,
maybe
rename
it
to
be
a
lowercase
thing.
Just
to
be
less
annoying
about
everything.
C
E
A
C
A
I
C
There
was,
there
was
a
think
related
to
buffer
circulation
in
the
huge
huge
pages
for
our
file.
Writer
of
blue
FS.
Great
I
would
like
to
I
would
like
to
extend
the
PR
with
with
the
buffer
our
circulation
I'm
a
bit
nervous
that
at
the
moment,
we
we
will
go
to
care
now,
each
time
for
each
week
you
know
not
immune
map,
it's
painful.
The.
A
C
I
saw
in
the
dream
profiling
I
saw.
There
was
there
was
around
2%
of
of
cycles
of
the
heavy
sink
read
spent
in
them
up,
lighted,
starter
and
filleted,
and
what
make
me
nervous
is
only
the
direct
cost
of
the
Siskel.
However,
when
you
and
when
you
are
dealing
with
gather,
there
are
a
lot
of
indirect
costs
like
pollution,
yeah.