►
From YouTube: 2017-MAR-22 :: Ceph Performance Weekly
Description
Weekly collaboration call of all community members working on Ceph performance.
http://ceph.com/performance
For full notes and video recording archive visit:
http://pad.ceph.com/p/performance_weekly
B
B
A
A
Pr
for
Ashley
that
was
based
on
an
earlier
PR,
I,
think
from
ma
jian
feng
to
separate
the
kV
sinks,
red
and
blue
store
into
two
parts
and,
as
we
saw
last
week,
bass
and
Igor
slides.
It
looks
like
that
has
a
pretty
substantial
impact
on
performance,
so,
unfortunately
that
is
not
compiling
right
now,
at
least
in
certain
versions
of
GCC.
Possibly
it
works
in
newer
ones,
but
we
need
to
define
that
up
a
little
bit
more
before
we
can
start
testing
it.
A
This
is
really
really
exciting
because
it
lets
you
on
a
purple
basis,
be
able
to
say
that
a
distribution
looks
bad
or
doesn't
look
optimal
and
after
the
pgs
have
already
been
mapped.
For
that
pole,
you
can
go
in
and
say:
okay
now,
I
want
to
remap
these
pgs
to
in
some
kind
of
user,
specified
way
and
there's
a
little
script.
A
This
is,
this
is
something
we
need
something
like
this
for
a
long
time
ed
and
have
been
thinking
something
different
along
lines
of
making
it
so
that
you
could
have
been
different,
crush
buckets
kind
of
occupy
a
place
in
multiple
hierarchies
at
the
same
time,
which
I
think
could
have
accomplished
something
like
this,
but
frankly,
I
think
sages
solution
is
probably
a
year
and
maicer
in
a
lot
of
ways.
So
yeah,
it's
very,
very
exciting.
A
What
else
here,
apparently
we
had
forgotten
to
set
the
men
Alex
eyes,
SSD
to
in
blue
sorta
16kb
we
had
plant,
we
were,
we
thought
we
had
done
that
actually
a
last
fall,
but
apparently
we
had
forgotten
to
or
something
so
we
we
just
a
bet
now
it
might
be
that
we
can
reduce
that
down
to
8
kb
or
4
kb,
as
we
reduce
the
metadata
load
in
Roxy,
be
or
if
we
switch
over
to
something
else
like
another
scale,
but
at
least
for
the
moment
where
we're
bumping
that
16
kb
just
because
it
there's
too
much
metadata,
both
in
memory
and
it
knocks
DB.
A
A
The
fixed
six
deferred
rights
p
are
merged
last
night.
In
some
cases
it
seems
to
be
maybe
improving
performance
a
little
bit,
but
in
other
cases
it
might
actually
be
able
to
slower
that
those
also
fix
a
whole
bunch
of
issues,
apparently
that
we
had.
That
says
you
noticed
once
they
start
looking
into
it.
So
it's
it's
definitely
needed.
We
require
should
try
to
understand
why
the
performances
not
universally
better
so
probably
look
into
that
a
little
bit
more
and
then
this
RC
walking
one
knows
by
Dan
lambright
a
while
back.
A
It
turns
out
that
it
actually
doesn't
appear
to
be
very
much
faster
or
any
faster
than
our
current
locking
mechanism,
and
that
actually
has
all
been
moved
out
of
the
hot
path
now
from
what
I
understand,
so
it
might
not
really
matter
in
the
end.
Anyway,
there
was
a
note
in
the
PR
that
some
folks
are
trying
to
resurrect
some
code
that
the
cohort
FS
guys
had
or
instrumenting
the
the
OSD
to
get
like
french
marker
or
latency
stats
out
of
certain
parts
of
the
code.
So
that
would
be
a
nice
effort.
A
D
Think
it
was
I
think
it
was
two
parts
one
part
is
to
like.
You
were
saying
in
part
of
our
idea
what
it
was
to
introduce
a
whole
lot
of
fealty
t
ng
tracing
into
BOS
d
and
then
also
to
have
the
direct
messenger.
/
men's
store
thing.
So
you
could
basically
just
have
the
OSD
top
hat
more
or
less
by
itself
without
confounding
variables
and
then
just
focus
on
finding
latency
in
those
those
areas
where
it
was
on
the
top
right
on
the
bottom,
more
or
less
Oh,
fantastic.
A
Yeah,
that's
great
we've.
We
I
think,
at
least
in
my
opinion.
We
can't
needed
to
do
that
kind
of
thing
for
a
long
time
separate
out
the
code
as
much
as
possible.
So
you
can
just
do
micro
benchmarks
like
this,
but
very
good,
any
idea
that
the
atom
do
you
think
that's
going
to
happen
anytime
soon
or
is
it
and
then.
D
So
it
is
in
our
interest
like
right
now
we're
working
on
trying
to
finish
up
a
bunch
of
rtw
related
stuff,
but
we
are
going
to
be
trying
to
move
into
improving
rtw
performance,
and
I
think
we
want
to
try
to
well
to
try
to
focus
on
improving
OSD
performance
as
a
means
to
improve
rtw
performance
as
sort
of
something
that
is
very
like
we're
starting
it
right
now.
I've
we've
been
thinking
about
it,
but
we
are
sort
of
hoping
to
reallocate
people
and
push
on
that
effort.
A
B
If
I
may
interrupt
in
the
matter
of
brothers,
greatly
performance,
I
guess
we
can,
we
add
them
out.
I
personally,
think
that,
for
this
gateway
is
a
type
of
guy
that
simply
orchestrates
data
transfers
between
two
fiber
strip,
tears,
100
scripter
is
divine
rated
to
reduce
to
the
Raiders
layer
towards
DS
plus
do
an
OSD.
The
second
one
is
for
HTTP
client.
At
the
moment
we
are
doing
a
lot
of
memory
copy
their
bulk,
a
lot
of
bulk
memory
copy,
and
maybe
we
could.
B
This
is
quite
fine,
because
today,
at
this
moment
we
are
writing.
We
are
writing
a
proposal
for
22
safe
devil
about
extending
xing
the
interface
of
batter
list
to
get
even
more
to
get
even
more
savings
during
the
management
of
data
transfers.
I
think
I
think
that
this
might
be
quite
interesting
money
situation.
Stratos
graduate
tends
to
tend
to
move
a
lot
of
data.
May
be
combining
them
may
be
pulling
those
data
from
Colonel
only
two
full.
Yet
to
put
it
back,
maybe
we
could
try
to
avoid
that
I
think.
D
It's
very
interesting
idea:
I'd,
certainly
like
to
see
it.
I
know
the
parts
that
I've
been
think
I
know
the
parts
that
I
had
been
wanting
to
look
at
the
most
myself
is
that,
since
I'm
aware
of
a
bunch
of
inefficiencies
allocation
to
sort
of
you
know
having
very
large
critical
sections
where
a
bunch
of
stuff
happens
in
the
OSD
client
I
was
also
hoping
to
both
improve
the
implementation.
B
Some
time
ago,
there's
even
another
initiative
related
to
that
static
pointer.
Actually,
it's
it's
I
would
say
it's
it's
the
code.
Is
there
its
own
place
in
place,
but
we
will
need
to
to
even,
but
to
start
even
thinking
of
product
ization.
This
concept,
you
will
need
to
write
a
lot
of
or
a
lot
of
unit
tests.
D
Well,
so
I
can
actually
think
about
actually
have
a
bunch
of
unit
tests
for
something
similar,
but
for
a
different
purpose
in
a
different
project.
But
saticoy
is
an
interesting
idea
and
it
works
very
well
for
the
poor
polymorphism
in
things
like
the
allocator
case
for
most
of
the
promotional
stuff,
I
was
thinking
about
in
the
OSD
client
interface
per
se.
I.
D
Don't
think
that
we
actually
have
the
kind
of
polymorphism
that
would
make
that
that
would
necessarily
require
that
are
things
it
up
recall
any,
but
it
does
make
a
lot
of
sense
in
some
of
the
other
parts
of
our
GW
that
I'm
aware
of
okay
got
it.
I
would
certainly
be
happy
at
some
point
to
help
try
to
get
static,
pointer,
production-ready.
A
A
We
can
do
reads
and
writes
and
then
also
more
file
store,
specific
related,
although
that's
also
kind
of
file
care
issue,
but
the
the
way
that
we
split
DG's,
then
the
number
of
objects
that
is
created,
especially
if
you
look
at
something
are
very
sure
coding
with
our
DW,
where
you
have
lots
and
lots
of
chunks
like
it
created
potentially
with
with
smaller
objects.
Energy
w
is
just
it's
really
nasty.
Oh
you
either.
D
Of
those
persist
in
the
brave
new
world
of
blue
store,
because
I
talked
about
the
impression
that
we
didn't
really
want
to
these
from
other
episodes
with
a
set
performance.
Call
that
we
didn't
want
to
spend
too
much
time
working
on
file
store,
specific
stuff
I
mean
if
we
do
that,
Tom
and
I
just
thought
we
were
I,
know
yeah.
A
We're
not
fixing
easier,
post
and
false
are
to
be
honest
in
blue
store.
Those
issues
are
not
nearly
as
bad
well.
I
should
rephrase
that
in
blue
store
some
of
those
issues
basically
just
go
away
specifically
like
the
the
PG
splitting
issue,
but
we
do
inherit
me
ones,
specifically,
as
the
amount
of
metadata
and
Roxy
be
increases.
You
have
a
lot
of
compaction
overhead
and
so
we'll
just
need
to
be
mindful
of
the
fact
that
potentially
you
might
have
if
you've
got
what's
little.
A
Map
going
on
racks
TB
could
be
start
becoming
a
problem
and
and
in
general,
if
you
have
lots
of
objects
and
lots
of
success
and
other
things
you
potentially
could
have
box
to
be
again
becoming
kind
of
a
a
bottleneck.
So
that's
in
my
mind,
all
the
work
that's
going
on
with
trying
to
reduce
the
amount
of
metadata
in
blue
sore
anything
that
we
can
do
to
reduce
metadata,
sighs
and
I
know,
Igor,
isn't
working
on
or
Retta,
so
that's
been
working
on
that
and
in
folks
from
sandisk
have
been
looking
in
that
too
now.
A
D
D
One
thing
that
matt
has
been
talking
about
was
wanted
to
try
to
partition
out
either
very
very
large
sets
of
Oh
Matt
prefix
keys
or
someone
like
that
into
a
database.
I,
don't
know
how
that
would
handle
consistency,
or
if
there
was
some
way
that
we
could
sort
of
hint
it
to
have
compact
to
basically
sort
of
have
multiple
compassion
domains.
Perhaps
I'm
not
an
expert
on
rocks
PP.
What.
A
There
was
actually
a
PR
where
we
were
doing
it
and
it
worked
I,
don't
I,
don't
know
why
we
ended
up,
never
really
merging
it,
but
but
that's
totally
doable,
and
it
might
be
that
that
actually
does
help
us
in
in
the
long
run,
especially
in
situations
like
you
just
mentioned,
where
we
get
like
tons.
You
know
map
traffic
because.
D
It
sounds
like
I
have
to
look
at
the
key
schema,
but
it
sounds
like
if
it
were
actually
some
way
that
somewhere
in
the
OSD
or
in
oh,
you
know
the
thing.
The
method
calls
that
we
have
there.
It
sounds
like
if
we
could
have
some
sort
of
hinting
to
be
able
to
create
a
column
family
for
a
for
a
part
of
key
space
that
we
expect
to
come
with
a
to
be
a
large
bucket
in
dec
say
that
that
might
actually
be
good.
A
Yeah,
you
know
that
will
probably
I
would
think
will
help
with
the
locking
issues,
because
one
understand
in
rocks
DB,
you
still
are
only
have
a
single
thread
for
level
zero
compaction
than
all
the
other
like
one
level
to
whatever
compaction
can
be
happen
in
other
threads
I.
Don't
know
when
you
have
comb
families,
I
would
think
I
don't
really
know
they.
A
So
it
you
know
it
might
it
might
end
up
being
that
that's
the
the
biggest
limitation
is
about
right,
amp
and
read,
am
but
but
you're
just
kind
of
hammering
the
device,
that's
kind
of
why
I
like
the
Sanders
guys
were
or
so
vigorously
looking
at
trying
to
get
Zetas
scale
working.
Just
because
you
know
it's
you're
not
doing
that
kind
of
connection
anymore.
Instead,.
A
So
yeah
anyway,
flux,
lots
and
lots
of
stuff,
though
what
else
is
on
here,
I
think
there
are
a
couple
other
things
in
here
that
I
guess
I,
don't
know
too
much
about
this
crc32,
the
for
PVC
architectures.
It
looks
like
keeping
reviewed
that
and
then
sage
is
now
testing.
This
arm
range
Keys
operator
interface
for
X
to
be
which,
hopefully
is
better
I
seem
to
remember:
I,
don't
I,
don't
really
remember,
I
guess,
but
rocks
BB
has
kind
of
some
weirdness
is
regarding
some
these
interfaces,
where
they're
they're
slower
than
they
should
be.
A
A
It
looks
like
builder
answered
that
James
question
here
in
the
chat
about
our
sages,
PG
remapping
work
is
and
yeah.
That's
to
me
at
least.
That's
super
super
exciting.
This
is
something
that
we've
needed
a
solution
for
four
years
and
ended
up
being
way
simpler
than
I.
Guess
any
of
us
ever
realized
so
yeah
my
it.
A
It's
pretty
amazing
I
was
hoping
that
you'd
be
able
to
join,
to
talk
about
it,
but
yeah
he's
he's
not
able
to
make
it
to
the
vault.
But
let's
see
the
other
thing,
I
guess
I
was
hoping.
It
may
get
sage
here
for
would
be
talked
about.
The
deferred
right
changes,
I
did
go
through,
and
I
think
I
mentioned
I
tested
bed
and
it
is
not
really
as
doing
quit.
What
we
may
be
hoped
we'd
get
out
of
it,
but
more
testing
to
be
on
that
I
think.
That's,
basically
all
I
have
this
week.
B
One
thing
basically
at
talking
the
same
area
we
was
talking
was
talking
about
last
week
at
the
data
structures
we
are
using
in
in
Brewster.
We
had
some
discussion
are
got
basically
related
to
then
sequential
structures,
we're
acquiring
extended
some
copy
copy
operation,
some
memory
moving
on
rights
versus
versus
dynamic
structures
like
red
black
tree
and
similar
stuff
we
have
at
the
moment
I
was
I,
was
looking
for
for
some
benchmarks
and
found
very
interesting
page,
but
guys
just
take
a
look
on
that.
B
I
already
put
already
forward
the
link
H
and
pretend
also
posting
on
some
on
Fridays
Bluestar
meeting,
but
I
guess
it
could
be.
It
could
be
interesting
for
a
broader
audience.
What
is
what
is
the
interesting
is
that
even
is
that
using
sequential
memory
copy,
based
data
structures
for
for
for
hunting,
small
day
data
structures
for
for
for
hunting,
small
data
items
like
24
bytes
is,
like
our
extent
instance
might
be
faster,
even
on
random
insert
that.
Does
it
that's
terribly
interesting,
I
think
the
researchers.
B
Page
is
pretty
is
pretty
old,
unfortunately,
to
the
script.
I.
Think
I!
Guess
that's
the
script
exclusively
for
making
those
graphs
it's
located
outside
the
HTTPS
domain,
so
we
will
need
to
tell
you're
below
your
browser
to
to
get
content
from
untrusted
source,
but
I
have
my
chrome.
I
have
a
small
icon
on
the
right
of
address
bar.
This
page
is
trying
to
log
scripts
from
alden
from
unauthenticated
cells
allowed
unsafe
street
right
under
it.
Yep.
B
For
small
items,
there
is
nothing
better
than
done
that.
Ask
that
simple,
really
stupid
structure,
preserving
memory
continuity
yeah!
This
is
because
nowadays
CPUs
because
because
lot
of
CPUs
are
all
about
cheating,
main
memory
is
slow,
so
you
have
multiple
levels
of
of
caches.
You
have
protected
mode.
That
requires
a
lot
of
data,
translate
translations
a
lot
of
page
working,
but
you
have
TLPs
instructions
will
be
data,
TLD,
shed,
glb
and
another
stuff
all
about
cheating.
B
Unfortunately,
when
you
are
using
when
someone
is
using
dynamic
structures
that
scatters
data
around
world
memory,
you
are,
you
are
avoid,
for
example,
professors,
predators,
disabled
themselves
and
you
are
along
with
with
you.
Don't
have
you
are
lossing
cash
assistance?
You
are
along
with
the
terribly
slow
main
memory.
I
guess
that's
the
case
of
encode.
Some
I
was
poking
with
it
for
some
time
and
without
exchanging
data
structures.
B
E
Can
I
can
collaborate
to
this?
This
way
that
I
took
the
previous
erratic
part
for
encode
some
and
I
tried.
He
introduced
that
some
kind
of
slap
alligators
related
to
each
own
owed,
so
he
is
so
they
were
kept
us
together.
I
tried
to
play
with
that
and
make
one
big
slap
a
locator
for
entire
blue
store
and
that
caused
it
to
locate
extant
in
a
definitely
random
random
places,
after
some
time
and
I've
seen
a
degradation
performance
tenfold
tenfold.
So
I
guess
I
will
follow
that.
B
Guys
this
is
especially
true
in
the
case
of
extant
map.
T
accent,
map
t
is
a
typedef
over
boost,
intrusive
set
boost,
intrusive
set
is
based
on
our
red
black
trees,
and
we
are
using
this
powerful
machinery
to
just
200
just
24
bytes
long
data
structure,
including
including
the
padding
between
members
I,
would.
C
B
B
A
B
Very
good
idea,
but
this
means
we
would
need
to
also
extra
to
change
the
serialization
formats.
We
are.
We
have
right
now,
but
it's
certainly
a
problem.
We
did.
We
can
just
simply
bump
app
version
by
one
and
preserve
for
current
code.
For
for
the
legacy
path.
Yeah,
we
can
try.
The
idea
is
to
serialize
blobs.
First,
then
extents
then
Martin
between
the
between
them
big.
D
B
A
C
E
E
B
E
E
B
P.M.
you,
access
from
which
one
machine
yet
but
I
guess
guys
I
think
nowadays
machines
nobody,
a
CPUs,
are
really
really
powerful
beast.
They
are
able
to
execute
money
in
strategy.
Are
super
scholar
able
to
execute
money
instruction
per
cycle?
All
you
need
to
do
is
to
pro
is
to
fit
them
with
data
and
dynamic
structures.
A
button
that
matter.
B
E
A
B
A
And
I
won't
speak
for
sage,
I
guess,
but
there's
always
the
concern
to
that.
It
is
much
more
difficult
to
follow.
What's
going
on
right,
you
know
right
now.
We
have
all
these
these
things,
nothing
on
memory
that
that
sort
of
makes
sense
and
a
lot
of
things
that
we're
talking
about
meme
that
potentially
maybe
not
but
substantially.
Maybe
it
could
be
a
bit
harder
to
understand.
What's
going
on.
B
A
A
A
All
right
well,
then,
have
a
good
week
guys
and
hopefully,
next
week
I
will
will
have
the
multi
kb
sync
thread:
stuff
worked
out,
barley,
split,
kb
swings,
drive
better
and
we
can
get
some
feedback
on
that
and
then
rest
I
buy
also.
I
did
I
think
I
mentioned
you.
I
did
some
testing
on
your
bitmap
allocator
work
and
it
definitely
is
improving
things.
B
Cool,
we
haven't
finished
with
the
memory
allocator,
yet
we
definitely
want
to
have
to
preserve
allocation
screens
and
also
make
the
internal
structure
at
least
of
the
bottom
layers,
where
which
are
very,
very
populated
with
with
with
with
data
we
want
to
have
those
data
continues
memory,
because,
right
now
we
are,
we
are
jumping
from
from
area
to
area
to
just
to
just
take
a
look
whether
whether
the
place
is
empty
or
not.
Yes,.
A
B
The
way
marc
andreessen
there
are
some
some
controls
you
can
provide.
You
can
feed
the
system
to
to
show
or
hide
some
particular
costs
we
have,
for
example,
if
you
want
to
test
the
memory
allocator
overhead
itself
without
without
too
much
too
much
cause
of
of
traversing
data
structures
in
encode
decode,
then
going
with
smaller
radius
objects.
Is
it's
reasonable
small
I
mean
I
mean
here
around
40
kilo
kilo
byte
age?
B
A
A
Yeah
and
we
list
from
all
the
testing
that
than
those
have
been,
the
two
things
they
always
struck
out
at
me
is:
is
the
map
allocator,
locking
issues
who
she
jump
right
in
on
and
then
also
reflective,
encode,
some
and
and
all
the
associated
things
that
come
with
our
meditate
overhead,
ranging
from
a
memory
scattered
all
over
the
place
to
just
be
mounted
mitad
at
we're
shoving
in
rocks,
TV
and
and
how
much
work
artstv
ends
up
doing
because
of
it.
So
ya,
know
you're
you're,
doing
good
keep
up,
keep
up
the
good
work.
A
B
E
E
So
it's
just
now
checked
after
decoding
very,
very
neat.
If
we
somehow
did
not
get
additional
superfluous
bite
and
then
we
make
a
troll
we're
not
trying
to
throw
it
and
with
any
bites
and
all
a
bit
realigned
it
so
it
made
a
shorter,
but
that's,
but
that
was
no
one
I
squeezed
out
from
excellent
excellent
reference
to
all
node,
because
all
the
extent
maps
that
we
have
are
always
located
in
unknown.
E
So
now
we
just
subtract
the
bike
and
finds
where
it
was,
but
that
the
one
thing
and
the
second
one
I
inserted
a
shard,
blobs
and
I
mean
I
made
a
lazy
creation
of
short
blobs.
Now,
when
you
create
a
blob,
you
you
just
have
a
blog
and
the
first
time
you
actually
need
to
do
something
with
shared
blog
you
created
later,
and
it
made
like
eight
percent
improvement
when
a
random
write
tests
using
fio.
So
I
was
satisfied
with
that.
The.
A
E
A
C
E
That's
what
about
that's
a
trade
of
that
I
will
I
will
I
wish.
I
would
like
to
take
note
of
that,
and
if
that's
acceptable,
then
maybe
we
can
do
this
I.
Don't
all
right!
I
think
that
we
we
can
go
with
with
having
segmentation
fault
instead
of
throwing
exception,
especially
that
we
actually
don't
catch
them
very
much.
C
B
C
C
B
B
Anyway,
guys
I
think
we
have
two
separated
items.
We
can
work
on
right
path
and
reach
path.
At
the
moment,
read
path
is
even
if
buf
is
not
so
well
optimized,
and
but
we
can,
but
we
don't
see
it
in
typically,
because
of
presence
of
huge
metadata
caches.
However,
in
in
in
production
deployments,
the
having
such
a
large
caches
could
might
not
be
acceptable.
I
suppose
that
in
production
people
tend
to,
we
tend
to
have
money,
many
of
em,
always
this
even
based
on
nvme
devices.
B
E
B
A
I
would
have
expected
it
to
be
honest,
ladies
because
you've
been
spending
time
optimizing
at
the
right
path.
Yes,
yeah
crazy.
All
right,
well,
yeah
keep
up
the
good
work
guys.
It's
definitely
good
to
have
multiplies
on
this
kind
of
thing.