►
From YouTube: 2018-JAN-18 :: Ceph Performance Weekly
Description
Weekly collaboration call of all community members working on Ceph performance.
http://ceph.com/performance
For full notes and video recording archive visit:
http://pad.ceph.com/p/performance_weekly
A
B
C
D
D
D
D
Let's
see,
there's
a
pull
request
and
flight
on
you
see
that
looks
pretty
good.
It's
just
swishing
around
the
reads
to
not
send
messages
to
yourself
to
do
the
reads,
but
just
to
do
it
sort
of
in
line
it
didn't
do
it
before
for
simplicity,
but
the
fix
to
make
it
do
it
efficiently
is
actually
pretty
straightforward.
D
So
I
think
he
has
one
small
mark,
another
small
change
to
make
and
then,
if
we
can
go
and
test
it
way,
that'll
work
there's
still
an
open
full
request
for
doing
tearing
inside
blue
store,
I'm
still,
not
sure.
If
we
want
to
do
it
or
not,
that's
just
sort
of
sitting
there.
D
D
B
D
Just
transparent
for
I
see
on
this
corner
guys,
that's
mostly
it
that's
far
as
new
stuff.
D
So
she
made
here:
did
you
want
to
talk
about
the
your
stuff
with
VPP
this
week
or
do
you
anyway
next
weekend.
E
See
my
screen
so
I
try
to
do
some
PLC
to
use
the
way
PP
after
user
stack
user
space
tcp/ip
stack,
so
we
PP
provide
some
interface
to
the
application
to
use
it
after
user
space-tastic.
So
it
has
several
I
think
has
several
interface,
the
popular
one
is
it's
called
it.
We
see
out
a
chance.
We,
we
PP
communication
library,
so
the
application
used
with
the
air
has
two
mode.
E
One
is
just
use
the
preload
motor,
so
the
application
is
not
needed
to
modified
so
as
native
application,
so
just
to
use
the
LD
preload
Messer
to
preload
it.
We
see
our
library,
the
library
that
it
will
hook
the
stock
socket
layer,
and
so
when
you
call
the
POSIX
socket,
they
will
go
to
the
this
preload
library
with
the
optional
library
and
this
library
will
use
a
message.
Something
used
to
share
memory
to
talk
with
we
PP
Pioneer
API.
E
Software
stack,
so
the
binary
API
will
call
the
session
API
ISM
and
then
tcp/ip
I
go
to
the
DPD
kay
so
another
mode
to
use.
We
see
out
that
that
you
can
build
in
the
way
CL
API
into
your
application.
So
it's
not
pretty
promote
a
pre-loaded
mode.
So
you
just
call
the
with
your
API
in
your
application
build
in
your
application.
Then
we
see
I.
We
see
a
library
will
use
the
same
method
to
message.
Send
the
message
to
the
way
PP
to
set
up
the
session
connection.
E
The
another
one
I'm
not
very
familiar
with
is
that
they
have
plugging
in
the
EPP
card
memory,
what
you'll
network
interface.
So
there
is
memory
interface
plug-in
here
they
use
the
so
there
is
a
libel
can
be
caught
can
be
built
in
the
Indian
application,
so
application
use
that
lack
of
memory
interface
to
communicate.
If
it's
that
plug-in
so
just
like
to
cut
to
talk
with
virtual
network
interface,
I
tried
to
use
the
right
while
used.
We
see
out
first,
is
use
the
preload
mode,
so
I
needed
to
modify
them.
E
The
safe
code
adjuster
used
POSIX
stack
so
I
think
it
should
work
because
the
preload
mode
we
see
I
will
hook
the
socket
layer
so
we'll
cover
when
the
native
application
called
the
POSIX
stack,
something
create,
create,
socket
Bank
socket,
they
will
go
to
the
we
see
our
library
took.
That
is
them
the
we
PP
provided
the
interface.
So
another
subjection
is
that
I,
don't
think.
E
E
E
Okay,
so
the
a
PP
application
used,
we
see
how
so
the
for
the
server
you
need
attach
and
abandoned
for
the
client-side
you
need
to
the
attach
and
the
connection
connector,
and
there
is
a
share
memory
between
the
application
and
adele
we
PP
process.
So
when
the
plant,
when
the
data
transfer
there's
one
cup
a
one-time
copy
here,
so
that
client
application
need
a
copy
to
the
buffer
to
the
shared
memory
to
the
power
4
and
the
EPP
transfer
the
packet
it
received
also
need
the
copy
tutor
share
memory
file.
E
E
The
latest
of
EPP
and
the
sister
also
included
the
DP
DK
realization.
So,
no
matter
what
we
use
sister
are
we
PP.
The
typical
code
can
be
removed
from
the
the
staff
I
think
messenger,
because
the
DP
TK
device
in
initialized
the
memory
pool
initialized,
is
realized
in
the
EPP
stack
RC
star
stack
and
oh.
E
If
we
used
we
PP,
so
we
can
remove
the
TPD
connect
to
a
way
PP
stack.
If
we
won't
use
the
built
in
the
way
CL
API.
We
just
need
a
right
as
I'm
a
star.
A
similar
EPP
stacker
like
at
the
POSIX
stack
now
the
color
system
call
urban
way
created
the
sake
top
and
the
circuit.
We
need
a
call,
the
VCL
library
directory,
otherwise
we
don't
build
in
there
with
our
library.
E
E
D
D
But
the
thing
that
confuses
me
is
that
if,
if
you're,
using
C
star
or
even
if
you're,
if
you're
using
VPP,
then
it
seems
like
that
whole
the
whole
thread
pool
model
that
facing
Messenger
has
should
shouldn't
be
necessary,
because
whenever
you're
initiating
something
you
can
just
queue
it
with
VCL
and
whenever
you
get
a
completion,
that's
already
going
to
be
executing
in
some
other,
like
vcl
thread
context
whatever
that
is,
and
that
should
be
able
to
just
again
trigger
the
async
event.
Is
that.
E
D
G
Taller
order
I'm
not
proposing
I'm,
promoting
that,
but
the
end,
but
the
inverted
conversion
of
control
version
of
that
might
be
more
typical
and
I
mean
I
mean.
In
other
words,
it
might
be
something
that
eventually
happens
or
I'm,
not
sure
what
the
real
reason.
That
would
be
that
that
they
wouldn't
want
to
support
that.
My.
D
So
I
was
trying
to
ask
this
question
when
we're
talking
to
the
PvP
folks
last
week
and
my
understanding
was
that
that's
not
a
model
that
they
plan
to
support.
Although
you
know
in
principle,
maybe
they
could,
but
that
they're
not
planning
on
doing
that
and
the
models
very
much
that
there's
a
choice
of
dedicated
VPP
process
that
own
owns
the
world
and
then
they
have
sharing
mechanisms
so
that
you
can
utilize
it
over
shared
memory
channels.
G
D
At
a
more
likely
model
Matt,
if,
if
we're
thinking
about
such
stuff
as
VPP,
would
be
something
like
rgw
being
a
PPP
plugin,
where
I'm
sure
we
can
look
at
it
differently,
because
that
the
types
of
things
that
are
plugins
today
are
like
you
know,
network
translation,
layers,
routing
layers,
all
the
Sdn
stuff
runs
those
plugins
and
things
like
load,
balancers
and
I.
Think
something
like
rgw
or
it's
it
is
a
gateway.
Function
is
like
a
good
fit
for
that,
where
you're,
you're,
redirecting
and
sending
traffic
around
whatever.
D
I'd
be
very
interested
to
hear
their
VPP
so
that
the
sort
of
the
that
takeaway
that
I
am
ending
up
with
from
that
discussion
with
EVP
folks
was
that
VPP
is,
is
doing
a
really
good
job
of
building
like
reusable
accessible
infrastructure.
It's
being
used
for
estions
load
balancers,
it's
you
know
it
requires
into
a
virtualization,
it's
orchestrated
via
kubernetes,
but
it
is
like
it's.
It's
really,
a
replacement
for
the
kernel.
D
Network
stack
right,
it's
just
a
pluggable
one
that
happens
to
be
running
in
user
space,
but
it's
faster
so
and
it'll
having
Seth
be
able
to
plug
into
it.
When
you're
on
hyper-converged
knows
like
you're,
an
openstack
for
Cooper,
Nettie,
snowed
and
staff
is
one
of
many
services
like
it
makes
it'll
make
great
make
a
lot
of
sense
to
wire
into
VPP
directly
instead
of
using
the
normal
sockets
layer.
D
But
it's
not
it's
different
than
a
situation
where
Seth
wants
to
own
the
world
or
on
the
box.
So
if
you
imagine
like
a
1u
server,
packed
with
nvm
use,
doing
nothing
but
storage,
and
you
want
your
stuff,
those
T's
in
there,
it's
not
it.
It
doesn't
seem
like
it's
a
good
choice
there,
where
you
want
to
eliminate
that
shared
memory,
overhead,
so
I
think.
My
big
question
is
how
this
relates
to
C
star.
D
G
D
E
D
So
that
the
thing
that
I
don't
understand
is
that
if
you
there's
all
this
stuff
in
a
stick
messenger
with
the
event
center
and
the
event
driver-
and
my
recollection
is
that
this
is
just
a
thread
pool
a
set
of
workers
in.
If
we
move
to
see
star,
then
that's
all
going
to
go
away.
So
it
seems
like
this
is,
if
I'm
understanding
correctly,
this
is
sort
of
an
interim
piece,
because
a
signature
is
being
the
bridge
between
a
threaded
model
and
all
that
and
the
asynchronous.
E
E
Messenger
will
crater
the
Nasdaq
network
staff.
Network
stack
will
create
a
worker
pour
here
and
the
worker
pool
here
is:
has
each
worker
earth
rider
has
a
by
the
center,
and
so
so
each
workers
ride
will
handle
them
the
how
to
say
that
the
crew
is
located.
Listen,
the
socket
you're
been
abandoned.
So
when
the
connection
come
here,
the
workers
right
that
there
is
a
pool
mode,
they
pull
through
the
event
from
the
event
center
and
did.
E
D
E
D
E
E
E
D
E
So
that
means
a
weight
with
the
south.
Will
the
cupholder
with
g
pd
k?
But
I
don't
know
how
the
how
they
transfer
the
package
from
the
you're
dismissed
a
single
messenger,
because
if
you
don't-
and
this
realize
the
TPT
cake
exists,
because
because
if
he
used
a
huge
page
and
no
zero
copy,
so
how
you
manage
the
buffer?
E
D
Okay!
Well
thanks
for
thanks
for
sharing
that
I
think
it
gives
it
something
more
to
think
about.
I'm,
Josh
and
Greg
J
guys
have
any
questions
before
we
continue.
I.
C
D
D
D
H
We
made
those
testing
solely
using
the
file
objects,
the
plugin-
it's
it's
quite
far
from
OSD,
because
it
doesn't
even
try
to
mimic
the
PG
lock
related
traffic.
This
might
also
we
were
testing
also
only
using
Rams
rights
while,
for
instance,
the
duplication
we
have
infused
and
reordering
branch
made
might
be
used
may
might
be
still
useful
for
a
sequential
writes
still
needs
a
lot
of
not
testing.
Definitely
inside
for
SD
yeah.
H
I
mean
I,
don't
think
it
will
be
a
breakthrough
to
be
honest,
yeah,
okay,
to
get
more
from
rock
DB.
We
would.
We
would
need
to
to
to
make
the
batching
inside
jobs
to
be
much
much
more
deeper
it
at
the
moment.
It
ends
very,
very,
very,
very
quickly.
It's
shallow!
It's
on
it's
around
iterate
method
of
the
right
batch,
so
yeah,
okay,.
D
Forget
her
full
effect
on
fi,
the
makes
me
nervous
when
we
don't
see
an
effect
with
fi
o,
because
fi
o
most
of
the
time
is
being
spent
in
blue
store
and
so
usually
any
benefits
we
have
will
be
even
less
than
before.
Once
you
add
in
all
the
OSD
overhead,
so
then,
what's
the
T
and
T
and
C
of
the
Sun,
do
you.
H
D
H
D
C
H
B
H
D
H
H
D
A
A
What
else
there's
there's
a
bunch
of
other
stuff
floating
around?
It's
all
kind
of
messy
right
now,
because
there's
multiple
different
patches
from
multiple
different
architectures
there's
some
AMD
specific
patches
for
for
Specter,
and
then
there
they
both
the
Specter,
multiple
different
version,
Specter
patches
for
Intel
and
then
also
meltdowns
patches
for
for
Intel.
So
all
of
this
gets
kind
of
confused
too,
when
people
are
looking
at
benchmarks,
but
the
the
gist
of
it
is.
It
looks
to
me
like
when
the
the
smoke
clears.
B
This
is
the
tool
which
actually
mimics
the
behavior
of
Marc's
wall,
clock
profiler,
which
was
based
on
gdb
API.
The
only
functional
difference
is
that
it
uses
a
sleep
and
unwind
directly
and
forms
just
an
almost
standalone
binary.
It's
much
faster
I
was
able,
without
problem
to
sample
running
OSB,
one
time
thousand
one
hundred
times
per.
B
Second,
all
the
threads
without
noticing
any
performance
drops
so
that
that
was
nice
and
the
bad
side
is
that
I
wasn't
able
to
to
make
it
fully
standalone
tool
and
it's
required
to
preload
to
the
target
some
shared
library
before
it
runs.
So
that's
that's
inconvenience,
huge
inconvenience
I
intended
to
to
fix
it.
But
what
is
currently
difficult
for
me
is
that
default
linker
tools.
I
am
NOT
able
to
to
get
them
to
produce
a
binary,
real,
okay
binary,
which
is
properly
linked.
B
Maybe
it's
just
confined
to
badly
constructed
start
a
function
because
it's
it's
come
constructed
by
the
linker,
because
this
is
of
course
broken
I'm.
Sure
of
that
the
rest
of
code
looks
more
or
less
fine
man,
but
I
wasn't
able
to
test
that
so,
but
it
it
will
require
time.
But
I
currently
suspended
that's
it's
on
some
free
time.
B
I
I
will
resume
that
and
try
to
to
understand
why
why
it's
not
linking
properly
I
mean
actually
the
gold
linker
is
doing
much
better
work
because
it
is
able
to
produce
proper
calls,
but
relocations
of
data
in
start
start
function
are
bad,
but
LD
original
LD
even
gets
jump
addresses
wrong.
I
mean
they
are
real
okay,
but
to
relocate
to
some
weird
places
and
that's
broken.
So
that's
that's
it,
but
that
tool
is
I
think
can
be
used.
If
there
are
some
problems,
please
please
inform
me:
I
will
fix
anything.
That's
that's
broken.
A
A
I'm
very
excited,
though,
to
add
your
support
for
your
profiler
into
CBT,
because,
if
it's
not
affecting
the
performance
results,
that
means
that
we
can
run
it
during
tests
and
maybe
make
sure,
maybe
go
through
lots
of
things
to
check
and
see,
but
there's
a
good
chance.
We
can
add
it
and
start
getting
wallcloud
profiles
on
all
the
different
runs
which,
for
me,
it'd
be
amazing.
A
A
A
Also
one
of
the
things
that
came
out
when
I
was
doing.
That
is
just
kind
of
how
much
the
read
ahead
when
using
fragments
on
the
the
file
system
helps
when
you
have
no
client-side
read
ahead.
That
is
one
case
where
new
store
looks
a
lot
more
like
file
store
and
both
are
better
than
than
blue
store,
but
on
the
right
path.
All
of
the
work
we
did
to
make
blue
stores
writes
faster,
really
helped
I
mean
it's
like
twice
as
fast
for
for
small
for
K
random
writes.
A
This
is
our
BD
with
I
know
some
some
reasonably
decent
amount
of
of
I/o
depth.
So
just
not
not
anything,
you
know
real
mind-blowing
or
anything
here,
but
just
you
know
it's
nice
to
see
that
in
reality,
blue
stores
is
actually
on.
The
right
path
is,
is
doing
really
well,
both
for
large,
writes
and
for
small,
writes
and
then
kind
of
just
the
confirmation
that
yeah
we're
we're
kind
of
just
hurting
from
not
doing
read
ahead
anyway.
That's
that's
all.
Oh.
A
Maybe
I'll
throw
in
here
not
related
to
that,
but
there's
an
interesting
post
by
Christophe
Hellwig
he's
may
not
lining
the
old,
a
AOF
sync
code
sage
that
you'd
saw
like
from
three
years
ago
and
he's
working
on
some
other
stuff
for
the
scylla,
DB
guys
yeah.
A
D
I
think
it's
gonna
be
more
exciting
for
us.
It
illuminates
assist,
call
when
you're
pulling
for
a
iOS
I'm,
not
sure
if
we
would
actually
use
the
async
sync
or
not
not,
focus,
or
at
least
but
yes,
it's
exciting
that
we're
moving
forward
there.
This
wasn't
the
pull
request
list,
but
I
want
to
mention
it.
I
came
in
I.
Just
noticed
it
this
morning
is
one
of
the
top
ones.
The
United
stack
folks
have
made
a
new
cache
steering
mode
that
works
way
better
for
them.
D
They
get
like
reduces
flushed
by
90%,
with
a
zip
distribution
of
random
rights
and
with
a
similar
hit
rate
to
the
normal
right
leg
mode.
So
that's
pretty
big
when
what
they
basically
did
was
they
just
devoted
a
bunch
of
memory
to
making
it
replate
instead
of
using
the
hit,
sets
and
sequential
hits
us
to
estimate
temperature,
they
just
went
ahead
and
did
a
map
of
a
hash
ID
to
a
counter.
That
is
the
temperature
and
they
add
like
a
exponential
decay
and
they're
doing
that.
D
Instead,
so
it
uses
a
lot
more
memory,
but
it's
actually
can
create
an
accurate
temperature
estimate
and
the
other
thing
that
it
does
is
when
you
read
it
doing
that
temperature
estimation
I
think
on
the
I'm,
not
sure
handed
it
to
the
reply,
I'm
not
sure
exactly
why
they
did
that,
but
yeah
I
guess
that's
the
main
difference.
I
have
some
questions
on
there
for
class.
Let's
see.
A
A
D
So
they
did,
they
took
the
opposite
approach.
They
fixed
the
hit
rate
is
the
same,
but
they
have
one-tenth
of
the
flushes
I
see:
okay,
okay,
that's
what
they
did
anyway,
it's
a
remarkably
complete
pull
request
that
adds
Diedre
bits
and
all
the
rest
of
it.
It's
in
pretty
good
shape.
I
had
some
questions
and
it's
gonna
ask
for
more
performance.
It
so,
but
that's
kind
of
exciting
and
then
the
other
thing
is
last
thing:
I'll
mention
an
EC
bug.
Regression
started
me
thinking
about
the
unreadable
callbacks
again
in
the
object
stores
layer.
D
These
are
really
an
artifact,
a
file
store
because
you
couldn't
read
what
you
wrote
until
after
file
store
had
journal
it
and
then
written
it
to
disk,
and
so
has
this
tool
stupid
dual
callbacks,
none
of
the
other
backends
need
that
because
they
write
they
maintain
an
in-memory
cache,
maybe
case
or
does,
but
we
can
just
delete
it
if
it
doesn't
really
care
about
mem
store
and
bluestar,
don't
need
it
because
they
have
their
own.
They
manage
their
own
cache.
D
D
The
that
base,
the
thinking
is
basically
to
eliminate
that
from
the
objects
or
inner
interface
and
instead
at
a
kludge
inside
file
store
that
keeps
track
of
right
center
in
flight.
And
so,
if
you
try
to
read
something
that
is
in
the
process
of
being
written,
applied,
it'll
block
that
read
until
it's
apply
and
you
can
read
it
back.
So
that
sort
of
pushes
the
burden
onto
Pastore.
And
then
we
eliminate
all
the
all
the
code
and
the
OSD
for
all
those
callbacks
and
locks
and
whatever
there's
a
ton
of
it.
D
Unclear
what
the
performance
penalty
of
doing
that
extra,
tracking
and
file
store
will
be
or
whether
we'll
get
a
benefit
from
removing
the
complexity
everywhere
else,
because
if
it's
a
speeds
that
blue
store
by
20%
and
slows
down
file
store
by
20%,
that
might
still
be
worth
it.
But
we
might
want
to
wait
one
more
release
before
we
sort
of
as
the
balance
shifts.
D
A
D
Good,
no,
nothing
does
it.
That's
when
you
want
to
talk
yeah,
I'm,
I'm
kind
of
tempted
to
do
the
simple
thing
and
just
rip
out
all
of
the
callback
code
without
actually
fixing
file
store
first
and
just
see
what
the
effect
is
on
blue
store
workloads
to
give
us
just
a
to
know
whether
this
is
worth
worthwhile.
D
But
we'll
see
the
main,
the
main
thing
that
I
that
needs
to
happen
is
to
take
the
sequencer
concept
and
the
collections
and
sort
of
combine
them,
because
on
reads,
we
need
to
wait
for
in-flight
rights
and
in
reality,
sequencers
and
there's
there's
already
a
one-to-one
relationship
between
those
two
but
there's
still
two
different
structures,
and
so
I
need
to
combine
those
in
the
code
and
the
interface
whatever,
and
so
that
the
reads
will
know
what
to
wait.
For.
D
But
once
that
happens.
D
C
I
D
C
D
D
A
I
guess
one
one
thing
to
keep
in
mind
is
that
file
store
performance
for
anything
fast
might
be
going
kind
of
downhill
with
all
the
specter
meltdown
stuff
anyway,
if
it's
not
helping,
certainly
yeah.
So
if
that's
the
case,
people
we
might
need
to
move
people
over
to
blue
store
I
care
about
performance
sooner
rather
than
later.
I
D
I
I
D
D
A
Other
thing
is
that
blue
store
is
already
faster
right,
so
even
if
we
take
a
hit-
or
you
know,
if
knowing
right
now,
blue
stores-
probably
at
least
fifty
to
a
hundred
percent
faster
for
small
random,
writes
and
file
store,
is
so
you
know
file
store
taking
it.
You
know,
I,
don't
30%
hit
vs.
blue
store,
taking
at
twenty
five
percent
hit
you're
going
up
in
a
better,
much
better
place.
D
So
you
know
what
I'm,
what
I'm
trying
to
figure
out
is
how
to
like
answer
the
question
of
what
that
performance
penalty
would
be
on
file
store
without
actually
doing
all
the
work,
but
I
think
there
might
not
be
a
way
to
do
that,
but
I
guess.
The
upshot
is
that
if
the
initial
refactor
is
actually
the
first
hard
part
and
that
we
can
do,
there's
no
performance
implications
there.