►
From YouTube: 2017-FEB-08 :: Ceph Performance Weekly
Description
Weekly collaboration call of all community members working on Ceph performance.
http://ceph.com/performance
For full notes and video recording archive visit:
http://pad.ceph.com/p/performance_weekly
A
A
A
D
A
A
C
A
B
A
Don't
think
flash
like
having
like
five
something
Eugenie's
open
others
in
time.
C
C
C
We're
going
okay,
oh
I'll,
go
I'll,
do
the
pull
request,
really
quick,
the
the
main
one
that
still
in
flight
is
that
prefer
wall-sized
my
family
of
hard
drives
on
my
local
block,
saying
a
donk
up
today,
so
I
can
actually
test
it
in
a
sort
of
its
intended
environment.
I'm
I'll
try
to
do
that
today,
a
couple
for
customers,
the
partial
Rashard
for
Bluestar
merged
and
the
blob
brush
Matt
a
merge
that
just
reduces
the
amount
metadata
and
simplify
some
code.
So
that's
better
rgw
sizes
are
tuned
better.
C
That
also
went
in
the
main
other
thing:
that's
in
flights
that
doesn't
have
a
PR
open
right
now
is
we
realized
that
fast
dispatch
stuff
was
way
more
complicated
than
it
had
to
be
because
of
the
way
that
we
were
dealing
with
PG
splits.
So
we
have
a
big
long
series
of
matches
that
fix
that
simplifies
bunch
of
code.
We
left
locking
most
complicated
should
be
much
faster,
so
that's
in
the
compiling
and
running
state,
but
not
really
tested.
I
reviewed,
but
it
looks,
looks
promising
so
far.
A
A
A
C
Talked
about
it
very
briefly,
but
I
guess
the
short
version
is
that
it
looks
very
promising
and.
A
I
will
definitely
be
interested
in
taking
a
look
at
that.
One
you've
got
code
that
works
I,
see
for
the
past
couple
of
days
and
over
the
weekend
I.
I
went
back
and
finally
cross
something
off
my
to-do
lists
on
there
for
a
while
to
get
mark
speakers
get
put
benchmark
into
cbt
and
kind
of
thick.
The
rgw
stuff
in
there
is
sort
of
worked
already,
but
not
in
a
really
convenient
ways,
and
now
you
can
use
CBP
to
stand
up
a
cluster
and
run
get
foot
benchmarks
against
rgw.
A
They
were
using
ec
62
bucket
indices
on
hard
drives
only,
but
they
were
using
nvme
journals
with
one
extra
flotation
on
a
very
small
cluster
in
ND
drives.
I
could
I
can
easily
get
better
than
they
were
seeing
as
being
like
10,
you,
like
four
seconds,
never
seen
buffy
and
a
half
now
that
I've
switched
over
to
actually
making
the
OSP
use
you
more
work,
trying
to
replicate
their
easy
setup.
I
think
it's
four
plus
two,
not
too
close
to,
but
I'll
just
check
on
that,
but
anyway
and
I'm,
seeing
quite
a
bit
lower
performance.
A
I've
also
now
replicating
the
index
cool,
though
for
nutrient
replication
in
the
index.
Full
so
also
seemed
very,
very
high
latency,
so
something's
going
on
only
two
big
into
it
more,
but
definitely
something
interesting
there
in
the
war
next
case,
though,
I
can
push
about
3.7
gay
bytes
per
second
through
one
gateway.
So
that's
good
news.
It's
really
really
cpu
hungry!
I
was
reading
like
one
course.
A
lot
of
that
seemed
to
be
in
ye-e-ep
layer,
so
purpose
this
kind
of
hard,
not
to
mention
triples
cookie,
saying
so.
A
A
A
I
have
lost
and
then
the
the
craziness
with
the
meeting
URL
today,
because
they
added
to
be
on
here
about
seeing
to
expect
improvement
with
C
being
while
on
SP
but
I.
Don't
I,
don't
actually
know
what
that
means,
just
adding
those
on
to
an
FC
game
into
a
certain
Permenter
earth,
but
bringing
gave
them
a
better
for
improvements,
but
anyway,
maybe
next
week.
So
I'll
talk
about
that,
then
I
hear
textbook
this
week.
Does
anyone
else
have
anything
they
would
like
to
talk
about
this
week's.
E
Hi
this
is
jen
jen
I'd
like
to
talk
about
about
some
founding
about
k,
apt
and
MBD
in
our
continuing
Wellman.
E
Yeah,
basically
do
what
we
want
to.
We
compare
the
kybd
initially
we'll
use
the
khabidi
in
our
pair
mental
continuum
and
and
how
Cano
was
report
on
a
three-point
turn
and
which
is
quite
old.
So
we
we
want
you
to
use
some
new
features
in
the
body.
Then
we
tested
alibaba
d
with
nvd
and
we
found
out.
You
know
what
testing
women
and
for
our
test
case,
we
found
out
MBD
and
the
dbar
body
actually
is
about
ten
percent
faster
than
ki
body.
E
C
It's
it's
going
to
be
related
to
the
workload.
The
main
difference
is
the
K.
Rbd
has
no
client-side
caching,
so
the
iOS
are
going
to
go
straight
through
to
the
OSD,
whereas
if
you
do
n
VT
to
live
our
buddy
live,
our
buddy
has
a
client-side
cash,
and
so
some
of
those
rights
will
return
immediately
and
then
they'll
only
block
when
you
have
a
flush
so,
depending
on
the
workload
or
the
benchmark
that
you're
using
you
might
get
better
performance
because
of
that
client-side
caching
sure
them
okay,
wait!
So.
E
The
concussion,
that's
your
memory,
caching,
and
so
for
both
NBD
and
the
cabig
is
the
you
know,
both
of
them
on
client
side.
You
have
a
pitch
cash,
so
yeah.
C
C
But
if
you're,
if
there's
like
a
filesystem
sitting
on
top,
then
probably
you're
going
to
have
pretty
pretty
similar
behavior
I
see
em,
but
it
really
depends.
It
depends
on
the
timing
of
B
and
the
overlap
of
the
iOS
that
are
being
passed
to
our
BD,
whether
they're
blocking
or
not,
because
in
the
kernel
RVD
case
those
iOS
will
tend
to
block,
whereas
in
the
liberty
case
they
will
won't
block
until
you
get
a
flush,
and
so
I
have
to
do
with
the
iOS,
the
rights
versus
the
flushes
and
and
so
on.
F
For
that
ternet
proton,
oh
sorry,
I
know
I'm
also
a
lot
of
the
older
kernels.
There
was
problems
with
TCP,
no
delay
which
was
fixed
in
the
later
ones,
which
had
a
large
impact
from
performance
and
then
there's
also
a
lot
of
read
ahead
improvements
and
later
colors
as
well.
If
it's
some
read
stuff
as
well.
E
Techno,
well,
we
we
saw
that
yeah,
but
we
didn't
notice
any
bigger
comments
in
pie
for
Cutie,
be
no
delays.
Yeah,
we're
not
sure
why
yeah
we
are
using
three-point
turn
supposedly,
which
wish
you
saw.
So
we
should
not
see
the
problems
are
in
powerful
to
be
no
delays,
but
we
didn't
see
that
we
are
very
interesting.
E
C
Think
that
the
main
thing
is
that
an
NBT
is
probably
not
sort
of
the
going
to
be
the
most
common
choice
and
it's
not
the
primary
focus
for
the
rbd
team
instead
they're
looking
at
using
the
LI
OTC
mu
runner,
which
is
a
different
sort
of
user
space
pass
through.
That's
part
of
the
Elio
stack
and
attaching
love
our
p
to
that.
C
So
that's
the
same
basic
concept
where
you
have
liberty
of
running
in
user
space
and
you
it's
just
supposing
a
colonel
watch
device,
but
it's
if
using
NBD
is
using
the
l,
io
t,
CMU
runner
thing,
I'm,
not
sure
exactly
what
the
status
of
that
is.
Maybe
Josh!
Maybe
you
know
more,
but
that's
that
that's
the
path
forward.
That's
what
that's!
What
they're
going
to
use
for
doing,
I,
scuzzy
support
and
for
ya
in
general,
but
we.
D
A
C
In
our
notably
in
Red
Hat
and
sent
us
Colonels,
it's
not
supported
at
all,
whereas
la
o
is
well
maintained
and
more
feature
full.
The
other
reason
is
that
the
el
año
pass
through
is
scuzzy
based,
which
means
that
all
the
reservation
information
gets
passed
through,
which
means
that
we
can
actually
do
proper
H,
a
failover,
fencing
stuff
with
the
I
Scotty,
multiple
s,
gazi
gateways,
and
that
is
impossible
with
NBD
because
of
it's
sort
of
a
dumb
protocol.
Oh.
E
C
D
G
C
C
But
we
have
to
sort
of
solve
the
first
problem.
First,
so
that'sthat's
for
this
step
for
the
sort
of
future
full
DM
clock
that,
what's
you
do,
I'm
reserved
io
for
clients
and
so
on.
There
are
other
efforts
that
are
trying
to
manage
this
sort
of.
At
a
macro
level,
just
by
carefully
controlling
how
I,
also
provisioned
I
am
limiting
at
the
client
side
and
making
sure
that
you
don't
over
subscribe
to
cluster
and
that's
that
that
happens
in
like
cinder,
basically
and
I.
Think
those
are.
Those
are
totally
separate
from
Seth.
C
G
Football
we
have
a
situation
like
we
have
probably
more
than
11
thousand
were
two
thousand
times
access
their
subchapter
one
pool
at
the
same
time
requirement
for
each
kind
of
going
to
have
their
own.
You
know
their
pollution
use
in
terms
of
oh
I'll,
for
you
and
I'm,
not
so
for
sure
we
cannot
delay
didn't
see,
but
I'm
using
a
house
well.
C
F
C
G
C
G
E
G
A
Don't
tell
him
anything
here
unless,
if
you
want
briefly
mentioned
and
sing
that
oh.
C
Yeah,
that's
sort
of
referrals
related,
so
there's
this
crush
issue
with
them,
small
osts
or
small
racks
or
host
or
whatever
getting
more
Gigi's
in
our
supposed
to,
and
it's
sort
of
a
thorny
statistics
problem
and
Dan
presented
it
to
one
of
his
like
weekly
groups
at
CERN,
and
a
couple
of
people
came
up
with
the
solution.
Our
proposed
solution,
so
that's
promising
the
sure
the
link,
I
guess
I
haven't
to
figure
out-
that's
actually
in
work
or
not
will
be,
but
hopefully
hopefully
it's
promising.
C
Let's
see
I
guess
I
mean
the
other.
The
main
thing
really
I
think
going
on
right
now.
Is
this
the
fastest
patch
refactor,
which
I
could
talk
about
in
detail?
I,
guess
I'll
ever
most,
you
want
to
talk
about
it
with
Greg
he's
not
here
right
now,
but
Josh.
If
you
want
I,
could
go
into
detail
and
make
sure
that
the
coming.
C
H
B
C
Good,
okay,
all
right,
let's
see,
I
guess
I
can
post
the
branch
github.
C
Just
be
a
branch
in
the
chat,
I'm,
not
there's
a
way
to
show
it.
This
is
like
so
this.
This
branch
is
built
on
top
of
the
future
stuff.
That's
ready
to
merge,
then
there's
a
second
batch.
That's
the
whip,
PG
split
stuff,
which
basically
makes
the
changes
so
that
clients
will
resend
if
it's
a
new
OSD
with
the
new
feature,
bits
it'll,
resend,
request
on
a
PG
split
or,
as
previously
didn't
do
that,
and
it
adds
the
bits
to
the
monitor
so
that
it
will
force
old
clients
to
resend
on
a
PG
split.
C
Also,
so
that's
sort
of
the
second
half
of
this
and
there's
some
cleanup
and
MSD
up
to
make
it
all
work
in
each
objects
and
bunch
of
other
stuff.
So
that's
nice.
The
other
sort
of
main
the
main
takeaway
is
that
the
mos
Diop
message
gets
a
SP
GT,
which
is
the
placement
group
ID
and
the
shard
number
encoded
explicitly
in
the
message.
C
Now
the
client
is
putting
information
in
there,
and
so
the
end
result
is
that
every
message
that
coming
that's
coming
in
too
fast
dispatch
now
has
the
SPG
T
the
exact
place
in
Group
chard
that
it's
destined
for
right
there,
and
so
we
can
just
pass
it
directly
into
the
queue
and
the
queue
is
basically
re,
just
search
and
replace
refactor
to
use
PG
IDs
instead
of
the
place
and
group
pointers,
and
so
that's
great.
The
fastest
ash
path
is
like
one
see
if
I
can
find
the
actual.
C
Here,
that's
dispatch,
so
fat
fish,
that's
basically
it.
This
is
coming
directly
off
the
messenger
and
as
long
as
it's
a
either,
not
an
OST
app,
because
all
the
other
messages
already
had
the
explicit
ID
in
there
or
it's
a
new
and
client
that
understands
that
it
has
to
set
this
explicitly.
Then
we
just
do
it
directly
with
no
further
checks
or
work
or
anything
like
that.
C
Otherwise,
if
it's
a
licensed
legacy
client,
then
we
have
to
do
some
checks,
so
we
have
to
look
up
a
map
and
do
that
mapping
and
we
do
this
in
order
to
preserve
ordering.
We
have
to
do
this
waiting
on
map
thing
and
off
which
we
kind
of
very
similar
to
what
we
did
before,
but
basically
those
functions
get
fed
through
here
this
this
dysfunction
and
we
basically
say
if
it's
you
know
if
it's
an
older
declined
as
an
old
map,
we
maybe
share
it.
If
it's.
C
C
We
have
to
look
up
the
current
OST
map
and
we
have
to
do
that.
Mapping
and
I
think
actually
I'm
missing
a
lock
here.
They're
supposed
to
be.
This
is
like
the
one
place.
Where
am
I
know
it's
here,
it's
in
the
collar,
so
we
actually
do
the
service
get
next
map
reserved
and
release
map
just
like
annoying
complicated
handoff.
That's
an
osu
service
and
it's
really
just
to
run
this
one
function,
and
this
one
function
is
like
not
doing
any
blocking
work.
C
C
C
H
C
H
C
Well,
yeah
so
we'd
have
to
make
sure
it
actually
helps.
But
but
it's
it's
a
it's
a
it's
a
much
smaller
scope
of
work,
that's
being
done
under
the
protection
of
that
map
and
so
I
think
it's
it's
it's
more
hopeful
than
it
was
before,
but
but
yeah
it
probably
as
a
matter
anyway,
because
it's
it's
like
a
sea
client.
So
do.
A
A
A
C
Yeah
so
anyway,
that's
so
that's
the
fast
dispatch
gets
really
simple.
It's
just
this
waiting
for
matt
waiting
for
ya,
waiting
for
map
on
sessions
and
the
the
wake
up
stuff
is
is
simpler
as
a
result,
and
so
I'm
pretty
happy
with
that,
but
that
what
it
means
is
that
everything
that
gets
queued
if
it's
a
if
it's
a
legacy
request,
it's
OA
has
always
has
an
older
map
than
we
have
or
if
the
same
ran
over
math.
But
if
it's
a
new
client
it
could
be
a
newer
map
or
an
older
map.
C
We
don't
know
which
means
that
we're
going
to
have
things
coming
out
of
the
bottom
of
this,
just
queue
that
like
rpgs,
that
might
not
exist
that
we
need
or
their
old
for
old
pgs
that
you
know
went
away
a
long
time
ago
and
so
the
other
the
second
half
of
this
has
to
be
much
smarter.
So
that
is
in
what
started
work
queue
process
which
I
cleaned
up
a
little
bit.
So
basically
it
it
comes
through
it
peaks
at
the
item.
C
It
says
it
basically
puts
it
on
the
ordered
list
of
things
that
it's
going
to
process
and
then
it
drops
the
lock
and
tries
to
lock
the
PG,
because
I
have
to
do
that
first,
so
that
it's
ordered
within
the
PG
lock.
So
it
tries
to
look
it
up.
It
sort
of
that
this
may
or
may
not
happen
because
we
may
or
may
not
have
the
PG,
as
we
know,
item
that
versus
now
PG
ID,
instead
of
a
pointer
to
the
pg.
So
this
may
or
may
not
work,
but
we
don't
really
care.
C
Yet
we
have
to
go
grab
that
items
that
we
had
cute,
so
we
were
ordered.
So
if,
if
we
have
the
lock,
then
we
have
this
in
order
if
we
don't
but
I
the
way
we
have
the
item,
DQ'd
are
ready
to
go
work
on
it
and
we
have
to
handle
the
case
where
the
PG
doesn't
exist,
and
so
this
is
a
little
bit
different.
Sometimes
this
is
a
an
op
that
came
out
of
fast
dispatch,
sometimes
it's
not,
which
is
also
sort
of
annoying
and
confusing.
C
But
in
this
case
we
have
there's
a
new,
a
new
set
of
wait
lists.
That's
just
called
waiting
for
PG,
so
it's
a
mutex
that
protects
just
the
waiting
for
jeep
for
PG
map
and
it
is
a
map
of
SPG
to
a
list
of
PG
q,
able
I,
think
the
thing
that
comes
through
the
queue.
And
so
we
look-
and
we
say
if
the
if
the
sharted
placing
group
in
the
current
epoch
that
is
protected
by
this
block.
C
So
this
this
this
weightless
is
basically
always
maintained
in
synchrony
with
of
specific
osteen
up
so
the
Sochi
map.
Reference
is
protected
by
the
lock
also
so,
if,
as
of
that
epoch,
that
PG
should
exist
on
the
Sault
Ste,
but
just
doesn't
then
we'll
just
wait
for
it.
We
just
haven't
like
created
it
yet
or
we
haven't
gotten
to
notify
to
instantiate
the
PG
or
whatever,
if
the,
if
it's
the
fastest
patch
off
and
the
client
has
a
newer
map
and
we
have
then
maybe
it
doesn't
exist
yet.
C
As
of
the
OC
map
we
have,
but
it
will
in
the
future.
So
we
also
should
put
it
on
that
list
and
wait
for
us
to
catch
up
to
that
map
and
then
maybe
we'll
decide
to
create
it
or
not,
and
otherwise
it's
like
an
older
map
or
whatever.
Then,
if
you
shouldn't
exist
and
we
just
dropped
on
the
floor
and
that
goes
away
and
then
there
are
a
couple
other
text,
then,
when
we
process
a
new
map
we
go
through
and
look
at
all
these.
Where
is
that.
C
To
find
this
here
somewhere
at
dispatch
sessions
waiting
on
matt.
Well,
that's
the
other
thing.
Oh
here
it
is
yeah.
So
we
look
at.
We
go
look
at
the
people
after
we
have
our
new
map
that
we
just
processed
so
that
osu
is
catching
up.
We
go
take
a
lock.
We
update
the
one
that
it's
sort
of
synchronously
matching
and
then
we
go
through
all
the
things
that
are
waiting
and
if
it's
something
that
we
shouldn't
have,
then
we
throw
it
out.
C
C
C
H
Yeah,
okay,
all
right
child.
If
I
can't
see
the
source
for
this
territory,
what
I
would
be
worried
about
is
that
we
want
to
make
sure
we're
not
using
the
thread
locality
properties
we
used
to
have
and
I
couldn't
tell
from
the
description
yeah.
C
C
Then
the
and
then
the
last
result
is
that
the
all
of
the
checks
that
used
to
happen
before
we
keep
something
about
whether
the
OSD
or
the
client
had
like
an
old
map
and
needs
a
map
update
or
the
request
is
old,
and
it
should
get
this
you
get
thrown
away.
All
that
happens
now
after
so
there
was
a
whole
waiting
for
map.
In
first
thing
that
happened
before
we
things
get
queued
and
now
that's
all
in
the
PG
instead
of
in
a
in
the
OSD.
So
the
PG
has
a
new
waiting
for
map
and.
C
C
Yeah,
so
the
first
thing
now
in
a
request
handling
is
checking
whether
we
have
to
wait
for
a
map
and
the
behavior
is
equivalent
to
what
happened
before
where
if
there
are
any
ops
already
waiting
for
the
map,
then
we
knew
ops
also
wait
for
the
map
so
that
we
preserve
ordering
and
when
you
only
start
waiting.
If
we
get
a
request
that
has
a
map
in
the
future,
in
which
case
we
just
start
populating
that
that
list
and
wait-
and
these
other
tests
are
already
here
yeah.
So
it's
really
busy
little
only
new
ones.
C
Now
that
are
happening,
post
post
queue
and
then
and
then
I
changed
recue
ops,
so
that
if
we
are
sort
of
pushing
stuff
back
up
into
the
queue
backwards,
if
their
stuff
that's
waiting.
When
we
have
to
ask
go
on
that
list
and
order
to
preserve
the
order,
instead
of
going
back
up
and
all
the
way
up
into
the
OC
work,
you
it's
pretty
much
pretty
similar
to
what
happened
before.
C
But
the
ordering
sets
a
lil
bit
different
because
it's
happening
post
instead
of
pre
pre
queue
and
as
part
of
that,
because
it
was
so
freaking
confusing,
I
actually
went
and
doc
documented
all
of
the
or
map.
Oh,
this
is
it
it's
not
in
this
particular
version,
but
I
wouldn't
documented
all
the
different
places
that
ops
can
wait
and
the
properties,
because
some
of
the
lists
sort
of
we're
waiting
and
then
once
we
stop
waiting.
We
never
wait
on
them
again,
like
the
waiting
for
active
and
waiting
for
peered
and
other
wait.
C
C
So
you
can
understand
why
the
code
recuse
everything
in
a
specific
order
and
I'm
in
the
places
that
it
knows
which
is
better.
I
guess
than
what
it
was.
It's
still
kind
of
gross,
so
I'm
sort
of
trying
to
figure
out
if
there's
a
better
way
to
structure
the
code
so
that
it's
a
more
verifiable
safe
framework
for
handling
recue,
except
that
is
better
able
to
preserve
order.
But
no
fee.
C
C
C
C
C
E
Said
I
hope
the
changes
now
question
are
above
that
PG
distribution,
which
is
a
tribute
to
the
westie
weight
which
you
were
talking
about
so
James.
The
posted
Allah
issue
on
melodies.
So,
basically,
are
we.
We
found
our
Custer.
We
have
the
PCPD.
Education
is
not
that
even
for
each
your
HP
and
on
the
OST,
which
you
had
no
PT,
we
saw
a
lot
bigger
latency
on
the
USP.
E
So
that's
why
we
are
probably
that's
why
we
saw
the
little
bit
bigger,
live
in
sitio
as
well,
and
now
we
are
thing
about
how
to
adjust
the
teaching
numbers
on
each
USP
manually
and
we
won't
remember
supposed
to
use
a
questi
way
to
adjust
to
adjust
picky
number
I.
Then
we
saw
this
part
weekly,
no
smaller
what
he
makes
we
might
gather
in
a
multi-modal.
So
that's
it.
C
Worry
about
that
yet
that
that
that
happens,
if
you,
if
you
have
like
Rex
each
rack,
has
like
you
know
whatever
a
thousand
Oh
Steve's,
but
then
one
rack
has
like
two,
then
it's
going
to
get
those
those
diesel
get
overloaded,
but
that's
not
usually
the
situation
so
I
would
I
wouldn't
worry
about
that.
For
now.
The
main
thing
you
want
to
do
is:
there's
a
command
that
will
adjust
all
the
OSD
weights
automatically
in
order
to
even
out
the
distribution.
It's
step,
OST
re
weight
by
utilization.
E
E
C
Generally,
only
need
to
do
it
if
there's
a
change
to
the
cluster
I'm
like
if
you're,
adding
or
removing
Oh
Steve.
That's
when
you
need
to
repeat
the
process,
but
the
easiest
thing
would
probably
just
be
to
run
it
from
a
cron
job
daily,
because
if
it's,
if
it's
already
been
done,
it's
a
no
op.
So
there's
no
reason
not
to
just
keep
doing
it.
I
see.
A
C
A
C
So
that
the
plan
is
to
move
the
functionality
into
the
manager
and
make
it
more
sophisticated
and
so
there's
actually
one
of
the
google
Summer
of
Code
projects
is
going
to
be
to
do
this
in
the
manager,
because
they're
they're,
very
first
people
have
written
offline
script.
So
do
the
same
thing
and
better
smarter,
basically,
and
so
the
hope
is
that
we'll
do
that.
Carry
this
had
been
startled.
Sorry,
I
gotta
run
all.