►
From YouTube: 2016-DEC-07 :: Ceph Performance Weekly
Description
Weekly collaboration call of all community members working on Ceph performance.
For full notes and video recording archive visit:
http://pad.ceph.com/p/performance_weekly
A
Alright,
I
think
I'm
just
going
to
go
in,
and
so
you
can
catch
up
when
he
gets
here
alright.
So
what
we
have
going
on
this
week,
there's
a
couple
of
new
pr's
that
came
in
adding
trace
points
or
critical
functions
in
the
io
path.
That's
always
welcome.
I'm
hopeful
that
we
will,
in
general,
be
able
to
start
doing
better
trees.
Point
analysis
soon,
I
think
that
would
actually
help
us
quite
a
bit.
I'm
called
different
things
going
on.
A
A
There's
a
new
object
store
of
performance
benchmark
from
sandisk
here
there
were
some
questions
as
to
why
we
would
want
to
use
that
versus
something
like
fio
with
the
object
store
back
end,
and
it
sounds
like
new
you
just
can't
generally
for
ease
of
integration
into
test
Suites
and
also
potentially
lower
CPU
overhead.
So
there's
MACD
discussion
going
on
there
if
you're
interested
in
optics
or
benchmarking
and
it
being
a
good
one
to
look
at.
A
B
Those
those
they're
really
old,
pull
request
as
a
newer
version
of
the
code,
Oh
fantastic.
A
B
Yeah,
it's
orthogonal.
His
pull
request
is
just
splitting
it
in
two,
so
that
there's
one
thread
doing
the
submission
and
another
one
doing
the
completions.
Oh
and
Tom
NASA's
like
in
doing
submission,
so
we
could
have
n
times
too
I
guess
but
I'm
not
not
sure.
Yet,
if
that's
going
to
be
a
good
idea
or
not
we'll
see.
A
All
right
me
on
to
discussion
topics.
There
were
two
the
head
today,
I
just
want
to
quickly
mention.
We
just
started
looking
at
RVD
with
EC
over
a
testing,
and
the
results
are
still
coming
in,
but
so
far
the
the
gist
of
it
seems
to
be
that
this
is
going
to
be
really
good
for
sequential
write
performance.
A
B
A
Began
so
at
least
for,
like
you
know,
for
four
megabytes,
sequential
writes
both
on
hard
drives
and
then
be
unique.
It
looks
really
really
good
or
basically,
all
other
cases.
There
are
kind
of
mixed
results
with
you
lots
going
on,
and
so
cases
is
quite
a
bit
slower,
but
you'd
expect
that
in
some
cases
just
give
them
the
extra
overhead
this
required
so
next
week,
I'll
I'll
have
more,
but
for
right
now
and
the
gist
of
it
is,
is
for
large,
sequential
writes
it's
looking
really
good
from
a
performance
perspective.
C
Mark
I
have
a
quick,
quick
question.
Yes,
so
you
said
that
rant
that
the
reeds
did
not
perform
well,
and
I
can
understand
why
small
you
know
random
writes
are
going
to
perform.
C
With
ec
but
reads
it
seems
to
me,
I
can't
think
of
a
reason
why
they
should.
Can
you
explain
that
it's.
B
B
B
So
there's
a
sort
of
a
long-standing
feature
on
the
roadmap
that
would
let
the
clients
read
directly
from
the
various
shards
opportunistically
at
least
try
to
do
that
before
falling
back
to
reading
from
the
primary,
because
in
the
case
of
no
bracing
rights
and
that'll
work
just
fine,
but
that
was
the
hasn't
been
implemented.
So
thank
you.
Nice.
Come
back
to
that
later,
yep.
A
So
I,
the
other
thing
I
don't
want
you
here
before
me
on-
is
that
our
friends
at
sandisk
have
been
doing
the
kind
of
work
and
have
a
blue
store
on
zetas
scale
available
for
people
to
test
there's
a
couple
days
ago.
They
have
chem
the
initial
branch
here
that
uses
a
single
kv
singh
thread
and
then
a
new
version,
I
think
was
just
released,
may
be
very
very
early
this
morning,
the
sickly,
but
all
this
is
experimental,
but
but
an
even
more
experimental
version
that
uses
multiple
PVCs
threads.
A
E
Mana
mana
is,
is
there
so
he's
also?
He
is
actually
made
managing
audi
to
skeletal
too,
but
yeah,
so
we
should
be
yeah.
I'm
writing
this
multi
kissing
thing
that
actually
I
mentioned
on
the
on
the
mail.
So
so
the
desk
is
basically
pearl
is
in
for
working
great,
and
that
is
what
actually
we
need
from
the
blue
store
and-
and
I
am
seeing
promising
result
with
that
darling
results,
but
I
still
need
to
do
lot
of
benchmark
and
they're.
E
A
lot
of
optimization
still
needs
to
be
done
on
the
on
the
shame
layer
so
that
also
we
are
doing
and
I
think.
Hopefully,
in
a
week
or
next,
two
weeks
right,
we
should
be
able
to
present
to
the
community
that
that
entire
difference
between
dsl
to
get
it
booster
and
in
the
rocks
to
be
so
yeah
and
hopefully
actually
it
will.
E
It
will
do
good
and
the
difference
is
mostly
on
the
on
the
bigger
volumes,
as
we
know
that
the,
if
the
matter
is
less
rocks,
TV
will
be,
it
will
be
great,
but,
as
you
actually
put
more
data
down
rocks
TV
so
will
be
actually
like
working
working.
Like
debating
because
of
the
perfection
and
whatnot
so
also
as
that,
a
for
right
I
am
trying
to
find
out
what
is
the?
What
is
the
crossover
point?
E
For
example,
we
can
say
that,
okay,
if
for
the
volumes
times,
for
example,
lower
than
100
gig
so
rocks
like
you
will
be
probably
very
performing
better,
but
bigger
than
hundred
dig,
it
will
be
get
a
scale,
will
be
always
better
because
a
more
volumes
be
cut
the
volume
sizes,
so
the
performance
more.
So
so
that
crossover
point
I
am
trying
to
find
out.
So
actually
it
will
be
helpful
for
the
for
everybody,
and
also
we
are
working
like
as
part
of
the
complete
implication.
We
need
to
run
virtually
rocks
TV.
E
E
Up
with
a
solution
for
that,
so
if
we
see
performance,
the
performance
is
the
first
thing:
if
we
see
performance
benefit,
significant
performance
benefit
all
the
way,
so
that
so
that's
the
next
thing
like
how
to
integrate
for
the
product
for
the
production
deployment
by
sharing
the
spaces
in
between
so
so
yeah.
So
that's
what
else
we
been
working
on
so
fully.
Probably
a
next
week,
I
will
target
for
the
next
week.
Performance
meeting
so
should
be
able
to
provide
a
detailed
benchmarking
result.
Yep
see
ya.
F
So
last
time,
when
we
presented
mark
and
sage
the
performance
data,
we
saw
the
various
call
slide.
So
let
us
scale
with
a
larger
dataset
performing
better
than
box
TV,
but
even
in
that
test
we
saw
that
out
of
Iowa
a
bandwidth
left
on
the
system
on
the
flash
guys,
because
the
since
single
thread
submitting
a
bunch
of
transaction
in
the
stream
layer.
It
is
supplied
one
by
one,
so
not
having
an
affair
ilysm.
They
are
not
able
to
saturate
the
flash
bandwidth.
F
So
we
think
that
applying
this
transaction
through
multiple
filled
in
parallel,
we
have
seen
that
with
the
various
other
tests,
which
is
a
dead
skin.
So
we
are
experimenting
this
with
a
multiple
thread.
If
you
can
push
these
transactions,
so
we
should
be
able
to
saturate
and
use
all
the
bandwidth
left
in
the
system
in
in
the
device
and
expose
that
bandwidth
to
the
client.
So
that's
the
expectation
and
early
results
prove
that
cutting
tools
from
that
and
we
need
to
do
more
runs
and
performance
to
make
sure
that
it's
like
this
explainable.
G
B
The
poor
request
now
I
think
we
can
do
this
a
little
bit
more
efficiently
than
oh.
It's
like
it's
things,
sort
of
the
threats
of
being
divvied
up
on
the
receiving
end
and
I
think
we
can
just
separate
out
into
n
different
cues
that
we
can
clean
it
up
later.
I
think
the
key
is
that
showing
that
this
is
actually
helping
and
then
going
from
there.
E
B
G
H
So
that's
what
I
am
actually
showing
on
the
screen
for
those
two
trays
points,
the
second
one
is
you
know
so
this
as
user
gives
us
the
function
level,
the
latency
we
are
also
looking
at
into
any
latency.
So
if
you
follow
the
flight
path
of
0
ID
from
the
beginning
of
the
operation
submission
whether
it
is
reach
all
rights,
you
should
be
able
to
track.
You
know
different
places
and
what
the
latency
is.
So
that
is
essentially
the
focus
of
the
oid
tracing.
H
So
the
already
tracing
has
two
sets
of
ltd
ng
events:
one
is
a
generic.
You
know
that
avoiding
event
that
has
essentially
the
name
of
the
object,
ID
and
a
tag
that
I
normally
overload
with
different
things
for
a
given
event.
So
it
is
an
event
or
it
could
be
something
else,
and
then
it
has
essentially
some
information
about
what
is
the
file
where
this
event
is
originating
from
what
is
the
line
number
and
so
on?
H
I
also
added
one
additional
thing
called
avoiding
elapsed,
so
this
essentially
gives
me
there
are
certain
places
where
I
really
cannot
tag
an
event
we
whenever,
whenever,
if
the
event
is
conditional,
so
if
you
look
at
the
ASIC
messenger,
I
am
only
tagging.
The
MSD
up
and
m-fer
reply
and
I
won't
be
able
to
know
until
the
message
is
actually
decoded.
H
So
in
that
case,
what
I
do
is
I
essentially
take
a
timestamp
right
before
this
message
arrived
and
when
the
when
the
decode
complete,
when
it
is
really
the
MSD
up
or
image
the
OP
reply,
then
I
take
the
timestamp
and
dump
it
into
the
oid
elapsed
event.
So
there
are
several
events
where
you
may
want
to.
You
know:
just
use
the
yd
elapsed
as
opposed
to
just
marking
an
event
with
certain
time
stamp
in
it
and
attack
yeah.
B
H
About
how
late,
okay,
so
so
sister
feel,
if
you
look
at
those
events
once
you
capture
them,
I
have
essentially
three
sets
of
python
scripts.
What
they
do
is
eat
skim
through
the
entire
set
of
events,
function,
trace,
events
and
figures
out
the
essentially
it
grates
the
stack
from
the
function,
trade
trace
events
based
on
the
events
that
are
happening
within
the
thread
and
when
the
function
is
entering
when
the
function
is
exiting.
H
So,
if
you
look
at
the
profile
of
you
know
the
output,
what
you
will
normally
see
is
a
text,
let's
say,
for
example,
if
you
take
a,
I
will
operate
as
the
function
call
and
you
get
a
detail
of
which
function.
What
file?
In
what
line
number
it
is
essentially
modulating
from
that
function
is
essentially
calling
off
submit,
which
is
calling
a
method
in
the
objector
up
submitted
budget,
and
then
it
essentially
goes
through
a
series
of
calls
in
the
different
stages,
and
I
can
actually
compute
the
latency
of
each
one
of
them.
H
You
can
selectively
turn
it
off
just
by
you
know,
essentially
disabling,
that
specific
function
trace
the
way
you
disable.
That
function
trace
is
just
a
very
simple
macro,
and
that
macro
is
something
called.
So
if
you
really
want
to,
you
know,
trace
a
function,
you
just
add
a
macro
called
function
place
that
is
essentially
instantiate
an
object
tracking
the
timestamp
of
that
entry
point
and
then,
when
it
actually
goes
around
hope
it
is
taking
timestamp
of
the
exit
times
time
and
that's
how
you
essentially
get
a
trace.
H
You
know
shell
script,
to
comment
out
all
those
things
and
then
selectively
turn
it
on
if
you
want
to,
but
at
the
end
of
the
day,
once
you
get
the
trace,
you
essentially
can
paint
the
stack
of
all
the
operations
and
then
you
can
essentially
look
at
a
breakdown
of
you
know
the
latency
for
each
one
of
those
operations,
the
intent
of
actually
submitting
the
pull
request
for
different
function.
Traces
is
based
on
the
testing
that
we
did.
There
are
certain
critical
functions.
We
really
want
to
do
that
racing
in
couple
of
layers
down.
H
We
want
to
look
at
the
breakdown,
but
once
you
do
that
analysis,
if
you
really
want
to
zoom
in
because
it
captures
gigabytes
worth
of
data,
you
really
don't
want
to
do
this
tracing
when
you
want
to.
Actually,
you
know,
run
a
test
run
for
a
longer
period.
Once
you
hone
in
on
specific
set
of
functions.
H
Where
you
see
the
bottleneck,
you
want
to
actually
filter
out
everything
else,
but
those
few
functions
where
you
want
to
collect
the
trace
and
then
we
can
run
it
at
volume
so
that
you
know
you
will
be
able
to
compute
and
not
do
this
effectively.
So
you
really
want
to
look
at
this
as
beat
a
trace
as
the
starting
point
and
then
start
actually
trimming
trimming
it
down
to
the
functions
that
you
really
want
to
focus
and
optimize
and
then
just
collect
them
at
volume.
Give.
B
A
sense
of
what
the
overhead
is
like,
if
you,
if
you
have
it,
if
you
have
a
trace
on
an
outer
function,
and
you
start
and
then
look
at
what
you're
you
know
average
mark
second
measurement
is
and
then
you
add
a
bunch
of
traces
inter
functions
that
it
calls
GC.
Have
you
looked
to
see
what
the
effect
on
the
outer
functions?
Runtime
is.
H
No,
you
mean
just
a
lttle
trees,
overhead,
each
other
yeah.
H
H
B
H
Okay,
so
that's
essentially
the
you
know
the
list
of
functions
and
the
breakdown
latency
breakdown,
you'll
get
a
very
my
new
detail
and
the
OAD
is
the
one
that
I
normally
use
as
a
way
to
figure
out.
How
long
is
it
taking?
There
are
two
specific
set
of
events
that
I
normally
focus
on
when
it
look.
H
When
I
look
at
the
boy
d
tracing
one
is,
if
I
want
to
submit
a
message
from
the
client,
how
long
is
it
taking
from
a
client
to
OSD
that
is
actually
picking
up
that
message
and
how
long
is
it
taking
to
reply
that
message?
Those
two
are
the
latency
vectors
that
are
normally
care
fer
from
a
network
latency
perspective,
so
I
use
voyant
as
a
way
to
track
that
the
second
one
is
when
you
do
the
thread
switching,
you
know
there
are
different
places
where
you
do
a
thread.
H
Switching
because
of
the
function
level
function
level.
You
know
the
latency
breakdown
is
not
going
to
give
you
the
thread,
switching
latency,
so
I
use
the
oid
event
as
a
way
to
figure
out.
You
submitted,
let's
say,
a
message
into
the
dispatch
queue
and
then,
and
then
you
r
DQ
that
operation
there
is
a
thread
switch.
B
H
So
this
one
is
actually
coming
from
avoiding
event,
so
essentially
they
saying
how
long
so
let
me
go
through
this
tag
one.
So
this
is
the
Gredos
bench
right
sequence,
so
you
are
submitting
a
write
operation
through
the
air.
You
will
operate,
call,
that's
your
entry
point
and
the
acing
messenger
picks
up,
writes
a
message,
and
then
there
is
a
delay
on
how
long
it
is
taking
the
moment.
You
have
a
message
that
gets
submitted
on
the
Raiders
client
to
the
time
that
was
d
picks
it
up.
H
H
These
two
things
are
a
little
bit
overlap
because
I
am
were
laying
the
void
events
as
well
as
a
functional
events
together,
but
essentially
there
is
a
dispatch
function
latency,
and
this
is
essentially
the
breakdown
of
that
function.
Latency
and
then
the
some
dispatch
to
DQ
operation
thread
switch
latency.
H
So
this
is
where
the
threats
which
happens
and
the
question
is,
how
long
is
it
taking
and
then
how
long
is
the
operation
for
the
d
cube
function
or
DQ
operation
and
then,
once
you
do
the
DQ
operation
there
is
a
thread:
switch
from
DQ
operation,
completion
to
op
applied.
Of
course
there
are
series
of
things
that
happen,
but
through
the
finisher
it
comes
into
the
op
applied
function.
J
H
H
H
Very
similar
pattern
here
too,
for
the
read,
as
you
can
see
here,
the
module
one
is
the
Raiders
to
royalty
network.
So,
as
you
dial
up
the
know,
the
queue
depth
you
are
going
to
see
that
that
pattern
repeating
quite
a
bit.
So
if
you
look
at
the
queue
depth
of
one
all
the
way
to
queue
depth
of
16
with
a
single
client,
you
can
see
the
latency
will
just
keep
going
up
and
up
same
thing
on
the
reverse.
You
will
see
the
latency
going
up
and
up
on
the
radio's
side
on
the
receiving.
H
You
know
when
the
response
gets
back,
so
you
will
see
that
pattern
with
the
queue
depth
IQ
depth-
and
this
is
one
of
the
reasons
when
actually
to
shower
actually
shows
of
the
full-blown
scale-out
performance
data.
You
can
see
that
number
of
you
know
the
clients
and
increase
in
the
queue
depth.
You
can
see
somewhat
little
bit
of
this.
You
know
the
latency
pattern
and
it
may
be
useful
to
clarify
that
this.
This
was
this
will
make
servant
done
with
a
single
node
single
and
be
me:
did
it.
D
H
Everything
is
contained
isolated
to
local,
because
I
can
actually
get
the
Ltd
ng
trace.
Events
from
Raiders
and
osb
win
the
same
time
stamp
with
the
same
sequence,
so
I
don't
have
to
do
acrobatic
work
to
actually
sort
it
out
and
then
I
can
actually
get
the
control,
the
number
of
parameters
and
mice
level
to
bare
minimum.
Just
to
look
at
the
latency
of
0,
SDM,
rado
Slayer,
you
know.
Typically,
what
you
want
to
do
is
once
you
do
this
latency
across.
D
D
H
D
D
H
A
H
D
H
I
G
D
B
H
Yeah,
this
is
not
a
completely
maxed
out
scenario.
You
know
I'm
kind
of
torn
between
the
number
of
ltd
LG
traces
I
have
to
collect
and
it
starts
crapping
out
along
I
need
to
run
the
test
and
then
what
it
is,
but
in
general
the
point
I
was
making
in
the
beginning
was
once
you
get
a
handle
on.
You
really
need
to
shortlist,
the
you
know
the
focus
on
which
you
really
want
to
do
the
latent
analysis.
H
Yeah,
then
you
can
actually
go
full-blown
with
take
10
sd1
in
DME
SSD
and
just
stretch
it
to
the
maximum
and
look
at
what,
where
the
bottlenecks
are.
That's
the
right
thing
to
do,
but
number
of
events
that
you
are
capturing,
I'm,
really
running
for
five
or
six
seconds
and
the
amount
of
data
is
around
20.
Gig
are
so
right,
so
you
get
mind
boggling
it.
You
know
from
an
hour
and
I
love
his
perspective,
but
really
the
next
step.
So.
G
G
H
G
Than
it
is
I'm
saying,
is
it
our
system?
I
suspect,
lib
rbb
is
only
opening
one
Raiders
context
yeah
and
that
that
is
where
the
bottleneck
is.
We
saw
exactly
the
same
problem
in
our
GW
and
went
in
and
modified
rgw
to
allow
it
to
have
multiple
ratos
contexts
over
and
that
basically
eliminated
this
problem.
It's
trickier,
maybe
because
of
the
sequential
parallel
interlock
issues
that
you
have
to
deal
with.
A
concern
is
simpler.
When
you
only
have
one
one
pipe
but
I'm
wondering
if
that's
really
what
this
is
showing
up,
that
probably
I'm.
B
B
I'm,
not
sure
I'm,
I
guess,
like
my
question
is
like
is:
does
it
matter?
I
mean
it
in
the
case
where
you
have
a
single
client.
I
guess
is
that
it's
that
actually
a
case
that
we
would
care
to
optimize?
Are
we
more
concerned
that,
because,
in
general,
no
ste
is
going
to
be
sharing
its,
you
know
work
between
a
bunch
of
her
clients,
so
you're
going
to
get
get
the
parallels
in
their.
H
D
H
B
B
I
H
H
H
H
So
the
one
interesting
point
before
I
actually
wrap
it
up
and
give
it
to
you
the
flow
to
push
our
is
a
soap.
This
is
probably
a
philosophical
discussion
more
than
anything
else.
So
if
you
look
at
the
score,
if
you
exclude
everything
else,
the
the
core
I
will
into
the
media,
you
are
take
out
the
locking
and
network
latency
and
everything
very
interesting
data
point.
So
you
are
essentially.
H
Around
you
know,
maybe
around
200
microseconds
again,
this
is
for
the
right
place
will
be
a
little
bit.
You
know
faster
right,
even
if
I
increase
in
the
queue
depth
and
increase
in
the
number
of
clients
it
is
ranging
anywhere
between.
You
know,
200
to
1
50.
It
keeps
changing,
but
look
at
that
as
three
hundred
microseconds
for
the
time
being,
and
if
you
look
at
the
into
and
latency
you
know
you
are
essentially
looking
at
probably
the
core
I
was
taking
our
twenty
percent
of
into
and
latency
the
rest
of
that
is
pretty
much.
H
You
know
the
the
whole
context,
management
network,
latency
and
threading
management.
To
me,
that
is
really
really
high.
If
you
really
want
to
look
at
for
the
envy
mes,
LTS
and
optimize
it
that's
where
we
need
to
be
focusing
on
from
an
optimization
perspective,
the
core
I
was
taking
longer,
then
it's
a
different
problem
and
that's
probably
the
case
with
the
hard
disk
drives,
but
it
the
problems
which
is
completely
with
the
SSD
drives.
H
Okay,
so
there
is
a
pull
request.
I
do
want
to
actually
drop
in
the
python
scripts
as
well,
where
you
can
actually
parse
the
function.
Traces
I,
don't
know
mark.
If
you
can,
let
me
know
what
would
be
the
right
place
for
this.
It
doesn't
look
like
it
is
the
you
know,
the
actual
SEF
repo,
but
it
looks
like
it
is
somewhere
else
right.
If
you
let
me
know,
I
can
actually
drop
in
those
scripts
and
maybe
put
in
some
nice
readme
file,
so
people
can
use
it
once.
B
One
quick
note
on
that
last
comment:
you
were
looking
at
the
the
issue,
rep
pop
basically
piece
and
adding
that
up,
that's
200,
so
microseconds,
that's
actually
only
the
front
Saturday.
Oh,
that's,
basically
preparing
all
the
metadata
and
getting
everything
ready
and
queuing
it
the
device,
but
it
does
actually
block
waiting
for
the
IO
IO
and
any
of
those
functions.
B
I
think
the
actual
I/o
number
that
you're
seeing
is
41
where
you
have
the
DQ
up
to
apply
threads,
which
latency
because
then
and
some
other
thread
we're
going
to
pick
up
the
I/o
completion
and
then
we're
going
to
wake
up
and
trigger
it,
and
so
that
that
includes
all
the
context
switches
to.
But
the
actual
cost
of
that
was
in
that
number
I've.
Even.
B
But
so
I
think
I
think,
there's
there's
there's
work
to
do
there
in
a
couple
different
places.
Right
I
mean
one
is
that
we
can
spend
less
time
preparing
the
I/o
if
we
can
optimize
that
bath
and
then
ensure
that
the
probably
we
need
to
narrow
and
more
on
be
the
time
the
latency
between
one
the
I/o
is,
the
aio
is
queued
to
the
device
and
when
we
get
the
completion
and
then
try
to
separate
that
from
the
various
thread
context
switches
that
happen
along
that
path.
Let's
see
parachute.
H
B
I
think
the
others
get
at
things
that
worry
me
or,
like
you
know,
line
26
find
object.
Context,
that's
71
microseconds,
just
to
like
get
the
oh,
no,
that
we're
going
to
operate
on
and
you
know,
prepare
transaction
is
somewhere
where
we
can
probably
make
some
improvements.
Like
finish
ctx
and
all
these
things
are
adding
up
right
like
five
seconds
here
page.
B
D
K
D
D
L
H
Course,
the
lab
environment.
I
have
all
the
cpus
that
I
can
use
it
at
my
disposal,
so
I
basically
put
in
whatever.
Is
that
happened?
But,
yes,
it
is
too
high.
2619
32.3
charge.
Okay,
but
again
you
know
keep
in
mind.
You
know
I'm
really
looking
at
this
as
just
a
unity
will
break
down
and
optimization
focus,
I
can't.
D
L
H
H
We
need
to
look
at
the
blue
store
latency
from
the
time
the
RP
submitted
and
the
time
it
completes,
which
I
don't
have
it
here
to
really
look
at
the
I/o
latency,
which
is
another
optimization
point
and
then
I
think
the
third
one
is
I
think
this
is
what
sage
and
Sam
talking
about
going
after
each
one
of
these,
you
know
find
object
context,
and
there
are
few
operations
in
the
critical
DQ
operation
flow.
We
could
potentially
optimize
I
think.
H
H
H
H
The
publisher
site
is
essentially
the
10
SD
liberators
clients
running
on
the
same
host
machine
just
to
keep
it
at
a
micro
level.
There
is
absolutely
no
replication,
nothing
just
10
SD.
You
are
going
to
see
more
and
more
latency
break
down.
Once
you
start
expanding
the
scope.
To
let's
say
20
is
these
in
two
different
nodes
between
a
client
is
running
on
a
different
host
compared
to
the
OSD?
You
know
that
is
essentially
a
more
broader.
A
H
Succeeded
vacuum
you
yeah
other
way,
you're
going
to
see
a
lot
more
other
problems
and
you'll
get
lost
in
the
system.
Yeah,
okay,
so
I
do
we're
gonna
run
out
of
time.
I
don't
want
to
give
some
time
to
push
our
two
yeah
sure,
but
basically
a
that.
We
in
the
party
we
have
shared.
You
know,
especially
to
22
how
much
question
on
you
know:
closter
sighs.
We
we
actually
had
to
have
results
on
a
larger
cluster.
H
In
the
past
we
we
we
have
been
sharing
data
with,
for
you
know,
multiple
OS
DS
on
poor
nvme
on
all
envy
me
node,
right
and
and
in
a
sage
and
thyme
like
like
we
had
discussed,
you
know
we
we
do
want
to
go,
go
away
from
that
model
right,
which
is
a
sort
of
a
band-aid.
So
we
we
actually
ran
some
some
data
literally
last
night
on
on
the
latest
surf
master
as
of
yesterday
on
a
five
load
cluster.
So
let
me
see
if
I
can
actually
put
up
the
picture
here.
H
So
it's
basically
a
five
node
cluster
with
six
nvm
es
of
these
each,
where
the
P
3700
class
SSDs,
with
10
SD,
/
nvme.
The
data
while
and
DV
partitions
are
separate
on
each
nvme
and
we
have
seven
clients
with
ten
rbd
volumes
each
you
know,
and
and
basically
so
basically
with
this
this
test
set
up.
You
know
for
for
reads.
H
Yeah,
so
the
chart
is
basically
a
latency
I
of
chart
with
the
markers
being
the
queue
depth,
and
these
are
the
queue
depths
as
seen
by
fiu
rbd
from
from
the
client
side,
so
I
mean
these
do
need
to
be
translated
to
to
queue
depth
sat
at
the
at
the
device
level
and
the
way
the
configuration
is
set
up
there.
Let's
say
it's
ad
queued
up
16,
though
the
queue
depth
of
the
OSD
is,
is
above
right
about
30,
two
and
but
the
you
know.
H
But
the
point
here
is
that,
like
from
an
efficiency
standpoint,
if
you
look
at
let's
say
but
typical-
be
3,500
class
nvme
SSD
today,
which
is
in
fact
on
the
on
the
lower
end
of
performance.
When
we
were
looking
at
2017
range
of
SSDs,
it's
about
450
k
for
k,
I,
ops,
and
we
you
know
I
the
I,
ops,
/
and
Vav
that
we
are
seeing
are
right
around
you
know,
10
to
12
k
range.
So
we
are.
H
We
are
right
within
in
the
three
percent
utilization
today
from
a
read
standpoint,
and
we
basically
just
wanted
to
give
give
you
guys
this
snapshot
of
where
we
are
today
and,
as
you
can
see,
you
know,
after
after
16q
death
of
16
and
in
fact
already
actually
did
some
analysis.
Do
you
want
to
do
and
tell
them
what
the
five-and-ten
client
experiment
yeah.
H
A
I
H
So
we're
right
at
at
about
forty
percent
CP
utilization.
You
know
somewhere
between
that
eight
and
sixteen
to
depth
level
and
and
I
at
128.
You
know,
as
you
can
see,
you
know
it.
The
CPU
stays
that
under
fifty
percent,
okay
yeah
and
from
an
vme
utilization
point
of
view.
You
know
these
I
ops,
numbers,
/,
/,
0,
SD
or
sorry
I,
also
over
40,
SD
or
/
nvme.
Here
right
they
do
correspond
to
the
to
the
back
in
read
utilization.
H
H
You
know
we
with
with
a
10
SD,
/
nvme
and
six
nvme
system
today.
You
know
I
mean
this.
Basically,
a
dense
configuration
that
we
are
we
looking
at
for
2017.
That
number
of
people
have,
you
know,
expressed
interest
in
you
know
at
in
this
class
cluster.
We
were
not
able
to
scale
beyond
8q
depth
and
I.
Think
the
effective
queue
depth
at
the
OST
level
is
right.
Around
16.
H
H
A
Okay
is
very
low
blow
I
apps
porosity
number
compared
to
what
I've
seen
won't
be
37
hundreds,
but
we
can.
We
can
talk
about
that.
Not
fine
yeah.
H
Yeah
I
think
we
should
definitely
correlate
the
notes,
but
if
you
just
in
we
want
to
give
you
know,
it
is
like,
with
the
background
that
ready
provided.
You
know
where
we're
marching
it
with.
You
know
we
like
the
goal,
is
to
really
get
the
performance
up
in
this
case,
where
your
hostings,
you
know,
11
/,
steeper
and
Jamie
right.
H
E
H
Yeah,
it
is
so
some
of
its
yet
I
think
we,
we
have
tried
a
few
configs,
but
but
you
know
what
we
are
showing
is
basically
the
defaults
right
now,
but
we
we
have
tried
I,
think
marks,
marks
letters
config,
which
I
think
which
perform
similarly
and
and
yeah.
So
we
are
the
data
that
you
are
seeing
right
now.
Some
of
these
actually
with
the
rocks
TV
default
settings
right
yeah.
Are
they
selling.
H
Correct
yeah
and
in
fact,
actually
a
week
we
run
into
some
issues
where
we
run
into
some
OSD
crashes,
who
asked
when
we,
when
we
change
the
settings.
So
that's
who
is
that
that
expression
is
still
pending,
because
we
we
actually
started
looking
at.
You
know
various
options
that
are
being
shown
here
right,
but
but
actually
we've
run
into
some.
So
the
lander
you
can
probably
talk
about
the
behavior,
saw
yeah.
J
H
A
A
But,
okay,
okay,
okay
and
I-
don't
know
this
is
probably
gonna
be
difficult
to
do
off
the
top
of
your
head
with
you.
Do
you
happen
to
know
approximately
how
much
of
the
how
much
that
was
pearl
st.
H
J
A
H
A
F
E
It's
a
brand
new
microphone,
but
somehow
a
chili's
NDC,
the
w
of
solution
here,
so
you
say
that
you
correlated
that
numbers
lading
from
this
at
the
flute,
so
that
means
sigue.
There
is
no
way
to
enter
it,
going
on
to
the
dch
all
the
data
eat
and
still
you're
getting
this
number.
So
have
you
actually
take
that
okay?
Is
there
any
compaction
drives
or
anything
else?
You
click
on
order
only
leaves
so
that
we.
H
D
H
E
E
H
A
Hey
go
yeah.
Everyone
feel
free
to
join
us
in
a
half
an
hour
for
the
monthly
except
developer
meeting.
If,
if
you're,
able
and
we'll
reconvene
again
here
next
week
and
hopefully
have
smores
at
a
scale
blues
for
testing
done
and
also
more
data
or
already
on
ECE
overwrites
so
day
and
a
half
an
hour
guys.