►
From YouTube: 2019-08-15 :: Ceph Performance Meeting
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
All
right,
let's
get
this
started,
so
four
new
pull
requests
this
week,
one
is
from
Igor
it's
an
evolution
of
his
previous
work,
looking
at
trying
to
figure
out
how
to
better
utilize
a
device
space
for
fast
devices
in
blue
storm,
so
I
have
not
looked
at
how
this
is
different
than
the
earlier
version,
but
I
see
or
yeah
you're.
Here
you
are,
do
you
want
to
say
just
how
this
kind
of
improves
on
the
old
one
I.
B
Yeah,
sorry,
so
actually
it
simplifies
things
a
bit
by
utilizing
current
path
model
for
ox
DB,
which
is
as
just
slow
in
DB
paths
and
instead
of
forgetting
level
from
rocks
DB
or
using
some
tricks.
It
uses
just
this
low
and
DB
levels,
virtual
levels
and
again
makes
the
cymatics
per
level
per
device
usage
and
tracks
our
space
is
used
and
then
tries
to
allocate
extra
space
from
DB
volume.
If
possible.
Oh,
we
don't
need
any
changes.
Clorox
DB!
We
don't
need
any
changes
to
existing
path,
model
and
eyes.
A
A
C
A
So,
let's
see
okay,
a
couple
things
closed,
CBT
stuff
for
crimson
testing.
This
deferred
I
have
notify
unlocking
optimization
and
that
didn't
actually
do
a
whole
lot.
It
sounds
like,
but
yeah
every
little
bit
helps
and
then
this
other
one
from
Egor
I
removed
some
lock
acquisitions
and
you
come
in
finisher.
A
One
is
in
testing,
that's,
basically
an
evolution
of
a
couple
of
different
other
ones
that
he
had
been
working
on.
This
one
is
different
in
that
it
leaves
the
larger
Alex
eyes
for
rocks
to
be
wears
on
the
shared
device.
We
use
a
smaller
Alex
eyes
to
avoid
fragmentation,
related
space
usage
or
space
amplification.
Maybe
I
should
say.
A
C
A
A
Let's
see
this
met,
mb/s
cache
memory,
limit,
work,
I
think
Patrick
is
camp,
trying
to
to
figure
out
what's
going
on
with
it.
Hopefully
we
can
still
get
in
look
for
octopus,
but
it's
moving
a
little
slow,
I
think
or
just
hasn't
gotten
as
may
have
paste
recently
blue
store
alligator
aging
test.
This
is
a
really
old
PR.
That
Adam
was
working
a
while
back
and
looks
like
he's.
A
Maybe
sorry
other
Adam
is
working
on
and
I
think
he's,
maybe
resurrecting
that
to
hopefully
get
it
in
and
then
this
week
no
movement
on
the
the
sharding
work,
but
hopefully
that
will
will
continue
to
progress
I'm,
watching
that
very
closely
to
see
if
I
should
try
to
get
my
double
cash
avoidance:
oh
no
double
cashing
avoidance,
PR
merged
now
or
after
that,
but
any
event
we'll
see
how
it
goes.
A
E
E
Okay,
so,
but
I've
been
looking
into
so
far,
is
well
the
integral
is
we
want
quality
the
QoS
stuff
in
the
OSI
to
work
right
so
for
that
to
work,
blue
sword
needs
do
not
accept
more
than
the
amount
of
I/o
it
needs
to
actually
hit.
Essentially,
its
servant
will
see,
though,
the
throttles
that
currently
exist
in
booster.
There
are
two
of
them,
one
throttles
the
amount
of
I/o
that
is
accepted
into
boost
or
but
not
not
committed,
yet
so
the
part
that
hasn't
gone
through
the
KP
Singh
thread.
E
E
What
those
throttles
should
be
set
to
to
ensure
that
booster
gets
reasonable
throughput
at
different
io
sizes,
but
does
not
accept,
does
not
accept
so
much
I.
Have
that
the
next
thing
taken
out
of
the
throttle
has
a
lot
of
latency,
because
once
we
have
with
some
architectural
changes
in
the
OSD,
we
can
modify
it
so
that
we.
E
When
we
get
a
high
priority,
io
from
higher
priority
client,
for
instance
the
latency
serving
that
we
relatively
love,
because
the
booster
would
have
already
committed
to
doing
a
bunch
of
low
priority
stuff
so
to
get
better
data
on
this
I
added
a
whole
bunch
of
trace
points
to
booster
that
allows
racing
or
sampling
of
specific
iOS,
specifically
all
of
the
initial
conditions
or
the
for
the
I/o
when
it
starts,
you
know
what
the
current
throttle
values
are:
a
bunch
of
snow
from
rocks
TV
and
a
few
other
things.
E
So
each
of
these
data
points
is
a
single
I/o
taken
from
running
random,
4k
rights
against
512,
gigs
of
data
for
making
outfits
using
the
stuff
or
PFI
osf
object
or
back-end,
and
it
looks
to
me
here
like
at
least
for
4k.
We
only
need
about
100
of
KP
throttle
the
sizes
here
are
misleading
it.
E
It's
the
one
point
out
here
is
10
Meg's,
but
the
throttle
cost
per
I/o
hard.
This
value
for
booster
defaults
to
a
little
more
than
half
a
mega,
and
is
here
one
night
so
be
heard.
The
latency
curve
here
starts
to
go
up
substantially
after
essentially
to
iOS.
And
similarly,
if
you
look
at
down
at
the
throughput
graph,
we
already
have
most
of
the
available
through
foot
at
one
or
two
iOS,
so
we
actually
get
very
little
benefit
from
doing
a
lot
of
stuff
or
our
disks.
A
E
We
do
benefit
from
having
it
deferred
throttle.
So
if
you
look
at
the
second
column
of
graphs,
we
get.
We
don't
see
big
spikes
in
latency
until
like
30
Meg's
of
different
level,
and
there
is
some
modest
increase
in
throughput
associated
with
that.
So
somewhere
in
the
middle
there
is
more
appropriate.
The
only
challenge
is
that
it's
hard
to
well.
E
If
we
look
at
the
512
numbers,
we
do
see
some
increase
in
or
some
benefit
from
having
multiple
things
queued,
probably
because
of
overlapping
the
right
with
the
with
the
kV
cigarette
right.
But
similarly,
low
concurrency
is
the
sort
of
order
of
the
day
here,
there's
a
little
bit
of
a
trick
in
that
it's
hard
to
it's
hard
to
get
the
right
throttle
cost
to
match
up
between
4k
and
512
pay
here,
because
in
both
cases
the
throttle
cost
per
IO
actually
dominates
so
we'll
have.
E
E
All
right,
so
nvme
is
a
little
more
interesting,
I'm
concurrency,
actually
matters.
It
looks
like
up
to.
E
Whatever
they
say
up
to
about
500k
or
add
about
500k
of
throttle,
which
is
about
4000
iOS
here,
because
the
cost
per
I/o
on
SSD
is
set
to
8k.
In
this
case,
we
get
pretty
much
the
full
troop
what
we
have
available
to
us,
which
in
this
case
is
about
yeah,
27,000,
I,
ops
and
there's
a
large
jump
in
latency
past
that
point.
So
that's
probably
what
we
want
and
that's
much
much
lower
than
the
throttle
thought
you
said
in
the
defaults.
E
In
fact,
the
throttle
value
set
the
defaults
as
I
believe
lower
for
nvme
than
Hardison.
It
should
be
the
other
way
around
and
VB
actually
benefits
parallelism
with
512
same
kind
of
theme.
Some
additional
throttle
value
is
valuable
up
to.
Perhaps
three
is
worth
of
worth
of
throttle,
but
not
much
past
that,
because
512
a
is
enough
to
largely
exaggerated
the
device.
E
So
that's
all
I've
got
questions
so
I
think
the
the
next
stage
is
to
embed
these
trace
points
into
CDT,
get
better
data
and
very
likely
improve
the
defaults.
But
the
longer-term
goal
is
to
make
these
things
self
self
tuning,
because
I
think
the
actual
values
will
differ
substantially
between
different
device
types,
for
instance
with
hard
disk
values.
This
this
curve
looks
completely
different.
If
you
don't
have
an
nvme
database
like
it's
not
similar
at
all
they're,
essentially
not
the
same
device
type.
E
C
F
E
The
evidence
here
is
that
any
experiments
we've
done
in
the
past,
with
the
default
with
the
default
throttles,
were
not
gonna
work
because
boost
or
accepts
way
more
I/o
than
it
should,
though,
the
next
step
is
to
see
how
the
QoS
stuff
works.
If
the
throttles
are
set,
something
appropriate,
okay
got
it
that
make
sense.
E
E
Yeah,
don't
I
guess
the
way
I'm
gonna
do,
that
is
by
wiring
up
CBT
by
setting
up
CBT
and
then
wire
wrap
a
couple
of
different
RBD
devices
with
different
priorities
and
evaluate
whether
one
of
them
with
extremely
low
queue
depth
gets
decent
latency,
while
the
other
is
giving
or
is
a
setting
of
saturation
level
workload.
That
I
think
should
be
the
test.
E
That's
where
that's
where
I
started
and
that's
why?
If
you
look
at
the
pull
request
and
that's
a
lot
of
additional
light,
I
picked
out
all
of
the
or
all
of
the
rocks
TV
properties
that
I
thought
were
interesting,
like
it
exposes
the
number
of
current
compactions
in
progress.
The
amount
pending
flush
all
that
stuff,
because
I
was
hoping
that
that
stuff
would
be
along
with
the
current
in
front
in
flight
operations,
would
be
predictive
of
latency
and
as
far
as
I
could
tell
so
far.
Blue
store
just
has
a
lot
of
internal
variants.
E
E
It's
either
device
level
or
rocks
TV
level
like
I,
suppose,
that's
how
I
look
here
level
yeah,
no
you're
right!
It
could
be
that
too.
That
is
another
direction.
I
could
go
next,
but
I
think
I
think
finding
out
whether
the
current
kya
stuff
works
at
all
is
probably
a
better
next
next
step.
E
A
E
B
A
C
E
C
C
You've
already
you've
already
consumed
gateway
resources.
Yes,
but
you
need
some
level
of
overseas
OST
injection
to
keep
those
kind
of
service
levels.
Wait.
Certainly.
E
If
you
wanted
to
interact
with
other,
not
on
GW
workloads,
did
you
repeat
the
preamble
to
that
question?
Oh
I
was
just
asking:
are
we
primarily
interested
in
us
in
the
context
of
our
PD,
or
we
also
interested
in
it
for
our
W?
Well,.
F
E
Gonna,
integrate
the
trace
points
into
CBT
and
do
the
two
quiet,
yeah
experiment,
okay,
cool,
that
is
with
the
or
now
that
I
have
values
for
what
the
model
should
probably
be
set
to,
at
least
on
these
devices.
On
this
you
know
I'll
set
up
to
our
PD
plants,
one
sending
1q
depth.
You
know
rights,
the
other,
sending
a
saturation
workload
and
the
idea
is
to
get
the
Qi
stuff
to
if
the
one
queue
depth
right.
Latency,
as
though
the
other
client
we're
not
present
right.
E
A
E
E
And
I
expect
that
the
degree
to
which
it
is
some
level
higher
is
a
trade-off
against
where
you
put
the
throttle
values,
if
you
put
them
so
that
the
latency
remains
very
low
for
the
next
think
you'd
you
get
up
a
little
bit
of
throughput,
but
that
would
be
an
appropriate
trade
offers
for
some
people.
So
I
think
you
can
actually
get
very
close
to
the
to
the
idle
latency
value.
C
E
And
that
makes
sense
if
you're
benchmarking
right,
but
it's
not
so
useful
if
you're
actually
trying
to
do
q
us
if
you're
not
doing
any
QoS,
then
it
sort
of
doesn't
matter
right,
then
the
next
layer
up
to
sending
what
it's
sending
and
it
doesn't
care
about
blue
stores
latency.
It
only
cares
about
its
own
application
level
agency.
E
What
do
you
do
the
throttling
at
the
the
interface
level,
as
we
eventually
will,
or
at
the
booster
level
or
at
just
the
device
level,
because
it's
slow
doesn't
matter
it
just
changes
where
the
stuff
is
is
is
cute,
but
if
you're
doing
QoS
and
you're
trying
to
give
differential
latency
to
different
clients,
it
actually
does
matter,
and
you
you,
you
can't
send
that
stuff
to
the
device
until
the
devices
ready
to
use
them.
So
I
think.
A
G
A
G
So
originally
originally,
when
we
had
these
recovery
sleep
options,
they
were
blocking,
so
they
were
not
asynchronous,
so
we
always
kept
them
zero.
So
I,
don't
know
I
mean
like,
since
they
were
all
zero
people,
probably
never
never
headed
to
it
like,
since
they
were
blocking
and
also
they
were
defaulting
to
zero.
So
I
don't
know
what
the
impact
was,
but
with
the
stuff
that
I
did.
E
C
A
G
A
G
It
was
the
4k
man,
Alex
I,
said
Igor
was
in
investigating.
Yes,
that's
right.
B
B
B
B
A
The
I
was
again
host
to
I've
got.
This
is
the
same
document
I
shared
last
week,
whereas
this
is
kind
of
an
offshoot
of
the
the
64k
blue,
FS
Alex,
eyes
work,
but
also
looking
at
16k
versus
4k
and
alex
has,
but
in
the
context
of
nvme
drives.
Instead
of
programs
like
Igor
was
looking
at
and
the
the
last
tab
has
the
most
stuff
in
it,
but
even
the
first
one
there
you
can
see
that
there
was
some
advantage
or
4k
Menelik
with
doing
home,
app
testing,
but
the
last
tab.
A
But
the
result
of
that
was
basically
that
it
looked
like
in
in
definitely
in
some
cases
having
a
4k
Menelik
size
did
show
benefit,
not
necessarily
in
all
cases,
but
for
certain
the
ends
of
write
workloads,
especially
like
the
the
128
K
sequential
write
workload.
A
A
The
I
think
the
gist
of
it
at
least
from
what
I
looked
at.
Is
that
it's
not
necessarily
worse,
and
it
sounds
like
for
Eagers
tests.
It
was
better.
So
the
the
big
advantage
right
is
that,
regardless
of
performances
that
you
don't
end
up
with
a
lot
of
space,
amplification
or
4k
after
quicklist
george
w-
and
that's
that's
really
the
big
win.
I
think
and
all
that.
So
if
we
can
do
that,
it
would
be
it'd,
be
really
nice.
A
A
A
A
C
Yeah
I
mean
it's
little.
The
settings
occur
in
the
air
stored
persistently,
so
that
wouldn't
be
an
issue.
I
believe
your
can
correct
me
if
I'm
wrong,
but
believe
that
the
stores
just
reading
about
setting
off
the
disk
and
with
the
new
configuration
wouldn't
actually
take
effect.
Unless
you
did
you
guys
reset
the
store
thoroughly
yeah.
A
C
What
you're
saying
I
think
any
kind
of
upgrade
thing
could
probably
do
the
same
thing
that
a
booster
conversion
upgrade
general
would
do,
which
would
be
disruptive
visioning
I
was
teased
for
any
of
these
are
kind
of
major
for
my
changes,
where
we
can't
only
change
like
a
portion
of
things,
but.
A
C
I
guess
I
am
I'm
curious
on
how
this
would
interact
with
the
recently
changes.
Fourth,
64k
go:
fest
Alex,
I,
appreciate
debates
as
well.
A
I,
don't
have
it
for
just
the
the
shared
device,
but
if
you
look
at
that
last
RBD
iteration
test
tab,
I
do
have
one
of
those
test.
Results
is
for
32k
blue,
that's
Alex
eyes,
which
you
know
she's,
probably
pretty
similar
to
a
64k,
would
be
a
little
bit
more
overhead,
possibly
sure,
okay,
cool
and
it
looks
really
similar
to
the
others,
at
least
for
everybody.
A
little
a
little
slower
than
using
a
much
larger
one
for
like
random,
writes
I.
C
Guess
wondering
means
more
than
the
performance
it
would
be
if
we
could
get
into
a
situation
like
the
gotten
to,
but
before
with
stupid
allocator,
where
the
booster
segments
enough.
That
64k
would
no
longer
be
available
for
professor
to
use
IC
or
for
the
navigator.
It's
good
enough
at
avoiding
fermentation
that
it's
unlikely
I.
A
I
C
B
A
B
I
C
I
was
gonna
agree
with
your
marker.
If
there
is
like
a
very
large
hit
performance
at
oars,
bishop
implication
everybody
has
to
have
a
single
occur.
Then
we
might
want
to
consider
with
the
current
scheme,
with
the
dispatcher
of
only
making
in
them,
making
the
flu
if,
as
Alex
dies,
the
same
or
similar
for
the
share
device.