►
From YouTube: 2019-06-20 :: Ceph Performance Meeting
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
C
B
Let's
see,
there's
a
mob.
Some
musician
object
her
that
merged
there's.
Also,
this
peachy
mapping
cache,
which
merged
last
week,
I
think
which
is
nice.
It
just
basically
keeps
a
memory.
All
the
crushed
calculations
for
a
given
OST
map
that
wanted
to
repeat
them
shows
a
noticeable
improvement.
Nice
camera-
it's
like
10%
1%
done
like
that,
but
that
won't
thing
especially
important
for
flash
clusters.
B
The
Sun
bumps
up
the
priority
for
recovery,
ops
that
they're
triggered
by
a
client
op
surprise
that
when
we're
doing
out
that
already
I
think
that
was
at
oversight
and
then
Igor
finally
addressed
an
issue
finally
addressed
an
issue
with
blue
store
and
some
weird
extreme
cases,
it
could
run
out
of
IDs
for
shared
blobs.
So
it
basically
noticed
is
that
includes
that
one
that
happens
with
an
awkward
one
to
fix,
but
finally
got
sorted
out.
B
I
haven't
looked
at
this
objector
stuff
from
Adam
Casey,
says
all
stuff:
refactoring
librettists
a
little
tie-in
closer
or
appropriately
with
rgw
and
there's
op,
create
one
I.
Think
it's
about
to
merge
and
it's
a
optimization
for
the
case
where
the
OSD
knows
that
the
object
doesn't
already
exist
and
can
therefore
skip
a
lookup
in
blue
store.
So
it's
a
slight
improvement.
The
post.
He
just
has
to
be
more
careful
about
how
it
crafts
transactions.
B
B
Yeah
accept
it
for
the
new
stuff
crimson
things
Stan
here:
Pat
I'm,
still
working
on
shutting
blue
store
or
orcs
to
be
okay,
I
think
fairness.
B
B
D
Have
something
small,
it's
about
again
the
energy
stuff
that
I
revived
I
hadn't
touched
it
in
a
long
time,
but
now
that
I've
had
a
cluster
to
benchmark
it.
You
rent
some
dashboards
on
blue
store,
2020
with
very
skewed
depth,
and
if
I
can
just
share
my
screen
quickly,
you
might
go
far
too
worried.
Yeah.
D
I've
done
is
that
I
have
all
such
duster
with
five
notes.
Five
hose
denotes
and
I
ran
I
cranked
up
the
debugging
for
to
2024
booster,
only
and
tada,
that
is
the
baseline
and
I
changed
the
code
of
boost
for
basically
all
log
statements,
I
converted
them
to
algae,
TNG,
first
points
otherwise
for
multiple
key
depth.
I
have
QD
fringing
from
one
to
one
to
Nate
and
basically
here's
what
I
got
I'm
looking
at
a
ops
and
the
latency
and
for
queue
depth.
D
Well,
basically,
you
know
when
I
you
depth
increases,
you
can
get
a
lot
more.
I
also
13g
compared
to
well
set.
Six
here
is
the
very
slight
thing,
so
it's
really
in
major
major
performance
difference.
It's
about
twenty
percent,
so
that's
four,
four
megabytes
and
4k
chunks,
it's
really
magnificent
yeah
and
if
I'm
looking
at
the
tail,
latency
basically
see
how
much
of
a
difference
that
makes
so
basically
writing
to
a
file.
D
What
I
think
this
tells
us
is
that
I
mean
since
all
well.
First
of
all,
the
fact
that
the
defense
is
really
with
the
logic
to
Delta's
I,
imagine
what
the
larger
to
that
for
more
threads
are
involved
on
the
USD
nodes,
but
there's
probably
more
contention
somewhere
right.
The
fact
that
all
that
the
fact
that
multiple
threads
for
each
other's
D
are
writing
to
the
same
file
on
this
I
mean
there
has
to
be
a
lots
on
my
right
or
a
lot
of
contention.
D
D
Because
I'm
sorry,
this
is
just
a
tracing
infrastructure.
There's
the
whole,
you
know
thing
converting
strings
and
can
copying
extremely
efficient
is
something
else
I'm
pretty
sure
that
it
has
to
be
just
so
long
because
even
with
LT,
TNG
I'm
still
using
strings,
it's
all
objects.
Are
they
have
the
string
operator
and
they
are
converted
to
strings
and
copies
also
that
overhead?
This
is
both
an
entity
in
G
and
the
base
time,
but
I'm
talking
about
the
actual
tracing
mechanism
and
no
buffering
in
memory.
D
This
is
what
describes
shows
with
you
and
so
I
would
definitely
think.
There's
something
to
do
there.
Honestly
I
mean
markula.
You
and
I.
We've
talked
about
this
I,
don't
know
if
you're
really
using
editing
G
is
the
right
approach,
because
there's
a
lot
of
overhead
for
the
developers
not
as
easy
as
just
a
DL
statement.
D
A
Guess
my
my
my
friend
I
thought
is
first
to
just
be
that
at
least
it
showcases
that
there's
big
gains
that
we
can
make
right.
Yeah.
D
D
B
D
A
D
D
B
A
Okay,
yeah
write
it
down.
If
you
think
about
it,
I'll
start
on
some
this
other
stuff,
then
so
I
wanted
to
just
talk
a
little
bit
about
the
booster
cache
refactor
PR,
though
it's
it's
funny
I
when
I
started
this
I
didn't
even
really
know
performance
was
kind
of
a
secondary
thing.
I
was
really
only
trying
to
make
it
so
that
the
the
buffer
cache
was
behaving
a
little
bit
nicer
and
not
exceeding
its
allocation,
but
I
kind
of
wondered
if
there
might
be
some
benefits
to
it.
B
A
Have
fewer
locking
events
overall
anyway,
because
we're
just
taking
walking
the
way
and
in
anywhere
else,
what
we're
doing
trimmed
already
grabbed
the
lock.
So
you
know
less
was
blocking
less
contention
unlocks
good
overall,
when
I
did
that
what
I
saw
was
that
performance
improved,
but
actually
lock,
contention
went
up.
That
was
weird.
A
The
reason
I
think
lock
contention
went
up
is
because
the
buffer
cache
at
basically
adding
buffers
for
the
buffer
cache
and
looking
up
Oh
nodes
in
the
tpo
setp
threads
were
contending
with
each
other,
though
you
know
when
you're
you're,
adding
buffers
that's
happening,
like
the
I
think
the
KB
sync
thread.
If
I
remember
right,
we
don't
I'm.
Sorry,
the
oh
nice
thread
and
I
wanted
to
see
whether
or
not
basically
splitting
those
apart
would
improve
things
further.
So
that's
what
the
second
part
of
this
PR
does.
A
Is
it
basically
rips
the
cash
apart
so
that
you
have
independent
buffer
and
diode
caches
that
don't
share
a
lock
anymore,
so
they're
each
have
their
own
independent,
lock
and
that
improve
dramatically
much
much
less,
lock
contention.
All
this
overall
is
improving
tale,
latency,
pretty
significantly.
It's
like
half
the
tale
latency
a
third
of
the
99.9%
I/o
latency
for
Fri
O's,
so
yeah.
No,
it
was
really
good.
A
A
D
B
A
A
A
All
right,
so
next
thing
recovery
settings
I
have
been
looking
at
recovery
for
the
last
couple
of
days,
primarily
what
happens
in
CPU
constraint
scenarios,
but
what
what
wouldn't
take
away
they
I
found
from
this
is
that
our
current
recoveries
things
are
very,
very,
very
much
optimized
toward
not
impacting
client
I
know
on
hard
drives
and
I'm
nvme.
It
was
it's
really
really
awful.
It
there's
kind
of
like
an
upfront
cost.
A
As
soon
as
you
go
into
recovery,
it's
like
maybe
ten
or
fifteen
percent
on
the
festival's
drive
got,
but
recovery
is
so
slow
compared
to
the
rate
that
you
can
get
that
you
can
ingest
data
and
that
I
mean
you
can
almost
make
it
kind
of
stall
recovery.
It
doesn't
quite
install
it,
but
it
kind
of
almost,
for
all
intents
and
purposes,
is
recovering
so
much
slower
than
data
ingestion.
That's
it's
not
good.
A
C
A
C
A
But
one
of
the
problems
josh
is
that
an
nvme
you
can
write
so
many
objects
that
people
end
up
with
a
lot
of
objects
right.
So
you
can.
You
could
write
16k
ratos
objects
easily
at
fifty
thousand
right
apps
to
a
small
cluster
farm,
or
if
you
go
for
the
cpu
available
right,
you
can
end
up
with
a
cluster.
This
guy,
you
know
I'm,
maybe
like
a
hundred
million
or
200
million
objects,
a
default
recovery
settings
on
this
cluster
that
I've
got
it's
like
twelve
nbme
drives
across
three
nodes.
Recovery.
A
A
Dramatically
right
now,
the
setting
I've
got
after
just
playing
around
with
it
is
I,
think
I've
got
OSD
max
backfill
set
to
16
and
the
the
max
active
recovery
set
for
64
and
with
those
settings
the
average
ratio
of
client
IO
to
recovery.
I
always
split
like
70%
client
30%
recovery,
roughly,
is
what
I
was
seeing
when
I
was
recovering
objects.
What
with
it
quite
a
workload
happening?
This.
B
May
be
an
aside,
but
I
would
expect
that
we
would
want
to
keep
the
max
back
feels
at
a
small
number
and
proportionately
increase
the
max
ops
instead
of
having
10
parallel
backfills
doing
ops
each
of
one
backfill
doing
20
ops,
a
member
bobstine
fight
but
you're
working
on
one
PG.
So
it
completes
faster.
A
The
only
way
to
to
push
that
further
was
increasing
the
OST
max
backfills
I
guess
that
sort
of
makes
sense.
I
didn't
really
actually
fully
understand
how
this
worked
when
I
started,
but,
but
you
know
not
not
all-
can
participate
with
each
other
right
now.
So
not
all
those
T's
will
participate
in
the
overall
recovery.
With
our
current
settings.
B
A
A
What
I'm
thinking
is
we
look
at
the
overall
time
spent
on
ops
in
the
last
interval
or
client
I/o
and
recovery
io,
and
then
we
try
to
tune
that
some
ratio
of
times.
But
if,
like
there's
not
enough
client
IO
to
hit
some
kind
of
threshold
that
we've
set,
so
we
always
give
it
a
little
more
than
it
wants.
A
C
C
That's
really
some
other
heuristics.
We
could
do
that
or
a
simple
like
that.
Miss
thing
we
don't
it's
a
tough
road
to
go
down
to
try
to
really
measure
exactly
how
much
time
or
resources
were
spending
on
that
your
particular
app.
C
B
C
C
Simple
that
we
could
potentially
backport
as
a
first
step
and
then
maybe,
if
that's
what
we
could
make
that
more
a
little
bit
more
intelligent
as
a
second
step
like
maybe
the
simple
thing
could
just
be
increasing
the
max
backfills
and
max
active
to
some
higher
threshold
based
on
some
very
simple
measurement
of
how
busy
the
I
was.
Steve
looks
right
now.
C
C
B
E
A
Make
the
OSC
smart
my
crusade,
so
yeah.
The
only
other
thing
I
have
in
this
is
a
so
I
have
been
doing
a
lot
of
CPU
limited
testing
and
what
happens
when
your
CPU
limited
is?
Not
only
is
the
client
I/o
slower
during
compaction
or
so
I
gotta,
compassion,
joy
during
recovery,
but
then
the
recovery
takes
longer.
So
it's
it's
like
a
multiplicative
effect
right,
you're
you're
spending
two
or
three
times
longer
in
recovery,
but
your
throughput
is
also
like
half
what
it
is.
If
you
had
more
CPU.
A
So
it's
this
is
kind
of
a
commentary.
I
guess
on
why
it
really
is
irritating
to
be
CPU
limited
on
the
OS
T's,
but
that's
as
part
of
it,
which
I
guess
maybe
also
oh.
But
having
said
that
when
your
CP
limited
regardless,
if
you're
using
like
half
a
core
one
core
when
I
have
cores
or
two
cores
or
whatever
on
the
OSD
I
saw
the
same
ratio
of
client
to
backfill
traffic,
roughly,
it
was
really
close.
A
A
C
B
B
B
Well,
anyway,
it's
been
all
and
it's
all
been
enabled
and
master
or
like
two
months
three
months.
B
However
long
it's
been
and
we
fixed
our
first
bug,
and
that
explains
the
sporadic
failure
we've
been
saying,
but
otherwise
everything
looks
good,
though
I
think
the
only
thing
really
blocking
us
turning
this
on
by
default
that
put
turn
on
Oh,
making
it
not
experimental
and
Nautilus
and
changing
the
default
so
that
your
connections
to
the
monitor
or
secure
is
that
we
don't
have
any
numbers
about
how
much
slower
it
is
when
you
used
to
cure
mode
versus
on
secure
mode.
I
really
want
to
talk
about
this.
Let's
wait
a
second
here.
B
Sorry
I'm
here
emeritus
all
right,
okay,
so
the
secure
mode
was
merged
for
Nautilus
but
marked
experimental
Nautilus.
It's
been
enabled
and
master,
it's
all
been
fine.
At
least
we
thought,
except
there
were
these
sporadic
Mon
down
errors
that
we're
seeing
in
QA
that
it
turned
out
we're
due
to
a
bigger
mode
bug,
that's
fixed.
They
now
go
away,
though,
feeling
generally
very
good
about
it.
B
B
B
A
B
B
Sure
yeah,
even
when
you're
talking
v1
I,
think
I'm
not
sure
if
it
affects.
If
you
want
to
be
too,
if
it's
only
be
too
either
way,
there
is
a
big
reflector,
so
I
think
we
also
just
need
to
do
whatever
setup
it
is.
That
is
basically
pushing
as
much
data
over
the
wire
and
then
do
some
perf
work
to
figure
out
where
the
CPU
time
is
being
spent.
Look
at
some
profiles,
whatever
I,
don't
see
what
optimizations
we
can
make.
B
B
A
A
F
F
A
B
But
it's
not
gonna
reflect
what
a
real
workload
it's
going
to
look
like
and
what
people
really
want
to
know
is
like
at
my
cluster,
if
I
just
turn
this
on
how
much
slower
I
have
a
challenging
question
to
ask,
but
we
can
like
pick
something
that
something
sort
of
yeah
and
away
be
representative,
at
least
so.
I
wouldn't
get
I.
Guess
I
wouldn't
get
caught
up
on
like
trying
to
have
the
perfect
micro
benchmark.
I
would
start
with
just
doing
I'm,
not
revenge.
If
I
can
just
see
yeah
that's
closer
to
what
yeah.
A
The
first
step
will
be
getting
the
CBT
messenger
to
stuff
working
figure
out.
Why
that
was
breaking
yeah
go
back
and
do
that
yeah
we
can
from
there.
What's
that's
done,
then
they
don't
be
easy
once
we
can
just
switch
between
them
and
it'll
be
easy
to
do.
Constructor
testing
just
see
how
it
how
it
does.
B
Yep
as
long
as
DVD
can
still
run
mimic
cuz
we'll
want
to
do
a
mimic,
Nautilus
V
one
comparison,
probably
too
sure
start
with
that.
Even
you
could
start
with
that,
even
just
because
that
might
even
be
sub.
0
is
just
to
see.
If
there's
a
regression
on
the
v1
performance
between
mimicking
Nautilus,
you
can
repeat
what
Roland
saw.
B
A
I'd
be
a
little
concerned,
though,
there's
enough
other
changes
between
economists
that
it's
gonna
be
hard
to
strictly
yeah.
That's
what
really
I
mean
yeah.
We
wouldn't
be
doing
regression
just
to
this
yeah
I
guess
what
I
would
maybe
want
to.
It
seems
to
me,
like
maybe
you
know,
I'm
an
easy,
straightforward
comparison.
First
would
be
just
Nautilus
with
v1
v2,
yeah
and.
A
C
E
You
know
material
I
can
take
a
look
at
the
understand,
what's
happening
from
the
amplification
perspective
and
also
any
particular
things
that
might
be
worth
tuning
for
an
all
flash
environment
that
I
have
may
not
have
thought
about
what
what's
the
workload
so
I've
got
a
bet
several
that
I'm
flying
to
you
know,
I'm
focused
around
meeting
and
entertainment,
but
you
know
this
shows
up
also
in
the
I/o
500,
which
we
have
a
result
out
there.
Actually,
that
I
posted
this
week's.
A
So
I
guess
I,
don't
know
exactly
what
IO
500
is
doing
yet.
E
E
A
A
Of
thousands
of
files,
okay,
so
I'm
I'm,
not
as
much
of
an
expert
on
cephus,
we
might
get
want
to
talk
to
Patrick
about
it.
Some,
but
I
would
be
concerned
about
what
you
know
kind
of
is.
Is
we
have
hundreds
of
thousand
files
that
you're
writing
to
simultaneously
what
what
the
effects
of
that
would
be?
That
might
be
kind
of
a
first
place
to
start
I.
A
E
You
you're
so
you're
somewhat
right
what
I'm
seeing,
but
the
pattern
is
the
same,
because
I
do
have
some
tests
from
getting
small
small
numbers
of
files
with
the
random
4k,
I/o
and
I
still
see
that
massive
mismatch
in
my
ops
from
what
I
would
expect,
and
that's
that's
one
I,
don't
my
getting
wrapped
around
a
look.
What
kind
of
write
amplification
factor
should
I
be
looking
at
and
expecting
to
see
and
then
what
am
I
missing?
That
kind
of
thing
and.
A
E
So
if,
if
you
just
break
it
down
to
the
wrong
math
on,
you
know,
what's
happening
at
the
64k
level
is
like
2.8
million
I
ops
and
when
I
look
back
at
it
from
a
4k
perspective,
we're
only
seeing
about
two
hundred
eighty
five
thousand
die
offs
for
doing
the
same.
You
know
roughly
the
statement
so
I
know
the
numbers.
Don't
exactly
line
up
here,
but
it's
a
much
bigger
Delta
than
I,
fully
expected
to
see.
Okay.
E
A
E
Sorry,
how
many
one
of
my
horses,
you
would
ask
that
I,
don't.
A
A
E
A
E
A
Sure
yeah
at
some
point,
you'll
hit
like
SATA
or
SAS
you're,
giving
room
limits
made
it
it's
been
a
while
since
I
looked
at
that,
you
know
fast
devices,
I,
don't
remember,
really
kind
of
really
go
on
yeah.
E
A
They
should
lower
latency
at
the
very
least
another
scenario
where
you
get
lots
and
lots
of
traffic
I,
don't
know
that
they
necessarily
will
improve
CPU
utilization
much
but
maybe
able
I
still
saw
like
on
our
iron
server
nodes
with
would
have
P,
3700
and
V
me
drives
in
them
to
get
40,000
right.
I
have
Sun
and
OSD
it's
using
10
cores.
So
it's
still,
you
know
burning
a
lot
of
if
you
pull
that
off
yep.
A
E
A
A
Cool
sounds
good
Oh.
As
you
mentioned
you,
you
could
try
running
through
one
of
the
OSDs
through
either
adams
will
talk.
Profiler
in
my
wall,
clock,
profiler
Adams
as
faster.
It's
it's
the
debased
and
it's
a
larder
to
get
running.
A
A
This
in
the
chat
window
got
it
so
that
will
make
everything
like
super
slow,
but
it
might
tell
you
where,
in
the
code
you
know
things
are
waiting
and
then
like.
If
you
see
it
that,
like
the
K
piecing
thread,
isn't
like
you
know,
super
busy,
then
that
might
tell
you
you're
waiting
I
owe
you
can
like
look
at
the
B
story
and
thread
to
to
see
if
it's
like
100%
busy.
A
Also,
if
you
see
I,
always
submit
taking
a
lot
of
time,
then
that
means
I
Houston
is
blocking
and
also
usually
means
us
like
saturated
the
device
and
the
queue
depth
has
been
kind
of
like
reached
and
now
I
wasn't.
That's
just
blocking
so
it'll
give
you
some
indication
of
like
what
kind
of
is
going
on
that
is.
E
Pretty
cool
yeah.