►
From YouTube: 2019-11-07 :: Ceph Performance Meeting
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
B
A
A
We
changed
it
a
couple
of
months
ago
and
but
Igor
has
actually
been
documenting,
some
issues
that
the
users
have
been
having
with
rocks
TB
corruption,
and
we
don't
know
that
this
is
it,
but
it's
one
of
the
changes
that
we
made
so
right
now,
I
have
a
flight
with
do
not
merge,
but
it
might
be
one
of
the
things
that
we
revert
to
see.
If
that
makes
a
problem
go
away.
A
B
A
B
A
B
A
A
Measured
in
how
we
use
it
and
once
we
do
use
it,
we
should
probably
start
thinking
about
issuing
a
mem
table
flush.
You
know
we
can.
We
can
issue
a
number
of
them
I
think
before
we
probably
need
to
flush.
But
if
we
have
hundreds
of
these
things
happening
quickly,
then
it
can
really
really
quickly
start
making
everything
very,
very
slow.
So
that's
basically
the
high-level
view
of
it
all
right,
updated
PRS.
A
Let's
see
the
this
one,
but
right
log
and
RVD
I
think
that
there's
still
a
request
for
Jason
to
review
that
he
had
kind
of
given
a
general
thumbs-up
about
splitting
that
away
from
a
much
bigger
PR
that
they
previously
submitted,
but
no
no
specific
review
yet
I
think
sage
did
review.
This
enhance
OSD
new
affinity
PR
with
mostly
with
just
some
some
comb
clamps
and
then
some
questions
about
how
they're
doing
it
and
if
they
need
to
do
it
that
way.
A
Per
shard
entry
count
yard
during
bucket
listing
that
I,
don't
think
it
got
any
real
major
updates,
just
rebasing
to
keep
it
and
working
I'm
going
to
try
to
do
some
additional
testing
with
that.
As
time
permits
ends,
I
object
is,
is
still
still
desperately
being
worked
on.
There
are
some
new
updates
and
testing
I
think
it
was
not
not
passing
Jenkins,
oddly
or
something
so
he's
working
on
that
and
then
objection
sustained
as
well,
so
I'm
dad
okay
CID.
Any
news
on
that.
A
C
A
Good
deal
hopefully
soon
agreed
all
right,
so
that's
all
I
got
for
updated
stuff.
This
week,
I've
got
a
couple
of
things
myself
in
the
no
movement
list.
I
need
to
get
back
to
and
look
at
again,
but
I,
don't
know
what
I'm
gonna
get
to
them.
So
I
think
that's
the
case
for
a
lot
of
folks,
but
we
do
have
a
lot
of
outstanding
performance
yars.
D
E
D
D
D
Setting
with
and
right
now,
the
current
settings
that
are
not
default
are
the
sleep
at
point:
1,
no,
deep
scrub,
I
set
to
I
turned
on
and
and
then
also
to
try
and
affect
the
scheduling.
I
said
the
deep
scribe
scheduled
an
interval
to
a
month
instead
of
a
week.
Those
are
all
changes
since
we
had
the
issue
not
before.
D
E
Okay,
okay,
then,
that's
that's
something
we
need
to
mean
it
definitely
to
investigate
this
video
and
put
similar
issues
and
our
latest
releases
the
other
issues
that
we
did
receiver
and
earlier
releases.
They
fixed
a
bunch
of
them
I
think
David
was
also
on.
The
call
can
speak
more
about
it.
David
came
here,
yeah.
F
G
Were
most
surprised
by,
we
were
hoping
once
we
said
to
know
that
you
know
they
would
trickle
down
to
zero,
but
we
definitely
saw
for
sure
like
say
we
were
at
like
one
point,
sixty
two,
then
we
definitely
saw
sixty
three
and
we.
D
G
G
H
G
E
G
I
guess
the
question
would
be
in
the
meantime,
with
the
bug
as
it
is,
is
there
any
kind
of
thoughts?
We
obviously
need
to
try
to
avoid
this,
so
our
thoughts
were
like.
Maybe
we
just
write
a
script
that
would
kind
of
manually
start
scheduling
these
deep
scrubs
ourselves.
We've.
E
G
G
Like
is
the
model
basically
like
disabled,
deep
scrub
and
then
like
schedule
them
yourself,
or
we
saw
other
people
say
you
know,
make
the
interval
very
long,
like
you
know
so
long
that
hopefully
never
actually
scheduled,
and
you
schedule
them
yourself,
so
we're
just
trying
to
get
any
type
of
pointers.
Cuz.
Basically
we
wrote
a
script.
G
I
was
going
to
do
something
along
the
lines
of
you
know,
schedule
the
oldest
five,
but
make
sure
there's
no
more
than
say,
30
deep,
scrubs
happening
at
a
given
time
or
something
like
that
was
our
over
the
whole
cluster
I
mean
that's
what
we
were
thinking
about
doing,
but
we're
open
to
any
suggestions
of
what
we
could
do
for
the
time
being
to
avoid
cluster
chaos.
F
D
I
mean
we
saw
that
we
saw
those
settings,
you
know
we
don't
really
have
a
nice
off
hours
just
because
you
know
when
people
aren't
here
in
the
day.
You
know
we
have
other
scripts
that
are
running
and
actually
or
a
lot
of
times
more
demanding
on
the
file
system,
so
so
the
the
off
hours
or
scheduling
it
like
that
I,
don't
think,
really
fits
our
use
case.
I
didn't
see
like
any
high
like
OSD
loads.
If
that
suite
on
that,
like
with
causing
the
like
long
or
deep
scrubs
or.
F
I
E
D
E
I
F
I
G
D
F
D
F
Actually,
the
that's
it's
it's
sort
of
a
little
subtle
here.
Every
time
a
regular
scrub
happens,
so
that's
normally
daily.
It
checks
to
see
whether
it's
been
now
past,
the
deep
scrub
interval.
Okay.
So
if
you
set
the
regular
scrub
to
a
month
and
the
deep
scrub
to
a
day-
and
if
you
inverted
that
it
was
still,
it
would
not
deep
scrub
every
day,
it's
only
scheduling
the
regular
scrubs.
D
D
F
Wouldn't
make
a
difference
as
far
as
yeah
the
flag,
so
yeah,
so
so
ever
so,
what
with
the
no
deep
scrub
flag
set
it's
supposed
to
so
everyday,
it
checks.
Okay,
should
I,
run
the
scrub
and
then
even
though
it's
been
a
week
or
two
weeks,
whatever
your
setting
is
it's
supposed
to
look
at
the
deep
scrub
flag
and
the
no
deep
scrub
flag
and
then
just
say:
okay,
well,
I'm
not
going
to
do
it
anyway.
Okay,
so
I
just
know.
If
you
have
regular
scrubs.
D
F
I
I
G
I
That
the
ticket
in
the
whatever
the
card
in
the
backlog
was
to
fix
the
scheduler
so
that
it
gets
rid
of
that
minimum
up
to
so
it's
a
minimum
of
one,
that's
smarter
scheduling,
so
that
you're,
like
just
any
OST,
is
only
participating
in
a
single
scrub
at
a
time
but
I
think
in
reality.
What
you
want
is
in
order
to
reduce
like
a
visible
client
impact.
G
There's
a
way
for
us,
the
max
number
of
total
on
the
clock,
like
it
percentage-wise,
like
you
just
said,
like
20%
third
set
that
then
I
can
basically
dial
that
in
to
where
is
acceptable
for
my
cluster
and
then
that
way,
I
have
that
granularity
to
say,
like
I,
never
want
more
than
this
much.
My
cluster
fact
is
something
like
that.
If
that
was
implemented,
would
do
exactly
what
we
were
kind
of
thinking
of
doing
with
the
script
anyway,
yeah.
I
G
We're
trying
we're
trying
to
sleep
and
we're
trying
we're
another
that
helps
or
not,
and
we
will
definitely
put
a
ticket
in
about
the
deep
scrub
still
being
scheduled
on
we're.
Send
our
emails,
like
you,
said,
and
see
if
we
can
actually
get
into
the
ticket
system.
Out
of
great
again,
we
always
appreciate
the
help.
A
All
right
should
we
move
on
to
Billy
range
yeah,
so
the
the
gist
of
this
or
I
guess
we
went
were
in
the
gist
of
it
before.
But
the
the
deal
here
is
that
when
we
started
defaulting
to
use
this
it,
it
doesn't
impact
us
most
of
the
time.
But
there's
one
test
case
that
really
really
makes
it
shows
up
very
easily,
which
is
to
create
a
large
number
of
rgw
buckets
on
one
OSD
and
then
probably
delete
all
of
them.
And
you
can.
I
I
I
C
A
J
I
A
J
I
I
Just
weird
that
it,
if
implemented
that
way,
yeah
it
seems
we
have
no
you're.
I
H
A
A
We
probably
at
some
point
needs
something
that
that
will
issue
a
flush
at
some
point.
I,
don't
just
you
know,
wait
around
laying
these
pile
up.
Even
the
rock
CB
guy,
has
talked
about
implementing
some
kind
of
like
automatic
flushing
behaviour.
If
they're
enough
phrase,
tombstones
they've
got
the
same
idea.
A
How
many
is
a
really
good
question
right,
you
know
and
and
how
big
you
should
be.
You
know
if,
if
you've
got
a
million
keys
to
delete-
and
you
do
that
over
200
range
deletes
then,
is
that
the
appropriate
point
to
flush?
Should
you
do
it
sooner
and
then
have
multiple
flushes?
You
know,
I,
don't
know
that.
There's
a
clear
answer
to
that,
and
it
probably
depends
on
the
underlying
hardware
and
I.
Don't
how
much
DP
you
you
want
to
earn
versus
if
you
want
to
be
in
compaction.
C
A
Yeah
you're,
basically
so
so
everything
that's
in
the
current
buffer
for
the
right.
I
have
log
that
you're
writing
into
basically
is
flushed
into
level
zero
and
then
you're
issuing
compaction,
though
that's
why
it
makes
it
fast
right
because
now
you're
anything
new
coming
in
is
coming
into
a
new
buffer,
though
you're
you're,
basically
taking
what
you've
got.
Moving
that
into
you
know
compaction
workload,
but
now
you've
got
brand-new
buffer
that
you
get
to
rate
into
quickly.
A
I
Do
you
know
if
we
actually
have
an
API
to
like
check
that
I?
Can't
we
asked
so
every
time
we
do
a
range
delete,
we
could
say:
is
this
the
same
buffer
as
the
last
time
is
so
increment
the
counter,
if
not
reset
it,
to
zero
or
to
one
and
then,
if
it
hits,
you
know
20
or
however,
whatever
we
decide,
the
magic
number
is
then
then
force
a
flush.
A
A
A
A
That's
maybe
why
we
want
to
set
it
really
high,
though
the
threshold
really
high
right
like
if
maybe,
if
you're,
deleting
like
a
million
keys,
maybe
it
doesn't
matter
that
much
that
you're
issuing
flushes
a
lot.
Maybe
you
really
want
to
be
in
front
of
it.
I
don't
know
like
in
the
case
where
you've
got
like
an
object
that
has
2000
map
entries.
We
probably
don't
want
to
delete
range
on
it
anyway,
right.
You
probably.
A
E
A
I
Well,
maybe
I
spice.
My
suggestion
is
to
look
at
the
rocks
to
the
API
and
to
see
if
there's
a
quick
way
to
get
the
number
at
the
current
number
for
whatever
the
thing
is
that
you're
writing
into,
because
that
if
so,
then
it's
really
straightforward
to
just
say
just
count
per
burnt,
write
buffer
and
put
the
threshold
that
way?
I
A
I
C
C
I
I
A
B
B
I
E
I
A
A
A
Well,
we've
seen
stuff
I
mean
over
the
last,
like
three
years.
People
keep
talking
on
the
mailing
list
about
like
random
or
oxenby
corruption
issues,
but
it's
it
comes
up
like
once,
every
like
four
or
five
months.
Someone
has
some
kind
of
wrecks
to
be
like
CRC
check
sum
type
error
in
rocks
TB
in
the
API
corruption
in
weed.
I
B
Yeah
well,
actually,
I
don't
have
any
evidence
of
these
as
well
side.
Let's
go
on
the
part
we
change
from
our
part,
so
we
have
some
additional
support
in
blue
phase
for
for
prefetching
and
well.
Actually,
I
can
see
two
branches
in
rock
DB
code,
which
calls
checksum
verification
when
first
one
is
regular
reading
the
second
one
is
retrieving
data
from
from
the
prefetch
buffer,
which
actually
might
not
think
about
prefetching.
I
B
B
B
B
But
well,
offsets
that
are
reported
in
in
logs
are
completely
unaligned.
With
this
information
in
assistive
file,
it
doesn't
read.
Well,
it
doesn't
complain
about
reading
it
offsets
at
page
boundaries.
They
are
not
aligned
with
pages
and
actually
so
checksum
values.
It's
presenting
do
not
correlate.
We
have
data
that
located
in
SSD
files.
B
But
yeah
checks
and
do
not
correlate
with
actual
that
thing
on
a
50
file
and
well
at
least
I
I
have
just
a
couple
of
such
files
and
there
are
zeros
in
tough
sets
in
assistive
files,
but
the
retrieved
value
is
definitely
not
zero
and
which
is
very
interesting
about
the
retrieved
value.
It's
always
the
same
for
all
different
cloud
for
different
clusters
are
reporting
the
issue
so
I
suppose
it's
well,
so
that
it
doesn't
look
like
reading
error.
I
mean
it's
that
that
doesn't
look
like
data
corruption,
that
persistent
storage,
it's
I,
don't.
A
A
I
A
I
B
I
I
B
I
B
B
I
I
Yeah,
so
I
think
that
the
thing
that
I
think
we
need
to
be
careful
of
is
when
we
have
more
PGS.
It
increases
the
probability
that
a
double
o,
SD
or
triple
o
SD
failure
will
lose
data,
for
example,
when
you
have
both
these
randomly
corrupting
themselves
because
of
a
rocks
to
be
bug
so
in
general,
I
think
we
want
to
try
to
keep
that
PG
count
as
low
as
we
can
provided.
It's
not
you
know
outside
against
other
issues.
I
guess
you.
A
G
I
H
I
A
A
A
Work,
I
love
them,
so
so
the
the
very
specific
reason
I
brought
up
this
morning
is
because
there
was
a
QE
case
where
they
had
n
pool
spread
across
ATP
GS,
and
they
were
trying
to
expand
the
amount
of
space
in
the
cluster
and
backfill
was
complaining
that
they
didn't
have
enough
space
to
to
finish
backfill,
for
particular
PG
I
assumed
it
was
because
they
were
trying
to
fill
it
onto
an
existing
drive
that
didn't
have
many
space
matters.
What's.
E
E
The
four
should
have
that
fixed
though,
but
which
we
should
grab.
We
should
double
check
so
mark.
Can
you
you
send
me
a
link
to
the
BZ
or
whatever
you're
looking
at
and
I
can
try
that
yeah.
A
A
If
we
can
first
change
the
PG
log
lengths
and
have
more
peaches
up
front,
if
that
will
less
avoid
rebalancing-
and
you
know-
oh
the
backflow
work
and
everything
else
if
we
could
start
out
with
more
peaches
but
kind
of
tweak,
the
the
PG
log
links
on
a
per
cool
basis
exists.
Essentially,
what
the
auto
balancer
really
is
kind
of
doing
right
is
you're.
I
A
No,
the
other
way
around
so
right
now
we
have
like
a
total
of
100
PJ's
on
an
OSD
right,
so
you
can
very
easily
end
up
in
a
situation
where
you've
got
a
lot
of
pools
and
the
the
auto
tuner
is
saying.
Alright,
this
pool
gets
16.
This
cool
gus'
hate
in
this
pool
is
for
right,
though
I'm
saying
is
well.
Maybe
we
can
start
out
with
a
larger
pool
of
PGS
to
work
with
if
we
are
willing
to
change
the
PG
log
lengths
of
the
PJ's
in
each
pool.
A
If
it's
in
shock,
you
know
yeah
and
so
same
3000
is
what
everything
gets.
Maybe
this
pool
gets
instead
of
having
the
number
of
PG
shrink
in
that
pool.
Maybe
we
say
that
the
PG
log
length
for
the
PJ's
and
that
pool
shrinks.
Instead,
you
still
have
the
same
number
of
entries.
Overall,
though,
it's
not
like
you're
you're,
shrinking
the
the
total
number
of
log
injuries.
You've
got
you're
just
distributing
them
across
more
PGs.
Instead,.
I
I
Think
the
yeah
I
mean
I,
think
I.
Think
the
PG
logs
are
sort
of
a
red
herring
like
we
should
separately
from
whatever
the
autoscaler
is
doing.
We
should
be
choosing
a
PG
log
length
for
PG.
That's
a
little
bit
smarter
than
just
a
fixed
value
for
PG
I.
Think
the
real
question
is
how
many
PG
should
we
have
and
that's
you
don't
want
a
lot
of
empty
few
G's.
I
Now
there's
hard-coded
value,
it's
like
for
I
think
by
default
and
we
could
make
that
or
or
something
or
we
could
ask
the
user
to
like
say
this
pool
I
expect
to
have
high
performance,
and
so
therefore
I'm
going
to
set
a
minimum
of
something
or
we
could
try
to
make
the
cluster
try
to
intelligently
choose
a
minimum
based
on
the
size
of
the
cluster.
So
if
there
are
only
Oros
T's,
then
there's
no
reason
to
have
more
than
four
you
geez
I.
I
A
I
A
I
A
I
Anyway,
I
guess
what
I'm
getting
at
is
there
cases
where
a
pool
is
small
and
it's
not
performance,
sensitive
and
you're,
really
watch
like
one
PG
or
for
VG's,
or
something
like
that
right
and
there
are
cases
where
a
pool
has
data
that
you
expect
to
get
reasonable
parallelism
too,
and
we
can
either
do
two
things.
We
can
either
say
that
if
it's
a
pool
that
you
want
performance
on,
even
though
it's
small,
then
you
set
a
minimum
on
it,
it
mean
PG
to
whatever
you
want,
130,
something
whatever
or
the
other
way.
I
If
it's
a
pool
that
you
don't
expect
performance
on,
then
you
set
the
min
PG
to
be
a
small
number
and
you
make
the
default
in
PG
something
larger
I,
don't
really
care
which
one
we
do
right
now,
we're
basically
defaulting
to
four,
and
so,
if
you
need
more
than
that,
you're
expected
to
set
it
higher.
But
we
could
flip
that
around
we
can
make
the
min
PG
by
a
to
64
or
32,
or
something
32
is
probably
sufficient.
I
I
That's
that's
the
only
that's
the
name
of
the
game.
If
you
want
to
do
it
automatically,
unless
the
user
tells
you
what
the
pool
is
going
to
be
for
so,
if
we
concur
äj--,
we
should
encourage
users
to
set
the
target
ratio
or
the
target
size
for
the
pool,
and
then
we
can
just
start
with
a
number
that's
reasonable.
But
if
they
give
you
no
information,
then
you
have
no
idea
whether
it's
going
to
be
an
empty
pool
or
if
it's
going
to
be
a
big
pool.
A
A
I
I
mean
I'm
just
thinking
that
every
time
there's
like
a
customer
issue
where
they
like
just
have
way
too
many
pts
in
the
cluster,
and
it's
like
makes
things
Diddy,
it's
like
it's,
it's
not
the
memory,
it
is
the
memory
usage,
I
guess
at
some
level,
but
it's
not
necessarily
the
log.
It's
like
dealing
with
the
like
the
past
intervals
and
the
peering
and
the
like
when
they
get
a
cluster,
that's
dug
into
a
hole
and
can't
climb
out
again
that's.
I
I
H
H
H
Question
I
had
wanted
to
bring
up
as
because
a
lot
of
those
things
like
with
heartbeats
and
interconnections
have
been
I,
don't
know
more
with
larger
clusters,
with
a
cluster
of
like
three
six
well,
those
DS
doesn't
matter
that
much
that
we
have.
If
we
have
like
300,
pgz
or
something
I,
don't
know,
I,
don't.
H
A
Yeah
I
mean
like
I've
I,
don't
know
when
the
last
time
I
ran,
one
was
but
I
mean
I've
I've
I've
run
tests
where
I've
had
like
a
thousand
PJs
on
the
OSD
on
multi
OSD
clusters.
And
you
know
this
is
small-scale.
It's
like
you
know
four
Oh
STIs
or
what
was
the
urgent
thing,
but
you
know
it
works.
Just
fine,
there's
no
problem,
but
you
know
I'm
not
really
stressing
them
in
weird
ways:
either
I
guess
so.
H
A
H
Wanted
to
bring
up
was
been
and
had
to
enjoy
some
testing
with
autoscaler
and
seeing
some
effects
on
performance
due
to
look
like
primarily
recovery
in
his
case,
but
mark
you
had
mentioned
that
you
using
the
past
fronts
and
tests
with
a
single
OSD
and
seen
some
impact
from
just
plain
splitting
and
merging
with
even
with
booster.
Yes,.
I
Has
two
cui
sio2,
the
2p
G's
that
are
going
to
be
merged
for
like
a
full
OSD
map,
epic
cycle,
so
there
are
enough
known,
stable
state
in
order
to
sandwich
them
together,
and
then
they
get
unblocked.
That's
why
the
merging
goes
one
PG
at
a
time,
those
only
one
PG
that
gets
blocked
for
like
3
seconds
or
whatever,
and
then
releases
okay.
H
I
I
I
Yeah
I
mean
I
have
to
go
back
and
look
at
it
to
remember
exactly
what
would
be
needed
in
order
to
get
rid
of
that,
but
at
the
time
it
was
so
much
simpler.
Just
to
do
just
to
pause,
I
Oh
I
mean
this
should
be
rare
right.
It's
it's
unusual
that
you're
going
to
be
splitting
because
it
should
only
happen
if
you
have
a
pool
that
had
a
lot
of
data
and
then
you
deleted
it
all
on,
so
the
pool
is
shrinking
or
the
cluster
is
shrinking
or
something
like
unless.
A
I
Don't
make
room
for
each
other,
they
size
themselves
based
on
their
size
relative
to
the
total
cluster,
though
the
scaler
looks
at
each
pool
independently.
That's
how
much
data
is
this?
What
fraction
of
the
total
didn't
the
cluster?
Is
it?
How
many
PGs
that
have
relative
to
the
total
target
number
of
PGS
in
the
cluster?
Should
I
get
bigger
smaller,
and
if
it's
off
by
more
than
three
X,
then
it
will
do
it
bump
up
or
down
yeah.
A
I
Shoot
if
you
like
manually,
set
it
really
high
and
you
overshoot,
then
it
will
scale
it
down
for
you
or
if
you
blow
up
a
pool
so
that
it
gets
really
big
and
they
need
to
lead
a
bunch
of
data,
and
so
it
has
to
get
really
small.
But
again
it
only
makes
a
change
if
it's
off
by
more
than
three
X.
So
you
have
to
make
like
you
know,
quarter
the
amount
of
data
or
whatever,
before
it's
really
gonna
yeah.
I
H
I
A
I
C
I
A
H
Think
is
he
after
he
said
the
seat,
seven
setting
is
silly.
There
was
like
I'm
Trevor
you
sleep,
then
yeah.
It
was
working
there.
I
think.
The
one
thing
we
did
notice
knows,
though,
that
was
that
you're
running
on
AWS
and
then
the
disks
there,
even
though
they
wouldn't
be
non
rotational,
don't
behave
the
same
way
as
regular
physical
SSDs
that
we'd
see
in
a
data
center
and
berstein
a
lot
more
lot
slower,
yeah,
yeah,
rolling.
I
A
Well,
it's
kind
of
like,
though
I
don't
exactly
know
how
AWS
worst,
but
my
understanding
is
that,
like
you,
kind
of
like
use
up
your
credits
and
then
you
regain
them
slowly
over
time
so
like
if
you
invoke
a
background
workload,
all
the
sudden,
you
burn
through
all
the
credits
that
you
have
right
now
and
then
everything
else
is
slow
and
then
you
slowly
get
it
back.
But
you
know,
if
you
have
something
you
need
to
do
right
now
it
doesn't
help
you
right.
I
E
H
I
Have
to
run
I
got
another
better
subsidy,
but
yeah
I
think
my
I
think
I've
two
main
things.
One
is
that
if
we
want
to
change
the
behavior
so
that
instead
of
setting
a
high
minimum
floor
or
high
performance
pools,
we
default
to
a
higher
number
and
you
have
to
set
it
lower
for
a
on
performance
pool.
If
we
want
to
make
that
change,
we
can
do
that.
I
I,
don't
really
care
either
way.
That
and
people
should
weigh
in
and
then
the
other
thing
is.