►
From YouTube: Ceph Performance Meeting 2023-01-26
Description
Join us weekly for the Ceph Performance meeting: https://ceph.io/en/community/meetups
Ceph website: https://ceph.io
Ceph blog: https://ceph.io/en/news/blog/
Contribute to Ceph: https://ceph.io/en/developers/contrib...
What is Ceph: https://ceph.io/en/discover/
A
Hey
folks,
I
think
just
starting
to
wrap
up
core,
so
hopefully
we'll
get
those
folks
in
a
minute
or
two
here.
A
So
as
we're
waiting
for
them
every
year,
I
take
the
ether
pad
and
I
I
try
to
Archive
it
into
the
previous
year
and
start
over
kind
of
with
a
new
one,
and
it
turns
out
that
we
have
a
not
sort
of
a
bug,
but
it's
also
a
figuration
setting
where,
when
I
tried
to
copy
and
paste
into
the
the
previous
year's
etherpad,
basically
it
silently
fails.
It
looks
like
it
did
it,
but
then,
behind
the
scenes,
the
buffer
was
too
big
for
what's
allowed
on
the
server
and
it
silently
fails.
A
And
so
unfortunately,
if
you
look
at
the
ether
pad
there's
these
comments
empty
and
empty,
we
lost
the
2021
record.
Unfortunately,
I
think
that
had
happened
previously
and
I
didn't
have
a
I
didn't
know
about
this,
so
it
looked
like
it
was
fine,
but
then
it
never
correctly
put
anything
into
it.
I
did
happen
to
notice
this
from
the
2022
one
though,
and
I
have
a
local
saved
copy
of
that
ether
pad.
A
So
I
think
I
just
saw
that
Adam
crateman
adjusted
it
so
I'll
try
to
get
that
in
there
so
that
we'll
have
the
record
from
2022,
but
I
think,
unfortunately,
2021
is
gone,
but
the
good
news
is
that
now
we
should
have
larger
buffers.
So
hopefully
this
won't
happen
again.
A
All
right
so
I
guess
they're
still
going
in
course
stand
up,
but
I
can
maybe
get
started
here.
While
we
wait,
there
were
two
new
poll
requests
this
week
that
I
saw
both
from
Igor
the
first
one.
Is
this?
Do
not
reset
prefetched
buffer
when
doing
multi-chunk
I
I
assume
reads
is
the
rest
of
that
statement.
A
We
talked
about
this
a
little
bit
this
morning
and
in
Igor
and
I
I
met,
oh
Igor,
you're
here
never
mind
you
were
you
talk
about
this
you
better
than
I.
Do.
B
Skip
file
and
then
proceeds
with
reads
of
the
same
junk
using
much
lesser
block,
and
it
appears
that
we
have
some
issues
as
well
or
internal
Perfection,
which
resets
these
prefetched
blocks
too
early,
and
please
also
think
like
that.
So
I
could
see
the
some
improvements
in
statistics,
so
performance
counters
that
we
first
shared,
but
not
much
in
a
real
performance
numbers.
To
be
honest,.
A
B
The
issues
that
we
still
have
is
buffer
and
memory,
and
instead
of
that,
we
said
it
and
to
to
disk
I
believe
this
career
has
all
copy
and
cache,
and
that's
why
the
Improvement
is
not
that
be
that
big.
But.
B
A
Okay,
well,
do
you
also
want
to
talk
about
your
other
PR
here?
Well,.
A
A
Yeah
I
was
I
was
going
to
ask
if,
if
you're
doing,
a
lot
of
experimentation
right
now,
it'd
be
really
interesting
to
know
whether
or
not
the
the
pr
made
to
enable
the
ability
to
do
compaction
on
iteration
helps
here
or
not.
B
Yeah
I
I
haven't
looked
at
least
in
this
aspect
from
this
point
of
view,
but
do
my
best
to.
A
Sure
sure,
just
if
you
happen
to
be
doing
other
experiments,
I'd
be
I'd,
be
very
interested
if
that,
if
that
shows
any
Improvement
or
not.
A
All
right,
I
think
you
are
you're
sort
of
the
only
two
new
PRS
I
saw
this
week.
Although
I
apologize
to
anyone.
If
there's
something
I
miss,
let's
see,
I
saw
one
closed
PR,
and
this
is
also
from
uecore
you're,
the
you're
you're,
the
the
the
person
doing
all
the
work
this
week.
I
think
so.
This
is
the
you're
enabling
4K
allocation
units
for
bluefest.
It
looks
like
you're
emerged
that
finally.
B
B
That's
primary
intended
to
fight
to
fight
and
to
fight
and
expect
provocations
on
highly
fragmented
disks,
so
yeah
so
right
now
we
don't
care
about
the
amount
of
continuous
chunks
at
the
first
drive,
so
we
are
able
to
fall
back
to
okay
allocating
unit.
Hopefully
this
wouldn't
happen
very
quickly.
We
wouldn't
result.
A
B
B
B
A
Okay,
let's
see
moving
on
to
updated
PRS
another
one
from
uecore,
their
your
user
bounded,
iterators
and
arm
wrench
Keys
looks
like
Corey
reviewed
that
and
approved.
We
just
need
QA
on
it.
Now.
B
Yeah
and
let's
actually
an
issue
which
one
can
face
and
it's
in
the
same
scope
as
many
of
our
issues.
We
have
got
with
products
DB,
which
is
impacted
by
tons
of
tombstones
and.
B
B
That's
a
sort
of
incremental
Improvement
graded
products,
VPN.
A
B
Right,
that's
the
last.
This
is.
A
Just
yeah
I'm
hoping
that
we
we're
kind
of
nuking
this
this
issue
with
tombstones
from
orbit.
We've
got
multiple
PR's
coming
in
from
multiple
people
all
trying
to
solve
this,
and
and
hopefully
we'll
we'll
just
kind
of
through
a
company
kind
of
lots
of
different
attempts.
Yeah,
but
we'll
fix
it.
I'm.
B
Physician
cases,
but
generally
we
might
get
at
least
so.
Then
you
know
the
the
tissue
again
so.
B
A
There
there
was
a
really
easy
way
to
General
effect
of
tombstones,
causing
slow
iteration
a
couple
of
years
ago
by
using
rgw
with
the
the
delete
range
configurable
that
we
have
set
to
to
allow
delete
range,
even
with
a
small
number
of
deletes
that
could
pretty
pretty
quickly
cause
issues.
B
C
B
Should
I
write,
I
tried,
W
delete
to
my
in
my
lower
for
recent
experiments,
but
sure
I
switched
already
deletes,
but
I
don't
I
don't
know
so
perhaps
it
somehow
depends
on
Blackberry
sources.
But
in
my
case
this
issue
is
not
easily
reproducible.
So
I
I
managed
to
do
that
a
couple
of
times,
but
it
can
say
that
it's
pretty
easy
to.
A
Okay,
if,
if
you're
still
struggling,
let
me
know
and
I
can
try
to
reproduce
the
experiment
that
I
did
a
couple
of
years
ago
and
see
if
I
can
I
can
still
hit
it
when
we,
when
we
change
the
the
settings
for
the
threshold
for
delete
range
and
just
see
if
it
if
it
happens
again,.
D
A
B
But
no,
no
recently
I
learned
that
snapdebe
released
open
source
version
of
the
drop-in
replacement
for
aux
debut.
You
might
want
to
try
to
use
it
as
an
alternative
engine.
At
some
point.
A
See
so
it's
been
a
long
time
yeah
it's
been
a
long
time
since
I've
looked
at
it
I
know.
Adam
has
already
also
looked
at
it,
but
from
what
I
recall,
I
didn't
think
that
the
improvements
they
made
would
help
us
a
lot.
A
Maybe
that's
incorrect
and
they
may
have
also
improved
it
dramatically
in
the
last
year
or
two,
but
the
last
time
I
looked
at
it.
I
wasn't
convinced
it
was
a
huge
Improvement
for
us.
A
B
C
A
Igor,
you
may
know
you
may
know
more
about
what
they're
doing
than
I
do,
but
I
thought
that
they
were
doing
a
lot
of
work
very
similar
to
like
the
the
whiskey
improvements
for
for
running
qualification
with
certain.
B
Well,
actually,
I'm
not
aware
on
this
stuff
in
detail,
so.
B
A
I
think
so
I
mean
I
believe
it's
just
basically
very
similar
to
rock
CB
with
modifications,
but
I'm
not
I'm,
not
entirely
sure
that
Adam
I
mean
you're.
Probably
you
probably
know
the
most
about
it
of
any
of
us.
Do
you
remember
what
what
it
it
looked
like
or
what?
What
was
what
the
behavior
was.
D
Not
really
I
just
remember:
I
got
the
pdb
to
integrate
it
with
with
Seth
and
Iran
the
tests
that
I
actually
did
have
at
the
time,
so
I
could
get
a
comparable
results
and
the
results
weren't.
Amazing.
Let
me
try
to
find
a
link.
There
was
some
some
improvements.
Some
degradation
overall
I
do
not
even
claim
that
my
tests
were
the
best.
It
was
some
some
4K
random
right.
Some
mixed
mixed
reads:
rights:
I
remember:
there
was
no
omap
operation,
so
maybe
on
that
it
will
be
a
better,
but.
A
D
A
Sure
sure,
with
it
being
open
source,
we
can
certainly
do
more
to
try
to
understand
what
it's
doing
differently.
A
I
I
still
think,
though,
that
the
biggest
improvements
that
we're
likely
to
see
immediately
are
in
the
right
of
log
Behavior,
which
Igor
I
think.
Maybe
your
your
work
is
more
important
frankly
and
then
also
the
behavior
of
tombstones
with
iteration,
but
we're
we're
already
kind
of
trying
to
fix
that
anyway.
So.
B
Yeah
and
so
I
definitely
I'm
definitely
planning
come
back
to
this
alternative
writer.
Catalog
implementation
and
I've
got
one
more
idea.
C
B
It
might
be
helpful,
which
is
absorbing
rights
tools
which
looks
DB
triggers
during
compaction
from
time
to
time.
So
if
we
have
large
enough
external
right,
the
headlock
we
might
want
to
to
avoid
the
work
relevant
which
she
raises.
B
B
B
But
anyway,
I've
seen
some
latency
Peaks
periodic
latency
Peak
in
the
field
multiple
times
and
again,
I
have
the
feeling
that
it's
caused
by
rocks
a
bit.
So
we
can
absorb
that.
We
rectangle
like
a
headlock.
It
would
be
great
I
believe.
A
Yeah
yeah
I
I
was
I
was
very
very
impressed
by
the
numbers
that
we
were
seeing
by
your
your
experimental
work
earlier
in
the
I
think
in
the
spring
you're
looking
at
that,
it's
definitely
I
think
worth
continuing
effort
and
I'm,
not
I
mean
you
are
with
with
improved
for
xav
settings
I
think
we
were
seeing
like
122
000
write
apps
for
given
OSD,
which
is
really
really
good.
A
Yeah:
okay:
let's
see
moving
on
next
PR
upgrading
to
the
latest
rocks
to
be
from
Facebook
I'm.
Reviewing
that
one,
the
gist
of
it
is
that
the
the
we
can't
switch
over
to
the
rocksdb
version
directly.
We
have
to
use
our
existing
branch
that
has
a
fix
that
was
implemented
a
couple
of
years
ago
or
upgrade
scenarios.
A
The
the
author
was
trying
to
push
a
branch
directly
to
our
roxdb
repo
and
didn't
have
permission
to
do
so
so
I
I
just
this
morning
asked
if
they
could
Fork
that
repo
directly
Branch
from
our
Master
version
updated
to
roxdb's
latest,
and
then
it
created
pull
request
from
their
own
Branch
and
their
their
own
Fork.
So
we'll
see
if
they
have
success
doing
that.
A
Otherwise
we
really
need
to
look
at
access
permissions,
but
I,
don't
think
that
would
be
necessary
so
anyway,
that
one's
still
in
the
works
and
I
think
the
last
ER
I
have
here
is
that
Laura
made
an
additional
I.
Think
review
of
this
balancing
score
PR
from
Josh
Solomon
that
he
he
kind
of
described
for
us
last
week.
A
So
just
a
little
movement
on
that
one!
Well,
we'll
we'll
see
if
that
that
finishes
full
review
soon
or
not
in
the
no
movement
category
I
made
it
about
not
even
quite
halfway
through,
but
I,
think
I
think
I
looked
at
the
most
recent
stuff,
so
there
may
have
been
a
couple
here
that
that
were
closed
by
the
bot,
but
otherwise
I.
Don't
think,
there's
a
whole
lot
to
talk
about
there
all
right.
So
that's
it
for
pulling
pull
requests.
A
Let's
see
Josh.
We
didn't
finish
your
your
discussion
from
from
last
week.
So
would
you
like
to
continue
talking
about
your
the
latency
Spike
issues
that
you
were
seeing.
C
Yeah
and
actually,
since
the
last
time
we
met
we've
pretty
much
tracked
this
down,
so
I
followed
the
tracker
there
I'm,
not
sure,
if
folks
have
actually
looked
at
it
or
not,
but
I'll
very
briefly,
say
what
we
were
seeing
and
then
I'll
talk
a
little
bit
about
what
we
found
to
be
the
cause.
C
So
we
have
this
going
monitoring
software
that
we
use.
We
wrote
it
ourselves.
It
gives
us
things
like
a
histogram
of
latencies,
including
Max
latencies.
C
Over
time
periods
is,
of
course,
a
history
on
bucketed
so
like
if
we
see
a
five
second
Spike
that
could
be
anywhere
between
two
and
five
seconds,
just
to
do
out
of
our
moderate
works
and
that
sort
of
thing,
but
we
also
set
a
10
second
Raiders
off
timeout
in
the
software,
and
what
we
saw
after
Pacific
upgrade
was
that
multiple
clusters
also
we
start
to
see
this
time
of
fire
where
it
almost
never
fired
before,
and
it
was
very
irregular
that
we'd
see
five
second
Ten
Second
bucketed
latency
spikes
versus
before
some
of
these
clusters
that
were
better
behaved.
C
C
They
would
just
not
report
it
and
once
we
put
all
those
Clues
together,
Alex
and
me,
they
thought
aha
I
wonder
if
this
is
a
messenger
level
throttle,
and
so
the
issue
is
due
to
a
decision
made
Years
Ago
by
our
predecessors,
all
of
the
throttling
perf
stuff
had
been
disabled,
so
we
couldn't
actually
inspect
the
throttles,
so
we
did
turn
them
on
in
one
of
our
clusters.
We've
now
since
turned
them
on
everywhere,
because,
like
there's
just
no
reason
to
have
them
off
it.
C
There
was
a
performance
concern
years
ago,
but
like
I,
don't
think
it
was
ever
substantiated.
It
was
like.
Oh
some
vendor
told
us,
you
should
just
turn
this
thing
off,
and
so
we
did,
but
there's
lots
of
those
sorts
of
things
floating
out
there
in
The
Ether.
That
just
are
not
good
ideas,
so
anyways.
We
refer
to
that
and
it
was
very
clear
that
the
client
message
throttle
was
triggering
every
single
time.
C
We
saw
one
of
these
huge
latency
spikes
and
so,
of
course,
we
went
and
did
digging
and
found
out
that
had
been
re-enabled
in
octopus
and
Pacific.
It
would
have
been
re-enabled
in
Nautilus,
14
223
had
that
ever
been
released,
but
it
was
zero
in
Nautilus
and
the
funny
thing
is
we
actually
have
this
throttle
disabled
previously,
when
we
upgraded
to
Nautilus,
we
went
and
looked
we're
like.
C
Oh,
it's
set
to
zero
now,
so
we
don't
have
to
have
an
explicit
disable
anymore,
so
we
got
rid
of
it,
and
so,
when
we
upgrade
to
Pacific
now
the
throttle
got
enabled
for
the
first
time
essentially
for
our
Blockbusters,
where
it
had
never
been
enabled
before.
C
We
actually
had
a
case
where
our
monitoring
Suite
was
actually
starved
for
three
minutes
straight,
could
not
get
any
IO
over
through
to
an
OSD,
and
that's
not
surprising,
because
all
it
takes
is
one
customer
who
does
keep
getting
through
to
the
OSD
over
and
over
and
over
again,
and
all
these.
These
connections
that
get
delayed
just
keep
getting
delayed,
delay
delay
delay.
They
can
never
get
their
I
o
through.
C
A
A
C
We
try,
we
tried
setting
this
throttle,
so
the
default
in
Pacific
is
256..
We
tried
setting
it
to
40.96,
it
wasn't
high
enough.
We
said
we
set
it
to
16
384
and
it
mostly
helped
but
like
and
so
I
mean
I
understand
the
the
reasoning
that
I
eventually
finally
dug
out
of
some
red
hat
documentation
somewhere
is
basically
we
you're
trying
to
prevent
a
OSD
flap
when
there's
too
much
client
traffic
right
is
that
the
base
reason
for
this
I.
C
So,
like
the
setting
of
256
probably
makes
sense
for
a
spinner
and
don't
know
that
it
really
makes
sense
on
flash
where,
like
you
could
get,
you
know,
30
000
iOS
done
a
second
in
some
cases,
right,
yeah,
so
yeah
when
I
did
some
digging
around
the
internet.
Of
course
everyone
has
their
own
setting
for
this.
I
did
see
someone
set
it
as
high
as
like
65
64k
or
64
Chevy,
okay,
but
I
mean
I'm
sure
most
people
are
just
like
setting
it
to
random
stuff
and
then
once
their
problems
go
away.
C
They
never
really
think
about
the
implications
are
of
how
they're
studying
it
yeah.
So
anyways,
like
I,
said:
we've
turned
it
off,
because
that's
the
only
thing
that
we
found
was
safe
and
our
our
at
least
our
block
workloads.
Our
block
workloads
are
much
happier
with
it
off
than
on
pretty
much
any
level.
C
It's
hard-coded
because
I
would
actually
really
scared,
because,
when
I
was
facing
through
this,
the
units
for
that
timer
value
are
actually
wrong
in
the
header.
C
This
goes
back
to
like
I,
don't
know
2016
or
something
like
that.
The
units
were
actually
changed
in
this
timer
to
go
from
milliseconds
to
microseconds,
and
so
it
actually
looks
like
it's
a
one
second
delay.
Until
you
dig
into
the
implementation
you
find
out,
it's
actually
microseconds
underneath
the
cover
they
they
updated.
The
units
in
the
CC
file,
but
not
in
the
header,
so
I
do
have
a
PR
up
for
that
somewhere
to
like
fix
the
header
file.
C
So
the
unit
is
documented
correctly,
no
okay,
so
it's
not
configured
I'm,
not
like
at
the
end
of
the
day,
like
the
only
way
to
really
avoid
starvation
here.
Is
you
almost
want
to
have
like
the
state
machine
throttle
mode
where,
as
soon
as
you're
throttling
every
single
connection
has
to
go
in
a
queue
and
you're
always
choosing
from
head
of
queue?
And
then
you
get
a
fifo
across
connections,
and
then
you
don't
avoid
the
starvation
problem
even.
C
Be
to
actually
have
the
connection,
sorry,
the
messages
in
a
queue
because
then
you
get
first
come
first
serve
on
the
connections,
but
that
destroys
the
whole
point
of
the
throttle,
which
is
don't
pull
the
messages
off
the
connections
until
they're
ready
to
right.
So
the
next
best
thing
is:
all
the
connections
go
on
a
queue.
You
round,
you
fifo
the
connections.
Every
time.
C
Put
it
at
the
back
and
you
keep
doing
it
yep
yep,
so
this
would
have
to
be
done
for
every
throttle.
Implementation
make
them
all
fair
right
like
this
is
not
just
like.
Oh
it's
just
this
one,
throw
that
has
this
problem.
Basically,
anything
in
the
messenger.
Has
this
problem
and
I
haven't
looked
at
things
like
the
other
throttles,
see
how
they're
implemented,
but
at
least
in
the
connection,
in
down
for
inbuild
messages
and
the
connection
they're
all
susceptible
to
this
problem.
B
C
B
But
but
how
these
does
these
explain
the.
C
Right
amplification,
it
does
not
two
separate
problems.
We
still
don't
know
what's
causing
the
right
amplification
so
like.
If
we've
got
time
to
talk
about
that
one,
we
can
talk
about
the
digging
lead
on
the
last
week
on
that
one
too,
but
absolutely
yeah
yeah
see
within
the
p100
thing
I'm
like
glad
we
figured
it
out
glad
we
were
able
to
tweak
it
with
a
setting
to
make
the
problem
go
away.
I
I
do
found
that
tracker.
It's
one
of
those
things
where
it's
like
I
would
love
to
just
go
and
work
on
it.
C
A
Yeah
I've
I've,
been
kind
of
feeling,
like
we've,
been
ignoring
the
messenger
too
much.
There's
there's
a
lot
of
stuff
there
that
I'm
kind
of
scared
of-
and
this
may
be
reiterate
that
we
need
to
go
through
and
and
look
at
it
a
little
bit
more
closely
again.
C
Yes,
continue
on
continue,
okay,
so
write
amplification,
so
we
were
all
excited
last
week
because
I
think
you
and
others
were
just
trying
to
think
what,
if
it's,
the
Deferred
rights
thing.
So
we
spent
a
lot
of
time
trying
to
chase
that
angle.
Internally,
I
think
we've
pretty
much
concluded
from
the
available
stats
from
perfdump
that
there
is
no
increase
in
the
number
of
deferred
rights
between
Nautilus
and
Pacific
and
our
systems
it
looks
pretty
much
equivalent.
C
I
did
want
to
correct
one
thing:
I
said
last
week,
which
is
I
thought
Oliver
osts
were
4K
mask,
that's
not
true,
anything
that
was
deployed
under
luminous
and
before
or
actually
16k,
Mass
yeah,
and
so,
as
you
would
expect,
those
osds
show
higher
deferred
right
rates
than
the
4K
Mass
ones
in
general,
and
we
can
see
that,
but
there's
no
there's,
no
there's
no
change
to
either
the
16k
or
4K
in
terms
of
how
many
deferred
rights
are
happening
per
second
once
we
do
the
upgrade.
C
So
it's
not
a
change
in
the
count
of
deferred
rights.
Could
it
be
the
Deferred
rights
are
more
expensive,
I
mean
that
would
probably
take
some
digging.
If,
if
that's
exactly
what's
happening
there
I
don't
know
the
one
thing
I
did
notice
is
when
I
filed
the
ticket
it
looks
like
beat.
This
is
where
you
start
to
get
into
like
these.
You
can
go,
you
can
go
and
slice
the
statistics
to
give
you
all
sorts
of
interesting
numbers,
but
whether
they're
actually
like
relevant.
C
A
Okay,
interesting
it
more
absolute
or
more,
is
the
ratio
of
the
I
o
size
or
both?
Let
me
bring
up
the
ticket.
C
C
So,
like
it
looks
like
the
4K
Nas
hosts
the
average
I
o
size
jumped
from
I,
don't
know
what
you
want
to
call
it
like
13.
Debbie
up
to,
or
is
it
probably
kilo
up
to
like
16.
C
Okay
and
then
you
can
like
this
is
you
can
also
see
there's
that
yellow
line
that
kind
of
transits
between
the
top
group
and
the
bottom
group
yeah
that
host
we
were
actually
reconditioning
all
like
rebuilding
all
the
osds
on
sorry.
C
A
A
Right,
like
it's
yeah,
it's
like
some
of
the
yeah.
It.
A
C
I'd
have
to
go
and
dig
like:
we've
had
like
these
systems
also
have
a
mix
of
at
least
two
different
generations
of
Hardware
in
them
and
then
also
deployment
across
in
this
case,
with
the
deployment
across
luminous
Nautilus
and
then
obviously
now
Pacific's
on
the
system
so
like
they
have
all
these
interesting
histories,
where
we
almost
have
to
dig
on
a
per
host
group
basis
to
go
and
figure
out.
Okay.
What
is
the
exact
history
of
how
we
got
here
so
I?
A
Yeah
I
was
curious.
If
that
would
help
explain
any
of
this,
like
you
could
understand
the
difference
between
those
but
I.
Don't
I,
don't
know
if
it
does
or
not.
Maybe
okay,
but
the
big
difference
right
is
that
we
see
that
the
the
ones
with
a
16k
monoxides
are
are
differently
showing
higher
right
sizes.
Then
the
small
one
as
you
would
expect,
although
it
is
interesting
that
and
they're
not
that
far
apart.
C
A
C
Very
small
yeah,
like
I
I,
mean
to
my
eyes.
There
is
a
small
jump,
but
the
other
weird
thing
is:
you
can
see.
The
average
iOS
house
was
actually
falling
in
the
recent
history
of
Nautilus
up
until
the
upgrade,
and
then
it
looked
like
it
started
increasing
after
that
and
like
the
unfortunate
thing
is
that
that
cut
off
there
is
as
much
history
as
we
have
so
yeah.
We
like,
we
have
monthly
Cycles.
We
have
weekly,
there's
all
sorts
of
interesting
cycles
that
happen
in
our
system,
so
it's
entirely
possible.
A
C
What
I
did
attach
to
the
ticket
is,
we
did
get
a
perf
dump
across
three
different
osds
on
a
different
system
that
we
upgraded
both
before
and
after
the
upgrade
the
same
three.
A
It's
also
kind
of
interesting
that
I
mean
it
almost
looks
like
at
the
very
beginning
of
this
race.
The
at
least
some
of
these
hosts
look
like
they
were
in
the
same
ballpark
as
after
the
Pacific
upgrade
right
like
the
in
that
that
the
ones
had
the
4K
men
Alex
eyes.
It
looks
like
half
of
them
maybe
started
out
pretty
close
to
where
Pacific
ended
up
yeah
I
mean
not
quite
they
were
definitely
a
little
lower
there,
but,
like
you
said
there
could
be
some
fluctuations
over
time.
A
C
Be
bringing
that
up
now,
the
binomial
distribution
actually
still
exists.
C
A
D
D
We
would
but
last
week,
Jeff
said
that
also
total
amount
of
data
transferred
is
is
increased
and
that
cannot
be
explained
by
its
fragmentation.
A
A
I
was
wondering
more
about,
like
a
is
there
any
possibility
that
we
end
up
with
additional
additional
rights
due
to.
A
A
D
C
A
A
A
B
A
C
C
I
I'm
doing
I'm
I'm
playing
this
dumb.
Half
it's
like
half,
okay,.
A
C
C
A
C
We
say
Alex
Yeah,
we
actually
did
she
upgrade
recently.
We
actually
collected
with
some
debug
levels-
correct,
I,
don't
remember
which
ones
I'll
see,
which
ones
are
we
paying
for
that.
A
C
A
If
you
want
to,
you
can
try
running
there's
a
script
in
CVT
for
just
making
those
statistics.
Look
nicer.
I've
mentioned
it
before,
but
you
can
try
running
this
python
script
on
the
before
and
the
after
ones,
and
that
will
give
you
a
whole
bunch
of
information.
Like
a
summary
statistics
on,
like
the
number
of
input
and
output
records
and
the
the
write
rates
for
roxdb.
A
That
might
actually
tell
you
whether
or
not
rocksdb
is
like
writing
out
more
data
before
and
after
maybe
it'll
help
explain
it
if
it
is.
A
Both
a
super
interesting
Josh,
especially
thank
you
for
for
digging
in
on
that
issue,
with
the
messenger
we
we
kind
of
tested
that
and
we're
like.
Okay,
let's
it's
looking
good,
let's,
let's
merge
it
and
didn't
didn't,
hear
any
issues
from
anyone
and
that's
unfortunate.
A
C
C
I
have
like
we
have
not
attributed
any
user
complaints
to
it,
although
we
know
that
users
were
seeing
it
like
I
I've
gone
and
run
fio
and
VM
over
multiple
days
and
like
I,
saw
fioc
like
10
to
20
second
latency.
Sometimes
so.
C
I
know
that
I
know
it
was
hitting
people,
but
nobody
complained
about
it
and
we,
we
wouldn't
have
noticed
either
other
than
the
fact
that
we
had
started
paying
a
lot
more
attention
to
our
p100
starting
about
a
year
ago.
So
yeah
I
wouldn't
be
surprised.
If
this
is
like
especially
like,
if
people
have
been
running
luminous,
they
would
have
been
seeing
the
same
thing.
I'm
assuming
like
I,
didn't
look
at
the
V1
messenger
I'm
assuming
it's
throttle.
Implementation
is
the
same,
but
maybe
it's
not.
D
C
All
right
well
we'll
we'll
try
to
dig
at
our
the
those
rocks,
DB
logs
a
little
bit
and
see
but
I'll
definitely
admit
on
our
side.
We're
kind
of
like
out
of
ideas
at
this
point
as
to
what
to
investigate
other
than
like
right
now,
the
path
of
like
really
heavy
tracing,
I
O,
and
trying
to
pick
that,
apart,
which
we
don't
have
time
to
do.
A
B
Yeah,
just
rough
briefly
from
become
encounters
for
tickets,
SG
683,
specifically,
what
I
can
see
is
a
pretty
high
difference
in
log
bytes,
like
blue
effects,.
B
C
That's
the
exact
stat
name
there
ready.
What's
the
exact
stat
name
that
you're
looking
at.
B
Do
variable
slope,
bytes
and
locked
bytes
log
bites
if
I
remember
correctly,
that's
log
size
and
locked
bytes,
its
amount
of
bytes
Legion
to
lock
the
period
and,
unfortunately,
the
legs
the
same
star
r,
no
in
not
always
locked
by
its
present
as
well.
It's
zero
four
six,
eight
three,
why
it's
13
mega
X
5
megabyte
for
about
five.
Fifty
fifty
three
megabyte
for
Pacific.
B
C
One
of
the
things
that
we
do
have
in
Prometheus
is
blue
FS
bites
written
wall.
B
B
B
C
B
B
Some
increase
in
by
the
written
wall.
Well,
it's
like
30
25,
not.
D
There
is
also
one
change
now:
blue
FS
reports,
a
much
more
space
available
for
storage
and
it's
consistent
with
the
change
we
made.
But
previously
we
had
it
separated
storage
for
blue
fs
and
for
blog
data,
and
we
had
a
mechanism
that
regularly
gifted
or
reclaimed
some
large
portions
of
the
main
device
for
blue
FS.
So
they
can
so
Bluffs
can
get
a
underlying
continuous
space.
D
D
A
Well
guys
we
are,
we
are
a
little
bit
over
that
a
good
place
to
wrap
up.
Do
you
think.
A
B
B
B
Yeah
I'm
curious
I'd
suggest
to
check
local
bicep
larger
scale,
how
it
behaves.
Okay,.
C
I
yeah:
we
we
export
a
bunch
of
these
for
dumb
things
to
from
the
Prometheus
I.
Don't
think
we
export
those
ones
yet
so
well,
we
can
do
that
and
then
observe
across
one
of
our
upgrades,
for
example,
and
see
what
happens
and
see
what
happens
long
term.
A
A
A
See
you
later,
okay
and
guys
I
think
I
think
it's
probably
a
good
good
point
for
us
to
wrap
up
too.