►
From YouTube: Ceph Performance Meeting 2022-04-07
Description
Join us weekly for the Ceph Performance meeting: https://ceph.io/en/community/meetups
Ceph website: https://ceph.io
Ceph blog: https://ceph.io/en/news/blog/
Contribute to Ceph: https://ceph.io/en/developers/contribute/
What is Ceph: https://ceph.io/en/discover/
A
Okay,
max
29k,
okay,
so
yeah
once
I've
got
some
of
this
quincy
stuff
or
taken
care
of,
I
can
maybe
try
to
take
a
look
and
help
debug
what's
going
on
there.
A
It
is
interesting
that
you
saw
it
with
a
second
nvme
drive
as
well,
who
knows
maybe
master
is
messed
up
or
something
but
I'll
see.
If
I
can.
A
All
right,
so,
the
the
what
gabby's
talking
about
is
he's
had
trouble
on
one
of
our
our
fast
nvme
test
nodes
internally
with
master
for
the
last
couple
of
weeks.
Getting
good
test
results
out
of
it.
A
In
the
past,
we've
seen
around
70
to
80,
000,
small,
random,
right
iops
and
he's
seeing
more
like
20
to
30
right
now,
I've
tested
something
fairly
recently
that
I
thought
was
based
on
master
on
the
mako
notes
and
still
saw
high
performance,
but
I
should
go
back
and
just
verify
that
that
was
actually
on
master.
I
think
it
was,
but
anyway
things
to
figure
out
and
look
at
okay.
A
So
this
week
I
didn't
get
quite
through
all
the
old
pr's,
but
I
don't
think
there's
probably
been
a
ton
of
updates
on
them.
So
I'm
not
gonna
worry
about
it
too
much.
We
do
have
two
new
prs
that
I
made
this
week
and
these
both
relate
to
the
avl
allocator
topic
that
we've
been
discussing
so
a
reminder.
A
We
changed
the
way
that
we
determine
when
to
go
into
best
fit
mode
in
the
avl
allocator
and
instead
of
continuing
on
in
your
fit
mode.
Last
summer,
the
changes
that
we
made
basically
limit,
based
on
the
number
of
bytes
distance.
We
have
to
search,
and
also
the
number
of
iterations,
that
we
search.
A
A
A
The
gist
of
it
basically
right
now
is
that
even
trying
to
I'm
sorry,
you
should
look
at
the
avl
adaptive
enough
tab,
so.
A
That
on
these
samsung
drives,
whenever
we
kind
of
use
these
limitations.
A
We
we
see
a
fairly
big
slowdown
in
large,
sequential
writes.
It
appears
to
be
because
the
allocation
pattern
changes
dramatically
instead
of
doing
really
linear
allocations
just
straight
across
the
drive
like
we
did
previously
now
we're
seeing
this
kind
of
blurry
of
I
o
spread
across
the
disk,
all
64k
we're
not
fragmenting
smaller
than
that.
We're
really
consistently
writing
64k
ios,
but
this
kind
of
pattern
of
spring
the
entire
block
device
with
ios,
makes
the
samsung
drives
unhappy.
A
I
did
try
increasing
the
the
parameters
that
we'll
have
in
that
pr
kind
of
4x
and
8x.
Those
are
on
lines
8
and
14
in
that
first
mako
kind
of
set
of
columns
and
that
helps,
but
it
doesn't
eliminate
the
issue.
A
A
Here
makes
it
so
that,
instead
of
deciding
when
to
switch
into
best
fit
mode
based
on
you
know
the
cycles
and
bytes
instead,
it
just
does
it
based
on
the
amount
of
time
that
you've
spent
in
near
fit
mode,
and
when
you
exceed
that
time,
then
it
switches
the
default.
I've
got
right
now
is
one
millisecond
that
was
enough
to
kind
of
keep
the
behavior
in
the
fast
mode,
so
that
seemed
to
work
it.
A
A
So
yesterday,
at
the
the
clt
meeting,
we
kind
of
decided
that
we
want
to
look
at
more
tests,
especially
on
hard
drives
and
see
what
the
impact
is.
A
There,
david
galloway,
very
kindly
got
one
of
the
older
incentive
nodes
set
up
with
syntax
stream
for
me
really
quickly,
and
so
I
did
start
getting
results
from
those
they
have
intel
p,
3700
nvme
drives
and
they
have
hard
drives,
and
so
I
looked
at
the
p3700s
bear
and
I
don't
have
results
yet
for
my
pr,
but
I
do
have
results
for
quincy
with
the
the
kind
of
default
current
behavior
and
then
kept
reverting
back
to
the
older
pacific
behavior
and
there
the
effect
is
minimal
there.
A
There
seems
to
maybe
be
a
slight
effect
because
it's
always
a
little
slower
with
the
quincy
defaults.
It's
not
dramatic,
it's
maybe
like
one
percent,
but
it
it
does
seem
to
be
consistently
a
little
slower
in
that
mode,
whereas
on
the
hard
drive
case,
which
is
really
interesting
in
the
the
tests
that
I
run,
these
are
really
short.
This
was
kind
of
like
the
minimal
set
of
tests.
I
could
run
to
showcase
the
behavior
on
the
samsung
drives.
A
It
looks
almost
identical,
there's
very,
very
little
difference,
but
there
hasn't
been
a
lot
of
rights
like
small
random,
writes
to
drive,
and
I
can
verify
that.
I
still
see
the
same
patterns
I
see
with
on
the
nvme
drives,
but
it's
a
little
different
in
this
case.
We
still
see
these
64
kios,
but
they're
lumped
together
in
groups
of
64.,
so
in
reality
we're
writing
like
64k
rights,
completely
sequentially
in
the
case
where
it's
like
pacific,
the
old
behavior
and
then
in
the
current
default
quincy
behavior.
A
We
see
64k
rights
grouped
into
blocks
of
64
that
are
written
sequentially
and
then
those
are
scattered
around
the
disk
and
when
I
looked
at
longer
running
tests
where
instead
now
I'm
instead
of
writing
these
tests
for
like
30
seconds
they're
running
for
30
minutes
in
those
tests,
it
actually
looks
like
this
change
was
maybe
slightly
better.
A
When
I
disabled
it.
I
saw
lower
numbers
in
some
of
the
tests,
at
least
initially
by
the
third
iteration,
maybe
less
so,
but
definitely
it
was
different.
So
it's
possible
that
actually,
the
current
behavior
in
quincy
is
doing
better
on
hard
drives,
maybe
hard
to
say
so.
That's
where
I'm
at
right
now
need
to
do
a
lot
more
testing,
probably
on
this,
but
we
do
need
to
make
a
decision
on
what
to
do
for
quincy,
so
really
consistently.
A
We've
seen
that
this
behavior
on
the
samsung
drives
is
bad,
it
hates
the
change.
Well,
these
drives
hate
the
change
that
we
made
and
we're
able
to
get
into
the
mode
pretty
easily,
where
they're
they're,
showing
fairly
significant
degraded
performance,
whether
or
not
on
other
nvme
drives
and
the
intel
drives
there's
very
little
difference
it.
Doesn't
they
don't
seem
to
care
one
way
or
another
now
on
hard
drives,
we'll
have
to
see
how
this
plays
out
but
yep?
That's
that's!
A
Basically
it
so
one
one
question
your
you
and
I
have
been
discussing
this
quite
a
bit.
I
wanted
to
ask
you
with
the
test
that
you've
been
running,
that
workload
test.
Do
you
have
kind
of
any
idea
like
what
what's
happening
in
that
test?
Where
you're
kind
of
showing
the
the
the
current
behavior,
you
know
faster
allocations,
then
I
I
it
makes
sense.
I
agree
with
you
that
it
would
do
that,
but
do
you
know
what
what
the
the
workload
is
there.
B
Well,
I
think
the
the
issue
is
not
the
payload
itself,
but
the
implementation
of
the
disk
of
the
space,
so
the
replay
payload
is
pretty
trivial.
I'm
trying
to
replicate
the
payload
coming
during
dp
compaction.
B
B
User
allocation
unit
is
16k,
so
it's
a
single
volume
and
db
shares
the
checks,
the
volume
and
hence
in
high
frequency
in
high
fragmented
space.
It
might
be
tricky
for
wfs
to
get
64k
continuous
blocks
and
it
looks
like
without
these
limits
on
avl
locator
might
take
pretty
long
time
to
search
for
such
continuous
blocks.
A
B
A
B
Well
again,
it's
not
exactly
regular
operation,
it
was
compaction
and
the
compaction
well
for
for
regular
operation.
I
can
share
just
for
latency
graphs
before
we
will
default
hybrid
allocator
and
then
after
switching
to
stupid
one
and
the
difference
is
crazy,
yeah
and
on
the
same
cluster
I
performed
db
compaction
and
for
hybrid
allocator.
It
took
something
like
50
minutes
versus
5
minutes
on
stupid,
allocator,
yeah.
B
Right
right
so,
but
but
what
what
compaction
requires
from
the
allocator
is
the
repetitive
allocation
of
this
500
k,
chunks
of
64k
continuous
blocks.
So
it's
it's
probably
the
worst
pattern
in
case
of
fragmented
space,
so
yeah
every
every
allocator,
every
allocation.
B
Needs
to
to
look
up
for
well,
if
continuous
block,
which
is
tricky
scenario,
but
again
the
regular
operation
was
crazy
as
well,
and
actually
I
think
that
if
I
compare
allocation
durations
for
stupid
and
hybrid
allocator,
so
I
can
see
something
like
two
microseconds
versus
one
or
two
milliseconds.
B
So,
probably
that's
not
so
that
that's
actually
great
difference
as
well.
So
it
might
cause
well
pretty
significant
impact
on
the
overall
performance,
because,
while
a
real
allocator
searches
for
of
this
continuous
block
for
for
two
milliseconds,
it
it's
locked,
so
other
operations
are
not
able
to
proceed
proceed
on
the
same
or
again
allocate
location.
So,
in
fact,
with
two
millisecond
location
duration,
you
can
get
something
like
500
allocations
per
second.
A
B
A
B
So
if,
if
you
have
pretty
fragmented
space
again,
it
might
be
tricky
to
find
continuous
blocks.
That's
that
that's
the
key
point
here.
A
Yeah
but
but
my
my
the
reason
I
was
asking
is
like
when,
when
you're
seeing
some
of
these
big
numbers
come
back,
that's
for
a
512k
allocation.
B
B
B
Well,
this
new
patch,
hybrid,
allocator
and
original
one
that
so
the
original
behavior
is
much
worse
than
stupid
one
or
this
patch
kind
of
like
it.
I
mean
patch,
which
introduces
this
this
search
limits,
not
not
your
old
backpack.
B
So
you
can
see
yeah
and
you
can
see
this.
B
Duration,
then
want
you
need
max
hint
in
hex.
So
the
first
number
is
the
requested
block
size
then
allocation
unit
and
well,
some
some
additional
parameters.
So
it
was
128
k,
chunks.
A
Okay,
okay,
so
in
this
case
we
could
either
get
a
128k,
continuous
chunk
back
or
two
or
would
be
or
260
4k,
or
were
you
saying,
16k
yeah.
C
A
B
In
your
case,
it
might
get
even
4k
4k
chunks.
So
since
we
we
now
have
4k
allocation
unit
for
okay
offices
and
hdd,
so
potentially
you
can
have
up
to
down
to
4k
blocks.
A
Blocks
back
from
the
allocator
mm-hmm,
I
don't
think
we
are
because
it
looks
like
I'm
always
writing
out
in
units
of
64k.
B
B
A
Well,
the
yeah,
the
fact
that
you
see
this,
you
know
huge
difference
right
when
you
you
switch
from.
Basically,
I
think,
basically,
what
it
means
is
we're
we're
only
spending
a
very
small
amount
of
time
in
in
best
fit
and
sorry
near
fit
and
then
switching
to
best
fit
right
away.
I
mean
almost
to
the
point
where
I
wonder
how
often
we're
even
using
near
fit
in
this.
These
tests
that
you,
you
used.
B
A
Yeah,
it
might
test
us
looking
like
we're
really
quickly
going
into
into
best
fit
as
well
almost
to
the
point
where
I
wonder,
if
you'd
be
better
off,
just
not
even
bothering
doing
your
fit
search
at
all.
B
Because,
as
far
as
I
remember,
some,
there
are
some
tunings
in
the
alligator
which
enforce
switching
to
after
that
second
mode
in
case,
like
in
case
of
pretty
high
full
ratio
like
ninety
percent
of
disk,
is
full.
B
A
B
But
so
you
are
saying
that
the
full
ratio
for
d
screen
your
case
is
pretty
low.
B
So
we
need
to
revise
this
logic
for
mode
switching
and
maybe
something
from
there
or
something.
A
B
A
Yeah,
so
I
mean
in
terms
of
the
performance
problem,
I
actually
recorded
the
block
trace
data
so
that
I
could
then
play
back
the
same
workload
using
fio
and
ignoring
timestamps,
and
that
kind
of
thing,
and
if
I
just
replay
back
the
that
right
workload
on
the
drive
in
isolation,
it
does
it
fine.
B
C
B
A
B
Well,
for
a
play,
I
don't
need
drive
at
all,
so
I
just
so.
The
replay
tool
just
well
tries
to
locate
to
to
call
a
locator
and
then
measures.
How
long
does
it
take,
and
that
is
the
results
well,
if
we
can
extend
it,
so
it's
it
doesn't.
No,
no,
no!
No,
no
need
for
specific
devices
for
that
that
everything
happens
in
memory.
A
Since
I've
got
this
window
shared
for
for
people,
I'll
show
what
these
patterns
look
like,
so
this
is
basically
what
we're
seeing
with
the
current
defaults.
This.
This
looks
a
little
different
depending
on
the
drive
that
you're
running
on,
because
some
are
faster
than
others,
so
it
might
show
slightly
different
oscillation
behavior,
but
this
is
kind
of
what
we're
seeing
in
master
right
now-
and
this
is
this
kind
of
behavior,
where
it's
just
going
straight.
A
A
C
A
And
actually
I'll
change
to
the
other
pr,
let's
see.
A
A
Doing
a
little
bit
better,
though
same
thing,
with
even
bigger
search
space
eight
times
what
the
default
is
still
seeing
this
pattern,
and
then
this
is
the
change
I
made
to
do
based
on
time
with
a
small,
well
sort
of
small
500,
microsecond
search
time
allowance
and
then,
when
we
move
up
to
one
millisecond,
that's
when
we
go
back
to
this
behavior
where
we're
in
your
fit,
instead
of
falling
back
to
best
fit.
A
We
see
this
linear
allocation
and
high
throughput
on
the
samsung
drive,
and
I
did
two
milliseconds
as
well,
but
it's
the
same
same
behavior,
so
yeah
eager.
What
perplexes
me
is
that,
on
a
workload
like
this
with
a
very
very
lightly
filled
disc,
we
could
have
near
fit.
Take
a
millisecond
right,
I
mean.
Doesn't
that
seem
crazy.
B
To
get
to
allocate
four
megabyte
chunk.
A
Yeah
so
I'll
be
informing
white
chuck,
hey
casey
mentioned
he's
gotta
go
soon.
Igor.
Do
you
want?
Would
you
have
time
to
continue
this
offline?
Maybe
we
can
talk
more
about
it.
A
Okay,
cool-
maybe
we
can
continue
this
we're
really
in
the
weeds
here.
I
don't
know
if
people
care
about
this,
that
much
or
not,
but.
A
Maybe
we
can
wrap
this
one
up
for
now
for
here
that
so
casey,
the
the
tracer
pr
did
you
have
anything
quickly
you
wanted
to
mention
about
it.
D
C
Cool,
I
think
that
ombre
initially
he
was
running,
the
test
was
like
30
seconds.
So
what
he
really
observed
was
whatever
was
happening
at
the
beginning
and
it
was
really
fluctuating
so
now
he's
running
that
for
two
minutes
or
three
minutes
and
it's
much,
which
is
much
more
stable.
A
Okay,
so
honestly,
I
probably
won't
look
at
it
until
that's.
This
is
all
done.
We're
trying
to
do
that
specifically
yeah.
C
A
A
Maybe
moving
on
gabby,
we
were
talking
core
about
the
the
performance
issues
you're,
seeing
on
official.
A
Yeah,
I
don't,
I
don't
know
if
that's
master
the
slope,
I
don't
think
it
is,
but
it
could
be.
Maybe
that's
have
you
tried
testing
pacific
out
of
curiosity.
A
A
Yeah
that
should
do
it.
I
just
linked
it
or
put
it
in
chapter,
but
here
I
can
give
you
a
link
to
it
too.
A
There
you
go
so
yeah
yeah,
just
check
that
out
and
and
it'd
be
really
interesting
to
see.
If
that
fixes
it
or
not.
If
it
doesn't,
then
then
yeah,
we
definitely
need
to
figure
out
why
that
note
is
going
slow.
Now.
E
Yes,
so
when
I
tried
to
change
from
to
envy
me
one
that
the
thing
just
stayed.
E
E
And
blue
store
and
so
on,
so
I
try
to
change
all
of
them
from.
A
Interesting
there
appears
to
be
rights
in
this
test
that
I
did.
I
got
this
when
I
was
on
it
earlier
with
you,
and
it
appears
that
there
are
rights
both
to
nvme,
0
and
nvme
1
in
that
test.
A
A
E
A
Okay
gabby,
my
thought
is:
maybe
we
take
this
out
of
the
performance
meeting
and
I
think
we
will
clock
profile
the
osd
while
it's
running
and
take
a
look
and
find
out
what
it's
doing.
Let's,
let's
start
our
own
cat
after
the
meeting
here
and
then
we'll
go.
E
A
Okay
sounds
good
all
right
anything
else
that
anyone
wants
to
bring
up
this
week
before
we
wrap
up.
A
All
right
well,
then,
thank
you.
Everyone
for
coming.
It
might
have
been
a
little
dry
this
week.
Sorry
about
that,
but
thanks
thanks
for
the
pending
and
we'll
see
you
bye
next
week.