►
From YouTube: Ceph Performance Meeting 2022-07-14
Description
Join us weekly for the Ceph Performance meeting: https://ceph.io/en/community/meetups
Ceph website: https://ceph.io
Ceph blog: https://ceph.io/en/news/blog/
Contribute to Ceph: https://ceph.io/en/developers/contribute/
What is Ceph: https://ceph.io/en/discover/
A
All
right
well,
then,
for
discussion
topics,
there's
one
here
that
I
want
to
talk
about
with
laura
regarding
this
blue
star:
zero
block
block
detection
pr,
it
turns
out
that
that's
already
been
committed
to
quincy
it's
already
there,
but
we
turned
it
off
in
1721
and
actually
saw
a
performance
regression
relative
to
previous
versions
of
quincy.
So
we
want
to
talk
about
that
and
try
to
understand
what's
going
on
there,
but
until
laura
gets
here.
A
Maybe
we
can
talk
about
something
else,
so
one
other
topic
that
we've
been
looking
at
recently
is
memory
usage
in
the
way
that.
A
We
track
dupe
ops
for
the
pg
log.
It
turns
out
that
when
you
have
a
corrupted
dupe
entry
that
looks
like
it's
in
the
future,
we
stop
trimming
and
that
allows
dupes
to
accumulate
basically
until
that
corrupt
entry
is
eventually
trimmed,
which
could
be
very
very
far
in
the
future.
A
So
we've
done
a
lot
of
testing
and
done
quite
a
bit
of
work
to
try
to
figure
out
the
right
way
to
fix
that.
The
current
idea
is
to
basically
detect
it
and
then,
when
an
osd
is
rebooted
with
the
fixed
version
of
the
code
and
rights
are
issued,
we
then
slowly
well
somewhat
slowly
trim,
I
think
10
000
entries
at
a
time
for
every
right.
A
If
I
remember
right
and
and
then
get
back
into
a
stable
state,
there
was
an
earlier
version
of
this
pr
that
tried
to
trim
everything
on
osd
boot
up
so
that
we
didn't
consume
the
memory
at
all
on
reboot.
But
I
think
there
were
some
concerns
about
the
safety
of
that
approach.
So
this
is
the
one
I
think
we're
looking
at
right
now.
Neha
do
you
know?
Do
we
have
a
final
version
of
that?
Pr
did
radic.
B
Yeah
I'll
paste
it,
why
don't
you
continue
I'll
paste
it
offline.
A
Right
here,
okay
sounds
good,
so
so
I
think
we're
basically
fairly
convinced
that
that's
the
right
way
to
go
at
this
point.
I've
done
some
testing
on
it
and
it
looked
good
in
my
my
tests.
I
think
there's
some
more
official
tests
of
the
you
know
downstream
releases
or
or
you
know,
other
other
packages
that
are
are
being
tested
now.
So,
as
far
as
I
know,
that
looks
like
it's
pretty
good
and
we're
gonna
move
forward
with
it
any
and
they
have
any
any
other
thoughts
on
that
topic.
B
No,
I
think
we
are
trying
to
do
our
due
diligence
from
all
sides
to
validate
this,
and
at
this
point,
what
what
we
are
doing
is
also
validating
it
on
hdds,
which
is
something
you
know
we
haven't
done
upstream.
So
that's
where
we
stand.
If
things
look
good,
this
will
be
shipped
in
the
next
corresponding.
This
bug
goes
as
far
as
octopus,
with
octopus's
end
of
life.
We
don't
want
to
risk
the
upgrade
approach.
B
We've
already
shipped
the
I
mean
we
will
be
shipping
the
offline
tool
method
in
octopus,
but
up
until
pacific
we'll
be
shipping.
This
fix.
A
Cool
all
right,
so
then,
now
I
see
laura
you're
here.
Do
you
want
to
give
a
quick
update
on
on
what
you
guys
found
regarding
the
blue
store,
zero
block
detection,
pr
performance
improvement.
C
A
C
Sure
so,
yeah
in
the
dfg
storage
team
we've
been
studying
the
differences
between
17,
2-0
and
17-2-1
in
performance,
and
we
saw
that
1721
had
a
performance
decrease
from
17
to
zero
and
we
were
trying
to
better
understand
that
and
we
checked
the
differences
in
commits
and
found
that
there
were
two
major
changes
to
blue
store
between
the
two
versions
and
one
we
ruled
out.
C
But
the
other
was
blue
story:
zero
block
detection
being
disabled
by
default,
and
this
was
a
feature
added
to
1720
and
which
was
on
by
default
and
essentially
this
feature
detects
zero
buffer
lists
and
skips
writing
them
to
bluestore
and
the
goal
with
that
was
so
that
we
could
perform
some
large-scale
tests
and
mainly
in
tautology.
C
So
we
can
perform
tests
with
many
osds
without
filling
the
devices.
But
we
found
in
1720
that
this
caused
some
unwanted
effects.
Side
effects,
so
we
disabled
it
for
1721,
so
we're
not
seeing,
for
instance,
side
effects
with
rbd
thick
provision
images
being
thinned.
C
However,
in
1720
it
caused
a
performance
boost.
Maybe
in
quotations,
because
you
know
there
were,
there
were
obviously
side
effects
that
came
with
that.
So
essentially,
marca
was
interested
in
understanding
where
the
performance
boost
came
from
and
seeing
if
we
can
get
those
numbers
back
in
a
safer
way
in
the
future.
C
But
for
now
this
makes
a
lot
of
sense
because
comparing
the
numbers
to
the
last
pacific
point
release
and
1721
those
numbers
match,
and
then
the
only
thing
in
between
is
1720,
where
the
blue
store,
zero
block
detection
feature
is
enabled
and
that's
where
we're
getting
this
quote-unquote
performance
boost
so
yeah.
That
was
just
kind
of
discovered
this
morning.
So
since
I
was
headed
with
that
pr,
I'm
working
with
the
dfg
storage
team
to
you
know
further
understand
that
and
everything.
C
But
now
we
we
understand
what
that
performance
boost
in
1720
was
coming
from.
That's
the
job.
A
Were
those
were
those
gains
primarily
in
rgbw
workloads,
or
were
they
rbd
workloads
as
well.
C
It
was
in
the
hybrid,
it
was
a
hybrid
new
workload.
I
believe
I
don't
know
what
it
exactly.
That
workload
is
that
I
can.
B
A
Yeah
casey,
I
noticed
that
when
I
turned
on
roxdb
compression
that
for
rgw
tests,
we
saw
a
huge
reduction
in
right
amplification.
A
I
wonder
if
there's
a
bunch
of
fields
that
are
like,
if
there's
a
bunch
of
zeros
being
written
out,
that
that
aren't
being
compressed
in
any
way,
but
this
thing
kind
of
just
avoided.
Writing.
D
A
D
Yeah,
I'm
not
sure
exactly
what
integer
fields
would
be
like
unused
and
default
to
zero.
There
are
a
couple
strings
which
can
potentially
get
long,
and
so
I
could
see
some
benefit
from
compressing.
Those.
A
I'm
wondering
if
there's
like
any
intersection
between
what
the
dfg
team
and
laura
saw
when
with
this
and
and
the
thing
I
was
seeing.
C
Mark
were
those
results
that
you
were
talking
about.
Do
you
have
them
anywhere
or
is
this
you
know.
A
It's
in
that
big
draft
roxdb
thing
which
you
may
not
have
access
to.
Let
me
get
you
on
that.
I'm
gonna
publish
it
pretty
soon,
but
I'll
give
you
access
to
read
it
now.
A
And
if
anyone
else
wants
access
to
it
as
well,
I'm
going
to
try
to
get
into
the
the
blog
format,
probably
next
week,
so
that
we
can
post
it.
But
if,
if
folks
are
interested,
let
me
know
I
can
give
you
access
to
to
look
it
over.
I'm
happy
to
get
feedback
but
yeah.
I
guess
I
guess
what
I'm
trying
to
understand
is
yeah.
Why
why
the
zero
block
detection
seemed
to
improve
performance,
to
the
extent
that
it
did.
C
Yeah,
I
think,
understanding
that
and
will
help
us
potentially
get
those
performance
numbers
back
in
the
future.
C
The
the
main
topic
of
a
conversation
with
that
in
terms
of
enabling
it
in
on
clusters,
not
in
just
synthetic
tautology,
testing
adam,
I
believe,
said
that
there
is
some
blue
sword
limitations
where
we
we
are
currently
unable
to
mark
the
extents
as
logically
that
they've
been
used
so
for
skipping
writing
a
zero
buffer
list.
We
can't
logically
mark
that
the
extent
has
been
used.
C
So
that's
what
we
would
need
to
look
into
doing
and
that
would
make
it
safer
to
actually
have
enabled
on
clusters,
but
yeah.
There
are
just
some
limitations
in
that
right
now
and
what
blue
store
is
capable
of.
A
C
No,
I
haven't
checked
into
that,
but
so
you
want
to
know
if
the
workload
is
using
zero
buffer
lists.
E
E
Yep,
it's
fine
as
long
as
the
energy
first
wasn't
optimizing
for
it,
but
well
now
blue
store
is.
F
C
Yeah,
so
that
is
something
to
check
into,
because
I
think
the
goal
now,
since
we
just
kind
of
made
this
discovery
this
morning,
the
goal
now
the
next
action
items
is
to
better
understand.
What's
what's
going
on
under
a
hood
and
figure
out
what
the
tests
are
using,
but
that's
certainly
a
thing
to
check.
E
E
E
A
A
A
A
G
Something
so
the
snap
map
here,
I
wrote,
which
skip
rocks
db,
doesn't
seem
to
impact
paul
kozner
performance
testing,
which
is
a
mystery
to
me.
G
So
I
was
wondering
if
you'd
be
available,
to
help
me
construct
and
run
appropriate
tests
or
sorry
targeted
testing
to
show
or
maybe
to
find
out
that
it's
not
that
they're
the
impact
on
snapdragon
and
on
clone
creation.
Any
clone
object
creation.
I
cannot
see
how
this
thing
would
have
zero
impact,
and
so
there
might
be
something
else
hiding
the
performance
issue.
A
It
is
strange
I
I've
been
trying
to
keep
up
with
paul's
emails
that
he
sent
out,
and
it
looks
like
the
last
thing
I
saw
was
that
he
said
that,
through
this
constant
workload,
he
just
sees
the
osd's
consuming
more
and
more
cpu
is
that
is
that
right.
G
Yes,
it
says
the
cpu
percentage
consumption
is
going
up
over
time.
I
assume
that
that,
because
they
have
more
and
more
snips
accumulated
and
because
the
amount
of
work
you
do
is
related
to
the
number
of
snips.
G
G
Yeah,
so
that's
for
I
need
your
help
if
you
could,
if
you
could
set
up
a
meeting
with
me
where
we
could
design
a
test
and
try
to
outline
the
steps
that
need
to
be
done
and
what
would
be
considered
improvement
or
what
would
prove
that
there
is
no
improvement.
G
So
if
you
have
many
small
objects
and
then
there
is
a
right
touching
all
the
objects,
so
I
know
it's
not
very
normal
flow,
but
that
would
be
like
best
case
scenario
for
my
code.
So
let's
say
the
object
are
very
small
64k
each
and
then
you
have
some
sequential
random
right,
jumping
doing
64k
ios,
so
every
right
would
touch
an
object.
G
Then
every
write
would
create
clone
object,
which
in
my
code
should
be
more
efficient.
Then
I
want
to
measure
timing
and
cpu
utilization
and
if
there
is
no
difference,
then
I
need
to
really
understand
what
I'm
doing
wrong.
I
do
expect
to
see
a
difference
and
the
other
thing
is
say:
you
have
a
system
with
many
objects,
and
now
you
run
snap
trim.
G
Sorry
there's
a
lot
of
clone
object.
You
run
snap
train,
so
I
would
measure
cpu
utilization.
Just
snap
me
do
nothing
else:
cpu
utilization
and
the
time
it
tooks
to
do
the
trim
with
my
code
and
with
the
base
code
again
I
expect
to
see
difference
and
last.
I
would
also
check
in
both
cases
I
would
check
right
amplification,
because
I
don't
try
to
disc,
unlike
snap
mapper
today,
which
every
object
is
created
on,
rocks
to
be,
and
then
it
had
to
create
a
tombstone.
G
A
So
it
kind
of
seems
to
me
like
based
on
what
you
just
said,
that
it
might
be
a
good
idea
to
write
a
benchmark.
That's
really
specifically
tailored
to
do
this
and
then
maybe
some
kind
of
more
generic.
I
know
the
the
the
odf
team
at
red
hat
has
run
benchmarks
where
they're
looking
at
snapshot
the
impact
of
taking
snapshots
during
a
background
workload,
but
they
were
they're,
basically
ddosing
the
osd
with
snapshots.
A
So
I
don't
know
if
that's
really
what
we
want,
but
I
mean
it
seems
like
you
could
write
a
targeted
test
that
maybe
would
would
give
you
an
idea
of
how
your
code
is
impacting
things
gabby.
What
would
you
think
about
something
like
the
g-test
suite
just
making
something
that
really
specifically
targets
this
behavior.
G
A
G
G
A
Yeah
I
well
and
the
the
the
fact
that
it
increases
the
overhead
increases
over
time.
I
mean
on
one
hand,
sure
you
could
expect
that
in
rocks
db,
things
would
slow
down.
Maybe
there's
some
fragmentation
involved
at
the
bluefs
layer.
A
You
know
there's,
maybe
some
explanations
for
why
it
would
get
slower,
but
understanding
that
exact
behavior
would
be
would
be
good
it'd,
be
I
mean
it
seems
like
paul's,
making
some
kind
of
slow
progress
on
that
now
he's
starting
to
figure
things
out
in
the
last
email
I
saw
so
maybe
it's
just
you
know
it's
gonna
take
some
time
to
to
understand
that.
A
But
yeah
in
terms
of
like
I
don't
I
don't
have
any
any
real
specific
advice
on
this
other
than
just
you
know.
Try
try
crafting
something
that
you
think.
Would
you
know
showcase
the
difference
between
what
you
wrote
and
what
what's
already
there
and
in
the
process
of
doing
that,
you'll
probably
figure
out
what
makes
sense
or
doesn't
make
sense,
certainly
profile,
as
you
go
right
like
try
to
to
analyze
why
things
are
degrading
over
time.
I
think
that
will
tell
us
a
lot.
A
Sure,
okay
sounds
good
all
right.
One
other
thing
I
forgot
to
mention
to
the
group
there's
a
chance
that
we
may
get
funding
for
replacing
the
smithy
nodes.
A
And
it's
there's
a
real
tight
deadline
on
this.
It
sounds
like
we
need
to
have
quotes
submitted
by
tomorrow.
I
worked
with
david
yesterday
david
galloway,
to
try
to
nail
down
potential
configurations
that
might
let
us
kind
of
keep
either
systems
or
vms
that
look
a
lot
like
the
smithy
nodes.
There's
the
biggest
concern,
I
guess,
is
power
usage
in
the
community
lab.
There
is
no
available
additional
power.
A
That
is
not
like
per
rack.
That's
the
entire
data
center,
so
our
hands
are
a
little
bit
tied
in
terms
of
what
we
can
do,
but
the
the
current
kind
of
iteration
of
the
design
of
these
replacement
nodes
is
very
much
based
on
trying
to
maximize
power
efficiency.
A
If
anyone
cares
about
this
or
has
opinions
on
it,
you
know
please
let
me
know
asap
because
we're
trying
to
to
kind
of
get
this
nailed
down
today,
so
we
can
submit
tomorrow.
But
you
know
it's
it's
we're
running
out
of
time
very
quickly
on
this.
A
So,
just
let
me
know
if
you
care
the
current
iteration
of
of
this
hardware
is
basically
1u
servers
with,
I
think
32
core
64
thread,
amd
processors
and
a
bunch
of
nvme
drives
that
will
try
to
split
out
into
like
eight
vms
per
node
that
looked
like
it
gave
us
kind
of
the
the
maximum
speed
for
the
lowest
power
envelope.
I
doubt
that
we'll
be
able
to
do
a
straight.
A
You
know
one
system
to
one
system
swap
with
mithy,
since
they
will
take
more
power
than
the
smithy
knows,
but
you
know
maybe
we
can
do
somewhere
between,
like
half
to
three
quarters
of
the
same
note
count
with
a
lot
of
vms
per
node,
that's
kind
of
what
I'm
hoping
we'll
be
able
to
do,
and-
and
I
don't
know
if
each
vm
will
be
quite
as
fast
as
smithy,
but
I'm
hoping
it'll
be
pretty
close
with
a
big
increase
in
the
count.
A
So
anyway,
that's
that's
kind
of
the
current
rough
plan
on
this
yeah.
Let
me
know
if
you
care:
that's
that's
basically
it
for
that.
A
Yeah,
sam,
that
was
kind
of
what
I
was
thinking
too,
if
we
could
get
even
close
to
the
current
test,
throughput
with
smithy,
but
then
have
more
nodes
available
or
more
vms
available,
so
that
people
aren't
waiting
in
queues
long,
that
would
that
would
probably
be
a
win.
A
The
current
jobs,
I
think,
are
still
spending
a
fair
amount
of
time
waiting
on
us
like
a
single
like
a
compression
for
logs
or
other
random,
weird
stuff
that
take
a
long
time
on
like
a
single
single
thread
or
single
core.
So
you
know
increasing
the
the
number
of
vms.
Even
if
it's
at
a
slight
decrease
in
performance
per
vm
is
probably
a
big
win.
A
A
So
anyway,
that's
basically
it
for
that,
unless
anyone
else
has
thoughts
or
questions
on
it.
A
All
right,
then,
any
anyone
else
have
anything
they
want
to
discuss
this
week.
A
All
right
well,
then,
have
a
great
week.
Everyone
and
see
you
next
week.