►
From YouTube: Ceph Performance Meeting 2023-02-09
Description
Join us weekly for the Ceph Performance meeting: https://ceph.io/en/community/meetups
Ceph website: https://ceph.io
Ceph blog: https://ceph.io/en/news/blog/
Contribute to Ceph: https://ceph.io/en/developers/contrib...
What is Ceph: https://ceph.io/en/discover/
A
I
have
a
feeling
that
we're
probably
not
going
to
get
them
for
at
least
another
five
or
six
minutes.
So
let's
just
get
started
on
stuff.
A
All
right.
I
have
two
new
PR's
this
week.
That
I
saw
Igor.
Tell
me
about
this.
Allocator
format
version
2
PR
from
you.
It
looks
huge.
B
It
potentially
might
grow
with
this
augmentation,
especially
for
high
volume
cleaning
drives,
and
it
can
also
overcome
the
four
digit
gigabyte
and
gigabyte
file
size,
maximum
file
size
for
Bluetooth,
which
actually
requires
around
to
five
256
million.
B
With
bit
per
location
unit,
draw
this.
B
Uses
more
space
on
almost
empty
disks
disks,
but
if
pigmentation
grows,
the
space
usage
remains
fixed,
the
same
as
we
had
with
skip
the
locator
versus
bitmap
one,
not
to
mention
a
bunch
of
cleanup
and
some
performance
improvements
around
this
stuff.
A
So
it
might
be
timely,
Adam
and
I
were
just
talking
earlier
this
week
about
trying
to
to
revisit
how
Hybrid
allocator
Works,
maybe
maybe
doing
it
a
little
differently
than
the
the
current
method,
maybe
something
that
combines
trees
and
bitmap
in
a
little
bit
different
way.
Now
we
currently
do
it.
B
Yeah,
so
well
generally,
this
thing
works
like
it
dumps.
B
C
B
Well,
it
actually
depends
on
the
size
of
the
location
map
on
it
actually
on
amount
of
free
base,
chunks
comfortable
heavy
chunks.
So
if
it's
large
enough,
then
it
provides
some
benefit.
C
B
A
B
There
is
a
unit
test
attached
to
which
one
can
play
with,
and
you
can
specify
amount
of
fixed
ends
in
a
locator
to
save
and
restore
and
we've
around
200
million
of
these
chunks.
The
differences
like
the
original
implementation
runs
runs
in.
B
16
seconds:
well,
not
are
you
on
the
team
versus
new
one
running
in
nine
or.
A
So
it's
it's
a
big
big
big
pull
request,
we'll
have
to
I
I,
don't
know
if
Adam,
maybe
is
the
right
person
to
review
it.
I
mean
I
can
try,
but
this
is.
This
is
a
lot.
B
Well,
yeah,
you
could
put
some
optimization.
C
A
A
I
figure
Adam's,
probably
the
one
that
will
fall
to
to
review.
But
you
know
if
I,
if
I
can
Round
Up
time,
I'll
try
to
take
a
look.
I
know:
I
have
another
one
that
you've
made
that
I
said
if
Adam
couldn't
look
at
it,
I
tried
to,
and
that
was
like
a
week
or
two
ago.
A
Yeah,
absolutely
we
should
see
if
we
can
rub
him
into
into
doing
a
review.
It's.
A
Cool
all
right,
let's
see
next
PR
here
you
you
come
to
contacted
me
earlier
about
this
one.
Do
you
want
to
talk
about
your
pull
request.
D
E
D
D
A
I'm
not
sure
if
it
was
maybe
just
on
my
mind
or
not,
was
it
or
other
tired
or
two.
D
D
F
Looking
at
the
pr
I'm
just
curious,
does
anybody
know
why
image
context
needs
a
copy
of
the
config
proxy?
It's
storing
a
stuff
context,
so
it
could
just
get
the
config
through
the
pointer.
C
G
Okay,
what
I
think
that
maybe
I
can
I
can
help
a
little
bit
okay,
but
in
any
moment
you
can
interrupt
me
okay.
So
what
this
this
whole
request
is
because,
basically,
we
need
to
provide
some
kind
of
configuration
of
custom
labels
for
and
from
a
serious
metrics.
Okay,
when
the
the
user
introduce
a
custom
labels
in
Rapid
matches,
we
have
the
problem
that
to
recover
these
images,
this
label
sorry
take
a
long
time.
G
Okay,
basically,
what
it
has
done
is
to
profile
the
different
goals
in
the
favorite
of
the
images,
and
what
we
have
found
that
we
have
found
has
found
is
that
the
the
current
image
CTX
context
is
taking
a
lot
of
time
in
order
to
be
recommended
for
for
each
unit.
Okay,
so,
basically,
what
we
have
done
is
to
replace
this,
this
big
context
for
any
kind
of
inner
compose
users
with
a
proxy
that
is
getting
a
reference
to
the
big
one.
Okay,
and
that
is
more
more
like
to
be
managed
foreign.
F
G
A
All
right,
cool
I,
don't
think
we
have
any
of
the
RBD
folks
here
today.
Well,
I
can
see
so
we'll
we'll
want
to
get
their
input
too,
but
sounds
good.
H
I'm
just
catching
up
here,
but
I
think
you're
talking
about
having
a
perk
image
context
copy
of
the
configuration
I
think.
One
reason
that's
done
is
to
be
able
to
override
configuration
with
the
image
specific
metadata.
H
So
there
are
ways
to
specify
like
for
image,
configurations
that
RPD
uses
and
I
think
that's
why
it's
getting
that
copy
there.
H
H
Oh,
it
wasn't
the
question:
okay,
so
Stephen
I
think
the
reason
why
it
was
making
copies
was
that
was
I.
Have
a
per
image
configuration
possible
these?
You
can
store
extra
metadata
on
an
image
that
overrides
some
of
the
configuration
options.
H
It's
been
some
time
since
I
looked
at
this,
so
that
that
was
before
the
context
was
turned
into
a
whole.
Config
proxy
object
as
well.
It
may
have
been
simpler
before
that
point.
H
I'm
I'm
just
saying
that
the
I
I
think
the
behavior
here
might
have
changed
a
bit
because
when
it
was,
this
was
originally
implemented
with
RBD
config
proxy
didn't
exist,
but
then
it
was
changed
later
so
that
maybe
but
the
reason
we
didn't
notice
this
before,
like
the
the
performance
piece
that
you're
looking
at.
C
G
H
H
A
All
right:
well
then,
let's
see
moving
on
I
did
not
see
any
closed,
pull
requests
this
week,
I
missed
anything.
Let
me
know.
A
Otherwise
updated
this
week,
Igor
looks
like
Adam
reviewed
your
avoiding
using
whole
space
iterators
for
prefixed
access,
PR.
A
Ide
and
looks
like
he
actually
approved
it
and
just
had
one
comment:
I
think.
A
So
yeah,
hopefully
we'll
get
that
in
soon
this
rock
Stevie
one
we
should.
We
should
figure
this
out.
It
should
not
be
a
hard
problem,
but
for
whatever
reason,
there's
a
hang
up
here.
A
I
asked
radic.
If
you
could
take
a
look
at
it
earlier
this
week,
so
I'll
go
I'll
come
again
just
to
see
since
he's
updated
or
actually
be
in
the
past,
he
should
be
able
to
help
out.
If
not-
and
we
can
we
can
do
it
ourselves.
Up
like
this
is,
should
be
super
hard
to
get
done,
but
we
we
do
really
want
to
get
this
in
for
Reef,
see
next
that
rocksdb
iterator
bounds
for
blue
star
collection
list.
A
Adam
found
some
bugs,
so
he
requested
some
fixes
for
that,
but
otherwise
I
think
that
people
people
are
liking
it.
So
it
looks
good.
Finally,
there's
there's
an
older
PR
here.
This
is
a
really
really
simple
PR,
it's
just
disabling
busy
polling
in
qat.
This
is
from
someone
at
Intel.
A
Kefu
had
marked
himself
to
review
it
like
a
month
or
two
ago,
I
think-
or
maybe
it's
even
longer,
I
wonder
if
we
should
just
merge
this.
It's
it's
a
really
simple
change:
qat's
Intel,
stuff
they're
recommending
that
we
disable
busy
polling.
Does
anyone
have
a
strong
opinion.
A
All
right,
if
I'm,
not
hearing
a
certain
opinion
on
this
I
think
I
might
just
I
might
just
merge
this.
It's
they.
They
give
performance
results
that
made
it
look
like
it's
an
improvement,
a
kind
of
trust,
their
judgment
on
it,
since
the
this
is
their
technology.
So,
oh,
oh
I'll,
probably
just
merge
this
one
other
than
that
I
didn't
just.
F
A
quick
question
about
qat
in
general:
do
you
know
if
there's
been
any
discussions
about
being
able
to
test
that
stuff
in
in
our
own
infrastructure,
because
it's
it's
hard
to
review
and
support
that
stuff
without
being
able
to
run
it.
A
F
F
No,
but
they
have,
they
have
been
doing
some
other
qat
stuff
that
relates
to
rgw's
compression,
okay,.
A
Yeah
I
completely
understand
I
mean.
Can
we.
A
B
H
B
C
Now
I
think
the
is
a
functionality
you
got
from
the
the
add-in
cards
got
moved
into
the
the
CPUs
on
dot.
You
know
on
the
socket,
but
I,
don't
know
what
generation
you
know.
I
know
the
personalists
ppus
or
one
or
two
generations
old,
so
I'm
not
sure
if
they
have,
but
they
they
might
well
have
the
functionality
built
in
without
needing
the
add-in
card.
A
I
suppose
our
our
homework
is
to
figure
that
out
and
then
theoretically
Casey,
if
they
support
it,
I
suppose
you
guys
have
one
of
those
right.
A
Yeah
that
might
that
might
be
a
path
forward
if
those,
if
they'll
support
it
now,
let
me
know
I
can
try
to
help
help
figure
it
out.
F
I'm
gonna
link
the
qat
AR
PR
that
they've
been
working
on
that
touches.
Rgw
put
it
in.
A
A
All
right
for
no
movement,
I
I,
don't
think,
there's
anything
interesting
in
this
right
now.
A
A
All
right,
I
think
that's
it
for
PRS
anything
I
missed
buddy.
A
All
right
for
discussion,
topics,
I,
don't
think
Corey
or
David
Orman
are
here,
but
I'll
just
give
a
quick
update
since
they
they
said.
I
could
share
some
of
the
things
that
they're
seeing
their
their
cluster
is
doing
really
really
well.
After
a
couple
of
things.
A
They
they
applied,
the
pr
that
I
had
for
basically
compacting
an
iteration
when
tombstones
were
encountered,
and
that
was
was
huge,
I
guess
for
them
they're,
seeing
a
dramatic
reduction
in
in
how
much
time
is
spent
in
iteration
and
that
allowed
them
to
remove
their
TTL
attempt
to
live
optimization
that
they
put
in
place
to
deal
with
this
in
the
past,
which
made
other
things
much
better
and
on
top
of
that,
they
they
applied
lz4
compression
to
roxdb
and
that's
been
a
huge
win
for
them
for
space
amplification.
A
So
all
these
things,
combined
together,
they're
seeing
dramatically
higher
performance
and
lower
disk
utilization
cluster.
So
this
they
talked
about
it
last
week,
a
little
bit
but
they're
continuing
to
see
a
lot
of
really
good
behavior
over
this
past
week
with
it
so
yeah
really
really
good
I'm,
hoping
that
that
reef
is
going
to
be
a
really
really
good
release
for
a
lot
of
people
and
that
that
was
all
I
had.
But
I
I
wanted
to
ask
Joshua.
If,
if
you
guys
have
any
update
on
on
your
stuff.
E
Yeah
I
can
briefly
just
talk
about
our
findings
in
our
staging
cluster.
So
after
we
met
last
week,
there
were
three
different
paths
to
try.
This
is
referring
to
the
right
application.
We've
been
witnessing
in
Pacific,
so
the
tracker
Issue
5,
8
5
30,
not
that
anybody
can
get
to
the
tracker
right
now.
It
looks
like
it's
overloaded
at
the
moment,
three
different
options,
the
first
one
we
tried
per
your
suggestion-
and
this
was
just
like
out
of
curiosity-
because
there's
no
way
we're
actually
gonna
apply
some
product.
E
You
have
to
re-roll
all
your
osds
is
changing.
The
blue
FS
share,
download
sized
one
Meg
that
seems
to
work
as
expected,
reduces
the
inode
size,
because
the
extent
list
is
smaller
because
you're
doing
one
Mega
allocations
instead
of
64k
Mark's
suggestion
was
try
the
new
rocksdb
tunings
from
Maine
and
so
I
applied
that
to
the
system.
It
had
the
same
effect
in
this
case.
It's
because
the
wall
is
kept
smaller
and
so
the
eye
node
just
never
gets
that
big
again.
The
extent
list
is
shorter
because
the
wall
is
smaller.
E
That
was
positive.
We
we
have
an
internal
item
to
go
and
evaluate
those
settings
in
like
against
our
performance,
metrics
and
benchmarks,
and
that
sort
of
thing
and
then
we'll
see
if
we
want
to
actually
start
rolling
those
more
widely
in
our
infrastructure
ahead
of
that
quarter.
Reef
but
we'll
see,
and
then
finally,
it
suggested
that
the
bluefest
incremental
log
update
patch
that
landed
in
16-11
could
fix
this
as
well.
And
it
does
seem
to
be
the
case
too.
So
really
like
any
of
those
any
of
those
options.
E
Help
I
mean
and
they're
all
improving
things
completely
differently
either
by
keeping
the
wall
smaller
or
by
making
inode
size
increases
better
or
less
expensive
on
the
log,
because
we
aren't
rewriting
the
entire
inode
every
single
time
or
by
keeping
the
inode
itself
smaller.
So
I
did
not
then
try.
These
in
combination
because,
like
that,
would
be
interesting
to
see
what
happens
if
we
keep
the
walls
smaller
and
also
have
the
incremental
update
node
mode,
but
like
it's
going
down
to
I.
A
That
point
so
yeah
it's
it's
kind
of
funny,
I
feel
like
we.
We
get
these
problems
and
then
we
kind
of
like
Nuke
them
from
orbit
from
like
four
directions
at
once.
Yeah
well.
E
And
the
thing
is
like
I,
think,
the
combination
of
the
the
roxdb
settings
and
then
the
incremental
blue
FS
updates
is
a
valid
thing
right,
because
the
fact
the
fact
that
wall
is
still
getting
so
big
is
I
mean
even
if,
in
the
steady
state
it's
not
showing
problems,
it's
bound
to
cause
some
sort
of
delay
somewhere
like
a
startup
delay
or
something
right.
So.
A
Yeah
yeah
and
it's
it
it.
It
took
a
long
time
for
us
to
figure
out
how
to
avoid
having
like
crazy,
Ray
amplification
in
rocksdb,
without
keeping
those
big.
You
know
with
the
way
that
we
do
PG
log
updates
it
just
it.
Yeah
it
and
I
have
to
give
credit
I
think
it
was
to
either
Intel
or
Micron.
That
came
up
with
like
seemingly
well
working
tunings
that
let
you
keep
it
smaller,
but
no
one
understood
why
so
yeah
it's
every
time.
A
E
Right
yeah,
my
new
science
is
because,
probably
because
the
levels
are
level
one
or
bigger
stuff
just
gets
deleted
at
those
levels
internally.
Is
that
why,
like,
like
my
understanding,
is
you
keep
the
walls?
Big
stuff
is
getting
added
but
deleted
within
the
wall,
so
it
just
never
gets
actually
committed
to
level
zero
yeah.
E
Hey
yeah,
so
that's
our
updates.
We're
I
am
like
literally
right
now
evaluating
the
16
to
11
pass
list
for
anything
that
might
concern
us
for
upgrading
in
our
environment,
so
I
mean
I,
won't
know
what
this
looks
like
in
prod,
probably
for
another
week
or
two.
So
at
that
point,
I
can
like
finally
comment
back
on
the
ticket
and
say
yes
or
no.
This
is
actually
fix.
What
we're
observing
at
the
production
level.
A
E
To
observe
the
difference
separately-
yes,
yes
yeah
so
I
would
like
to,
but
that's
going
to
come
after.
Okay.
A
We
have
a
decision
to
make
for
Reef
whether
or
not
we
blanket
tune
everything
to
the
new
tunings,
including
old
clusters,
so
that
they
automatically
start
using
those
new
tunings
when
they
run
out
new
SSD
files
and
and
use
the
wall
or,
if
we
like
kind
of
Flinch
and
and
make
it
so
that
only
new
clusters
that
are
deployed
use.
Those
tunings
and
old
clusters
continue
to
use
the
the
existing
tunings.
E
Yeah
I
mean
looking
at
the
tunings
and
especially
based
off
of
what
the
11
11
folks
were
talking
about
last
week.
I
wonder
if
it's
actually
worth
dropping
the
TTL
Edition,
yes
default,
tunings
yeah,
yes
other
than
that
I
mean
yeah.
I
I
can
understand
it's
kind
of
hard
to
say,
but
it
wouldn't
like
I
mean
it's
not
the
first
time
that
a
major
Ross,
TV,
Behavior
change,
has
landed.
I
mean
we've
upgraded
rocks
to
be
across
major
versions
right,
yeah.
G
A
A
You
could
always
change
it
back.
It
shouldn't
be
a
problem
and
the
the
SSD
files
are
going
to
look
different,
they're,
going
to
be
sized
differently
and
and
they're
going
to
behave
a
little
bit
differently.
But
you
know
it's
not.
This
can
be
a
gradual
thing.
It's
not
like
a
you
know,
incompatible
data
format
or
something
right.
Yeah.
E
Exactly
yeah
I
I,
it's
like
I
think
again
evaluating
the
changes.
My
my
personal
concern-
and
we
won't
know
this
until
it's
rolled
more
widely,
would
be
is
level
zero
level
one
big
enough
or
are
we
basically
or
some
configuration
is
going
to
exceed
it
and
then
also
have
spill
that
causes
right
out
right.
A
A
E
A
All
right,
that's
all
I
had
guys.
Was
there
anything
else
anyone
wanted
to
bring
up
this
week.
A
Well,
all
right,
then,
thank
you
all
for
coming.
It
was
a
good
talk
and
we'll
we'll
meet
again
next
week.
See
you
guys.