►
From YouTube: Ceph Performance Meeting 2022-01-13
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
All
right,
so
this
week
has
been
a
busy
week,
not
any
new
prs
that
I
saw,
but
a
whole
lot
of
closed,
prs
and
updated
prs
for
that
matter.
So
everybody's
trying
to
get
stuff
in
for
quincy,
let's
go
through
the
list.
First,
pr
rgw
zipper:
this
is
a
bug
fix
pr.
We
noticed
that
rgw
and
master
was
a
very,
very
slow
in
some
age
bidding
tests.
A
I
was
running
for
the
age
bidding
pr
that
also
just
merged,
and
it
turns
out
that
that
was
due
to
an
older
pr
that
was
mostly
cosmetic
but
did
have
a
change
that
ended
up
loading
stats
for
every
bucket
load
in
rgw,
and
it
turns
out
that
was
a
fairly
easy
fix.
So
we
we
got
in
quick,
which
is
good
because
we
won't
have
that
issue
going
into
quincy
now,
but
the
the
fix
basically
returned
us
to
previous
performance
levels,
so
that
was
really
good.
A
The
ttl
cache
implementation
pr
merged.
I
didn't
see
a
whole
lot
of
further
discussion
on
it.
I
think
it
just
needed
to
get
through
some
testing,
josh
solomon's
primary
balance
or
pr
emerged
laura.
I
think
you
merged
that
anything
new
report
on
that.
One.
B
Nothing
new,
I
guess
the
main
thing
is.
We
decided
the
original
pr
had
a
doc
documentation
commit
on
it
too,
where
josh
had
written
a
new
document
in
the
developer
guide.
That
explains
the
primary
balancer
feature,
but
since
that
part
of
it
hasn't
been
implemented.
Yet
we
move
that
over
to
a
different
pr
that
will
be
merged
after
quincy
branches
off,
but
the
refactoring
code
was
merged.
A
Any
anything
anything
new
on
that
that
one,
I
think
we
saw
some
performance
advantage
with
it,
but
I
have
trouble
remembering.
C
Yes,
there
were
performance
advantages,
but
none
of
the
tests
we
are
running
as
a
quick
performance
check
is,
is
showing
that
so
I'm
waiting
to
see
them
in
more
elaborate
quincy
testing,
okay,.
A
Cool
cool-
let's
see
next,
the
a
cash
age
binning,
also
a
multi-year
pr
that
merged.
Finally,
so
like
like
adam's
tests
for
the
the
fine
green
locking,
we
don't
actually
see
a
huge
performance
game
with
this,
which
was
very
disappointing,
but
in
fact
the
the
the
radius
bench
perf
test
that
we
run
through
jenkins
actually
did
show
a
little
improvement
with
it,
which
was
nice,
so
the
bigger
benefits
we
get
from.
A
This
are
more
fine
grained
control
over
the
caches
in
blue
store
and
also
much
better
information
regarding
the
relative
ages
of
the
items
in
our
different
caches
that
we
present
through
the
performance
counters.
So
overall,
I
think
still
a
good
win,
but
just
not
the
big
win
that
I
was
hoping
for.
Let's
see
next
igor's
pr
on
handling
onode
binning,
shardstrimming
and
also
adam's
pr
about
refactoring,
oh
node
reference
counter
and
pinning
both
of
those
were
closed
by
the
sailbot.
A
So
these
are
two
different
paths
we
could
have
gone
down.
I
know
there's
been
some
other
pinning
code
changes
recently.
C
A
C
Yeah
they
are
outdated
and
it's
fine,
because
we
have
a
new
one
on
the
kill
it's
on
new
updated
category,
so
they
are
clearly
closed
and
they
will
remain
close
forever.
Now.
Excellent,
all
right.
A
So
we
got
a
path
for
that,
which
is
very
good,
then
the
other
one
that
closed
by
the
stillbot
auto
tuning
of
the
mds
cache
memory
based
on
rss
usage
that
one
I
have
not
seen
updates
to
that
in
a
very
very
long
time.
A
I
think
personally,
my
feeling
on
this
is
that
we
should
be
using
the
priority
cache
for
this
anyway,
since
the
the
problem
that
you
hit
when
you
use
rss
memory,
is
basically
solve
or
try
to
use
rss
memory
to
to
judge
how
much
memory
to
use
for
caches
that
that
is
using
rss
can
result
in
very,
very
nasty
swings
that
you
really
don't
want
to
deal
with
the
priority
cache
kind
of
gets
around
that.
A
A
Okay,
next
updated
prs
use
thread
local
pointer
variables
to
save
the
shard
pointer
that
was
approved
by
ronin.
So
ronan
are
you
here?
I
don't
see
ronin
today
all
right!
I
didn't
look
closely
at
that,
so
I
guess
roman
was
happy
with
it,
though
next
make
sure
blob,
fsck
much
less
ram.
Greedy
igor
is
not
here
adam
you
looked
at
it.
I
looked
at
it
a
little
bit.
I
I
think,
assuming
that
it's
tests
out
fine,
we
should
just
get
it.
C
C
Hash
and
using
two
separate
bucket
lists
for
hashing.
It's
just
thing
of
performance
and
granularity
of
check
and
verification,
which
I
just
wavered
because
it
it
will
work
as
it
is.
It's
just
an
improvement
thing
so
thinking
that
this
is
as
important
just
accepted
that
and
leave
this
discussion
for
future.
A
A
C
Basically,
osd
memory
target
takes
care
of
our
caches
and
for
the
data
structure,
bitmap
data
structure
for
shared
blobs,
it's
fixed
allocation
of
fixed
size,
so
it
stays
really
within
a
memory
target.
I'm
pretty
satisfied
with
that.
Of
course.
It
we
pay.
We
might
pay
for
that
with
an
extensive
unneeded,
blob
rebuilding,
but
that's
what
we
have
to
do.
There
is
no
no
other
way
around.
C
So
if
there
is
no
errors,
there
is
no
need
to
give
more
memory,
even
I
would
say
the
less
memory,
the
faster
it
will.
It
will
work,
but
if
you
would
have
a
very
broken
state
with
a
lot
of
shared
blobs
requiring
repair,
then
having
more
memory
would
help.
A
A
Yeah,
that
was
that
was
one
of
the
reasons
I
was
advocating
for
using
the
the
the
priority
cache
for
this,
because
we
could
then
basically
inform
fsck
up
to
how
much
memory
it
should
try
to
target
to
use.
But
it's
a
minor
point:
you
know:
let's
get
this
merged
and
then
we
can.
We
can
hang
over
that
kind
of
thing
later.
C
A
Skip
oh
node
with
cap
iteration
for
empty
directories,
or
I
know
sorry
this
is
mbs.
Oh
I
no.
I
got
myself
way
off
fast
and
no
movement.
Where
am
I
okay?
I
think
next.
Do
we
talk
about?
Oh,
this
is
the
pinning
lodge.
This
is
the
other
being
logic
pr
from
you
adam.
Yes,.
C
That's
I
mean
that's
actually
both
our
work,
but
in
my
pr
this
time
and
that's
why
the
other
solutions
were
closed
and
will
stay
closed.
Okay,
okay,
that's
a
simplification.
After
we
merged
igor
fix
to
all
node
pinning.
So
now
we
could
make
some
more
more
simplification,
nice,
nice
yeah.
I.
A
I
am
very
happy
with
what
you
guys
came
up
with
compared
to
what
we
were
doing
earlier,
especially
the
one
I
tried
to
do
so.
This
is
this
is
excellent.
Okay,
next
braddock
introduce
huge
page
based,
read
buffers
is
radicular.
No,
I
don't
think
so
igor
reviewed
that
and
I
think
he
approved
it.
So
that
looks
good
next
optimize,
pg
removal
by
igor
that
previously
was
failing
tests.
It's
had
more
reviews
and
discussion
and
updates.
C
A
Yeah
yeah,
I
I
understand
it
did
have
an
approval
by
someone
I
think
early
on,
but
I
would
probably
need
someone
in
addition
to
approve
it.
Adam.
Are
you
planning
to
look
at
that?
C
A
Else
last
one
test
object,
store,
store
test.
Oh
this
is
my
old
map
edge
thing
neha
just
marked
this
is
not
stale,
because
we
do
still
want
it
in
some
form,
probably
not
in
the
format.
It's
in,
though,
basically
with
store
test.
These
tests
take
a
while-
and
we
probably
don't
want
it
to
make
a
store
test.
Take
that
long.
Also,
we
can't
easily
change
the
parameters
of
the
tests
that
way
with
with
the
g
test
suite.
A
So
maybe
this
becomes
its
own
separate,
benchmark
or
or
or
something,
and
maybe
we
don't
tie
it
directly
to
the
object
store,
but
we
try
to
go
through
an
actual
osd
but
anyway,
those
are
big
changes.
Lots
of
work
not
going
to
happen
for
quincy.
A
So
I
think
that's
all
I
had
in
the
updated
categories.
Did
I
miss
anything
from
anybody.
A
Can
think
of
okay
sounds
good,
then
moving
on
so
the
only
real
discussion
topic
I
wanted
to
bring
up
to
this
week.
We've
we've
talked
previously
a
little
bit
about
quincy
performance
tests
nah.
I
brought
it
up,
I
think
last
week,
so
I
think
for
for
nvme
tests.
I
can
probably
take
this
on.
We've
got
quite
a
few
templates
for
different
tests
that
we
want
to
run.
So
my
thought
is:
let's
use
mako
for
this.
A
We
can
actually
do
a
fairly
decent
sized
cluster
on
that
hardware
for
testing.
So
we've
already
got
kind
of
some
fairly
straightforward
fio
tests
that
we
can
run
against
rbdsfs,
possibly
also
look
at
dcmu
runner
and
nbd.
So,
basically,
iscsi
and
nbd
we'll
see
if
that
still
works.
We
did
have
it
working
at
one
point
through
cbt,
though
so,
theoretically,
it
may
still
and
then
hs
bench
for
rgw.
You
know
cost
benches
is
kind
of
bit
rotting
and
has
been
for
some
time.
A
So
that's
by
far
the
easier
route
to
go.
We
don't
have
any
real
tooling
yet
for
io
500
mbs
performance
we
do
have.
I
have
some
stuff
to
make
that
sort
of
automated,
but
not
not
real,
straightforward
and
the
the
results
we
get
from.
It
are
usually
really
chaotic
for
for
a
variety
of
different
reasons,
but
we'd
have
to
run
a
lot
of
I-500
tests,
probably
to
see
what
kind
of
trends
we'd
see
between
pacific
and
quincy.
A
So
I'm
I'm
thinking.
Probably.
We
should
just
leave
that
out.
It's
also
very
time
intensive,
oh
matt
bench.
It
might
not
be
a
bad
idea
to
run
it.
It
doesn't
take
too
long
and
there
have
been
some
fairly
significant
differences
we've
seen
between
past,
so
it
might
be
worth
looking
at
and
then
they
have
said
that
the
dfg
team
can
do
their
own
hard
drive
tests.
So
I'm
thinking,
let's
let
them
do
that.
A
I
also
really
like
having
multiple
people
running
tests,
because
sometimes
it
it
shows
things
that
that
one
group
overlooks
the
other
group
happens
to
catch.
So
what
else?
Now
you
brought
this
up?
How
does
this
sound.
D
A
Cool
the
good
news
is
that
I've
been
doing
a
lot
of
testing
as
part
of
the
age
binning
pr
and
I'm
I'm
not
seeing
anything
right
now
in
the
rgw
or
the
rbd
side.
That
shows
any
kind
of
major
regression
versus
specific
based
on
previous
numbers.
I've
gotten,
so
I
don't
think
we're
going
to
see
any
real
surprises
for
kind
of
the
the
common
case
workloads
it's
possible.
We
could
still
see
some
things
in
in
other
areas,
but
at
least
based
on
what
we've
got
in
master.
D
It's
good
to
know,
I'm
just
curious,
like
the
test
that
you
plan
to
run
how
many
osd's
are
you.
A
I
think
we
should
do
well
what
I've
like
to
do
in
the
past
when
I
have
the
resources
to
do.
It
is
both
kind
of
like
single
osd
tests,
because
you
kind
of
are
able
to
much
more
aggressively
push
an
individual
osd
that
way
and
then
also
like
a
big
cluster
test
and
on
mako
we
can
do
if
we
target
just
a
single
osd
per
nvme
drive.
We
can
do
osds.
Otherwise,
if
we
go
with
like
four,
we
could
do
240.
D
A
So
you
know
we
can
yeah
I'm
open
either
way.
If
you,
if
you
want
to
just
you
know,
try
to
go
for
like
maximum
number
of
osds.
We
could
probably
do.
240
memory
gets
a
little
tight,
but
we
can
we
can
do
it.
Otherwise
you
know
we
can
we
could.
We
could
give
the
osd's
a
very,
very
comfortable
amount
of
memory
at
you
know,
just
using
one
one
os
deeper
drive.
D
A
D
Yeah,
in
any
case,
I
think
we
can
go
on
the
safer
side
and
get
the
ost's
good
amount
of
memory.
A
Yeah,
we
could
also
very
easily
do
two
two
of
these
per
per
nvme
drive
and
then
you'd
be.
You
could
easily
have
four
gigs
for
each
one
and
plenty
of
space
for
other
demons
and
clients
at
the
same
time,
and
then
that
would
be
like
120
osds
yeah.
That
sounds
good.
D
So
you're
going
to
be
using
cbt,
I
was
just
wondering.
Maybe
if
we
have
time
we
could
do
the
recovery
testing
as
well.
I
know
there
were
some
improvements
that
srider
made
to
the
recovery
tests,
so
maybe
we
can
do
that
as
well.
A
Yeah
yeah,
certainly
we
could
we
could.
I
I'm
not
sure
I
haven't
looked
closely
at
the
changes
that
they
made
but
beyond
you
know
reviewing
that,
so
the
one
I
think.
E
Yeah,
can
you
hear
me
now?
Yes,
yes,
yes
yeah,
there
have
been
quite
a
few
call
rocks
from
mine
and
sorry
about
that
so
yeah.
So
as
far
as
the
new
recovery
test
and
cbt
goes,
I
think
this
essentially
creates
a
couple
of
pools.
One
pool
dedicated
to
the
client
and
the
other
pool
dedicated
for
recovery,
specific
operations,
so
the
way
that
the
test
works
is
to
basically
populate
the
client
pool
with
a
bunch
of
objects.
E
E
While
all
this
time
the
client
tryouts
are
going
on.
So
that
way,
we
measure
the
how
the
recovery
proceeds
against
the
client
drops
that
are
going
on
in
parallel,
and
the
test
basically
collects
all
these
stacks
related
to
client,
iops
and
the
recovery
items.
And
then
that
helps-
and
I
also
written
a
simple
tool
to
graph
the
the
statistics
related
to
ios
client-based
diops
and
how
the
recovery
proceeds
to
complete
in
that
time
frame.
So,
in
a
nutshell,
that's
what
the
test
does.
E
In
this
case,
the
the
tool
that
we
create
is
is
much
later.
The
recovery
pool
is
created
much
later
when
compared
to
the
client.
So
I
think
the
I
think
the
test
is
basically
the
for
the
the
initial
testosterone
is
already
existing.
That's
the
I
call
it
the
blocking
recovery
test
and
the
other
one
that
I
created
now.
That
essentially
does
that
helps
in
the
creating
background.
E
A
Well,
yeah,
if,
if
you
want
to,
we
can
we'll
just
have
to
see
how
long
it
takes
to
get
through
all
this
others,
but
we
could
definitely
try
to
see
if
we
could
get
that
in
frequency
too.
If
you
want
to
try
a
much
larger
cluster.
D
Cool
I
mean,
ideally,
I
mean
creating
a
separate
recovery
pool
and
all
that
may
be
fine,
but
like
maybe
we
can
also
just
have
a
regular
setup
like
an
rjw
cluster
and
have
the
recovery
test,
do
some
kind
of
bringing
notes
down
and
capturing
recovery,
stats,
etc
in
the
existing
cluster.
Instead
of
having
to
create
a
separate
pool
for
recovery
and
all
that
stuff,
which
was
mostly
for
qs
performance
purposes.
D
A
Yeah,
that's
how
I
think
the
original
ones
worked.
We
we,
basically
just
you
pre-filled
in
some
data.
You
know
you
do
maybe
like
a
an
rgw
put
workload
and
then
at
some
point
at
some
you
know
you,
you
have
the
the
recovery
triggered
where
it
marks
down
a
bunch
of
osds
and
and
then
you
know,
casters
monitoring
the
the
behavior
while
well.
That's
you
know
the
the
cluster
is
healing.
D
Okay,
I
don't
really
recall
what
you
used
to
do,
but
I
mean
in
general
I
think,
good
time
to
revisit
it.
A
D
E
Yeah,
I
recommend,
from
the
earlier
test,
used
a
single
rbd
image,
whereas
in
this
case
the
one
that
I
have
introduced,
that
essentially
creates
two
two
separate
images
with
two
separate
pools
and
that's
the
basic
difference.
I
think
okay.
A
I've,
never,
I
don't
think
I've
ever
tried
a
recovery
test
with
rgw.
If
I
remember
right,
the
the
aegis
bench
wrapper
supports
it
in
the
same
way
that
the
fio
wrapper
does,
but
I
don't
think
I've
ever
tried
using
it.
A
Cool,
so
I
think
neha.
I
want
to
ask
you:
what
do
you
think
about
like
iscsi,
tcmu,
runner
and
npd?
We
might
be
able
to
get
them
for
free,
essentially
other
than
the
time
it
takes
to
run
them
previously.
We've
done
that
through
cbt
and
it's
worked.
D
D
I
would
s
okay,
I
am
biased,
I
would
say,
lower,
because
I
would
really
like
to
recovery
because
of
m
clock
changes
and
like
making
sure
like
recovery
works.
Fine
and
like
we
have
the
high
recovery
profile
with
the
m
clock,
changes
I'll,
try
that
out
as
well.
So
that's
my
personal
agenda
but
yeah
as
a
global
project.
We
can.
We
can
talk
to
elia
and
their
rb
folks
to
figure
out
how
that
you
know
what's
the
priority
on
their
mind,.
A
Shoot
I'm
blinking
on
his
name.
He
left
for
oracle
a
little
while
ago
on
the
the
rbd
team.
A
But
anyway
I
don't
think
it's
changed
much
in
like
the
last
year,
just
from
talking
to
different
people
so
and
we
could
we
can
test
it,
but
I
don't
think
it's
probably
going
to
look
a
whole
lot
different
than
it
did
a
year
or
two
ago,
when
we
last
looked
at
it.
E
A
Would
probably
know
for
sure,
but
that's
the
impression
they
get.
A
All
right:
well,
I
think
we
got
a
plan
then
for
quincy
testing
here.
That
was
the
only
topic
I
had
for
this
week.
I
don't
have
anything
else.
They
want
to
to
talk
about.
A
All
right:
well,
if
not,
then
I
know
everyone's
really
busy
trying
to
get
last-minute
prs
in
here.
So
yes,
we'll
wrap
this
up.