►
From YouTube: Ceph Performance Meeting 2021-05-27
Description
A
All
right
so
this
week
I
only
saw
one
new
performance
related
pr.
Maybe
there
someone
else
has
some
other
ones,
but
the
only
one
I
saw
this
week
was
actually
from
related
to
the
manager
to
implemented
ttl
cache,
and
there
were
some
impressively
there
were
actually
some
performance
results
and
graphs
included
this.
So
that's
always
really
nice
to
see
so
that
that
looks
good.
Well,
I
don't
know
if
it
really
looks
good.
I
I
glanced
at
it.
A
I
I
I
I'm
hoping
it
looks
good,
because
that
will
be
good
if
it
can
be
approved.
I
also
did
not
see
any
closed
performance
relay
appears
this
week.
I
think
everybody's
getting
ready
for
memorial
day
weekend
in
the
us,
so
I
have
a
feeling
that
maybe
related,
but
there
are
a
couple
of
updated
prs.
The
d3n
cache
changes
for
upstream
and
rgw.
A
It
looks
like
after
lots
and
lots
of
testing
and
updates
and
things
that
this
is
passing
tests.
I
don't
casey
on
the
call
he
had
responded
to
it.
Maybe
adam
have
you?
Have
you
been
tracking
this
at
all.
A
Well,
it
looks
like
maybe
it's
making
good
progress,
so
that's
that's
excellent.
Hopefully,
hopefully
that
will
be
able
to
be
merged.
B
A
So
maybe
exciting,
we'll
see,
let's
see
next
gabby
is
still
working
on
removing
allocations
from
roxdb.
He
mentioned
this
morning
in
the
course
standup
that
he's
hitting
some
kind
of
bug
on
his
laptop,
but
not
in
the
test
cluster.
So
it
sounds
annoying
and
I
think
he's
trying
to
track
down
exactly
what's
going
on,
but
there's
just
continued
work
on
trying
to
make
sure
that
that
is
really
stable
before
we
merge
it
in
for
reference,
though,
that's
a
really
big
performance
win
on
the
right
side.
It
it.
A
The
kv-sync
thread
in
in
rocks
the
aaron
blue
store
is,
is
quite
busy
again
on
very
fast
hardware,
where
we
kind
of
go
back
and
forth
on
whether
or
not
it's
a
bottleneck.
We
make
it
better
and
then
we
get
faster
hardware
and
then
so.
This
change
primarily
benefits
us
by
by
reducing
the
amount
of
work
that
the
kbc
thread
has
to
do
well.
That
rocksteady
has
to
do,
and
by
virtue
of
that
they
keep
easy
right.
A
Okay,
next
adam's
pr
to
set
a
new
compression
blob
size
specifically
to
64
kb,
there's
some
discussion
on
that
and
testing
and
things
at
one
point
a
while
back.
I
looked
at
it
and
thought
it
was
good,
so
I
approved
it
and
igor
just
very
recently.
This
week
looked
at
it
and
also
approved
it,
but
I
think
we're
all
in
agreement
that
this
is
a
good
idea,
good
decision,
it
just
needs
to
be
tested
and
improved
and
well
it
has
been
approved.
A
I
guess,
but
I
just
passed
tests
and
and
get
it
in
otherwise,
no
movement
and
a
lot
of
things.
A
I
owe
people
a
new
version
of
the
whip:
the
cash
bidding
vr
that's
been
based
several
times
in
several
branches,
it
all
more
or
less
seems
to
work,
but
we
need
to
get
a
real
pr
in
place
now
that
calm
family
shouting
has
made
it
in
so
at
some
point
in
the
coming
weeks
or
months,
I'll
get
back
to
that
and
actually
update
it
and
hopefully
we'll
get
it
in.
Let's
see,
I'm
not.
A
Related
to
the
isc
21
topic
below,
there
is
a
pr
for
subtree
map
removal
from
the
mds
journal,
and
I've
talked
to
patrick
a
little
bit
about
that.
I
think
he
thinks
it's
too
big
to
merge
in
one
go.
He
wants
to
break
it
up
into
smaller
pieces,
but
that
zephyfest,
I
think,
is
actually
still
really
a
critical
pr,
especially
since
it
seems
like
some
of
the
other
work
that
happened
to
change
the
ephemeral
pinning
code
to
distribute
difference
rather
than
subdirectories
has
not
dramatically
improved
things.
A
So
at
some
point
I
really
want
to
focus
on
helping
get
that
pr
in,
but
otherwise
there's
a
number
of
other
things
here
that
just
didn't
count
on
the
back
partner.
The
oh,
no
painting
work
from
both
igor
and
adam.
We
have
to
make
a
decision
on
that
and
figure
out
what
we're
going
to
merge.
They
each
have
their
own
versions
of
that
pr.
A
And
yeah
just
other
random
stuff,
nothing,
nothing
real
interesting
to
update
for
now.
Oh
good,
of
course,
folks
have
made
it
so
I
just
went
through
pr's
gabby.
I
mentioned
at
your
sir.
You
mentioned
this
morning,
you're,
seeing
just
an
issue
on
your
laptop,
but
not
in
the
test
cluster
was
that
right.
A
C
C
A
A
A
All
right,
but
moving
on
the
only
discussion
topic
I
have
for
today
is
that
patrick
had
come
to
me
a
couple
weeks
ago
wanting
to
know
if
we
were
going
to
do
another
submission
for
the
international
supercomputing
conference
for
zffs.
We
did
this
last
year
for
isc
20..
We
didn't
do
badly.
We
did
okay,
not
as
nearly
as
good
as
we
could
have
done.
I
think
if
we
can
nail
down
a
couple
of
things
as
ffs,
but
we
did
okay,
so
the
question
was:
do
we
do
another
one
of
these?
A
I
don't
have
the
full
officials
cluster,
like
we
did
last
year,
we've
we've
distributed
those
to
developers
to
work
on,
but
there
is
a
chance
that
we
might
be
able
to
do
another
amazon,
ec2
submission
or
possibly
it
looks
like
david
galloway
has
made
some
progress
in
getting
our
new
performance
news
set
up,
so
I
might
be
able
to
do
on
that
cluster.
But
in
the
meantime
I
still
had
a
couple
of
official
nodes
that
I
could
use.
A
I
did
some
smaller
scale
testing
and
specifically
we're
wanting
to
know
whether
or
not
some
of
the
changes
that
have
been
put
in
for
ephemeral,
painting
to
distribute
different
eggs
rather
than
subdirectories
has
improved
the
test
results
where
we
have
multiple
clients,
all
writing
files
to
a
single
directory
that
has
historically
been
a
really
tough
use
case
for
cfs
and
and
the
thought
was
that
this
might
do
better.
A
So
I
ran
some
tests
over
the
last
week.
Looking
at
this,
the
the
link
is
in
that
window
also
in
the
ether
pad,
and
unfortunately
the
results
so
far
have
been
that
no,
in
fact,
we
are
not
seeing
better
performance
in
that
scenario,
with
the
new
ephemeral
pinning
code,
these
are
specifically
the
md
test,
hard
results.
A
Oh,
no,
I'm
sorry,
I
must
have
not
showed
it
properly.
A
E
A
So
so,
yes,
the
columns
e
through
h
are
the
ones
that
we
wanted
to
improve
the
easy
test
results.
We
can
do
better.
We
can
do
much
better
than
we're
doing
now,
but
even
in
the
last
round
of
testing
a
year
ago,
the
ephemeral
opinion
helped
fairly
dramatically
as
you're,
seeing
here
right
that
well
as
long
as
you
have
the
distribution,
the
the
value
set
at
1.0,
so
that
you're
you're
always
doing
it.
A
So
as
long
as
you're
fully
using
the
the
ephemeral
random
pinning
then
for
the
easy
tests,
we
do
better,
not
perfect
I'll
get
back
to
that
later,
but
the
hard
test,
though,
in
fact,
if
we're
we're
twice
as
fast
for
rights,
if
we
just
pin
it
everything
to
one
mds,
if
we
pin
that
that
really
hot
directory
with
lots
of
clients
all
accessing
it,
trying
to
write
files
out
just
pinning
to
a
single
mds
is
like
twice
as
fast
for
rights
than
trying
to
do
active,
active,
mds,
balancing,
no
matter
if
we're
doing
damage
partitioning
or
if
we're
doing
random,
direct
dirt
entry
or
different
distribution
to
mbs's.
A
Is
it's
not
that
it's
something
else,
maybe
journaling
and
the
authoritative
mds?
Maybe
it's
caps
client
caps
that
are
not
being
relinquished
fast
enough
or
maybe
it's
the
the
locking
in
the
distributed
cache.
Now
there's
lots
of
nastiness
that
this
could
be,
but
the
mdss
are
all
very
idle.
The
osds
are
all
even
more
idle
and
very
little
work
is
actually
getting
done.
So
that's
unfortunate
with
distribution
active,
active
distribution,
we
do
see
better
read
perform,
so
it's
very
much
related
to
the
right
path.
It's
not
great!
A
A
A
The
problem
is
that
it's
very
clumpy
we
only
in
this
case
have
96
clients,
each
of
which
have
a
directory
that
they're
writing
to.
We
have
24
mds's.
Yes,
if
you
think
about
the
way
random
distributions
work,
that's
going
to
result
in
a
very
uneven
random
distribution,
lots
of
clumpiness
lots
of
cases
where
one
particular
mds
might
have
more
directories
associated
with
it
than
others.
A
The
spread
is
actually
quite
large,
more
than
I
think,
you'd
expect.
If
you
haven't
done
a
lot
of
work
with
this
kind
of
thing,
a
uniform
distribution
would
look
much
better.
We've
done
tests
like
that
in
the
past
a
year
ago
and
saw
that
that
performance
could
double
by
having
a
uniform
versus
a
random
distribution
with
this
kind
of
in
this
case
scenario.
A
So
if
we
could
figure
out
a
way
where,
instead
of
doing
random
distributions
for
ephemeral,
pinning,
we
did
uniform
distribution
round
robin
with
a
shirt
state
between
mds's
that
just
iterated,
so
that
every
new
directory
lands
on
a
new
mds.
I
suspect
that
we
do
much
better
in
this
test,
maybe
not
much
better.
In
reality,
I
don't
know,
but
at
least
in
this
particular
test
case,
it
would
be
far
far
better.
A
So
I
may
look
at
trying
to
do
something
like
that,
we'll
see
so
the
the
gist
of
it
is
that
there's
a
lot
of
low
hanging
fruit
here
we
can.
We
could
probably
double
our
scores
in
the
easy
test
fairly
easily,
and
if
we
can
figure
out
why
the
right
tests
are
so
bad.
Maybe
we
can
do
better
there,
but
we
should
do
much
better
in
both
of
these.
So
anyway,
we've
got
work
to
do.
A
To
get
there,
the
first
thing
I
want
to
do
is
wall,
clock,
profiling
and
the
mds
to
really
try
to
figure
out
what
what
is
waiting
around
doing,
but
I
apparently
multiple
nbs's
in
this
case
lots
of
them.
I
think
the
other
pr
from
you
kernel
the
one
that
changes
the
way
that
we
encode
some
trees
in
the
mds
journaling
essentially
has
some
some.
A
I
think,
that's
a
lot
of
potential
to
help
if
we're
still
seeing
that
as
a
big
bottleneck
which
we
saw
previously
yeah
otherwise
ffs
is
really
complicated.
It's
the
code
is
at
least
for
me,
and
you
know
I'm
not
the
best
coder
in
the
world,
but
it's
kind
of
intense,
so
I'm
hoping
that
maybe
we
can
clean
some
of
that
up,
we'll
see.
A
A
All
right:
well,
then,
that
was
the
only
topic
I
had
josh
or
adam
or
anyone
else
any
anything.
You
guys
would
like
to
talk
about
this.
A
E
No,
not
really
I
mean
I'm
previous
week
and
this
week,
I'm
basically
trying
to
switch
blue
fs
from
having
one
huge
lock
into
smaller
locks,
which
is
not
going
very
bad,
but
I'm
getting
very
large
changes
in
code
base
and
I
actually
have
to
redo
the
work
to
make
it
in
a
small
small
steps
that
we
can
verify,
because
I
I
don't
want
to
break
pfs
so
now.
I
don't
have
really
anything
to
talk
about
yet.
A
Adam,
I
I
think
that's
actually
really
interesting,
because
one
of
the
things
when
I
was
trying
to
look
at
the
shardi
like
implement
charting
in
bluestore
by
kind
of
like
hijacking
that
whole
transaction
state
machine.
The
the
big
question
is
what
happens
at
the
bluefish
layer
right.
So
if
you
can
do
like
finer
grain
locking
there
that
that
maybe
helps.
F
Well,
I'm
a
bit
nervous
when
massive
changes
are
going
to
blow
fast
and
hence
maybe
we
should
suggest
from
to
consider
to
parallel
implementation
for
at
least
for
some
time.
So
let's
have
bluefs
version,
one
which
is
current
stable
version
and
then
fork
to
another
one
with
additional
stuff
and
provide
an
ability
to
switch
between
them
or
at
least
come
back
to
the
original
implementation.
E
Go
ahead
and
bigger,
but
still
preserving
the
data
structures
and
on
this
format,.
F
Well
yeah,
I
I
don't
know
since
I
I
haven't
seen
your
changes,
but
I
believe
it's
hard
to
to
come
back
from
multiple
loads
to
a
single
one
back
to
a
single
one.
So
at
least
let's
be
able
to
to
enable
one.
Once
we
have
the
new
version,
let's
just
be
able
to
use
the
old
one,
maybe
not
reverse
the
the
existing
data
structure
but
say
like
we
have
with
messengers.
E
Well,
at
least
nothing
what
I
have
in
mind
currently
will
invoke
changes
in
data
structures
on
disk
only
in
memory
data
structures.
So
so
far,
so
good.
A
E
A
I
agree
with
you
adam,
but
my
question
for
you
then
to
be
a
little
devil's
advocate,
I
guess,
is:
does
the
locking
actually
affects
the
single
lock?
Does
that
actually
impact
performance
on
spinners.
E
C
A
B
B
C
If
you
give
it
a
q
of
32
ios
versus
four
ios,
it
will
execute
about
this.
The
same
amount.
Sorry,
it
will
take
almost
the
amount
of
time
to
do
the
same,
the
same
number
of
ios,
because
if
for
every
io
you
would
move
the
head
moving.
The
head
takes
about
10,
milliseconds,
extremely
slow
operation.
So
in
spinning.
B
C
You
have
the
operator
cause
of
the
head
movement
and
then
you
have
to
wait
for
a
spin
for
complete
spin
now,
if
with
the
2.5
inches,
the
spin
is,
is
not
that
bad,
I
mean
even
with
them.
It's
like
that.
You
know
how
long
it
take
for
a
single
rotation,
but
the
rotation
is
not
that
bad.
The
biggest
problem
is
moving
the
head.
If
you're
able
to
get
more,
I
o
from
the
same
rotation,
then
then
you
get
them
virtually
for
free.
A
E
No,
I'm
really
not
not
thinking
that
speeding
up
blue
fs
will
significantly
improve
io
depth
utilization
on
spinners,
but
it
will
be
better,
I
mean,
even
if
fractional
for
spinners
but
definitely
will
improve
for
ssds.
I
hope
so.
A
G
E
B
E
A
All
right,
well,
good
luck
on
it.
Hopefully,
hopefully
we
can
do
it
safely
without
I
I
I
share
igor's
concerns
about
how
scary,
sometimes
you
know
this
can
be
to
change.
So
just
be
careful.
I
guess
right.
E
A
A
All
right:
well
then,
everyone
in
the
us
have
a
excellent
memorial
day
weekend
for
everyone
else.
Red
hat,
at
least
in
the
us
tomorrow,
is
recharge
day
and
then
monday
is
holiday
in
the
us
so
long
weekend,
and
I
guess
we
will
see
everybody
next
week-
take
care.