►
From YouTube: Ceph Performance Meeting 2021-09-16
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
A
A
There
we
go
so
just
a
reminder
that,
sometime
possibly
the
day
before,
if
you're
like
me,
try
to
read
the
article
all
right
so
pull
requests
this
week,
not
a
whole
lot
of
new
stuff.
I
didn't
actually
see
anything
that
looked
particularly
relevant.
That
was
new,
but
we
did
have
a
pull
request
from
radic
that
closed.
A
Regarding
optimizing
carriage
handling
and
bufferless
c
string,
ilya
did
the
review
on
that
one
and
approved
it
and
merged
it.
That's
good.
We
had
four
prs
that
have
been
updated,
at
least
in
the
ones
that
I
was
looking
at.
A
This
ongoing
osd
compression
bypass
pr
casey
had
done
a
review
on
that
and
it
looks
like
now.
Eric
has
been
testing
it
and
said
that
during
testing
he
was
seeing
a
lot
of
errors,
so
that
is
going
to
be
ongoing.
I
think
not
quite
ready.
A
This
bluefish
fine,
green
locking
from
adam
that
I
think
that
was
no
longer
dnm.
D
A
Pr
is
that
I
don't
know
if
I've
got
that
one
listed
here
you
happen
to
you.
That's.
A
That
probably
explains
why
I
never
added
it
in
here.
Okay,
very
good!
Do
you
have
somebody
to
review
that
one.
D
A
I
think
we
should
try
to
rope
igor
into
looking
at
it
or
sage,
if
you
want
to
which
one
this.
This
is
42099
I'll,
put
the
link
in
the
chat
right
here.
A
I
remember
right
adam:
this
came
out
of
majian
peng's
attempt
at
doing
something
kind
of
similar
right.
D
Exactly
just
made
an
enforceable
release
of
lock
just
inside
read,
you
know
right
to
be
able
for
others
to
go
and
that
really
had
a
nasty
side
effects
and
we
decided
that
we
really
should
fix
the
locks
and
because
it
really
gives
improvement
from
blue
fs
buffered
io,
when
in
the
same
time
there
are
compactions
and
some
other
actual
rights,
and
you
can
see
on
the
performance.
D
A
E
A
Yeah
the
first
attempt
at
this
it
was
not
safe.
This
one
might
be
safe.
D
Okay,
I
mean
it's
safe
enough
that
now
it
is
possible
just
if
we
go
in
a
corner
case
that
we
do
not
have
a
space
or
runway
space
for
blue
fs
lock.
We
can
just
stop
and
everything
still
having
the
proper
locks.
We
can
now
allocate
and
rewrite
just
from
the
scratch
blue
fs
log,
which
previously
was
just
impossible
to
do
safely.
A
A
A
We
agreed
that
we
don't
also
want
to
update
roxdb
yet,
but
that
pr
so
far
looks
like
it's
fairly
benign.
It
still
works
with
the
current
version,
we're
on
and
then
potentially
allows
us
to
upgrade.
Rocksdb
I've
been
running
some
perf
tests
on
that,
just
to
make
sure
that
there
are
no
regressions
introduced
it,
it
looks
fine.
So
far,
I
did
also
tried
upgrading
to
roxdb
6.23.2
based
on
kifu's
commit
it's
having
compilation
issues
with
the
iou
ring
stuff
that
it
now
includes.
A
I'm
sure
it's
probably
just
a
bug
in
the
the
like
our
cmake
stuff,
but
I
didn't
look
too
closely
at
it.
So
I
think
we'll
we'll
just
wait
on
rocks
db
upgrade,
since
I
don't
want
to
rush
into
that
anyway.
This
is
like
no
one
else
does
either.
A
More
generally
related
to
that
pr,
josh
should
recommended
talking
to
seth
maintainers.
This
all
resulted
from
a
maintainer
wanting
to
use
kind
of
a
cutting
edge
roxdb
to
compile
stuff.
A
I
don't
know
what
they're
going
to
use
it
for,
but
kind
of
given
the
issues
we've
had
in
the
past,
with
adopting
rocks
to
be
too
fast,
even
releases
like
with
the
dramatic
changes
that
they
made
in
their
their
their
their
right
path
or
or
like
how
they
they
read
from
cache
versus
how
they
read
from
disk
with
read
ahead,
I
mean
that
code
has
changed
multiple
times
in
the
last
couple
of
years,
so
it's
I
think
we
need
to
start
getting
a
little
bit
more
careful
about
how
quickly
we
just
jump
on
the
new
rocks
tv
releases,
but
anyway,
that's
a
different
discussion.
A
So
probably
probably
try
that
on
stuff
maintainers
list,
the
only
other
one
here
updated
the
cash
binning,
my
cash
bidding
pr
neha
mentioned
whether
or
not
we
want
to
get
this
in
for
quincy.
I
talked
to
adam
a
little
bit
about
this
about
a
week
ago.
Maybe
I
just
got
a
couple
of
other
ideas
for
doing
something
a
little
bit
different,
but
you
know
we're
we're
all
kind
of
in
a
time
crunch
here.
A
This
code,
already
more
or
less
exists,
just
needs
to
be
rebased
again
and
then
probably
go
through
a
lot
more
testing
and
analysis.
So
I
I
think
I'm
going
to
try
to
make
it
happen.
That
might
be
the
next
thing.
I
work
on
after
finishing
up
kifu's
thing.
Well,
also,
looking
on
the
wall
clock
profiling,
stuff,
but
anyway,
that
one
I
don't.
I
don't
want
to.
A
Let
it
go
yet
there's
I
think,
there's
still
value
there,
so
I'll
probably
try
to
get
back
to
that
soon
lots
of
stuff
under
no
movement
adam.
I
know
you
get
a
couple
of
different
things
here,
but
getting
reviews
is
always
tough.
A
Igor's
not
here
right,
oh
so
adam,
I'm
gonna
pick
on
you
a
little
bit.
What
do
you
think
we
should
do
about
cash
trimming?
We've
had
both
your
pr
and
eager
pr
sitting
there
for
a
couple
months.
A
D
Okay,
so
let's,
let's
make
it
for
the
next
performance
weekly.
How
about
that
sure
that
sounds
good.
A
Then
we've
got
the
tc
malik
thread
cache
settings,
pr
which
do
you
know
it
was
there
anything
that
prevented
us
from
doing
that?
Was
there?
Was
there
anything
broken
about
it?
D
A
Let's
see
oh
my
shorted
cash
for
rgw,
I
think
mark
hogan
is
taking
that
over.
He
said
he
was
interested
in
doing
that,
so
I
there's
been
no
more
discussion
on
my
pr,
but
that
doesn't
mean
he
hasn't
been
working
out
in
the
background,
so
I'll
try
to
find
out
what
the
status
of
that
is
unless
casey,
if
you
know.
B
Yeah
he
he
did
sign
up
to
take
it
over
he's,
been
making
a
lot
of
progress
looking
into
the
performance
regressions
in
the
beast
front
end,
though,
oh.
B
There
isn't
a
pr
for
it,
but
he's
added
a
ton
of
details
to
the
tracker
issue.
If
you
want
to
track
that
in
the
ether
pad,
I
can
find
a
link
yeah.
B
A
Cool
well:
hey,
oh
yeah,
I'll
I'll!
Look
through
that
later,
unless
people
are
interested
in
looking
through
it
now.
A
All
right,
I
think,
oh
new
pull
requests,
oh
good.
The
mds
ones
made
it
in.
I
was
actually
just
thinking
that
that
that
zhang's
prs-
probably
I
missed
those
but
that's
good.
F
C
A
Hey
patrick,
are
you,
do
you
have
a
working.
A
Oh
okay,
yeah.
I
see
that
now
cool,
okay,
so
yeah
I
I
did
notice
on
the
mailing
list.
Young.
I
sent
a
big
email
talking
about
some
of
this
one
of
the
things
I
was
trying
to
understand,
based
on
his
last
emails.
A
He
said
that
this
doesn't
apply
to
situations
like
in
the
the
I
o
500
md,
test
hard
tests,
where
you
have
multiple
clients,
putting
many
files
into
like
a
single
directory,
and
I
was
trying
to
understand
how
that
reconciled
with
this
pr43
125
that
we've
got
under
the
new
section
now,
where
we're
randomly
distributing
dirt
frags
to
multiple
nps's.
I
thought
that
would
be
like
kind
of
exactly
that
case,
where
you
have
lots
of
files
for
multiple
clients
with
multiple
juror
frags.
A
You
know
all
in
one
directory
and
then
those
dirt
phrases
are
distributed
to
many
mbs's,
that's
actually
kind
of
the
situation.
I
thought
that
would
apply
in
so
I
wanted
to
ask
jung
or
patrick
if,
if
they
could
kind
of
explain
that
more
I'm
here
now.
H
Oh,
hey,
patrick
hey,
I
don't
know
for
the
I
o
500
hard
test.
Is
that
how
large
is
the
directory?
In
that
case,.
A
So,
usually,
in
the
hard
test,
you've
got
one
directory.
You've
got
an
arbitrary
number
of
of
clients,
and
each
client
can
do
like
write
out
an
arbitrary
number
of
files.
So
the
test
is
timed:
it
is
kind
of
up
to
you
to
define
how
much
to
try
to
write
out
within
that
time
limit.
It's
actually
even
timed
is
a
little
bit
incorrect.
It's
a
minimum
time,
but
no
maximum
time,
so
you
can
let
it
run
for
as
long
as
you
want,
but
it
has
to
run
for
at
least
five
minutes.
H
A
A
H
A
Yeah,
it's
possible,
I'm
misremembering
because
it's
been
a
while,
let's
just
even
say
ten
is
ten
reasonable
or
five.
I
know
I've
hit
yeah.
That
was
number
one.
H
More
reasonable,
this
pr
may
help
if
we
also
add
some
tricks
to
pre-fragment
the
directory,
but
I
think
that
requires
hints
from
the
client
that
may
not
be
allowed
by
the
I
o
500
testing
framework.
H
Quite
possibly
we
can.
We
can.
H
A
Yeah,
it
looks
like
with
the
I
o
500
they
let
you
do
anything
you
want
to
the
parent
directory
so
like
whatever
directory
all
these,
this
stuff
is
going
to
end
up
in
the
parent
you
can
set
whatever
x
out
of
flags.
You
want
on
it
or
you
know
other
things
you
can.
You
can
do,
but
it's
the
the
individual
sub
directories
that
they
don't
want.
You
touching
from
from
what
I've
seen.
A
H
I'm
sure
there's
some
hints
we
could
provide
to
the
mds
through
some
exciters
that
would
improve
our
performance.
There
preemptively
spreading
the
diffracts
or
just
doing
this
sharding
the
metadata
by
spreading
the
dirt
frags
out
randomly
across
mds's
is,
I
don't
know
antithetical
to
the
early
mds
design.
First
ffs.
I
don't
know
how
sage
really
feels
about
it,
but
it
could.
H
Config
option:
I
was
also
thinking
what
would
be
nice,
although
I've
gotten
pushback
from
jung
in
the
past
on
this
is,
is
just
providing
a
a
config
change
for
a
sub
tree,
similar
to
how
we're
using
for
ephemeral
pinning
just
to
say
you
know,
I
want
the
metadata
sharded
this
way
and
the
random
ephemeral
pins
were
kind
of
one
idea
in
that
regard,
and
so
was
the
the
distributed
femoral
pins.
H
H
You
know
that
would
that
would
be
something
I'd
be
more
willing
to
merge,
because
I'm
a
little
wary
of
of
having
such
a
large
change
in
here
and
plus
I
I
haven't
gone
through
any
of
this
code.
I
assume
there
must
be
some
kind
of
config
for
turning
this
on,
because
you
wouldn't
want
it.
In
the
general
case.
A
You
know
the
the
dynamics
of
tree
pinning
if
we
could
figure
out
how
to
make
it
almost
not
ddosed.
When
you
you
get
so
big.
That
seems
to
be
where
it
really
falls
apart
is
it
it
can't
actually
distribute
subtrees
properly
it
you
end
up
like
failing,
lock
acquisition
and
it
just
all
like
falls
apart.
If
we
can
fix
that
it
might
work
better,
I
mean
it
might.
Work
like
you
know
well,
even,
but
that
that's.
That
seems
to
be
where
it
keeps
like.
H
Yeah,
we
still
have
not
had
any
like
anybody,
who's
really
dug
into
the
balance
recently,
so
I'm
sure
there's
lots
of
little
things
that
can
be
done
to
improve
it.
A
H
E
Yeah,
I
missed
the
beginning
of
the
discussion
until
you
mentioned
my
name,
so
I
wasn't
sure
what
I
was
likely
to
not
whatever
I
missed
the
first
part.
But
are
we
looking
at
the
we're
just
looking
at
the
list
of.
H
H
Gpr's
from
jung,
one
of
them
adds
well,
I
haven't
looked
at
code,
yet
I
haven't
seen
how
it's
configured
or
turned
on,
but
one
of
them
randomly
distributes
every
dirt
frag
and
a
sub-tree
across
them.
Yes,
it's
using
the
new,
consistent
hashing.
We
have
for
ephemeral,
pinning
like
nested.
H
But
I
think
it
avoids
making
sub
trees
because
everything
is
is
distributed,
but
yeah,
I
think
it.
It
probably
has
some
nice
performance
characteristics
for
this.
This
aiml
workload,
but
you
know,
for
general
purpose
file
system
use
it's.
It's
really
not
very
good.
E
A
E
H
Well
so
sage,
the
sage
is
basing
this
work
on
another
pr,
which
is
also
in
the
list
on
this
either
pad
called
the
pulling
the
subtree
map
out
of
the
mds
journal,
which
is
something
john,
was
working
on
to
improve
the
performance
of
mds
when
we
have
hundreds
or
thousands
of
subtrees,
because
the
subtree
map
is
written
out
with
every
journal
segment
which
can
get
prohibitively
expensive,
and
so
here's
a
pr
to
pull
that
out
and
this
other
stuff
is
based
on
that.
H
I
don't
think
I
think
he
was
in
the
middle
of
reviewing
the
the
pr
to
remove
the
subtree
map
from
the
journal,
but
we
haven't
made
progress
on
it
since
because
of
its
size
and
with
jung
leaving.
I
was
a
little
concerned
about
merging
that
because
it
really
just
changed
everything.
Oh
my
god,
it's
huge.
E
Yeah,
I
mean
it
seems
like
it's
like
a
deeper
question
of
which
direction
we
want
to
go.
If
we
really
want
to
go
in
a
direction
where
we
have
a
bazillion
subtrees,
then
something
like
this
is
necessary,
but
if,
instead,
the
thinking
is
find
a
way
to
still
keep
the
treatment
modestly
or
reasonably
concise,
then.
A
Agent
in
those
I
o
500
tests
that
we
ran,
it
was
subtree
map,
encoding
or
journaling
on
the
authoritative
mbs
that
the
and
it's
just
like
one
director
with
billions
of
files
from
lots
of
clients
in
it.
That
was,
I
mean
it's
just
awful.
E
E
A
Okay,
I
had
one
really
minor
pr
that
helps
us
a
little
bit
that
merged
a
while
back.
It's
it's
not
even
directly
related
to
this
was
about
like
a
pen,
hole
and
buffer
list,
a
refill
pen
or
buffer
sorry
implementing
dynamic,
a
pen
length.
This
thing
this
this
actually
might
help
a
little
it's
kind
of
stupid,
but
it
it
it's
just
taking
care
of
like
some
of
the
work
that
was
being
done
over
and
over
again.
Every
time
we
were
like
encoding,
the
subtree
map.
A
H
E
Why
it
needs
to
be
in
every
segment
yeah?
I
think
it's
just
so
that
the
trimming
logic
doesn't
have
to
be
super
careful.
I
mean
you've
asked
you
have
to
when
you
start
really
when
you
replay
you
have
to
have
a
subtree
map,
so
you
are
replaying.
E
You
know
what's
authoritative
whatever,
so
that
you
can
rebuild
your
cache
appropriately,
and
it
was
just
simple
as
to
write
the
whole
thing,
because
I
just
assumed
it
was
going
to
be
small,
but
as
long
as
you
eventually
as
long
as
you
don't
trim
too
much
so
that
when
you
start
over,
you
don't
have
enough
contacts
as
long
as
you
have
enough
context
to
do
replay
then
it'll
be
fine,
but
just
be
a
matter
of
working
out.
What
what
that
context
is?
E
H
E
They're
still
working
on
it,
yeah
yeah
I
mean.
Maybe
that
is
maybe
that
is
the
direction
to
go
in
I
mean
I
guess
my
assumption
was
always
that
that
the
map
would
be
small,
but
I
think,
even
if
you
have
like
a
pretty
conservative
view
of
things,
you
would
have
multiple
directories
in
the
many
directories
in
the
file
system
that
are
hot
and
so
you're
hashing
even
at
that
level,
and
so
that
would
be
in
metadata
servers
for
every
one
of
those
directories
and
m
directories.
E
That's
like
already
a
a
lot,
so
maybe
this
makes
sense,
but.
C
A
C
E
It's
not
too
terrible,
it's
not
too
bad.
I
mean
you
could
like,
throw
it
away
and
rewrite
it
and
then
like
nothing,
would
change.
E
This
is
pretty.
This
is
fundamentally
changing
the
way
that,
like
the
replay
dirty
state,
I
think,
is
being
tracked.
E
H
H
Whatever
so,
I'm
I'm
planning
to
go
through
these
pr's,
but
especially
once
my
my
time
frees
up
in
the
next
month,
but
yeah,
you
know
yeah.
We
can't
merge
these
without
you
know
someone
fully
understanding
them
who
who
regularly
is
in
upstream
yeah,
and
maybe
something
like
then
upstream
I
mean
jung-
is,
is
working
on
stuff,
but
he's
not
really.
H
You
know
regularly
doing
or
he's
not
his
fed
stand
up.
Like
he's,
not
triaging
bugs
he's
yeah,
I
mean
it'd,
be
nice.
E
If
you
shut
up
and
stand
up
to
like
support
these
and
move
through,
but-
and
maybe
also
the
I
don't
know
where
the
the
testing
stands,
but
having
a
set
of
tests
that
have
like
pretty
large
numbers
of
mdss
and
a
like
a
workload
with
like
thrashing
or
something
just
to
like,
really
push
the
the
boundaries
here,
like
aggressive
trimming
or
I
don't
know
whatever
it
is-
that
we
think
is
gonna.
E
I
think
the
risk
here
I
would
assume
not
having
actually
read
any
of
the
pull
requests.
The
risk
I
would
assume
would
be
if
the
the
total
subtree
map
state
is
spread
over
lots
of
different
log
segments
that
we're
like
tracking
that
correctly,
so
that
we
don't
trim
something
such
that
we
can't
rebuild
that
state
or
that
we
don't
have
the
important
state
before
we
need
it.
A
H
Yeah,
I
think
it's
just
inherent
in
in
the
yeah.
It's
inherent
with
the
distributed
cache
with
caps.
The
mds
is
the
replication
yeah
of
trees
renaming
all
that
stuff
is
is
just
adds
but
yeah.
What
sage
said,
if,
if
it
wanted
to
make
it
simpler,
it
would
have
to
be
we'd,
probably
have
to
go
back
to
the
drawing
board
with
the
architecture
and
think
about
how
we
might
do
it
differently.
F
Patrick,
while
you're
here,
I
was
curious
about
the
seven
best
qos
efforts.
You
see
that
it
goes
though
some
activity
in
this
one
in
the
last
month,
where
they
say
that
they're
actually
using
this
this
in
production
now.
F
A
Another
one
to
add
to-
and
maybe
patrick
I
can
help
out
on
this-
is
implementing
the
memory
auto
tuning
first
ffs
like
I
know
that,
there's
like
that
outstanding
pair
has
been
there
for
a
couple
of
years,
but
I
could
maybe
try
to
actually
get
it
using
the
same
one.
The
priority
cache
like
we're
using
for
the
other
demons.
H
A
Not
not
really,
actually,
this
is
all
written
with
the
idea
that
you're
kind
of
like
making
suggestions
more
than
demanding
things
does
that
make
sense.
Okay,
like
the
whole
architecture,
is
based
on.
You
ask
the
the
particular
cache
what
it
wants
at
different
like
priority
levels.
It
gives
you
back
what
it
wants,
then
we
kind
of
go
through
this
whole
process.
Saying:
okay,
here's
what
you
should
get,
but
please
please
do
this.
It's
not
like
a
you
know,
immediately
trying
to
revoke
everything.
A
It's
like
okay,
here's
what
you
should
try
to
target
and
then
we
go
through
this
iterative
process
where
we
look
how
much
memory
we're
using.
Then,
if
we're
still
not
under
that,
then
we
make
new
suggestions
and
you
might
end
up
kind
of
like
starving
something
that
can
release
memory,
but
the
whole
goal
is
to
really
keep
things
as
much
as
possible
below
some
memory
threshold.
H
Yeah,
okay,
I
mean
it
is
it's
worth
looking
at
and
mostly
would
just
be.
I
assume
changing
the
lru
that
we
use
for
denturies
priority
cash
instead,
most
of
the
memory
tracking
is
is
done
through
a
a
mempool,
which
I
said
it
works
pretty
similar
to
what
we
already
do
with
the
osd
yeah.
A
Most
likely,
what
we
do
is
the
same
thing
we
do
in
in
the
osd,
which
is
basically
just
to
make
a
really
thin
wrapper
around
your
existing
hash
that
implements
either
add
the
interface
to
it
or
just
make
a
thin
wrapper
around
it
that
implements
the
the
priority.
Cache
calls
that
need
to
be
made,
and
that
will
yeah
it's
it's
fairly.
Non-Intrusive.
H
It's
certainly
worth
looking
at
right
now.
The
the
one
pr
we
have
outstanding
for
the
memory
targeted
is
something
siddharth
was
working
on
the
mds
memory
target,
which
is
supposed
to
be
analogous
to
the
osd
memory
target,
and
that
was
really
just
some
logic
to
set
the
mds
cache
size
according
to
what
the
mds
thinks
it
needs
to
be
in
order
to
stay
at
its
target.
H
You
know
he
had
some
remaining
work
to
do
on
that
and
the
pr
became
stale,
so
it
just
really
needs
to
be
revived
and
yeah.
That's
that's
low
hanging
fruit.
As
far
as
just
getting
that
worked
on,
and
then
I
think
the
priority
cache
would
be
a
good
next
step.
A
So
the
the
priority
cache
basically
incorporates
like
generic
code
for
doing
the
same
kind
of
thing.
So
that's
what
we
use
in
the
osd
and
in
the
mon
and
I
was
hoping
I
might
be
able
to
get
the
rg
rgw
guys
to
use
it
as
well.
But
it
you
know
it
it.
I
don't
know
in
the
mds.
It
might
maybe
it's
more
complicated
and
wouldn't
work,
but
it
might
be
a
nice
way
to
just
kind
of
avoid
having
lots
of
independent
implementations
of
the
same
thing.
A
F
F
A
Well,
I'm
talking
to
you
about
that.
Do
you
guys,
besides
that,
the
one
cache
do
you
guys
have
other
caches
or
buffers
or
anything
that
that
need
to
be
like
regulated
to
keep
the
client.
H
C
H
H
H
Sure
the
only
way
that
actually
we
can
force
the
kernel
to
release
its
references
to
an
inode
is
to
actually
remount
this
fuse
mount.
So
actually
it's
got
some
cute
logic.
When
the
mds
revokes
an
inode
capability
that
fuse
will
actually
remount
itself,
causing
the
kernel
to
release
all
of
its
references.
H
And
you
know
this
has
actually
been
a
long-standing
problem
with
cepheus,
because
there's
no
great
mechanism
to
tell
the
colonel
hey.
I
need
you
to
drop
this
reference
to
this
high
note.
H
We
used
to
have
some
special
api
call
internal
to
the
colonel,
through
fused
italia,
to
release
a
reference,
but
that
got
deprecated
for
colonel
reasons,
and
then
I
think
that
left
us
with
remounting,
although
I
think
there's
been
some
work
on
the
next
version
of
fused
at
some
kind
of
support
to
do
this
again.
But
I
haven't
looked
into
that
carefully
sure.
A
Was
was
someone
on
this
ffs
team
was
they
were?
I
think
I
remember
someone
was
looking
at
fuse
in
general,
like
trying
to
update
stuff
and
yeah.
H
H
Actually,
that
sounds
like
a
good
project
for
our
new
ffs
team
member,
who
was
an
intern
elsewhere
at
red
hat
and
just
joined
this
ffs
team
mirage
to
pitch
that
as
a
startup
project
for
him,
cool.
A
H
A
A
All
right:
well,
let's
see,
is
there
anything
else?
I
don't
think
I've
got
anything
else,
guys
I'll
open
it
up.
Anyone
have
anything
they
want
to
talk
about
in
the
last
15
minutes
here.
F
I
guess
just
going
back
to
that
qspr
for
and
yes
patrick
they
also
mentioned
they
had
as
for
tesla,
they
implemented
around
mds
like
thrashing
essentially
sounded
like
I
don't
know.
If
you
look
through
their
presentation
about
this
looks
pretty
interesting.
F
A
I
will
I
will
say
that
that
if
we
can
fix
some
of
the
qos
problems
in
this
ffs,
that
was
the
other
big
thing
I
noticed
in
the
I
o
500
test
is
that
we
were
having
some
clients
completing
much
much
earlier
than
others,
and
I
think
not
necessarily
strictly
due
to
just
like,
like
weird
balancer
issues
or
other
things,
even
with
like
ephemeral,
pinning
well.
A
That
was
also
a
separate
problem,
but
I
think
qos
also
played
into
this,
where
the
more
we
can
make
things,
even
the
better
we're
gonna
do
on
that
test.
H
If
the
mds
needs
to
ask
the
client
to
release
something
from
its
cache
or
release
a
capability
so
that
it
can
do
work
on
behalf
of
another
client,
it
may
first
have
to
chew,
through
a
number
of
messages
from
that
client
before
it
can
get
to
that.
That
cap
release
that
the
client
is
giving
back
to
the
yes
and
so
bolting
on
qos
onto
the
mds
is,
is
tricky
business
because
you
have
the
potential
for
creating
deadlocks.
H
So
in
some
ways
we
need
to,
you
know,
take
a
detailed
look
at
the
clients
and,
and
they
they
definitely
need
to
participate
in
this
qos
nicely.
So
that
they're,
not
you,
know,
creating
these
deadlock
situations
by,
for
example,
giving
the
mds
too
much
work
and
we
have
to
provide
legacy
support
and
it
it's
not
simple.
H
But
I
have
not
again
taken
a
detailed
look
at
this
pr.
So
I'm
not
exactly
sure
what
they've
they've
done
yet.
F
Yeah,
it's
definitely
a
hard
problem
even
for
a
simpler
protocol.
The
osds
I
mean
we're
only
just
now,
starting
to
get
to
the
point
where
we
can
be
implemented
and
test.
The
client
versus
client,
qos
and
radios.
H
A
D
H
Often
see
that
the
data
pool
is
set
on
our
disk
drives,
but
now
you
know
these
days,
it's
very
common
to
see
all
ssd
set
clusters,
and
so
there's
no
guarantees
anywhere.
You
can't
make
assumptions,
which
is
why
we,
we
just
have
default
advice
of
putting
the
the
ceph
minutes,
ffs
metadata
pool
on
its
own
set
of
ssds
that
are
exclusively
for
cfs.
A
C
A
So
here
let
me
rephrase
it
say:
you've
got
a
bunch
of
nvme
drives
and
you
can
either
dedicate
all
of
those
for
both
data
and
for
metadata,
or
you
could
split
them
so
that
only
some
of
them
are
serving
data,
and
some
of
them
are
certain
metadata.
H
I
haven't
seen
numbers
for
nvmes,
so
I
I
can't
say
whether
it's
necessary
in
that
particular
case,
but
I
imagine
that
even
with
nvmes,
the
osds
could
be
overwhelmed
by
by
clients
doing
large,
read
and
write
workloads
on
osds,
especially
for
a
smaller
cluster,
in
which
case
the
you
know,
the
mds
is
not
getting
any
kind
of
priority
treatment
from
the
lsds.
So
even
simple
things
like
writing
to
the
journal
to
record
you
know,
file
opens
and
closes,
and
app
updates
would
be
slowed
down.
H
F
A
good
argument
for
making
sure
that
ray
sqs
works
well
for
his
best
clients,
the
available
control,
our
priority
between
that
the
metadata
server
activity
and
the
client
activity.
H
Yeah,
especially
for
edge
clusters,
where
the
number
of
osts
is
not,
you
know,
you
only
have
a
few
osds,
but
they
do
have.
You
know
cutting
edge
hardware
like
nvmes.
You
know
it's
even
more
important
to
make
sure
that,
and
you
know
we
don't
need
to
carve
out
a
number
of
osds
for
the
for
the
metadata
pool
right
right.
A
What
I
was
seeing
during
the
I
o
500
tests
is
that
the
the
really
hard
tests
the
osd,
seemed
to
not
actually
be
doing
a
whole
lot.
It
was
kind
of
like
we
had
a
lot
of
contention
on
like
a
single
mds
like
an
authoritative
mds,
even
in
like
the
ephemeral
pinning
tests
where
you
are
doing
like
this
round
robin
stuff
I
mean
the
osds
were
working
harder,
but
not
as
hard
as
they
can
work.
A
It
looked
to
me
more
like
what
we
saw
was
that
you'd
end
up
with
certain
mds's
hitting
their
their
kind
of
inherent
limit,
which
is
like
maybe
20
000
ifs
or
something
around
that
that
level
of
performance
and
then
others
end
up
with
a
lower
number
of
subdirectories,
or
maybe
I
don't
remember
how
you've
changed
it
at
this
point,
but
they
have
a
a
smaller
proportion
of
the
work
to
do
just
by
random.
You
know
bell
curve,
distribution
and
end
up
by.
I
think
for.
H
B
H
A
H
H
When
I
do
testing
on
line
node,
I
I
I
still
put
the
os
the
metadata
os.
I
have
a
separate
set
of
metadata
osds
because
I
actually
do
see
the
problems
when
I,
when
I'm
doing
something
with
like
a
16
or
even
32,
osd
cluster
and
they're
on
enterprise
ssds.
Even
though
they're
vms,
I
can
easily
you
know,
create
slowdowns
by
doing
a
large
workload
with
like
64
128
clients
all
hitting
the
cfs
cluster.
H
In
that
case,
and
all
of
the
metadata
map,
no,
the
metadata
the
fps
journal
is,
is
all
blob
store
and
objects
and
okay,
but
the
directories,
the
directory
objects,
are
all
on
and
then
there's
a
few
other
data
structures
which
I
believe
are
blob
stores.
They
open
file
tables.
Another
example
of
an
old
map
store
that
does
the
mds
uses.
A
Did
you
did
you
happen
to
notice
what
looked
like
was
slowing
down
more
if
it
was
a
map
or
or
data
accesses
up
to
texas.
H
C
H
Yeah
I
mean
the
the
nice
thing
about
linode.
Is
it's
easy
to
make
a
you
know
even
a
large
cluster
for
cheap,
but
you
know
unless
you're
willing
to
you
know
you
can
make
vms
with
a
lot
of
memory,
but
then
you
have
to
start
shelling
out
more
money
which
I
try
not.
I
try
to
avoid
doing
so.
It's
easier
to
just
take
a
few
ost
and
use
them
for
metadata.
F
A
H
I
don't
want
us
to
go
down
a
huge
rabbit
hole
of
optimizations.
We
can
do.
I
think
mostly.
We
just
need
to
try
out
the
new
qos
features
of
the
of
the
osd
with
in
this
particular
scenario,
of
a
small
ceph
cluster,
and
if
that
works,
then
we
can
make
appropriate
recommendations
in
the
documentation,
but
otherwise
I
I
don't
think
it
requires
a
lot
of.
H
You
know
a
a
lot
of
detail.
Sorry,
I
I
don't
think
we
need
to
investigate
this
too
much.
The
you
know,
just
separating
off
a
few
osd
is
in
a
large
cluster,
for
metadata
is
not
a
is
not
a
huge
ask
for
for
the
large
production
clusters
we
know
of
like
it's
certain.
H
Anyway,
I
gotta
run
to
a
conflict,
so
hopefully
I'll
see
you
all
next
week
or
sometime
in
the
future,
depending
on.
If
that
other
meeting
conflict
continues.
Okay,
sure.