►
From YouTube: Ceph Performance Meeting 2021-07-01
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
New
pull
request
this
week.
I
only
saw
one
new
one
that
showed
up
in
my
list
and
that
is
adam
on
the
core
team
as
a
new
pr,
as
do
not
merge
or
implementing
fine
grain
locking
in
blue
fs,
and
I
suspect
that
that
is
maybe
based
on
some
work
that
I
think
majian
peng
did
earlier.
A
B
Basically,
it
is
working
it
passed
test
and
it
passes
various
testing,
but
still-
and
I'm
not
really
satisfied
with
the
possibilities
of
deadlocks.
I
have
to
find
out
some
systemic
way
of
discriminating
that
this
call
code
will
not
cause
any
deadlock
because
it
there
is
a
basically
one.
Global
log
has
been
switched
into
four
logs
for
different
parts
of
bluestora
and
it's
not
really
a
progression
from
least
locking
to
most
locking
in
like
kernel
layers.
B
It's
different
domains
of
action
require
different
logs.
So
that's
why
deadlocks
are
really
possible.
So
I
would
like
to
review
that
more
before,
actually
pushing
that
on
anyone.
I
mean
that's
free
to
to
look
at
mostly.
I
put
there
because
gabby
wanted
to
review
that.
So
it's
free
to
look
but-
and
please
comment
if
you
see
anything,
but
it's
not
ready
for
rigorous
review
out
of
curiosity,
have
you
tried
running
through
teethology?
Yet
no,
I
did
not
run
it
through
taboology.
A
To
hit
stuff
like
that
pretty
well,
but
not
always
well
excellent.
That
was
the
only
new
pr
that
I
noticed
this
week
for
performance.
There
are
closed.
Let's
see
it
was
a
little
pr
for
changing
a
long
double
in
common
to
double.
I
forget
even
what
code
that
was
in,
but
yeah
it's
fine
makes
sense,
maybe
a
little
bit
of
win.
A
Patrick
merged
something
in
the
mds
to
flush
the
md
log.
When
requesting
the
read
lock.
I
do
remember
looking
at
this
a
couple
weeks
ago.
I
don't
remember
very
much
about
it,
but
it
must
have
been
good
because
patrick
merged
it
so
that's
excellent
and
then
the
b
tree
allocator
that
kiku
wrote
adam.
It
looks
like
you,
you
reviewed
it
and
liked
it
and
kifu
was
satisfied
and
he
merged
it.
So
that's
excellent.
B
Yeah,
we
are
now
waiting
for
a
continuation
which
will
be
merging
data
locator
with
hybrid
alloy,
with
bit
bitmap
allocator
forming
hybrid.
But
for
that
I'm
owing
kifu
a
rigorous
testing
on
possible
scenarios
and
threshold
when
we
should
switch
from
b3
to
hybrid
mode.
A
Eight,
that's
actually
sounding
quite
a
bit
like
what
some
of
the
file
system
the
path
that
they
ended
up
going
down.
So
it's
this,
I
think,
we're
on
the
cutting
edge
or
close
to
it.
Now
it's
good.
A
All
right,
let's
see,
manager
this
time
to
live
cash,
implementation,
that
pr
has
updated,
testing
results
which
is
nice
to
see
they
they
included.
Some
graphs
looks
like
the
cache
is
helping.
So
that's
that's
nice
to
see.
I
think
that
there
was
more
discussions.
Avl
allocator
vr
that
has
not
merged
yet
and
then.
A
Of
course,
a
pg
optimization,
rpg
removal,
optimization
pr,
it
looks
like
that's,
maybe
ready
for
testing
or
no,
it
isn't
testing
kifu.
Has
it
testing
now
again.
So
that's
that's
good.
Maybe
we'll
see
that
merge
soon.
A
And
that
was
that
was
all
I
saw
for
new
and
updated
prs
here.
Anything
I
missed
from
anybody.
A
All
right:
well,
then,
let's
see
the
the
only
real
topic
I
have
for
today
is
that
for
a
while
josh
and
nihal
were
waiting
on
me
to
to
do
run
some
tests.
Looking
at
the
the
osd
client.
What's
it
called
osd
client
message
cap
parameter.
A
We
previously
had
this
set
for
100
for
many
years
and
then
for
kind
of
an
unknown
reason.
I
guess
we
disabled
it
completely
and
got
rid
of
it.
So
there
was
no
cap
anymore,
but
I
guess
that
was
causing
problems
in
some
cases
in
in
toothology
tests,
and
I
don't
know
maybe
actually
in
writing
clusters
neha.
Do
you
or
josh
do
you
know
was
that
actually
a
customer.
C
And
no,
at
least
one
case
that
I
encountered
was
one
of
our
rocks.
Tb
related
tests
and
pathology
started
failing
and
it
was
just
the
osds
were
just
hitting
a
suicide
timeout
and
setting
this
option
actually
helped
in
that
case,
which
is
what
made
me
think
that
it
does
work
and
it
does
a
good
job
of
you
know,
throttling
things
at
the
messenger
layer.
That
was
one
example
recent
that
I
remember.
A
That
does
not
surprise
me,
given
how
things
are
architected
right
now,
so,
okay,
I'm
glad
that
we
weren't
specifically
seeing
customer
issues
with
it.
Although
I
wonder
if
maybe
oh.
C
We
did
see
customer
issues,
the
customer
issues
were
there,
but
the
code
wasn't
there.
That
was
the
problem,
so
we
couldn't
really
ask
them
to
like.
You
know,
even
set
this
and
see
if
that
helped,
but
we've
now
finally
gotten
this
code
available
in
the
customer,
you
know
releases
that
people
are
using,
but
since
then
I
don't
think
we've
hit
that.
So
it's
not
been
a
long
runway,
but
at
least
it's
there
now
for
us
to
exercise.
If
we
run
into
such
issues.
C
E
A
Yeah
I
was
gonna
say
that
suicide
timeout
does
seem
like
something
that
people
have
been
seeing
like
in
the
last
couple
of
years,
and
maybe
it's
been
more
predominant
since
we
disabled
that,
but
yeah
well.
In
any
event,
I've
owned
you
guys
this
for
like
a
month
or
two,
I
think,
and
I
finally,
over
the
weekend
just
went
through,
went
bonkers
and
ran
a
bunch
of
tests
on
it
and
that
pasted
the
results
in
the
chat
window.
A
So
for
anyone
that
wants
to
look
it's,
I
I
personally
don't
find
it
super
interesting.
It's
it's
kind
of
what
you'd
expect
to
see
at
really
small
message.
Caps,
it's
slow!
Not
not!
A
Actually,
maybe
one
surprising
thing
in
this
is
actually
it's
faster
in
a
number
of
these
tests,
even
with
a
really
small
message
gap
than
I
expected,
which
you
know
mostly
sequential,
which
makes
sense
right
because
you
might
have
like
merging
the
bios
happening
or
whatever
so
so
that
you
know
is
maybe
less
surprising,
but
but
in
general
I
mean,
like
the
the
what
we
had
said
to
you
before
100
was
was
not
super
unreasonable.
A
We
do
a
little
bit
better
if
it
said
a
little
higher
so
like
in
the
course
stand
up.
We
were
talking
about
like
256,
maybe
just
being
the
new
default,
but
you
know
really.
Some
reasonable
cap
is
not
completely
like
it.
Yes,
this
is
kind
of
just
straightforward,
there's
no
reason
not
to,
in
my
mind
at
least
based
on
these
results.
A
Yeah,
both
rbd
and
rgw,
can't
follow
similar
patterns.
Nothing,
nothing
real
super
exciting.
A
What
I
will
say,
though,
actually
in
this,
is
that
we're
seeing
this
really
weird
behavior
with
128k
random
reads
where,
like
it's
oscillating
between
like
it's,
almost
like
a
binomial
distribution,
where
half
the
time
it's
it's
like
fast
and
half
the
time,
it's
slow
and
there
doesn't
seem
to
be
a
whole
lot
of
rhyme
or
reason
to
it.
A
So
that's
something
that
needs
to
be
figured
out,
and
I
don't
know
if
that's
related
to
throttling
or
not,
but
we
we
always
have
had
really
kind
of
strange
issues
with
that
I
o
size
with
around
there
like
these
middle.
I
o
sizes-
and
it's
not
never
been
totally
clear
to
me
why
but
yeah.
That
is
what
it
is.
More
work
needs
to
be
done
there.
A
The
other
thing
that
came
through
in
these
results
is
that
for
large,
sequential
and
random
writes
like
four
megabyte
ones,
we're
like
getting
between
13
to
15
gigabytes
per
second
and
on
pacific.
I
was
seeing
closer
to
20
to
25,
so
we
may
have
a
sequential
regression
in
master
that
I
need
to
go,
look
at
and
track
down,
or
at
least
figure
out.
Why?
A
Why
it's
happening
so
there's
that
too,
but
that
was
really
what
came
out
of
all
of
this,
in
my
mind,
is
a
couple
things
to
look
at
maybe
fix,
and
you
know
some
message:
cap
value
of,
like
you
know,
between
100
and
500
is
probably
fine.
A
D
I
agree,
I
think
we
have
something
very
coarse,
grained
in
in
existing
osd
back
off
code,
which
only
kicks
in
when
there
are
cases
where
we
know
that
we
can't
service
operations
for
a
long
time
like
object,
needs
to
be
recovered
or
the
pd
is
veering
or
otherwise
inactive.
D
And
then
we
ask
the
clients
to
back
off
for
a
while
until
we
tell
them
it's
ready
to
go
again
but
yeah.
I
agree
that
the
better
flow
control
would
be
quite
helpful.
I
think
we
were
like
the
message.
Cap
is
kind
of
a
very
simple
kind
of
throttling
approach,
but
something
you
know
doesn't
happen
all
cases
and
it
doesn't
really
prioritize
things
or
or
broaden
any
level
of,
like
all
your
service
that
you
might
get
with
a
better
or
more
complex.
Looking
drawing
algorithm.
D
A
Sure,
but
like
they
could
they
could
they
could
sleep
right
or
do
something
like
they
could?
Just
you
tell
it,
I
don't
want
you
to
contact
me
for
like
two
seconds
or
something.
Then
it
could.
You
know
sure
it
wouldn't
be
hanging
around
and
just
be
like.
Okay,
fine,
whenever
I'll
give
you
other
work
or
I'll
just
sleep
here,.
D
A
D
E
D
To
learn
more
about
different
kinds
of
flow
control
schemes,
I'm
not
actually
familiar
with
with
many
other
myself.
A
Yeah,
I
I
actually
just
this
morning
was
looking
at
like
tcp
and
cam.
You
know
the
original
implementation
was
just
like.
Okay,
you
send
like
a
frame
back
and
say.
Okay,
I
want
you
to
wait
for
some
period
of
time
right,
but
then
at
some
point
they
did
a
much
more
kind
of
interesting,
like
qos
type
theme
where
they
could
have
different
kinds
of
well,
and
I
don't
actually
know
that
much
about
other
than
that
it
looked
like
it
was.
D
It
might
be
more
interesting
to
think
of
this
as
well
in
different
contexts
for
the
future
interfaces.
We
have
worth.
Thank
you
bye
for
stuff,
like
with
any
any
protocol
changes
we
want
to
make
for
crimson
or
with
nvme
of
any
more
direct
connections.
A
D
Yeah,
I
guess
I'm
not
even
different
things
that
more
complex
small
control
could
help
with,
since
I'm
not
very
familiar
with
the
algorithms.
C
Something
lines
of
this
remember
george:
there
was
some
discussion
about
doing
some
flow
control
at
the
rgw.
Just
for
the
rgw
workloads.
There
was
some
pr
from
somebody
who's
trying
to
implement
something
of
that
sort.
C
G
Tim
clock
is
used
in
the
beast
front
end.
They
were
just
discussing
an
email
today,
but
apart
from
that,
it's
still
under
discussion.
D
F
I
think
that,
but
in
my
recollection
of
the
the
arm,
talk
was
just
that
they
increased
that
and
it
increased
performance
in
their
tests,
and
it
wasn't
clear
to
me
that
there
was
really
a
downside
of
just
increasing
that
in
general.
But
I
think
there's
also.
D
D
A
F
Yeah
just
keep
it
high
enough
that
it
won't
affect
most
people
yeah.
I
think
the
only
deal
my
recollection
is
that
these
the
message
cap
and
the
memory
cap
are
both
global,
which
means
that
there's
some
like,
if
you
have
one
client
that
comes
in
first
it'll,
it
can
fill
up
the
queue
effectively
or
fill
up
that
cap
and
then
another
once
other
clients
come
in.
They
won't
be
able
to
it'll,
have
a
harder
time
sending
messages.
F
So
there's
like
a
little
bit
less
spare
in
that
case,
even
though
they're
like
the
dequeueing
underneath
there
so,
but
that
should
even
itself
out
over
time
like
a
first
mover
advantage
or
something.
A
A
C
Yeah,
did
you
see
somebody
who's
already
using
it,
the
arm
drop
or
are
they
using
what
value
are
they
using?
Do
you
know.
F
D
A
I
it
surprises
me
a
little
bit
though
right
like
if
they
change
it
from
like
a
hundred
to
a
thousand,
I
mean
thousand
just
I
mean
think
about
that
right,
like
that's.
That's
a
a
huge
cap
yeah
and
it's
like
crazy
that
shouldn't
once
you
get
past
like
a
hundred
even
like
200
to
300.
Okay,
maybe,
but
like
that's.
A
A
F
A
F
Well,
the
other,
the
other
part
of
that
talk.
I
want
to
there's
a
whole
list
of
stuff,
but
most
of
the
good
news
is
that
most
of
the
stuff
that
they
mentioned
has
already
been
fixed.
There
are
some
like
page
size
stuff
with
4k
pages.
It
sounds
like
basically
they've
addressed
the
the
big
problems
and
basically
pumping
up
the
page
size
on
arm
like
a
10
improvement
or
something
which
is
pretty
nice.
F
Even
the
right
amp
thing.
I
think
the
recent
rocks
to
be
direct.
I
o
right
re-caching
behavior
thing
does
the
right
thing,
but
there's
one
other
thing
that
we
haven't
looked
at
and
that's
cpu
partitioning,
and
this
was
something
that
I
just
like.
F
Don't
I
don't
even
think
I
realized
that
it
was
a
thing
or
at
least
hadn't
ever
thought
about
in
much
detail
where
they
basically
set
it
up,
so
that
the
threads
that
are
processing
like
the
op
q
versus
the
like
blue
store
dispatch,
kb
thread
whatever
are
on
different
cores,
but
on
the
same
socket
and
they
pinned
them
there
and
so
that
the
like
division
of
labor
they
would
be
work
better.
They
they
managed
to
get
a
decimal
bump
from
that.
A
F
Have
been
yeah,
but
it
might
be
having
it
might
be
running
them
on
the
same
course,
I
guess
so
like
it'd,
do
one
queue
and
then
do
the
other
cue
or
whatever,
and
they
basically
force
them
onto
separate
cores
which
made
me
wonder
if
that's
something
that
we
could
teach
the
osd
how
to
do
automatically.
F
I
don't
know
how
I
mean
we've,
so
they
also
talked
about
numa
pinning
they
had
a
machine,
their
machine
had
like
the
network
and
the
pci
devices,
or
the
nvmes
were
directly
attached
to
different
sockets,
and
so
all
the
automatic
nema
stuff
just
worked,
which
was
like
great
news,
because
I
don't
know
if
I've
ever
actually
managed
to
test
that
on
a
system
that
had
like
a
balanced
number
architecture.
F
F
If
we
we
see
that
the
socket
the
pneuma
know
that
they're
on
has
multiple
cores.
We
divvy
up
the
osd
across
those
cores.
A
I
mean
yeah,
you
you'd
hope
that
in
like
right,
heavy
scenarios
where
the
kv-sync
thread
is
basically
just
like
pegging
the
cpu
like
taking
one
core
that
nothing
else
would
get
scheduled
on
the
same
core,
because
it's
just
like
consuming
the
entire
thing,
but
that
doesn't.
That
means
that
we
trust
that
you
know
the
next
scheduler
is
actually
like
saying,
and
you
know
I
don't
know,
if
that's
a
reasonable
assumption,
I
guess.
F
And
maybe
that
maybe
the
thing
to
do
is
like
replicate
the
experiment
and
then
try
to
understand
what
what's
going
on
like?
What's
the
behavior,
when
you
pin
it
what's
the
paper,
when
you
don't
pin
it
like
how
much
what
threads
are
getting
scheduled,
where
there
may
be,
maybe
there's
some
kernel
tunables
that
control
the
the
like
the
heuristics
in
the
kernels
scheduler
that
might.
A
F
A
I
mean
on
arm.
I
think
that
I
might
be
totally
making
this
up,
but
I
thought
arm
tended
compared
to
x86
to
have
higher
contact
switching
overhead.
D
F
A
And,
and
not
also
not
restricting
threads
to
any
particular
pneuma
node
or
anything,
it
was
just
like
random.
F
A
A
It'd
be
nice
if
they
had
any
data
on
what
was
going
on
in
that
case,
in
those
tests
like
whether
or
not
they
were
seeing
a
lot
of
processes
moving
and
where
they're
moving
between,
we
could
do
it
in-house.
F
A
F
A
A
F
F
F
It's
no
different
than
a
non-containerized
case,
though
the
osd
is
sort
of
in
the
pneuma
case.
It's
doing
it.
It's
setting
its
own
whatever
to
confine
itself
to
a
node
based
on
the
devices
it's
using.
D
F
Yeah
I
mean
the
encephalidium
the
container,
I
think,
even
in
rook,
the
container
is
has
all
these
system
privileges,
so
it
can
look
in
sysfs.
A
F
F
That's
going
to
say,
probably
at
the
point
where
we
start
trying
to
have
the
container
orchestrator
direct
it
to
a
particular
abuse
or
set
limits
or
whatever,
then
we'll
have
to
make
sure
that
these
two
things
play
well
play
nice
together,
but
so
far
you
know,
orchestrators
haven't
done
any
of
that.
D
D
More
complicated
because
they
have
kind
of
a
different
concept
of
cpu
sets.
That's
that's
like
kubernetes
scheduler
controls.
D
So
it's
unclear
if
I
guess
we
might
need
to
play
nice
with
you
with
that
versus
having
the
osd.
Do
it
itself.
A
F
Okay,
but
the
point
at
rook
like
sets
some
scheduling
properties
that
tell
it
to
pick
a
cpu
or
use
this
many
cps
or
whatever.
Then
they
can
then
kubernetes
will
go
and
allocate
something
and
that
the
approach
we
took
with
the
memory
restrictions
is
basically
that
cedium
just
tries
to
mimic
the
same
container
inputs
that
kubernetes
does,
and
so
from
the
dean's
perspective
it
looks
the
same.
A
I
suppose
one
downside
of
this
right
is
that
if
you're
pinning
the
tpos
ttp
threads
and
then
the
the
also
the
the
qb
sync
thread
to
specific
cores,
it
does
kind
of
take
away
the
ability
to
kind
of
dynamically
adjust
things
right
like
your
rock
cb
compaction.
Threads
come
in
that
all
of
a
sudden
are
using
a
bunch
of
cpu
or
other
stuff
coming
in
using
cpu
scheduler
can't
be
like
magically
rearranging
it.
F
A
The
we've
there's
been
some
desire
to
do
graviton
testing
on
aws.
I
wonder
if
it'd
be
worthwhile
to
try
doing
this
there
just
to
have
another
arm
set
up.