►
From YouTube: 2020-02-20 :: Ceph Performance Meeting
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
B
A
A
I'll
get
started
here
all
right,
so
we've
got
like
three
weeks
of
PRS
here,
although
it
ended
up
not
being
too
bad.
Although
I
may
be
missing
some
things
in
here.
Possibly
Igor
has
a
couple
of
new
PRS
that
are
both
really
exciting.
One
here
is
using
deferred
rights
to
avoid
blood
fragmentation.
When
you
have
a
small
metallic
size,
I
think
he's
going
to
talk
about
that
later.
So
I'll
just
leave
it
at
that,
and
then
he
also
has
one
that's
a
hybrid
allocator
based
on
both
AVL
and
bitmaps.
A
So
likewise,
I'll
I'll
leave
him
to
talk
about
that
and
he's
got
some
interesting
performance
numbers
too.
That
I
think
he
was
planning
on
showing
today,
let's
see
so
also,
there
is
a
PR
here
for
improving
SPD
cape
performance.
It
looks
like
it's
just
kind
of
changing
some
of
the
code
in
nvme
device
to
use
kind
of
best
practices.
I
guess
we
were
doing
some
things
that
weren't
great
so
anyway.
There's
that
would
be
really
interested
to
see
benchmarks
on
that
clothes.
Pr's
we've
got
a
bunch.
A
A
A
A
Even
affinity,
improvements,
I
guess
for
rgw
here,
not
a
whole
lot
of
other
stuff
in
line,
beautiful
small
object,
yeah
and
I,
don't
remember
too
much
way
and
these
other
ones
but
looks
like
they
were
close
by
his
tail
bot:
okay,
updated,
more
bufferless,
stuff,
Igor's,
simplification
of
the
o
node
in
unpin
logic.
So
that
looks
like
it's
probably
much
better
than
what
I
was
trying
to
do.
But
it
is
not
passing
tests
right
now,
so
just
need
to
figure
out.
Why,
but
overall
looks
really
good.
A
Or
bad
OCW
admin
allows
own
OS
one
from
KC
that
incorporates
my
other
PR.
It
just
does
a
little
bit
more
as
well.
Adams
big
objection,
triumphant
PR
needs
a
rebase
I
didn't
see.
Some
updates
after
Patrick
was
seeing
asserts
in
the
MDS,
so
hopefully,
hopefully
whatever
he
changed
his
resolved
thought,
but
he's
a
rebase
needs
to
be
retested.
A
A
A
C
A
A
B
B
B
The
major
difference
between
these
two
alligators
is
that
AVL
alligator
is
pretty
good
in
finding
continuous
blocks
in
fit,
maintains
a
sort
of
tree
and
hence
it
can
search
efficiently.
Well,
bitmap
needs
sort
of
quench
all
search
to
find
such
blocks,
but
on
the
other
hand,
a
real
alligator
is
mighty
consumed,
pretty
high
amount
of
memory.
B
We've
got
multiple
complaints
about
this.
If
memory
usage
in
production
after
a
while,
well
bitmap
rotator
has
constant
RAM
usage
and
very
good
in
this
case.
So
the
idea
I
decided
to
try,
which
I
called
I
built
alligator
just
based
on
existing
AVL
one
and
you
seat
for
fast
search
for
long
ranges
as
well
as
use
it.
Basically
until
we.
B
So
that's
it
well
in
this
hybrid
locator,
when
we
fall
fell
back
to
a
bitmap
one,
we
still
use
AVL
tree
or
little
alligator
implementation
as
a
sort
of
caching
again
for
search
in
this
continuous
blocks
and
well.
So
we
that's
a
sort
of
work
around
to
bring
of
pad
searches
and
limited
memory
constrained
to
allocation
schemes.
B
B
B
When
we
do
regular
writes,
we
have
asked
the
right
but
subsequent
reads
our
flow
due
to
result
in
fragmentation.
On
the
other
hand,
if
we
apply
default,
writing
performance
drops,
but
subsequent
reads
are
fast,
so
the
result
that
I've
got
the
I
tried
hybrid
allocator
with
default
right,
and
this
is
an
attempt
to
preserve
current
performance
numbers.
We
have
for
64
K
min
alaq
size.
A
E
C
B
F
B
B
A
B
B
E
B
A
B
A
B
Maybe
I'm
not
very
good
in
explaining
that
right
now,
but
well
again,
it's
so
sometimes
we
might
experience
pretty
interesting
numbers
when
we
start
pretty
interesting
difference
when
restarting
locators,
both
bitmap
and
the
real
one.
So
so,
when
I
run
initially
they
tend
to
to
return
continuous
blocks.
But
after
I
started,
some
releases
happened
before
they
start
to
to
to
return
more
fragmented
space.
B
B
B
When
you
perform
some
releases
to
this,
allocator
do
to
do
two
different
writes.
It
tend
to
have
a
list
of
short
extent.
We
just
created
all
the
dispersed
or
space
or
disk
space
and
again,
when
you
try
to
allocate
to
extend
independently
I
mean
that's
not
a
single
write,
but
two
different
rights,
so
they
do
occur
due
to
calls
to
a
like
a
real
locator
and
the
resulting
well
resulting
extents
are
not.
A
B
B
A
Yeah
sorry
yep,
just
looking
at
more
of
your
numbers
here,
I
mean
some
of
these
are
really
impressive.
I
mean
the
random
right
non
4k
like
the
bigger
ones,
8k
128
K,
and
he
sometimes
is
three
to
four
times.
Performance
improvement
on
the
first
run.
Second
run
is
still
50%
like
I'm
looking
at
37
and
38
or
sorry,
36
and
37.
B
E
B
B
A
B
A
You
can
I've
got
it
in
the
ether
pad
here,
but
I'll
link
it
in
though
this
is
the
10
node
challenge
from
the
supercomputing
2019
and,
as
you
can
see
on
it,
the
stuff
is
there.
We
were
represented,
but
we're
kind
of
down
here
at
the
bottom
I
think
Sousa
was
able
to
get
a
score
of
around
12
and
a
half
that
puts
us
at
place.
Eighteen
of
the
top
25
here
so
I
started
working
on
it
using
our
new
officinalis
nodes
and
so
far,
I've
gotten
us
up
somewhere
around
place,
13
or
14.
A
Pretty
variable
results.
I
see
us
typically
getting
anywhere
from
about
20
up
to
around
34,
now
is
kind
of
the
the
highest
I've
gotten
it
most
recently.
I
did
see
it
I.
Think
once
higher
than
that
and
we
did
beat
the
13th
place
result
but
I
have
not
yet
been
able
to
repeat
it.
So
we're
we're
close,
we're
kind
of
hovering.
You
know
pretty
close
to
13th
place,
but
that's
that's
kind
of
where
we're
at
right
now.
A
A
That's
one
issue!
That's
really
hurting
us,
because
it's
in
this
test
is
about
17
gigabytes
per
second,
and
we
should
be
able
to
do
far
far
better
than
that.
With
this
hardware,
with
with
highly
parallel
reads
with
lib
RBD,
we
actually
can
hit
about
60
to
70,
so
lots
and
lots
of
room
for
improvement
on
that
front.
A
The
other
big
thing
that
I'm
seeing
is
I'm
having
a
really
difficult
time,
trying
to
get
consistently
get
balanced,
inodes
across
MPS's
and
then
also
balance.
Requests
across
mb/s
is
probably
due
to
that,
but
also
sometimes
ever
even
seen,
kind
of
weird
weird
behavior
that
doesn't
seem
to
match
the
distribution
of
inodes.
A
Theoretically,
we
should
have
an
equal
number
of
files
represented
on
every
single
MDS
and
then
doing
you
know,
stats
and
other
things
across
all
of
them
across
the
whole
cluster.
It's
often
times
when
I
do
a
small
set
of
tests.
It
works
the
way
it's
supposed
to,
but
in
certain
cases
I
haven't
really
know
down
exactly
all
of
them.
I'll
see
only
one
MDS,
getting
a
huge
number
of
I
knows
and
then
occasionally
some
other
ones
popping
up
here
and
there
it's
almost
like
it's
reverting
to
the
balancer
behavior,
but
it's
not
good
balancer
behavior.
A
It's
really
unbalanced,
so
yeah
I,
don't
I,
don't
get
it.
I
do
need
to
probably
update
my
kernel
client,
so
I
can
make
sure
that
all
the
pending
happened
properly,
because
right
now,
I
can't
actually
check
them.
Getting
the
X.
Adder
doesn't
work.
So
there's
there's
hats
you,
let's
see.
Oh,
we
missed
the
femoral
pinning
PR.
Thank
you
to
Patrick.
He
found
a
bug
in
it
that
was
causing
even
more
unusual
behavior,
though
that's
that's
at
least
figured
out
Thor,
not
using
that
at
the
moment.
But
potentially
it
was
like
it's
fixed.
A
We
can
try
that
again
as
well,
and
I
also
did
try
the
fuse
client,
and
so
the
results
are
maybe
slightly
hilarious.
The
everything
runs
quite
a
bit
slower.
You
know
maybe
10
times
slower,
but
the
one
case
that
was
even
worse
was
using
or
the
the
MD
test,
hard
delete
case
and
I
need
to
actually
verify
exactly
what
that
is
doing.
But
I
was
seeing
about
21
ups
per
second
across
all
eighty
or
no
I'm.
Sorry,
100
mb/s
is
that
I
have
set
up
and
Ellis
was
hanging.
A
You
know
the
stuff
commands
and
also
the
looking
through
the
perf
data
and
correlating
that
so
L,
that's
on
my
list
of
things
to
try
to
get
or
PM
looking
at
as
well.
That's
that's!
Basically
it
right
now!
You
know:
we've
got
numbers
that
are
non-qualifying,
but
we
do
have
numbers
that
are
looking
much
better
than
what's
on
the
eye
of
a
500
list
right
now,
I
think
we
can
do
much
much
better.
A
We've
got
the
performance
in
the
US
needs
to
do
it.
We
just
need
to
kind
of
figure
out.
What's
going
on
here,
or
at
least
yeah
I
need
to
figure
out.
If
it's
something
I
haven't
said
right
or
if
it's
you
know
things
that
we
need
to
change
in
the
code,
so
that's
it
Patrick
and
anything.
You
wanted
to
add,
based
on
what
we've
seen
so
far.
F
Yeah
I
mean
the
only
thing
is
it
would
be.
You
know
we
definitely
want
to
kind
of
grab
some
of
the
behaviors
we're
seeing
that
would
probably
figure
out.
What's
going
on
in
terms
of
you
know,
almost
behavior,
the
ephemeral
yeah,
the
term
opening
had
some
weird
bugs
and
that
an
invalidate
all
the
results
I
expect
yeah,
then
the
normal
export
pins
to
to
work
pretty
well
in
terms
of
distribution
and
exercise,
those
pretty
heavily
in
line
owed
and
I'd
expect
to
get
linear,
meditative
performance
scaling
in
the
Intel
cluster
Alice.
F
Who
reason
why
not
I
don't
know
if
you
mentioned,
but
one
interesting
thing
that
marks
doing
is
he's
got
any
MDS
ranks
in
his
cluster,
which
is
not
something
I've
done
before.
So
it's
conceivable
he's
uncovering
some
new
new
issues
been
in
ffs
by
by
scaling
that
large,
so
any
crazy.
With
the
what
some
of
the
graphs
produce
at
very
least
we're
gonna
try
to
get
some
people
like
databases
involved
performance,
thought
that
so
that
we
can
try
to
see.
A
F
88,
certainly
an
interesting
number
of
ideas:
I
mean
I
would
exist,
I,
don't
have
any
reason
to
believe
it
wouldn't
work.
So
if
you
can
make
it
feel
that
much
that
would
be.
You
know,
I'd
love
to
see
what
kind
of
results
we
get,
but
if
it's,
if
it's
just
falling
over
then
and
the
interest
is
actually
getting
results
that
are
they're
useful,
we
might
want
it
to
go
down
and
then
be.
A
Patrick
one
other
thing
that
I
did
look
at
was
lock
contention
in
the
MVS
when,
when
I
was
doing
these
tests,
I
was
sometimes
seeing
like
really
high
CPU
usage.
What
high,
as
like,
two
hundred
per
demon
so
like
two
cores
being
used,
maybe
up
to
three,
but
when
actually
looking
at
like
a
wall
clock
profile,
it
seems
like
we're
waiting
on
locks
a
lot
and
I
I.
Don't
know
that
me
has
code
that
well
yeah,
so
I
don't
know
what
exactly
we're
waiting
on,
but
the
the
per
mb/s
throughput
never
really
got.
A
F
They
in
terms
of
lock
contention
they,
mostly
just
one
big
India
sake
that
you
might
be
seeing
in
your
analysis
that
we
have
broken
out
some
death
and
then
yes,
maybe
the
journaling,
the
first
few.
So
if
you're
doing
it
on
the
workload,
you
might
see
that
thread
get
pretty
hot
and
then
a
few
other
small
to
keep
past.
But
it
should
not
be
that
significant.
So
most
of
the
work
is
going
to
be
done
in
the
messengers
thread
and
mdf
block
or
whatever
that's
using
MDF
fog.
A
Yeah,
that
was
what
I
was
seeing
is
that
the
messenger
threads
were
all
quite
busy
and
they
are
all
they
were
also
putting
a
fair
amount
of
time
waiting
on
whatever
you
know,
probably
that
big
walkman
that
you're
mentioning
and
then
there
was
in
the
two
different
wall
talk
profiles.
I
looked
at
each
one
had
different
time
spent
in
a
couple
of
other
threads
I
can
go
back
and
look
at
them
again,
but
that
was
that
was
what
I
was
typically
seeing.