►
From YouTube: 2020-03-12 :: Ceph Performance Meeting
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
C
C
All
right,
let's
get
this
show
on
the
road
now
not
a
whole
lot
going
on
not
and
no
as
far
as
I
could
tell
no
new
or
close
PR
performance
be
ours.
This
week,
not
horribly
surprising,
though,
given
that
everyone's
really
focused
on
just
getting
up
to
put
stuff
taken
care
of.
So
let's
see,
I
saw
two
updated
PRS
this
week,
one
from
Igor
with
his
hybrid
alligator
I,
think
that
got
new
update,
I,
don't
know!
What's
in
it
beware:
I
saw
you
just
joined
anything
new
with
the
hybrid
alligator.
D
C
D
A
C
Absolutely
okay,
let's
see
so
then
the
other
one
is
sage
review.
This
parallel
crusher
calculation
for
balance
or
PR
I,
confess
I
didn't
actually
read
his
reviews,
so
I
don't
know
what's
in
it,
but
feel
free
to
look
if
you're
interested
okay,
that's
it
not
much
else
going
on.
As
far
as
I
can
tell
anyone
have
any
garish
this
week
that
they've
been
working
on
that
they
I
missed
they'd
like
to
talk
about.
C
C
Recommendation
for
fixing
a
couple
things
I
had
done
wrong.
That's
now
working
very
well
and
we're
actually
seeing
high
performance
with
it
than
we
are
with
the
kernel
client,
which
is
interesting,
because
now
it's
telling
us
that
there
possibly
are
some
things
and
kernel
stuff
of
us
and
potentially
also
kernel
RVD
that
we
need
to
fix.
So
the
the
results
that
we're
getting
now
between
that
and
between
between
switching
from
toast
replication
to
1x
replication,
because,
apparently
that's
allowed
in
their
test.
Results
were
in
9th
place
and
kind
of
flirting
with
8th
I.
C
Think
that's
actually
pretty
good.
I
have
been
talking
on
the
IO
500
million
list,
with
people
about
what
the
different
scores
kind
of
mean
and
they
don't
really
categorize
the
hardware
that
the
storage
is
running
on
very
well.
Even
the
in
the
Tendo
challenge
is
just
ten
client
notes
and
you
can
have
whatever
storage
system
you
want
backing.
It,
though,
we're
competing
with
a
variety
of
different
systems,
some
of
which
have
you
know,
maybe
like
20
or
30
nodes
and
hundreds
of
storage
devices.
And
you
know
our
test.
C
Cluster
has
10
nodes
and
80
storage
devices.
So
we're
actually
I
think
looking
surprisingly
good
here.
Having
said
that,
the
way
that
they
do
the
scoring
for
this
is
is
this
kind
of
multiplicative?
It's
not
just
like
an
average
of
all
the
scores.
It's
score
1
times
score
2
times
4
3
times,
4
4
all
raised
to
the
power
of
one
fourth,
and
it's
a
little
more
complicated
than
that
because
they
calculate
these
independently
for
IEPs
and
bandwidth,
and
then
they
do
the
same
thing
across
those
fours.
C
So
it's
kind
of
weird,
but
the
the
end
result
is
that
low
scores
really
really
hurt
you
and
we
are
getting
low
scores
or
the
hard
tests.
So
for
like
the
IOR,
hard
test,
it's
doing
unaligned,
reads
and
writes
to
big
files
and
I
I.
Don't
I
need
to
go
back
and
look
at
it
again,
but
we
may
have
multiple
clients
hitting
the
same
file.
C
So
that's
one
thing
and
then
the
MB
test,
hard
tests,
you
have
one
big
directory
and
you
have
lots
of
clients,
writing
lots
of
small
files
or
reading
or
satting
or
whatever
those
in
that
directory.
So
in
that
case,
pinning
doesn't
help
us.
You
can
be
both
the
existing
static,
pinning
and
the
ephemeral
pinning.
You
know
we
only
have
one
big
directory
to
work
with.
So
in
that
case,
it's
really
how
well
the
balancer
works
that
that
effects
that
score
and
for
at
least
in
this
these
tests.
C
It
takes
a
long
time
for
the
balancer
to
actually
get
going.
It
does
eventually
and
even
when
it
does
get
going,
we
do
see
really
good
distribution
across
into
yeses
eventually,
but
then
they're
like
these
beard
periods
of
very,
very
slow
access.
It
looks
like
maybe
it's
trying
to
move
things
back
to
the
parent
directories.
Nbs,
which
also
is,
is
kind
of
bad.
We
were
slower
when
we
have
the
parent
directory,
also
trying
to
participate
in
handling
either
pin
directories
underneath
it
or
fragments.
C
So
there's,
there's
I
think
this
whole
language
is
very
rich
for
or
figuring
out,
what's
going
on,
improving,
so
the
the
gist
of
it
is
that
we
I
think
we
could
actually
do
really
really
good.
That's
if
we
can
take
care
of
those
two
hard
cases,
the
one
with
them
doing
unaligned,
reads
and
writes
to
small
numbers
of
very
big
files
from
lots
of
clients
and
then
the
case
of
a
single
directory
with
many
many
files
being
read,
read
written
or
read
or
standard
or
whatever
from
multiple
clients,
but
where
we
can't
use
pinning.
C
So
those
are
the
those
are
kind
of
the
interesting
pieces
in
this.
So
that's
basically
it
for
now
we
we
continue
to
improve
it.
I
think
we're
actually
doing
pretty
good
but
Moore's
to
come.
So
any
comments
or
questions
on
that
stuff.
C
E
The
reason
was
to
find
a
proper
value
when
the
part
that
became
a
part
of
that
hybrid
alligator
should
kick
in
to
get
the
proper
balance
of
performance
and
memory
usage,
and
strangely
I
did
not
even
finish
that,
because
some
more
problem
has
distracted
me
from
it,
namely
I
noticed
that
all
of
our
alligators
and
when
we
have
been
like
more
than
80%
full
on
discs,
do
degrade
very
quickly
in
terms
of
performance
of
the
CPU
performance
of
actually
performing
allocations
and
deallocations,
so
I
have
them.
E
The
difference
is
that
if
I
have
a
test
that
has
a
fixed
type
of
object
that
I
test
on
like
8
megabytes,
then
all
of
the
locators
perform
very
well.
But
if
the
distribution
of
sizes
is
not
not
constant,
there
is
some
different
sizes,
even
like
4,
kilobytes,
64
and
8.
Megabytes
are
just
only
three
acceptable
sizes,
then,
for
a
very
full
devices
or
a
lock.
E
Our
locators
get
very
time-consuming,
and
my
question
is:
do
we
even
care
for
disks
that
are
filled
more
than
80
percent,
or
we
should
just
I
should
just
drop
it,
because
it's
not
really
relevant
to
our
case,
so
that
that
is
that's
right.
Questions
for
now
and
I
will
fill
the
actual
tables
with
full
data
next
time,
because
I
couldn't
finish
that
on
time
today,.
E
E
Will
in
that
case
maybe
I
will
just
finish
the
numbers
and
three-round
them
also
on
different
machines,
but
I
don't
really
expect
a
much
significantly
different
numbers
because
I've
been
hitting
cache
sizes
much
earlier
than
than
the
limits
of
of
that
80%
feel
so.
But
what
I
will
I
will
do
that
and
maybe
we'll
go
back
to
that
discussion
next
week.
Yeah.
E
C
E
D
D
D
D
D
D
D
Performed
if
your
tries
writes
on
sizes
up
to
allocation
size,
which
is
64
K,
my
second
step
was
to
increase,
prefer
prefer
different
size
parameter
settings
for
big
difference
and
actually
that's
what
I
did
for
and
actually
the
numbers
for.
These
two
changes
are
in
column
G,
which
shows
pretty
well
some
improvement
for
long
Creed's.
We
still
worse
than
master
64k
case,
but
it's
well
much
closer
now
and
I'll
try
to
cover
a
bit
later
why
it's
still
worse,.
D
D
D
D
E
D
D
249
I
actually
just
performed
some
validation
for
different
right
patterns
for
the
performance
and
no
much
of
interest
here,
except
that
okay,
hybrid
allocator
and
if
you're
big
rides,
are
generally
bad.
Well.
I'd
say
that
when
we
introduced
default
right
differed
big
right,
our
right
performance
becomes
worse,
but
this
actually
there
is
conflict
between
rate
performance
and
right
performance
here.
So,
if
we
want
faster
reads,
we
should
use
different
rides
to
reduce
fragmentation
and
enhance
wave.
Oh,
they
have
worse
writing
performance.
D
B
D
D
D
D
D
Well,
if
you
look
at
column
C,
you
can
see
that
after
just
the
first
write
case,
which
is
24
K
right,
random,
right
I
can
see
some
performance
droppin
in
read
and
then
again,
one
more
an
aligned
set
of
Rights
makes
reading
performance
even
worse,
which
is
roll20
error
or
all
12,
and
so
on
so
forth.
At
this
point,
I
realized
that
I
am
working
on,
so
it
is
better
to
reset
the
store
on
each
on
each
try
and
then
proceed
with
Nerva
set
of
benchmarks,
which
are
presented
at
lines.
D
D
96
K
right,
a
pretty
unaffected
by
pequenina
locks,
eyes
and
her
British
locator
so
well.
Actually,
writing
performance
are
better
and
reading
operates.
The
same
then
72k
writes
again
badly
affected
and
32
64
and
128
K
operated
on
the
same
plate
as
the
original
behavior
that
was
about
currency
and
actually
I
realized
that
we
missed
a
couple
more
things
with
default:
big
rights,
the
first
one
is
handling
of
cases
which,
when
write
overlaps
to
adjust
and
blobs
and.
D
D
D
D
Right
sizes
below
64
K,
which
is
actually
the
threshold
for
applying
different
rights
at
the
moment,
so
this
fixed
allows
to
achieve
performance
on
par
with
the
master,
but
large
right
prices
still
buffering,
might
still
suffer
from
from
additional
fragmentation.
So
that's
not
the
case
for
96k
right.
D
D
Experience
some
blowdown,
which
means
we
experienced
some
additional
fragmentation,
and
the
reason
is
that
actually
original
master
code
performs
writing
for
such
blocks
in
a
manner
when
it
splits
it
into
64k
and
the
rest.
Part
and
64k
is
written
in
as
a
big
right
without
any
different
and
the
rest
part
is
differed
and.
D
Well,
actually,
I
can
try
to
reproduce
this
behavior
by
some
by
introducing
some
additional
code,
but
I
don't
see
any
way
to
well.
Actually
this
it
looks
like
it
brings
some
additional
complexity,
which
is
probably
not
necessary,
since
this
right
sizes
are
are
not
very
common,
it
might
need
some
additional
settings
and
well
how
to
treat
this
right
block,
how
to
split
it
into
big
and
small
part.
D
D
D
D
So,
and
you
can
see
that
for
more
or
less
common
right
sizes,
which
is
32k
64k
128k,
we
don't
suffer
from
performance,
real
form,
leading
performance
degradation.
That's
actually
this
unaligned
right,
which
are
causing
drop,
and
so
one
should
have
plenty
of
them
to
increase
the
actual
fragmentation.
And
yet
this
worse
numbers,
so
I
think
that
in
real
life,
this
impact
would
be.
D
Spinners-
and
we
don't
have
any
proof
so
by
default,
we
don't
have
different
right
or
or
flash
drives
and
more
all
this
performance
degradation
is
mostly
caused
by
fragmentation,
which
is
not
relevant,
no,
not
that
much
relevant
for
a
flash
drive.
So
it's
been
as
each
suffer
from
from
this
greatly
yeah.
D
D
B
B
D
B
C
E
I
mean
my
testing
scenario
shows
that
all
of
our
alligators
do
get
very
inefficient
regarding
just
allocation
time,
not
performing
an
allocation,
not
even
taking
account
into
what
the
results
would
be
or
actually
drive
to
perform
that
operation.
So
there
is
some
threshold
different
floral
craters,
but
all
of
them
after
80%
do
tend
to
take
exponentially
amount
of
time
to
perform
data.
B
Okay,
okay,
okay,
okay!
Well,
this
sounds
great
to
me.
This
is
fantastic
work.
You
were
it's
I
guess
there
are
three
things:
there's
the
hybrid
alligator,
there's
the
change
in
a
deferred
behavior
and
there's
a
Exorcist
two
things
really.
B
B
C
C
A
B
It
seems
to
me,
like
the
thing
to
do,
is
to
have
first
make
sure
these
aren't
in
our
group
together
and
like
a
pull
request.
I
guess
and
merge
them
all
in
the
master
and
then
let
it
bake
there
for
a
little
bit
and
if
everything
looks
good,
then
in
the
next
point,
release
Ori
suggesting
that
you
think
Oh
straightened.
Octopus.
E
C
D
Mm-Hmm,
what
makes
me
a
bit
nervous
is
actually
analog
size
downsizing
and,
in
my
opinion,
it's
not
the
best
idea
to
change
it
during
release
life
cycle.
It's
it
it's
Luke.
It
looks
quite
a
Kay
if
I
change
with
the
new
release
but
I'll
change
it
during
the
release
cycle
doesn't
seem
a
perfect
idea.
My
opinion
so.
C
We
could
so
sage,
based
on
what
you
said.
It
sounds
like
the
AVL.
Okay,
maybe
I'll
alligator
in
is
is
not
bad.
We
can
get
that
in
as
long
as
we
don't
make
it
default
the
deferred
change.
We
probably
need
to
wait
on
and
then
because
the
deferred
change
we
have
to
wait
on.
We
probably
need
to
leave
the
hard
drive
metallic
size
at
64
K.
For
now,
until
we
can
make
the
deferred
change
the
default.
Is
that
sound
right.
C
It
sounds
like
based
on
what
you
said:
we
can
get
a
VL
and
right
away.
We
don't
need
to
make
it.
We
won't
make
it
default,
but
we
can
get
it
into
octopus
without
really
worrying
about
it
too
much.
As
long
as
it
passes
tests
the
deferred
change.
We
don't
want
to
do
for
octopus.
We
want
to
wait
on
that
and
because
we're
waiting
on
that,
we
probably
don't
want
to
make
the.
B
Only
that
the
downside
there
as
you're
running
out
is
that
the
first
few
of
these
that
are
deployed
with
octopus
are
going
to
have
a
large
metallic
size,
and
so
they
won't
really
capture
these
benefits,
but
I'm
I,
just
I'm
just
I,
don't
know
but
nervous,
yeah
rep
it
in
so
I
might
I
think
we
should
put
this
in.
Let's
put
it
in
master
right
away
and.
B
C
Igor
I'll
try
to
get
some
test
running
on
this
stuff
too,
so
that
I
can
give
you
some
feedback
if
I
see
anything
interesting
going
on
with
it.
Yeah
I'm,
especially
interested
in
for
the
these
IO
500
tests
they're
on
NDB.
So
there's
that,
but
specifically
for
AVL
I'm
curious.
If
a
BL
can
help
out
in
some
of
the
tests,
I
get
where
I'm
seeing
the
eye.
D
D
D
E
E
D
E
C
E
Actually,
I
cannot
I
do
not
know
which
parent
would
perform
such
a
pattern,
because,
if
that
four
megabyte
object
is
a
part
of
our
BD,
then
writing
a
tart
inside
of
that.
That
object
means
that
actually
for
a
file
system
that
supports
it.
On
the
other
hand,
if
there
is
a
file
inside
because
over
writes
in
a
part
of
a
file
in
your
typical
user
file
system
is
a
very
rare
operation
we
usually
usually
create,
create
files
happens
to
them
and
close
them.
C
Isn't
isn't
this
exactly
what
would
happen
in
replication,
though,
where
you,
if
you
might
have
like
lots
of
small
rights
to
something
like
some
block
on
our
BD
and
one
of
your
OS?
These
goes
down
and
now
you're
replicating
objects
all
over
the
place
when
you'd
be
doing
a
large
read
of
that
object
after
potentially
having
inclined
to
lots
of
small
rights.
C
If
you're
moving
objects
around
and
stuff,
you
have
a
client
that
does
lots
of
small
random
I/o
to
some
some
block
and
RBD.
And
now
you
have
an
OSD
go
down
and
you're
replicating
objects
across
and
you're
doing
reads
when
you
be
doing
big
four
megabyte
reads:
right
after
after
you've
done,
those
small
writes.
C
E
C
B
C
The
question
is
okay
for
patterns
where
you
have
like
a
client
doing
small
random
writes
across
an
object.
Let's
say:
RBD
right:
you
got
a
formatted
100
for
the
block
and
then
later
on,
we
have
an
OSD
go
down
and
we
need
to
replicate
that
data
of
a
for
megabyte,
read
following
potentially
small
writes,
is
not
a
necessarily
an
unusual
Yusuke
Amade.
Maybe
not,
but
is
that
right.
D
D
Between
two
cases,
the
first
one
which
I'd
say
conservative
and
this
one
I
tried
to
preserve
so
far-
is
to
well
first
of
all,
is
to
be
more
or
less
on
par
with
our
original
behavior,
and
actually
these
tries
to
keep
less
fragmentation
on
the
object
and
hence
supply.
So
with
64
came
in
a
log
size,
we
had,
we
had
large
chunks
and
hence
we
had
less
fragmentation.
D
Trying
to
implement
this
conservative
approach
results
in
worse
writing,
speed
and
in
bed.
Reading
performance
different
case
would
be
to
appear
to
operate
in
an
assumption
that
user
tries
to
read
and
write
this
more
or
less
the
same
chunks,
and
hence
when
we
perform
writing
it's
better
to
keep
this
block
and
split
it
and
then,
if
I'm
reading
in
the
same
matter
so
reading
to
these
blocks,
this
unsplit
block
weight
will
be
better.
But
this
results
in
in
more
fragmentation
and
continues
large
reads
will
suffer
from
from
this
behavior.