►
From YouTube: 2017-MAY-25 :: Ceph Performance Weekly
Description
Weekly collaboration call of all community members working on Ceph performance.
http://ceph.com/performance
For full notes and video recording archive visit:
http://pad.ceph.com/p/performance_weekly
A
A
So
after
looking
at
some
pull
requests,
let's
say
I'll
start
with
the
closed
ones.
I
think
two
interesting
things
happened.
A
lot
of
them
sell
on
here.
So
the
first
one
is
that
the
how
I
emerged,
the
right,
locking
tension
patch
an
async
messenger.
This
is
something
that
came
up
in
the
profile
as
quite
a
bit
of
time,
spent
waiting
on
this
particular
lock,
and
so
we
rearrange
the
code
a
little
bit
in
the
messenger,
and
now
we
don't
sit
there.
A
A
We
have
a
better
idea
where
that
threshold
should
be,
but
it
provided
matter
that
much
I
don't
know
how
often
actually
we're
calculating
CRC's
see
her
buffers.
Anyway,
let's
see
we
close
the
Zipkin
branch,
because
those
liberate
of
stuff
was
rebased
and
merged
a
weaker
couple
weeks
ago
and
I
think
that
our
v1
also
just
merged,
and
so
the
original
PRS
closed,
the
penis
emerged
so
yeh
the
last
one.
A
That's
why
the
most
interesting
on
Buser
side
is
the
kb
sync
thread:
that's
gone
through
several
iterations
originally
starting
with
some
work
at
Intel,
SCI
Shaun
thing
has
urged,
and
that
shows
like
a
10%
change
on
nvme,
well,
less
than
a
30%
mark
was
saying
when
it
had
also
eliminating
the
finisher,
but
there's
a
bad
luck
that
happens.
If
you
do
that,
so
use
more
work
before
we
can
sort
of
go
all
the
way.
A
Igor
had
a
pull
request
that
pushed
or
Oh
any
comments
on
the
queue
transaction
thing.
I
haven't
seen
that
actually,
as
that
came
from
but
Alibaba
I
think
they
opened
up
here
anyway,
I
massino's
come
with
Seattle.
Take
a
look.
There's
one
for
rock
CD
that
avoids
memory,
copy,
sure,
I,
think
I,
just
mostly
did
go
through
a
I,
would
assume
it's
going
to
help
marginally,
but
has
been
tested
from
the
formal
side
and
as
if
old
ones,
that
as
a
discard
methods
for
SSD
yeah.
A
That
good
still
needs
work,
and
then
there
are
some
new
ones.
Sean
peg
has
a
batch
throttle.
I
listed
that
yet
on
clean
apps
actually
really
performs
really
I.
Don't
think
that
that's
going
to
testing
right
now-
and
there
is
a
change
to
the
encoding
getting
code
code
that
we
turned
up
on
Big
Bang,
though
it
behaves
horrendously
when
but
buffer
is
fragmented
in
you're
trying
to
code
using
a
legacy
code,
you
could
stuff
on
top
of
it
and
so
fixing
need
to
fix
that.
A
A
Yeah
not
to
anything
here
is
worth
mentioning.
I
guess
the
main.
The
only
ones
here
that
I
think
is
still
in
play
for
luminous
is
one
that
that
does
EC
there.
You
go
the
second
one:
try
to
unshare
velocity
fever
at
workload
that
showed
some
dealers
in
QA,
but
I
need
to
fix
those,
and
we
should
merge
that
for
luminous,
because
it'll
make
blue
store,
behaves
much
better
on
EC
of
right,
pools
and
I.
Think
that's
that's
it
for
the
PRS.
D
Just
going
to
say,
you
purchase
the
other
one,
which
was
a
common,
improved
qsd
calculation
for
0
buffers,
which
is
great
and
then
I'll
just
want.
Let
you
know
my
plan
is
to
rebase
to
sort
of
incorporate
those
changes
and
then
continuing
work
on
encourage
ambassador
on
power
for
Intel.
The
waterboarding,
faster,
ok,.
A
A
Oh
yeah
go
ahead
and
go
to
the
discussion
topics,
so
the
first
thing
is
blue
store,
Mark's,
been
doing
more
testing
and
number
of
recent
improvements
to
the
point
where
we
had
set
metallic
size
to
16k
on
nvme,
because
it
was
faster,
even
though
there's
more
right
amp,
but
it
looks
like
that
is,
may
no
longer
to
be
the
case.
It
might
be
about
the
same
and
so
having
a
fork.
A
C
Yeah,
that's
just
happening
right
now
with
me
outside
before
Kade,
but
one
of
the
things
I
have
noticed
is
that
unless
you
restart
with
it,
the
overall
memory
usage
looks
like
it's
lower
than
it
is
even
look
like
a
sixteen
Caden
and
Alec
sighs
when
it's
not
restarted.
So
it
might
be
that
even
in
the
16k
case
we're
using
more
memory-
and
maybe
it's
maybe
there
is
some
ceiling
that
we
have
or
star
stabilizing,
but
I
mean
it's
past
eleven.
You
get
lights
RSS
before
hanging
outside.
C
C
It's
the
from
last
week,
excuse
program.
You
could
have
there's
you
can
see
if
there
some
long
1697
of
the
myth
trace
that
was
included
there.
But
one
of
the
reasons
why
for
teaming
Alec
size
is
probably
faster
because
we
do
like
Casa
there.
Oh.
A
A
That
probably
have
lots
of
people,
because
even
so,
if
you
are
trying
to
write
and
you
aren't
aligned
to
a
page,
then
we
have
to
wait
for
the
I/o
for
the
previous
page
to
be
to
complete
before
you
write
again,
you
can't
have
two
iOS
for
the
same
block
outstanding,
or
else
it
could
race
in
the
block
layer.
So
it's
also
sort
of
a
serialization
thing.
A
So
you've
seen
that
9
percent
in
time
and
copy
you've
seen
that
that
you
measured
with
the
4k
malloc
size
and
that
went
away
it.
A
A
C
C
A
C
A
Yeah
shifted
somewhere
else
sounds
good,
let's
see
other
things
going
on,
Booth's
orifice
check
filling
a
disk
up
and
then
making
sure
that
it
wasn't
that
you
can
actually
run
it
as
figure
out
what
the
memory
envelope
is
make
sure
it
works.
So.
A
There's
a
a
little
at
the
big
bang
stuff.
That's
been
happening
last
two
weeks,
so
we
have
a
temporary
cluster
set
up
at
CERN
that
has
like
200
hosts
or
something
supposedly
10,000
owes
to
you
so
that
we
never
advocacy
at
adult
10000
right
now.
We
have
like
6500
Louis
to
use
and
system
and
we're
testing
all
the
new
code
for
the
monitor
and
manager,
with
stats
going
to
the
manager
and
making
sure
it's
stable
and
fixing
with
this
stuff,
so
that
work
is
ongoing.
A
A
Cluster,
but
I
just
wanted
to
see
what
happens
and
then
a
whole
rack
went
down,
and
so
a
bazillion
PG
mmio
Steam
app
got
remapped
to
PG
temp
with
PG
temp
entries,
and
then
we
started
seeing
a
memory
usage
for
the
decoder
toasty
map
balloon
because
it's
a
red
black
tree
and
it's
just
super
inefficient.
It's
a
red
black
tree
of
vectors,
which
then
have
another
allocation.
A
So
it's
only
a
few
handful
of
memory
allocations
and
much
more
compact
and
faster,
so
testing
that
but
I'm
having
really
got
it,
can
sit
here
at
what
happens
like
that's
deployed,
I'm
still
working
on
that
it's
sort
of
challenging
to
diagnose,
because
they're,
just
having
so
many
pts
means
that
just
random
things
that
are
totally
reasonable
to
spend
CPU
time
on
suddenly
take
a
long
time.
And
then
you
start
hitting
weird
timeout.
So
you
have
to
figure
out
what
part
is
slow
and
why
and
kind
of
parted
to
bug.
A
So
we're
gonna
do
that,
but
that's
been
good,
hopefully
we'll
be
able
to
like
have
a
fully
stable
cluster
at
you
know:
either
sixty
five
hundred
or
ten
thousand
those
fees
and
sort
of
check
that
box,
but
we'll
see.
A
That's
it
for
me,
any
other
topics.
People
want
to
talk
about
right
now,
right
now
about
that
cluster
at
6,500
OSDs,
the
machines
they're
enough
machines
for
10,000
OS
DS,
but
they
aren't
all
added
a
cluster
I'm,
not
sure
if
that's
because
they
didn't
all
get
deployed
or
because
we
just
didn't
add
them
yet
so
I
get
it.
Does
we've
been
working
too
much
other
stuff
the
time
so
hopefully
we'll
get
to
ten
thousand
fighting
the
week.