►
From YouTube: 2018-Jun-28 :: Ceph Performance Weekly
Description
Weekly collaboration call of all community members working on Ceph performance.
http://ceph.com/performance
B
All
right
not
a
whole
lot
happening
in
terms
of
pull
requests
over
the
last
week,
there's
a
new
one
by
ma
Jinping
that
that
looks
interesting.
It
changes
some
of
the
behavior
of
the
K
vsync
thread
in
blue
store
and
the
way
that
finisher,
the
finisher
thread
works.
I
have
not
looked
closely
at
this.
The
only
thing
I
asked
Majin
pang
for
was,
if
he
could
run
it
through
the
wall,
clock
profiler,
because
that
might
tell
us
a
little
bit
more
about
kind
of
the
change
in
behavior.
B
B
B
C
Well,
actually,
if
your
actual
location
of
the
store
isn't
well,
it
is
not
fragmented,
then
duplicated.
Probably
it
was
better
because
well
it
properly
locates
the
the
continuous
segment
first,
while
with
map
locator
books
from
the
beginning
and
it
just
collects
any
available
extents
resulting
list
might
be
fragmented
and
it
doesn't
care
about
that.
C
A
B
A
B
B
Everyone
has
been
doing
a
ton
of
work
on
optimizing,
make
check
performance
and
just
can't
build
performance
in
general
over
the
last
couple
of
weeks
and
just
an
IRC
this
morning
he
was
talking
a
little
bit
about
it.
It
looks
it
looks
good
with
especially
the
work
that
he's
been
doing
on
caching
recently
it
looks
like
he
can
reduce
a
build
from
like
30
minutes
down
to
like
seven
minutes,
though
the
the
gist
of
it
is
well
I'm.
B
B
There's
a
couple
of
other
semi
recent
polar
requests
here
that
I
haven't
had
a
whole
lot
of
movement.
This
shared
persistent,
read-only
RBD
cache.
That's
that's
been
around
for
a
while
I'm,
not
I'm,
not
sure
what
the
current
state
of
that
is.
But
apparently
it's
it's
continuing
to
get
discussion
and
updates
here.
So.
B
B
A
Basically,
it's
ready
to
go.
It
has
some
drawbacks
right
it
mostly
to
the
involvement
of
administrator
to
configure
the
size
of
a
huge
page
pool.
However,
this
is
I'm
afraid
we
would
be,
and
we
wouldn't
be
able
to
overcome
that
this
is
just
a
system
limitation
if
you
want
to
use
its,
but
he
wants
to
use
explicit
huge
pages
amidst
he
needs
to.
He
needs
to
tell
Colonel
about
the
you
know.
A
A
B
B
A
B
A
Think
so
yesterday,
and
we
had
a
good
and
strong
idea
regarding,
regarding
the
up
tracker
and
testing
environment,
to
note
that
Peter
had
a
huge
difference
here.
I
was
used
on
in
Sirte
I'm
using
CBT,
which,
by
default
disables
suffix,
you
know,
is
very
similar
to
what
an
entrance
samsung
card
doing
in
their
own
test
of
maximum
I
opted
scenario.
A
A
A
A
First
of
all,
the
custom
of
division.
Operation
on
x86
is
variable
and
can
vary
from
just
nine
cycles
in
the
developed
mystic
scenario
up
to
ninety
cycles
night,
for,
for
instance,
90
cycles
is
about
its
it's
just
the
cost
of
l3
cache
miss
and
me
to
go
directly
to
to
the
system
memory.
Well,
it's
pretty
well.
In
the
exact
case
of
tracker
divisions
and
I
I
feel
that
I
get.
They
are
worth
at
least
at
least
fifty
or
forty
cycles,
which
is
comparable
to
and
to
cache,
to
l2
cache
miss
covered
by
l3.
A
But,
of
course,
the
optimizations:
it's
not
that
it's
not
the
most
important
part
of
the
pull
request.
Definitely
what
is
what
has
been
done
in
the
continuation
of
this
pull
request
is
integrating
our
our
p2t
wrapper
for
a
power
of
two,
it's
a
type
for
long
language
facility
to
convey
the
information
that
we
are
dealing
with
power
of
two
using
language
system
type
x
type.
A
A
A
Well,
I
have
one
huge
branch
with
all
those
things:
basically,
everything
that
is
in
the
alt
abstract
request,
plus
optimizations
related
to
read,
to
create
request
of
abstractor
plus
some
musicians
for
IO
toddler,
while
using
using
making
a
bond
between
shards
of
abstractor
with
shards
of
our
main
work.
You
well
a
lot
of
comments
there.
There
is
a.
B
A
A
B
B
Let's
see
the
only
other
thing
I
was
going
to
mention
is
that
there
has
been
some
discussion
recently
about
memory
allocators
again
and
whether
or
not
it
would
be
worthwhile
to
reconsider
the
Lib
C
memory
allocator
and
so
over
the
past
week
we
ran
some
tests.
Taking
a
look
at
that
again
here.
I'll
share
my
screen.
B
B
That
includes
the
the
T
cache
optimizations,
so
the
gist
of
it
is
that
we're
still
yeah,
Lib
C
malloc
is
still
is
still
really
still
really
bad
at
using
lots
of
RSS
memory,
presumably
due
to
fragmentation,
you
know
we
don't
have
direct
evidence
of
fragmentation,
but
you
know
this
is
you
know
probably
indicating
that
there
is
some
TC
malloc
really
I
think
probably
continues
to
have
kind
of
the
the
best
behavior.
Here
you
could
make
an
argument
that
J
malloc
is
maybe
is
probably
pretty
close.
B
The
the
vert
size
is
is
higher,
but
the
actual
you
know
RSS
memory
used
to
just
maybe
a
little
bit
lower
than
TC
malloc
is,
and
this
is
Kevin
old
version
of
J
malloc.
So
the
the
newer
version
might
be
better
I,
don't
know,
but
both
are
using
significantly
less
memory,
both
vert
and
RSS,
than
in
Lib,
C
Malik
and
there's
no
real
performance
advantage.
Phillip
C
Malik
either
in
this
case.
B
So
these
were
our
BD
results
for
K
random,
writes
using
an
nvme
device
with
3d
by
Bluestar
cache
one
OSD,
and
then
also
we
looked
at
our
GW
small
object
creation
and
you
know
the
stories
may
be
actually
a
little
bit
better
for
Lipsy
Malick,
but
it's
still
not
great
and
still
using,
not
quite
double
the
memory
of
the
other
two,
but
but
still
really
high.
So
yeah
I
mean
the
the
gist
of
it.
Is
that
I?
Don't
think?
There's
a
real
compelling
technical
case
right
now
for
using
whoopsie
Malick.
B
The
the
only
you
know
real
advantage
is
that
it's,
the
you
know
default
pretty
much
available
on
any
links.
Distribution,
no
extra!
You
know
setup
is
necessary
for
it,
but
you
know
I.
Think,
given
these
results,
we
need
the
Lipsy
folks
to
figure
out
ways
to
help
us
reduce
reduce
this
one.
One
thing
that
they
recommended
was
disabling
fast
bins
via
LD
pre
loading
shared
objects,
because
that's
the
only
way
you
can
disable
fast
bins
right
now.
B
So
we
did
that
and
then
also
increasing
the
duty
cash
count,
and
you
know
that
that
actually
increased
memory
usage
a
little
bit
I
think.
Basically,
what
happened
is
the
performance
went
down
a
little
bit
because
we
I
didn't
have
fast
bins
anymore
and
the
the
memory
usage
went
up
because
we
were
now
using
higher
than
default.
T
cash
count,
but
but
yeah
I
didn't
it
didn't
really
help.
So
we,
oh
I,
also
tried
reducing
the
number
of
arena's
kind
of
the
the
community
wisdom
out.
B
There
is
that
by
using
only
two
or
even
one
arena,
you
might
see
increased
threat,
lock
contention,
but
potentially
lower
fragmentation,
and
that
really
did
not
work.
I
didn't
even
get
to
the
random
right
case
before
the
denoted
run
out
of
memory
or
whatever
reason
decreasing.
The
arena
count
made
the
memory
usage
spike
very,
very
quickly,
just
pre-filling
an
RVD
volume.
It
was
up
to
like
11
or
12
MiG's
of
our
SS
before
we
even
got
to
this
test.