►
From YouTube: 2018-07-12 Ceph Performance Weekly
Description
Weekly collaboration call of all community members working on Ceph performance.
http://ceph.com/performance
A
A
A
A
There
is
a
new
PR
from
Aaron
85
regarding
the
racial
coding,
stripe,
cache
I,
don't
know
much
about
this
one
at
all
other
than
that
I
saw.
You
is
modifying
the
legacy,
configure
ops,
which
I
not
sure
do
pass
age,
hey,
how's
it
going.
B
B
All
right,
yeah
I,
don't
have
anything
I
gotta
run
in
a
minute
here.
B
A
B
I
wonder
what
merged?
In
the
last
hour
I
mean
you
tried
out
just
like
McMaster,
but
you
can
fight
against
Keith
pull
request
yeah
just
on
one
minor
and
Howard
Globes
library,
snippy,
libraries.
B
B
B
A
A
A
A
A
Yeah
I,
don't
think,
there's
anything
else
real
interesting
going
on
with
new
PRS
here.
So
what
to
discuss?
Maybe
the
only
one
is
mah
jam.
Peng
has
a
is
looking
at,
like
latency
of
the
K
be
finalized
thread,
which
is
good,
but
I.
Don't
know
if
there's
any
info
to
report
there,
yet,
okay,
so
for
my
stuff,
there's
two
PRS
that
are
coming
in
that
kind
of
work
with
each
other
one
is
to
pull
rocks
Phoebe's
cache
LRU
cache
into
our
tree
instead
of
using
their
default.
One
there's
a
couple
of
reasons
for
doing
this.
A
One
is
that
it
doesn't
work
quite
exactly
how
we
want
to
work.
They
have
a
low
priority
pool
on
the
high
priority
pool
from
for
the
LRU
and
the
high-priority
pool
stores
indexes
and
filters
optionally,
but
but
that's
how
we're
using
it,
but
it
also
can
then
store
data
that
has
been
recently
read.
It
doesn't
by
default,
store
that
there,
when
it's
been
written,
but
the
first
time
it
gets
read
it
moves
into
the
high
priority
pool.
We
don't
actually
want
that.
A
We
just
want
the
high
priority
pool
reserved
for
indexes
and
filters,
and
that
way
we
can
look
at
it
and
say
as
much
as
its
filled
up.
This
is
how
much
cash
we
need
for
Nyx's
and
filters
to
keep
that
guaranteed
in
cash,
so
in
our
own
version
of
the
LRU
cache
that
we
have
in
this
PR.
That's
a
change
that
we
make.
We
no
longer
are
putting
just
regular
key
value
pairs
in
the
high
priority
pools
cache
or
that
portion
of
the
cache.
A
A
Originally,
we
were,
we
made
a
modification
to
that
interface
and
tried
to
submit
it
upstream,
drock's
DB
and
it's
been
like
two
months
and
they
never
actually
moved
down
it
and
I
I
suspect
just
because
they
really
don't
want
to
make
modifications
to
the
the
cache
interface
and
that's
fine
I,
understand
why
they.
You
know
it's
it's.
You
know,
there's
probably
no
one
else
using
rocks
DB.
That's
trying
to
to
get
access
to
the
kind
of
cache
statistics
that
we
are
so
by
pulling
the
whole
cache
that
oil
are
you
cash
into
our
tree?
A
We
no
longer
have
to
use
the
public
interface,
we
can
add
and
remove
our
own
methods
for
forgetting
whatever
we
want
ourselves
and
and
is
probably
better
for
everyone.
So
that's
the
the
one
PR
that
does
both
of
those
things
and
then
there's
another
PR.
That
looks
at
the
amount
of
memory
that
the
OSD
is
using
the
heap
memory
and
then
both
and
the
amount
of
unmapped
memory
and
from
those
two
values.
We
can
commute
compute
the
amount
of
mapped
memory,
but
the
OSD
is
consuming.
This
is
from
TC
malloc.
A
We
don't
have
the
ability
to
do
this
with
Lib
C,
malloc
or
J
milk,
yet
maybe
in
the
future
we
will
be
able
to,
but
with
those
we
can
start
dynamically
adjusting
the
size
of
the
caches
in
blue
store
to
try
to
keep
the
OSD
s.
Rss
memory
usage
close
to
that
value
because
TC
malc
uses
M
advised
don't
need
and
the
kernel
is
not
guaranteed
to
reclaim
any
memory
that
has
been
unmapped.
A
B
A
Zoom,
this
in
I
think
can
can
folks
see
this
at
least
is
it?
Is
it
present
die
yet
not
yet
keep
going
better,
a
little
bigger,
a
little
bigger?
Okay,
maybe
a
little
here,
I'll
move
my
thing
to
the
how's
that
looking
that's
fine,
excellent
okay,
so
the
the
second
graph
here
is
kind
of
showing
roughly
the
behavior
of
what
we
had
maybe
a
month
ago.
It's
not
exactly
the
same,
but
it's
really
close.
A
That's
without
doing
any
kind
of
auto
tuning,
a
three
gigabyte,
fixed
blue
store,
cache,
the
the
values
are
different,
4kv
meta
and
data.
We
we
were
kind
of
trying
to
optimize
for
the
our
beauty
case.
We
really
really
aggressively
favored
metadata
over
the
other
two
previously,
but
but
this
is
just
kind
of
showing
the
behavior.
When
you
don't
do
any
Auto
tuning
and
use
fixed
ties
caches,
it
also
isn't
playing
any
tricks.
It's
giving
you
exactly
what
you
request:
a
three
gigabyte
fixed
cache
size
and
those
ratios,
and
it
doesn't
do
anything
else
previously.
A
Reallocate
memory
when
it
wasn't
used,
but
it
didn't
work
very
well
and
it
was
kind
of
broken.
So
it
didn't
look
exactly
like
this,
but
but
in
the
case
where
you
don't
do
any
auto
tuning
where
it's
disabled,
this
is
now
what
you
get
so
so
that
bottom
graph,
you
can
kind
of
see
it
it's
doing
kind
of
what
is
requested.
A
Will
use
memory
in
smarter
ways
if
it
if
it
thinks
it
can
do
so
say
if
one
cache
doesn't
need
all
of
the
memory
that
that
you've
specified
via
the
ratio
it
will,
it
will
borrow
it
for
another
cache
and
let
it
go
over
its
its
target
value.
So
in
this
case,
there's
a
new
parameter,
an
OSD
mem
target
parameter.
That
just
says
you
know
you
just
specify
how
much
memory
are
you
trying
to
keep
this
particular
demon
limited
to
and
with
the
auto
tuning
and
that
target?
A
A
My
understanding
is
the
kernel
more
aggressively,
reclaim
unmapped
pages
when
there's
a
lot
of
memory
pressure
and
will
do
so
at
and
it
may
choose
not
to
if
there's
no
memory
pressure,
so
you
know
potentially
in
a
scenario
where
there's
higher
memory
pressure
it
might
stick
closer
to
it
or
or
might
not
I.
You
know
capped
the
kernel,
so
you
can
see
here
that
the
the
cache
size
is
bouncing
around
a
little
bit
to
try
to
keep
up
with
that
and-
and
here
this
is
Kevin.
A
This
is
one
of
the
first
times
where
we've
really
seen
the
rocks
DB
kV
cache
kind
of
peaking
up
and
going
back
down.
Presumably
those
are
periods
of
time
where
it's
it's
maybe
created
a
new
level.
You
know
it's
stayed
within
the
same
level.
At
that
point
you
know,
merging
and
and
adjusting
ESS
ST
files
and
then
I
suspect
that
those
points
it's
when
it's
now
created
a
new
level
and
is
moving
stuff
into
it.
A
Just
a
guess
on
my
part,
but
but
it's
it's
interesting
to
see
that
behavior,
where's
kind
of
when
you've
got
these
fixed
cache
sizes
rocks
TB
is
not
allowed
to
to
kind
of
borrow
any
cache
like
that.
It
just
can't
stick
set
of
fixed
cache
size,
actually,
maybe
the
that's
not
quite
the
right
way
to
say
it.
A
It's
always
at
a
higher
cache
size
in
the
the
fixed
case,
it's
kind
of
at
that
peak
size
continuously,
whereas
in
the
outer
tuned
setting
it
can
get
back
down
to
really
just
what
it
needs
and-
and
you
can
reclaim
it
for
for
metadata
or
data
cache.
So
this
is
the
rgw
with
Oh
performance
in
this
test
was
almost
the
same.
A
It
turns
out
that
this
test
machine
was
I,
had
updated
for
doing
some
memory,
allocator
testing
and
pulled
in
the
the
changes
for
Spector
and
meltdown
when
I
did
that,
and
how
this
system
is
not
fast
enough
to
actually
show
any
performance
differences
between
either
of
these.
It
basically
runs
at
the
exact
same
throughput.
Roughly
so
I
need
to
retest
this
on
a
machine.
This
faster
I
think
because
right
now,
I'm,
basically
CPU
limited
now
hums
this
box,
but
any
event
they
they
look
about
the
same
or
for
our
BD
tests.
A
It
grows
in
the
pre
filled
stage
and
then
and
then
just
you
know,
Canada's
level
at
whatever
values
you
set.
So
that's
it.
That's
all.
I've
got
right
now,
the
the
in
my
opinion,
the
more
interesting
change
is
coming
next,
which
is
the
actual
LRU
binning
the
age
based
bidding
for
the
lr
use
that
will.
Hopefully,
let
us
see
changes
in
the
cash
ratios,
depending
on
the
workload
that
you're
doing.
A
If
you
go
from
an
RB
d
to
an
r
GW
back
to
an
RB
d
workload,
the
goal
is
to
make
it
so
that
you,
the
cash,
is
adjust
based
on
what's
happening
at
that
time
and
my
just
back
to
kind
of
the
previous
state
right
now.
This
doesn't
do
it
or
probably
doesn't
I'm
guessing
last
time.
I
looked
it
didn't
so
that's
it
any
questions
on
any
of
this
I.
A
Just
have
a
couple
quick
questions:
I
love
it
by
the
way
it
looks
really
good.
The
only
question
I
have
is
how
big
a
change
did
you
have
I
mean
like
is
it?
This
is
a
really
complicated
change,
or
was
it
relatively
straightforward
and
if
the
latter,
what
is
the
chance
that
we
could
get
this
into
say?
Luminous
go
ahead,
don't
mock!
My
goal
is
to
have
all
of
these
changes
backward
for
3.2,
we'll
see
if
it
how
hard
it
is,
but
I,
don't
think
anything
is
too
terrible.
It's
not
one
change
right!
A
There's,
there's
gonna
be
like
four
PRS,
the
the
PR
where
we
look
at
TC
Malik's
stats
to
compute
the
the
cash
the
overall
cash
size
is
pretty
straightforward.
That
should
be
an
easy
back
port
I
think
that
will
get
us
at
least
the
the
on
Mac
tuning
of
the
the
overall
cash
size.
The
the
tuning
of
the
ratios
within
that
cache
size
is
a
little
bit
more
complicated.
I
think
that
we
can
do
it.
The
biggest
change
is
pulling
the
rocks.
A
Db
LRU
cache
into
our
tree
our
version
of
it
into
our
tree,
but
the
good
news
is
that
it
doesn't
look
to
me
like
the
public.
Cache
interface
has
changed
in
the
last
year
or
two,
so
I
think
that
our
changes
won't
rely
on
a
new
version
of
racks.
Db
I
think
we
can
pull
those
in
without
pulling
in
a
new
rocks.
Db.
So
I
think
that
backporting,
it
might
just
kind
of
work
so
long
as
we
resolve
any
other
weird
things
like
there
was.
A
There
was
a
G
comp
change
just
just
recently
in
the
last
day
or
two
that
that
will
we'll
have
to
go
through
and
like
fix
everything
for
when
we
back
port
but
yeah.
Otherwise,
you
know
it's
kind
of
complicated,
but
it's
not
so
much
so
that
I
think
I
think
we
might
build
a
back
port
it
at
least
to
at
least
four
three
two
luminous
might
be
tougher,
we'll
have
to
see
oh
I'm,
sorry,
my
bad
I
thought
I
got
confused.
A
I
thought
three,
two
is
luminous,
and
it's
really
saying
it's
something:
it's
a
different
upstream
release,
yeah!
Well,
that's
a
good
question.
Those
I
I
should
actually
find
out
what
that's
actually
based
on,
but
yeah
yeah.
The
good
news
I
didn't
want
to
put
okay
yeah
sure,
no,
no
worries,
no
I,
I
guess
the
gist
of
it
is
think
we
maybe
can
do
it.
A
You
know
it's
kind
of
a
big
thing
to
back
port,
but
I'm
hopeful
that
we'll
we'll
get
it
in
four
three
two
at
least
it
seems
very
significant
because
some
of
the
the
problems
that
we
have
you
know
in
the
field
have
to
do
is,
like
you
know,
hyperconvergence,
were
you
trying
to
run
deaf
and
applications?
You
know
in
the
same
box,
and
this
makes
it
a
lot
more
manageable,
though
that's
the
goal.
I
really
would
like
it.
A
You
know
we
can't
control
our
SS
memory
usage,
but
you
know
as
exactly
right.
You
know
the
colonel
is
involved,
so
you
know
we
can't
force
the
colonel
to
reclaim
pages.
We
can
only
kind
of
say
and
we've
unmapped
stuff
and
the
Colonel's
free
to
reclaim
it.
We
can't
guarantee
it's
not
fragmented,
but
you
know
we
can
at
least
say
that
there's
stuff
here
the
colonel
can
grab
if
it
wants
to
and
I
think
that's
the
best
we
can
do
well,
the
only
the
only
concern
I
have.
A
So
that's
a
really
good
question,
because
I
have
not
run
any
tests
with
this
when
you
go
in
to
backfill,
but
presumably
we
will
be
watching
the
heap
stats
and
it
should
dynamically
adjust
the
cache
size
down
to
whatever
minimum
is
set.
You
know
up
to
whatever
minimum
is
set
to
try
to
compensate
for
any
memory
usage
that
we
haven't
accounted
for
in
the
heap.
So
theoretically,
this
might
actually
level
out
memory
usage
during
recovery
too.
A
A
Yeah
I
mean
I,
don't
mind
getting
a
little
slower,
you
know,
but
it
it's
like.
When
you
run
out
of
memory,
then
it
gets
a
lot
slower.
Exactly
exactly
I
would
I
would
much
rather
eat
the
performance
of
performance
it
if
it
meant
that
we
could.
You
know
to
the
best
of
our
ability,
keep
the
memory
usage
of
the
OSD
within
certain
balance,
the
the
performance
issue
we
can
fix
later
on.
Once
we've
got
the
OSD.
You
know
set
that
it's
behaving
in
a
consistent
way.
Then
you
know
we
can
start
looking
at
well.
Why?
A
Why
do
we
need
so
much
memory?
Are
there
things
that
we
can
do
to
speed
it
up
in
memory,
limited
scenarios,
etc,
etc?
But,
but
if
we're
just
you
know
letting
the
OSD
use
an
unbounded
amount
of
memory,
you
know
all
bets
are
off.
You
know
the
user
has
no
ability
to
do
kind
of
understand
or
know
what's
going
on
at
any
given
point
in
time.
So
exam
I
hope
is
that
this
will
it's
not
perfect
right?
We
can't
we
can't.
A
We
can
only
adjust
the
cache
sizes
once
you
get
down
to
the
minimum
cache
size,
there's
nothing
more.
We
can
really
do
with
this
PR
anyway.
So
if
other
things
in
the
OSD
are
using
tons
of
memory,
you
know
this.
This
will
compensate
to
an
extent,
but
right
now
the
minimum
size
is
set
at
128
megabytes.
So
if
we
get
down
to
that,
then
there's
nothing
more.
We
can
do
and
it
will
then
we'll
grow
the
process.
A
A
All
right,
that's
all
I've
got
any
anyone
have
anything
else.
They
want
to
discuss
this
week,
or
should
we
wrap
up
all.