►
From YouTube: Ceph code walkthrough: BlueStore cache autotuning
B
If
you're
ready,
you
can
please
jump
into
it.
Yeah
I
wasn't
watching
the
recordings
rather
than
attending
the
live
session.
So,
okay.
A
B
A
Close
enough
yeah,
alright,
alright!
So
I've
got
this
big
fancy
sounding
title
here
about
15
minute
Ami
and
continue
learning.
He
said
well
really
what
this
comes
down
to
is
I
want.
I've
got
some
slides
here
that
I
wanted
to
go
through
kind
of,
as
we
talked
about
the
code,
but
what
it
really
comes
down
to
is
I'm
trying
to
make
the
OST
smart
I
really
I
want
them
to
do
smart
things
and
and
not
to
kind
of
force
on
the
user.
A
I
kind
of
got
this
the
analogy
recently
I've
been
reading,
Ender's
Game
with
my
kids
and
there's
a
really
good
quote
there-
that
every
single
one
of
our
ships
contains
an
intelligent
human
who's
thinking
on
his
own
and
every
one
of
us
are
capable
of
coming
up
with
a
brilliant
solution
to
a
problem.
They
the
the
kind
of
enemy
bugger
fleet,
who
are
kind
of
like
giant
insects.
A
They
only
come
up
with
one
brilliant
solution
at
a
time
they
think
fast,
but
they're
not
smart,
all
over,
and
you
know
in
in
terms
of
how
seffle
kind
of
works
in
so
many
other
ways
we
have
kind
of
distributed.
Smarts
right.
The
whole
concept
kind
of
behind
crush
is
that
you're
you're
distributing
this
decision-making
process.
There
are
rules,
there
are
very
strict
rules
about
how
it
happens
is
all
deterministic,
but
but
Madonna
have
the
user
having
to
kind
of
come
in
and
say:
okay,
you
know
this.
A
A
My
thought
is
that
it's
much
better
if
users
define
goals,
this
is
kind
of
the
the
goal
that
I'm
trying
to
accomplish
with
the
setting
and
I.
Don't
want
to
think
about
it.
I
don't
want
to
have
to
you
know,
try
to
finagle
or
finesse
very
fine-grained,
behavior
I
just
want
it
to
do
the
right
thing
and
what
here's?
A
How
I
I
think
the
right
thing
is
so
right
now
you
know:
we've
got
like
I
think
last
time,
I
counted,
it
was
somewhere
I,
think
between
200
and
300
options
in
option,
stuff,
CC
and
at
least
a
variety
of
those
not
not
all
of
them
are
documented
and
then
controllable
through
stuff
manager.
Probably
I
assume
there's
some
GUI
that
that
has
at
least
a
subset
of
these,
if
not
all
of
them
to
find
that
they
can
set.
A
A
The
the
one
that
that
that
came
up
was
memory
management
in
the
OSD
I
actually
observed
some
users
struggling
with
it,
especially
in
the
way
that
the
cache
is
in
boost
or
defined
and
how
they
affect
memory,
usage
and
kind
of
a
the
the
the
thought
in
their
head
was
okay.
Well,
I've
got
these
nodes
that
have
this
much
memory
available.
How
do
I
set
all
this
stuff
so
that
you
know
my
OSD
will
stay
within
that
and
use
this
memory
in
an
effective
fashion
at
the
time
this
was
this
was
like
eight
months
ago.
A
A
So
the
the
very
first
thing
is
that
I
kind
of
well,
this
isn't
exactly
true,
but
but
one
of
the
early
things
I
started
looking
at
was
trying
to
just
say:
can
we
make
the
OSD
fit
within
a
certain
memory
target
you
know?
Can
we
can
we
make
it?
So
that's
it's
fairly
well
behaved
kind
of
optimally
right,
you'd,
actually
pre-allocate
all
of
the
memory,
and
you
would
stay
in
it
that
way,
and
you
have
a
lot
of
guarantees,
or
at
least
you'd,
be
more
guaranteed
to
stay
close
to
that
memory
range.
A
But
we
don't.
We
don't
have
the
luxury
of
doing
that
right
now,
it'd
be
way
too
much
work,
so
I
eat
it
and
instead
what
I
kind
of
started
looking
at
was
okay.
Well,
DC
Malick
gives
us
all
this
information
about
the
the
heap
memory
usage
and
some
other
useful
statistics
about
like
what's
mapped
and
what's
not
mapped,
and
this
kind
of
thing
so
we've
already
got
code
in
place
to
programmatically
grab.
All
of
that
so
I
I
was
started.
Thinking.
A
Well,
okay,
you
know,
I
have
control
over
the
size
of
all
these
caches
and
in
fact
you
know
we
already
kind
of
set
a
target
value
for
the
caches
in
general
and
blue
store.
So
maybe
we
can
just
start
adjusting
this
up
and
down
to
have
react
to
how
much
memory
the
OSD
is
using
it's
it's
a
little
more
complicated
than
that,
but
it
sort
of
generally
works.
A
So
after
thinking
about
this,
a
lot
I
came
up
with
kind
of
a
very
simplistic
algorithm
for
deciding
when
the
size,
the
total
amount
of
memory
available
for
caches
should
increase
and
when
it
should
decrease.
So
the
idea
here
is
that
we
have
an
OSD
memory
target,
how
much
memory
we
want
the
OSD
to
use.
We
also
know
how
much
memory
has
been
mapped
by
the
process
and
that
then
kind
of
gives
us
the
ability
to
defined
a
ratio
of
mapped
memory
to
target
memory.
A
A
A
The
idea
here
is
that
we're
kind
of
slowly
growing
this
memory
and
as
we
get
closer
to
cache
max,
we
end
up
growing,
even
slower,
so
we're
just
really
kind
of
incrementally
getting
closer
and
closer
to
cache
max,
but
that's
the
limit
we
don't.
We
don't
want
to
exceed
it,
but
now
what
if
or
unknown
reasons
other
things
come
in
the
amount
of
map
memory
is
greater
than
the
target
memory.
A
Well,
in
that
case,
we
instead
look
at
that
ratio,
which
is
basically
just
kind
of
what
gives
you
have
the
ratio
of
overage,
and
then
we
remove
memory
based
on
that,
but
instead
now
we're
looking
at
the
kind
of
the
the
the
previous
cache
size
compared
to
the
min.
So
we
we
bounce
away
basically
and
we
bounce
away
with
kind
of
a
greater
force
than
we
do
when
we're
approaching
it.
So
as
we
approach
we
get
kind
of
weaker
and
weaker
right,
we
don't
want
to.
We
don't
want
to
end
up.
A
Kind
of
aggressively
hitting
that
cache
max
we
just
kinda
wanna,
get
slower
and
slower
move
closer
to
it,
slower
more
slowly,
more
slowly
if
we
exceeded.
If
we
exceed
our
target
memory,
then
we
can
want
to
shoot
away
from
it
quickly
and
just
kind
of
again
start
growing
slowly
as
we
approach
it.
So
this
is
I'll
actually
go
into
the
code
and
a
little
bit
here,
but
this
is
basically
the
algorithm
that
we
just
talked
about
that
that's
implemented
there
and
it.
Actually,
it
took
a
little
bit
to
get
to
this.
A
A
A
A
Of
course,
we
just
talked
about
the
OSI
memory
target,
this
OST
memory
base
and
OST
memory
expected
fragmentation.
All
this
is
really
doing
is
lowering
the
maximum
cash
to
be
some
ratio
of
the
OST
memory
target.
We're
basically
saying
here:
okay,
well,
we're
we're
kind
of
estimating
some
base
memory
usage
in
the
OST
once
it's
kind
of
at
a
steady
state,
and
then
we're
expecting
some
level
of
memory.
A
But
it's
not
it's
not
really
something
that
that
users
need
to
worry
about,
as
we
account
for
more
things
in
in
this
system
right
now,
it
doesn't
have
any
knowledge
about
the
the
rocks
DB
right
ahead:
log
mem,
mem
tables,
so
the
buffers
that
are
there,
it
doesn't
know
anything
about
those
or
anything
about
like
PG
log
right
now.
So
this
768
megabyte
number
is
just
kind
of
made
up
to
take
that
kind
of
stuff
into
account.
A
But
at
some
point
we
could
actually
really
factor
it
in
you
know
make
it
so
that
it
was
about
these
things
knows
how
much
memory
is
being
used
for
all
these
things,
and
then
this
kind
of
expected
fragmentation
and
memory
based
number
could
go
away
entirely.
It
would
just
you
know,
kind
of
have
an
idea
of
what
other
stuff
is
potentially
using
memory.
A
Okay,
so
OSD
cache
min.
That
is
the
minimum
total
amount
of
cache
of
memory
that
should
be
devoted
for
Bluestar
caches.
So
if,
for
example,
we
have
a
memory
leak
somewhere
and
the
auto
tuner
is
attempting
to
like
fight
it.
Basically,
it
won't
go
below
this
amount
of
memory
available
for
caches
it'll.
Just
at
that
point
say:
okay,
I've
said
everything
to
the
lowest
values.
I'm
Louis
aggregate
I'm,
a
memory
I'm
going
to
assign
to
this
stuff
I
give
up
now.
A
I've
done
my
best,
we're
done
or
in
a
scenario
where
you
know
potentially
you've
you've
got
the
memory
targets
so
low.
Like
say
you
know,
256
megabytes
or
something
you
know
it.
You
can't
do
much
with
that.
It'll
say:
ok:
I've
set
the
cache
as
the
total
aggregate
cache
space
down
to
128
megabytes.
I
can't
do
anything
else.
The
the
memory
usage
is
going
to
exceed
our
target.
That's
just
the
way
it
is
the
resize
interval
here
is
how
many
seconds
we
wait
between
cache
base
between
basically
doing
this.
A
This
this
analysis
of
the
cache
dies
and
and
how
much
memory
is
being
used.
You
don't
want
to
set
that
too
low.
If
you
make
that
too
small,
it
will
really
make
a
TC
malloc
kind
of
thrash
around
it's.
It's
just
you
know
not.
It
doesn't
really
it's
not
really
worth
it.
I've
played
around
with
it
a
fair
amount,
but
yeah,
it's
just
not
really
not
really
useful.
A
You
know
not
not
totally
perfect,
not
not
100
percent
within
guaranteed
to
be
within
our
target
range,
but
you
know
pretty
close.
All
right
so
well,
so
we
have
here
now
this
is
this
is
just
you
know,
changing
other
levels
and
things
not
nothing
interesting.
All
right
are
the
cmakelists
change.
That's
also
not
particularly
interesting
all
right
here
in
blue
stored
SEC
we're
adding
in
per
flew
heap
profiler
that
we
can
make
our
calls
and
find
out
what
we're
using.
A
A
The
idea
here
is
that
we
want
to
still
allow
a
user
to
to
set
the
stuff
themselves
if
they
want
to.
You
know
it's
the
kind
of
idea
right
is
that
it
it
does
things
automatically
by
default,
but
these
are
wants
to
override
it.
Then
they
get
very.
A
A
Okay,
this
is
more
of
the
same
nothing
interesting
there.
Here
we
go
now
we're
getting
into
the
meat
of
it
so
to
encash
size,
we're
grabbing
all
these
options.
Basically,
these
things
are
are
being
read
in
in
in
the
blue
store
code,
but
then
we've
got
this
vent
pool
thread
now
or
we're
actually
just
asking
to
grab
them
from
blue
store.
So
blue
store
is
the
thing
that,
hopefully
you
know,
registers
listeners
for
all
this
stuff
and
and
and
sits
upright,
and
then
we
just
grab
it
every
time
here.
A
We
grab
while
we
here
we
go
we're
grabbing
the
the
different
heap
statistics
that
were
interested
in.
We
also
beforehand,
every
time
we're
we're
iterating
through
this
we're
releasing
the
the
free
memory.
There
is
a
way
you
can
do
this
through
the
admin
socket,
but
people
don't
I,
don't
I,
don't
really
think
people
use
it.
I
would
much
rather
use
it
here
on
kind
of
the
same
schedule
that
we're
doing
everything
else
and
I
mean.
Is
this
we're
not
doing
that
often
really
so
this?
This
helps
us
keep
everything
kind
of
well
controlled.
A
Can't
force
the
colonel
to
reclaim
that
it
does
it
on
its
own,
as
often
as
it
feels
like
on
Ubuntu,
we
saw
it
have
in
in
low
memory
pressure
environments,
not
reclaiming
memory
to
the
point
where
the
the
amount
of
mapped
memory
differed
from
RSS
by
like
20
percent,
but
on
Sun
toss
our
CentOS
machines.
It
was
very,
very
close
and
I
suspect
on
the
aboon
tube
machine.
A
It
would
probably
the
kernel
would
be
more
aggressive
about
reclaiming
the
unmapped
memory
if
there
was
a
lot
of
memory
pressure,
so
I
think
it's
probably
ok,
but
the
the
the
kind
of
irritating
thing
is
that,
when
users
look
at
the
amount
of
RSS
memory
by
the
process,
they
will
see
that
it's
maybe
higher
than
what
they
set
the
target
to
be.
Even
though
the
mapped
memory
is
very
very
similar.
So
you
know
it's
kind
of
unfortunate,
but
it
is
what
it
is.
I'll
show
some
graphs
later
on.
A
Okay,
so
we've
got
or
defining
new
size
and
we're
first
saying
it
to
the
the
old
size,
the
auto-tuned
cache
size
and
then
we're
basically
just
bounding
it
by
the
cashman
and
cache
max.
So
if,
if
things
have
changed,
if
some
of
these
settings
have
changed,
you
know
the
in
in
between
iterations.
This
basically
just
sets
everything
so
that
were
within
our
boundaries
nicely.
A
A
There's
there's
some
other
stuff
here
for
dumping
stats
out
and
everything.
It's
it's
really
surprisingly,
simple,
there's
not
a
whole
lot
going
on
yeah.
This
is,
you
know
just
setting
these
things,
it's
not
interesting,
so,
okay,
Josh,
how
much
time
do
I
have
am.
I
do
I
need
to
be
done
soon
or
is
we
have
more
time
we.
A
Fantastic
okay,
so
then
I
will
keep
talking.
So
that
is
the
PR
that
introduced
the
memory
kind
of
auto-tuning,
but
we
did
more
than
that.
Have
the
the
simultaneously
while
this
was
going
on.
I
was
trying
to
think
about
well,
okay,
that's
great
that
we've
now
got
kind
of
the
the
overall
target
for
the
memory
usage
of
the
OSD
people.
Don't
have
to
think
about
that
much
anymore,
but
we
still
have
this
issue
where
no
one
has
any
idea.
What
any
of
the
blue
store
caches
should
be
set
to.
A
It's
super
confusing
it's
confusing
even
for
me,
sometimes
and-
and
that
has
to
do
with
just
kind
of
how
our
caches
used
to
work
and
how
the
rocks
TB
block
cache
works
because
it
it's
not
intuitive.
Unless
you
actually
read
the
code,
there
are
things
that
they
they
have
kind
of
this
concept
of
high
priority
and
low
priority
cache
just
in
their
LRU
cache,
not
in
the
clock,
cache
and
what's
in
the
high
priority
cache
and
what's
not
in
the
high
priority,
cache
isn't
isn't
super
well-documented
and
there's
a
lot
of
options
that
govern
it.
A
So
anyway,
I
I
was
thinking
about
all
of
this
and
trying
to
think
about
okay.
Well,
maybe
we
can
make
this
automatic,
because
it's
there's
no
real
good
default
for
these
things.
Anyway,
if
you
have
a
rgw
workload,
you
might
want
a
lot
of
own
ODEs
to
stay
or
sorry
I'm,
oh
man
entries
to
say
in
cash,
but
if
you're
doing
RBD
on
on
blue
store,
you
might
want
a
bunch
of
own
ODEs
to
have
really
really
high
priority
in
cash,
but
own
ODEs
can
also
be
cashed
in
the
rocks
to
be
block.
A
Cash
is
hierarchical
and
theoretically
they
might
be
a
little
smaller
because
we're
doing
var
engine
coding,
but
then
that
encoding
means
that
there's
more
CPU
overhead
to
do
the
encode
and
decode.
So
you
know,
there's
there's
tons
and
tons
and
tons
of
variables
and
it's
impossible
for
anyone
to
really
figure
this
out.
So
why
want
to
start
looking
at
is:
can
we
make
it
automatic?
We
kind
of
use
the
same
idea
where
we're
periodically
looking
at
different
statistics
and
then
adjusting
the
amount.
A
A
The
idea
here
is
basically
you've
got
some
kind
of
controlling
thread.
That's
periodically
going
through
and
asking
each
cache
for
each
priority.
How
much
memory
would
you
like
for
this
particular
priority?
And
then
this
kind
of
controller
at
each
priority?
Ask
these
these
questions
and
then
based
on
how
much
memory
is
available
and
kind
of
a
fair
share
ratio
for
each
cash
based
on
kind
of
the
initial
ratios
that
are
set
in
the
set
that
comp
file
says.
Okay,
you
get
memory,
you
get
memory,
you
get
memory
kind
of
based
on
these
things.
A
You've
asked
for
more
than
your
fair
share
ratio,
so
I
mean
it's
kind
of
keep
you
in
mind,
but
here's
how
much
your
you're
gonna
get
right
now
and
because
you
request
a
lot
or
someone,
another
cash
has
said:
oh
I,
don't
care
I,
don't
have
anything
I,
don't
need
any
memory?
Okay!
Here,
let
me
just
give
you
the
small
amount
you
want
or
nothing.
A
A
Okay,
so
so,
let's
go
through
now
and
then
start
dividing
up
the
remaining
memory
and
see.
If
now,
we've
satisfied
the
requests
and
we
keep
doing
that
at
the
given
priority
level
until
every
cache
is
satisfied
or
we've
run
out
of
memory.
One
of
the
two-
let's
let's
say
every
cache
is
satisfied
great
okay.
Now
we've
still
got
some
memory
lists
left.
So
let's
move
on
and
repeat
this
process
at
the
next
priority
level.
So
you
know
the
highest
priority.
A
Things
are
first,
king
memory
assigned
based
on
their
fair
share,
but
eventually
we
keep
iterating
through
and
they
they
hopefully
get
as
much
memory
as
they
requested
for
that
priority
level
for
a
very
high
up
priority
level,
and
then
we
maybe
go
down
to
a
medium
priority
level
or
a
low
priority
level,
and
and
just
can't
repeat
this
process,
so
so
no
low
priority
items
will
get
fulfilled
unless
there's
memory
available
because
we'll
we'll
give
everything
what
it
wants
at
higher
priorities.
First,
all
right.
A
And
weird,
okay,
so
so
have
a
first
pass
at
this
was
looking
at
the
rocks,
TV
block
cache
they
they
already
have
kind
of
this
concept
of
high
priority
items,
specifically
the
the
bloom
filters
and
the
indexes
for
the
SST
files
are
a
really
high
priority.
You
really
want
those
to
stay
in
cash
because,
if
even
if,
like
an
item,
is
not
in
cash
by
having
that
that
bloom
filter
in
cash,
you
can
avoid
a
lot
of
reeds
as
you
iterate
through
all
the
levels
of
rocks
DB.
A
If
you
don't
have
bloom
filters,
potentially
you
could
look
for
something
in
every
single
level,
after
new
reeds
in
every
single
level
to
to
find
where
exactly
in
rocks
DB.
It
is.
But
if
you
have
those
bloom
filters
in
cash
and
if
you
have
a
reasonably
high
number
of
bits
devoted
to
them,
you
can
say
with
with
fairly
high
certainty
or
you
can
avoid
I
guess
situations
where
you
have
a
something
that
that
wasn't
there.
But
you
have
to
look
anyway
right.
So
you
can.
You
can
avoid
a
lot
of
the
reeds,
basically.
A
So
those
things
are
really
important
indexes
filters.
They
also
include
recent
reads:
I
didn't
actually
like
that
that
was
one
of
the
things
I
didn't
want
there,
because
I
kind
of
want
more
control
over
what
things
go
into
the
high
priority.
Cash
pool
that
they've
defined
and
I
want
adjust
that
cash
pool
I
want
change
the
size
of
it.
They
don't
really
typically
do
that
in
Roxy,
be
the
capability
is
there,
but
they
haven't
exposed
it
as
part
of
their
their
public
interface
for
the
cash.
A
So
so
our
first
have
attempted
this
was
to
did
the
cash
interface
to
expose
all
this,
and
it
worked
that
that's
how
it
kind
of
first
worked.
We
submitted
PRS
to
the
proxy,
be
guys
then,
and
they
sat
there
for
like
two
months.
They
were.
They
were
super
nice.
They
were
encouraging
about
all
of
this,
but
they're
kinda
like
well.
A
We
really
don't
want
to
modify
the
interface
and
that's
totally
understandable
right,
because
that's
a
nice
static
interface
that
they've
had
for
a
long
time-
and
you
know
here
we
are
coming
in
with
this
weird
custom
stuff
that
most
likely
we
will
be
the
only
ones
that
will
ever
use
so
the
the
end
result
of
this
Kevin.
This
first
pass
was
that
we
still
had
it.
We
still
had
our
own
modified
version
of
rocks
DB
with
our
own
modified
public
interface
and
we
used
their
cache,
their
LRU
cache
and
inside
our
our
implementation.
A
This
is
actually
in
the
the
rocks.
Db
store
the
the
key
value
store.
We
we
implemented
the
the
priority.
Cache
interface
kinda,
weird
thing
to
implement
it
here,
but
but
that's
what
we
did
initially
and
here
you
can
see
we
are
fulfilling
the
the
priority
cache
requests.
So
this
is
when
this
kind
of
the
the
blue
store,
mempool
thread
is
the
thing.
This
kind
of
acting
is
the
overall
controller
for
all
of
this,
and
it
goes
in
and
it
requests
from
the
caches.
How
much
memory
do
you
want
at
a
given
priority?
A
And
here
this
is
exactly
what
we're
doing
we're
saying:
okay,
let's
look
at
what
the
hi
primal
usage
is
for
prior
prior
zero.
That's
the
high
priority
here
and
then
we're
going
to
chunk
it
so
we'll
we'll
basically
push
it
out
to
some
larger
boundary.
So
we
give
ourselves
some
Headroom
for
growth,
and
then
we
also
potentially
have
some
already
assigned
memory
here.
A
So
we're
only
going
to
request
more
if
it's
you
know
kind
of
not
if
it's
you
know
greater
than
this,
and
then
we've
got
this
last
priority
here
and
those
are
the
only
two
priorities
we're
implementing
first
and
that's
for
everything
else.
Basically,
the
the
rest
of
the
usage,
not
just
the
high-priority
pool
usage,
so
everything
else
basically
get
shoved
into
the
last
priority.
A
There
are
interfaces
like
this
for
the
other
caches
as
well,
for
the
the
blue
store
own
cache
and
the
blue
store
buffer
cache,
but
they
don't
really
do
anything.
Yet
all
they're
doing
here
is
is
basically
sitting
at
the
last
priority,
just
like
the
the
rocks
DB
block
cache
the
only
thing
that
we're
really
prioritizing
with
us
and
kind
of
a
prototype
implementation.
Here
is
making
sure
that
the
the
indexes
and
filters
are
always
being
cached.
A
The
the
one
thing
that
it
does
do,
though,
is
that
that
whole
process
that
I
talked
about,
where
we're
kind
of
iterating
through
the
different
priority
levels
and
asking
things
what
they
want
and
assigning
things
and
then
CAD
giving
things
up
based
on
the
ratio.
If
something
doesn't
request
memory,
if,
like
the
buffer,
cache,
isn't
requesting
memory,
then
it's
still
the
case
that
the
Oh
No
might
get
more
of
it.
We
had
some
kind
of
weird
behavior
like
that
already
in
blue
store,
but
it
didn't
work
quite
right.
A
It
actually
had
some
punch
in
it
and-
and
the
ratios
didn't
look
like
they
really
looked
the
way
that
they
were
supposed
to
in
in
previous
versions
of
stuff.
So
we
almost
had
this
kind
of
situation
before
where
we
told
the
user,
you
need
to
figure
out
what
these
ratios
are,
but
then
we
we
kind
of
almost
overrode
those
and
sort
of
use
them,
but
sort
of
made
our
own
decisions.
A
It
was
kind
of
like
you
had
worker
bees,
but
the
worker
bees
were,
were
you
know,
insubordinate,
not
to
me
what
they're
told
so
the
end
result
of
all
of
this
stuff
is
that
if
you
look
on
the
right
here,
that's
kind
of
the
the
auto-tuning
disabled,
behavior
and
and
actually
the
correct,
auto,
treating
disabled
behavior,
not
kind
of
the
the
weird
kind
of
hybrid
behavior
we
had
before,
where
we've
we've
specified
specific
hash
ratios
for
the
different
caches
and
we're
not
using
the
memory
auto
tuning
that
that
we
that
we
implemented
that
we
talked
about
earlier
on,
so
the
RSS
memory
or
sorry,
the
the
sorry,
the
RSS
memory
usage
or
the
process
is
quite
high.
A
A
kind
of
static,
3d,
white,
blue
store,
cache
size
has
been
set,
and
then
all
of
these
different
things
are
kind
of
using.
You
know
they're
up
to
their
their
ratios
here.
Right
with
you
figure
out
with
the
40%
of
three
gigabytes
is
or
20%
of
three
gigabytes.
Is
there
they're
kind
of
they're
sticking
to
them?
In
our
auto-tuned
case,
though,
now
we've
specified
that
we
want
a
four
gigabyte
SD
memory
target,
so
you
know
they're
they're,
sort
of
comparable
right.
You
think
well,
okay,
I
do
a
three
gigabyte
booster
cache
size.
A
Maybe
that
makes
my
OCD
process
around
four
gigabytes
when
you
count
the
other
overhead
and
things.
But
in
this
case
now
we've
just
set
the
target
and
it's
figuring
out
what
the
cache
size
should
be
so
now,
instead
it's
it
starts
out
like
around
2.7,
that's
based
on
our
osde
memory
base
and
fragmentation
values
that
kinda
starts
out
there,
but
then
it
turns
out
that
that's
too
much
we've
we've
kind
of
gone
above
and
so
now
we're
dropping
the
amount
of
cache
memory
down
and
kind
of
adjusting
it
below
three
2.7
gigabytes
there.
A
It's
you
know,
maybe
more
around
2.5
or
2.4,
or
something
had
variable
you'll
see
that
the
mapped
memory
in
the
light
blue
is
very,
very
good.
It's
very,
very
close
to
around
4
gigabytes.
It's
something
spikes
over
a
little
bit,
but
it's
staying
really
close.
The
RSS
memory
usage
is
high.
This
was
our
boon
to
note
where
the
kernel
was
not
aggressively
reclaiming
unmapped
memory,
and
so
the
RSS
memory
usage
of
the
process
ended
up
being
higher.
A
I
suspect
that
if
we
had
memory
pressure
that
the
kernel
would
be
more
aggressive
about
reclaiming
that,
so
those
would
probably
closer
on
CentOS,
we
see
them
much
closer.
At
least
in
our
test
lab.
The
CentOS
kernel
settings
appear
to
be
far
more
aggressive
about
reclaiming
unmapped
memory,
even
when
there
isn't
a
significant
amount
of
memory
pressure.
You
can
also
see
here
that
very
initially,
the
amount
of
memory
allocated
for
buffers
grows
really
quickly.
A
That's
that
little
green
line
it
spikes
up,
because
when
we're
doing
writes
like
this
initially
you
you
end
up
with
a
lot
of
of
buffers
that
potentially
you
could
cache
and
that's
getting
the
majority
of
the
memory
at
first,
when
we're
going
through
that
iteration
process
that
I
described,
but
as
time
goes
on,
we
see
that
Oh
nodes,
the
amount
of
Oh
nodes
have
been
written
increases
dramatically,
and
so
that's
starting
to
get
a
higher
proportion
of
the
cache
and
over
time.
A
A
There's
some
value
in
caching
that
and
at
the
same
time,
the
amount
of
buffered
the
amount
of
things
in
the
buffer
cache
the
amount
of
data
in
the
buffer
cache
and
the
amount
of
data
in
the
O
node
cache
ends
up
decreasing
over
time,
but
ends
up
recovering
when
we
have
these
spikes,
these
spikes
I
think
are
maybe
when
rocks
TB
is
doing
compaction
and
it
is
storing
the
SST
files
in
the
block
cache
and
so
all
of
a
sudden.
There's
this
huge
request
or
more
memory-
and
this
is
the
the
auto
tuner.
A
Basically,
our
this
whole
thing,
this
cash
priority
system.
Responding
to
that
saying:
oh,
hey,
look
all
of
a
sudden!
We've
got
this.
You
know
more
data
coming
in
probably
both
at
the
high
priority
level,
which
is
this
kind
of
continual
growth
and
then
at
the
lower
priority
level,
which
is
the
spike
here,
but
because
we've
said
that
we
want
40%
up
to
40
per
12,
a
target
of
40%
of
the
the
meta
cash.
A
Sorry,
the
kV
cash,
because
that's
that
we've
said
that
high
then
it
gets
in
the
first
round
a
higher
percentage
of
the
memory
and
these
drop
down.
But
then,
potentially,
as
those
SST
files
end
up
being
evicted
from
Roxy
B's
cash,
then
it
doesn't
need
as
much
it
says:
I
don't
request
nearly
as
much
I,
don't
want
it
I,
don't
care,
then
the
the
buffer
cache
and
the
didn't
met
Akash
end
up
getting
more
of
it.
A
That
was
an
rgw
case.
This
is
kind
of
similar
here
for
for
our
Reedy.
In
this
case,
you
see
that
the
the
amount
of
data
in
the
key
value
store
for
Rox
DB
is
much
lower.
This
is,
this
is
kind
of
a
simplistic
view
of
it,
because
in
this
case
we
had
enough
memory
that
we
were
able
to
fit
all
of
the
blue
storoe
nodes
in
cash.
A
That's
you
can
kinda
see
here
that
it
stabilizes
right
around
one
gigabyte
and
then
there's
there's
plenty
of
memory
for
for
additional
medication
and
for
for
everything.
Basically,
it
turns
out
that
the
data
cache
ends
up.
Basically,
just
getting
oh
I'm.
Sorry
I
got
this
backwards.
The
the
the
code
cache
ends
up
is
the
yellow
line
that
gets
the
steady
state.
A
A
B
A
A
So
we
we
start
out
with
this
kind
of
global,
not
global,
but
this
member
veil
and
then
we
go
through
and
ask
well,
we
go
through
for
each
priority.
Is
it's
kind
of
what
we're
doing
here
and
running
this?
This
balanced
cache
priority
method
and
eventually,
after
we've
gone
through
all
the
parties,
if
there's
any
left.
After
all
of
that,
then
we
just
kind
of
equally
well.
We
divide
it
based
on
the
theory,
shows
they've
been
set.
A
So
just
if
nothing
wants
cache,
then
that
just
ensures
that
that
everything
gets
some
share
of
the
memory,
but
then
balanced,
cache
pry.
This
is
where,
at
a
given
priority
level,
we've
we've
passed
in
the
the
amount
of
memory
that's
available,
and
now
we
are
iterating
through
looping
through
and
doing
that
process
that
I
described
earlier
or
asking
each
thing
what
it
wants
and.
A
And
then,
if
it
wants
more
than
kind
of
its
fair
share,
we
we
give
it
what
it
wants,
but
we
keep
it
in
mind
and
go
through
and
can't
do
this
for
everything
right.
So
each
thing
is
just
getting
up
to
its
fair
share
to
start
out
with,
and
if
that
satisfies
it,
then
we
remove
it.
Otherwise
we
keep
it
around
and
we
go
back
through
and
do
this
process
again
with
whatever
memories
available
until
all
of
them
have
been
removed
or
or
until
its
its
memory.
So
so
that's
that's
kind
of
it.
A
That's
that's
how
this
works
right
now
this
is
more
or
less.
What's
in
math
master,
there's
one
other
commit
in
master
that
that
basically
goes
through
and
and
then
I
had
mentioned,
that
the
Yurok
CB
guys
didn't
like
this,
where
we're
changing
the
the
interface
for
for
the
cash.
So
what
we
did
instead
is
we
switched
back
to
master
and
we
took
the
ROC's
DB
LRU
cache
code
and
implemented
it
in
stuff
itself.
We
we
basically
took
it
imported
it
and
then
now
we
we
don't.
We
can
use
the
interface
the
same
way.
A
A
This
is
something
that
that
changes
there,
whether
or
not
recent
reads-
recent
read-
hits
are
promoted
into
the
high
priority
cache
and
that's
that's
not
actually
useful
for
us
right
now.
We
don't
want
that
specifically
because,
what's
coming
in
the
future
and
another
PR
is
that
we
have
a
system
of
not
just
two
priorities,
not
just
low
and
high,
but
an
arbitrary
number
of
pry
we're
by
default.
A
It's
set
to
be
10,
but
we
can
stratify
all
of
this
and
we
can
make
decisions
about
whether
or
not
something
that's
a
priority
level
1
in
the
the
O
node
cache
is
more
important
than
something
as
a
priority
level
2
in
the
other
cache,
and
we
do
this
through
kind
of
a
system
of
age
bins
that
get
added
to
all
the
different
caches.
We
don't
have
time
to
talk
about
here
and
the
codes
not
actually
in
master
it's
in
another
PR
that,
unfortunately
failed
QA
due
to
a
seg
fault,
but
it
mostly
works.
A
Actually,
when
it
doesn't
sync
fault,
I,
guess
it's
really
neat,
because
then
you
can
kind
of
have
the
OSD
dynamically
reacting
to
the
kind
of
the
the
kind
of
I/o
that's
happening.
Maybe
Oh
nodes
are
more
important.
Maybe
oh
map
is
more
important.
Maybe
there's
some
trade-off
between
these
things.
You
may
be
still
always
want
the
kV
cache
to
absolutely
get
the
indexes
and
filters
cached.
So
that's
always
high
priority,
but
then
other
things
can
kind
of
shift
around.
A
It's
really
neat
to
see
at
work.
There
is
a
one
problem,
though,
with
all
of
this
is
that
when
you
get
aggressive
about
doing
this
it
definitely
it
really
makes
TC
Malik
kind
of
unhappy
it
wants
to
like
shove
like
size
things
into
bins,
so
that
it
it
can
make
like
good
allocations
with
them
and
all
of
a
sudden
here
now
we're
like
moving
we're
saying
this
thing.
This
set
of
things
is
getting
more
memory
and
now
we're
changing
it.
Over
here
and
doing
things,
this
has
to
be
kind
of
a
slow
process.
A
We
can't
do
it
quickly.
We
I
think
it's.
You
know
one
second
or
changing
the
the
amount
of
memory
that's
available
in
general
and
five
seconds
for
this
process
of
like
kind
of
renegotiating
how
much
memory
should
be
devoted
to
each
individual
cache.
So
let
me
just
check
and
see
if
there's
anything
else
here
and
to
go
over
okay
yeah.
So
we
just
we
talked
about
this
where
we
pulled
in
the
rocks.
Db
cache
our
selves
one.
A
There's
there's
they've
got
Roxie
became
had
their
own
custom,
small
vector
like
implementation,
and
you
know
we
can
just
standardize
on
this
now
and
use
it
in
multiple
places
depending
on.
If
we
want,
there's
or
or
actually
small
vector
itself
or
whatever
also
they,
they
did
some
kind
of
shady
things
with
memory
allocations
where,
if
you
ever
heard
of
the
struct
hack
in
C,
it
lets
you
basically
do
a
malloc
and
then
or
for
a
larger
amount
of
space
and
a
struct
aches.
A
But
then,
at
the
end
of
your
struct,
you
have
like
a
an
array
that
you
walked
off
the
end
of,
but
you
know
you
can
do
it
because
you've
got
malloc
space
for
it
and
it's
it's
really
evil,
but
there's
actually
a
legal
way.
You
can
do
this
and
C,
but
not
in
C++
and
so
they're
they're
doing
it
the
illegal
way
and
C++.
Basically,
it
works
with
GCC.
It
works
definitely
with
with
other
other
compilers.
It
might
work,
but
it's
definitely
not
not
valid
C++.
A
So
we
changed
that
it
might
be
a
little
slower,
but
it's
it's
definitely
nicer
to
work
with
now
and
it
was
necessary
to
be
able
to
do
the
things
that
we're
doing
where
we're
modifying
the
caches
in
the
future
for
doing
this
edge
bidding
so
yeah
I
think
that's,
basically
it
Oh.
Also,
there
are
some
other
things:
they're
coming
automatic
chunk
sizing
for
or
doing
the
chunky
allocations.
A
That
just
basically
means
that
the
bigger
the
more
memory
you
have,
the
bigger
chunk
sizes
it
makes
but
you're
you're
kind
of
not
just
alkane
tiny
little
chunks
if
you've
got
tons
of
memory
available
and
then
also
there's
a
where
this
new
PR
is
moving.
The
cash
balancing
code
out
of
blue
store
into
this
kind
of
generic
priority
cash
manager.
A
Class
I
would
really
like
to
even
if
it's
not
that
useful
I'd
really
like
to
make
this
OSD
wide
for
file,
store
and
blue
store,
and
maybe
potentially
do
something
like
this
using
this
priority
cash
manager
in
like
the
Mon
and
other
things
so
that
users
get
away
from
this
idea
of
setting
okay
I'm
gonna
have
a
512
megabyte
cache
for
Rox
TB
in
the
Mon.
Well,
no,
let's
not
do
that.
They
don't
they
don't
know
how
much
they
should
assign.
A
Instead,
maybe
we
just
say:
okay,
the
Mon
should
use
four
gigs
of
memory
or
two
gigs
of
memory
or
whatever,
and
then
we
just
handle
it
behind
the
scenes,
and
then
the
user
knows.
Okay,
this
is
the
envelopes
the
Mons
gonna
try
to
fit
into
the
cache
gets
adjusted
based
on
you
know
what
other
stuff
is
happening
and
and
then
we
just
can
go
from
there.
A
I
think
have
making
it
all
Universal
and
very
straightforward
for
the
users
kind
of
the
end
goal
of
all
this,
and
if
this
all
works,
if
this
works
really
well,
then
maybe
we
can
start
thinking
about
things
like
doing
similar
stuff
for
disks.
It's
more
complicated
for
sure
and
the
the
act
of
migrating
data
around
is
is
definitely
very
impactful,
but
you
know
I'd
love
it
if
we
just
handed
OS
DS
or
handed
diamond
demons
in
general,
a
chunk
of
hardware
memory
disks,
whatever
maybe
course
to
operate
on
and
said
you
figure
it
out.
A
B
A
To
do,
I
mean
all
it
really
does
is
say
how
much
you
event
right
when
you're
going
through
and
doing
cash
eviction
now
you've
got.
Maybe
you
know
less
less
stuff
that
you
need
to
evict
right,
because
you've
got
a
little
bit
of
headroom,
so
you
don't
do
it
until
you
hit
that
with
the
automatic
chunk
sizing.
All
it
really
does
is
says:
okay,
if
you've
got
lots
of
memory,
give
it
more
Headroom.
This
count.
What
comes
down
to
or
if
you've
got
like
nothing.
A
There's
no
memory
available,
then
then
make
a
really
small
chunk
size
so
that
we
can
like
it
as
much
into
our
space
as
possible.
So
I
guess
you
know
it's
a
small
memory
values,
then
you're
not
gonna
have
quite
as
much
waste
as
having
a
static
chunk
size
and
at
large
values.
It
makes
it
so
that
well,
you've
got
a
lot
more
Headroom.
So
you
know
eviction
doesn't
start
happening.
A
Maybe
you
stay
ahead
of
eviction
if
you've
got
room
for
growth
and-
and
you
only
start
really
evicting
once
you've
kind
of
gotten
to
the
point
where
you're
so
saturated
with
everything
that
they
have
to
I
I,
don't
know
I,
don't
it's
kind
of
neat
it's
useful,
but
I,
don't
know
that
it
necessarily
it's
a
huge
difference
in
the
long
run.
Maybe
it
does
I,
don't
know,
okay,
that
make.
A
Anywhere
anything
but
yeah,
and
my
I'd
like
to
do
too,
is
once
I
get
the
the
next
working
right
and
get
that
into
master
I.
Think
you'd
be
good
to
do
another
one
of
these,
because
the
next
piece
is
the
really
cool
piece
for
work.
We're
actually
modifying
the
caches
to
do
this
age
binning.
It's
kind
of
what
makes
everything
really
work.
This
stuff
is
master
right
now
you
know
more
or
less
works,
but
it's
it's
really
simplistic.
There's
basically
just
two.
There
are
two
priority
levels,
and
one
of
them
is
only
used
for.
C
C
A
This
this
stuff
is
all
you
know.
This
is
like
you
know,
digging
into
the
guts
of
you
know,
kind
of
blue
stores,
working
I
would
I
would
look
at
how
crushers
and
how
kind
of
the
overall
kind
of
design
behind
the
monitors
and
the
OS
DS
work.
Probably
it
I
know
Josh.
Do
you
think
sages
paper
would
still
be
useful
to
read
yeah.