►
From YouTube: 2018-FEB-22 :: Ceph Performance Weekly
Description
Weekly collaboration call of all community members working on Ceph performance.
http://ceph.com/performance
For full notes and video recording archive visit:
http://pad.ceph.com/p/performance_weekly
B
B
A
Right
well,
the
first
one
is
just
came
in
this
morning
and
it's
pretty
exciting.
Actually,
this
Peter
went
through
and
figured
out
why
the
OP
tracker
was
slow
and
fixed
most
of
it.
Now
that
performance
impact
is
it's
not
totally
gone,
but
it's
like
much
much
smaller.
You
compare
under
still
chart
the
blue
blob
to
the
green
blob.
On
the
other
side,
that's
pretty
awesome,
so
I
had
some
style
nets,
but
otherwise
all
sounds
fine
to
me.
B
A
A
A
C
A
B
B
A
A
Oh
yeah,
that
one
was
close.
Let's
see
that
clock
now,
this
inlined
that's
merged
some
config
normalization
checks
from
Peter,
that's
merged
the
observer
for
the
config
options
that
we
talked
about
a
couple
weeks
ago.
That's
merged,
that's
great!
That
should
make
it
really
easy
to
move
away
from
the
legacy
stuff.
A
A
A
A
A
B
A
B
A
B
B
A
B
A
A
A
A
A
B
B
So
there
this
this
was
motivated
somewhat
by
a
bug
that
came
up
recently.
I,
don't
know
was
I've
taught
my
head,
but
it
maybe
was
that
one
maybe
was
that
no,
it
wasn't
just
a
trim
happening
more
frequently.
There
was
another
one
too,
where
there
was
something
related
to
trim
and
I
started.
Looking
at
the
code
for
blue
source
cache
and
it
the
I
guess,
the
the
the
thing
that
really
struck
me
is
how
kind
of
the
the
encapsulation
is
kind
of
leaky.
B
It
feels
to
me
at
least
like
there's
just
a
lot:
it
it's
complicated,
I.
Guess
it
yeah.
So
so
you
know
I,
don't
I,
don't
know
how
to
fix
the
bug
other
than
that.
It's
made
me
between
that
and
between
some
issues
with
a
partner
that
recently
came
up
where
they
were
having
a
really
kind
of
it
was.
It
was
apparent
how
how
confusing
it
can
be
to
to
how
we
kind
of
deal
with
with
blue
stars.
B
Oh
no
memory
for
that
memory
for
rocks,
DB's
block
cache
and
then
also
having
memory
available
just
for
caching
data
in
blue
store
and
how
we
kind
of
try
to
divide
up
the
ratios
between
those
and
limit
rocks.
Tb's
cache
at
some
point.
It's
just
kind
of
it's
not
only
hard
for
for
users,
I
think
to
kind
of
understand.
B
All
of
that,
but
even
for
us,
we've
got
this
kind
of
scheme
that
I
was
responsible
for
implementing
to
to
try
to
kind
of
make
a
sane
default
for
how
we
divide
all
this
up
and
it's
it's
clear
that
it's
not
handling.
At
least
this
particular
partners
use
case
very
well,
so
you
know
we.
We
could
try
to
make
that
better.
We
could
try
to
make
it
smarter.
B
We
could
try
to
divvy
up
things
more
dynamically
and
make
it
so
that
we're
balancing
hit
rates
in
our
cache
and
Roxy
B's
cache
or
maybe
just
try
to
make
something-
that's
a
little
bit
better,
a
little
bit
of
a
better
default.
That's
still
pretty
simple,
but
but
even
if
that
is
the
L
feels
crazy
to
me.
B
So
the
the
quest
that
I'm
on
now
is
to
try
to
see
if,
if
we
just
kind
of
rip
out
blue
stores
cash
entirely,
if
we
can
identify
what
the
pain
points
the
solving
are
and
then
fix
those
instead
of
having
our
own
cash
and
just
rely
on
rocks
TVs
cash
for
metadata
I,
don't
know
if
that's
sane
or
not,
it
might
not
be.
We
might
still
need
our
own,
but
I
want
to
at
least
kind
of
get
a
better
idea
when
we
don't
have
our
own
cache
of
what
we're
fixing
yeah,
yeah
yeah.
A
So
I
think
they're.
There
too
I
think
it's
important
to
separate
out
metadata
and
data.
So
the
problem
we're
having
here
is
the
case
where
we're
not
catching
any
data,
it's
just
metadata
and
we
weren't
giving
it
for.
If
you
gave
those
see
lots
of
memory,
it
wasn't
giving
rocks
to
be
enough
of
the
memory
it
was.
We
basically
capped
at
that
thick
rock
streamer,
yet
I
think
512
Meg's
or
something
really
small
and
gave
everything
else
to
blue
store,
and
that
was
just
a
bad
decision.
I
think
in
retrospect,
the.
A
A
We
have
to
fix
that
regardless,
but
the
thing
is:
if
setting
all
that
aside,
the
eliminating
the
metadata
cache
stores.
Basically,
the
basic
trade-off
is,
if
you
eliminate
it,
then
you're
trading,
you're,
basically
trading
the
overhead
of
maintaining
that
cache
in
memory
versus
the
cost
of
decoding
the
data
that
comes
out
of
rock
darks
to
be
every
time,
and
we
did
quite
a
bit
in
blue
store
to
make
that
decoding
much
faster,
but
still
slow,
I.
Think,
objectively
speaking,
it's
still
it's
still
too
slow
and
I.
A
Don't
know
that
that's
something
we
can
fix
in
there,
like
short
term
I,
think
it
needs
like
I,
think
it
needs
a
pretty
fundamental,
like
rethink
and
redesign
of
how
the
data
structures
are
laid
out
in
memory,
so
that
matches
the
representation
in
porosity
so
that
there
isn't
like
a
decoding
stage.
We
can
just
like
map
it
into
memory
and
then
immediately
use
it.
One.
B
A
B
Eat
the
encoding
overhead
and
decoding
overhead
than
it
is
to
use
blue
stores.
Oh
no,
we
don't.
We
don't
really
know
we.
You
know
it's
it
looked
before
when
we
did
this
testing
last
like
giving
blue
source.
Oh
node
cache
all
of
the
memory
we
could
more
or
less
made
sense
as
long
as
rocks
TB
had
some
like
minimal
amount
for
probably
indexes
at
the
time
and
now
indexes
and
filters
yep.
A
So
that's
the
if
I
remember
correctly
book
Oh
excited
yeah,
so
we
have
blue
store,
KB
ratio.
So
that's
a
point:
nine,
nine
or
whatever.
So
right
now
we're
doing
almost
everything
directly
until
we
hit
the
cap,
which
is
512
megs
and
then
everything
goes
back
to
a
blue
store.
So
that's
the
it's
really
that
cap.
That
seems
like
it's
a
problem,
I
think
what
we
it
feels
to
me
like
the
focus
should
be.
What
is
the
actual
cost
of
maintaining
that
that
cache?
Because
that's
the
part
that
I
don't
really.
A
A
Do
we
have
a
good
understanding
of
like
how
much
that
actually
costs,
because
the
because
they're
they're
sort
of
these
confounding
issues
one
is
that
we
have
to
have
some
tracking
of
in-flight
requests
and
that's
handled
in
the
cache.
So
it
might
be
that
the
sort
of
the
workaround
is
just
to
like
steer
all
the
memory
towards
loose
towards
rocks
to
be,
and
then
blue
storages
caches
only
what
it
has
to
that
requires.
A
Basically,
no
code
change
or
very
very
little
code
change
and
might
be
enough
as
long
as
it
keeps
all
the
cache
structure
small.
So
all
the
lookup
tables
are
very
tables,
are
small.
It's
only
in-flight
stuff.
It
should
be
hopefully
really
fast,
but
again
when
you
we
need
it
for
sure
figure
out
what
the
cost
is
there,
and
the
second
thing
is
that
the
only
way
to
cache
data
is
also
have
the
metadata
that
I
Dingles
off
of
so.
If
there's
any
kind
of
data
caching
enabled,
then
that
doesn't
help
right.
A
We
have
to
have
rocks
to
be
tracking,
that
in
memory,
I
think
that's
important
for
Assefa
thefts,
workloads
and
for
our
GW
workloads.
Not
so
much
for
RVD,
because
there's
the
client-side
caching
there
and
also
the
VM
and
everything
else,
is
caching
the
filesystem
layer,
but
for
our
GW
hot
objects
should
get
cached
on
the
OSD
and
for
a
set
of
s.
B
Gw
is
one
of
the
cases
where
I'm
worried
that,
having
us
kind
of
not
not
being
able
to
dynamically
change,
the
rocks
DB
cache
hurts
us
right
because
you've
got
so
many
more
key
value
pairs,
potentially
enough
with
au
map
that
we
we
don't.
We
don't
know
how
much
we
need
to
be
able
to
keep
all
the
indexes
and
filters
in
memory.
Yeah.
A
A
A
B
And
even
kind
of
gets
more
weird
right
because
you
you
load
well,
I
was
gonna
say
you
look
bloom
filters
in
with
the
bestest
he
files,
but
that's
not
true,
because
we're
keeping
them
in
the
block
cache.
So
so
you
have
like
a
bloom
filter
for
every
SST,
all
right
so,
depending
on
the
size
of
your
SST
file,
you
may
be
paging
stuff
in
different.
Like
you
know,
sizes
right
you
might
have
like
if
you
have
a
big
SST
file,
you've
got
a
big
page
in
for
the
the
bloom
filter.
B
A
So
I
think
that
my
my
gut
says
that
we
should
that
the
simplest
thing
might
be
good
enough
and
that's
basically
just
to
fix
that
stupid
cat,
and
so
it's
instead
above
a
certain
threshold.
You
just
served
that
about
you,
know,
2/3
or
actually
even
1/3,
to
boost
or
or
vice
versa,
something
like
that
I.
A
My
guess
is
that
that's
that's
going
to
be
good
enough,
but
if
it's
not
then
I,
we
should
be
able
to
ask
rocks
to
be
how
many
SST
files
there
are
and
how
big
they
are
to
get
an
estimate
of
how
big
those
bloom
filters
are.
So
then
we
can
adjust
the
cache
based
on
that
I'm.
Just
not
sure
that
it's
gonna
be
worth
the
complexity.
To
do
that.
I.
I,
guess,
is
that
we
can
do
the
simple
thing:
it'll
be
good
enough.
I
kind.
A
A
B
I,
wonder
I,
wonder
because
I
mean
we
first.
So
if
we
keep
the
blue
storoe
node
cache
and
we've
got
that
there,
where
we
have
to
decide
how
much
memory
that
thing
has
we've
got
the
rock
CB
cache
where
we
have
to
decide
that
we've
got
some
other
buffers
in
various
places.
If
I
recall
in,
let's
see
the
OSD
code,
the.
A
B
B
A
I
think
to
be
clear:
users
should
never
have
to
think
about
any
of
this.
I
want
users
have
one
knob
that
says
this
is
how
much
memory-
yes,
so
I
think
it
really
comes
down
to
how
sophisticated
a
model
do
we
think
we
need,
and
it
it's
it's
it's
it's
basically
complexity
versus
diminishing
returns,
so
we
can
do
something
sort
of
trivial,
a
trivial
set
of
heuristics
like
we
currently
have
that
are
hopefully
good
enough
and
we
make
them
like
just
good
enough,
so
that
they're
good
enough
or
we
can
invest.
A
B
It
could
it
be
even
something
as
simple
as
you
have
different
things
that
say,
say:
rocks
DB.
We
have
some
kind
of
little
little
piece
of
logic
in
there
that
says,
here's
home
how
badly
I
think
I
need
memory
based
on
some.
You
know
scale,
you
know
from
zero
dough
at
one
point:
zero
or
something
and
say
this
is
how
badly
I
think
I
need
memory,
and
then
you
know
we
we
give
it
memory
based
on
how
much
it
thinks
it
needs
memory.
How
much
other
things
think
they
need
memory
I.
A
So,
for
example,
you
know
all
RBD,
you
know
NOAA
map,
it's
all
blocked
data
whatever
and
then
we
like
fill
a
device,
and
then
we
figure
out
what
the
right
allocation
of
memory
between
blue
Storen
and
rocks
TV
is,
and
then
we
take
the
other
extreme
where
it's
like
all
our
JW
indexes
stole
an
entire
device
and
we
figure
out
where
that
balances,
and
then
we
just
pick
something
that
sort
of
you
know
text
somewhere
between
those
two
extremes
without
necessarily
like
yeah.
That
makes
sense.
I.
A
Well,
I
think,
even
even
if
it
does,
if
we,
if
we
choose
a
I,
think
what
I
think
I
mean
I'm
just
guessing
here,
but
it.
My
guess,
is
that
the
that
the
I
guess
the
surface
area
of
the
thing
that
you're
optimizing
is
such
that
you
know
whether
you're
steering
like
80
percent
to
rocks,
TV
and
20
percent
of
blue
store
or
80
percent
of
blue
store
and
20
percent
rocks
TV
doesn't
make
a
huge
level
of
difference.
A
A
We
can
pick
a
value,
that's
sort
of
middle
of
the
road
that
doesn't
run
off
the
off
the
rails
for
either
extreme
cases
like
it
still
does
good
enough
for
the,
although
map
case,
and
it
still
does
it
good
enough
for
they're.
Like
no
oh
map
case,
then
you
know
we
might
get
another
like
10
percent.
If
you're
like
twiddling,
trying
to
optimize
it
perfectly
but
I'm
not
sure
it's
worth
I
guess.
B
A
Or
something
like,
especially
for
the
case
where
we
just
need
to
make
sure
blue
store,
does
it
doesn't
break
right?
We
need
to
make
sure
that
we're
not
blue
star
started
luminous
like
we
need
something
that
we
can
back
for
it.
We
just
like
tweak
the
policy
so
that
it's
going
to
be
okay
for
everything
and
we
don't
want
to
like
introduce
this
whole
other
level
of
like
self
tuning
monitoring
or
something.
B
One
so
so
I
agree
with
you
there
that
something
very
simple
for
luminous.
Absolutely,
we
don't
want
a
bad
for
a
big
big
change.
I.
Do
wonder,
though,
if
still
going
to
have
a
problem
where
you
have
very
different
requirements
for
how
much
cash
rocks
TB
gets
depending
on
what
you're
doing
like
an
RDD
case.
That's.
C
C
B
A
I
mean
we
can
yeah,
but
it's
gonna
be
work
I'm,
just
thinking
what
the
extremes
are,
the
the
extreme
in
one
way
is
going
to
be
like
I
guess:
rgw
objects,
oh
they're,
all
4,
Meg's
and
they're,
written
sequentially
and
they're
not
fragment
at
all,
and
the
other
extreme
would
be
100%.
Oh
map
tons
of
rocks
three
keys
yeah.
A
B
And
the
ones
that
the
bloom
filters
that
we
really
really
need
I
think
are
probably
when
we're
setting
or
checking
adders
for
when
we're
doing
like
writes
we're
looking
to
see
if
stuff
exists
already,
but
that
I
mean
every
single
write.
We
do
like
three
of
them
or
something
that's
so
yeah
yeah
I
mean
that's
really
what
we're
I
think
talking
about
the
primary
thing
we
want,
like
a
super
high
yeah.
A
A
Of
them
in
cash
right,
because
it's
going
to
be
so
I
think
the
simplest
case
would
be.
We
can
size
it
so
that
we
can
keep
all
the
bloom
filters.
Even
if
you
have
a
complete
device,
that's
just
full
of.
If
you
fill
an
entire
device
with
you
value
data
will
still
be
able
to
keep
them
all
in
memory.
Dad.
B
Even
I
don't
think
it
is
actually
I
recall
on
a
different
customer
or
a
different
partner.
Customer
I
remember
workload
when
they
were.
They
had
a
big.
You
know
much
bigger
than
our
our
test
environment
here
I
recall,
seeing
a
lot
of
work
being
done,
creating
and
and
working
on,
bloom
filters
and
I
suspect
it
was
for
PG
log
work
with
all
the
the
key
inserts
that
are
coming
in
with
PG
log
I
mean
it's.
It's
a
huge
amount
of
our
total
key
insert
workload
as
PG
log
updates,
I
suspect
a
lot
of
that
work.
B
A
B
B
A
A
Feels
to
me,
like
the
two
action
items.
Excuse
me
action
items
are
that
disable
bloom
filters
for
PG
meta
items
and
then
we
need
to
deploy
and
like
populate
a
OSD
or
small
cluster
or
whatever,
that's
just
completely
full
of
probably
like
our
GW
index
data
or
something
or
just
or
some
sort
of
simulated
map
data
just
completely
fill
it
up,
and
then
we
can
look
carefully
at
what
roxie
be.
A
B
B
Oh
go
ahead,
then
sorry,
I,
don't
want
a
truck
go
ahead.
I
was
just
gonna,
say,
I
know,
rocks
TB
will
give
you
estimates
on
how
many
keys
are
present.
There
are
kind
of
some
weirdnesses
I
think
when
you
have
like
duplicate,
keys
and
multiple
levels,
where
it's
not
like
exact
what
you
have
and
what
you
don't
I,
don't
know
if
there
are
like
offline
tools
where
we
could
just
like
to
shut
everything
down
and
then
have
it
like
go
by
and
go
through
and
just
like
get
everything
I'm.
A
Not
sure
that
that
how
many
keys
is
actually
what
we
care
about,
because
the
users
and
the
how
many
keys
we
don't
know
how
many
keys.
What
we
know
is
that
we
have
a
10,
terabyte
SSD
or
whatever
a
30
terabyte
SSD
from
Samsung,
or
it
was
that's
what
we
know,
and
so,
if
we,
if
we
fill
and
that,
then
the
question
is,
as
an
operator
I
have
this
SSD
I
know
I'm
gonna
be
using
it
for
a
g-tube
you
in
Texas.
How
much
memory
do
I
need
for
my
OST
for
this
size
device.
A
A
Whatever
that
happens,
or
we
can
just
have
a
workload
that
generates
the
keys
that
we
care
about,
not
even
know
how
big
they
are
and
just
fill
the
device
up
and
I'm
kind
of
leaning
towards
filling
the
device
up,
because
that
will
capture
any
other
effects
that
we
didn't
think
about
all
right.
Okay
and.
B
A
We
have
to
like
look
at
the
logs
to
figure
out
figure
that
out
I'm,
not
sure
she
sure.
So
we
can.
We
can
look
at
blue
FS
to
see
how
many
S's
T
files
there
are
and
helping
there.
So
we
can
actually
see
that
there's
I
assume
we
can
look
at
an
SST
and
parse
it
out
and
figure
out
how
big
the
bloom
filter
portion
of
it
is.
That's
probably
like
in
the
header
of
the
file
it
tells
it
starts
with
the
bloom
filter
or
something
not
sure.
A
B
A
B
A
A
B
There's
only
little
thing
I
had
here,
which
was
that
I
don't
have
it
up
yet
I,
guess
the
wall
clock
profile,
but
when
I
was
doing
some
of
the
pet
store
work
earlier,
I
did
a
Wolcott
profile
on
FIO
with
the
lebar
BD
plug-in,
because
that
was
actually
the
the
bottleneck
at
some
point
was
having
just
you
know.
One
client
wasn't
enough
had
that
scale
to
multiple
clients
and
looking
at
it,
it
looked
like
a
lot
of
time.
Well
poll
might
have
been
a
bottleneck.
We
actually
were
spending
a
lot
of
time
just
polling.
B
B
A
B
A
A
This
is
something
that
could
be
because
the
block,
cache
and
rush
to
be
wasn't
big
enough,
but
it
could
also
be
that
rocks
to
be
is
still
stupid
about
compaction
where,
when
it
does
compaction-
and
it
writes
new
SS
T's,
they
don't
go
into
the
cache.
They
only
go
to
the
device
and
has
to
go
read
them
back
again.
As
you
start
faulting
against
them.
B
E
A
Things
yeah
so
two
things,
so
yes
it.
There
are
two
reasons
why
that
can
happen,
one
because
the
rocks
to
be
cash
isn't
cashing
what
it
should,
because
it's
too
small,
which
I
think
is
I,
think
that's
what's
going
on
here,
but
the
other
cause
which
we
can't
fix
without
fixing
rocks
TV
is
that
when
rocks
TV
does
its
compaction,
it's
taking
all
these
SST
files
that
are
warm
and
in
the
cash
and
everything
and
it's
writing
new
ones
that
are
compacted
and
it
doesn't
put
them
in
a
cache.
A
It
just
writes
into
a
device
and
so
immediately
following
a
compaction.
You
have
a
whole
bunch
of
cache,
misses
that
have
to
fall
to
everything,
hot
backing
it
again
and
that's
a
common
problem
with
rocks
TV
and
leveldb,
and
a
bunch
of
these
things,
I've
reviewed
papers
with
weird
complicated
schemes
trying
to
like
mitigate
that
effect
with
Ellis
entries
and
we've.
A
When
we
talked
to
Roxy
about
that
they
they
said
they
don't
really
want
to
just
dump
the
newly
compacted
stuff
into
the
cache,
because
it'll
push
out
all
the
other
stuff
that
it
was
legitimately
warm
and
the
newly
compactive
stuff
may
or
may
not
be
warm.
They
don't
really
know,
and
so
otherwise,
every
time
you
do
compaction
you'll
just
basically
like
to
throw
out
everything
in
your
cache
and
they'd.
A
The
reason
why
I
was
pushing
to
get
that
off
and
while
we
finally
turned
it
off,
was
because
I
wanted
I
wanted
to
eliminate
that
entirely,
so
that
we
would
have
just
to
simplify
the
code
but
mostly
and
so
that
we
would
wouldn't
have
a
disparity
between
when
you're
running
on
SPD
Kay
and
when
you're
running
on
a
kernel,
block
device
and
I.
Think
that
disparity
doesn't
matter
that
much
and
the
codes
already
there.
It
works
fine.
So
if
it
helps,
we
should
just
make
you
so.
E
A
Yes,
yes,
but
the,
but
the
linux
buffer
cache
is
sort
of
free
memory
because
we're
not
using
it
anyway,
because
all
of
our
other
caches
are
in
the
process,
and
so
anything
that
said
someone
s
buffer,
caches
memory
that
we
wouldn't
be
using
anyway.
So
it's
it's
free
as
far
as
we're
concerned,
I
guess
you
might
be
crowding
out.
You
know
something
else
in
the
pH
cache
for
a
file
system
or
something
but
I.
Don't
think
we
care
about
that.
A
A
A
A
A
We
should
change,
thinks
I,
think
we
should
just
we
should
add
there
just
get
rid
of
it,
and
so
we
obey
those
ratios
all
the
time
or
we
should
change
it
to
K
V
min
where,
if
it's
below
baby
min
has
look
at
look
at
our
settings
are
basically
our
settings
basically
say
get
everything
in
Roxy
B.
So
if
we
change
it
to
K
V
min
where,
if
it's
below
that,
then
Oliver
memory
goes
to
K
T
and
then,
if
it's
above
that,
then
we
obey
the
ratios
cuz
right
now.
A
A
B
E
B
We
don't
necessarily
want
to
give
half
of
it
to
Roxy
and
half
to
blue
store,
though,
because
that
was
the
behavior
that
we
had
previously,
where
we
were
having
a
lot
of
Onoda
misses
when
doing
unlike
fast
devices,
we
had
something
more
along
those
lines
and
we
kept
on
through
a
variety
of
steps.
We
kept
on
giving
more
and
more
and
more
to
do
blue
stores
to
the
owner
they're
right,
but.
A
Just
saying
decision
status
in
the,
but
so
if
we
so
again,
this
is
going
back
to
our
original
discussion,
where
we
can
have
a
complicated
model
that
like
tries
to
give
exactly
the
right
amounts
of
rock
Steve.
You
know
more
and
then
everything
else
of
blue
store
or
we
can
have
something
sort
of
like
middle
the
road,
simple
and
I'm
kind
of
guessing
I'm,
just
guessing
that
half-and-half
is
sort
of
splitting
our
risk
there
right.
A
B
I
don't
know
if
I
believe
that
statement,
but
I'm
trying
to
see
if
I
can
find
some
of
the
old
testing
and
legitimately
this
is
without
the
bloom
filters
in
the
cache.
So
it's
not
not
strictly
relevant,
but
it
wasn't.
It
wasn't
real.
It
was
you
know
what
we
have
now
was
was
kind
of
the
optimal
at
that
point.
A
B
A
B
C
A
But
okay,
but
setting
that
aside
previously,
we
had
this,
we,
the
current
behavior,
is
that
up
until
512
megs,
forgetting
all
of
it
we're
giving
99%
so
we're
effectively
giving
all
of
it
to
the
proxy
B
I.
Don't
think!
That's
any
different
than
changing
that
kit.
Basically
changing
that
KB
max
to
caving
in
so
that
anything
below
that
we
dedicate
all
of
its
tracks
to
be
is.
B
A
A
A
We
basically
flipped
those
two
and
it
would
be
the
same
behavior
mm-hmm,
but
the
new,
the
new,
the
new
way
having
K
team
into
KP
max,
would
give
us
a
flexibility
to
decide
what
happens
above
that
512
voice.
Right
now,
we're
just
stuck
we
always
we
don't
the
choice:
Atticus,
yeah,
yeah
and
I.
Don't
think
I.
C
A
B
That
was
what
the
testing
bore
out
to
is
that
you
needed
at
that
point
at
least
that
much
memory
to
be
able
to
get
reasonable
performance
and
then
and
then
yeah.
It
was
exactly
what
Ben
just
said.
It
was
a
cliff
right,
anything
more
at
that
point,
more
or
less
it
wasn't
exactly
five
twelve.
It
was
probably
more
like
you
know,
four
hundred
or
something
at
that
point
you
know
giving
more
to
rocks
tbh
didn't
help.
It
was
better
just
to
give
it
out
to
two
blue
store.
I
suspect.
B
That's
gonna
continue
to
be
the
case
where
there
is
some
amount
of
memory
rocks,
DB
needs.
You
give
it
that
and
then,
after
that,
it's
better
to
give
it
to
blue
store.
You
know,
that's
that's
what
it
seemed
like
the
testing
showed
when
we
originally
did.
This
I
think
it's
probably
still
the
case.
It's
just
that
rocks
DB
needs
more
now
that
there
bloom
filters
there.
B
B
So
one
question,
though,
is
if
we
are
doing
buffered
reads
in
blue
store
and
we
can
solve,
or
at
least
identify
pain
points.
You
know
if
it's
encoding
or
if
it's
you
know
walking
or
something
else.
Is
there
any
good
reason
to
split
the
memory
anymore
and
and
actually
have
the
blue
store
cash
at
all,
because
it
still
strikes
me.
That's
a
lot
of
complexity
has
a
lot
of
code.
Yeah
I,
don't
know.
A
There's
a
whole
in
order
to
cash,
and
he
did
it
at
all.
We
rely
on
the
metadata
cache
because
the
data
is
associated
with
the
metadata,
so
a
we
need
most
of
that
complexity
just
to
track
in
fight
rights.
We
could
throw
it
all
the
way
and
rewrite
something
that's
probably
simpler,
but
it
wouldn't
be
that
much
simpler,
but
we
would
get
rid
of
the
LRU,
but
all
the
other
data
structure
do
the
same.
We
have
the
same
look-up
tables,
basically
I.
A
A
hot
rgw
object
right.
If
you
have
one
object,
that
has
you
know
it's.
The
latest
I
think
we
had
something
untrim
objects
several
years
ago.
That
was
like
it
there's
something
stupid.
It
was
like
a
minecraft
tar,
ball,
I,
don't
whatever
it
was
just
some.
Some
gaming
thing
was
just
some
big
file
and
there
just
it
got
linked
somewhere
and
then
a
bazillion
things
are
hitting
it,
and
every
single
request
goes
through
our
GW
to
the
OSD
and
the
other.
A
Steve
wasn't
caching
it
and,
for
some
reason,
I
camera
what
it
was
at
the
time,
and
so
there
was
like
a
disk
I/o
for
every
single
jet,
and
there
just
was
no
caching
anywhere
in
that
stack.
Our
GW
isn't.
Caching.
On
the
client
side,
you
could
put
like
a
caching
web
layer
in
front
of
it,
but
I
don't
know,
think
that's
efficient
right
with
file
stored.
Is
it
you
use.
You
can
use
a
page
cash
flow
with
blue
store
like
it's
direct
I/o,
you
need
you
need
to
cash
in
there
somewhere.
A
A
C
B
A
A
There's
also
all
the
data
copies
into
the
page,
cache
that
you
avoid
with
direct
IO
and
on
reads:
there's
no
such
thing
as
an
asynchronous
read:
that's
buffered
so
reads
become
blocking
so
all
the
ACE,
a
IO
step
that
we
did
on
the
read
path,
but
we
throw
out
the
door
if
we
use
a
buffer.
It
reaches
our
furring
on
the
block
device
because
it's
yeah
bright
and
all
those
well.
A
The
other
thing
is
that
all
of
the
all
of
the
threading-
it
just
totally
changes
the
structure
of
blue
store,
because
right
now
we
Q&A
oh
and
then
we
continue
and
then
there's
a
different
thread
that
picks
up
the
I/o
completions
and
just
that's
completely
different,
because
there
is
no,
the
a
or
thread
doesn't
do
anything
more
anymore.
We're
not
doing
a
oh
yeah.
A
A
A
A
D
Just
a
minor
note,
I'm,
currently
starting
the
some
of
the
investigations
we
talked
about
the
other
day
of
trying
to
see
if
we
can
find,
is
trying
to
look
for
a
few
points
where
you
can
get
a
short-term
win
to
keep
the
customer.
The
Chobot
shall
not
be
named
happy
regarding
some
of
the
PG
log
and
OMAP
stuff
cool.
D
Basically,
specifically,
the
diversion
rights,
there's
a
whole
lot
of
transaction
overhead,
and
if
we
could
temporarily
move
some
of
that
through
a
fast
medium
and
then
move
it
to
a
slow
and
when
it
settles
as
well
as
making
PG
long
not
compete
as
much
with
everything
and
everything
else
that
should
make
them
not
hate
us.
Are
they
on
file
store.