►
From YouTube: Ceph Performance Meeting 2021-05-06
Description
A
Good
morning,
so
the
core
meeting
looks
like
it's
gonna
go
over
again,
I
left
and
we
are
maybe
about
two
thirds
of
the
way
through
people
the
those
people
to
go
through.
So
we
I'll
wait
just
a
little
bit
here
before
we
get
started,
but
I
suspect
people
will
be
joining
late.
A
A
Doctor
have
you
oh
excellent,
excellent
yeah?
I
was
insane
hadn't
seen
you
in
the
meetings
before.
Are
you
a
community
member
randy.
B
I
work
for
renna.
Do
you
okay,
excellent,
absolutely
I'm
in
ce?
I
I
support
sbrsef
and
sbrocs,
as
well
as
all
the
cross.
Collaboration
for
for
ceph
integrations
nice,
nice.
You
know
michael
kidd
or
yeah
joe
quinn,
same
team.
A
Yep
my
peers,
excellent
yeah,
yeah
michael,
has
I'm
pretty
sure
he
was
actually
there
back
in
the
ink
tank
days.
That's
right
so
yeah.
A
Well,
very
good,
very
good.
Maybe
while
we
wait
for
people,
I'm
always
interested
in
hearing
how
things
are
going
on
that
side,
what
what
kind
of
stuff
are
you
guys
running
into
these
days.
B
Well,
I
spend
the
majority
of
my
day
just
trying
to
stay
afloat.
I'm
one
of
the
only
folks
on
the
team
that
can
take
on
you
know
the
the
l3
type
ocs
cases
most
of
the
time.
It's
ocp
type
issues
or
just
integration,
or
it's
always
something.
That's
got
like
what
time
spent
on
stuff,
and
it's
mostly,
you
know,
openshift,
and
for
that
reason
you
know
I
normally
get
hunted
those
cases.
So
that
keeps
me
pretty
busy
ibm
cloud
packs
is
an
abuser
of
our
of
our
slas.
A
Do
you
do
you
deal
with
very
many
performance
issues
on
that
side.
B
Well,
yeah,
actually,
now
that
I
think
about
it,
we've
got
a
lot
of
folks
who
are
just
throwing
anything
and
everything
at
ocs
and
still
has
three
devices.
B
You
know
concepts
of
of
tiering
and
understanding
what
will
or
will
not
get
reconciled
by
the
operator
which
objects
are
you
know
stateful
and
and
what
can
fix?
You
know
have
these
specific
static
settings
that
it's
appropriate
time
or
you
would
reach
a
scenario
when
it's
appropriate
to
turn
them
off.
You
know
scale
them
down
to
zero
and
say
don't
do
anything.
I
know
what
I'm
doing
do
what
you
got
to
do
from
like
a
set
admin
perspective
and
then
flip
them
back
on
and
see
what
happens.
A
Semi-Recently,
it's
been
a
couple
months
now,
but
we
kind
of
heard
about
people
that
wanted
to
run
xcd
on
top
of
our
rbd
devices
right
and
that's
a
nasty
use
case
because
it's
like
small,
underlined,
synchronous,
sequential
rights,
yeah.
B
It's
the
old,
let's
just
have
a
virtualized
control,
plane,
put
satellite
director
cloud
forms
and
everything
on
rev
three
nodes
backed
by
gluster,
which
just
so
happens
to
also
be
running
the
mons
and
the
whole
virtualized
control
plane
for
stack
it
gluster,
hates
that
tiny
io.
So
the
minute
you
start
a
cdn
sync
that
would
bring
lester
down.
I
mean
it's,
it's
the
same.
We're
always
fighting,
and
I
think
now,
with
local
storage.
B
Having
that
native
association
with
host
name
where
pod
can
get
scheduled,
is
going
to
be
very
helpful
for
us
to
say:
hey,
it's,
not
your
root
device
anymore,
let's
patch,
a
local
storage
in
via
me,
that's
on
your
master,
because
it
should
have
had
that
to
begin
with
anyway,
and
and
that-
and
that
was
the
story
that
we
were
kind
of
telling.
B
When
me
and
michael
hackett
synced
up
is
we
didn't
do
ourselves
any
favors
by
saying
we
support
this
ipi
elevation,
because
all
that
did
was
brought
the
problem
back
down
to
set
backed
by
spinners.
You
know
what
what
did
the
client
expect?
So
ephemeral,
oracle
or
compute
node
with
ssds
would
still
be
better.
B
If
you
trust
the
aha
aspect,
that
ncd
brings
to
the
table
and
you're
on
top
of
it,
if
you
want
to
back
it
by
rbd
because
you
say
hey
center
is
going
to
work,
it
will
work,
but
you
know
ramp
up
a
workload.
Do
a
solid
update.
Have
it
go
crazy?
You
know
olm
kick
and
it's
not
going
to
be
happy.
You
know
image
registry
takes
the
amount
of
io
there's
a
bunch
of
stuff.
That
happens.
You
know
with
just
olm
in
general,.
A
Interesting
interesting
one,
one
thing
that
I've
been
talking
to
intel
about
lately
and
I
haven't
seen
them
on
the
call
recently
but
they've
been
so
they've
got
something
called
opencast.
It
used
to
be
this
proprietary
caching
layer
that
they
they
made,
but
now
they've
open-sourced
it.
So
it's
kind
of
competitive
for
game
cache
sort
of,
but
they've
been
doing
a
lot
of
benchmarking
lately
looking
at
comparing
those
and
they
really
want
to
have
something
that
can
sit
on
the
client
side.
A
B
I'll
have
to
scrub
a
bz,
but
we
had
an
instance
where
we
had
someone
trying
to
do
that
on
the
lvm
layer:
okay,
okay,
as
as
to
as
the
front
the
the
hole
we
don't
have
enough
solid
state,
but
the
reality
is
is
how
much
solid
state
do
we
need?
Let's
see
if
this
works,
it
was
a
topic
that
was
briefly
discussed
and
I
think
it
was
shot
back
down
in
terms
of
we're
not
going
to
move
that
direction
or
support
that
model.
B
We
try
to
solve
the
value
of
the
benefit
and
invest
in
the
time
with
the
initial
results
that
came
out
of
it.
I
don't
remember
what
zillow
that
was
remember.
C
B
Was
the
lpm
cache
that
they
were
using?
I
remember
was
specific
to
lvm
and
it
was
an
lvm
cache
thing,
but
I
can't
remember
what
about
def
site
was
actually
going
to
consume,
that
you
know
how
it
was
going
to
get
down.
It
was
outside
of
our
preview
native
to
lvm
or
or
something
else.
Let
me
see
if
I
can
find
it.
B
Throwing
random
keywords
out
of
there
come
on
course.
Our
plugzilla
is
super
fast.
A
But
then
from
intel's
perspective
right,
they
just
want
to
sell,
like
you
know,
obtain
drives.
So
that's
terrible
part
of
this
is
like
they're
they're
working
on
the
software
piece,
but
they
you
know.
Ultimately
they
want
to
listen
to
lots
of
object
drives
for
client-side
caching
for
rbds.
So
that's
you
know,
that's
that's
where
they're
coming
from,
but
legitimately
their
stuff
actually
looks
pretty
good.
I
mean
their
their
catching
layer.
Looked
like
it
was
in
the
test
that
they
just
ran.
It
looks
like
they
were.
A
They
were
faster
than
lvm
cache,
so
I'd
be
really
interested
to
go
back
to
those
our
developers
that
are
working
on
so,
okay,
you
know,
is
there
anything
that
we
did
wrong
here?
That
you
know
should
have
been
done
differently
and
we
make
this
if
we
can't
make
this
faster,
because
there's
some
way
we
can
use
ideas
and
work
together
with
with
intel
to
you
know
make
this
work.
A
All
right
looks
like
we're
starting
to
get
people
from
core.
Maybe
we'll
get
this
thing
off
the
ground.
A
All
right,
hey
guys,
so
I
think
it's
gonna
be
a
pretty
short
meeting
day.
There's
not
a
whole
lot
on
the
agenda
unless
anyone
has
something,
but
we've
got
a
couple
of
new
pr's
this
week,
one
from
adam
to
make
do
small
right,
never
do
buffered
rights.
I
started
writing
up
a
review
on
this.
A
I
think
my
concern
would
be
that
right
now
we
don't
actually
have
the
the
buffer
cache
enabled
by
default,
and
I
think
we
probably
need
that
to
get
rid
of
this
and
and
after
being
burned
a
little
bit
when
we
turned
on
directio
versus
buffered
io
at
the
bluefs
layer.
I
want
us
to
be
very,
very
sure
that
we
know
all
of
the
unexpected
consequences
of
doing
something
like
this.
So
definitely
we
want
benchmarks
to
data.
D
Yeah,
I
agree
about
the
benchmarks,
but
I
think
you
yourselves
and
we're
saying
that
it's
off
by
default,
so
the
default
behavior
shouldn't
change
at
least.
D
And
doing
the
not
doing
buffalo
to
the
the
baio,
I
thought
it
wasn't
related
to
our
cash.
But
maybe
I'm
misunderstanding
that
so.
A
So
so
my
understanding
is
that
that-
and
maybe
I'm
maybe
I'm
miss
thinking
this,
but
that
adam's
pr
that
he
wants
to
not
have
aio
wright
do
buffered
rights
to
pollute
the
page
cache
like
he's.
Why
polluting
the
page
cache
this
is.
My
understanding
is
his
idea.
Was
that
we're
all
we
already
have
our
own
buffer
cache
in
blue
store
where
we
cache
stuff?
So
why
pollute
the
page
cache
when
that
bin
makes
it
so
that,
like
roxtv
stuff,
might
get
forced
out
of
cache,
or
whatever
am
I
thinking
about
right
josh?
D
I
think
he's
also
thinking
about
it
as
a
correctness
thing
like
doing
buffalo
and
and
about
freddie
at
the
same
time,
isn't
really
recommended
with
the
kernel
in
general
and
the
bio
you're
generally
supposed
to
use
it
with
with
only
direct
I
o.
So
it
isn't.
It's
unclear
why
we
were.
We
were
kind
of
doing
this,
but
for
leo
I
had
this
path
as
a
possibility.
In
the
first
place,.
D
Yeah
yeah,
but
if
it's
been
off
by
default
for
so
long
now,
that
is
it
from
what
from
yours
comment,
it
sounded
to
me
like
it
was
making
no
change
in
practice
with
default
settings.
It
was
only
if
you
turned
on
this
bluefs
buffered
right
option
that
it
would
change
change.
Behavior,.
D
A
Oh,
I
see
so
if
that,
if
that's
enabled
in
the
I
o
context,
then
we
do
that
code
path,
okay,
right,
right,
okay,
yeah!
So
so
in
this
case,
it
would
only
follow
that
if
you
already
had
the
the
buffer
cache
enabled-
and
in
that
case
you
would
then
want
to
do
the
directio
rather
than
the
buffer
diode,
exactly
okay
yeah,
I
get
it.
That
seems
reasonable
something
we
should
benchmark.
D
Yeah
yeah,
it's
always
worth
benchmark.
Actually
there
is
one
other
case
where
we
have
the
flag
as
well,
which
is
when
we
have
the
f
advise
we'll
need
flag
on
the
operation,
although
I'm
not
sure
if
we
ever
actually
use
that
at
a
higher
level.
D
Now
it
appears
we
do
in
at
least
a
cls
fifo
for
rgw
and
cls
rbd,
okay,
yeah,
okay,
so
yeah
we're
probably
worth
maybe
worth
changing
that.
So
the
patch
changes
that
behave
here
for
that
those
operations
then.
A
Yeah,
I
I
have
a
theory
that,
back
when
a
lot
of
the
stuff
got
written,
the
thought
was
that
we'd
have
like
the
page
cache
as
kind
of
like
we
had
like
this
kind
of
hierarchical
cache
structure.
Right
where,
like
we
had
the
blue
star
cache
for
our
own
so
supposed
to
be
super
fast,
then
we're
supposed
to
have
like
the
roxby
block
cache
that
would
be
bigger
and
being
coded
so
that,
theoretically,
everything
was
smaller
in
it
and
we'd
be
able
to
use
as
like
a
secondary
level.
A
A
D
A
All
right
so,
let's
see
next
oh
initial
support
for
c-store
from
sam.
I
didn't
give
it
the
performance
tag
because
it's
probably
not
appropriate
yet,
but
I
wanted
to
add
it
in
here.
That's
really
exciting.
A
Yeah
yeah,
okay,
next
closed
pr's,
my
little
description
change
from
this
buffer
dio
merged.
It
looks
like
so
that's
good
is
josh.
I
said
you
commented
on
that.
It's
not
in
the
docs.
D
Yeah
yeah
in
the
blue
store
configuration
config
reference,
there's
no
documentation
about
the
option,
so
you
have
to
be
looking
at
the
source
to
know
about
it.
A
A
A
Forget
I'm
checking
the
pr
right
now.
It
is
advanced.
A
Sure,
okay,
so
yeah
we
could.
We
could
certainly
put
something
in
the
docs,
especially
you're,
right
about
swap
disabling
stop.
That's,
I
think,
going
to
be
really
important.
Okay,.
A
That
lock
was
changed
for
bluefest
was
reverted.
Oh
sorry,
fs
from
bluestore
was
reverted,
that's
good
and
also
the
other
fix
that,
I
think,
probably
wasn't.
Actually
a
fix
was
closed.
A
This
mempool
cache
line,
optimization
detection
stuff
that
louis
has
been
working
on
it,
looks
like
smith
farm
had
written
something
that
that
I
think
was
related
to
qa
for
testing
this
and
that
got
merged
and
then,
oh,
I
closed
my
old
pr
that
had
the
blue
first
buffer
documentation
change
and
changed
the
ode
map
to
tree
structure
instead,
since
ronan
wants
to
try
playing
around
with
the
the
hash
specialization
or
the
o
node
map
when
using
an
order
map.
A
Let's
see
updated,
rw
compression
bypass
casey,
you
reviewed
that
further.
I
don't.
I
didn't
really
pay
too
much
attention
to
what
he's
talking
about
there
but
looks
like
it's
is
still
actively
being
looked
at.
Gabby
you're
looks
like
you've
been
making
more
fixes
to
your
your
allocation
work.
E
Hey
come
again:
oh,
it
looks
like
you've
been
making
more
fixes
for.
F
F
A
All
right
last
updated
one
here.
This
crimson
osd's
client
request
parallelism
dar.
It
looks
like
that
got
rebased,
I'm
very
curious
to
see
how
well
that
one
works
in
performance
testing,
otherwise
that
last
of
us,
no
movement,
I
actually
didn't
make
it
through
the
whole
list.
Unfortunately,
I
didn't
have
time
to
finish
up
this
morning,
but,
but
I
think
most
of
this
other
stuff
is
probably
still
just
stale.
A
Unless,
oh
occasionally,
we
see
updates
but
a
lot
of
times
it
stays
stale.
So
anyway,
did
I
miss
any
prs
anyone's
working
on.
A
All
right,
so
the
only
discussion
topic
I
have
is
that
I've
just
got
a
couple
of
updates
for
the
oatmap
bench
work
that
I
had
mentioned
last
week.
A
Gabby
some
of
this
is
for
you,
so
I
went
back
and
I
looked
at
so
okay,
so
so
for
people
that
haven't
seen
these
before
I'll
just
link
the
spreadsheet
into
the
chat
window
here
and
gabby
had
noticed
in
these
tests
that
last
week
that
we
saw
like
a
huge
performance
improvement
between
luminous
and
nautilus
and
a
couple
of
these
tests,
specifically
the
set
keys
and
the
remove
tests.
A
It's
a
really
dramatic
performance
improvements
when
going
from
luminous
to
novelis.
These
numbers
are
in
seconds
so
basically
between,
like
five
and
and
even
up
to
like
15
times
or
higher,
maybe
even
yet,
and
it's
faster
so
and
he
was
worried
about
because
you
know
that's
a
big
change,
so
I
in
the
the
third
tab
here
I
started
going
back
and
trying
to
look
at
different
versions
of
master
in
that
time
period.
A
We're
seeing
what
to
me
what
looks
like
not
a
single
commit,
making
things
faster,
but
perhaps
multiple
commits
between
luminous
and
nautilus
that
perhaps
were
having
an
effect
it's
hard,
because
the
version
of
luminous
I
tested
had
a
bunch
of
back
ports
already
done
to
it,
and
when
I
went
back
in
time
to
these
point
in
time,
snapshots
of
master
one.
It
was
super
hard
to
compile
these
on
centos
eight
that
it
did
not.
It
was
not
happy.
I
had
to
do
a
lot
of
a
lot
of
screwing
around
to
get
stuff
to
compile
right.
A
I
also
had
to
back
port
the
omap
bench
to
work
with
intermediate
versions
of
the
code
as
it
changed
since
that
api,
the
object
store
api
had
changed
some,
but
I
was
able
to
get
to
work
and
I
saw
that
set
keys
was
faster,
but
not
as
fast
as
nautilus
in
these
kind
of
intermediate
versions,
as
diffic
prs.
That
sage
did
that
reduced
flushing
behavior
and
I
had
noticed,
as
I
was
doing
these
with
wall
clock
profiling.
A
That
flushing
really
seems
like
it's
the
the
key
to
this
story
as
to
why
it's
so
much
faster
in
luminous
in
like
set
keys,
we're
spending
a
huge
amount
of
time
in
the
kdc
thread.
Just
doing,
I
think,
a
sync
file
range.
If
I
remember
right,
like
sixty
percent
of
the
time,
the
kv
sync
thread
was
spent
just
doing
that,
and
in
nautilus
is
this,
we
actually
don't
see
sync
file
range
at
all.
A
If
I
remember
right,
it's,
I
think
we're
doing
f
data
sync
and
surprisingly,
that
seems
to
be
much
faster,
not
really
exactly
what
I
would
have
expected,
but
we
also,
I
believe,
reduce
the
amount
of
flushing
and
syncing
that
we're
doing
specifically
for
omap.
So
I
believe
that
this
story
will
become
more
clear
as
more
stuff
comes
to
light,
but
that's
kind
of
kind
of
what
we're
seeing
now.
A
Interestingly,
though,
the
remove
performance
didn't
change,
it
was
more
or
less
what
it
was
in
luminous,
so
something
else
seems
to
have
improved
that
in
nautilus
it
doesn't
seem
to
be
the
same,
the
same
factor
at
least
not
all
the
same
effect.
So
lots
of
lots
of
interesting
questions
still
exist,
but
that's
that's
what
I
see
right
now
and
separately.
A
I
was
looking
at
buffered
versus
direct,
I
o,
and
why
that
that
is
so
much
better
and
buffered
for
like
omap
get
and
a
couple
of
other
cases
remove
actually
is
one
of
them
and
with
wall
clock
profiling
there
I
saw
that
with
direct.
I
o.
We
are
indeed
going
back
and
doing
a
bunch
of
reads
on
disk
in
prefetch
that
we
don't
do
when
using
buffered
io.
The
strange
thing
is
that
both
should
be
reading
from
the
rock's
tv
block
cache.
A
We
already
knew
that
this
is
just
more
further
confirmation
that
in
the
wild
clock
profiles,
we
do
not
see
that
happening.
We
instead
see
us
instead
of
reading
from
cache.
We
see
us
reading
from
disk.
That's
really
really
clear.
So
still,
don't
know
why,
but
it's
just
kind
of
more
fuels
of
the
fire
that
that,
for
some
reason,
things
that
I
would
have
expected
us
to
read
from
cash
in
those
scenarios
were
not.
A
A
A
I
o
like
a
year
ago
because
of
some
internal
testing
on,
I
believe
on
downstream
that
showed
that
we
were
seeing
really
bad
swap
behavior
with
rgw,
with
osds
that
had
rgw
targeting
them
when
that
was
essentially
just
causing
everything
to
like
fall
over
it
looked
like
the
actual
oc
processes
were
getting
swapped
out
possibly,
and
everything
was
super
slow
and
problematic
when
we
switched
to
direct
dial
that
didn't
happen,
but
it
turns
out
that
we
didn't
realize
when
we
switched
to
direct,
I
o
that
was
causing
certain
like
collection
listing
and
omap
performance
to
really
degrade
badly.
A
I
didn't
show
up
in
any
of
our
performance
testing
at
the
time,
so
people
started
really
noticing
that
when
they
were
doing
things
like
deleting
pg's,
I
believe
in
a
couple
other
scenarios.
A
Is
this
the
result
of
the
of
the
bloomberg
issue?
A
This
is
a
public
meeting,
so
we
don't.
We
don't
talk
about
it,
but
but
it
was
a
customer
that
that
ultimately
resulted
in
a
lot
of
this
work
done.
A
G
C
Have
no,
I
have
no
way
to
know
what
you
were
talking
about
mark
until
I
rewatched
the
meeting,
which
I
will
not
so.
A
C
About
buffering
io
or
what,
but
I'm
still
not
in
in
the
context.
A
I
I
have
a
suspicion
that
sage
did
it
back
then,
because
he
was
at
the
time
he
was
thinking
a
lot
about
things
in
terms
of
like
multi-level
caches,
like
he
wanted,
the
oh,
no
cash
in
blue
store
to
be
a
high
level
fast
cash
with
like
the
rocks
to
be
black
cash
being
slower,
but
more
dense
cash
underneath
it
and
then
potentially
the
page
cash
thing
under
that
with,
like.
You
know
that
being
kind
of
like
the
third
level
cash
that
can
be
shared
between
different
processes
and
things.
C
I
don't
know
I
mean
we
are
using
that
way.
I
mean
aio
right
with
dependence
on
whether
we
cache
it
or
not.
Only
in
do
right,
small
in
other
cases.
We
don't
do
that
at
all,
so
that
I
still
leave
some
room
for
thinking
that
maybe
we
wanted
to
make
a
behavior
that
if
we
do
write
small
and
only
till
a
sector
in
half
or
something
we
expect
to
add
it
soon,
so
we
will
read
it
soon.
C
C
So
that's
I
thought
that
maybe
try
to
go
straight
and
clean
that
wiggle
room
will
be
better.
A
C
Yes,
I
will-
I
did
not
test
that
yet
fully
and
I
intend
to
test
that
using
a
basic
small
unit
for
testing
smaller
than
allocation
unit.
So
I
will
be
spamming,
a
lot
of
really
small
rights
and
that's
the
possibly
a
place
that
I
can
see
a
difference,
because
otherwise
there
will
be
no
difference
at
all.
A
Josh-
and
I
also,
I
think,
both
agreed
with
each
other
that
it
it
would
be
really
nice
in
general
if
we
can
fix
whatever's
going
on
with
roxdb
to
just
get
rid
of
buffered
io,
all
over
just
make
everything
direct
io.
C
I
think
josh
is
gone
yeah
he
left,
but
yes
yeah.
I
agree
I
I
would
prefer
to
go
for
direct
io
in
all
cases,
if
we
doing
any
buffering
ourselves
so
for
bluestore,
it
might
make
some
sense
to
use
buffered
io
really,
but
maybe
even
there
so
much
if
we
properly
verify.
Finally
that
we're
using
a
block
a
block
cache
as
we
should
use
it.
F
Yeah,
it's
really
annoying,
but
it's
wireless
so.
C
Yes,
we
do
not
differentiate
turning
on
and
off
of
buffered
io,
depending
on
the
type
of
device
you
are
based
on.
I
did
not
see
such
kind
of
recommendations,
and
I
didn't
hear
anyone
talking
about
it.
That
way.
We
either
decide
to
make
it
buffered
on
the
page
cache
level,
and
then
we
make
it
buffered
or
we
buffer
it
ourselves,
and
we
just
make
it
using
direct
interface
and
buffering
ourselves.
C
F
F
C
F
It's
not
the
most,
especially
we're
talking
about
multi-level
with
buffering
right,
I
mean
if
it
was
just
one
specific
battery
that
we
control
yeah
sure
go
for
it,
but
once
you
have
multiple
players
trying
to
buffer
it,
and
we
only
talking
about
ssds,
I
mean
I
I
I
would
try
to
do
without
buffering
versus
these,
just
because
we
have.
A
A
We're
trying
to
head
in
that
direction,
gabby.
The
problem
we
have
right
now
is
that
for
some
really
strange
reason,
it
doesn't
appear
that
if
we
have
like
an
o
node
miss
it,
it
looks
like
if
we-
I
don't
know
if
it's
not
going
to
miss
exactly
it's
more
like
a.
A
If
we
have
a
a
miss
for
like
omap
data
when
you're
you
know
looking
at
an
o,
node
or
something
or
or
something
else
right
like
a
rgw
object
that
we
don't
read
it
from
the
block
cache
like
we
should,
especially
during
iteration.
For
some
reason
I
don't
I
don't
understand
why.
But
we
end
up.
A
That's
the
problem
is
that,
for
some
reason
it
doesn't
look
like
it's
reading
from
the
block
cash.
Those
blocks
don't
seem
to
be
getting
read
the
way
that
we
would
expect
them
to
from
the
blog
cache.
Instead,
it's
like
being
going
into
a
prefetch
step
and
re-reading
from
the
disk,
and
it's
doing
that
like
even
as
we
like
reiterate
over
and
over
again,
it
will
like
go
refresh
that
same
block
from
the
disk
over
and
over
again
leading
to
the
huge
read
amplification.
A
Drive
I
mean
you
could,
but
the
the
problem
we
have
right
is
that
if
you're
like
iterating
over
a
collection
and
you're,
always
reiterating
from
the
same
point,
looking
for
some
something
in
the
collection
which
is
apparently
what
it
seemed
like,
we
were
doing,
especially
in
older
versions
where,
before
igor's,
some
of
his
fixes,
it
would
like
just
keep
re-reading
the
same
block
over
and
over
again
from
disc.
It
was
like
awful.
F
A
Roxdb
typically
uses
the
page
cache,
though
usually
it's
doing,
buffered
reads
and
it
uses
the
page
cache
as
like
a
a
secondary
cache
for
the
block
cache,
and
by
doing
that,
even
if
you're,
like
re-reading
the
same
block
over
and
over
again,
if
it
ends
up
in
the
page
cache,
it
doesn't
matter
really
in
sort
of
it's
not
great,
but
that's
that's.
How
we
get
around
this
right
now
is
that
we
have
to
turn
bluefish
buffer
dio
back
on,
so
that
those
block
reads
would
come
from
page
cache
rather
than
from
disk.
F
A
F
A
Where
I
think
adam
has
some
stuff
that
he's
written,
that
will
let
us
look
at
the
hit
rates
in
the.
F
That
probably
means
that
they
are
using
some
options.
There
are
some
options
to
bypass
cash
right,
I
mean
in
every
system
in
every
cashing
system.
There's
an
option
say:
don't
trust
cash,
there's
options
or
maybe
there's
an
indication
causing
grogs
to
be
to
suspect
that
the
cash
is
not
up
to
date,
that
there
was
some
change
which
is
not
reflected
in
the
cash.
If
we
could
find
what
is
this
thing
causing
blocks
to
be
to
mistrust
the
cash?
Maybe
you
can
just
fix
that
issue
instead
of
doing
double,
cashing.
A
C
Yes,
I
concur.
That's
the
what
thing
I
would
like
to
to
see.
We
just
make
block
cache
work
perfectly
fine
and
we
disabled
buffered
io,
and
we
control
fully
the
memory
we
are
using
for
osd.
A
We
see
the
effect
of
it
primarily,
I
think,
when
we're
doing
like
iteration
with
like
omap.
That
seems
to
be
a
really
common
case
in
that
spreadsheet
that
I
linked.
If
you
look
at
the
second
tab,
that's
like
the
500k
slash
20s.
A
A
Although
the
iteration
was
faster,
I
take
that
back,
iteration
deep
to
first
and
lower
bound
was
actually
fast.
In
that
case,
it
was
just
get
that
was
slow
and
delete
that
was
slow
or
remove.
That
was
slow.
F
F
F
C
No
it's
irrelevant
because
that
option
only
tells
how
to
operate
its
port
and
we,
but
we
are
having
our
own
file
system
port.
So
those
options
do
not
even
matter
for
us.
F
F
Okay,
so
we
definitely
need
to
try
and
understand
this
behavior,
but
it
should
be
something
documented.
I
would
expect
that
such
thing
as
when
to
bypass
the
voice
cash
when
to
mistrust
or
is
cash
or
what
kind
of
logic
causes
rocks
to
be,
to
suspect
that
the
cache
is
invalidated.
Do
we
have
an
option
to
invalidate
a
cache?
C
C
We
don't
we
don't
know,
we
don't
know
if
the
rocks
db,
mistresses,
the
cache
or
just
using
it
incorrectly
from
what
I
can
see
when
I
observed
it
in
it
in
detail,
I've
seen
that
roxdb
tended
to
request
a
lot
of
entries
from
cache
that
were
never
put
there.
It
was
like
a
different
pattern
of
a
key
that
they
were
trying
to
retrieve
and,
of
course,
they
were
they
weren't
there.
So
maybe
there's
just
an.
F
F
History,
what
I
put
in
occasion,
what
is
not
the
simplest
thing,
is
always
go
first,
the
cache
and
if
it's
not
go
to
the
disk,
what?
But
you
are
saying
that
the
data
exists
in
cache.
We
know
that
it's
in
a
buffer
cache,
but
roxy
b
still
goes
to
the
disk.
F
So
it
doesn't
ask
if
these
things
is
in
cash.
Is
there
some
reason
for
them
to
say
we
don't
trust
the
cash
or
don't
expect?
Maybe
it's
an
optimization
when
they
don't
expect
that
to
be
in
cash,
then
they
don't
go
there
or
maybe
it's
something
automatic
after
they
had
that
many
cash
misses
they
would
stop
going
to
the
cash
because
going
to
the
cash
introduce
some
extra
hope
that
they
want
to
avoid.
A
Gabby,
I
put
a
link
in
the
chat
window.
This
is
like
a
walkthrough
of
the
code
that
adam
and
I
were
doing
a
couple
weeks
ago.
I
guess
or
longer
holy
smokes
back
in
march.
I
guess,
though,
possibly
maybe
that
will
be
if
you're
interested
in
looking
at
it.
That
might
be
a
good
good
place
to
to
kind
of
the
the
flow
of
how
the
code
works.
C
Yes,
gabi,
I
I
would
really
love
to
I
mean,
but
it's
not
like
that.
I
know
the
answers.
I'm
also
just
searching
almost
half
blind
to
roxdb
code
and
the
effects
I
see
in
the
logs.
So
I
would
gladly
debug
that
with
you,
maybe
we'll.
F
A
Yeah,
I
think
I
think
the
the
the
next
step
in
my
mind
is
we
need
to
to
see
just
if
what
the
block
cache
is
actually
doing
like
are.
We
are
we
actually
fetching
stuff
from
it
and
how
often,
and
how
often
are
we
actually
doing
the
reads
from
disk
and
and
then
from
there
we
might
be
able
to
actually
start
instrumenting
the
cache
itself
to
tell
us
why
it's
making
decisions
the
way
it
is
yeah.
E
Oh,
no,
we
still
got
randy,
we
might
scare
him
off
yet.
C
B
I
I'm
part
of
the
the
kobe
generation.
Now
I'm
just
trying
to
find
friends
that
aren't
my
kids,
I
hear
you.
A
I
hear
you
all
right:
well,
I'm
actually
gonna
wrap
up
early
today,
guys
because
I
want
to
go,
have
lunch
and
eat
so
any
any
last
minute
things
anyone
wants
to
bring
up.
B
Just
that
bc
I
threw
in
the
chat
just
something
to
keep
in
the
back
of
your
mind.
If
you
ever
hit
a
bucket
with
millions
of
objects,
if
it's,
if
it
has
a
soft
quota
tied
to
it,
we
we
saw
a
recent
issue
and
it's
present
and
master
right
now,
where
it
will
go
and
still
do
a
list
object
or
list
buckets
against
against
the
bucket,
with
millions
of
objects
on
a
simple
get,
even
when
it's
not
explicitly
called
because
of
the
quota,
and
that
was
generating
a
significant
amount
of
I.
B
On
the
bucket
index
pool
for
a
customer.
We
had
recently.
C
A
E
F
C
Okay,
I
would
just
like
to
signal
that
I
will
be
touching
block
device
interface
and
straightened
up
dependencies
between
being
direct
io
and
buffered
and
asynchronous.
I
just
want
to
make
it
always
asynchronous
is
never
buffered
and
the
other
modes
might
and
might
not
be
buffered.
With
addition
that,
possibly,
I
will
add
an
api
call
to
think
right.
That
is
different
than
right.
C
Sorry,
buffered
right
make
sure
that
always
data
goes
to
disk,
so
there's
no
possibility
to
spam.
Small
writes
to
the
device
and
then
and
some
at
some
other
time
make
it
flush
yeah,
that's
that
stems
from
my
problems
with
performance
with
blue
fs
and
buffer
taiyo,
that
was
heat,
so
yep.