►
Description
http://goo.gl/U4b70r
29 October 2014
Ceph Developer Summit: Hammer
Day 2
OSD (Tiering): Reduce Read/Write Latencies on Cache Tier Miss
Zhiqiang Wang
A
Yeah,
okay
yeah
for
this
one
and
we
invite
you
to
reduce
the
revivatin
T's
on
calculus
in
the
current
code
pass
in
the
color
code
and
when,
when
do
a
read-
and
it
is
missing
in
the
cast
year
we
were
promoted
and
after
the
promotion
we've,
we
will
replicate
this
object
to
the
and
to
all
of
the
castillo
OST
that
will
pass
and
when
and
the
read
request
is-
is
cued
and
deserved.
After
all
of
the
replications,
and
then
we
can
make
some
make
us
an
improvement
improvement.
A
This
is
for
read
and
for
right.
I
went
in
his
museum
cast
here
and
we
promote
it.
We
will
first
replicate
this
promoted,
object
to
to
order
Castiel
OST
and
then
we
also
add.
At
the
same
time,
we
will
update
the
object
with
the
data
from
the
kind
and
then
do
a
lot
of
replication
I
I
am
here
we
can
make
we
can.
We
can
avoid
one
replication.
A
Here's
to
that
that-
and
we
do
not
need
to
do
from
replications
after
the
promotion,
we
must
hold
the
data
and
then
we
we
updated
the
object
with
the
data
from
the
kind,
and
then
we
do
a
revocation.
Then
we
we
just
need
to
want
to
do
one
implication.
We
we
we
serve
the
time
for
one
applications
and
44
object
right
and
since
we
are
going
to
overwrite
the
object
anyway,
I
think
there
is
no
need
to
promote
the
object
data.
A
A
For
the
read
I
when
had
when
we
handle
the
promotions
in
the
process,
copy
chunk
and
I
in
redcar
chang
functions,
we
wake
up
the
bed
and
copy
the
promoting
that
we
can
do
them
to
something.
The
radical
chant
functions
we
can
cut
the
and
promote
we
can
copy
the
data.
We
need
for
the
read
request,
tutor
to
the
OSD
or
ping
to
the
out
data.
A
The
reader
cast
is
doctor
again,
because
we
can,
we
can
get
the
read/write
knock
I,
but
in
this
new
new
in
this
new
solutions,
since
the
data
is
already
in
the
update
all
data
list
of
the
oyster
v,
we
do
not
need
you
to
block
this
read
write
and
this
region
request
again.
So
we
don't
need
to
get
the
rock
in
the
GOP
functions:
I,
ok,
and
that
can
go
ahead.
B
C
So
I
think
that
the
trick
here
is
that
there's
there's
going
to
be
a
sub,
a
small
subset
of
operations
that
this
will
work
for.
So
it's
only
when
the
read
operation
can
be
completely
satisfied
by
the
results
from
one
one
chunk,
basically
something
cents
or
partially
satisfied
from
that
one
Chuck
like
it.
So
it's
so
we
can't.
C
D
So
this
is
the
set
of
function.
The
set
of
reads
for
which
this
will
work
is
exactly
the
same
as
the
set
of
ops
for
which
a
sink
rates
are
ok
and
the
way
a
sink
reads.
Work
is
we
build
up
a
map
of
the
offsets
we
need?
So
if
we
keep
that
map
of
offsets
as
weak
as
we
receive
chunks,
we
can
of
course
filled
in
them
and
right.
D
Me
kick
it
off.
That
said,
it
still
only
works
for
the
first
read.
It
does
not
work
for
any
subsequent
read.
If
you
receive
two
two
reads
in
sequence:
the
first
one
will
be
satisfied
this
way.
The
second
will
be
blocked,
waiting
for
the
thing
to
complete
I,
think
you're,
actually
better
off
proxying
the
read
to
the
backend,
the.
D
A
B
D
D
Yeah
I
mean
maybe
like
I,
said:
I,
don't
know
well,
maybe
because
you're
gonna
have
a
lot
of
right
locality
after
a
read.
Actually,
if
what
you're
doing
is
you're
promoting
a
piece
of
a
file
that
you're
doing
right,
random,
I
Owen,
then
you
may
be
reading
that
you
may
be
reading
a
big
chunk
of
the
file
into
the
cash
on
the
rpd
client
and
then
you'll
do
in
place
over
rights
and
those
will
flush
back
out.
So
actually,
yes,
possibly
it
depends
on
the
workload.
I.
C
C
D
But
even
even
so,
you,
you
have
to
have
a
read
that
only
reads
out
of
that
and
subsequent
trunks
yeah
that
you
wouldn't
have
been
better
off
proxy.
Oh
right
right!
So
it
both
of
those
have
to
be
true.
So
it
can't
require
any
bites
that
you've
already
pulled
through,
because
then
you'll
have
to
wait
to
the
full
object
anyway,
and
it
can't
be
so
far
ahead
in
the
future
that
you
wouldn't
have
been
better
off
descending.
The
small
right
back
right
right
back
to
the
baseball
but
yeah.
It
seems
like
a
pretty
narrow
window.
C
A
So
you
are
worrying
about
the
async
read
and
in
this
can
in
this
way
to
2
and
2
billion
the
data
of
the
reader
cast
in
in
a
in
the
promotion
chunks
I
since
we
met
we
made
promote
this
object
several
times.
I
we
go
in
the
opt
and
their
country
past
read
request
and
in
several
times
I
or
forcing
Creed
I
think
we
can
just
I
appear
in
the
the
doctor.
The
Vision
Quest
is
held
in
its
held
in
the
we.
A
C
D
D
Rates
is
that
what
you
put
in
the
out
data
may
depend
on
what
you
read
object.
Classes
can
can
do.
Art
can
perform
arbitrary
reads
based
on
information.
They
get
back
from
a
read,
so
you
could
read
the
first
bite
and
use
that
to
decide
whether
you're
going
to
read
the
second
megabyte
of
the
third
night
light.
No
one
does
that,
of
course,
so
for
easy
pools.
We
have
a
thing
where
it
detects
when
it's
able
to
do
an
async
read,
and
in
that
case
we
can
do
this
as
you
go
through
the
OSD
ops.
D
A
D
C
A
C
If,
if
the
reed
is
small
and
and
if
you
know
that
you
already
have
your
part
way
through
promotion
and
your
promotion,
that
is
in
flight
already,
has
the
exact
same
data
that
you
need
and
it
fully
falls
within
that
range.
Then
it
might
make
sense
to
wait
for
that
that
next
chunk
to
come
and
satisfy
it
from
that
chunk.
But
that's
going
to
be
like
likes.
I
think
that
the
promotion
currently
is
like
5
12
or
one
meg
chunks.
Maybe
do
you
remember
Sam
what
the
current
chunk
granularity
is
for
promote
if.
C
D
Okay;
okay,
though
it's
one
request
them
so
either
either
you've
got
into
what
I
mean,
in
general
case,
for
our
BD,
it's
going
to
be
11
copycat
and
you
it's
so.
The
only
way
you
win
is
if
the
small
Reed
came
in
right
before
the
copy
get
chunk,
showed
up.
Otherwise,
you're
waiting
for
a
four
megabyte
read
to
perform
buddy
I.
C
D
C
But
first
step
would
be
to
first
step
would
be
to
proxy
the
reeds
in
the
normal
case,
and
then
the
second
step
would
be.
If
we
get
a
read-
and
we
know
it
can
be
satisfied
by
the
copycat-
that's
in
progress
and
we
we
do
this
additional
operation,
and
that
would
probably
we
probably
want
to
implement
the
async
infrastructure
first
so
that
we
can
reuse
the
same.
A
Ok,
the
in
the
in
the
right
code,
bus
I,
I,
also
in
the
function
right
copy,
chunk,
I.
We
we
can
allocate
operation
context
to
to
store
all
the
data
promoted
from
the
best
year
and
we
do
not
initiate
a
repop
to
rev
it
to
the
rare
occasions
and
after
that,
when
the
red
request
is
unblocked,
I
in
the
function
duo
p,
since
we
already
allocated
this
operation
of
the
contest,
what
is
a
write
request?
We
do
not
need
to
allocate
another
one,
we
use
the
previous
one
and
then
in
the
20
s
GOP
functions.
C
B
C
Things
need
to
happen
here.
We
need
to
make
sure
that
the
that's
promoted
right.
Essentially
that
says,
here's
the
here's.
The
new
object
includes
the
effects
of
the
right,
so
it
has
to
be
have
a
new
object,
info
t
or
new
version
tea
and
all
the
end
time.
All
that
stuff
needs
to
be
changed
and
then
we
need
to,
and
we
need
to
combine
the
right.
The
written
data
into
the
promoted
data.
D
The
other
piece
you
may
be
missing
is
that
we
don't
that
that
wrap
up
is
also
what
does
the
local
ripe.
A
D
C
D
C
C
I
think
that
the
high
part
is
that
it
needs
to
do
all
of
the
same
machinery
that
happens
in
in
my
finished
context
and
prepare
transaction
somewhere
in
there
where
it
like
sets
the
version,
and
it
sets
the
M
time
and
it
sets
the
user
version,
and
it
all
that
logic.
That's
like
kind
of
convoluted,
because
the
object
that
we
promote
will
be
the
modified
up
version
and
not
the
original
version.
C
B
D
Actually,
are
you
worried
about
the
extra
our
contacts?
Are
you
worried
about
the
fact
that
it
blocks
because
I
don't
think
it
does
blog
a
copy
that
that
finished,
promote
that's
not
going
to
hold
the
read,
lock
the
whole
time
it's
going
to
drop
the
reblock
as
soon
as
it
submits
the
wrap
up.
The
right
would
then
immediately
execute
as
long
as
it
doesn't
try
to
take
a
reebok
if
we're
going
to
take
a
reebok,
EP
screwed
anyway,
yeah.
A
B
A
D
No,
no
I
see
the
problem.
We
couldn't
we
block
until
the
promotoras.
Don't
we
all
right
edge
to
e
I'm,
not
if.
A
D
Okay,
so
that's
that's
what
we
should
fix
so
conceptually
once
we
fired
off
the
wrap
up
that
that
completes
the
promote.
We
don't
need
to
continue
to
block.
We
can
release
the
cues
and
rely
on
the
right
on
the
right.
Lock.
We
we
we
have.
No,
that's
it.
That's
a
sure
exclusive,
lock.
So
any
well,
it's
a
shared
shared
blog.thanks.
Anything.
A
C
Okay,
okay,
so
then
I
think
then
it's
a
then
it's
really
just
a
question
of
how
big
the
client
right
is
compared
to
the
promote.
So
if
to
promote
is
for
megs
and
the
client
right
is
for
k,
then
it
doesn't
matter
that
much
right
we're
losing
the
cost
of
an
extra
message
which
isn't
that
big
and
we're
we're
sending
4k
that
we're
about
to
overwrite,
which
is
like
who
cares
so
I?
Wonder
if
it?
C
C
C
D
Yeah
also
just
for
the
record,
with
the
four
megabyte
right,
followed
by
the
form,
megabyte
or
megabyte
right,
followed
by
the
four
kilobyte
right
you're
not
going
to
do
a
four
kilobyte
right
to
the
actual
disk.
The
journal
will
see
both,
but
the
file
system
won't
it'll,
be
written
to
page
cache
and
then
later
on,
it'll
be
flush
with
both
operations,
so
I
yeah
I'll
show.
That
saves
you
anything.
C
C
Mm-Hmm,
so
that
there's
still
the
complexity
about
when
you,
when
you
sort
of
finally
write
the
object
when
you're
done
with
the
promote
you
have
to
modify
that
yet
to
make
sure
the
object
of
T
is
a
sort
of
the
the
net
effect
is
both
to
promote
and
the
decline
up,
but
that's
that's
doable.
Just
have
to
be
careful.
Actually,
the
real
concern
is.
D
If
the
we
we,
we
don't
perform
client
or
client
rights
for
clients
that
aren't
there.
So
if
the
client
causes
a
promote
and
then
isn't
around,
we
have
to
know
to
actually
apply
that
rightful
anyway,
which
is
a
small
change,
but
the
my
biggest
concern
is
that
what
things
actually
do,
rightful
that
didn't
just
previously
do
do
do
delete
so.
C
It
could
be
well,
it
is
necessarily
rightful
it's
anything
that
is
doing.
That
happens
to
overwrite
all
of
the
data
right.
So
that's
how
many
things
do
rightful
rgw
does,
but
it
it's
writing
to
fresh
objects.
Always
it's
ever
over
writing.
So
it
doesn't
help
an
rbd
import,
probably
would.
But
how
often
do
you
import
across
an
existing
image?
I
think?
Actually
you
never
do
that
and.
C
C
A
bit
but
the
write-back
cache
is
going
to
coalesce
this
into
one
for
Meg
right.
It'll
it'll
actually
go
to
yeah,
ok,
so
the
OSD
will
so
that's
the
case.
For
the
OSD
will
see
one
big
for
Meg
thing.
It
won't
be
marked
as
rightful
at
least.
Currently
we
could
prolly,
we
could
fix
it,
so
it
would
in
fact,
actually
if
we
did
fix
it,
if
it
would,
that
would
help,
because
otherwise
the
OSD
has
to
to
do
it's
tile
to
find
out
it
needs
the
object
info
worse.
C
C
B
D
D
C
D
C
C
You
have
to
look
at
that
yet,
if
you
say,
if
is
this
a
rightful
and
set
adders
and
as
completely
undone
dependent
on
any
previous
state
of
the
object?
Alright,
there's
no
class
ops
or
compare
exert
or
anything,
and
if
it,
if
it
passes
that
test
it's
a
simple
enough
operation,
then
the
the
promote
would
have
a
flag
that
says,
skip
the
data
because
I'm
about
to
replace
it
anyway
and
the
promote
completion.
C
D
D
C
So
one
thing
that
might
be
worth
mentioning-
this
is
a
tangent,
but
Adam
was
working
for
us
over
the
summer
and
he
wrote
a
trace
capture,
tool
or
rbd.
That
is
a
bunch
of
LT
TNG,
trace
points
that
will
generate
a
trace
file
from
a
real
rbd
workload,
though
the
idea
would
be
that
you
would
run
this
on
an
actual
cloud
with
the
actual
VMs
and
you
would
get
work
blood
traces
for
all
of
everything
that
they're
actually
doing.
C
And
then
you
could
look
at
that
and
you
could
say
how
you
could
look
and
see
how
often
a
that
we
do
complete
rights
in
the
first
place
and
then
be.
You
could
try
to
try
to
model
how
old
those
objects
were
and
are
they
likely
to
be
in
the
based
here
not
in
the
cast
year,
and
would
this
promote
case
actually
help
and
if.
B
C
D
A
C
A
virtual
cloud
and
with
the
real
work
lid
and
then
we
need
to
capture
your
traces.
So
hopefully
you
have
these
open
clouds
just
sitting
around
where
you
can
can
do
that,
but
the.
D
Mean
the
bulk
data
copied,
I
mean
it.
It
would
be
fine
for
me
if
you
did
like,
as
if
f
distribution
for
a
bit
on
an
fi.
Oh,
I
don't
f
io
using
fio
against
our
BD
and
then
did
a
bulk
copy
like
just
a
big
zero
operation
or
no
zero
operations.
Terrible
cuz,
it'll
it'll
delete
the
objects,
but
a
piping
from
deborah
you,
you,
random
or
whatever.
D
If
you
observe
a
lot
of
right,
Foles
there,
because
the
whole
stack
is
willing
to
actually
do
that
coalescing,
then
that
would
probably
be
reasonable
in
other
word.
Would
it
would
greatly
speed
up
the
bulk
copy
case,
which
is
something
people
do
and
get
annoyed
when
it's
slow
right?
So
if
we
have
out
that
that
that
would
be
a
good
place
to
start
anyway.
Well.
C
I
mean
the
second
question
is:
how
often
does
this
rightful
happen
in
a
case
where
it's
followed
by
all
rights
that
you
want
to
actually
want
to
have
in
a
cashier?
It
could
be
just
that
it
could
just
be
that
when
we
get
a
big
right,
a
complete
block,
we
just
say
this
is
big.
The
object
hasn't
been
touched
in
a
while
I'm
just
gonna
pass
it
back
into
here,
because
it
I
don't
want
to
pollute
I,
don't
want
to
cause
castien.
Yes,.
A
C
C
C
B
A
C
Guys,
Thank
You
fixer,
yes,
good
you
up,
yes,
good
data
and
and
good
suggestions.
I
think
I'll
probably
do
what
I
did
with
those
sessions
yesterday
and
I'll
try
to
go
through
and
still
listen
to.
I
think
that
the
things
that
we
probably
want
to
do
but
then
feel
free
to
disagree
and
tell
me
that
I
missed
missed
your
suggestion
so
but
I'll
try
said
any
mail
tomorrow
that
what
does
that?