►
Description
https://wiki.ceph.com/Planning/CDS/CDS_Giant_and_Hammer_(Jun_2014)
25 June 2014
Ceph Developer Summit G/H
Day 2
RBD: Copy-on-read for clones in kernel RBD client
A
All
right
and
now
we
are
on
to
the
rbd
copy
and
read
clones
in
the
colonel
rbd
client.
We're
joined
by
min
and
ostensibly
lee
will
be
joining
as
well
so
min.
Would
you
like
to
give
an
overview
of
your
blueprint
and
we
can
get
started.
B
Okay,
good
morning,
everyone
I
will
introduce
accordion
raid
in
the
corner,
rb
client,
and
we
have
discussed
new
new
rbd
copyright
and
in
flavor
cds,
and
this
time
I
will
discuss
how
to
implement
it
in
kind
of
rbd
class
and,
as
we
know,
another
language
latency
will
be.
The
bottleneck
of
reading
object
not
exist
in
clone
because
it
will
reobject
from
parent.
B
If,
if
it
are
not,
I
wish
you
read
from
this
parent.
If
you
think
this,
we
just
read
it.
Secondly,
after
we
got
the
object
from
parent,
we
just
write.
B
We
just
write
it
centrosome
with
say
our
escrow
verma
cover
up
method,
and
so
and
after
each
written
in
clone,
we
just
finished
the
whole
process
and,
in
addition.
B
C
Might
be
a
little
bit
refractoring
you
might
need
to
do
to
make
the
lower
level
calls
work,
but
I
think
most
all
the
changes
will
probably
just
be
restricted
to
like
those
rbd
image,
object,
requests
and
already
object,
request
functions
within
rv.c
and
right
right
now.
The
chrono
client
does
not
support
fancy
the
driving
version
two,
so
you
don't
have
to
worry
about
the
extra
striping
things
that
you
did
in
user
space,
which
is
nice.
C
See
one
question
I
had
about
this
was
how
how
you
thought
it
should.
It
should
be
configured.
D
We'll
get
that
amount
option
with
without
our
existing
options
mechanism
that
won't.
E
Hey
josh,
what
what's
the
status
of
the
of
the
user
space
side
copy
and
read
hatch.
I
see
that.
C
Yeah,
I
think,
there's
still
a
couple
problems
with
it.
I
need
to
post
more
detailed
comments
on
it.
This
is
relating
to
the
fantasy,
stripping
and
cleaning
things
up
a
little
bit.
C
Just
getting
the
striping
stuff
correct
and
just
a
little
bit
more
difficult.
E
Well,
is
that
gonna
I
mean
one
of
the
next
things
up
for
ilia
is,
is
doing
the
fancy
striping
on
the
kernel
side
and
we're
going
to
face
ultimately
the
same
complexity.
There.
B
D
Well,
I
I
don't
want
to
sort
of
backstop
the
the
fancy.
Striping
is
a
much
bigger
thing,
so
if
we
implemented
it
in
time
say
for
this
release-
and
it
works
well
not
for
this-
for
the
for
the
next
kernel
release
and
it
works
well
all
right
with
the
with
just
the
chunking
mode
that
we
have
then
we'll
emerge
it
and
that'll
have
to
deal
with
it
and
otherwise
otherwise
yeah
it's
the
other
way
around
yeah.
D
C
Yeah,
I
think
it's
the
one
one
other
component
is
that
keeping
track
of
all
the
outstanding
rights
that
aren't
being
blocked
by
like
other
requests
and
making
sure
that
they
complete
before
the
device
can
be
closed
or
unmapped.
E
I
mean
is
the
idea
that
before
we
satisfy
the
read,
we
actually
do
the
whole
promotion
and
then
we
do
the
read
from
our
newly
written
child.
Or
would
we
actually
do
the
triggering
read
from
the
parent
and
then
immediately
follow
that
with
the
promotion
and.
E
C
Free
user
space
suggested
doing
it
asynchronously
so
that
the
beads
wouldn't
be
slowed
down
too
much
yeah.
E
C
C
Yeah
they
still
slow
down
like
reading
their
object,
which
is
slightly
slower,
perhaps
than
reading
a
small
chunk
of
it,
but
they
wouldn't
have
to
wait
for
the
entire
write
and
that's
why
the
user
space
version
has
some
code
and
to
wait
for
all
the
requests
at
the
end
on
image
close
so
for
kernel
lvd.
It
could
just
be.
We
use
the
like
image
in
use
counter
and
I
can
increment
that
for
each
do
that
as
a
ref
count.
C
Yeah,
I'm
sorry,
I
haven't
reviewed
it
yet
I
I
I
still
have
a
few
comments
making.
I
think,
there's
still
a
couple
things
to
fix
with
the
fancy
striping
at
the
handling.
At
the
end
of
the
read
and
with.
C
Using
the
xls
to
hold
all
the
completions
for
the
image
close
but
I'll
comment
on
github.
B
C
So
if
two
two
clients
access
the
same
image
with
another
doing
this
so
using
this,
this
copy
of
operation
is
actually
safe,
since
it
only
it
only
it
aborts
if
they,
if
the
object
ends
up
already
existing
when
it's
actually
run.
So
if
multiple
clients
have
them
inflate,
only
the
first
one
will
be.
We
will
be
done.
The
second
one
well,
we'll
just
return
success
without
actually
writing
anything.
C
I
manually
set
up
thing:
it's
not
kind
of
automatically
managed,
yet
I
guess
in
general,
if
there
was
a
higher
level
like
service
that
was
managing
the
locks,
it
would
be
able
to
determine
whether,
like
the
actual
user
of
the
image,
was
still
using
it
or
not,
and
if
they
were
not
then
break
the
lock
and
blacklist
those
client
so
that
a
new
client
could
use
it.
C
Yeah,
the
kernel
clan
might
have
slightly
slightly
different,
lock
different
locks
required.
C
I
don't
actually
remember
off
the
top
of
my
head
and
I'd
look
at
the
code.
I
think
it
might
just
have
the
pointers
to
buffers
in
it,
which
are
then
passed
back
to
the
bio
reference
buffers
in
the
bio.
I
forget.
C
He's
asking
about
rbd
image
object
callback
and
how
the
where
the,
where
the
data
gets
back,
gets
back
to
the
bio.
D
So
you
want
to
know
how
the
data
is
is
essentially
goes
in
or
goes
out
goes
in,
while
the
while
the
there
is
a
list
of
bios
that
is
attached,
well,
not
attached,
but
there's
a
list
of
bios
and
it's
not
not
the
vfas.
It's
it
goes
through
the
bio
interface,
because
it's
a
blog
device
and
it
returns
the
data.
D
A
D
Mean
we
are
in
it
the
vfs
all
the
vfs
stuff
is
done
by
the
kernel
automatically
we
just
have.
We
just
have
to
provide
the
basic
bio
interface
and.
B
B
D
It's
already
provided
there
is,
if
you
want
to
sort
of,
have
the
central
function.
Is
the
rbd
request,
fn,
or
something
like
that?
D
C
D
B
I
also
want
to
know
the
obj
ieq
flex
like
obj.
I
eq.
C
The
device
flags-
oh
right,
yeah,
those
are
used
kind
of
to
determine
whether
just
maintain
some
information
about
the
block
device
for
the
kernel.
It's
just
a,
I
guess
the
general
mechanism
there
is
just
to
to
like
determine
whether
that
the,
if,
if
it's
a
map
snapshot,
whether
that
snapshot
still
actually
exists
or
not.
B
C
The
bi
request,
bao
ones-
are
basically
used
for
it's
for
passing
down
yeah.
So
this
is
the
part
where.
C
We're
basically
deciding
which
part
of
whether,
where
the
data
is
going
to
be
stored
by
the
osd
client
at
the
below
the
rbd
driver,
so
we've
got
we
passed
down
either
you
put
put
together
or
either
add
in
pointers
to
the
bios.
If
it's
a
request
coming
from
the
block
device
directly
or
if
we're
doing
a
request
to
the
parent
image,
we'll
use
a
list
of
pages
that
just
writing
into
a
buffer
that
we've
allocated
ourselves.
D
Or,
what's
going
on,
the
page
is
basically
something
that
we
initiated
that
the
block
driver
itself
initiated
and
the
bio
is
used.
When
it's
the
client,
then
the
io
is
was
initiated
by
the
client,
so
the
block
device
was
opened
and
then
rather
written
to
or
whereas
the
page
lists
are
used
when
we
want
to
when
we
initiate
io
ourselves,
such
as
such
as
the
copy
app.
C
Yeah
so
for
the
copy
I
mean
we'd
probably
want
to
do
the
pages
and
then
copy
a
portion
of
that
into
the
the
bio
for
the
actual
read.
B
E
E
I
think
you
you
dropped
off
for
a
couple
minutes,
but
we
were
talking
about
just
sequencing
of
the
copy
and
write
support
in
the
kernel
versus
the
support
for
striping
in
the
kernel,
and
if
we
can
move
forward
on
the
copy
and
write
or
sorry
copy
and
read
quickly,
and
then
it
would
make
sense
to
to
get
that
in
sooner.
E
And
we
can
move,
we
can
move
a
lot
of
these
technical
questions
on
the
mailing
list
too.
Our
irc
we're
always
there
to
answer
questions.
A
All
right
sounds
like
it.
We
will,
I
think,
so,
continue
on.