►
From YouTube: CDS Infernalis (Day 1) -- RBD: Async Mirroring
Description
Videos from Ceph Developer Summit: Infernalis (Day 1)
03 March 2015
https://wiki.ceph.com/Planning/CDS/Infernalis_(Mar_2015)
A
B
So
the
general
idea
is
that,
right
now
you
can
replicate
RVD
images
synchronously
within
a
certain
SF
cluster.
But
it's
back
the
kind
of
options
you
have
for
disaster
recovery
for
it
for
like
making
backups
or
copies
in
a
different
stuff
Questor,
possibly
in
a
different
view
on
the
center
or
geographic
location,
are
kind
of
okay
for
slow
or
daily
backups,
or
perhaps
on
a
lot
of
time
scale,
but
really
nice
to
be
able
to
have
something.
B
That's
a
point
in
time,
consistent
with
whatever
one
time
the
the
and
a
separate
site
happens
to
be
at
and
can
be
kept
closer
to
the
current
state
of
things,
and
so
the
that's
kind
of
the
high-level
gold
people
have
a
stroke
of
early.
When
one
data
sent
goes
down
and
you're,
our
big
images
are
still
under
consistency.
B
Ironing
consistent
state
in
a
separate
datacenter
and
the
means
that
were
talking
about
to
accomplish
that
are
in
a
synchronous,
replication
mechanism,
both
on
top
of
a
full
block
of
all
the
changes
that
are
happening
to
everybody
image,
which
is
all
the
data.
All
the
rights
that
are
happened
to
it
and
all
the
metadata
changes
like
creating
snapshots
resizing
it
flattening
of
that
kind
of
thing.
B
B
At
a
high
level,
I
think
the
would
be
nice
to
have
the
user
interface
be
able
to
kind
of
enable
this
kind
of
replication
on
entire
pool.
You
could
be
sure
that
your
cell
pool
of
open
site
volumes
is
being
mirrored
to
a
different
datacenter,
but
as
well
as
leading
at
making
it
more
flexible
for
the
future,
so
make
being
being
able
to
enable/disable
it
on
per
image,
races
to.
B
B
It
could
be
like
an
apple
s,
disease,
speed
of
rights,
for
example,
and
this
could
be
operating
in
right
back
mode
where
you
don't
actually
write.
You
consider
the
right
complete
from
the
caches
abdication
perspective,
where
the
guests
perspective.
If
the
cache
is
disabled
once
the
right
is
in
the
journal
or
in
right
through
mode
where
you
consider
the
right
complete
when
it's
been
to
both
journal
and
to
the
actual
rbd
image.
That's
currently
in
use.
B
It
might
be
a
good
idea
today
and
below
to
skype
with
kind
of
kind
of
fancy
tracking
parameters,
and
so
that
you
could
get
more
parallelism
for
many.
Smaller
ideas
to
the
journal
might
be
a
good
idea
to
also
allow
the
objects
to
be
highly
variable
in
size,
with,
like
kind
of
a
common
room
size,
to
fill
up
an
object
before
moving
on
to
the
next
one,
just
so
that
there
aren't
kind
of
nu,
3
enjoyment
or
boundary
constraints.
B
In
general,
for
thee
and
for
the
journal,
you'd
want
to
have
us
into
doing
it
right
back
in
some
at
some
level,
even
if
you're
in
right
through
mode,
you
need
to
keep
track
of
like
which
writes
I've
actually
flushed
to
the
image.
So
you
can
note
whether
you
can't
replay
them
so
need
to
be
there's
a
nice
q7
metadata
associated
with
the
journal
as
well,
and
I
was
thinking
this
one.
B
So
one
would
be
just
the
flush
position,
which
is
how
far
how
much
data,
how
far
back
we
are
and
from
where
the
our
body
images
or
from
where
the
file
system
is
now
how
much
still
need
to
be
flushed
out
from
that
journal.
If
we
be
crashing
the
client
restart
somewhere
else,
I
can
start
from
that
position
in
replay
the
journal
there
and
for
the
actual
mirroring.
It
would
also
need
to
have
some
positions
on
a
per
site
basis
or
pre-owned
basis.
B
C
C
So
basically
it
will
be
similar
to
the
safe
was
designed
on
that.
Actually,
we
are
using
on
the
ways
decided
everything
will
be
reading
there
and
they
def
tickety.
So
then
one
concern
I
have
is
that
say
somebody
is
writing
from
image
in
Milan,
so
data
will
be
actually
written
there
as
a
journal
extra
data
and
as
well
as
if
somebody
is
using
Feist.
So
there
will
be
also
write
writing
twice
so
so
so,
basically
it
will
be
replicating
data
and
chlorides
so
more
right
everywhere.
C
B
Yeah
yeah,
especially
current
file
store,
and
that
one
thing
that
helps
with
that
is
like
the
Starkey
cash
in
general,
since
we'll
be
hopefully
be
able
to
able
to
use
this
in
right
back
and
the
cash
right
back
mode.
That
will
queue
up
more
rights
and
right,
that's
up
to
the
journal,
and
if
we
were
using
the
journal
and
write
back
mode,
we
can
also
delay
writing
out.
B
D
Is
the
same
trick
that
thats
ffs
does
where
it
has
a
really
big
journal
and
at
everything
in
the
journal
is
durable,
and
so
it
lets
it
sit
there
for
a
long
time.
So
when
it
finally
does
go
update
the
actual
object
it
you
have
all
the
rights
on
that
object
over
a
longer
time
get
coalesced
into
like
one
big
I,
oh
so
it
will
amplify.
But
the
more
like
you
have
lots
of
4k
iOS
when
you
eventually
go
and
actually
do
them
on
the
actual
object.
D
You'll
do
them
all
at
once
and
when
you
do
them
in
the
journal,
they're,
sequential
and
so
it's
hopefully
keeper
so
I
think
that
the
bigger
the
bigger
the
journal
gets
or
the
bigger
the
write-back
cache
is
effectively
on
the
rbd
client.
Then
the
better,
the
better
that
will
work,
I
mean
for
doing
random.
4K
goes
across
the
entire
image
and
there
you
never
touch
the
same
object
twice
than.
Obviously
that
doesn't
help,
but
I
think
in
most
workload
sets
us
on
true,
usually
have
quite
a
bit
locality
on
the
block
device,
the
other.
D
The
other
thing
is
that
you
know
we
want
to.
We
want
to
replace
the
file
store
which
we'll
talk
about
tomorrow,
I
think
and
in
the
new
store
implementation.
A
pens
have
no
double
right.
They'll
just
deal
just
a
pen
to
an
existing
file,
and
if
we
have
to
roll
back,
then
we
truncate
or
even
just
ignore
the
end,
but
there
won't
be
a
double
right
in
the
appendix
so
that
will
help
as
well.
B
Us
when,
in
that
blue,
jean
suit
is
asking
at
Arizona
laws
to
take
a
snapshot
of
a
volume
to
audric
storage
for
accessing
another
zone.
Since
we
already
have
our
different
location
across
Jones
could
be
right,
slowly,
take
a
snapshot
to
our
debut
object
and
what
we
could
just
like
that
we
kind
of
already
have
that
in
the
form
of
our
BD
differential
snapshots
that
you
can
use
to
do
I'm.
You
know
I'll
find
backups
when
you
take
snapshots
manually
and
replicate
them,
but
that
doesn't
really
scale
very
well
for
doing
a
doing
that
across.
B
B
So
the
mirroring
will
really
give
you
make
it
much
much
easier
to
use
and
in
terms
of
being
able
to
configure
it
on
a
purple
basis,
for
example,
and
providing
more
orchestration
around
actually
using
it
during
the
replication
and
also
being
able
to
being
able
to
have
a
point,
a
backup.
That's
are
closer
in
time.
Do
what
the
current
state
of
things
is.
If
something,
if
it's
a
little
prayer
site,
fails.
E
B
Yeah,
so
that
I
could
so
it's
definitely
parts
maybe
want
to
allow
I
mean
my
thought
is
that
it
would
basically
be
a
feature
bit
to
enable
the
journaling,
and
so
you
could
add
you
could
that
or
disable
that
on
a
per
image
basis.
I
could
also
just
be
a
I,
basically
helper
command,
for
that
would
be
enabling
it
unhelpful,
which
would
essentially
be
a
for
loop
over
all
the
images,
in
addition
to
I'm
setting
some
like
date
in
the
pool,
so
that
new
images
get
that
feature
enabled
by
default.
E
B
Yeah,
so
you
could
you
I'm,
just
riveted
sable
it
too
I
guess
so
the
weather
you
do.
That
would
be
basically
with
a
slant
design.
How
it
would
work
would
be
if
it's
enabled
on
a
purple
basis,
generically
and
the
image
would
have
that
feature
bit
by
default.
But
as
soon
as
you
create
it,
you
could
immediately
then
disable
it
or.
Conversely,
you.
A
C
Just
may
be
a
dumb
question
so
since
it
is
asynchronous
right,
so
why
we
need
to
maintain
the
entire
data
of
the
journal.
So
can't
we
have
some
metadata
pointing
to
the
actual
objects
that
will
contain
on
those
images
and
all
and
then
imagine
that
will
actually
read
the
object
and
write
it
on
the
other
end.
C
So
what
I'm
saying
that
things?
It
will
be
a
synchronous
right,
so
the
image
will
be
there
on
the
say:
primary,
the
primary
server
and
all
right
so
and
can't
there
be
some
kind
of
metadata
or
some
pointers
wait
like
say
the
marker
that
the
other
end.
Actually,
we
connect
there
and
then
run
through
this
better
data
where
and
and
and
get
the
actual
object
from
the
metadata
and
run
their
run
and
copy.
The
data
order
in.
B
Did
you
talk
about
journal.
F
C
F
D
Could
do
you
could
maybe
do
a
metadata
entry
in
the
journal
that
says
I'm
about
to
write
this
extent
and
then
go
and
write
the
extent.
But
then
you
have
to
deal
with
like
the
race
case
where
you
wrote
to
the
journal,
but
not
to
the
device,
and
so
you
have
to
like
have
a
sequence
number
in
the
journal
and
tag
the
Parvati
object
with
that
same
sequence.
Number.
So
you
know
that
you
have
the
VK.
You've
got
you've
caught
up
that
far
and
it's
that
that
gets
a
little
weird
yeah.
C
D
Or
just
our
just
have
to
put
the
writer
failed
like
they
said,
I'm
about
to
write
all
the
stuff
and
they
failed
before
they
actually
wrote
it,
and
then
you're
like
at
what
point
do
I
throw
out
that
I
guess
when
the
locker
takes
over?
So
maybe
you
could
make
some
recovery
path,
but
I
think
that
this
is
a
bigger
issue
with
that.
Is
that
assuming
you
did
that,
if
you
have
a
somebody
who's
replicating
the
journal,
I'm
actually
doing
the
mirror.
D
D
D
Yeah
or
you
can
you
can
tell
that
so,
assuming
your
your
say,
you're
watching
the
journal
and
it
says
I'm
about
to
write
this
extent
and
you
go
to
that
object
to
try
to
read
that
extent.
You
want
to
make
sure
it
actually
wrote
before
you
read
it
and
go
replicate
it.
We
would
like
pull
the
object
until
you
see
that
that
journal
sequence
number
has
been
sort
of
applied
to
that,
but.
F
D
F
D
Yeah
I
mean
I
suppose
having
having
a
full
dated
journal
and
is
expensive
to
do
it
naively
without
sort
of
optimizing
around
it,
but
it
enables
like
a
whole
class
of
other
really
cool
stuff.
Like
you
can
do,
you
can
do
delayed
apply,
so
you
could
have
your
replica
and
be
exactly
an
hour
behind
you
can
fast
forward.
You
can
even
make
a
rewind
if
you
make
a
reverse
ternal
I
mean
there's
all
kinds
of
cute
stuff.
D
You
can
do
assuming
you
could
pay
the
pay,
the
performance
penalty
and
if
we
do
stuff
like
this
right
back
like
a
making
a
large,
a
large
right
back
window,
essentially
and
coalesce
rights
to
objects,
then
we
can
better
get
a
lot
of
that.
Tory
put
the
colonel
in
a
flash
player
in
a
flash
player.
Put.
C
B
We
can
like
that's
against
one
of
the
ghosts.
Yes,
no
reason
why
we
didn't.
There
are
few
more
questions
which
is
chat
is
done
so
ones
asking
if
this.
A
B
Inconsistent
per
image
and
I
think
that
that's
true
for
now,
potentially
in
the
future,
my
people
have
the
images
like
share
the
same
internal
I'm
favorite
in
some
kind
of
consistency
group.
But
for
now
just
one
per
image
also
ask
about
how
we
handle
the
resynchronization,
so
missing
kind
of
a
failover
scenario.
Basically,
what
you'd
want
to
do
is
fit
when
you
fail
over
to
the
secondary
I'm
site.
B
B
D
Yeah,
it
seemed
I
think
there
might
be.
There
might
be
two
cases
like
you,
the
the
journal.
The
data
journal
will
tell
you,
which
things
have
effectively
been
we're
divergent
like
that,
didn't
get
replicated.
So
that's
the
stuff
that
you
need
to
roll
back,
and
it
might
be
that
the
same
extents
haven't
been
touched
to
the
target.
So
you
can
just
copy
them
and
you
can
actually
roll
them
back
sort
of
explicitly.
D
Then,
if
that
fails
for
some
reason,
like
they've
been
overwritten,
and
you
don't
want
to
like
roll
forward
and
some
and
get
into
an
inconsistency,
you
could
actually
roll
all
the
way
back
to
a
I.
Guess
we're
already
against
consistence
that
doesn't
matter.
We
might
as
well
just
copy
it
yeah,
but
you
could
also
roll
back
of
a
snapshot
and
then
replay
the
journal
from
there.
If
you
wanted
you
yeah.
B
Was
also
asking
about
this
speed
of
snapshot
and
copying
adrics
a
different
zone
or
vs
async
replication,
so
the
kind
of
do
two
different
stuff
different
costs
there
one
is
that
snapshots
to
and
correlate
that's
got
the
cop
and
right
overhead,
so
they
make
more
rights.
You
guys
to
recent
snapshot
of
objects
slower,
since
they
have
to
do
the
whole
copy
and
x
fest
in
terms
of
the
actual,
like
metadata,
transferred,
I.
Think
it's
pretty
similar
hello.
We
can
just
optimizations
with
the
sickness
replication
I.
B
Have
it,
for
example,
having
to
replicate
your
things
during
replication
have
a
large
buffer.
They
read
through
the
journal
and
I'm,
so
they
can
nab.
Not.
You
know
not
not
replay
every
single
right
if
it's
overriding
the
same
segment
segment.
More
than
months,
all
you
put
only
write
the
last
one
in
the
buffer,
so
this
and
as
potentially
some
other
kind
of
optimizations,
we
can
do
there.
Perhaps
a
general.
D
So
sorry,
random
thought
going
back
to
second
to
that
question
of
whether
we
want
to
do
what
they'd
want
to
put
the
journal
in
a
different
pool.
That's
faster.
It
could
be
that,
even
if
you
don't
do
that,
we
just
hint
the
journals
such
that
if
there's
a
cache
tier
configured
for
the
base,
rbd
pool
that
it
just
ends
up
always
living
in
the
cashier
and
the
the
rights
to
the
blocks,
and
our
hinted
such
that
they
tend
to
stay
to
not
be
promoted
into
the
cashier
or
something
like.
B
We
can't
probably
about
you
to
put
that
same
thing.
D
B
D
C
B
B
Journal
would
be
premature,
so
you
could
enable
and
disable
it
for
a
given
image
and
you
could
have
different
like
you
could
easily
paralyzed
light
replication.
That
way
too.
B
Okay,
yeah
they
Leland,
because
it's
the
group's
on
top
of
this,
they
couldn't
be
that
big,
a
deal
just
from
having
multiple
images,
largely
share,
journal
and
being
essentially
one
logical,
rbd
image,
assuming
they're
not
used
from
multiple
places
at
once,
if
they
argue
some
little
faces
or
once
the
minute
that
we
think
this
with
it
make
it
work
better
for
them
yeah.
So
that's
maybe
a
question.
We
should
look
into.
D
B
I
think
that's
probably
one
the
first
steps
putting
a
generic
during
the
module
someday,
please
go
by
this
and
fs
or
anything
else
that
wants
to
eat
I
dreaming
o'clock
today,
as
well
yeah
and
for
the
program
is
versus
purple
thing.
I.
Think
it's
pretty
simple
to
make
it
like
a
feature
bit,
but
you
enable
or
disable
frame
image,
and
you
can
do
that
dynamically
and
perhaps
of
the
dynamic
part,
gets
a
little
complicated.
If
you
have
existing
data.
D
D
Stamp
set
of
pool
and
record
level,
Delta,
I,
think
I,
think
Lars
I
think
we
do.
There
is
sort
of
a
separate
effort
to
do
that.
The
radius
level
or
you
take
the
entire
pool
and
you
create
a
raid
us
pool
snapshot,
although
I
don't
know,
if
that
would
that
won't
work
with
rate
of
images
or
block
device
images,
because
I
hello,
all
right
now,
cool
snapshots,
don't
mix
with
self-managed
snapshots.
D
D
Yeah,
but
he
says,
consistency
groups
are
quite
important,
yeah
I
think
so
I
mean
we
have
to
so
we
have
to
sort
of
solve
this
generic
consistency,
consistent
snapshot
issue
anyway,
where
you
have
a
bunch
of
images
and
you
want
to
snaps
out
them
together,
and
so
it
might
be
sort
of
the
same
thing
in
a
car,
not
their
journal
where
you
sort
of
pause
them,
tell
them
all
the
snapshot
and
then
let
them
resume
or
something
like
that,
and
we
already
get
that
that
point
in
time.
Snapshot.