►
From YouTube: CDS Infernalis (Day 1) -- RGW: Active/Active Arch
Description
Videos from Ceph Developer Summit: Infernalis (Day 1)
03 March 2015
https://wiki.ceph.com/Planning/CDS/Infernalis_(Mar_2015)
A
A
B
Right
so
yeah,
the
first
one
we're
talking
about
is
the
active,
active.
B
B
Said
we
want
to
have
regions
in
which
in
it
each
region
would
have
data
in
it
and
and
in
each
region
ii
you
might,
you
might
have
multiple
zones
and
each
zone
which
replicates
the
other
ones
basically
used
for
disaster
recovery.
So
you
have
a
master
zone
within
each
region
and
secondary
zones
that
will
follow
it.
B
So
you
might
have
two
regions.
Let's
say
USA
see
us
west
us
eh,
dude
have
two
regions
are
one
and
two
east
one
and
it's
two
and
then
we
could
have
west
one
and
one
west
to
and
you'll
need
to
to
provide
a
single
zone.
That's
going
to
be
with
the
master
of
all
for
the
sake
of
metadata,
because
it's
the
one's
going
to
control
automated
data-
you
don't
don't
know
if
reprint
so
everything
metadata
relate,
is
going
to
go
through
it.
B
B
So
next
one
you're
going
to
go
to
it,
to
ease
them,
etc,
so
that
that
allowed
us
to
create
a
single
global
name
space
that
is
being
used
for
both
both
east
and
west
and
but
then
again
you
can
have
a
localized
data,
but
the
several
issues
with
this
for
first
of
all,
there
is
some
confusion
about
what
region
actually
means.
B
Apparently
creek
region
is,
there
is
just
a
container
for
the
zones,
so
the
actual
data
resides
in
the
zones,
and
so
there
there
is
a
mix-up
here
between
what
what
is
actually,
where
is
the
actual
data
center,
for
example,
for
East
it,
because
if
you
say
if
the
data
resides
in
the
East
region,
you
expect
it
all
to
be
in
the
East
region,
but
then
for
disaster
recovery.
One
of
you
want
to
put
the
secondary
zone
in
in
West
so
so
saying
that
the
secondary
zone
for
for
Easter
resides
resizing
the
West.
B
It's
kind
of
confusing,
so
I
think
that
our
first
decision
and
suggestion
was
to
rename
regions
due
to
zone
groups,
and
hopefully
that
is
going
to
help
avoiding
some
of
the
confusion.
The
second
thing
is
within
a
single
zone
group:
now
we
don't
want
to
have
single
zone
where
you
can
can
write
into,
because
you
might
have
multiple
zones
that,
as
we
said,
like
people
in
within
a
single
Sun
group
can
be
a
both
in
the
Asian
and
the
West.
B
C
B
Now,
how
that
is
gonna
be
achieved.
So
at
the
moment
we
have
a
single
master
zone
that
keeps
logs.
For
you
know
there
are
two
trip
so
that
you
can
track
it.
We
have
a
sink
agent
that
goes
through
the
logs,
choose
these
logs
and
then
sends
commands
to
the
gateways
and
say:
ok,
here's
this
object
has
changed,
go
fetch
that
object.
B
B
The
sink
agent
will
then
need
to
be
able
to
go
through
through
all
the
zones
within
that
zone
group
and
to
send
the
commands
to
each
of
the
zones
irrelevant
questions.
But
that's
another
thing
now
that
the
back
in
the
index
log
now
we
need
to
hold
some
data
about
what
was
the
source
zone
for
each
entry,
because
you
might,
you
might
have
three
zones.
The
change
originated
on
someone
soon
to
fetch
the
change
in
turn
three
now
Tagus
zone
to
log
and
see
that
it
has
some
change.
B
B
Basically,
we
don't
have
an
issue
where
we'd
have
one
object
that
contains
part
of
data
that
was
written
in
object,
one
in
zone
1
and
part
of
the
written
in
the
end
another
zone.
What
you're
going
to
have
is
is
one
of
them
is
going
to
need
to
win
and-
and
we
can
probably
apply
a
basic
scheme
in
which
we
say:
okay,
let's
look
at
the
time
stamps
of
the
change
and
whoever
wrote
class
is
going
to
be
the
winner.
B
Now
here
you
can.
We
can
have
some
issue.
What
happens
if
the
time
stamp
is
equals
with
this
on
is
actually
chilly
like
the
change
is
different,
so
we
need
to
identify
first
of
all,
that
the
object
is
actually
different
and
then
also
have
another
tie
breaker
and
it
can
probably
decide
okay,
so
on
one
all
always
lean,
so
you
know
comparisons,
own
names
and
and
one
of
them
is
going
to
win,
but
that's
really
an
edge
case.
D
B
B
D
Do
if
you
do
a
foot,
a
delete
and
a
put
say:
no,
it's
okay
I
mean.
If
you
look
at
the
if
look
at
the
foot
and
then
you
look
at
the
delete
and
you're
like
say:
oh,
that
delete
is
old
and
you
can
ignore
it.
But
if
you
look
at
the
delete
first
and
you
delete
the
object,
you've
forgotten,
you
no
longer
have
that
date
about
what
the
version
is.
And
then
you
look
at
the
foot
you're
like.
Oh,
that
object
doesn't
exist.
D
A
big
drink
it
yeah
I
mean
we
on
the
OS
keys
in
the
PG
logs.
We
have
like
this
window
of
a
thousand
operations
or
whatever
we
keep
track
of
those
illrick
West
IDs.
That's
how
we
resolve
make
things
like
item
put
like
that,
but
we
don't
really
know.
Maybe
the
gateways
just
have
to
look
at
the
log
of
recent
also
just
have
a
window
in
the
log
or
something
as
the
last
so
many
requests
or
something
alright,
that's
low,
yes,
pull
it
off.
D
D
D
B
Doing
that
or.
B
So
yeah,
so
we
have
the
issue.
Now
is
another
question
about
how
to
handle
object
versioning?
What?
What
do
you
do
with
the
thing
with
object?
Versioning
is
that
we
need
to
keep
some
kind
of
ordering
so
and
the
current
scheme
you
have
some
kind
of
for
each
object
that
you
you
manage.
There
is
an
epoch
that
what
is
monotonically
increasing.
So
let's
say
you.
B
Write
an
object:
it
gets
a
puck
to
then
rewrite
it
again,
a
perk
3
right
again
cook
for
but
what
happens?
If
you
write
it
on
two
different
zones,
the
you
have
epoch
to
that
they
share,
but
then
ebook
three,
each
one
has
different
version
of
that.
And
then
you
need
to
keep
the
object
versions
in
order
from
being
able
to
list
them
from
the
newer
to
the
older
for
the
news
to
the
oldest,
which
is
ok,
if
you
doing
it
on
a
single
zone.
B
But
if
you
have
multiple
zones
and
you
actually
need
to
be
able
to
list
them
not
not
by
the
epoch
but
by
some
Canada
timestamp.
B
So
the
idea
here
is
to
replace
the
current
epoch
scheme
with
something
that
is
both
the
counter
in
a
timestamp,
so
that
they'll
it
will
preserve
the
ordering
hello,
always
monotonically
increase,
but
on
the
other
hand
it
will
be
will
be
able
to
to
to
merge
different
zones
I
into
a
single
coherent
view.
B
And
the
end
again,
we
have
the
issue
of
what
to
do
with
the
changes
that
happened
on
the
same
time
stamp
and
in
here
think
that
that
fix
would
be
to
again
to
head
to
the
epoch,
some
kind
of
a
unique
strengths,
para
para
zone,
so
that
they're
not
going
to
get
the
same
epoch
and
what
one
is
one
is
always
going
to
win.
If
they're
happen
when
same
time,
which
is
probably
a
very.
A
B
D
B
D
B
The
moment:
well,
it's
definitely
in
enslaved.
There's
us,
you
turn
it
on
for
specific
buckets
and
currently
what
we
have
is
you
do
it
for
the
entire
zone,
for
all
the
buckets
we
can
and
we
discuss
in,
we
might
wanna
have
I'm
kind
of
a
configuration
where
we
can
turn
it
on
only
on
the
smaller
kitchen
and
not
on
all
data.
B
Okay,
it
makes
sense
in
some
some
configurations.
Yeah.
D
B
That's
mostly
work
for
the
sink
agent,
not
quite
Gateway
Pacific
issue.
D
B
B
B
B
Essentially,
you
you're
having
complete
fail
over
situation,
or
you
know.
The
scenario
that
we
are
talking
about
need
need
that
specific
failover
handling,
because
specially
syncing
all
the
time,
so
that
would
probably
solve
it
and
I'm,
not
sure
if
you
know
fixing
the
current
sink
agent
to
do
failover
easily
or
just
doing
the
the
active
active
work.
What's
what
would
be
the
best
way
to
go?
I.
D
B
D
That
just
means
that
the
logs
on
the
site,
B,
are
basically
that
zone
are
basically
empty,
and
if
you
have
a,
if
you
cut
over
and
you
have
to
fail,
then
those
logs
will
start
to
get
populated
right
now,
when
the
master
site
comes
back
up,
it
just
will
read
just
those
short
logs
that
have
only
the
rights
that
happen.
Since
yes,
I
mean
yeah,
it
seems
like
it
would
yeah.
C
D
That's
sort
of
that.
That's
the
reality
is
eventually
consistent,
though,
like
you
can't
read
something
stale
and
rewrite
it.
What
happens
for
the
M
with
like
the
metadata,
though,
like
the
bucket
in
like
you,
can
do
sort
of
the
West
writer
winds,
simple
view
of
things
from
the
actual
object
point
of
view,
but
on
the
metadata
thing,
users
are
updating,
ack,
a.
B
D
This
is
a
case
actually
where
you
do
need
a
defined
resynchronize,
failover
type
thing,
because
if
you
are
doing
that
on
the
master
zone,
you
create
buckets
that
haven't
been
replicated
yet
and
then
you
master
fails
and
you
fail
over
two
tails
on
a
different
user.
Click
create
the
same
bucket
and
you.
B
B
B
B
The
second
question
here
is
ETA
above
case.
Both
users
have
got
success,
which
encodes,
who
will
end
up
losing
their
buckets.
The
answer
is
at
they're
not
going
to
both
to
get
just
successfully
turn
curses.
Everything
for
net
metadata
wise
goes
goes
to
the
same
through
the
same
zone
and
though
they're
not
going
to
be
so
only
one
of
them
is
gonna
win.
E
B
E
I
guess
it
is
any
cure.
I
would
gather,
there's
an
issue
because,
for
example,
if
you
run
beat-
and
we
end
time
keeping
it
always
a
problem
and
for
the
money
talk,
we
have
well
for
a
rest
of
the
class
of
you
have
some
notifications
of
the
time
to
talk
too
much
but
I
guess
you
need
something
like
that
done
to
the
gateway
yeah.
B
Yeah
we
can
notice
ice
the
time,
drifts,
ghost
too
big,
so
yeah
so
gateways
send
messages
to
each
other.
They
can
can
think
of
a
scheme
where
they
they
say
they
notify
that
you
know.
What's
the
current
timestamp
and
everyone
checks,
and
if,
if
there
is
some
kind
of
a
big
drift,
then
they
send
some
kind
of
warning
now.
B
Okay,
the
question
is,
you
know
you
need
to
look
at
what
is
the
use
case
for
for
for
that,
the
weather
multiple
writers
write
is
the
same
object
in
the
same
packet,
whether
there's
any
scenario
I'm
adding
like
I
assume
it's
an
shoe
and
it's
a
problem.
But
it's
not.
You
know
it's
not
the
most
acute
issue.
A
E
B
B
Like
you
know,
but
with
the
time
that
reported
about
the
creation
of
the
buckets,
but
by
deep
by
if
using
the
f3
restful
interface,
you
cannot
drift
more
than
15
minutes,
because
Peters
would
not
be
able
to
to
sign
anything
because
so
15
minutes
is
going
to
be
or
15
15
times.
2
30
minutes
going
to
be
the
largest
time
dress,
which
is
huge,
but
you
know
I
that
not
going
to
be
days.
E
B
Okay,
any
other
questions.
B
It
goes
through
a
when
creating
the
bucket
on
the
on
on
the
master
going
back.
B
It
usually
it's
not
it's
not
part
of
the
CIO
path
or
not
supposed
to
be
at
least
it's
so
some
latency
there,
but.
B
At
a
higher
level,
we
have
different
settings
in
which
we
have
different
masters,
owns
four
different
tenants
or
something
of
that.
So
we
can
say:
okay,
this
cannon
is
a
west
tenant
and
all
the
buckets
would
create
create
here
it's
going
to
go
through
the
West
in
here.
That's
a
nice
tenant,
but
it's.