►
From YouTube: CNCF SIG Storage 2021-01-13
Description
CNCF SIG Storage 2021-01-13
A
Hello,
everyone
we'll
just
wait
a
couple
more
minutes,
first
goal
of
the
new
year
so
happy
here.
A
B
A
A
A
D
Hello,
hi,
erin
and
alex.
Thank
you
for
the
kind
words
I
have
severe
internet
issues
today,
so
I'm
gonna
try
to
save
my
bandwidth.
So
I'm
on
the
mobile
phone
for
our
audio
and
but
I
don't
know
about
the
video
how
long
it's
going
to
hold.
A
That's
all
right,
I
think
I
will
put
the
link
to
the
document
in
the
chat
window,
so
people
can
all
access
it
directly
anyway.
A
A
All
right:
it's
it's
six
minutes
passed.
Why
don't
we
when
it
restarts
so
at
the
beginning
of
december?
We
before
context
at
the
beginning
of
december.
We
we
had
a
discussion
with
rafael
and
we
we
we
decided
to
take
forward
and
build
out
some
some
more
information
on
cloud
native
disaster
recovery
rafaela
has
been
exceptionally
busy
and
has
put
together
a
great
a
great
document.
I
think
we
were.
A
We
were
expecting
a
skeleton
that
we
could
that
we
could
fill
out,
but
but
there
is
a
tremendous
amount
of
content
here.
So
so
maybe
rafael
do
you
want
to
take
us
through
the
documents
and
and
we
can
and
we
can
kind
of,
discuss
and
figure
out
what
what
needs
to
be
done
and
where
you
need
help,
etc.
D
D
D
It's
somewhere
here:
okay,
okay,
so
yeah
like
like
alex
said
in
the
last
meeting.
We
decided
to
go
ahead
with
talking
about
disaster
recovery
in
a
cloud
native
scenario,
and
I
was
tasked
to
create
a
outline
of
the
document,
but
I
decided
to
go
ahead
and
also
fill
up
some
of
the
content,
because
I
already
had
some
of
it
written
down
in
various
articles
that
I
had
you
know
published
before,
and
some
of
that
was
in
my
mind,
so
I
just
wanted
to
do
a
brain
dump.
D
While
it
was
there,
and
so
this
is
what
I
have.
D
Obviously
it's
a
draft,
it's
rough
in
many
way,
both
in
the
structure
and
in
the
actual
content,
and
so
my
hope
is
today
to
talk
about
it
a
little
bit
and
then
that
you,
you
all,
will
go
and
and-
and
I
think
the
connection
already
dropped.
So
maybe
someone
else
can
consider
this.
Let
me
try
again,
but
can
you
still
hear
me.
D
Okay,
good,
so
yes,
and
my
hope
is
that
you
will
all
read
it
and
provide
feedback.
Okay,
do
you
all
have
commenter
and
every
anyone?
I
I
don't
know
how
to
share
with
with
everyone
in
this
in
this
week.
But
as
you
ask
permission
to
access,
I
will
share
it
with
you
and
and
will
give
you
commenter
abilities.
D
So
you
you
will
be
able
to
the
the
suggestions
or
comments
and
if
you
would
like
to
add
your
feedback
that
way,
I
will
incorporate
all
of
the
feedback
that
makes
immediate
sense
to
me
and
then
discuss
and
follow
up
with
you
and
discuss
all
of
the
feedback
that
you
know.
For
me,
it's
not
clear,
and
then
I,
if
we
can
work
that
way
so
being
directly
on
the
document
without
having
to
meet.
I
think
we
can
quickly
converge
to
something
that
we
all
agree
and
we'll
we'll
feel
like
we
can
share.
D
D
Let's,
let's
take
a
look
at
the
structure,
so
there
are
three
main
areas.
One
is
the
first,
the
first
chapter.
E
D
Okay,
thank
you!
Yes,
so
the
first
chapter
is
about
availability
and
consistency,
and
here
is
we
I'm
trying
to
give
some
definitions
about
this
concept,
plus
others
that
will
be
useful
later
in
the
document
and
are
relevant
in
the
custom
context
of
disaster
recovery.
So
we
talk
about
failure,
domain
right
and
then
we
talk
about
availability,
consistency,
the
cap
theorem,
which
creates
this
relation
between
availability
and
consistency
right
and
then
what
we
mean
by
disaster
recovery.
D
So
for
me,
and-
and
you
will
read
it
in
the
document,
but
the
the
main
takeaway
is
that
when
we
talk
about
availability,
we
are
really
asking
the
question
given
in
a
failed
domain.
What
happens
to
my
workload
if
one
component
in
that
failure
domain
fails
one
or
more?
But
generally
it's
one
right.
It's
a
hao
one.
D
Instead,
when
we
talk
about
disaster
recovery,
we're
asking
the
question
given
and
availab,
given
a
failure
domain,
what
happens
if
all
of
the
components
fail
at
the
same
time
with
a
single
event
right?
Obviously,
in
that
case,
to
be
still
able
to
service
requests,
you
will
need
to
have
multiple
failure
domains.
So
it's
really
it's
a
different
question,
but
today
there
is
a
lot
of
confusion.
A
Hey
quick
question,
therefore:
do
we
want
do
we
wants
to
do?
We
want
to
put
a
definition
of
the
r
in
there
and
in
in
some
form-
and
you
know,
yeah.
A
There
is
a
in
there
because
you
know
one
one,
one
of
the
things
one
and
let
me
explain
why
I'm
trying
to
discuss
this
about
the
differentiation.
I
guess
between
high
availability
and-
and
you
know,
disaster
recovery,
because
I
think
one
of
the
one
of
the
aspects,
the
way
we're
we're
kind
of
describing
it
in
this
document
is
that
we
are.
A
We
are
talking
about
multiple
instances
of
of
something
right
say
you
know
a
database
or
or
or
or
something
that
or
or
some
sort
of
system
or
whatever
it
is
that
that
we're
thinking
about
here-
and
you
know
the
the
the
question-
that's
not
100
clear
in
my
mind,
is
where
how
blurry
does
that
line,
get
between
h,
a
and
d
r
when
we're
when
we're
considering
certain
cloud-native
technologies
right?
So
so
I'm
kind
of
thinking.
A
If,
if
we're
talking
about
something
like
you
know,
for
tess
or
or
cockroachdb
or
or
or
something
like
that,
where
you
you
kind
of
have
multiple
instances
and
and
and
replication,
but
the
those
multiple
instances
are
serving
both
the
purpose
of
hay,
but
also
the
purposes
of
the
right.
I
guess
there's
there.
There
might
be
lots
of
opportunities
where
there
might
be
overlaps
as
well.
D
Yeah
yeah,
that's
what
I'm
that's
what
I
meant
when
I
said
I
talked
to
customers
where
that
are
sort
of
overlapping
those
two
concepts
in
the
same
way
that
you
are
describing,
I
I'm
building
an
architecture
that
serves
both
the
purpose
of
localha
and
global.
Dr
you
know
automatic,
dr
and
then
and
yeah,
so
I
can
with
a
single
solution.
I
can
address
those
two
problems,
but
for
me,
they're
still
two
separate
problems
we
can.
However,
we
can
add-
and
I've
debated
within
myself
whether
we
should
do
it
or
not.
D
We
can
add
a
discussion
in
this
document
saying
where,
where
we
see
that
possible
overlap
and
and
how
that
plays
out
but
right
now,
I
am
the
way
this
document
is
written
right
now
we
we
try
to
delineate
a
clear
distinction
between
h,
a
and
d
r
and
for
me
the
simplest
way
to
explain
that
distinction.
Is
that
question
the
way
you
ask
that
question
right
relative
to
a
failure?
D
So
in
that
way
you
have
a
clear
distinction
and
the
only
way,
the
only
really
the
only
way
where
you
start
making
the
they
may
mean
the
same
thing
and
ndr
is,
is
if
you
are
working
with
different
failure
domains.
D
Well,
in
that
case,
h,
a
and
d
r
can
mean
the
same
thing,
but
but
but
you
have
subtly
changed
the
the
phase
of
the
main
level
and
I
I
hope,
I've
not
confused
everyone,
but
that's
what
happens
really
in
the
in
the
minds
of
of
the
of
the
customers
that
that
I
talk
to.
B
We
may
make
a
raphael.
Maybe
it
would
make
more
sense
to
take
that
section
on
disaster
recovery
and
move
it
up
above
aj,
so
we
address
it
first
and
then,
when
we
dive
into
talking
about
h.a,
we
can
talk
about
the
correlation
and
non-correlation
and
differences
between
you
know
traditional
ways
that
customers
view
these
things
compared
to
the
cloud
native
approach,
kind
of
like
we
did
in
the
white
paper
alex.
B
D
Yeah
yeah
in
the
so
in
the
disaster
recovery
section
that
is
highlighted
by
now.
We
just
talk
about
general
disaster
recovery
definitions:
it's
not
the
solution
for
cloud
native
disaster
recovery.
That
is
only
at
the
end,
but
I
can
move
the
design
yeah
we
can
move.
We
can
rearrange
the
disaster,
recovery,
section
and
definition
section
and
put
it
just
before
or
right
after
the
high
availability
one
and
so
so
people
can
mentally
compare
them
immediately.
D
A
Yeah
suggestion
think
that
that
makes
sense
too.
I
guess
for
me,
the
the
you
know,
I
think,
describing
it
in
terms
of
working
across
failure.
Domains
is
is,
is,
is,
is
a
is
a
good
way
of
doing
it.
The
second
distinction
which
I
wanted
to
clarify,
because
because
this
is
kind
of
important
too,
is
are
we
are
we
talking
about.
A
We
would
be
talking
about
having
completely
separate
instances
of
of
a
system.
You
know
whatever.
That
system
is
whether
it's
an
application,
a
database
whatever
and
they
would
and
and
and
and
those
systems
would
be,
would
be
made
available
across
different
failure.
Domains
and
and
those
failure
domains
could
be
data
centers.
They
could
be
racks,
they
could
be
server
rooms,
they
could
be
geographies
whatever
that
those
failure
domains
are,
and
that
makes
kind
of
like
a
lot
of
sense.
A
But
I
think
what
we're
seeing
in
in
in
the
cloud
native
world-
and
this
is
why
I'm
I'm
mentioning
it-
is
that
what
we're
seeing
is
the
the
spread
of
components
across
failure
domains
to
to
to
effectively.
You
know,
have
much
more
of
an
overlap
between
hay
and
d
r
and-
and
you
know
for
me
that
that
is-
that
is
also
kind
of
like
an
important
differentiation.
A
So
so,
if
the
definition
of
dr
is
being
able
to
to
recover
a
system
from
the
failure
of
or
from
the
outage
of,
a
failure
domain
right,
the
the
the
other
differentiator
which
which
might
need
clarification,
is,
are
we
talking
about
completely
separate
instances
or
are
we
talking
about,
or
can
we
also
talk
about
a
single
instance
that
has
components
which
are
spread
across
failure,
domains.
D
A
Okay,
all
right,
so
so
let
me
try
and
explain.
Let
me
try
and
explain
so
so
imagine
you
have
imagine
you
have
a
database
and
and
your
database
and
your
database
has
maybe
a
primary
and
and
a
replica
copy,
right
and
and
those
primary
and
replica
copies
are
spread
out
across
different
failure,
domains
right,
but
but
from
a
management
point
of
view
from
a
control
plane,
point
of
view,
they're
being
administered
as
a
single
instance.
A
So,
for
example,
in
the
first
example
where,
where
you
have
you
know
a
primary
and
a
replica
and
they're
tightly
coupled
and
seen
as
one
instance,
a
configuration
error
or
an
error
on
one
side
can
easily
cascade
down
to
the
replica
right,
whereas
if
they
are
two
completely
separate
instances,
you
have
you
have
like
separation
of
those
domains
of
those
failure,
domains.
D
Okay,
I
think
I
followed,
but
now
help
me
understand
how
you
want
to,
because
we
can.
You
know
it's
it's
workload.
D
There
are
a
ton
of
tons
of
stable
workload
and
they
all
have
quirks
in
the
way
you
can
configure
them
and
options,
but
not
no,
not
say
quick,
but
options.
So
what
you
have
described
is
maybe
an
option
of
some
kind
of
workloads.
But
how
do
we
generalize
that?
D
Because
we
don't
want,
you
know
we
at
least
I
was
trying
to
be
very
general
with
this
concept
and
yes,
yeah
there
are.
There
are
some
databases
that
can
do
master
slave
right
and
or
workloads
that
can
do
much
to
slave,
and-
and
you
know
it
would
be
wise
for
you
to
put
the
master
in
this
wave
in
different
failure
domains,
but
I'm
not
sure
how
that
comes
back
into
this
document.
How
would
you
you
you
mean?
Are
you
saying
that
this
could
be
a
way
to
do?
D
A
Well,
so
so
what
I
was
so
what
I
was
thinking
was
that
I'm,
I'm
seeing
you
know
in
in
customers
that
we're
working
with
I'm
kind
of
seeing
two
specific
patterns
emerge.
So
imagine
you
have
imagine
you
have
a
storage
system.
For
the
sake
of
the
argument
right,
I
am
seeing
two
specific
patterns.
A
So,
for
example,
you
know
a
storage
system
that
can
do
replicas
or
erasure
coding
or
something
like
that,
but
it's
actually
just
spread
across
multiple
multiple
failure
domains.
So
so
so
now
you
have
kind
of
like
a
single
instance
across
multiple
failure,
domains
versus
multiple
instances
over
multiple
failure.
Domains
so
so
I
I
was
kind
of,
I
was
kind
of
bringing
this
up,
because
what
I'm
seeing
is
a
lot
of
the
a
lot
of
the
cloud
native
technologies.
A
You
know
and
I'll
mention,
for
example,
you
know
rick
and
chef
and
and
and
vitesse,
for
example,
tends
to
favor
a
single
instance
spread
out
over
multiple
failure
domains,
whereas
other
technologies,
you
know
like
I'll,
give
say
postgres
as
an
example,
you
kind
of
see
that
as
being
implemented
as
multiple
instances
over
multiple
failure
domains-
and
it
is,
it
is
subtly
different.
But
you
know.
D
No,
I
agree.
I
agree.
I
am
so
in
this
document.
I
am
focusing
on
on
the
single
logical
instance:
single
logical
workload,
entity
spread
across
multiple
failure,
domains
the
other
one,
the
other
option,
maybe.
D
More
but
but
I
think
it's
yeah,
I'm
not
sure
exactly
how
to
model
that,
but
but
if
we,
if
we
feel
we
should
talk
about
the
other
option,
I'm
certainly
open
to
it,
it
is.
It
is
in
a
way
in
the
appendix,
but
I
think
you
you
have
a
more
comprehensive
view
of
the
other
option,
but
for
me
in
this
document,
I'm
always
I'm
always
in
all
the
examples
I'm
thinking
about.
There
is
a.
F
D
Entity
and
we
just
discuss
how
to
make
it
highly
available
right
and
and
resistant
to
disasters,
but
it's
not
it's
not
about
keeping
in
sync.
Multiple
entities
got
it
for
well.
Unless
you
go
to
the
traditional
disaster,
recovery
strategism,
which
is
well
all
of
them,
are
doing
that
are
trying
to
do
that.
But
in
the
cloud
native
we,
I
think
our
at
least
my
argument
is:
you
should
pick
a
software
or
a
product
that
can
do
that.
D
That
that
is.
Is
you
can
deploy
as
a
single
logical
entity,
and
it
will
spread
across
multiple
phase
at
the
moment
and
that
the
question
obviously
become
how
far
you
can
go
with
the
failure
domains
like.
Can
you
do
geographies
or
is
just
local,
because
some
some
of
these
workloads
have
latency
issues,
but
but
yes,
that's
that's!
The
test
is
right
now
of
the
document.
That's
where
the
document
is
going.
So
we
can.
We
can
obviously
rediscuss
if
that
is
what
we
want
to
say,
but
that's
what
the
document
is
saying:
raphael.
E
Rafael
had
a
question
about
scope
of
disaster,
so
I
think
right
now,
it
looks
like
your
disasters
are
benign
faults,
as
opposed
to
malicious
attacks.
Is
that
correct.
D
D
A
I
I
I
think,
that's
that's
a
good
point
right
where,
where
where
somebody
might
use
dr
capabilities
to
protect
against
a
security
issue
or
to
protect
against
or
even
to
protect,
say
against
human
error,
for
example,
I
imagine
right.
D
E
E
D
It
doesn't
matter
why.
The
point
is
at
some
point:
we
lose
connectivity
and
that's
really
what
we,
what
our
software
should
be
able
to
detect
and
react
to
if
we
want
to
do
cloud
native
disaster
recovery,
meaning
the
application
as
soon
as
connectivity
is
lost
to
the
other
peers
right
of
this
cluster
of
logical
workload,
the
that
workload
is
able
to
reorganize
itself
and
keep
working
and
keep
serving
without
human
intervention.
D
So
I
mean
I'm
totally
fine
if
we
want
to
list
example
of
disasters
and
have
a
security
in
it,
but
it
doesn't
change
the
rest.
It
shouldn't
change.
The
rest
of
the
conversation
right
is
the
trigger
of
the
disaster.
Shouldn't
change,
how
we
manage
the
disaster
and
tell
me
if
I'm
you
know
if
you
disagree.
E
I
I
think
alex
is
one
instance
versus
two
instances,
kind
of
highlight
that,
to
some
degree,
in
my
mind,
that
a
disaster
in
a
single
instance,
it
is,
is
different
than
a
disaster,
with
independence
with
more
layers
of
independence,.
E
D
There
I'll
be
careful
there,
because
if
you
set
up
a
multiple
instance,
multiple
logical
instance,
kind
of
scenario-
and
you
want
really
fast
recovery
from
a
disaster.
It
means
that
you're
synchronizing
the
data
right
very
continuously,
synchronously,
maybe
or
asynchrony,
but
with
a
very
little
delay.
F
Makes
sense,
I
think
I
think
attacks
are
an
interesting
concept,
but
it
could
be
a
rabbit
hole
if
you
get
into
like
a
sophisticated
enough
attack
that
can
poison
disaster
recovery.
You
know
how
to
how
do
you
protect
disaster
recovery
itself?
F
A
I'll
I'll
I'll
provide
I'll
provide
some
I'll
provide
some
feedback
to
the
to
the
document.
I
mean
security
is
an
interesting
angle.
One
of
the
the
specific
angle-
actually
I
was
I
was
thinking
of
between
the
multiple
instances
versus
single
instance
is,
is
also
you
know,
more
simple
things
like
like
human
error,
for
example
right.
A
So
so,
for
example,
if
you,
if,
if,
if
you
are
replicating,
say
transaction
logs
across
two
different
instances
on
a
database,
then
if
somebody
makes
a
mistake
on
on
the
primary
and
say
drops
a
table,
the
the
drop
doesn't
have
to
get
replicated,
for
example,
whereas
you
know,
if
you're,
if
you're
working
with
a
single
instance
across
multiple
failure,
domains
a
human
error
kind
of
takes
out
all
the
failure
domains
at
the
same
time.
A
So
so
those
those
are
kind
of
like
some
of
the
things
that
that
I
was,
I
was
going
to
suggest
that
we
highlight
so
so
we
can.
We
can
just
say:
look.
There
are
two
ways
of
doing
this:
multiple
instances
with
multiple
domains
or
a
single
instance
with
multiple
domains,
and
there
are
some
pros
and
cons
between
the
boats
and
maybe
have
a
short
table,
and
so
I
can.
I
can
throw
that
together
and-
and
we
can
review
it
for
next
time.
C
A
D
I
propose
let's
table
this
for
now
the
we
both
alex-
and
I
took
notes
of
that,
so
we
will
it.
The
feedback
is
not
going
to
be
lost.
Let's,
let's
not
use
all
the
time
on
this.
I
want
to
talk
about
the
other
three
sections
of
the
document,
just
briefly
yeah,
so
so
the
first
one,
as
we
said,
is
about
defining
these
concepts,
about
availability,
consistency
and
disaster
recovery
and
all
the
all
the
other
reasonings
that
we
need
in
the
rest
of
the
document.
D
So
the
argument
of
this
section
is
in
in
the
end,
with
regard
to
availability
and
consistency,
all
state
applications
are
doing
the
same
thing
that
have
to
solve
the
same
problem.
Maybe
they
solve
it
in
different
ways,
but
they
are
actually
solving
the
same
problem,
so
we
can
model
those
stateful
applications.
We
can
create
a
logical
model
of
those,
the
of
of
a
statement,
application
that
applies
to
all
of
them.
D
Of
course,
when
you
actually
build
a
stateful
application
model
doesn't
hold
because
you
have
to
optimize
highly
optimized
right,
but
in
the
logical
model
is
that
there
is
an
api
layer,
so
that
could
be
the
sql
layer
or
it
could
be
messaging
layer
if
it's
a
storage
type
of
state
of
application
is
the
block
device
protocol
or
the
file
system
protocol.
It's
a
way
to
talk
to
the
application,
and
there
is
a
coordination.
E
D
And
then
there
is
a
storage
layer.
Okay,
this
this
paragraph
is
kind
of
similar
to
what
the
story
sig
has
already
published
right
with,
with
just
addition
of
the
concept
of
the
coordination
layer,
but
the
ipi
layer
and
the
storage
layer.
Where
were
already
identified
in
that
in
that
document,
and
then
I
I'm
adding
here
the
concept
of
replicas
and
partitions,
and
it
should
be
self-explanatory,
but
re
read.
Some
of
the
considerations
regarding
replica
competition,
replicas
is
is
a
way
to
obviously
create
aha
high
availability
for
for
a
workload.
D
Is
a
way
to
scale
by
by
partitioning
the
data
the
data
sets
right
and
then
you
can
use
them
together
to
create
highly
available
and
theoretically
unlimited
scaling
workloads,
which
is
what
you
know.
Modern
modern
products
like
cochlear
tv
and
yoga
byte
and
tidb
it's
what
they
advertise
that
they
can
do,
and
if
you
try
them,
they
can
actually
really
do
that,
at
least
at
least
relative
to
the
hardware
that
I
have
to
my
disposal.
C
D
Because
because
you
create
more
replicas
due
to
coordination,
so
anyway,
going
back
to
the
structure
here,
so
you
have
replicas
and
partition,
and
so,
if
you
go
to
the
put
to
the
last
paragraph
where
I
say
putting
it
all
together,
if
you
don't
mind
scrolling
there
alex
the
idea
is
that
you
have
these
instances
of
replicas
that
will
be
coordinated
to
stay
always
in
sync,
and
then
you
have
other
partition
of
the
of
the
data
which
may
have
multiple
instances
right,
and
sometimes
you
have
a
request
that
that
requires
the
cross
cross
partition
type
of
request,
in
which
case
you
have
pre-coordinate
between
partitions
okay.
D
So
so
there
are.
The
important
thing
here
is
the
important
takeaway
is
that
we
need
two
kinds
of
coordination
protocols
or
consensus
algorithm.
One:
two
coordination
between
partitions,
one
to
coordinate
between
replicas
and
one
to
coordinate
between
partition
and
the
job
is
very
different
because
between
replicas
it's
about
doing
the
same
thing,
all
the
replicas.
I
have
to
do
the
same
thing
between
partitions.
D
It's
about
doing
a
different
thing.
Each
each
partition
has
to
do
a
different
operation
right
to
carry
out
the
full
transaction.
D
Okay,
then,
so
that's
important
to
understand
and
the
other
thing
is.
Unfortunately,
there
is
a
lot
of
confusion
in
the
names
that
each
workload
uses
for
calling.
You
know
for
the
concept
of
replica
and
partitioning
their
own
jargon
in
their
own.
You
know
product.
D
You
know,
workloads
and
products
and
and
show
how
you
can
map
what
they
call
a
partition,
what
they
call,
for
example,
axis
search,
I
think
called
index
what
would
be
in
this
document
called
a
partition
or
and
and
and-
and
I
mean
you
know-
I
map
all
of
these-
to
showcase
that
really
all
the
workloads
can
be
can
be
brought
to
this
model
that
we
are
that
we
are
talking
about
here.
D
It
is
similar
to
the
section
that
was
independent
in
the
in
the
original.
You
know
in
the
story
sig
paper
that
was
published,
but
it's
a
little
bit
expanded
and-
and
we
here
we
say-
I
say
that
for
for
replicas
coordination,
the
there
are
specific
consensus
protocols
that
might
that
fit
better
for
that
job,
and
they
are
rough
and
practical.
D
Okay,
so
that
they
are
the
the
consensus
protocols
that
are
based
on
leader
election
and
in
which
all
of
the
instances
that
participate
in
that
transaction
essentially
have
to
do
the
same
thing
based
on
a
log
of
events.
Right
and
then
there
is
the
consensus
protocol
between
partitions
and
these
and
that's
where
two-phase
commit
and
three-phase
commit
are
better
fit.
D
And
the
other
thing
that
we
that
I
talk
about
in
this
in
this
section
is,
is
the
fact
that
you
should
only
trust,
proved
consensus,
protocols,
algorithm
and
then
these
some,
if
you
scroll
up
a
little
bit,
this
concept
of
reliable,
replicate
reliable,
replicated
state
a
machine
under
a
reliable
replicated
data
store.
D
This
is
some
accept
taken
from
the
essary
book,
so
very,
very
interesting
reading,
but
the
general,
the
gist
of
it,
is
that
this
problem
can
be
generalized
and
has
been
theoretically
solved
by
a
set
of
paper
in
in
academia
where
you
can.
They
approve
that
you
can
build
a
machine
that
will
replicate
the
state
whatever.
That
means
for
your
particular
state,
for
workload
in
a
reliable
way
across
multiple
replicas.
C
D
Using
a
leader
election
type
of
consensus
protocol,
and
so
they
give
you
all
the
instruct
you
know
they
gave
you
a
mathematically
proved
way
to
do
it,
and-
and
in
fact
so
so
this
this.
I
think
this
kind
of
layer
in
software
will
be
at
some
point
generalized,
so
that
people
can
can
more
quickly
build
cloud
native
style
workloads
where
they
just
have
to
to
define
the
api
and
the
rest
is
already
taken
care
to
a
certain
extent
taken
care
but
anyways.
D
This
is
just
to
make
the
point
that
that
it's
possible
theoretically
to
build
this
kind
of
workloads.
D
At
the
end,
if
you
can
scroll
down
a
little
bit
at
the
end,
there
is
a
table
where
I
have
classified
some
of
the
workloads
common
workloads
that
stateful
vocal,
that
we
that
I,
that
we
encounter-
and
they
are
classified
by
the
consensus
protocol-
that
they
use
to
sync
the
replicas
and
then
the
consensus
particles
that
they
use
to
sync
the
partition
if
they
have
a
constant
of
partition
right,
because
partition
is
not
necessary.
D
Right,
for
example,
etcd
does
not
support
partitioning
the
data
so
all
of
the
copies
in
etcd
they
have
the
entire
data
space
right,
but
other
other
other
databases
do
like
when
you
want
to
scale
to
a
larger
ability
to
manage
a
larger
data
set.
D
Even
though
they
don't
it's
hard
to
find
this
information,
they
don't
advertise
it.
But
this
is,
I
think,
is
a
way
to
classify
workload
and
put
them
all
on
the
same
level
and
rationalize
their
their
own
internal
way
of
calling
things,
and
it
should
immediately
give
you
an
idea
of
what
the
workload
can
do
with
respect
to
the
problem
at
hand
right
so
with
respect
to
high
availability
and
disaster
recovery.
It
doesn't
tell
you
what
else
you
know
what
the
what
the
api
does,
and
we
don't
care
about
that.
But
we
can.
D
We
can
immediately
immediately
see
that
what
what
we
can
expect
in
terms
of
behavior
when
we.
C
D
Then
so
that
that
concludes
the
consensus
protocol
section
and
then
the
last
section
is
where
we
essentially
give
our
proposal
of
what
a
cloud
native
disaster
recovery
study
should
look
like.
Okay,
so
expect
here
we
will.
We
will
discuss
a
lot
for
a
long
time,
but
my
proposal
is
that
one
should
pick
a
workload
that
can
be
spread
across
multiple
availability
zones
as
a
single
logical
entity
and
then
and
then
let
it
do
its
job
right.
D
It's
gonna
be
have
to
be
written
to
work
with
the
concept
of
cap
and
to
have
the
concept
of
for
sure,
copies
and
maybe
even
partitions
right
and
then
the
idea
is
when
the
when,
in
this
picture,
when
a
data
center
goes
down,
you
will
have
some
level
of
global
traffic
manager
that
detects
that
and
doesn't
send
traffic
to
the
data
center.
That
went
down
and
will
send
the
traffic
to
the
remaining.
C
D
D
D
D
D
They
cannot
treat
disaster
recovery
as
an
aha
event
right
it
disaster
recovery
is
a
human
decision,
is
not
autonomously
managed
by
the
system.
It's
a
human
decision
that
there
is
a
disaster
and
then
a
lot
of
manual
processes
that
take
place
they
have
to
do
exercises
every
six
months.
I
think
those
that
actually
do
those
exercises
and
they're
very,
very
painful
here
we
are,
we
are
telling
them.
There
is
a
new
way
to
do
these
kind
of
things.
You
need
three
data
centers
and
you
need
workloads
that
can
be
deployed
that
way.
D
D
Also
to
you
know
what
it's
also
very
painful
for
them
when
the
when
the
data
center
that,
where
that
was
lost,
when
the
data
center
that
was
lost
comes
back
up,
restoring
everything
to
the
right
to
the
normal
to
normal
operation
is
as
painful
as
as
managing
the
disaster.
Sometimes
in
this
case
there
is
no
human
intervention.
When
the
disaster
happens,
there
is
no
human
intervention
when
the
data
center
comes
back
up.
D
So
it's
a
very,
very
desirable
situation
to
be
right,
and
that's
that's
in
my
opinion.
That
should
be
our
cases
that
we
propose
people
to
do
things
this
new
way,
if
they're
trying
to
do
cloud
native
cloud
native,
a
cloud
native
approach
to
disaster
recovery
and
and
and
the
surprising
thing
for
me
here-
the
surprising
discovery
is
disaster.
Recovery
is
often
very
much
associated
with
storage.
D
People
assume
that
the
solution
to
disaster
recovery
will
come
from
storage,
in
this
case,
really
the
solution
that
the
capabilities
that
we
need
are
in
part
of
the
the
specific
workload
okay.
It
has
to
be
built
that
way,
and
it
has
to
be
able
to
be
deployed
in
that
way.
But
then
the
other
capabilities
that
we
need
are
really
capabilities
that
come
from
networking,
more
than
storage.
D
A
Here
I
mean
I
I
like
where
this
is
going,
but
but
I
think.
A
C
A
Just
just
because
of
the
just
because
of
the
you
know
the
typical
expense
of
doing
that,
and-
and
you
know
the
rto
and
rpo
are
not
necessarily
like
the
more
cloud
native.
You
are
the
more
you
know
technically
you're
able
to
achieve
zero,
rto
and
rpo,
but
that
doesn't
necessarily
mean
that
you
need
to
do
that
or
that.
In
fact,
it
is
right
for
you
right,
because
that's
also
arguably
the
absolute
most
expensive
solution.
D
They'll
say
lore
that
the
more
the
more
you
want
to
achieve
that
the
more
especially
it
is,
I'm
I'm
challenging.
If
that
is
still
true
with
with
these
new
technologies,
I
think
it's
more
expensive,
but
not
extremely
more
expensive,
but
still
I
agree
with
your
argument.
We
should
not
say
that
in
cloud
native
you
can
only
do
it
this
way
and
that's
it's.
I
I'm
actually
trying
to
say
that
way.
D
So
if
you,
if
you
don't
mind
alex
scrolling
to
the
appendix,
but
he
may
need
some
rewarding,
I
agree
with
you.
It
may
need
some
reward,
but
the
other
in
the
panics
I'm
discussing
other
options
right,
which
I
call
the
more
traditional
business
recovery
options,
and
the
point
here
is
playing.
The
game
may
need
some
rewarding.
But
the
point
here
is,
you
can
still,
even
if
you
even
you
know,
even
if
in
container
native
or
cloud
native,
you
can
still
do
the
disaster.
D
Recovery
approaches
that
you're
probably
likely
doing
today
right
in
your
traditional
data
center
or
pre-cloud
data
center,
and
here
is
what
they
look
like,
and
here
is
some
considerations
on
how
to
implement
them
in
cloud
native,
and
there
is
some
some
specific
consideration
on
kubernetes,
but
my
yeah.
D
F
D
A
So,
for
example,
you
know
things
like
eventual
consistencies
in
in
data
systems,
for
example,
are
perfectly
reasonable
compromises
to
make
for
if
you
want
performance,
for
example,
but
eventual
consistency
also
means
that
you
know
zero.
Rpo
is
impossible,
but
but
that's
fine,
because
you
know
people
can
make
these
compromises
and
we
kind
of
discuss
those
different
options
and
those
different
attributes
in
the
white
paper.
So
so
I
think
we
need
to.
D
I
am
with
you
I'm
with
you
totally,
so
I
agree
and
if
the
words
don't
come
across
that
way,
we
can
certainly
fix
them
yeah
we
can.
We
can
change
from.
We
recommend
this
to
to
go
to
something
like
the
new
within
this
new
cloud
native
technologies.
This
is
now
enabled,
and
it's
possible-
and
you
would
do
it
this
way,
but
also
all
the
other
options
are
still
available.
D
A
A
Yeah,
thank
you
so
much
for
all
the
work
that
you've
put
into
this,
and,
and
you
know
just
echoing
what
rafael
I
just
said,
please
provide
please
provide
feedback.
That
would
be.
A
Great
excellent:
well
thanks
everyone
for
for
for
joining
the
crawl,
and
I
look
forward
to
the
to
the
next
set
of
updates
in
the
next,
and
the
next
meeting
have
a
good
rest
of
your
day.
Everyone.