►
From YouTube: OpenShift Commons Briefing Data Protection and Disaster Recovery Solutions Venka Kolli (Red Hat)
Description
OpenShift Commons Briefing
Data Protection and Disaster Recovery Solutions for OpenShift
Venkat Kolli (Red Hat)
2020-2-20
A
Before
we
get
started,
I
just
wanted
to
mention
that
we
have
now
opened
registration
for
the
next
open
ship,
Commons
gathering
and
Amsterdam-
that's
co-located
on
day
zero
with
coop
con
in
Europe
on
March
30th
and
the
fun
news
about
this.
If
you
are
a
container
storage
person
or
interested
in
container
storage
at
all,
we
are
hosting
a
half-day
hands-on
workshop
at
the
open
ship,
Commons
gathering,
and
it's
called
the
open
ship
container
storage
for
admins
hands-on
workshop,
and
you
can
sign
up
for
it
as
part
of
your
registration
for
the
open
ship
Commons
gathering.
A
There
are
lots
of
other
wonderful
things
going
on
as
well
on
the
main
stage
and
other
ones
too.
So
we
hope
you'll
join
us
in
Amsterdam
and
you
can
register
today
and
it's
open
and
it's
only
49
euros
and
it's
probably
your
best
deal
of
the
week
to
get
a
whole
lot
of
information
about
what's
going
on
in
the
kubernetes
and
OpenShift
ecosystems.
So
with
that
I'm
going
to
hand
it
over
today
do
venkat
holy.
A
B
You
Diane,
hello,
everyone.
Let
me
introduce
myself:
I'm
Venkat
Kali
I'm,
a
product
management
director
in
OpenShift
storage,
so
openshift
storage,
OCS
as
it's
called,
is
a
native
storage
solution
for
openshift.
So,
specifically,
we're
going
to
be
talking
about
data
protection
and
disaster
recovery
solutions
that
you
should
consider
when
you're
deploying
openshift
and
in
this
session.
B
B
You
know
functions
work,
but
for
this
session,
we're
going
to
give
a
general
portfolio
or
view
okay,
so
I
said
that
let's
well
right
into
it.
So
when
you
think
about
backup-
and
dr
I
mean,
obviously,
these
are
the
solutions
that
are,
you
know
meant
to
protect
from
your
failures
and
basically,
hello
is
this
seems
to
be
crosstalk.
B
So
not
all
failures
are
the
same
right.
So
if
you
generally
look
into
the
failures
that
are
that
impacts
your
applications
and
data
right,
you
can
broadly
categorize
them
into
two
broad
categories,
all
right,
so
one
is
logical
failures
or
software
failures.
These
could
be.
You
know
you
user
errors,
where
somebody
accidentally
deletes
a
file
or
someone
intentionally
deletes
a
file.
It
deletes
some
data
right
and
your
software
bugs
or
some
virus
and
the
other
malicious
software
impact.
B
So
in
these,
these
failures
are
the
ones
where
there's
something
wrong
with
the
application
logic
or
the
application
data
right.
But
your
hardware
is
intact
right.
Your
your
data
center
and
hardware
is
running
okay,
but
you
lose
either
a
piece
of
data
or
an
entire
application
itself
and
got
corrupted.
So,
in
this
case,
the
most
logical
thing
to
do
is
for
you
to
go
back
in
point
in
time
right
before
the
failure
has
occurred
and
take
a
good
copy
of
the
data
and
restore
it
back
to
your
primary
cluster
right.
B
So
because
your
hardware
is
running
fine,
your
cluster
is
not
impacted
from
hardware
standpoint,
so
the
quick
recovery
mechanism
is,
you
know,
going
back
in
point
in
time
and
recovering
from
there.
Yes,
it
does
involve
in
data
loss,
at
least
you
or
you
have
a
good
copy
to
start
off
with
right.
So
this
is
where
the
backup
solutions
typically
dwell
all
right.
So
this
is
how
the
backup
solutions
work,
so
they
basically
are
built,
for.
You
know
for
software
failures
and
maintains
multiple
copies
to
recover
to
recover
from.
B
In
fact,
the
right
word
to
use
is
restore
right,
restoring
the
data-
and
you
have
this
other
class
of
failures
generally,
which
are
less
than
less
common,
but
have
much
more
devastating
effect
right.
So
this
is
where
your
hardware
failed
or
or
could
be
you
to
hold
the
center
has
failed.
The
hardware
fails
either
based
on
some
components.
B
I
mean
it
describes,
or
you
know
some
other
part
of
your
hardware
that
fails
and
takes
away
the
node
of
the
cluster
right
or
you
could
have
an
HVAC
issue
or
a
power
grid
issue
where
your
data
center
is
down
right.
So
in
this
case,
obviously,
you
cannot
wait
for
this
to
be
repaired
and
recovered
right.
So
the
recovery
mechanism
is
that
you
failover
to
a
remote
site
that
you
previously
set
up
and
where
you
have
been
copying
the
data
to
all
right
and
move
your
applications
running
off
from
your
dr
side
right.
B
So
this
is
your
failover
to
either
a
standby
or
a
hot
side
and
we'll
get
into
the
details
about
the
different
dr
sites
that
you
know
that
mechanism
to
have.
So
these
are
the
dr
solutions.
These
are
typically
a
function
of
a
storage
at
the
underlying
storage
and
right.
So
that
basically
protects
you
from
from
me,
or
you
know
physical
and
datacenter
failures.
Now
the
common
tools
and
technologies
that
are
used
for
for
data
protection.
These
are
like
very
common
set
of
tools
that
are
used.
B
So,
let's
start
off
with
say
data
mirroring,
so
data
mirroring
is
essentially
a
synchronous
mirror
copies
that
most
modern
applications
and
the
storage
systems
do
right.
So
whenever
you're
writing
an
application,
is
writing
a
IO
or
any
transaction
being
complete
before
the
transaction
gets
complete?
It
synchronously
makes
sure
there
are
more
than
one
copy
it's
right
to
more
than
one
place
and
they're
always
consistently
and
minaret.
So
you
always
have
I.
You
know
a
full
consistent
copy
before
the
transaction
or
IO
is
complete
right.
B
So
this
way,
even
if
you
you
know,
have
a
node
or
a
single
copy
failure
without
application
without
a
beat,
can
can
continue
running
on
the
other
good
copies
right.
So
there
is
no
data
loss
here
or
any
application
downtime
with
the
data
mirroring.
Obviously
it
comes
with
some
limitations
and
we'll
go
through
this
in
detail
later,
but
that's
essentially
what
the
data
mirroring
is.
So
just
as
an
example,
the
OCS,
which
is
the
native
storage
of
OCP,
does
have
a
native
data.
Mirroring
replicate
Marine
Corps
built-in
into
it.
B
So
by
default
it
always
writes
three
copies
and
consistently
keeps
them
copies.
Synchronously
mirrored
right.
So
that's
the
data
mirroring
and
snapshot.
You
know
most
of
you
probably
already
know
and
heard
about
snapshots
all
right,
so
this
is
basically
where
you
consistently
take
point-in-time
copies
right
and
you
you
know
you
keep
a
set
of
them.
So
when
you
need
to
go
back
to
the
previous
copy,
because
your
latest
copy
either
your
latest
data
is
either
corrupted
or
are
lost
right.
You
can
restore
from
one
of
this
point
in
time
copies,
so
this
step
shows.
B
Typically,
it
makes
the
core
foundation
of
any
backup
solution,
or
sometimes
even
the
dr
solutions
right
and
building
on
snapshots
are
the
backup
applications,
so
the
backup
applications
efficiently.
You
know
take
this
point
in
time
coffees
now
this
could
dual
them
right
or
you
can
set
up.
You
know
certain.
You
know
some
mechanism
where
you
can
have
certain
policies
where
you
can.
Actually
you
know,
keep
those
copies
at
the
locally
or
a
remote,
so
that
yeah
and
also
restore
from
them
when
when
the
time
is
so,
this
is
essentially
an
application.
B
Typically,
in
you
know,
in
most
traditional
enterprises,
you
have
a
full-blown
backup
applications
that
are
built
with
very
rich
backup
policies
all
built
into
them
right.
So
that's
basically
a
backup,
a
solution
that-
and
that
is
typically
built
on
snapshots.
And
lastly,
the
other
common
mechanism
to
use
for
data
production
is
data
replication
and
again
this
is
a
storage
function
where
you're
now
asynchronously
copying
the
data
to
a
remote
destination
right.
B
So
these
being
the
common
tools.
So
how
do
they
come
together?
So
when
you
think
about
it
approaching
solutions
right?
So
you
basically
are
driven
by
what
is
something
called
SLO
right,
service
level
objective,
and
so
you
take
an
application.
You
got
to
determine
the
business
value
of
that
application
and
how
much
service
are
how
much
critical
is
that
application
to
your
business
right
and
based
on
that?
The
most
common
two
metrics
that
are
used
are
called
RPO
and
RTO,
and
they
constitute
what
a
service
level
objective
is
right.
B
B
Obviously
right,
but
you
know
protecting
it
without
any
kind
of
data
loss
obviously
gets
you
know
more
expensive,
so
you
have
to
rely
on
more
expensive
solutions,
but
if
the
business
calls
for
applications
calls
for
that,
yes,
you
go
to
the
most
critical
zero
data
loss
solutions
right,
but
not
all
applications
might
require
that
level
of
you
know
guarantee.
So
you
can
have
options
to
go
to
previous.
You
know
to
more
loosely,
you
know,
defined
solutions
and
on
the
other,
metric
is
the
RTO.
So
this
defines
how
quickly
do
you
need
to
bring
backup
right?
B
So
if
your
application
in
credit
downtime,
how
quickly
do
you
need
to
bring
that
back
up
and
make
it
make
it
running
and
just
as
before,
right?
Obviously,
you
don't
want
to
have
any
application
downtime,
but
again
you
know
for
that
to
happen
right.
It
requires
expensive
solutions,
so
you
have
to
basically
make
a
judgment
on.
You
know
where
your
will
do.
B
You
lie
on
the
spectrum
of
RPO
and
RTO
right
and
if
you
look
into
solutions
that
we
just
are
talking
about
so,
for
example,
the
mirroring
is
the
one
that
basically
provides
you
the
most
consistent
copy
without
any
data
loss
and
also
helps
you
to
recover
automatically
right.
So
this
is
getting
close
to
our
0
RP.
U
+,
0
RTO
with
that.
B
Obviously,
in
a
solution
gets
expensive
right
and
especially
if
you
are
trying
to
protect
from
many
other
disasters
that
are
more
Geographic
in
nature,
but
we'll
get
into
that
later
and
getting
on
that
scale
a
little
bit
more.
You
know
loser
you,
application
of
that
is
replication
right,
where
you
are
replicating
the
data
to
a
longer
distance,
so
that
obviously
might
involve
asynchronously.
That
would
involve
you
know
some
data
loss
but
you're
protected
against
more
types
of
disasters
with
this
solution
right
and
both
replication
and
mirroring
are
basically
are
built
for
hardware
failures.
B
B
You
choose
the
right
solution
for
that,
so
getting
into
a
little
bit
more
details
onto
each
one
of
them
so
start
off
with
they
did
a
production
with
the
snapshots
and
backup.
So
this
is
where
I
said
right.
You
know
based
on
snapshots,
you
build
a
backup
solution
on
it,
so
the
snapshots
snapshots
are
always
point
in
time.
Copies
and
most
modern
storage
systems
you
know,
provide
you
incremental
copy,
so
it's
not
like
a
full
copy.
B
Every
time
you
take
a
snapshot,
you
have
a
full
base
copy
and
then
all
the
other
copies
later
on
are
basically
incrementals
of
those
copies.
So
you
basically
from
a
story
standpoint
and
also
from
time
it
takes
a
snapshot,
is
a
lot
more,
a
lot
more
quicker
and
the
local
snapshot
is
your
first
defense
of
you
know
prediction
right.
So
these
are
located
on
your
the
same
cluster
and
is
taking
up
the
same
space
as
your
primary
cluster
right
and
you
could
quickly
recover
because
it's
local
on
the
same
node,
also
on
the
same
cluster.
B
So
whenever
you
have
a
failure
or
if
you
lose
a
file
or
have
some
data
loss
right
or
sorry,
if
you
have
any,
you
know
failure
incident,
you
can
quickly
go
back
to
your
prior
in
point
in
time
and
you
know
get
that
restored
from
there
locally.
Now.
Why
wouldn't
in
any
one
use
this
right
as
their
as
their
only
backup
mechanism?
One
is
obviously
because
you're
using
your
primary
storage,
so,
as
you
increase
in
a
number
of
snapshots
one,
it
is
eating
up
your
primary
storage
capacity.
B
For
for
your
backup,
copies
and
also
the
other
thing
to
consider
is
that
as
the
depth
of
the
snapshot
so
and
when
you
say
def,
it's
the
number
of
snapshots
that
you're
retaining
right
as
the
number
of
snapshots
that
you're
retaining
goes
higher,
there
will
have
an
impact
on
the
on
the
application,
because
these
are
point
in
time
incremental
copies.
You
know
it's
primarily
a
snapshots,
it's
created
by
a
pointer
mechanism,
so
you
have
more
processing
that
needs
to
be
done.
B
So
the
other
downside,
obviously,
is
that
if
you
lose
your
primary
cluster-
or
you
know
you
know
any
nodes,
you
know
you
lose
your
price
snapshots
along
with
it,
so
that
calls
for
the
snapshots
to
be
stored
and
in
a
backed
up
to
a
remote
destination.
So
that
is
where
the
backup
solutions
come
in
right,
so
the
backup
solutions
essentially
are
taking
the
snapshots
and
copying
into
a
more
cheaper
or
an
external
storage
in
a
typically
s3
object.
B
Store
is
a
more
common
way
to
are
more
getting
more
common
for
this
backup
targets
right
and
these
scale
well-
and
also
you
know-
is
that
much
lower-cost
so
right,
so
you
can
have
more
backup
copies
retained
on
this,
and
this
object
store
could
be
located
far
out
for
more
disaster
like
scenarios
or
could
be
within
the
same
campus
or
in
a
metro
network.
So
you
can
have
a
quick
recovery
happening
from
there
right
and
by
the
way,
the
same
mechanism
where
you're
backing
up
to
a
remote.
B
You
know
this
nation
can
also
be
used
for
my
migration
migrating
of
the
workloads.
So
in
fact,
OCP
has
a
migration
tool.
You
probably
might
have
come
across
called
camp.
This
is
actually
built
on
the
same
backup
mechanism
that
you
know
that
that
we
are
built
so-
and
you
know
in
one
of
the
follow-on
presentations
we
can
go
in
more
details
and
how
the
how
that
works.
But
this
is
also
a
common
way
to
use
for
migration,
and
the
other
aspect
to
consider
for
the
backups
is
that
so
these
are
managed
by
policies.
B
So
you
can
actually
see
it.
You
know
set
up
a
policy
on
whenever
you're,
creating
an
application
or
volume
for
that
application
right.
You
can
basically
define
your
scheduling
and
retention
policies.
Rights.
Oh
I,
want
this
to
be
backed
up
every
15
minutes
or
every
hour
and
I
want
to
be
retained
for
seven
days
or
for
for
a
month
right.
So
these
rescheduling
intentions
are
basically
specified
in
the
backup
policy,
and
these
backup
policies
could
be
codified
into
storage
classes,
so
you
could
actually
have
a
predefined
storage
class.
B
So
storage
class
is
a
way
where
you
can.
You
know
they
define
your
dynamic
volume
right,
your
PVC
volume
right,
so
you
can
basically
create
a
storage
class
and
define
these
policies
into
the
storage
class.
So
if
somebody
just
shows
in
a
gold
class
storage
right,
they
come
up
with
a
very
frequent.
You
know
snapshots
and
right
and
have
a
longer
retention,
for
example,
right
so
so
on
and
so
forth.
So
that
is
how
the
you
know
the
backup
policies
are
defined
and
and
used
in
the
OCP.
So
we
just
talked
about
the
storage
here.
B
Storage
backups,
but
you
know,
backup,
is
not
just
about
the
storage
but
about
your
entire
application
and
kubernetes
allows
and
specifically
OpenShift
allows
you
to
actually
have
your
backup
defined
at
an
application
level
right.
So
you'd
want
to
have
your
protection
for
a
consistent
protection
for
your
whole
application,
rather
than
a
volume
by
volume
basis
right.
So
what
is
an
application
level?
Backup
comes
this
stuff,
so
start
off,
we
get
at
the
core.
Obviously
it
is
storage
application
data
right.
B
This
CSI
interface
is
also
being
expanded
to
cover
snapshot
functionality
right,
so
the
snapshot
interface
is
now
you
know
getting
standardized
with
the
CSI
interface
right
and-
and
this
is
currently
in
beta,
but
it's
going
to
go
into
GA
Suen
in
in
few
releases,
so
at
a
core
storage
level,
so
you
have
the
application
volumes
getting
snapshot
at
through
the
CSI
interfaces.
Now
coupling
the
with
this
application
data,
our
application
volumes,
are
the
cluster
resources
right
that
are
associated
to
these
this
application.
So
when
we
say
application,
what
does
the
application
mean
in
kuben?
B
B
The
parts
containers
for
application
of
interest
in
one
label
and
I
could
basically
want
to
backup
based
on
the
label
or
I,
could
say
that
I'm
on
the
back
of
this
entire
namespace
right
or
every
application
component
of
that
namespace
needs
to
be
backed
up,
and
when
we
talk
about
that
space
level
back
up,
so
we're
not
just
talking
about
the
application
data,
the
application
volumes
we
talked
about
before
the
PVCs,
but
also
the
cluster
resources
that
belongs
to
this
namespace.
So
these
are
the
namespace
FRA
projects.
B
You
know
your
deployment
stateful
sets
right
your
config
map
secrets
and
you
know,
amble
files,
everything
that
is
associated
to
the
namespace.
So
when
you
are
able
to
back
up
write
your
application
data
along
with
the
cluster
metadata
right
now,
you
have
a
full
name,
space
or
application
level,
consistent
backup
right
so
for
you
to
be
able
to
either
migrate
this
application
or
restore
this
entire
application.
B
It
is
much
much
easier
if
you're
able
to
have
one
consistent
copy
that
covers
both
the
configuration
data,
cluster
configuration
data
and
the
application
data
right,
and
that
is
what
kubernetes
allows
and
specifically,
we
are
building
into
the
openshift.
Again,
as
I
said,
you
know
more
details,
more
technical
details,
we'll
cover
in
the
follow-on
sessions,
but
I
just
want
to
kind
of
quickly
quick,
give
on
how
openshift
allows
you
to
basically
provide
application
level
backups
now
building
on
so
now
you
covered
up
to
the
cluster
right,
so
you
have
a
whole
cluster
covered
now.
B
Sometimes
you
know
for
you
to
be
able
to
get
a
crash,
consistent
copy
crash,
consistent
copy,
meaning
that
you
know,
if
there's
a
system
that
is
there's
going
down,
you
have
a
copy
that
is
good
just
before
that
crashed
right,
so,
which
means
that
a
group
of
the
volumes
are
all
consistent
together
right
before
the
time
of
the
crash
right.
So
that
requires
quieting
of
that
application.
B
That
requires
the
application
to
flush
all
its
caches
and
all
its
data
to
the
persistent
storage
right,
so
that
a
snapshot
can
be
taken
so
that
that's
a
quiet
operation
of
their
application.
So
all
of
these
procedures
are
can
be
automated
with
this
application.
Operators
right
and
the
backup
solutions
typically
provide
specific
application
agents
to
do
that
right.
B
So,
combined
with
this
application
agents
and
with
the
clustered
metadata
and
the
application
data
forms
the
full
application
level,
backup
solution
so
specifically
for
OCP
and
we'll
see,
s-so
OCS
is
built
with
the
incremental
snapshots
at
they're,
built
with
CSI
that
are
going
to
be
introduced
soon
and
Rho
CP
Works
uses
Valero,
which
is
other
in
an
open
source.
You
know
backup
solution
to
be
able
to
provide
a
consistent
cluster,
consistent
backup
at
a
namespace
level.
B
Right
and
working
along
with
our
backup
partners,
will
be
able
to
offer
a
full
fledged
backup
solution
at
an
application
level
you
know
for
for
for
OCP
users
right.
So
that's
a
quick
overview
of
what
the
backup
solution
is
so
quickly.
Moving
on
to
the
dr.
So
this
is
the
basically
dr,
as
I
said,
is
build
a
solution
built
on
your
application
right.
B
It
is
where
you're
a
synchronously
copying
the
data
to
anymore
side
two
and
these
remote
sites
are
scheduled,
are
set
up
at
far
enough
distances
where
you
know
you're
not
affected
by
you're,
protected
against
in
the
geographical
failures,
floods,
fire
or
power
grids,
or
any
of
that
right.
So
that's
what
typically,
you
define
your
blast
radius,
which
is
basically
what
defines
your
protection
failure
you're
protecting
from
what
gets
impacted.
B
You
know
during
that
you
know
in
the
distance,
so
you
schedule
your
dr
site
and
connected
by
van,
for
you
know
that
that
that
is
beyond
the
blast,
radius
right
and
because
you
are
relying
on
an
asynchronous
replication
to
protect
against
these
failures
right.
There
will
be
a
data
loss
involved
right,
so
there
will
be
not
a
solution
where
you
can.
You
know
predict
from
you
know,
with
without
any
complete
data
loss
right
to
protect
against
this
long
distance.
B
You
know,
Vandy,
you
know
in
a
Vandy
are
so
typically
when
we
talk
about
the
asynchronous
volume
replication
again.
This
is
done
at
a
storage
level.
All
the
volumes
belong
to
the
applications
are
consistently
replicated
to
the
dr
side
right
and
the
schedule
that
you
scale
that
you
set
up
for
this
replication.
It
really
depends
upon
what
your
RPO
needs
are
right
again.
How
far
do
you
want
to?
Are
you
willing
to
lose
the
data
now,
if
you
want
to
schedule
at
a
very
frequent
interval
like
even
seconds
it
is
possible.
B
However,
your
network
also
defines
you
know
the
bandwidth.
You
need
to
have
enough
bandwidth
network
bandwidth
to
handle
the
data
right.
So
if
you
are
doing
a
very
frequent
replication,
which
means
every
change
that
is
happening
in
that
very
short
interval
needs
to
be
transferred
over
the
network
to
the
remote
site
and
and
if
your
network
cannot
handle
that
you
get
conditioned
there
and
obviously
you
will
fail
and
to
keep
up
with
you
know
with
the
changes
and
that
impacts
your
applications.
B
So
both
your
RPO
target
and
your
available
network
bandwidth
defines
what
your
replication
interval
is
would
be
right.
But
one
positive
aspect
of
this
is
that,
because
this
replication
is
async,
your
application
is
a
latency.
Doesn't
get
impacted
right
so
because
you
write
your
data
to
your
primary
side
and
your
replication
takes
over
later
right
to
make
the
you
know
to
keep
this
car.
You
know
changes
transferred
over
to
the
remote
side.
B
So
that's
basically
on
the
asynchronous
volume,
replication
and
I
said
this
is
a
storage
function,
mostly
a
tentative
volume,
but
there
are
also
some
applications
that
handle
you
know
the
changes
and
replication
at
the
application
level,
but
those
are
less
common
now,
so
that
is
making
the
data
available
to
the
remote
site
is
one
aspect
of
the
solution,
so
the
other
key
aspect
of
the
our
solution
is
a
failover
itself
right.
So
it's
the
process
of
where
you're
transferring
your
application
from
your
primary
site.
B
We
just
failed
to
your
remote
site
and
make
the
application
better.
You
know
being
and
bringing
up
the
application
there
and
you
know
going
or
the
recovery
process
there
now.
One
thing
good
about
openshift
and
kubernetes
is
that
this
failover
mechanism,
which
typically
is
mostly
manual
in
the
traditional
world,
can
be
fully
operated.
We
couldn't
fully
automated
with
the
operators
right,
so
the
same
operators
today
that
we
use
to
bring
up
your
applications
on
your
primary
site.
B
We
use
the
same
mechanism
to
have
an
automated
failover
management
done
and
even
fail
back
fail
back
is
something
there.
Once
you
fail
over
to
your
remote
site,
you
repair
your
primary
site
and
then
once
your
primary
site
is
repaired,
you
want
to
get
back
the
original
state
right.
So
that's
a
fail
back
to
your
repaired
side,
so
both
this
failover
and
failback
can
be
automated,
and
since
these
are
automated,
you
know
you
can
do
this.
B
You
can
achieve
a
very
quick
failover
so
which
results
in
a
very
low
RTO
right,
recovery
time
objective,
if
it's
a
manual
set
of
operations
where
you
how
to
follow
ten
steps
for
this
application
to
be
brought
up,
which
is
typical
in
them
in
the
traditional
world
right
that
takes
a
much
longer
time.
But
with
this
this
allows
you
to
have
a
query
quick.
You
know
recovery
of
the
applications.
B
This
not
only
helps
you
during
your
failover
failed
back
operations,
but
also
the
more
frequent
testing
that
you
do
so
once
you
set
up
with
the
our
site.
Most
companies
require
you
to
you
know
to
do
test
for
readiness
of
this.
Dr,
so
you
actually
have
you
know
dr
testing
mechanism.
That
also
can
be
done
with
through
the
opera,
the
automated
operations
with.
So
that
also
helps
you
with
the
quick
thing.
So
one
other
thing
to
note
is
a
room
where
we
keep
using
the
word
automated,
not
automatic
right,
because
it's
always
triggered
manually.
B
No
one
wants
to
have
an
automatic,
which
means
a
you
know.
An
on
manually
intervened
feel
over
because
closes
an
asynchronous
dr
solution,
there's
always
a
data
loss
involved.
So
there's
always
a
judgment
hat
that
has
to
be
made
whether
you
want
to
failover
to
the
remote
side.
Once
you
declare,
you
know
your
primary
site
inoperable
right
or
you
want
to
invest
in
bringing
your
primary
site
back
up,
which
could
be
quicker
and
where
you
cannot
really
don't
need
to
incur
it
at
a
loss
right.
So
there's
always
a
judgment
that
needs
to
be
made.
B
So
more
so,
it
is
always
required
for
a
manual
intervention
to
be
made,
but
once
you
decide
to
failover,
you
need
to
be
able
to
just
click
one
button
and
do
an
entire
failover
very
quickly.
So
that
is
automation
right
and
in
the
case
of
OCP
and
OCS
right.
So
we
do
have.
The
self
is
synchronous,
mirroring
that
does
the
data
mirroring
a
data
replication
solution
at
a
volume
level,
and
we
have
the
operators,
storage,
operators
that
helps
you
to
restore
your
volumes
and
your
applications
backup
right.
B
So
this
is
basically
using
or
OCS
fluke
operators.
So
now
you
have
both
an
automated
failover
management
and
an
asynchronous
mirroring
built
into
the
you
know:
open
shipped
storage
right
and,
as
I
mentioned
before,
we'll
get
into
more
details
in
the
following
sessions,
but
that
is
how
the
solution
is
defined
for
for
asynchronous
replication
now
getting
into
a
more
critical
applications.
B
B
Now
these
also
kind
of
fall
into
your
H,
a
category
right
so
H,
a
and
D
are
they
go
hand
in
hand,
and
especially,
you
know.
This
is
the
case
where
you
know
some
of
them
call
this
to
be
an
H,
a
solution,
a
high
availability
solution
right
that
also
can
work
as
a
TR
solution
if
you
use
it
effectively.
So
here,
what
we're
really
looking
at
is
that
your
D,
our
site,
are
basically
defined
either
as
an
availability
zone.
Right
and
availability
zone
is
defined
as
something
that
has
a
you
know
full.
B
B
So
this
is
your
basically
defined
failure,
domain
right
so
and
typically,
when
customers
define
availability
zones,
they
set
up
such
a
way
that
your
HVAC
or
your
power
grids
are
basically
not
crossing
over
the
zones
right,
so
they're
basically
set
up
in
places
where
the
different
portraits
or
your
HVAC
systems
are
having
redundancy
for
each
of
these
available
to
zones.
This
could
be
as
simple
as
a
rack
right,
a
rack
level
zone
or
it
could
be
like
a
data
center
bill
Eric.
B
You
know
a
building
in
campus
as
an
available
to
zone,
and
you
have
your
next
zone
in
the
in
a
different
building
in
the
campus
or
they
could
be
spread
in
a
metro
distance
right.
So
you
could
have
it.
You
know
in
one
across
the
town
from
each
other
right.
So
this
is
a
place
where
you
need
to
have
three
availability
zones,
because
it's
always
an
odd
number
is
required
because
of
the
quorum
issues
right.
B
B
The
other
aspect
to
remember
is
that
you
know
latency
between
these
zones.
The
network
running
between
the
zones
is
very
important
right
so
because
these
are
synchronously
copied,
so
you
know
the
more
latency
the
network
you
know
incurs
impacts
your
application
right.
So
typically,
you
should
not
exceed
more
than
ten
milliseconds
of
latency.
B
So
it
knows
essentially
where
all
these
the
copies
are,
and
even
if
a
single
zone
or
a
copy
that
is
failed,
it
distributes
the
load
across
those
two
other
remaining
zones
right
and
there's
a
solution
that
does
not
incur
any
data
loss
and
you
have
a
full.
You
know
you
know
solution
that
has
an
automatic
recovery
and
no
data
loss
right,
but
obviously
this
can
be
used
for
solutions
that
has
the
protect
against
a
big
blast
radius,
which
means
that
any
base,
Geographic
failures,
events
like
earthquakes
or
fires-
you
know
you
get
impacted
by
all.
B
Three
zones
gets
impacted
by
that
right,
so
it
does
not
protect
you
from
those
those
failure.
Events.
So,
given
these
three
solutions,
your
backup
and
your
a
synchronous,
dr
and
your
data,
mirroring
solutions
right,
so,
as
you
could
see,
each
of
them
covers
different
failure
scenarios
and
provide
you
options.
Different
recovery
options
for
four
against
those
failure
scenarios,
and
now
the
more
common
way
is.
You
know
the
users
tend
to
use
all
three
of
them
or
a
combination
of
them
together
to
come
up
with
their.
B
You
know
their
prediction
mechanism,
for
you
know,
for
you
know
for
your
for
your
application
right.
So
you
start
off
with
hardening
the
primary
site
with
you
know,
with
this
multi
zone
solution
that
I
talked
about
the
stretch
clustering
right.
So
this
is
where
you
can
actually
have
your
primary
site.
Your
primary
data
center,
you
know
defined
with
three
different
zones,
and
you
have
your
cluster.
B
You
know
spread
across
these
three
zones
and
hardened
accordingly
right,
so
any
hardware
nodes,
you
know
you
still
can
run,
keep
running
on
your
primary
side
without
having
to
have
a
failover
and
your
application
runs
without
missing
a
beat
right
now
combine
that
with
the
DR
solution.
Now
you
have
a
solution
that
pretends
to
Geographic
failures
right,
so
you
have
a
nice
increased
data,
mirroring
that
is
done
from
your
primary
cluster
to
a
remote
cluster
right
now
you
have
Geographic
protection
against
Geographic.
You
know
failures
right.
B
Obviously,
this
could
incurred
at
a
loss,
but
again
you
have
failure
against
most
calm
most.
You
know
major
disasters
with
this
right
and
along
with
that,
so
that
that
handles
your
physical
or
datacenter
failures,
and
you
combine
that
with
your
backup
solution
that
handles
the
logical
failures
as
well
right
now,
you
have
previous
point
in
time,
coffees
that
are
stored
in
a
remote
site,
right
and
or
in
your
chip
or
object
storage,
and
that
helps
you
to
go
back
point
in
time
when
your
latest
copies
no
longer
good
right.
B
So
the
key
difference
you
might
have
noticed
here
is
that
with
a
dr
site,
you
always
have
your
most
recent
copy
available
and
replicate
to
the
DR
side
right
so
you're,
trying
to
recover
to
your
most
recent
copy
with
the
backup
solution.
You're
choosing
your
previous
point
in
time
copy
when
the
most
recent
copy
is
corrupted
right.
So
if
you
have
a
corruption
on
your
primary
site,
the
same
corruption
gets
replicated
to
your
site
as
well,
so
it
doesn't
protect
you
from
a
logical
failure.
B
B
There's
a
very
good
question
Rob
and
so
clearly
your
data
production.
You
know
policies
gets
defined
by
that
and
in
fact
there
are
multiple
factors
that
that
drive
your
cluster
design
right,
whether
to
how
one
large
cluster
or
multiple
clusters,
and
if
you
look
into
how
the
community
is
moving
and
what
most
of
the
thinking
is
going
right.
B
Most
people
are
leaning
towards
having
multiple
clusters,
multiple
small
clusters
that
are
defined
for
each
specific
workload.
Right
because
there
are
a
lot
of
you
know:
innovation
happening
in
this
multi
cluster
management
right
there,
the
clusters,
so
you
have
tools
and
technologies
that
are
being
developed
to
handle
the
management.
B
Yeah,
so
backup
of
all
these
tools
actually
has
the
least
impact
on
your
primary
application
performance
right
so
because
the
backups
are
built
on
the
snapshots
technology
and
typically,
what
happens
is
that
when
a
backup
needs
to
happen
right,
it
takes
a
quick
snapshot.
First,
it
needs
to
cause
the
application
to
get
a
consistent
snapshot
right.
B
That's
the
primary
impact
of
the
performance,
but
these
days,
these
most
modern
applications
have
a
quick
wise
mechanism
where
it
flushes
the
cache
onto
the
disk
right
and
the
moment,
early
freezes,
the
transactions
and
and
because
these
are
point
in
time
copies,
especially
the
worst
years
of
you
know.
The
storage
of
OCP
supports
the
point
in
time.
Copy
snapshots
are
instantaneous,
almost
instantaneous
right
once
the
snapshot
is
taken,
your
application
is
free
to
move
on
and
the
backup
process
happens
thereafter
right.
B
A
If
people
had
more
questions,
you
can
always
check
out
the
openshift
products
page
with
container
storages
and
there's
links
there
to
all
the
different
pieces
of
the
container
storage
solutions
and
these
backup
and
disaster
recovery
pieces,
and
you
can
always
reach
out
to
ben
cat
Kali
directly
and
and
find
him
on
the
Internet
as
well.
I'll.
A
Just
reiterate
again
that
we
are
there
the
make
sure
we
are
hosting
a
deep
dive
workshop
in
Amsterdam
in
a
couple
of
weeks,
a
couple
of
what
a
little
more
than
a
couple
weeks
on
March
30th
at
KU
Con,
and
you
can
register
for
that
at
on
Commons
at
OpenShift,
org
and
sign
up
for
the
the
container
storage
workshop
and
we'll
happily
dive
in
and
answer
any
of
your
questions
at
that
event
as
well,
so
that
that
is
the
container
storage
for
admins
workshop
on
March
30th.
So
space
is
limited.