►
From YouTube: How to protect your cloud-native data 101
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
A
So
for
today
what
I
have
in
store
for
you
is
we're
going
to
take
a
look
at
the
definition
of
cloud
native
data,
what
other
characteristics,
what
all
their
you
know,
what
may
be
different
options
for
you
to
build
data
within
the
cloud,
because
cloud
native
is
not
necessarily
only
about
kubernetes,
but
then
we
will
switch
specifically
to
kubernetes
talking
about
different
concepts
involving
data
and
also
extended
to.
A
A
But
what
it
doesn't
said
is
that
you
know
we
should
consider
this
whole
cloud
native
database
concepts
as
a
whole
iceberg
at
the
top
of
the
iceberg.
This
is
more
or
less
the
care
bear
world
where
the
narrative
is
told
by
the
service
club
providers.
Everything
is
easy,
is
naturally
consumable
via
apis
and
you
basically
pay
for
what
you
consume.
A
A
If
you
look
at
maybe
google
to
run
kubernetes,
maybe
aws
to
run
your
storage
and
your
buckets,
etc,
etc.
So
you
may
want
to
have
specialized
cloud
for
a
specific
set
of
function.
You
want
to
provide
to
your
business
and
then
the
challenge
becomes
that,
as
you
move
between
different
clouds
and
reiterate,
you
know
database
migration
or
operate
databases
into
those
different
clouds.
You,
even
though,
at
the
end
it
may
be.
You
know
relational
database
or
no
sequel
database.
A
The
fact
is
in
terms
of
the
operation,
those
are
operated
in
different
way
because
you
are
using
different
cloud
providers.
So
it's
not
exactly
the
same
api,
but
protecting
your
data
will
happen
in
a
different
way
and
how
you
combine
this
data
with
your
overall
application
architecture
will
also
be
different,
and
typically
that
means
that
you
need
a
broader
scope
in
terms
of
the
skill
sets
of
your
engineering
teams.
A
A
As
you
probably
know,
most
of
the
cloud
database
are
replicated
and
highly
available
within
a
particular
availability
zone.
As
soon
as
you
want
to
recover
into
a
different
availability
zone,
then
it
does
incur
some
downtime.
Potentially
you
have
to
restore
for
snapshots,
and
that
also
means
you
have
to
schedule
and
manage
the
lifecycle
of
those
snapshots
by
yourself
and
chances
are
that
your
snapshot
may
not
be.
A
A
If
you
consume
this
database
as
a
service
model
from
different
clouds,
you
will
have
to
repeat
the
same
sort
of
automation
and
extended
operation
over
multiple
cloud,
and
you
know
automating
snapshot
testing.
The
restore
all
those
snapshots
in
different
cloud
will
also
involve
different
skill
sets
because
they
are
using
different
api,
different
sdks
defense
provider.
A
So,
let's
first
take
a
look
at
the
cloud
native
features.
What
we
could
expect
from
a
kubernetes
environment,
whether
running
on
premises
or
in
the
public
cloud
managed
or
unmanaged?
First,
it's
all
about
scalability.
The
last
couple
of
years
I've
seen
the
rays
of
auto
scaling
for
pods,
but
also
auto
scaling
for
nodes.
So
this
means
that,
as
your
application
requires
more
power,
you
can
also
deploy
more
nodes
in
kubernetes,
as
you
would
do
with
a
neutral
scaling
group
in
public
cloud
providers
as
well,
so
not
only
for
nodes,
but
also
for
the
application
itself.
A
So
you
can
scale
your
application
to
be
able
to
take.
You
know
taking
some
of
the
peaks
during
high
usage
periods
such
as
you
know,
promotion
sales,
if
it's
that's
a
commercial
application,
or
during
black
friday,
for
example,
where
potentially
you
need,
you
need
more
power
for
your
application,
so
more
web
servers,
maybe
more
database
nodes
to
facilitate
reads,
but
also
more
kubernetes
compute
nodes
as
well
elasticity.
A
So
even
though
things
may
not
succeed
the
first
time,
maybe
the
second
time
a
controller
will
try
to
do
something
all
the
prerequisites
will
be
met
by
you
know
if
it
depends
on
other
controllers
and
in
the
end,
eventually,
everything
will
converge.
Sort
of
you
know,
self-healing
itself.
A
A
I
would
say
you
know
at
no
extra
costs,
so
this
is
also
a
very
important
factor,
so
this
is
for
the
basic,
the
foundation
of
kubernetes
and
what
kind
of
requirements
and
capability
it
is
providing.
But
how
about
persistent
data
and
storage?
Let's
say
if
you
want
to
build
your
own
database
in
kubernetes
and
run
it
in
production,
then,
first
off,
of
course
it
needs
to
be
distributed.
You
can
not
run
a
single
part
database
on
a
single
node.
A
You
want
replication
to
happen
as
well,
because
by
default
we're
going
to
see
later
some
of
the
qualities
primitives,
but
the
data
itself
is
not
replicated
by
the
platform
so
meaning
that
there
are
two
main
solutions.
A
Encryption
is
also
something
you
have
to
consider,
especially
if
you're
running
database
that
help
that
are
holding
sensitive
data.
End-To-End
encryption
is
really
important
in
kubernetes
and
you
have
to
find
the
right
solution,
which
is
not
necessarily
relying
only
on
the
cloud
provider
for
encryption.
You
may
also
want
to
encrypt
your
data
directly
inside
kubernetes
so
that
no
one
can
get
access
to
your
queries.
Volume
if
someone
were
to
you
know,
read
it
from
from
kubernetes
itself.
A
A
Individual
teams
are
responsible,
for
you
know
a
set
of
micro
services
and
each
one
of
this
team
will
run
their
own
queuing
or
messaging
system
and
their
own
databases,
and
you
can
simply
you
know.
In
the
let's
say
cloud
native
philosophy:
you
cannot
rely
on
a
waiting
time
for
developer
to
consume
and
to
provision
their
databases.
A
They
need
to
be
deployed
on
demand.
You
cannot
afford.
You
know
waiting
two
three,
four
or
four
days
or
even
multiple
weeks
to
get
the
database
up
and
running
in
an
environment
where
potentially
code
updates
and
new
releases
are
deployed
in
production
typically
multiple
times
a
day,
so
self
self
provisioning
is
a
very
important
concept
when
it
comes
to
deploying
and
managing
databases
in
in
communities
and
because
kubernetes
has
all
the
fundamental
prerequisites
to
enable
this
kind
of
paradigms.
A
And
again,
you
have
two
solutions:
either
you
could
use
your
cloud
service
provider
native
service
such
as
azure
pipelines
and
and
others,
and
of
course
you
will
be
subject
to
the
same
drawbacks.
I
would
say
that
we've
seen
before
in
terms
of
different
clouds
having
different
apis
and
different
way
of
implementing
those
devops
pipelines
or,
alternatively,
you
can
choose
to
stay
within
capabilities
and
use
a
kubernetes
native
devops
tool
such
as
tecton,
which
gives
you
the
ability
to
develop
your
devops
pipeline
without
leaving
kubernetes.
A
A
Recursive
will
give
you
the
full
list
of
supported
volumes.
Some
of
them
are
legacy,
I
would
say,
because
it
also
include
the
deprecated
entry
drivers,
but
kubernetes
has
moved
away
from
the
entry
drivers
into
a
more
modular
approach
where
every
store,
storage,
vendor
or
provider
developed
its
own
driver
called
csi
or
container
storage
interface.
A
It's
a
plugable
architecture
where
only
the
required
csi
will
be
installed
by
the
user
when
you
need
it.
So,
for
example,
if
you're
using
amazon
eks,
you
can
install
the
ebs
csi
driver,
and
so
you
will
be
able
to
take
advantage
of
a
variety
of
features
that
comes
with
the
csi,
so
all
the
kubernetes
primitives,
and
on
top
of
that
also
additional
capabilities.
A
The
main
volume
providers
you're
going
to
be
using
are
displayed
here
on
the
screen.
The
first
and
more
obvious
one
is
the
persistent
volume
claim.
So
a
persistent
volume
claim
is
a
request
for
a
back-end
persistent
volume
that
matches
specific
criteria
such
as
you
know
the
size,
the
storage
class,
of
the
the
volume
you
want
to
create.
A
A
Another
important
consideration
is
how
the
pod
will
access
the
pvcs.
So
if
you
have
a
single
pod,
you
can
have
a
pvc
that
is
locally
attached.
If
the
node
fail,
then,
unfortunately
you
will
also
lose
the
data.
Now,
if
you
create
a
higher
level
construct
such
as
a
deployment,
then
you
will
have
to
use
a
shared
file
system,
because
in
the
definition
of
your
deployment,
you
will
specify
a
single
pvc,
meaning
that
if
the
pvc
is
a
local
attached
file
system,
then
only
the
first
pod
will
be
able
to
consume
it
right.
A
The
other
parts
that
will
be
potentially
residing
on
the
same
host
won't
be
able
to
access
it
because
it's
been
already
claimed
by
the
first
one
and
parts
that
are
residing
on
other
nodes.
Well,
they
won't
have
access
to
the
local
attached
volume
right.
So
the
only
solution
is
to
have
a
shared
network
file
system.
A
So,
in
terms
of
access
definition,
it
means
that
if
you
want
to
use
a
pvc
within
a
deployment
and
every
pods
need
to
write
to
that
pvc,
you
will
need
to
use,
read,
write
many
access,
backed
by
something
like
an
nfs
share.
Other
volumes
that
can
be
used
include
empty
gear,
which
is
a
scratch
directory,
typically
mounted
from
the
root
file
system
or
ram
on
the
node.
A
It
starts
empty
and,
of
course,
the
pod
may
write
data
to
the
directory
that
will
be
mounted
into
it,
but
when
the
pod
is
restarted,
the
data
that
is
located
there
is
also
scratched,
then
hostpass,
which
identifies
a
particular
path
in
the
communities
node
that
will
be
mounted
as
a
volume
into
the
pod.
It
is
typically
avoided
in
production,
as
it
has
some
security
involvement,
but
also
because
it's
only
valid
for
naked
pods.
A
A
That
can
be
mounted
to
your
pod
as
environment
variables,
but
also
as
a
volume
into
the
pod,
and
then
your
application
can
can
access
can
get
access
to
this
information
just
by
reading
the
files
that
will
be
present
into
your
mount
point
secrets
sort
of
similar
to
config
map,
except
that
it
is
encoded
in
base
64
but
not
encrypted
by
default.
This
is
really
important.
A
Then
we
have
the
downward
api,
which
can
be
very
useful
because
it
provides
contextual
information
for
your
application
running
inside
your
pod.
So
the
downward
api
allows
you
so
to
define
in
yaml
again
inside
your
pod.
You
can
reference
particular
fields.
You
want
to
inject
into
your
running
pot,
so
it
can
be
things
like
you
know
your
pod
ip,
the
amount.
You
know
the
requests
for
your
cpu,
the
limits
of
the
cpu
memory,
etc.
A
So
give
essentially
a
lot
of
contextual
information
for
your
application,
as
opposed
to
you
know,
hard
code,
those
information.
And
finally,
we
have
also
a
firm
ephemeral
volumes
which
are
a
bit
more
recent
than
the
the
others,
and
they
have
been
created
to
meet
the
requirement
of
specific
use
cases
where
applications
don't
really
care.
If
the
attached
volumes
are
persistent
or
not.
A
So,
for
example,
it
may
be
a
caching
application
where
the
data
you
know
can
be
easily
scratched
when
the
pod
get
restarted
and
the
application
doesn't
really
care
or
care
about
that,
or
it
gives
also
the
ability
to
pre-populate
data
as
input
for
the
application.
A
But
essentially
the
main
difference
is
that
the
life
cycle
of
the
volume
is
the
same
as
the
one
of
the
pod,
meaning
that
the
pod
can
get
restarted
on
the
node.
Where
previously
the
volume
didn't
reside,
as
opposed
to
as
opposed
to
a
pvc,
for
example,
as
the
pvc
once
it's
been
claimed,
will
basically
reside
forever
on
the
particular
node.
It
means
that
the
pod
is
tied
to
the
specific,
the
specific
node
where
the
pvc
resides.
So
it
cannot
be
restarted
on
another
node
here.
A
The
difference
is
that
parts
can
be
restarted
on
whatever
nodes.
Also,
in
addition,
ephemeral
volumes
can
be
supported
by
csi
providers
to
deliver
some
additional
capabilities,
such
as
snapshotting
cloning,
resizing
and
storage
capacity
tracking
for
those
ephemeral
volumes,
because
they
are
fundamentally
csi
capabilities.
A
Okay,
so
now,
let's
focus
a
little
bit
more
on
the
csi
basic
capabilities.
What
does
a
csi
need
to
deliver
to
kubernetes
at
the
bare
minimum?
So
it
is
a
standard
defined
for
storage
plugins
in
2018
when
kubernetes
moves
from
the
entry
driver
development,
meaning
that
for
every
modification,
the
whole
community
system
had
to
be
replaced.
A
A
Now,
when
it
comes
to
data
protection,
the
csi
driver
deliver
multiple
functions
that
are
represented
as
an
extension
of
the
kubernetes
apis
snapshots
are
effectively
represented
as
crds
or
custom
resource
definitions
and
are
composed
of
three
main
objects.
First,
the
volume
snapshot,
the
volume
snapshot,
content
and,
finally,
the
volume
snapshot
class.
A
So
the
volume
snapshot
is
comparable
to
a
pvc
in
the
sense
that
it
is
actually
a
request
for
a
snapshot.
A
real
snapshot
and
that
snapshot
that
is
taken
is
effectively
similar
to
a
persistent
volume,
in
the
sense
that
it
is
effectively
the
physical
sort
of
snapshots
and
the
corresponding
object
is
the
volume
snapshot
content.
A
The
volume
snapshot
is
composed
of
a
snapshot
controller
as
well
as
a
validation
webhook
and
is
effectively
delivered
by
the
csi
driver.
So
the
snapshot
controller
watches
both
volume,
snapshots
and
volume
snapshot
content
and
it's
the
component
responsible
for
the
creation
and
the
deletion
of
volume
snapshot
objects.
A
On
the
other
side,
the
sidecar
csi
snapshotter
is
the
component
that
watches
volume,
snapshot,
content,
objects
and
that
triggers
create
snapshots,
as
well
as
delete
snapshot
operations
against
a
particular
csi
endpoint
and
finally,
the
validation.
Webhook
is
nothing
more,
but
a
http
callback
that
is
there
with
the
goal
of
tightening
the
validation
for
volume,
object,
snapshot.
A
And
finally,
we
also
have
the
volume
snapshot
class,
which
specifies
different
attributes
belonging
to
a
volume
snapshot.
It
is
sort
of
similar
to
a
storage
class.
A
So
obviously
snapshots
are
asynchronous.
That
means
they
represent
at
a
particular
time
the
content
of
the
data.
It's
not
a
synchronous
replication
that
is
happening
continuously
over
time,
and
that
may
be
an
issue
in
case
of
rpo.
That
needs
to
be
equal
to
zero,
so
rpo
or
recovery.
Point
objective
is
the
representation
of
the
data.
Then
you
can
afford
losing
in
case
of
failure
right.
So
if
you
have
an
rpo
equal
to
zero,
it
means
that
you
need
something
more
synchronous
than
a
snapshot.
A
Basically,
you
need
a
representation,
a
continuous
representation
of
your
data
over
time-
and
this
is
the
type
of
thing
that
are
not
or
cannot
be
represented
directly
or
are
not
available
directly
in
kubernetes,
but
by
using
particular
csi
drivers
that
can
produce
and
provide
actually
that
feature
on
top
of
the
additional.
A
Functions
that
are
required
by
the
kubernetes
api.
Well,
the
csi
driver
can
itself
deliver
synchronous
replication.
So
this
is
the
case
of
the
on
that
csi
that
is
represented
here
on
the
screen,
but
other
open
source
csi
drivers
like
open
ebs
can
also
support
replication.
It's
just
to
give
you
an
example
of
how
it
can
be
delivered.
A
Okay,
so
so
far,
we've
seen
different
paradigms:
snapshots
asynchronous
synchronous,
application
for
zero
or
po,
but
fundamentally
there
is
also
something
else
which
is
creating
backup
from
your
snapshot,
so
your
snapshot
as
such
are
living
within
kubernetes.
So
in
case
of
failure,
of
course,
if
you
want
to
restore,
you
need
to
restore
from
a
storage
repository
that
is
still
available.
So
typically,
you
want
to
externalize
your
snapshot
and
copy
the
data
into
an
external
storage
repository
like
aws,
s3
or
google
storage.
A
Then,
as
another
set
of
custom
resources,
we
have
action
sets
which
define
actions
that
can
be
triggered
by
the
creation
of
the
corresponding.
You
know,
custom
resource
manifests
so
typically,
if
you
want
to
do
a
backup
or
a
restore
actions,
you
will
do
that
by
creating
those
manifests
and
to
help
with
the
life
cycle
of
those
custom
resources.
A
You
can
also
use
a
command
line
called,
can
ctl,
which
can
be
used
in
dry
mode
to
generate
a
different
manifest
and
then
those
manifest
can
be
applied
to
the
kubernetes
cluster
using
coop
ctl
or
you
can
just
use
ken
ctl
without
the
driver,
an
option
and
it
will
directly
create
those
crds
into
your
kubernetes
cluster.
So
here
we
have
an
example
for
the
elasticsearch
application.
A
A
Then,
once
the
profile
has
been
created,
you're
going
to
create
the
blueprint
that
is
available,
you
know
publicly
that
is
really
specific
to
elasticsearch
and
defined
how
to
perform
action
on
that
particular
application,
then
we
can
use,
can
ctl
with
dry
run
mode
to
generate
the
the
the
manifest
for
the
backup
action
set
and
later
apply
it
with
ctl
or
here.
In
the
example,
we
just
use
canned
ctl
without
the
driver
mode
and
that
will
directly
create
and
push
the
manifest
into
your
communities
cluster.
A
A
So
once
the
action
set
has
been
created,
the
manifest
push
to
kubernetes
the
controller
will
react
to
that
and
trigger
effectively
a
backup
that
you
you
can
monitor
in
terms
of
the
status
using
coupe
ctl
as
well.
So
just
monitoring
that
particular
custom
resource
you
will
see
once
you
will
be
updated
once
the
the
backup
has
been
completed
and
then
in
case
of
disaster,
you
and
you
want
to
restore
the
content
of
of
the
remote
location.
Then
again
you
can
just
use.
A
Can
ctl,
as
displayed
here
on
the
screen,
specify
the
namespace
create
the
action
set.
This
time
the
action
set
is
restored
and
from
the
backup
name,
which
is
basically
the
name
that
has
been
returned
by
the
previous
command.
When
triggering
the
backup
action
set
and
again
it's
a
crds.
You
can
monitor
the
progress
of
the
restore
by
using
kubectl
to
monitor
the
status
of
that
particular
crds
and
at
some
point,
the
data,
the
initial
data
will
be
restored
in
the
right
place.
A
So
that
concludes
our
presentation
for
today.
Hopefully,
you
learned
something
and
it's
been
useful.
A
couple
of
key
takeaways
before
moving
on
kubernetes
is
ready
for
stateful
application
with
cloud
native
data.
This
is
a
very
important
point.
It
has
evolved
over
time.
So
now
it's
not
only
about
cattle.
You
can
also
run
pets
in
cobilities,
but
the
key
is
to
make
sure
that
you
can
reach
the
right
level
of
availability,
scale
and
performance,
and
we've
addressed
today.
A
If
you
want
to
learn
more
about
data
on
kubernetes
and
how
to
run
your
stateful
application
and
your
stateful
workload
and
kubernetes,
please
join
the
dok
or
data
on
kubernetes
community.
You
have
the
link
there,
I'm
personally
running
the
dok
london
meetup.
So
if
you're
local
to
the
uk,
you
can
go
and
subscribe
to
the
meetup
page
so
that
you
are
always
up
to
date
when
it
comes
to
the
next
dates
for
our
needs
meetup,
the
next
one
will
be
it's
in
september,
so
if
you're
local
don't
hesitate
to
join
us.