►
From YouTube: KubeVirt data protection and forensics forum
Description
Let's get together to discuss plans/ideas to extend KubeVirt's data protection and forensics functionality.
Presenters:
- Michael Henriksen, Red Hat, github: mhenriks
- Ryan Hallisey, NVIDIA, github: rthallisey, twitter: @rthallisey
C
A
C
So
yeah,
I'm
gonna,
I'm
gonna
talk
a
little
bit
about
data
protection
what's
available
in
convert
now
what
we're
planning
for
the
future,
what
some
of
the
challenges
there
are
and
but
yeah
like
pep,
said
we're
really
interested
in
hearing
from
the
community.
C
This
is
kubernetes.
Data
protection
is
a
rapidly
growing
field
or
topic
of
interest,
and
we
want
to
know
what
you're
doing
for
data
protection
now,
what
primitives
cubert
can
provide
and
to
kick
that
off.
We've
got
ryan
here
to
talk
about
his
use
case
and
you
know
how
we
can
extend
some
of
the
we
we're
planning
to
extend
some
of
the
you
know:
data
protection
primitives
to
help
him
do
forensics
and,
of
course,
a
bunch
of
other
things
you
can
do
with
these
primitives.
So
yeah.
Let's
talk
about.
B
C
Let's
talk
about
what
you're
doing
now,
and
maybe
you
can
help
us
with
some
of
the
challenges.
So,
okay,
what
is
data
protection,
so
data
protection?
Is
you
know?
The
purpose
of
data
protection
is
to
insure
an
application,
and
its
associated
data
can
be
restored
quickly
after
any
corruption
or
loss,
so
basically
backup
and
restore,
and
so
what's
available
now,
there's
offline
virtual
machine
snapshots
and
I've
done
a
terrible
job
of
promoting
this
feature.
So
I
will
be
going
through
the
api
and
telling
you
how
to
use
it.
C
We'll
talk
about
online
snapshots,
which
will
be
coming
soon
in
their
crash
consistent
and
application.
Consistent.
C
C
Okay,
so
if
you
want
to
use
q
vert
snapshots,
there
are
a
couple
things
you
got
to
do.
First
in
the
keyboard
realm,
you
have
to
enable
the
snapchat
feature.
Gate
keywords:
are
the
snapshots
alpha
feature
and
convert
so
turn
on
this
feature
gate
there
is,
should.
C
In
the
cooper
info
telling
you
how
to
do
it
right
now,
you
can
only
snapshot
your
virtual
machines,
so
bmis
and
vmi
replica
sets
to
come
and
yeah
offline
snapchat.
So
your
virtual
machine
has
to
be
halted
for
storage
requirements.
C
So
if
you're,
the
big
thing
is,
and
the
kind
of
one
of
the
main
drivers
in
this
feature
was
we
want
to
be
kubernetes
native
everywhere,
so
we
are
leveraging
the
kubernetes
volume
snapshot
api.
You
know.
Obviously,
if
you
know
we
run
qmu
vms
and
you
know
there
are
all
sorts
of
capabilities
baked
into
that.
If
you're
using
qca2.
C
Snapchats
whatnot
we're
not
doing
any
of
that.
We
are
relying
on
kubernetes,
so
you
need
a
volume
snapshot
beta
api
deployed
in
your
cluster.
The
crd.
There
is
a
volume
snapshot,
common
controller
that
must
be
deployed
as
well
and
web
hook.
So
these
are
things
that
are
available
in
state.
Sig
storage
maintains
them.
You
can
get
them
in
github
somewhere
to
so,
because
we're
using
the
volume
snapshot
api
for
persistent.
C
A
C
Your
vm
configuration
in
the
snapshots,
but
we're
not
snapchatting
any
of
your
data
and,
of
course
you
need
a
csi
provisioner
for
your
storage
class
and
support
snapshots.
C
Okay,
so
those
are
the
table
stakes
there.
Next,
so,
if
you
wanna
snapshot
a
vm
again
make
sure
it
stops
and
just
submit
a
email
that
looks
like
this
pretty
easy
you
can
see
the
source
is
a
virtual
machine
and
it
has
a
name
okay.
So
all
right,
so
snapshot
monitoring.
So
you
can
see
buried
in
this
yaml
somewhere.
There
are
a
couple
conditions,
progressing
and
ready.
C
C
So
one
interesting
thing
I
want
to
point
out
at
the
bottom:
is
this
volume
snapshot
content
name
so
back
here
we
introduced
this
volume,
virtual
machine
snapshot
resource,
but
there's
also,
this
vm
snapshot
content
reference.
What
is
that
so?
There's
a
virtual
machine
snapshot
content
resource
and
it
is
similar
to.
I
don't
know
if
you
are
familiar
with
the
csi
volume
snapchat
api,
there's
volume
snapshot
and
volume
snapshot
content.
C
So
a
virtual
machine
snapshot
is
like
the
intention,
the
job
I
want
to
snapshot
this
vm
and
the
con.
The
virtual
machine
snapshot
content
is
the
container
that
has
the
data
and
references
to
any
data
that
are
applicable
to
that
snapshot.
So
we
can
see
in
here
we're
embedding
the
virtual
machine
spec,
embedding
any
persistent
volume
claim
specs
and
any
the
names
of
any
virtual
machine
volume
snapshots
that
have
been
created.
C
Okay,
so
snapshots
are
cool
but
they're
kind
of
useless
without
a
mechanism
to
restore.
So
this
is
how
you
would
restore
a
vm.
There
is
a
virtual
machine
restore
resource
and
you
specify
the
target
which
is
vm
and
the
name
of
the
snapshot
pretty
straightforward.
C
C
Create
pvcs
from
all
the
volume
snapshots
that
are
referenced
there,
wait
for
those
all
to
complete
and
then
update
the
vm
spec
with
the
spec
that
is
in
the
snapshot
resource.
So,
while
all
that
is
happening,
progressing
condition
will
be
false,
ready
will
be
false
and
then
eventually
it'll
be
true.
Once
everything
has
been
created.
C
All
together
in
the
new
virtual
machine,
spec
cool,
so
offline
snapshots,
that's
what
we've
got
now
and
we're
working
on
online
snapshots,
but
I
just
want
to
go
into
a
bit
of
the
challenges
and
what
we
need
to
address
to
make
that
possible.
And
this
is
an
area
where
you
know.
C
Kubernetes
storage
experts
out
there
definitely
chime
in
if
you
have
any
tips,
tricks
or
any
quirks
or
features
that
may
be
applicable.
C
So
one
thing
to
consider
is
that
kubernetes
csi
gives
no
consistency
guarantees,
so
you
can
create
a
volume
snapshot
at
any
time
and
there's
no
guarantee
that
you
know
the
snapchat
will
be
created,
but
there's
no
guarantee
that
you
know
the
snapchat
is
going
to
be
useful
at
some
time
in
the
future.
Could
you
may
have
just
snapchatted
at
a
time
when
you're
file
system
is
in
a
state
where
it
couldn't
be
restored?
C
So
that's
why
you
know
offline
snapshots.
We
know
that
everything
nothing
is
being
used.
We
wait
until
any
that
are
referencing
at
pvc
are
are
gone
and
then
we
snapshots
when
we
know
that
they're
not
in
use.
So
we
really
need
to
solve
this
consistency
problem
when
we
get
to
online
snapshots.
C
C
Make
sure
that
you
know
all
the
disks
are
consistent
amongst
themselves
and
there's
work
in
the
community
to
address
that
there
is
a
notion
of
a
csi
volume
group
which
will
be
coming
along
very
soon.
I
encourage
you
all
to
check
out
the
data
protection
work
group
after
doing
some
cool
stuff.
C
So
that's
one
of
the
challenges
there.
How
do
we.
C
Get
snapshots
that
are
usable
so
for
crash
consistency.
We
really
need
to
do
an
fs
freeze
at
the
file
system
level
to
make
sure
that
the
the
file
system
can
be
restored.
So
what
fs
freeze
basically
will
flush
any
outstanding
activity
on
the
file
system
and
block
any
future
writes,
I
think
things
will
just
block
until
snapchat
is
created
and
then,
of
course
you
unfreeze,
which
you
know
will
continue
anything
that
was
queued
up
or
or
whatever
so
for
a
crash
consistency.
C
You
know
minimum
you
have
to
fs
freeze
unfreeze
again
the
data
projection
work
group
knows
about
this
and
there
are
a
couple:
different
initiatives:
there's
a
container
notifier
and
there's
also
the
execution
hook,
which
has
been
around
for
a
while.
But
the
thing
is:
do
fs
freeze
on
freezing
need
privileges?
Typically,
so
if
we
can't
get
the
cube,
let's
do
us
we're
kind
of
to
do
it,
we're
kind
of
to
help
us
out
we're
kind
of
lucky
and
that
we
have
handler
around
which
is
so
more
privileged.
C
So
the
next
step
after
crash
consistency
is
application,
consistency
and
for
that
we
need
a
way
to
qs
and
qs
applications
and
because
we're
virtual
machines
there's
a
kind
of
extra
level
of
work.
We
have
to
do
there,
so
we
really
need
to
integrate
with
the
qmu
guest
agent
to
have
code
running
in
the
vm.
Do
the
proper
you
know
flush
tables
or
or
whatever,
to
get
to
the
application
in
the
state.
C
C
Which
I'm
sure
you
all
kind
of
expected
cool
right
and.
C
That
we're
going
to
be
working
on
the
parallel
is
valero
integration.
So
valero
is
you
know
the
number
one
tool
for
data
protection.
C
C
It
has
a
lot
of
support
for
cloud
provider
stuff,
which
I
think
is
less
important
for
us,
but
interested
to
hear
if
you
guys
are
running
on
the
cloud
and
what
you
guys
are
doing
for
backups
there.
C
Migration
and
disaster
recovery
are
kind
of
very
similar
use
cases,
but
valero
has
apis
to
back
up
to
object,
storage
and
restick
integration.
So
I
know
if
you
guys
have
used
rustic
before,
but
it's
pretty
cool,
and
so
you
could
kind
of
even
on
a
bare
metal
cluster,
bring
up
a
mineo
s3
server
and
using
the
reset
integration
back
up
your
your
disks
in
a
relatively
efficient
way,
so
by
default.
Valero
is
pretty
good
at
backing
up.
C
You
know
your
entire
namespace
or
your
entire
cluster,
but
it's
not
so
great
you
if
we
just
want
to
backup,
say
1vm
and
that's
why
we
want
to
create
a
cubert
backup,
restore
plug-in
for
valero
that
will
allow
basically
to
traverse
the
cube
root
object
graph.
So
you
know
valero
doesn't
know
when
it
sees
a
virtual
machine
spec
that
it
has
to
also
back
up
any
pvcs
that
are
associated
to
it
or
any
other
resources.
So
that's
planned
integration
there.
So
definitely
yeah
love
to
hear
what
you
guys
are
doing
for
backups.
Now.
C
How
does
this
sound
to
you
guys?
Have
you
used
valero?
Do
you
think
this
will
work?
I
think
it'll
work,
but
so
any
feedback
would
be
great
and
yeah.
So
these
are
kind
of
the
primitives
that
we're
planning
to
provide
now
and
I'll
hand
it
over
to
ryan
to
talk
about
his
use
case,
and
you
know
how
we're
gonna,
you
know,
extend
keybridge
work
for
him.
A
Okay,
okay,
so
let's
forensics
so
forensics
being
like
a
use
case
of
snapchat,
so
to
expand
even
on
what,
when
mike
talked
about
with
live
virtual
machine
snapchat
and
restore
that's
kind
of
what
in
a
lot
of
ways
about
being
exact.
A
Let's
say
that
you
know
I
like
old
video
games,
and
I
have
one
of
my
favorite
video
games
of
all
time
depicted
here
and
and
for
older
games.
Nowadays
those
servers
don't
exist
anymore.
So
let's
say
you
know
I
I
that
I
want
to
host
a
cluster
on
my
whole
with
this
video
game,
so
I
can
play
with
my
friends.
A
Let's
say
this
runs
kubernetes.
It
runs
qvert.
I
have
virtual
machines.
I
have
gpus,
I
have
a
bunch
of
allow
people
to
connect
as
guests
to
run
in
those
and
those
virtual
machines,
the
game
and
so
that
we
can
all
play
together,
so
they'll
be
playing
and
they'll
have
a
gpu
attached
to
their
virtual
machine
and
they'll
be
playing
that
game.
A
So
in
this
scenario,
you
can
imagine
where,
if
they
had
someone
come
to
our
cluster
one,
let's
say
perhaps
that
you
know,
maybe
they
weren't
really
standing
there
and
and
they
won't
really
do
anything
and-
and
you
know
problem,
you
know,
that's
cool,
but
what
happens
if
I
get
a
notification
on
my
phone
or
with
monitoring
saying
that
it's
been
reached?
Well,
that's
not
really
a
good
thing.
You
know.
A
I
can
imagine
that
oh
wow,
something
is
weird
happening,
there's
a
person
playing
the
game,
but
it's
a
little
bit
unusual
they're,
not
really
doing
anything
and
someone's
breaching
my
network.
So
what
am
I?
What
am
I
left
to
do
so
you
know
in
the
scenario
I
really
want
to
look
into
what
what's
happening
with
this
virtual
machine
I
want
to.
I
want
to
look
into
ways
that
I
can
examine
it
further
to
see
what's
going
on
and
so
that
I
don't
encounter
this
in
the
future.
A
So
what
would
I
look
to
do
so
in
this
kind
of
use
case?
You
know
I
have
cuber
here
and
I
have
my
virtual
machine
and
I
have
globe.
I
want
to
make
sure
that
my
virtual
machine
can't
cause
any
damage
to
any
of
the
healthy
workloads
in
my
cluster,
so
I
find
it
I'm
going
to
look
at
suspending
it,
so
I
can
do
any
sort
of
analysis
later
next.
A
A
A
I
said:
suspend
okay,
well,
two
types
of
suspended,
I'm
referring
to
so
pause
and
save
slightly
different
pause
being
that
the
the
hypervisor
is
going
to
suspend
the
virtual
machine
and
it's
going
to
store
its
state
in
ram
until
it's
resumed,
save
being
that
the
domain
state
is
going
to
be
it's
going
to
be
suspended
and
it's
going
to
be
stored
in
pers,
it's
going
to
be
stored
in
persistent
storage,
so
one
big
ephemeral,
one
being
a
little
bit
more
persistent
and
right
now
pause
is
something
that's
already
supported
in
q
vert.
A
A
A
A
So
with
that
high
level
suspend
is
going
to
in
this
case,
spending
is
going
to
be
I'm
going
to
be
able
to
suspend
a
misbehaving
virtual
machine
and
I'll
be
able
to
resume
it
later
for
analysis.
So
it's
perfect
that
you
know
that
solves
my
problem.
I
can
keep
playing
the
version,
the
playing
playing
my
game
with
my
friends
and
not
have
to
worry
about
any
sort
of
problems,
and
I
can
deal
with
it
at
a
later
date.
A
So
now
I've
got
what
else
suspend
I
can
spend
pass-through
and
mediated
devices,
so
the
physical
gpus
or
the
mediated
ones.
I
don't
have
to
worry
about
that
so,
but
what
does
this
not
quite
get
me?
I
can't
resume
multiple
virtual
machines.
What
I
mean
by
that
is
like
if
I
was
a
team
administering
this
cluster,
you
know
one
of
my
co-workers
say
they
wanted
to
also
look
at
this
virtual
machine
and
forensically
analyze
it.
They
can't
do
it.
If
I'm
looking
at
it,
they
would.
A
They
can't
have
their
own
copy
without
doing
additional
work.
They
could
clone
it
if
they
wanted
to,
but
they
would
have
to
do
additional
work
and
the
same
for
resume
into
a
different
name
space.
You
could
have
to
migrate
the
virtual
machine
to
different
name
space.
Let's
say
so.
You
can
log
name
space
locked
on
the
network
to
do
some
analysis
and
then,
finally,
which
machine
spending
for
a
longer
longer
than
a
week.
Like
I
mentioned
we're
holding
out
the
resources
you
know,
gpus
are
pretty
valuable
in
my
cluster.
A
A
Okay,
so
I'm
going
to
talk
a
very
high
level
and
sort
of
the
sort
of
future
work.
You
know
what
mike
mentioned.
I'm
doing
live
snapshots,
so
specifically,
what
I'm
interested
in
describing
the
case
is
actually
doing
the
an
online
snapshot
with
with
save.
So
what
would
this
look
like
a
high
level?
A
Well,
when
you
create
a
snapshot
of
a
virtual
machine,
the
vert
launcher
pause
the
virtual
machine,
we'll
do
a
hot
unplug
of
non-usb
devices
and
that's
important
because
to
an
offline
migration.
This
is.
This
is
essentially
like
a
virtual
machine
migration,
so
it
runs
the
same
code
path,
so
the
non-usb
devices
will
need
to
be
unplugged,
and
then
we
actually
execute
a
save
to
to
a
location
on
the
pvc,
and
the
result
is
that
we
save
ram
and
vram
disk
and
we'll
all
end
up
in
the
pvc.
The
next
slide.
A
Please,
and
so
the
same
thing
will
happen
for
restore,
but
now
we're
going
to
or
the
inverse
will
happen
for,
restore
now
we're
going
to
we're
going
to
restore
this,
but
we're
going
to
clone
the
pvc
when
we
go
to
restore
we're
going
to
look
at
recreating
the
vm
with
the
clone
pvc
reattach
the
devices
and
actually
do
the
restoration
on
the
vm
and
relocate
the
ram,
and
do
that
restore
and,
like
I
mentioned
one
of
the
things
we
can
do
with
this
is
now
we
can
actually
pick
the
namespace,
because
it's
a
namespace
operation,
restrict
everything
to
a
namespace
and
do
and
restrict
the
network
in
the
namespace
and
do
our
analysis
in
a
safer
place.
A
Please,
okay!
So
what
does
this
get
us
now?
We
don't
talk
about
save,
so
we
can.
When
we
save
we
power
off
the
virtual
machine,
we
free
up
resource
in
the
cluster,
so
high
level
we're
able
to
suspend
we're
able
to
restore.
A
We
can
only
suspend
mediated
devices
when
we
do
when
we're
talking
about
save
and
not
pass
through.
I
mentioned
vram
and
media
device
earlier.
That's
the
one
exception
to
this,
and
then
what
else
will
we
get
out
of
this?
Well,
we
can
resume
multiple
virtual
machines.
We'll
get
this
as
part
of
this
operation
because
of
the
cloning.
A
We
can
resume
it
to
a
different
name
space,
because
this
is
equivalent
to
offline
virtual
machine
migration,
and
then
it
makes
it
a
little
bit
more
appealing
to
keep
our
virtual
machines
suspended
for
longer
than
a
week,
because
my
valuable
resource
those
devices,
those
gpus,
aren't
going
to
be
consumed
next
slide,
please
so
to
kind
of
look
back
and
conclude
the
story
on
on
my
cluster,
the
person-
that's
misbehaving,
I've
paused
their
virtual
machine
or
I've
saved
it
and
they're
no
longer
able
to
play,
and-
and
this
is
what
they're
left
with
they
can
no
longer
play
games
with
us
in
the
cluster
okay.
A
So
that's.
That
concludes
by
example,
and
we
can
start
with
any
questions.
B
A
So,
in
the
case
of
so
with
save,
you
would
be
able
to
the
we
actually
talked
about
this
in
from
the
community
meetings.
The
we,
the
first
approach
we
were
looking
at
doing
is
using
the
already
attached
pvc,
so
that
ram
would
end
up
in
the
would
end
up
in
the
same
pvc,
with
the
assumption
that
the
user
attached
it
that
they'll
have
enough
space
to
store
ram
in
there.
There
are
other
options
where
we
could
look
at
possibly
adding
new
pvcs
there's
a
little
design
document.
A
C
Yeah,
I
think,
when
I
did
split
around
is
to
add
you
know
a
volume
type
to
virtual
machines,
vms
that
is
like
a
memory
volume
type
that
can
be
used
as
a
sort
of
scratch
space.
If
you
will
that
when,
if
you
want
to
do
a
snapshot
or
save
that
requires
memory,
we
can
dump
it
there
and
then
take
a
virtual
volume
snapshot
of
that
to
restore
it
later.
B
Okay,
are
vgpus,
considered,
mediated
or
passed
through
devices
for
suspend
and
resume.
A
I'm
I'm
talking
about
them
as
media
devices.
Okay,.
B
C
Definitely
the
synchronization.
These
are
all
things
that
are
kind
of.
I
think
one
of
the
challenges
is
going
to
be.
You
know
going
with
the
community
and
forging
our
own
way
like.
I
think
the
community
is
always
going
to
be
a
little
behind
of
what
we
want
to
do.
So
you
know
some
of
these
synchronization
problems
that
they're
tackling
you
know.
Can
we
wait
for
the
proper
api
and
implementation
to
be
deployed,
or
do
we
have
to
come
up
with
something
ourselves?
C
I
think
that's
going
to
be
an
interesting
balance
and
just
the
the
fundamental
challenges
there
of
synchronization.
I
think
you
know
what
I
mentioned
before,
especially
the
issue
around.
If
your
application
uses
multiple
disks
and
they
have
to
be
synchronized,
there's
nothing.
You
can
do
for
that
right
now.
Until
we
have
volume
groups.
B
C
Csi
is
supporting,
helps
us
out
there.
B
Okay,
jose
wants
to
know,
is
anybody
collecting
use
cases
for
for
data
protection
in
cooper
and,
if
so,
where.
C
It's
a
good
question,
that's
one
of
the
things
I
want
so,
as
I
mentioned
earlier,
I
think
I
did
a
terrible
job
of
promoting
the
snapshot
feature
and
I
hope
people
start
using
it
more
and
there's
more
awareness
out
there,
I'm
I
I
don't.
B
Okay,
rob
wants
to
know,
should
the
snapshiny
feature
be
storage
provider,
agnostic,
they're,
running
rke,
plus
longhorn,
and
have
had
some
issues
with
snapshotting.
C
So
yeah,
so
that
would
mean
not
using
csi
snapshots.
I
assume-
and
I
think
that's
where,
when
I
briefly
talked
about
the
valero
and
rustic
integration
may
come
in,
stick
to
to
do
know
incremental
backups
of
your
of
your
pvcs
as
they
change.
I
don't
see
us,
I
think
we'll.
You
know
our
plan
is
to
stick
with
csi
snapshots
for
now,
but
I
think
with
rustic
integration
and
valero,
you
could
have
a
solution.