►
Description
Meeting of Kubernetes Storage Special-Interest-Group (SIG) Volume Snapshot Workgroup - 17 September 2018
Find out more about the Storage SIG here: https://github.com/kubernetes/community/tree/master/sig-storage
Moderator: Jing Xu (Google)
A
So
right
now
is
seeing
sad
and
I
are
working
on
a
blog
post
or
snapshot
feature,
so
this
feature
will
be
available
for
version
112
and
we
continue
to
add
all
kinds
of
documentation
in
kanani's
in
CSI.
So
I'll
put
the
list
later,
I
think.
Last
time
we
discussed
okay,
some
documentation
for
user,
like
more
focus
on
user
side
and
some
documentation
will
more
for
developer.
A
So
we'll
address
all
of
those,
and
today's
meeting
and
I
want
to
more
focus
on
a
list
of
like
more
like
the
next
step,
the
features
we
want
to
have
for
snack
shot
after
we
have
all
those
basic
building
blocks
and
so
I
ran
I
have
list
welcome
back
to
like
give
us
feedback
and
add
more
new
new
things.
The
first
one
I
think
we
discuss
a
little
bit
in
last
meeting
relate
hit.
You
like
panelizer
for
protection,
so
there
are
a
few
like
scenarios.
We
might
need
some
kind
of
protection.
A
The
first
is,
for
example,
you
want
to
delete
a
volume,
but
currently
it
is,
it
is
trying
to
take
a
snapshot
or
actually
you
want
to
delete
a
volume
when
it
already
has
snapshot
taken
for
this
volume,
because
some
storage
plugins
will
not
allow
you
to
delete
volume
if
they
have
snapshots
associated
with
it,
and
also
some
volume
plugin.
You
might
have
like
incremental
snapshots.
So
you
cannot
delete
a
snapshot
if
there
are
some
other
snapshots
or
social
visit
and
the
last
time
we
kind
of
agree
that
is
in
this
scenario.
B
A
A
B
The
alternative
that
I
had
in
mind
was
you
could
require
that
the
back
end
reports
success,
even
even
if
it
didn't
actually
delete
it,
and
then
it's
incumbent
on
the
backend
to
figure
out
how
how
to
delete
it
when
it
can
be
deleted.
So
it's
no
longer
kubernetes
problem,
so
it
to
worry
about
this
thing.
As
far
as
cubanÃa
is
concerned,
it
really
is
gone.
B
A
B
If
we
had
like
conformance
tests
that
those
kinds
of
implementations
could
never
pass
these
conformance
tests,
because
the
conformance
test
is
going
to
set
up
a
known
configuration
and
try
and
known
sequence
of
steps
and
expect
it
everything
to
go,
as
you
know
as
expected,
and
if
a
back-end
can't
delete
a
snapshot
or
volume
for
whatever
reason
it
would
fail.
The
conformance
test,
even
if
eventually
retrying,
would
succeed
that
that's
still
not
good
enough.
A
B
Sososo
speaking
for
NetApp,
but
we
had
net
up,
have
had
to
deal
with
these
kinds
of
cases.
In
fact,
we
have
exactly
the
situation
where
we
cannot
delete
an
actual
volume
that
has
a
snapshot
if
the
snapshot
needs
to
stay
around,
because
our
implementation
has
the
snapshot
being
a
member
of
the
volume.
But
it's
not
unrealistic
for
us
to
to
lie
and
say
yeah.
We
deleted
that
volume
and
for
us
to
basically
keep
track
somewhere
that
when
the
last
snapshot
is
gone,
we
need
to
delete
the
volume
will
leave
ourself
a
breadcrumb
to
go.
B
A
B
A
A
A
B
Yeah,
we
need
to
I
think
make
it
sounds
like
the
decision
hasn't
been
made
and
and
and
I'm
I'm
not
saying
it
has
to
go
this
particular
way,
I'm
just
pointing
out
that
you
know,
there's
a
choice
to
be
made
here.
We
can
either
surface
up
the
ugliness
to
the
end-user
that
sometimes
snapshots
can't
be
deleted,
or
sometimes
volumes
can
be
leader
or
we
can
say
no.
We
want
to
provide
a
consistent
experience
and
force
the
developers
to
deal
with
ugliness
on
their
side
and
in
both
both
options
have
have
issues.
Okay,.
C
So
Ben's
right
now
we
don't
really
have
any
restrictions
right
now
right.
So
basically
it's
really
left
to
the
backend
I.
Think
it's
hard
for
Co
do.
A
A
Might
create
snapshots
already
and,
and
we
only
test
for
volume
currently,
and
so
we
don't
really
like,
have
any
restriction
or
any
thing
I
say
it
just
depends
on
the
backhand.
Whatever
message
response
you
have
and
volumes
it's
also
possible
like
without
kubernetes,
and
you
can
create
snapshots
for
one
already
and
the
behavior
of
the
current
behavior
for
volume.
Right
at
least,
is
it's
my
return,
some
arrow
in
the
back
in
the
currently
can,
because.
D
I
have
two
comments:
I
agree
with
Beth
and
that
we
shouldn't
expose
our
ugliness
to
users.
Users,
don't
have
to
be
aware
of.
You
know
how
this
backing
works
or
how
that
backing
works.
I
think
the
better
alternative
to
what
Ben
suggested,
which
was
lying
about.
You
know
successful,
delicious
things
like
that.
I
think
the
better
alternative
would
be
to
these
bad
cans
that
have
a
coupling
between
snapshots
and
corresponding
volumes.
D
They
can
decouple
their
life
cycles
so,
for
example,
but
the
net
aback
in
the
Ben
mentioned
they
can
promote
a
snapshot
to
a
volume
and
that
way
that
snapshot
has
an
independent
lifecycle
from
the
volume
that
it
corresponds
to
and
later
on,
when
you
want
to
create
a
volume
from
the
snapshot
effectively
our
cloning
in
that
volume.
So,
yes,
you
would
lose
some
storage
efficiencies
because
you
know
now
you're
more
space
and
stuff
like
that.
D
D
B
D
A
B
E
Do
we
have
to
enforce
this?
So,
let's
imagine
there
is
a
back-end
that
won't
allow
you
to
delete
a
volume
unless
all
the
snapshots
are
deleted.
If
that's
the
case
and
you
go
and
delete
the
PVC,
that
process
should
basically
fail.
Pinging,
saying
I
am
unable
to
delete
this
volume
because
you
know
snapshots
need
to
be
doing
the
bursts.
The
status
has
an
error.
There
comes
along
and
reads
it
and
then
they're
responsible
for
doing
the
cleanup.
A
A
D
Is
just
whenever
we
create
a
part,
the
part
wouldn't
start
unless
all
the
dependencies
for
the
wall
for
the
part,
which
are
the
plus
say
the
volumes
the
PVCs
are,
you
know,
have
corresponding
TVs
and
so
then
the
parts
part
starts.
This
is
a
similar
type
of
dependency.
You
wouldn't
delete
a
volume
until
you
delete
all
the
corresponding
snapshots,
so
it's
a
just
a
dependence
users
have
to
be
aware,
so
they
can't
so
they
can
address
it.
But.
B
It
should
be
possible
for
this
sequence
of
actions
to
succeed
on
your
plugin
and
then
you
design
the
conformance
tests
to
run
those
actions
and
it's
up
to
the
person
running
them
to
ensure
that
the
prerequisites
are
met
and
then
you
do
it
and
it
passes-
and
you
say
great,
you
can
form
button
resetting
of
a
situation
here
where
it's
just
not
possible
to
write
that
test.
Well,.
F
B
D
So
yeah
my
point
was
that
backends,
you
know
that
bakken's
I
wanna
support
the
way,
Cuban
history,
snapshots
and
volumes.
They
can
emulate
that
but
promote
putting
a
snapshot
to
Hell
volume
and
manage
the
snapshot
lifecycle.
The
way
you
know
the
manager
volume
that
way,
they're
decouple.
So
it's
off
to
the
plug-in
implementers
to
figure
out
the
best
way
they
can
make
this
happen
in
bit
kubernetes,
but.
D
B
E
That's
an
option
that
a
storage
vendor
gets
to
make
write;
they
can
do
extra
work
to
make
a
nicer
user
experience
or
they
can
let
their
bubble
up
to
the
user
and
let
the
user
handle
it.
I
think
I
agree
with
Ben's
original
point,
which
is
by
trying
to
work
around
this.
At
the
kubernetes
layer
we
introduced
inconsistencies
in
the
behavior
of
the
kubernetes
api
and
that's
pretty
bad.
So
I
would
prefer
to
leave
this
to
the
storage
vendor
to
decide
they
can
choose
to.
E
A
E
A
And
there
are
some
trade-off
rights
if
we
hide
the
like
the
arrows.
That
moment
it's
not
even
yet
when
we
say
it's
reading
I,
don't
it
will
cause
some
issues.
Definitely
if
the
system
never
be
able
to
the
media
and
the
user.
Just
have
no
knowledge
about
that,
and
so
so
I
think
the
planning
can
choose.
Do
some
smart
things
and
try
something
and
but
it's
in
2d
the
snapshots
and
then
return
it
successfully,
get
it.
We
can.
Oh,
it's
yeah.
E
Or
it
just
surfaces
there
and
says
sorry
in
order
to
delete
this
volume,
you
first
need
to
delete
snapshots,
which
means
the
users
are
responsible
for
the
cleanup
and
it's
not
the
end
of
the
world,
because
you're
not
putting
the
user
in
the
position
that
they
cannot
recover
from.
They
would
just
need
to
go
in
and
delete
the
snapshots
first,
but
you
know
that's
a
decision.
You
leave
up
to
the
storage
vendor.
A
B
A
E
E
E
B
E
E
C
B
G
A
So
I
think
for
this
we
kind
of
agree
at
least
for
now
right.
We
depends
on
the
plugins
to
give
us
whatever
responds.
When
you
try
to
delete
something
right,
the
read
operation
will
make
a
return
from
CS.
The
pizza
on
the
CSI
responds,
and
another
thing
scenario
is
more
like
when
you,
for
example,
trying
to
delete
a
snapshot,
but
the
snapshot
currently
used.
These
means
someone
like
is
trying
to
create
volume
from
snapshots.
I
know
for
some
pecans.
A
It
will
not
allow
you
to
need
some
shots.
It
will
return.
I
realize
something
like
the
resource
in
use,
but
I'm
not
sure
this
is
consistent
across
all
the
back
end
parties.
So
it's
possible.
We
we
can,
in
this
case,
protect
at
qualities,
layer
and
finalizer,
because
it
is
more
like
us,
that's
what
you
use
and
we
can
monitor
that
and
then
we
prevents.
A
A
A
A
B
A
D
A
D
B
A
Never
eat
something.
So
in
order
to
do
this,
we
need
to
make
sure
we
can
see
their
other
corner
cases
and
to
prevent
that
happening.
So,
yes,
this
is
so
we
add
this
nice
feature
like
the
prevents
you
something
bad
happen,
but
also
we
might
cause
something
like
undesired.
So
if
we
all
grade
that's
nice
feature,
sorry.
A
A
E
A
B
It
occurs
to
me
that
this
is.
This
is
exactly
the
same
situation
as
as
what
we
were
discussing
the
first
part
of
the
meaning,
which
is
you
know
you
could
just
let
let
let
the
plug-in
fail
and
then
it's
up
to
the
plug-in
to
figure
out
what
the
you
know.
If
it
can
do
something
smart
or
not,
and
to
not
not
track
any
state
of
the
kubernetes
and
say
you
know
it.
A
E
A
B
A
B
Attached
volume
is
something
the
Corday's
has
to
know
about,
because
it
manages
the
pod
and
to
manage
the
attachment,
but
the
relationship
between
a
volume
and
a
snapshot
is
not
something
communities
manages
so
it
you
could
make
it
out
of
scope
and
to
say,
hey,
do
the
right
thing,
otherwise
users
will
suffer
yeah,
which
is
which
it
sounds.
Like
is
what
we're
saying
about.
You
know
the
first
case
of
the
ability
to
delete
a
snapshot
or
volume
has
snapshots
we're
just
pushing
back
on
the
plugin
and
saying
do
the
right
thing.
E
The
the
difference
that
I
see
between
the
first
case
in
this
case
is
that
the
workaround
on
the
kubernetes
side
in
the
first
case
would
result
in
weird
and
consistent
behavior,
whereas
the
proposal
here
actually
will
just
give
us
added
protection
without
any.
You
know
weird
behavior
so
over
here
we
can
add
protection
it'll,
make
it
a
nicer
experience
and
we
don't
really
have
any
negative
side
effects
here.
That
I
can
see.
E
E
D
So
it's
not
main
difference
between
this
problem,
and
the
first
problem
is
that
this
would
benefit
all
plugins,
although
in
plugins,
whereas
the
first
one
would
only
benefit
the
plugins
I
had
the
limitation
for
coupling
because
snapshots
and
volumes
so
if
it
makes
sense
yeah.
So
if
it
makes
sense,
you
know
for
all
plugins
Ramona
has
alder
in
kubernetes.
A
Should
apply
for
all
plugins,
yes,
yeah
I
agree
like
it's
not
like
a
must-have
right.
It's
a
nice
feature
to
have,
and
we
can
see
when
you
should
do
that.
It
definitely
has
some
capacity
in
the
system.
So
kick
it
off,
but
there's
we
can't
make
it
Monica
to
and
we
can
work
on
whenever
we
think
it's
a
good
time
yeah
and
so
can
we
move
on
to
next
one,
and
so
the
second
one
on
the
list
is
the
reach,
my
creation,
retry
policy.
A
So
right
now
you
know
snapshot
is
kind
of
unique
because
we
say
we
only
create
snapshots
once
and
in
case
of
failure.
We
just
report
the
arrow
and
the
the
content.
Controller
won't
retry,
because
snapshot
is
kind
of
time-sensitive
and
we
users
say
ok
great
now
they
might
not
want
to
like
retry
in
a
much
later
time,
just
like
the
controller
to
keep
redrawn
yeah.
B
I,
like
the
existing
behavior,
makes
sense
to
me.
I
actually
I,
actually
kind
of
this
during
my
testing,
because
I
I
created
a
PVC
and
then
created
a
snapshot
before
the
PV
even
created,
and
it
failed
for
obvious
reasons
and
I
was
like.
Oh
what
happened
then
I
realized
what
happened.
I.
Think!
Oh,
that
makes
perfect
sense.
You
can't
take
a
snapshot
before
the
data
exists.
I
mean
you
wouldn't
want
it
to
retry
after
the
data
exists,
because
that
would
be
weird.
A
G
Make
sure
it's
it's
a
it's
a
reasonable
feature,
because
you
know
when
you
take
a
snapshot,
want
it
now
and
there
are
other
higher
level
of
snapshots
policies,
scheduling
that
will
help
you
to
do
snapshots
on
on
a
specific
time.
So
I'm
not
sure.
If
it
could,
it
could
meet
for
a
snapshot
that
are
not
in
the
right
time
that
the
user
meant.
E
I
think
it's
an
important
point.
That
snapshots
are
very
much
time
bound
when
you
expect
to
take
a
snapshot.
You
expect
to
take
it
within
a
certain
time
frame,
which
is
why
the
retry,
you
know
not
having
a
retry,
makes
perfect
sense
like
Ben
said,
but
if
there
is
some
room
for
retry,
we
should
enable
that
scenario.
So
what
I
was
imagining
was
having
a
instead
of
saying:
oh,
please
retry
for
one
or
two
minutes.
What
you
say
is
please
retry
up
to
a
given
timestamp.
E
So
you
know
we
we
qsr
a
workload,
for
example,
and
we
know
that
we
have
30
seconds
to
take
that
snapshot,
and
so
what
we
can
do
is
pass
that
information
to
say,
hey.
You
know
for
the
next
20
s
and
29
seconds.
Please
try
to
take
the
snapshot,
and
so,
if
it
takes
a
snapshot
within
five
seconds
and
fails
it
can
it
knows
that
before
that
timestamp
it
can
continue
to
retry.
But
after
that
timestamp
it
has
to
give
up
I
think
that
would
be
potentially
useful,
yeah.
B
A
E
A
A
G
G
A
B
A
B
The
the
deck
hey
if
you're
talking
about
if
something
is
slow
or
something's
unreachable,
the
retry
needs
to
go,
needs
to
bubble
all
the
way
up
to
whatever
is
invoked
in
a
snapshot
typically
because,
usually
there's
something
else
going
on
you're
talking
to
the
application.
That's
managing
the
data
you're,
it's
in
a
quiet
state,
while
you're
taking
the
snapshot
and
if
you
can't
get
the
snapshot
taken
by
the
time,
some
time
out,
you're
going
to
uncollapse
the
application
because
it
needs
to
keep
them
in
work
and
you'll.
A
A
A
B
I
I
would
support
the
idea
of
a
deadline,
but
you're
gonna
have
to
solve
the
problem
so
I
mentioned
of.
How
do
you
specify
you
know
what
time
zone
is
that
in
what
ahead
you
deal
with
clock
skew
time
drift
all
the
other
problems
you
can
have,
plus
we
used
to
have
to
define
if
the
deadline
is
not
specified.
Does
that
mean
retry
forever,
or
does
that
mean
retry
zero
times.
A
A
E
I
think
this
idea
is
worth
exploring.
I
would
put
it
as
a
design
project
for
for
beta.
We
can,
you
know,
think,
through
what
the
design
looks
like
does
it
make
sense
with
skew
and
time
zones
and
all
that
kind
of
fun
stuff
and
see
if
it
makes
sense
and
then
also
at
the
same
time,
see
if
there's
actually
a
concrete
need
for
this?
Are
we
seeing
you
know
failures
that
would
have
had
they
been
retried
results
in
a
successful
creation
of
a
snapshot?
A
B
A
A
B
D
A
Con
today,
we
only
support
on-demand
snapshots
and
some
user
might
have
want
to
have
automatic
snapshotting,
like
periodic
array
snapshot,
but
this
I
think
also
could
be
like
next
higher
level
controller.
Doing
that
this
thing
goes
away
and
I
say
you
create
a
script
and
then
you
just
crazy
out
today.
B
So
you
you
can
do
that,
but
I'll
raise
the
issue
that
I
raised
a
while
ago,
because
this
is
an
important
issue
to
me.
The
main
problem
I
have
with
doing
those
kinds
of
things
through
like
a
higher
level
controller
is
kubernetes
will
eventually
know
about
a
ridiculously
large
number
of
snapshots
in
that.
In
that
scenario,
where,
if
you,
if
you
say
I'm
gonna,
take
snapshots
every
hour
and
I'm
gonna
retain
them,
for
you
know
a
week
and
then
I'm
gonna
have
another
retention
policy
past
a
week.
B
You
can
end
up
with
hundreds
of
snapshots
for
every
one
of
your
volumes
and
and
most
of
them
you're
never
gonna
care
about
the
kubernetes
will
have
to
track
all
of
them
and
you'll
end
up
accumulating
thousands
or
tens
of
thousands
of
objects
in
the
community's
database
that
nobody
ever
wants
to
see.
Except
for
that
one
percent
case
where
you're
like
oh
I,
need
to
do
a
restore
and,
and
that's
a
pretty
big
tax
to
put
on
the
system.
A
G
I
I
G
A
So
one
thing
might
help
is
one
I
miss
you
know
is
we
gave
the
functionality
and
I
say
user
can
easily
list
all
the
snapshots
taken
for
for
water
to
be
seen
so
visit
a
user
can
query
okay
for
this
volume,
what
snapshots
this
model
and
also
the
beads
all
the
snapshots?
It
is
volume.
If
we
have
this
kind
of
functionality,
then
the
Apple,
a
border
controller,
can
I
think
easily
to
kind
of
have
some
policies
or
ways
to
and
I
say,
debate
the
old
ones.
You
only
keep
certain
number
and
then.
G
B
Idea
of
having
snapshots
that
are
that
exists
outside
of
kubernetes
knowledge
up
until
the
point
when
they're
needed
at
which
point
communities
becomes
aware
of
them,
and
you
can
start
using
them
that
unburden
screw
Bernays
for
having
to
track.
You
know
a
potentially
a
huge
number
of
objects,
yeah
you're
right.
Some
people
will
do
it
through
kubernetes
and
use
their.
You
know,
they'll
create
these
snapshot,
objects
and
you'll
end
up
with
thousands
of
them,
and
if
that's
what
they
want.
B
That's
okay,
but
it
doesn't
seem
like
a
good
design
to
me
so
having
an
alternative,
which
is
something
outside
the
system
is
automatically
taking
snapshots
and
automatically
aging
them
out
and
retaining
them
on
some
policy,
and
and
only
when
it
you
decide.
Oh
I
need
to
do
restore
now.
There's
got
to
be
a
way
to
take
that
snapshot
that
exists
outside
the
system
and
pull
it
in
and
say.
Okay
now,
I
want
to
use
this
snapshot
and
go
create
a
volume
from
it.
B
No,
no,
no,
we
already
have
the
the
list
snapshots
our
PC
in
in
CSI
and
that
and
that
is
defined
to
return
all
the
snapshots,
whether
they
were
created
by
kubernetes
or
not
so
as
a
mechanism
tips
to
get
the
information
out
of
the
plugin.
It's
just
that.
What
do
you
do
once
you
have
that
list?
If
the
object
isn't
already
there
in
kubernetes,
do
we
have
a
way
to
take
it?
Put
any.
D
Is
the
easy
part
the
harder
part
is
when
the
life
cycle
of
that
snapshot
is
managed
by
the
batch
an
and
then
you
have
to
reconcile
the
Kuban
any
state
at
the
back
and
say
these
are
the
pieces
where
the
problem
comes
in
importing.
It
is
easy
you
can
just
import
it
from
the
back
end,
but
then,
once
you
create
a
corresponding
snapchat
object
inside
kubernetes
and
let's
say
the
storage
back-end
garbage
collects
all
the
snapshots
that
are
more
than
a
day
old.
D
Then
you
have
to
somehow
bring
reconcile
the
kubernetes
say
with
the
back
end
state,
and
these
are
where
the
complications
coming
but
I
agree
with
Ben,
then
that
this
is
an
important
use
case
that
we
should
figure
out
and
it's
some
I
need.
Some
wood
also
related
to
the
previous
discussion
we
get
regarding
the
deadlines,
because
the
way
kubernetes
works
is
a
you
know.
The
creative
model,
eventually
you're
gonna,
take
some
action
and
there
are
no
time
guarantees
for
the
event
eventualities
of
your
action.
D
So
if
somebody
wants
it's
not,
but
now
there
is
no
guarantee,
like
kubernetes
say
would
happen
within
you
know
five
seconds
or
I'm
not
talking
about
it.
Talking
about
the
backend
part
of
it
I'm
just
talking
about
the
all
the
activities
that
happen
on
the
kubernetes,
so
the
controller
may
you
know
pick
up
some
event
with
some
delay.
So
a
may.
A
B
A
I
mean
the
list
here.
If
you
say
list,
Auto
snapshots
taken
from
this
volume
like
this
PVC
right
welcome,
legs
can
do.
Is
it
just
goes
through
all
the
snapshots?
Are
you
object
and
check
which
one
relate
to
this
volume
reading
into
this
PVC,
and
then
we
turn
list.
Another
way
is
go
to
the
back
hands
and
this
all
the
snapshots
are
taken
from
this
volume
and
I
can
tell
you
all
the
list.
These
tools
could
be
different.
A
B
A
A
B
As
far
as
the
nay's
API
is
concerned,
yeah,
you
should
only
care
about
the
ones
the
Carini's
knows
about,
because
anything
else
is
kind
of
ridiculous,
but
there
has
to
be
a
way
to
get
communities
to
know
about
these
other
snapshots.
I
think
that's
what
art
alone
is
getting
at
some
sort
of
an
import
and
then
a
way
of
I.
B
D
D
Yeah
then
we
many
of
these
forest
systems
that
support
schedule
snapshots
work
is
that,
for
example,
you
specify
policies
to
keep
so
many
daily
snapshots,
so
many
hourly
snapshots
so
many
weekly
snapshots.
So
the
store
system
takes
care
of
the
snapshot
lifecycle
for
you
right
so
where
we
can
define
a
policy
that
keeps
around,
let's
say
ten
daily
snapshots
and
then,
as
soon
as
you
go
to
the
eleven
a
then
your
first
Knopfler
gets
deleted,
and
now
you
have
if
kuba
nice
has
an
object.
A
I
think
right
now
we
do
have
the
controller.
It's
my
not
have
is
all
the
time,
but
the
controller
will
appear
actually
check
back
stated.
We
have
two
objects,
snapshots
and
natural
conscience,
and
the
snapshot
of
content
has
the
central
ID
and
it
can
check
in
a
exist
of
those
central
ID
through
the
plugin,
the
backhand
yeah.
You
don't
want
to.