►
Description
Meeting of Kubernetes Storage Special-Interest-Group (SIG) Volume Snapshot Workgroup - 08 October 2018
Find out more about the Storage SIG here: https://github.com/kubernetes/community/tree/master/sig-storage
Moderator: Jing Xu (Google)
A
So
for
the
first
two,
the
division,
protection
and
the
record
addition
policy.
We
should
be
able
to
implement
those
in
this
coder
and
we
start
like
you're
looking
at
them
and
for
resource
Kota.
I
think
this
is
also
quite
important
feature,
but
currently
we
don't
have
support
for
recess
for
recess
codon
for
c
c
rd,
and
I
asked
our
team
who
is
working
on
this
feature.
A
A
And
for
topology
aware
to
ask
snapshots
apology,
we
start
working
on
that
and
seeing
already
have
a
PR,
so
we
can
talk
a
little
more
detail
shortly
in
this
meeting.
So
last
test,
yes,
I,
meaning
we
already
joined
and
talk
about
this
idea,
natural
topology
and
for
the
river
and
in
place
restore
and
thinking
so
far.
A
This
caught
her
will
like
gather
some
feedback
for
users
like
who
can
start
using
as
the
natural
as
a
feature
so
that
we
can
get
more
or
not
how
we
retain
like
implement
those
functionalities,
whether
they
are
needed
and
well
yes.
So
it's
not
like
a
very
urgent
feature.
We
need
to
implement
now.
So
we
want
to
have
some
feedback
about
those
and
fall
laced.
I
have
heard
snapshots
for
volumes.
Those
are
probably
very
useful
feature,
but
again
for
this
quarter.
A
We
want
to
get
some
feedback
and
to
see
how
we
can
provide
those
functions
and
for
a
consistent
group,
a
group
of
Sandra's
way.
You
know
I
like
will
initiate
some
discussion
on
that,
and
then
we
can
put
some
like
in
Coco's,
oh
and
for
sharing
cologne
I
think
we
will
call
it
with,
like
others,
for
his
organic
clone.
A
B
B
Okay,
you
know
so
we
talked
about
this
at
the
last
CSI
meeting.
B
A
The
reason
we
have
these
two
separate
ones
like
because
we
are
thinking
for
bottom
plucking-
it's
possible
that
they
only
have
one
it's
possible.
They
only
have
one
capability.
For
example,
the
volume
don't
have
any
topology
construing
first
natural
has
or
the
other
way
like
the
morning
have
topology
constrains,
but
the
snapshot
is
no
topology
at
all.
So
in
this
case
we
want
to
separate
their
King
constraints,
yeah
the
capabilities
so
that,
for
example,
if
we
only
have
one
and.
A
C
C
C
A
C
Why
sorry
I,
don't
know
what's
wrong
with
this
headset?
My
basic
question
is
because
snapshot
can't
be
connected
to
knows
why
does
it
go
stability
matter?
All
you
can
do
is
create
volumes
from
them.
So.
B
B
C
C
B
Yeah,
because
really
even
in
the
car
smack,
if
you
look
at
other
Chris
naps
already
talked
about,
there
are
two
phases
right.
The
first
phase
is
snapchat
is
cut
and
then
second
phase
is
uploading,
but
a
nod
every
Zuri
system.
What
do
the
uploading
right?
Now?
It's
only
the
cloud
provider
I'm,
sorry
like
Google
Cloud
it
on
the
ass,
but
they
were
upload.
The
snapshot
in
the
same
create
snapshot
process
going
to
be
uploaded.
C
B
That's
what
we
have
this
topology,
then
we
we
do
have
something
in
the
design
right.
So
that's
why
we
we
were
discussing
about
English,
stutters
and
the
how
to
you
know
how
to
how
to
tell
Co
it
is
down.
We
talked
about
that
for
a
long
time,
just
because
of
those
that
we
have,
because
we
have
those
to
say.
A
A
A
A
B
Okay,
so
we
want
to
the
next.
So
this
is
just
as
some
naming
change,
because
we
we
modified
the
accessibility
constraints
that
was
defined
for
voting
only
before
so
now
we
added
the
volume
in
front
of
it.
So,
similarly,
so
for
boarding
message,
that's
that's
from
the
crave
only
response,
so
just
added
that
just
is
added
avoiding
in
front
of
the
accessibility
conference.
B
This
is
the
topology
requirements,
basically
just
making
changes
to
the
definition,
so
that
considered
snapshot
as
well
yeah.
So
basically
it's
just
a
you
know.
Every
word
says
the
warning.
Also
added
a
snapshot
in
there
I
think
maybe
make
more
sense
if
I
just
go
to
the
create
pretty
snapshot
request,
because
that
one.
B
B
And
and
and
then
we
can
look
at
it
as
a
topology
requirement,
so
the
in
the
Chris
option
response
we
also
added
topology
here.
Basically,
that's
after
the
snapshot
is
the
probation,
so
this
theater
cell
tells
us
well.
We
can
have
access
to
this
snapshot
from
which
region
which
zone
so
and
and
then
let's
go
look
at
of
the.
B
B
C
The
thing
that's
on
the
tip
of
my
mind
is
I
might
want
my
snapshot
in
more
than
one
place
because
I'm
using
it
as
my
disaster,
protection
scheme
or
I
might
be
worried
about
an
entire
site
being
lost,
so
I
need
to
make
sure
my
snapshot
is
in
at
least
two
places
so
that
I
don't
lose
my
data
and
that's
not
an
accessibility
concern,
that's
more
of
a
redundancy
concern,
but
it
sounds
like
you
could
achieve
it
with
this
mechanism.
Right.
C
C
D
So
if
the
snapshot
or
the
the
volume
are
capable
of
replication,
you'll
have
an
opaque
parameter
that
says,
you
know,
turn
replication
on
and
then
these
parameters
are
used
in
conjunction
with
that
to
say:
I
want
it
to
be
replicated
to
these
two.
You
know
racks
region
zones,
whatever
does
that
make
sense.
D
D
I
believe
most
systems
are
not
going
to
allow
accessibility
without
replication.
So
if
you,
for
example,
try
to
specify
multiple
zones-
and
you
don't
turn
on
replication-
then
your
volume
is
going
to
be
accessible
from
a
single
zone
or
the
same
thing
with
a
snapshot
and
the
snapshot
or
volume
object
that
you
get
back.
Tell
you
which
zones
or
topology
segments
it's
accessible
from
so
you're
gonna,
see
when
you
get
the
response
that
oh
I
can
actually
only
access
this
from
one.
It's
not
until
you
provide
that
opaque
parameter
to
say,
hey
I'm.
D
C
But
I'm
worried
about
the
case
where
you
have
a
back-end,
that
that
says:
well,
it's
accessible
from
all
the
regions,
because
I
have
my
networking
set
up
such
that
I
can
deem
the
data
where
it
needs
to
be
when
you
want
to
be
there,
but
I'm
only
going
to
keep
one
copy
because
it's
wasteful
to
have
two
or
three
or
four
copies,
and
it's
you
can
blur
the
you
know
that
you
can
say
well,
if
all
accessible
means
is
that
you
can
use
it
in
that
place.
Mm-Hmm.
D
Yeah
I
think
that
risk
exists.
The
number
of
storage
systems
that
I've
seen
do
that
is
very
low,
I
think
it's!
You
know
the
fact
that
this
is
already
part
of
the
API
already
yet
you
know
I
think
consistency
is
better
than
trying
to
optimize
around
that
and
I'm.
D
B
So
we
actually
have
some
comments
here
talking
about
that,
so
if
those
two
together
right,
so
if
it
actually
talks
about
an
opaque
parameter
in
create
a
bond
request,
if
that
says
okay,
this
warning
is
replicated
it's
successful
from
two
zones
and
then
based
on
that,
then
the
volume
of
snapshot
she'd
be
really
accessible
from
two
zones.
So
here
is
one
place
I
to
see
that
this
one
is
already
applications
being
mentioned.
C
C
B
D
To
be
clear,
the
specification
does
not
require
replication,
and
that
is
on
purpose.
We
want
to
leave
it
open
to
let
the
storage
system
define
whether
they
want
to
do
accessibility
via
replication
or
through
some
sort
of
network
setup
or
through
some
other
setup,
and
the
way
that
the
user
discovers
what
is
accessible
or
not
is
through
the
object,
that's
returned,
it
gets
marked
and
the
way
that
they
request
replication
is
through
an
opaque
parameter.
D
A
D
Fine
and
there's
a
question
from
Andrew
in
the
chat.
His
question
is:
the
question
is
whether
the
specification
requires
replication
and
if
it
doesn't,
how
do
you
request
it?
So
the
specification
does
not
require
replication,
and
if
it
doesn't,
how
do
you
request
it?
So
you
request
it
through
an
opaque
parameter.
All
the
create
operations
in
CSI
have
a
key
value
pair
of
opaque
parameters
which
are
defined
by
the
volume
plug-in.
D
Calls
second
part
of
his
comment
is
it
sounds
like
accessibility
and
replication
should
be
completely
orthogonal
issues
yes
agreed
so
out
of
the
scope,
then
yes,
accessibility
or
sorry
replication
should
be
out
of
scope,
because
essentially
the
CEO
does
not
doesn't
really
need
to
be
aware
of
the
fact
that
this
thing
is
replicated
under
the
covers
yet,
and
accessibility
does
need
to
be
part
of
the
spec,
because
the
CEO
needs
to
take
actions
based
on
the
fact
that
it
is
actually
accessible
from
multiple
locations.
So,
for
example,
in
order
to
do
scheduling,
I.
D
Think
an
argument
can
be
made
that
multi
accessibility
versus
replication
could
result
in
different
types
of
scheduling
requirements,
and
if
that
is
the
case,
that
we
could
potentially
add
some
sort
of
replication
parameter
or
indicator
into
the
volume
response,
the
snapshot
response
objects,
but
so
far,
I
haven't
seen
a
concrete
use
case,
for
that
and
I
would
prefer
to
start
with
a
minimal,
API
and
add
things
to
it
rather
than
go.
The
other
way.
C
D
D
One
more
comment
from
Andrew
in
the
chat
is
the
comments
imply
that
behavior
I
think
we
should
be
explicit
that
it
isn't
guaranteed.
So
I
agree
with
that.
If
there
is
an
implication,
we
should
be
clear
about
that.
It's
not
required
and
feel
free
to
comment
specifically
in
the
PR.
If
there's
a
comment
or
anything
that
is
unclear
and
we
can
take
it
from
there
right.
D
B
D
B
B
A
B
A
A
Yes,
okay,
so
besides
the
list
we
discussed
so
far,
I
think
one
more
thing
we
want
to
add
to
this
is
for
the
snapshot.
Preparation,
I
thought
it's
not
like
a
must
feature,
but
I
want
to
discuss
a
little
bit
to
see
to
get
some
feedback.
So
I
think
we
discussed
a
little
bit
before
relate
to
this,
and
so
right
now
in
our
API.
We
don't
have
any
like
support
for
prepare
application
before
taking
snapshots.
That
means
we
need
to
let
user
know
it's
not
a
consistent,
seasonal
shots.
We
miss
opponent.
A
If
you
directly
snapshot,
you
might
not
scatter
their
consistency.
So
user
must
prepare
the
application
manually
before
taking
a
snapshot.
They
can
either
cause
their
application
and
also
even
like
freeze
the
fascism
or
to
be
the
syphilis
raised
unmount
filesystem
in
some
cases,
and
then
they
can
ensure
that
except
shots
mr.
Rob
API
and
it's
also
like
possibly
we
can
have
some
support.
A
For
example,
we
can
have
the
controller
to
take
care
of,
let's
say
fascism,
phrase
before
taking
snapshots
and
also
add
some
hook
for
application
freeze,
so
things
right
now
the
application
is
running
in
container
racks
or
our
pot,
so
Cuba
it's
responsible
for
the
lifecycle
of
application,
the
pot.
So
in
order
to
provide
such
book,
then
Cuba
dates
can
watch
them.
A
That's
hook
need
to
be
like
specified
in
a
paths
back
and
cooperage
can
responsible
for
execute
some
commands
before
to
prepare
application
before
taking
snapshots,
and
it's
it's
a
definitely
not
easy
task
to
like
provide
such
supports.
So
I
won't
see
the
general
idea
like
whether
we
should
think
about
providing
those
or
it's
certainly
not
nice.
It's
even
not
necessary,
so
the
benefits
I
can
see
so
far
is
easily.
A
D
The
challenges
are,
you
know
all
the
things
that
you
mentioned,
one
more
thing
to
keep
in
mind
as
we
design.
This
is
that
the
feedback
that
we
have
from
Zig
architecture
is
to
ensure
that
the
way
that
we
do
this
has
to
be
generic
enough,
so
that
it
can
be
reused
for
potentially
other
types
of
features
and
other
lifecycle
hooks.
C
C
What
one
of
the
big
challenges
with
these
kinds
of
file
system
freezing
things
is
controlling
of
the
timing
like
how
long
is
it
frozen
for
how
does
it
timeout?
You
know?
How
do
you
make
sure
that
it's
frozen
for
the
shortest
amount
of
time,
something
that
we
had
a
I
think
it
was
in
the
regular
CSI
meeting?
You
know
the
general
discussion
about
deadlines
and
snapshots
and
how
long
you
have
to
take
them.
C
It
gets
even
worse
here,
because
now
your
application
is
actually
halted
for
you
know
until
the
snapshot
gets
taken,
so
you
got
to
have
a
way
to
put
a
time
bound
on
it.
So
if,
for
whatever
reason,
the
snapshot
can't
get
taken,
you
remember
to
unfreeze
and-
and
you
know,
fail
the
whole
operation.
It
gets
me
going
on
and
really
seems
relatively
poorly
suited
to
doing
things
in
a
time
bounded
way.
So.
C
A
So
I
think
the
API
with
a
final
snapshot.
So
right
now
we
do
consider
this
timing,
so
we
have
a
righty
back
to
this,
not
that
RIT,
actually
the
the
Krait
10.
So
when
snapshot
is
cut,
we
put
a
quiz
time
and
that's
when
indicate.
Ok,
this
application
can
be
resumed,
so
controller
should
be
able
to
detect.
Ok
when
the
application
can
be
resumed,
and
if
your
quick
snapshots
fail,
then
we
have
an
arrow,
so
the
arrow
state
will
specify.
Ok,
you
fail
to
create
charts.
C
But
like
like,
as
the
user
of
the
application,
I
might
say,
I'm
willing
to
hold
my
application
for
up
to
30
seconds
to
take
a
consistent
snapshot.
It
is
for
whatever
reason
it's
gonna
take
longer
than
30
seconds
I'm
going
to
give
up
and
let
my
applications
start
running
again.
So
so
the
users
don't
get
upset
on
the
other
end.
And
so
you
need
a
mechanism
to
say
well,
the
snapshot
is
happening.
It's
just
going
too
slow,
so
I
need
to
abort
and
get
my
application
up
and
running
because
I'm
not
willing
to
wait
for.
D
C
D
Part
of
whatever
this
lifecycle
hook
is
going
to
be
is
we're
gonna
have
to
define
what
that
period
of
time
is
whether
it's
fixed
or
static,
or
it's
something
that's
negotiated,
and
if
it's
negotiated,
is
it
negotiated
between
application
and
storage
system
and
kubernetes?
What
are
the
players
involved?
How
do
we
do
that?
D
So
I
think
all
of
that
needs
to
be
thought
through,
but
the
natural
evolution
of
this
feature
is
that
if
we
want
to
get
to
a
point
where
we're
gonna
be
able
to
take
application
consistent
snapshots,
we
need
to
be
able
to
have
some
sort
of
hook
into
the
application.
It's
not
going
to
work
without
it,
but
I
completely
agree
with
the
challenges
that
you
laid
out.
We'll
have
to
think
through
those.
A
Multiple
powers
can't
use
volume
read,
we
have
to
visually,
prepare
multiple
paths
at
the
same
time
and
then
yeah,
prepare
them
and
unfreeze
them.
Those
kind
of
consistency
issue,
but
right
now
we
already
have
to
make
two
kinds
of
timing
for
one
is
for
create
snapshots.
We
also
mentioned
for
Chris
that
we
can
have
kind
of
a
a
period
of
time.
You
are
waiting
to
trying
to
take
a
snapshot
right.
We
haven't
had
that
feature
yet
and
then.
C
A
D
The
underlying
concern
that
Ben
is
raising
is
around
if
we
have
a
time
bounded
and
I
think
this
has
come
up
in
this
call
in
previous
meetings
as
well.
If
we
time
down
how
long
a
snapshot
can
take
in
the
previous
context,
this
was
discussed
in
the
context
of
retries.
What
happens
if
the
snapshot
taking
process
is
taking
a
you
know
a
long
time,
but
is
still
in
process,
but
it's
beyond
a
deadline.
D
What
do
we
do
and
it's
worth
thinking
through
and
how
we
define
that
in
the
CSI
spec?
Maybe
we
can,
because
these
calls
are
supposed
to
be
synchronous.
The
termination
of
the
connection
from
the
caller
side
should
be
an
indication
to
the
to
the
storage
system,
and
this
is
just
me
kind
of
throwing
out
ideas.
Yes,
it
could
be
an
indication
that
hey
I,
give
up
I
requested
this
snapshot,
but
you
took
too
long
so
do
whatever
you
need
to
do
to
clean
it
up,
I,
don't
need
it
anymore.
A
D
Maybe
we
should
make
that
explicit
in
the
spec
that
if
our
timeout
happens,
but
the
problem
is
the
spec
says,
these
calls
are
supposed
to
be
idempotent
and
I
believe.
Create
snapshot
also
says
that
so
then,
ultimately,
the
caller
would
be
responsible
for
doing
the
cleanup,
regardless
I
think
it's
worth
taking
a
note,
I'm
thinking
through
the
and
see
how
we
can
clarify
the
spec
and
what
can
be
added
to
to
handle
this
I
think
all
of
this
will
need
to
go
in
before
GA.
A
Okay,
sure
so
we'll
think
through
all
these
things,
and
especially
like
a
process,
an
icepack,
so
whether
we
need
some
special
also
explained
in
more
detail.
So
that's
a
good
discussion.
So
that's
what
I
have
so
far
so
I
think
for
next
time.
We
all
like
discuss
a
bit
more
in
detail
so
since
it's
not
required
properly
for
this
quarter,
but
it's
definitely
worth
to
start
discussion
as
early
as
possible
to
required
big
changes
in
snapshot
and
also
maybe
it's
nice
back
yeah.