►
Description
Kubernetes Data Protection WG - Bi-Weekly Meeting - 30 November 2022
Meeting Notes/Agenda: -
Find out more about the DP WG here:
https://github.com/kubernetes/community/tree/master/wg-data-protection
Moderator: Xing Yang (VMware)
A
Hello,
everyone
today
is
November
30th
2022.
This
is
the
kubernetes
data
protection.
When
we're
meeting
I
think
today's
main
topic
is
Yvonne
is
going
to
give
an
update
on
CBT
and
I
know.
There
are
a
few
others
Prasad
and
Dave
Carr.
You
also
have
some
thoughts
about
this,
so
we
can
start
talking
about
this
Yvonne.
You
do.
You
have
anything
to
share
okay.
So
let
me
actually
I'll
stop
sharing
I'll
make
you
host.
B
Yeah
thanks
Sam
I
think
yeah.
If
maybe
like
another
person,
you
know
you
put
some
time
and
effort
into
doing
a
proper
type
on
the
bitmap
approach
right.
Do
you
want
to
share
the
finding
with
the
group
any
thoughts
on
that.
C
A
Do
you
have
anything
like
any
link
to
share
it'd
be
good?
You
know
if
I
had
anything
to
sh
to
show
on
the
screen,
rather
than
people
just
stare
at
this
blank
screen.
You.
C
A
C
All
right,
I
hope,
I'm
sharing
the
right
screen,
reduce.
C
Yeah
sure
yeah,
so
the
thing
we
wanted
to
experiment
with
these
like
to
visualize
how
the
bitmap
would
look
like
for
for
the
change
block
response
we
get
from
abs
and
how
you
know
we
can
serialize,
send
the
response
and
at
the
client
said
how
we
can
deserialize
and
build
the
you
know,
change
blocks,
get.
E
C
Change
blocks
data
again
right,
so
we
so
basically
used
mob
data
of.
So
this
is
how
we
get
e-based
response.
Ebs
change,
block
response
right
so
so.
C
Is
based
on
real
data,
yeah
I
had
taken
two
steps,
snapshot
and
you
know
used
EBS
clf.
E
C
Are
detected
here,
they
are
not
actual
tokens,
so
this
is
how
we
get
the
response
and
yeah
to
build
the
build
the
bitmap.
So
again,
this
is
just
to
you
know
some.
It
is
not
I
would
say
best
way
of
doing
this.
We
are
just
doing
experimentation,
so
I
believe
this
is
okay.
C
So
we
some
of
the
response.
We
try
to
get
the
a
number
of
blogs
and
since
we
can
represent
each
block
with
one
bit,
so
the
total
bytes
required
would
be
total
number
of
blocks
by
I
mean
one
byte
can
can
represent.
Eight
block
eight
blocks
right,
one
bit
per
block,
so
likewise
we
can
build
the
bit
vector,
so
we
will
set
if,
if
the,
if
okay
so
so,
will
iterate
through
all
the
change
blocks.
C
If
we
for
for
each
index,
we
will
set
the
bit
to
one
otherwise
by
default,
it
will
be
zero
and
yeah,
so
we
then
serialize
it
and
then
the
actual
response
will
be
sent
in
format
bytes
and
to
deserialize.
Again,
basically,
from
that
bite
we
got
in
the
response.
We
try
to
build,
beat
V
Factor
again,
so
that
then
we
can
iterate
over
the
big
vector
and
get
the
sense
of
like
which
block
is
changed
right.
C
So,
based
on
like
the
index,
we
can
then
find
out
the
data
once
we,
you
know,
restore
the
volume
we
we
have.
The
metadata
like
which
blocks
are
changed
and
we
can
fix
the
data
from
from
the
volume
and
to
visualize,
like
yeah,
so
for
the
mock
data
like
this
is
how
the
bit
Vector
would
look
like.
C
I,
don't
think
there
is
an
option
to
unwrap
this
level
yeah,
so
the
response
would
be
invite,
so
it
will
consume
only
a
total
number
of
blocks
divided
by
eight
beats,
bytes,
sorry,
yeah
and
then
obviously
we
can
send
this
by
itself
response
and
deserialize
and
build
rebuild
the
bit
vector
to
get
the
change
block
information
again
and
yeah.
So
Carl
pointed
out
by
this
approach
may
not
work
because
we
just
get
we
can.
We
can
encode
only
the
index
indexes.
C
E
C
This
is
not.
This
approach
may
not
be
extendable
if
we
want
to
go
into
the
data,
if
you
want
to
adopt
data
paths
in
future.
D
B
Thanks
Prasad
yeah,
thanks
for
putting
the
time
to
investigate
into
this
yeah
I
agree
with
your
assessment.
There
I
think
like
it
does
require,
like
you,
have
the
backup
software
to
make
some
assumptions
about
you.
B
Business
means
you
know,
I
guess,
at
the
end
of
the
day,
all
the
backups
off
I
can
see
it's
like
it's
a
string
of
bit
of
bytes
or
whatever
you
know
basic
type
that
we
use
there,
but
then
it
has
to
make
some
assumptions
around,
like
you
know
like,
because
in
your
test
right,
your
the
blocking
back
is
one
two
three
four
five,
six,
seven,
eight
nine
ten,
but
as
I
remember
like
EBS,
have
its
own
logical
offset.
It
might
mean
who
knows
what
the
only
EPs
back
I
understand
right.
B
So
the
backup
software
would
even
like
building
the
soft
building
the
business
and
then
you
know
deconstruct
building
the
bitmap
inside
the
aggregated,
API,
server
or
controller
or
whatever
we
choose
and
then
reconstructing
it
and
or
like
you
know
like
trying
to
make
meaning
out
of
it.
B
On
the
other
end,
there's
some
assumptions
there,
which
you
know
as
much
as
like
a
I,
think
this
is
like
simple
and
great,
but
I'm,
just
not
comfortable
with
the
assumptions
that
have
to
be
made
there
on
both
ends:
yep
cool,
okay,
anything
else,
okay,
yeah
so
like
in
parallel,
like
Jan
and
I,
have
been
exploring,
like
you
know,
possible,
alternative
and
yet
again,
like
I.
B
Think,
like
I
share
this
thought
with
you
know
on
like
last
week
and
it
might
be
doable
the
idea.
I'm
gonna
share
my
screen.
I
guess!
The
idea
is
instead
of
like
forcing
like
the
payloads
down
the
wire
into
the
network
and
through
the
kubernetes
control
plane
and
cross
our
finger
and
hope
for
the
best
like.
Why
can't
we
just
write
the
data
into
a
persistent
volume
and
let
the
backup
software
decide
when
they
want
to
mount
this?
B
B
Show
me
the
change
blocks
between
again
these
two
pair
of
snapshots
and
secret
reference
is
really
just
you
know,
CSI
requirement,
so
one
interesting,
the
first
interesting
change
that
we
want
to
propose
is,
instead
of
like
your
maximum
number
of
blocks,
we
asked
user
to
input
like
on
the
maximum
size
in
bytes
to
be
returned
like
maximum
size
in
bytes
of
the
payload
to
be
returned
because,
like
it
I
think
we
discussed
this
on
slack
as
well
right.
B
The
number
of
blocks
is
really
determined
by
the
the
block
size
which,
as
we
all
know,
may
vary
between
provided
to
Providers.
You
know
when
a
user
say
hey,
you
know
the
maximum
number
of
blocks
that
I'm
willing
to
handle
is
10
000.
B
in
a
way
like
it's
kind
of
meaningless
right,
because,
like
it,
ten
thousand
like
can
mean
something
else
from
another
provider's
perspective
when
their
box
size
is
512
kilobytes,
for
example,
in
EBS
another
provider
with
four
kilobyte
block
size,
you
know,
like
the
number
of
blocks
would
increase
like
significantly
so
instead
of
yeah
that
why
don't
we
just
enforce,
like
the
payload
total
payload
size
is
expressing
bytes.
You
know
anything
more
than
that,
like
we
were
just
not
want
to
receive
it
or
not
want
to
return
it
to
the
backup
software.
B
So
time
out,
pretty
you
know
intuitive
I
think,
like
the
main,
the
first
magical,
I
guess
ingredient
here
is
like
I'll,
be
at
like
a
property
that
allowed
the
user
to
say,
hey,
don't
send
me
the
data
over
at
the
network,
because
I
don't
want
you
to
take
down
my
kubernetes
control
plane,
but
instead
write
it
to
this
PVC
provision.
This
previous
PVC
use
volume
populator
to
inject
or
write
all
the
data
into
the
underlying
volume
and
then,
when
I'm,
ready,
I'll
spin
up
a
part
to
read
it
from
here.
B
So,
as
we
can
tell
right,
like
these
properties
are
very
familiar,
I
think
we
can
go
into
more
discussions
about
like
what
kind
of
like
configurable
PVC
properties.
We
want
to
expose
the
user,
but
the
some
of
the
more
common
ones
include
like
the
name
and
the
name
space.
So,
like
the
user,
tell
CBT
CSI,
driver
or
CSI
site
car
I
want
you
to
write
the
CBT
metadata
into
this
EBC
in
this
namespace
access
mode.
You
know
you
know
rewrite
one's
part.
Just
you
know
like
this.
B
How,
like
the
backup
software
choose
to
access
it,
meaning
anyone
once
this
PVC
is
ready.
Only
one
part
can
access.
This
is
PVC.
We
could
claim
policy
whether
they
want
to
delete
it
or
retain
it.
You
know
pretty
standard
API
and
a
configuration
there.
Resource
requests,
some
storage,
you
know
like
how
much
of
the
PVC
should
be,
and
the
storage
class
to
use
and
then
in
return,
so
you
can
imagine
right
like
now.
B
Even
we
can
do
this
without
like
an
aggregate,
API
server,
we
just
need
a
good
old,
CID
controller.
So
users
like
create
this
volumes,
natural
Delta
resource,
underneath
it
invokes
the
API
and
then
now
CSI
like
psychar,
does
its
thing
and
then
in
return,
you
update
the
status
with
all
the
information
that
of
the
relevant
operation.
You
know
the
number
of
blocks
that
we
we
found,
like
the
block
size
that
the
provider
covers.
B
B
B
That
yeah,
so
we
can
go
into
like
we.
You
know
it's
one
of
those
like
young.
Do
we
want
to
it's
a
question
around
like?
Do
we
want
to
let
the
user
decide
the
volume
populator
or
can
our
CSI
site
card
decide
the
volume
populator,
so
the
the
the
second
slide
would
deal
with
that.
So,
but
let's
I
wanna
like
get
people
some.
B
You
know
feelings
about,
at
least
from
the
API
consumption
level.
Does
that
like
makes
sense,
you
know
if.
A
This
doesn't
make
sense,
well,
I,
just
wonder
because
take
create
a
Walling
could
be
slow.
So
that's
the
only
anything
that
you
know
jump
out
of
my
mind
right
now,
just
to
I
don't
know
if
it's
they
will
add
some
performance.
Yeah.
B
There's
a
bit
of
a
trade-off
there
right,
I
think,
like
I,
think
on
someone
did
the
math
like
on
slack
or
something.
So
if
it
is
like
one
terabyte
of
volume,
I
think
the
the
CBT
metadata
might
be
in
the
magnitude
of
hundreds.
A
That
this
is
the
creative
water
itself
right.
It's
it
could
take
take
time
right
to
create
a
volume.
Basically
just
it's
like
for
every
time
we
do
this.
We
want
to
create
a
new
warning,
basically
right,
I'm,
just
saying
this
will
add
some
performance
penalty.
B
Yeah,
you
know
like
it's
definitely
a
good
factor
to
to
consider,
but
there's
some
trade-offs
there
to
be
to
be
made
at
least
like
because,
like
the
main
like
I
guess,
like
you
know,
pushback
was
like
you
know.
We
don't
want
to.
We
don't
want
to
like
put
the
kubernetes
API
server
at
risk.
You
know,
as
we
flow
all
this
data
back
to
the
backup
software,
which
would
be
the
case
if
we
utilize
an
aggregated,
ABS
server
or
just
shove.
All
the
entries
into
yeah.
A
We
probably
need
to
just
need
to
even
need
to
do
some
tests
and
see
like
how
much
yeah.
B
Yeah
I
think
like
from
from
at
least
from
my
perspective.
It's
like
this
makes
sense
because,
like
for
one
like
you
know,
we
are
we
as
a
community,
we
are
familiar
with
PVC,
you
know,
we
know
how
performing
provisioning
works.
We
know
how.
A
A
Yeah
but
I
think
the
the
I
guess.
The
reason
why
reason
for
having
sweet
is
to
improve
the
you
know
the
make
it
more
efficient
right,
but
then
we
are
adding
this
we're
also
adding
some
performance
penalty
here
by
using
this
way,
but
I'm
not
saying
this
is
the
I
just
want
to
say
this
is
something
that
we
need
to
consider
right.
You
need
to
look
at
pros
and
cons
of
each
approach,
yep,
so.
E
So
I
think
we're
starting
to
design
for
worst
case
and
that's
kind
of
skewing
everything
at
the
moment.
So
one
thing
to
remember
is
CBT
is
an
optimization.
If
you
don't
have
CBT
everything
works,
just
everything
works.
Just
fine.
It's
just
slower
right.
A
E
Yeah,
so
you
know,
even
in
cases
where
so
like,
for
example,
we
like
when
you're
using
CVT
so
worst
case
would
be
like
say
every
other
block
is
changed,
I'm,
not
sure
that
CBT
actually
buys
you
anything
in
that
situation,
because
reading
every
other,
Block
versus
just
reading
every
block
yeah
in
theory
it's
half
the
iOS
I
guess
with
ssds,
it's
not
so
bad,
but
you're,
not
really
saving
that
much.
You
know,
there's
a
point
where
you
just
go:
yeah
just
read
the
whole
volume.
E
It's
it's
not
that
much
worse
because
you
may
be
reading
in
pretty
big
chunks.
So
I
think
we
want
to
be
careful
about
that.
We
can
certainly
put
some
limits
and
say
you
know
if
it's
more
than
x,
percentage
of
the
disk
has
been
changed.
Just
read
the
whole
thing,
so
that's
one
option,
mm-hmm
I!
Think
we
should.
We
should
think
about
that
and
discuss
that
a
bit
as
to
where
you
know
when
there's
a
cutoff
point
where
CBT
is
no
longer
buying
us
anything
right.
B
A
B
I
think,
like
I,
think
that's
a
good
point
and
the
again
right,
the
the
main
pushback
that
we
got
is
still
like
the
data
flowing
through
the
network
through
the
kubernetes
control
plane.
B
So,
like
you
know
it's
a
matter
of
like
the
so
it
sounds
like
yeah.
You
know
like
there
will
be
some.
So
if
we
go
down
the
path
of
like
trying
to
optimize
things
and
there
would
be
some,
then
we
have
to
be
explicit
about
like
the
supported
case
and
the
non-supported
case.
I
think
right.
E
No
because
you
simply
come
back
and
you
say,
for
example,
everything
changed
right.
That's
that's
one
option.
You
return
an
extent
that
says
everything
changed
and
then
you
know
the
backup
software
just
deals
with
it
like
that.
B
So
yeah
like
in
a
lot
of
that,
like
so
that
again
that
so
that
is
like
the
worst
case
scenario,
where
everything
change
no.
E
B
So
how
would
we
so,
regardless,
like
how
would
we
still
like
send
the
data
back
to
the
user.
E
That
that's
that's
not
what
I'm
trying
to
solve
there.
What
I'm
trying
to
say,
though,
is
that
we
don't
we
we
want
to
be.
We
want
to
be
solving
in
such
a
way
that
we
get
a
boost
most
of
the
time
if
we
are
doing
things
that
give
us
a
boost,
only
10
of
the
time,
but
we've
designed
for
like
the
worst
kit,
but
because,
because
we've
designed
for
the
worst
case,
we're
not
really
winning
yeah.
B
E
B
In
a
good
case,
scenario
like
I
feel
like
this
proposed
solution
will
still
work
well,.
D
D
Is
all
overhead
right
I
mean
we
have
to
take
into
considerate
consideration
as
a
factor
of
the
overall
data
transfer
for
the
backup?
So
it's
a
bit
murky
at
this
point,
how
we
evaluate
it
so,
but
I
I
agree
with
Dave.
You
know
that
we
should
observe
and
make
comments
about
the
worst
case.
But
again
you
know
it's
targeted
towards
the
average
case.
Maybe
there
can
be
some
the
API
spec
there
could
be
some
thresholds
about.
D
D
B
Yeah
I
think
like
yeah,
for
the
for,
for
the
like
on
first
step.
Like
you
know,
it's
just
really
like
text.
You
know
like
we're
expecting
like
I'm
just
tax
payload,
so
just
write
it
to
like
a
file
system,
PVC.
B
B
Yeah
well
like
and
then
like
you'll,
be
up
to
the
backup
software
to
decide
how
they
want
to
consume
it
right.
Do
they
want
to
consume
a
Boston,
PVC
or
block
PVC.
D
B
From
like
the
CSI
like
driver
perspective,
I
think
would
you
agree
that
it
is
like
a
it
should
be:
a
user
configurable
property
at
the
API
level,
s
are.
E
Pretty
different
code
paths
right
because
you
know
you
have
to
like
you
know
getting
the
file
system
is
one
thing,
but
then
you're
gonna
have
completely
different
code
to
write
to
a
file.
You
know
up
to
a
point
than
you
are
writing
to
the
raw
device
and
reading
from
it
also
don't
forget
about
the
possibility
of
having
windows
work
for
nodes
yeah.
So
that
was
something
that's
that's.
D
D
While
we
do
it
so
that
this
is
then
forcing
the
backup
application
to
launch
another
pod
to
attach
this
P,
this
PVC
right,
so
that
it
can
be
read
yes,
you're
forcing
behavioral
change
in
a
backup
application,
whereas
before
you're
just
using
apis
now
they
have
to
spin
off
another
pod,
which
itself
takes
time.
Besides
the
you
know,
the
dynamic
allocation,
all
these
things
so
yeah,
it's
there's
a
lot
of
overhead,
and
maybe
we
could
just
classify
what
the
overhead
is
and
then
we
can
understand
it
better.
I'm.
B
So
that's
those
are
good
points
right
so
like
and
like
whether
it
is
file
system
or
code
or
so
two
things
here
right.
The
file
system
with
a
block-
and
you
know
like
hopefully
like
because
like
if
the
volume
populator
like
API,
there's
nothing
stopping
us
from
having
like
different
kind
of
volume
populators
to
have
to
to
handle
like
different
kind
of
different
types
of
populations
of
different
volume
types,
because
that's
one
thing
and
secondly,
regarding
part,
you
know
again
excellent
point
right
like.
B
But
you
know
if,
if
we
like,
just
like
soon,
I
mean
like,
if
we
like
take
consider,
we
try
again
right,
try
not
to
like
step
into
that.
They
have
half
too
much.
But
once
like,
eventually
like
this
backup,
software
will
need
to
spin
up
a
part
to
do
to
to
to
to
get
real
meaning,
meaningful
operation.
Out
of
this
right,
it
would
need
to
spin
up
a
part
to
do
the
data
path
things,
and
this
way
it's
going
to
apply
like
all
the
CBT
metadata.
D
A
B
There
so
yeah
I
feel
like
so.
What
we
want
to
propose
like
for
the
first
step
is
like
you
have
the
container
the
cycle
container
is
with
is
both
like
a
controller
as
well
as
like
a
volume
populator
so
like
the
controller
will
respond
to
like
the
volume
snapshot,
Delta
request
and
then,
as
the
CBT
payload
flows
back
in
like
it
would
like.
You
know
the
the
volume
populator
within
the
sidecar
circus.
B
You
know
separate
process
right
now
and
then
like
it
would
just
go
ahead
and
create
like
a
persistent
volume
claim,
and
then
it
would
Define
like
the
data
source
reference
that
points
to
CR
or
crb,
that
the
that
will
have
all
the
information.
Obviously,.
B
B
So,
of
course,
like
this,
CR
doesn't
have
to
be
it's
not
something
that
the
backup
software
needs
to
be
aware
of
so
like
it
would
be,
like
you
know,
created
by
the
PVC,
as
well
as
the
change
block
output.
However,
however,
we
want
to
name.
It
would
be
something
that
the
sidecar.
B
Yeah
yeah
I
mean,
like
you
know,
like
the
good
thing
about,
like
with
a
football
volume.
Population
is
I
mean
it's
like
I'm
still
like
the
same,
underneath
like
it's
like
at
least
for
Alpha
I.
Still
like
the
same
code
around
like
you
know,
we
need
to
write
it
into
like
some
sort
of
ephemeral
storage.
A
B
Yeah
I'm,
sorry,
the
so
yeah
I
think
like
we
by
putting
it
behind
an
API
like
there
is
opportunity
to
Implement
different
kind
of
volume
populators
in
the
future.
So
nothing.
C
B
To
you
like
that,
concerns
around
like
what,
if
there
is
different
code
path,
around
code
path
around
like
this,
is
whether
this
is
a
file
system
or
a
block
story.
Small
stocks
block
PVC,
there's
provision,
but
at
least
like
so
I
yeah.
B
B
The
concern
is
around
optimization,
like
with
it
like,
with
this
like
slow
it
down
significantly
I
think,
but
at
least
like
I
think,
like
they
had
a
couple
of
things
here.
That
I
personally,
like
is
the
idea
of
like
again
right,
like
I'm,
going
back
to
that
user
experience
like
users
have,
we
are
familiar
with
Pub
PVC
work.
There
is
like
you
know.
We
are
familiar
with
security
around
PVC.
B
We
are
not
at
risk
of,
like
you,
know,
shuffling
Downs,
like
tons
of
traffic
through
the
control
plane
and
also
like
with
a
volume
populator
there's
like
flexibility
around
how
we
want
to
extend
like
things.
So,
if
we
put
things
into
ephemeral
storage
that
can
serve
as
a
cash,
you
know
when
user
resubmit,
like
a
volume,
populated
snapshot
again
with
the
same
base
and
Target.
We
know
oh
yeah,
we've
seen
this
before.
You
know
like
we're.
B
D
E
D
E
Feel
like
talked
about
adding
a
data
path,
yeah.
C
E
B
I
guess
so
yeah
I
think,
like
the
data
paths.
So
far
like
definitely
like
you
know,
we
talk
about
this
multiple
times,
right,
they're,
different
implementation
of
data
paths.
So
far
like
you
know,
the
more
I
feel
like
the
common
genetic
one
is
the
one
that
is
detailed
in
the
the
data
protection
white
paper,
where
we
have
some
sort
of
data
mover
path,
a
pot
that
do
some
mounting
of
the
PVC
of
the
restore
PVC.
E
E
So
if
you
go
to
say
V
sphere,
then
that
means
that
it
actually
copies
every
block
before
it
even
out
of
the
snapshot
into
a
new
volume
before
you
even
Mount
before
you
can
even
get
access
to
the
data.
So
at
that
point
you've
already
blown
out.
You
know
all
of
your.
You
know,
advantages
for
CBT
and
everything
else.
You've
doubled
the
amount
of
of
iOS.
E
E
B
Mm-Hmm
so
yeah
I
think,
like
this
approach,
doesn't
stop
that
the
the
out-of-band
data
path
that
you
folks
want
to
do
right.
E
Well,
it
doesn't
add
anything
for
it,
though,
because
we
can't
use
so
if
we
use
this,
then
you
know
we're
basically
not
gaining
much
so
I
think
what
we'd
like
to
look
towards
is
eventually
having
a
network
data
path
that
is
common
that
sits
on
top
of
a
bunch
of
different
network
data
paths,
but
brings
them
together
into
a
common
into
a
common
protocol.
B
So
so,
for
the
sake
of
discussion
like
instead
of
like
provisioning,
like
you
know
the
PVC
being
like
a
disc
in
the
cloud
or
a
volume
in
the
cloud
like.
Would
it
help
if
you're
folks,
like
just
provision
ephemeral,
one
like
it,
would
just
be
local
on
the
node
right.
E
F
Yeah
I
mean,
ultimately
that
would
be
the
ideal
solution
right.
We
have
a
data
path
included
along
with
there,
this
metadata,
so
one
question.
So
what
was
so
sorry
I
missed
that
part.
So
what
was
the
concern
around?
How
much
traffic
is
okay
for
the
you
for
using
the
aggregate,
API
server.
B
B
If
it's,
if
a
user
have
a
super
buff
like
cluster
control
plane,
then
they
may
have
some
higher
threshold,
but
the
you
know,
but
if
they
can
get
sort
of
like
in
some
sort
of
confined
restricted
resource
environment
being
because,
if
you
can
imagine
so
even
using
EBS
direct
as
an
example,
I
think
it
limits
you
to
like.
Maybe
there's
a
couple
thousands
of
blocks
and
then
the
backup
software
will
be
like
okay.
Now
give
me
more
give
me
more
give
me
more
because.
F
So
what
we
were
discussing-
and
that
was
brought
up
earlier
so
save
more
than
10
or
20
percent
of
it
is
changed.
It
will
say:
okay
back
up
the
whole
volume.
Would
that
address
the
concern
that
the
of.
B
Overwhelming
there
yeah
so
then
like
that
would
be
up
to
the
backup
software
to
decide
right
so
that
now
then,
like
the
implementation
at
the
CSI
site
car,
it
becomes.
B
That
the
CSI
site
card
will
I,
don't
think
CSI
can
enforce.
That.
Can
it
because,
like
the
backup
software
optimally,
has
to
decide
okay,
how
much
my
threshold
is
well.
E
How
much
is
well
well?
No,
because
it
does
returning
it's
a
CSI
driver
returning
the
list
of
change
blocks,
and
so
we
could
set
a
a
threshold
in
the
CSI
driver
that
says
we're
not
going
to
return
more
than
x
amount
of
data.
If
it's
more
than
that,
we
simply
return
a
single
extent.
That
says
everything.
That's.
B
A
C
E
Is
a
security
issue
though
this
is
I
mean
this
is
a
assistant
performance
issue
and
if
someone
breaks
it
I
mean
people
can
break
things
all
the
time
and
there's
plenty
of
ways
to
break
the
system
and
the.
A
We
need
to
so.
If,
if
we
can't,
we
have
some
way
to
say
maybe
our
side
car,
our
like
we
need
CSI
controller-
can
have
this
limit,
at
least
if
they
use
our
controllers.
They
cannot
go
beyond
this
limit
or
something.
If
we
can
set
some,
we
can
agree
upon
some
limit.
Maybe
we
can
talk
about
talk
to
them
again
and
see
yeah.
So
what
is
this
upper
limit?
I
guess
I'm,
not
sure
without
limit
that
is
acceptable.
A
B
I
think
that
two
things
there
right,
like
first
of
all
the
most
of
CSI,
can
say,
at
least
in
my
opinion,
is
okay.
I'm
gonna
only
return.
You
x
amount
of
data,
like
you
can't
pick
the
extra
step
like
what
shukrano
was
saying
and
say,
like
hey,
you
know,
like
I'll,
do
a
full
backup.
You
know
you're,
better
off
doing
that
way.
E
E
No
no
I
haven't
all
we're.
Returning
is
a
list
of
extents,
so
all
we
have
to
do
is
just
say
extent
of
everything
or
extent
of
everything
from
here.
Even
you
know,
so,
if
we
know
the
size
of
the
volume
we
could
even
inject
the
record,
we
could
say:
yeah,
you've
returned
us,
a
thousand
records.
1001
is
going
to
be
everything
else
right.
It
adds
one
more
record
and
then
discard
the
rest.
That's
coming
back
from
the
CSI
driver.
B
B
B
E
It
can't
we
can't
presuppose
the
intelligence,
the
backup
software
to.
E
E
B
B
F
E
D
Well,
that's
a
VMware
issue,
though
right
VMA
returns
extends,
but
our
API
here
is
giving
change
blocks.
So
that
means
we'd
have
to
map
the
extent.
E
D
C
D
Changed
block
X
Plus,
so
many
by
so
many
bytes
yeah.
B
So,
okay,
so
so
yeah
so
I
think
what
was
your
question
earlier.
F
I
was
asking
was
so
the
this
discussion
started
because
of
the
concern
with
the
amount
of
traffic
go
flowing
through
the
control
plane,
but
if
we
put
some
upper
cap
on
that
traffic,
would
that
be
an
acceptable
solution?
Yes,.
B
So
going
back
to
yeah,
so
yeah
I,
guess
yeah
I
guess
like
what
this
aligns
with
what
the
machine
was
asking
earlier.
I
think
like
yeah,
is
I
kind
of
floated.
This
comment
around
in
slack
as
well
I
think,
for
maybe
it
makes
sense
to
say
hey
for
Alpha.
You
know
we
I
don't
know
like
support,
maybe
up
to
10
terabytes
of
volumes
or
something
like
that.
B
Some
sort
of
arbitrary
threshold
like
for
Alpha
just
to
see
like
how
user
use
it
to
get
a
sense
of
like
you
know
when
the
problem
is
the
road.
What
it
really
looks
like
so
and
then
I
guess,
to
kind
of
to
your
point
like
say:
if
from
a
design
architecture
perspective,
say
if
we
go
in
and
say
hey,
you
know
what
like
we
did.
B
Our
math
like
we're
gonna
only
support
like
maybe
up
to
10
terabytes
of
volumes,
which
equates
to
a
couple
megabytes
of
metadata
or
worst
case
in
a
one
gigabyte
of
metadata,
the
other
Factor
there
and
fortunately,
unfortunately,
depending
on
how
we
look
at
it
is
a
a
lot
of
it
depends
on
block
size
right.
So
if
you
sell
like
one
gig
of
metadata,
if
the
block
size
vary
from
provider
to
provider,
then
you're
going
to
have
different
like
pagination
Behavior
different
request
response
pairs.
B
B
Then
yeah
further,
what's
the
first
part
right
this?
The
second
part
also
that
the
return
path
back
to
the
user
through
the
kubernetes,
API
server
and
then
you'd
be
like
okay.
You
know
like
so
the
block
size
right,
it
can
be
I,
don't
know
off
the
top
of
my
hat
on
what
the
math
is
going
to
be
one
gig
of
metadata,
it
can
be
5.
B
000
blocks
can
be
ten
thousand
blocks
depending
on
the
Block
size,
and
all
of
these
have
implication
on
how
many
requests
and
response
gonna
flow
through
the
kubernetes
control
plane,
which
again,
is
the
main
concern
of
the
I.
Don't
think
the
architecture
cares
about
how
we
implement
it
at
the
end
of
the
day,
as
long
as
we
can
take
it
out
of
sight
of
the
control
plane,
because
also
we
don't
want
our
component
to
be
the
one
that's
responsible
for
taking
down
people's
survey
right.
E
E
B
B
E
E
They
are
no
in
in
the
CSI
driver.
Even
so,
if
I
want
to
convert,
say
from
512k
blocks
to
one
megabyte
blocks,
it's
pretty
easy.
I
can
aggregate
them
together,
or
vice
versa.
If
I
want
to,
if
I
want
to
say
oh
I'm,
going
to
return
them
in
4k
blocks
and
I
get
a
512k
blockback,
you
just
return.
You
know
whatever
a
thousand
10
24
4K
blocks.
D
D
B
I
think
we
just
want
to
again
generic
enough
to
cover
the
80
90,
not
a
specific,
like
provider.
Respect
here,
I
feel
like
also
like
again
right.
Just
for
the
sake
of
discussion
like
we
feel
like
we're,
taking
on
a
lot
more
like
complex
logic
than
I,
just
want
a
provision,
a
volume
and
write
it
in
there.
You
know
and
listen
right
now.
That's
my
way
of.
E
B
D
D
If
I'm
new
to
the
scene
and
I
get
this
right.
I
want
to
just
inherit
efficient
backup
without
investing
a
lot
of
a
lot
of
time
and
logic
into
trying
to
understand
the
low
level
infrastructure
and
say
kubernetes
feed
me
right.
That's
that's
the
goal
so
I
don't
know
whether
it
makes
very
much
sense
without
looking
at
a
holistic
solution.
D
I
know:
we've
parceled
it
out
into
metadata
of
CBT
and
then
talk
about
the
data
blocks,
but
I'm
just
tossing
this
out
there
does
it
make
sense
in
all
situations
that
we
treat
this
in
the
split
brain
mechanism
or
do
we
also
have
to
look
at
the
data
path
to
get
the
change
blocks.
The
actual
blocks
as
part
of
the
solution.
E
I
think
it's
a
fair
point.
Carl
I
think
that
the
reason
we
probably
went
down
this
you
know
bit
by
bit
path
is
just
that
it
is
incremental
and
getting
the
data
path
for
the
actual
data.
I
mean
that
is
a
large
amount
of
of
stuff
to
move,
and
we
we
wind
up
in
a
bunch
of
issues
there.
So
it's
probably
you
know,
that's
why
it
hasn't
really
been
on
the
table,
but
maybe
that
maybe
it
does
need
to
be.
A
But
we
are
okay,
we'll
plan
on
adding
this
into
CSS
background.
She's
expected
something
for
control
path.
It's
not
not
for
data
pass,
so
I
think
we
need
to
address
this
first.
Okay,
we
can
add
too
much
new
things
in
there.
A
D
A
So
you're,
okay,
with
this
burning
populated
approach,.
D
I,
don't
totally
understand
it,
but
I'm.
Okay,
if
I
have
to
consider
the
data
block
movement,
because
ultimately,
if
I'm
actually.
D
D
D
B
Yeah
I
mean
I
agree
with
that
right,
like
at
the
end
of
the
day
yeah
without
we,
we
don't
want
to
implement
anything
on
the
data
path,
but
without
thinking
about
how
the
user
consumed
this
like,
if
there's
no
feasible
way,
or
at
least
there's
no
optimized
way.
A
In
the
grand
scheme
of
thing
we
yeah,
we
don't
have
to
you
know,
describe
like
how
how
the
you
know.
People
will
be
using
this.
You
know
after
retrieving
this,
how
do
they
do
the
data
part
right,
but
it's
I'm
just
saying,
but
we
will
not
be
providing
common
API.
A
B
It
could
be
like
everything
flows
through
the
network,
including
metadata
plus
the
data
blocks,
so
everything
just
got
put
into
a
PVC
the
metadata
and
the
data
block,
eventually
from
a
backup
software
perspective
yeah.
So
in
the
grand
scheme
marketing.
What
is
the
total
like
overhead?
There
I
think.
The
other
thing
is
also
like.
If
you
want
to
talk
about
performance,
optimization
with
network
bandwidth,
it
isn't
right,
it
isn't
like
not
without
its
overhead.
This
is
like.
Maybe
it's
not
less
obvious
until
like.
E
B
D
You
yeah,
but
there
are
many
many
networks.
Right
I
mean
the
device
Channel
if
the
vendor
were
to
implement
this
add
a
some
with
some
spec.
The
vendor
could
use
internal
device
dedicated
networks
to
move
data,
as
opposed
to
whatever
we
have
for
the
inter-process
communication
and
the
kubernetes
side
right.
E
A
We
are
yeah,
we
only
have
one
minute
left,
okay.
So
how
do
we
go
from
here?
So
now
we
have
this
new
approach.
Does
it
make
sense
to
move
forward
with
the
POC
see
how
that
works,
or
should
we
also
come
up
with
this
like
this?
Can
we
come
up
with
this
com
upper
limit
and
try
to
see
if
we
can
go
back
and
talk
to
the
API
reviewers
about
this
aggregated
API.
B
I
think
like
well,
at
least
personally
I
would
like
to
give
this
a
shot
to
get
some
feedback
from
like
people
like
Jeff
and
David.
It's
still
I
think
one
way
or
the
other
something
has
to
give
like.
We
can't.
A
Okay,
so
you
want
to
you
want
to
do
a
POC
with
this.
Maybe
you
can
do
a
POC
and
then
yeah,
and
we
can
look
at
that
next.
Next.