►
From YouTube: Kubernetes SIG Storage - Bi-Weekly Meeting 2023-07-27
Description
Kubernetes Storage Special-Interest-Group (SIG) Bi-Weekly Meeting - 27 July 2023
Meeting Notes/Agenda: -
Find out more about the Storage SIG here: https://github.com/kubernetes/community/tree/master/sig-storage
Moderator: Xing Yang (VMware)
A
Hello,
everyone
today
is
July
27th
2023.
This
is
the
kubernetes
story,
sleep
meeting.
So
today
we
are
going
to
go
over
our
1.28
planning,
spreadsheet
and
and
then
I
think
there
is
one
you
should
really
want
to
go
over
and
we
have
just
passed.
The
test
freeze
deadline
on
Tuesday,
so
the
next
next
Ally
is
August
8th.
That's
the
dog
complete
and
reviewed
deadline.
A
A
A
Okay,
I
think
you
just
need
to
update
Reynolds
already
a
dog
in
place
that
was
added
earlier.
Okay,
so
it
can
be
painting
offline
or
not.
D
A
C
C
A
Yeah
thanks,
okay,
yeah!
Please
just
share
that
on
the
chat.
Thank
you.
Our
next
one
is
additional
metrics,
so
this
is
still
working
progress.
I
don't
have
new
update
on
that.
A
And
change
block
tracking,
so
we
we
so
we
had
a
meeting
yesterday's
death
protection
group.
We
went
over
the
changes,
so
basically
the
team
went
to
seek
off
and
got
some
recommendation
from
them.
G
Are
you
talking
to
me
Shane
yeah
I
am
struggling
to
get
my
my
Bluetooth
headphones
to
work.
It's
just.
G
I
so
tell
me
which
one
it
was
again.
Sorry
I
was
oh.
A
G
Yesterday,
yeah
I
mean
we
we,
we
did
another
review
of
it
yesterday
in
the
data
protection
working
group
meeting
and
the
big
update
was
the
the
developers
working
on
a
head
talk
to
Sig
auth
and
they
had
proposed
an
entirely
different
authentication
scheme
which
is
way
better
and
way.
Simpler
and
I
think
we
were
all
pretty
happy
with
it
and
we
spent
the
meeting
discussing
just
a
new
way
to
have
access
control
based
on
the
new
authentication
mechanism
and
I.
G
Think
we
basically
reached
agreement
on
that,
and
so
it's
good
progress,
I
I,
don't
know
where
the
CSI
spec
changed
for
that
stance.
I
think
I
think
we
should.
We
should
work
on
that
soon
and
and
I,
don't
know
where
the
cap
stands,
but
the
overall
design
is
is
looking
ready
to
go.
In
my
opinion,.
A
H
A
A
I
I
The
reason
I
was
asking
was
because
there
has
been
some
recent
activity
where
folks
have
asked
for
that
functionality
to
be
enabled
in
beta
and
and
I
just
wanted
to
make
sure
if
this
was
exactly.
If
this
was
that
feature
or
not
sure,.
F
I
Yeah,
yes
great,
so
so
this
this
would
be
available
in
in
in
beta
starting.
A
Monday,
no,
so
because
it's
blocked
on
so
the
reference
Grant,
this
feature
is
moving
to
the
to
the
core
code.
Are
we
moving
this
entry
Michelle
yeah.
D
So
so,
currently
the
the
feature
depends
on
the
Gateway
apis
reference
Grant
and
before
we
move
this
feature
to
Beta.
We
want
to
move
this
reference,
Grant
object
into
the
core
kubernetes,
and
so
that
proposal
is
still
under
discussion.
So
until
we
can
move
reference
grants
into
core
kubernetes
and
move
that
to
Beta,
this
feature
will
be
blocked
from
moving
to
Beta.
I
D
I
Got
it
and
can
the
move
to
the
reference
Grant
to
Beta
and
the
move
to
offer
cross
namespace
volume
data
source
to
Beta?
Can
those
happen
in
the
same
release
or
would
that
have
to
be
I?
Think.
D
I
D
The
best
case
scenario-
okay,
I,
do
think-
maybe
maybe
takufumi
you
can
clarify,
but
I
think
we
do
need
help
pushing
the
reference
scrap
work
to
both
Alpha
and
beta.
I
C
Else,
okay,
I
think
touching
yeah,
that's
the
one!
Please.
C
I
Got
it
okay,
I
know
that
there's
some
interest
internally
to
to
have
this
available
sooner
so
I'll
see
what
we
can
do
from
the
AWS
side
and
I'll
Circle
back
with
you
folks.
So.
A
And
next
one
is
enable
privileged
containers
or
Windows
to
replace
CSI
proxy.
So
many
you
have
an
update
on
this.
I
Yes,
so
we
will
likely
pick
this
up
in
the
near
future.
I
know
that
there
is
some
discussion
that
we
need
to
kind
of
conclude
on
with
Mauricio
and
others,
so
we
are
working
on
identifying
like
you
know
who
can
work
on
this
and
and
what
the
next
steps
should
be
from
our
side,
but
but
we
should
be
able
to
start
contributing
some
Market
to
this
in
the
near
future.
So.
A
B
A
A
So
this
is
done?
Do
we
have
any
documentation
or
things
like
that
to
follow
up,
or
are
we
just
complete.
A
A
So
the
code,
the
code
is
already
merged.
A
A
The
blog
I
think
the
deadline
has
I
think
there
is
a
placeholder
for
that
I.
Don't
think
the
deadline
has
come
yet.
A
A
Okay,
yeah.
We
still
have
some
time
for
the
for
the
blog
yeah
next
one
quality
of
service
for
volumes.
A
So
we
sunny,
sunny
or
mad.
A
E
I
So
I
think
there
was
one
there
were
three
PR's
three
issues
if
I
remember
and
I
believe
that
one
of
them
was
not
yet,
at
least
when
we
had
checked
last,
nobody
had
a
sign
being
assigned
to
had
volunteered
to
pick
that
up.
So
if
I
don't
know
what
the
latest
is,
but
if
that's
the
case,
then
Colin
from
our
team
can
can
help
with
with
that
particular
peer.
So.
E
I
D
E
A
Thank
you.
Thank
you,
okay,
so
it
looks
like
we
are
making
good
progress
here,
see
next
one
robust
volume
manager,
reconstruction,
Young.
A
Okay,
yeah,
if
you
can
because
I
think,
because
the
dog
ready
to
merge
their
line
is
August
8th
right.
So
we
still
have
a
few
days.
Actually,
if
we
can
yeah,
if
you
can
submit
a
while,
I
think
she'll
still
be
signing.
A
D
A
Thanks
and
next
one
is
the
water
extension
for
staple
said
so
come
on
it's
not
here
today.
Yeah,
do
you
know
if
there's
any
update
on
this
one?
This
is.
A
It's
a
design
anyway
right
yeah.
This
is
a
targeting
design.
Meeting
against
you,
okay,
all
right
thanks
and
next
one
is
not
graceful,
no
Shadow.
So
this
one
code
is
merged
and
and
then
there's
the
E3
test.
But
then
we
decide
not
to
merge
that
because
it
depends
on
gcp,
but
there
is
a
integration
test
that
is
being
reviewed,
so
we'll
try
to
get
that
merged
once
the
code
freezes
lifted.
A
So
that's
all
we
have
here
Let's
see
we
have
a
few
things
Justin.
You
want
to
go
over.
The
first
item.
E
J
J
How
should
I
share
my
screen?
Oh.
A
J
So
it's
workload
recovery,
so
I'm
just
going
to
go
into
the
background
on
the
problem
that
we're
facing
the
controller.
That's
the
solution
to
the
problem
and
a
demo
of
the
behavior
of
the
local
static
provisioner
before
and
after
the
controller
implementation.
J
So
the
problem
was
that
as
I'm
sure,
a
lot
of
you
know
pause
using
local
PV
are
always
scheduled
to
the
same
node
as
the
local
PV
they're
using.
But
when
nodes
fail,
while
having
the
local
PV
attached
to
them,
the
pods
using
local
PV
becomes
stuck
since
the
scheduler
is
trying
to
run
them
on
the
Node.
The
local
PV
is
referencing,
but
obviously
the
node
is
deleted.
J
So
it
can't
do
that,
and
the
issue
here
is
that
the
PV
and
the
PVC
aren't
cleaned
up
when
a
node
becomes
unavailable,
and
so
all
those
resources
become
stuck.
So
we
had
some
user
concern.
There
were
some
GitHub
issues
out
for
this
for
recovering
workloads
from
no
deletion,
and
so
the
solution
was
a
cleanup
controller.
That's
already
been
merged
in
the
local
side.
J
Provisioner
repo,
the
high
level
flow,
is
that
when
a
node
is
deleted,
we
look
to
see
if
there
was
a
local
PV
attached
to
the
node,
and
if
there
was
then
we
start
a
timer
for
say,
like
a
minute.
The
timer
is
user
configurable
and
if,
at
the
end
of
the
timer
the
node
is
still
gone,
then
we
delete
the
PVC
first
that
was
associated
with
the
local
PV
to
that
deleted,
node
and
then
the
second
step
is
that
the
PV
now
becomes
released
since
the
PVC
is
deleted
and
there's
a
second
process
always
running.
J
So
no,
we
assume
that
when
it's
so
the
first
bullet
point
here
is
that
well,
the
first
and
second
bullet
point
here
are
that
we
assume
that
local
data
is
lost
when
a
node
is
deleted.
So
we
wait
for
some
amount
of
time,
because
there
might
be
some
edge
cases
in
which
a
node
triggers
a
deletion
event,
but
comes
back
immediately
and
data
is
not
lost,
but
most
likely
when
a
node
is
lost
for
some
time.
J
G
J
Yes,
so
yeah,
so
the
first
bullet
point
here
is
that
users
already
know
that
a
trade-off
for
using
ephemeral
volumes
is
that
data
losses
possible,
and
so
we
assume
that
users
already
value
workload
up
time
over
keeping
data,
since
their
workload
should
already
be
resilient
to
data
loss.
J
G
D
D
Like
in
order
for
someone
to
use
this
controller,
you
know
there's
a
couple
of
different
assumptions
made
about
the
workload
and
how
the
workload
can
handle
data
loss.
But
generally,
this
is
designed.
This
controller
is
designed
to
handle
the
local
storage.
That's
in
Cloud
environments,
where
it
is
inherently
not
a
durable
storage,
and
so,
if
someone
is
using
local
PVS
in
the
cloud
environment,
they
have
to
already
their
workload
already
has
to
be
resilient
to
just
the
data
disappearing.
G
Okay,
but
but
normally
when
the
workload
would
restart,
it
would
restart
on
the
same
node
and
it
would
still
have
its
data
like
if
it
was
just
a
node
reboot.
This
would
be
a
special
case
where
the
note
workloaded
restart
and
this
data
would
be
vanished
and
I'm.
Just
wondering
is
there
anyone
for
whom
that
would
be
a
problem
and
if
the.
D
J
J
So
here
we
have
a
cluster
with
three
note
three
nodes
and
we
have
the
local
static
provisioner
running
and
we
have
a
staple
set
SS
with
one
pod
ss0
that's
running,
so
we
can
see
that
ss0
is
running
on
the
Node
ending
in
fs48
and
it's
using
the
PVC
sspvc
and
that
PVC
is
bound
to
the
local
PV
ending
in
E4.
So
to
trigger
the
issue.
J
J
J
If
we
reset
here,
if
we
have
a
same
conditions
as
before,
we
have
a
cluster
with
three
nodes.
We
have
the
local
static
provision
running
staple
set
SS
with
pod
as
a
zero.
This
time
the
Pod
is
running
on
the
node
ending
in
9rsp.
J
J
First,
we
can
look
at
the
logs
of
the
cleanup
controller.
So
when
the
node
was
deleted,
we
started
the
timer
like
I
mentioned
before
for
resource
deletion.
The
timer
here
was
set
to
one
minute
and
after
one
minute
past
the
node
was
still
gone,
and
so
we
deleted
the
PVC
that
pointed
to
this
deleted
node
and
then,
after
that,
the
PV
became
released,
and
so
the
controller
was
able
to
delete
the
local
PD
that
had
never
Affinity
to
the
deleted
node
as
well,
and
as
you
can
see
from
this
output
up
here.
J
Okay,
when
we
watched
the
pod,
we
can
see
that
the
Pod
got
deleted
when
the
node
was
deleted
and
then
went
into
this
pending
State
as
before,
but
unlike
before
it
was
able
to
get
back
up
and
running
once
the
PVC
was
deleted
and
as
we
can
see
here,
the
PVC
was
deleted
by
the
controller
and
a
new
one
was
brought
up
and
bound
to
a
new
local
PV
I
think
in
BD
here
and
as
we
can
see
here,
the
pod
that's
running
again.
It's
now
scheduled
to
a
different
note
wyty's.
J
J
And
so
some
things
to
know
about
customization
is
that
the
controller
only
deletes
PVS
and
PVCs
belonging
to
specific
storage
classes.
This
is
a
command
line
argument,
so
you
can
opt
in
certain
workloads
for
this
cleanup
and
opt
out
others.
You
can
set
the
delay
between
no
deletion
and
PVC
deletion.
Just
in
case
you
want
to
be
conservative
about
not
being
too
quick
to
delete
resources
when
no
deletion
event
is
triggered.
J
You
can
set
it
to
like
five
minutes
if
you
want
to,
and
you
can
also
set
the
delay
for
PV
deletion,
which
is
the
same
idea
and
some
code
pointers.
The
controller
was
already
merged
and
the
documentation
for
it
was
emerged
as
well
and
there's
an
example,
deployment
and
role-based
access
control
for
that
as
well.
So
you
can
use
it
now
if
you
would
like,
and
if
there
are
any
questions,
I'll
take
those.
If
not
thank
you
for
everybody's
time.
H
A
So
in
terms
of
documentation,
you
will
be
updating
the
release
notes
for
this
controller.
That's.
A
D
It's
going
to
be
released
in
the
local
static
provision,
you
know
yeah
and
so
all
the
documentation
there
and
the
change
log
will
be
updated.
Okay,.
A
A
Okay,
sorry,
the
new
to
go
back
to
this.
Let's
see,
what's
the
next.
A
I
think
the
next
one
yeah.
So
we
have
an
another
one
Baptiste
you,
you
want.
K
K
So
yeah
I'm,
just
like
I,
would
like
to
implement
some
basic
grpc
tracing
on
the
CSI
components
and,
like
obviously
like
hidden
behind
a
feature
flag,
because
we
don't
want
to
have
to
have
it
enabled
everywhere.
But
yeah
I
would
like
to
know
what
the
the
folks
here
think
about
it.
And
if
I
can
proceed
with
some
peers.
A
Somehow
saw
this
one:
did
we
look
at
this
one
before
somehow
remember?
There
is
a
similar
one
that
we
just
reviewed.
K
Maybe
yes
in
the
CSI
drivers
like
for
gcp
and
AWS.
A
K
A
Right,
so
this
is
okay,
not
exactly
the
same.
I
I
see
I,
I
I.
Remember
this
one!
Okay,
it
seems
yes
seems
to
make
sense
to
me.
Michelle
Young,
any
comments
on
this.
K
D
Yeah
seems
good
to
me,
I
guess.
The
only
question
is
I'm
not
familiar
with
this
library.
Is
this
sort
of
like
a
it's.
K
It's
like
the
official
like
open,
Telemetry
lab
for
grpc
instrumentation.
E
B
E
B
Because,
like
we
send
in
the
grpc
messages,
we
send
some
secrets
in
credentials:
I
hope
they
are
not
going
to
leak.
The.
K
Is
it
only
on
the
because
I've
already
like
did
something
similar
in
a
a
deployment
on
my
site
on
the
AWS
CSI
driver,
and
there
are
like
no
secrets
so
whatsoever
looking
through
it?
So
like
the.
D
K
B
K
Basically,
I
don't
have
an
example
right
now,
but
it's
you
have
the
like.
If
there
is
any
an
error,
you
have
the
grpc
message
with
the
Theo
and
it's
basically
it's
like
very
basic
instrumentation,
like
the
time
that
the
like
how
long
the
call
took
and
some
metadata
around
like
which
grpc
service
was
called,
and
things
like
that.
B
K
Find
an
example
of
the
the
fields
we
have
on
our
side
and
just
put
it
in
in
the
issue
to
have
like,
so
you
can
have
a
look
at
it
thanks.
I
One
quick
thing
so,
first
of
all,
thanks
for
sending
the
a
pull
request
for
the
AWS
EBS
CSI
driver
changes
Baptist.
We
did
see
that
request.
We
are
actually
evaluating
right
right
now.
What
our
position
should
be
regarding
regarding
such
changes.
I
I
personally,
don't
have
any
problem
with
the
changes
that
you
have,
but
I
need
to
kind
of
make
sure
that
we
are
consistent
in
terms
of
a
policy
around
what
monitoring
framework
we
are
using
within
the
driver
code
and
and
also
like
you
know,
just
need
to
make
sure
that
there
are
no
concerns
around
adding
of
telemetry
within
the
EVS
CSI
driver.
I
So
we'd
probably
have
a
response
for
you
officially
on
your
pull
request
at
some
point
in
the
future,
but
but
I
just
wanted
to
mention
that
here,
because
we
were
talking
about
it
right
now,.
K
So
once
the
change
is
made
on
the
libutil
side,
I
would
have
to
go
through
the
sidecars
and
like
update
the
code
to
just
use
a
new
function
and
this
part
will
be
hidden
behind
the
the
feature
flag
and
on
the
CSI
drivers
code
itself.
It
needs
to
be
different
because
the
CSI
leave
utils
is
only
responsible
for,
like
the
client
part
of
the
grpc
calls
and
the
CSI
drivers
use
their
own
implementation
for
the
server
part.
C
I
So
is
there
a
particular
motivation
for
for
these
changes
on
your
site
properties
like?
Are
you
actually
trying
to
do
some
performance
profiling,
or
something
like
that
for
which
you
need
this
or
yeah.
K
So,
on
my
side,
I'm
like
we
are
pushing
in
my
company
to
like,
have
some
more
in
depth
of
the
observability
under
the
components
we
are
running
to
like
be
able
to
know,
what's
like
what's
going
on,
and
if
there
is
a
failure
or
something
to
be
able
to
debug,
what's
going
on
more
efficiently
and
so
yeah
like
I'm,
going
to
to
do
it.
If
I
got
Upstream
it
I'm
going
to
to
four
kids,
so
I'd
rather
explain
it
I
think.
I
Got
it
and
I
think
basically,
what
you're
saying
is
the
goal
is
to
do
it
on
both
the
Upstream
components
and
the
CSR
driver
components
to
get
a
complete
picture
is
that.
K
Right
yeah
well,
once
or
like
one
component
is
enough,
because
we
like
it's
better
to
have
like
the
client
and
server
obserability
to
have
like
a
more
detailed
view
of
what's
going
on
button
once
or
only
one
is
enough
to
get
some
basic
stability.
At
least.
A
I
All
set
yes,
that's
right!
That's
right!
So
we
will
I
have
made
a
note.
We
will
follow
up
on
this
and
we'll
see
if
we
can
do
something
from
our
side
to
help
this
along
further
I
I'll
need
to
speak
to
some
folks
internally
to
to
understand.
If
there
is
somebody
who
can
help
out
with
this
effort,
so
I'll
provide
an
update
in
the
next
meeting.
Hopefully,
okay,.
A
Thank
you
say:
issue
opened
for
some
AWS
E3
tests.
I
don't
have
it
here,
but
I
will
ping
you
on
slack
for
that.
So
if
you
can
help.
I
One
other
thing
that
I
wanted
to
update
in
the
last
six
storage
meeting
I
had
promised
that
I
would
get
an
update
on
a
couple
of
EFS
CSI
driver
requests,
which
had
been
sitting
for
a
long
time,
so
I
forget
who
it
was
who
had
asked,
but
I
can
provide
an
update
here
at
this
point.
So
there
were
two
pull
requests.
One
was
I
I,
don't
have
the
link
link
handy,
but
let
me
see
if
I
can
put
it
up.
I
One
was
for
a
PR
7,
7,
32
I,
believe
yeah
I
I.
I
And
the
other
one
was
for
PR
850,
so
for
732.
We
follow
so
for
both
of
these
I
followed
up
with
the
EFS
CSI
driver
team,
and
they
confirmed
that
they
are
both
being
actively
worked
on.
It's
just
that
it
hasn't
I.
Think
just
before
the
last
meeting,
just
after
the
last
meeting
actually
for
7
32,
there
was
an
update,
so
I
think
that
is
reflected
in
the
issue
and
for
850.
I
There
is
some
performance
analysis
that
needs
to
happen,
so
they
are
in
the
process
of
trying
to
prioritize
that
work
and
they
will
probably
get
to
that
in
the
near
future.
So
so
both
of
these
are
things
that
should
should
get
worked
on
and
addressed
in
the
in
the
relatively
near
future.
So.
I
Yeah
no
worries
and
if
you
can
send
me
the
specific
issue
that
was
opened
I
can
I
can
follow
up
on
that
sure.