►
From YouTube: Kubernetes SIG Node 20220308
Description
Meeting Agenda:
https://docs.google.com/document/d/1j3vrG6BgE0hUDs2e-1ZUegKN4W4Adb1B6oJ6j-4kyPU
A
All
right
well
welcome
everyone
to
the
march
8th
2022
edition
of
sig
node,
we'll
kick
off
our
meetings,
as
we
usually
do
here.
Sergey
won't
even
update
on
where
we
are
with
overall
velocity
for
the
past
week.
B
Yeah
this
week
was
very
slow.
We
still
fighting
some
bugs.
I
think
one
of
them
is
on
agenda
today
and
box
creates
many
pr's,
so
we're
growing
on
pr's.
Some
of
them
are
working
progress,
so
I'm
not
that
concerned
about
it
we
merging
on
a
regular
basis.
Maybe
some
of
yards
was
closed.
I
looked
through
all
of
them.
Nothing
is
getting
lost
as
a
rotten
or
anything
like
that.
So
nothing
is
saving,
so
very
good
work.
B
Everybody
thank
you
for
keeping
the
pace
and,
as
I
said,
there
are
many
work
in
progress
prs
and
the
prs
needed
review.
So
if
you
have
time,
look
at
the
ones
that
needs
review.
A
All
right
very
cool,
and
then
I
guess
that
can
lead
into
our
next
topic,
which
is
where
we
are
on
the
release
and
what
we
think
is
relatively
on
track
or
not.
I
saw
you
had
a
list
of
items
there.
I
don't
know
if
you
want
to
give
any
highlights
on
each
or,
if
folks,
on
the
call
of
concern
that
their
item's
not
there.
B
B
Next
one
is
credential
provider.
I
I
saw
the
pr
that
is
there,
but
I
think
cap
is
calling
for
more
work,
especially
for
internet
testing,
and
I
don't
think
this
is
being
worked
on
by
anybody.
B
So
last
release
we
cut
it
out
of
release
because
of
lack
of
activity.
B
I
wonder
whether
we
need
to
do
the
same
facilities,
because
similarly,
nobody
working
on
that
like
idt,
was
working
on
that,
but
then
she
stopped
and
andrew
I'm
not
sure
how
different
it
is
right
now.
A
Yeah
I
had
worked
with
andrew
most
on
this
in
the
past
so
and
I'll
ping
him
and
see
if
he
has
cycles
or
not
but
yeah.
I
guess
it's
the
continuing
barrier
to
fixing
the
attitude
cloud
provider,
migration.
B
Question
removal
done,
we
still
have
few,
please
admit,
docker,
shame
but
mathias
is
working
on
with.
Please
admit
all
yours.
It's
like
see
windows,
storage,
to
remove
those.
C
Yeah
I
can
represent
this,
I'm
not
sure
if
peter's
on
the
call
we
chat
about
this
one,
it's
kind
of
in
progress.
I
think
the
issue
is
we're
thinking
to
make
it
beta,
but
to
make
it
beta,
we
kind
of
need
to
turn
on
the
feature
gate
by
default,
and
then
I
think
we're
ready
to
do
that
fully,
because
the
some
of
the
pieces
of
the
cap,
like
the
alternative
metrics
in
the
cri,
are
not
fully
implemented
yet
like
a
new
prometheus
endpoint.
C
So
we
might
track
this
one
as
alpha
for
now
and
gonna
see
how
it
goes,
because
we
aren't
really
fully
ready
to
turn
on
the
feature
gate.
So
we
had
some
offline
discussion
about
that.
One
we'll
probably
update
the
cap
to
reflect
the
latest
status
there.
B
Okay,
I
will
fix
the
comment
later.
Thank
you
for
update,
swap,
I
think
I
talked
to
elana
before
she
went
on
vacation
to
other
people
who
was
interested
in
doing
that
work.
So
it
seems
that
swap
is
not
happening.
This
release
because
of
lack
of
contributions.
B
Let's
move
on,
I
will
comment
on
the
cap
to
cut
it
from
release
priority
class
radio
based
critical
shutdown.
David
is
a
here.
C
Yes,
so
this
one,
I
believe
I
mean
renault,
are
tracking,
I
believe
the
cap
has
been
merged
and
I
believe
I've
saw
the
implementation
and
I
reviewed
it.
I
think
it
just
needed
some
approvals,
so
yeah.
D
It's
on
my
list
I'll
get
to
it
this
week.
Oh.
B
So
we're
keeping
it
in
the
series
and
last
one
grpc
probes.
I
send
the
pr
for
that
and
yeah
just
waiting
for
api
review.
I
would
suggest
we
like
it's
past
some
deadline,
but
I
would
suggest
to
keep
it
anyway.
It's
it's
really
straightforward.
B
And
those
plans
can
be
work
in
progress.
Whatever
hat
is
small
pr,
it's
it's
due.
I
think
we
you
can
keep
it
in
place
vertical
scaling.
I
think
vinyak
commented
here
he's
not
on
the
call,
but
he
said
that
he
has
some
other
priorities,
but
he
is
actually
working
on
he'll
start
working
on
derek's
comments
after
coming
friday,
so
there
are
still
time
2
29th,
so
I
think
you
can
keep
it
this.
B
Category
checkpoint,
I
saw
ice
peers.
Yes,
I.
B
Perfect
and
image
pro
secrets.
D
E
Yeah
sure
it's
a
complicated
subject
so
probably
being
off
offline
call
to
go
into
details,
but
suffice
to
say
there
was
we.
We
split
the
feature
up
into
two
phases
and
the
second
phase
was
going
to
include
persisting
the
information,
unfortunately,
to
process
the
information.
We
need
some
changes,
probably
in
the
container
run
time
and
some
other
policies
within
kubernetes
for
image
image
pulling,
because
we
can't
just
persist
a
list
of
yeah
pods
that
you
successfully
have.
E
You
know,
pulled
images
and
or
the
references
of
those
images
alongside
a
hash
for
the
secrets,
because
that
would
make
the
secrets
even
more
vulnerable
than
we
have
today.
So
we
need
a
way
to
you,
know
checkpoint
it
in
a
secure
way,
probably
encrypted
or
something
that
code
is
probably
going
to
be
exposed.
So
we
need
a
new
policy,
we
don't
usually
store.
E
You
know,
checkpoint,
secure
information
right,
so
just
checkpointing
it.
It
needs
a
cap.
Basically,
and
that's
why
we're
splitting
it
into
two
phases?
The
current
phase
only
works,
if
you-
or
at
least
you
know,
with
with
respect
to
the
expectation
of
the
of
the
current
cap
in
this
pr,
only
works
if
you,
if
you
require
garbage
collection
to
have
occurred
on
the
images.
E
If
you
haven't
collected
those
images,
then
we
won't
know
because
we're
not
persisting
whether
or
not
the
image
was
pulled
with
a
secret
or
was
loaded
into
the
container
runtime
directly
or
in
the
past
had
been
loaded
with
a
sacred
grid
because
we're
not
persisting
the
past
information
right,
so
don't
leave.
You
know
correctly
pointed
out
that
you
know
his
bar
is
doing
persistence
and
I
agree.
We
need
to
do
persistence.
We
we
just
have
to
decide.
E
We
want
is
phase
one
still
viable.
Are
we
okay
with
using
an
alpha
feature,
gate
and
telling
customers
that,
if
they
want
to
test
this
gate,
we
suggest
that
they?
You
know
in
fact,
do
garbage
collection
of
the
images
and
not
to
tr
otherwise,
not
to
trust
any
images
that
are
currently
cached.
A
Mike,
could
you
help
me
out
for
a
second
on
that
one?
So
I
remember
reviewing
the
cap
in
the
design
and
then
I'm
glad
that
jordan
caught
that
topic.
If
we
just
proceed
with
phase
one,
what
is
net
new
in
the
cubelet
with
this
capability.
E
Right
then,
the
net
new
would
be
with
the
images
you
know
garbage
collected.
You
can
turn
this
feature,
gate
on
and
start
testing
the
the
performance,
and
you
know
making
sure
that
everything
works
right
with
actually
doing
a.
You
know
in
memory,
persistence
for
the
life
of
that
kublet
right,
be
it
a
day,
be
it
a
week
where
you
could
actually
know
that
you
can
use
pull
if
not
present
and
not
pull
always.
E
You
know,
with
a
with
the
controller,
to
make
to
force
pull
always
which
allows
us
to
go
on
to
the
next
performance
problem.
What
we're
trying
to
do
in
fast
solutions
is
get
down
to
sub
second
pawn
initialization
and
if
you
have
to
pull
always
you're,
not
gonna,
you're,
not
gonna,
find
all
the
all
the
issues
that
we
need
to
fix
for
fast
pause,
the
pods
right,
so
we
were
really
just
trying
to
make
progress.
E
Knowing
that
this
isn't
the
end
game
with
having
some,
you
know,
long-running
image,
cache
persistence,
hosted
by
the
container
runtimes
or,
however,
we
end
up.
You
know
solving
this
problem.
We
we've
got,
we've
got
a
bunch
of
problems
to
fix
and
we're
just
trying
to
make
it.
So
we
can
test
fast
solutions
right.
D
E
E
What
the
expectation
is
is
that
there's
some
magical
image
cash
policy,
that's
happening
on
the
images
based
on
the
poll
policy,
that's
been
used
and-
and
that
is
not
the
case
right
so
there's
some
security
issues
and
thus
everybody's
just
either
using
never
pull
or
pull
always
and
there's
a
lot
of
security
issues
even
with
the
never
pull,
because
you
don't
know
where
that
image
came
from
right.
E
So
I
agree
mornelle
the
end
game
might
actually
be
to
to
define
the
image
cache
policies
in
your
pod
spec
bring
that
down
into
the
you
know,
extended
runtime
cache
manager
and
actually
have
them
implement
the
the
image
policy
instead.
So
what
would
happen,
then,
is
in
google
it
when
it
does
insured
secret
pulled
images.
It
would
just
pass
the
pod
spec
with
the
policy
for
caching
down
to
the
container
runtime
and
the
secrets.
E
You
know
the
the
current
credits
that
we
have
and
then
what
what
we
would
do
is
reply
back
from
the
container
run
time
that
yeah
gabrielli.
We
have
pulled
the
image
with
this
with
this.
You
know
this
particular
cred
or
a
hash
of
that
credit
you
like
and
then,
and
then
we've
handled
it
and
now
you
can
run
your
pot
because
we
have
all
the
images
right
pulled,
but
we're
not
doing
it
that
way
today
it
would
require
a
re-architecture
yeah.
A
I
guess
so
thanks
mike
for
describing
that
what
I
was
trying
to,
I
feel
like.
A
With
the
current
approach,
absent
fixing
those
things,
at
least
that's
what
I'm
hearing
when
I
hear
this
out
so
like
I'm
wondering
like,
do,
we
want
to
just
revisit
and
go
back
and
explore
the
runtime
route.
That
seems
interesting,
but
I'm
wondering
like
the
risk.
If
we
go
ahead
with
the
option
as
it
is
now,
does
that
minimize
our
ability
to
evolve
that
feature
gig
going
forward?
Will
we
have
to
unwind
something
if
we
just
ultimately
think
it's
going
to
go
into
the
run
time
well,
do
you
feel
an
urgency.
E
A
Help
me
understand
that
I'm
sorry
and
then
I
don't
want
to
take
too
much
time.
There's.
E
A
I
understand
that
so
you're
saying
you're
still
keeping
the
list
of
images
that
required
authentication
on
pool
as
a
checkpoint
file
on
the
node,
and
so
you
at
least
will
know
that
I
use
the
cred
to
pull
that
image.
I
guess
what
I'm
wondering
is.
Do
you
really
know
that?
Because
you
don't
know
if
the
image.
A
E
E
Cloud
environment
right:
you
create
a
vm,
you
create,
it
is
white,
usually
when
you're
wiping
kubelet
you're
wiping
the
node
you're
starting
it
up
over
again.
In
that
case,
it
has
value.
Okay,
oh
yeah.
I
shouldn't
have
made,
I
didn't
mean
to
say
it's.
Only.
A
performance
sign,
yeah.
D
E
It's
the
value
in
that
case
and
the
wiping-
and
it's
also
the
value
you
know
for
when
you're
trying
to
if
we're
using
this
as
an
evaluation
set
of
code,
but
marnell's
right,
you
know,
maybe
the
the
right
answer
here
is
just
to
start
over
with
a
new
set
of
policies
and
implement
the
policy
in
a
different
place
instead
of
the
mhgc
manager
in
kubelet,
we
could
handle
that
down
in
the
container
run
times
I'll
be
at
this.
E
E
That
would
be,
you
know,
supportive,
we,
if
we
and
then
we
also
added
the
additional
information
above
the
who
pulled
it,
to
be
whether
it
was
pulled
the
secret
or
not
right.
A
Yeah,
the
other
part
of
me
is
trying
to
think
what,
if
the
keyboard
did
it?
Is
there
a
useful
future
audit
event?
We
could
admit
yeah,
I
guess
from
my
opinion
I
could
see
there
being
some
benefit
as
it
is
now.
If
I
reboot
a
host
and
people
might
find
that
practical
and
get
some
experience
on
it,
but
I
guess
ultimately
I'd
defer
to
you
and
renault
and
if
you
think
it's
worth
keeping
it
around
or
not.
A
E
Yes,
is
there
more
value
in
the
in
the
end
game?
Yes,
I
guess
the
other
question
would
be
derek
in
the
in
this
group.
Does
this
break
us
from
other
possible
solutions
in
the
future?
None
come
to
my
mind,
but
liggett's
point
that
there
may
be
some
expectation
that
this
problem
is
fixed.
A
Yeah,
basically,
even
in
alpha
we'd,
have
to
enumerate
all
of
the
security
gotchas
in
that
documentation
and.
A
E
A
If
not
or
you
have
one
later,
please
reach
out
on
slack,
I
guess
and
going
to
the
rest
of
the
items.
I
guess
we
talked
about
venez
in
place,
update
already
and
then
bobby.
Do
you
want
to
talk
around
the
out
of
cpu
issue?
You've
been
exploring.
C
Yeah
sure
so,
basically
it's
just
gonna
update
from
last
week.
The
context
is,
there
is
a
kind
of
regression
in
122
with
with
part
of
the
pod
life
cycle,
refactoring
stuff
that
basically
introduced
an
issue
that
when
pods
are
basically
the
when
during
termination,
the
kublet
sends
a
update
to
the
api
server
and
it
sends
the
update
that
the
pod
is
terminated.
But
it's
actually
not
fully
terminated
yet
and
that
causes
that
new
pods
when
they're
scheduled
they
might
be
rejected
by
the
kubelet
locally
because
they
don't
have
enough
resources.
C
So
we
talked
about
this
last
week.
There
was
a
pr
up
like
by
clayton
who
is
attempting
to
fix
the
issue,
so
I'm
kind
of
working
with
with
clayton
on
that
right.
Now,
it's
turning
out
to
be
kind
of
complex
fix,
because
we
kind
of
want
to
make
a
minimal
fix,
because
I
think
we'll
need
to
cherry
pick
us
back
all
the
way
to
122,
but
it
requires
some
some
kind
of
deeper
changes
in
pod
life
cycle,
stuff
and
so
there's
kind
of
discussion.
C
If
we
have
enough
testing
around
it
and
so
forth,
and
I
kind
of
looked
at
the
latest
code,
I
think
it
makes
sense
to
me
that
the
latest
change,
basically
the
latest
change
reports
to
the
api
server,
the
the
terminal
phase,
only
when
it's
fully
actually
terminal
and
all
the
containers
are
not
running
anymore.
So
I
think
the
fix
looks
good.
The
only
question
if
like
the
testing
is
sufficient
and
if
we
have
enough
coverage
and
so
forth,
like
I
did
another
test
yesterday,
I
actually
found
like
a
regression.
C
I
think
in
the
pr
I
posted
it
on
my
github
comments,
just
by
doing
some
manual
tests,
so
that's
kind
of
the
big
bigger
concern
if
we
just
have
enough
testing.
So
if
anyone
wants
to
take
a
closer
look
and
have
some
ideas
on
other
test
cases
and
stuff,
we
should
look
at,
I
think
they'll
be
valuable.
C
So
that's
going
to
update
for
that
issue.
So
that's
that
one
and
then
any
questions
from
that
one.
I
have
like
slightly
related
topic
I
want
to
go
to,
but
maybe
there's
some
questions
on,
that.
B
First
yeah,
I'm
curious:
did
we
close
on
the
discussion
of
reversing
the
fix
that
will
cause
the
regression,
because
I
agree
with
your
assessment
like
if
new
fix
will
cause
new
regression,
it
may
be
even
worse
situation,
we'll
chase
backs
all
the
time.
C
Yeah
yeah,
so
that's
another
option.
I
guess
the
issue
is,
I
guess,
like
these
changes
are
all
introduced
in
kind
of
a
really
big
pr.
The
pod
lifecycle
refactor
pr
back
in
122.,
like
I
think
it's
almost
a
year
ago
at
this
point,
I'm
not
sure
how
feasible
it
is
to
fully
revert
it.
I
guess
we
haven't
really
explored
that
option,
but
maybe
something
worth
to
consider,
but
I
think
the
issue
there
were.
C
Yeah,
I
think
the
problem
is
that
pr
did
fix
a
lot
of
issues,
but
you
can
introduce
some
other
ones,
so
it's
kind
of
fix
one
thing
break
another
thing,
type
of
thing:
yeah,
so
anyways,
that's
in
progress
and
then
another
issue
that
I
found
actually
just
yesterday,
I
haven't
opened
up
a
github
issue,
but
it's
kind
of
related
to
actually
what
lifecycle
refactor
stuff.
C
One
of
the
changes
as
a
result
of
the
pod
life
cycle
refactoring,
is
that
when
pods
are
terminated,
for
example
by
kubla
during
eviction
or
graceful
shutdown
or
any
other
use
cases
where
kublet
basically
manually
kills
a
pod
before
pre-122
before
the
public,
refactor
stuff,
the
status
of
the
pod
and
all
of
the
conditions
associated
with
it,
like
the
ready
condition
and
the
container
stats
et
cetera,
were
not
actually
reported
by
kublet
and
after
122
after
the
public
lifecycle
refactor.
C
Actually
that
changed
and
now
kubla
does
report,
container
statuses
and
ready
conditions
and
so
forth
after
pods
are
terminated
and
the
problem
is
kind
of
reports
them
in
two
phases.
One
is
that
it
puts
the
pot
into
a
terminal
phase,
so
it
changes
the
phase
to
like
succeeded
or
failed,
and
then
the
second
update
it
actually
updates
the
conditions
and
so
there's
an
issue
we
found
kind
of
internally,
where
basically
the
kubelet
might
might
might
not
send
a
second
update.
C
For
example,
if
the
node
is
shut
down
or
the
node
is
terminated
for
whatever
reason,
so
it
can
basically
leave
pods
in
a
very
weird
state
where
the
pod
is
in
terminal
phase
or
failed
phase,
but
the
ready
conditions
still
reporting
true
and
that
kind
of
creates
a
problem
because,
like
kubernetes
services
and
endpoints
and
stuff
stuff
like
that,
use
the
ready
condition
to
tech
to
detect
if
the
pod
is
ready
or
not,
and
so
this
can
result
in
basically
traffic
being
sent
to
pods
that
are
not
actually
like
ready.
C
So
that's
kind
of
an
issue
so
I'll
open
up
a
github
issue
to
talk
about
this
more.
I
just
kind
of
thought
about
this.
Yesterday.
C
It
doesn't
really
matter,
I
think
the
the
case
I
was
looking
at
was
for
graceful
shutdown,
but
it's
it's
it's
any
for
any
case
when
the
kublet
initiates
an
eviction,
basically
because,
prior
to
this,
just
no
conditions,
no
details
were
reported.
Basically
if
the
bot
was
in
failed
so
because
the
update
happens
in
two
two
states
basically
like
in
two
steps
to
the
api
server.
C
If,
for
whatever
reason,
the
second
update
is
not
sent-
or
maybe
it's
sent
but
like
basically,
the
first
update
is
sent
is
just
updating
the
phase
and
a
lot
of
other
controllers
and
external
things
like
the
service
controller,
endpoint
controller,
etc.
Don't
actually
look
at
the
phase,
they
look
at
the
ready
condition,
so
one
thought
I
have
here
on
how
to
fix
this.
Potentially.
C
Is
you
know
if
we
already
know
the
pod
is
going
to
be
in
terminal
phase
right
like
failed
or
succeeded,
we
should
probably
not
report
a
ready
condition
of
true,
because
ready
condition
is
kind
of
used
by
you
know,
services
to
detect
if
they
can
send
traffic
and
if
we're
changing
the
terminal
phase.
That
means
we're
already
shutting
down
this
pod.
So
that's
one
idea
but
probably
needs
a
little
bit
more
discussion.
I
think
so.
A
All
right,
any
other
topics
that
people
want
to
race
today.
If
not,
we
can
give
back
a
half
hour.
A
Everyone
who
participated
and
we'll
talk
to
you
next
week,
bye
everyone
bye
thanks.