►
From YouTube: Kubernetes SIG Node 20210324
Description
Meeting Agenda:
https://docs.google.com/document/d/1j3vrG6BgE0hUDs2e-1ZUegKN4W4Adb1B6oJ6j-4kyPU
B
A
B
A
So
we
have
pretty
empty
agenda
and
we
have
a
test
fees
today.
The
suggestion
is
to
go
through
all
the
gtm
issues
and
try
to
see
which
one
we
need
to
like
scream
about
and
has
to
merge
and
whether
we
can
expedite
pr
that's
under
review
as
well.
D
Magic
one
second.
C
Okay,
so
here
we
go,
this
is
the
list.
We
got
seven
things,
and
I
know
that
we're
waiting
on
this
one
today,
which
is.
C
The
the
like,
I
think
there
are
some
concerns
about
whether
or
not
we
could
run
these
tests
in
conformance
so.
C
C
One
has
also
been
sitting
with
derek
I've,
been
talking
with
jordan
about
this
one
because
he's
also
reviewed
it,
and
basically
this
fix
is
a
thing
that
we
would
like
to
backport
to
118,
but
it
looks
like
it
adds
a
new
race
condition
which
I
had
brought
up
and
apparently
others
have
concerns
about,
and
so
the
difficulty
with
this
one
is
that
if
we
don't
merge
it
now,
we
can't
backport
it
to
one
118
because,
basically,
like
the
development
won't
reopen
for
122
until
after
the
last
cherry
pick
deadline
and
the
last
118
patch
release.
C
So
there's
a
concern
here
and
that
if
we
don't
merge
this
now
we
can't
backport
it
to
118,
but
there's
also
a
concern.
It
will
introduce
more
bugs.
So
there's
we're
kind
of
at
a
bit
of
a
standstill
on
this
one
and,
like
I
mean
I've,
talked
to
the
release
team,
but
there's
not
really
like
they're,
not
gonna,
hold
the
release
for
this.
As
far
as
I
can
tell
so,.
C
C
But
yeah
there
are
folks
that
are
kind
of
like
screaming
about
backwarding
this
to
118..
The
problem
is
the
thing
that
this
tries
to
address.
I
mean
it
fixes
an
actual
serious
correctness,
bug
that
affects
clusters
from
when
nodes
like
go
offline
and
go
back
online
and
they
basically
mess
up
all
of
their
online
pod
statuses
and
pod
gets
stuck
in.
I
think
node
affinity
and
it's
it's
not
great.
C
C
C
I
mean
like
basically,
there
are
two
options
right.
We
can
merge
this
now
and
then
we'll
have
time
to
let
it
bake
and
do
the
back
ports,
but
that's
very
relatively
risky
for
this
late
in
the
release,
or
we
can
wait
until
122
and
then
it
won't
get
backported
to
118..
I
think
those
are
basically
the
options.
C
C
Yeah
yeah,
I
mean
people
can
take
the
patch
and
build
it
themselves
if
they
so
wish.
But
you
know
that
can
be
very
difficult.
So
yeah
I
mean
like
not
that
there's
like
a
poll
here,
but
you
know
some
people
like
well.
We
should
merge
it
now
because
we're
worried,
but
it's
like.
C
We
are
also
worried
about
making
it
worse,
because
we've
got
a
new
potential
race
condition
we
might
be
introducing-
and
it's
just
so
hard
to
like
detect
this
sort
of
thing
without
ci
soaking
I
mean
the
initial
patch
that
sort
of
spawned
this
issue
looked
very
innocuous
at
the
time
and
has
introduced
all
sorts
of
weird
knock-on
effects.
So.
C
Yeah,
I
don't
know,
do
we
want
to
like
put
a
statement
on
here
like
on
behalf
of
the
like
node?
I
guess
triage
that
we
reviewed
this
today
circa.
Do
you
want
to
do
that?
C
C
I
can
give
you
a
very
quick
background,
which
is
basically
so
a
patch
went
in
which,
like
so
the
issue
was
that,
like
that
users
would
experience,
is
they'd
turn
off
a
node
and
turn
it
back
on
and
all
the
pods
in
the
node
would
get
stuck
in
node
affinity
and
they'd
have
to
do
all
sorts
of
weird
things
to
get
like
those
pods
to
like
actually
properly
die,
terminate
get
rescheduled.
C
That
kind
of
thing-
and
this
was
this-
was
happening
basically
because
nodes
would
come
online
and
they
would
mark
themselves
as
a
ready
before
they
even
had
a
chance
to
sync
with
the
api
servers,
so
they'd
be
working
with
like
outdated.
Like
view
of
like
what
the
api
state
currently
was
for
those
pods
and
as
a
result
like
you
know,
there
would
be
sort
of
a
race
and
a
mismatch,
and
so
the
the
fix
was
don't
mark
the
note
as
ready.
C
Until
you
know,
we
have
the
we've
synced
with
the
api
servers
and
we
can
get
that
state
and
make
sure
it's
in
sync.
So
we're
not
doing
anything
wonky.
C
So
it's
a
correctness
fix,
but
there
was
there
have
been
a
few
knock-on
effects
from
that,
so
ones
that
nodes
sometimes
will
take
longer
to
start
if
they're
not
running
in
standalone
mode,
because
they
take
time
to
sync
with
the
api
server
and
so
in
some
like
single
node
instance
cases
or
with
cube
adm
or
that
kind
of
thing
it
can
take
longer
to
bootstrap
now.
E
C
Yeah
it's
it's
like
you
know
like
in
the
order
of
like
it
might
take
a
minute,
I
think,
40
seconds.
What
was
what
most
people
were
seeing?
It's
not
like
it's
gonna,
take
30
minutes,
or
something
like
that.
So
so
there
was
that
there
have
also
been
a
couple
of
follow-up
issues,
claiming
that,
like
the
node
affinity
issue
was
not
fixed
by
this
patch,
but
that
has
not
been
my
experience
in
production
yet
so
that,
basically,
that
there's
more
investigation
that
needs
to
happen.
C
I've
also
seen
some
weird
consequences
that
seem
to
start
once.
We
merge
this
in
openshift,
for
when
we
do
this,
but
we
also
on
reboot,
do
a
cryo
wipe.
So
we
wipe
all
of
the
like
previous
container
statuses
and
that
has
caused
some
like
weirdness,
with
the
like
the
node,
not
really
knowing
like
and
basically
like,
causing
some
accounting
issues
for
pod
life
cycling,
but
because
that
doesn't
happen
with
container
d
or
like
typically
in
like
cube
ci.
We
haven't
really
seen
that
anywhere
else.
C
So
that's
what's
going
on
with
this
and
yeah,
there's
a
concern
that,
like
moving
this
into
a
go
routine
when
this
is
not
thread
safe,
it's
going
to
cause
concurrency
issues.
If
we
merge
this,
so
it
might
make
this
faster
and
it
might
address
the
regression,
but
it
might
also
add
new,
bigger
batter
problems
and
because
the
cubelet
has
like
so
many
things
that
are
liable
to
get
into
race
conditions.
It's
very
uncertain.
What
merging
this,
especially
this
late
in
the
cycle,
will
actually
do
so.
F
It
seems
to
me
that,
from
project
perspective,
the
benefits
are
not
really
clear
and
I
mean
the
the
the
unknown
the
unknown
unknowns
is.
What
worries
me
most.
As
engineer
like
we
can,
there
is
a
concrete
risk,
not
just
people,
brainstorming
and
introducing
more
data
races,
which
is
scary,
because
it's
hard
to
detect
on
ci,
so
my
take
is
a
disclaimer.
F
Unfortunately,
it
seems
to
be
that
the
safest
call
is
to
not
merge
it
yet,
but
I'm
fully
willing
to
defer
the
the
call
to
the
final
call
to
derek
and
to
others,
but
from
my
engineering
and
I'm
done
from
an
engineering
perspective,
it
seems
that
the
unknown
unknowns
outweigh
the
benefit,
the
known
benefits.
So
this
is,
I.
C
Think
I
agree
with
you
there,
but
unfortunately,
those
folks
who
are
like,
but
we
desperately
need
this
patched
in
118.
It
will
not
make
them
happy
so.
F
But
this
is
a
release.
Is
it
a
religion,
engineering
issue
I
mean
I,
the
the
pr
percy
has
those
problems,
but
can
we
I'm
not
enough
deep
enough
in
the
kubernetes
release
engineering
procedures
like
hey?
Can
we
make
an
exception
somehow
or
it
is
now
or
never,
because
there
are
two
different
things?
In
my
opinion.
C
Yeah,
I
think
part
of
the
problem
too.
I
mean
this.
This
pr
has
been
open
for
quite
a
while
and
part
of
the
problem.
Is
nobody
wanted
to
make
a
call
on
it
earlier
in
the
cycle?
And
now
it's
a
month
later,
we
still
haven't
merged
it,
but
now
it's
kind
of
like.
Oh,
I
don't
know,
and
there
have
been
like
a
lot
of
iterations.
So
it's
not
like
you
know.
This
was
just
sitting
here
or
something
like
that.
C
I
think
like
there
were
changes
pushed
as
late
as
yesterday
so
like
this
is
it's
not
like?
It's
been
a
static
patch
kind
of
sitting
here
without,
like
being
looked
at
for
you
know
a
month,
so
maybe,
given
that
yeah
I
mean
I
I
would
lean
to
all
sorts
of,
but
I
don't
wanna,
you
know
sergey.
Do
you
have
thoughts
on
this?
This
is
kind
of
like
our
big
flaming.
You
know
like
one
critical,
urgent
thing.
We
probably
need
to
decide
today
as
node.
A
Yeah,
I
think
the
problem
is
kubernetes
state
overall
is:
there
are
different
types
of
users,
like
some
users
just
happy
to
take
whatever
and
they
need
features
and
they
need
more
like
innovations
and
some
people
just
need
stability,
so
we
need
to
decide
like
which
camp
we
are
leaning
towards.
So
talking
from
stability
perspective
and
like
being
conservative,
if
you
you
don't
take
fixes
that
way.
E
C
Yeah,
I
think
I
think
what
you
I
think
you
raised
a
point
earlier,
which
I
thought
was
very
good,
which
was
that
if
we,
you
know
like
this,
isn't
really
the
118
cherry
picking
thing
I
mean:
that's,
not
really
our
problem
as
a
node
like
that's
kind
of
the
release,
team's
problem
and
I
think
like
as
node
it's
reasonable
for
us
to
say
there
are
known
unknowns
that
make
us
not
super
comfortable
wanting
to
merge
this
now.
C
C
C
Oh
great
and
yeah,
I
think,
other
than
that
we
have,
I
think,
there's
only
like
two
things
here:
labeled
as
release
blocker,
which
are
these
two
and
then
we
have
who
marked
this
life
cycle
frozen.
Don't
do
that.
C
I
think
these
are
all
on
the
test
enhancements
board.
So
maybe
sergeant
do
you
want
me
to
hand
it
over
to
you,
since
these
all
have
milestones
attached,
we
should
be
able
to.
I
think,
if
we
go
to
the
board,
I
think
you
can
filter
by
milestone
like
milestone.
V12.
Does
that
work
yeah?
So
we
can
just
like
go
over
them.
There.
C
C
A
A
Good,
so
look
at
the
release.
Plugins,
all
of
them,
except
myself,
like
meister,
was
also
flaky
yesterday,
so,
like
all
of
older
tests
are
flaky
and
it's
different
tests.
A
So
this
is
not
like
even
a
test.
It's
just
like
not
test.
It's
something,
this
infrastructure,
it's
fishy,
oops,
it
didn't
mean
to
quote
it.
A
C
Thing,
I
think,
like
anything
over
95
is
probably
pretty
good.
C
A
Okay,
so
yeah
something
wrong
with
the
with
the
infrastructure,
so
this
also
free
link
with
other
all
like
this
infrastructure
and
then
what
else
I
was
looking
at
yeah.
A
Oh,
this
one
is
also
like-
has
a
lot
of
flakes
yeah,
so
I
mean
I
don't
think
like,
since
it
is
blocking
and
critical
are
not
like.
Yesterday,
there
were
a
couple
flags
that
I
don't
see
them
today,
so
likely
just
infrastructure,
timeout
issues.
So
and
since
we
don't
have
anything
like
just
jumping
on
us,
it
should
be
fine.
I
mean
we
should
be
ready
for
release.
A
C
F
C
Do
we
want
to
fix
the
does?
It
have
stick
storage
on
there
already.
A
A
And
jin
kissing
from
six
stars.
C
A
Yeah,
I
think
we
can
just
archive
it
out
of
so.
The
only
thing
is
about
pause
image
like
if
you
want
to
like
help
with
that,
but
get
it
so.
C
D
C
So
if
it
got
merged,
we
should
probably
just
close
it.
Oh
yeah,
they
linked
the
issue,
but
they
didn't
put
fixes.
So
it
never
got
auto
closed.
C
C
A
D
I'm
sorry,
I
wasn't
able
to
look
at
it
last
week.
I
might
take
a
look
at
it
this
week.
C
Oh,
is
it
the
yeah
10.50
or
whatever.
C
A
D
A
D
A
C
C
A
C
Yeah,
so
if
that's
the
case,
then
I
think
we
can
close
it.
A
C
Yes,
we
were
trying
to
do
with
this
release
and
we
have
a
docs
pr,
that's
up
as
well.
The
only
thing
is
we
need
to,
like
you
know,
update
the
test
to
conformance
and
derek
said
he
wanted
to
go
through
them
all,
to
make
sure
that
we
weren't
accidentally,
including
anything
with
unsafe
ciscodales,
because
those
should
not
be
in
conformance
okay,.
A
A
A
It's
not
like,
we
are
allowed
it
just.
This
is
quite
a
recent
practice
to
like.
C
C
A
A
C
But
this
is
new
yeah.
I
understand.
Okay,
makes
sense,
yeah,
I'm
just
hoping
like,
because
we're
currently
like
we're
holding
the
thing
on
the
website
on
this
pr-
and
I
don't
think
that
makes
any
sense
because
I
don't
know
if
the
plan
is,
we
revert
the
feature
flag
change
like
that's,
not
gonna
happen.
It
was
already
defaulted,
so
I
don't
know
I'll
check
in
with
people.
Hopefully
this
will
just
get
urged
and
it'll
be
fine
with
potential
changes
that
are
needed.
A
We
and
I
cleaned
up
like
triage
issues
so
like
there
are
a
few
review
in
progress,
and
I
didn't
see
anything
critical
that
we
want
to
take
in
this
release.
So
we
I
wanted
to
go
with
with
all
of
you
again,
but
it
may
not
worth
the
time.
C
E
E
E
I
saw
some
commit,
probably
from
you.
If
I'm
not
mistaken,
my
memory
is
corrupted.