►
From YouTube: Kubernetes SIG Node 20210804
Description
Meeting Agenda:
https://docs.google.com/document/d/1j3vrG6BgE0hUDs2e-1ZUegKN4W4Adb1B6oJ6j-4kyPU
A
It's
august
4th
2021,
ci
subgroup,
meeting
of
signature
welcome
everybody.
Okay.
So
today
we
discussing
a
couple
topics:
first,
1.5
branch
continuously
canaries
are
missing.
Contingency
1.5
was
released
quite
recently,
I
mean
quite
some
time
ago
and
we
ordered
it
152.
I
believe,
and
we
still
don't
have
tests
for
that
mike
I
talked
to
mike
and
like
you
wanted
to
take
it
mike.
Do
you
want
to
take
it.
B
Yeah
yeah
I'll,
take
it
a
question:
should
I
make
new
jobs
for
this,
or
should
I
just
modify
the
1.4
no.
A
B
A
Will
come
out
of
support
in
february,
but
till
now
till
then
we
definitely
need
to
keep
your
prying.
B
A
Okay,
any
anybody
wants
to
like
comment
on
that.
Any
any
other
thoughts.
D
A
C
Oh
yeah,
so
if
you
go
to
the
cubelet
serial
dashboard
on
test
grid,
I
mean
we've
got
like
all
sorts
of
stuff
in
the
sig
node
dashboard.
But
basically
you
go
to
the
signaled
cubelet
one.
E
C
C
I
think
we
actually
like
care
a
lot
about
staring
at
our
pre-submits
in
test
grid,
in
a
way
that,
like
most
other
sigs,
don't
because,
like
we're,
often
trying
to
deal
with
flaky
tests
and
like
patterns
in
test
breakage
and
test
grid
gives
us
the
ability
to
do
this
in
ways
that,
like
it's
hard
while
staring
at
prow,
I
think
part
of
the
problem
is
that
a
lot
of
cigs
are
like
well,
but
the
tests
are
green.
C
C
But
I
mean
I
think
all
of
this
is
like
very
willy-nilly:
we
can
do
whatever
we
want
in
test
grid.
It's
just
a
matter
of
convention,
so.
A
And
it's
mostly
a
matter
of
like
how
we
look
at
that.
So,
for
instance,
like
another
thing
we
can
get
out
of
this
is
previous
releases.
Oh
wait,
so
you
can
have.
C
Sorry,
I'm
just
looking
at
the
comment
in
the
chat,
and
I
see
arno
is
here
hi
or
no,
you
know
we
haven't
standardized
the
so
the
comments
in
chat
was
specifically.
Do
you
want
to
prefix
them
with
poll?
All
of
the
jobs
are
prefixed
with
poll,
but
for
whatever
reason
in
the
dashboard,
they
all
start
with
pr.
C
Thank
you,
I
mean
to
be
clear,
like
all
of
this
is
potentially
up
for
refactor.
The
reason
I
brought
this
up
is
because
our
test
grid
tests
are
kind
of
a
mess,
so
this
is
one
thing
where
I
was
just
like.
Well,
this
is
very
obvious
that
I
can
fix.
I
can
take
all
of
the
pre-submits
and
put
them
in
another
place
and
that'll
at
least
like
clean
things
up
a
little
bit
in
terms
of
organization.
C
The
other
thing
too,
which
is
sort
of
related
to
the
next
item,
is,
I
think,
for
the
most
part,
all
the
pre-submits
are
in
a
special
pre-submit
config
file,
but
I
think
possibly
not
all.
So
if
they
are
not
all
in
there,
I
would
like
to
all
move
them
into
the
same
config
file
so
that
they're
all
in
one
place.
A
Yeah,
it's
better
to
sort
by
function.
I
think
at
least
periodic
to
different
behavior.
What
I
started
talking
saying:
do
we
want
to
take
other
releases
out
of
this
dashboard
as
well?
So
basically,
this
will
be
like
view
of
on
main
on
kubelet.
C
We
maybe
yeah
the
the
cubitium
kind,
jobs
that
have
all
the
skus
we
might
want
to
put
those
into
like
a
specific
place.
That's
just
cubelet
version,
skew
testing.
E
C
A
C
C
A
So
this
is
escalations
a.
C
C
Yeah
when
I
was
walking
somebody
through
the
state
of
our
jobs
under
the
signo
job
configs,
I
was
like
what
is
going
on
here.
This
is
kind
of
all
over
the
place
and
I
think
it's
just
it
just
sort
of
happened
to
get
that
way
organically.
I
don't
think
this
was
by
design
so
and
now
that
the
that
we're,
hopefully
cutting
122.
Today
we
have
the
opportunity
to
go
and
break
things
between
releases
when
no
one
will
notice
that's
my
idea.
A
A
C
Sure
it's
gonna
happen,
but
you
know
we
can.
We
can
reduce
some
of
the
chaos
temporarily.
A
Yeah,
so
I'm
I'm
just
surprised
to
see:
stick
storage
here.
Oh
the
only
kind
of
like
question
for
me.
A
I
A
Okay,
mike.
B
I'm
sure,
as
an
effort
to
remove
the
node
feature
tags,
which
are
already
duplicated,
I
submitted
a
couple
of
weeks
ago,
a
pr
that
basically
removes
those
those
tags.
I
noticed
there
are
a
lot
of
tests
that
still
have
these
tags
and
I
want
to
remove
them.
I
just
had
a
question:
do
we
need
to
backfill
this
into
previous
feature
branches?
C
So
I
think
that's
a
good
question
so
sergey
for
122.
I
feel
like
the
only
like
blocking
stuff
that
we
have
set
up
for
branch
is
the
like
node,
you
know
121
122,
etc
tests.
Those
don't
need
to
change,
because
yeah.
A
Changing
the
reorganization
of
tests
in
previous,
this
is
a
mess
because
we
may
start
skipping
tests
that
we
didn't
intend
to
skip,
and
it
will
be
really
hard
to
to
discover.
So
I
would
suggest
we
just
keep
it
as
is,
and
generally
cherry
peaks
are
for
blocking,
bugs
or
a
critical
box
or
for
test
fixes.
That
will
definitely
improve
like
specific
tests
that
is
flaky
or
something.
F
C
A
Okay,
so
this
pr
needs
to
review.
C
Yes,
so
this
one,
I
don't
think
this
is
on
our
board.
I
think
this
is
on
the
main
note
board,
but
derek
flagged
this
to
me
today,
because
I
think
this
one
got
lgtm
and
if
you
look
at
the
actual
changes,
it's
like
there's
a
very
small
code
change
which
I'm
not
actually
sure
if
it's
correct
and
then
a
lot
of
test
changes
and
so
derek.
A
C
This
is
what
I
mean
by
test
changes
like.
There
are
modifications
in
the
get
history.
Not
a
test
was
changed.
A
Yeah,
I
added
it
to
our
board.
It
seems
to
already
have
signees
like
quite
a
few
of
them,
but
yes
did
you
already
look
at
that.
F
Yes,
the
the
code
change.
I
I
think
I
think
it's
correct
because
at
the
beginning,
if
you
look
at
the
first
comment,
I
asked
him
what
it
was
about.
I
didn't
agree
at
the
beginning,
but
I
think
I
think
it's
correct
and
then
I
I
mostly
put
the
lgtm
because
of
the
code
change.
I
I
have
the
feeling
that
this
one
is
is
like
the
right
thing
to
do.
C
F
F
I
think
it's
it's
in
the
comments
that
are
not
shown
if
you
go
a
bit
up.
J
F
C
F
J
J
Okay,
hi
I'm
chyoto,
I'm
from
google.
So
this
item
was
about
our
risk
condition
issue
that
the
customers
see
failed
a
pod
with
the
message
saying
no
definitive
failed.
I
think
the
original
issue
was
what's
what's
this.
Let
me
just
paste
in
the
dark.
J
Oh
sorry,
anyway,
this
this
risk
condition
can
happen
when
the
node
restarts
or
couplet
restarts
a
fix
was
supposed
to
to
to
take
effect.
But
we
our
customer
report,
it
doesn't
so
yeah
just
bring
this
up
in
submitting.
C
So
there
isn't
currently
a
an
affected
version
attached
to
this
issue.
I
know
that
the
fix
went
in,
I
think,
to
121..
I
think
it
was
back
ported.
C
C
A
So
I
think
it's
best
to
bring
it
to
sig
note
meeting,
not
ci
subgroup
it's
so
the
issue
is
not
like
actually
actually
product
issue
right.
J
Right,
it
was
just
thomas
bring
this
up
today,
and
we
want
to
talk
about
this.
A
C
J
A
Okay,
but
I
mean
if
it's
like,
if
you
want
to
add
an
integration
test
and
like
I'm,
not
sure
if
integration
tests
can
be
added
that,
like
induce
some
race
condition
by
adding
some
timeouts
or
whatever,
so
we
don't
do
that
in
kubernetes,
maybe
it's
unit
as
it
can
try
to
catch
this
all
right.
D
I
I
was
suggesting
this.
The
way
I
see
it
is
this
case
can
should
be
able
to
reproduce
in
a
integration
test
that
has
a
real
real
api
server
or
we
just
create
the
create
a
pods
before
starting
the
cubelet,
and
then,
of
course,
it
might
not
always
fail,
because
it's
a
race
condition,
but
if
we
are
able
to
to
retry
the
test
several
times
it
might
require
this.
D
C
Hey
aldo
and
welcome,
I
think
that
that
seems
reasonable.
To
me
I
mean
probably
we
might
want
to
mark
it
as
a
slow
test,
if,
like
it,
takes
a
lot
of
tries
to
reproduce,
but,
like
I
mean
I'm
happy
to
take
the
tests
as
a
reproducer
that
we
know
fails
and
like
bring
in
a
fix
that
makes
it
green
it's
just
a
matter
of
like
we
have
to
make
sure
we
can
actually
run
a
test
that
fails.
D
Yeah,
that's
fair,
yeah,
I'm
not
familiar
with
the
code.
So
that's
why
I'm
not
offering
to
do
the
test
myself!
Anybody
else
is
gonna.
Take
this
suggestion.
A
Yeah,
so
she
don't,
if
you
want
to
add
a
test
and
like
running
it
up
with
like
many
retries,
you
will
definitely
fail.
Then
we
can
take
it.
I
mean
we
should
take
some.
I
mean
we
generally
like.
We
already
have
an
issue
saying
we
don't
have
a
lot
of
salt
testing
like
we,
I
mean
a
lot
of.
We
don't
have
any
soak
tasting
in
in
kubernetes,
like
we
don't
test
node
like
kubelet,
for
like
project
gradations
and
like
leakages.
A
J
A
I
remember
troubleshooting
one
issue
long
time
ago,
when
you
start
five
containers
sequentially
very
fast,
and
then
it
start
like
failing
at
some
point.
I
think
when
back
was
fixed,
nobody
was
interested
to
actually
add
this
test,
but
it
was
the
same
discussion.
Basically
so
I
mean
if
it's
really
hard
to
catch
these
bugs
and
starting
this
test
will
be
easiest
approach.
A
C
Hello,
so
I
bumped
this
because
I
think
I
ran
into
this
while
I
was
triaging
some
stuff
and
thought.
Oh,
we
haven't
talked
about
this
in
a
while,
so
in
the
before
times,
or
not
even
in
the
before
times,
but
last
year,
morgan
brought
up
that,
like
we
currently
skip
all
of
our
flaky
tests
in
node
jobs
like
we
don't
run
them
anywhere
and
that's
probably
a
problem,
because
we
can't
see,
if
they're
succeeding
at
all
anymore,
like
we're
getting
no
signal
and
so
they're
effectively
not
getting
run.
C
So
there
was
this
pr
that
was
submitted
in
like
september
of
last
year
to
basically
add,
like
think,
a
flaky
specific
job,
so
to
like
make
sure
that
we're
selecting
on
all
the
flaky
tests,
we
need
to
run
them
somewhere,
so
they're
running
as
a
periodic,
so
this
vr
rotted
out
and
closed
and
then
somebody
reopened
it
but
like
it
needs
to
be
rebased.
Currently
this
can't
merge,
as
is
so,
I
was
wondering
if
we
should.
C
I
think
it
would
be
good
to
revive
this
pr
and
get
the
flaky
tests
running
somewhere.
Someone
can
just
go
pick
up
this
pr
and
run
with
it.
I
just
wanted
to
bring
it
to
our
attention
as
we
continue
to
invest
in
our
test
coverage
and
like
clean
things
up
organization
wise,
because
right
now
we
have
a
bunch
of
deaths
that
we
just
don't
run
now
to
be
fair.
C
I
think
I
recently
looked
at
this
and
because
I
added
two
tests
with
a
flaky
label
and
while
it
is
true
that
we
don't
run
the
flaky
tests
anywhere,
I
think
I
only
found
two
of
them
that
matched
sig
node
and
they
were
the
ones
that
I
added
like
we
didn't
actually
have
any
tests
marked
flaky.
So
I
think
we're
good
in
that
respect.
C
So,
like
we're,
not
actually
missing
a
lot
of
signal,
it
turns
out,
but
we
should
probably
like
have
a
job
for
this
just
so
we
don't
forget
to
run
the
flaky
tests
as
we
mark
them.
Flaky.
A
Yeah,
I
think
we
don't
have
many
flaky
tests
because
morgan
deflaked
quite
a
few
when
we
started
cia
group
so
yeah.
Definitely
anybody
any
takers.
H
C
Your
github
handle,
because
I
can
see
you
on
the
pr.
H
C
I
A
C
So
I
guess
we're
out
of
agenda
items,
but
I
didn't
ask
about
our
perennial
favorite
thing:
francesco.
How
is
fixing
the
serial
test
going.
K
Hey
I
have
a
couple
of
pr's,
which
last
time
I
checked,
they
were
supposed
to
fix
the
flakes
which
I'm
not
sorry,
I
I
mean
eventually
did
we
merge
the
pr
which
was
marking
as
flaky
the
tests
which
were
okay,
great,
so
yeah?
I
will
update
one
flake
the
deflates
and
make
them
real
failing,
but
I'm
confident
that
bot
prs
should
fix
it
and
I'm
volunt.
Of
course
I
will
make
sure
they
will
actually
fix.
K
You
actually
commented
the
lana
on
one
of
them,
which
is
about
adding
a
sleep,
a
longer
sleep
which
I
fully
admit,
is
not
at
all
the
best
solution,
but
I'm
really
open
to
suggesting.
Unfortunately,
it's
quite
technical
to
understand
why
asleep
is
beneficial
and
how
a
better
solution
could
look
like
so
long
story
short.
I
will
update
both
tests,
but
I
expect
only
to
the
label
flaky
and,
let's
discuss
them.
Let's
add
more
reviews
and
like
I,
I
don't
familiar.
K
K
C
I
don't
know
if
we
have
an
issue
for
this
anywhere,
I'm
trying
to
remember
if
this
is
one
that
I
filed
or
not.
I
think
you
did.
I
think
you
did.
Oh,
no,
no
there's
another
issue,
so
the
other
issue
is
that
if
so
sergey,
if
you
pull
up
the
test
grid
tab
for
the
serial
tests
for
a
second,
I
will
show
you
something
fun
and
exciting.
C
I
would
like
that
to
stop
happening,
so
I
don't
know.
I
can't
remember
if
I
filed
an
issue
for
this,
but
like
basically
we
can't
get
signal
on
the
test
if
the
tests
aren't
running
at
all,
and
so
that's
currently
an
issue.
I
think
I
filed
some
issue,
but
I
don't
know
if
it
was
that
let
me
hunt
in
kubernetes
kubernetes.
C
C
So
this
is
potentially
another
thing
that
somebody
could
try
to
dig
into
one
of
the
things
that
was
making
it
very
hard
to
debug.
This
was
that
there
were
no
logs,
so
I
couldn't
figure
out
why
the
cubelet
wasn't
restarting
and
I
think,
there's
a
few
things
going
on.
C
Potentially
there's
like
some
stuff,
I
think
in
the
memory
manager
tests
where,
like
the
keyblade
just
doesn't
restart
properly,
when
we
expect
it
to
like
there's
a
race,
and
I
think
rtm
had
a
pr
up
for
that,
but
I
I
haven't
actually
like
dug
in
in
detail,
so
I
don't
know
what's
happening.
All
I
know
is
that
this
is
very
consistently
happening
so.
K
We
were
since
we
were
talking
about
this.
I
I
remembered
somehow
deems
commented
about
a
fix
to
restart
the
cubelet,
and
maybe
you
alana
commented
about
yes,
but
this
needs
to
take
into
account
the
the
many
units
there.
They
are
restarted,
I'm
sure
and
making
sense,
but
I'm
pretty
sure
it's
something
we
we've
wrote
about
that.
I
was
my
question
is
if
we
have
an
issue
already
to
take
care
of
this
thing.
Well,.
C
C
We
then
don't
have
to
like
restart
the
cubelet
20
bazillion
times
during
a
serial
run,
so
that
would
certainly
reduce
the
likelihood
of
seeing
this
sort
of
thing
happening,
and
I
don't
know
how
we
want
to
track
that
work
for
the
upcoming
release.
I
know
sergey's
been
leading
a
lot
of
the
dynamic
cubelet
configuration
stuff
and
we
have
at
least
one
issue
with
like
metrics,
so,
but
that's
potentially
something
that
also
could
help
with
this,
like
we
just
don't
restart
the
cubelet.
All
the
time
problem
solved.
C
I
A
Think
the
idea
like
what
is
my
idea
was
that
for
many
tests,
dynamic,
google
config
was
used
to
enable
features,
and
this
shouldn't
be
happening.
We
just
need
to
enable
them
when
we
started
like
in
the
test
definition,
and
it
will
make
everything
much
cleaner,
because
I
mean
I
think
so.
This
is
this,
but
in
cases
when
we
need
to
change
configuration,
we
actually
will
need
to
start
restarting
kubelet
and
it
will
be
a
little
bit
more
often
pretty
started.
I
Like
what
I
saw
under
the
memory
manager
problem
is,
the
main
problem
is
not
the
restart
of
the
kubelet.
The
main
problems
that
are
like
our
service
framework
is
restarting
it
automatically
like.
We
have
some
goal-link
routines
that
always
runs
like
in
the
background
and
monitoring
the
kubrick
status
via
health
check
probe,
and
if
it's
seeing
that
it
does
not
response
it
restarted
automatically
and
like
you,
try
to
restart
it
under
the
test
and
once
it
try
to
restart
it
under
the
like
goal.
Routine.
I
Look
because
you
have
some
race
conditions
and
prevent
the
normal
work
again.
I
try
to
reduce
it
like
a
lot
under
the
memory
manager
and
the
latest
vr
like
I
want
just
to
get
rid
of
automatic
restarts
at
all,
but
I
need
to
reject
my
pr,
because
probably
it
was
some
changes
and
I
see
that
memory
manager
like
lane,
started
to
fail
with
my
pr.
So
I
will
need
to
check
it.
A
C
I
haven't
looked
into
it
like,
I
know,
for
a
fact
I
have
seen
in
the
logs
something
that
looks
like
cubelet,
restarting
and
or
crashing
and
or
something
else
like
I
feel
like.
I
can't
remember
who
it
might
have
been
antonio
was
like.
Why
is
the
cubelet
restarting
I'm?
Like?
I
don't
know,
I
don't.
I
didn't
write
these
tests,
so
someone's
gonna
look
into
it,
so
this
could
be
a
fun
and
exciting
project
for
you
to
learn
more
about
the
state
of
cubelet
cereal.
C
I
don't
volunteer
because
I
just
spent
the
last
two
weeks
trying
to
get
these
to
the
relatively
kind
of
green
state
that
they're
in
as
opposed
to
the
like,
very
much,
not
green
state.
They
were
in
before.
A
C
I
I
And
also
some
other
interesting
thing
is
that
we
have
like
a
lot
of
gold
routines
under
the
traceback
like
it's
like
one.
What
like
14
000
of
skipping
lines
of
gold,
routine
trace
bags?
It's
a
it's
like
a
wall!
Oh.
C
A
C
Oh
yeah,
no,
that's
that's
not
this.
This
particular
issue
is
there's
something
where,
like
there's
a
problem
in
the
test
and
then
the
test
call
like
k,
log.fatal
f,
which
then
like.
If
you
call
fatal
f,
it
will
exit
and
print
a
stack
trace,
which
it
turns
out,
is
like
14,
000
lines.
Long.
A
Sorry,
I'm
totally
like
missing
this
video.
I
I
thought
like
I
saw
another
thing
that
not
in
syria,
which
brings
a
lot
of
panics
and
okay.
Sorry.
C
Yeah,
this
is
a
different
thing.
I'll
put
a
note,
we
really
shouldn't
be
calling
fatal
f
in
the
tests.
A
Okay,
looking
at
dashboard.
F
C
I
feel
like.
Oh
can
you
look
at
the
test
because,
oh
I
guess
it's
sc
linux
labeling
we've
got
some
red
hatters
on
the
call
who
love
sc
linux
right.
C
A
C
We're
looking
at
a
we're
looking
at
an
issue,
titled
test
coverage
of
volume,
relabeling
is
lacking
and
it's
labeled
sig,
node
and
sig
storage.
I
think
it's
a
six.
I
was
said.
I
think
it's
a
six
storage
thing
except
the
specific
test
is
an
sc
linux
label
test.
So
yeah.
If
you
want
to
take
a
look,
it
was
previously
marked
flaky.
It
doesn't
like.
C
I
just
did
a
run
for
flaky
tests
and
I
didn't
see
any
so
it's
possible
that
it's
been
like
moved
out
of
flaky
since
then,
but
it'll
be
worth
taking
a
look.
I
think.
H
A
Just
stuff,
I
think
so
maybe
it's
yeah
everything
is
failing,
so
likely
something
broken.
H
A
So
now
we
just
looking
for
reviewers
somebody
who
will
take
initial
look
and
like
make
a
review,
it's
the
opportunity
for
you
to
learn
new
code
base
and
like
get
into
like
what's
going
on
and
what
kind
of
issues
you're
working
on.
This
is
just
to
review
this
pr.
A
A
I
really
like
that
we
have
so
many
agenda
items.
I
hope
that
we'll
keep
being
enthusiastic
and
fixing
bugs
I
still
have
didn't
update
on
not
conformance.
I
will
do
it
by
next
time.
I
think
I
will
have
time
just
this
week.
Thank
you.
Bye,
bye,
everybody.