►
From YouTube: Kubernetes SIG Node 20211014
Description
Meeting Agenda:
https://docs.google.com/document/d/1j3vrG6BgE0hUDs2e-1ZUegKN4W4Adb1B6oJ6j-4kyPU
A
Hello,
everybody
happy
cubicon
week.
It
is
thursday
october
14th
and
we
are
the
ci
subgroup
for
sig
node.
We
have
a
few
agenda
items
for
today.
I
don't
think
that
we
currently
have
francesco
or
imran
with
us,
so
I'm
gonna
skip
those
for
now.
I
paint
them
in
slack
just
in
case
they
might
join
us
mike
troubleshooting
container
d,
1.4,
canaries.
What
we
got.
B
Sure,
first
of
all,
for
a
little
more
context,
this
derives
from
a
job
I
was
creating
for
having
container
one
continuity,
1.5
calories
after
some
suggestion
to
move
them
to
another
directory.
B
I
noticed
that
the
container
the
1.4
canaries
are
failing
and
if
you
go
to
the
template-
and
you
click
any
of
those
you'll
notice-
that
the
failures
are
because
the
cluster
is
not
started
well,
the
clutter
is
not
starting,
but
also
when
looking
at
the
code
of
the
of
the
node
logic
I
it
looks
like
it
expects
the
node
to
already
be
present.
It
doesn't
create
it.
B
Oh
the
first
one
always
works,
because
it's
just
the
build,
but
the
other
two
they
keep
failing.
I
see.
B
No
this
this
is
the
around
me
to
move
my
my
original
work
to
another
directory,
and
this
is
the
old
directory.
B
B
It's
it's
an
fyi,
I'm
thinking
like
we
have
to
create
a
cluster
before
the
actual
test
starts.
I'm
not
really
sure
about
this,
but
that's
that's
what
I'm
thinking
from
from
bringing
the
logs.
B
A
Cool
makes
sense,
I
don't
know
any
history
about
this
job.
Maybe
we
should.
A
Cc
dimms
and
danny.
A
Because
I
am
not
working
on
container
d
things
primarily,
okay,
that's
that
next
item
memory,
pressure
testing
with
swap
enabled
how
can
we
run
tested
machines
with
swap
enabled
that's
a
great
question?
We
do
have
some
configs
for
this-
that
I
comma
put
together
in
test
infra.
B
A
I
just
don't
remember
the
job
name
is
so
it's
in
here.
I
guess
okay!
Well,
this
is
good
enough.
There's
this
node
args
image,
config
swap
so.
Basically,
if
you
take
a
look
at
this,
I
think
it's
the
jobs,
eating
node,
swap
config.
A
There's
a
couple
of
configs
here
and
basically
as
long
as
you
pass
this
config
for
the
nodes,
it
will
like
use
this
metadata
to
make
a
swap
partition.
So
there's
one
for
ubuntu
and
there's
one
for
fedora
core
os,
and
then
you
can
see
that
like,
for
example,
this
one
just
runs
a
command
to
make
a
swap.
A
On
and
then
similarly,
this
one
just
has
an
ignition
file
that
turns
on
swap.
A
Oh,
no,
I
think
I
think
we
were
just
tagging
them
with,
like
probably
so.
Currently
we
don't
have
any
tests
that
are
specific
to
swap.
Rather,
what
we've
been
doing
is
we
just
run
all
of
the
tests
in
a
swappy
environment,
or
at
least
like
all
of
the
standard
edes,
so
the
selectors
for
the
job
that
I
showed
you
they
were
just
all
of
the
standard
edes.
A
We
don't
have
any
specific
tests
for
swap
yet
like
the
idea
is,
while
we
run
these
tests
with
swap,
we
do
want
to
start.
Having
like
we
want
to
be
able
to
like,
for
example,
have
some
nodes
that
have
like
exercise
different
swap
configs,
but
that's
all
going
to
be
like
infrastructure
configuration
and
not
like
test
level
configuration.
A
Set
these
values
in
the
nodes
when
they
boot
up
and
then
we
want
to
run
the
tests
so
right
now
at
least
most
of
the
memory
tests
for
swap
like
there's
no
specifics
today,
there's
no
specific
test
selectors
for
swap
tests
that
may
change
as
we
start
adding
new
categories
of
tests
where
we
both
need
the
swappy
environment
and
we
have
special
tests
that
we
want
to
run
on
swap,
but
currently
there's
no
swap
specific
tests
and
the
reason
is
in
part
because
the
swap
changes
were
very,
very
limited
in
terms
of
what
change
in
the
code
like
it
was.
A
Basically,
here
are
some
new
configuration
values
and
the
actual
code
change
itself
was
here's
a
new
value
to
the
cri
which,
like
you
know,
I
think,
currently
still
the
cris,
don't
actually
support
doing
anything
with
that
value.
They
drop
it
on
the
floor,
both
cryo
and
container
d.
So
that
is
the
state
of
the
app
it
doesn't
look
like
we
got
francesco
or
imran,
but
I
guess
let's
go
up
the
agenda.
C
A
Then,
let's
do,
let's
do
some
quick
bug,
triage.
A
And
let's
maybe
also
take
a
quick
look
at
the
test
board.
A
A
Someone
has
assigned
themselves
to
fix
this.
Let's
check
governator,
hopefully
we'll
crash.
That
was
okay.
A
Yeah,
I've
used
a
unicomp,
it's
really
not
the
same.
I
rescued
this
one
from
ebay
like
well
when
I
was
in
university
so
a
while
ago
and
I
oh,
it
was
filthy
when
I
got
it
and
I
cleaned
it
and
yeah
I've
taken
good
care
of.
It
still
works
great
and
yes,
it's
very
loud.
My
apologies.
A
Okay,
so
we've
triaged
that
one,
I
guess
that's
to
do.
I
don't
know
what
else
there
is
to
take
a
look
at
on
this
board.
I
guess:
are
there
anything
that
has
lgtm
on
it
a
couple
things.
A
A
Okay
and
I'm
not
gonna,
I
guess
why
does
this
if
this
fixes
a
flake?
Why
does
this
not
have
area
test
on
it?
Maybe
we
can
add
it
to
area
test.
A
Okay
and
then,
let's
take
a
look
at
bugs.
A
No
new
bugs
this
one
has
been
triggered
already.
Oh
and
it's
urgent.
How
many
do
we
do?
Oh
sorry,
one
second,
I
need
to
be
right
back.
A
Let
me
stop
my
video
for
a
second
y'all
can
continue
without
me.
If
you
want,
I
don't
know
how
long
this
is
going
to
take.
Let
me
stop
sharing
my
screen
for
a
moment.
B
A
Yes,
wrong
board:
this
is
the
pr
triage
board.
There's
this
other
one.
A
Some
background
from
this
one
is
so
in
sig
instrumentation.
This
is
a
instrumentation
cap.
Like
the
event
api
got
redesigned
a
very
long
time
ago.
C
A
Not
that
one,
that's
apparently
the
first
one,
because
the
search
engine
is
383..
I
wanted
to
give
you
a
little
bit
of
background
on
this
one
while
we
were
staring
at
it.
So
this
is
a
very
old
cap.
A
We
basically
like
the
event
api
was
rewritten
and
then
in
theory,
so
like
this
one
beta
in
one
eight
and
then
they
tried
to
ga
in
119,
but
I
argued
that
you
can't
call
it
ga
unless
you've
like
deprecated
the
old
thing
and
so
there's
still
a
bunch
of
things
that
are
not
using
the
new
event
api
and
every
time
I've
asked
someone
to
step
up
and
take
ownership
of
this
feature
and
please
finish
migrating
the
rest
of
the
things
it's
been
kind
of
hot
potato,
so
yeah.
A
It
looks
like
somebody
commented
yesterday
and
removed
the
stale
from
this,
but
like
it's
as
far
as
I'm
concerned,
it's
in
beta,
it's
not
stable
because,
like
initially
it
was
started
by
merrick
and
then
chelsea
chen,
and
she
has
not
worked
on
this.
Basically
at
all,
since
it
was
handed
over.
So
I
think
we
might
need
to
find
a
new
owner,
but
yeah.
This
is
kind
of
a
backlog
thing,
I'm
not
sure,
like
technically
it's
owned
by
sig
instrumentation,
but
there
are
stakeholder
sigs
and
I'm
not
sure.
A
I
I
feel
like
this.
It's
a
very
old
old.
I
guess
it's
got
the
old
design
proposal
and
all
that
so
like
it
would
have
to
be
migrated
to
the
new
template
and
yada
yada.
I
think
that
honestly
people
should
just
do
that.
They
don't
need
a
cap
to
do
the
migrations,
but
we
can't
really
call
it
ga
until
it's
fully
migrated.
So
anyways,
that's
just
some
background
on
what
what
is
going
on
there
with
the
event
api.
A
Let's
look
at
some
bugs
because
prs
will
take
a
long
long
time.
How
did
this
pr
get
in
here?
Maybe
schedule
like
it's
rotten
so
let's
go
through.
I
think
there
were
no
new
bugs
to
add
to
the
board
so.
A
A
A
We've
got
a
bunch
of
these
oh
and
there's
another
one.
That's
already
triggered
so
great.
A
A
D
C
A
A
Yeah,
I
don't.
I
don't
think
that
that
makes
any
sense.
I
think
that
they
don't
understand
the
readiness
behavior
if
there
is
a
network
partition
where
the
entire
control
plane
gets
severed
from
the
data
plane,
there's
nothing
wrong
with
the
workloads
like
the
workloads
are
still
probably
running
and
ready.
The
issue
is
that,
from
a
endpoint
perspective
from
a
control
plane
perspective
they're,
not
ready
because
it
can't
check
in
like
that
is
the
safe
failure
mode.
A
So
when
something
like
that
happens,
we
like
I
would
expect
that
services
which
require
communication
with
the
nodes
would
go.
Haywire
like
the
issue
is
that
services
are
part
of
the
control
plane,
so
you
can
have
service
outages
if,
like
your
control,
plane
goes
down
and
I
think
that's
the
issue
here.
The
workloads
are
fine.
I
I
people
seem
to.
A
I
get
the
impression
from
getting
a
lot
of
these
bugs
and
reading
through
them.
Recently,
people
really
believe
that
not
ready
means
something
other
than
it
actually
means,
and
maybe
that's
like
a
function
of
bad
terminology,
which
we
probably
can't
change
now,
but
like
readiness
does
not
mean
it
doesn't
mean
that
the
thing
isn't
actually
running.
It
means
that
from
kubernetes
perspective,
it's
accounted
for
everything
and
it's
running
correctly.
So.
A
A
Yeah,
oh
I
mean
basically
the
the
issue
is
so
like,
possibly
so
federico.
I
guess
this
must
have
been
mark's
api
machinery,
yeah.
B
A
This
doesn't
make
sense
why
it
was
api
machinery.
I
agree
this
is
node.
I
think
that,
like
the
issue
here,
is
that,
like
it's
honestly,
it's
more
of
a
networking
thing.
It's
certainly
not
api
machinery.
A
Don't
want
like
if
your
control
plane
is
fully
severed
from
your
worker
nodes,
like
no
matter
what
you
do
that
service
mesh
is
not
going
to
work
in
theory,
you
would
want
to
set
up
multiple,
like
failure
domains
in
order
to
like
not
have
a
full
network
outage
like
that,
but
if
you've
got
a
full
net
workout,
what
are
you
gonna?
Do.
A
Okay,
so
that's
that
enabling
feature
gate
blah
causes
unit
test
failures.
A
A
A
A
A
This
seems
to
be
a
node
problem.
Detector
thing
mike,
is
there
someone
that
I
should
assign
this
to.
A
A
We
have
10
minutes
to
spare.
Should
we
go
through
any
of
these
ones
to
see
if
they've
been
updated.
A
This
just
needs
information.
This
is
a
feature
request,
not
a
bug,
pod
stuck
terminating.
Let
me
ask
for
more
information.
Oh
somebody
has
a
repro.
E
A
No,
he
said
he
could
only
reproduce.
D
B
A
A
A
A
A
We
could
probably
see
some
of
those
next
time,
but
I
think
we've
done
plenty
of
bugs
thanks
folks,
we're
doing
good
work
and
I
hope
you
have
a
great
rest
of
your
day
and,
if
you're
attending
cubecon,
I
hope
you
have
a
fun
cubecon.