►
From YouTube: Kubernetes SIG Node 20210901
Description
Meeting Agenda:
https://docs.google.com/document/d/1j3vrG6BgE0hUDs2e-1ZUegKN4W4Adb1B6oJ6j-4kyPU
A
A
So
yeah,
let's
start
with
this
first
item
mike
mike's
here.
B
Sure,
yes,
I'm
here,
okay,
so
there
was
an
issue
with
no
problem
detector
for
a
particular
job
that
pushes
images
it
was
failing
because
there
was
a
cloud
in
it.
Yaml
file
missing
the
repository.
B
I
created
it,
but
it's
still
failing
on
the
particular
reason.
If
you
go
to
the
to
the
issue,
then
there's
I
don't
think
there's
I
don't
think
I
linked
it
there
but
yeah.
That's!
Basically
the
error
gcc
not
found.
I
take
it
a
little
bit
more
and
it
appears
that
the
seagull
enabled
flag
is
true.
B
C
I
thought
most
stuff
we
built
on
debian
for
eventual
like
destroyless
stuff.
B
Yeah
right
now,
I'm
using
the
we
have
like
synthetic
sugar
and
up
on
top
of
the
of
the
google
cloud
images,
which
is
this
particular
image
from
from
kubernetes.
B
A
B
A
B
Oh,
you
know
it
used
to
work
yeah
yeah.
I
was
wondering
also
because
I
noticed
that
there
are
some
some
images
already
I
mean
there
are
some
versions
which
are
pushed.
B
A
A
D
D
C
Yeah,
so
that
is
a
fairly
annoying
pr
that
deflates
a
bunch
of
the
storage
eviction
tests.
It's
been
open
for
a
while
just
needs
some
reviews.
C
C
This
mostly
had
the
issue
of
not
actually
solving
any
problems.
Unfortunately,.
A
Yeah
for
gpu
tests
do
we
need
to
restore
it.
I
believe
francesco
wanted
to
understand
what
like
first,
do
you
want
to
test
device
plugin
with
gpus
right
and
second,
do
we
need
gpu
tests
at
all.
C
I
mean
so
we
had
the
existing
broken
gpu
tests
that
I
now
have
at
least
running
and
nearly
have
working,
but
they
were
broken
both
in
terms
of
like
the
way
we
configured
the
host
was
broken
and,
like
actual
logic
in
the
test,
was
broken.
F
Yeah,
so
this
is
a
fallout
from
clayton's
refactor
of
the
pod
life
cycle
and
static.
Pods
are
not
being
recreated
correctly.
F
Hash
that
gets
generated
and
the
side
effect
is,
is
that
static,
pods
that
get
created
with
controllers
or
will
get
restarted?
And
so
I
don't
know
if
any
controllers
actually
do
that.
But
that
would
be
the
fallout
from
this
one,
but
it
does
fix
the
issue
and
seems
to
be
working
pretty
good
within
our
ci.
So
far
as
well.
G
I
had
a
good
question
about
this
one,
actually
so
like
what
I
still
don't
understand
is
before
the
refactor
did
like.
How
did
it
work?
I
guess
because
did
the
refactor
change
what
what
was
taken
into
account
for
the
uid
or
kind
of
how
does
the
refactor,
I
guess
affect
that.
F
So
the
refactor,
I
think,
manifested
the
problem
further.
I
think
this
was
a
problem
prior
to
the
refactor
as
well.
F
D
F
My
local
bench
testing
shows
how
to
fix
those
eye
completely,
but
it's
running
through
openshift
openshift
ci
right
now
and
upstream
ci.
C
Might
be
worth
adding
an
ete
for
that?
Yes,.
F
Pr
yeah
yeah:
I
can
we're
trying
to
get
a
release
done
for
this
week,
so
yeah.
Hopefully
I
can
put
it
into
a
separate
pr.
A
I
think
it's
a
follow-up
from
regression
right.
G
Think
you
again
yeah.
This
is
yeah.
Yes,
sir
ryan,
just
just
to
be
clear
on
this
one
we
wanted.
I
think
I
left
a
comment.
One
one
question
I
had
about
this
one:
there
was
like
the
existing
scheduling
test
that
already
had
a
job,
so
I'm
just
wondering
if
maybe
we
should
modify
that
already,
instead
of
creating
new
tests.
Just
because
for
this
we
will
need
to
ensure
the
scheduling,
folks
are
on
board
and
create
new
jobs,
etc.
G
F
G
Okay,
yeah
yeah,
because
I
mean
it
looked
like
they
already
had
a
job
or
tesla
did
something
very
very
similar.
The
only
difference
is
that
the
pods
didn't
have
any
resource
requests.
So
that's
why
the
that's
why
I
like
it
didn't
reproduce
the
issue.
So
I
think
if
we
just
copy
and
paste
their
existing
tests
and
only
add
resource
requests,
I
think
it
would
be
the
same.
A
Great
okay
issues
to
do.
D
D
Like
we
maybe
can
get
rid
of
this
one,
because
I
use
the
like
overall
gce
alpha
features
job
as
a
put
for
like
pull
requests
rather
than
a
node
specific
one
like
there's.
B
D
The
corresponding
job
runs,
I
think,
like
all
of
the
ede
stuff,
with
all
the
feature
gates
on.
So
that's
why
I
just
want
to
leave
that
job.
A
D
I
feel
like
that
that
test
has
been
failing
for
a
while
on
various
different
things.
When
did
it
start
failing.
D
A
E
It's
dragon
cell.
Thank
you.
A
Why
you're
here
perfect,
thank
you
for
taking
for
looking.
A
A
Here
as
well,
yeah.
A
A
A
Cool,
okay
and
yeah-
I
think
it's
assigned
tomorrow,
set.
A
A
Okay,
I
think
we
done
with
no
assigned
triage.
We
can
get
more
in
depth
once
we
caught
up
on
everything.
A
Yeah,
we
did
it
last
time
and
we
cleared
quite
a
few.
D
D
D
A
But
this
is
the
bug
for
api
server.
D
I
mean
david's
comment
about
the
or
I
think
it
was
from
david
that
the
api
server
can't
tell
the
difference
between
a
random
anonymous
user
and
the
cubelet
talking
to
something
anonymously
is
true,
so
I
think
it's
a
feature
request.
D
A
A
Again,
removing
from
the
board.
A
H
A
Okay,
remember:
we
have
like
two
or
three
bugs
about
google
restart
people
updating
certificates
for
for
something
and
they
restart
kublet
and
at
the
startup.
Everything
is
not
ready.
A
H
D
D
I
don't
think
that
it
assumes
that
containers
are.
D
H
But
it
it
prevents
the
service
from
from
forwarding.
C
H
D
D
D
Oh,
the
other
thing
too
is
they
could
potentially,
if
they
add
a
delay
to
the
probe,
then
won't
that
also
solve
this
problem.
It
won't
mark
it
as
not
ready
until
I
think
it.
A
Yeah,
I
think
this
behavior
is
expected
when,
when
you
restart
for
unknown
reason,
when
you
restart
for
to
update
certificate,
maybe
I
mean
maybe
we
need
to
allow
certain
graceful.
D
Well,
whether
or
not
like
I
mean
I
don't
think
the
timing
is
a
matter
of,
I
don't
think
you
can
make
any
assumptions
in
terms
of
how
long
your
cubelet's
been
offline
or
something
like
that
you
could
checkpoint.
So,
for
example
like
when
the
cubelet
shuts
down,
you
could
dump
the
state
of
everything
known
to
disk
and
then
like
pull
it
back
in
and
be
like.
Oh
well,
this
thing
was
ready
previously,
so
I'm
going
to
assume
it's
still
ready
until
proven
otherwise,
but,
like
I
think,
that's
really
fraught.
D
D
Don't
think
it'll
actually
help,
I
think
fundamentally,
the
issue
is
people
want
things
to
magically
work
in
ways
that,
like
architecture,
will
limit
them
working
like
if
we
make
this
always
work
by
default,
then
we
cause
broken
behavior
in
other
cases,
and
then
we
get
a
bug
about
that.
It's
not
possible
to
make
everything
like
magically
work.
The
way
that
people
want
it
to.
I
think.
D
A
A
A
Yeah,
but
it's
I
think
we
need
to
triage
it
into.
H
D
If
it
can't
be
reproduced,
then
we
shouldn't
market
triage,
accept
it.
We
should
mark
it
triage.
I
think
it's
not.
Reproducible.
D
There's
a
triage
label,
is
it
yeah
needs.
D
A
E
D
It
has
a
documentation
label
on
it,
so
it
might
be
something
I
might
have
a
comment
on
here.
So
it
sounds
like
something
I
would
do.
D
I
think
this
is
a
misunderstanding
of
the
pod
life
cycle
and
so
like
we
should
document
this,
so
people
stop
asking
about
it
and
getting
confused.
D
A
D
A
C
H
C
C
A
H
D
For
this
one,
if
we
put
it
in
the
needs
information
column
sergey,
can
we
make
sure
that
we
also
put
a
triage
needs
information
label
on
there?
Because
that's
what
I
use
to
manage.
C
D
D
D
D
I
think
that
there's
like
we
have
a
few
issues
on
so
skills
floating
around
right.
Now
I
don't.
I
think
it
would
be
good
to
have
a
sort
of
more
unified
strategy
for
how
we
want
to
deal
with
them
like.
This
is
clearly
an
issue
affecting
a
bunch
of
people
trying
to
use
kubernetes,
but
we
don't
really
have
anybody
who
owns
that
right
now
and
I
don't
I
don't
think
we
should
try
to
like
you
know,
address
these
things,
sort
of
like
pieced
meal
or
with
bits
of
documentation
or
small
fixes
here
and
there.
D
I
think
we
just
need
to
determine
like.
I
know
that
there
was,
for
example,
there
was
like
a
request
to
like
change.
The
list
of
white
listed,
says
cuddles
or
allow
listed
cis
cuddles.
I
don't
know
if
that's
been
taken
up
by
anybody,
but
that
was
discussed
previously.
So.