►
From YouTube: Kubernetes SIG Node 20220323
Description
Meeting Agenda:
https://docs.google.com/document/d/1j3vrG6BgE0hUDs2e-1ZUegKN4W4Adb1B6oJ6j-4kyPU
A
Hello,
it's
what
is
it
23rd
of
march
2022.,
welcome
everybody
to
weekly
signal,
ci
subgroup.
A
We
have
quite
a
few
items
on
agenda
and
then
we'll
go
to
regular
triage
first,
I
don't
know,
but
he
can't
join
so
yesterday
there
was
a
pull
request
that
changed
added
the
new
registry
into
into
the
list.
Unfortunately,
this
pull
request
broke
many
ci
jobs
and
when
I
was
filling
up
that
table,
it
wasn't
clear
whether
this
job
crashed
something
or
like.
So
maybe
we
need
to
refuel
it
a
couple
days
later.
The
reason
for
breakage
is
this
version
two
here,
but
there
are
still
a
version.
A
One
style
configuration
here
and
here
so
yeah.
That's
why
it
broke,
but
there
is
a
like
fixed
pr
already
merged.
I
think.
A
Yeah
so
fixed
pr
has
merged.
Hopefully
many
jobs
will
get
back
to
green,
but
the
main
idea
here
is
to
let
everybody
know
that
we
will
be
switching
to
registry
to
kubernetes
io
right
now,
it's
just
a
outside
redirector
to
a
gcr
registry.
In
future
it
will
be
smarter.
It
will
likely
be
like
different
for
different
clouds,
so,
like
we'll,
have
some
other
smartness
built
in
into
that,
and
it's
in
general
is
great,
so
it's
not
gtr
name
any
longer,
so
yeah
yeah
we're
moving
towards
a
better
future
yeah.
B
Sure
so
this
was
not
a
relatively
old
failure.
I
need
a
review
on
this
change.
Basically,
after
trying
to
run
it,
and
I
was
able
to
reproduce
the
flakiness
and
on
this
failure,
basically,
the
the
ubuntu
swap
job
was
failing
was
failing
due
to
some
conditions
not
met.
B
B
So
I
tried
running
this
on
on
the
on
the
ubuntu
one.
There
was
a
cop.
There
was
some
failures
with
the
timings.
I
just
increased
the
demo
that
it
works
now
and
the
the
time
differences
were
like
very,
very
small,
just
a
couple
of
minutes
a
couple
of
seconds,
so
this
additional
minute
should
be
enough.
B
No,
it
took
enough
time,
I
mean
my
local
machine
was
failing
as
well.
I
increased
it
to
the
the
time
mode
and
just
a
couple
of
seconds
and
it
worked.
We
can
leave
one
minute
more.
Okay,.
A
Finally,
I
hope
it
will
fix
it
and
we'll
forget
about
this
for
some
time.
A
A
Okay,
then,
let's
go
through
yeah.
I
I
don't
even
know
whether
we
need
to
go
through
that,
because
many
tests
started
failing
because
of
config
change.
Oh
yeah!
Actually
this
one
is
interesting.
I
I'm
not
sure
whether
this
where
is
this
coming
from
so
on
signal
critical
tab.
There
is
now
like
this
coupe
test2
that
now
fading.
A
I
think
it's
also
failing
because
of
yeah.
Let's
just
wait
for
for
this
image
config
to
propagate
and
then
we'll
revisit
this
test
grid
again,
maybe
next
time
it
will
be
better
because,
like
you
start
failing
on
things
like
that,
never
failed
before,
like
this
one
preset
meets
start
fading
and
new
failures
on
this
top
for
sure
and
yeah
anyway.
A
A
Okay:
okay:
this
is
center
thieves.
A
C
C
A
Yeah
they
want
feature
gates
to
turn
off
and
actually
not,
like
all
other
tests
wouldn't
mind.
This
feature
gate
turnout
because
nobody
uses
this.
No
tests
are
using
credential
provider,
except
maybe
one
or
two.
C
A
E
Yeah,
I'm
not
I'm
not
super
aware
of
this
one.
D
I
took
a
look
at
it
and
it
I
I
but
lgtm
did
but.
A
E
A
A
Okay,
this
is
promotion
conformance.
I.
C
D
A
Yeah
this
is
a
new
issue
I
filed.
I,
I
think
david
added,
this
skipper
logic
here
and
now
the
I
didn't
put
a
link
to
task
grid.
A
A
Yeah
now
runtime
plus
constantly
skipped,
so
this
schipuler
logic
was
introduced
to
not
run
the
test
when
test
handler
didn't
wasn't
pre-installed
on
the
machine
and
I
think
that's
checks
for
provider
being
gce.
I
think
this
logic
is
faulty
and
we
need
to
do
something
about
it.
E
A
A
Okay,
it's.
A
A
A
C
A
C
C
C
Well,
it's
it
just
ping!
This.
A
A
C
Yeah,
I
haven't
checked
it
since
I
created
this,
but
I
assume
that
it's
probably
still
flaking
in
test
grid.
E
C
I
think
that
peter
was
looking
at
this
one,
maybe
peter
you're
here
right.
Do
you
have
an
update.
A
C
Yeah,
what's
happening
with
this
one
is
that
the
pods?
Basically
it
relies
on
there
being
a
certain
number
of
pods
on
the
node
in
the
serial
end-to-end
test
like
in
order
to
schedule
the
right
number
of
pods.
In
order
for
the
test
to
work
and
the
problem
is,
it
seems
to
keep
changing
how
many
pods
that
are
like
just
system.
Pods
are
hanging
around
on
the
node
requesting
like
cpu.
So
previously
we
were
having
an
issue
where
sometimes
pods
would
be
requesting
50
ml
cpu.
C
So
like
fine,
we
changed
it
and
now
we're
seeing
that
sometimes
there's
like
a
hundred
milli,
cpus
of
pods
or
150
hanging
around
on
the
node
and
it's
like.
Where
are
these
system
clouds
coming
from?
I
don't
know
they
didn't
used
to
be
there,
but
now
they're.
There
is
this
an
issue
with
like
system
pods
not
tearing
down
properly
or
some
other
accounting
issue
on
the
node?
I
don't
know,
but
that
seems
to
be
why
this
one
occasionally
flakes
it'll,
never
schedule
the
pod,
because
there's
not
enough
space
for
it.
A
A
C
C
C
Okay,
oh,
if
it's
a
conformance
test,
sorry,
this
is
triggering
something
in
my
brain.
Conformance
tests
aren't
supposed
to
rely
on
events.
A
Yeah,
that's
I
understand
that,
but
I
think
it's
common
reflection.
Okay,.
C
A
Yeah,
I
suggest
we
move
to
black
triage
unless
there
is
more,
oh
by
the
way
I
wanted
to
update
on
this
performance
flake.
This
causes
performance
thingy
yesterday
and
I
started
looking
into
timestamps
and
like
what
happened,
I
didn't
find
anything
and
now
it's
green
again.
I
like
same
again,
you
can
look
at
performance,
really
quick
and
double
check
on
that.
C
G
Crash
pr
put
in
that
fixed
are
not
a
new
but
core
and
cubelet.
A
C
C
E
C
I
think
we,
like,
maybe
somebody,
suggested,
a
blog
post
or
something
like
that
to
do
this,
like
I
don't
understand
why
we're
getting
so
many
bugs
saying
like
by
the
way
I
set
my
cubelet
back
like
two
days
in
the
past,
and
then
it
stopped
working
like
that's,
not
a
thing.
You
should
do.
C
C
A
Yeah,
I
think
matias
told
the
scenario
last
week
about
raspberry
pi
that
loses
loose
connection
just
stays
in
time.
C
F
C
Yeah,
like
I,
I
think
that
I
mean.
Maybe
we
should
like
make
a
an
update
to
our
docs
saying
that
this
isn't
supported
but
like
I
know
that
no
just
yeah
don't
do
that
yeah,
but
the
thing
that
I'm,
like
mildly
concerned
about
is
like
we're
suddenly
getting
this
onslaught
of
bugs
being
like.
C
I
changed
the
time
of
my
cubelet
and
it
didn't
work,
and
this
is
a
bug
and,
like
I
don't
know,
if
there
was
like
a
blog
post
or
something
like
suggesting
that
you
do
this,
that
we
should,
like
maybe
say,
don't
do
this
to
or
something
but
like
this
is,
I
think,
the
third
or
fourth
bug
I've.
Seen
saying
like
I
changed
the
cubelet
time
wall
clock
on
the
node
by
like
a
large
amount
like
a
day
and
then
it
didn't
work
properly.
I
was
like
that's
not
supported
why.
C
F
A
F
Yeah,
because
doesn't
it
have
a
thing
that,
like
checks,
some
heartbeat
time,
if
it's
further
away
ago
than
something
something's
wrong,
I'm
not
ready!
Oh
that
thing
updated,
because
it
was
time
that
changed
now
we're
good.
A
Yeah,
so
I
mean
this
is
a
surprise
for
me
that
we
even
recovered
in
this
case
another
case
that
somebody
else
posted
that
we
discussed
last
week.
It
was
time
changing
back
backwards,
and
in
that
case
some
ordering
of
pods
somehow
like
is
broken,
and
because
of
that,
we
like
completely
lies
on
this
ordering
to
do
some,
like.
A
I
don't
remember
what
what
was
that
in
the
spark,
but
this
ordering
basically
led
to
unrecoverable
error
for
kublet,
and
they
claim
that
it's
even
happened
with
one
minute
shift
back,
which
can
be
legit
a
change
but
yeah.
We
probably
need
to
get
back
to
the
pack
and
understand.
What's
going
on.
A
C
C
Given
the
version
here
there
was
this,
I
notify
dependency
that
we
upgraded
in
a
bunch
of
versions,
but
not
I
don't
know
if
we
backported
it
back
to
120,
I
think
we
didn't
or
if
we
did
it
was
recent,
like
I
think
ryan,
do
you
remember
what's
going
on
with
this.
C
Yeah,
I
think
120
is
out
of
support,
is
the
issue
I'm
pretty
sure
this
is
that
I
notify
memory
leak
and
I
think
that
it's
just
not
going
to
get
fixed
in
120.
F
C
I
mean
c
advisor
will
use
a
lot
of
memory
and
cpu
if
you
let
it,
but
that's
typically
on
like
oh,
there
were.
F
Ten
thousand
blocked
see
advice.
I
got
routines,
oh.
C
Boy,
but
that
is
different
than
potentially
this
thing.
I
linked
the
issue
in
the
chat
sergey.
This
was
not
back.
I
think
it
was
backported
to
a
bunch
of
supported
branches,
but
I
don't
think
it
yeah.
It
got
closed
because
120
is
out
of
support.
A
A
D
C
A
C
And
last
week
it's
kind
of
in
my
butt
for,
like
pr
review,
I'm
like
like
submit
comment,
push
button
and
then
it's
like
there
was
a
problem
with
your
request,
or
you
can't
comment
at
this
time
or
like
any
number
of
oh
yesterday.
It
was
super
annoying
because
I
kept
commenting
and
the
comments
worked
fine,
but
then,
like
the
hooks
for
ci
weren't
working.
So
I
was
just
like
what
is
going
on.
A
Yeah,
it
wasn't
related
to
this
exact,
prop
timeout.
C
C
Also
sad
when
the
latest
update
is
just
the
stale
bot,
but
that
seems
to
be
what's
going
on
with
these.
C
Yeah,
like
the
other
one
that
we
saw,
they
were
shifting,
I
think,
the
time
into
the
future
and
they
were
finding
that
the
node
went
not
ready
and
in
this
case,
they're
switching
it
into
the
past
and
then
like
they're,
finding
that
they
can't
create
a
container
but
like,
I
think
it
recovers.
Eventually,
you
can't
do
that.
G
C
Like
I've
seen,
this
sort
of
thing
happen
in
like
service
environments,
when
I
was
like
running
cube
as
a
service
in
various
clouds,
and
there
are
sometimes
weird
time
keeping
problems
and
they
do
cause
weird
failure.
Modes
on
clusters,
but,
like
you,
gotta,
keep
time
like
that's
kind
of
a
base
assumption
so.
A
Okay
still
needs
information,
and
if
there
is
any
approval,
is
one
minute
back
and
one
minute
forward,
then
maybe
we
can
try
to
look
at
it.
Okay,
I
think
we're
done
with
bactriash.