►
From YouTube: Kubernetes SIG Node CI 20221019
Description
SIG Node CI weekly meeting. Agenda and notes: https://docs.google.com/document/d/1fb-ugvgdSVIkkuJ388_nhp2pBTy_4HEVg5848Xy7n5U/edit#heading=h.2v8vzknys4nk
GMT20221019-170447_Recording_1538x1120
A
Hello:
everybody,
it's
October,
19
2022..
It's
a
signal.
Ci
subgroup,
meeting
welcome
everybody.
Let's
get
started
today
on
agenda.
We
have
a
few
items.
Let's
get
going.
The
first
item
I
wanted
to
discuss
what
we
prepared
with
Swati
Swati
is
not
on
a
call,
so
I
will
represent
her
I.
A
Yeah
we
prepare
this
report
and
presented
yesterday.
I
I,
don't
know
like
I.
We
didn't
get
any
feedback
during
the
signal
meeting
that
everybody
were
quiet
just
listening,
but
I
wonder
if
anybody
on
this
call
has
feedback
to
what
we
made
here.
A
Yeah,
that
was
one
of
the
messages
I
wanted
to
deliver.
I
like
one
message
I
was
I
was
missing
from
this
report.
Whether
everything
is
okay
or
not,
and
I.
Think
knowing
all
this
like
some
details
of
his
failures,
I
would
say
the
status
of
it
always
green,
maybe
yellowish
green,
but
we
need
like
looking
at
this
and
looking
at
testing
it.
You
are
right,
like
you
look
at
like
how
many
tests
are
failing,
can
how
many
different
problems
we
have
it's.
A
A
Yeah,
so
I
think
one
thing
that
we
can
do
is
to
highlight
the
status,
maybe
like
saying
that
all
the
critical
Test
passing
can
many
failures.
We
know
that
are
being
infrastructure
problems,
so
that
may
be
one
feedback,
but
yeah.
C
You
know
I'd
like
to
give
one
additional
opinion,
because
it's
going
to
discourage
people
from
trying
to
fix
these
things.
I
was
looking
at
things
that
used
to
fail
in
a
particular
way.
They
fail
in
totally
different
reason
now,
and
it
looks
like
there's
a
fairly
broad
squareth
of
infrastructure
problem,
which
is
masking
the
original
problem.
That
was
there
some
weeks
back
hard
to
figure
it
out.
Now.
A
Yeah,
let's
do
yeah
when
it's
green,
it's
better
right,
so
you
immediately
know
when
something
started
being
not
green
like,
for
instance,
it's
if
you
do
what
I
swapped
as
a
red
for
a
very
long
time
because
of
infrastructure
problem
and
now
I
wouldn't
commit
that
it's
working
on
fedore.
We
didn't
change
anything
in
the
logic
and
it
worked
on
continuity,
so
all
I
can
Ubuntu
I
believe
we're
testing
it.
So
it's
supposed
to
be
working
functionality.
A
D
I
think
this
is
really
good.
I
think
we
could
I
I
saw
some
of
the
tests
for
cos
that
we
want
to
deprecate
out
I,
believe
I,
think
if
we
went
through
the
failing
test
and
figured
out
which
ones
we
really
cared
about,
we
could
highlight
those,
and
then
we
can
dive
into
them.
A
Yeah
and
I
asked
Mike
to
add
this
item
for
today
discussion.
This
PR
highlights
some
things
that
I
wanted
to
discuss
and
it's
exactly
what
you're
talking
about
this
is
like
some
tests.
We
don't
really
care
about
and
they
read
like
and
we
okay,
let's
discuss
it
when
we
will
get
to
that.
Okay,
okay,.
C
A
You
I
also
wanted
to
pay
attention
on
this
one.
So
I
was
very
generous
here
by
saying
many
issues
are
one
month
old.
In
fact,
many
issues
are
like
multi-months
old,
so
I
I
was
just
okay.
One
month
is
like
old
enough
to
to
make
people
worry.
A
Some
issues
are
from
like
last
year,
quite
a
few.
Actually,
so
it's
at
least
six
months
old
many
issues
and
I
also
found
many
issues.
Not
many
fewer
issues
got
rotten,
so
it
means
that
during
the
summer
time,
when
many
people
were
on
break,
we
had
issues
broken
out
and
we
didn't
fish
them
out,
because
issues
still
exist,
but
we
don't
track
it
as
a
GitHub
issue,
so
yeah.
That
is.
A
A
Permafilers
right
so
this
this
eviction
and
this
eviction
like
sore
eye
for
a
very
long
time,
I
think
somebody
in
Google
looking
at
this
right
now.
Pangea
is
not
on
a
call
right
now,
but
yeah
he's
looking
at
that.
But
again
it's
it's
not
like
top
priority.
A
So
issue
tracking
I
think
this
is
quite
easy,
easy
collected
statistics
and
I
think
if
we
start
doing
these
reports
once
a
month,
you
can
get
some
sense
of
what
we're
doing
and
maybe,
if
you
combine
it
with
how
many
tasks
were
closed,
yeah
just.
A
A
That
may
be
osprs,
so
tasks
is
just
one
here
and
11
here.
So.
A
Okay,
also
interesting
statistic:
I
think
we're
doing
reasonably
good
on
bugs
from
triage
perspective,
and
this
is
what
we
wanted
to
do
from
this
group.
We
really
want
like
bugs
to
be
fixed
by
main
signal
like
by
everybody,
so
triage
from
triage.
If
you're
doing
amazing,
we
caught
up
on
all
the
bugs
and
I
think
we
reaction,
like
people
provide
more
information
and
like
moving
them
into
right
homes.
So
we
know
all
the
issues
we
have.
We
just
we're
growing
our
debt
by
accumulating
this
box
and
not
fixing
them.
B
A
A
Yeah
that
would
be
interesting
piece
of
information,
lots
of
work
going
into
that,
but
it's
hard
to
demonstrate.
B
A
A
Okay,
this
is
this:
is
it
yeah?
If
you
have
any
other
ideas?
What
we
can
talk
about
in
this
report?
It
will
be
great,
I.
Think
next
report
we
may
start
doing
some
Deltas,
so
not
only
current
status,
but
it
is
Delta
as
well.
That
may
help
and
I
mean
my
hope
when
I
started
this
sheet
was
that
we
will
have
clear
picture
what
is
like,
like
green,
going
to
red
and
green
again,
but
yes,
it
doesn't
provide
enough
information
like
it's
not
presentable
like.
A
We
cannot
present
this
and
say
like
this
is
kind
of
status
of
tests
like
it
doesn't
work,
unfortunately,
so
maybe
this
can
be
improved
as
well
yeah
if
you
have
any
other
ideas
and
feedback
or
want
to
be
involved
for
the
next
report.
Let's
discuss
that
can.
C
A
B
A
We
can
try
now
try
today,
okay
Brian,
you
said
that
you
want
to
discuss
UPR,
so
I
noticed
it
didn't
have
signal,
that's
why
I
haven't
seen
it.
So
it's
a
very
simple
command
here
and
after
that
it's
immediately
got
into
my
radar.
C
Yeah,
it
might
be
a
bit
verbose,
I'm
here
to
pitch
it
and
I'll
try
to
summarize,
for
you,
previous
PR
went
out.
It
wasn't
good
enough.
This
one
I'm
I
want
to
explain
because
I
want
to
make
sure
you
know
that
due
diligence
was
done
to
try
and
get
the
arm.
624
tests
working
for
tensorflow
I,
documented
in
quite
a
bit
down
further
by
the
efforts
I
made
to
make
that
work.
C
to
actually
build
tensorflow
properly
anyway.
I
won't
go
on
too
much
about
that.
I
put
some
data
in
the
ticket,
so
the
proper
solution
is
to
not
try
to
do
that
anymore.
I
even
tried
to
bump
it
up
tensorflow
too,
and
the
amount
of
work
that
would
get
in
it
similar
or
equivalent
test
going
with
tensorflow
2
was
way
too
large
to
justify
this,
including
pulling
and
basil
and
building
stuff,
because
those
images
don't
exist
for
our
64
either.
C
C
A
E
A
Doing
that
I
see,
oh,
it
doesn't
yeah.
A
Okay
images
calls.
C
A
C
A
It
doesn't
count
my
approval
doesn't
count
here,
but
do
something.
C
A
D
Mike,
let's.
A
Go
to
this
one.
F
So,
for
anyone
not
familiar,
this
PR
is
to
delete
some
jobs
that
are
on
the
cost
tab.
This
the
shops
have
been
bred
for
for
a
while,
and
nobody
looks
at
it.
I
ask
internally
to
the
costume
if
they
are
using
them,
and
they
mentioned
that
these
are
not
looked
by
them.
So.
D
A
This
tab
called
cause.
I
mean
like
this,
the
name
of
operating
system
and
I.
Think
when
I
looked
at
this
PR
I
was
like
it
doesn't
look
like
it's
cost
specific,
so
it
doesn't
look
like
it's
operating
system
specific,
so,
for
instance,
this
test.
This
is
a
stock
test,
soak,
meaning
that
VMS
will
never
be
deleted
after
test
executed.
So
unfortunately,
it's
all
right
now
it
used
to
be
green
before
my
parental
leave.
A
Strange
it
was
green,
green,
red,
green,
green,
green
red
red
was
when
VM
was
out
of
memory
out
of
hard
drive
and
then
like
everything,
crashed
and
like
testing,
for
it
creates
a
new
machine
so
yeah.
That
is
very
interesting
task
to
keep
and
I
think
this
was
my
first
suggestion
and
I
I
just
like
quickly
glanced
through
and
like
I
think
what
Mike
did
is
move
this
tab
somewhere,
but
then
like
when
I
did
another
review
I
back
to
Ryan's
Point.
A
Was
this
test
important
I
noticed
that
this
one
flaky
so
flaky
is
the
one
like
I
think
it's
the
only
type
that
specifically
queries
for
flaky
tests.
So
the
idea
with
flaky
tab
was
always
that
if
tests
start
being
flaky-
and
we
don't
want
to
make
our
whole
test
grid
red,
so
we
will
Mark
the
specific
test
as
flaky
and
we'll
move
it
into
a
separate
Tab.
And
then
you
look
at
the
stop
and
clean
up
and
unflake
it.
A
So
that
was
a
process
that
was
created
before
and
I
looked
at
other
tabs
and
I
didn't
find
any
other
flaky
tabs.
So
I
think
it's
the
only
flake.
It
up
that
we
have
thank
you.
F
F
A
Yeah
because
an
Ingress
I
think
that
is
very
specific
for
Ingress
tests
like
trans
Ingress,
test
and
I,
wonder
if
we
run
it
in
other
places.
If
we
do
then
fine,
let's
remove
it,
but
I
think
it
has
a
very
specific
configuration
for
Ingress
does
and
then
it
runs
them
into
specific
configuration.
So
you
may
lose
a
signal.
I
mean
it's
a
red
signal
right
now,
but
we
might
lose
it
nevertheless,.
D
D
A
This
one,
maybe
we
need
to
involve
a
Sig
Network
and
even
with
networking
tab
like
we
have
like
Sig
Network
somewhere
different
place,
and
then
they
can
decide,
they
don't
need
it,
but
I
don't
think
we
want
to
just
blend
blind
and
move
it.
And
reboot
is
another
thing
that
I
just
don't
know
if
we
run
a
reboot
test
anywhere
else,
I
mean
reboot.
Is
such
a
disruptive
thing
that
I
would
assume
it.
A
We
disabled
it
everywhere,
but
it's
interesting
like
I
mean
we
don't
test,
he
put
in
any
other
I,
don't
remember
any
other
test
that
would
intentionally
reboot
the
machine.
So
maybe
that's
the
only
one,
and
if
it
is,
then
we
need
to
decide
whether
we
want
to
keep
it
around.
A
Ryan
to
the
point
to
your
point,
about
importance
of
a
test.
I
was
thinking
like
this.
This,
like
reboot
or
like
a
very
specific,
like
cereal,
on
Fedora,
like
memory
swap
on
Fedora.
Is
it
would
you
say
it's
critical
enough
that
we
need
to
care
or
you
think
it's
kind
of
secondary
tab.
D
A
D
Maybe
I
guess
my
point
was
as
if
we
were
going
to
deprecate
out
the
COS
tests,
then
we
should
remove
them
from
the
you
know,
build
tree
and
and
remove
them
from
the
document
and
just
prioritize
which
ones
we
were
wanting
to
look
at.
First.
F
A
F
Have
the
full
context
on
this,
but
when
I
saw
the
issue,
I
think
it
was
specified
that
it
were
failing
at
least
all
the
all
the
times
that
were
keep
track
of
the
test
grid.
So
that's
why
I
I
thought
that
maybe
they
were
failing
since
the
beginning
of
times
and
they
don't
really
offer
any
value
and
and
there's
no
point
in
keeping
red
tops
if
they
have
always
been
read,
but
saying
that
at
some
point
they
were
passing
then
that
that
changes
everything
but
yeah.
A
And
I'm
asking
you,
because
you
brought
this
point
about
important
tests
and
less
important
tests
and
I
get
in
this
all
the
time
from
inside
of
Google
as
well.
Let's
just
consider
it
an
important
test
and
I'm
like
trying
to
come
up
with
a
dimension
and
I
cannot
like
is
Alpha
feature
is
not
important.
I
mean
they
about
to
become
beta
right,
like
they
are
important.
Oh.
A
Yeah
and
I
I
can
like
I,
mean
some
functionalities
it
we
may
not
use
for
Google,
but
it's
still
important
for
open
source.
So
we
still
need
to
invest
to
those
and
I.
Just
cannot
come
up
with
a
dimension
like
I,
don't
think
we
run
any
not
needed
tests,
and
that
bothers
me,
like
even
performance,
says
that,
like
Brian,
you
helping
fix
I
think
that
I
need
it,
because
we
don't
run
any
stress.
Besides
this
performance
on
kublet
and
like
we
wouldn't
catch
any
real
degradation
of
how
kubernetes
works.
A
Yeah
but
I
I
welcome
my
any
idea
like
how
we
can
categorize
tests
to
make
it
make.
This
question
disappear
so
make
people
realize
that
all
tests
are
important
or
maybe
categorize
tests.
A
Okay,
sorry
for
my
rant,
I
I,
just
I
was
trying
to
get
this
sort
it
out
for
some
time
and
I
just
cannot
come
up
with
any
plan
and
like
any
categorization
that
will
make
that
is
easily
explainable.
F
A
Yeah
I
mean
we
run
some
tests
a
little
bit
too
often
like
some
tests,
like
every
four
hours
like
once
a
day,
but
besides
of
that
I
I
think
all
the
test
runs
are
quite
needed,
and
the
fact
that
signal
is
red
is
not
indication
that
you
don't
care.
It's
indications
would
be.
A
Something
is
broken
with
us
and
we
don't
have
enough
people
to
look
at
them.
Yeah,
right
and
again,
like
I,
think
knowledge
that
everything
is
green
is
very
much
in
the
heads
of
few
people
so
like
people
who
attended
this
meeting
knows
that,
like
everything
is
pretty
much
stable
because
so
the
failures
we
know
like
this
is
the
infrastructure.
This
was
fading
a
lot,
but
we
know
that
it's
working
this
kind
of
tribal
knowledge
that
we
carry
from
really
studies,
but
I
hope
we.
A
We
can
eliminate
this
human
factor
from
from
the
equation.
A
Okay,
let's
move
on
to
triage,
if
you
don't
mind,
we
have.
A
Okay,
the
one
we
just
looked
at.
A
A
Maybe
it's
done
now:
she's,
not
gonna
image
validation
course
waiting.
A
A
A
B
A
Yeah
there's
a
very
interesting
cap:
I
really
enjoy
what
they're
doing
with
the
jobs
these
days.
So
if
you
I
would
definitely
look
myself,
but
if
you're
interested
take
them
as
well.
A
Okay,
so
I
think
this
person
shingo
came
last
week
to
this
meeting,
can
give
us
some
overview
of
some
functionalities.
That
is
not
working
as
expected.
A
Yes,
okay,
so
the
engine
New
Image
here
very
there's
a
base
image
or
busy
buttons.
Okay,.
E
I
think
I
already
yeah
I
think
I
already
moved
it
very
trash,
accepted
it
and
reviewed
it.
B
A
Okay
and
just
we
looked
at
perfect,
nothing
else,
oh
doing,
okay,
if
anybody
like
I
think
last
time
what
we
did
on
the
music,
we
looked
at
the
task
assigned
to
everybody
and
everybody
had
tasks
so
I.
Don't
think
we
need
to
do
it
this
time,
I,
unless
people
want
to
want
to
discuss
some
items,
any
desire.
Oh
okay,
let's
go
into
box
really
quick
and
then
okay,
let's
do
what
you
suggested.
Seven
plus
this
I
think
six
six
box
reached.
B
A
A
Like
existing
limitation,
but
I
cannot
say
for
sure,
without
going
into
my
details,.
E
It
it
says:
121,
do
you
think
you
know?
Maybe
we
can
ask
them
if
it's
still
happening.
A
B
I
think
typically,
they
expect
in
CPU
manager,
you're
you're
supposed
to
remove
the
checkpoint
file
for
CPU
manager
before
you
restart
qubit
I
believe,
maybe
that's
what
you're
supposed
to
do
here
as
well,
but
I
think
it
makes
sense.
Don't
just
ask
them
to
check
first.
If
this
is
the
behavior
they
get
on
the
latest
version,
and
then
we
can
look
into
it.
B
A
E
A
Oh
I
see
it's
not
about
image.
It's
about
I'll
pause,
dockershim,.
A
Yeah
I
know
that
similar
Behavior
exists
on
containerd.
We
have
some
ideal
issues
with
sandbox
being
stuck
sometimes
I
doubt
you
will
ever
touch
it
for
123.
A
A
Like
it
is,
I
know
that
sometimes
happening.
If
somebody
wants
to
look
at
Dr
Shim,
please
be
my
guest.
Let's
or
it
may
be
a
little
bit
hard
from
archeology
archeology
perspective
to
go
there
and
code
base
is
not
on
Master
anymore.
A
Okay:
this
is
another
time
issue
how
much
in
the
past.
A
I,
remember
about
that
time.
It
was
poet,
wasn't
able
to
I
mean
not
was
still
in
good
shape.
It
just
couldn't
schedule
any
more
ports,
because
we
thought
that
it's
already
have
a
newer
Port
around
or
something
like
that.
So.
A
Times
queue
happening,
and
typically,
if
you
have
a
time
server
around,
it
will
update
completely
fast
I.
Think
most
of
the
Box
reported
about
time.
Skew
was
from
Edge
devices
like
some
telecoms
at
the
running
node
on
some
low
power
device,
and
those
may
have
time
skew
reasonably
like
very,
very
high.
A
So
if
you
don't
support
those
devices
very
well
and
I
think
this
will
be
like
the
smallest
problem.
If
not
becomes
becomes
unknown,
you
can
at
least
just
three
stars
this
node
and
be
done
with
that.
So
it
should
be
fine,
but
yeah
I
think
we
have
more
serious
issues
with
timing
right,
listen!
How
much
timing
could
talking
about
here
if
it's
very
like
if
it
can
be
represented
with
a
few
seconds,
and
we
may
need
to
look
at
it
into
that.
A
Yeah
I
think
this
is
being
looked
at.
Isoculating.
A
B
A
C
A
A
Yeah
I
think
with
the
promotion
of
CRI
starts.
Do
we
do
it
for
this
lady?
So
next
it
is,
but
we
about
promoted
to
the
next
stage.
It
will
be
interesting
to
fix
it.
E
A
Okay,
thank
you.
Yeah
I
think
we're
done
with
the
bakriage
and
we
have
11.