►
From YouTube: Kubernetes SIG Node CI 20230802
Description
SIG Node CI weekly meeting. Agenda and notes: https://docs.google.com/document/d/1fb-ugvgdSVIkkuJ388_nhp2pBTy_4HEVg5848Xy7n5U/edit#heading=h.2v8vzknys4nk
GMT20230802-170531_Recording_2304x1440.mp4
A
Well,
hello:
this
is
signal
CI
week.
Limiting
today
is
August,
2nd
2023
welcome
everybody.
Today
this
section
of
this
meeting
will
be
led
by
dixita
Dixie.
Do
you
want
to
take
over.
B
I'm
sorry,
this
is
my
first
time,
so
it's
a
little
glitchy
for
me,
okay,
so
this
is
August
2
2023
and
we
are
host
I'm
hosting
this
week's
Sig
note
CI
meeting
I'm
going
to
try
to
present
my
screen.
First.
B
Okay,
so
I'll
be
doing
the
test
triaging
today
and
Todd
will
be
doing
the
bug,
bug
triaging
the
we
have
the
agenda
here.
So
we
are
looking
for
more
people
to
engage
in
the
meeting
by
Leading,
either
bug
triage
or
test
failures,
and
if
you
are
interested,
please
reach
out
to
a
circuit
can
start
with
the
test
failures.
The
GCE
device
plug-in
pre-submit
job
is
still
broken.
So
does
anyone
have
any
insights
into
this.
C
Yeah,
hello,
so
context
for
this
thing,
which
I
added
is
that
now
I
was
validating
or
trying
to
validating
a
back
part
of
mine
and
I
literally,
don't
know
if
this
test
is
still
maintained
or
supposed
to
be
working
literally
so
I
how
we
found
it
using.
You
know,
test
question
mark,
so
it
could
be
just
stale
and
I
don't
know.
So
if
anyone
has
any
insight
about
or
just
say
yes,
this
is
supposed
to
work.
So
something
is
up
just
reach
out
and
to
me
please,
and
we
can
move
forward.
C
The
failure
I
see
seems
to
be
when
the
test
is
setting
up
even
before
the
actual
you
know,
the
actual
Recreation
starts
so
and
I
also
filed
a
very
simple
PR,
which
just
you
know,
against
the
same
branch
which
just
changes
a
comment
and
run
this
and
test
and
again
same
issue
which
points
to
obsolete
test
or
broken
test
or
whatever.
But
I
really
don't
know
if
all
things
are
supposed
to
go.
So
that's
it
thanks.
B
Okay,
okay,
all
right!
Moving
on
to
next
one,
the
multinomial
test
is
still
failing
on
this
particular
PR
has
some
of
the
details
and
I.
Think
swathi
has
created
this
VR
to
modify
the
machine
type
to
narrow
down
the
failures.
So
what
you
do
you
want
to
talk
about?
It.
D
Yep,
so
the
issue
per
se
captures
that
periodic
jobs
are
failing,
but
I
was
trying
to
figure
out
if,
if
it
is
just
periodic
jobs
or
it's
all
the
jobs
that
essentially
use
this
image,
config
file,
so
I
created
a
dummy
PR
to
test
out
test
the
route
and
I
I
identify
that
pre-subish
jobs
that
are
referring
to
this
config
file
are
also
failing,
so
essentially,
The
Next
Step
was
to
change
the
submission
config
file,
which
I've
done
as
part
of
this
PR,
so
that
should
narrow
down.
D
If
it's
to
do
with
the
image
config
file
itself,
and
then
we
can
figure
out.
How
do
we
enable
multinum
systems
or
machines
in
test
infra,
so
I
think
the
next
steps
are.
If
someone
can
take
a
look
at
this
particular
PR
and
approve
it,
and
then
we
can
see
if
CI
require,
if
CI
responds
to
this
in
a
positive
way.
And
then
we
know
that
it's
this
issue
in
and
following
that,
we
can
further
make
changes
to
the
image
config
file
to
enable
multinomial
systems
in
CI.
B
Okay,
so
the
next
one
is
the
arm
failures
that
we
saw
last
week.
Those
seem
to
be
fixed
now
on
the
pr
with
the
fix
is
already
merged,
so
we
are
good
there
and
then
moving
on
to
the
containerdy
jobs
that
were
failing
last
week,
they
are
still
failing,
I
think
yeah.
There
were
two
test
failures,
as
mentioned
as
mentioned
on
this
PR
one
was
for
test,
the
fix
is
already
yeah,
I,
think
they're
waiting
for
120
nine
to
open,
and
then
this
will
be
merged,
but
the
summary
API
tests.
A
We
cannot
merge
it
before
129
I
mean
we
can,
but
I,
don't
think
it's
critical
enough
to
justify
as
a
merge
right
now
so
to
remind
everybody.
The
problem
was
that
specifically
in
Canada
energy
and
I'm,
not
sure
about
I,
think
a
crowd
was
the
same
behavior.
A
If
happens
too
early
in
a
life
cycle
of
a
container,
then
it
may
die
with
a
different
message
than
status.
So
test
is
looking
specifically
for
home
status
and
if
it
happens
too
early,
then
you
cannot
find
it.
That's
why
we
just
I
didn't
sleep
into
before
we
Asian
sleep
before
allocating
too
much
memory,
and
that
fixes
the
test.
Fakeness.
B
B
A
No
I,
don't
think
so,
like
you
need
to
assign
to
himself.
If
you
will
look
into
that.
B
Okay,
if
not,
maybe
we
can
take
a
look
at
it
later
then.
B
I
think
that
is
all
with
respect
to
the
test
failures
from
last
week.
How
do
we
want
to
proceed
ahead?
Do
we
want
to
go
through
the
rest
of
the
agenda?
First
Circle.
Do
you
want
to
go
through
the
rest
of
the
agenda
first
or
do
we
want
to
start
looking
at
the
test
board.
A
I
usually
going
to
go
to
agenda
first
to
get
it
out
of
the
way.
Okay,.
E
E
It
works
fine
on
1964,
but
on
a
harm.
That's
so
false
and
the
stack
Trace
is
a
little
useless.
There's
it's
a
strip
binary,
there's
nothing
there
that
I
can
see
and
upgrading
to
a
newer
version
of
BusyBox
fixes
it.
So
this
is
a
change
to
upgrade
to
a
newer
version
of
BusyBox
and
that
should
resolve
this
arm.
E
64
flake
arm64
test
failure
anyway,
and
then
there
was
sort
of
the
second
part
of
this
question
was
there
are
some
other
test
images
which
are
really
old
and
one
of
which
is
depending
on
Centos
7,
which
is
now
like
end
of
life,
and
there
will
there
will
not
be
a
Centos
or
a
newer
new
updates
to
send
to
us.
So
I'm
curious
like
what's
the
like.
E
How
do
test
images
get
updated
is
do
we
just
leave
them
alone
until
there's
a
problem
and
then
change
them
and
and
sort
of
what
to
do
about
the
Centaur
seven
like
there's
like
Alma
Linux,
which
I
did
build
this
with
all
of
it.
It
worked.
Fine
still
uses
the
same
RPMs
RPM
names,
and
it
is
up
to
date,
but
I'm
just
curious
what
the
consensus
is
for
that.
A
I'm
trying
to
be
on
supported
versions,
but
so
protest
images,
specifically,
we
tried
like
since
they
don't
I,
will
try
to
replicate
as
much
as
possible
users
scenarios
so
users,
typically
on
supported
versions
of
images.
But
beyond
that
it
doesn't
matter
the
image
they
run.
It
I
mean
hosting
platform
matters
more
because
typically
kubernetes
will
be
running
from,
especially
in
managed
environments.
Kubernetes
will
be
running
on
more
modern
OS
images,
and
so
we
need
to
update
it
more
often
but
test
images.
A
Either
something
is
broken
or
we
just
keep
it
above.
The
end
of
life.
B
C
Honest
this
one
I,
don't
know
how
we
want
yes,
this
is
from
me,
so
basically,
this
one
is
going
to
be
just
deleted
when
I
figure
out,
what's
going
on
so
I'm,
not
sure
how
we
want
to
map
on
the
board.
This
should
basically
no
one
should
look
at
it
unless
to
to
understand
CI,
so
not
totally
not
worth
review
or
unless
you
know
for
targeted.
C
C
C
D
B
Okay,
this
is
for
node
conformance
test
container
PID
namespace
sharing
container
in
Parts
using
isolated,
PID
namespaces
should
all
receive
prd1
okay.
So
what
is
this?
Is
this.
F
B
B
I
think
this
is
the
same
one,
the
one
that
swathi
created,
so
this
also
has
been
assigned
so
I'll
move
it
to
in
progress.
B
B
C
This
is
product
back
port,
so
not
for
the
like
includes
test,
but
the
main
focus
is
not
test.
B
A
Yeah
I
would
just
move
to
Archive
it
because
otherwise
it
will
keep
popping
up
and
thank
you.
F
B
C
D
A
B
A
So
apparently
like
when
we
create
an
image,
we
need
to
make
one
of
the
like
TMP
executable,
because
we
put
test
binaries
there
and
we
don't
do
it
unless
there
is
an
image
config
file.
A
Yeah,
they
I
think
they
add,
will
I
haven't
looked
very
deeply,
but
I
think
ad
agent's
image
config
file
to
use
for
this
image
for
the
specific
test.
Okay,
I
think
the
question
was
about
like
whether
we
want.
G
G
But
if
you
look
at
files
changed,
there's
a
new
image,
config
being
made.
A
Which
Cloud
menu
do
you
refer
to?
Because
I
think
this
is
how
we
had
to
to
the
image
to
the
cloud
in.
B
G
F
I,
don't
think,
there's
any
default
on
this.
It
will
just
not
run
anything,
but
there
is
a
way
to
specify
like
a
script.
If
you
wanna
foreign.
A
And
the
question
was
I
glanced
over
yesterday,
so
the
question
was
whether
we
want
for
Canary
to
use
something
more
Advanced
like
some
like
GF,
build
of
of
the
same
image,
or
we
still
always
just
want
to
use
whatever
like
I,
don't
see
him
like.
The
font
is
too
small,
yeah
yeah.
F
Right
now
we're
using
one
class
97,
which
is
it's
a
relatively
old
image,.
A
G
Just
to
add
to
this
Cube
Test
2
has
its
own
set
of
no
D3
tests
that
are
running
on
there.
So
if
there
are
changes
that
repo
you
will
know,
the
the
existing
node
tests
have
to
pass
hope.
You
guys
get
emerged.
G
Again
so
some
of
the
sick
testing
canaries
are
used
to
test
keep
tests
to
when
it's
been
developed.
A
E
A
Is
it
time
to
clean
up
like
remove
non-group
test
to
tests.
G
G
B
Okay,
so
I'll
keep
it
in
needs
reviewer,
so
that
Sergey
can
take
a
look
at
it.
Next
one
is
fix
freaky
text
test
on
DRA
test,
prepare
resources
should
time
up.
Oh,
this
is
for
the
bug
fix
this
fixes
some
flake
in
manager
test.
B
B
The
next
one
is
for
a
pre-submit
job,
e2e
Gap,
C
windows,.
C
C
C
Yeah,
because
I
think
the
basic
question
is
okay:
is
this
line
still
supported?
You
should
supposed
to
work
or
I
just
hit,
something
that
is
stale
and
should
actually
be
remote.
I
think
that's
the
triage.
We
need
I.
Just
don't
know
if
this
Lane
supposed
to
work.
B
B
Okay
sounds
good,
so
I
think
we
are
done
with
the
test
triage
here
we
can
move
to
bug,
triage
I
can
stop
presenting
or
if
you
want
to
take
call.
E
Is
mine
displaying
or
yeah
okay,
human
Zoom,
all
right,
so
I
went
ahead
and
opened
up
the
all
the
bugs
here
in
triage
list.
E
So
let's
start
at
the
top,
this
one
I
was
just
looking
at
it.
Let's
submit
a
job
that
requests
the
GPU.
It.
C
C
E
It's
been
asked:
if
they
can.
Oh
there,
it
is
yeah,
so
they're
saying
I
can't
reproduce
it
one.
Two
seven
and.
E
E
C
I
do
remember
a
bug
a
while
ago,
which
I
should
probably
not
look
at
the
deeply
enough,
which
was
exactly
about
completed
body
if
you
restart
the
cubelete
territories,
admission
yeah
and
that
could
be
related
this
time.
This
time
could
be
button.
Unfortunately,
remember,
look,
look
what
we
can
do
is
that,
could
you
see
me
that
I
think
I
can
I
can
have
a
look
and
try
to
reconculate
things
thanks.
A
Do
you
want
to
ask
Google
logs
and
what
whatever
else
may
be
helpful,
like
Nvidia
driver
version,
for
instance,.
E
A
The
immutable
they
are,
but
they
can
still
be
mutated
by
being
deleted
right
so
immutable.
If
you
misleading
here.
E
A
Yeah
I
wonder
if
it's
a
feature
request
saying
that,
since
it's
immutable,
then
kubert
assumes
it
will
okay
forever.
E
A
H
E
Yeah
I
think
that,
and
that's
always
asking
like,
can
we
just
clear
up
documentation,
yeah
dark
Bishops
like
that
immunologics
need
to
be
delayed
and
recreated
with
a
different
name
in
order
for
the
new
pods
to
see
the
new
version.
A
Yeah,
you
can
make
a
kind
documentation.
E
A
Sf
team
I'm.
Sorry
there
is
a
person
SF
team
at
SF
team.
H
C
Right
in
this
case,
the
other,
the
author
of
the
previous
issue,
acknowledged
that
their
PR
was
a
duplicated
of
the
pr
fixing
this
issue,
so
I
think
we
can
get.
We
can
close
the
yeah
I.
H
E
Mark,
that
measure
is
closed,
pod
created
without
you
know,
the
economy
removed.
C
E
C
E
F
E
C
A
We
can
try
to
mention
Sasha
to
see
what
he's
thinking
about
it.
E
C
No,
no
only
keid
is
is
exactly
yes,.
A
No,
no
I
meant
Sasha
growing.
It.
E
Sorry,
I
don't
know
that
that
get
up
handle
either.
You.
A
A
Can
you
see
me
as
well
just
yeah
I'm,
really
curious.
A
Yeah
and
on
the
issue,
can
you
move
to
need
information
on
top
of
it
different
section
yeah?
Thank
you.
E
A
A
Right
typically,
I
click
on
dashboard
name,
and
then
it
brings
me
to
the
dartboard
and
then
I
remove
it
from
there
benefit
of
all
response.
That's
why
I
have
archive
with
a
button.
E
Thank
you
all
right,
pretty
much
part
I
think
we
talked
about
this
one.
The
other
day.
H
C
E
E
A
E
A
H
I
wonder
if
this
has
something
to
do
to
both
crime
and
continuity.
When
it's
a
not
privileged
pod,
they
could
see
group
same
space
and
see
us
V2,
but
I
wonder
if
maybe
we
by
doing
that,
just
with
the
you
know,
amount
to
see
its
sfsc
group
within
the
container
for
privileged
containers,
because
I
know
we
do
that
on
previously.
What
the
mount
says,
FC,
group
and
or
so
maybe.
E
E
E
H
E
A
You
think
about
side
effects,
but
songs
addict.
A
Yeah
is
not
very
tested
scenario
like
we
have
so
many
edge
cases.
H
E
E
Yeah,
if
the
yeah,
if
you
have
this
pressure
and
you
restart
it
just
goes
away
momentarily
and
comes
back,
seems
yeah.
It
seems
like
a
bug.
Chip
pod
schedule
that
shouldn't
for
a
short
period
of
time.
E
E
Oh
they,
oh,
they
create
an
empty
deer
memory,
back
temp,
FS,
basically
and
then
fill
it
which
consumes
memory.
Then
their
fog
gets
boom
killed
and
the
temp
FS
continues
to
exist.
But
now
the
node
is
dying.
I
think
it's
my
interpretation
of
this,
because
the
temp
investor
still
consuming
memory.
E
E
A
A
E
E
Yeah
I'm
trying
to
reproduce
this
one.
It
sounds
interesting,
yeah.
H
A
Yeah,
if
you
can
reproduce
it
and
get
some
cool
folks
into
debate,
or
maybe
you
ask
Google's
logs
from
this
person.
A
A
Cool,
do
we
want
to
make
it
news
information
or
to
clear
enough.
E
It
seems
I
mean
he
supplies
a
pod
that
does
it
so
I
think
okay
makes
sense.
Yeah
should
be
enough.
E
I
have
to
tweak
the
limits
based
on
my
Note,
but
yeah
should
be
good,
so
I'm
gonna
try
to
reproduce
this
one.
This
afternoon.
H
E
H
E
All
right,
so
that's
all
the
triage
issue
or
Treehouse
issues.
We
got
I.
Think
three
minutes
left
so
not
much
time.
A
Yeah
I
think
so,
if
anybody
who
wants
to
sign
up
for
next
one,
please
sign
up
on
a
agenda
document
Let's
see
you
next
time.
Thank
you.
Dixita
and
I
thought
I
really
appreciate
you.
Okay,.