►
From YouTube: Kubernetes SIG Node CI 20230920
Description
SIG Node CI weekly meeting. Agenda and notes: https://docs.google.com/document/d/1fb-ugvgdSVIkkuJ388_nhp2pBTy_4HEVg5848Xy7n5U/edit#heading=h.2v8vzknys4nk
GMT20230920-170330_Recording_2560x1440.mp4
A
B
D
Yeah
hi
everyone
yeah,
just
as
part
of
the
cap
I
was
working
on.
You
know
the
I
was
working
on
the
split
image
file
system,
but
I
noticed
that
Cuba
already
supports
a
dedicated
image
file
system,
but
we
don't
have
any
test
coverage
of
that,
and
so
I
was
a
little
con.
I
mean
I
like
to
add
eviction
tests
for
the
split
disk
case,
but
I
think
I
probably
want
to
add
it.
Also
for
the
dedicated
image
file
system
and
I
was
just
curious.
What
the
group
thinks
about
that.
D
I,
don't
know
that
would
be
it's
allowed
or
not.
I
don't
know.
A
So
definitely
allowed
definitely
need
to
to
be
added.
We
have
a
long
history
of
two
eviction
test,
failing
one
on
cryo
and
one
on
contain
energy,
and
it's
different
eviction
test.
I.
Think
latest
I
heard
that
one
container
the
eviction
test
fails
because
it
failed
to
fill
up
disk
fast
enough,
because
this
became
too
big
on
infra
nodes.
So
if
you
can
look
into
that
as
well,
it'll
be
really
appreciated.
D
I
mean
I,
think
I
can
I
can
get
them
to
fail
locally,
not
sure
if
it's
the
same
reason,
but
I
will
continue
looking
into
the
eviction
test
cases
just
to
see
if
I
can
at
least
especially
on
the
cryo
side,
if
I
can
find
why
they're
failing
but
yeah
and
then
the
other
part
is
also
the
dedicated
the
image
file
system.
D
A
So
you
need
a
machine
with
extra
disk
specifically
for
image
file
system
right.
D
A
E
D
I'll
go
ahead
and
ask
that
and
I
guess
the
question
I
have
is
Maybe.
You
all
can
clear
something
up
for
me.
Do
I
add
it
for
AWS
or
gcp.
A
These
are
Works
I
think
you
will
have
more
experience
with
gcp,
but
the.
A
C
D
A
A
I
think
I
don't
know
how
many
tests
you
want
to
run
on
this
specific
environment
with
the
extra
disk.
It
probably
will
be
just
one
test
right,
like
one
area
that
you
want
to
test.
D
A
It
may
be
easier
to
start
with
separate
job
specifically
for
this
extra
disk.
It
will
only
run
one
test
and
then
we
will
decide
like
how
fast
it
is
and
I
just
not
like
eviction
tests
are
generally
slow
and
I'm,
not
sure
what
will
be
extra
cost
of
running
this
disk.
D
Fair
enough
yeah,
I'll
think
of
some
test
cases
for
that
and
then
I'll
keep
I'll
try
to
keep
getting
back
to
investigating
the
eviction
ones.
Yeah.
A
D
Oh
no
and
I
finally
got
the
the
pre-submits
running,
so
I
can
actually
test
the
cryo
jobs
on
on
PRS,
so
that'll
that'll
be
easier
for
the
eviction
tests.
D
No,
it's
just.
They
were
all
periodic
jobs
there
were.
There
were
no.
There
was
only
a
few
of
the
cryo
pre
of
the
periodic
jobs
that
were
actually
you
could
able
trigger
them
from
a
PR,
so
they'd
have
to
be
merged
and
then
go
into
the
the
Cron
job
to
test
them,
which
is
not
great
for
testing.
A
I
thought
that
all
the
eviction
tests
is
mostly
equivalent
logic
and
what
it
like
I'm
wondering.
Is
it
anything
specific
for
container
runtime
or
it's
just
for
you,
it's
easier
because
you
have
a
local
environment
set
up.
D
Yeah
I
have
cryo
set
up
locally.
That's
the
one
I'm
testing,
with
the
main
difference
is
stats,
which
is
why
the
the
one
of
them
is
failing
in
container
Rd
and
it's
not
failing
in
cryo,
because
I
think
somebody
I
think
David
Porter
pointed
out.
There's
some
issue
with
pids
I
think
around
stats.
That's
the
last
issue.
I
saw
around
that
and
that's
failing
for
cryo
stats
provider,
but
I
don't
think
it
fails.
For
this
anything.
That's
using
the
C
advisor
stats
provider,
which
is
cryo.
A
D
So
there's
slight
differences
between
those
and
yeah.
The
other
thing
on
the
agenda
is
actually.
This
is
something
that
Ryan
fixed
yesterday,
but
I.
D
We
were
noticing
that
the
proud
jobs
there
was
some
issue,
don't
fully
understand
all
the
stuff
and
prowl,
but
the
there's
two
required
pre-submit
jobs
from
Sig,
node,
I
think
or
maybe
one
was
an
ete
GCE,
which
I
don't
know
who
owns
that
one
and
the
other
was
I,
think
a
Sig
node
end-to-end
tests
are
using
the
deprecated
bootstrap.pi
job
and
those
started
failing
I
think
we
probably
should
consider
trying
to
migrate
those
jobs,
I'm
happy
to
take
that
on
for
the
required
ones,
because
I've
been
poking
around
a
lot
and
test
the
testing
for
scripts
lately,
so
I'll
just
yeah.
A
Move
to
decorate
through
is
definitely
a
preferred.
You
can
ping
me
if
needed,
quick
approval,
yeah.
A
And
for
previous
item
I
I,
remember
like
when
we
configure
images
to
run
on
I
think
we
can
do
machine
type
and
some
like
some
other
characteristics,
but
we
cannot
specify
and
put
disk
size
if
you
will
be
looking
at
agent
extra
disk.
Maybe
you
all
can
also
look
into
configuring
boot
disk
size,
because
I
believe
that
was
an
issue
for
eviction.
Tests
that
would
like
default
would
just
size
increased
and
we
failed
to
filled
up
efficiency.
A
A
Anyway,
yeah
thank
you
for
looking
at
that
I
think
it
will
make
in
general.
We
don't
have
too
many
Edge
case
tested
in
kublet.
More
eviction,
tests
than
other,
like
stress
environments,
will
have
better
for
reliability.
B
Okay,
then
I
guess
we
can
move
to
desperate
triage.
B
B
I'll
check
it
later
then
confirmware
release.
They
are
looking
good.
B
C
B
C
C
C
B
B
I
see
the
one
from
anthropology:
do
you
think
it
was
yeah.
B
Yeah
then,
let's
wait
for
the
pr,
and
hopefully
this
is
so
your
your
peer
was
addressing
the
test
right,
not
the
job.
B
B
B
This
one
looks
fine,
I,
think
I
think
all
the
tests
that
we
have
failing.
They
already
have
an
issue
tracking
them,
so
there's
only
to
create
any
new
ones.
With
that
in
mind,
I
think
we
can
finish.
B
If
you
want
of
my
screen,
you.
A
Can
test
board
I
think
it's
part
of.
B
B
Moving
for
a
while,
okay,
yeah
I
think
we
should
have
reported
this
one
last
week,
but
nobody
has
to
look
into
it.
B
B
Erasing
important
helpers,
Etc
areas.
B
Yeah
we
have
a
couple
of
note
packets,
but
this
doesn't
this
doesn't
look
like
it
should
be
in
this
test
here,
I'm
gonna
move
it.
B
B
B
B
C
E
Domination
race
period
are
Force
deleted,
been
a
part
with
spec
dominated
brace
period
is
0
is
deleted
without
without
force
it
is
force
deleted
from
API
server
without
cubelet,
killing
its
containers
and
unlocking
its
volume
first.
E
E
Is
can
you
elaborate
like.
E
B
I,
wasn't
it
I
was
talking
about
yeah
I
think
this
is
intended
Behavior,
oh
basically,
when
you
delete
a
post,
you
have
termination
works
period,
yeah,
and
it
specifies
how
long
it
waits
between
sending
industry
a
safe
term
and
then
a
stick
kill
to
delete
the
boss.
Once
it's
on
the
hill,
the
ball
is
deleted
completely.
So
if.
C
F
E
Right
right
so
yeah,
because
the
grace
period
is
set
to
zero
here,
so
it
should
be
first
deleted
immediately
correct
right.
B
E
This
can
be
dangerous
for
stateful
sets.
They
should
guarantee
that
only
a
single
replica
of
quadrants
but
KCM
will
start
a
new
replica
when
the
old
replica
is
still
running.
A
B
E
Okay,
I
couldn't
follow
what
they
are
trying
to
say
for
stateful
sets.
C
A
Germination
with
zeros
and
like
container,
is
still
running
on
the
Northern,
like
it
still
have
volume
attached.
So
when
you
create
a
new
poor
that
may
fail
to
attach
the
same
volume
or
it
may
fail
to
I
mean
it
may
do
something
unexpected,
because
two
replicas
will
be
running
the
same
time.
A
Project,
all
together,
if
you
click
on
yeah
or
go
back
to
the
project
board
and
then
remove
it.
A
Yeah,
just
click
click,
Three,
Dots
and
remove
from
Project.
Thank
you.
E
Garbage
collector
for
container
images
is
unpredictable
and
inconsistent.
We
got
which
collector
for
container
image
life
cycle
does
not
seem
to
adapt
to
the
documentation
and
provided
parameters
and
the
dogs.
The
configured
high
threshold
percent
value
triggers
garbage
collection,
which
deletes
images
in
order
based
on
the
last
time
they
were
used.
Okay,
starting
with
the
oldest
first,
the
cubelet
deletes
images
under
disk
usage
reaches
the
low
threshold
percent
value
from
the
test
is
not.
This
does
not
appear
to
be
the
case.
E
Often
often,
the
garbage
collection
will
delete
way,
more
images
than
required,
often
dropping
below
the
low
threshold
percentage
and
deleting
three
four
at
a
time
either
create
dummy
images
or
Identify
some
appropriate
for
image,
deploy
a
port
and
then
delete
it.
The
image
will
Dash
after
a
while,
the
disk
space
will
reach
low
threshold
percent.
First
and
GBC
will
be
triggered.
The
GC
shouldn't
either
delete
the
earliest
unused
image.
This
does
not
happen.
Priorities
sometimes
given
to
other
images
and
more
than
one
is
deleted
in
a
random
order.
B
E
E
So
they
are
saying
the
configured
high
threshold
percentage
value
triggers
garbage
collection,
okay,
yeah,
that's
what
we
are
expecting,
which
relates
images
in
order
of
the
last
time
they
were
used
but
I
think
that's
not
happening.
E
E
E
So
I
think
this
is
like
this
says
here
it's
related
to
whatever
CRA
plugin
they
are
using.
But
then
this
author
is
saying,
even
though
this
is
not
a
bug
in
case,
it
can
be
done
in
a
better
way
without
dumping.
This
responsibility
on
CRA
plugin.
B
D
I
mean
it
does
look
like
now.
We
do
do
something
related
to
that.
I
think
it
sorry
I
mean
we
are
I,
think
sorting
by
the
last
used
in
the
list,
but
at
least
that's
the
list
that
we
are.
E
I
think
if
we
can
try
to
reproduce
this,
that
would
be
the
right
step
to
do.
I
think
so
again,.
E
And
then
sort
of
see,
if
there
is
it's
required
to
change
the
documentation.
A
So
didn't
comment
after
or
black
right.
A
A
Yeah
just
ask
a
creator
of
the
bug
with
a
with
what
I'll
break
suggested.
Is
the.
A
A
Yeah
but
yeah
it
will
apply
a
label.
E
A
C
D
I
did
look
into
this
one.
Actually
I
was
able
I
asked
them
to
reproduce
in
later
versions.
They
said
they
were
able
to
I.
Don't
I
have
a
repo
case
on
there
for
the
local
up
cluster,
but
I
don't
know
if
this
is
really
a
book.
I
think
I
am
unclear.
E
Cubelet
and
respect
resolve
conf
when
resolved
one
is
empty
or
full
of
comments.
When
passing
the
world
can't
play,
it
did
not
respect
the
option
and
still
copy
the
Etc
resolve
conf
to
the
Pod.
What
makes
it
special
is
that
resolved
cons
provided
by
myself
is
empty
after
skimming
the
code
I
found.
The
logic
is
problematic
because
resolve
quantity
is
empty
and
empty
area
or
no
error
is
returned,
as
if
DNS
policy
is
default.
E
The
part
DNS
type
turns
to
because
a
container
D
will
get
an
empty
DNS
config,
so
it
will
copy
resolve
con
from
host
to
container
the
expectation
is
cubelet
should
refuse
to
apply
the
wrong
empty,
resolve
conf,
to
create
a
pod
and
complain
with
errors,
so
I
think
they
want
like
they
are
suggesting.
We
have
some
validation
for
this
parameter.
E
E
A
They
say
that
if
you
pass
an
empty
resolve,
config
then
Google,
it
will
disregard
it
and
use
it
default.
One
and
this
causes
some
problems,
I'm,
not
sure
what
kind
of
problems
it
causes
for
them,
but
but,
like
Antonio,
asked
like,
why
would
you
even
consider
it
passing
empty
resolve
config
file,
and
the
scenario
here
is
that
somebody
pass
a
resolve,
config
files
that
is
empty
originally,
but
then
it
will
be
written
by.
A
C
A
So
it's
they
want
to
bootstrap
is
empty,
but
then
have
it
updated
later.
E
A
Nice
condition
so
between
somebody
python,
empty
file
and
filling
it
up
two
years
later,
so
it's
kind
of
a
race
between
like
how
configuration
being
applied
so
instead
of
writing
config
file
first
and
then
starts
in
kublet.
Somebody
start
kublet
and
then
want
to
populate
this
file
because
it
will
be
added
later.
A
I,
don't
think
from
signal
will
be
any
action.
I
would
suggest,
remove
from
signals
and
say
like
yeah,
it's
for
Signature
networking
to
decide
how
to
migrate.
Customers
to
this.
A
C
A
C
E
E
The
memory
manager
unexpected
admission
error,
dual
socket
server
with
threads
and
GP
manager,
started
policy,
topology
manager
policy
based
effort,
10
gigs
of
RAM
watched
here.
If
I
try
to
allocate
two
guaranteed
pods
of
this
one
is
admitted
and
the
second
one
fails
with
unexpected
admission
error,
even
if
it
would
fit
using
memory
of
both
pneuma
nodes.
E
E
Have
some
memory
reserve
for
Numa
0
launch
to
identical
paths
with
memory
limits
really
close
to
the
max,
so
one
pod
fit
on
pneuma
one,
but
this
second
one
doesn't
fit
on.
You
want
zero.
F
Hey
I'm
I'm
monitoring
this
one
a
few
and
this
other
person
is
a
teammate
of
mine.
So
we
can,
you
can
even
assign
it
to
me
and
we're
going
complete
the
triage
we're
discussing
the
if,
if
the
behavior
is
I
mean
consistent
with
the
cap
and
the
expected
Behavior
or
not,
and
we're
still
trying
to
figure
out
if
it's
an
actual
bug
or
not.
E
Okay,
could
you
please
summarize
what
the
issue
is.
F
But
the
behavior
is
not
that
and
they
were
expecting
the
the
Pod
to
be
to
be
pending,
but
this
is
not
actually
possible.
So
the
only
possible
behavior
is
that
if
the
pods
goes,
It
goes
up
and
consumes
memory
from
both
Numa
zones,
but
we,
this
is
actually
or
memory
manager,
specific
behavior.
So
we
need
to
actually
to
Deep
dive
with
the
behavior.
Is
legal
or
not?
Okay,.
F
A
I
think
Tyler
is
trying
to
G
is
a
feature
so
yeah.
A
F
E
Next
one
note
status
error
handling
in
cubelet.
The
nodes,
ready
condition
is
true,
but
node
status
address
is
empty.
This
indicates
an
issue
populating
the
addresses
field.
C
A
Yeah
I
think
it's
a
little
bit
involved
into
node
addresses,
but
mostly
it's
significantly.
E
A
A
Because
of
external
provider
and
how
we
handle
pod
IP,
let's
keep
note
here,
but
let's
not
triage.
Let's
wait
for
signature.
E
C
C
A
A
Something
similar
reason
before
so.
C
A
A
Unless
somebody
on
this
call
wants
to
take
a
look,
no
no
okay,
yeah
there
is
a
person
who
is
looking
into
in
place
with
PE
I
will
try
to
find
them
the
advantage
issue.
Okay,.
E
D
I,
looked
at
this
one,
a
little
bit,
I
think
what
they're
trying
to
say
is.
They
probably
are
suggesting
it's
not
a
bug.
It's
a
feature,
I
think
they
want
CRI
to
tell
you
whether
or
not
the
images
are
compressed
or
not,
and
then
they
already
posted
an
issue
on
the
CRI
tools,
repo
and
I.
Guess
somebody
suggested
posting
it
here,
because
at
least
that's
what
I
think
is
going
on
in
this
issue.
C
E
Because
there's
one
I
think
the
uncomfortable.
E
Sure
do
you
want
to
add
it
add
the
details
here
to
documentation
as
well.
A
E
E
Yeah
but
I
mean
the
behavior
of
the
current
API
to
to
show
compressed
size
of
each
image.
Do
we
want
to
document
that
behavior.
A
Yeah,
we
need
to
document
what
we
currently
have
and
then
decided
whether
we
want
to
change
it.
A
A
A
Yeah
I
think
unless
somebody
that
back.