►
From YouTube: Kubernetes SIG Node 20220427
Description
Meeting Agenda:
https://docs.google.com/document/d/1j3vrG6BgE0hUDs2e-1ZUegKN4W4Adb1B6oJ6j-4kyPU
A
Hello,
it's
april,
27
2022,
it's
a
signot,
ci
subgroup
meeting
welcome
everybody.
A
B
And
it
would
be
nice
for
us
to
get
people
looking
at
them,
so
I
already
went
and
opened
pr's
for
the
fedora
swap
and
ubuntu
swap
shops.
B
B
B
A
C
A
Hard
troubleshoot,
but
there's
different
failures.
I
think
cryo
has
this
conviction,
problem
or,
and
continuity
has
some.
So
there
are
two
different
evictions.
I
don't
remember
like
my
memory
doesn't
serve
me
well
right
now
and
flaky,
I'm
not
sure
like
I,
I
never
looked
at
it
peter.
Are
you
on
the
call?
Do
you
think
you
can
start
looking
into
that.
D
Hello,
I'm
here
yeah
I
can.
I
can
try
to
take
a
look.
I'm
like
this
and
next
week
are
kind
of
swamp,
trying
to
cut
stuff
for
cryo
124,
but
definitely
after
that,
and
hopefully
before
that
I'll
also
poke
my
team
and
see
if
we
can
get
any
assistance
as
well.
D
And
thanks
a
bunch
to
danielle
for
getting
started
and
taking
a
look
at
some
cryodrops.
I
really
appreciate
it.
A
Okay,
so
I
think
yeah
this
test,
I
looked
at
it,
it's
just
a
broken
image.
I
have
a
pr
for
that,
but
we
are
in
a
code
freeze,
since
I
I
don't
I
haven't
looked
at
this
one.
Don't
remember.
B
B
A
A
Yeah
so
I
looked
at
soccer
like
I
have
a
pr
for
that,
like
npd
verification,
it
doesn't
work
when
the
test
is
running
more
than
24
hours,
because
it's
look.
It
is
looking
for
events
that
supposedly
kicked
on
in
the
beginning
of
test
and
if
beginning
of
test
was
more
than
24
hours
before,
then
it
will
not
find
it.
So
I
have
like
work
in
progress
for
that.
But
then
sock
test
is
also
have
a
different
problem
over
time.
It
accumulates
so
much
something
that
disk
getting
overloaded.
A
So
we
start
getting
like
these
problems
and
it's
basically
because
it
is
a
soak
test
like
it
soaks
the
cluster
for
a
very
long
time,
and
it's
maybe
some
configuration
that
we
can
adjust.
I
really
don't
want
to
turn
it
off,
because
we
don't
have
any
other
soak
tests,
but
we
need
to
get
it
back
to
life,
but
npt
part.
It's
figured
out.
I
have
a
work
in
progress.
Pr
for
that.
Oh
sweet
yeah,.
A
Yeah,
like
I
remember
what
was
the
name
queen
or
like
a
person
who
joined
last
week,
he
said
that
he
will
take
a
look
at
this
flaky,
but
it
will
take
him
time
to
do
that.
So,
let's
see
if
there
will
be
progress.
A
Yeah
eviction,
I
think
we
need
to
find
owner
for
eviction.
We
looking
internally
in
our
team
and
in
g
key
note,
but
if
anybody
also
has
cycles
now,
let's
try
to
take
a
look.
I
think
this
is
a
long
longest.
Failing
test
is
an
unknown
issue.
That's
unknown
problem.
I.
B
Mean
so
the
problem
with
the
eviction
test
is
mostly
known.
The
pro
like
the
problem
is
fixing.
It
is
unknown.
B
The
eviction
tests
have
the
problem
of
interacting
with
each
other
and,
like
anything
else
happening
on
the
host.
So
if
we
actually
want
to
make
them
reliable,
we
need
to
do
do
something.
That's
gonna
mostly
involve
potentially
rewriting
a
lot
of
them
for
the
disc
ones.
I've
been
thinking
about
trying
to
switch
to
like
using
like
ram
disk
or
disk
image,
or
something
rather
than
the
host
disk.
A
Yeah
you
start,
I
remember
now
david
porter
mentioned
that
some
disk
problems
may
be
caused
by
long
like
slow,
filling
up
of
a
disk
and
david
on
the
call.
So
maybe
you
can
comment
like.
E
Yeah
I
mean
I
I
had
a
little
bit
of
time.
I
unfortunately
got
sidetracked
by
something
else,
but
I
did
a
little
bit
of
debugging
of
the
eviction
test
and
what
I
found
at
least
was.
It
was
trying
to
fill
up
the
disk
like
the
one.
That's
failing
the
local
storage
eviction.
One
is
failing
because
it's
trying
to
fill
up
the
the
whole
disc
and
the
disc
is
pretty
big.
I
think
it's
like
30
gigs
or
something
like
that.
E
40
gigs,
but
it
was
filling
up
very,
very
slowly,
like
10
megabytes,
a
second
or
something
like
that,
so
it
was
maybe
just
timing
out
before
it
even
got
a
chance
to
kill
the
disc.
So
that
was
my
investigation.
I
got
any
comments
on
the
bug,
so
maybe
we
just
need
to
speed
up.
I
mean
maybe
there's
other
problems
there.
Good,
no
question,
but
at
least
one
problem
it
just
seemed
like
it
was
filling
up
the
disc,
very,
very
slowly.
A
Okay
yeah,
if
you'll
comment
on
the
bucket
will
be
helpful,
and
maybe
you
can
allocate
smaller
disk
machines
for
that.
A
A
Perfect,
thank
you
daniel
and
david
yeah.
I
mean,
even
if
you'll
find
a
quick
fix
for
that
and
like
it
wouldn't
fail
for
a
while.
It's
good
first
step,
I
mean
I
really
need
to
get
into
green
state,
because
our
next
steps
is,
as
danielle
pointed
out
like
into,
is
to
improve
coverage
right.
A
So
it's
really
hard
to
start
improving
coverage.
If
we
cannot
get
to
the
green
for
so
long.
A
Daniel
are
you
there
yeah?
Do
you
want
to
talk
about
this,
like
thread
that
you
started
on
reliability
and
maintainability?
Is
there
any
action
items
you
want
to
start
immediately
or
yeah.
B
B
B
But
yeah,
like
we've,
had
a
lot
of
fairly
scary,
escaped
bugs
in
the
last
few
releases
that
it
would
be
nice
to
not
keep
repeating,
because
it's
really
hard
to
land
any
code
in
the
couplet
without
breaking
something
else.
Right
now
and
like
the
ci
cell
group
has
been
a
great
start
to
making
that
less
likely
to
happen
in
that.
B
If
tests
fail,
you
can
now
generally
assume
it's
because
the
test
failed
for
like
legitimate
reasons,
but
now
we
need
to
sort
of
like
move
forward
and
increase
the
coverage
and
also
in
some
cases
you
know,
write
code
to
actually
be
testable
and
stuff.
F
Hey
francesco
here
I
will
have
a
couple
of
open-ended
questions.
I'm
not
sure
this
is
the
right
place
and
time
so,
but
let
me
just
mention
it
and
we
probably
can
elaborate
offline.
I
would
be
happy
to
and
the
most
first
of
all.
Yes,
absolutely.
I
want
to
help
in
last
couple
months.
I
had
unfortunately
not
enough
time,
but
things
should
be
better
now
so
yeah
sign
me
in
thing
is
adding
tests
and
adding
the
necessary
refactoring
to
make
the
code
testable
as
ultimately
the
same
reviewer
bandwidth
issue
as
new
features?
B
Yeah,
so
that's
part
of
why
I
bought
this
up
in
the
general
stick
meeting
yesterday,
and
I
want
us
to
treat
this
with
the
same
importance
that
we
would
treat
you
know
caps
and
actually
as
we're
considering
what
caps
we're
going
to
accept
for
125
build
in
some
bandwidth
for
reviewing
and
improving
like
maintenance
prs,
because
they
are
really
hard
to
learn
today
and
like
a
pr,
adding
test
to
container
manager
sat
open
since
february,
without
review
until
yesterday,
and
I'm
hoping
that
if
we
can
get
it
actually
treated
as
a
priority,
especially
now
that
derek
is
back.
F
Okay
thanks,
I
I
will
just
say
that
I
will
actually
go
as
far
as
requiring
to
have
a
band
some
bandwidth,
for
that
I
mean
statically,
allocating
some
bandwidth
for
them
and
making
that
actually
a
goal
for
this.
This
is
probably
what
you
said
already,
but
really
I'm
reinforcing
that
for
the
next
couple
of
cycles,
because
I
agree
we
are
in
a
bad
state
and
we
should
improve.
B
B
B
And
also
to
try
and
motivate
people
to
you
know
encourage
people
to
write
us
before
merging
code,
like
the
amount
of
changes
to
behavior
that
we
land
that
don't
break
or
change
any
tests
is,
quite
frankly,
terrifying.
A
Yeah,
I
agree
with
all
counts.
I
think,
did
you
good
job
getting
there
getting
to
where
we
are
now,
like,
I
think,
in
the
past,
we
always
stumbled
onto
this
trap
when
you
want
to
increase
coverage
and
do
something,
but
everything
is
right
like
how
do
you
even
operate
when
so?
My
thing
needs
to
be
improving,
improved
and
even
today,
like
francesca,
I
don't
know
about
your
bandwidth
and
you
said
you
have
problems
with
bandwidth.
I
hope
you
still
have
time
to
debug
this
device
plug-in
test.
C
A
Just
disabled
and
then
said
it's
hard
and
like
I,
I
don't
want
to
put
it
peanut
on
you,
but
I
mean
without
this
test
coverage
we
have
very
little
visibility
into
quality
of
that
code.
So.
F
A
Yeah-
and
I
don't
want
to
see
not
on
you
specifically-
I
just
want
to
demonstrate
this
problem
when,
like
bandwidth,
is
really
a
problem,
an
issue
and
I
think
we
get
into
better
shape
this
bandwidth
wise.
So
we
have
more
more
people
having
time
to
investigate
things
and
healthy
things.
A
F
A
Danielle
did
you
think
of
any
ways
to
kind
of
enumerate
areas
where
we
need
to
better
test
coverage
or
like.
B
B
B
B
If
this
is
a
discussion
where
people
have
things
they
are
interested
in
and
care
about,
I'm
hoping
that
actually
motivates
something
to
change,
but
there's
a
lot
of
stuff
where
like.
If
we
have
like
happy
path
testing,
we
don't
test.
What
happens
when
say
like
there
are
grpc
issues
talking
to
a
cri
like
I've.
Had
customer
production
bugs
escalated
to
me
where
it's
basically
like
some
weirdness
happened,
we
couldn't
reproduce
it.
B
So
I
can't
say
what
weirdness
happened
talking
to
a
cri
at
like
the
wrong
time
in
the
kublet's
loop
and
just
like
breaking
a
bunch
of
container
state,
I
could
never
figure
out
exactly
what
happened.
I
couldn't
reproduce
it
like
just
somewhere
in
the
pod
loop
cri
returned
garbage
and
the
kubelet
just
broke
like
so
like
a
lot
of
like
sort
of
failure.
Case
testing,
where
we
have
happy
path,
testing,
also
potentially
different
types
of
failure.
B
Testing
would
be
nice
in
that
we
don't
really
have
any
today,
but
the
more
important
part
is
adding
coverage.
While
we
don't
have
any
today.
B
There's
also
a
lot
of
cases
today
where
it's
hard
to
tell
what
is
broken
versus
what
is
expected
behavior,
because
we
don't
have
tests
defining
what
the
expected
behavior
is
and
there's
no
like
specification
for.
The
kubelet
is
aside
from
like
conformance
tests,
and
they
are
nowhere
near
like
complete
enough
for
that.
A
Yeah
definitely
can
count
a
few
examples
of
those
as
well.
I
think
yeah
people,
like
example,
this
graceful
termination
when
we
still
kind
of
decide
whether
the
readiness
prop
supposed
to
run
during
graceful
termination
not
supposed
to
run
during
race
determination
like
we
can.
We
see
bugs
filed
for
both
situations
for
both
cases
it's
quite
annoying
and
we
need
to
decide
what
we
want.
This
kubernetes
yeah
another
example.
A
Anything
else
on
this
topic,
I
I
think
next
steps
will
be
like
danielle.
If
you
say
we
promise
to
start
the
mail
thread
and
like
get
more
activity
around
that,
so
we
will
probably
discuss
a
lot
during
this
meeting
and
our
main
sig
meeting
and
francesco.
Thank
you
for
bringing
up
the
reviewer
band.
This
problem
like
we
definitely
need
to
increase
that
and
improve
things
like
that.
F
Just
one
quick
note,
because
it
was
mentioned
about
the
dress
roots
management
area,
which
I
agree
by
the
way,
could
use
some
more
testing.
We
may
want
to
make
sure
we
involve
kevin
clues
from
nvidia,
which
is
very,
very,
very
expert
in
this
area,
and
I
think
it
should.
It
should
be
happy
to
help
on
and
assist
us,
so
just
make
sure
he's
in,
because
derek
was
mentioning
the
area
wanted
to
mention
it
so
make
sure
he's
involved.
A
And
especially
with
a
couple
caps,
we
keep
discussing
around
this
device,
the
allocation
and
device
plug-in
model.
A
Okay,
thank
you.
I
think
we
can
quickly
go
through
dashboard.
A
I
think
we
listed
all
the
test
failures
in
in
this
jim's
message,
but
if
do
you
did
you
notice
any
anything
new
anything
that
was
meshing.
C
A
B
Yeah,
sorry,
I
am
not
having
a
good
day
yeah.
I
will
yeah
feel
free
to
cc
me
on
anything
that
needs
review.
A
Yeah
looking
at
this,
it
feels
that
there
are
so
many
things
that
don't
relate
to
this
group,
so
I
hope
that
triage
will
be
super
fast.
A
A
Oh,
it's
product
change,
so
when
code
freeze,
so
I
think
we
can
mark
it
as
done
because
once
branch
will
be
open
that
will
be
merged
automatically.
A
A
So
this
is,
you
don't
need
this
any.
D
A
Okay,
crosstalk,
okay,
I
see
advisor
end-to-end
fails.
A
Oh,
it's
weird:
did
you
do
anything,
no,
all
right,
so
he
fixed
it.
He's
a
new
chad
manager.
E
B
A
So
this
is
waiting
on
author,
it's
yeah.
It's
me
experimenting.
A
A
And
let's
needs
a
review
because
I
think
we're
removing
some
code
there
from
like
skipper
code
that
keeps
execution.
A
And
we're
done
with
this
review,
I
think
we're
done
with
the
test
part
of
meeting.
We
will
go
to
bug
trash
right
now,
if
you
are
not
here
for
bug,
triage
feel
free
to
drop
off
and
for
bugs
we
only
have
10
like
actually
9,
because
it
also
counts,
as
pinky.
E
A
What
logs
will
help
cooperate?
Locks
wouldn't
have
this
transformation
right.
E
Yeah,
I
don't
think
so,
maybe
like
something
like
a,
maybe
if
they
ran
see
advisor
manually
and
then
compared
it
to
what
like
the
df.
You
know
I
just
on
the
linux.
If
you
do
like
df-h
and
show
all
the
file
systems
and
stuff
and
disks
mounted
try
to
see
why
supervisor
does
not
detect
the
disc
that
linux
is
exactly.
E
See
what's
wrong,
I
mean
yeah.
I
think
it
should
if
they
bump
hublot,
they
think
they
should
see
it
because
it'll
maps
the
same
verbosity
so.
E
So
maybe
maybe
a
couple
things
we
can
suggest
one
is
to
bump
up
the
the
kubelet
verbosu,
maybe
like
b4
or
something
like
that
b6
and
then
add
their
logs.
Second
suggestion
might
be
to
run
c
advisor
manually
and
compare
the
output,
as
reported
from
c
advisor
to
like
df-h
or
something
like
that,
which
lists
all
of
the
file
systems.
E
Maybe
they
have
some
weird,
I
know
see
if
there's
some
weird
stuff,
where,
like
zfs
and
other
weird
file
systems
that
are
like
standard,
ext4
and
stuff.
So
maybe
something
like
that.
E
Look
at
the
f
h,
for
example,
seem
like
that.
E
A
Yeah
very
badly
formatted
log
message
trying
to
understand
where
the
failure
is.
A
Can
you
remind
me
r,
phillips.
A
A
D
A
A
A
Yeah,
I
think
this
yeah
this,
you
already
replied
to
this.
A
Are
you
looking
for
somebody
to
assign
it
to
or
just
keep
them
back?
I.
F
A
Thank
you
west
coast,
less
code
is
better
okay,
then
I
think
we've
done
all
the
bugs
any
other
topics
for
today.
Any
anybody
will
have
anything
to
discuss.