►
From YouTube: Kubernetes SIG Node 20210728
Description
Meeting Agenda:
https://docs.google.com/document/d/1j3vrG6BgE0hUDs2e-1ZUegKN4W4Adb1B6oJ6j-4kyPU
A
Hello,
it's
july,
28
2021
signal
ci,
subgroup,
meeting,
hello,
everybody
we
have
a
small
agenda
and
the
very
top
agenda
item
is
our
best
stability
for
release.
We
have
some
test
failures
recently
and
mike.
I
just
added
another
agenda
item
mike.
Do
you
want
to
start,
and
so
we
take
it
out
of
the
way.
B
Girl,
sorry
for
this,
so
I
finished
merging
this
pr.
In
particular,
I
was
wondering
if
it's
okay
to
to
backward
this
to
122..
I
understand
that
the
deadline
was
yesterday,
but
I'm
not
sure
if
it's
still
possible
or
if
it's
not
really
critical
for
this
version.
Oh.
C
I'm
not
muted,
so
back
ports
are
anything,
that's
release,
critical,
slash
blocking.
So
let
me
see
what
the
pr
is
I
haven't
looked
at.
It.
C
A
A
C
It
is
it's
it's
important
that
we
eventually
backport
this
because,
like
we
don't
want
things
that
are
potentially
failing
us
to
fail,
122
conformance.
A
A
Okay,
let
me
share
my
screen
and
open
the
link.
A
C
That
I
opened
yes,
we
can
see
your
screen.
I
just
pulled
in
the
pr
that
I
opened
for
the
dynamic
cube
config
metrics
to
122.
So
that's
new.
A
C
A
We
may
not
need
it,
because
my
only
thing
that
it
may
have
happened
because
master
was
bummed
to
123
already
and
that
started
fairing
in
master
and
they
may
not
fail
in
122
branch.
D
A
C
A
Okay,
well
I
mean
it's
already
approved
and
a
small
change,
so
I
likely
we
can
just
take
it
and
see
what's
going
on,
but
it
will
be
interesting
to
confirm
what
the
behavior
is
supposed
to
be
for
hidden
using
heightened
metrics,
especially
since
we
cannot
unhide
them
easily
and
to
end
this.
C
Yeah
well
so
I
I
flagged
this
to
han
earlier
today,
because
I
think
this
is
honestly
the
first
time
that
somebody
has
tried
to
use
this
for
a
feature
deprecation,
as
opposed
to
just
like
removing
metrics
and
replacing
them
with
something
else
so
like,
and
I
think,
like
the
deprecation
process
was
designed
to
work
for
like
stable
metrics.
I
haven't
seen
somebody
try
to
deprecate
an
alpha
metric
quite
like
this
before
so.
C
Basically,
I
am
not
entirely
sure
we're
working
in
stick
instrumentation
on
like
what
conventions
for
this
sort
of
thing
should
look
like,
because
we
haven't
really
gone
through
this
much
before
so
I
mean
it's
possible
that
just
like
the
release
version
has
already
been
bumped
to
123
in
master
and
that's
why
some
of
these
tests
are
failing,
but
then
that
doesn't
explain
like
why.
We've
been
unable
to
debug
all
of
the
other
test
failures
that
are
causing
problems
for
those
on
the
call
who
are
not
sure
which
test
failures.
C
I'm
talking
about
there
are
two
issues
in
here:
marked
release
blocker
and
one
of
them
is
a
sig
storage
test.
So
I've
not
been
focusing
on
that
one.
One
is
the
cubelet
serial
failures,
so
we
need
to
get
that
fixed.
We're,
not
entirely
sure
why
those
are
failing,
but
we
have
a
number
of
tests
that
started
failing
particularly
these
dynamic
cubelet
config
reliant
tests
that
started
failing
during
test
freeze.
C
So
we
need
to
figure
out
what's
going
on
with
that
before
we
can
release,
because
the
signal
is
not
great
right
now
and
because
the
job
is
like
overall
broken,
it
makes
it
even
harder
to.
We
can't
just
be
like
okay,
it's
green
great.
Let's
go
so
it's
possible
that
maybe
we
should
do
is
we
should
mark
the
two
tests
that
are
known
failures
as
flaky
and
then
they'll
get
skipped,
and
then
we
can
possibly
get
the
job
green,
but
I
have
not
previously
suggested
that.
E
C
Well,
what
you
can
do
is,
if
I
add,
like
a
flaky
label
to
them
now
or
something
like
that.
Then
later
you
can
take
that
off.
Yeah.
E
C
Okay
yeah,
so
that
sounds
good.
I
can
go
ahead
and
do
that
and
maybe
that
will
give
I
know.
Dims
has
been
not
super
confident
because
those
jobs
are
failing,
and
so
when,
for
example,
we
did
the
run
c
bump,
we
couldn't
get
any
green
signal
being
like
yes,
this
is
good
to
go,
let's
ship
it,
so
I
don't
know
if
we'll
be
able
to
get
cubelet
cereal
to
green
before
the
end
of
the
week,
but
I
could
certainly
try
so.
C
C
Circa,
do
you
know
if
there's
any
way
to
get
more
than
two
weeks
of
history
on
test
grid,
or
is
that
the
limit,
because,
like
a
month
at
this
point,
would
be
good?
The
problem
is
test.
Freeze
is
so
long
that
I
can't
compare
back
to
like
july,
7th
july,
8th
anymore.
C
C
A
F
C
A
I
mean
this,
may
I
mean
we
have
this
pr
from
matthias
to
increase
the
timeouts
right.
So
maybe
we
can
just.
C
Yeah
so,
but
at
this
point
like
we're,
not
fixing
flakes
anymore,
we're
just
trying
to
fix
things
that
are
going
to
break
the
release.
So
they're
like
test
flights
are
annoying,
but
you
know
it's:
it's
not
release
breaking
make.
D
C
C
And
peter
do
you
know
what's
going
on
with
the
two
or
no
not
two
just
one,
the
one
cryo
test
failures
there.
D
I
do
actually
have
a
suspicion
and
I
I
think
it's
a
default
of
a
change
in
cryo,
but
I
want
to
see
like
I.
I
think
it
is
also
a
the
a
problem
with
the
way
that
the
test
is
so
I'm
like
trying
to
see
there
might
be
a
cute
fix
that
I'll
propose,
but
it
will
not.
It
shouldn't
have
to
be
a
release.
Blocker,
okay,.
C
A
So
I
saw
the
comment
that
it
may
be
related
to
fedora
version,
something
like
that.
D
Actually
so
the
suspicion
is,
there,
we've
been
in
cryo
trying
to
drop
the
pause
container
by
default,
and
I
think
the
the
test,
the
like
stat
summary
test
is
expecting
the
pause
container
to
exist
or
nc
advisor
also
is
so
it
like
is
failing
to
get
some
stats
and
I
think
that's
kind
of
muddling
with
the
test.
So
I
need
to
I
need
to
like
dig
in
to
see
who
who's
at
fault
there.
A
D
Yeah
I
mean
I
also
have
a
pr
up
to
stop
that.
If
people
want
to
look
at
that
now,.
C
C
Our
list
for
122
stuff
should
now
be
decent
bit
shorter.
Only
four
things
left.
A
Yeah,
I
wonder
like
for
this
bump
of
metric
deprecation
to
123.
We
don't
run
serial
on
122..
A
C
What
do
you
mean
we
have
no
way
to
validate
that?
It's
working
I
mean.
A
We
don't
run
serial
tests
on
122
periodically
right,
so
122
is
just
google
features.
C
C
Like
we
do,
we
do
generally
want
to
get
things
in
earlier
versions
to
be
like
more
stable
now
right,
so
we
spend
a
bunch
of
time
trying
to
unbreak
the
serial
tests.
I
mean
now
is
probably
not
the
time
to
be
like
by
the
way
we
want
to
add
all
this
stuff
to
like
122,
but
I
think,
certainly
going
forward.
We
can
look
at
like
adding
some
of
the
stuff
and
adding
periodics
as
soon
as
like
you
know,
we
we
cut
123
or
something
I
think
we're
making
progress
on
stability
here.
C
It's
just
a
matter
of
you
know
getting
it
done.
I
I
am
worried
about.
Like
I
mean
fundamentally
right
now,
the
problem
is,
we
had
like,
probably
a
dozen
things,
all
land,
the
day
of
code
freeze
and
as
a
result,
we
had
a
bunch
of
regressions
happen
all
starting
to
get
code
freeze
and
it's
impossible
to
tell
you
know
which
was
what
so
it's
like
challenging
to
kind
of
go
through.
I,
for
example,
like
looking
at
the
cubelet
serial
failures,
the
new
failures,
like
I
thought.
A
C
Sure
there's
a
few
things
that
were
showing
up
that,
I
think
were
legitimate.
For
example,
like
we
had
a
the
thing
that
we're
really
trying
to
track
down
right
now
is
like
the
reason
a
bunch
of
the
tests
are
failing
is
that
we
have
this
one
test
that
like
fills
up
the
disk
and
it
was
failing,
which
meant
it
was
not
cleaning
up
after
itself.
So
there
are
all
these
files
left
on
the
disk.
The
disk
was
out
of
space
and
then
all
the
rest
of
the
tests
would
fail.
C
C
C
Indicated
a
real
problem
with
like
static
pod
storage
cleanup
so
but
yeah
without
all
you
know
with
all
this
other
stuff
potentially
blocking,
then
it's
like.
Oh,
why
is
this
failing,
and
why
is
that?
Failing
everything
is
failing.
A
Yeah
but
look
at
this
list
if,
if
you
will
confirm
that
this
is
this,
thing
was
caused
by
version
bump
and
master.
So
if
version
bump
in
master
happened
at
720,
then
probably
it's
it's
going
to
coincide
with
the
start
of
failures
right.
So
if
it
happens
at
state,
then
we
can
tell
that
this
is
not
a
cause.
We
don't
need
this
fix
in
one
page,
two.
A
And
then
what
will
be
left
now?
We
don't
know
because
we
cannot
run
all
the
other
tests,
so
we
need
another
fix
to
make
it
flaky
right
I
mean
mark
test,
is
flaky,
and
then
we
can
get
a
green,
potentially.
C
A
Yeah,
my
problem
is
we're
trying
to
make
serial
testers
release
blocking,
but
then
we
never
will
run
them
periodically
for
122
plus
we
never
did
it
before
so
like
we
have
some
test
failures.
Like
so
far,
we
only
discovered
test
failures
and
test
inconsistencies.
Yeah.
C
It's
not
so
I
mean
for
like
additional
context.
I
think
the
worry
is
not
merely
that
you
know
like
we
want.
We.
We
do
want
to
treat
these
things
as
cereal
and
whatnot,
because
we
had
such
a
big
refactor
with
the
plague
land
in
122.
I
think
we
want
to
be
really
sure
that
this
is
not
going
to
cause
problems,
and
I
know
at
least
in
initial
testing
that
we've
been
doing
at
red
hat.
C
We've
been
having
issues
with
the
performance
of
clusters
going
from
121
to
122,
particularly
with
like
io
performance
like
very
initial
results,
but
like
concerning
enough
for
me
to
say,
like
I
really
want
to
get
these
tests
green,
and
I
want
to
like
try
to
get
an
idea
of
what's
what's
going
on,
so
that
we
don't
end
up
like
releasing
with
big
blockers
like
that
or
big
performance
regressions.
A
Yeah,
I
I
don't
know
like
I
understand
the
potential
critical
bugs
we
can
discover,
but
at
this
stage
I
don't
know
how
much
we
should
market
as
the
release
blocking
we
never
did
and,
like
you,
followed.
C
F
We've
also
found
there
have
also
been
multiple
that
were
critical,
regressions
and
so
until
we've
at
least
investigated.
All
of
them,
like
it's
kind
of
worrying,
for
an
actual
release.
C
Yeah
danielle,
you
did
that
one
that
I
found
there
was
the
new
failure
and
you
did
the
deep
dive
on
that
one
and
that
one
was
again
like
pretty
serious
bug
that
we
caught
so
that
one,
I
think,
was
also
marked
like
critical,
urgent
fix
right
now.
F
C
C
My
concern
is
if
we
have
a
bunch
of
like
tests
that
previously
or
not
failing
that
are
now
failing,
we
need
to
like
get
rid
of
all
of
the
signal
causing
them
to
fail
and
like
make
sure
like.
Are
they
failing
for
legitimate
reasons
or
are
they
not
because
we
don't
have
any
other
things
to
really
go
on
here?
So
yeah.
F
F
The
rest
of
the
eviction
failures
by
the
way
seem
to
mostly
be
actual
flakiness,
as
opposed
to
being
broken,
and
that
I
can
make
them
all
pass
individually,
but
running
them
all
together.
Like
is
kind
of
a
mess,
and
so
I'm
trying
to
make
those
not
be
trash.
C
Yeah
well
so
I
think.
C
The
the
eviction
stuff
are
not
in
the
milestone,
I
think
like
we're
kind
of
going
through.
If
you
I
don't
know,
maybe
my
screen
is
frozen,
but
if
you
refresh
the
122
milestone
list
it's
much
shorter
now
because
I
bumped
a
bunch
of
stuff.
C
Oh
yeah
yeah,
so
we
just
got
the
four
items.
We
know
that
there
seems
to
be
some
sort
of
actual
sub
path,
regression
happening
with
sig
storage
and
we
need
to
help
them
if,
if
that
ends
up
being
on
us
and
then
similarly
for
the
serial
failures
like
we
need
to
get
that
test
semi-green
as
soon
as
possible,
so
that
we
can
say
it's
not
our
problem,
we
nothing's
broken
it's
all
cool,
but
right
now
I
just
don't
have
the
confidence
so.
A
G
Maybe
it's
worth
to
mention
that
we
also
have
cp
measure
when
it's
broken,
and
the
problem
is
that
it's
not
only
the
cpu
manager
is
broken.
It's
all
managers
broken
because
it's
like
because
of
the
place
manufacturer,
but
again
like
we
assumed
that
the
buggy
behavior
is
the
expected
one,
and
now
we
should
to
fix
all
manager
like
cpu
device
and
memory
managers
according
to
clayton
refactoring,
I'm
currently
working
on
it.
G
G
It
will
not
crush
the
kubrick,
like
we
already
discussed
under
the
under
the
threat.
But
again
it
can
be
pretty
serious.
G
E
G
C
Yeah,
there
are
definitely
some
flakes
for
those
sorts
of
issues.
I
think
that,
like
those
wouldn't
be
release
blocking
like
not
everything
out
of
the
box
is
going
to
be
using
the
cpu
manager
so
like
they're,
definitely
urgent.
We
should
fix
them.
They're,
probably
not
going
to
block
the
release
but
like
the
cubelet,
is
failing
to
clean
up
files
from
static
pods,
for
example.
That
would
block
the
release.
C
So
just
we're
so
deep
into
test
freeze-
and
it's
like
it's
really
long
four
weeks
so
yeah.
A
C
Saw
that
issue
I
punted
it
from
the
milestone,
but
or
I
rather,
I
think
I
didn't
pull
it
into
the
milestone
because
it
was
filed
late,
definitely
a
priority
and
we
should
probably
like
you
know
we
have
too
many
p0s.
So
maybe
that's
p1.
C
I
don't
know
we
may
want
to
talk
to
the
release
team.
A
C
Yeah,
I
was
just
gonna,
say
yeah.
I
have
no
problem
with
that,
because
triage
like
everything
that
we've
been
doing
at
friage,
is
probably
gonna
be
lower
priority
than
this.
So
yeah.