►
From YouTube: Kubernetes SIG Node CI 20221012
Description
SIG Node CI weekly meeting. Agenda and notes: https://docs.google.com/document/d/1fb-ugvgdSVIkkuJ388_nhp2pBTy_4HEVg5848Xy7n5U/edit#heading=h.2v8vzknys4nk
GMT20221012-171300_Recording_1628x1120
A
Hello,
everybody,
it's
a
signal,
CI
weekly
meeting
it's
starting
late
today,
because
I
was
late
and
we
have
very
low
attendance
because
of
that
probably
people
dropped.
So
if
you're
watching
recording,
we
will
be
doing
back
triage
and
task
triage
and
look
at
the
state
of
our
tests.
A
So
let's
get
the
Dive
Right
In,
yeah
I
started
tracking
this
test
stability
in
this
spreadsheet
and
what
I
was
hoping
to
see
is
like
this
kind
of
picture
like
green,
green
and
sometimes
it
will
be
non-green
and
it
will
be
like
single
problem.
A
But
what
happens
typically
is
we
have
this
kind
of,
for
example,
when
I'll
cry
or
continuously,
when
the
tub
is
so
huge
that
it's
just
every
like
some
tests
like
eviction
and
performance
failing
over
and
over
again,
but
sometimes
it's
different
set
of
pests
like
you
see
like
what
happened
here
and
then
like
it
switched
again,
so
maybe
I
will
need
to
change
the
format.
A
little
bit
and
I
want
to
learn
like
I
know
that
secret,
like
reliability.
Working
group
is
doing
something
on
tracking
job
statuses.
A
Maybe
I
will
try
to
borrow
some
tuning
from
some,
but
in
any
case,
I
looked
at
tests
and
it
all
looks
exactly
the
same
as
classic
so
no
news
here,
but
I
actually
need
to
start
to
look
into
that
so,
okay,
this
is
it
and
from
triage
perspective,.
A
No,
it
doesn't
look
anything
new
here:
okay,
I
I
looked
yesterday
and
it's
a
night
and
there
were
nothing
that
new
for
our
group.
Everything
is
here
and
I
also
cleaned
up
this,
neither
proper
top.
It's
only
three
right
now:
it's
it
needs
some
approval
from
Brunel.
Maybe
I
think
there
are
two
from
Francesca
and
one
is
it's
a
functionality
change.
So
it's
a
little
bit
different.
A
Okay
from
we
have
issues
to
do
as
well.
So
if
you
want
I
can
just.
A
Pay
Peter:
you
have
two.
A
Okay
last
reminder
was
a
few
months
back.
A
On
June,
let's
see
if
it's
still
failing.
C
B
Have
this
tab
open
and
it's
in
my
backlog,
but
it's
kind
of
at
the
bottom
of
it.
A
A
bit
updated
okay
and
next
one
is
this
color
shouldn't
volume
really
blink
is
like
also
assigned
to
you.
A
So
what
was
the
original
issue?
The.
B
B
Yeah
I
can
keep
it
yeah
I'll
I'll,
keep
it
and
and
try
to
get
back
to
that.
One.
A
Yeah
and
I'm
looking
through
that
not
to
like
poke
on
anything
I
just
is
there
is
something
that
assigned
we
don't
like.
Let's
unassign
everything
we
do
not
plan
to
work
on.
B
Yeah
I
think
it'd
be
good
if
I
work
on
this
I
just
have
to
find
the
time.
Okay,.
A
D
E
B
I,
don't
remember
exactly
the
state
of
the
test
now,
but
it
does
kind
of
unify
how
we
set
up
all
of
the
tests
and
I
suspect
it'll
help
the
swap.
B
A
Yeah,
if
you
or
somebody
on
the
call
needs
some
more
tasks,
let
me
know
we
have
plenty
here.
A
Okay,
going
back
to
agenda
nothing
new
Brian,
I,
see
you
join
a
call.
Do
you
want
to
talk
a
little
bit
about
this
performance
test
that
you
beaten
to
success?
Work.
D
Yeah
I'll
just
give
you
a
quick
update.
The
most
important
update
is
that
I
finally
found
a
system
where
I
can
properly
test
the
multi-arch
building
of
images
and
I
tested
the
pr
I
put
up
a
PR
this
morning
to
fix
it
and
then
I
got
this
new
system
and
tested
the
pr.
It's
still
not
good
enough,
so
I'm
going
to
keep
slugging
away
at
it,
but
at
least
I
have
a
good
system
now,
where
I
can
iterate.
On
this
thing,.
A
D
Learned
a
lot
about
that
yeah,
okay,
there's
an
amusing
thing
for
you.
You
know
the
previous
PR
I
put
out
pick
out
reviewed.
None
of
us
seem
to
know
the
procedure
because
I
failed
to
update
the
version
number
of
the
intended
image.
Apparently
I
was
supposed
to
bump
it
myself
in
a
file,
so
that'll
be
in
the
that's
in
the
pr
that's
currently
blocked
in
and
then
the
system.
D
D
A
Thank
you.
Okay,
if
nothing
else
for
tests,
I
will
oh
by
the
way
yeah
I'll,
just
keep
about
desktop
I.
Think
I
gave
up
this
last
time
at
last
time.
It
wasn't
accepted
now
it's
fully
accepted,
so
it's
it'll
be
implemented
and
as
there
is
a
PR
out
that
agent
gimko
V2
like
we
switched
to
Ginkgo,
V2
and
Ginkgo,
we
do
supports
tags.
That
is
not
a
check
stack,
but
actually
like
tags
that
can
conversate
with
the
test.
A
So
there
is
a
PR
out
by
somebody
to
switch
to
the
stacks,
and
this
PR
introduces
all
this
new
labels.
So
once
this
is
intends
we
can
start
switching
test
to
actual
like
test
tags
rather
than
like
text
description
with
like
thingy
like
music,
such.
A
Okay,
and
with
that,
we
will
go
to
bug
triage.
A
A
I
think
this
one
is
bitter
and
Ryan,
since
you
want
to
call
do
you
want
to
discuss
this
one,
it
was
reopened.
A
I
already
opened
it
so
I
can
comment
on
that.
Okay,
so
I
think
I
just
wanted
to
reiterate
that,
so
it
feels
that
there
is
some
problem
with
C
group
V2,
some
folder
being
removed,
that
shouldn't
have
been
removed
yet,
and
somebody
complains
that
they
have
too
many
of
these
messages.
So
the.
C
A
Fixed
it
was
to
just
change
the
verbosity
level
to
four
preventing
this
message
to
be
outputted,
so
my
worry
with
that
is,
since
we
don't
know
the
root
cause,
and
we
know
that
this
message
was
written
so
many
times
like
over
and
over
again.
It
feels
to
me
that
we
will
start
like
if
customer
has
a
node
with
ports
that
being
created
and
deleted.
Very
often,
then
you'll
accumulate
so
many.
A
This
cleanup
workers
that
it
will
be
it
will
lead
to
some
boom
problem
and
like
equivalent
boom
killed,
and
my
problem
is
that
is
we
hit
the
error
message,
so
we
don't
even
know
why?
It's
not.
Why
like
what's
happening,
so
there
is
no
indication
that
it's
about
to
get
bad,
but
it
will
go
bad
after
some
time.
E
A
E
Well,
it'll
stop
once
the
volume
gets
cleaned
up
and
so
because.
A
E
As
an
problem
versus
the
original
report,
because
the
original
report
came
in
from
Daniel,
CB
and
I,
don't
think
he
mentioned
that
there
was
a
memory
issue.
E
E
Because
I
know
workers
it's
streaming,
it's
writing
the
log
file
and
so
the
patch
that
bumps
the
log
line
up
to
V6
won't
print
that
message
anymore.
So
we
won't
get
those
extraneous
logs.
The
logging
itself
shouldn't
be
leaking
memory.
A
Accumulation
workers
that
used
to
do
this
login
right.
E
A
Me,
but
we
will
accumulate
loggers
that
used
to
be
like
workers
that
used
to
do
this
login,
so
they
will
keep
executing
this
piece
of
code
and
there
is
I
I.
Just
don't
know
if
there
is,
if
you
can
confirm
that
this
will
stop
at
some
point.
So
if
there
is
no
stop
of
executing
this
line
of
code
and
like
there
is
no
way
to
recover
from
that,
then
Kubota
will
accumulate
something
in
internally
in
in
its
state.
A
E
Here
the
logging,
the
log
message-
is
on
a
non-error
path
and
so
I
didn't
see.
Anything
in
this
bug
report
saying
that
the
air
that
the
cubelet
is
not
behaving
correctly.
B
A
Yes,
exactly
and
I'm
trying
to
understand
like
if
there
is
no
exit
from
that
then
I
I
think
yes,
I.
Think
I'm
conflating
this
to
report
that
somebody
else
complaining
about
similar
problem
on
C
group
with
two,
but
they
were
saying
that
zero
note
is
running
ports
like
creating
and
removing
ports
all
the
time,
so
they
get
to
the
really
quickly.
B
And
the
is
happening
for
the
the
pods
or
like
the
container
processes
or
it's
is
it
happening
to
the
cube
list.
B
Right
so
I
would
more
expect
we'd
have
to
look
at
the
at
the
code
path.
I
would
more
expect
the
qubit
spinning
on
this
to
accumulate
like
to
overuse
CPU
and
not
memory
like
it's
I
wouldn't
expect
any
memory
to
be
accumulating
for
it
like,
unless,
like
the
runtime
has
a
stack,
that's
growing
indefinitely
and
that
is
taking
up
a
bunch
of
memory
but
like
I,
could
see
it
like
this.
D
It
sounds
to
me,
like
Sergey
is
saying
that
for
each
one
of
these
churning
pods
there's
going
to
be
a
new
worker,
that's
going
to
start
up
and
it's
going
to
continually
try
to
delete
something.
It
can't
delete
and
we
just
won't
see
it
anymore.
That
would
be
a
source
of
CPU
churn,
but
also
memory
for
every
worker
foreign.
A
Is
also
not
very
good,
I
mean
just
wasting
it
on
something
that
will
never
clean
up
is
no
indication
that
it's
happening.
It's
also
bad.
B
Yeah
I
do
agree
that
it's
it's
bad
I,
just
you
know
it's
not
the
the
connection
is
necessarily
clear,
though
it
is
I
think
it
would
be
fairly
easy
to
reproduce.
If,
if
we
like,
you
know,
run
the
cubelet
create
100
Bots
delete
the
path
in
each
of
these
then
delete
the
pods
and
then
do
that.
You
know
a
couple
of
times.
You
should
see
a
linear
memory
growth.
A
Yeah
that
may
be
a
good
course
of
action
and
with
regards
to
just
CPU
churning,
since
we
don't
have
any
like
do,
we
have
any
way
out
of
this
state
since
it's
already
been
deleted
like
is
there
any
way
to
recover?
If
not,
then
maybe
we
can.
We
can
just
detect
that
it's
not
recoverable
or
something.
B
I'm
pretty
sure
we
dropped
a
lot.
Did
we
drop
the
log
level,
because
this
is
an
expected
state
where
the
Pod
worker
is
racing
with
the
volume
manager
and
the
volume
manager
tore
down
the
volume
because
or
the
Pod
is
in
the
process
of
being
tore
down.
But
the
Pod
worker
is
like
still
looping
on
it.
Correct.
E
B
A
Okay,
I
think
we
can
ask
a
topic
starter
to
see
if
it's
the
same
ID
here
all
over
and
over
again,
because
because
if
it
is
the
same
AG,
then
it's
never
going
away
right.
B
A
B
A
Yeah
I
just
afraid
that
we
don't
have
any
idea
internally,
like
I,
think
I
think
we
have
a
lot
of
people
reporting
the
problem,
but
I
haven't
seen
any
of
the
Opera
locally
I
can
ask
David
Porter
as
well.
A
A
A
E
A
D
A
A
A
B
A
And
I'm
not
sure
about
priority,
as
it
sounds
like
super
critical,
but
we
can
change
it
later
if
needed.
A
A
A
So
allocated
devices
with
this
resource-
probably
this
isn't
new.
A
D
E
E
E
A
A
It
sounds
like
a
bug.
Actually,
we
need
to
like
keep
them
consistent,
but
I
think
it's
very
low
priority.
D
Sounds
like
it
clears
itself
up
after
after
a
moment
anyway,.
A
A
And
I
don't
want
to
Market
as
get
help
like
help
needed,
because
it
will
take
I
I,
don't
think
it's
if
it
will
retrievable
change.
A
D
A
A
A
C
C
A
Yeah,
what
I'm
trying
to
understand
first
is:
is
it
a
bug
or
it's
a
documentation
problem.
C
Yes,
yes,
that's
very
controversial,
so
current
implementation
of
popular
CRI
implementation-
just
add
supplemental
groups
defined
in
pod
to
the
group
information
defined
in
the
container
image.
C
So
if
cluster
administrator
secure
the
supplemental
groups
field
in
PSP
or
other
policy
engines
that
can
be
bypassed
very
easily
and
in
using
host
path
volumes
in
the
Pod,
so,
as
you
know,
host
path
volume
is
predicted
by
Legacy
uid
GID
information.
C
So
bypassing
group
information
leads
security
breaches.
So
we
recognize
that
this
is
very
security
issues.
However,
security
committee
answered
works
as
intended,
and
also
we
can
read
that
definition
of
supplemental
groups-
field
yeah.
The
description
can
be
read:
kubernetes,
just
as
supplemental
groups
gids
to
the
Divine.
The
group
information
defined
in
the
container
image.
E
C
If
we
can
recognize
this
behavior
is
bug,
we
should
fix
that
many
Sierra
implementations.
C
However,
we
recognize
this
Behavior
works
as
intended.
I
proposed
to.
C
A
A
I
mean
we
generally
can
do
this
one
like
first
one,
because
we
need
to
explain
what's
happening
today:
I,
don't
think.
Even
if
you
implement
New
Field,
it
will
not
be
back
ported
into
previous
versions.
A
So
I
would
say
it's
like
we'll
start
with,
probably
with
cap
process,
if
you're
familiar
with
cab
processes
is
a
process
to
produce
new
features
in
kubernetes,
and
it
will
take
some
time
so
I
think
if
you
want
to
take
this
one
or
like
just
improve
documentation
and
explain,
what's
happening,
it
will
be
great
and
then,
if
we
need
to
do
that,
then
it
will
be.
It
wouldn't
be
a
bug
fix.
A
It
will
be
a
new
feature,
especially
when
60
security
already
said
that
it's
expected
Behavior
yeah
argue
with
them
so
yeah.
If
you
want
to
take
it
on
you
to
improve
documentation
and
write
a
blog
post,
it
will
be
super
great
I
think
it
will
help.
C
Yeah,
okay,
so
let
me
open
the
pro
request
to
improve
the
API
definition
first
and
then
and
then
I
will
plan
to
write
cap
or
something
foreign.
A
A
C
A
Okay,
we
out
of
box
and
we're
out
of
time.
Is
there
any
last
minute
nodes,
please
let
me
know
otherwise
bye
everybody.