►
From YouTube: Kubernetes SIG Node CI 20230111
Description
SIG Node CI weekly meeting. Agenda and notes: https://docs.google.com/document/d/1fb-ugvgdSVIkkuJ388_nhp2pBTy_4HEVg5848Xy7n5U/edit#heading=h.2v8vzknys4nk
GMT20230111-180428_Recording_1848x1120.mp4
A
Hello,
hello,
it's
January,
11th
2023,
it's
a
signaled,
CI
meeting,
hello,
everybody.
We
have
one
agenda
item
today
that
is
explicit
and
it's
on
Francesca
I
just
want
to
take
it
on.
B
Hello,
hey
so
yeah.
Let
me
give
you
folks
some
context
here,
so
some
end-to-end
tests
we
have,
notably
the
ones
related
to
the
research
measure
or
the
Pod
resources
API,
which
are
again
related
to
research
manager,
so
CPU
manager,
device
manager
pointing
case
and
they
require
some
form
of
device
plugin.
B
B
We
have
in
particular
one
or
two
few
tests
that
wants
to
check
some
combination
of
parameters
which
are
uncommon,
for
example,
point
in
case
device
plugin,
which
do
not
report
Numa
topology
information,
because,
as
it
is
today
for
the
device,
plugin
API
is
not
mandatory
to
report
in
the
Numa
topology
information,
so
it's
still
allowed.
B
So
we
have
this
need
because
you
know
we
want
to
check
actual
features
with
support
for
any
former
device
plugin
and
in
particular
we
will
really
really
benefit
to
have
some
device
plugin
which
do
not
report,
no
more
information,
no
matter
topology,
which
is
pretty
uncommon.
All
the
devices
will
only
measure
device
again
I
reviewed
that
they
want
to
report
Numa.
A
lot
of
pneumatology
information
fill
this
use
case.
We
we
did
a
research
back
in
time
and
we
among
the
few
device
plugin
available.
B
We
add
the
cube
vert
device
plugin,
which
long
story
short,
is
the
device
plugin
from
the
cube,
Cube,
weird
project
which
exposes
Dave
and
Dave
KVM
a
device
file
which
is
needed
by
virtual
machine.
So
it's
not
that
important.
The
pointing
case
is
that
that
plugin
back
in
time
was
supported
by
Third
parties
was
an
existing
device,
plugin
used
in
production
and
didn't
report,
no
more
information,
because
it's
a
device.
So
it's
available
on
it's
a
fake
devices
by
Linux
on
it.
B
B
We
read
that
didn't
notice
for
another
related
case,
but,
okay,
now
it's
broken.
We
have
without
this
device
plugin
and
that
covers
the
first
two
points.
I
add
now
the
problem
becomes-
and
this
is
actually
an
open
question
for
this
forum.
I
have
a
suggestion,
but
it's
for
awareness
and
to
open
questions
Forum.
Okay.
What
do
we
do
now?
Because
you
know
just
removing
the
test-
seems
the
less
serial
solution,
so
we
will
really
benefit
from
a
replacement
device.
B
Plugin
the
option
I'm
going
to
present
and
then
I'm
shut
up
is
extend
the
sample
device.
Plugin
we
already
use
to
you,
know
to
have
a
configuration
to
provide
or
not
provide
Numa
locality.
Why
is
this
is
not
obviously
a
good
solution,
because
to
end-to-end
test
will
really
like
really
like
to
use
them
as
close
as
possible
configuration
similar
to
production
as
close
as
possible
production
and
simple
device.
Plugin
is
used
only
test,
so
it's
not
a
real
device
plugin,
it's
not
something
used
by
people
which
consume
kubernetes.
A
A
I
know
we
had
this
similar
conundrum
with
credential
providers
that
we
are
now
moving
out
of
tree
I
mean
it's
a
simpler
problem,
because
I
think
a
surface
space
like
API
space
is
much
smaller.
A
So
we,
since
we're
running
on
gcp
most
of
the
tests
that
we
saw
that
maybe
need
to
take
gcp
credential
provider,
but
it
wasn't
well
accepted.
People
didn't
want
to
take
depends
on
specific
vendor
and
tests
are
not
portable.
After
that
I
mean
they're
they're,
somehow
portable
today,
with
taking
depends
on
specific
vendor
credential
provider
you'll
make
it
less
portable.
So
now
we
have
fake
credential
provider
in
three
in
in
our
test
folder.
A
That
kind
of
doing
this
logic
and
again
this
has
similar
concerns
that
it's
not
actually
a
testing
what
people
may
use
in
production.
A
So
I
guess
our
end-to-end
test
in
this
sense
is
mostly
integration.
Test
views.
B
It's
not
terrible
because,
but
it's
it's
still
sub-optimal
because
you
know
it's
something
interesting
something
fake
so
but
yeah,
really
I'm
I'm
out
of
options
is
if
everyone
has
an
idea
or
a
suggestions
or
want
to
chat
about
it.
Just
ping
me
on
slack
otherwise.
B
A
B
You
know
they
have
like
Dev
zero
or
it's
it's.
The
device
file,
the
okay
in
the
specific
of
that
is
that
that
device
value
is
used
by
the
qm
or
hypervisor
to
use
acceleration
provided
by
the
processor
through
the
mediated
by
the
Linux
kernel,
but
conceptually
okay
shared
now
wrong
link.
B
A
I,
don't
have
my
biggest
question
was:
is
it
how
portable
that
was?
It
sounds
like
it
was
quite
portable,
so.
A
Remember
we
didn't
have
any,
we
don't
have
any
real
devices
on
the
machine,
so
it's
hard
to
find
real
use
device
plugin
for
machines
that
doesn't
have
any
devices.
B
Using
a
real
device
in
a
real
device
plugin,
it's
completely
fine.
The
only
concern
I
have
okay.
I
need
to
learn
the
setup,
but
that's
minor,
very
minor.
The
only
concern
I
have
is
that
those
machines
are
more
expensive.
So
you
know
okay,
we
can
maybe
run,
but
it's
it's
friction.
It's
friction
that
leads
to
Lanes
running
less
often,
arguably
more
often
than
today,
but
still
and
so
I
was
optimally.
We
would
have
something
that
we
can
run
on
H
on
HPR.
You
know,
instead
of
like,
let's
say
daily
or
weekly,
like
integration
again.
D
C
B
E
Yeah
I
was
gonna,
ask
so
there's
different
tiers
of
testing.
We
want
this
to
run
on
PRS
and
potentially
we
want
to
do
integration
testing
too
right.
B
E
We
actually
mark
like
a
GPU
and
have
a
report,
various
whatever
it
needs
to
report
and
have
the
keyboard
interact
with
that.
B
Yeah
yeah,
we
we,
we
can
quote
the
just
unquote
mock,
simpler
device
than
GPU
or
GPU,
and
and
because
we
would
just
need
to
provide
what
the
device
plugin
API
needs.
My
issue
here
is
that
is
conceptually
a
mock
or
fake.
It
depends
on
us,
but
it's
not
a
real
thingy.
So
you
know
it's
it's
something
some
signal
it's
better
than
nothing,
but
it's
not
really
end
to
end
well.
E
I
was
gonna
suggest,
maybe
so,
if
we
did,
that
on,
every
PR
might
be
a
good
signal
that
nothing
is
broken
and
then
what
if
we
had
like
a
weekly
GPU
job
that
actually
spent
up
a
GPU
and
we
did
an
integration
test
with
that.
B
E
E
F
Times
right,
we
could
have
three
jobs,
one
runs
on
a
BPR
that
only
uses
a
fake
device
and
we
have
two
periodic
one
with
the
GPU
and
one
with
the
fake
device
as
well.
I.
B
A
B
A
Much
do
you
want
to
talk
about
number
nodes,
I,
remember,
yeah,.
D
Yeah,
thanks
for
reviewing
that
I
managed
to
update
it.
So
there
was
I
think
a
minor
comment
about
starting
like
comments.
So
I
did
that
and
it's
ready
for
people
to
take
a
look.
A
A
Therefore,
we
will
try
to
run
the
project
manager
test
on
bigger
machines
that
have
multiple,
pneumonodes
and
I
believe
this
is
only
for
your
focus,
is
topology
manager
call
suit
and
it
will
run
on
bigger
machines,
I
guess
also
discuss.
Maybe
we
need
to
increase
the
timeout
for
like
how
often
we
run
this
test,
but
right
now
it's.
D
Default
as
it
was
because
I
just
wanted
us
to
have
some
signal
and
be
able
to
compare
it
to
what
we
already
have.
A
A
Yeah
I
just
had
a
hold
on
it
because
of
this
minor
comments:
okay,
yeah,
it's
I
think
it's
a
big
deal.
We
will
be
testing
more
and
better.
D
A
Yeah
by
the
way
I
saw
that
basically,
it
was
fixed
hokkaya,
so
now
I
think
any
longer.
A
Yeah
and
it's
still
showing
red
but
most
of
the
test
green
now
and
failing
tests
are
not
failing
in
the
beginning
that
failing
on
specific
tests,
so
it's
much
better
shape.
Now,
foreign.
A
Check
issues
in
progress
to
start
cleaning
up
the
board
so
bear
with
me.
A
F
C
D
A
A
I'll
put
an
agenda,
so
the
document
explains
how
to
use
this
framework
and
how
to
write
a
good
tests.
A
I
think
this
is
one
of
those
like
how
to
write
specifically
for
painting
ports.
A
A
C
A
Yeah
I
think
yeah
the
test.
This
change
sounds
familiar,
so
I
would.
A
A
A
A
A
A
Yeah,
so
it's
Antonio
mentioned
here:
we
should
shouldn't
rely
on
events
in
test
and
we
had
a
dedicated
effort
to
remove
all
the
dependencies
and
events
from
conformance
tests.
So
all
the
conformances
was
cleaned
up
and
they
don't
have
this
dependency
any
longer.
A
It
feels
that
some
tests
still
have
this
dependency
and
removing
it
quite
trivial.
In
most
cases,
you
just
I.
A
Test
explicitly
meet
event
and
like
have
event
s
that
event
will
be
sent.
Then
you
have
some
other
conditions
like
something
happen
like
Port
was
deleted,
or
quote
was
important.
There
is
some
specific
state
so
changing
it
typically
easy,
but
in
this
case,
if
anybody
wants
to
take
it,
please
do
well
now
I
just
put
it
in
to
do
to
do.
A
Okay
and
I
think
this
is
something
related
to
our
team,
so
I
think
it
was
mysterious.
D
A
A
A
I'm
not
sure
if
it's
if
it
exist
anymore,.
A
A
A
A
I
remember
somebody
was
trying
to
fix
it
by
a
region
owned
score
just
from
a
proper
process.
So
we
had
a
problem
when
we
just
assumed
that
the
process
will
be
the
first
one
and
or
something
like
that.
So
we
took
an
score
of
some
other
like
incorrect
port.
D
C
A
Is
interesting,
I,
don't
think
it's
test
related
that
much
I
believe
it's
only
very
small
and
static
forward
test.
A
Yeah,
so
this
interesting
Behavior
exploration
today
when
you
Delete
Port
twice
like
you,
call
pod
deletion
with
a
very
good
large
grace
period,
and
then
you
want
to
kill
it
immediately.
You
quote
again:
second
gray,
Spirit
wouldn't
override
the
first
grace
period,
so
Kubota
will
wait
for
the
whole
duration
of
the
first
place
period
supplied,
and
this
is
a
change
of
behavior
that
is
not
ideal
after
120
something
22
23.,
so
this
PR
attempt
to
fix
it.
A
If
anybody
on
this
call
want
to
take
a
look
at
this
from
a
perspective
of
end-to-end
test,
at
least
to
make
sure
that
end-to-end
test
is
written
properly,
it
will
be
great
if
you
can
test
the
whole
it'll,
be
even
better
any
takers
and
it
was
brought
up
on
Signal's
main
meeting
as
well.
For
for
the
code
change.
A
A
Oh
yeah,
it's
actually
related
to
this.
So
oh,
it's
good,
meaning
that
actually
like
tests
that
they
added
is
not
working,
which
is
I,
mean
it
means
that
code
is
not
working
which
gives
you
something
Food,
For
Thoughts.
A
Proves
a
couple
of
recovery
after
restart
when
dealing
with
devices?
Oh,
this
is
Francesco
Europe
right.
Is
it
somehow
it
get
into
tasks
in
progress.
B
I
think
I'm
gonna
close
this
one
because
I'm
yeah
it's
working
progress
and
actually
Swati
has
a
better
PR
for
the
similar
issue.
If
I
remember
correctly,
so
I
don't
worry,
let
I
will
handle
that
and
probably
close.
A
A
Good,
we
also
have
issues
to
do,
but
we
also
only
have
20
minutes
left,
so
I
suggest
we.
We
will
clean
up
to
do
next
time
and
now
we'll
switch
to
bad
triage.
A
Is
there
any
comments
on
this
board
before
I
go
to
bugs
no
okay
and
by
the
way,
I
will
also
check
the
performance
dashboard.
It
still
looks
fine,
no
spikes
or
anything
if
you're
interested
just
click,
this
link
and
switch
between
CPU
and
memory
and
runtime.
A
E
We
haven't
come
to
a
consensus
yet,
but
sort
of
thinking
that
we
will
be
deprecating
this
feature,
since
no
one
has
time
to
work
on
it
over
here
at
Red,
Hat,
Maybe,
Google
or
somebody
else
can
pick
it
up
if
they
really
want
it.
But
you.
E
C
A
A
What
is
this?
Okay,
I
moved
the
triage
for
now,
because
it's
actually
an
issue,
but
maybe
you
can
can
close
it
because
of
a
duplication
of
the
feature.
Foreign.
A
A
Dependency
between
my
business
I'm
mixing
up
things
but
I,
remember
the
some
of
the
issues
was
because
of
Circle
dependencies
between
CRI
and
C
advisor,
and
we
wasn't
able
to
collect
certain
metrics.
B
A
A
C
A
Okay,
there
are
so
many
problems
with
cleaning
cleaning
up
volumes.
Recently,
some
activities
going
on
there.
A
A
A
A
Feel
but
error
is
different,
so
I
don't
think
it's
this
problem.
A
A
A
A
F
A
D
A
Let
me
thank
you,
so
you
said
last
time
that
you
will
take
a
look
at
documentation,
so
if
maybe
you
can
do
this.
C
A
Oh
yeah
sorry
I
forgot
about
it.
I
will
take
a
look
even
behavior
for
regular
and
static
ports
upon
deletion.
G
Yeah
so
I,
just
I
just
discovered
this
yesterday
me
and
Ryan
were
debugging
something,
and
so,
if
you
try
to
delete
the
a
pod
it
doesn't
especially
if
it's
static
part,
it
doesn't
transition
into
terminating
State
immediately.
But
if
the
regular
part
we
do
that
immediately.
So
that's
the
bug.
G
Effect
is,
let's
say
if
you
are
monitoring
a
part
for
the
status
and
you
remove
it
like
say:
if
it's
a
static
part,
and
if
you
remove
it,
you
would
expect
it
to
go
in
terminating,
like
the
regular
part
does.
But
this
one
doesn't
go
so,
but
eventually
it
does
go
in
this
terminating
State
and
gets
removed,
but
that's
like
a
split
second
before
it
gets
removed,
instead
of
it
immediately
transitioning
into
terminating
state.
G
So
I'm
not
sure
whether
this
is
a
signaled
issue
or
a
six
CLI
I
found
some
I
think
Ryan
found
some
documentation,
which
said
it's
just
a
waste.
Cube
CTL
represents
the
issue,
but
I
haven't
dig
deeper
like
so
I
just
want
to
keep
it
open
at
this
point
and
whether
it's
a
signal
or
issue
or
six
CLI
issue.
G
I'm
not
sure
how
so
I'm
removing
the
file,
but
whether
it's
a
CLI
that
queries
and
interprets
it
that
way
like
now
so
who
is
responsible
for
printing
the
actual
status?
Is
we
need
to
see
whether
the
logs
also
say
whether
the
part
is
running
or
not,
or
just
a
CLI?
That's
not
printing
correctly.
I.
Don't
know
at
this
point
now.
G
A
G
A
Thanks,
thank
you.
Containers.
A
A
Whatever
error
is
very
similar
to
what
we
see
in
different
bug,.
A
A
Okay,
we
are
out
of
time,
and
unfortunately,
we
didn't
have
enough
time
to
go
through
all
issues,
but
nevertheless
we
reached
almost
all
of
them.
Thank
you.
Everybody
bye-bye.