►
From YouTube: Kubernetes SIG Node 20230711
Description
SIG Node weekly meeting. Agenda and notes: https://docs.google.com/document/d/1Ne57gvidMEWXR70OxxnRkYquAoMpt56o75oZtg-OeBg/edit#heading=h.adoto8roitwq
GMT20230711-170511_Recording_1542x1020.mp4
A
Hello,
hello,
it's
a
weekly
signal
meeting
today
is
July
11th
2023
welcome
everybody.
Today
we
have
few
agenda
items.
We
will
start
with
Ace.
B
Yeah
so
I
opened
this
issue
a
week
or
two
ago.
Basically,
all
node
usage
is
reported
very
differently
for
secret
between
V1
I,
mostly
put
on
the
agendas
today
to
see
kind
of
I.
Think
I
at
least
have
some
alignment
on
what
I
think
the
solution
is.
But
I
was
hoping
to
either
get
a
consensus
here
and
try
to
move
that
forward.
B
In
terms
of
I
mean
it's
a
run,
C
fix,
it
needs
to
be
picked
up
in
C
advisor,
potentially
and
then
picked
up
in
kubernetes,
but
I
kind
of
would
like
to
get
that
train
moving
if
people
sort
of
agree.
B
So
the
very
short
version
of
this
is
curvy
two
there's
no
Group
C
group
like
memory.usage,
and
we
used
total
minus
free
to
approximate
it,
which
doesn't
really
match
the
calculation
that
was
done
in
V1,
and
it
can
be
off
by
several
hundred
megabytes,
almost
up
to
a
gigabyte
in
some
cases,
and
so
what
are
the
fix
and
run
C?
That
I
suggested
basically
just
matches
the
calculation
in
V1
and
I
might
need
to
sorry,
not
this
one.
Actually,
it's
at
the
very
end.
B
If
you
look
at
the
pr
that
I
linked,
but
it
basically
just
makes
it
in
on
plus
file.
B
And
yeah
I
think
the
other
question
is,
then:
how
do
we
actually
like
test
this
right
because,
ideally,
like
the
testing
that
I've
done,
I
basically
take
a
node
and
you
can
look
at
it
on
secret,
P1
or
V2?
You
just
reboot
it
and
change
it
and
you
see
what
gets
reported
and
you
can
check
like
proc
mem
info
and
all
the
actual
usage
is
exactly
the
same.
It's
it's
purely
a
reporting
calculation
issue,
so
yeah.
C
B
Correct
yeah
I
mean
there's
no
objections.
I
think
we
can
move
forward
but
I.
Definitely
if,
if
anyone
is
everyone
see
maintainer
and
wants
to
review
this,
please
do
take
a
look.
I'll
just
bring
the
folks
there
and
see
yeah.
A
Thank
you
for
bringing
it
in.
So
if
it's
run
C
and
then
see
advisor
and
then
kubernetes,
please
pink
people
more
aggressively.
Thank
you
for
bringing
it
to
the
meeting
yeah.
B
A
A
Okay,
Kevin's
item
was
strikent,
probably
it's
already
merged.
Next
one
Karthik.
D
So,
in
short,
what
we
are
trying
to
achieve
is
that
in
currently,
if
you
want
to
resize
the
compute
of
an
e
node,
we
know
to
manually
restart
a
cubelet.
So
with
this
approach
we
want
to
dynamically
change
the
values
to
the
cluster
level.
So
this
is
the
main
intention
behind
this
cap
yeah.
So
we
want
to
know
more
about
it.
A
Typically,
the
process
is
like:
we
need
to
Define
scope
through,
like
iterations
of
understanding
what
back
and
what
the
minimal
products
we
can
get
with
the
understanding
how
it
will
go
going.
What
will
happen
going
forward?
I.
C
A
D
Yeah,
there
is
concept
regarding
the
stability
of
the
cubelet
and
there
was
a
one
raised.
So
so
we
want
to
understand
with
the
which
perspective,
so
we
should
need
to
tackle
so
that
we
can
address
that
issue.
A
Yeah
I
think
the
bigger
question
was
semantical
like
do
we
even
want
to
support
this
kind
of
API
and
what
it
will
mean
for?
If
you
all
do
so,
one
of
the
suggestion
was
to
concentrate
on
Cub
Resort,
making
Dynamic,
for
instance,.
C
D
A
I
think
you
have
issue
and
you
have
a
PR
right
for
Gap,
so
I
think
it
was
raised
in
one
of
those
I.
Remember
seeing
it.
A
Okay,
but
I
hear
this
request
quite
often,
and
now
like
there
are
two
types
of
requests.
First
request
is:
let's
make
node
be
more
Dynamic
and
report
it
status
quite
like
proactively,
like
that's
probably
what
you
suggest
right
see.
C
A
Monitors
the
usage
to
update
the
status
correspondingly
another
approach,
people
asking
about
is
API
based,
so
can
we
make
an
old
big,
but
then
API
can
edges
the
snow
to
say,
like
only
use
that
part
of
a
node
and
that
another
approach,
people
thinking
about
both
are
valid
and
for
different
scenarios.
We
just
need
to
understand
how
much
we
want
to
address
it
soon
and
obviously
a
couple
of
stability
will
be
a
big
issue
here
as
well.
A
All
the
resources
put
in
place
upgrade
and
this
that
cap
was
merged
with
a
few,
with
understanding
of
few
race
conditions
that
we
introduced.
We
hope
to
address
them
closer
to
Beta,
but
it
may
require
a
lot
of
refactoring,
so
maybe
after
the
city
Factor,
it
will
be
easier.
A
If
you
want
to
keep
the
racing
on
that,
it's
it
would
be
great.
The
main
goal
is
to
understand
the
scope
and,
like
agree
on
a
scope,
is
also
parties.
So
if
you
can
go
through
the
pr
and
issue
again
and
collect
all
the
feedback
that
was
given,
it
may
help-
or
maybe
we
can
have
a
separate
meeting
for
that,
but
I
think
it
should
be
closer
to
beginning
of
the
next
release.
Right
now,
everybody
will
be
busy
with
128.,
because.
D
A
You
thank
you,
I,
don't
know.
E
Yeah,
that's
that's
me.
Actually,
hello.
Everyone,
like
I'm
new
to
the
meeting
and
I,
was
actually
taking
interest
in
this
particular
issue.
Where
I
understand
we
need
owner
references
to
be
exposed
from
the
Pod,
either
via
environment
or
download.
Api.
Probably
download
APA
so
like
I
just
wanted
to
get
a
better
understanding
of
it
and
also
like
how
to
go
ahead
with
it.
E
So
like
this
is
yeah,
this
is
one
of
the
use
case
that
was
under
discussion
where,
like
the
metrics
were
metrics
wanted
to
get
the
owner
references
of
the
pods
and,
like
the
replica,
sets
to
the
deployments
and
all
that
I'm
still
like
kind
of
not
clear
on
the
part
like
what
is
the
exact
use
case
that
will
require
this.
So
that's
one
of
the
things
I
wanted
Clarity
on.
A
Yeah,
if
it's
about
metrics,
it's
quite
an
interesting
use
case
and
we
met
it
when
we've
been
working
on
probes,
so
every
port
has
its
name
as
identifier,
and
this
name
is
constructed
from
in
in
case
of
some
application.
It
may
be
pod
name,
but
then
it
will
be
also
a
replica
set,
ID
replica
will
generate
ID
and
another
ID
as
well,
so
you
guys
have
multiple
random
IDs
as
a
suffix.
A
So
on
the
one
hand
you
want
to
have
metrics
specifically
for
this
port
for
this
instance
of
report,
but
for
some
metrics
you
want
some
aggregate
across
multiple
ports
so
and
easiest
way
to
aggregate
across
multiple
ports
is
to
I
mean
today
we
strip
step
in
the
suffix,
but
it
may
be
better
if
you
have
a
direct,
only
reference,
maybe
more
clean
solution.
D
A
It's
about
that
yeah.
If
you
want
to
start
the
cap,
you
can
start
writing
it.
I
think
we
will
get
more
attention
to
it
closer
to
129
I,
at
least
right
now.
We
are
closing
up
128
and
code
freezes
soon.
E
Great
yeah,
okay,
like
I'll,
start
drafting
the
cap
and
maybe
in
the
next
meeting
we
can
put
it
up
for
review
in
the
same
issue
thread
something
like
that.
A
Yeah
you
can
do
that
as
I
said,
you
may
not
get
enough
eyes
right
now,
because
everybody
will
be
working
wrapping
up
the
128
release.
We
will
have
code
freeze
very
soon.
E
A
Okay,
next
item
is
mine:
I
wanted
to
let
everybody
know
that
sidecar
PR
got
merged,
sidecar
yeah
I
see
people
reaction
is
a
emojis.
The
sidecar
is
a
long-standing
functionality
that
people
were
using
for
a
while,
but
it
wasn't
used
everywhere.
A
A
Please
watch
out
for
all
the
failing
tests
that
may
be
result
of
this
cap
on
merge
and
also
we
have
more
follow-up
PRS
like
since
it's
a
very
big
functionality.
We
split
it
into
multiple
PR's
main
logic
is
already
merged
and
the
only
thing
left
is
clean
up
here
and
there,
and
maybe
a
little
bit
of
enablement
of
more
things
like
craziness,
props
and
such
but
yeah,
please
be
on
lookout
for
any
regressions.
We
can.
We
could
have
cost.
F
Yeah
I
was
that
we
have
a
pull
request
open
for
the
windows,
implementation
for
the
stats.
Only
Cris
and
I
was
hoping
to
get
some
reviewers
from
signal
to
look
at
that,
so
that
we
can
hopefully
get
that
merged
by
next
week.
For
this
there
there's
the
implementation
and
some
ede
tests,
the
ede
tests
are
currently
failing
because,
as
these
Ed
tests
uncovered,
some
issues
where,
like
on
on
Windows
pods
and
transient
states,
were
causing
some
of
the
stats
not
be
reported
correctly.
There's
some
linked
pairs
in
container
D.
F
That
I
know
folks
have
been
reviewing,
and
hopefully
those
get
merged
soon
too,
and
then
we'll
make
sure
that
the
ed
test
pass.
That's
running
that
this
test
consumes
those
updates,
but
I,
think
that
we
could
probably
work
on
getting
this
merged
in
kubernetes,
since
it
seems
like
the
functionality
is
not
working.
G
Yeah
I'm
here
it
is
it's
targeting
beta
I
I'm
suspicious
as
to
whether
it'll
make
it
there.
But
it's
currently
targeted,
debated
now
like
to
go.
B
G
A
Okay,
yeah,
because
it's
if
it's
owned
by
default
or
some
reason,
I
thought
it.
It
targets
GA,
and
that
was
a
little
concerning.
But
if
it's
bad
it's
great,
we
progress
and
I
see
him.
Now.
Oh
no,
who
commented.
F
F
A
A
H
Hey
guys
yeah
hi,
oh
good
morning,
I
am
new
to
this
project
and
meeting.
So
actually
my
I'm
going
to
ask
a
very
maybe
still
the
question
so
I
have
this
PR
and
it's
about
like
facing
an
issue
like
when
steady
policy
like
pausing
shared
pool,
still
allocated
to
Reserve
CPU
right
and
then
there's
a
test
like
poor
communication
manager
test.
It's
just
failing.
E
H
I
try
to
see
the
test
in
the
ammo,
but
he
just
said
exit
one
and
no
specific
error
message
so
is
Francisco
there.
H
Yeah
I
think
you're
very
familiar
with
this.
How
should
I,
debug
and
see
what's
really
failing
or
just
false
alarm,
yeah
yeah.
I
So
basically,
yes,
you
would
like
to
have
the
the
CP
manager
test
run
against
APR,
so
yeah
about
the
the
the
specific
test,
unfortunately
I'm
between
in
I'm
with
busy
this
week,
for
the
reasons
to
get
ready
explained
so
me
and
others
so
yeah
I
will
take
a
look
as
soon
as
I
can,
but
if
someone
else
wants
to
to
help
feel
free,
because
I
I
have
a
limited
bandwidth,
but
yes,
we
need
to
it's.
I
It's
a
related
to
to
your
PR
I
I,
strongly
believe
that
so
need
to
make
sure
the
lane
is
working
correctly.
H
Yeah
but
but
let's
say,
let's
error
like
this
like
this
in
a
test,
how
should
I
see
the
actual
error
message.
I
A
Yes,
a
few
suggestions
first
is
try
to
run
it
on
mcpr.
It
may
be
helpful
and
then
like.
If
it
doesn't
work
on
mtpr
as
well,
then
Lane
is
clearly
broken.
A
You
can
also
ping
seek
testing
because
it
may
be
something
with
infrastructure
once
these
two
exhausted
yeah,
please
pink
again
on
Slack,
and
we
also
have
other
CPU
anthropology
tests
failing
right
now
on
periodics,
like
we
had
them
on
CI
on
pull
requests
and
they
were
reasonably
stable
before
I.
Don't
know
what
happened
with
this
particular
one
on
periodics,
we
have
them
added,
spicy
added
them
a
few
weeks
back
and
but
they
are
not
working
right
now
as
well.
A
So
we're
looking
for
people
who
has
bandwidth
to
look
at
this
test
and
try
to
get
them
back
into
working
state,
but
those
tests
are
very
important
because
before
before
Society
added
this
pull
request,
jobs
and
other
jobs.
We
didn't
have
tests
at
running
on
multinomial
environments
and
I.
Think
this
one
is
Ryan
called
multinum
as
well,
which.
H
A
I
H
A
But
since
it's
multi-numa,
it
uses
bigger
machine
type,
actually
a
very
big
marketing
type,
so
that
may
be
one
of
the
issues
and
it
may
be
test
infinite
related.
That's
why
I'm
suggesting
to
suggesting
to
start
with
seek
testing?
Maybe
they
can
help.
A
Yeah
kubernetes
we
have
different
sigs
and
sigma
we're
looking
at
equivalent
and
such
and
sick
test
is
concentrated
on
how
tests
are
run
and
what
kind
of
test
Frameworks
we
have.
So
if
there
are
some
infrastructure
like
there
is
also
infrastructure,
sick,
terrible
to
help
with
that.
If
one
of
those
may
be
better
to
look
at
this
particular
issues
and
signals.
H
A
As
said
that,
we
have
this
one
and
we
have
more
related
tests
that
needs
to
be
fixed
and
we're
looking
for
people
with
a
bandwidth
to
get
them
back
into
the
Green
State.
H
J
Oh
yeah,
this
one
is
about
to
the
eviction
CI
failure
for
PID
related
scene.
This
is
here
we
can
see.
J
There
are
many
feeling
readings,
and
one
of
them
is
about
PID
pressure
related,
which
is
filled
in
the
CIO,
but
it
runs
well
in
the
community
to
it
will
cause
a
PID
pressure
easily
in
Canon
D
cluster,
in
my
local
test
as
well
and
I'm,
not
sure
if
you
are
joining
the
subscript,
the
signaled
test
failures,
email
group,
you
will
got
a
filler
email
every
day,
I
think
and
yeah.
J
Yeah
I
want
to
Iris
here
because
so
many
guys
here
I'm
not
sure
if,
if
anyone
knows
the
background
or
context
about
this
e2e
test,
I
see
according
to
recent
testing,
it
is
always
filled.
J
I
want
to
know
some
context,
maybe
about
the
pr
I
fixed
it
using
not
waiting
for
the
PID
pressure
on
CIO,
but
I'm,
not
sure
if
it
is
right.
Another
thought
on
this
the
another
way
to
fix
this
I'm,
not
sure
if
it
is
right,
I
tried
to
make
the
PID
pressure
lower
to
make
the
couplet
faster
I'm,
not
sure
if
that
works,
and
it
also
flicks
in
the
community
environment
for
the
eviction
order,
because
roblette
cannot
know
the
the
PID
process.
J
States
of
hpod
apt,
I
enabled
the
feature
gate
to
get
post
status
from
CI
the
then
the
it
works
in
my
cluster.
A
J
By
the
way,
the
I
think
the
PID
process
States
for
is
forming.
If
it
is
from
the
C
advisor
there
is
a
PID
related
metrics,
which
which
has
some
performance
issue.
I
think
before
I
have
said,
say
that
in
an
all
the
kubernetes
issue,.
A
Yeah
I
think
I
looked
at
PR
and
I'm
not
comfortable
to
fix
it,
as
it
suggests
to
fix
just
like
changing
condition.
I
really
want
to
understand.
What's
going
on
so
yeah,
if
anybody
has
a
context,
please
share.
H
A
A
Okay,
yeah:
we
at
the
end
of
agenda
anything
else
for
today.
A
Okay,
if
nothing
else,
please,
if
you
can
spend
this
rest
of
the
meeting
time
for
reviews,
please
do
so.
There
are
many
PRS
that
waiting
for
some
ice
and
some
attention.
Thank
you
very
much.
Bye-Bye.