►
From YouTube: Kubernetes SIG Node 20230829
Description
SIG Node weekly meeting. Agenda and notes: https://docs.google.com/document/d/1Ne57gvidMEWXR70OxxnRkYquAoMpt56o75oZtg-OeBg/edit#heading=h.adoto8roitwq
GMT20230829-170534_Recording_640x360.mp4
A
Hello,
hello,
it's
August,
29
2023!
It's
a
signaled
weekly
meeting,
welcome
everybody.
We
have
not
bigger
attendance
I.
Think
it's!
The
last
week
of
summer,
like
before
many
kids,
go
to
schools.
So
I
think
that
may
be
the
reason,
and
also
there
is
a
Google
next
coin.
So
many
googlers
are
out
is
that
we
still
have
a
very
large
agenda.
So
let's
get
right
into
it.
A
First
is
Canon.
A
C
B
Yeah
I
just
I
think
about
a
month
ago,
I
brought
to
this
agenda
the
take
over
the
Pod
ready
to
start
containers.
Cap
for
beta
and
I
have
I
have
a
PR
to
update
the
cap
for
beta
and
I
was
hoping
to
just
get
a
reviewer
slash
approver
on
that,
if
possible
and
then
also
to
see
about
I,
guess,
maybe
I
don't
think
this
meeting.
B
We
are
going
to
talk
about
1.29
yet,
but
I
was
hoping
to
get
this
on
track
for
1.29,
but
I
guess
I,
don't
know
when
that
is
yeah
I,
don't
know
when
that's
happening
quite
yet.
A
Yeah
we
started
to
have
planning,
maybe
starting
next
week
when
there
will
be
more
people
on
a
meeting
or
discuss
it
means
already
so
yeah
it's
time
to
bring
it
up
and
if
you
think
it's
ready
for
129,
it's
great
and
I
see
you
posted
couple
scenarios
that
is
being
used.
Yeah.
B
You
you
had
a
suggestion
to
update
some
use
cases
that
I
found
useful,
so
I
added
those
and
then
I
updated
the
cap
with
some
of
the
details
since
1.25
just
to
make
it
more
clear
about
what
the
progress
of
what
that
cap
has
done.
C
Hi
actually
Karthik
has
not
joined
the
call
and
I
would
be
asking
a
few
questions
on
behalf
of
Karthik,
so
I.
We
actually
wanted
to
schedule
a
meeting
to
discuss
further
on
the
dynamic
node
resize
kept.
So
would
it
be
appropriate
if
we
could
have
this
meeting
scheduled
sometime
early
next
week,
provided
that
signode
has
no
agenda
or
plans
which
might
hamper
the
meeting
schedule
for
I
mean
specifically
for
the
skip.
A
C
Oh,
we
would
just
like
to
like
schedule
a
separate
meeting
for
this
to
discuss
further
on
this
cap,
so
that
is
the
expectation
we
have
so
that
we
could
combine.
You
know,
get
get
an
idea
together
and
continue
to
work
on
this
Gap
to
get
this
get
this
rolling.
A
Sure
what
Peter
did
with
GC
working
group
like
image
GC,
he
sent
a
what's
called
the
scattering
thingy
scaling
up.
Peter
remind
me
doodle,
okay,
oh
yeah,
so
you
can
set
up
a
doodle
and
such
as
times
and
where
we
interested
can
join
I.
Think
early
feedback
is
we
already
have
in
place
pod
vertical
other
scalar
feature?
That
is
an
alpha.
The
problem
with
this
feature-
and
this
feature
enables
dynamism
in
how
ports
are
allocated,
and
this
is
another
part
of
dynamism
and
I.
A
Think
feedback
was
a
couple
of
releases
back
that,
since
we
have
in
place
vertical
Auto
scaling
implemented,
then
we'll
open
door
for
more
dynamism,
because
we
already
have
a
way
to
adjust
things
on
the
North
side.
At
least
the
problem
is
that
in
place
vertical
other
scale
still
has
a
lot
of
depth.
We
we
we
need
to
refactor
some
of
the
logic
and
how
allocation
happens
and
this
depth
is
still
haven't
paid
for.
A
So
we
still
have
this
feature
in
Alpha
and
we
still
have
outstanding
refactoring
to
do
that
will
complicate
implementation
of
dynamic,
node
resize,
because
without
sending
refactor
for
one
cap
introducing
another
cap
will
be
quite
complicated
and
may
not
be
reasonable.
So
you'll
get
a
lot
of
pushback
on
on
quote
front
as
well.
Just
for
you
to
be
aware
of.
C
Yeah
to
I
totally
agree
with
that
point
of
views,
so
we
we
will.
We
are
okay
with
that.
So
we'll
just
keep
the
discussion
in
progress
and
probably
once
there
is
stability
and
some
conclusion
on
in
place
pod
Auto
scaler.
Then
we
would
probably
proceed
or
work
on
Dynamic
node
resize.
If
that's
okay,.
A
Oh
yeah
absolutely
and
thank
you
for
pushing
forward
I
think
the
feature
is
required
for
by
many
so
and
I.
Think
it's
another
shout
out
like
since
we're
talking
about
in
place
versus
scaling.
If
anybody
interested
to
move
it
forward,
please
step
up.
I.
Think
VNA
is
not
going
to
move
it
forward
just
anytime
soon
from
what
I
heard.
A
Okay,
thank
you
for
bringing
it
up.
Next
is
a
big
topic
of
API
for
quadrants
information.
D
Yes,
hi,
so
I
I
would
like
to
propose,
let's
say
again,
a
new
API
for
cubelet.
So
we've
been
working
to
add
some
between
so
answer
some
questions,
so
we
would
like
to
propose
the
API
that
will
send
the
local
pods
information
about
its
Readiness.
We
would
like
this
API
to
be
exposed.
They
are
cool
blood
because
we
cubelet
owns
a
lot
of
conditions
that
are
regarding
the
plot
State.
D
The
status
which
they
are
equivalent
is
watching
the
pause
that
they're
running,
and
we
think
that
this
API
would
be
a
benefit
to
the
couple
that
puts
Readiness
from
the
control
plane
and
also
adding
the
new
watches
to
cube.
Api
is
a
huge
scalability
issue,
so
if
someone
some
workload
means
only
to
watch
the
parts
which
are
running
on
the
actual
node,
this
is
a
good
benefit,
so
we
can
read
them
directly
from
the
cubelet.
D
So
there
is
already
there
are
already
some
apis
in
cubelet
and,
for
example,
there
is
a
product
resource
endpoint,
but
actually
this
endpoint
would
serve
the
information
about
this
local
posts
which
are
owned
by
the
cubelets.
So
that's
why
I
think
that
we
should
expose
a
new
one,
and
we
know
that
there
are
some
issues
with
Q
blood,
for
example
the
restart.
So
we
need
to
take
care
of
this
issue
and,
first
of
all,
we
think
that
that
this
API
should
return
the
actual
non-stay
by
cubelet.
D
So
we
need
we
will
implementing
this
API.
We
need
to
make
sure
that
when
the
API
is
ready
to
send
the
data,
cubelet
knows
the
information
about
all
the
endpoints,
because
we
know
from
this
pod
resource
endpoint
that
there's
been
the
issue
that
cubelet
is
serving
the
information
when
it
does
not
snow
in
which
state
the
produce
and
by
default.
If
Kubler
doesn't
know
the
information
about
the
plot,
its
markets
are
not
ready,
so
we
know
that
we
need
to
cover
this
issue.
I
said
at
the
beginning
that
we
need.
D
We
want
this
API
to
be,
let's
say,
independent
through
the
control
plane
availability,
because
it
might
happen
that
control
plane
is
unreachable
or
is
down.
We
know
that
console
plan
is
a
branch
in
kubernetes,
but
actually
for
the
brief
moment
when
the
constraint
line
is
down,
the
user
work
would
maybe
working
fine
and
there
may
be
some
work
that
are
interested
only
in
the
current
state.
D
So
we
think
that
if
the
cubelet
is
running
and
it's
healthy,
even
if
the
cube
API
is
done,
this
API
should
return
the
current
state
that
is
stored
in
that
cubelet
cash
caches,
not
even
if
it
might
not
be
reported
to
the
cube
API
and
this
app
you
should
be
right
limited.
So
we
don't
want
to
put
a
very
big
law
to
the
cubelet
and
I
tested
it
briefly
on
resource
consumption.
D
So
if
we
run
the
grpc
request
with
one
let's
say
one
request
per
second:
it
shows
no
resource
consumption
increase,
so
it
is,
let's
say
we
can
add
more
load.
It
is
okay
and
we
would
like
to
use
grpc
because
we
think
that
streaming
it
we
can
version,
it
strongly,
could
strongly
type
and
we
can
use
Leverage
The
streaming.
D
So
I
put
some
proposal
like
how
would
it
looks,
but
we
are
interested
in
returning
the
conditions
that
are
owned
by
cubelet.
So
from
this
condition
the
workloads
may
know
in
which
phase
is
bought
and
if
it
is
ready
to
sell
the
traffic
or
or
not,.
E
I
think,
can
you
give
more
details
in
The
Proposal
about
the
kind
of
workloads
that
need
to
know
this
information.
D
What
so,
so,
I
think
that
there
might
be
different
works,
for
example,
which
wants
to
understand
if
the
pause
actually
is
ready
to
serve
the
traffic.
One
of
the
example
might
be,
for
example,
that
cilium
would
like
to
understand
if
the
pots
are
ready
to
receive
the
traffic
or
they
are
in
training
state
or,
for
example,
starting
I
think
that
there
might
be
other
customers
might
want
to,
for
example,
run
some
a
workflow
that
understand,
in
which
phase
the
reports
is,
for
example,
for
better
custom
monitoring.
D
So
I
think
that
this
API
might
have
that's
a
various
various
user
stories,
but
overall
I
think
that
for
customers
there
will
be
just
let's
say
one
or
two,
this
post,
because
they
will
be
more
like
system
pause
they.
So
it's
not
like
the
customer
will
run
these
parts
and
expose
this
data
outside.
This
is
something
that
should
be
keep
on
the
Note
okay,.
F
Yeah
I
think
maybe
it
provides
us
a
little
bit
more
context.
We're
trying
to
bridge
a
gap
in
reliability,
performance
Etc,
we're
right
now
to
understand
if
you're,
if
you're
a
workload
on
a
node
today
to
understand
if
another
workload
on
the
same
node
is
healthy,
you
need
the
API
server
to
be
up.
You
need
to
go
all
the
way
up
to
control,
plane
and
then
pack
and
that's
not
really
helpful
when
kublet
is
sitting
right
beside
you
and
is
actually
doing
the
health
checking
and
has
the
most
recent
information.
F
What
we're
particularly
concerned
about
is
a
scenario
where
control
plane
is
unavailable
and
we
want
to
understand
the
health
of
a
pod
on
the
same
node,
but
we
can't
get
it
because
the
control
plane
is
not
there,
but
we
have
you
know
all
the
information
is
sitting
right
there
beside
us
so
that
that's
really
what
we're
trying
to
resolve.
You
know
the
example
Catalina
mentioned
of,
like
you
know,
psyllium
or
you
know
a
data
plane
just
wanting
to
understand.
F
For
example,
the
health
of
an
endpoint
on
on
the
same
node
could
be
very
helpful
and
provide
just
a
bit
more
reliability
and
remove
that
kind
of
control
plane
dependency,
where
it
doesn't
need
to
exist.
E
Yeah
I
think
adding
those
use
cases
will
be
helpful
like
to
motivate
why
we
are
adding
something
like
this,
and
the
second
thing
is
I
believe
like
there
was
a
discussion
with
the
architecture
sigarch
on
on
this
proposal,
whether
to
add,
grpc
or
to
add
more
first
class
API.
So
maybe
an
Alternatives
considered
section
and
Pros
cons.
D
Yes,
so
yes,
we,
we
proposes
this
to
the
sick
art
and
they
asked
about
this.
So
we
look
at
the
HTTP
API,
but
actually,
if
it's
only
part,
is
being
let's
say
considered
as
a
not
safety
and
it's
for
some
workloads
is
being
shut
down
and
to
access
the
HTTP
API.
We
need
to
use
the
authenticated
part
and
the
application
goes
straight
to
the
control
plane.
The
credential
can
be
cached
by
default,
is
I,
think
two
or
five
minutes,
but,
like
I
said
we
wanted
to
to
introduce
this
locality.
D
We
wanted
so
the
workload
is
on
the
Node
and
is
accessing.
So
we
thought
that
using
the
unique
circus
and
to
access
the
unit
second,
their
workload,
so
it
needs
to
be
on
the
Node.
So
we
thought
that
this
will
be
beneficial
because
of
this
control
plane
dependency
issue,
and
it
will
be
I
believe
that
it
will
be
easier
to
version
this
API
with
grpc
than
with
http.
D
F
Maybe
just
a
high
level,
that's
very
good
feedback.
Thank
you.
Just
a
high
level
kind
of
procedural
question
at
what
point
to
should
we
move
forward
with
you
know,
actually
translating
this
to
a
PR
on
k-
enhancements
like
actually
start
spinning
up
a
cap
itself
for
this
I.
E
F
C
A
Is
there
any
more
questions?
Mike,
you
turned
on
the
video
for
a
second
and
then
turned
off.
Did
you
want
to
ask
something
or
just
mistake.
G
F
F
Intended
to
be
read
only
and
and
only
a
very
small
sub
honestly,
so
this
is
coming
from
Sig
Network
perspective
and
and
all
that
we're
really
looking
for
is
information
on
Readiness,
though
I
can
understand
how
like
once
we
open
this
little
door,
it
could
expand
beyond
that.
But
for
our
use
case,
that's
really
all
we
want
I
mean.
G
Right,
we
have
other
paths
to
do.
You
know
Network
Readiness
checks.
Maybe
we
should
talk
about
that.
I,
don't
know
if
you've
been
working
with
the
the
safe
Network
team.
That's
doing
some
additional
design
changes
to
support
multiple
networks
and
updates
that
sort
of
stuff
they're.
F
F
Fair
there
is
the
the
parallel
work
on
multi-network
yeah,
I
I
think
this
is
orthogonal
because
again
at
this
point
we
just
care
about.
Is
this
Ready
or
Not?
And
we
don't
want
to
really
we're
trying
to
remove
API
server
dependencies
where
they
don't
need
to
exist?
That's
it.
F
You
know
like
if,
if
if
we
get
a
support
ticket
or
something
like
that,
you
know,
API
server
was
down
and
therefore
my
you
know
my
endpoints
are:
never
you
know
getting
traffic
or
they
are
getting
traffic,
but
they're
unhealthy,
like
just
trying
to
avoid
those
kinds
of
disconnects.
F
I
mean
there
are
any
number
of
reasons
when,
when
you
have
enough
things
enough
scale,
there
are
any
number
of
things
that
could
go
wrong
and
so
just
providing
just
a
little
bit
more
reliability,
I
guess
and
I
know.
Kublet
also
has
things
that
could
go
wrong
on
it
so,
but
trying
to
keep
dependencies
as
local
as
possible.
G
H
I
wanted
to
add
a
quick
point
just
that
you
know,
there's
other
components:
I
think
that
are
relying
on
the
HTTP
pods
endpoint
today
right
and
there's
a
lot
of
problems
with
that
endpoint
one,
namely
like
it's
all
and
big
Json
blob.
We've
had
like
cases
of
metrics
agents
and
other
things,
reading
those
and
actually
switching
over
to
read
from
the
API
server.
H
Just
because
performance,
gonna
inefficiencies
are
reading,
the
HTTP
API,
so
I
would
like,
maybe
to
consider
also
to
to
make
it
so
that
you
know
we
only
have
I
understand
the
number
of
thread
in
this
case,
but
maybe
there'll
be
other
use
cases
that
we'll
want
to
follow
up.
H
A
Yeah
I
think
my
big
worry
about
that
is
today
we
have
what
what
what
is
the
Pod
resources
API.
It's
also
grpc
and
also
it's
all
the
ports
and
we
don't
want
to
add
Readiness
information
to
report
resources
for
multiple
reasons,
and
one
of
them
is
to
separate
throw
think
of
one
endpoint
from
another
endpoint.
So
if
somebody
asking
about
devices
too
often
we
don't
want
Readiness
to
be
affected.
A
If
we
will
start
using
it
for
more
scenarios,
then
keeping
one
endpoint
generalized
may
not
be
a
good
answer
for
throttling
issues
of
endpoint,
so
yeah
I
think
yeah,
I
kind
of
like
the
Sig
architecture
like
we
need
to
spell
out
why
exactly
we're
doing
like
grpc
versus
any
rest
endpoint
that
may
have
potentially
have
like
per
client
throwing
or
something
like
that.
It's
like
per
service
account
routing
per
se.
So
we
need
to
spell
it
out
and
understand
how
we'll
generalize
it
in
future.
For
most
letters.
A
Okay,
any
more
comments,
questions
I,
think
it's
a
big
deal
like
it's
one
of
the
I
think
the
second
API
is
a
guarantee.
Well
backward
compatibility
from
Google.
Not
even
metrics
are
guaranteeing
anything.
A
Okay,
next
is
signing
here.
I
Yeah
so
generally,
like.
D
I
Yeah,
so
the
problem
is
that,
like
we.
I
Get
the
registration
completed
and
right
now
is
the
internal
field
and
only
like
you
can
only
access
in
couplet,
but
we
are
trying
to
fix
an
issue
in
our
CSI
node
that
we
need
to
refer
to
the
registration
completed
status
and
then
decide
what
the
CSI
node
initialization
should
be
doing.
I
And
then
the
first
question
is
like:
is
there
any
existing
accessible
status
in
the
node
object
to
indicate
that
the
node
has
stress
registration
completed
and
if
yes,
that's
the
best
and
then
we
can
just
consume
it,
and
if
not,
is
there
any
suggestion
around?
How
should
we
pass
around
this
status?.
I
I
H
Yeah
so
I
think
we
chat
about
this
I
think
there's
a
couple
options,
one
I
think,
since
the
volume
manager,
it's
part
of
the
kubelet,
the
kublic
in
its
node
status,
has
like
that
Revenue
distribution
completed
flag
right,
so
I
think
that's
one
option.
B
C
H
Yeah
I
think
today
we
don't
expose
that
Boolean
very
well,
but
maybe
that's
one
option.
The
second
option
would
be
somehow
to
pass
the
full
node
status.
I
think.
Maybe
you
could
check
the
ready,
conditioner
proxy.
I
H
H
So
so
maybe
we
need
to
do
I
think
the
the
problem
is
that
that
information
is
not
really
accessible
to
other
yeah
Google.
It
components
like
other.
You
know
it's
kind
of
internal
to
Google
in
some
sense
or
even
internal
to
to
Google
it
from
other
packages.
So
maybe
we
just
need
to
do
a
little
bit
better
job,
exposing
that
information
other
equivalent
components
right
so
we're
just
having
some
type
of
wrapper.
H
H
C
A
Yeah,
sorry
for
being
slow,
I
think
I
understand
the
problem
now,
so
you
want
registration,
completed
the
flag
inside
Google
code
to
be
exposed
to
other
components.
David.
If
you
look
at
this
quote,
how
do
we
know
to
start
the
registering
static
ports
with
the
API
server?
Do
we
check
this
plug
somehow.
A
A
H
I'm
not
sure
exactly
where
it's
synchronous
I
know
like
there's
a
sources
like
for
the
Pod
sort
of
information.
There's
like
all
the
different
sources
of
static,
pods,
API,
pods
Etc.
They
have
like
a
sources
ready
field,
that's
set
when
that
source
is
ready
and
and
is
able
to
to
get
new
information,
I'm,
not
sure
how
that's
exactly
linked
up
to
registration
completed.
Maybe
that's
there's
two
different
mechanisms.
There
I
need
to
take
a
closer
look.
A
G
Put
the
the
link
to
where
the
code
you
know
sets
it
to
true
when,
after
register,
it's
basically
just
yeah
we're
done,
we
registered
with
the
API
server
as
the
node.
A
H
A
A
A
Thank
you
for
bringing
it
up.
Next
one
is
vipin.
J
Hey
guys,
can
you
hear
me
this
will
test
okay,
yeah
yeah
thanks
yeah,
okay,
yeah
I'm
waiting,
so
I
raised
the
pr
some
time
ago,
it's
to
kind
of
fix,
a
CPU
manager.
When
you
you
define
some
reserved
CPU,
they
should
just
exclude
from
share
pool
so
container
shouldn't
be
allocated
on
the
CPU.
So
so,
due
to
the
current
test,
laying
the
there's
a
CPU
major
test
is
failing.
So
previous
Reviewer
is
not
confident
so
if
the
pi
is
still
kind
of
pending.
A
The
things
that
are
problems
right
first
problem
is
the
test
is
failing
and
I.
Think
SWAT
is
trying
to
fix
this
test,
and
Francesca
is
helping
her
and
second
problem
is.
Francisco
is
not
sure
that
this
fix
is
actually
changing
things
in
a
proper
way.
I
didn't
get
into
details.
J
A
That's
why
I
think
Kevin
again,
because
Kevin
already
approved
it
I
think
is.
A
J
So
I,
according
to
my
understanding,
I
think
he
just
means
the
change
is
too
deep,
which
means
he's
afraid
he
might
cover
all
the
cases
but
I
but
I
think
it's
kind
of
vague,
so
I
don't
know.
J
If
if
the
statement
can
be
more
concrete,
maybe
I
can
try
to
take
a
look,
but
as
far
as
I
see
the
code,
I
think
it's
not
that
that
might
be
the
appropriate
place.
I
don't
know
if
I'm,
if
someone
can
be
a
side
and
sing
with
me
to
to
proceed
to
help
me
to
proceed.
J
Yeah:
hey
it's
okay!
How?
If,
if
someone
can
help
with
me
to
with
that
like
either
to
see
if
there's
any
other
place,
I
should
change
the
code
instead
or
how
to
do
some
tests
to
verify
that
you
cover
most
cases
or
something
or
maybe
just
fix,
that
test
line
just
need
someone's
help,
because
I'm
I'm
kind
of
new.
With
this
thing.
J
Yeah
so
I
post
a
couple
of
message
in
Signal
slack,
but
no
no,
no.
J
E
E
J
Yeah
thanks
morono
yeah.
A
Okay,
thank
you
for
now.
Yeah.
This
failing
test
line
is
quite
disturbing.
I
mean
it's
new.
What
happened
is
in
128.
We
introduced
this
test
land
because
before
we
didn't
have
any
tests
on
multinum
environments,
so
all
the
tests
we
had
in
upstream
was
I
mean
we
didn't
have
any
tests,
so
we
just
relied
on
red
heart
to
test
it
and
give
a
green
light
for
every
PR's
and
like
yeah.
Please
PR
is
fine
like
go,
merge
it
and
like
we
just
had
to
change
it.
A
In
128
we
had
a
staff
line
working
for
some
time,
but
then
it
broke
and
we're
still
trying
to
get
it
back
into
greens.
J
A
With
that
we
reached
the
end
of
agenda,
is
there
anything
else
today.