►
From YouTube: Kubernetes SIG Node 20230808
Description
SIG Node weekly meeting. Agenda and notes: https://docs.google.com/document/d/1Ne57gvidMEWXR70OxxnRkYquAoMpt56o75oZtg-OeBg/edit#heading=h.adoto8roitwq
GMT20230808-170555_Recording_1920x1020.mp4
A
Okay,
hello,
hello,
it's
a
weekly,
signaled
meeting,
welcome
everybody.
On
date,
it's
August
8
2023..
Today
we
only
have
one
topic
on
agenda
and
linguan
will
be
presenting.
Take
it
over.
B
Okay,
so,
oh
sorry,
I
I
think
I
I'm
not
able
to
share
the
the
document.
Maybe
I
can
just.
C
Quit
my
zooming
when
I
open,
there
is
some
permission,
interests.
C
Yeah
cool,
thank
you,
so
yeah
I
think
this
is
the
continue
the
topic
of
the
meeting
of
July
18th,
so
we
actually
brought
both
down
the
all
the
problem
into
four
of
them.
So
here
yeah.
This
is
the
detailed
document.
So
maybe
I
will
briefly
talk
about
each
of
them
and
then
you
guys
can
comment
more.
If
you
have
more
questions
so
the
first
one
is
the
new
static
policy
for
CPU
manager.
C
So
we
know,
CPU
manager
have
has
some
static
policy
and
we
add
actually
one
more
static
option.
Studying
policy
options
is
called
a
spread.
The
physical
CPU
preferred.
C
So
the
problem
is
that
in
our
impedance
we
have
a
database
team
and
we
we
actually
found
in
performance
again
if
we
spread
the
DB
container
to
to
actually
different
physical
course,
instead
of
like
the
same
hype
thread
in
the
same
physical
core,
so
you
guys
can
look
at
the
maybe
the
picture.
Can
you
scroll
down
yeah?
So
here?
If
you
compare
these
two
different
CPU
sorting
image
you,
then
you
can
see
the
difference.
C
So
the
first
one
is
like
this,
like
the
CPU
ordering
is
they
will
sort
according
to
the
virtual
course
you,
maybe
the
hyper
threading
the
same
physical
course
and
the
second
one
is
like
we
sort
the
CPU
course
according
to
different
physical
costs.
C
C
So
yeah,
and
also
we
did
some
experiments
and
you
can
see
the
appendix
like
for
the
detailed
experiments
to
see
like
why
this
CPU
static
policy
option.
It's
better.
B
B
D
You
mean
about
which
yeah
yeah
I
think
I
agree
with
Francesca
what
he
already
Putin
was
saying
meeting
now
it's
what
this
looks
like
just
an
option
to
exist
in
static
policy,
so
like
with
selection,
which
CPU
core
is
just
part
of
algorithm.
What
CPU
manager
static
policy
has
I,
don't
think
we
need
a
special
policy
for
it.
C
Oh
yeah,
so
right
it
I
think
it
should
be
like
one
setting
a
policy
auction.
So
it's
not
a
new
static
policy.
Maybe
I
should
yeah.
E
Yeah
I
can
briefly
talk
about
it.
So
the
time
we
invent
this
policy
it
doesn't
have
to
across
Numa
in
community.
Yet
and
after
we
evaluate
the
new
policy,
we
notice
there's
the
like
a
slice
difference,
because
the
distributed
CPU
across
normal
is
mainly
focused
on
putting
like
a
heifer
spread
across
the
normal
pneuma
and
our
like.
E
The
policies
treat
pneuma
as
a
kind
of
like
the
secondary
preferred
like
a
Criterion
to
Cross
or
not
so
that
it
has
to
cross
it
try
to
firstly
cross
the
like
physical
CPUs
in
the
same
new
manual,
and
if
the
only
if
there's
no
like
available
physical
CPUs,
then
it
tries
to
cross
the
new
map.
So
the
the
the
the
detail
strategy
is
a
little
bit
different
compared
to
the
new
addict
like
across
Newman
I.
Think
it's
the
125
I
think
yeah.
That's
the
difference.
D
Well,
here
we
have
actually
two
different
problems,
so
selecting
spreading
across
physical
course
versus
like
multiple
threads
on
the
same
core.
It's
part
of
a
static
policy
so,
like
you,
can
look
at
the
option.
Existing
one
prefer
like
full
CPU
as
an
example,
so
like
what
what
you're
looking
for
is
exactly
where
opposite
what
Spider
did
some
time
ago
for
for
it,
but
trading
Numa
as
a
secondary
thing.
D
Well,
it's
it's
more
a
bigger
problem,
because
it's
not
on
the
CPU
manager.
It's
the
whole
logic
of
topology
manager,.
E
Yeah
I
think
the
first
thing
it
it
is
true.
That's
totally
different
direction
compares
to
the
full
like,
like
the
the
pcpus,
the
only
solution,
because
we
do
see
the
Noisy
Neighbor
problems
in
our
use
case.
That's
the
reason
we
like
to
distribute
them
in
different,
like
like
physical
course.
That's
only
for
I
think
we
do
see
the
performance
skills
for
our
use
case,
but
I
know
this
is
not
a
like.
A
common
policy
like
could
be
used
widely
in
other
scenarios.
E
So
that's
why
we
make
it
optional.
The
second
thing
is
the
the
difference
between
the
the
our
proposal
and
the
the
CPU
distribute
CPU
across
pneuma,
I.
Think
yeah,
as
I
said,
there's
a
difference.
What
we
like
to
control
is
more
on
the
physical
core
side.
E
We
don't
care
like
is
that
on
the
same
human
or
different
pneuma
and
yeah
I
think
if
we
think
we
we
want
to
more
like
information
about
our
proposed
like
the
options,
we
can
also
try
like
across
Yuma
and
do
some
like
performance
comparations
later
I
from
the
like
the
allocation
strategy.
I
do
see
the
difference,
but
from
the
like
performance
flight.
Currently,
we
don't
have
these
like
data,
yet.
F
F
Thank
you
for
the
proposal.
I
would
just
like
to
add
that
in
general,
adding
this
feature
as
policy
static
policy
option
is
feasible.
I
think
is
something
we
can
look
into
button.
My
comments,
I'm
Francesco,
were
more
about.
Okay,
let's
try
to
let's
try
I
would
like
to
see
if
we
can
implement
this
option,
this
allocation
strategy
as
a
composition
or
building
up
on
existing
building
blocks.
We
have
and
then
maybe
fill
the
gaps
or
tune.
A
Yeah
I
have
another
Uber
question
is:
did
you
find
it
in
testing?
Did
you
like?
How
did
you
discover
it?
I,
you
probably
have
a
lot
of
knowledge
but
I'm
curious
how
our
customers
will
discover
similar
things
like
how
people
will
recognize
that
one
policy
is
advantages
comparing
to
other
policies.
C
Yeah
I
can
share
that
so
previously
we
only
turn
around
the
static
policy
for
doing
the
testing
and
we
we
found
that
like.
If
you
can
look
at
the
maybe
the
appendix.
C
Right
so
so
previously,
like
we,
we
found
like
when
we
turn
on
the
default
CPU
static
policy,
so
it
will
like
assign
the
DB
container
to,
for
example,
the
same
core
with
different
hyper
threads.
So
at
that
time
like
when
we
have
like,
for
example,
when
we
have
one
hyper
thread
like
the
DB
container,
it's
like
we
only
assign
it
to
one
sorry,
one
CPU.
C
So
the
performance
is
like
it's
like
that,
and
then
we
also
needed
like
two
threads,
so
these
two
threads
will
be
shared
in
the
same
physical
core.
So
it's
not
like
a
linearly
improved
for
the
performance
and
we
do
more
testing
and
then-
and
we
found
like
there
is,
could
be
some
like
a
Noisy
Neighbor
issue,
and
we
are
actually
during
that
time
one
of
my
team
members,
one
of
our
team
members.
C
They
consult
to
some
I,
think
the
maybe
Hardware
or
expertise
like,
and
then
they
found
like
it's
actually,
for
example,
if
the
DB
container
is
maybe
there
are
two
threads
if
they
bound
to
the
same
physical
core
with
two
different
hyper
threads.
So,
in
that
case,
there
will
be,
could
maybe
something
like
a
cash
contention
like
there
is
a
L1
cache.
It
can
be
it.
It
actually
shared
by
two
different
hyper
threads.
C
So
in
that
case,
like
the
performance
can
yeah
like
it's
worse
than
if
we
assign
these
two
difference
views
to
different
physical
cores,
so
we
do
more
testing
we
found
out.
C
Okay,
actually
that's
the
trend,
that's
the
issue,
so
we
we
actually
it's
actually
a
different
team,
they
conductive
experiments
and
they
actually
require
like
a
submit
some
requirements
to
our
kubernetes
team
and
seeing
maybe
this
CPU
policy
may
be
better
like
we
need
to
spread
all
the
DB
container
into
different
physical
cores,
because
at
that
time,
like
one
machine
typically
is
like
most
of
CPUs
are
idle,
so
we
only
have
maybe
a
1db
container
running
in
that
machine.
C
So
in
that
case
we
need
to
spread
the
DB
container
into
different
special
course.
So
we
have
better
performance
so
yeah,
that's
the
I
think
the
initial
maybe
request
for
this
feature
for
this
new
CPU
policy.
C
So
I'm
not
sure
about
the
details.
I
I
only
know
like
because
we
work
on
different
teams
and
they
actually
so
from
their
statement.
It's
actually
for
the,
for
example,
there
is
one
physical
machine,
so
most
of
the
physical
course
actually
are
Idle
No.
D
E
Explained
yeah
I
can
give
some
comments.
They
definitely
have
some
assumption
for
this
policy
from
our
online
like
Statics.
What
we
observe
is
we
don't
see
all
the
instances
are
always
busy.
So
there's
always
like
sometimes
instances
are
busy
and
some
instances
are
kind
of
idle.
E
So
if
we,
if
a
DB
instance,
it
is
busy
and
the
average
thread
allocate
to
the
same
like
physical
card
that
have
the
issue,
but
it
does
have
a
chance
like
in
our
case,
I
would
say
a
lot
of
chances
like
this
instance
are
busy.
Okay,
you
use
the
like
the
cash
more
than
other
instances
that
also
share
the
CPUs.
So
that's
our
assumptions
because
we
are
doing
the
like
the
serverless
DB
instances.
We
don't
see
all
the
instances
are
busy
yeah.
A
I
wonder
if
same
results
can
be
go
through
the
same
static
policy,
but
twice
the
request.
D
D
Okay,
so
so
here
here
here
are
all
the
experiment.
Results
as
I
see
we
all
assuming
what
you
have
a
very
active
thread
of
application,
which
consumes
almost
a
whole
like
physical
computation
problem
or
physical
core,
and
you
have
a
separate
hyper
thread
which
might
be
like
less
utilized
for
by
our
workload.
So
it
will
not
let
much
interfere
with
database.
C
Right
sure
yeah,
so
the
second
one
is
the
kind
of
for
maybe
related
to
the
first
one.
So
when
we
have
this,
you
know
the
new
static
policy,
so
we
also
need
to
or
meditate
or
to
work
with
the
In-Place
vpa,
and
we
found
that
actually
in
community.
This
is
this
is
not
supported,
but
we
actually
kind
of
need
those
features
to
work
together.
C
So
we
just
for
for
this
one,
like
we
proposed
some
solution
to
fix
this
issue
to
make
in
place
of
APA
and
CPU
Manager
work
together
correctly.
C
Yeah,
so
in
the
solution
part
I
think
we
have
several
fixes
for
for,
for
these
two
features
to
work
to
work
together.
A
A
Easier
suggestion
just
to
make
it
so
you
don't
want
to
restart
your
restart
your
workload
right,
so
you
want
to
update
in
place
without
restart
right.
B
C
So
actually
yeah,
our
like
environment.
We
are,
we
already
turn
around
these
two
features
and
we
we
also
conducted
several
experiments
like
and
on
also
some
stress
test.
So
right
now
there
isn't
any
other
issue
we
identified
so
so
I
think.
A
Yeah
I
just
wanted
to
highlight
the
facts
that
when
we
designed
3ba,
we
explicitly
said
that
the
weather
workload
will
be
restarted
or
not.
It's
not
a
you
cannot
enforce
it.
A
You
can
just
ask
to
restart
explicitly,
but
you
cannot
ask
not
to
restart
and
I,
see
more
and
more
scenarios
and
use
cases
when
there
is
assumption
that
vpa
in
place
vpa
is
assuming
no
restarts,
I
think
if
we
have
more
and
more
requests
like
that,
you
probably
need
to
deal
with
that
and
make
sure
that
our
API
is
allowing
this
assumption.
So
we
we
should
be
able
to
say
like
this
in
place
vpa
like
specific
API,
maybe
explicit
option
we
can
pass
or
something
like
that.
A
That
will
guarantee
no
restart,
because
I
mean,
as
I
said
right
now
is
designed.
Pp
does
not
guarantee
no
restarts.
D
The
bigger
problem
is
actually
implicit
qos,
so
a
change
might
might
well
calculate
report
into
different
class
right
now.
If
you
remember
correctly,
it's
well
it's
kind
of
used
inside
vpe,
so
it
doesn't
allow
what
to
do,
but
the
risks
still
exists.
E
So
your
concern
on
the
Qs
is
I.
Think
you
assumed,
like
a
user,
could
change
any
like,
like
like
value
of
the
the
resources.
D
My
concern
is
what
CPU
manager
policies
are
working
only
with
guaranteed
qos
classes
right
and
my
behavior
for
guaranteed
Qs
classes
like
historically,
it's
maintained
at
what
like.
If
we
give
some
resources
to
a
container,
we
are
not
changing
it
or
let's
say
we.
We
should
not
disappear
and
in
terms
of
CPUs
it
means
what
we
are
allocating
exclusive
CPU
for
guaranteed
QRS
classes.
D
If
a
CPU
manager
is
active
and
some
of
our
applications,
let's
say
like
the
Telco
dpdk
based
applications,
they
rely
on
web
Behavior
so
way
like
team
themselves,
data
processing
threads
to
those
exclusively
allocated
course,
and
if
inside
policy
during
the
scaling,
you
start
to
add
or
remove
something
from
CPU
set,
you
you
will,
you
will
break
those
applications.
E
Right
right,
I
think
there's
two
yeah
like
two
things.
The
first
thing
is
the
we
want
to
make
sure
the
Qs
class
won't
be
changed
and
I
think
that's
for
sure,
because
we
so
here
we
we
talk
about
like
CPU
manager
and
the
In-Place
BPA,
so
the
assumptions
that
people
always
choose
the
like
the
the
integer
values
and
they
won't
like
go
back
and
forth
between
like
some
different
Qs
classes.
That's
a
I.
Think
in
our
case
that's
an
assumption.
E
The
second
problem
is
I
think
how
application
adapts
to
the
new
changes
of
the
like
CPU
changes
or
memory
changes.
So
yeah
I
agree.
That's
a
like
a
problem
actually
to
application,
because
even
sometimes
we
do
allocate
more
resources
or
like
tier
two
applications.
The
applications
cannot
detect
the
results,
change
and
adapt
to
is
like
Behavior,
I.
Think
that's
more
on
on
the
application
side.
If
the
application
can
dynamically
detect
the
changes,
then
the
feature
Works.
Otherwise
it
doesn't
make
like
sense.
E
As
you
said,
some
like
dbdk
or
some
other
applications,
yeah
I
I,
would
say:
do
we
think
it's
better
to
like
enable
the
like
capabilities
from
the
resource
layer?
First
and
then
I
think
it's
the
applications,
the
responsibility
to
adapt
to
like
these
kind
of
new
changes,
at
least
that
we
can
unlock
some
of
the
applications
to
leverage
this
feature
and
that's
something
we
we
are
thinking
about.
A
In
general,
we
try
to
prevent
people
shooting
themselves
on
the
foot.
So
if
you,
if
we
can
make
it
clear
and
like
more
explicit,
what
should
what
will
happen?
We
will
do
that.
So
we
don't
fully
rely
on
application
to
behave
properly.
D
Like
imagine,
we
enable
with
functionality
and
CPU
manager
and
imagine
we
have
like
preventing
quick
us
change,
so
the
policy
should
be
when
able
to
track
like
when
we're
scaling
up
or
down
the
port
or
well
actually
container
vpa
should
know
what
was
the
original
request.
So
we
never
go
lower
when
there
was
a
regional
allocation
and
the
same
goes
with
CPU
core.
So
if
we
allocated
it
would
be,
it
will
start
of
a
container
some
CPUs.
D
A
I
think,
in
the
end
of
the
day,
one
way
or
another
In-Place
update
needs
to
support
apology
manager
so
like
it
will
I
mean
some
solution
needs
to
happen
and
I'm
afraid
that
solution
may
be
I
mean
solutions
that
will
not
rely
on
application
to
behave
correctly
will
be
to
restart
everything
and,
in
some
cases,
even
whole
Port.
If
policy
was
better
Port
allocation,
so
you
need
to
re-admit
the
entire
report
and
restart
all
the
containers.
A
Which
is
far
from
ideal,
so
this
is
like
one
side
of
a
spectrum
is
to
restart
everything
like
the
entire
report
needs
to
be
rescheduled
in
the
same
node
and
another
side
of
spectrum
is
just
do
whatever
requested
and
hope
that
application
reacts
correctly.
So
I
think
we
need
to
find
a
balance
in
the
middle
and
I
haven't
read
the
solution
proposal
in
details,
but
I
think
what
Sasha
brings
as
a
concerns
is
a
very
well
concerns
and
we
need
to
decide
where
we
want
to
be
on
the
spectrum
between
like.
G
Did
we
ever
discuss
having
some
kind
of
a
hook
or
signal
to
the
application,
so
it
can
detect
and
like
the
cigarette
liquid
configuration,
something
like
that
could
work
or
then
or
a
custom
hook
or
script
that
the
application
can
add,
which
then
gets
called,
and
it
knows
something.
Change
I
need
to
reread
the
configuration
and
update
my
resources.
A
G
A
When
we
discuss
the
API
for
vpa,
the
API
options
was
I,
restart
container
explicitly
and
prefer
not
to
restart.
There
are
two
options
right
now,
and
one
of
the
options
we
discussed
is
exactly
that,
like
a
send
some
signal
to
The
Container
I
I
mean
we
discussed
it
briefly,
but
we
never
implemented
anything
like
that.
Do
you
think
it
will
help
in
this
situation.
D
It
depends
on
what
kind
of
applications
we're
talking
about
and
Depends?
Is
the
application
have
a
possibility
to
say
no
to
a
decision,
so
example
with
the
pdk
application?
You
are
saying
a
US
engine
signal
saying:
I
want
to
remove
CPU
core
number.
Five
and
applications
would
say
like
no,
no
I'm
already
like
using
it
very
actively
foreign.
D
Thank
you.
So
we
we
had
experience
with
in
NRI
we
we
have
a
down
downward
API
as
a
file
which
say
which
provides
a
list
of
CPU
cores
which
is
available
for
application
and
like
using
I-95
Watcher
on
on
this
file,
you
can
detect
what
something
is
changing,
but
it's
like
post
factum,
so
you
cannot
prevent
policy
to
change
your
allocations.
A
I
think
if
I
will
summarize
this
item,
we
definitely
need
to
work
on
that
and
some
work
will
be
needed
if
you
want
to
move
it
forward.
Please
do
I
think
this
may
be
a
good
starting,
a
starting
point,
but
make
sure
you
read
the
comments
and
try
to
address
as
much
attention
as
possible.
C
To
the
next
one
yeah,
this
is
the
third
one.
Is
the
In-Place
vpa
performance
Improvement,
so
we
found
that
I
think
for
right
now
for
equivalent
any
1.27.
The
problem
is
still
exists.
So
if
we
change
the
resources,
it
will
take
about
one
minute
to
finish
like
from
proposed
to
in
programs
and
no,
for
example,
we
need
to
scale
up
or
down
to
the
specific
containers,
so
we
actually
identify
the
issue
it's
like.
Actually,
there
are
for
for
this
performance
effects.
C
We
have
several
fixes
about
this
one,
so
it's
actually
I
think
this
is
the
first
one
and
can.
C
Oh
sorry,
I
I
forgot
to
paste
the
original
one
so
actually
in
the
Google
it
single
pod,
I
think
the
thing
called
is
like
a
one
Loop
to
reconcile
the
skill
resource
changes
right
so
at
the
I
think
at
the
end.
Actually
we
need
to
get
the
results
from
the
container
CRI
to
know.
C
Actually,
the
resources
has
been
like
actually,
for
example,
allocated
or
scale
down
up,
it's
actually
already
complete,
but
at
that
time
actually
in
right
now
in
Singapore
there
is
no
detection
of
that
like
we,
we
haven't,
got
any
part
status
from
CRI.
So
that's
why
there
is
like
a
what
you
need
to
one
minute
to
complete
the
scale
up
and
down.
C
So
we
actually
fixed
that,
like
we
directly
get
these
Port
status
from
CRI
and
then
we
know
actually
in
this
one
single
powder,
the
VPN
has
complete,
and
maybe
it
will
take
less
than
one
one
second.
So
after
we
fix
that,
we
found
another
issues
actually
in
here
with
like
a
fixed
one,
fixed
two,
so
it's
all
related
to
the
part
of
the
status
from
API
server
and
also
the
CRI
being
inconsistent
in
the
code.
So
we
fix
those
issues.
A
Yeah
I,
don't
know
about
specifics
here,
I
think
we
will
have
David,
maybe
able
to
comment
right
away,
but
I
I,
don't
bet
on
that
yeah.
Please
send
a
hot
fix
like
fixes
PR,
and
make
sure
that
there
is
a
description
idea.
The
test
for
that
I
have
a
big,
bigger
piece
of
feedback
from
Clayton
that
most
of
the
quotations
written
may
need
to
be
Rewritten,
because
there
are
many
principles
for
races.
A
So
there
are
suggestions
how
to
refactor
things
if
it's
very
targeted
updates
well
features
still
in
Alpha
and
it
doesn't
affect
any
any
other
logic,
definitely
like.
Let's,
let's
make
it
better,
so
people
can
test
how
vpa
Works
in
in
ideal
scenario
rather
than
like
less
ideal.
Thank
you.
Yeah.
F
C
Okay,
sure
yeah,
so
we
can
go
to
the
last
one
so
yeah.
This
one
is
also
related
to
In-Place
vpa.
We
found
like
sometimes
it
will
be
stuck
in
the
progress,
because
sometimes
we
we
do
have
like
a
customer
resources
in
resource
allocated.
This
keyword
so
I
think
maybe
you
guys
already
know
that,
because
in
the
pot
there
pot
has
this
back
and
also
status
right.
So
for
the
In-Place
vpa
we
only
focus
on
the
CPU
and
the
memory
like
a
scale
up
and
down.
C
So
if
there
is
another,
for
example,
another
keyword
here,
for
example,
maybe
IP
or
GPU
or
network
blah
blah
some
other
devices
in
this
resource
allocated,
then
it
will
for
this
one.
The
vpu
will
be
stuck
in
progress
because
you
can
see
the
code
here
so
it
it
will
compare
actually
the
the
resource,
I
think
from
the
spec
and
also
the
status.
It
will
never
like
this.
This
resource
will
never
be
equal
because
in
place,
vpa
will
only
have
CPU
and
the
memory,
but
for
the
spec
it's
actually
another
customer
resources.
D
It's
actually
the
same
triggers
a
question
about
restock,
not
restart,
so
everything
what
we
are
not
restarting.
We
need
to
have
a
comparison
with
its
native
resources
for
everything
else.
If
we
have
changes,
we
we
need
to
force
restart
containers
to
trigger
the
allocation
of
those
resources
like
device,
plugins
or
anything
else,
or
extended
resources.
H
A
Yeah
I'm
so
excited
that
you're
trying
it
out
and
given
feedback.
This
is
a
big
feature.
Everybody
wants
to
start
using
it,
but
I
I'm
a
little
conscious
about
it.
Like
so
many
issues
like
whenever
somebody
tried
it,
they
found
something
to
report.
You
have
over
20
bucks
reported
in
kubernetes
issues
list
just
for
this
feature.
E
Do
you
have
a
like
actual,
like
a
issue
list
that
do
we
have
a
tag
for
this
issue
to
easily
find
those
like
20
problems?
So
we
probably
can
check
and
let's
see
whether
we
can
match
our
like
fixes
with
those
issues,
so
we
can
wrap
the
reference.
Those
issues
if
we
fail
the
like
PR
foreign.
A
Would
be
to
pause
this
list
on
enhancement
issue,
so
we
have
some
summary
of
what
will
be
happening
and
it
will
tackle
most
of
the
problem
in
129.
It's
a
it
will
be
a
great
Improvement.
A
Okay:
let's
go
back
to
the
agenda.
Is
there
anything
else.