►
From YouTube: Kubernetes SIG Node 20221108
Description
SIG Node weekly meeting. Agenda and notes: https://docs.google.com/document/d/1Ne57gvidMEWXR70OxxnRkYquAoMpt56o75oZtg-OeBg/edit#heading=h.adoto8roitwq
GMT20221108-180646_Recording_640x360
A
This
meeting
is
being
recorded
good
morning.
Everyone
today
is
the
November,
8th
2018
and
it's
our
regular
weekly
signaled
meeting
and
the
welcome
back
everyone,
and
also
today,
is
the
1.26
code
of
free
state
right
so
I
know
everyone's
so
busy
and
after
and
also
the
code
of
phrase,
all
those
kind
of
things.
So
we
also
have
a
lot
of
proposal
but
I
believe
off
nothing.
A
Some
of
us
already
protests,
so
some
is
being
already
crossed
and
but
they're
still
put
there
as
the
information
to
updates
to
the
to
community
here
right,
so
us
so
David
and
Eric.
Do
you
want
to
start.
B
Sure
I
can
start.
This
item
actually
is
not
super
critical
for
code
free,
so
I
was
thinking.
Maybe
I
can
actually
move
it.
B
D
D
A
And
also
some
is
blocked
by
the
by
the
release,
actually
I,
believe
people
write
the
process
like
also
in
the
morning
I
also
start
to
process
all
those
kind
of
things.
So
I
think
this
is
maybe
it's
okay
for
us,
too.
Okay,.
B
In
our
case,
okay,
cool
just
wanted
to
make
sure
that
people
have
time
with
him
and
bring
anything
up.
So
this
is
something
that
I've
been
working
with
with
ericon.
So
the
kind
of
backstory
here
I
just
wanted
to
kind
of
get
folks
thoughts
about
this
and
see
if
this
is
some
problem
that
other
few
people
face
or
if,
if
it's
something
kind
of
be
kept
for,
maybe
this
is
kind
of
a
smaller
thing.
B
The
kind
of
story
is
that
we
we
have
like
a
script
that
basically
checks
the
health
check
of
the
CRI
periodically
on
our
nodes.
So
sometimes
we
have
the
case
where
the
container
runtime
goes
down
and
sometimes
Google
can
still
be
up
or
it
can
be
kind
of
worked
one
way
or
another.
So
we
have
a
script
that
basically
periodically
directly
contacts
the
container
runtime,
the
CRI
via
cry
cuddle
and
just
does
like
a
crack
cuddle,
pods
list
and
that's
kind
of
health
check
that
we
have.
B
So
it's
kind
of
like
a
health
check
that
we
have-
and
this
requires
us
kind
of
to
run,
crackle
every
minute
right
and
so
the
problem
with
this
we
kind
of
realize
crackle,
is
actually
a
pretty
heavy
binary
and
we
were
kind
of
investigating
some
cases
in
production
where
we
have
a
kind
of
disk,
throttling
and
low
disk
iops
and,
surprisingly
enough,
the
loading
of
the
cry
cuddle
binary.
B
Just
because
it's
the
kind
of
a
big
chunky,
50
meg,
go
binary
actually
resulted
in
some
like
additional
disk
IHOP
usage,
which
is
kind
of
funny
that
the
health
check
itself
is
causing
kind
of
more
issues.
So
we
were
discussing
some
ways
to
resolve
this
and
we
were
thinking
you
know,
since
the
kublet
is
already
talking
to
the
container
runtime.
Maybe
it
makes
sense
from
the
Kubla
to
kind
of
have
a
health
help
the
endpoint
that
directly
checks
if
the
container
run
times
up.
B
So
this
way,
since
the
kublet's
already
running,
we
don't
need
to
load
another
binary
to
talk
to
container
runtime.
That's
kind
of
The
Proposal
here.
So
the
proposal
is
to
add
a
new
health
check
to
Kublai
that
directly
just
checks
if
the
container
runtime's
up
so
that
way,
we
don't
need
to
load
in
another
binary.
To
do
that.
So
I
was
wondering
if
other
folks
think
this
is
useful-
or
this
is
another
page
probably
face
or
how
you
health
check
the
CRI
in
general
yeah
and
to
get
some
parts.
D
A
Yeah
I
also
don't
have
any
kinds
of
that,
but
I
I
have
to
admit
that
until
I
agree
to
the
code,
I
have
to
say
that
the
implementation,
the
the
description
actually
is
confusing
there.
So
the
feature
what
they
you
need
talking
about.
A
Actually
it
is
totally
I,
don't
have
any
concern
and
also
I'm,
not
sure
yeah
I
agree
with
this
also
why
we
need
another
of
the
flag
there
and
the
so
so,
but
I
can
see
that
maybe
because
they
want
to
use
in
a
flag
are
to
control,
because
on
doing
something
like
the
motion
level
controlling
here,
because
because
the
older
version
don't
have
that
healthy
right
in
the
point.
B
Yeah
exactly
and
the
theory
that
why
we
were
discussing
flag
for
that
is
because
some
folks,
like
the
existing
health
check,
didn't
check
that
the
container
runtime
was
up
just
a
few
bits.
So
maybe
some
folks,
you
know
for
the
health
checks,
they
don't
care
about,
checking
the
container
runtime,
and
so
that
was
kind
of
someone
raised
that
as
a
concern.
So,
like
that's
kind
of
the
issues,
we
need
some
way
to
differentiate
if
folks
care
about
that-
or
maybe
the
other
question
should
just
be
built
into
the
Kubler
health
check.
B
E
Oh
David,
one
question:
how
is
this
different
from
the
generic
plan,
because
generic
plaque
already
caused
CRI
to
at
least
All
Pause
right
frequently.
B
B
The
difference
is
basically
that
this
is
just
checking
if
I
think
it
makes
I
need
to
check
exactly
what
call
it's
making
but
just
checking
if
the
container
runtime
is
up
and
then
it's
marking
the
the
health
Z
endpoint
directly.
So
the
difference
is
that
you
know
the
Kublai
actually
exposes
like
Health
Z
endpoint
right
that
you
can
kind
of
curl
and
you
can
get
that
as
a
response,
so
that
can
be
used
by
like
a
script
or
something
else.
B
B
B
Well,
I
mean
the
main
goal
here
is
just
to
check
that
the
continuing
times
up
so
like
the
idea
is
like
you
know.
If
the
correctional
status
is
responding,
it's
up.
B
G
Yeah
one
question
about
this:
is
that
I
remember
it's
the
MPT
performed
the
health
checkpoint
and
we
started
compare
the
and
on
MPD
side
there
is
also
code
written
to
perform
household,
and
it's
just
that
that
code
today
invoked
character.
But
actually,
if
we
just
change
that
part
to
a
client
code,
then
we
don't
need
to
load
the
binary
either
right.
Why
that
is
not
considered.
B
G
Oh
okay,
because
okay
for
yeah
I
think
we
use
that
in
some
other
products.
Basically,
we
use
MPD
to
there's
a
house
chat,
plugin
there
and
it
load
that
joins,
has
house
type
plugin
to
check
the
Cube
Lane
and
contain
healthy
needs
and
restart
it's
basically
the
same
behavior
with
the
script,
but
result
before
or
four
kubernetes
before,
and
we
use
a
a
midi
and
a
midi
plugin
there
and
there
today.
We
are
also
invoking
cry
Castle
periodically
from
that
Health
chat
code.
F
G
We
can
also
change
that
to
a
CI
client.
Instead,
if
we
think
loading
another
binary
is
an
overhead
just.
B
G
G
Is
it's
a
vlogging
yeah.
A
Yeah,
yeah,
yeah
and
now
this
is
why,
when
I
saw
this,
when
the
first
and
the
first
one
I
saw
this
agenda,
I
pin
you
because
I
said
so,
then
we
won't
have
the
plan
and
later
MPD
take
over.
So
we
don't
need
to
have
those
script.
It
sounds
like
we'll,
never
finish
that
work
first,
the
thing
and
they
said
that's
why
no
puc
I
said.
Can
you
take
a
look
but
on
the
other
hand,
quickly
after
pingu
and
offline,
and
ask
you
to
take
a
look
I
quickly
reminded?
A
Actually,
maybe
we
do
I
think
these
features
still
have
the
meaning,
because
not
everyone
deployed
after
know
the
problem.
Detector
right
you
do
want
to
next
we
could
have
a
while
even
like
the
maybe
people
don't
deploy
after
know
the
problem,
detector
and-
and
they
only
have
like
the
container,
they
pass
the
kubernetes.
They
still
have
a
way
to
check.
So
that's
why
I
kind
of
I
also
told
myself?
Oh,
maybe
that
I
need
to
think
about
the
front
other
way.
A
I
just
share
here
the
context,
because
some
of
we
do
have
the
MPD
and
there's
the
planning
already
build,
but
it
looks
like
within
shifter
the
standard
config
from
the
previous
script,
based
of
the
check,
kubernet
and
and
also
containerdy
healthiness
to
that
NPD,
NPD
and
and
but
on
the
other
hand,
the
containerdy
itself
and
the
crowd
itself
have
that
healthy
and
the
point
maybe
help
other
this
job
because
they
may
don't
deploy
NPD,
they
don't,
they
have
to
travel,
depart
NPD.
So
just.
G
Yeah
yeah,
but
another
question
I
have
related
to
this.
Is
that
why,
because
another
option
that
we
can
just
let
the
condition
one
time
expose
its
Health
standpoint
itself
right
so
that
it
we
can
gather
up
the
dependency
and
cubelet
in
between
and
yes,
I.
Just
wonder
why?
We
why
those
are
not.
Those
options
are
not
considered
instead
of
just
that's.
B
A
good
point,
I
guess
I
guess,
is
it
in
I
guess?
Is
that
valuable
to
have
like
a
client
connect
remotely
to
crackle
and
check
if
the
client
was
able
to
make
that
connection,
because
that's
kind
of
what
Google's
trying
to
do
versus
if
the
container
runtime
itself
is
up
I,
don't
know
if
that
difference
matters
right
like
is
there
a
situation?
A
continual
time
can
be
up
but
like
it
can't
be
reachable
from
like
clients,
I,
don't
know,
that's
sort
of
the
question.
F
I
mean
same
same
thing
can
be
said
for
some
of
our
cni
services.
I
know,
there's
a
cni
check
and
I'm
I
noticed
in
here
that
it
looks
like
you're
going
to
check
for
runtime,
ready
image,
service,
ready
and
network
ready.
So
we
probably
wouldn't
get
this
a
little
bit
a
little
bit
deeper.
I
think
Dave.
B
Okay,
yeah,
maybe
this
makes
sense
on
the
runtime
directly.
That'll
also
make
sense.
I,
don't
know
the
the
current
kind
of
all
the
health
checks
are
built
around
connecting
to
the
CRI
and
then
checking
if
that
connection's
special.
But
if
the
socket's
there
and
container
runtime
says
it's
ready,
maybe
that's
good
enough.
A
For
this
one,
basically,
it
is
only
only
you
solve
the
one
problem
right.
So
basically,
I
have
published
this
in
the
points.
It's
not
really
healthy,
it's
not
really
continually
or
cry
out.
It's
really
healthy.
It's
basically,
it
is
alive.
It's
running
the
process
is
bad
and
and
into
State
can
publish
that
at
the
point.
That's
all
so
so
I'm
not
sure.
Are
we
going
to
eventually
once
we
have
this
healthy?
B
H
B
Okay,
okay,
I
think.
Let
me
go
back
I'm
working
with
with
someone
else
in
this
and
let
me
kind
of
gather
some
feedback
I
think
it's
useful,
maybe
consider
something
on
the
container
runtime
and
let
me
bring
this
back
into
some
internal
discussion
and
then
come
back
later.
Thanks
all
for
the
discussion.
I
This
this
topic
is
about
the
image
pulling
at
the
same
time,
and
we
have
already
have
a
option
of
like
like
to
to
make
returns
here,
QPS
and
first
to
a
limit,
but
I
find
that
there
I
recently
got
got
the
issue,
and
the
issue
says
this
works
not,
as
is
expected
and
I
have
a
PR
forecast
to
fix
it.
But
after
some
really
thinking
about
this
feature,
this
feature
seems
not
to
make
sense.
I
The
most
scenarios
and
I
I'd
like
to
ask
for
some
feedbacks
from
sync
node
and
also
I,
think
we
should
have
a
new
flag
like
non-level
limit
or
something
else,
because
the
current
logic
is
just
the
qts
and
burst,
but
this
this
just
limit
the
starting
image
pulling
limit,
not
the
no
level
limit
of
parallel
post
as
Raven
says,
I
think
I'm,
not
sure.
If.
E
Yeah
so
pekko,
if
I
remember
correctly
in
the
issue,
you
mentioned
that
somehow
the
users
were
under
the
impression
that
the
limit
is
actually
on
the
number
of
in-flight
parallel
image
pools.
But
that
is
not
the
case
right.
The
burst
actually
only
limits
the
number
of
poles
that
kublitz
sent
to
runtime.
E
It
doesn't
care
how
many
pools
are
in
Flight
already.
So
that's
why
we
want
to
change
it
instead
of
doing
the
QPS
burst.
So
we
want
to
do
a
no
level
limit
on
number
of
in-flight
poles.
G
Oh
I
heard
Echo.
Oh
I
just
wanted
to
pass
one
here,
because
I
can
place
some
internal
calculation
with
this
trying
to
use
this
QPS
to
actually
limit
the
the
the
concurrency.
So
it's
basically
impossible
because
the
image
pool
requires
this
long.
It's
a
long,
long
running
request.
It's
not
just
some
one-time
thing.
G
Just
limiting
the
QPS
won't
work,
because
if
the
image
take
much
image,
protect
two
minutes
five
minutes
just
limiting
how
many
poor
you
can
start
every
second
doesn't
help
so
like
we
even
did
this
kind
of
very
complex
calculation
to
try
to
see
whether
we
can
use
this
to
achieve
some
kind
of
concurrency
limit
and
just
impossible.
So
at
the
end
we
didn't
use
this
at
all.
So
yeah.
I
D
A
Yeah,
that's
that's
yeah,
that's
also,
but
that's
complicated,
I
I.
We
yeah.
We
always
wanted
you
to
to
the
red
worker
right
so
initiate,
but
that's
but
I
also
understand
that's
kind
of
make
the
user
developer
to
to
oh
yeah,
okay,
vertical
vertical
impress
update,
can
help
this
one
but
I.
We
thought
the
desperately
want
better
things
and
we
can
get
some
user
exposure.
I
just
always
worry
about
it.
This
is
my
old
experience
before
we
have
that
vertical
Dynamic
update
in
book.
A
Basically
user
is
just
cause
more
trouble
like
that
for
customer,
which
is
my
type
of
developer
right,
so
that
the
developers
so
like
admin
is
also
my
user,
but
they
they
are
maybe
happy
because
it's
basically
but
I
shift
that
problem
to
the
every
application
owner.
So
when
it
deprives
our
services,
they
don't
know
how
to
configure
their
work
node.
Let's
make
that
even
more
more
challenging.
So
more
people
stopped
using
guarantee
instant
use
in
first
of
all,
then,
which
has
to
return
back
to
make
the
admin
work
is
much
harder.
A
You
can
see
that
the
cascading
problem
and
circling
like
I
say
so.
We
shift
the
problem
them
because
from
the
itemic
mechanism
is
easier,
so
then
charge
to
the
each
application
owner
because
you
charge
it
today,
group
so
then
make
the
their
job
maybe
be
killed.
All
those
kind
of
things
all
strength,
all
those
kind
of
things,
and
so
they
have
to
configure
their
working
out
better.
So
then,
before
we
have
this
dynamic
in
place,
upgrade
update
the
resource
in
the
past.
A
So
the
end
up,
they
all
changing
using
something
like
a
burstable
and
once
everybody
using
burstable,
then
you
have
to
preemption
all
those
kind
of
things.
Logic-
and
you
know
items
actually
also
have
the
problem,
because
you
also
have
the
arrow
budget
right
so
for
the
services
what
you
host.
So
you
are
then
to
predict
about
that
resource
capacity
planning
better,
so
I,
just
I
just
want
to
share
people
here,
since
we
also
many
of
the
people
development.
Also-
or
maybe
it's
item
in
here.
So
this
that's
the
that's
potential
problem,
yeah.
C
Yeah
this
sounds
like
another
use
case
for
using
evpf
to
detect
that
okay
image
pool
is
starting
and
immediately
give
whatever
capacity
you
can
to
the
Pod
as
long
as
they
are
allocating
the
budget,
the
budgeting
for
it,
and
that
will
make
things
better.
It's
much
more
responsive.
These
are
the
deterministic
events
for
which
we
can
really
use
the
technology.
A
But
we
need
here's
the
one
problem,
it's
right,
so
kubernetes
we
did
a
terrible
job
on
the
network
eye
opener
is
management.
C
A
Job
on
those
Network
aisle
I
mean
I,
I
actually
say
we
did
nothing
basically,
and
this
I
hope
we
also
did
this
really
poor,
but
the
hopefulness
it
will
be
too
you
can.
We
can
help
to
do
something
at
the
Disco
at
all,
but
the
network
that
I
owe
so
far
I,
don't
see
whether
it's
the
good
solution
for
kubernetes,
so
the
basic
Universe
are
the
kubernetes
vendor
to
think
about
how
they
are
going
to
integrate,
with
the
their
backbone
Network
offer
and
do
this
education.
So
just.
C
I
actually
did
one
talk
in
the
open
source
Summit
in
Texas
in
Austin
Texas
about
bringing
in
qos
for
kubernetes,
pods
I,
don't
know
if
that's
something
Tim
also
mentioned
he
was
interested
in
it,
but
we
don't
have
a
clear
set
of
requirements
to.
Essentially
this
was
more
from
the
line
of
from
the
point
of
you
know:
different
pods
have
different
network
needs.
If
you
have
a
pod,
that's
doing
image,
processing
like
file,
backup
versus
a
pod,
that's
processing,
you
know
payment
traffic.
They
both
compete
for
the
same
network
bandwidth.
A
C
I
For
your
feedbacks
and
I
have
done
something
related
and
I
have
to
have
opened
a
podcast
in
continuity
to
add
metrics
for
image.
Pro
related
thing,
for
example,
how
many
images
are
pulling
in
process
and
some
the
the
duration
of
the
image
pulling
in
your
Matrix
in
another
metrics?
And
if
we
have
those
Matrix
I,
think
users
can
a
limited
can
know
how
to
set
the
QPS
or
burst
live
using
the
using
such
a
calculation
to
to
know.
E
F
Think
it's
also
important
to
note
here
that
we
had
a
lot
of
discussions
at
The.
Summit
cried
V28.
You
know,
Summit
discussions
around
the
concept
of
adding
into
the
Pod
specifications,
some
image
service.
You
know
kind
of
declarative
information
that
would
help
explain
to
The
Container
runtime,
the
the
types
of
caching
that
needs
to
happen.
How
fast
this
particular
container
image
needs
to
be
pulled.
You.
F
Should
be
kept
on,
you
know
on
store
in
the
node,
as
well
as
whether
or
not
you're
you
should
use
lazy.
You
know
image
pooling
services
for
that.
You
know,
for
this
particular
container.
F
Different
runtime
handlers
are
going
to
want
to
pull
the
images
into
their
own
VM,
for
example,
in
combination
containers,
so
I
think
it
it
sort
of
points
out
that
we
need.
We
need
some
more
declarative
information
in
the
plot
specifications
to
hand
down
to
who's
handling
these.
These
image
pools
in
this
additional
networks
and
and
pretty
soon
multi-networks
on
each
of
the
networking
devices
that
we
have
available
to
us.
So
we
we
need.
We
need
to
somehow
manage
this
stuff.
F
Three
and
like
vinay
was
saying
we
need
some
quality
of
service
information
in
those
pod
specs,
so
we
can
go
from
an
imperative
right,
API
to
more
declarative
around
this
space,
and
then
we
can
extend
plugins
that
some
of
the
intelligence
we're
working
on
you
know
to
make
some
decisions
on
how
to
manage
those
images.
You
know
in
a
in
a
common
way
across
the
entire
node.
C
Right
I
think
multinet
dripping
brings
in
a
very
interesting
that
effort
has
started
out.
I
was
in
one
of
those
contributor
submit
talks,
they're
talking
I,
think
the
use
case
of
using
SRI
makes
the
high-speed
Nicks
for
getting
bulk
data
that
might
and
that
separates
your
Port
traffic.
It
could.
Overall,
there
are,
there
are
a
few
different
design
patterns
that
we
could
look
at
here.
Qos
included,
hopefully
with
this
whole
involvement
to
you
know
better
Network
management
for
kubernetes.
A
I
have
to
check
the
time.
I
know
there.
People
have
so
so
Paco
I.
So
thanks
for
the
after
this
one,
so
I
basically
I
think
the
can.
We
just
looks
like
the
everyone
agree
about
the
limited,
the
number
of
the
parallel
poor,
at
least
at
this
moment
before
we
accept
and
I
I,
think
everyone
agree
about
the
net
know
the
level
of
the
limit
right
concurrent,
the
pool
at
this
moment.
A
So
can
you
summarize
what
we
discussed
here
and,
of
course,
your
current
appear
together
with
the
river
and
then
we
we
can
working
on
new?
What?
What
do
you
propose
here
and
continue
with
that?
New
okay.
I
A
Thanks
and
thanks,
we
will
also
volunteer
to
help
and
next
one
we
may
really
I
know
you.
We
have
you
earlier.
D
C
C
I
think
that
pretty
much
covers
it.
There
was
the
issue
that
came
up
was
with
after
rebasing.
The
bunch
of
changes
came
in
in
the
past
day
or
so,
and
there
were
some
code
changes
I
needed
to
update
with
the
rebase.
The
big
change
is
essentially
adding
the
new
CI
job.
I
still
haven't
had
the
chance
to
look
into.
Why
that's
feeling.
C
David
Porter
had
helped
me
with
with
it,
and
we
found
that
some
of
the
config
parameters
weren't
right
and
then
that
it
went
past
that
now
the
API
server
is
not
coming
up.
So
I
need
to
look
into
it.
Hopefully,
I'll
find
time
tonight
or
tomorrow
to
look
into
that.
Yeah.
A
Next
one
I
think
the
sorry
about
the
Ian.
Sorry
I
made
honest.
Your
name
is
wrong,
but
can
you
talk
about
the
CPU
site.
H
Sure
yeah
name
is
pronounced,
Ian
yeah
I
was
talking
with
Tim
and
some
other
people
about
moving
some
a
CPU
set
Library,
that's
used
within
CPU
manager,
primarily
into
kubernetes
utilities,
and
it's
evolved
into
quite
a
little
project,
but
I
just
wanted
to
post
the
pr
in
case
anybody
had
some
interest
in
opinion
on
the
deletion
of
N64
variants,
or
you
know
kind
of
the
re
deleting
the
no
sort
variant
I
was
originally
planning
to
delete
the
no
sort
variants
of
the
API.
H
But
then
Tim
suggested
I
take
a
look
at
the
set
API,
which
has
a
list
and
unsorted
list
methods
and
to
mimic
that
for
this
Library.
So
that's
kind
of
the
direction
I
went
with,
but
just
wanted
to
kind
of
from
the
Spy
people
and
see
if
anybody
had
any
ideas
or
anything.
D
A
D
I
think
that
the
move
itself
made
sense
to
me.
I
I
can
I
can
review
the
pr
like
and
see
if
I
have
any
thoughts
on
yeah.
A
Okay,
thanks,
that's
all
because
I
think
the
rest
staff
will
already
covered
and
any
other
topic
of
people
want
to
discuss.