►
From YouTube: Kubernetes SIG Node 20220621
Description
SIG Node weekly meeting. Agenda and notes: https://docs.google.com/document/d/1Ne57gvidMEWXR70OxxnRkYquAoMpt56o75oZtg-OeBg/edit#heading=h.adoto8roitwq
GMT20220621-170431_Recording_640x360
A
Good
morning,
everyone
today
is
the
June
21st,
and
thanks
and
this
week
we
have
like
the
that
nine
for
the
type
so
I
know
everyone's
busy.
So
let's
go
over
our
agenda.
A
I
saw
the
current
and
I
did
review
that
one,
the
Kevin.
Do
you
want
to
talk
about
the
the
alignment
circuit
alignment,
CPU
manager,
option.
B
A
A
Okay,
okay,
so
I
just
reviewed
before
this
meeting
and
and
I
will
continue
finish
and
having
to
finish
other
review
and
but
I
think
the
current
give
throughout
the
review.
I
agree
with
him,
so
looks
likely
we
are
going
to
approve,
and
after
this
meeting
and
move
forward,
this
is
an
alpha
feature.
I
offer
option
and
for
existing
features,
so
yeah
sure
makes
sense.
Yeah.
Do
you
want
to
talk
about
the
next
one
yeah.
B
So
don
I
put
together
a
list
of
few
of
them,
I
think
first,
second
and
the
fourth
one
are
like
minor
updates
to
the
caps
that
need
approval
like
Milestone
or
the
testing
section.
So
so
Derek
is
not
out
is
out
this
week.
So
we're
gonna
ask
you
to
if
you
can
take
a
quick
look
and
add
approve
on
those
I.
A
Think
before
this
meeting,
I
have
already
approved
two
of
them,
oh
and
the
first
one.
For
the
part,
the
network
will
write
the
condition
so
I
I,
so
they're
still
after
the
review
going
on.
So
please
ping
me
once
you
sing,
for
that
is
Friday.
C
Dynamic
resource
allocation
is
not
in
that
list.
That's
the
pr
number
of
cap,
30
63
3063,.
C
C
So
Tim
really
encouraged
everyone
me
as
the
author
and
Aldo
as
we
have
our
core
reviewer
to
consider
merging
it
provisional
and
then
taking
it
from
there.
We
are
fairly
close
to
having
something
that
we
all
agree
on
for
the
API
and
the
scheduler,
but
yeah.
We
need
further
review
also
from
someone
from
from
sick
note
to
move
it
to
implementable.
B
So
I
know
Derek
was
reviewing
it
I'm,
not
sure
if
he
got
a
chance
to
comment
but
he's
out
this
week,
I
can
make
a
pass,
but
I'm
not
sure
like
if
I'll
be
able
to
approve
it
entirely
from
Signal's
perspective.
Zik.
C
Yeah
it
it's
mostly
about
the
the
main
problem
is
Newport
API,
scheduler
and
then
at
the
very
end,
cubelet
consumes
some
of
the
new
information
and
we
intentionally
simplified
it
compared
to
what
direct
refute
in
124.
So
now
cubelet,
for
example,
doesn't
need
any
additional
permissions.
It
doesn't
need
to
modify
any
additional
objects
as
part
of
this
proposal
anymore,
which
make
should
make
it
simpler
for
signaled
to
to
refue
and
and
approve
because
it's
it's
really
simple
in
that
regard.
A
Menu,
we
have
a
bunch
of
people
still
and
the
cabin
also
can
prove
menu,
because
the
the
problem
is,
the
goal
is
not
just
a
plural
right,
so
we
also
want
to
make
sure
that
it
is
the
meat
requirement
yeah.
E
A
C
C
C
A
Thanks
for
volunteer
to
take
another
video,
unfortunately
Derek's
have
the
most
content
here,
because
we
assigned
to
him
to
review
this
one,
so
I
don't
have.
But
the
team
also
talk
to
me.
I
understand
team
approval,
but
the
team
also
raised
his
concern
to
me,
and
so
he
hope
our
signal
can
hold
the
bar
here.
So
that's
why
I
want
to
make
sure
as
you
did,
we
need
to
hold
the
public
select
the
dynamic
the
resource
allocation
is
being
the
feature
we
want.
A
I
mean
the
signal
the
want
to
move
forward
for
last
couple
years
right,
so
the
every
time
being
pushed
back,
but
this
time
looks
like
it's
more
promising,
so
we
do
want
to.
Maybe
even
I
mentioned
you
team
have
the
a
lot
of
consent
before
he
approval.
So
he
didn't
talk
to
me.
Then
I
gave
him
the
reason.
My
religion
I
want
to
move
this
forward,
but
it's
starting
to
mean,
like
we
are
going
to
sacrifice
everything
right:
the
reliability
and
the
meant
used
to
be
1980
and
the
maintainability
just
move
forward.
A
I
just
want
to
make
sure
we.
We
understand
that
there's
something
because
we
signal
the
community.
We
have
weeping
suffer
or
maybe
in
a
criticize
the
real
happiness
here
and
the
people
want
a
certain
feature
and
desperately,
but
the
but
the
in
the
reality.
We
have
also
because
those
once
a
while
we
because
the
velocity
we
treat
off
the
relapenator,
so
that's
also
better
us
a
lot
yeah
and
yeah
Patrick.
We
understand
this.
A
We
want
to
move
forward,
but
unfortunately,
Derek
have
the
full
entire
context
from
the
beginning,
and
it's
not
here
yeah.
We
also
have
like
the
19
different
couple
in
this
unique
cycle.
Lastly,
and
I
just
want
to
say
that,
and
and
and
there's
the
review
or
pandemic's
problem,
and
also
the
approve
of
enemies,
problem
I.
C
Checked
with
my
colleagues
and,
for
example,
Sasha
agrees
that
the
class-based
resources,
which
I
think
you
discussed
last
week,
which
based
on
the
title
sounds
similar,
is
actually
doing
something
else
and
he
Sasha
agreed.
But
this
one,
the
other
one,
the
class-based
resources,
that's
less
important
than
Dynamic
resource
allocation.
So
if
you
need
to
choose
from
our
side
until
you're
fine
to
to
pick
Dynamic
resource
allocation
and
just
reject
the
other
one
and
postpone
it
if,
if
you,
if
you
don't
feel
comfortable
doing
both.
B
Yeah
I
think
like
what
you're
saying
is
like
we
will
try,
but
Derek
had
the
most
contexts
and
we
have
a
lot
of
the
plate
and
you
want
to
make
sure
that
we
get
it
right
and
right
so
we'll
try
but
like,
but
it
may
slip
and.
G
Yeah
I
did
a
quick
update
for
the
Pod
status
conditions
that
was
I,
guess
the
third
entry,
so
the
quick
update
on
that
was
so
I
direct
did
do
a
full
review
on
Thursday,
but
so
he
he
also
spoke
to
me
and
like
suggested
that
you
know
he
had
a
specific
name
that
he
thought
better
aligned
with
what
the
cap
is
trying
to
achieve,
which
is
bot
has
network
instead
of
surfacing
sandbox
related
Concepts.
G
So
the
cap
has
been
updated
to
basically
integrate
all
that
feedback,
but
I
think
Derek
didn't
get
a
chance
before
he
left.
Is
there
any
way
we
can
and
then
I
have
another
review
and
try
to
kind
of
get
it
approved
for
this
quarter.
A
G
B
A
H
Oh
yeah,
so
I
got
a
chance
to
go
back
over
the
recording
from
the
kickoff.
We
did
for
the
reliability
work
and
did
a
quick
summary
of
basically
what
we
ended
up,
which
is
to
say,
we
generally
agreed
that
we
have
stuff
to
do.
Obviously,
part
of
that
was
trying
to
figure
out
what
we
mean
when
we
say
things
are
unreliable,
so,
like
pot,
that's
like
simple.
H
H
So
as
part
of
that,
we
also
then
found
a
few
areas
where
we
want
to
improve
things.
So
part
of
that
is
improving
the
contract.
H
Testing
around
Cris
and
I've
started
opening
issues
for
that,
so
both
testing
CRI
implementations,
but
also
testing
how
the
kubelet
will
respond
to
failures
in
the
CRI
or
like
grpc,
and
then
also
that
we
want
to
use
tests
to
document
what
exists
today
to
avoid
shipping
regressions
without
like
knowing
about
it
and
actually
be
aware
when
we
are
shipping
behavioral
changes,
because
today
it's
fairly
easy
to
make
a
change
to
the
kiblet.
H
That
will
not
break
any
tests,
but
will
change
Behavior
and
that's
quite
scary,
especially
when
it
has
like
spiraling
issues
with
like
scheduler
interactions
and
other
stuff.
H
And
then
there's
an
extra
area
which
is
clarifying
where
we're
actually
just
lacking
features
for
things
that
can
cause
failures.
So
some
of
that
is
like
when
disk
accounting
is
really
slow
or
we
run
out
of
memory,
and
then
things
spiral
out
of
control.
We
should
be
better
at
like
documenting
that
proactively,
rather
than
just
having
you
know,
a
bunch
of
unspecified
things
that
can
go
wrong.
H
H
B
See
the
the
I
o
one
is
is
bad:
usually
it
results
in
like
timeouts
between
the
cubelet
or
the
container
runtime
when
they
start
getting
starred,
so
I
think
a
good
way
to
handle
that
one
is
maybe
having
some
metrics
like
in
the
node
node
exporter
or
something
that
can
catch
it
fast
enough,
so
maybe
like.
B
If
there
are
like
folks
across
across
companies,
I
want
to
collaborate
on
something
that
we
can
do
that
it
will
help
because
I
know
that's
been
tricky
and
we
did
a
bunch
of
workarounds
in
cryo
to
better
handle
that,
but
basically
like
it
depends
on
how
you
can
configure
your
node
right.
Okay,
if
you
configure
it
with
cryo.
These
are
the
kind
of
Errors
you
expect
if
you've
configure
it,
but
containerdy
you'll
see
different
errors
and
so
on.
So
maybe
a
better
alert
thing
may
be
useful.
There.
D
B
D
On
on
the
GK
side,
we've
had
like
a
lot
of
issues
also
with
just
throttling
and
disk
related
stuff
and
yeah.
It
would
be
great
to
work
more
on
those
issues
and
I
think
there's
also
works,
for
example
an
NPD
that
we
can
do
to
detect
it
and
maybe
service.
So
that's
like
a
node
condition.
So
users
are
clear,
you
know,
maybe
the
node
should
not
get
new
pod
scheduled
on
it.
D
Etc
yeah
I
think
like
one
of
the
most
challenging
parts
right
now
so
in
in
the
in
the
in
the
Sega
like
in
in
Kublai
and
stuff
I,
think
it's
a
lot
of
the
contracts
are
not
super
well
defined
so
like.
D
If
we
look
at
122,
we
had
like
the
pub
life
cycle
refactor,
for
example,
and
it
changed
certain
certain
things
around
plot
status
updates,
basically,
and
whether
it's
like
when
the
Pod
status
update,
is
delivered
or,
for
example,
is
there
an
IP
on
the
Pod
status
when
a
pod
is
terminated
stuff
like
that,
so
that
stuff
is
all
kind
of
a
little
bit
ambiguous
today
and
what
that
should
be
right.
D
Should
there
be
like
my
P
on
a
terminated
pod
or
not
right
and
we're
kind
of
relying
on
the
existing
Behavior
to
capture
that,
but
unfortunately,
there's
like
a
lot
of
controllers
and
other
things
in
the
ecosystem
that
have
started
to
rely
on
these
things
and
and
that
changing
them
can
cause
kind
of
Downstream
effects.
So
I
think
that's
one
of
the
big
areas
I
think
we
need
to
work
on
is
some
kind
of
defining
those
contracts
today
and
actually
documenting
what
is
important.
What
is
not
important,
I'm
sure
we
have
tests
for
them.
A
To
expect
to
be
on
CRI,
you
want
to
have
like
the
clear
defined.
Even
we
did
Define,
but
it
looks
like
we
might
don't
have
like
the
test.
Sufficient
test
cover
right
to
check
the
powder
status,
about
the
condition
and
all
those
kind
of
things
women
missing.
A
Something
like,
for
example,
you
just
mentioned
powder
terminating
yeah,
I
saw
tons
of
the
product
issue
and
the
product
is
terminating,
but
but
it
could
be
like
the
already
died
or
could
or
maybe
still
is
running
container
still
running,
and
then
there's
no
clear
way
to
explain,
and
so
you
did.
You
propose
next,
the
between
kubernetes
and
CRI.
We
also
have
another
layer,
and
the
basically
is
the
between
APS
server
or
Master
Control
plan
and
also
kubernetes
on
top
of
the
against
of
the
powder
API.
D
Exactly
yeah
yeah
I
mean
just
to
take
like
a
concrete
example.
We,
when
a
pot
is
terminating
like
it
on
the
Pod
status
after
the
plot
is
terminated.
Do
we
expect
an
IP
on
the
Pod
spec
or
not
and
like
that?
Actually
was
a
behavior
change
in
122,
for
example,
and
I?
Don't
think
we
ever
really
thought
too
much
about
whether
it
should
have
an
IP
or
shouldn't
have
an
IP
and
I.
D
Don't
think
the
answer
is
too
important,
but
the
fact
that
we're
consistent
matters,
because,
like
controllers
and
other
things
have
started
taking
advantage
of
that.
So
when
we
broke
that
in
122
it
cost
kind
of
Downstream.
You
know
breakage,
for
example,
endpoints,
controller
and
stuff
like
that.
So
definitely
I
think.
There's
work
to
be
done
at
that
layer
at
the
kind
of
pod
API
between
API
server
and
Google
layer.
A
H
Do
not
particularly
well
thought
out
yet
I'm,
not
sure
whether
we
want
to
just
bundle
it
into
the
CI
testing
subgroup
or
if
we
want
something
more
aligned
with
like
the
main
Sig
stuff
and
so
I
haven't
particularly
spent
a
lot
of
time.
Thinking
about
that
part
of
this
I'm
just
trying
to
document
what
we
want.
H
A
Yeah
I
think
the
next
step
and
we
need
to
have
like
an
accent,
actionable
plan
right.
So
so
then
action
plan
I
think
the
original
and
they
are
thought
it
is
in
the
CI
project,
a
CI
testing
project
of
driving
this
one,
because
from
the
previous
stage
to
the
next
stage,
which
is
we
either
more
test
the
coverage
right.
We
are
kind
of
the
make
some
progress,
D,
flick
or
testify,
and
so
now
is
we
we
want
to.
A
We
found
that
there's
the
missing
test
quality,
so
we
maybe
all
maybe
like
the
sometimes
it's
been
ideal,
but
the
next,
for
example,
some
stress
tests
that
we
talk
about,
but
the
level
we
run
those
tests
anymore.
So
we
try
to
add
those
back
and
update
because
some
is
not
reflected.
Today's
needs
today's
situation,
so
we
try
to
re-ed
it
back
and
but
we
need
the
plan
how
to
do
that
and
accent
of
all
so
people
can
share
the
work
right.
So
we
need
a
central
place,
a
central
place,
Central
dark.
A
A
Yes,
please
yeah
all.
I
Right
so
here
think
ups
and
the
working
group
watch,
we
are
suggesting
to
add
some
rules
for
terminating
or
continue
retrying
jobs
and
the
natural
API.
For
this
it
seems
to
be
exit
codes,
but
not
in
all
cases.
That's
not
enough.
In
all
cases,
for
example,
you
could
have
a
lot
of
failures
due
to
due
to
infrastructure
errors
or
system
errors.
I
For
example,
the
node
completely
goes
down
in
that
case,
there's
no
cubelet,
so
there
will
there
wouldn't
be
exit
code
in
the
Pod
spec
and
things
like
that
so
part
of
a
proposal
that
it.
This
is
how,
when
y
signal
is
involved,
the
part
of
The
Proposal
is
to
standardize
all
of
the
infrastructure
system
errors
in
kubernetes
into
a
single
port
condition
that
every
controller
can
write.
For
example,
if
cubelet
you
know,
does
an
eviction
about
eviction.
I
It
would
also
write
a
condition
saying
this
is
a
system
determination
which
start
with
status
through.
If
keep
scheduler
does
a
preemption
it
would.
It
would
write
the
same
condition
if
the
the
Pod
garbage
collector
the
text
that
the
node
is
gone
and
is
to
delete
the
parts
that
are
orphan
same
same
thing.
I
They
would
have
this
condition
so
that
that's
basically
the
proposal
in
in
cubelet
side,
we
have
identified
that
cubelet
sometimes
writes
a
status
reason,
which
is
in
the
cases
where
there
is
a
oompl
or
when
there
is
an
eviction.
So
we
are
try.
We
are
looking
to
just
in
addition
to
adding
a
reason,
these
standardized
spot
condition.
I
So
that's
a
proposal
and
the
cap
is,
is
pretty
much
ready
and
there's
a
few
details
to
to
enhance,
but
I
wanted
to
bring
this
up
to
your
attention
to
see.
If
you
think
this,
this
approach
seems
correct
or
you
have
any
other
suggestions.
D
One
question
I
have,
and
maybe
it's
already
answered
there
but
wanted
to
just
ask
like
how
do
you
handle
stuff
where
the
pods
are
killed?
Outside
of
you
know
the
standard
kubernetes
API
like
an
unkill,
or
maybe
someone
just
SSH
just
to
the
node
and
you
know,
deletes
it
directly
from
the
CRI
or
something
like
that.
Right
stops
the
plot
sandbox
from
the
CRI.
It's
an
idea
that
Google's
gonna
reconcile
that
state
and
I
mean
it
probably
won't
have
an
exit
code
at
that
point
right
because
I
won't.
D
We
won't
have
watched
it.
I
So
I
guess
that's
what
I
there
the
case
of
oomko
I
think
it's
already
handled
by
cubelet,
where
the
kernel
kids
kills
the
pot.
So
we
we
just
wanna
hook
into
that
logic,
I'm,
not
sure
what
happens
if
a
user
SSH
is-
and
that
removes
the
mostly
container
yeah.
I
Actually,
in
the
case
of
a
complete
VM
failure,
there
is
the
the
port
garbage
collector
in
the
cute
in
the
cube
controller
manager.
So
we
would
add
that
logic
there
of
that
scenario.
But
I
don't
know
about
CRI
what
happens
if
you
remove
the
container
from
CRI.
B
I
Right
so
it
sounds
like
we
should
be
able
to
tap
into
that
logic
and
add
the
condition
as
well
right.
D
A
My
understanding
it
is
this
problem,
so
actually
try
to
avoid
the
I
mean
like
what
the
other
early
earlier
say
that
all
the
controller,
the
kubernetes
also
is
one
of
the
controller
here
right
to
start
there.
So,
basically
you
could
determine
next.
The
controller
like
the
global
I
mean
not
Global
cost
level
of
the
controller
and
then
we're
based
on
that
failure
and
determine
should
I
how
I'm
going
to
retry
or
not
to
retry
or
or
maybe
it
is
next
move
to
the
defender.
We
couldn't
have
the
more
intelligence
so.
I
The
other
side
of
the
equation
is:
how
do
we
make
it
standard
in
kubernetes
that
certain
failures
are
you
know
the
the
con
the
Pod
didn't
finish
successfully
and
also
it
didn't
fail,
unsuccessfully,
because
of
software
bugs
it
just
failed,
because
there
was
pressure,
you
know
the
nose
was
gone.
I
There
was
no
more
space,
so
it
keeps
together
preempted
the
Pod.
All
of
those
all
those
errors.
We
want
to
standardize
into
a
single
a
single
pot
condition
that
we
can
filter
against.
A
This
is
really
useful
feature
I
just
a
little
bit
concerned
because
I
know
you
you
you
you
ping
me
last
week.
The
problem
is
you
earlier.
You
heard
we
have
so
many
archive
already
going
on
here
and
we
don't
connect
the
few
people
already
out
of
office,
and
so
so
only
my
concern
is
review,
benefits
and
also
approve
benefits.
It
is
the
important
feature.
I
can
say
that
obviously,
and
the
the
concern
is
because
this
is
also
his
API
change
standardization.
A
I
A
Yes,
the
only
problem
is
I
have
to
look
at
the
capture,
my
counseling,
it
is
kubernator
already
handle.
So
if
you
want
to
standardize,
you
either
go
with
kubernetes
with
or
if
you
want
to
kubernate
a
node
to
change
a
new
way,
then
we
need
to
start
a
single
magnification
for
existing
customers.
They
may
be
relying
on
the
previous
handed
towards
their
job.
I
I
don't
want
to
remove
whatever.
Is
there
right,
I
just
want
to
add
a
new
Port
condition
for
this.
For
all
of
this,
okay.
G
A
A
Last
one
Andrew:
do
we
want
to
talk
about
the
quick
Point
container
checkpoint,
yeah.
F
Yes,
yes,
so
I
I
got
a
couple
of
reviews
from
from
different
people.
Thank
you
very
much
everyone,
and
so
the
question
is
now:
if,
if
it
needs
any
more
reviews,
I
think
monal
mentioned
that
he
will
look
over
it
next
week.
So
I
just
wanted
to
bring
it
up
here
to
the
stages
of
the
of
the
future.
B
I
can
make
a
last
pass
Adrian,
and
then
we
can
get
it
up
to
how
much
yeah
okay
I.
A
D
Well,
Adrian,
just
quick
question:
maybe
for
the
runtime
support
is
what's
the
status
on
the
I'm,
pretty
sure
you've
been
working
on
the
cryo
side
yourself
right,
but
on
the
containerdy
side,
is
there
any
support
yet
or
that
still
yeah.
F
So
so,
I
I
opened
a
pull
request
basically
to
to
wire
through
the
new
CRI
call
to
container
d,
and
so
the
pull
request
exists.
The
pull
request
has
no
unit
testing
yet
so
I
need
to
add
this
to
the
container
Depot
request
and
the
the
the
the
bigger
problem
for
the
containerdy
side
is
that
I
need
to
think
about
a
format
where
to
store
the
the
checkpoints,
how
to
use
it
in
containerdy
because
currently
container
the
has
no
support
which
fits.
F
What
what
we
we
thought
about
here,
but
it
should
be
easy
to
to
come
up
with
something.
So
basically,
at
this
point,
containerd
is
able
to
create
a
checkpoint.
F
We
just
need
to
put
it
in
in
the
right
file
and
and
then
it
should
be
that
so
it's
it
looks,
it
looks
the
the
most
infrastructure
and
containerdy
already
exists.
So
it's
it's.
It's
not
that
big
part.
A
big
chunk
of
work.
D
Okay,
cool
yeah,
we're
happy
to
help
feel
free,
Ping,
On,
The
continuity
side,
happy
to
like
test
it
out
and
and
cool.
Thank
you.
E
G
A
Thanks,
Mike
and
David,
and
and
on
this
way
so
the
day
and
on
our
prop
on
our
agenda,
any
other
topic.
People
want
to
brought
up.