►
From YouTube: Kubernetes SIG Node 20200225
Description
Meeting Agenda:
https://docs.google.com/document/d/1j3vrG6BgE0hUDs2e-1ZUegKN4W4Adb1B6oJ6j-4kyPU
B
B
A
And
I
think
like
Giuseppe
can
give
like
an
overview,
but
we
are
already
running
quite
a
lot
of
tests
like
140
something
node
conformance
and
next
you're
gonna
target
running
all
of
the
node
conformance,
and
we
could
do
next
week
how
that
goes
sure.
So
so
this
is
just
a
status
update
want
to
ask
her.
What's
the
blocker,
oh
yeah
yeah?
Basically,
we
want
to
get
the
cap
up
roots
to
make
sure
that
we
can
get
phase
1.
B
A
So
we
discuss
this
sometime
last
year
and
I
wanted
to
bring
this
back
up
so
to
give
a
brief
overview.
So
today,
when
a
pod
has
a
termination
grace
period,
it's
only
ever
passed
when
we
are
calling
stop
container
and
the
problem
we
have
today
is
when
we
are
rebooting
a
node
there's,
no
guarantee
which
process
will
get
killed
then
so
we
are
at
least
we
are
relying
on
system
D
to
kill
our
pods
and
system.
A
D
has
a
setting
where
you
can
specify
the
termination
grace
period,
the
stop
timeout
on
the
Scopes,
and
if
we
set
it
over
there,
then
system
D
will
actually
do
a
graceful
termination.
It
will
wait
for
that
time.
It
will
give
a
sick
time,
wait
for
that
time
and
then
give
us
a
kill,
but
we
can
only
said
that
if
we
pass
that
down
as
part
of
run
pod
sandbox,
so
we
can
set
it
for
all
the
containers
in
that
pod.
B
Yeah
I
think
I
reach
that
problem
for
the
for
the.
If
that
is
set
to
now
and
what
we
can
do,
but
but
I
don't
recall
any
other
problem,
but
I,
don't
think
I,
don't
think
I
think
that's
doable
right,
so
I
saw
UPR.
I
just
saw
up
here.
A
B
C
The
reason
and
back
is
because
we
sort
of
keep
talking
about
it
at
this
meeting
and
then
end
on
well,
let's
take
this
offline
kind
of
note
and
then
and
then
sort
of
get
nothing
back,
so
so
I
think
we're
we're
kind
of
still
in
this
place,
where,
like
I'm,
not
sure
that
we've
even
agreed
that
it
is
a
bug.
You
know
when
I.
D
C
Brought
it
up,
there
was
general
agreement
that
it's
a
bug
and
there
were
two
different
proposals
on
how
to
fix
it,
and
I
was
hoping
for
some
kind
of
steer
or
blessing
about
about
one
proposal
or
the
other.
But
we've
kind
of
gone
down
down
this
this
route
and
come
full
circle,
and
so
like
I,
want
to
get
to
at
least
some
resolution
of.
Like
is
this
a
bug.
Are
we
gonna
fix
it
and
if
so,
like?
At
least
you
know
what?
What
is
the
path
forward
for
for
getting
a
fix.
B
We
want
to
change
the
hook
to
become
to
the
asynchronous,
and
that's
actually
that
in
for
the
purpose,
many
people
randos
hookah
and
that
could
be
potential
change.
The
container
state
has
like
a
you
think
about
containers.
They
wowie
all
you
make
you
have
to
wait
anyway.
You
have
to
wait
for
the
hook
now
to
come
back.
This.
B
Momento,
it
is
because
containers
status,
actually,
the
initial
container
status
is
out
of
the
powder
stickers.
So
a
little
we
design
how
the
lifecycle
management
actually
do
have
a
one
principle.
It
is
the
we
will
once
that,
every
single
container,
with
our
powder
run
at
least
the
ones
we
will
define
that
powder.
It
is
the
reigning
state.
Even
a
company
might
die
later
all
other
kind
of
things,
but
the
iDevice
we
won't
think
about
a
punishment.
So
it's
part
of
the
part
of
the
lifecycle
management.
C
The
the
problem
is
that
we're
not
like,
while
the
hooks
are
running
we're
not
getting
the
pot
IP
on
the
pod
status,
and
so
one
proposal
was
to
as
soon
as
you
set
up
the
sandbox
and
get
the
pot
IP
then
send
a
pod
status.
Update
that
contains
the
pot
IP
container
status
is
unchanged.
Containers
haven't
been
started,
yet
the
other
proposal
was
to
allow
the
hooks
to
run
asynchronously.
C
B
So
I
try
to
explain
why
people
think
about
the
situation
is
not
at
the
back
and
that
you
think
about
the
even
reason
apart.
We
could
update
the
couple
state
hers
if
I,
the
other
people
think
about
the
party
itself
is
the
smallest
of
the
men
management
and
created
in
kubernetes
and
in
this
think
about
the
one
and
the
pipe
design
principle
initially
and
I
just
stayed
what
I
remember
I
could
force
discuss
it
before.
Ok,
it's
not
ok,
I'm,
on
which
side
and
I
just
tried
to
summarize
what
I
recall.
B
So
so
people
think
about
the
sender
that
part
of
state
her
stood
before
the
first
pass
of
the
old
one
and
in
the
middle
sender
that's
not
wanted
powder
Staters,
they
could
could
end
up.
We
just
have
the
pub
IP
and
the
better.
Then
we
decided
to
kill
Randall
Rustad.
So
that's
they.
They
try
to
avoid
a
such
a
situation
and
also
they
worried
about.
B
Want
use
in
the
middle
to
sender,
powder
Staters
and
they
only
raised
the
pad
IP,
but
not
a
venue
refractor
of
the
powder
Staters.
So
this
is
kind
of
the
one
of
the
initially
foundation
design
principle
in
the
kubernetes,
especially
the
part
lifecycle
management.
They
want
every
single
container
run,
at
least
the
ones
based
on
that
when
determining
powder
status.
So
what
do
you
proposal?
No
matter?
It
is
the
async
as
a
hook.
B
Oh,
it
is
negative,
and
also
next
in
the
middle,
after
sandbox,
created
I,
have
the
powder
IP
and
send
the
powder
IP,
but
to
container
all
the
status
is
missing.
There
is
this
pending
or
whatever
they
just
think
about.
That's,
not
original
powder
lifecycle
management,
so
this
is
basically
a
I.
Can
remember
the
the
argument
there.
Okay.
B
Then
then
the
real
it
is
then
we
just
kind
of
continuously.
Can
we
have
a
good
way
because
it's
just
state
hurts
right,
so
not
reading
in
my
perspective,
is
just
status
and
is
the
way
we
way
to
talk
about.
Maybe
could
be
the
product
later
change
and
I
think
that
we
there's
a
way
to
say:
oh,
how
we're
going
to
redo
those
kind
of
things.
But
then
the
concern
for
engineer
here
it
is
even
calculate
is
the
complexity.
It's
not
just
like
the
okay
stand.
B
Another
state
hers
is
the
complexity,
change
the
whole
did
a
single
loop
in
the
Cuban
eight
internally,
so
then
last
tip
and
resolve
pricing,
especially
for
the
a
single
hooker.
The
last
end
result
question:
it
is
look
at
the
potential.
Can
we
send
the
powder
IP
earlier
before
we
have
like
a
company
of
the
palace
takers,
I
think
nurse.
You
also
have
imposed
some
of
the
complexity
that
had
I
also
didn't
really
:
stuff.
B
C
B
C
We're
just
looking
for
an
excuse
not
to
accept
a
PR
rather
than
saying
you
know
we
have
this
budget
of
QPS
and
you
know
make
a
concrete
statement
about
it
like
if,
if
the
QPS
stays
under
X
or
if
the
QPS
doesn't
increase
by
more
than
Y
percentage
like
if
we
can
make
those
kind
of
statements,
then
I
think
it's
worthwhile
to
maybe
go
down
into
this.
This
sort
of
like
well,
let's
measure
it
kind
of
modality,
but
like
I,
don't
want
us
to
just
sort
of
say
well.
C
B
If
people
have
the
different
problem,
south
style,
the
problem
is
even
like
the
I
think
they're
in
the
community.
Many
people
think
well,
they
don't
see
this
problem
and
they're
so
far,
only
a
factor
calico,
and
so
that's.
Why
that's?
Why
we
so
so
it
it
is
not
like
the
okay
common
problem.
Everyone
tried
to
solve,
but
they
do
is
common
concern.
For
me,
people
many
people,
single
mothers
could
be
potential
cause
the
actual
problem
for
the
community
right
you
the
step
in
in
here
for
the
reliability
here.
B
So
so
that's
what
I
try
to
propose.
It
is
a
way
even
at
the
Iowa
you
initially
actually
I
do
agree.
So
maybe
we
should
fix
this
one,
but
I've
got
a
huge
pushback
for
the
community.
So
if
I
was
having
the
like
the
like
the
people
and
we'll
try
to
stay
here,
actually
it's
not
that
here's,
the
data
I,
don't
think
about
that's
caused
the
problem
and
I
could
prove
this
actually
can
help
or
not
chemical
kisses,
and
also
it's
not
really
have
the
it's,
not
that
too
disruptive.
So
this.
C
B
B
E
B
F
B
G
It
seems
like
we've,
been
talking
on
a
more
nebulous
level
rather
than
talking
about
this
actual
code
and
what
it
does
and
I
mean.
Yes,
it
looks
like
we're
going
to,
you
know,
add
a
status
update
and
but
I
mean
David
is
right
in
that
you
know
it
looks
like.
Let's
see,
GC
g
zp
100
performance
has
been
run
on
this
cube
mark
ii
ii
ii.
Gce
big
has
been
run
on
this,
both
of
them
passed.
I
mean
in
terms
of
passing,
I'm
not
sure
what
that
means.
G
C
G
B
I
think
the
how
much
increase
they've
asked
the
proposal,
then
we
can
make
the
next
stack
at
least
that
I
might
only
concern
that
it
is
the
concept
I
think
we
need
a
measure
of
sixth,
but
others
do
have
other
concerns,
like
the
people's
think
about
the
actual
complexity
and
I
personally,
don't
buy
that.
Well
so
so
we
could
address
those
problems
so
that
review
Pia.
So
that's
kind
of
a
separate
issue
I.
C
B
C
So
the
one
that
that
Seth
linked
to
is
is
the
one
that
sends
the
extra
pot
IP
so
like.
If
that
is,
though,
the
way
that
that
you
know
sort
of
we
as
a
signal
community
would
prefer
to
solve
this
problem.
Then
that's
the
pr2
review.
There
was
another
one
that
he
opened
and
then
has
closed
and
I'm,
not
sure.
Why
he's
closed
it?
C
C
Strong
preference
for
for
one
solution
over
the
other
and
like,
if
there's
a
problem
with
this
particular
PR
right
that,
like
you,
know,
Ted
sent,
and
we
have
a
problem
with
that.
One
then,
like
you
know,
we
can
like
me
and
my
organization
can-
can
consider.
Okay,
we'll
do
our
own
PR
that
that,
like
does,
does
this
better
if
their
affair
issues
with
that,
but
like
before
I
go
down
that
route.
I
would
like
to
have
a
decision
made
about
like
what
is
even
the
right
way
to
approach
this.
This
solution,
I
I,
didn't.
B
E
E
C
No,
in
this
case,
calico
is
not
a
CNI.
Plugin
calicos
is
handling
network
policy,
but
it's
not
a
C
and
I
plug
in
so
so
we're
watching
the
API
server
to
find
out
about
pods
and
what
their
IP
czar.
So
we
can
enforce
network
policy
and
if
the
hook
wants
to
talk
on
the
network
and
just
like
waits
until
the
network
is
available,
it'll
wait
forever
and
it
will
block
calico
finding
out
the
IP
so
that
it
could
network.
So
there's
this
deadlock.
C
E
E
E
E
B
B
D
So
it's
a
simple
review:
approval
request
for
two
tiers:
one
is
a
part
of
a
series
of
the
Earth's
and
it's
actually
covered
by
the
Securities
exception.
So
we
still
have
a
chance
to
get
into
118
and
another
one
is
a
bug
fix
and
both
of
them
are
quite
long
in
being
in
interview
and
the
the
first
one
was
actually
accepted
and
already
passed,
API
review
and
only
like
last
review
step
is
actually
needed
for
it
to
be
merged,
so
Jordan
actually
promised
it
to
be
merged.
D
If
signaled
review
it
last
time-
and
it
was
reviewed
couple
of
time
by
Derek
and
I
updated
it
a
couple
of
times.
But
it's
it's
from
my
point
of
view.
It's
in
a
good
shape
and
like
if,
if
anybody
from
signal
can
connect,
surely
accept
it.
That
would
be
great
because
we
are
kind
of
approaching
the
final
deadline
for
189
and
I
start
to
be
a
bit
concerned
about
the
future
of
this
PR
and
the
second
one
is
actually
I.
D
F
D
B
F
B
F
So
I'm
not
doctor
I'm,
not
the
author
of
the
proposal
of
the
cap,
but
I
just
want
to
erase
attention
of
seek
maintenance
maintenance
if
they
can
take
a
look
on
it
because
again,
the
camp
already
passed
a
lot
of
iterations
before
it
is
someone
of
metal
did
some
review
on
it,
but
it
can
be
nice
like
to
hear
the
point
of
maintenance
if
it's
acceptable.
If
it's
like
near
the
finish
like
on
what
stage.
C
F
B
B
F
B
B
Okay,
next
one
I
think
the
next
one
it
is
I
think
the
P
is
next.
True
is
always
kinda
passed,
though
so
I
believe
will
oppose
me
and
the
dire
card
across
the
pasture
and
somehow
we're
not
do
B.
We
don't
have
enough
of
the
perimeter
even
with
fooled,
but
the
frontal
signal
the
perspective.
We
both
approve
those
testers
dawn.
B
J
F
J
K
Yeah,
that's
me:
hey
hey,
so
in
yeah,
I'm
working
with
Victor
and
with
Artyom
and
yes,
the
I
discovered
reviewing
the
the
proud
jobs
that
these
tests
are
not
run.
This
test
are
not
running
and
they're,
having
some
troubles
actually
testing
on
my
on
my
environment,
so
I'm
just
asking
what
do
you?
What
do
you
suggest
just
to
remove
the
whole
day,
I?
Think
no
or
how
should
I
ping
to
get
help
in
the
in
having
the
test
run
in
pro
I.
K
L
F
K
B
M
So
I
I
have
appeared,
that's
ready
for
the
stay,
a
PA
changes,
so
I
am
running
behind
on
the
implementation
as
well.
The
I
got
pulled
off
because
we
had
a
couple
of
people
from
the
company
were
stuck
in
China
and
then
I
don't
go
and
fill
in
for
some
of
the
work,
but
I
think
I'm
not
really
confident
that
will
make
the
March
5th
code
freeze
with
high
quality,
but
the
APA
review
can
I
think
we
can
start
looking
into
that
and
then
at
least
have
it
ready,
I'm
not
sending
a
peer
to
kubernetes.
M
Yet
because
I
want
the
implementation
to
be
in
pretty
good
shape
before
I
send
the
PRS
together,
but
I'm
gonna
have
I'm,
gonna,
think
Tim
and
Jordan.
This
pull
request,
number
one
which
is
in
my
repo
at
this
point.
If
David
direct,
you
guys
want
to
take
a
look
at
this
and
have
any
comments,
please
let
me
know
but
I
believe
the
last
changes
I
committed
just
a
little
while
ago,
should
address
the
comments
that
they
had.
M
The
primary
Tim's
feedback
was
mostly
about
using
like
drop
disabled
fields
and
couple
of
fixes
in
the
code,
and
he
had
some
questions
about
the
commenting
to
clarify
and
I
addressed.
Those
Jordan
had
a
couple
of
fairly
important
changes
where,
if
we
have
a
group
city,
earth
client,
a
client
go
that
is
of
a
lower
version,
we
need
to
support
that
and
when
I
used
set
defaults,
the
defaults
don't
go
to
set
the
default
values
that
was
getting
in
the
way.
So
what
I
did
was
a
remove.
M
The
changes
from
defaults
don't
go
and
then
I'm
using
the
admission
control
that
we
have
added
the
plug-in
to
set
the
defaults
and
I
was
able
to
figure
out
how
to
create
user.
Older,
downlevel,
cube,
CTL
and
then
test
that,
and
it
seems
to
do
the
right.
So
far,
so
I've
created
the
code
with
the
changes,
the
latest
changes
and
then
added
a
lot
of
unit
tests
as
well.
So
I
have
fair
amount
of
confidence
in
this
change.
M
M
J
Yes,
yeah
yeah.
Thank
you
this
one.
You
know
what
we're
doing
is.
We
are
topology
manager
we're
going
from
or
targeting
to
go
from
alpha
to
beta
and
1.18,
and
so
we
have
this
PR
and
the
only
thing
this
PR
does
is
change.
The
feature
from
you
know
default
false
to
true
and
it's
a
very
simple
change
and
as
a
result,
what
happens
is
the
scaling
test
fails
and
the
scaling
test
fails?
J
G
J
Would
be
great
is
if,
if
it
would
be
possible
to
look
at
this
job
and
running
and
get
on
the
job
and
interact
with
it
or
if
somebody
can
look
at
this
thing
and
just
say
hey,
why
is
it
failing?
I
have
not
been
able
to
get
to
the
root
cause.
I
suspect
it
may
be
just
a
slight
memory
increase
usage,
but
really
been
turning
the
feature
on
the
only
thing
we're
doing
store
in
the
container
UUID.
So
it
can
only
be
a
teeny
tiny
amount
of
memory.
They
were
adding
I
think
so.
J
Sort
of
set
up
but
I'm
not
quite
able
to
reproduce
yet
and
the
code
freezes
next
Thursday
and
so
I
was
hoping.
I
could
have
figured
out
by
now,
but
I
don't
and
so
any
anyone
can
help
out
or
provide
some
pointers
or
look
at
this
thing
in
debug
and
David.
You
should
kick
it
on
there
and
debug
I
saw
you
shaking
your
head.
No,
it's!
Alright!
I
I.
B
E
So
I
think
my
intern
actually
presented
something
like
this
more
than
a
year
ago,
but
I'm
hoping
in
sort
of
the
1.19
ish
timeframe
to
get
my
tracing
work
in,
and
so
the
basic
idea
of
this
is
that
we
can
use
open
telemetry
to
collect,
distributed
traces
on
what
kubernetes
controllers
are
doing
so
I've
been
to
a
number
of
SIG's,
but
now
I'm
gonna
talk
to
sig,
note
and
sort
of
present
what
I've
got
and
what
changes
are
required.
E
Hopefully
it
should
be
kind
of
hot,
so
my
goal
from
this
is
just
if
you
have
feedback
feel
free
to
ping
it
to
me,
your
email
me
or
or
leave
it
for
me
somehow
and
so
I
can
collect
all
that
make
sure
it's
all
addressed,
but
first
I'll
do
a
little
bit
of
a
demo.
So
the
basic
concept
is
that
everyone
can
see
this
right
yeah.
E
So
if
I
create
something
and
pass
in
this
trace
parameter,
then
we
have
so.
This
will
start
with
a
config
map
and
I'm
using
Zipkin
here,
I
have
the
collector
configured
Pacific
in,
and
so
you
can
see
that
we
can
trace
requests
that
are
going
to
the
API
server
down
through
to
the
LCD
transaction
and
so
for
a
quick
primer.
E
You,
the
open,
telemetry
libraries,
have
a
an
HTTP
server
wrapper
so
that
I
get
these
traces
or
these
fans
here
for
free
based
on
the
HTTP
request,
sent
to
the
API
server,
and
then
they
also
have
a
ERP
C
dial
option
that
you
can
use
for
the
client
for
@cv,
and
that
gives
you
this
cool,
sed
transaction
tree.
So
really,
without
almost
any
code
changes
we
can
get
traces
from
API
server
and
that's
all
cool,
but
wouldn't
it
be
better
if
we
could
know
what
say
the
node
was
doing
when
creating
a
pod.
E
So
if
I
instead
create
a
really
simple
pod,
let's
see,
I
can
now
get
this
view.
That
includes
not
just
the
API
server
request
to
create
a
pod,
but
then
also
things
like
the
schedulers
work
that
it
does
schedule
the
pod
and
then
all
the
way
down
to
the
cube.
Let's
work
to
sink
the
pod
and
even
the
container
runtime,
even
traces
from
the
container
runtimes
G
RPC
calls
for
cyclic
humans.
E
G
RPC
calls
to
the
container
on
top
and,
as
you
guys
all
know,
we
have
actually
not
just
one
jar,
PCE
client
for
the
cubelet.
We
actually
have
many,
so
we
can
do
this
with
device
plugins.
We
can
do
this
with
any
of
the
other
cubelet
plugins
we
have
and
that'll.
This
is
a
tool,
I
think
that
will
help
node
greatly
and
being
able
to
diagnose
sort
of
where's.
My
pod
stuck
sort
of
problems,
since
we
can
pretty
easily
get
interesting
in
telemetry.
I
E
E
Tracing
is
obviously
more
useful
when
you
have
lots
of
different
components,
because
the
whole
promise
of
distributed
tracing
is
that
I
can
have
two
different
binaries
that
both
exports
telemetry.
That
can
then
be
joined
back
together.
Yet
to
give
you
something
more
useful
than
safe,
logs
or
or
metrics.
E
Cool
and
the
last
demo
I
have
is
deployment,
and
so
the
the
way
that
the
the
previous
two
demos
work
is
that
adding
this
trace
argument
put
an
annotation
on
the
pod
object
or
the
config
macked
config
map
object
and
whenever
a
component
acts
on
that
object.
For
example,
when
the
cubelet
goes
to
sink
the
pod,
it
reads
the
annotation
off
the
object
and
uses
the
trace
context
stored
in
that
annotation.
E
So
I
just
created
the
deployment
and
if
I
pop
over
to
Zipkin
I
can
now
see
this
trace
here
and
it's
sort
of
like
the
pod
trace,
except
now.
There's
five
of
them
I.
Think
because
that's
number
of
pods
in
this
deployment-
but
you
can
see
sort
of
that
even
with
a
very
large
number
of
spans,
that
we
would
get
from
a
system
like
kubernetes.
E
It
actually
still
provides
a
very
useful
way
of
viewing
what's
happening
over
time
and,
like
I
said
you
can
see
the
the
initial
creation
of
the
deployment
you
can
see
the
creation
of
the
replica
set
for
the
deployment,
and
then
you
can
even
see
the
replica
set
controller
creating
the
pod.
So
all
of
those
layers
are
there.
E
That's
how
you
get
this
this
tree,
that
we
have
cool
so
there's
one
more
thing:
I
want
to
show,
which
is
that
open
telemetry
is
pretty
cool
because
you
can
send
to
not
just
Zipkin,
but
you
can
also
send
to
Jaeger,
and
you
can
also
send
to
stack
driver
as
well.
So
if
everything
worked,
I
should
have
all
the
same.
E
In
stock
driver
as
well
so
I've
set
this
up
to
send
to
both
stock
driver
and
to
zip
Caen,
and
so
like
this
isn't
a
vendor
specific
thing
because
we're
using
open
telemetry,
we
can
send
our
traces
wherever
the
heck
we
want,
including
yeah,
including
a
bunch
of
them,
and
so
I
think
this
is
pretty
fun
and
as
someone
who's
spent
plenty
of
time,
debugging
node
issues.
I
think
this
would
be
very
useful.
E
So
the
only
thing
let's
see
I've
got
2%
battery,
so
the
only
thing
left
the
three
changes
that
are
actually
made
to
the
cubelet
here.
That
I
think
deserve
a
little
bit
more
scrutiny
from
this
group.
Are
we
have
to
use
a
G
RPC
dial
option
when
we're
making
connections
to
all
of
our
clients,
so
device
plug-in
container
on
time?
Any
of
the
other
plugins
right?
E
The
cubelet
has
to
be
configured
on
startup
where
to
send
the
traces,
and
so
if
we
do
use
the
open
tracing
agent,
then
you
would
send
to
a
local
payment
set
and
the
qubit
has
some
startup
stuff
that
it
does
and
then
the
only
other
sort
of
big
change
that
some
people
find
annoying
is
that
this
is.
This
does
mean
we
are
going
to
start
doing
context
propagation.