►
From YouTube: Kubernetes SIG Node 20210525
Description
Meeting Agenda:
https://docs.google.com/document/d/1j3vrG6BgE0hUDs2e-1ZUegKN4W4Adb1B6oJ6j-4kyPU
A
Good
morning,
everyone-
and
today
is
the
may
25th
and
our
weekly
signaled
community
meeting
welcome
everyone.
So
we
have
a
full
of
agenda
today
and,
let's
start
our
tour
meeting
as
yo-yo
cirque.
Do
you
want
to
silky?
Are
you
around?
Do
you
want
to
update
us
about
the
pr
status
and
and
also
the
charity
status.
A
B
I
think
sergey
might
not
be
on
the
call,
so
I
am
happy
to
jump
in
on
his
behalf.
I
don't
have
the
numbers
for
the
the
from
the
script
for
the
meeting
notes,
but
anecdotally.
I
could
say
that
we've
been
merging
some
prs.
I've
certainly
been.
B
Oh,
I
see
sorry,
my
wi-fi
is
a
little
out
there,
so
I
was
just
saying
that
pr
seemed
to
be
moving
along
well,
the
triage
column
is
mostly
empty.
We
definitely
still
need
more
reviewers
and
more
help
with
triage,
but
velocity
seems
like
pretty
good
right
now
and
the
more
folks
who
get
involved
the
more
we
can
sort
of
burn
down
the
backlog,
because
we
still
have
lots
of
pr's
that
are
waiting
on
reviews,
even
if
things
are
mostly
chugging
along.
A
Thanks
alailah,
so
maybe
we
move
to
our
agenda
and
the
first
one.
Actually
is
you
and
I
talk
about
the
yeah?
What.
B
So
I
put
this
one
on
the
agenda
and
put
together
a
doc,
because
we
have
been
chasing
down
a
lot
of
race
conditions
in
the
cubelet
and
issues
with
pods,
spin
up
and
deletion,
and
I've
been
chatting
with
ryan
and
clayton
and
clayton.
Is
here,
hey
clayton?
Do
you
want
to
jump
in
and
talk
about
some
of
the
stuff
from
the
dock
that
we
put
together.
C
Yeah
and
ryan
and
elena
and
I
have
been
like
chasing
some
of
these
for
a
while-
we
had
a
bug
two
years
ago,
which
was
like
you
could
have
a
pod.
That
was
that
always
failed.
So,
like
you
could
have
a
job
that
returned
exit
code,
one
all
the
time
and
then
very
rarely
it
could
actually
return
zero.
That
was
due
to
a
race
conditioning
cubelet
status
reporting.
We
we
chased
some
stuff
usually
had
some
feedback.
C
We
knew
there
was
another
race
somewhere
else,
and
so
we
kind
of
put
in
a
best
effort
check
that
that
handled
it.
I
was
the
test
case
that
we
put
in
to
simulate
that
actually
creates
and
delete
like
it,
creates
a
pod
and
immediately
tears
it
down,
and
the
container
is
supposed
to
return
zero,
and
so
the
test
was
checking
between
zero.
C
We
had
a
couple
places
where
that
would
hang
or
like
fail
for
a
long
time,
and
so
it's
like,
oh
you,
know,
there's
something
weird
going
on,
but
it
was
really
rare,
as
I
was
digging
into
that
last
week.
I
realized.
The
actual
issue
is:
what
would
happen
would
be
that,
just
because
of
the
way
tear
down
happens,
you
actually
would
nev
would
not
the
various
bits
of
the
cubelet
that
are
like
hey?
Is
this
pod
terminated?
C
So
I
can
start
cleaning
stuff
up
we're
actually
vulnerable
to
a
race
condition
where
the
first
container
hadn't
been
started.
So
there
was
no
strong
synchronization
in
the
cubelet
that
allowed
you
to
know
that
the
pod
is
transitioned
to
basically
there's
like
three
phases
of
a
pod's
life
cycle.
It's
like
it
doesn't
have
any
containers.
Yet
once
the
cubelet
sees
it,
it
can
start
having
containers
and
then,
when
it's
when
it's
stopped
or
evicted
or
you
know
shutting
down,
you
need
to
know
that
no
more
containers
can
be
created.
C
From
this
point
on
and
most
of
the
other
cubelet
cleanup
loops
were
doing
a
very,
very,
very
incorrect
check
for
that
they
were
just
looking
at
whatever
copy
of
the
container
status
they
had,
which
could
be
wildly
out
of
date,
and
so
and
now
so
it
turned
out
that
status
check
wasn't
checking
in
it
containers
or
ephemeral
containers.
So
it
was
also
just
wrong.
C
So
after
some
quick
discussion
with
elena
and
ryan,
like
my
proposed
address
and
what's
in
the
dock,
that
alana
wrote
up,
was
really
great
that
she
she
did,
that
I
was
adding
some
notes.
Is
the
life
cycle
of
a
pod
is
pretty
predictable
and
we
need
some
strong
guarantees.
We
already
have
the
pod
worker.
I
think
the
pod
worker
today
exits
too
early,
and
so
my
proposal
and
what
I
was
hacking
around
and
testing,
was
effectively
that
the
pod
worker
from
the
moment
we
start
building
up
a
pod
to
the
moment.
C
We've
fully
torn
the
pod
down
and
and
generated
the
final
status,
because
there's
no
more
running
containers
on
teardown
would
all
happen
in
a
single
per
worker
loop.
We
actually
found
other
pro.
I
was
linked
into
other
problems
that
people
have
found
so
there's
an
issue
with
graceful
termination
which
is
you're
supposed
to
be
able
to
shorten
graceful
termination,
but
the
way
that
that
was
being
handled
didn't
necessarily
guarantee
that
the
shorter
duration
like
so
you
can.
C
You
create
something
you
say,
wait
30
seconds
you're
allowed
to
at
the
api
server,
and
this
is
part
of
the
original
design
say:
oh,
no,
no
only
wait
20
seconds
and
if
20
is
shorter
than
30,
the
cube
is
supposed
to
shorten
it.
That
logic
didn't
actually
fully
work.
There
was
a
pr
for
it.
I
actually
think
to
correctly
do
that.
You
need
to
be
able
to
know
what
the
last
value
was
in
a
deterministic
fashion.
The
cubelet,
the
other
person,
had
already
started
changing
that
in
the
sync
worker.
C
So
again,
I
think
it's
right
for
us
to
combine
how
the
sync
worker
be
the
source
of
pod
life
cycle,
tearing
up
the
immutable
transition
to
starting
to
tear
down
the
transition
from
when
there's
no
more
running
containers
to
when
the
pod
is
cleaned
up,
and
actually
I
found
a
couple
more
race
conditions,
as
I
was
going,
so
I'm
pretty
confident
that
kind
of
pod
logs
are
actually
being
torn
down
too
early
today,
because
the
loop
that
tears
down
containers
also
tears
down
pods,
but
that's
not
actually
tied
to
pot
shutdown.
C
So
I
actually
was
able
to
trigger
a
race
condition
and
a
test
that
I
had
seen
before,
where
we
a
pod,
reaches
success,
and
then
we
try
to
go
see
the
logs
for
it
and
just
because
the
the
sync
loop
was
kind
of
slow,
most
of
the
time
you'd
have
like
10
or
15
or
20
seconds
before
it
got
torn
down.
I
was
actually
able
to
make
it
happen
instantly
because
in
my
in
the
sink
worker
I
was
tearing
down
the
containers
and
the
fix
is
pretty
obvious,
which
is
we
want
to
preserve?
C
We
want
to
guarantee
that
container
logs
stay
until
the
pot
is
deleted
or
an
eviction
or
a
garbage
collection
happens,
and
by
unifying
some
of
this
logic,
that
became
a
lot
clearer.
So
there's
a
bunch
of
like
little
bugs
all
over
the
place
that
I
think
this
helps.
It's
not
a
lot
of.
Basically
all
the
places
in
the
cubelet
that
are
saying
things
like
hey,
given
the
current
pod,
I
have
go:
ask
the
status
manager
for
the
container
state
go
check,
running
containers.
C
They
basically
just
become
calls
to
the
pod
worker,
which
has
a
locked
synchronized
immutable,
one-way
transition
map
that
says
the
you
know
the
sync
worker
has
started
up,
so
there
could
be
running
containers
sink
pot,
and
so
I
also
split
sync
pod
into
three
there's
setting
up
a
pod,
there's
stopping
the
containers
and
then
there's
cleaning
up
the
resources
and
cleaning
up
the
resources
matches
what
you
have
to
do
in
the
status
manager
before
the
final
delete
is
sent.
So
those
are
all
basically
unified.
C
As
the
pod
worker
goes
through
those
phases,
it
checks
to
see
that
we're
not
transitioning
through
an
illegal
state
transition.
It
sets
the
map
and
then
everybody
else
in
the
cubelet
just
says
like
they
make
the
same
calls
they
did
before,
but
they're
consistent
and
they're
they're
one
way,
so
they
can
never
regress.
C
I'm
hoping
that
I
can
get
the
I
think
I
have
almost
all
the
bugs
I'm
still
chasing
down
some
cases
where
I
didn't
understand
the
cubelet
enough
mirror
pods
are
still
broken,
as
they
always
are,
and
I
need
to
go
fix
lump
in
there.
But
I
just
if
fultz
can
read
the
dock
and
give
feedback
on
it.
Did
it
sound
sane?
I
saw
don
nodding,
but
don
don
is
easily
convinced
by
my
my
my
subtle
eyes.
Sometimes.
A
Honestly,
I
want
to
invite
you
to
youtube
here
if
you
will
feel
so
relaxed
somewhere,
repeat,
basically
see
what's
original
design.
I
like
here,
I
know
I,
I
believe
the
single
part
and
also
part
worker-
or
it
is
the
first
time
we'll
be
introduced
to
the
synchronization
to
the
kubernetes,
and
I
saw
the
original
episode
is
the
party
event
in
the
old
code.
Maybe
some
optimization
mess
up
those
events
so
distributor
we
saw
so
it
looks
like
still.
It
is
just
optimization.
A
Last
two
years
we
I
did
like
that
we
have
even
recently,
I
think,
the
half
year
ago
we
have
the
huge
pressure.
People
want
to
even
more
optimize
the
status
report
so
but
did
not
really
enforce
those
synchronization
and
also
part
the
related
events,
so
that
causes
some
of
the
risk
conditions.
So
what
do
you
you?
You
see
it
here?
It
is
so
makes
sense.
I
also
see
the
nantes
here
today.
A
D
Yeah,
actually,
I
I
just
recently
had
a
discussion
with
jean
from
sig
storage
and
basically,
we
also
encounter
some
risk
condition
around
similar
to
what
you
described.
Is
that
we'll
create
a
container
and
we'll
create
a
part,
and
it's
not
running
yet
it's
pulling
image,
and
and
now
you
try
to
delete
it
immediately
and
somehow
the
volume
will
go
into
a
real
state.
We'll
say
that
the
actual
volume
is
not
mounted
but
cubic.
Think
that
issue
month
and
create
a
local
directory.
D
So
the
problem
is
using
a
local
directory,
but
it
it
it
actually
should
use
a
volume.
But
anyway
it
goes.
It
goes
into
inconsistent
data
and
we
discussed
internally,
and
we
also
discuss
with
several
options
and
one
of
the
options
is
that
to
make
sure
that
part
worker
handles
the
lifecycle
of
the
of
the
pod
and
the
other
logic,
like
the
part,
the
the
the
clean
up.
D
We
have
a
big
cleanup
look,
so
I
think
for
that
one,
the
risk
condition
is
caused
because
the
both
the
cleanup
and
the
part
worker
trying
to
do
things-
and
we
saw
that
pod
worker
stopped
working,
but
it's
still
working.
So
if
we
have
a
a
clear
indication
about
why
the
pot
worker
is
still
managing
the
the
pot,
if
it
is
cleanup,
shouldn't
kick
in
and
so
to
avoid
the
risk
on
the
condition
only
when
we
know
that
part
worker
protein
is
gone,
we
can
use
any
use.
D
C
It
is
kind
of
reassuring
because
most
of
the
the
race
condition
that
was
originally
caught
was
the
short
running
it
could
happen
later.
In
some
cases
it
would
be
unlikely,
but
cleanup
in
general
is
working,
we've
kind
of
like,
as
we've
improved
cleanup
over
the
last
couple
years.
C
The
nice
thing
is:
is
that
we've
kind
of
improved
and
concentrated
where
cleanup
happens
and
as
I
went
through,
I
was
trying
to
like
just
verify
all
those
places,
and
I
think
we
have
gotten
a
lot
cleaner
so
that
it's
easy
to
say,
like
here's,
a
set
of
things
that
we
have
to
clean
up
having
that
list
actually
means
we
could
just
do
that
synchronously,
which
also
reduces
tail
latency
on
shutdown,
but
there's
some
trade-offs
there
that
I
think
we
should
discuss
one
of
don.
C
You
brought
the
point
about
status,
so
the
thing
I
was
actually
seeing
before
with
status
was
some
of
these
second
order
effects
with
cleanup.
You
know
with
status.
There
was
basically
you
know
something
it
was.
I
think
it
was
like.
It
was
taking
us
10
seconds
on
average
to
sync
some
of
the
important
events.
C
What
I'm
seeing
today
in
the
code
is,
it
can
take
somewhere
between
10
seconds
and
20
seconds
under
moderate
load
or
30
to
40
as
more
popular
show
up
to
actually
finish.
The
termination
of
pods
part
of
that
is
actually
just
the
sink
loop
intervals,
which
is
a
defense
of
like
you
know
it
degrades
gracefully
to
a
more
predictable
performance
mechanism,
but
also
status
is
easier
to
reason
about
now,
because
we
could
actually
look
with
the
pod
worker
having
the
the
full
life
cycle.
C
What
I
was
finding
was
that
I
could
actually
make
an
argument
that
we
would
know
the
time
between
different
phases
much
more
accurately,
because
it's
one
spot,
which
also
would
help
us
with
putting
performance
measurements
in
place.
That
would
let
us
assess
tail
latency
of
pods,
shut
down
and
measure
it
accurately,
as
well
as
things
like
when
we
do
go
optimize
status,
reporting
we'll
have
a
much
much
better
understanding
of
when
the
status
should
be
done
and
we
can
compare
against
it.
But
yeah
like
it
was.
C
I
the
places
that
were
a
little
weird
were
some
of
our
terminology
is
a
little
inconsistent.
C
So
in
eviction
we
would
say
terminated,
pods,
but
those
are
actually
terminating
pods
and
then
like
pod,
deleted,
like
is
pod,
deleted,
was
being
used
a
couple
places,
and
that
was
it
wasn't
really
the
pod
as
deleted.
It
was
either
cubelet
side.
C
Eviction
has
been
requested,
in
which
case
the
cube
wants
to
get
rid
of
those
resources,
or
the
user
has
told
us
to
delete
it,
in
which
case
we
can
get
rid
of
those
resources
that
was
kind
of
being
used
inconsistently,
so
just
going
through
and
reviewing
you
know,
it's
easier
to
put
better
names
on
those
that
are
centralized,
like
should
pod
resources
be
reclaimed,
which
makes
it
easier
for
someone
adding
a
new
loop
or
looking
at
it
to
reason
about
you
know,
it
took
me
a
couple
days
to
bring
everything
into
context,
and
I
still
think
there's
some
subtle
stuff,
but
it
has
helped
to
look
at
it.
E
I
was
looking
at
your
first
pr
and
I
know
I've
been
telling
you
for
last
couple
days.
I'd
look
at
this,
but
you
know
live
code.
Review
is
fun.
I
I
don't.
You
said
static,
pods,
weren't
working
right
with.
E
I
was
trying
to
think
through
the
changes
we
did
last
year
to
handle,
I
guess
proper
shutdown
of
a
static
pod
when
you
change
the
file
on
disk-
and
I
remember
having
to
spelunk
my
way
down
into
the
pod
killer
stuff,
and
I
see
that
you've
gotten
rid
of
the
pod
killer
or
I
can't
find
it
potter.
C
E
What
happens
when
a
cubelet
starts
and
sees
pods
that.
E
Should
no
longer
exist,
like
all
the
cleanup
loops
for.
C
All
the
cleanup
loops
are
basically
the
same,
so
the
cleanup
loop
would
look
at
all
the
like
containers
or
volumes
and
it
would
say,
here's
the
set
that
shouldn't
work
and
then
all
the
cleanup,
as
far
as
I
can
tell
the
cleanup
loops,
are
actually
correctly
cleaning
up
all
the
pods
that
are
non-existent
because
I
didn't
change
the
cleanup
loops
except
to
say,
like
the
cleanup
loops
have
a
bunch
of
gates
in
them.
That
say,
is
the
pod
in
this
state,
or
does
it
exist
or
not
exist?
C
I
made
those
correct
because
they
were
not
correct,
like
the
places
where
we
were
checking.
Our
containers
running
was
racy
and
was
vulnerable
to
a
race
on
startup
or
on
restart
of
a
cubelet
or
on
restart
of
a
node.
So
that
is
a
place
where
a
race
would
exist
in
practice.
Most
people
wouldn't
hit
it,
but
you
could
actually
start
tearing
down
the
containers
or
the
state
you
needed
from
the
previous
reboot
or
from
a
previous
start
of
the
cubelet.
C
Due
to
that
race
condition,
that's
been
mostly
fixed,
so
I
didn't
really
have
to
change
that
logic.
It's
just
the
check
is
correct
from
the
should
this
pod
exist
and
what
phase
of
its
life
cycle
is
this
positive,
yeah.
E
C
I'll
go
double
check
the
as
far
as-
and
this
may
be
like
one
of
the
tests
that
that
was
broken
like
the
core
tests,
all
work,
which
was
another
example
of
any
place
where
I
knew
it
was
broken
and
the
ede
tests
did
not
catch.
It
is
a
place
where
we
probably
need
to
add
some
better
ede
test,
like
the
no
dd
test
caught
a
few
of
them,
but,
for
instance
like
some
of
the
race
conditions
like
the
fact
that
containers
were
not
considered
at
all
means,
we
need
to
have
an
ede
test.
C
That's
a
little
bit
like
the
existing,
create
then
delete
for
knit
containers.
There
was
one
with
pod
logs
aren't
being
preserved.
That
was
a
race
condition
that
I
did.
I
only
the
test,
flaked
and
didn't
fail.
C
E
E
C
So
I
didn't
change
any
names
of
that.
So,
like
eviction,
so
everything
if
you
want
to
terminate
a
pod
from
the
cubelet
in
a
scenario
you
basically
send
the
sink
pot,
kill
event
to
update
pod,
so
eviction
through
the
kill,
pod
now
rapper,
and
then
you
pass
it.
C
Obviously
you'll
do
status
mutations
that
say
like
that,
make
it
permanent
or
not
so
it's
kind
of
the
soft,
so
soft
admission
failures
in
the
sync
loop
are
the
only
thing
that
don't
go
through
that
mechanism,
but
it
puts
the
container
into
the
same
space,
which
is
a
soft
pod
termination.
So
if
this,
if
pod
emission
happens,
you're
still
in
the
sink
loop
you're
still
in
the
building
up,
but
you
get
to
the
point
where
you
realize
you
should
be
not
running
and
you
shut
it
down.
C
So
it's
like
one
of
the
first
checks
all
other
like
eviction,
graceful
shutdown,
basically
just
called
kill,
and
they
either
mutate
the
status
or
they
don't.
The
issue
I
saw
from
a
terminology
perspective
was
graceful.
Pods
shut
down
today
is
behaving
like
eviction.
C
When
you
shut
down
the
node
gracefully,
I
don't
think
it
should,
because
there's
no
expectation
that
that
pod
can't
restart
on
the
next
time.
The
keep
the
note
comes
back
up.
That
was
a
minor
thing
I
caught.
I
tried
to
unify
all
of
the
code
that
is
terminating
pods,
so
there's
only
one
method
right
now
that
anybody
can
send
to
telepod
to
shut
down.
That's
not
inside
a
sync
pod
worker,
so
kill
pod
is
only
inside
sync
worker,
except
for
the
cleanup
loop.
C
I
think
that
was
maybe
the
one
and
maybe
clean
up
even
should
be
more
closely
aligned
with
the
pod
worker
to
your
point
derrick,
but
we
could
tag
it
like
with
a
to-do
or
something
like
clean
up
of
pods.
That
don't
exist
is
similar
enough
to
pods
shutting
down
that
they
should
be
aligned
so
that
you
know
when
you
can
use,
kill
pod
or
not.
But
I
didn't.
I
didn't
change
the
definition
of
eviction.
I
didn't
change.
Grace
was
shut
down
and
it
didn't
change.
A
So
we
do
have
like
a
eviction
and
also
out
of
resource
tests,
but
I
don't
know
the
e2e.
We
never.
A
I
know
and
look
I
just
want
to
say
that,
but
there's
the
many
to
do.
We
didn't
finish
the
reason
it
is.
We
need
the
support
from
other
communities
like
this
kubernetes
community,
because
I
think
the
back
is
in
the
community
unique
about
a
lot
of
what
is
conformance
right.
A
So
that's
why
we
so
this
is
why
I
invented
something
called
know
the
level
of
the
conformance
so
but
back
to
this,
the
pleg,
like
the
management,
actually
used
to
have
the
integration
test
for
those
kind
of
things
and
unfortunately,
those
integration
tests
also
is
no
maintenance.
So
that's
why
to
focus
for
the
community
can
be
focused.
So
we
later
only
focus
node
e
to
e
the
later
felony
kubernetes
have
the
cluster
level
of
the
conform
test.
Then
we
focus
on
confrontation,
so
you
can
see
like
over
time.
A
We
are
focused
on
smaller,
smaller,
even
with
smaller,
smaller
core
function
on
it
here,
for
the
try
to
make
those
is
the
e2e
have
the
eating
test,
so
we
we,
we
give
up
a
lot
of
other
tests
even
at
the
same
weather,
related
standoff.
We
also
have
integration
tests
and
the
plet.
We
also
have
integration
tests,
so
maybe
we
could
revisit
those
kind
of
things.
Even
if
you
know
the
perf
has
the
performance
test
for
the
container
runtime
and
also
know
the
level
of
the
performance
from
the
cuban
level.
A
C
The
nice
thing
is,
is
I
at
least
from
the
behavior,
a
user
expects
most
of
our
ed,
like
the
gaps
I
was
seeing
were
like
places
where,
in
the
ede
of
like
have
we
ever
really
defined
how
long
it
is
that
you
can
get
logs
from
a
terminating
pod?
No,
should
we
probably
because
if
it's
racy
right
now,
so
people
are
probably
expecting
it
to
work
so,
like
even
that's
like
a
great
example
of
a
conformance
style
test.
That's
pretty
easy
to
write
with
a
little
bit
of
cleverness.
C
I
did
really
regret
not
having
reliable
and
easy
to
run
integration
tests
for
some
of
the
interaction
between
the
cleanup
loops
and
some
of
that's
tough,
no
ddes
could
cover
that.
C
But
like
alana-
and
I
were
talking
about
yesterday,
just
about
there
were
some
definite
places
where
this
was
harder
for
me,
even
kind
of
knowing
what
the
intent
was
and
having
a
lot
of
the
backstory,
because
I
couldn't
check
the
the
mid-level
invariants
without
just
firing
up
a
local
cluster
and
running
it,
which
felt
felt
like
we
could
do
better
for
testing
specific
scenarios
that
are
kind
of
like
whole
public.
But
you
know
knowing
how
complex
all
the
flags
for
cubelet
are.
There's
also
some
limitations
there
so
be
happy.
E
E
So
if
I
think
flantal
or
myself
want
to
help
review
on
this-
and
I
know
clayton
will
help
you
with
the
yeah.
C
But
I
I'm
not
ruling
out
the
possibility
that
some
of
the
other
weirdness
we've
seen
on
static
pod
restarts
might,
I
might
have
broken
a
work
around
where
this
might
actually
help
us
fix
it
more
cleanly
static.
Pods
are
subtly
different
in
their
termination
logic,
because
they
will
come
back.
That'd
be
that'd,
be
a
that'd,
be
actually
a
great
opportunity
to
go
and
make
sure
that
I
understand
it
as
I'm
putting
some
of
this
other
stuff
in
space.
So
I
can
help
add
more
tests.
E
And
then
dawn
you
had
mentioned
like
the
perf
test
renault
and
I
were
having
some
discussion
with
some
folks
at
red
hat
that
wanted
to
better
look
at
pod's,
startup
latency.
I
guess,
and
one
of
the
areas
we
were
trying
to
see
is
if
we
could
have
them
bring
that
test
dashboard
back
to
life.
I
don't
know
what
the
timing
on
that
will
be,
but
sure
definitely.
A
That's
what
I
also
have
to
do
based
on
original
design.
He
said
I
can
share.
If
what
I
can
schedule
meeting
give
the
status
current
status,
and
you
know
what
we
can
move
forward
and
also
there's
the
sig
instrument,
which
is
david,
david,
the
dutch
part
right
so
rather
signaled.
So
we
also
want
to
do
more
from
the
instrument
and
those
kind
of
things,
and
then
we
can,
let's
also
from
the
signal
initially
basically
from
us.
We
want
to
instrument
enables
those
kind
of
things,
and
so
you
can
easily
monitor
of
the
latency.
C
E
C
The
time
we
delete
the
ncd
object
is
unmeasurable
today,
but
it
could
be
measurable
and
like
there
were
some
options,
so
I'd
be
happy
to
tackle
some
of
that
and
then
tee
up
some
of
the
discussion
as
well.
Certainly
cool
with
this
change,
at
least
for
unloaded,
cubelet
shutdown,
latency
becomes
much
more
predictable.
It's
not
it's,
not
it's,
not
a
regression
and
the
tail
is
pulled
in
a
bit.
It's
not
fundamentally
changing
like
some
of
the
like
the
the
status
loop
is
now
the
biggest
like
under
load.
C
The
pod
status
worker
is
or
pod
status
manager,
and
its
final
updates
are
the
longest
component
of
shutdown.
From
an
observable
perspective,
the
other
fixes
I
had
that
I
was
playing
around
with
in,
like
you
know,
improving.
That
would
be
probably
even
better
now
because
of
we're
not
dependent
on
the
cleanup
loops
in
this
approach.
C
So
I'm
kind
of
incentivized
to
like
get
the
right
instrumentation
in
so
that
we
can
say
like
hey,
here's,
what
we're
getting
now
and
we
get
we're
seeing
how
that
is.
And
then,
when
the
status
manager
improvements
like.
If
somebody
gets
time
to
come
back
and
recess
them,
we
might
actually
see
a
significant
reduction
in
the
time
to
shut
down
which
matters
for
anything
short
running.
B
Clayton,
do
we
want
to
touch
on
the
the
pod
level
termination
grace
period
seconds
being
set
to
zero
bug,
because
that
was
kind
of
the
thing
that.
C
Was
like
yeah
and
yeah,
like
jordan,
had
been
recommending
to
people
in
edes
because
jordan's
not
the
best
person
sometimes,
but
in
edes
we
were
force
deleting
pods,
and
so
I
thought
we
were
forced
deleting
pods
in
edd
tests
where
people
were
actually
invoking
a
deletion,
but
it
turns
out
that
we
actually
allow
determination,
grace
period
seconds
on
spec
to
be
zero,
which
is
not
the
intent
at
all,
and
I
don't
know
how
it's
gone
five
years
without
me,
noticing
that
so
that's
what
the
pr
originally
was.
C
I
was
fixing
it
to
make
it
impossible
to
set
zero
from
a
pod
spec,
because
that
completely
takes
the
cubelet
out
of
the
chain.
It
has
a
whole
bunch
of
resource
implications
for
the
cubelet.
The
original
intent
of
graceful
deletion
was
not.
Our
first
deletion
was
not
for
it
to
be
a
standard
part
of
the
workflow.
C
As
part
of
that,
I
was
proposing,
making
zero
behave
like
one
a
lot
alana
appointed
to
another
bug
where
negative
also
works.
Jordan,
and
I
were
just
having
a
brief
chat
yesterday,
where
he
was
suggesting.
E
C
C
So
I
I'd
like
to
see
that
pr
and
the
approach
I
had
combined
jordan
was
making
some
suggestions
on
the
pr
because
he
had
questions
about
it,
but
it
would
be
an
api
change
effectively
where
we
would
say
you
can
no
longer
request
forced
deletion
of
a
deployment
or
replica
or
staple
set.
So
I
I'm
a
little.
C
A
Clayton,
I
actually
I
update.
I
give
the
some
background.
Basically,
I'm
I
think,
there's
the
mini
debate
between
us
and
also
community,
so
many
features
I
did
because
we
also
don't
want
to
kubernetes
in
the
middle
for
a
lot
of
the
force
deletion,
behavior
scenario,
so
the
so
that
time.
I
basically
debate
a
lot
of
time,
the,
but
I
also
don't
want
it
in
the
middle.
I
just
want
to
say:
there's
the
also.
A
If
we
need
the
kubernetes
really
and
to
accumulate
the
real
need
delete
all
those
kind
of
things
coming
up,
and
so
so
I
basically
want
initially
what
I
want.
It
is
the
kubernetes
observer
at
least
observability
event
and
the
part
event
will
be
generated.
It's
currently
deleted,
but
then
there's
a
lot
of
people
think
about.
They
don't
want
to
wait
even
with
that
single-part
duration
and
and
to
delete
anyway.
I
forgot
all
the
rhythm
counter
argument
and
so.
C
C
So
then
the
question
would
be
like
if
we
can
improve
the
is
this
a
the
performance
is
slow
and
by
improving
the
cubelet
if
we
can
get
it
down
to
like,
because
I
do
think
that
if
we
don't
wait
for
containers
to
be
removed,
which
is
another
thing
in
the
pr
that
is
worth
some
discussion,
we're
talking
volume
cleanup,
which
is
the
slowest
part
of
it
so
volume
cleanup,
even
with
the
the
check
I
was
doing,
was
still
seconds.
C
Qos
c
group
removal
is
very
fast
and
the
sync
this
the
the
pod
sync
loop,
if
it
had
the
right
info,
could
basically
be
near
instant
with
the
exception
of
throttling.
So
I
do
think
it's
actually
pretty
possible
that
if
we
could
improve
volume
cleanup,
we
could
get
down
to
two
seconds
like
if
you
had
a
pod
that
had
a
termination
case
period
of
one
second,
you
could
probably
get
down
to
finishing
in
two
seconds.
Would
that
be
short
enough,
or
does
someone
still
need
it
even
faster.
A
This
is
this,
is
this:
is
the
evil
deletion
blocker
single
part?
That's
the
wrong
oriental
design
is
not
this
way
I
heard
design.
It
is
just
generate
clean
action
that
came
from
me.
I
said:
okay,
once
we
watch
after
party
event,
no
matter
it
is
the
container
creation
and
the
contender
delete
all
those
kind
of
things
and
even
like
the
you
you
generate
once
you
have
those
kind
of
things
and
you
generate
an
event
how
the
worker
will
generate
clean
up
event
associated
with
this
one
and
the
hand
over
to
clean
up
the
threat.
A
C
That's
probably
what
my
change
looks
like,
which
is,
if
you
said,
grace
period,
one
the
first
event
comes
in.
It
makes
it
to
the
pod
worker
right
now,
we're
not
interrupting
the
pod
worker,
but
in
order
to
do
context,
shortening
we
would
be
able
we
should
be
able
to
cancel
the
sink
pod
worker
by
passing
a
context
to
it.
C
So
that
would
be
the
that
was
listed
in
there
be
the
event
the
you
see
the
delete
it
certain
events,
transition
and
terminating
would
be
allowed
to
cancel
shortening
the
grace
period
ability
to
cancel
the
current
sync,
which
would
then
just
t
up
the
next
one
we'd
call
it
pod
killpod
would
be
done,
and
if,
at
that
point
we
have
enough
info
to
fire
the
final
status
it's
taking
a
while
today
under
load,
because
the
status
manager
is
not
optimized
for
per
pod
latency,
it's
optimized
for
throughput,
but
I
think
my
change
is
90
of
that
which
would
basically
be
it's.
A
This
available
up,
while
I'm
cleaning
up
also
is
the
shifter
and
the
hand
over
to
the
different
things
is
just
because
on
disk
there
are
also
you
need
a
single
wheel
about
all
the
disk
management
include
of
the
container
image,
download
and
start
log
all
those
kind
of
things
on
the
disk.
So
this
is
what
I
initially
designed.
A
It
is
that's
the
power
worker
for
handle
off
the
single
part
event
and
of
their
watch
about
all
those
events
generated
from
the
api
server
related
stuff
and
then
there's
the
separate
of
the
maybe
a
group
of
the
disk
co-worker.
That's
the
separate
disk
worker,
but
there's
the
one
thing's,
the
single
slide,
monitor
entire
of
the
node
disk,
include
new
image,
download
track
of
the
image
download
and
the
logging,
and
and
also
the
water
and
on
the
disk.
So
that's
kind
of
make.
A
C
E
C
It
was
actually
good
for
me
to
bring
it
all
back
into
context,
because
I
could
see
many
of
those
elements
like.
Maybe
we
can
do
a
follow-up
at
some
point
soon
and
go
through
the
pr
detail
for.
Are
there
other
elements
we
can
improve,
because
I
do
think
in
order
to
fix
some
of
the
race
conditions
we
found
anyway,
we
have
to
do
some
of
these
trade-offs
like
because
we're
the
loops
that
are
cleaning
up
stuff
too
early.
C
A
A
So
I
think
the
please
review
the
pr
and
also
read
off
the
docs
share
by
the
alailah
and
the
thanks
a
lot
to
converging
everything
and
the
ninth
house
thanks
offer
for
the
review
benefits
for
this
one.
I
know
you
are
super
super
busy
so
for
time
checking
and
then,
let's
move
to
next
topic.
F
Let's
me,
as
we
see
issue
101
851,
we
want
to
expose
container
start
time,
incubates
metrics
resources
and
points,
and
so
we
can
have
two
benefits.
First,
it
allowed
to
detect
containers.
We
start
as
a
container
start
time
change
seconds.
It
allowed
to
speed
up
metrics
for
fresh
containers,
and
so
I
think
it
works.
B
Thanks
so
much
for
joining
today
to
bring
this
up
and
I
can
provide
a
little
bit
more
background
as
well.
So
on
the
sig
instrumentation
side,
merrick
was
looking
into
using
the
metrics
resort
endpoint
as
a
replacement.
I
think
for
the
summary
api
and
there
were
some
issues
with
the
cpu
usage
metrics.
B
F
A
A
A
The
first
claim
it
is
the
nodal
id
that's
node,
startup
type,
similar
to
how
we
claim
this
is
how
we
claim
of
the
power
startup
type,
when
we
first
and
like
the
duration
is
that
when
we
first
have
this
part
created
and
been
seen
in
the
api
server
and
tier
of
the
all,
the
initial
application
container
start
once
so.
That's
about
the
start,
latency
at
least
this
is
how
we
initially
defined
node.
It
is
that's
orange.
What
I
defined
and
just
as
the
brainstorming
here
through
here.
B
My
concern
with
having
the
node
start
time
be
when
the
node
marks
itself
as
node
ready
is.
It
can
run
stuff
before
then,
which
would
be
confusing
right
like
static
pods
could
potentially
start
before
then
things
that
I
guess
tolerate,
node,
not
being
ready,
could
potentially
start
before
then
so.
A
We,
I
think,
though,
I
think
a
lot
of
people
forgot
if
the
signal
that
we
we
we
know,
there's
the
use
cases
for
static
part.
We
even
last
year
support
some
a
real
name
required
of
the
feature
but
again
static.
Part
is
discouraged.
A
We
know,
there's
the
initial
clock,
the
initial
cluster
initial,
depending
on
the
static
power.
I
I
I
know
the
no
matter,
openshift
rtk,
there's
the
static
part
dependency,
but
I
believe
in
the
signal
from
day
when
we
said
we
are
discouraged,
people
using
static
power
all
the
time
only
for
the
cluster
initialization.
We
know
there's
the
dependence
until
we
address
the
after-sale
for
health
sales
of
the
cluster.
So
that's
why?
That's
mostly
for
these
cases,
but
the
rest
of
staff
shouldn't
ask
the
static
product.
E
E
A
I
think
it's
not
just
cli
actually
know
the
ready.
Actually
in
the
past
the
notoriety.
I
know
we
relax
a
lot
of
things
in
the
past
the
node
already
it
is
at
least
the
container
runtime
run
once
so.
You
will
see
container
runtime
is
ready
and
then
you
will
see
the
yasirai
or
it
is
the
powder
center
wrench
is
being
located
yeah.
So
it
is
not
variety.
That's
the
first
time
notoriety.
So.
E
The
container
start
time
metric,
that's
in
the
proposal,
makes
total
sense
to
me.
The
the
just
clarifying.
If
time
of
note
is
I
mean
it's
a
net
new
metric.
I
just
didn't
know
what
that
was
actually
measuring
and
it
wasn't
clear
what
that
metric
was
used
for
right
now
to
meet
the
two
use
cases
above
like
if
it's.
If
node
start
time
is
when
the
computer
was
powered
up.
E
Let's
say,
then
I
guess
you're
measuring
the
delay
from
power
up
to
running
a
container
which
seems
very
useful
versus
start
of
cubelet
reporting,
ready
and
then
running
a
container,
because
you
would
have
already
been
probably
running.
Containers.
B
A
I
also
want
to
see
that
even
today
note
the
wanna
know
the
first
up
and
we'll
say:
oh
copenhagen,
birth,
generator
event
and
then
load
is
ready
and
the
generator
okay.
No,
it's
ready
event.
This
is
why
what
I
talked
about
know
the
status
object.
Otherwise,
so
basically
we
used
to
measure
it.
Is
this
node
the
successfully
joined
cluster
and
it's
ready
to
take
any
work
from
api
server,
so
so
we
believe
node
itself
can
expose
other
metrics
to
say.
Oh
how
long
you
you?
A
Basically
you
have
other
metrics
to
measure
after
your
kernel,
startup,
tab
and
and
turn
already,
because
you
just
pass
your
log
as
a
kernel
log
and
there's
the
there
are.
Also
you
can
you
can
measure
if
you
look
at
the
know,
the
perform
they're
actually
measure
all
those
kind
of
things
based
on
the
log
scale,
but
do
we
want
to
when
we
talk
about
kubernetes
by
default,
open
source,
not
each
body's
production,
and
actually
we
should
be
most
generic
as
data,
so
the
node
started
up.
A
For
me,
it
is
more
like
okay,
this
node
is
ready
to
join
unless
kubernetes
right
now.
I
think
the
director
you
were
here
when
I
talked
about
can
we
make
the
kubernetes
as
the
self
content
of
the
product
I
mean
not
product
and
for
the
for
the
this
is
to
come
up
the
static
part
discussing
all
the
other
things
discussing
kubernetes
itself
actually
could
have
api
like
like
the
public
upgrade
we
turned
down
that
proposal.
A
I
I
mean
I
propose
that
when
the
community
turned
out
and
proposed
a
long
time
back
so
then
basically
kubernetes
is
the
basically
kubernetes.
It
is
always
part
of
the
cluster
and
you
have
to
be
say.
This
carbonate
is
ready,
server,
class
api
original.
We
actually
even
want
companies
to
put
the
public
api
so
itself
can
publish
the
power
spec
power
api
right.
So
that's
original
water.
I
thought
so.
Then
we
could
have
the
start,
but
that's
the
older
time
we
can
start
kubernetes
through
the
kubernetes
evolve
with
the
whole,
the
kubernetes
cluster.
D
Just
want
to
say
that
in
this
context
I
feel
like
it,
the
the
metrics
are
mainly
used
for
for
monitoring
the
monitoring
metrics.
So
it's
for
calculating
the
cpu
usage.
In
that
case
I
mean,
if
we
add
node
startup
time,
it
should
be
used
to
serve
the
purpose
for
cpu
usage,
calculating
right
then,
in
that
case
the
node
ready
main
may
not
make
sen
make
sense.
B
I
want
to
make
sure
that
we
get
clarity
on
this
before
we
spend
too
too
much
time
digging
deep
in
it.
So
I
put
an
action
on
here
I'll
summarize,
the
discussion
we
had
today
on
the
bug
and
I'll
take
this
to
sig
instrumentation,
to
clarify
why
merrick
wanted
the
node
boot
time,
because
I
think
there's
a
bunch
of
different
ways.
B
We
could
implement
that
and
in
the
issue
he
says
it's
optional,
so
it
sounds
like
we're
mostly
aligned
in
terms
of
you
know
the
the
containers
start
time,
metrics
that
makes
sense
the
node
time.
There's
some
questions
about.
If
we
either
come
to
the
conclusion,
that's
not
needed,
then
that
solves
that
problem
or
if
we
can
get
more
context
in
why
it's
needed,
then
maybe
we
can
move
forward.
Does
that
sound
good.
A
Sounds
good
really
good
thanks
keeper
and
also
come
back
thanks
for
taking
this
one.
One
thing
is
just
one
I
didn't
know
the
problem
detector.
Actually,
if
I
democrat
somewhere
I'd,
know
the
bullet
of
time
to
export
that
as
the
matrix
already,
if
some
product
you
want
to
use
it,
I
don't
know,
I
think
it
could
go
know
the
problem,
no,
the
problem,
detector.
This
is
the
compromise
we
we
we
have
in
the
past.
Let's
move
to
next
one
knee
hi.
G
Everyone,
so
I
think
to
for
as
part
of
the
femoral
containers
prr,
I
wanted
to
add
a
couple
of
new
metrics
to
the
to
the
kubelet
to
answer
the
following
two
questions:
how
many
ephemeral
containers
are
running
on
this
node
and
how
many
has
this
node
started?
How
many
were
ever
run
on
this
node?
So
I
added
a
couple
of
gauges.
A
couple
of
counters
and
the
counters
are
fine,
but
I
think
there
was
there's
some
overlap
with
existing
metrics
for
the
gauges.
G
G
These
measure,
the
the
kubelet's
internal
representation
of
containers
so
running
pods
measures,
sandboxes,
not
pods,
technically
and
running
containers,
manages
just
the
list
of
containers.
Coupe
containers,
not
api,
object
containers
so
that
doesn't
have
the
information
I
need.
G
I
I
could
plumb
down
the
information
and
like
add
them
as
a
label
or
or
some
or
like
an
annotation
which
would
be
fine,
and
we
actually
even
had
the
container
d
authors
asked
for
this
behavior,
so
that
would
be
beneficial,
but
before
I
knew
that
what
I
did
instead
was
add
a
new
metric
to
the
pod
manager
that
has
access
to
all
of
these
api
objects,
the
the
pods
and
all
of
their
different
containers
and
surface
those,
as
as
a
gauge
counting
the
number
of
of
pods
and
containers
running
running
in
this
kubelet.
G
So
it's
clear
that
we
don't
want
both
of
these
metrics.
We
could
have
them
they're
measuring
slightly
different
things,
but
the
the
feedback
that
I
got
was
that
we
don't
want
multiple
metrics.
It
confuses
people,
so
my
question
for
signode
is
is
which
do
we
want
to
measure?
What
exactly
do
we
want
to
measure?
And
maybe
even
how
do
we
want
to
call
it.
B
So
my
concern
was
not
having
multiple
metrics.
It
was
just
that
the
naming
made
absolutely
no
sense
to
me,
and
I
didn't
know
what
they
were
talking
about.
So
I
just
linked
the
pr
and
a
chat
which
was
adding
the
clarification,
which
is
that
the
running
pods
thing
actually
measures
the
number
of
sandboxes,
not
pods
as
api
objects,
or
something
like
that.
B
So,
like
that's,
the
sort
of
thing
I
think,
would
be
super
helpful
and
in
the
pr
you
had
introduced,
this
terminology
managed
pods
and
I
was
like
what
is
a
managed
pod
are
not
all
pods
managed,
and
so
I
think
just
I
wouldn't
have
any
issue
with
having
multiple
metrics.
My
concern
is
just
I
want
to
know
as
a
cluster
operator.
G
Yeah
sorry,
to
misrepresent
your
position,
I
I've
actually
heard
from
multiple
people
that
they
prefer
not
to
have
multiple
overlapping
things,
but
it's
totally
yeah.
I
didn't
mean
to
to
misrepresent
what
you
said
in
in
the
pr,
oh
and
by
the
way
managed
pods
comes
from
pod
manager.
I
am
not
at
all
attached
to
this
name.
A
I
and
it
is
from
hello-
I
I
do
think
about
the
there's,
the
leader-
to
introduce
metrics
for
informal
container
and
running
on
the
load.
I,
but
I
understand
what
elena
said,
because
for
the
running
path
and
running
container
is
all
it
is
kubernetes
internal
representative:
again,
it's
kubernetes
internal
representation.
A
The
reason
just
like
the
earlier
clinton
then
talk
about
like
the
synchronization
issue,
risk
condition
issue,
even
even
actually,
without
those
risk
condition,
issue
kubernetes
by
design
kubernetes
of
think
which
the
api
server
could
be.
This
is
why
we
introduce
because
we
cannot
need
a
cabinet
in
the
middle
to
block
a
lot
of
api
behavior.
So
that's
why
we
maybe
have
some
like
the
time
and
the
difference
from
the
api
server
of
willpower.
So
now
we
could
the
driving
reconsolar
data
and
driving
our
stators
consolidate
and
the
sync
currents
with
the
api
server.
A
This
is
where
we
introduced
funding.
When
we
introduced
those
running
path
and
the
running
container
concept
to
then
we
can
monitor
from
the
eps
server
and
also
from
the
node
side
and
see
what's
the
diff
and
eventually
it
should
be
consolidated.
Right
should
be
seen.
So
the
so,
at
the
same
time,
I
want
the
informal
container,
because
running
container
could
be
total
on
the
node
in
formal
container.
It
is
actually
it
is
the
temperature
and
the
container
we
should
have
the
tracker.
A
I
rest
the
stuff
and
it's,
but
I
don't
agree
with
that,
and
I
know
we
should
have
the
wow
document
and
because
to
represent
of
the
history
legacy
and
what
we
have
there.
I
believe
we
do
have,
but
maybe
scattered
too
different
of
the
documentation
and
over
the
years,
maybe
people
just
don't
know,
don't
have
the
background
removed,
but
I
do
think
that
informal
containers
should
have
the
matrix.
A
I
want
a
new
feature
such
as,
like
the
informal
container,
which
is
take
a
really
long
time.
First
thing:
it
is
the
guard
by
the
matrix.
So
now
we
can
see
that
it
is
behavioral.
Well,
then
we
can
monitor.
We
need
to
build
this
monitoring
dashboard
to
say,
oh
a
giving
off
the
cluster
or
gaming
node.
How
many
informal
containers
is
running
it
is
that
informal
for
their
stay
there.
Then
we
could
give
us
our
full
path
to
clean
up
those
things.
People
don't
abuse
those
features
anyway.
I
just
share
my
top
thoughts
here.
E
Yeah,
so
I
guess
thanks
a
lot
for
linking
to
the
the
pr
where
the
existing
terminology
was
easily
confused.
So,
like,
oh,
I
think
the
one
with
the
running
pots.
Sandbox
clarification
is
like
a
net
win.
I
wish
we
could
fix
the
keys,
probably
for
for
this
pr.
I
have.
No.
E
This
all
seems
super
useful
and
the
the
manage
terminology
doesn't
upset
me
because
cubelet
its
job
is
to
manage
pods.
So
I
think
that's
a
fine
prefix.
E
I
I
like
being
able
to
distinguish
by
container
type
on
the
labels
there,
and
so
I
like
that,
while
you
were
focused
on
just
thermal
containers,
I
like
having
this
for
the
other
container
types.
So
I
I
have
no
like
objection
to
what
you're
proposing
here
lee.
I
I
have
to
tie
back
in
your
code
to
make
sure
it's
counting
the
right
thing.
The
running
containers
metric
we
have,
I
think,
historically,
would
have
been
tripped
up
by
the
pause
container
or
not
when
running
on
docker
versus
the
cri.
E
So
having
this
cleaned
up
is
having
something
that
represents
the
end
user
view
more
than
the
internal
subsystem
view,
makes
total
sense
to
me.
G
Okay,
thanks
derek
and
I
could
also
pursue
a
cleanup
of
the
of
the
running
ones.
If,
if
that's
of
interest-
and
these
are
all
in
alpha
st
still
running
the
running-
gnomer
kind
of
gets
to
me,
because
it
includes
containers
that
aren't
running.
E
I
guess
the
one
question
I
have
is
the
motivation
to
know
if
the
cubelet
had
launched
an
ephemeral
container.
I
guess
is
to
know
if
it
was
a
pristine
node.
I
assume
or
pristine
workload
environment.
G
No
so
this
is
this
is
to
help
cluster
admins
with
upgrades
and
downgrades,
or
feature
enablement
and
and
turn
off
just
to
detect
whether
the
feature
has
been
used
anywhere.
We
we
are
pursuing
a
pod
api
level,
sort
of
tainted
flag
to
let
no
to
to
inform
as
to
whether
or
not
it's
been
ever
had
an
ephemeral
container.
A
Part
of
monitoring
driving
development-
I
call
it-
I
don't
know
other
people
call,
but
I
call
is
monitoring
driving
development.
So
that's
why
I
I
I
hope,
let's
just
make
that
scope
smaller.
Only
I,
the
like
informal
container
and
the
information
and
not
over,
complicate
the
existing
api.
I
think
that's
also
a
last
concern
like
oh.
We
changed
this
to
landing
part
that
becomes
management
part
all
those
kind
of
things,
but
we
don't
have
document
because
running
power.
A
The
running
container
actually
has
been
legacy
being
used
for
a
long
time,
and
people
may
be
already
monitoring
those
things
we
used.
If
you
want
to
change
that
people
just
say:
oh
no,
no,
don't
change
it.
I
already
monitored
those
kind
of
thing.
So,
let's
just
add
this
kind
of
things.
If
we
want
to
do
the
bigger
surgery
and
consult
with
everything,
I
think
we
need
to
call
out
on
the
like
the
even
like
the
water
directly,
then
node
api.
More
seriously,
then
let's
call
our
data
separately.
A
E
A
A
Basically-
and
I
have
to-
I
would
just
suggest,
just
narrow
down
only
like
the
focus
on
the
informal
container
and
and
is
that,
okay
for
this
one,
because
I'm
basically
driving
this
one
as
the
monitoring
monitoring
driving
development
for
me,
because
every
feature
in
the
past
every
feature.
I
did
my
first
question
to
engineers
how
to
monitor
how
the
cluster
admin
and
the
lee
actually
is
si
at
google.
So
he
basically
have
this
kind
of
momentum
like
the
model.
How
to
do
those
things.
A
G
Okay
and
just
to
make
sure
I
understand
I,
so
I
am
solving
the
problem
that
I
need
for
thermal
containers,
but
but
also
surfacing
the
other
types
and
not
just
ephemeral,
containers
and
and
you're.
Okay,
with
that
too,
the
container
type,
as
opposed
to
only
a
metric,
a
metric.
That's
only
ephemeral
containers.
G
Yeah,
I
so
I
think
the
metrics
in
the
as
is
here
modulus
and
better
documentation,
I'll,
go
back
and
and
try
to
explain
what
they
all
are
better
or
we
agreed
that
this
is
okay
to
proceed.
G
E
D
D
Actually,
I
also
have
some
questions
about
this
because
to
me
this
is
not
cubelet.
It's
it's
not
the
actual
statement.
I
feel
like
it's
important
for
the
kubelet
to
report
metrics
for
the
actual
state,
because,
because
those
are
things
you
cannot
get
from
anywhere
else,
but
for
this
one
actually
to
me
it's
just
some
category,
counting
of
some
api
objects.
In
that
case,
you
can
just
easily
do
it
in
the
api
server.
B
The
cube
state
metrics
will
scrape
this
from
the
api
server.
The
api
server
is
the
authoritative
source,
not
the
cubelet,
so
we
shouldn't
be
trying
to
get
that
from
the
cubelet.
D
Yeah
in
theory,
actually
even
the
informal
controller
can
just
read
all
the
parts
and
counting
them
and
also
also
check
what
node
is
schedule
two
and
get
the
num
exactly
the
same
number,
because
it's
just
desired
state
and
all
everything
is
in
a
server.
I
feel
like
this,
and
especially
as
don
mentioned,
that
whenever
we
add
matrix
it's
hard
to
deprecate,
because
people
may
start,
depending
on.
F
D
G
I
was
just
gonna
say
I
think
it's
useful
to
know
how
many
containers
and
pods
the
kubelet
thinks
it's
managing
right,
because
the
kubelet
is
not
the
api
server.
You
could
pull
the
intent,
but
the
intent
is
different
from
the
actual
state
and
and
also
I'm
I'm
sorry.
I
don't
actually
know
the
answer
to
this.
There
are
other
types
of
of
like
the
static,
pods
and
mirror
pods
are
all
these
reflected
in
the
api
as
well.
Well,.
A
I
I
I
did
I
landed.
I
don't
think
about
the
way
I
disagree
with
china
here.
I
also
want
every
single
things:
kubernetes
expensive
export.
It
is
actually
state.
I
believe,
this
informal
container
also
it
is
in
this
category
and
the
api
server
like
the.
What
do
you
just
say
that
desired
state?
That's
exactly
how
kubernetes
today
export
of
the
running
parts
and
the
running
container,
that's
the
actual
state
right.
So
what.
D
E
This
in
pod
manager-
and
that
should
be
reflecting
actual
state
I
mean
I'll-
have
to
run
it
to
check.
But,
like
I
find
this
useful
in
the
case
of
knowing
did
a
cubelet
actually
get
the
watch
notification
that
the
api
server
said
should
be
there
or
not
like.
I
don't
think
the
api
server
is
always
the
one
true
source,
and
so
as
as
long
as
this
is
actual
state,
it's
it's
fine.
H
Lantau,
look:
a
quick
look
at
the
code,
he's
he's
making
the
call
to
start
the
you
know
the
ephemeral
container
and
then
he's
checking
the
result.
So
the
counter
is
after
the
result
of
successful
it
may
not
currently
be
running.
So
it's
not
an
actual
state,
but
it
is.
You
know
the
fact
that
they
were
successful
in
running
in
a
thermal
container
of
these
various
types.
H
A
I
I
think
this
is
back
to
the
creator.
The
earlier
state
terminology
misalignment.
I
think
it's
the
most
representative,
some
level
of
the
actual
state.
Now
I
understand
where
9th
house
came
from
so
but
it's
not
actually
the
continent
already
running,
and
but
it
is
the
what
kubernetes
understand
informal
container
and
on
this
note
and
the
supposed
run.
G
Okay,
sorry
we're
actually
talking
about
two
different
metrics,
the
the
one
that
is
updated
is
the
counter
of
has
an
ephemeral
container
been
run
and
there's
another
metric
of
it's
a
gauge
of
how
many
ephemeral
containers
are
running
right
now,.
G
A
A
Can
we
carry
on
this
one?
I
think
we
really
ran
out
of
the
time.
Can
we
carry
on
the
rest
of
this
guys?
At
least
a
single
from
high
level
will
agree.
Kubernetes
can
expose
some
of
the
matches
represent
the
actual
state
and
the
informal
container
account
magic's.
Actually,
it
is
useful
for
the
cluster
enemy
and
and
but
how
to
represent,
and
can
we
narrow
down
or
not
larry
down
and
how
we
are
go
we
can
carry
on
on
the
pia
is
that
okay.
A
Thank
you
lee
and
sweaty,
and
sorry
we
we
we
didn't.
We
didn't
track
our
time
while
I
helped
so
you
missed
today
and
next
week
your
your
topic
will
be
the
first
one.
We
will
make
sure
your
topic
is
the
first
one
to
discuss.