►
From YouTube: Kubernetes SIG Node 20200310
Description
Meeting Agenda:
https://docs.google.com/document/d/1j3vrG6BgE0hUDs2e-1ZUegKN4W4Adb1B6oJ6j-4kyPU
A
B
Hi,
this
is
how
way
so
than
how
a
knife
on
the
wrist
Commission
in
Hewlett
that
could
cause
a
pod
Walker
to
stop
functioning.
We
sent
a
fix
to
grants,
but
I
want
to
bring
a
topic
a
arise.
We
think
a
bigger
problem
here
is
that
when
a
goroutine,
cubelet
crashed,
say
in
this
case
a
column
or
routine
crashed
couplet
doesn't
crash.
B
It
recovers
and
keeps
growing
which
makes
this
kind
of
issue
hard
to
debug,
because
in
this
case
this
pod
Walker
for
the
for
a
specific
dog
stopped
doing
any
updates
to
the
containers
for
the
pod.
But
everything
still
farm
to
you
found
out
that,
for
example,
the
container
actually
crashed
and
the
cube
that
never
restored
that
container,
and
that
may
happen
days
weeks
after
the
panic
happen.
B
So
I
think
it's
battered
for
cubelets
to
crash
when
a
sub-goal
routine,
crashed
to
make
a
debacle,
debugging
easier
and
so
I
took
a
little
bit
into
the
history
of
kubernetes,
see
why
we
have
this
recovery.
Behavior
and
I
put
some
link
down
below
in
the
background,
I
think.
Actually,
in
2016
we
changed
the
behavior
for
all
components
in
Corrales,
from
recovering
the
panic
into
actual
panic,
and
if
you
click
into
the
2016
link,
I
think
we
had
some
agreement
without
cubelet
as
well.
C
D
Howie
the
issue
Howie
report
I
think
the
way
being
discussed
before
so
the
PR
and
the
back
that
the
report
before
also
and
so
I
believe
the
back
again
Eugene
and
and
her
look
into
that
one.
They
couldn't
reproduce
that
problem
so
so
so
so
there's
the
big
discuss
the
crash,
the
köppen
aid
or
not
crash
Cuban,
eight
and
even
the
discards
like
the
ones
we
Department.
Probably
my
understanding
is
the
current
problem.
D
It
airs
when
the
goal
routine
pad
Walker
is
crash
and
the
Cuban
into
working
as
normal
and
the
team
to
renewed
require
the
enter
self-recovery.
The
enter
a
nice
start,
the
new
guru
team
and
a
huge
reconsolidate
estate.
That's
the
good,
that's
the
major
problem!
So
so
I,
don't
recall
why?
Because
because
its
nature,
nice
think
about
the
Cuban
nature
crash.
But
then
we
just
started
from
scratch
and
then
we
consolidated
the
whole
estate.
D
D
D
This
is
particularly
problematic
once
you
crashed
and
then
Cuban
will
recover
and
also
everything's
moved
forward,
but
the
our
concern
is
I
really
want
to
introduce
that
kind
of
powerful
model
finger
model
to
the
Cuban
and
itself,
because
we
used
to
also
have
a
crash
carbonate
and
well,
if
I
remember
correctly,
we
used
to
have
like
the
one
continuity,
the
stalkers
number
ID
or
something
is
like.
We
crash
criminate
a
couple
times
and
we
in
the
APNIC
have
the
series
of
the
couponing
crush
loop
and
then
cannot
really
come
back
or
even
come
back
then.
D
B
The
thing
I
can
also
fact
what
so
so,
if
we,
my
original
proposals
that
we
change
the
behavior
for
this
handle,
can't
handle
crash
function,
so
I
think
we
can
do
a
research
in
Cuba
at
code
base
to
see
where
it
uses
this
function
and
recover
from
the
crash.
In
this
case,
we
found
that
it's
used
before
the
particle
routine
and
I
think
we
should
crash
in
that
case,
but
if
we
do
some
research
and
gather
what
in
what
could
pass,
we
have
this
kind
of
issue
and
categorize
them
into
like
in
this
case.
B
D
B
It's
so
it's
also
possible,
but
it's
basically
changing
the
place
where
we
detect.
We,
where
we
have
this
recovery
function,
where
we
detect
that
the
movie
was
crashed.
Well,
maybe
maybe
there's
another
way
track
attack
mechanism
to
check
sir
say
like
a
heartbeat
for
a
goroutine
and
create
another
one.
When
the
old
one
is
gone,.
B
A
D
Okay
was
one
case
s
universe.
The
stock
is
the
container.
The
doctor
is
not
response,
we
crashed,
but
then
later
we
changed
and
murder
because
cause
a
lot
of
other
said.
If
factor
so
we
we
character
in
a
signal
and
we
say
the
we
want
the
Cuban
eight
we
okay
terminate
to
restart
right,
so
we
all
kick
open
eight
to
crash
it,
but
we
are
not
artificially
to
crash
Cuban
eight,
because
we
still
want
Cuban
a
to
report
because
we
want
Cuban
eight.
We
think
what
accompanies
the
brain.
D
After
that
note,
if
we
wanna
say
this
note
is
the
key
know
the
Cuban
a
node,
so
we
want
company
response
to
know
the
healthiness.
Alright.
So
then
there's
the
a
lot
of
the
matrix,
that's
exporters
through
the
Cuban,
a
so
that's
kind
of
the
way
we
made
a
decision.
We,
of
course,
all
those
kind
of
things
at
the
win
is
the
totally
make
sense
at
a
gaming
time.
So
now,
kubernetes
totally
different
kubernetes
also
is
different,
evolved
a
lot,
so
we
could
change
it
and
but
Michelson
when
concern
it
is
we.
D
This
case
is
Stephanie
helpful,
but
then
we
started
to
reintroduce
that
to
crash
Cuban
eight
mode,
and
there
are
many
other
players
could
be
crushed
to
cooperate
and
it
could
be
next.
One
Mulder
I
introduced
two
new
mode
and,
and
they
introduced
the
new
mode
and
then
become
to
the
crash
loop.
Let's
I
need
the
concern,
so
so
I
think,
therefore,
a
lot
of
people
crashing
or
maybe
also
okay,
because
Cuban
enter
an
analysis
in
my
concern
is
certain
feature
even
Cuban
idea.
D
We
are
clearly
see
we
don't
want
Cuban
a
to
checkpoint,
and
but
we
do
because
Cuban
and
also
is
extended.
Mold
hazel
is
a
model,
so
there's
the
component
pack
into
you
evoked
by
Cuban
a
to
actually
do
some
chick
upon
it.
I
won
my
cousin,
it
is
evil
Cuban
and
it
is
self
crash,
and
at
least
the
we
could
detect
those
checkpoint
by
the
crop.
Hit
big
hard
record
right
now
is
more
like
the
packing
mode
same
as
the
Cuban
ideas.
D
A
A
E
D
That's
really
easy
for
people
to
spot
that
problem
and
the
way
we
look
into
those
kind
of
problem
before
and
we
do
notice
that
the
colluding
crashed
and
we
think
what
is
a
state
mismatch
and
a
lot
of
time.
We
come,
we
suggest
after
user
customer
it
is
delete
that
particular
Department
instance
and
in
but
that's
the
problem,
it
is
actually
part
is
running
successfully.
D
Part
is
in
the
running
state,
but
it
could
be
negative
container
is
in
Nevada
State,
but
it's
not
detected
because
the
puddle
workers
just
crashed
and
the
best
actually
is
more
much
harder
to
detect
right.
So
so
this
is
really
yeah.
Just
thanks
to
find
this
back.
I
just
want
to
first
part,
because
we
know
your
solos
problem.
If.
A
F
We
should
definitely
make
sure
that
there's
a
metric
for
number
of
times
cubelet
is
starting
surfaced
up
through
the
components
it's
like,
like
anybody
who
has
a
prometheus
system
should
have
an
alert
for
that
today.
I'm
pretty
sure
that
doesn't
exist
because
they've
never
seen
it
fire
yeah.
This
would
be
an
obvious
one
like
if
cubelet
is
starting
multiple
times
that,
at
a
rate
higher
than
like
one,
every
ten
minutes,
something
is
for
one
every
hour.
Something
is
wrong.
D
Yeah,
so
how
many
times
so
I
forgot
it
so,
basically
that
it
is
using
to
detect
this
kind
of
problem,
and
anyway,
so
I
just
threw
out
here
I
one
of
the
my
major
concern.
It
is
because
each
component,
which
is
plugging
and
incubate
and
in
the
past
they
did
the
checkpoint
and
those
checkpoint,
because,
due
to
the
coupon
it
crash
and
in
the
corrupted
mode
and
then
cause
the
coupon,
it
cannot
renew,
which
is
their
cracking
cause
Cuban.
It
cannot
really
start
so
is.
A
It
I
forget
like
if
everyone
in
the
world
is
gonna,
be
probably
launching
a
Cuban
on
a
Linux
house
on
some
system
gig
unit,
that's
gonna
have
some
restored
on
failure.
I
can't
recall
right
now,
if
there's
a
simple
way
to
figure
out
from
sister
Angie,
how
much
it's
done
that
restart
I
want
to
go
check
that,
but
assuming
anybody
in
production
could
wire
that,
in
their
monitoring
system,
I
think
that's
probably
easier
than
taking
on
a
no
problem.
A
B
G
D
We
can
quickly
discuss
I,
think
the
I
think
that
you
really
want
it.
Is
this
probable
exporter
to
Castle
I
will
say:
oh,
this
is
Cuba,
that
is
in
crash
loop
and
I'm,
not
sure
the
system
D
I
know
system
have
the
rheostat
account,
but,
but
today
I
don't
think
well,
they
have
like
the
interval.
They
do
have
that
the
between
each
series
style
they
have
the
interval,
but
they
don't
have
like
the.
How
I'm
going
to
ask
major
this
is
the
in
look.
D
This
is
the
crash,
look
like
the
total
interval
and
how
many
Cuban
eight,
how
frankly
Cuban
in
me
style
and
then
I
think
well,
you
needed
here
attention.
So,
let's
alert
after
signal
I
think
that
that's
what
admins
and
I
know
the
problem
detector.
We
detect
that
a
problem
based
on
the
giving
total
interval
and
how
frankly
you'll
restart
couponing
or
talk
her
and
say
this
node
is
okay
or
not.
Okay.
A
H
H
So,
basically,
I'm
what
we're
talking
about
is
a
way
to
make
it
possible
to
share
limits
across
containers
in
a
single
pod.
So
a
little
bit
about
my
motivation
to
include
this
I'm
working
in
a
product
called
the
business
applications
to
do
@s
AP,
which
provides
and
development
environment
an
IDE
as
a
pod,
so
each
user
would
we
get
basically
his
own
pod
that
simulates
a
full
development
environment
and
in
our
product.
H
So
if
one
user
wants
to
compile
using
one
tool,
another
user
so
compiled
using
a
different
tool,
they
would
probably
benefit
from
different
limits
and
different
memory
requirements
for
each
of
those
containers
providing
each
of
those
tools.
And
what
this
is
about
is
actually
maybe
enhancing
or
extending
the
way
that
kubernetes
sets
up
the
cgroups
on
the
pod
so
that,
instead
of
setting
up
the
resources
on
the
container
level,
it
would
be
possible
to
set
the
resource
limits.
On
the
pod
level.
There
is
already
a
cebu
for
each
pod.
H
Basically,
we
look
at
what
exists
today,
and
every
quality
of
service
guaranteed.
First
of
all
and
best-effort
has
its
own
C
group,
and
under
that
there
would
be
another
C
group
for
the
part
itself
and
under
that
for
each
container,
including
the
post
container.
There
would
be
an
additional
signal
and
the
limiter
actually
said
on
this
level
for
guaranteed
it
actually,
today
is
already
setting
the
limits
also
on
the
pod
level,
but
it
doesn't
actually
matter
because
there
is
no
way
that
you
could
escape
the
the
limit
that
is
set
on
the
container
level.
H
A
H
So
basically,
what
my
proposal
is
is
to
allow
the
user,
if
you
so
decides,
to
specify
to
the
you,
let
that
for
specific
wads
there
should
be
a
limit
set
on
the
pod
level,
even
if
not
it
was
not
set
on
each
and
every
one
of
the
containers
in
in
that
specific
pod,
and
this
is
an
opt-in
behavior.
So
if
the
user
wants
to
keep
the
current
behavior,
that's
okay,
no
change!
D
Yeah
I
understand
wearing
kinfe,
but
I
think
I
think
about
the.
If
he
the
why
we
we
comments
are
so
awful,
it's
easier
to
for
user
to
set
that
limit
or
request
limit
at
the
paddle
I
will
instead
I
to
the
container
level
we
see.
Each
container
actually
really
represent
is
a
binary
or
it
is
application,
so
they
can
either
much
easier
to
to
through
the
sequel.
D
For
me,
it's
much
harder
to
side
at
the
paddle
Iowa,
because
the
pub
and
I
will
resource
usage
is
not
always
controlled
by
me
as
a
developer,
so
I
wanna
go
out
a
certain
application
and
I
want
to
benefit
by
kubernetes
all
those
functionality
services
you
gave
to
me
and
I
end
up
to
schedule.
I
do
the
deployment
I
could
have
some
other
container.
Some
other
is
not
out
of
my
control
and
in
share
the
same
powder
so
forth
from
that
perspective,
much
harder
to
cider
at
the
popular
wall.
I
just
want
yes,.
H
I
I
completely
agree:
I
will
get
to
this.
I
have
a
slide,
I
think
294
to
discuss
the
two
alternatives
to
implementing.
Let
me
just
get
to
that.
Okay,
the
advantages
of
allowing
these
odd
level
limits.
Well
for
me
at
me,
it
makes
sense
because
it
means
I,
don't
have
to
micromanage
the
container
limits
in
my
specific
use
case
and
I.
H
Don't
need
to
give
unlimited
resources
to
specific
containers,
because
the
current
behavior
is
that
if
I
don't
specify
a
limit
for
all
of
the
containers,
then
basically
those
containers
that
don't
have
a
limit
are
unlimited
and
that
actually
also
causes
some
sort
of
noisy
neighbor
problem.
So
pods
in
the
burst
of
all
quality
of
service
level
that
have
containers
that
are
not
constrained
are
actually
able
to
consume
all
of
the
resources
that
that
belong
to
the
host,
and
this
would
actually
make
it
possible
to
to
prevent
that.
Yes,.
A
H
That
is
true,
for,
but
still
there
would
be
contention
between
the
pods
that
are
running
on
the
node.
Even
if
the
cube
let
itself
or
the
system
itself
would
be
able
to
continue
working
because
they
have
their
own
slice,
the
the
burst
of
all
part
would
still
be
able
to
consume
more
resources
than
even
a
guaranteed
part
living
on.
On
that
same
that
single.
A
H
H
I
Doesn't
Potsie
group
is
well,
it
sets
up
the
C
group
that
the
containers
run
in
it'll.
Add
the
overhead
in
it,
because,
instead
of
just
running
those
individual
containers,
you
could
be
running
pieces
of
cry
or
container
D,
perhaps
or
you
could
be
running
a
virtual
machine
and
using
that
to
isolate
instead
of
just
namespaces.
So
you
would
essentially
cubelet
would
go
in
and
size
the
pod
C
group
accordingly
to
the
some
of
the
requests
plus
that
and
then
it'll
update
the
CPU
shares
as
well
and.
H
D
We
don't
need
to
figure
out
the
interactive
feature,
this
one
and
but
I
think
we
treated
how
the
overhead
is.
The
taxes
have
to
pay
once
you
install
that
one
and
you
a
lot
of
that
feature.
It's
kind
of
the
text
for
all
the
power
to
have
to
pay
so
I
think
it
a
QB
term.
There's
how
can
I
one
image
actually
could
it
be
remake
of
the
sinkhole
yeah.
I
D
I
also
think
about
the
powder
over
hide
that
actually
make
this
part.
Our
limit
is
possible
because
we
talked
about
powder
lower
limit
before
many
times,
and
one
part
it
is
just
introduces
virality,
actually
skim
from
the
pot
over
high.
Is
that
time
we
start
talk
about
the
virtual
machine
and
we
talk
about
some
of
different
of
the
container
as
donation
technology,
all
those
kind
of
things
we
we
think
about
overhead,
it's
harder
to
be
I.
D
I
Yeah
I
was
gonna
say
that
I
I'd
heard
a
lot
of
other
people
as
well
kind
of
describing
a
desire
for
pad
level
limits.
I
think
that
the
difference
is
that
they
don't
want
to
care
about
anything
with
respect
to
container
resources.
So,
whereas
I
see
this
cap
is
kind
of
talking
about
bursts
of
all
specifically
call
guaranteed,
it
doesn't
really
matter
it's
already.
I'll
send
up,
but
I
think
the
other
end
of
the
spectrum
is
where
they
don't
want
to
specify
anything,
and
they
want
to
have
it
set
up
correctly.
I
H
So
so
maybe
I'll
get
to
the
next
two
slides,
which
are
the
two
different
documentation
options,
and
then
maybe
we
can
discuss
exactly
what
the
best
way
to
implement
this
would
be,
and
obviously
the
first
implementation
object.
Option
would
be
to
put
this
on
the
pot
level
and
something
like
this.
It's
a
resources
section
on
the
pot
itself
and
then
you
would
limit
it,
however
way
you
want,
but
actually
I
agree
with
what
we
said
here
before.
H
It's
not
that
convenient,
because
that
would
mean
that
I
would
need
to
calculate
all
of
the
limits
myself
and
I
would
need
to
somehow
figure
out
what
the
correct
limit
for
the
pod
is,
and
in
cases
where
you
have
something
like
a
sphere
that
is
adding
additional
containers
to
my-my-my
pods
and
I'm.
Not
actually
in
control
of
that
I
know
necessarily
how
to
deal
with
that,
and
also,
if
you
upgrade
sto
and
then
suddenly,
the
continued
limit
that
issuing
Jax
is
different
and
then
everything
breaks
so
I'm,
not
really
a
fan
of
this
option.
H
I
was
thinking
of
something
a
little
different.
What
I
was
thinking
was
to
set
a
boolean
on
the
pot
level,
I'm
open
positions
for
the
name
not
not
married
to
this
name,
it's
just
kind
of
hard
for
me
to
figure
out
the
more
descriptive
name
and
basically,
what
this
would
do.
It
would
set
up
the
C
groups
in
the
following
way.
A
H
This
it
would
not
impact
because
we
would
sum
that
limit
and
add
it
into
the
C
group
on
the
level
of
the
pot.
So
if
it
says
that
it
needs
another,
I,
don't
know
128
megabytes,
then
that
is
in
addition
to
all
of
the
limits
that
are
in
defined
on
the
rest
of
the
containers.
Then,
actually
in
my
in
my
pod,
okay.
A
A
H
So
I
think,
if
we
abstract
it
on
the
runtime
level,
then
it
becomes
a
lot
harder
to
use,
because
that
would
mean
that
if
I
want
to
set
a
budget
or
a
limit,
I
would
need
to
and
a
different
budget
or
different
limit
for
different
odds.
I
would
need
to
define
a
random
class
to
go
along
with
that
different
budget
or
different
limits,
and
that
would
create
an
explosion
of
runtime
classes.
Also,
you
might
need
to
start
giving
permissions
to
developers
that
are
not
necessarily
good
idea.
A
I
It's
got
a
we
just
today,
do
a
summation
of
the
containers
and
that's
what
we
said
all
that.
Well,
actually
we
don't.
We
just
in
the
community's
case,
take
and
mimic
what
is
done
by
cubelet
already,
so
that
initial
pod
seeker
value,
if
it
was
explicit,
you
know,
and
it
the
pods
back
said
that
this
is
what
the
pod
level
resources
are.
That'd,
be
great.
J
J
H
J
H
A
The
other
question
I
have
is:
what
should
the
own
killer
do,
so
how
do
I
decide
which
container
to
kill
when
one
is
consuming
too
many
resources
with
this
model?
Do
I
do
anything
differently,
or
today
we
said
it
was
gorgeous
relative
to
usage
versus
request
and
I'm,
not
sure
if
you
would
desire
that
same
semantics
when
sharing
resources
across
your
containers
is
there
had
you
thought
about
that?
I
did
not
I.
H
A
When
we
prior
discuss
this
David
I
feel
like
I've,
always
been
interested
in
over
committing
the
pot
itself
and
so
yep
speaking
of
a
set
request
and
limits
on
the
pot
was
useful
if
I
wanted
to
overcome
it.
But
if
this,
if
option
wasn't,
isn't
an
option
you
want
to
like
deeply
pursue,
then
maybe
just
focus
on
option.
Two
I.
I
If
we're
gonna
go
ahead
and
do
this,
it
would
be
nice
to
be
able
to
do
something
like
best
effort
which
I
think
option.
Two
isn't
going
to
be
able
to
necessarily
cover,
and
there
did
naively
haven't
not
thought
nearly
as
much
as
you
I
would
think
option.
One
would
require
a
decent
amount
of
validation.
I
To
checking
what
you
know,
the
pot
level
is
versus
what
the
different
secret
requests
are:
make
sure
that
one
is
bigger
than
the
other,
but
for
me
that
would
maybe
be
the
most
useful
so
that
we
end
user
can
just
go
ahead
and
say
all
I
care
about
is
setting
the
pot
and-
and
you
know
the
workloads
can
figure
it
out.
I,
don't
really
care!
You
know,
if
you
imagine
kind
of
giving
out
a
pod
to
somebody
who's,
doing
different
container
builds
or
testing
or
things
like
that.
I
H
There
has
to
be
at
least
one
container
that
sets
limits,
because
otherwise
you're
in
the
best
effort,
quality
of
service
environment-
oh
I,
thought
that
was
what
oh.
So
you
mean
sorry
so
that
I
understand
treatment
that
moving
this
instead
of
being
in
the
burstable
octave
service,
double
to
move
this
towards
the
best
effort
and
then
have
the
resources
being
set
on
the
pond
devil
or.
A
C
H
Would
have
basically
no
effect
in
that
case,
it
would
be
a
definition
that
you
would
need
to
validate,
and
you
know
when
the
pod
is
being
checked
to
see
that
everything
is
okay.
In
the
end,
it
would
mean
that
the
developer
would
need
to
in
his
in
his
head
verify
that
the
summit,
the
pod
level,
it
also
matches
the
specific
values
in
each
of
the
containers.
It's
additional
development
over
and
then.
A
C
D
I
do
have
a
concern,
so
so
so
there's
a
couple
constant
I
have
but
I
just
want
to
name
one,
because
the
other
comes
in
I
haven't
were
clearly
single,
true,
so
one
concern
I
have.
It
is
based
on
this.
One
you'll
use
kisses,
you
described,
I
think
it's
totally
fate
with
the
option
to,
but
the
nurse
that
could
be
used
kisses
and
used
to
really
know
what
the
contender
is,
the
gaming
content
of
their
resource.
So
they
want
to
carefully
think
about
the
what's
the
peak
usage.
D
D
Well,
I,
don't
know,
and
also
they
don't
mean
a
specific
minute
in
my
thesis
you
could
end
up
like
the
assistant
of
the
container,
in
that
application
could
be
using
because
you
mark
this,
a
shared
personal
image
to
the
potentially
died
to
a
container
the
when
I
was
a
helper
company
or
whatever
container
could
be
using
all
the
usage.
Although
other
a
lot
of
the
limit
to
that
in
that
part,
so
to
me,
is
the
potential
is
actively
worse
priority
issues
here.
D
So
I
want
to
my
while
define
container
like
the
all
those
kind
of
Webber
stories
and
all
those
kind
of
things
and
I
I
made
I
stay.
Okay,
average,
you
say
gee
that
is
I
stopped,
requester
and
then
peak
I'll,
say
the
peak
usage
is
minute:
I'm
the
wild
bhai
weight
of
the
singular.
So
now,
there's
the
something
and
somebody
provides
something
and
injects
into
my
partner
for
the
deployment,
but
they
couldn't
because
they
have
no
idea,
they
provide
infrastructure
or
services
for
everybody.
So
they
cannot
predict
what
is
the
real
usage?
D
It
is
so
they
unlimited
in
that
could
be
using
most
of
things
and
then
that's
back
to
the
Mac
Brahm.
Actually
he
mentioned
something
in
that
cases.
A
lot
of
time
is
not
an
easy
to
me.
Release
of
the
memory
they
burn
out.
The
memory
type
is
the
such
certain
memory.
Even
cannot.
Declare
ball
so
always
goes
to.
Powder
goes
to
negative,
enter
click
pile
in
Iowa,
so
even
a
company
dying,
it's
not
other
resource.
You
see
like
there's
certain
slab.
D
Your
city
is
now
to
release
them,
and
this
goes
to
the
powder
level
how
to
destroy.
It
goes
to
the
root
level.
So
a
lot
of
time.
If
you
major
carefully,
you
can
see
that
slab.
You
said
you
keep
increasing,
so
you
end
up
to
have
to
force
no
the
reporter
to
release
those
kind
of
the
memory.
So
enough,
like
that,
you
could
the
neck
the
really
hard
of
you,
while
behind
while
designer
application
here,
so
they
just
I
just
want
to
sue
this
one.
This
is
the
one
of
the
concern.
H
H
A
Was
gonna
be
my
follow-on
question
which
was:
do
you
want
a
scratch
space
to
be
shared
across
containers
like
a
ephemeral,
storage,
CPU
and
memory?
And
if
you
were
using
memory
with
empty
Terr
back
type
of
s,
then
you
probably
would
have
hit
the
situation
that
Don
talked
about
and
then,
if
you're,
using
a
ephemeral
storage,
either
scratch
space
in
the
container
or
like
a
local
volume
or
something
it
wasn't
clear
if
you
want
to
scratch
space
to
be
shared,
because
I
could
see
an
argument
that
that's
also
hard
to
size
another
container
base.
A
So
how
is
it?
Is
that
also
see
group
or
a
different
evil?
It's
not
enforced
at
a
secret
level,
but
there
is
a
loop
in
that
cubelet
that
enforces
ephemeral,
storage,
so
I
think
in
the
interest
of
time.
I
know
there
were
other
topics.
We've
spent
a
lot
on
this.
Maybe
we
could
give
a
quick
chance
to
let
others
get
their
items
raised
if
it
was
quick
reviews
or
icy
plating
also
typing
a
huge
number
of
notes
here.
A
D
Wanna
say
thank
you
brought
this
up
and
it's
the
good
timing
should
revise
this
thing,
but
iterate
those
things
but
I
think
this
is
the
big
topic
and
it's
complicated
public
actually
and
we
we
cannot
draw
conclusions
in
so
we
need
to
carry
on
the
discussing
just
wanna
make
you
know.
Thank
you
thanks.
C
There's
yep
just
to
to
finish
up
for
beta,
but
the
last
one.
With
the
recent
merge.
We
did
break
the
Windows
platform
and
you
know
what
I'm
wondering
is
there
a
CEO
job
or
something
that
we
should
have
run
to
like
prevent
this
in
the
future
or
I'm
sitting
there
wondering?
How
do
we
miss
this
and
let
this
in
and
don't.
A
A
A
C
This
is
where
the
pod
admit
handler
was
Neil.
Windows
is
returning
nil
for
the
handler
in
it
at
some
point,
dereferences
it
and
blows
up
cubelet
fails
a
startup
they've
got
a
full
request
over
there
and
there's
been
some
comments.
So
the
other
ask
us:
okay,
we
broke
them.
Let's
maybe
help
get
it
in.
G
L
F
So
I
tried
to
capture
all
of
it
in
the
agenda.
I've
just
been
spending
as
I
was
in
I
was
looking
at
something
in
the
cubelet
it
caught
my
interest
around
status
managers
that
allows
since
we've
gone
in
and
touched
status
manager,
so
I
was
looking
at
I
think
there's
a
lot
of
room
on
the
table
for
improving
the
late,
the
Indian
latency.
From
the
time
we
detect
that
a
status
update
is
necessary
in
the
cubelet
to
the
time
it's
written
to
the
API
server,
especially
in
busy
nodes.
F
So
one
of
the
things
is
we
kind
of
have
a
real
simple,
like
here's,
the
stream
of
updates
that
need
to
be
happened
and
then
periodically
resync
everything,
that's
a
good
mechanism.
The
problem
is,
is
that
after
a
certain
point,
because
it's
a
single
threaded
process
when
you
get
about
more
than
10
pods
on
a
node,
there's
a
good
chance
that
you're,
just
in
the
reconcile,
behavior,
always
continuously,
which
isn't
bad,
but
that
like
maximizes,
like
p99
latency,
is
pretty
bad
on
that.
So
I
was
looking
at
a
couple
of
simple
changes.
Getting
familiar.
F
The
big
thing
was
that
there's
definitely
there's
definitely
some
room
for
reorganizing
the
code,
just
to
make
it
more
readable
while
doing
that
Jordan
and
I
caught
a
couple
of
bugs
that
are
just
you
know,
as
we've
evolved
there's
some
obvious
things
in
it
that
we
can
fix.
I
talked
to
David,
Ash,
Poole
and
Asheville,
and
we
he
gave
me
his
brief
on
what
he'd
done
before
there
was
probably
like
four
or
five
things
we
could
look
at.
F
I
did
some
simple
ones
which
is
trying
to
avoid
the
live,
get
which
is
actually
very
expensive
in
big
clusters,
because
reads
are
blocked
on
range
locks
today,
in
some
cases-
and
so
you
know,
you
know
a
big
cluster
lots
of
nodes
doing
live
reads.
All
the
time
of
pods
is
very
bad
and
there's
not
actually
any
safety.
When
we
do
a
live
read
because
in
between,
when
you
do
the
live,
read
and
then
you
generate
a
patch
and
send
it
back
to
the
API
server,
multiple
other
people
could
have
written.
F
So
I
think
that
right
there
is
probably
the
biggest
win
and
then
another
one
I
was
looking
at
was
prioritizing,
which
updates
go
out
and
the
updates
that
go
out.
That
really
matter
are
when
a
pod
becomes
ready
or
when
a
pod
becomes
not
ready
and
then,
when
a
pod
transitions
to
succeeding
or
failed
and
is
ready
to
be
cleaned
up
so
right
now,
I'm,
just
prototyping
and
I
will
pull
this
together,
an
account
but
I
think
there's
a
ton
of
low-hanging
fruit.
F
There
should
help
us
cut
p99
of
status
updates
and
pod
into
end
times.
Pretty
significantly
certainly
may
not
show
up
in
the
overall
numbers,
but
at
least
in
an
e
to
e
run.
You
know
just
like,
as
a
micro
benchmark,
I
was
able
to
cut
the
total
time
spent
waiting
to
send
a
status
update
to
like
a
fourth
or
a
fifth
of
what
it
was
before.
Just
with
some
basic
improvements
and
so
I
think
there's
some
there's
some
more
room
there
too.
F
So
we
can
get
tighter,
tighter
tolerances
on
something
changes
and
how
quickly
can
we
get
the
API
server
to
reflect?
That
did
you
mean
to
say
800
seconds.
So
if
you
sum
in
an
EBE
run,
if
you
sum
how
long
we
spend
waiting
from
the
time
we
could
we
detect
that
a
status
update
is
necessary
to
the
time
where
we
send
it.
F
It's
800
seconds
over
the
whole
run
with
some
basic
improvements
that
I
think
are
still
sound
like
from
a
correctness
perspective
and
obviously
like
we
had
to
be
really
careful,
because
we
had
lots
of
issues
with
stale
stale
reads:
the
thing
is:
is
the
library
doesn't
prevent
stale
writes
because
the
way
we're
doing
it?
So
we
still
have
some
issues
there,
but
I
was
able
to
get
down
to
about
200
seconds
of
waiting,
and
that
was
just
by
avoiding
the
live,
read
and
using
a
recency
cache.
F
F
It
doesn't
even
matter,
though,
because
all
of
the
same
like
the
thing
is,
is
that
the
current
logic
is
somewhat
like,
depending
on
a
library
read,
doesn't
guarantee
you
a
live
read
because
in
between
the
time
Reid
completes,
someone
else
can
go,
write
it
so
we're
doing
an
inconsistent
operation
after
a
consistent
operation
doesn't
protect
us.
We
just
have
to
be
sure
we're
consistent
with
ourselves
and
that
we
we're
not.
The
other
part
of
this
is
like.
F
As
part
of
this,
we
need
to
add
a
lot
more
IDI
tests
about
multiple
writers
explicitly
like
on
the
previous
work,
I've
done
for,
like
pod
termination,
just
having
an
EBE
test
that
extra
pod
termination
found
five
or
six
issues.
I
think
we
need
to
have
a
multi
rider
to
the
cubelet
multi
writer
de
pod
status,
ete
test
that
specifically
stresses
this
and
I'm
sure
we'll
catch
stuff
today
that
we
still
need
to
fix.
F
There
was
definitely
so
I
flagged
something
for
Bernal
to
look
at
with
CRI,
which
is
that
none
of
the
CRI
is
to
return
type
terrors.
So,
in
a
lot
of
error
cases
we
do
the
worst
possible
thing.
So,
like
you
try
to
start
a
pod,
that's
already
been
deleted
or
start
a
container.
That's
already
been
deleted.
F
The
error
message
says
this
container
doesn't
exist
and
in
a
teardown
case
we
could
probably
cut
one
or
two
seconds
off
a
lot
of
tear
down
loops
because
of
that
until
the
you
know,
the
plague
is
going
to
be
behind,
but
because
we
don't
have
a
typed
error.
We
just
do
the
naive
thing,
which
is
correct,
which
is
retry
later,
there's,
probably
a
big
win
on
pod
teardown.
If
we
can
get
type
their
ores
back
from
CRI
for
not
found
that
alone
would
probably
save
like
pod.
F
F
And
the
eventing
is
actually
pretty
good.
It's
just
the
interactions
like
the
pod
worker
interacts
poorly
with
a
status
loop
and
vice-versa
I.
Think
right.
One
of
the
things
that
we
can
do
is
actually
look
for
places
where
just
the
information
flow
is
bad.
The
nice
thing
was
is
that
we
have
like
most
of
this
was
pretty
obvious.
F
Just
look
at
the
couplet,
which
just
goes
to
tell
me
the
like,
nobody's,
really
sat
down
and
like
put
put
the
cubelet
through
its
paces,
and
some
of
these
like
what
happens
if
I
delete
and
recreate,
create
and
delete
a
pod
immediately
over
and
over
and
there's
a
bunch
of
issues
that
jump
out
getting
more
été
tests.
In
that
verify,
those
kinds
of
flows
will
get
out
will
put
us
in
a
much
better
spot
in
general
yeah.
We
test
it
and
cry,
but
yeah
make
sense
of
the
e2e
level
as
well.
Yeah
and
I.