►
From YouTube: Kubernetes SIG Node 20210914
Description
Meeting Agenda:
https://docs.google.com/document/d/1j3vrG6BgE0hUDs2e-1ZUegKN4W4Adb1B6oJ6j-4kyPU
A
All
right
well
welcome
everyone
to
the
september
14th
sig
node
meeting
my
meeting
is
recorded
and
will
be
later
updated
to
our
youtube
channel
for
folks
who
can't
attend
today.
A
B
A
B
If
you
have
some
formal
what's
going
on,
there
is
a
lot
of
pr
coming
in
and
we're
growing,
mostly
because
we
have
many
prs
hanging
around,
because
we
don't
know
what
exactly
to
do
with
static
pod
regression
and
how
to
proceed.
So
we
have
cherry
peak,
we
have
end-to-end
tests,
we
have
couple
attempts
to
fix
it,
so
I
mean
some
of
this
grow.
Some
of
that
growth
is
expected
because
we
just
work
in
progress,
but
we
also
have
more
prs
coming
and
emerge
rate
is
not
great
right
now.
B
So
I
think,
as
people
coming
up
from
location,
maybe
we'll
kick
off
speed
and
start
merging
faster.
I
checked
closed
prs.
There
is
nothing
that
rotted
away.
That
requires
some
attention,
so
this
is
great.
We
are
not
degrading
on
that
and
we
are
actually
paying
attention
to
everything.
That's
coming
all
the
closed
prs
are
overly
closed,
vrs,
so
yeah.
If
you
think
you
missed
something
just
check
out
what's
going
on
with
people
working
on.
Thank
you.
A
Yeah
thanks
sorry,
the
one
thing
I
was
just
trying
to
make
sure
was.
I
thought
we
got
through
every
enhancement
for
the
one
23
release
that
was
expected,
but
if
folks
feel
like
there
was
a
gap,
please
let
us
know,
but
I
know
at
least
that's
where
a
number
of
us
were
focusing
all
right.
So,
let's
transition
to
the
business
at
hand
here
so
alana.
You
have
the
next
item
here
on
the
static
plot
regression.
C
Yeah,
I
just
wanted
to
put
this
on
the
agenda
to
make
sure
that
there
was
some
visibility
and
folks
were
aware
of
the
scope
and
impact
and
all
that
jazz.
So
this
is
marked
as
a
critical
urgent
bug
was
introduced
in
122
as
part
of
the
pod
life
cycle
refactor,
and
we
are
we
meeting
a
bunch.
C
Myself,
ryan,
jordan
leggett
I've
been
working
on
trying
to
get
this
fixed.
Essentially,
if
you
take
a
static
pod
and
you
remove
its
manifest
and
then
re-add
that
manifest
unchanged,
then
that
pod
will
go
into
an
error
state
which
is
a
regression.
So
I
mean
if
you
have
static,
pods
and
you're,
not
touching
the
manifest.
No
problem
you'll
be
fine,
but
if
you
do
that,
then
it
will
cause
the
pod
to
error.
C
We
don't
want
that,
so
we're
trying
to
get
it
fixed
and
it
is
proving
to
be
relatively
complex
and
we
don't
really
have
a
test
case
to
prevent
like
something
like
this
from
happening.
So
I've
been
working
on
an
end-to-end
test
for
this.
I
have
something:
that's
definitely
catching
it
on
head
and
in
122,
but
like
passes
on
121
and
earlier,
however,
we're
talking
about
like
do.
We
also
want
to
add
more
test
cases
to
add
like
more
coverage.
C
Is
this
like
exactly
the
way
that
we
want
to
shave
this
yak
or
do
we
want
to
like
approach
this
test
case
in
a
slightly
different
way
and
so
on
and
so
forth?
So
I
don't
know
if
we
have
a
working
fix,
yet
the
latest
that
I
heard
from
ryan
was
that
clayton's
patch
might
not
that
the
pod
probe
workers
might
not
restart.
Is
that
right
ryan?
Do
you
want
to
jump
in.
E
Yeah,
maybe
so
there
is
a
end
to
end
on
the
pr
that
clayne
wrote
and
I
think
it's
working
for
the
most
part
we're
running
it
through
openshift,
ci
and
testing
it
there
as
well
and
we're
getting
some
flakiness
on
on
probes,
but
it
may
not
be
related
to
the
fix.
So
I'm
diagnosing
some
of
that
now.
A
So
I
just
had
a
question
like
I
thought.
A
lot
of
this
came
down
to
you
idea
assignment.
So
when
I
was
looking
at
this,
I
think
we
were
allowing
static
pods
to
escort
their
own
euid.
C
Yes
and
the
other
thing
that's
a
little
bit
weird
is
when
we
calculate
the
uid
of
a
static
pod.
It
basically
only
includes
like
the
pod
name
and
like
the
the
contents
of
the
file,
and
I
think
maybe
the
node
or
the
node
name
so
like
if
you
don't
change
that
manifest
if
everything
stays
the
same
or
if
you
have
a
static
uid
set
in
that
pod,
manifest
from
the
cubeless
perspective,
it
has
the
same
uid
and
so
that's
causing
a
lot
of
the
weirdness.
C
But
I
guess
now
that
that
behavior
has
been
in
cube
for
so
long.
I
think,
like
probably
six
years
unchanged
kind
of
thing,
I
think
people
will
expect
this
to
still
work.
So
I
was
like
well
isn't
that
a
bug
shouldn't
it
just
get
a
new
you
it
if
we
like
delete
and
recreate
the
thing-
and
the
answer
is
maybe
not
since
people
might
rely
on
this
behavior
and
that
will
break
them.
Yeah.
C
Yeah,
I
know
that
some
people
are
definitely
using
this,
like
I
think,
openshift
was
using
it
at
some
point,
but
in
terms
of
like
you
know,
do
we
want
to
continue
to
support
this
behavior?
I
think
the
consensus,
at
least
from
clayton
and
jordan,
is
that
we
should
there's
a
pretty
high
risk
of
breaking
people
if
we
go
and
like
change
this
so
okay.
C
Because
yeah,
I
asked
the
same
question
because
we
also
saw
some
weirdness
in
terms
of
like
pod
life
cycle
on
static
pods,
where,
if
you,
for
example,
like
restarted
a
node
from
like
the
node's
perspective,
when
it
would
go
and
possibly
incorrectly
update
the
mirror
pod.
C
That
pod
would
transition
from
running
to
pending,
which
is
not
exactly
what
we
wanted,
because
the
the
uid
was
the
same
so,
but
that
that
may
just
in
fact
be
a
bug
with
how
the
mirror
pods
get
created
for
static
pods
and
not
actually
an
issue
with
the
us.
A
Okay,
was
there
more
that
we
wanted
to
bring
less
or
is
it
yeah
there.
C
Was
just
the
the
one
thing
that
we
wanted
to
check
was
to
make
sure
that
we
had
all
the
right
reviewers
on
clayton's
patch
like
sergey?
Do
you
know
if
lantau
could
maybe
take
a
look
at
this.
D
Yeah
sorry,
I
just
reply
down
in
the
chat.
I
can
take
a
look.
C
Great
yeah,
you
might
want
to
take
a
look
at
the
the
bug
as
well
and
sort
of
like
the
discussion
on
the
bug
and
that
should
hopefully
catch
you
up
so,
but
I
I
included
the
links
to
I
think
everything
relevant
in
the
agenda.
So
you
should
be
able
to
find
that
not
a
problem.
F
C
A
Okay,
excellent
well
thanks
alana.
The
next
item
here
was
looking
for
reviewers
on
a
new
cap.
G
I'm
speaking
for
biology,
basically,
he
put
the
skip
a
couple
of
months
back
and
the
basic
and
nam
manu
and
the
basic
idea
was
like.
We
wanted
to
introduce
like
a
way
to
reject
admitting
parts
based
on
the
node's
properties.
G
So
for
us,
the
use
cases
that
fargate,
for
example,
does
not
allow
privileged
containers
and,
if,
like
the
fargate
as
a
compute
platform
or
like
technology,
does
not
allow
privileged
containers
and
by
the
time
it
reaches
like
target
it
could
be.
It
could
lead
to
like
it's
too
late.
So
we
would
want
to
like
from
a
security
point
of
view
and
also
from
a
usability
point
of
view.
G
You
would
like
to
add,
like
pod
labels,
handlers
determine
like
os
label
properties
and
then
reject
either
by
a
hard,
reject
or
a
soft
project,
depending
upon
whatever
it
is.
G
We
would
try
want
to
try
and
target
it
for
1.24,
but
in
general
we
want
to
see
like
what
what
scenes,
if
this
seems
like
a
good
idea
as
there,
and
we
think
like
there
are
other
use
cases
that
people
like
the
community
can
also
benefit
from.
Like
recently,
windows
introduced
the
os
tech
field,
but
then
you
could
also
like
have
things
outside
the
pod
that
you
can
exit
into
for
different
operating
systems
and
determine
different
properties
of
different
oss.
All
you
have
to
do.
G
All
the
node
has
to
do
is
like
install
that
particular
plugin,
regardless
of
if
it's
exec
or
grp
whatever
it
is.
I
don't
know
if
I
summarized
it
well
enough.
A
A
Take
review
and
we
can
make
sure
it's
on
the
124
queue.
I
guess.
Unfortunately,
I
didn't
I
had
missed
this
for
123.,
so
I'll
follow
up
on
that
and
we
can
iterate
on
the
cap
now
between
then
and
now
is
that
is
that
five
minutes.
G
Yeah,
that's
perfect.
We
were
thinking
one
yeah
1.23
was
too
soon
for
us.
Also,
so
1.24
would
be
perfect.
A
Okay,
but
the
thing
I
was
trying
to
recall
here
was.
A
I
couldn't
remember
if
jr
bellagio
talked
through
like
what's
the
desire
to
deliver
this
plug-in,
maybe
as
a
static
pod
itself
or
outside
of
a
kubernetes
management
model,
maybe
just
a
daemon
on
the
operating
system.
Externalized
with
was
there.
G
I'll
take
that
question
like
because
it's
a
fundamentally
a
designed
question
and
the
initial
idea,
if
I
remember
currently
what
it
would
reside
on
the
node
itself.
G
Question
and
ask
you:
I
know
that
for
a
fact
that
target
does
not
support
demon
sets
today
so,
but
still
I'll
revert
back
either
here
or
on
the
kept
itself.
A
Okay,
yeah
all
right,
well
excellent
I'll.
I
look
forward
to
walking
through
that
and
then
everybody
else
wants
to
review
that
cap
or
add
their
use
cases.
Please!
Please
do
all
right
anything
else.
We
want
to
bring
up
on
that
for
me
to
be
on
the
call.
A
All
right
adrian,
I
think,
you're
up
next.
H
Yeah
I
just
wanted
no,
so
I
just
wanted
to
mention
that
the
the
checkpoint
restore
cap
had
one
reviewer
who
was
more
or
less
in
favor
of
it.
I
pinged
my
now.
He
made
a
review
today.
I
I
Okay,
this
is
the
stage
one
corresponding
to
your
whip
that
you
have
your
preferred
concept
right
now
and
if
you
can
clean
the
one,
then
I
think
it
will
be
in
a
better
shape
to
merge
and
I,
I
think,
took
a
brief
look
at
your
work
in
progress,
your
pr
poc.
It
looks
good
yeah,
it's
like
it's
yeah.
It's
it's
good!
It's
as
we
had
discussed.
H
I
H
Yeah
so,
okay,
I
will,
I
will
talk
to
you
offline,
about
additional
changes
and
then
we
probably
can
and
then
I
think,
dawn
was
assigned
as
a
as
a
proverb
and
we
can
talk
to
her.
I
C
Okay,
yeah,
let's
jump
in
because
you
mentioned
that
somebody
had
said
it
mostly
looks
good
to
them.
I
don't
think
that
person's
a
kubernetes
org
member,
so
it's
good
that
they
looked
at
it,
but
they
are
not
the
sort
of
like
we
need
somebody
who's,
an
org
member
who
has
like
sign
off
powers.
H
Okay,
because
I
think
I
I
I
kind
of
expected
that
the
person
who
did
a
review
is
the
one
we
added
as
a
reviewer
in
the
sick,
note
123
planning
document,
but
I'm
not
sure.
C
C
A
So
I
think
adrian,
we'll
just
dawn's,
obviously
not
here
today
to
speak
to
her
time,
but
I'm
I'm
sure,
together
with
all
you
can
okay
make
the
increments
that
we
needed,
but
I
don't
think
there
was
any
major
disagreement
on
the
desire
to
get
checkpoint
functional
and
then
we
had
use
cases
that
we
had
discussed
around
like
a
security
analysis
where
we
all
agreed
checkpoint
was
useful.
So,
let's
just
work
with
raw.
If
you
can
on
getting
the
updates
that
we
all
okay
saw
and
then
get
done
to
review
as
well.
A
Thank
you
adrian
all
right.
So
then,
the
last
topic
today
looks
like
a
presentation
where
now
and
marcus
you
were
putting
together.
You
guys
went
to.
I
Marcus
was
looking
into
splitting
of
like
startup
latency
and
he's
done
some
analysis
and
he
just
wants
to
present
what
he
has
done
for
and
looking
for
feedback
marcus.
You
want
to
take
it
away.
J
Right
yeah,
I'm
here
good
evening:
everybody
can,
I
grab
the
share
screen
or
does
anybody
else
want
to.
J
Nice,
okay,
cool,
so
it's
a
font,
size,
etc.
Good
as
well,.
J
J
Basically,
if
you
look
at
serverless
systems
again
as
a
quick
reminder,
they
try
to
scale
up
and
down
parts
as
quickly
as
possible
and
even
down
to
zero,
which
means
that
if
you
want
to
scale
up
from
zero
to
something
the
pod,
startup
latency
is
going
to
be
part
of
the
latency
of
the
user
and
kind
of
the
narrative
in
and
I've
worked
for.
Different
serverless
platforms
thus
far
has
always
been
that
kubernetes
is
too
slow.
It
just
doesn't
work,
and
I
got
kind
of
sick
of
that
of
that
narrative.
J
J
We
are
probing
non-ready
ips
from
our
routers
to
basically
also
bypass
the
readiness
checks
of
parts
and
then
there's
a
few
ideas
that
are
not
implemented
yet
and
that
kind
of
don't
want
to
implement
because
they
are
like
a
lot
of
incidental
complexity,
just
because
we
need
to
work
around
kubernetes
as
a
whole,
which
is
in
this
case
note
local
scheduling
and
pod
pre-warning.
So
the
question
I
ask
is:
is
that
even
necessary?
J
So
I
can
we
improve
kubernetes
itself
to
drop
those
hacks
and
not
need
them
and
make
the
whole
community
better,
and
I
think
yes,
that
is
very
much
possible.
I
look
at
it
a
little
bit.
There's
a
cab
for
subsequent
probe
granularity
somewhere.
I
couldn't
find
a
link,
but
I
know
that
somebody
has
written
something
up,
but
we'll
not
care
about
that.
For
the.
J
For
the
sake
of
this
presentation,
I
wanted
to
actually
look
at
actual
improvement,
actual
performance
improvements
on
the
existing
behavior,
for
example
in
cubelet,
the
cni,
scheduler
etc,
and
I
kind
of
randomly
picked
cubelet,
because
I
was
interested
in
how
it
works,
so
I
just
started
there
and
all
of
the
result.
J
All
of
the
pictures
that
you're
going
to
see
right
now
are
all
taken
on
kind,
so
take
them
with
a
grain
of
salt,
I'm
mostly
using
like
the
numbers
to
get
ballpark-ish
numbers,
and
I
mostly
at
least
for
now,
am
reasoning
about
improvements
that
could
be
made
by
going
through
them
in
theory,
rather
than
actually
measure
measuring
things.
You'll
see
in
a
second
right
that
somewhat
makes
sense
for
now
I've
written
two
little
tools
just
for
me
to
get
an
understanding
of
the
cubelet,
because
the
code
base
is
somewhat
complex.
J
So
this
is
like
a
pod
speed
is
trying
it's
like
spawning
parts
with
a
defined
gamma
schema
just
like
repeatedly,
so
it
can
generate
some
numbers
from
that
and
then
cube.
Tracer
is
what
generates
the
pictures
that
you're
going
to
see
right
now
is
basically
just
just
using
the
recently
introduced
logging.
J
I
forgot
the
term
in
english,
structured
logging
to
to
fetch
only
the
logs
that
are
relevant
for
a
certain
part.
So
I
can
like
get
a
look
at
what
the
pods
startup
latency
looks
like
and
then
like.
One
of
the
things
that
I
noted
almost
immediately,
which
you
can
see
in
this
case,
is
that
even
just
launching
a
part
as
like
it
attaches
this
default
volume
like
the
the
cube
api.
J
Did
the
service
account
token,
and
that
has
a
300
millisecond
floor
and
if
you
look
at
look
through
the
code,
as
I
think
arbitrary
300
millisecond
retry
on
a
loop
which
then
spawns
another
loop
which
has
an
arbitrary
100,
millisecond,
retry,
etc,
etc,
etc.
Go
down
that,
and
basically
that
adds
up
to
this
300
millisecond
floor.
But
the
actual
work
taken
is
only
like
10
or
30
milliseconds
ish.
If,
if
even
that,
so
yeah.
A
Pause
here
for
one
second,
so
yeah
sure
if
my
memory
this
this
slide
is
very
good
and
my
memory
was
right.
I
I
wanted
seth,
I
think,
you're
on
the
call
earlier
we
had
gone
through
and
explored
this
and
I
think
at
one
point
we
saw
literal
sleeps
in
some
of
the
volume
management
code.
A
I
thought
we
had
pulled
all
those
out.
I
guess
is
my
memory
bad
on
that
stuff
and
then.
K
No,
I
pulled
out
at
least
one
where
we
were
doing
like
a
like
a
poll
that
initially
waited
at
least
100
milliseconds.
I'm.
K
J
Yeah-
and
I
think
that's
correct
in
this
case
too,
because,
like
the
initial
check,
wouldn't
be
true,
I
think
the
problem
in
this
case,
I
haven't
actually
cooked
up
a
solution
for
it
yet.
But
what
happens
in
this
case
is
the
volume
manager
works
in
that
it
schedules
jobs
on
pre-created
runners,
so
the
runners
are
basically
just
go
routines
per
part,
and
then
they
have
internal
reconcile
loops,
which
are
firing
only
once
every
100
milliseconds
and
then
eventually
that
you
have
to
go
through
through.
I
think
at
least
two
steps.
J
One
is
verified,
controller
attached
volume
and
then
one
is
mount
volume
and
they
are
like
individual
jobs
scheduled
to
that
worker,
and
I
think
they,
I
think
the
solution
might
be
to
just
introduce
like
a
signal
once
that
worker
job
is
done
to
actually
signal
back,
reconcile
immediately
launch
a
second
worker
job.
Reconciling
me
like
signal
that
back
and
then
signal
back
to
the
top
thing
that
something
happened
then
to
please
double
check,
something
something
like
that.
J
I
think
like
in
this
case,
I
think
I've
only
also
written
it
here
that
there's
just
that
doesn't
seem
to
be
a
feat
like
a
signal
feeding
back
from
these
workers
that
the
worker
has
actually
done.
It
just
waits
for
it
to
be
done
through
these
retries,
and
that
might
be.
There
might
be
a
potential
to
tighten
that
up.
K
A
A
I
think
it'd
be
good
to
also
reach
out
to
sig
storage,
because
they're
actually
properly
the
owner
of
the
volume
manager.
It's
just
our
code
structure
makes
it
intertwined.
D
A
Reach
out
to
some
of
those
folks
as
well
to
make
them
aware
of
this.
What
I
was
curious
about
was
the
use
case
of
a
representative
pod,
obviously
you're
getting
the
service
account.
So
there's
always
at
least
one
volume,
but
is
the
use
case.
You
were
exploring
largely
pods
with
just
secrets
and
config
maps,
or
are
you
also
focused
on
pods
that
need
access
to
persistent
storage
generally.
J
That's
a
very
good
question:
to
be
honest,
I'm
kind
of
working
my
way
from
the
simplest
part
possible
and
then
like,
if
I,
if
I
can
find
obvious
problems
which,
like
this,
is
kind
of
an
obvious
problem.
In
that
case,
I
try
to
fix
that
like
get
that
fixed
first
and
then
add
more
to
the
pots
back
like
the
two
life
written
pot,
speed
has
like
basic
parts.
That
is
just
that
they
literally
just
have
one
container
with
one
image
and
that's
it
and
by
the
way,
I'm
assuming
pre-pulled
images
too.
J
I'm
kind
of
leaving
that
out
of
the
equation
for
now,
and
it
also
can
spawn
canadian
style
parts,
which
then
has
two
containers,
one
being
a
sidecar,
a
readiness
probe
being
set
up
et
cetera,
et
cetera,
I'm
kind
of
working
my
way
up
from
the
simplest
possible
towards
the
canadian
spec
and
then
maybe
even
further,
as
you
mentioned,
like
configmaps,
etc,
will
be
in
the
picture
at
some
point
as
well.
J
J
As
I
said,
it's
not
an
exhaustive
list
of
the
things
that
we
might
do,
and
the
second
in
this
case
is,
as
as
y'all
may
know,
the
couplet
is
largely
driven
by
the
plaque
and
that
black
has
a
one
second
timer
which,
like
it
kicks
it,
every
second-
and
I
think
sync
part-
is
only
allowed
to
pass
once
the
plaque
has
updated
its
state
once
it's
updated
the
cache
and
that
causes
for
some
arbitrary
long
latencies
as
well
as
can
be
seen
in
this
case
here.
J
And
I
don't
actually
have
a
very
good
like
it's.
Basically,
it
comes
down
to
the
same
quote-unquote
solution
as
the
first
one
is
to
add
more
like
to
add
signals
to
kick
the
the
loops
earlier.
If
we
know
that
that
it's
valuable
to
kick
them,
for
instance,
I've
cooked
I've
cooked
up
a
very
simple
thing.
J
A
very
simple
quarter
quote
solution
where
actually,
after
this
right
before
the
sync
port
exit
is
called,
I
poke
the
plaque
to
make
it
like
relist
immediately
and
that
drops
the
that
that
would
drop
this
entire
700
milliseconds
down
to
to
zero,
at
least
for
the
very
simple
pod
cases.
J
So
that's
one
thing
and
that's
kind
of
the
whole
schema
here-
is
to
make
things
a
bit
more
event-driven
rather
than
strictly
timer-based.
Obviously,
where
appropriate.
Only.
A
Yeah,
so
the
one,
the
one
thing
on
this
one,
I
think
I
would
love
to
get
to
an
event
driven
plug.
What
I'm
wondering,
though,
is
if
that
will
just
shift
the
problem
to
the
runtime.
I
guess
is
there
anyone
here.
D
A
I
So,
on
the
runtime
side
direct,
we
could
be
faster
like
because
we
can
use
like
I
notified
to
detect
stops
and
when
we
do
a
start,
we
can
check
after
the
start.
I
think
we
can
be
quicker
than
one
second.
D
I
think
it's
nerdy
and
darker
already
support
this
kind
of
event
today
and
actually,
when
we
worked
on
flag
at
the
beginning,
we
already
started
looking.
We
already
looked
into
the
docker
events
and
it
actually
worked.
It's
just
that
they're
they're
there
there
was
some
actual
work
and
at
that
time
our
main
focus
is
to
reduce
the
cubic
resource,
usage
and
condition.
Resource
usage
and
the
release
was
enough
at
that
time
and
that's
why
we
didn't
actually
finish
the
complete
event-driven
flag.
D
But
now,
if
the
the
latency
become
concerned,
I
think
it's
possible
to
do
that
and
I
think
at
stockholm
connect
they
support
it.
We
just
we
probably
need
to
define
a
like
a
grpc
streaming
based
event,
api
and
then
in
cri
and
then
get
the
event
from
the
runtime.
A
Yeah
agree
with
that,
I
guess
I
know
we're
reaching
a
conclusion
here,
marcus,
but
the
other
area,
where
I
know
at
least
morneau
and
I
have
discussed
where
an
event
driven
plague
would
be
useful-
is
to
just
bring
down
cubelet
resource
utilization.
A
And
so
there's
definitely
a
lot
of
benefits
for
it,
because
the
plug
itself
is
a
it's
a
generator
of
a
lot
of
garbage.
I
I
I
guess
like
if
we
don't
worry
about
the
docker
shim
like
container
d
and
cryo
maintenance
should
be
able
to
if,
if
folks
from
both
the
communities
are
like
signed
up
to
help
drive
it,
it
shouldn't.
Take
that
long
like
and
london
mentioned,
canada
already
has.
The
events
and
drive
is
up
for
adding
whatever
we
need
to
do
to
support
this.
J
A
F
Yeah
or
or
you
could
do
a
a
subscription
model
where
you
get
the
initial
and
then
we'll
give
you
updates
to
your
subscription.
J
Yeah
and
the,
as
I
said
like
the
very
short
term
right
now,
solution
cooked
up
is
kind
of
kicking
the
plaque
when
we
know
it
should
have
something
new
like
in
this
case.
J
Maybe
I
can
show
the
code
here
if
that
works,
make
that
a
bit
bigger,
like
I
literally
just
added
a
poke
channel
to
the
plaque,
so
it
can
be
poked
from
the
outside
and
then
at
the
end
of
sync
part,
I'm
checking
if
one
of
the
actions
was
a
start
container
and
if
it
was
we
kind
of
know
that
we
changed
the
runtime
state
in
some
way
and
hence
poke
the
plaque
to
relist
immediately.
I'm
not
sure
if
we
can
also
like
do
a
more
focused
just
list.
J
This
just
give
me
that
container
or
just
give
me
that
part,
but
for
the
sake
of
just
showcasing
it.
This
is
like
brute
forcing
a
whole
realist,
and
that
kind
of
worked
like
that
improved
things-
and
I
guess
which
brings
me
to
one
of
the
more
important
bits
is:
how
do
we
actually
surface
things
improved
right?
J
I
don't
have
that
here.
I'm
I
I
have
to.
I
have
to
rely
on
your
guidance
to
tell
me,
where
best,
to
do
that,
if
there's
already
like
dashboards
or
performance
tests,
that
that
would
reveal
that.
A
Yes,
I
think
marcus
when
we
had
met
before.
If
I'm
not
mistaken,
I
talked
about
how
we
had
the
node
performance
dashboard
that
maybe
we
could
revitalize
right
that
wasn't
necessarily
testing
the
representative
pod
that
you're
presenting
here.
I
think.
A
You
know
mano
and
I
talk
a
lot
lately.
He
and
I
were
having
a
conversation
before
this
where
it's
like.
I
think
we
want
to
be
able
to
get
more
prescriptive
guarantees
of
like.
A
Ways
that
we
can
measure
success
so
like
what
I'm
curious
about
is
the
answer
of.
I
don't
want
to
wait.
Any
time
you
know
isn't
is
the
ideal
answer.
I
guess,
but
is?
Is
there
a
a
pod,
startup
budget
that
you
feel.
A
J
Right
so
one
definitely
definitive
goal
is
to
get
sub.
Second,
which
is
you
know
you
have
to
kind
of
specify
on
which
platform
with
which
networking
provider,
et
cetera,
et
cetera,
et
cetera.
That
like
throws
a
whole
curveball
into
the
equation,
but
I'm
trying
to
get
sub
second
in,
like
say,
p95,
something
like
that.
So
second
is
the
mark
for
me
right
now,.
J
Good
point
I
have,
I
don't
have
a
pod
speed
output
here,
but
generally
I'm
going
via
cube
api.
Yes,
because
that's
like
what
that's
what
load
balance
as
a
load,
balancers,
etc,
would
see
right
to
to
configure
networking
etc
to
to
actually
use
the
part.
J
There's
one
optimization
that
we
can
go
for
if
we
run
into
a
wall
which
is
I'm
looking,
I'm
also
looking
at
time
to
ip,
where
I
think
it's
valuable
to
optimize
the
time
it
takes
to
get
the
ipd
into
the
ap,
the
pod
ip
into
the
api
server,
because
that
kicks
off
services
and
endpoints,
which
can
also
kick
off
load
balancer
programming.
And
we
already
have
that
thing
where
the
load
balancers
take
non-ready
ips
and
do
use
the
well-known
probe
to
find
out
if
the
thing
is
already
ready.
D
J
K
K
So
when
I
was
doing
this
analysis
before,
I
would
always
do
base
it
off
of
like
the
syncpod
ad
event,
it's
like
from
from
there
to
when
the
pod
status
is
updated
to
running
like
how
long
does
that
take,
which
kind
of
isolates
it
to
the
cubelet
and
how
you
know
what
what
we
would
have
control
over
in
sig
note.
J
Right,
yeah,
I'm
open
to
suggestions
like
that.
That's
fine
with
me!
If,
if
that's
like
measurable
in
a
the
main
problem,
is
being
measurable
in
a
general
generic
case
right,
which
is
why
I
started
with
cube
api
visibility,
because
that
means
I
can
run
the
tool
against
any
kubernetes
cluster
and
see
what
the
latencies
look
like
like
in
general.
But
as
you
say,
it
doesn't
rule
out
things
like
cubelet
things
like
the
api
being
slow,
noisy,
neighbors,
etc,
etc.
J
J
Looking
at
best
case
scenarios
only
for
right
now,
because
there's
I
have
found
enough
potential
for
improvement
in
the
best
case
scenarios
before
looking
at
like
things
like
what
happens,
if
you
start
these
sports
in
parallel,
what
is
like?
Is
there
a
lot
contention
and
things
like
that?
J
A
Yeah,
that
makes
sense.
So
what
I'm
wondering
is
for
folks
that
might
be
on
the
call
and
reflecting
on
this
a
little
bit
at
least
I'll
put
my
red
hat
hat
on
right
now.
I
guess
there's
particular
resource
budgets
and
maybe
density
goals
that
I'd
love
us
to
be
able
to
get
the
community
to
be
able
to
achieve
which
would
be.
You
know
at
a
particular
pod
density.
How
quickly
can
pots
start
and
stop
and
what
amount
of
overhead
does
a
cubelet
and
and
runtime
provide?
A
I
think
that's
in
the
best
interest
for
everybody,
and
so,
if
folks
have
usage
of
kubernetes
in
their
organizations
where
you
have
particular.
A
Budgets
that
are
trying
to
be
met
it'd
be
great
if
we
could
get
them
shared
and
talk
through
and
just
start
measuring
our
success
relative
to
those
metrics.
I
guess
some
of
the
stuff.
I
think
we've
talked
about
earlier
around
the
density
of
things
that
I
think
we're
starting
to
focus
on
with
like
housekeeping
intervals
and
in
c
advisor
and
how
we
get
metrics
on
pods.
I
guess
so.
This
all
just
seems
to
go
together.
A
Well,
maybe
that's
my
last
point,
which
is:
is
there
an
average
life
expectancy
for
these
pods
that
you're
also
trying
to
optimize
for
or
is
it
just
a
zero
to
start?
But
then,
once
it's
up,
there's
not
like
a
churn
rate
that
you're
trying
to
build
towards.
J
Yeah
at
least
not
right
now
I
have
noticed
that
if
you
create
the
pods
too
quickly
at
least
right
now
and
the
startup
times
somewhat
deteriorate,
and
I
was
kind
of
attributing
that
to
the
housekeeping
etc.
I.
I
J
Digged
deeply
enough
into
that
code,
yet,
though,
to
to
know
for
sure,
but
as
I
said
it's
kind
of
right
now,
it's
the
simplest
part
possible
and
starting
it,
it's
just
like
getting
it
ready,
getting
it
to
serve,
and
that's
that's
it
for
now.
F
F
Yeah,
no,
I
think,
isn't
the
candidate
model.
Doesn't
it
also
include
keeping
a
pot
up
for
an
extended
period
of
time,
but
running
containers
for
the
for
the
services
requested.
J
J
And
yeah,
so
maybe
one
one
sentence
towards
I.
I
do
know
that
the
cubelet
code
is
somewhat
diligent,
yeah,
sometimes
a
bit
hard
to
change,
so
I'm
I'm.
I
am
looking
at
the
least
impact
possible
also
with
regards
to
what
you
mentioned
before.
J
I
think
it's
not
an
option
to
like
make
the
plaque
quicker,
like
the
interval
shorter
or
things
like
that.
I
think
that's
that's
kind
of
off
the
table.
I'm
looking
at
actually
saving
time
versus
saving
time
and
work
versus
making
more
work
to
achieve
a
better
latency,
yeah.
A
F
Derek
donald,
if
you
saw
lantau
posted
a
link
to
an
old
event,
driven
stream
idea
for
docker
from
2015.
F
J
Right
I'll
go
look
at
all
of
these.
If
anybody
has
a
a
pointer
with
regards
to
the
the
dashboard
that
that
you
mentioned
earlier,
that'd
be
cool,
I
guess
we
could
like
add.
J
I
don't
know,
maybe
the
the
simple
part
to
that,
if
the,
if
it's
not
already
a
very
simple
part
and
see
how
we
can
make
things
like
get
things
measured,
so
I'm
not
shooting
in
into
like
you,
don't
have
to
trust
the
numbers
that
I'm
posting
them
and
other
than
that.
This
was
mostly
supposed
to
be
an
introduction,
so
y'all
know
who
I
am
where
I'm
coming
from
and
that
I'm
going
to
to
push
on
that
or
I'm
planning
to
push
on
that.
If
y'all
think
it's
it's
useful,
so
yeah.
A
All
right,
well,
thanks
so
much
marcus
and
if
you
all
do
get
together
be
great
to
just
make
sure
we
all
can
participate.
I
look
forward
to
hearing
what
might
come
out
of
some
of
those
discussions,
but
I
think
it'd
be
awesome
if
we
can
start
looking
towards
a
venture
and
plague
in
the
upcoming
124
release.
A
I
think
that
got
through
everything
on
the
agenda
were
there
any
other
topics
that
folks
wanted
to
raise.
Otherwise,
I'm
happy
to
give
back
10
minutes.
C
I
just
wanted
to
give
folks
a
quick
shout
out
for
we
had
the
cap
deadline
last
week
and
I
was
really
impressed.
I
think
we
got
like
everything
merged
and
ready
to
go
24
hours
before
the
deadline,
except
for,
like
you
know
the
one
thing
I
think
we
ended
up
dropping
so
that
was.
I
A
All
right,
hopefully,
we're
just
as
good
on
the
execution
side
of
our
plans.
Yeah.
A
Side,
I'm.
C
Gonna
send
out
a
follow-up
email.
We
had
discussed
having
a
sort
of
like
soft
deadline
to
ensure
that
we
had
a
bunch
of
pr's
up
and
we
were
able
to
track
stuff
through
the
release,
so
I'll
send
out
an
email
talking
about
like
you
know,
we
got
these
beta
caps
in
so
we
expect
them
to
merge
approximately
around
the
state
and
for
all
the
alpha
caps.
C
A
That
that's
a
good
positive
note
to
end
the
meeting
on
thanks
everyone
for
joining
today
and
we'll
talk
to
y'all
next
week.
Bye,
bye,.