►
From YouTube: Kubernetes SIG API Machinery 20200729
Description
Today's Agenda:
-[wojtekt]Proposal to fix api server starting up with empty change history in watch cache: https://github.com/kubernetes/enhancements/pull/1878
-[Bhagwat] Discussion about deep healthz check on API server. [
-[Bhagwat] Discussion about graceful shutdown of API server.
-[fedebongio] metacontroller update here.
-[mvladev] ResourceQuota admission controller and aggregated apiservers
A
One
two
three
record
thing:
hello:
everybody
welcome
to
the
kubernetes,
see
api
machinery
by
weekly
meeting
we
have
today
is
july,
29th
2020,
and
we
have
a
number
of
items
in
the
agenda
that
we
are
hoping
to
go
through
today,
so
without
any
further
delay,
let's
get
to
it.
I'm
presenting
the
agenda
and
I
think
the
first
one
is
something
that
was
brought
up
by
voice
and
daniel.
So
let's
get
to
it.
B
Yeah,
so
I
can
talk
about
it,
so
I
wanted
to
talk
a
little
bit
about
the
proposal
that
I'm
I
have
put
together.
It
doesn't
seem
to
be
super
controversial
based
on
daniels
and
david's
comments,
who
already
at
least
initially
reviewed
that
but
like
like,
I
wanted
to
give
you
a
chance
to
actually
object.
B
If
you
see
any
problems
with
that,
so
the
problem
that
I'm
trying
to
solve
is
basically
that,
because
the
way
how
watch
cache
is
actually
initialized
whenever
you
are
using
watch
cache
and
that
is
really
needed
in
like
large
clusters,
at
least
for
some
resources
or
for
some
high
cardinality
and
high
churn
resources
in
particular,
pods
nodes
leases
probably
aren't
crucial,
but
pods
nodes
end
points.
If
you
are
extensively
using
them
and
stuff
like
that.
B
So
the
way
how
currently
watchcase
is
initialized
is,
is
it
doesn't
start
with
any
history
it
just
is.
It
is
just
initialized
based
on
like
the
current
state
of
at
cd,
so
it's
basically
initialized
from
now
for
some
definition
of
now
using
the
quorum
list
to
add
cd
and
that
is
causing
two
main
problems.
B
One.
The
first
problem
is
that,
basically
in
case
api
server
is
rebooted
or
even
if
there
are
multiple
api
servers
and
we
are
doing
a
rolling
upgrade
whenever
a
watch
was
connected
to
to
to
an
api
server
whenever
there
was
some
watch
that
was
served
by
the
old
api
server
and
now
and
after
reboot,
it's
or
after
during
the
rolling
upgrade
it's
as
a
result
of
rebooting
the
api
server
it
was
connected
to.
It
is
reconnecting
to
a
to
an
new
newly
intro
newly
in
newly
initialized
api
server.
B
If
there
weren't
any
changes
in
between
times,
the
watch
will
actually
fail
with
two
old
resources
or
yeah.
Two
old
resource
version
and
released
will
be
required,
so
in
particular,
when
there's
no
no
changes
happening
to
any
of
objects
of
a
given
resource
type,
because
watch
cache
is
per
resource
type.
B
After
the
upgrade
of
last
last
api
server,
pretty
much
every
single
watch
will
will
require
like
a
re-initialized
re-initialization,
with
least
which
is
causing
significant,
like
performance
issues
in
large
clusters.
B
The
second
problem
is
basically
the
fact
that
we
are
initializing
from
like
quorum
read
from
ncd,
and
even
though
we
are
listing
just
objects
of
that
time,
the
the
resource
version
the
list
is
returning
is
actually
global
for
that
particular
hcv
backend,
which
means
that,
if
something
changes
in
between
upgrades
of
of
different
api
servers,
but
no
object
of
that
particular
type
is-
is
changing.
B
So
even
though
it
usually
it
after
initialization,
it
is
updated
only
to
it
is
then
updated
by
watch,
so
only
it
can
only
have
values
of
objects
it
has.
It
can
only
have
values
where,
for
which
there
was
an
object,
change,
either
creation,
update
or
deletion
of
that
resource
of
object
of
that
of
some
object
of
that
resource.
B
B
The
progress
notify
feature
from
xcd,
so
the
progress
notify
feature
allows
you
to
configure
watch
to
send
your
periodic
periodic
progress,
notify
events,
that's
how
them
how
they,
how
they
are
called
in
etcd,
that
that
will
tell
you
that
your
watch
is
actually
synced
to
particular
resource
version
and
that
resource
version
to
it
to
what
that
contained,
that
this
progress,
notify
event
contains,
is
actually
also
global,
so
we
will
be
able
to
actually
so
so
that
obviously
fixes
the
the
the
second
problem.
B
It's
pretty
obvious
why
it
fixes
what
it
is
fixing
that
we
need
to
change
at
cd
a
little
bit,
because
currently
it's
like
hard
coded,
the
the
interval
is
card
coded
to
10
minutes.
We
need
much
less,
but
but
we
there
is
a
fairly
simple
proof
of
concept.
Pr
put
some
time
ago
by
jingy,
so
so
it
should,
it
shouldn't
be
very
controversial.
B
B
B
And
and
we
need
to
like,
adjust
the
yeah,
we
need
some
relatively
local
and
small
changes
in
watch
cache
and
and
in
the
refrigerator
to
to
actually
handle
correctly
the
progress
notifies
but,
like
I
have,
I
have
poc
it's
linked
from
the
cap.
Also,
it's
pretty
small
and
it
actually
shows
that
it
it
works.
B
We
were
also
testing
or
I
was
also
testing
it,
as
per
like
on
the
reboots
to
see
like
the
improvement
on
very
large
clusters,
and
it
actually
is
tremendous
improvement,
so
it
really
helps
and-
and
then
we
need
to
like
yes,
so
so
I
mentioned
that
we
need
to
adjust
the
the
period
like
the
exact
frequency
is
probably
to
be
done.
I
was
thinking
about
something
between
one
second
and
like
10
seconds,
the
motivation
or
the
requirement
to
to
fix
the
solution
to
for
the
solution
to
be
to
fix.
B
The
problem
is
basically
that
we
need
to
receive
an
event
or
progress
notify
event
or
if
there
is
any
yeah
progress,
notify
event.
After
the
cuba,
api
server
was,
after
the
previously
rebooted
api
server
was
initialized,
but
before
the
next
one
is
actually
going
to
be
initialized.
B
So
so,
depending
like
how
this
how
this
time
interval
looks
like
it,
it
may
we
we
might
have
like
place
to
like
tune
this
exactly,
but
but
probably
like.
It
won't
be
lower
than
like
couple
seconds
anyway.
So
so
anything
like
that
seems
fine
and
the
final
thing
we
need
to
do,
which
is
a
little
bit
or
toggle.
B
Now
to
this
or
we
would
it
would,
it
would
be
useful,
no
matter
if
we
proceed
with
that
proposal
or
not
is
like
to
to
change
the
api
server
to
also
send
watch
bookmarks
like
on
shutdown
so
right
before
shutting
down
it
will
send
watch
bookmarks
to
all
watches
that
will
be
like
broken
because
of
like
ips
servers
shutting
down
so
so
that
is
roughly
it
like
there's
much
more
or
much,
hopefully
much
cleaner
explanation,
the
cap.
If
you
want
to
follow
up
but
yeah,
I
think
that's,
that's
mostly
it!
C
So
I
read
the
idea:
the
idea
made
a
lot
of
sense
to
me.
I
did
have
a
question
about
the
behavior
when
we
see
these
progress
things
right.
So
so,
if
I
get
a
progress
indication
and
it
says
resource
version
150
and
I'm
getting
these
notifications
once
every
30
seconds,
one
of
my
q
api
servers
gets
150,
but
the
other
one's
not
going
to
get
another
update
for
another
10
seconds.
C
B
Yes,
that
is
correct.
Yes,.
C
C
I
just
want
to
make
sure
we
don't
end
up
in
a
case
where,
like
one
out
of
three
times,
this
is
going
to
happen
to
you,
because
you
one
out
of
three
times
you
will
hit
the
a
a
q,
a
server,
that's
more
current
than
the
other
two.
E
D
I
I
I
think
it's
relatively
uncommon
for
the
lists
and
watch
requests
to
go
to
different
api
servers.
So
if
they
do
then
yeah
you
got
a
problem,
but
I
think
that's
relatively
unknown.
Why
would
that.
F
D
Why
would
it
be
uncommon?
Because,
because
go
catches,
the
both
http
and
http
2
go
does
client.
Caching.
That
makes
it
very
likely
that
the
request
goes
to
the
same
api
server.
G
C
About
one
more
time,
I'm
willing
to
accept
that
a
disruption
is
just
going
to
cause
weird
things
to
happen
and
like
this
is.
This
is
fine,
in
that
case
wojciech,
if
you
wouldn't
mind,
including
the
explanation
of
cash,
http
connections,
sure.
G
G
I
was
going
to
actually
ask
one
more
question,
though,
like
I
don't
know
of
any
reason
why
we
wouldn't
more
aggressively
use,
watch
progress
notification
to
ensure
clients
in
general,
know
the
resource
version
relatively
frequently
in
as
many
possible
cases
as
we
can
so,
like
you
know
the
point
about
doing
it
right
before
restart.
We
should
just
do
that.
The
problem
is,
is
that
we
have
the
semantics
around
it.
D
So
there's
a
there's,
there's,
there's
kind
of
an
alternative
approach
that
for
check-
and
I
discussed
fairly
extensively
in
the
comments
here,
which
is
instead
of
making
sure
that
clients
are
exactly
up
to
date.
We
could
instead
make
api
server
preload
some
history,
so
that
it's
okay,
if
clients
aren't
completely
up
to
date
and
it
turns
out
the
work
to
do
that-
is
about
the
same
so.
G
I
I
was
gonna
say
like
I
every
time
I
have
had
a
client
that
is
behind
in
some
like
so
there's
a
couple
of
problems
too,
that,
like
we
haven't
really
sorted
out
like
what
happens
when
someone
has
to
restore
from
an
older
ncb
backup-
and
you
go
back
in
time
and
every
single
watch
cache
in
the
cluster
is
broken
in
a
horrible
way,
because
we've
completely
changed
the
fundamental
assumption
of
the
kubernetes
system,
which
is
we're
moving
forward
in
an
immutable
thing,
and
so
you
get
horrible
horrible
things
happening.
Keeping.
I
actually.
G
For
that
sure
I
I
might,
as
we
boot
everything
in
the
cluster
for
fun,
but
but
just
going.
G
Keeping
the
checkpoint
moving
through
the
cluster
is
what
I
would
call
strategically
better
for
the
whole
system
to
make
sure
that
clients
are
relatively
bounded
in
terms
of
resource
version
within
maybe
even
the
compaction
window,
which
is
harder
to
do
but
like
just
like,
like.
I
don't
know
of
a
reason
why
we
wouldn't
want
to
leverage
the
bookmark
mechanism.
G
D
Yeah,
I'm
not
arguing
with
that.
I
I
just
think
like
the
original
proposal
was
like
250
milliseconds
and
that
that
just
seems
a
little
brittle
to
me
if
the
client
has
to
get
yeah
250
milliseconds.
That
doesn't
seem
good.
D
So
I
I
think
it's
reasonable
to
approach
it
in
both
ways
like
like
try
to
keep
clients,
have
a
reasonable
effort
to
keep
clients
like
in
the
same
universe
of
like
being
up
to
date
and
at
the
same
time
preload
some
some
history,
so
that
if
the
client
is
a
few
seconds
older,
like
that's
fine
yeah,
I
was
I
was.
G
G
Doing
what
we
can
to
put
that
in,
I
feel
like
is
a
broader
problem
of
it
would
be
better
if
clients
can
reasonably
understand
what
their
drift
is
without
the
cube
api
server
without
having
to
do
complicated
logic,
which
is
we're
doing
a
better
job
of
checkpointing
within
some
reason,
that's
more
than
three
seconds
and
less
than
compaction
interval
divided
by
two
or
whatever.
In
order
to
like,
for
instance,
I
would
expect
to
see
some
of
that
in
logs.
G
D
So
pro
tip
you
can
you
can
force
everybody
in
the
client
to
clear
their
cache
and
reconnect
by
doing
clever
things
with
ncd.
D
B
D
B
For
the
record,
it's
slightly
related
like
in
119,
like
I
also
we
were
also
scale
testing
like
existing
bookmarks,
like
kubernetes
bookmarks,
and
we
because
initially
there
were
some
ish
like
performance
issues
in
large
clusters
with
them
like
it
seems
like
at
least
some
of
them
or
most
of
them,
or
maybe
all
of
them
were,
were
well
solved
and
like
we
on
top
of
like
what
they
originally,
we
were
just
sending
them
right
before
timing
watch
is
timing.
Out
now
we
are
say
or
starting
with
119,
we
will
be
sending
them
periodically
with
119.
B
sorry
periodically
every
one
minute.
So
if
we,
if
we
start
like
propagating
with
progress,
notified
progress
notified
the
watch
cache
like
we
will
get
this,
what
clayton?
What
you
are
asking
pretty
much
for
free.
G
I'm
not
proposing
we'd
do
anything
with
this
here,
but
just
like
in
general,
a
resource
version
at
the
heart
of
our
system.
How
our
caches
are
kept
up
to
date
is
like
a
fundamental,
cross-cutting
concern.
G
I'd
like
this
purely
because
and
what
you're
talking
about
what
tech
like
that
moving
us
from
a
model
where
we're
kind
of
like
wondering
what
clients
are
doing
to
where
we
are
putting
some
bounds
on
what
you
can
reason
about
in
the
system
in
a
very
concrete
way,
because
it
is
a
fundamental
part
of
when
cube
goes
badly
badly
wrong.
It
goes
badly
wrong
in
those
kinds
of
dimensions
like
you're
out
of
date.
You
understand
we
miss
events,
etc.
A
Sounds
good
voice.
Did
you
get
what
you
needed?
Yes!
Thank
you.
Okay,
perfect!
Thank
you!
So
much,
let's
go
to
the
next
one.
Bhagwat
are
you
here.
H
Yeah,
I'm
here
hey,
so
this
might
be
just
a
a
discussion
and
and
learning
maybe
for
others
as
well
as
an
operator
of
kubernetes
right.
I
would
like
to
understand
more
in
depth
about
a
system
when
a
system
is
bootstrapping
in
a
situation
when,
let's
say
I
created
a
cluster
versus
a
cluster,
is
being
rolling,
update
or
having
a
very
heavy
load.
H
Let's
say
we
have
like
5000
clusters
and-
and
I
would
like
to
basically
segregate
and
understand-
basically,
let's
say
we
are
only
talking
about
like
no
webhook
and
in
only
critical
components
and,
like
the
whole
whole
as
a
api
server,
not
any
like
dependency
on
the
web
hook
for
the
lg
and
then
what
happens
when
a
custom
components
involved
in
that
healthy
or
or
like
api.
So
it
depends
on
the
other
components
to
be.
H
You
know,
be
active
to
be
respondent
and
when
we
say
that
shutting
down,
how
does
a
graceful
shutdown
means
for
this
system
and
like
and
the
like
performance
and
benchmarking
of
that?
What
how
do
we
say
that
hey
this
version
of
kubernetes,
let's
say
119
or
118.?
H
Is
there
a
way
we
can
quantify
that
hey?
This
is
the
basically
our
bootstrap
time
and
it
should
be
up
or
or
what
are
the
caveats
around
that?
What
should
an
operator
needs
to?
You
know,
look
at
and
like
understand
it
thoroughly
that
hey
no,
it
depends
on
these
these
things.
This
is
how
it
happens.
It
cannot
be
quantified
like
yeah.
F
H
On
like
in
depth,
understanding
of
these
systems
around
that,
if
we
can
like
describe
someone,
can
describe
more
on
this,
this.
H
Yeah,
that's
okay.
We
can.
We
can
mostly
focus
today
on
not
like
extension,
but
the
core
of
like
api
server.
Maybe
we
can.
C
I
guess
I
guess,
then
I
would
be
trying
to
figure
out
what
difficulty
you're,
having
we've
improved
these
messages
in
the
log
this
past
release.
Actually,
I
think
walter
tagged
the
pr
so
that
in
the
log
when,
instead
of
getting
just
reason
withheld,
you
actually
get
the
reason
in
the
log,
so
you
can
look
at
it.
That
makes
it
slightly
easier
for
a
cluster
admin
to
know
why,
but
a
decision
about
when
to
give
up,
I'm
that
that
seems
like
a
very
deployment,
specific
choice.
H
H
I
I
Aren't
prioritized,
there's
also
interesting
things
going
on
with
things
like
priority
and
fairness,
which
I
believe
may
help
this,
but
yeah
I
mean.
Certainly
if
you
have
ten
thousand
nodes
and
huge
numbers
of
operators,
it's
pretty
easy
to
the
api
server
so
that
it
takes
a
lot
longer
than
normal
to
start.
H
Say
I'm
saying
I
even
like
I'm
just
bootstrapping
not
forwarding
my
request
yet
to
that
instance,
I'm
doing
let's
say
rolling
update
with
just
five
thousand
or
ten
thousand
in
the
case,
when
you
give
an
example,
I
am
waiting
for
the
the
my
api
server
to
you
know
bootstrap
completely
and
say
that
okay,
I
am
now
healthy
and
ready
to
accept
traffic,
and
then
I'm
saying,
okay,
if
I
have
a
load,
balancer
or
anything
which
can
which
I
have
in
front
of
that
and
then
I
will
say
okay
now
I
am
ready
to
accept
the
request.
H
So
there
are
two
aspects
of
that
as
well
like
if
I
have
a
external
load
balancer,
that's
one
part
of
it
and
the
second
part
of
is
like
when
the
api
server
service
endpoint
gets
updated
lcd
so
that
I
don't
have
control
over
right
like
when.
Does
that
says
like
when
does
the
api
server
service
endpoint
gets
updated?
Does
that
sorry?
I
haven't
basically
verified
that.
Does
that
says
after
some
critical
component?
What
are
those
critical
components,
or
does
it
just
says,
looked
at
the
ping
log
and
a
cd
and
then
say
that?
H
Okay,
we
are
okay
to
basically
say
that
we
are
healthy
and
then
every
cubelet
or
not
the
cube
that
basically,
all
the
components
which
are
relying
on
the
service
endpoint
will
start
talking
to
api
server.
And
while
it's
not
completely
ready
because
most
of
the
critical
components
are
no.
C
It
is,
it
is
ready-gated
right,
so
it
doesn't
add
itself
to
the
endpoints
until
it's
ready
when
it
shuts
down
it
gracefully.
It
removes
itself
before
it
goes
not
ready,
but
we
aren't
going
to
be
able
to
give
you
a
you
should
give
up
after
x
period
of
time,
because
that
period
of
time
varies
significantly
based
on
where
you've
deployed
it.
How
how
big
your
cluster
is,
how
small,
how
big
or
large
the
machine
is
what
your
network
latency
looks
like
how
heavily
loaded
is
that
cd?
Is
there
sufficient?
C
I
o
for
both
the
q
api
server
and
for
xcd?
Do
you
have
you
know
significant
admission
web
hooks
that
you've
added,
which
are
synchronous,
calls
and
can
intercept
some
of
these?
We
aren't
going
to
be
able
to
give
you
a
this
is
the
time
you
should
wait.
It's
dependent
upon
the
deployment
that
you
have.
H
C
I
I
H
Yeah
so,
like
mostly,
I
wasn't
talking
about
in
terms
of
like
time
as
we
discussed
right
like
so,
we
have
some
post
hook
like
critical
components
which
are
going
on.
So,
as
you
described
like
you,
we
update
the
end
point
right
so,
let's
say
you're
saying
like
heavy
load
on
lcd.
H
So
I'm
saying,
like
I'm
even
saying,
like
the
lcd,
let's
say
healthy
right,
I'm
just
like
doing
a
rolling
update
of
my
api
server
right
and
that
point
of
time
I'm
not
touching
my
xcd,
I'm
independently
rolling,
update,
let's
say
kubernetes
version
of
my
system
from
117
to
118
or
118
to
119,
and
but
it
is
like
a
bigger
cluster.
So
where
I,
what
I
wanted
to
understand
is
like
do
for
a
performance
point
of
view
right
like
even
like
5000
or
3000
node,
which
basically
should
support.
I
just
wanted
to
understand.
H
Like
do
we
have
something
which,
as
an
operator,
I
need
to
worry
about
and
take
care
of
when
I'm
when
I'm
having
a
heavy
load
and
performance
systems
where
I
could
say
that
hey
these
are
the
critical
components
or,
let's
say
a
watch
or
a
call
will
get
affected.
H
If
I'm
not
basically
looking
into
those
critical.
At
least
these
are
the
critical
components
which
needs
to
be
okay.
Or
are
we
saying
that
when
we
say
that
the
radiog
is
okay
and
we
updated
the
end
point,
it's
it's.
It's
completely,
okay
to
go
ahead
and
basically
register
that
instance
or
to
start
like
serving
the
traffic.
So.
C
So
that
is
deployment
dependent
right.
We
can
these
checks
on
the
cube.
Api
server
are
things
the
qapi
server
knows
about
itself
for
every
cube
api
server
right,
but
whether
or
not
you're
going
to
have
full
usage
of
your
cluster
is
going
to
be
dependent
on
what
has
been
configured
in
that
cluster.
So
if
you
have
some
resource
for
prow-
and
you
have
some
admission
web
hook
on
there,
this
isn't
going
to
check
to
make
sure
that
admission
web
hook
is
functioning
properly
right.
That's
not
that's
not
what
this
readiness
check
is
for.
I
Will
be
affected
right
so
that,
as
an
example
of
that
yeah,
it
can
be,
if
that,
if
that
web
hook
isn't
working,
that
web
hook
may
cause
an
update
or
create
that
one
of
these
hooks
is
depending
on
to
fail,
and
so
it
may
either
take
a
long
time
or
you
may
never
get
to
a
healthy
state.
Depending
on
how
your
system
is
configured.
H
Yeah
yeah,
basically,
I
understand
the
when
we
have
any
admission
wave
hook
or
any
other
api
extension
or
a
dependency
like
I'm.
I
was
mostly
like
because
that's
that's
the
second
step
of
like
when
the
health
g
like
comes
into
the
picture
and
deep,
healthy,
the
the
even
the
first
take.
You
know
the
definitely
hcd
and
the
api
server
has
to
be
like
the
one
one,
the
first
part
where
it
it.
It
makes
a
whole
like
web
api
server,
where
you
have
a
connectivity
with
your
database.
H
So
I
was
I
I
mostly
wanted
to
focus
and
like
wanted
to
get
some
some
details
on
like
we.
I
think,
like
from
my
understanding
what
I'm
I'm
thinking
and
knowing
like
david,
and
you
guys
can
basically
correct
me
from
wrong
like
let's
remove
the
web
hook
and
like
any
extension,
I'm
I'm
bringing
up
a
api
server
without
any
other
dependency.
H
Can
I
not
benchmark
my
readiness
like
that?
Hey
here
is
what
it
api
server
should
look
like.
Does
it
depends
on
any
caching
or
anything
not
right
like
when
I'm
bootstrapping
my
api
server
it?
It
doesn't
know
at
that
point
of
time.
It
just
basically
empty
bootstrap
and
then
the
request
comes
and
then
the
caching
happens.
Isn't
it.
C
C
There
are
going
to
be
cases
where
you
have
rights
that
you're
going
to
want
to
have
happen
for
things
like
the
bootstrap.
Our
back
rolls,
I
guess
what
I
would
say
is
that
when
the
server
reports
ready
z,
that
is
usually
enough
to
decide
that
this
server
is
going
to
respond
correctly
enough
to
be
worth
sending
traffic
to
and-
and
there
are
many
cases
where
it
won't
be
responding.
C
Exactly
how
you'd
like,
but
I
I
think
that
anything
more
than
what
we
provide
is
really
deployment
specific
and
even
the
timing
is
going
to
be
dependent
on
the
kind
of
hardware
that
you've
got
it
deployed
on
and
the
kind
of
load
that
you
end
up
under,
which
is
it
makes
it
distinct
for
per
deployment
and
so
you'll
see
you'll,
see
different
distributions,
choosing
different
values
and
maybe
even
different
exclusions
on
their
health
z
check.
For
instance,.
H
Yeah,
so
when
you,
basically
when
david,
when
they
were
saying
like
so
in
terms
of
like
how
we
benchmark
that,
like
any
scalability
or
any
anything
like,
we
said,
this
is
the
cpu
core,
and
this
is
like
memory.
It
has
been
tested
like
how
we
say
that
30
pod
and
this
like
one
node,
and
we
have
tested
this
as
a
benchmark.
So
I
was
just
more
on
like
what
resources
we
like.
Yes,
it
differs
definitely
distributed
system
like
the
in
various
way.
H
I
was
just
looking
for
like
if
we
can
benchmark
this
like.
Okay.
Here
is
what
a
system
and
the
resources
which
has
been
used-
and
here
is
our
benchmarking,
and
then
this
should
be
okay.
People
can
use
a
different
way
to
do
that.
I
was
just
you
know
like
this
is
my
core
component.
This
is
how
we
restricted.
This
is
our
benchmark.
C
Yeah
we
have
not
chosen
to
try
to
measure
something
like
that.
To
my
knowledge,
we
don't
have
standard
hardware
configurations
that
we
would
even
like
be
able
to
write
it
down
on.
I
know
that
a
lot
of
distributions
have
opinions
about
what
their
minimum
requirements
are.
D
If
you're,
if
you're,
trying
to
validate
your
setup
and
see
how
how
small
of
a
thingy
you
can
run
it
on,
I
would
recommend
checking
out
the
into
in
tests.
D
If,
if
some
of
the
some
of
the
healthy
hooks
are
indicating
a
failure
or
taking
a
long
time
to
complete,
probably
something
is
not
going
to
pass
some
into
an
intestine
would
be
my.
H
D
Think
yeah,
but
in
general
I
think
you
probably
should
go
if
one
of
these
is
failing
and
you
don't
understand
why
and
you
don't
understand
if
it's
important
or
not,
you
should
probably
go
figure
out
what
that
thingy
is
doing
before
deciding
either
way.
If
it's
important
or
not.
D
Great
helpful
answer,
but
these
things
all
do
different
stuff,
so.
H
Yeah
yeah,
so
I
was
very
curious
and
like
got
infected
with
like
how
can
I
make
sure,
because
I
can't
go
like
through
tomorrow-
let's
say:
I'm
working
on
118
and
tomorrow,
some
other
controller
like
post,
who
got
introduced
right.
I
would
never
be
able
to
like
quantify
each
one
of
them
to
make
sure
that
okay,
these
are
the
ones.
H
D
Ready
z
is
mostly
about
api
server
itself.
It
doesn't
tell
you
very
much
about
the
rest
of
the
control
plane.
H
Yeah
so
does
it.
Let's
say
I
have
a
request
if
I
only
introduced
the
health
g
just
basically
api
server
and
then
I
I
basically
said:
okay,
I'm
ready
to
accept,
but
isn't
actually
the
other
stuffs
are
ready,
not
just
the
dependent
one
but
the
like
any
the
control
specific
thing.
Then
I
would
basically
that
all
request
will
fail
or
what
happens
to
those
requests
is
also
a
question.
So,
let's
say
see,
a
registration
is
not
ready
right
reason
withheld.
I
said:
okay,
I'm
only
looking
at
like
a
ping.
C
I
I
think
that
trying
to
go
through
and
distinguish
what
each
of
these
individually
do
probably
isn't
a
good
exercise
for
this
call.
If
you
want
to
try
to
build
a
way
into
the
code
to
to
make
these
self-documenting
in
some
way.
That
could
be
interesting.
I
can
see
that
being
useful
on
a
detailed
reporting
page
to
say.
The
impact
of
this
not
being
up
to
date
is
x,
ca,
registration
in
particular.
Now
the
request
isn't
going
to
fail
on
that.
H
Yeah
no,
no,
I
do
not
want
to
basically
go
into
just
one
example.
I
basically
shout
through
yeah
we
can
move
on
to
the
next
one.
I
think
I
got
the
answers
I
was
looking
for,
or
maybe
like
some
action
items
to
maybe
do
something.
Yeah.
A
H
Yeah,
I
want
to
basically
quickly
like
touch
down
on
this
one
as
well,
so
when
so
when
so
in
respect
to
the
end
point
right.
So
when
we,
when
we
shut
down
an
api
server
that
removes
the
end
point
from
the
service
api
like
services,
endpoint
right,
I've
seen
cases
where
it
basically
takes
really
long
time
or
like
even
the
the
in
a
rolling
update
fast
on
the
instance,
which
was
shutting
down
or
let's
say,
I'm
like
removing
from
there
wasn't.
H
So
when
we
say
a
guest
will
shut
down,
means
that
until
that
api
server
removes
itself
from
the
end
point,
and
then
then
we
say
that
okay,
it's
a
shutdown,
happen,
graceful,
isn't
it
right
or
should
we
depend
like
any
other
api
server
finds
out?
Okay,
this
node
is
not
there
like.
They
don't
know
about
that.
C
So
so,
there's
a
couple
different
aspects
to
to
graceful
shutdown
at
the
very
highest
level.
You
get
an
indication
you
should
shut
down
and
then
what
happens
next
is
I'm
trying
to
remember.
I
believe
now
at
this
point
you
end
up
in
a
stage
where
you
continue
accepting
new
connections,
but
you
report
that
ready,
z
is
false
and
you
remove
yourself
from
the
endpoint
list,
but
you
can
actually
accept
new
connections
for
a
certain
period
of
time
so
that
you
can
handle
a
lag
of
x
many
seconds.
C
C
G
Got
that
absolutely
100
correct,
you
have
to
wait
from
the
time
you
stop
start
shutting
down.
You
signal
you're
done
as
early
as
possible.
You
must
wait
as
long
as
the
longest
load.
Balancer
could
possibly
take
to
remove
you
from
rotation
when
it
observes
either
your
ready
check,
going
down
or
endpoint
removal.
If
it's
a
service
load,
balancer
or
you
know,
propagation
delays
so
like
the
cloud
load,
balancers
have
propagation
delays
and
check
intervals,
so
you
have
to
be
longer
than
the
longest
possible
interval.
C
H
Yeah,
so
it
as
basically
david
explained
like
there
are
three
like
four
steps
you
said
like
one
is
like
when
we
get
a
signal
that
I
have
to
shut
down.
Ready
g
starts
saying
that
I'm
unhealthy,
while
I'm
still
accepting
the
request
and
then.
C
H
Okay:
okay,
okay,
I
understand
so
when
we
say
not
ready
and
like
yeah.
So
when
we
you
say
when
we
are
still
accepting
the
request,
I
I'm
understanding
like
we
will
have
some
threshold
of
time
here,
like
till
which
we
will
accept
and
start
rejecting.
C
J
Yeah,
I
said
I
I
believe
I
believe
it's
30
seconds
by
default.
I
could
do
a
mistake,
that's
what
I
remember
and
there
may
be.
G
A
really
concrete
example
is
like
in
openshift.
We
standardize
this
across,
like
all
the
cloud
providers
and
on-premises
environments
is
being
somewhere
between
roughly
60
and
120
seconds,
and
I
think
david
right
now
we're
at
like
120
or
60
to
120..
We
do
tune
based
on
which
environment
it
is.
If
we
we're
the
ones
who
set
up
the
load
balancers
and
know
how
long
the
load
balancers
are
configured
for
that
maximum
window
through
experimentation
and
testing
so
like
on
a
cloud
provider,
did
we
create
the
load
balancer?
Did
we
set
the
health
check
interval?
G
A
Sorry
to
interrupt,
I
think
we
should
unbox
this
and
you
know
we've
been
yeah,
it's
maybe
next
time
you
can
do
some
more
preparation
in
advance
and
then
the
more
concrete
the
questions,
the
more
concrete
the
answers
are
going
to
be
sounds
good
yeah.
Thank
you.
Thank
you.
Thank
you.
A
We
still
had
two
more
items
here
and
we
can
follow
up
offline,
also
in
the
slack
channel.
If
you
have
more
questions,
mine
is
super
quickly,
so
hopefully
we
can
get
to
martin
a
little
bit
so
for
those
of
you
that
have
been
or
not
following
up,
you
know
there
was
a
proposal
to
move.
A
Meta
controller
was
looking
for
a
house.
Basically,
a
new
home
was
originally
into
one
of
google
cloud
repos
organizations
and
was
not
being
maintained,
which
was
causing
a
lot
of
problems.
So
we
met
with
the
main
contributors
of
meta
controller,
together
with
david,
daniel
myself,
very
open
conversation
and
with
the
help
of
everybody,
we
decided
to
give
them
ownership.
So
basically
we
removed
that
from
the
google
organization
and
they
have
replicated
the
repo
fork
and
replicated
the
repo
somewhere
else.
A
We
transferred
them.
The
ownership
of
the
groups
we
as
in
we,
google,
but
also
you
know,
was
part
of
the
stick
discussion.
So
yes,
you
know,
conversation
was
close.
I
was
working
with
amit
and
alaimo.
I
don't
remember
his
name,
but
his
nickname
in
github,
and
you
know
end
of
the
story.
So
meta
controller
now
lives
in
a
different
place.
If
you
go
to
the
original
repo,
it
will
tell
you
where
to
go
exactly
okay,
so
that
was
my
update.
K
Okay,
great,
so
it's
more
or
less
a
behavioral
question.
So
today
the
resource
quarter,
admission
controller,
pretty
much
doesn't
work
if,
for
example,
you
want
to,
if
you
have
your
customer
resource
exposed
via
an
aggregated
api
server,
so
it
works
for
local
api
services
for
crds.
However,
in
order
to
get
it
working
in
in
most
of
the
cases
you
you
must
pretty
much
vendor
kubernetes
the
entire
repo
and
then
manually
add
it
to
as
a
plugin.
D
C
D
C
K
Of
my
colleagues,
it's
actually,
I
think,
working
on
a
pr
to
actually
move
it
in
the
api
server.
If
this
is
reasonable-
or
I
should
discuss
with
him
or
where
should
be
the
the
best
place
to
actually
put
this
plugin,
if
it's
not
part
of
the
kubernetes
such
kubernetes
repo.
F
C
K
C
L
K
Did
sounds
good
to
me
I'll
talk
to
him
to
my
colleague
and
he
is
actually
working
on
the
like
a
pull
request
right
now.
So.
C
Trying
to
get
it
created
if
you
could
keep
the
polls
from
getting
too
large
so
like.
If
you
have
holes
that
actually
like
snip
easily
the
configuration
a
bit
apart,
it
would
be
helpful
if
you
can
like
give
less
than
a.
J
C
I
was
just
saying
it
would
be
helpful
if
you
could
keep
your
refactoring
prs
fairly
fairly
modest
in
size,
so
we
can
more
easily
review
them.
If
you
make
one
giant
pr
that
does
all
of
it.
It's
going
to
take
a
long
time.
K
So
your
suggestion
is
to
copy
or
or
move
or
or
do
the
bot
or
I
mean
yeah.
We
will
try
to
keep
it
as
as
small
as
possible.
A
Thank
you,
okay
with
that,
I
think
we
hit
the
end
of
our
meeting
today.
I
appreciate
everybody's
time
and
brains
here,
answering
questions
and
having
good
discussions.
I
hope
everybody
stays
safe
and
healthy.
Try
to
stay
positive,
we'll
see
you
in
the
next
meeting.
Thank
you
for
joining,
have
a
good
rest
of
your
day
or
night,
depending
on
where
you
are.