►
From YouTube: Kubernetes SIG Cluster Lifecycle 20180124 - Cluster API
Description
Meeting Notes: https://docs.google.com/document/d/16ils69KImmE94RlmzjWDrkmFZysgB2J4lGnYMRN89WM/edit#heading=h.io2hjkir5u89
Highlights:
- Namespacing for api objects
- Summary of the node-controller-manager design review
- Discussion about managed machine states
- Status of PRs for Machines & MachineSets
- Validation custom provider configs
A
A
The
first
follow-up
I
want
to
talk
about
was
there's
an
issue.
We
open
number
four
six
three
talks
about
how
we
should
namespace
API
objects
and
a
particular.
This
relates
to
how
to
whether
we
want
to
support
putting
nodes
in
a
cluster
that
represent
nodes
in
a
different
cluster
and
we
poked
at
this.
Last
week
a
number
of
people
went
in
with
input
on
the
issue
which
was
awesome
and
so
Chris.
Do
you
and
I
sort
of
summarize,
where
we're
at
and
try
to
close
this
one
up.
B
Yeah
sorry,
if
it's
kind
of
hard
to
hear
me,
but
the
consensus
was
we
do
want
to
do
namespaces
from
cluster
AP
resources.
The
main
debate
was
over
whether
we
put
a
field
or
annotation
on
the
cluster
object
to
let
it
know
whether
it's
being
locally
managed
or
remotely.
There
was
no
good
consensus
on
that,
and
so
we
I
I'm
leaning
in
the
direction
of
not
doing
that.
With
the
caveat
that
if
we
find
that
we
need
it,
we
can
add
it
easily.
Later.
B
Annotations
can
be
added
for
individual,
like
deployments
that
actually
care
about
whether
it's
open,
managed
or
not,
and
if
we
see
that's
used
widely
enough,
we
should
probably
figure
out
a
better
way
to
support
it
or
at
least
standardize
on
the
annotation.
That's
the
high-level
summary.
A
C
D
A
A
Yeah
so
then
you're
getting
in
towards
some
of
the
multi-tenancy
discussions
around
kubernetes
and
how
sort
of
strict
of
separation
can
you
enforce
and
I?
Think
if
part
of
that
depends
on
what
level
of
compromise
is
in
the
management
cluster
right?
If,
if
somebody
can
take
over,
you
know
a
single
pod
in
the
management
cluster
and
you
can
restrict
the
access
of
that
pod
to
the
rest
of
the
system.
You
probably
have
a
better
story
there,
especially
if
you're
running
pods
on
different
nodes,
where
you
have
sort
of
the
VM
or
hypervisor
isolation.
A
B
Yeah
I
just
want
to
mention
that
nothing
in
the
cluster
API
right
now
insists
that
you
have
credentials
some
people
do
put
like
project
names
and
stuff
like
that
in
writing,
but
at
least
in
our
on
our
demo
for
a
Q
Khan,
we
had
a
service
account
that
was
in
the
pod
definition
itself
for
the
controller.
So
it's
how
you
would
do
isolation
with
any
kubernetes
application.
At
that
point,.
A
Okay,
so
cuz
I
think
it
sounds
like
we
have
consensus
here
that
we
do
want
to
support
namespacing
of
resources.
We
are
not
gonna,
add
any
special
fields
or
necessarily
enforce
any
standard
conventions
on
what
resource
in
a
namespace
means,
and
if
those
sort
of
conventions
come
out.
Naturally,
as
people
start
using
the
system,
then
we
can
start
enforcing
those
at
some
point
in
the
future.
A
B
E
B
A
Okay,
so
I
were
doing
two
action
items
in
the
notes.
It
sounds
like
we
need
to
upgrade
our
up
their
registration
logic
to
be
namespace
aware,
and
we
need
to
make
sure
that
we
get
the
client
generation
updated
as
well
to
be
named
so
far,
do
you
have
any
other
applications
that
we
should
write
down,
or
is
that
sound
like
it
to
you.
B
F
G
B
F
B
A
Yeah
I
guess
I
would
also
say
that
I
think
in
some
ways
this
is
a
lot
of
the
clusters
that
we
would,
you
know,
create
through
tooling
that
we're
sort
of
singleton
clusters
that
weren't
part
of
a
larger
system
probably
aren't,
can
need
to
do
this.
Remote
node
management
scenario-
and
this
is
something
we're
more
building
for
flexibility
extensibility
for
people
who
want
to
run
a
cluster
that
effectively
manages
other
question.
I.
A
Think
in
that
scenario,
where
you
have
this
sort
of
master
cluster,
that's
managing
room
of
clusters,
controller
per
namespace
or
a
single
controller
is
sort
of
up
to
the
person
managing
the
master
cluster,
but
for
posts
for
people
who
are
just
managing
a
single
cluster
that
they've
created.
They
will
probably
just
have
one
controller
that
manages
the
local
cluster.
E
It
also
makes
sense
to
consider
the
possibility,
where
administrator
of
the
master
cluster
might
not
want
to
reveal
certain
C,
RDS
or
certain
or
might
want
to
keep
some
obstruction
from
the
slave
clusters
right,
so
the
node
controller
manager.
Maybe
we
want
to
have
two
different
kinds
of
plans,
so
say,
for
example,
one
client
which
points
to
the
master
clusters
API
server.
E
Because
certain
CR
DS
are
certain
apart,
like
secrets
that
we
are
using
for
the
EWS
a
SS
credentials
in
node
controller
manager,
right
so
as
administrator
of
the
master
cluster,
we
would
not
like
to
give
it
to
the
slave
clusters,
so
we
would
like
to
hard
hide
certain
parts
to
the
master.
So
two
hooks
of
two
clients
may
make
sense.
A
So
in
that
case,
what
I
feel
like
your
architecture
would
be
is:
is
you
let
users
talk
to
your
master
cluster,
to
list
machines
that
would
describe
the
nodes
in
their
client
cluster
right
medium?
When
Chris
and
I
were
talking
about
this
yesterday
or
right
now
we
have
a
field,
the
Machine
specification
in
machine
status,
part
rather
that
points
it's
a
node
ref
and
it
points
to
a
node.
But
now
we're
talking
about
that.
A
B
B
E
The
namespace
do
we
have
API
server
running
for
the
slave
cluster.
We
would
still
expect
that
administrator
of
the
slim
cluster
might
want
to
have
anyone
excess
on
the
slave
cluster
right.
So
what
we
want
is
maybe,
in
our
calculation
what
we
wanted
was.
Maybe
we
can
hide
a
specific
things
like
cigarettes
and
Muslim
class
level
of
configurations.
We
keep
those
things.
C
E
F
A
So
the
set
of
desired
machines
for
that
slave
cluster
lives
in
the
master
cluster.
What
kind
what
Chris
is
saying
is:
can
you
put
our
back
rules
into
the
master
cluster
that
allows
the
admin
of
the
slave
cluster
to
resize
a
machine
set
in
a
master
clusters
API,
but
not
add
secrets
or
machine
classes
or
whatever
the
other
resources
you
want
to
protect
are,
and
that
would
be
our
Beck
rules
and
them
in
a
master
cluster
which
is
not
nor
tor.
Not
then
modifiable
by
someone
who
has
complete
control
over
the
site,
cluster.
E
Yes,
that
can
that
can
probably
solve
problem
at
some
extent,
but
it
depends
that
what
level
of
visibility
we
might
want
to
give.
So
we
might
want
to
hide
the
complete
master
cluster
from
the
slave
cluster.
It
just
depends
on
different
kinds
of
architectures
in
one
layer
we
might
want
to
give
some
visibility
of
the
mistype
luster
to
the
sale
cluster
set
mean
at
some
point.
We
might
not
mean
why
I
might
not
want
to
give
any
kind
of
visibility
of
the
master
cluster.
Just
for
the
usage
of
reasons.
A
A
Interesting
I
think
we're
actually
kind
of
veering
a
little
bit
into
the
second
topic
for
the
the
meeting
anyway,
because
that
on
Monday
we
had
interesting
hour-long
discussion
about
the
node
controller
manager,
which
is
sort
of
alluding
to
in
this
conversation.
I
linked
from
this
many
notes
to
a
notes.
A
A
So
again,
I
think
I
think
the
notes
of
people
when
you
go
take
a
look
at
those
I.
Don't
know
if
there's
any
follow-up
we
want
to
have
in
this
meeting,
I
think
that
the
our
sort
of
ran
out
before
we
got
to
a
great
conclusion
points
or
even
sort
of
got
to
the
next
steps
we're
still
going
through
some
of
the
questions
so
I
think
we'll
probably
have
some
some
follow-up
discussion
there
about
sort
of
the
the
right
path
forward.
A
H
I'm
fairly
interested
in
in
general
I
have
a
few
days
ago.
I
finally
got
the
prototype
AWS
Fork
of
cube,
deploy,
kind
of
working
and
I
need
to
push
that
over
to
you
to
as
a
PR
just
for
discussion
purposes.
So
as
I'm
looking
at
these
node
controller
manager
notes.
There's
a
lot
of
interesting
comments
were
later
relating
to
been
during,
for
instance,
and
management
of
secrets
that
are
relevant.
This
is
the
first
time
I've
looked
at
these
notes,
so
I
don't
have
a
lot
set.
A
A
All
right,
if
there
are
no
other
comments,
I'm
gonna,
I'm,
gonna
skip
I'm
gonna
switch
the
order.
The
next
two
things
on
the
agenda.
One
thing
we
talked
about
last
week
was
trying
to
define
sort
of
a
standard
machine
life
cycle.
If
you
will
of
the
sort
of
states
that
a
machine
goes
through
as
it's
coming
up
being
managed
by
the
cluster
API
and
then
eventually
going
away,
and
so
Philip
and
I
sat
down.
A
We
sort
of
talked
through
these
things
verbally
last
week,
but
I
think
that
was
probably
pretty
confusing
for
most
people
and
film.
I
sat
down
and
sort
of
drew
up
a
picture
of
what
this
looked
like
and
tried
to
describe
what
the
different
states
were.
We
shared
that
out
at
the
end
of
last
week
on
Friday
to
give
people
a
couple
of
days
to
look
at
it
before
this
meeting
I.
A
There
were
definitely
some
people
who
jumped
on
it
and
added
some
comments,
which
was
awesome
but
I
kind
of
wanted
to
present
it
again
this
week
with
a
picture
and
sort
of
talks
through
the
states.
One
comment
that
was
really
interesting
was
Daniel,
Smith
who's.
The
lead
of
the
CG
API
machinery
group
mentioned
that
for
other
parts
of
kubernetes
they
explicitly
did
not
create
such
a
picture
or
diagram
of
the
state
transitions
because
they
thought
that
would
prevent
them
from
being
able
to
make
changes
going
forward,
and
that
is
one
thing.
A
I
plan
to
talk
with
him
about
in
the
next
couple
days
is
whether
this
is
just
a
bad
idea
to
write
this
down
and
try
to
sort
of
codify
what
the
states
and
the
state
transitions
are
or,
if
there's
a
subset
that
we
can
codify.
That
would
at
least
allow
us
to
build
some
common
tooling
without
tying
our
hands
going
forward.
A
Assuming
we
can
get
sort
of
past
that
discussion.
If
you
guys
pull
up
the
picture
here,
I
can
sort
of
talk
through
it
again
and
I
think
there
were
some
questions
about
a
couple
of
the
states
in
the
dock
and
I.
Think
people
who
asked
those
questions
are
here,
which
is
even
better
so
I,
think
draining
was
may
be
contentious.
So
Martin
had
asked
whether
we
should
have
an
explicit
draining
state
in
the
API
partially,
because
you
can
do
drains
out
of
band.
I
I
mean
it's
because
if
you
got
drained,
then
we
might
some
other
condition
than
some
other
condition,
depending
on
the
on
the
types
of
operations
that
you
want.
What
about
coasters,
so
it
did
entity
we
could
become
bloated.
In
my
opinion,
we
should
just
focus
on
the
on
the
the
least
amount
of
states
that
we
want
to
support.
A
I
A
Am
I
still
draining,
or
am
I
actually
you
know
reconfiguring
or
replacing
this
underlying
VM
sort
of
two
distinct
things,
because
otherwise
we
end
up
with,
like
your
machine,
is
basically
always
in
a
serving
state,
and
you
can't
tell
what's
happening
to
it
during
that
sort
of
long
piece
of
the
lifecycle
of
the
machine
where
it
has
been
created.
But
now
it's
just.
Is
it
always
just
happy
and
running
after
that?
Or
can
we
tell
what's
what's
happen
to
it
in
the
meantime,
I
think.
A
That's
why
we
were
trying
to
sort
of
pluck
out
some
of
these
other
states
of
you
know
you
can
reconfigure
a
machine
without
draining
it
or
you
can
reconfigure
a
machine
after
draining
it,
and
you
know
that
reconfiguration
could
be
changing.
The
kubernetes
version
contingent
the
container
at
runtime
changing
the
underlying
operating
system
like
there
are
lots
that
changing
parameters
on
the
cubelet.
Now
there
are
lots
of
different
things
that
we
might
want
to
to
change
and
be
able
to
have
controllers
sort
of
tracked
that
state
declaratively,
and
so
I
think
some
of
this.
A
These
transitions
might
be
more
of
like
an
internal
API
that
we
want
to
enforce
between
different
controllers.
So
we
have
a
set
of
sort
of
States
and
machines,
walkthrough,
that's
consistent
across
in
different
environments,
and
maybe
we
don't
expose
all
of
those
out
to
the
end-user
and
the
public
facing
API.
But
it's
more
of
like
we
agree
that
internally,
as
we're
building
controllers,
we're
gonna
sort
of
implement
the
similar
workflow.
A
Because
I
think
Daniels
concerned
was,
if
you
put
this
in
your
public
API,
that
sort
of
ties
your
hands
going
forward,
because
people
start
to
depend
upon
the
states
and
the
state
transitions
which
makes
it
hard
for
you
to
change
them
like.
How
do
you
add
an
extra
line
right
or
change
where
a
line
goes.
E
Maybe
just
just
it's
a
thought,
so
we
tried
to
learn
from
the
existing
cubanÃa
disappear
for
the
motor
controller
manager
and
what
we
did
is
we
basically
added
a
last
operation
field,
so
the
last
operation
could
be
create,
delete
and
in
place.
It
could
also
be
update
and
then
there
are
two
layers,
so
one
layer
represents
the
machine.
State
and
the
second
layer
represents
the
machine
phase.
Yes,
so
in
Machine
State
we
got
only
like
only
very
minimal.
Only
three
states
like
processing
failed
and
successful.
E
So
you
can
represent
that
the
last
last
state
was
if
it
was
create,
then
it
was
processing
failed
or
it
got
successful
and
on
the
second
layer
the
Lillian
machine
face
can
be
shown
as
part
of
the
status.
But
you
say
it's
either
it's
pending
or
it's
available,
but
not
yet
join
to
the
cluster
or
it's
running
already
or
somebody
has
to
got
the
tournament
terminating
trigger
the
deletion.
So
it's
an
a
terminating
phase
or
it
is
completely
unknown
phase
or
its
failed,
States
or
draining
phase.
E
So
there
could
be
more
phases
for
the
current
status,
but
the
state
of
the
machine
state
could
be
minimal
and
last
operation
could
identify
that.
What
was
the
last
thing
that
we
tried
to
do?
It
was
creation
or
deletion
on
a
reputation
system.
That's
what
we
did
for
eventually
for
controller
manager,
good
controller
information.
A
E
It
doesn't
need
to
know
that
what
is
the
exact
machine
phase
machine
phase
would
be
more
for
the
user
or
human
perspective
that
what
was
really
happening
to
the
Machine
right
now,
whether
it
is
available
to
join
available,
then
it
means
that
the
machine
is
created.
Button
has
not
joined
right
now
running
the
process
is
running
heavy,
so
we
tried
to
create
two
layers
of
the
machine
students,
interesting.
A
A
I
know
that
in
the
way
we're
trying
to
handle
errors
right
now
in
the
cluster
API.
It's
also
split
that
way
where
we
have
like
here's,
an
error
string
that
supposed
to
be
machine,
readable
and
here's,
an
error
that
supposed
to
be
for
a
human
right,
I
think
so
we're
we
were
thinking
about
some
of
those
similar
concerns
about
splitting
the
different
audiences
apart,
because
it's
often
really
difficult
to
do
something
that
satisfies,
but
for
those
audiences.
I
A
I
think
an
events
is
to
me
one
of
those
things
that
falls
under
the
human
audience
category
right
because
of
the
way
that
they
are
tracked
by
the
system
and
sorted
by
the
system.
It's
I
think
in
pretty
much
impossible
to
build
automation
that
relies
on
events
to
drive
it
forward,
because
events
can
easily
be
lost
right,
so
yeah
I
think
for
automation.
A
Don't
I,
don't
think
our
prototype
implementation
generates
any
events
yet
nor
we
discussed
sort
of
when
it
would
make
sense
to
generate
events,
but
I
mean
if
you
look
at
the
state
transition
diagram,
I,
think
sort
of
each
of
those
state
transition
arrows
is
is
a
good
candidate
for
something
that's
changing
in
the
system
that
you
might
be
interested
in
like
we
are
now
drain
this
node.
We
are
now
updating
the
OS.
We
are
not
recreating
this
node,
like
those
all
make
sense
for
points
when
we
generate
events.
A
So
I
guess
the
question
then
becomes:
does
it
help
the
the
automation
to
know
that
they
know
those
intermediate
states
or
the
intermediate
states
more
for
for
human
consumption
where
something
like
events
or
your
your
two-level
phase
and
status
split
makes
more
sense.
Philip
I,
don't
know
if
you
have
any
any
thoughts,
you're,
very
quiet,
yeah.
J
So
yeah
feedback
first
so
I
think
like
it
might
help
us
to
try
to
like
brainstorm
what
we
would
need
in
the
higher
level
controllers,
because
that
would
drive,
probably
the
the
requirements
in
this
like
state
state
that
we
need
right.
So
not
sure
if
we
there
was
this
discussion
happening
and
I
was
not
present
or
like,
for
example.
What
do
we
need
to
create
machine
deployment
which
respects
like
disruption,
budgets
and
can
do
row
bugs
things
like
that
right?
So
is
there
anything
that
that
you
eat
from
the
machine
itself.
A
Something
on
on
our
end,
we
haven't
haven't
explored
the
heart.
What
the
higher
level
controllers
would
look
like
quite
yet
I
know
that
the
the
note
controller
manager,
folks
from
s
IP,
they
have
those
controllers
in
place
right
and
they
they
have
this
fields
and
so
I'm
curious
what
they
think
is
necessary
to
build
those
higher-level
controllers.
E
So
I
think
to
stick
this
Way's
we
would
need
at
least
I
mean
machine
said
would
not
need
to
know
lot
of
details.
It
would
just
need
to
know
that
at
certain
point,
machine
is
in
a
healthy
state
or
you
can
always
categorize
or
always
bunch
up
bunch
of
searching
faces,
ease
and
term
it
as
a
healthy
answered
term
and
categorize
certain
other
phases
and
tell
me
them
as
unhealthy
right.
E
So
Machine
State
still
considers
the
phase
like
the
initial
phase,
where
it
is
creating
when
we
call
it
pending
and
when
it
is
covering
the
available
state
and
when
it
is
running
state
we
categorize
all
of
them
as
a
healthy,
and
then
we
categorize
other
phases
where
it's
more
of
a
failed.
Well,
it's
a
couplet
fails
or
discretion
happens.
E
So
in
all
of
these
cases,
machine
controller
declares
declares
the
state
of
the
machine
to
be
filled,
and
at
that
point
machines
the
Machine
set
decides
that
now
this
machine
has
to
be
removed
and
needs
to
be
recreated
and
on
one
layer
on
top
of
it
machine
deployments.
So
we
don't
really
expect
the
machine
deployment
to
interfere
much
with
the
machines
we
expect
only
machines
said
to
interfere
with
the
statuses
of
the
machines
made
from
the
from
the
Machine
deployment.
E
We
only
list
out
certain
certain
categories
like
these
are
the
available
set
of
what
are
the
number
of
number
of
available
nodes?
What
are
the
number
of
labeled
nodes
available
which
are
available
nodes?
Which
means
which
are
the
curve
which
are
the
nodes
which
are
fully
labeled
according
according
to
the
label,
selector
criteria
and
so
on?
J
E
So
so,
currently
we
let's
say
the
pending
state,
which
is
the
initial.
When
the
machine
is
created,
we
call
it
pending.
Then
one
machine
is
created
but
yet
to
join.
We
call
it
available
available
phase
and
then
once
it's
connected,
we
call
it
running
phase
right.
So
all
three,
all
these
three
machine
phases
we
call
them
machine
state,
is
running
or
successful,
I
think
yeah,
so
we
called
the
running
machine
state.
E
So
all
these
three
different
phases
gets
categorized
as
healthy
or
running
machine
state
and
in
the
rest
of
the
cases
where,
where
the
Machine
phases
like
it's
failed
or
by
the
deletion,
it
failed
or
something
can
happen.
Something
else
has
happened
like
it
has
gone
into
the
unknown
state.
So
we
have
also
seen
the
cases
where
your
machine
goes
into
a
black
hole.
So
you
won't
know
what
happened.
I
mean
networking
has
gone
down
or
something
right,
so
the
Machine
goes
into
unknownst
unknown,
face
so
catch.
E
The
word
face
and
state
yeah,
so
the
Machine
goes
into
unknown
phase,
so
that
unknown
face
with
wait
for
five
minutes
and
after
that
we
convert
that
unknown
face
in
profile,
so
that
machine
that
initially
for
five
minutes,
the
machine
was
in
a
healthy
state,
but
then
it
becomes
in
that
way.
Instead,
so
that's
what
so,
we
more
or
less
try
to
bunch
up
the
faces
into
a
minimal
state,
because
eventually
machine
sucks
to
know
whether
it
should
recreate
the
machine
or
not
to
maintain
the
number
of
healthy
replicas
the
it
doesn't
really
care.
J
It's
like
the
interesting
part
of
like
the
interesting
to
me.
The
most
is
the
the
part
where
or
this
higher
level
controller
needs
to,
and
so
for
example,
there
is
some
unavailability
budget.
So,
for
example,
someone
configure
the
cluster
that
no
more
than
two
nodes
should
be
should
be
unavailable
because
of
upgrades
and
and
so
on,
right.
The
rest
of
them
should
be.
J
We
may
need
to
actually
understand
that
one
note
is
another
unavailable
for
scheduling
because
it's
drained,
so
this
is
something
that
will
not
just
wait
and
see
what
will
happen
because
we
know
it's
drained.
That's
why
it's
another
rule
versus
it's
just
unhealthy
because,
as
you
said,
like
networking
is
flaky.
So
this
machine,
or
something
like
that,
so
maybe
distinguishing
those
two
things-
is
helpful
for
economy.
That.
E
E
That
number
of
healthy
machines,
plus
the
number
that
we
mentioned
as
part
of
the
mixers,
is
the
maximum
amount
of
machines
deployment.
Will
any
point
in
time
create
right
and
makes
unavailable.
Machines
is
the
number
of
machines
which
any
point
in
time
will
be
unavailable
in
the
cluster.
Well,
while
it
is
doing
the
rolling
update
so
deployments,
job
is
to
mainly
mainly
to
a
rolling
update
post,
rolling
update
order
rolling
back.
E
So,
as
you
said,
if
loading
update
is
happening
and
let's
say,
two
machines
got
updated
properly
and
the
third
machine
is
stuck
so
at
that
point
we
wait
till
the
third
machine
actually
gets
up
and
becomes
healthy.
If
it
is
not
becoming
healthy,
then
we
then
the
rolling
update
automatically
gets
paused.
So
we
have
this
thing:
I
need
similar
the
way
the
deployments
were
in
Cuban.
It
is
right
now
for
the
pods.
E
So
if
you
see,
if,
when
you
are
doing
the
running
up,
wait
for
the
pods,
you
have
capabilities
of
pausing
manually
in
between,
so
you
can
actually
execute
the
command
and
pause
the
rolling
update
in
between
all
if
running,
upgrade
is
failing.
If
the
new
pods,
the
new
version
of
the
pods,
are
getting
stuck
somewhere,
the
rolling
update
automatically
gets
paused
so
that
thing
happens
in
machine
deployment
as
well.
So
eventually
you
will
end
up
in
a
hybrid
cluster
Abell.
E
It
should
not
happen
that
all
of
the
machines
go
Sandow
and
Hill,
this
state
and
stuff
like
that.
So
that's
that's.
Actually.
That
was
actually
a
very
good
point
that
we
need
to
stop
rolling
update
in
between
from
the
deployment
controller
if
things
are
not
going
and
they're.
Actually,
we
also
plan
to
think
of
a
scenario
where
we
need
to
think
one
layer
on
top.
E
So
let's
assume
that
we
have
applications
running
in
the
older
version
right
and
when
we
are
doing
the
rolling
update,
we
should
actually
also
ensure
that
those
applications
are
able
to
run
properly
on
the
newer
version
of
the
machines
that
we
are
transferring.
So
we
should
also
check
the
pods
are
getting
into
the
ready
state
and
that's
where
we
actually
meet
to
check
the
PDB
so
for
disruption
budgets.
So
we
need
to
check
those
parts
and
then
the
application
layer.
E
A
And
similarly,
and
when
you're
choosing
the
next
node
to
replace
during
your
update,
you
should
also
be
respecting
the
part,
disruption,
budgets
and
slowing
down
the
update,
not
just
based
on
max
unavailable
on
the
machine
deployment,
but
also
based
on
the
pod
disruption
budgets.
The
pods
that
are
running
on
those
nodes.
J
Yes,
so
that
was
also
pretty
interesting,
so
the
dis
waiting
for
whether
the
pods
can
actually
successfully
run
on
this
new
version
of
of
machine.
So
how
do
you?
How
do
you
wait
for
that
like
what?
What
is
the
idea
here?
Do
we
have
like
another
phase
where
you're
like
validating
that
this
new
version
is
fine
for
next
few
minutes
before
you
proceed
to
the
next
machine,
so.
E
That
part
is
yet
to
be
implemented,
so
we
don't
have
it
in
the
implementation,
but
just
what
you
say
it
is
correct
at
some
point
you
will
have
to
label
the
machine
or
we
will
have
to
create
a
new
phase
for
the
machine.
The
machine
might
be
healthy.
It
might
not
be
really
ready
for
further
routing
updates.
J
J
E
E
So
we
need
to
really
need
to
check
that
what
kind
of
parts
are
running
in
it
or
we
actually
need
to
prioritize
the
machines
that
which
machine
should
be
scaled
down
first
now,
in
that
case,
we
still
might
want
to
put
the
intelligence
inside
the
other
deployment
on
machines
set
in
a
way
that,
whether
we
put
a
label
or
a
notation
on
the
machine
object,
so
that
deployment
can
understand
that
how
could
I
have
got
firing
machines,
but
this
is
the
machine
that
should
be
scaled
down.
First
yeah.
E
That
thing
that
is
I,
think
very
interesting.
Question
and
I
was
also
very
curious
and
I
asked
in
one
of
the
channel
that
I
wanted
to
know
that
is
there
any
plane
from
the
past
or
the
scalar
guys
to
to
into
think
or
already
have
a
proposal
for
the
design
of
integration
of
autoscaler
and
machine
api.
E
A
There
are
two
answers:
one
is
that
they
they
already
do.
The
first
thing
you
talked
about
inside
the
autoscaler.
So
when
they're
looking
at
scaling
down
the
cluster,
because
I
think
that
it's
under
utilized,
they
look
for
a
machine
to
scale
down
that
will
have
the
least
disruption
right.
So
they
look
for
a
machine
that
isn't
running.
You
know
singleton
pods.
It
won't
come
back
because
they're
not
underneath
the
controller.
A
That's
maybe
only
running
system
pods,
you
know,
or
things
that
are
in
daemon
sets
that
they
won't
delete
a
node
that
will
disrespect
upon
destruction,
budget
or
something
running
on
that
know
it
right,
so
they
already
take
all
that
into
account
when
they
choose,
which
node
to
delete
all
right.
So
they're
not
saying
scale
down
this
MIG
by
one
they're
saying
delete
this
particular
node
from
the
MiG
or
the
SG
and
I
think
they
would
want
that
same
feature
in
the
machines
out
of
the
machine
deploy.
A
Maybe
I
of
we
are
smart
enough
to
know
which
thing
to
get
rid
of.
Please
leave
this
one
for
us,
not
please
scale,
but
down
by
one
and
I
trust
you
to
pick
the
right
one,
because
there
are
cases
where
the
clusters
underutilized,
but
they
don't
want
to
scale
down
by
one,
because
it
will
violate
one
of
these
constraints.
That's
your
first
answer.
A
The
second
answer
is
I've
been
talked
a
lot
with
Marcin,
who
is
from
the
Google
Warsaw
office
and
is
part
of
the
sig
auto
scaling
group
about
integration
between
the
cluster
API
and
the
autoscaler,
and
you
know
he
was
pestering
me
yesterday
about
like
timelines
and
when
they
should
start
trying
to
rebase
and
so
I
think
there
are
discussions
there,
I'm,
not
sure.
If,
there's
anything,
that's
been
written
down
about
what
the
autoscaler
would
look
like
or
what
the
proposal
is
quite
yet,
but
I
can
ask
him.
I
should
see
him
today.
E
J
J
I
think
we
should
maybe
wrap
up
with
some
some
action
items,
so
I
think
it
sounds
like
the
better
understanding
of
the
use
cases
of
like
what
do
we
actually
need
as
an
input.
So,
for
example,
we
talked
a
lot
about
healthy
versus
unhealthy
and
those
phases
and
States
so
so,
for
example,
the
dog
that
the
Roberts
and
I
prepared
lee
doesn't
even
talk
about
any
health
states,
or
also
it's
not
talking
about
like
intended
States.
Not
all
the
failures
are
like
outside
of
scope.
J
So
it's
not
something
that
it's
touched
on
and
I
totally
agree
with.
You
guys
that
it's
it's
important,
that's
an
input
for
the
higher
level
controllers,
so
I
think
we
should
have
a
is
full
picture
first
before
we
can
but
understand
like
before.
We
will
be
confident
this
is
really
de
the
states
or
phases
that
we
want
to
end
up
with
all
right,
so
these
I'm
not
comfortable
yet
with
you
know,
seeing
this
to.
A
J
So
I
think
we
should
so
I
will
try
to
think
a
bit
more
because
this
is
not
something
I
already
have
in
mind
about
these
disc,
healthy
versus
unhealthy,
how
these
can
be
or
if
either
modes,
how
with
that
bit
that
can
be
combined
with
River
States.
So
that's
I
have
any
opinion
on
that.
So
I
would
definitely
trying
to
figure
out
this
part,
but
I
can
hear
that
there
already
some
proposal
so
I
also
need
to
catch
up
on
that
on
what
is
already
there.
A
That
would
be
really
really
helpful
for
us
to
to
help
the
better
understanding
of
use
cases
why
you
guys
made
the
design
decisions
that
you
did
yes
and
how
we
can
either
fold
some
of
those
decisions
into
the
cluster
API
or
make
sure
we're
being
the
same
use
cases
in
a
slightly
different
way.
Mm-Hmm.
E
A
So
I
think
that
one
we
have
a
relatively
clear
path
forward
on,
although
we
I've
been
poking
at
him
for
the
last
couple
days
and
we
haven't
finished
so
I'm
hoping
to
get
that
wrapped
up
this
week,
the
Machine
set
PR
has
been
open
for
a
little
while
now
the
the
Lucy
guy
sent
that
one
over
and
I
don't
know
that
it
has
any
comments
on
it.
Yet
so
I'm
not
sure
if
anybody's
really
looked
at
it
very
deeply.
A
A
If
there's
a
way,
we
can
propose
a
machine
set
API
where
we
bundle
machine
sets
or
even
machine
deployments,
with
machine
classes
at
the
higher
level,
where
we
expect
users
to
want
those
ways
to
sort
of
stamp
out
lots
of
copies
of
things.
But
then
at
the
lowest
level,
where
we
have
machines.
We
don't
have
that
complexity
so
that
a
user
who
wants
to
just
use
machines
doesn't
have
to
worry
about
linking
together,
multiple
different
objects.
E
E
Machine
class
and
provided
config,
so
middle
ground
might
look
like
following
so
you
might
have
a
single
CR
deformation
class
and
the
main
benefit
that
I
see
from
the
provided
configure
way
is
that
we
can
add
the
arbitrary
number
of
key
value
pairs
and
we
can
make
the
Machine
controller
smart
enough
to
understand
it
and
talk
to
cloud
provider
where
we
don't
really
need
to
do.
Api,
versioning
all
the
time
when
something
change.
E
So
what
we
can
do,
what
we
can't
other
do
is
we
can
have
a
stable
fields
in
the
machine
class
such
as
waist
image
or
disk,
discreet,
eye
disk
size
and
so
on,
which
are
already
very
stable
and
they
don't
need
to
be
modified.
But
then
we
can
have
a
new
field
in
the
machine
class.
Let's
call
it
where
you
configure
itself
and
that's
where
we
can
put
the
arbitrary
key
value
pairs
and
the
Machine
controllers
would
still
assume
if
anything
changes
on
the
cloud
provider.
We
do
not
really
want
to
immediately
change
the
version.
E
So
now
we
can
make
the
motion
controller
smart
enough
to
understand
and
leverage
that
new
features
that
came
up
on
the
cloud
you
know
so
just
thinking
through
it,
but
thought
me
may
make
sense
at
some
point,
because
details
redundancy
of
lore
of
Troy
Durkin
fix
inside
the
machine
looks
scary
to
me.
The
only
reason
is
that
the
only
reason
is
that
we
still
have
a
question
on
there.
A
E
A
We
talked
about
a
little
bit
with
allude
to
you
guys
at
cube
con
of
of
having
that
cloud
in
it
instead
of
being
stored
in
the
API
itself,
be
something
that's
maybe
inside
the
machine
controller
right.
So
if
the
machine
API
object
doesn't
have
like
the
entire
cloud
in
it
like
startup
script
and
all
that
sort
of
stuff,
it
just
says
like
I'm,
underneath
this
controller,
the
controller
looks
at
that
and
says:
oh
great
I
know
for
you.
I
apply
this
cloud
in
it.
A
I
So
more
or
less
like
now,
we
have
like
the
Machine
API,
but
then
everyone
without
their
custom
needs
much
more
or
less
like
custom,
configuration,
etc.
So
how
this
the
API,
the
the
machine
API
server,
actually
do
validation
for
the
configuration
on
the
running
machine.
So
when
you
create
a
when
you
submit
in
the
machine,
you
probably
want
to
check
if
the
require,
if
the
configuration
is
ok
to
and
the
only
way
I
can
think
right.
G
G
A
I
think
we
did
talk
about
maybe
using
admission
controllers
for
that.
The
other
option
that
we
talked
about
is
the
machine
controller
could
see
that
this
new
resource
was
created.
The
provider
config
I
think
it's
a
string
right
now
what
we
talked
about,
making
that,
like
an
actual
real
typing
humanities,
there's
an
issue
open.
A
A
The
controller
understands
how
to
read
so
the
controller
can
read
that
in
do
its
own
versioning
and
then
spit
back
an
error
on
the
machine
where
it
says
you
created
a
machine
with
this
desired
state
and
that
desired
state
is
invalid,
right
and
I
think
there
might
actually
be
an
error
type
in
the
top
level
machine
object
that
basically
says
like
an
invalid
desired
state,
and
so
that
wouldn't
be
you
submit
it
and
it
immediately
gets
rejected
you'd
submit
it.
It
would
be
sort
of
accepted
as
a
possible
desired
State.
A
Likewise,
you
know
there
are
going
to
be
cases,
especially
as
we
talked
about
with
the
generic
past
a
couple
of
extra
fields
to
the
underlying
cloud
provider
that
aren't
represented
directly
at
the
top
level,
where
we're
gonna
get
something
in
and
we're
not
actually
mailed
to
know
a
priori
whether
it's
a
valid
state,
we're
gonna
call
out
to
the
underlying
cloud.
It's
gonna
reject
that
back
and
we're
gonna
have
to
say.
Sorry,
you
specified,
you
know
it's
an
availability
zone
that
doesn't
exist
or
you
specified.
A
You
know
an
instance
type
that
doesn't
exist
right,
and
so
we
also
have
to
take
care
of
that
case
anyway,
and
so
we
can
sort
of
use
the
error
reporting
mechanism
in
a
similar
fashion
for
both
cases
and
that's
not
to
mention
things
like
you
ran
out
of
quota.
We
can't
scale
you
up,
like
that's
another
case,
where
it's
more
of
a
transient
error
where
you
could
add
quota
and
it
could
then
later
be
reconciled.
They're
gonna
be
states
where
you're
sort
of
in
a
permanent
error
state
where
your
configure
is
just
plain
invalid.
F
A
Of
there,
so,
if
like,
if
there's
like
a
quota
error,
that's
a
transient
case
I
could
imagine
the
Machine
controller,
saying
I
could
try
to
incentive
machine.
You
know
in
OBS
told
me,
I
had
no
more
quota
for
that.
You
know
project
in
that
zone.
I'm
gonna
wait
now.
Maybe
I'll
try
again
in
ten
minutes
or
at
that
is.
J
A
Like
have
an
exponential,
back-off
and
I,
you
know
eventually
I
start
trying
once
every
hour
I
will
keep
trying
like
I'm
not
going
to
give
up,
because
I
know
this
could
be
resolved
externally
by
an
actor
going
changing
photo,
whereas
if
they
started
crate
of
Vienna,
it
says
like
sorry,
you
specified
in
machines,
but
it
doesn't
exist.
There's
not
really
much
point
in
trying
again,
so
you
can
basically
put
in
sort
of
a
terminal
this.
This
desired
machines
that
you
gave
us
is
not
going
to
ever
work.
I
A
Yes,
I
think
we
can
look
at
admission
controllers
for
that
I
think
Chris
admin
Chris
had
to
drop
off,
but
I
think
he
mentioned
that
it
was
possible
to
do
that
with
admission
controllers.
So
that's
one
way
to
look
at
it
and,
if
we're
running
our
own
extension
API
server,
presumably
you
can
stick
admission
controllers
in
that
API
server.
To
do
some
of
that
initial
validation
also.
A
Which
is
a
little
unfortunate
because
you're
decoupling
it
from
like
the
actual
controller
code,
and
so
you
don't
want
those
things
to
get
out
of
sync,
but
it
does
still
give
you
the
flexibility
of
being
able
to
have
different.
You
know
different
machine
controllers
with
different
input,
validation
without
having
to
surface
everything
at
the
top
level
and
try
to
generically
abstract
across
every
cloud,
which
is
I,
think
something
we
were
really
scared
of
trying
to
do
it.
A
We
aren't
we're
not
trying
to
build
another
terraform,
so
all
right
with
that,
we
are
just
about
out
of
time.
Thank
you
for
everyone
coming.
This
has
been
a
very
productive
and
fruitful
conversation.
I
will
be
out
of
town
I'm
out
of
the
office
next
week,
so
I'll
find
somebody
else
to
run
the
meeting.