►
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
All
right,
hello,
everyone
today
is
thursday
august
5th,
and
this
is
the
cluster
cad
provider
azure
office
hours.
As
always,
we
invited
by
the
cncf
code
of
contacts,
so
please
raise
your
hand
and
be
respectful
to
everyone
in
the
call.
A
If
you'd
like
to
raise
a
topic,
please
add
it
to
the
agenda
open
discussion
section
and
if
you
need
access
to
this
agenda,
you
can
join
this
cluster
lifecycle
mailing
list
all
right.
So
let's
get
started,
oh
and
if
you
can't,
please
add
your
name
to
the
attendee
lists
also,
so
I
don't
see
any
new
faces
here
today.
So
I'm
gonna
skip
the
welcome
and
let's
just
go
straight
into
discussion.
I
guess
so.
A
The
first
oh
yeah,
so
the
first
three
are
mine,
but
let's
take
them
one
at
a
time.
So
first
one
first
thing
I
wanted
to
bring
up
is:
I
noticed
there
was
a
lot
of
churn
in
the
azure
managed
cluster
area.
A
Recently,
lots
of
really
great
improvements
in
pr's,
going
in
a
couple
of
bug
fixes-
and
I
was
thinking
since
that
area
is
emerging-
to
be
quite
distinct
from
the
rest
of
the
self-managed
clusters
code
and
that
it's
been
most
like
most
of
the
code
has
been
written
by
a
couple
of
different
people
which
are
mostly
not
the
same
people
that
are
writing
most
of
the
code
for
the
self-managed
clusters.
That
might
make
sense
to
add,
like
a
separate
owner's
file.
A
That
would
be
a
step
set
or
not
a
subset,
but
that
those
files
are
subset
of
the
overall
code
base.
So
any
of
the
overall
maintainers
could
still
approve
prs,
but
it
would
allow
the
people
in
that
owner's
file
to
also
review
those
vrs
independently
and
have
you
know,
maybe
more
targeted
reviews
for
those
spouse.
I
don't
know
what
people
think
if
they're
any
opinions
or
dissent
on
this.
A
B
C
David,
I
was
actually
going
to
say
the
same
thing
as
later.
Yeah
ace
has
been
contributing
a
lot
to
the
project
and
I
think
it
would
be
great
to.
A
You
know
recognize
that
yeah
plus
one
to
that.
I
will
talk
to
ace
because
I
don't
know
if
he
has
the
bandwidth
to
like
take
that
on
officially,
but
also
in
general.
A
I
haven't
talked
to
anyone
that
I
was
thinking
of
adding
to
that
file,
so
that
would
be
something
to
see
if
they're
willing,
but
also
in
general,
I
was
thinking
of
adding
a
few
reviewers,
not
necessarily
approvers,
for
managed
clusters,
so,
like
the
people,
mostly
who
have
been
contributing
a
lot
lately
so
loken,
I
don't
know
if
that's
the
first
name
but
broken
rn.
I
think
like
the
his
github,
endo
and
maybe
nicola
you
know
if
you
can
contributing
in
that
area,
if
you're
interested
in
becoming
a
reviewer.
A
A
All
right
next
topic
is
we
yeah,
so
we
haven't
opened
the
shoe
for
renaming
the
master
branch
to
main
branch.
A
It's
been
open
for
a
while,
and
we
said
we
would
wait
for
the
alpha
4
release
to
be
out
before
doing
that,
just
because
we
didn't
want
to
break
all
our
test
signal
right
before
the
release.
I
think
now
that
we're
past
that
we're
in
a
good
place
to
start
thinking
about
it
again.
A
Fortunately,
the
kubernetes,
maintainers
or
kubernetes
contributes
group
has
made
it
pretty
easy
and
they've
outlined,
like
exact
steps
that
you
need
to
make
before
and
during
the
transition.
So
I
think
we
just
need
to
follow
that.
So
the
two
things
I
wanted
to
figure
out
here
is
like,
first
of
all,
was
there
anyone
who's
interested
in
taking
that
online
being
the
owner
and
like
maybe
delegating
some
of
the
tasks
to
others?
E
It
seems
like
this
week
would
have
been
a
good
time
to
do
it,
because
there
are
fewer
people
paying
attention
as
far
as
I,
although
there's
still
been
a
lot
of
activity,
but
sooner
than
rather
than
later.
Obviously,
because
it
ripping
the
band-aid
off
is
a
good
thing,
do
other
projects
or
cappy
have
a
timeline
for
when
they
were
gonna
switch
that
we
should
coordinate
with,
or
is
it
okay
to
just
do
this
independently.
A
I
think
it's
okay
to
do
it
independently,
but
we
should
definitely
post
on
slack
about
our
intentions
to
you
know
see
if
we
should
coordinate
if
anyone
thinks
we
should
coordinate
with
the
other
projects,
I
think
gcp
has
already
done
it.
So
they're
ahead.
Okay,.
C
Yeah
any
any
thought
to
getting
the
calico
patch
in
there
and
then
making
then
cutting
a
release
is.
Is
that
something
that
we
are
interested
in
doing?
Oh,
the
only
reason
I'm
saying
it
is,
it
seems
like
that's
pretty
close,
and
if
we
get
that
in
kind
of
release
move
the
branch
just.
Would
that
be
easier
or
would
moving
the
branch
be
easier.
A
Yeah,
so
I
don't
think
moving
the
branch
would
prevent
us
from
cutting
a
release
or
anything
like
that.
It's
just
that
it
might
disturb
the
testing
like
on
pr's
for
a
little
bit.
So
if
we
wanted
to
get
that
merged
it
would,
I
think,
probably
be
better
if
we
merge
it
first.
As
you
said,
it's
pretty
close
and
I
think
from
nader's
investigation
last
night
is
just
missing
a
configuration
that
I
need
to
go
and
update.
A
A
Actually
it
might
be
a
little
better
if
we
do
it
monday,
so
that
we're
all
around
to
fix
it
if
something
goes
wrong,
but
I
don't
know
what
what
are
people's
thoughts
on
that
later.
B
A
E
Sorry
yeah,
I
was
just
I'd
love
to
help
with
this,
because
for
whatever
reason
I
really
like
kind
of
get
stuff,
but
I'm
gonna
be
gone
until
wednesday.
So
I
can't,
if
we're
gonna,
do
it
right
away.
A
Okay,
well,
so
I
guess
I
I
don't
know
if
I
don't
think
anyone
else
volunteers,
so
I
wouldn't
mind
doing
it,
but
I
don't
think
there's
like
a
huge
urgency
to
do
it
right
now,
but
we
should
get
it
done
soon,
so
we
could
also
like.
A
Actually,
so
what
I
propose
is
that
I
can
try
to
get
all
the
groundwork
layout
done
by
next
wednesday
like
get
all
those
pr's
in
place,
so
that
we're
in
a
good
place
to
actually
cut
the
like
cut
over
when
you're
back
and
you
and
I
can
coordinate
on
that.
How
does
that
sound.
E
A
All
right
david,
I
added
an
item
because
I
saw
you
said
something
in
a
comment
about.
We
should
discuss
this
in
office
hours.
So,
let's
discuss
it.
C
Fantastic,
so
that
item
is
using
a
service
principle
directly
without
actually
using
nmi
or
anything
else.
C
The
reason
why
I
wanted
to
bring
it
up-
or
I
thought
it
would
be
good
to
discuss-
is
to
kind
of
talk
about
the
use
case
and
understand
whether
or
not
we
want
to
support
it.
So
the
the
use
case
is
really.
C
I
don't
want
to
run
aed
identity
or
I
can't
run
it
and
so,
for
whatever
reason,
I
need
to
use
a
service
principle
only,
and
that's
that's
just
that's
the
only
way
I
can
make
this
work.
C
Is
that
a
reasonable
scenario
a
does
it
even
make
sense?
Can
we
run
a
d
pod
identity
wherever
we
want?
Is
there
a
reason
why
somebody
wouldn't
want
to
run?
You
know
a
d
pod
identity,
and
if
so,
is
this
a
viable
solution?
Anybody
have
any
ideas.
D
D
So
in
our
in
our
experience,
we
noticed
that
some
use
cases
from
some
customers
can
cause
qm
app
to
crash
a
lot
or
to
re
gets
restarted
a
lot
so
in
general,
it
can
add.
You
know
a
little
bit
of
pain
when
it
comes
to
operations.
So
I
could.
I
could
assume
that
maybe
customers
like
that
would
maybe
like
to
try
not
to
use
it
if
they're,
if
they
had
the
option
not
to
use
it.
D
A
B
I
haven't
really
looked
at
the
pr
too
much
but
like
what
are
we
losing
by
not
using
nmi?
Remember
when
we
started
doing
that,
we
wanted
to
use
nmi
because
it's
already
like
tested
established-
and
we
know
it-
does
all
the
things
the
right
way.
So
what
are
we
losing
by
just
doing
the
thing
ourselves.
C
So
we
aren't
losing,
we
aren't
really
losing
anything.
The
token
client
for
auto
rest
will
do
the
right
things.
The
the
gate,
the.
What
we
use
is
code
complexity,
so
we
lose
on
code
complexity,
it
becomes
more
complex.
We
have
more
permutations
of
identities,
that's
a
loss.
C
It
is
not
quite
as
simple,
so
there's
there's
just
more
documentation
for
it.
We
gain.
On
the
other
hand,
we
gain
the
ability
to
not
have
a
dependency
on
aed
pod
identity.
If
we
don't
absolutely
need
to-
and
I
guess
I
think-
that's
pretty
much
it
like.
A
So
that
seems
like
negative
like
what
we
lose
if
by
adding
that
option
right
or
by
not
adding
that
option.
But
what
about?
Let's
say
we
say:
okay,
this
is
a
value
use
case
for
some
users,
so
we
should
have
that
as
an
option
and
we
have
both
side
by
side.
What
are
the
reasons
that
we
should
encourage
someone
to
pick?
You
know
using
part
identity
versus
just
using
this
like
what
do
they
gain
out
of
it?
If
it's
the
same
thing.
C
The
way
I
see
it
is
we
we've
had
folks
complain
of
or
not
complaining,
but
open
issues
for
stuff,
like
I'm
already
running
a
d
pod
identity
of
this
version,
I
I
don't
want
to
run
it
from
the
version
that
you're
using
or
I
can't
run
it
from
the
version
that
we
are
putting
out
in
the
infrastructure
yaml.
C
So
I'm
gonna
use
my
version
of
it.
Perhaps
at
some
point
you
know
we
we
end
up
having
an
incompatibility
issue
there,
or
maybe
it's
not
right
what
they
have
it
configured
to
watch
only
certain
name
spaces
or
something
this.
This
can
start
to
be
a
little
bit
painful.
So
I
I
think
this
lens
flexibility
to
be
able
to
not
have
to
configure
a
deep
identity,
not
have
to
use
it
and
have
something
a
little
bit
more
self-contained.
A
A
B
Yeah,
I
guess
I
was
trying
to
ask
as
well
like.
Maybe
this
is
the
future
version
that
we
don't
want
to
have
about
identity
anymore
and
we
can
have
both
at
the
beginning
and
then,
if
everything
is
like
all
right
now,
with
this
put
this
way,
then
we
can
remove
the
dependency.
Is
that
would
that
be
our
goal?.
C
I
think
pot
identity
is,
is
the
right
way
forward
in
that
it
does,
as
cecile
was
alluding
to,
and
I
think
she
already
knew
the
answer
but
pot
identity
is,
is
probably
a
more
secure
solution
than
writing
service
principles
to
file.
Since
you
know,
pod
identity,
a
will
say,
you're
using
user
managed
identities
or
at
that
point
pod
identity
enables
you
to
have
a
rotating
secret,
that's
deployed
by
the
azure
fabric,
so
that
you
don't
have
to
worry
about
rotating
a
service
principle.
C
And
then
you
know
redeploying
your
machines,
because
you
need
to
make
sure
that
service
principle
is
in
the
azure.json
file
on
an
individual
machine.
The
secret
gets
rotated
for
you.
C
It
is
a
better
solution
for
long-term
use
and,
for
you
know,
lowering
the
amount
of
like
operational
day-to-day
kind
of
stuff
that
you
would
have
to
do,
but
it
does
say
you
know
you
have
to
take
a
dependency
on
this.
This
other
piece
of
infrastructure.
B
The
one
tricky
part
that
I
remember,
because
I
worked
on
adding
part
identity,
is
that
we
couldn't
figure
out
a
good
way
of
making
it
an
optional,
installing
it
optional.
So
to
solve
these
like
to
serve
the
value
of
having
this
feature
is
that
you
won't
insult
but
identity,
and
I
don't
know
how
you
would
do
that.
A
C
Is
that,
indeed,
consensus.
G
So
I
think
main
complexity
is
like
making
it
optional.
So
that's
where,
like
I
think
we
had,
we
didn't
find
a
good
way
of
like
making
adoption.
The
only
thing
I
could
come
up
with
was
having
a
replica
field
and
making
a
zero
if
they
want
to
disable
it,
but
but
the
crd
still
get
installed
the
cluster
binding
crds.
So
so
that's
where
we
were
kind
of
stuck.
A
D
Yeah,
I
have
one
more
example
actually
specifically
about
adipode
entity,
so
if
there
is
no
one,
if
we
have
an
option
not
to
use
it,
so
it
increases
complexity
on
caps
decide,
but
on
the
other
hand,
for
somebody
that
wants
to
use
and
deploy
cabzi
for
them.
The
complexity
is
lowered,
so
it's
less
complex.
There
is
one
one
less
thing
to
deploy.
D
We
had
an
example
a
few
months
ago,
so
soon
will
be
a
transformer
we'll
be
moving
to
using
cabzi
and
cap
in
general,
like
completely
full
introduction,
and
we
have
a
different
way
of
deploying
apps
and
services
to
our
clusters.
So
we
are
using.
We
are
deploying
in
our
tasks.
Currently,
then
it
will
be
like
that.
Also
in
the
future.
D
We
are
not
using
this
exact
same
way
to
deploy
cabzi.
Like
you
know
you
would.
You
can
read
in
caption
repo,
but
we
have
a
different
way.
We
package
our
manifest
differently
and
we
use
the
same
code
with
the
same.
It's
the
same
project
and
not
modified,
but
we
deployed
differently
so
having
a
depot
identity
in
there.
For
us
it
was
an
additional
task,
so
we
had
it
before,
but
it
was
like
an
optional
app
that
our
customers
could
use,
and
now
it's
it's
not
optional
anymore.
D
A
But
in
terms
of
installing
it
we
could
publish
like
a
separate
infrastructure
components
manifest
that
is
without
a
departed
identity
like
80
pi,
then
completely
removed
and
then
have
like
cluster
cdl
still
use
like
the
one
with
a
departed
nd
by
default
and
have
it
be
like
the
most
like.
That's
our
90
use
case.
A
A
B
I
think
having
the
separate
infrastructure
yamo
is
probably
the
best
idea,
so
you
can
choose
whichever
one
you
want,
but
just
you
have
to
be
careful
of
which
things
we
test
requiring
our
tests,
because
just
to
make
sure
it's
covered
and
like
things
don't
get
broken,
and
we
don't
notice.
C
C
C
B
C
Agreed
plus
one
to
the
pr
and
thank
you
so
much
for
working
on.
A
It
well
all
right,
let's
move
on,
if
no
one
else
has
anything
on
the
topic,
all
right,
so
just
wanted
to
talk
a
little
bit
about
a
proposal
that
I've
been
working
on.
It's
in
the
prq.
A
If
you
haven't
looked
at
it,
please
take
a
look.
I'm
looking
for
feedback
and
I've
been
working
on
the
code,
while
this
is
being
reviewed,
so
I'll
have
a
poc
2
show
pretty
soon,
but
it's
still
in
progress,
and
basically
the
idea
of
this
pr
is
I'll.
Just
give
like
a
little
summary,
and
then
I
can
answer
any
questions.
If
there
are
me
see
how
do
I
you
file.
A
Okay,
so
the
tilders
right
now
when
we
we've
talked
about
this
before
in
office
hours,
but
when
we
create
azure
resources,
we
block
on
the
completion
of
the
operation.
So
if
we
do
or
create
our
delete.
So
if
we
do
a
put
or
delete
we'll
like
send
that
put
to
azure
and
then
we'll
pull
azure
until
that
operation
is
complete
and
we
can
return.
A
So
what
this
means
is
essentially,
if,
like
a
virtual
machine
creation,
takes
like
two
minutes,
let's
say
in
like
a
bad
day,
then
it
would
mean
that
for
two
minutes
our
controller
loop
is
stuck
waiting
for
that
creation
to
complete,
and
it's
not
doing
anything
else.
That
also
means
that
it's
not
returning
any
information
to
the
user.
From
the
user's
perspective,
things
are
hanging
they're,
not
seeing
any
updates.
A
They
don't
even
know
that
the
creating
has
started
and
it
can
be
quite
slow,
and
that
also
means
that
if
you
have
like,
let's
say
a
thousand
machines
that
you're
trying
to
deploy-
and
you
only
have
a
concurrency
set
to
10,
which
is
the
default
in
our
controllers,
that
means
you'll
process
10
virtual
machines
at
a
time
which
means
you'll
wait
for
the
10
to
be
created
before
moving
on
to
create
the
next
10
and
etc.
That
could
take
quite
a
while,
if
you're
in
a
dynamic
environment
like
cluster
autoscaler.
A
So
what
I'm
proposing
is
that
we
follow
this
pattern
that
has
been
set
as
a
president
in
azure
machine
pools
by
david,
which
essentially
stores
the
state
of
that
long-running
operation
in
the
status
of
the
objects,
and
it
uses
that
to
like
check
on
the
operation
the
next
time
around.
So
instead
of
waiting
for
an
operation
to
complete,
let
me
show
you
the
the
diagram.
A
So
let's
look
at
delete
for
a
second,
because
it's
a
little
easier,
but
basically,
whenever
you
start
a
resource
deletion,
the
first
thing
we'll
do
is
we'll
check.
Was
there
a
previously
running
operation?
A
Do
I
know
about
a
previously
running
operation
and
if
there
is
no
long-running
operation
in
progress,
that
means
that
you're
doing
this
from
scratch,
so
we'll
attempt
to
delete
the
resource
and
also
yeah
so
you'll
delete
the
resource,
which
means
you'll,
send
a
delete,
call
to
azure
and
one
little
difference
from
what's
been
done
in
azure
machine
pool
and
that's
taken
from
cheyenne's
proposal
earlier
that
I'm
doing
here
is
I'm
waiting
for
the
thing
to
complete
for
x
seconds
right
now.
A
That's
set
to,
I
think
five,
but
I'm
playing
with
that
number
and
that
basically
just
say
like
if
it
is
a
short
operation,
like
relatively
short
done
in
a
few
seconds,
and
then
we
can
wait
for
it
that
we
don't
need
to
like
requeue
later.
A
If
it
doesn't
complete
before
that
timeout,
then
that
means
we
need
to
store
its
date
for
later.
If
it
does
complete,
then
we
don't
need
to
start
the
state
anymore.
We
make
sure
it's
empty
nil
and
then,
whatever
no
matter
what
happens?
We
always
end
by
updating
the
status
of
to
set
the
conditions
of
the
object
and
then
going
back.
A
If
you
do
have
a
long-running
operation
in
progress,
then,
instead
of
trying
to
delete
you
get
you
try
to
see
where
it's,
whether
that
operation
is
at
so
you
pull
the
status
from
azure,
and
if
it's
done,
then
you
can
update
the
status
and
say
things
have
changed.
My
resource
is
not
deleted,
so
I'm
in
that
good
state.
If
it's
not
done
yet,
you
just
re-queue,
and
so
I
think
right
now
we're
re-queuing
to
like
say
req
in
15
seconds.
A
That
way,
if
we
don't
immediately
review,
because
that
wouldn't
really
be
like
that,
doesn't
really
give
us
much
advantage
because
then,
like
it's,
probably
not
gonna,
be
done
in.
You
know
a
few
milliseconds
if
it
wasn't
done
now,
but
in
15
seconds,
there's
a
good
chance
that
it
might
so
those
numbers
I'm
still
trying
to
play
with
to
like
find,
what's
optimal
and
trying
to
like
do
different
performance
testing
to
see
which
ones
will
work
best
but
yeah,
a
slight
difference
with
create
or
reconciling
and
by
the
way.
Those
are
like.
A
This
applies
like
all
the
like.
If
you're
familiar
with
the
code
base
the
azure
services
so
like,
for
example,
this
would
be
like
a
virtual
network
delete
and
for
reconcile.
The
slight
difference
is
that
before
creating,
we
do
like
a
get
and
we're
not
doing
that
consistently
in
every
service
right
now,
but
I'm
hoping
we
can
change
that
to
be
a
little
more
consistent
and
the
reason
for
that
is
that
if
you
look
at
azure
api,
actually
I
have
it
open
here,
great
api
resource
and
limits.
A
You
have
more
reads
than
you
have
rights
right.
That's
a
very
common
pattern
and
reads
are
cheaper
generally,
so
you
want
to
do
a
read
if
to
help
you
avoid
doing
it
right
if
possible.
So
every
time
we
do
again,
we
say
is
the
resource
already
there
does
it
have
everything
it
needs?
If
it
does
then
skip
the
rights,
we
don't
need
to
do
it
right
and
only
do
the
right
one,
it's
absolutely
necessary,
so
we
use
those
wisely
for
the
delete.
A
A
Oh,
this
is
like
a
full
reconcile
of
a
like
entire
azure
cluster
reconcile
loop,
so
note
that
every
time
there's
a
it's,
not
very
zoomed
in,
but
every
time
there's
a
context
timer
exceeded
in
one
of
the
services
we
short
circuit
out
of
the
loop,
and
this
is
once
per
service
as
well.
A
Usually
we
reconcile
multiple
public
ids
per
loop,
like
you
have
a
public
id
for
one
load,
bouncer,
and
then
you
have
another
one
for
another
load
balancer
that
would
like
we'll
try
to
do
all
of
them
before
we
like
exit
short
circuits,
because
the
assumption
is-
and
I
I
explained
this
in
the
in
the
dock
somewhere,
but
the
assumption
is
that
a
public
ip
will
not
depend
on
another
public
id
ever
and
so
it's
safe
to
do
those
in
parallel
or
concurrently.
A
So
even
if
one
of
them
is
not
done,
we
can
start
doing
the
other
one.
So
we'll
kick
off
all
the
public,
ip
creates
or
all
the
public
ip
deletes,
and
then,
if
any
of
them
is
not
done,
that
means
we
can't
proceed
to
net
gateways.
So
we
like
short
circuit
and
update
the
status
and
then
maybe
the
other
last
thing
I
want
to
mention
is
our
proposal
on
adding
a
bunch
of
new
conditions,
and
this
is.
A
Yeah,
so
this
is
like
all
the
conditions
and
the
since
that,
I'm
proposing
we
add,
and
it's
basically
give
like
a
way
more
granular
like
status,
update
on
which
resources
exactly
have
been
created
already
and
which
ones
haven't.
A
So
yeah
that
just
wanted
to
show
that
and
give
like
a
little
chance
for
like
questions
like
life
questions.
If
anyone
has
like
read
through
it
and
has
questions
or
comments,
but
if
you
haven't
read
through
it,
I
would
love
your
feedback.
A
C
The
ordering
of
resources-
and
this
is
something
that
we
talked
about-
and
I
I
just
wanted
to
bring
it
up
because
there
were
there-
were
two
async
reconciliation
proposals
and
there's
two
types
of
asynchrony
or
concurrency
that
we
are
looking
at.
This
one
is
still
serial
in
the
in
the
way
that
we
are
approaching
resources,
which
is
not
entirely
the
order
in
which
they
could
be
done.
C
So
there
is
still
another
opportunity
for
optimization
after
this,
where
we
could
build
a
dag
of
resources-
and
you
know
do
a
few
in
parallel
of
multiple
services
and
then
another
set
and
then
another
set-
and
this
is
really
applicable,
I
think
mostly
or
only
for
cluster,
but
that
that
was
basically
the
heart
of
the
other
concurrency
proposal.
C
So
I
I
just
wanted
to
bring
that
up
and
great
work.
The
proposal
looks
really
great
and
I
I
hope
folks
will
read
through
it,
because
it's
actually
a
really
really
cool.
Look
at
how
you
know
to
use
azure
and
use
it
use
it
well.
A
Thanks
yeah,
I
actually
documented
this
as
an
alternative,
the
apparel
reconciliation
and
noted
that
it
could
be
done
in
the
future.
It's
not
mutually
exclusive,
and
also
note
that
in
the
non
goals
I
also
wrote,
increase
or
decrease
the
overall
duration
of
a
reconciliation.
That's
very
important.
That
means
this
proposal
is
not
trying
to
reduce
the
time
it
takes.
It's
just
trying
to
do
it
in
a
way
that
we
have
a
faster
reaction
time.
A
We
improve
the
ux
and
also
there's
the
case
of
like
the
in
the
end
game,
like
of
having
many
many
machines
or
many
yeah
many
machines.
I
don't
think
you
could
have
many
clusters
you
could
technically
but
yeah.
You
could
create
those
like
concurrently
and
start
off
the
crates
before,
like
they
all
finish,
and
that's
where
you
would
gain
time
like
if
you're
trying
to
create
200
vms
but
on
one
vm
they're
not
going
to
gain
any.
A
A
B
I
just
want
to
say
too
quickly,
if
anybody
has
any
prs
that
are
they
can
close
or,
like
finish
off
before
we
start
renaming,
that's
probably
helpful,
because
everything
will
get
triggered
off
on
like
all
existing
open
pr.
So
if
you
can
finish
it
off
close
it,
if
it's
not
like
active,
that
would
be
helpful.
A
Yeah,
that's
a
great
point
and
also,
if
you
have
a
pr
that's
like
waiting
for
reviews,
that's
ready
and
it's
just
waiting
for
people
to
review
it.
I
know
we've
gone
a
bit
behind
on
some
of
the
pr's
lately
because
we
have
less
people
and
people
go
on
vacation
and
everything.
But
please
ping
us
and
we'll
try
to
take
a
look
and
unblock
you.
C
Sorry
I
was
trying
to
fill
in
the
dock
at
the
same
time,
matt
and
I
have
been-
and
also
dan
jordan's
have
been
looking
at
machine
pool
machines
and
the
proposal
is
open
and
cappy.
This
will
start
to
move
the
idea.
There
is
to
move
machine,
cool
machines
well
kind
of
what
we
did
with
azure
machine
cool
machines
up
into
cappy,
so
to
be
able
to
provide
machine
representations
for
machine
pools.
C
C
C
Indeed,
so,
there's
a
lot
of
common
functionality
for
machines,
health
checking.
C
So
it's
really
exposure
at
the
generic
level
and
you
know
reuse
of
functionality
that
already
exists.
G
So
this
is
this:
is
the
question
very
specific
to
cluster
auto
scaler
like
I,
I
was
thinking
that
cluster
auto
scaler,
like
scales,
the
resources
based
on
the
replica
account.
G
C
That's
a
really
fantastic
question,
so
cluster
auto
scaler
has
a
lot
of
really
cool
logic,
to
be
able
to
look
at
what
nodes
are
least
used
and
and
what
scheduling
would
be
impacted
by
deleting
individual
nodes
for
us
when
we
do
like
lower
replica
count,
what
happens?
Is
we
just
go
and
tell
the
virtual
machine
scale
set:
hey
we're
lowering
the
replica
account.
Actually,
that's
not
true.
Let
me
rewind.
If
we
were
to
tell
vmss
to
say:
hey,
we
want
less
replicas.
C
Vmss
has
no
idea
what
is
scheduled
on
those
machines
and
it
would
just
shoot
the
machine
and
all
of
a
sudden
that
workload
there
has
been
preempted
without
any
kind
of
like
safe.
You
know
recording
and
drain,
so
we
actually
delete
individual
machines,
but
first
we
go
through
record
and
drain
where
we,
you
know
drain
off
the
workload,
move
it
out
to
other
machines,
and
then
we
delete
it.
C
So
you
know
you
don't
want
to
be
running
a
web
app
and
all
of
a
sudden,
your
user
gets
disconnected,
and
hopefully
the
load
balancer
helps
you
out
at
some
point
you
want
to
you
want
to
do
that
proactively.
You
want
to
you
want
to
drain
workload
correctively,
so
we
can't
actually
rely
on
vmss
just
to
decrement
the
the
replica
account
from
the
from
the
machine
pool
level.
We
can
decrement
the
replica
count
and
then
we
handle
it
appropriately
in
the
chem
z
layer.
C
G
A
So
I
actually
like
bought
the
same
thing.
I
thought
that
we
only
changed
replica
count
and
I
actually
started
like
on
the
prototype
a
few
months
ago
to
get
machine
pools
working
with
cluster
autoscaler,
and
I
got
it
almost
there,
like
it's
pretty
much
all
there,
except
for
this
one
function.
A
That
is
basically
not
possible
for
machine
pools
and
what
it
does
is
it
deletes
a
specific
node,
and
so
you
need
to
be
able
to
and
the
way
we
do
it
for
machines
is
by
annotating
the
like
the
cluster
api
machine
to
say
that
it
should
be
deleted
and
for
machine
pools.
We
can't
do
that
and
I
think
there's
another
reason
for
that.
I
think
the
reason
is
cluster
over
schedule.
It
doesn't
only
deal
with
like
scaling
up
and
down
the
replica
account
I
think.
A
Historically,
it
also
has
a
functionality
to
remediate
unhealthy
nodes,
which
arguably
is
like
a
little
outside
of
its
scope,
but
that's
what
it
does,
and
so
the
provider
implementations
of
cholesterol
scaler
have
to
implement
that
delete,
note
function
which
will
delete
an
unhealthy
note.
So
that's
also
why
we
need
that.
A
Cool,
I
actually
have
a
question
for
you
david
on
this
proposal,
but
and
that
so
what
happens?
If
a
provider
wants
to
use
like
the
provider
capacity
directly
at
some
point
like
let's
say,
I'm
a
provider,
who's
implementing
my
infra
machine
pool
and
my
infrastructure
allows,
like
I
don't
know
like
allows,
auto
scaling
or
something
with
draining.
I
don't
know.
Let's
say
one
of
the
providers
integrates
kubernetes
functionalities
into
like
their
implementation
of
machine
pool
and
they
want
to
be
able
to
like
outer
scale
without
using
the
cappy
draining
code.
A
C
So
that's
a
really
awesome
question
and
I
would
love
that.
Can
you
add
that
note
to
the
proposal?
We
need
to
think
through
that
a
little
bit
better
right
now.
C
C
C
A
Yeah,
I
think
that's
something
the
kappa
provider
was
doing
well,
especially
because
they
didn't
have
autoscaler
working
at
the
time,
but
kind
of
like
ignoring
the
replica
account
and
just
letting
that
be
handled
by
the
and
then
using
the
cluster
of
scalar
provider
for
aws,
not
the
khaki
one.
C
Yeah,
I
think
it's
a
bad
practice
to
ignore
the
oh,
I'm
sorry
yeah
nichola.
D
Go
ahead,
sorry,
it's
that
that's
what
we
have
to
do
in
our
implementation
of
azure
operator
exactly
so.
We
used
only
for
now
we're
using
only
crds
from
capping
cabsie
and
since
we
are
running
cluster,
auto
scaler
in
regular
azure
mode,
so
azure
provider
for
cluster,
auto
scaler.
In
quite
a
lot
of
places.
We
had
to
make
sure
that
we
are
ignoring
machine
pool
replica
account,
so
classroom
scale
can
do
a
job
regularly
and
just
to
add.
D
What's
the
situation
that
cecile
described
about
cloud
provider
hg
for
example,
or
vmss
having
some
implementation
where
they
take
care
of
training
the
workload?
D
It's
not
far
from
that,
but
there
can
be
a
similar
situation
to
that
where
people
are
using
some
of
those
apps
that
watch,
for
instance,
metadata
service
and
then
they
watch.
We
use
that,
for
example,
in
our
azure
implementation,
so
we
watch
for
machine
termination
events
and
we
drain
the
notes
automatically.
D
So
it's
not
like
something
that
ever
provides,
but
there
are
azure
supports
mechanism,
so
you
can
implement
that
actually
on
your
own
and
I
guess
that
that
would
be
probably
in
conflict
if
a
cluster
autoscaler
would
try
to
do
the
same
thing,
and
I
think
that
aws,
for
example,
even
has
I
don't
know
if
it's
official,
but
in
aws
github
repo
github
organization,
there
is
an
app
to
do
that
and
it's
quite
maintained.
Well,
I
think.
C
Yeah,
there's
there's
a
call
out
in
this
proposal
about
spot
instance
utilization
where
we
haven't
filled
it
out
completely
yet,
but
that's
one
of
the
scenarios
right.
C
You
want
to
catch
those
events
and
you
want
to
have
the
node
notify
like
kubernetes,
that
hey
I'm
going
away
draining
cord
myself
and
then
take
me
out
right
that
that
is
definitely
something
that
that
needs
to
come
into
play
there
and
the
way
that
that
would
feed
upward
is
that
we're
looking
at
having
infrastructure
references
exposed
from
the
infrastructure
provider,
machine
pool
implementation,
and
so
those
infrastructure
references
as
one
of
the
machines
goes
away.
It
would
also
it
should
also
be
cleaned
up
in
in
cappy.
C
I
think
it
needs
to
be
paired
with
what
you're
saying
nicola
with
with
something
on
machine.
That's
going
to
catch
those
instance
metadata
notifications
and
something
we
need
to
describe
in
detail
here.
D
Yeah
for
azure
it's
a
little
bit
tricky
for
spot
vms
because
you
get
only
30
seconds
to
do
whatever
you
would
like
to
do
so.
A
Yeah,
so
I'm
curious.
If
you
looked
at
this
proposal,
it's
actually
been
under
review
for
a
while
needing
more
reviews,
but
that
was
exactly
the
discussion
that
we
were
having
on
one
of
the
comments:
yeah
yeah.
So
basically,
this
is
proposing
that
we
use
the
termination
handler
for,
like
interrupting
spot
instance.
Workloads
and
the
big
difference
with
azure
is
that
we
only
get
30
seconds,
and
so
I'm
not
sure
that
the
proposal,
as
is,
would
work
well.
C
Yeah
for
auto
scaling
nicola.
If,
if
you're
interested,
I
I
would
love
to
see
like
an
issue
or
a
proposal
around
how
to
enhance
a
machine
pool
to
better
handle,
auto
scaler
functionality,
because,
like
you're
saying
a
replica
count,
goes
right
out
the
window.
C
Really
it's
more
like
you
know,
min
max
and
or
maybe
some
set
of
configuration
in
some
way.
We
have
to
be
able
to
mark
replica
count
as
hey
this.
This
doesn't
even
matter
anymore.
This
is
this
is
useless
information
and-
and
it
would
be
great
to
get
that
solidified
before
we
move
it
out
of.
D
Experimental,
okay,
cool
I'll
I'll
try
to
right.
Where
I
have
something.
I
can
tell
you
what
we
did
dance
form
we
added.
So
it
was
a
sort
of
a
quick
fix,
so
we
added
annotations
for
cluster
autoscaler
and
the
logic
is
simple.
So
if
the
annotations
are
set,
we
use
those
and
we
ignore
replica
account.
Otherwise
we
use
replica
account,
but
it
would
be
definitely
nice
to
have
some
more
robust
solution
in
copy,
so
not
sure
what
that
could
be,
but
yeah.
I
will
give
it
a
talk.
A
All
right,
I
think,
we're
at
tim
thanks
everyone
and
I'll
talk
to
you.
One
sec.