►
From YouTube: SIG Cluster Lifecycle - Cluster API Deep Dive into Cluster Autoscaler Node Group Balancing 20220912
Description
In this video we take a deep dive into how the Kubernetes Cluster Autoscaler performs its balancing similar node groups feature with the Cluster API cloud provider implementation.
A
So,
to
begin
with,
here,
I'm
going
to
share
my
screen
and
we'll
take
a
look
at
the
faq
from
the
auto
scaler,
and
this
is
generally
where
I
direct
people
to
look.
You
know
first,
when
we're
when
they
have
questions
about
the
auto
scaler,
and
if
anybody
has
questions
during
the
meeting
today,
you
know
feel
free
just
to
kind
of
raise
your
hand
or
interrupt
me
or
whatever.
I'm
not
gonna,
I'm
not
going
to
stand
on
too
much
circumstance
or
yeah
ceremony.
A
So
anyways,
one
of
the
questions
in
the
how
to
in
the
faq
is
about
running
nodes
for
h,
a
purposes
in
multiple
zones,
and
it
talks
about
this
flag
called
balanced,
similar
node
groups
that
was
introduced
quite
a
while
ago
in
the
in
the
auto
scaler,
and
this
feature
is
something
that
you
know
at
red
hat.
We
get
asked
about
with
pretty
regular
frequency.
We
have
many
users
who
seem
to
like
using
this
feature.
A
I
wouldn't
say
that
everybody
does,
but
it's
probably
one
of
those
things
where
you
know
40
to
50
percent
of
users.
Ask
about
this
feature
and
what
this
feature
allows
you
to
do
is
to
tell
the
cluster
auto
scaler
that
when
it
is
attempting
to
scale
up,
because
it
will
only
do
this
during
scale
ups,
that
the
cluster
auto
scaler
should
try
and
create
nodes
amongst
node
groups
that
are
similar
to
each
other,
and
so
when
the
cluster
auto
scaler
is
attempting
to
scale
out.
A
You
know
it's
attempting
to
grow
the
cluster
one
of
the
first
things
it
does:
is
a
sort
of
bin
packing
algorithm
to
try
to
figure
out
what
the
best
node
type
for
expansion
would
be,
and
it
does
that
based
on
the
pods
that
are
pending
and
then
it
looks
at
kind
of
the
capacities
of
the
different
nodes
that
are
there.
It
also
looks
at
things
like
selector
labels
in
case
the
pending
pods
need,
you
know,
special
requirements
and
whatnot,
and
then
the
auto
scaler
will
make
a
choice
based
on
that.
A
If
you
have
this
flag
enabled
the
auto
scaler
will
attempt
to
find
similar
node
groups
when
it
wants
to
expand
more
than
one
node,
and
so
you
know
what
what
are
node
groups
and
what
is
similarity
between
them.
So
internal
to
the
cluster,
auto
scaler
is
a
concept
that
it
calls
a
node
group,
and
this
is
not
exposed
in
a
crd.
A
You
know
all
the
nodes
within
that
node
group
will
be
the
same
when
you
create
them,
meaning
that
they'll
all
have
kind
of
the
same
topology
they're
all
expected
to
have
the
same
labels
and
taints
on
them.
So
when
the
auto
scaler
looks
at
all
the
node
groups,
it
has,
it
can
then
make
decisions
about
these,
and
you
know
every
provider
has
a
different
notion
of
how
to
group
node
groups.
A
A
Then
necessarily
our
node
groups
have
to
be
more
abstracted,
and
this
is
where
I
find
working
with
cluster
api
with
the
auto
scaler
to
be
kind
of
easy,
because
in
general,
cluster,
api
machine
deployments
or
machine
sets,
if
you're
just
using
them
correctly
map
one
to
one
with
node
group.
So
in
cluster,
auto
scaler.
If
I
have
two
machine
deployments
and
both
of
them
are
doing
auto
scaling,
then
I
could
assume
that
each
one
of
those
is
a
node
group.
A
So
there's
a
couple
primary
tests
that
the
autoscaler
looks
at
when
it
tries
to
determine
what
are
similar
node
groups.
The
first
is
are
the
capacities,
so
this
is
like
the
cpu,
the
memory
capacity.
In
the
case
of
special
resources,
there
might
be
a
gpus
or
you
know,
networking
special
networking
cards
that
have
been
added
and
it
will
use
those
to
try
and
compare
the
node
group
yeah
fabrizio.
You
had
a
question:
go
ahead.
B
C
The
story
is,
we
are
deploying
the
auto
scale
that,
at
the
end,
is
a
new
controller
with
its
own
stuff,
and
the
flag
that
you
were
talking
about
is
a
flag
that
that
goes
in
the
autoscaler
deployment.
C
Okay,
the
missing,
maybe
I
I'm
making
a
really
silly
question
now:
you're
talking
about
okay,
what
is
a
node
group
in
class
api?
The
missing
bit
for
me
is.
I
would
declare
that
a
machine
deployment
is
another
scalar
group.
A
Yeah,
that's
a
good
question,
so
our
implement
and
and
you're
right
at
some
level,
I'm
going
to
go
back
to
sharing
because
I'll
show
you
where
this
is
in
the
code,
our
you
know
the
auto
scaler
yeah.
It
is
kind
of
like
a
controller
and
it's
just
kind
of
reconciling
records
in
the
case
of
cluster
api.
A
So
if
we
go
back
into
cluster
in
the
I'm
in
the
auto
scalar
repo
looking
at
cluster,
auto
scaler
cloud
provider
cluster
api-
and
this
is
where
this
is-
where
all
of
our
documentation
is,
when
you
add
the
annotations,
when
you
add
the
scaling
annotations
to
your
machine
deployments
or
machine
sets,
this
is
how
the
cluster
auto
scaler
will
identify
those
node
groups,
so
it
it
is,
reconciling
all
machine
deployments
in
all
machine
sets
and
when
it
sees
these
annotations
present,
it
knows
to
include
that
node
group
in
auto
scaling.
D
One
question
yeah
and
the
capacity
information
that
you
just
mentioned
that
it
needs
to
group
or
get
the
attributes
for
a
node
group.
It's
also
just
saying
from
the
capacity
annotations
right,
like
the
ones
below
directly.
A
They
can
also
come
from
the
status
capacity
block
of
the
infrastructure
reference
right,
so
the
auto
scaler
will
attempt
if
it
sees
an
infrastructure
reference,
it
will
attempt
to
get
that
record
and
then
it
will
look
to
see
if
there's
a
capacity
block.
If
there
is
a
capacity
block,
it
will
use
that
if
there
are
annotations
it
will
use
the
annotations
as
an
override.
B
Yeah
one
query:
so
if
I
have
a
cluster
deployment,
which
has
two
node
groups,
but
in
two
different
zones,
would
I
be
able
to
identify
if
the
pod
is
spending
in
a
particular
zone
and
the
cluster
auto
scaler
could
only
scale
out
that
zone.
The
measuring
deployment
definition
could
be
the
same
across
the
two
zones
so
how's
that
identification
carried
forward
to
the
auto
scaler
and
then
applied
to
the
right.
Node
group.
A
Yeah
that
that
is
a
great
question,
abigail
and-
and
I
am
just
getting
to
labels,
because
labels
are
the
other
part
of
how
the
cluster
auto
scaler
attempts
to
match
jobs
right
or
attempts
to
to
grow
the
cluster.
So
if,
on
the
pending
pods,
there
are
labels
that
are
needed
and
those
labels
can
be
satisfied
by
a
node.
A
The
cluster,
auto
scaler
will
attempt
to
make
those,
but
in
most
cases
the
zone
labels
are
not
necessarily
required
by
the
pods
right
like
it's
it's
it's
usually
rare
that
a
user
would
create
a
pot
and
say
I
need
this
pod
to
be
running
in
this
zone.
It's
certainly
possible
that
they
could
do
that,
but
they
don't
necessarily
need
to.
So
there
are
what
are
called
the
you
know:
the
well-known
zone
label
in
kubernetes
and
when
the
cluster
auto
scaler
starts
getting
into
looking
at
balancing
node
groups.
A
There
is
a
piece
of
code
that
is
called
the
node
group
set
processor,
and
this
is
I'm
getting
deep
into
how
the
auto
scaler
looks
at
these
things
now,
but
there's
a
group
of
what
are
called
processors
and
these
processors
customize.
The
behavior
by
which
the
cluster
auto
scaler
can
look
at
things
like
node
groups
and
node
info,
and
these
this
node
group
set
processors
are
what
the
auto
scaler
uses
to
do.
A
Some
of
these
deep
comparisons
right
so
in
the
case
of
zone
labels
right
oftentimes,
when
you're
attempting
to
balance
the
nodes
between
them,
you
don't
actually
care
about
the
zone
labels
right,
but
the
auto
scaler
will
consider
two
nodes
different
if
their
labels
don't
match
right,
and
so,
if,
if
you
have
one
node,
that's
deployed
in
zone
a
and
another
node,
that's
deployed
in
zone
b,
the
auto
scaler
considers
those
to
be
disparate
node
group.
A
A
This
is
a
function
that's
used
by
the
auto
scaler
when
it
attempts
to
do
balancing
operations
and
it
uses
this
to
try
and
figure
out
what
labels
it
should
ignore,
and
this
is
kind
of
getting
to
the
crux
of
the
discussion
we
were
having
a
few
weeks
ago
at
the
cluster
api
meeting,
because
one
of
the
things
that
I've
been
doing
is
going
in
here
and
adding
more
lit
zone
labels,
or
you
know,
adding
more
labels
that
it
could
be
aware
of.
So,
for
example,
this
one
here,
topology.ebs.csi.aws.com.
A
A
But
we
need
those
nodes
to
be
considered.
You
know
we
need
them
to
ignore
this
label
for
the
purposes
of
considering.
If
they're
the
same,
you
know
the
scheduler
will
handle
the
persistent
volume
requirements,
but
these
node
these
labels
would
cause
it
to
look
different
and
similarly
there's
a
group
of
what
are
called
the
basic
ignored
labels.
And
these
are
you
know
we
you
see
these
used
over
and
over
again
like.
A
D
D
That
means,
if
someone
adds
some
labels
in
some
way
to
a
node
it
I
mean
that
you
can't
know
about,
because
it's
just
totally
different
reason:
why
just
label
is
there.
A
Right
now,
you're
you're
jumping
ahead
a
little
bit.
I
appreciate
it
because
it
allows
me
to
talk
about
this
next
I'll,
just
I'll
do
a
little
diversion
here
and
then
we'll
go
back
so
you're
absolutely
right,
and
there
is
another
flag
here
called
balancing
ignore
label.
A
So
if
you
have,
if
you
have
a
situation
where
you
have
users
who
are
customizing
the
labels
on
their
node
groups,
maybe
they
have
some
special
information
that
they
want
to
keep
or
you
know,
there's
some
reason
to
demarcate
two
node
two
node
groups
is
different
from
each
other,
but
you
want
to
make
them
the
same
in
terms
of
balancing.
You
can
use
this
flag
multiple
times
when
starting
the
cluster
auto
scaler,
to
give
it
different
labels
that
it
should
specifically
ignore.
A
So,
even
if
the
labels
aren't
being
automatically
ignored
by
the
by
the
node
group
set
processors,
you
could
still
inject
labels
that
you
would
like
them
to
ignore,
and
similarly
you
can
also
define
a
balancing
label
that
you
should
use.
This
is
a
feature
that
was
added
more
recently,
so
you
can
say
I
want
any
node
groups
that
have
this
label
to
be
considered
the
same,
regardless
of
what
else,
what
other
labels
are
there?
A
And
so
this
was
a
feature
that
was
added
recently
that
when
you're,
using
balancing
similar
node
group
sets,
you
could
set
this
balancing
label
and
then
the
the
cluster
auto
scaler
will
ignore
every
other
label
and
only
look
for
those
labels
to
make
similarities
so
that
that's
kind
of
a
little
bit
of
a
side
look
here
into
how
you
could
you
could
supplement
this
on
your
own.
If
you
need
to.
D
One
follow-up
question
and
based
on
that,
one
machine
deployment
is
always
one
node
group
in
auto
scanner.
A
Yeah
I
mean
that
could
happen.
You
could
do
that
and-
and
likewise
you
might
have
different
reasons
to
use
these
balancing
labels,
like
you
could
imagine
if
you
had
four
machine
deployments
in
your
set
in
your
in
your
cluster
and
two
of
the
machine
deployments
had
label
a
and
two
of
the
machine
machine
deployments
had
label
b,
you
might
set
this
balancing
label
twice
to
designate
that
you
have
two
different.
You
know
topological
groups
and
you
wanna
work
is
when
work
is
going
to
one
group.
A
C
If
I
got
it
right,
so
let
me
say:
standard
behavior,
that'll
scare,
balance
between
another
group,
but
if
I
enable
this
flag
balance
seminar,
note
groups
and
then
I
play
with
these
other
flags.
C
A
Yeah,
you
absolutely
could
and
then,
like
you
know,
think
about
like
think
about
topologies,
where
we
have
people
who
are
like
they
have
one
group
of
nodes.
That's
like
machine
learning
related
stuff.
Maybe
it
has
gpus
on
it
and
then
they
have
another
group,
that's
like
their
database
stuff
that
has
high
speed,
storage
or
something
if
they
wanted
to
make
both
of
those
groups
highly
available.
They
could
make
two
machine
deployments
for
each
group
set
the
machine
deployments
in
different
zones
right
and
then,
when
they
have
workloads
that
flow
to
those
machine
deployments.
B
Just
one
question:
yeah
yeah
go
ahead:
you
just
showed
a
a
list
of
the
labels
which
are
kind
of
ignored
right,
not
taken
in
so
do
we
still
have
to
maintain
that
or
because
we
already
have
this.
The
other
way
of
passing
it
through
the
api
right.
So
is
that
list
still
required
to
be
maintained.
A
So
that
is
a
good
question
abhijit
and
it
kind
of
brings
us
to
the
where,
where,
where
I'd
like
to
end
up
here
so
yeah
like
let's,
if
I
go
back
to
the
code,
we're
looking
at
here,
do
we
need
to
maintain
this
list?
That's
a
great
question,
because
I
you
know
one
of
the
things
I
have
up
right
now
is
I
have
a
pr
open
to
make
to
try
to
make
this
better.
A
So
you
know
one
of
the
one
of
the
things
that
I
would
like
us
to
do
is.
I
would
like
us
to
maintain
some
of
these
labels,
especially
the
labels
that
other
providers
have
called
out
as
being
important
because
it
it
makes
the
experience
better
for
our
users
automatically
out
of
the
box.
So
if
they,
when
I'm
trying
to
here
so
here's
the
list
of
labels
that
I'm
trying
to
update
currently
into
our
into
our
you
know,
hey
jack.
A
These
are
the
labels
that
I
think
we
should
incorporate
and
I'm
trying
to
bring
them
in
from
all
the
different
clouds
that
we
cover.
These
are
the
automated
labels
that
they
use,
and
I'm
kind
of
describing
many
of
them
are
used
by
csi
drivers
or
they're
used
by
cloud
controller
managers
and
they're
just
used
to
tell
like
this.
Node
is
in
this
zone,
or
this
node
is
in
that
zone.
A
So,
in
general,
the
reaction
from
the
auto
scaler
community
has
been
to
ignore
these
labels
for
the
well-known
like
zone
labels
and
whatnot,
and
so
I've
been
bringing
them
in
to
try
and
create
a
list,
and
you
know
make
sure
that
when
users
use
the
cluster
api
provider
for
cluster
auto
scaler,
they
kind
of
have
this
awesome
experience
out
of
the
box,
but
abigail.
I
think
your
question
is
absolutely
pointing
it
like.
A
We
could
decide
not
to
maintain
these
labels
for
cluster
api
and
then
it
would
just
be
up
to
the
user
to
set
the
balancing
ignore
labels
whenever
they
were
trying
to
use
this.
So
you
know
really-
and
we've
only
got
about
seven
more
minutes
left
in
the
scheduled
time
slot.
This
is
like
a
good
place
to
be
in
terms
of
the
conversation
like
you
know,
should
we
be
maintaining
these
labels?
Is
the
is
this
something
that
we
want
to
do
as
a
group?
A
You
know
to
try
and
make
the
experience
more
seamless
for
cluster
api
users,
or
should
we
just
document
here's
how
you
ignore
these
labels?
Here's
how
you
configure
it,
because
certainly
it
will
be
much
easier
for
us
to
maintain
if
we
don't
have
to
look
at
these
labels
in
the
node
group
processor,
but
that's
at
the
cost
of
making
this
more
complicated
for
our
users.
D
I
wonder
if
an
allowance
approach,
plus
documentation
would
be
easier
than
the
other
way
around,
so
what
I
don't
know
is
about
which
labels
do
we
actually
care?
D
Let's
say
even
if
it
wasn't
right,
so
I
mean
it's
kind
of
the
same
problem
just
the
other
way
around,
but
maybe
there
are
not
that
many
labels
that
we
have
to
care
about.
Maybe.
A
Yeah,
that's
that's
kind
of
like
up
to
the
users
right
like
you,
users
are
always
you
know
they
maybe
label
some
machine
deployment.
It's
like
this
is
my
super
special
deployment.
So
when
I
create
workloads,
they
all
have
to
go
to
the
super
special
deployment.
I
mean
in
general
the
what
the
it
seems
like
the
auto
scaler
community
has
tried
to
do,
and
this
was
easier
for
other
clouds
because,
like
you
know
on
aws,
they
only
have
a
few
that
they
they
want
to
maintain.
A
A
A
E
E
My
vote
because
the
I
don't
think
that
the
downside
of
maintaining
a
set
is
huge.
I
think
it's
just
really
predictable,
that
it
will
almost
always
be
incomplete
and
I
guess
you
could
arg,
I'm
leaning
toward
the
argument
that
that's
sort
of
misleading
to
the
user
by
even
maintaining
any
set
you
sort
of
suggest
that
we've
got
this
under
control,
but
I
think
that
there's
very
little
confidence
that
we
will
have
this
under
control
for
any
particular
user.
So
it
would.
E
I
would
have
a
slight
preference
towards
just
kind
of
asking
the
user
to
do
that
work
as
a
way
of
actually
producing
a
confident
contract.
You
know,
otherwise
it's
just
yeah.
I
think
I
that's
just
my
my
view.
A
This
does
not
necessarily
need
to
be
in
the
code
for
the
auto
scaler,
either
right
like
if
we
develop
helm,
charts
or
you
know,
cluster
class
add-ons
or
whatever
that
deploy
the
auto
scaler
for
you
like
you,
we
could
certainly
encode
this
knowledge
in
those
places
as
well
like
when,
when
starting
the
auto
scaler
this
you
know
the
reason
this
came
up
for
us
is
that
at
red
hat
we
do
a
lot
of
automated
testing
of
the
cluster
auto
scaler,
and
we
do
have
a
series
of
tests
that
we
do
with
the
balance
or
node
group
stuff
and
so
like,
as
we're
now
deploying
kubernetes
with
ccms
we're
starting
to
see
another
explosion
of
these
like
labels
that
the
the
ccm's
are
using
to
designate
some
sort
of
zonal
differences
right,
they
could
be
ignored
and
that's
the
symptom.
C
May
I
ask
the
question
so
basically
the
specific,
so
the
problem
arises
only
if
the
user
opt-ins
in
these
cross
node
group,
balancing
okay
so
quest
the
the
my
comment
is
that
there
is
let
me
say:
I
see
a
problem
or
at
least
a
risk
in
the
current
auto
scale,
behavior
that
consider
all
the
labels,
because
if
I
have
a
cluster
perfectly
configure
it
and
someone
goes
and
apply
a
label
on
my
node,
basically
screw
up
of
my
load
balancing
thing.
So
this
is
the
the
biggest
risk.
C
So
starting
from
this
is
the
bigger
risk,
because
I'm
let
me
say
what
I
see
and
I'm
the
administrator
I
want
to
set
up
on
the
scatter
and
be
confident
that
my
setup
works
from
that
moment
on
so
the
behavior
that
the
autoscarer
considered
all
I
see
it
risky.
I
think
that
the
recommendation
that
we
have
to
do
to
give
to
our
customer
pi
user
is
that
if
you
want
to
obtain
in
this
feature,
you
have
to
pick
up
also
the
label
that
drives
this
behavior
and
set
it
explicitly.
C
A
A
I
I
think
so
let
me
just
repeat
it
to
make
sure
I'm
following
you
so
like
you're
you're
saying
the
preference
would
be
better
to
make
the
user
explicitly
aware
of
these
labels
that
they
need
to
know
about,
so
that
they
can
absolutely
be
assured
like
what
the
auto
scaler
is
doing
and
there's
like
kind
of
no
magic
going
on,
and
that
we
should
just
fully
document
this
out
and
kind
of
show.
People
like
hey
if
you're
working
on
this
cloud
use
these
you're
working
on
that
cloud
use.
C
C
D
D
A
D
One
question
the
list
that
you
currently
have
on
that
pr,
how
much
controversial
opinion
is
already
in
that
list?
I
mean
it's
something
like
that:
we
wouldn't
care
about
zones.
I
guess
that's
something
that
that
different
people
might
see
differently
right
that
that
a
different
zone
is
already
a
different
note
group.
I
I
don't
really
know
just.
C
D
Would
work
for
everyone
or
is
more
like
okay,
it's
maybe
already
it's
not!
It's
not
very
likely
that
that
will
work
for
a
lot
of
people,
because
they
just
have
already
different
opinions
and
everyone
does
it
differently,
etc.
That
would
that
would
mean
that
it's
hard
to
just
add
something
hardcoded
which
is
kind
of
the
best
way
to
do
for
everyone.
A
Well
right
so
like
where
this
started
is
that
in
kubernetes
you
know,
there's
like
a
set
of
well-known
labels
and
annotations,
and
one
of
them
is
the
well-known
zone
label,
and
so
the
well-known
zone
label
is
one
of
the
main
things
that
the
auto
scaler
ignores
right,
and
so
it
ignores
that
by
default
right,
unless
you
turn
on
the
the
flag
that
allows
it
to
only
look
at
a
certain
label
or
specific
labels.
A
So,
right
now
there
are
some
of
these
that
the
auto
scaler
is
already
ignoring
ones
that
the
community
has
agreed
on.
Like
you
know,
topology.kuber
zone,
that's
like
that.
That's
well
known!
The
community
has
agreed
to
ignore
that
some
of
the
other
ones
like,
for
example,
the
csi
drivers
and
I'll
just
note
we're
over
time
here.
So
if
anybody
has
to
drop,
you
know,
please
don't
feel
bashful.
A
Some
of
the
other
things
like
the
csi
drivers
right
when
csi
was
first
being
defined
as
a
spec.
The
various
driver
implement
implement
implementers
had
not
yet
agreed
on
the
well-known
zone
labels
right.
A
A
We
could
always
have
to
be
maintaining
this
list,
and
so
I
think
the
wisdom
there
from
what
I'm
hearing
is
like
probably
better,
to
teach
our
users
how
this
works,
how
they
can
enable
it
and
when
they
should
enable
it,
and
then
we
don't
have
to
worry
so
much
about
having
the
auto
scaler
automatically.
Do
this
stuff
for
us.
D
A
D
A
Okay,
since
we're
over
time,
does
anybody
else
have
any
comments
before
we
kind
of
wrap
things
up
here?
Oh
I
see
well,
stefan
and
fabio.
You
guys
had
your
hands.
Did
you
still
have
something
you
want
to
say
for
breezy,
okay,
cool
thanks
everybody
for
your
time.