►
Description
GCE and EC2 provide a great platform to run your own isolated Kubernetes cluster within the cloud. With the Kubernetes Cloud Autoscaler, scaling on-demand of your GCE and EC2 instances can even be done from within your Kubernetes cluster. This session will introduce the Kubernetes Cloud Autoscaler concept and discuss how it is implemented for GCE and EC2. Finally we will have a look at the Cloud Autoscaler backend for KubeVirt, a drop-in Virtualization add-on for Kubernetes, which brings Virtual Machines to Kubernetes to allow you running isolated workloads on your Bare-Metal Kubernetes installation.
A
So,
in
order
to
understand
why
we
need
to
cast
out
of
scaler
I
want
to
go
with
you
through
four
main
questions.
First,
how
public
clouds
in
general
help
you
with
scaling
your
application
at
all?
How
Kubo
need
is
fits
there
in
the
picture
and
how
it
helps
you
to
get
out
of
more
more
of
your
resources
which
you
are
buying
on
the
public
cloud,
then
we
will
finally
see
why
we
actually
need
to
cluster
autoscaler.
So
we
will
go.
A
We
will
ask
the
question
how
the
outer
scale
helps
us
with
combining
the
public
cloud
and
kubernetes
and
finally,
I
want
to
tell
you
a
little
bit
about
Qbert
in
the
cluster
out
of
scale
and
how
it
can
help
you
to
deploy
isolated
workloads
on
your
bare
metal
faster.
So
when
we
look
at
the
first
question
how
the
public
cloud
helps
with
scaling
in
general,
the
overall
answer
is
pretty
easy.
It
gives
you
virtual
machines.
It
gives
you
a
lot
of
them.
Basically
as
much
as
you
need.
You
just
have
to
pay.
A
That's
how
it
starts.
So
you
write,
so
you
have
to
make
your
application
scalable
to
make
use
of
that,
and
you
typically
do
it
with
a
simple
application
by
taking
the
resource
incentive
parts
of
your
application,
put
them
in
and
make
them
scalable,
which
means
that
they
can
run
in
parallel,
put
a
load
balancer
in
front
of
these
parts
and
when
your
notes
get
under
pressure,
you
scale
up
when
the
pressure
is
gone
because
no
one
is
visiting
from
your
webpage
or
whatever
you
scale
down
again
you
in
theory.
A
If
you
have
a
public
API
to
view
drop
provider,
you
can
just
script.
Everything
monitor
your
virtual
machines,
create
new
virtual
machines,
stop
old
ones,
but
typically
public
cloud
providers
help
you
they're
a
little
bit,
for
instance
the
Amazon
Web
Services
ec2.
They
have
the
auto
scaling
groups
concept,
which
means
that
you
can
define
a
virtual
machine
template
and
assign
scaling
policy
to
it.
Scaling
policies
to
it.
Some
typical
scaling
policies
would
be
tracking
track,
a
specific
CPU
metric
and
if
trigger
is
activated
it
scales
up
and
if
another
trigger
is
activated,
it
scales
down.
A
A
You
see
that
there
are
a
lot
of
resources,
also
hundred
X.
Also
here
you
see
it
already,
which
you
are
paying
for,
but
which
you're
not
using.
So
when
we
look
again
on
the
example
with
the
three
running
nodes,
you
already
see,
all
three
of
them
just
have
65%
of
CPU
usage,
which
means
that
if
hundred
percent
means
is
the
amount
which
a
full
node
has
you
already
pay
for?
A
One
hundred
five
percent
node
resources
which
you
are
not
using,
so
it's
tempting
to
try
to
pack
the
workload
harder
Titans
at
the
VM,
but
that's
actually
not
very
easy
to
do.
You
would
need
an
extra
scheduler
which
helps
you
doing
that,
and
also,
if
you
want
to
share
your
machines
with
other
people,
so
that
they
can
also
more
you
can
make
use
of
the
leftover
resources.
You
would
need
some
kind
of
multi-tenancy
or
so
on,
which
is
normally
not
there
in
cloud.
A
How
much
of
the
public
cloud
resources
you
did
so
the
solution
here
is
to
just
deploy
communities
inside
your
nodes
on
the
public
cloud
and
then
schedule
everything
with
pots
and
containers.
And
that's
what
you
see
here
in
the
picture.
So
the
outside
is
the
Amazon
Web
Services
or
BQ
Google
compute,
engine
node
and
in
on
there
you
run
your
cube,
Anita's
notes
and
in
there
your
plots
are
finally
running
so
a
fine
solution.
So
you
take
your
simple
web
application.
A
Let's
just
consider
a
very
simple
example:
we
still
have
two
three
workload:
the
three
backends
running
with
a
with
the
CPU
usage
of
65
percent,
each
and
different
schedule,
a
new
pot
which
needs
fifty
percent
of
one
node.
We
don't
have
that
left
anywhere
and
so
the
community
scheduler
tells
you
oh
I,
can't
schedule
this,
but
the
Google
compute
engine,
auto
scaling
group,
says
hey
I'm,
fine
I
don't
have.
There
is
no
pressure
on
the
node,
so
I
will
not
scale
up,
and
that
is
exactly
where
declare
the
autoscaler
sorry
wrong.
A
Button
very
autoscaler
finally
helps
you.
It's
just
a
small
application
which
you
can
deploy
inside
you
kubernetes
cluster,
it's
in
principle,
agnostic
to
the
cloud
provider,
so
you
can
use
it
for
Google
compute
engine
easy
to
s
or,
and
it
makes
use
of
the
auto
scaling
primitives,
which
the
public
clouds
offer.
You,
for
instance,
have
already
set
the
auto
scaling
group
or
the
managed
instance
group.
A
Then
it
gives
the
VM
some
time
to
register
it
over
the
communities
node.
Once
the
wants,
the
resource
pressure,
isn't
it
given
anymore?
It
will
just
scale
down
the
nodes,
and
that's
that's
pretty
much
it
already.
One
side
note
here
is
that
this
is
really
just
about
this
use
case.
It's
not
about
pod,
auto
scaling
there
extra
port
autoscaler,
and
it's
also
not
about
load
balancing
regarding
to
be
scheduling-
ports,
for
instance,
from
one
node
and
putting
it
to
another
node.
There
are
also
other
projects
which
are
doing
this,
for
instance
the
D
scheduler.
A
You
see
a
typical
I'm,
a
file
of
a
note
which
contains
a
few
fields
which
which
influence
the
decision
in
the
midde
data
you
have
to
label
section
in
this
case
with
just
with
just
one
label:
community
stop
do
/host
name,
and
you
see
the
capacity
of
the
node
CPU
memory,
lots
of
stuff
like
that,
and
that's
it
now.
Your
application
works
again.
It
can
scale
up
and
down
again
if
you
want
to
use
it
with
over
the
open
stack
you
right
now
out
of
luck.
A
It
just
supports
public
clouds
at
the
moment
and
now
I
created
a
back-end
for
Cubert
and
that
offers
very
interesting
use
cases
in
the--.
In
practice.
This
works
like
this.
You
have
bare
metal,
kubernetes
or
open
shift
cluster
with
Qbert
installed
and
in
there
you're
now
creating
a
nested
kubernetes
cluster
and
tell
the
class
the
autoscaler
to
monitor
the
nested
kubernetes
cluster.
A
In
order
to
make
that
possible
in
Hubert,
we
had
to
think
about
also
in
implementing
something
like
the
cluster
auto
scaling
group
for
diminished
instance
group
for
us,
and
we
decided
to
create
a
virtual
machine
replica
set,
and
the
name
is
not
an
instance.
It's
not
chosen
by
instant.
It
really
is
pretty
much
the
same,
like
a
replica
set
for
pause,
just
that
it
creates
virtual
machines
instead
of
pods.
A
We
need
a
little
bit
of
metadata
there,
so
that
the
autoscaler
has
all
the
information
to
scale
up,
for
instance,
the
label
to
allow
selecting
it
as
a
as
a
as
a
node,
so
that
the
autoscaler
knows
that
this
will
create
nodes
which
will
register
to
register
to
the
cluster.
You
need
a
few
annotations
which
tell
it
how
much
resources
we'll
be
able
to
how
much
pods
will
be
able
to
run
on
the
node.
A
How
much
storage
will
be
there
and
below
that
in
the
specs
actually
see
replicas
three
in
this
case,
which
means
create
three
VMs
or
make
sure
that
three
VMs
are
running
and
that's
pretty
much
then
what
the
cluster
autoscaler
will
will
manipulate
during
the
runtime
to
scale
up
and
down
virtual
machines.
But
that's
not
all
what
you
see
here.
There
is
on
the
second
screen.
The
specification
goes
on,
and
here
you
see
the
actual
specification
of
the
virtual
machine
it
will
create.
A
As
you
can
see,
you
can
specify
resources
on
the
virtual
machine,
how
much
memory
it
will
have
how
much
CPU
cores
it
will
have.
You
can
add
discs
there,
and
what
you
see
here
is
that
I've
added
two
disks,
the
first
disk,
the
boot
disk
references,
the
boot
disk
volume
in
the
volume
section
below-
and
this
is
just
referencing-
a
fedora
27
image-
and
we
have
a
second
disk
here,
which
is
cloud
in
a
new
cloud
data
source
which
we
can
use
for
bootstrapping.
To
note
and
auto,
registering
it
references
just
the
kubernetes.
A
A
You
create
a
very
simple
config
file
which
points
it
to
the
configuration
file
of
the
bare
metal
kubernetes
cluster,
and
then
you
start
it
point
it
by
a
dash
dash,
cube
conflict
to
the
nested
kubernetes
config,
well,
cloud
provider,
you
select
the
cube
root
back-end
by
a
cloud
config.
You
pointed
to
the
Indian
file
I've
written
here
on
the
top
of
the
page,
and
no
group
out
to
discovery
is
not
a
part
where
the
where
the
labels
from
before,
and
they
were
to
machine
right,
because
that
gets
interesting
again.
A
There
I
put
the
cube
root
of
the
autoscaler
label
there,
and
here
with
that
expression,
we're
telling
the
the
autoscaler
that
it
should
look
for
returned
machine
replicas
with
exactly
that
labor
and
selection
strategy,
which
put
no
template
it
should
use.
We
were
using
I'm
using
here
list
with
least
waste,
which
means
it
looks
it
go
through
all
the
no
templates
and
just
checks
which
one
fits
best
to
workload
and
leaves
least
CPU
and
memory
unused,
and
that's
pretty
much
it
now.
A
Your
bare
metal
cluster
will
scale
up
or
wrong
and
I've
already
mentioned
the
first
interesting
use
case,
and
that
is
what
you
are
seeing
here.
You
can
have
like
in
the
public
cloud
case,
you
can
have
a
completely
nested
kubernetes
cluster,
which
can
scale
up
and
down
by
asking
for
more
resources,
but
a
bare
metal
community
installation
at
the
bottom
and
that's
kind
of
the
typical
public
cloud
market
area.
In
the
use
case,
you
you
don't
have
to
take
care
of
what
the
customer
is
doing
in
there.
A
So
you'd
have
to
basically
have
two
different
types
of
nodes:
the
bare
metal
nodes
and
virtualized
nodes,
and
in
the
virtualized
nodes
you
can
then
run
isolated
workloads
typical
example
for
berries,
for
instance,
if
you
want
to
do
CI
for
github
in
github,
it's
typical
on
every
pull
request.
You
you
run
some
tests,
but
what
what's
run?
There
is
pretty
much
much
opaque
for
the
CI
system.
It's
you
just
execute
arbitrary
code.
There.
You
have
no
idea
what
people
are
doing
there,
so
you
probably
make
sure
that
you
don't
schedule
such
workloads
directly.
A
Next
to
your
sensitive
high
performance
pots
on
the
bare
metal
nodes,
we
want
to
isolate
them,
and
this
is
exactly
where
this
scenario
can
help
you.
If
you
want
to
go
into
the
details
there,
a
little
bit
more,
you
can
read
about
the
cluster
out
of
scale
in
their
github
repo.
There
is
a
port
autoscaler
to
which
I've
mentioned,
and
a
disk
a
doula.
These
three
projects
kind
of
make
a
whole
story
out
of
the
whole
scaling
in
the
cluster.