►
From YouTube: Cloud Native Live: Kubernetes automatic rightsizing
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Oh
okay,
so
hi
Welcome
to
Cloud
native,
live
the
way
we
dive
into
the
code
behind
Cloud
native
I
am
Muhammad
charger,
almost
like
a
metal,
so
I'm
a
sensitive,
Ambassador
and
I
will
be
your
host
tonight.
So
every
week
we
bring
a
new
set
of
presenters
to
Showcase
how
to
work
with
Cloud
native
Technologies.
They
will
build
things,
they
will
break
things
and
they
will
answer
your
questions
in
today's
sessions.
I'm
stoked
to
introduce
Andy,
who
will
be
presenting
on
kubernetes
automatically
right
sizing.
A
So
this
is
an
official
live
stream
of
the
cncf
and,
as
such
is
subject
to
the
CNC
code
of
conduct.
Please
do
not
add
anything
to
the
chat
or
questions
that
would
be
in
violation
of
the
code
of
conduct.
Basically,
please
be
respectful
to
all
of
our
fellow
participants
and
his
presenters.
With
that
I'll
hand
it
over
to
Andy
to
kick
off
to
this
presentation
and,
let's
add
Andy
to
the
session,
so
hey
Andy.
How
are
you
I'm.
B
All
right
so
my
name's
Andy
I'm,
the
CTO
at
Fairwinds,
author
and
maintainer,
of
several
of
our
open
source
projects,
including
Goldilocks,
which
I'll
talk
a
little
bit
about
today,
but
today,
I
want
to
talk
about
something
I've
been
working
on
over
the
last
couple
of
months
and
that
we'll
be
slowly
maybe
making
its
way
into
Goldilocks,
which
is
automated
right
sizable.
B
So
for
those
who
aren't
familiar
with
Goldilocks
Goldilocks
is
a
rapper
around
the
vertical
pot
Auto
scaler
project
that
lets
you
automatically
provision
vertical
pod,
Auto
scalers
and
then
view
the
recommendations
for
resources
for
all
of
the
pods
in
your
cluster
in
a
single
dashboard.
Now,
up
until
this
point,
Goldilocks
has
been
really
focused
on
recommendations.
How
do
we
see
what
resources
our
pods
are
using
and
allow
us
and
and
give
us
a
baseline
for
setting
those
going
forward
and
how
to
tweak
those?
B
What
I'd
like
to
start
to
explore
further
as
we
go
you
know
into
the
future,
is
how
do
we
start
to
utilize
the
automatic
right-sizing
abilities
of
the
vertical
part
out
of
scalar
in
a
safe
and
effective
way,
so
that
we
can
increase
the
utilization
percentages
of
our
clusters?
There's
so
many
of
the
Clusters
that
I
work
with
are
utilizing
so
little
of
the
resources
available
in
them,
because
we
tend
to
over
provision
and
I
know.
That's
a
really
hot
topic
right
now,
because
we're
all
worried
about
cost
across
the
board.
B
At
least
a
lot
of
us
are
I
know
so
today,
I
want
to
show
how
we
can
set
up
a
cluster
utilizing
I'm,
actually
going
to
use
four
different,
open
source
projects
to
put
together
a
cluster
that
automatically
sizes
all
of
the
workloads
in
the
cluster
and
allows
it
to
Auto
scale.
B
So
please
interrupt
me
at
any
time
with
questions.
Keep
them
coming
and
we'll
just
kind
of
dive
into
the
setup
here
and
I
will
show
what
we
have
going
on.
B
All
right
is
my
screen
share
up
all
right,
so
I
have
an
eks
cluster
here
and
I
have
four
different
Technologies
specifically
running
in
this
cluster
that
are
going
to
help
us
out
today.
So
the
first
one
is
I
need
cluster
Auto
scaling,
so
I
need
to
be
able
to
get
new
nodes
in
my
cluster
and
I
need
that
to
be
relatively
relatively
flexible
because
we
need
different
node
types
in
order
to
maximize
the
utilization
of
our
cluster.
B
So
we
have
here
running
Carpenter,
so
for
those
not
familiar
with
Carpenter
carpenters
and
open
source,
Auto
scaler
for
kubernetes,
you
create
an
object,
called
a
provisioner,
and
so
we
can
take
a
look
at
the
provisioner
here
in
the
cluster
and
the
provisioner
essentially
gives
you
the
ability
to
say.
Okay
I
want
these
kinds
of
nodes,
but
it
also
allows
you
to
specify
a
certain
amount
of
flexibility.
So
there's
some
values
in
here
that
are
important.
B
We
have
instance
category
so
I've
listed
three
different
instance
types
that
we
can
get
in
this
cluster,
so
we
have
C-Class
instances
m-class
instances
in
R
class,
so
that's
compute,
optimized,
general
purpose
and
then
memory
optimize
instances.
So
I
want
Carpenter
to
be
able
to
pick
nodes
that
have
different
balances
of
CPU
and
memory
based
on
the
workloads
that
I'm
going
to
deploy
in
this
cluster
in
order
to
save
on
cost
I'm,
allowing
it
to
only
provision
spot
instances.
B
That's
kind
of
up
to
you
whether
you
want
to
do
that
in
your
environment,
but
I'm
doing
it,
because
this
is
a
Sandbox
and
I
I.
Don't
want
to
spend
a.
A
B
Money
on
it
and
then
you
can
cap
out
the
amount
of
resources
the
carpenter
has.
This
is
just
sort
of
a
safety
thing
for
me.
I
don't
want
my
cluster
to
blow
up
we'll
talk
about
some
of
the
pitfalls
of
automatic
right
sizing
later
on
and
I
will
explain
why
this
has
to
be
in
there
and
then
the
only
other
thing
that
I
like
to
enable
in
Carpenter,
which
is
not
super
related
to.
B
B
B
Yeah
all
right,
so
the
only
other
thing
that
I
like
to
to
do
in
Carpenter,
which
is
not
required
for
automatic
right
sizing
but
I
think
it
does
a
nice
job
of
keeping
my
cluster
fresh
and
allows
me
to
do.
Upgrades
more
easily
is
set
a
TTL
on
my
notes,
so
my
nodes
will
expire
after
a
day,
so
no
node
will
live
longer
than
a
day
with
you
know
varying
load
in
my
cluster.
B
That
tends
to
not
happen
anyway,
because
Carpenters
constantly
rebalancing
and
then
actually
that
reminds
me
of
the
last
thing,
which
is
consolidation.
So
this
gives
Carpenter
the
ability
to
evict
pods
move
them
around
to
different
nodes
and
sort
of
rebalance
how
the
cluster
is
structured,
which
is
really
important
for
automated
right
sizing,
so
Carpenter
the
first
tool
we
have
in
place
again:
we've
got
consolidation,
enabled
true.
B
B
So
if,
if
we
want
to
look
at
my
values
file,
it's
fairly
straightforward,
it's
really
just
giving
it
access
to
the
roll
arm
that
it
needs
to
do
its
job
and
then
some
other
some
service
monitoring,
so
that
I
can
get
some
metrics
in
the
cluster
and
really
not
much
else.
Just
some
security
things
that
I
have
enabled
in
this
clusters.
B
All
right
see,
we
have
some
folks
with
video
issues.
I
think
we
look
okay
from
my
previews,
so
I'm
gonna
keep
going.
The
second
tool
that
we
have
configured
in
the
cluster
is
the
vertical
pod
Auto
scaler.
So
we
are
going
to
install
that
with
the
Fairwinds
chart.
For
that.
So
that's
at
github.com
fairwindsops,
slash
charts.
B
We
have
a
vpa
chart
that
allows
you
to
install
the
vertical
pod
Auto
scaler
I'm,
using
the
latest
version
of
it
and
I'm
honestly
using
a
good
portion
of
the
defaults
except
I'm
I'm,
enabling
the
admission
controller,
which
I
believe
is
not
default
in
our
chart,
and
that's
so
that
we
can
actually
do
automatic
vertical
pod
Auto
scaling.
So
we
need
to
have
a
certificate
in
place,
I'm
using
cert
manager
to
generate
that
certificate
and
manage
the
mutating
web
hook.
Configuration
and
beyond
that
I
think
vpa
is
fairly
standard
configuration.
B
Oh
the
last
thing
we
have
to
do.
We
want
long
long-lived
data
to
feed
our
vertical
pot
out
of
scalar,
so
this
is
super
important
to
getting
accurate
recommendations
from
the
vertical
parallel
scalar.
So
we
have
it
hooked
up
to
Prometheus.
So
in
the
recommender,
which
is
one
of
the
components
of
the
vertical
pod
Auto
scaler,
we
give
it
a
Prometheus
address
and.
B
We
give
it
a
Prometheus
address,
we're
going
to
give
it
a
minimum
CPU
a
minimum
memory
and
we're
going
to
say
the
storage
type
needs
to
be
Prometheus,
so
that
will
allow
our
vertical
pod
Auto
scaler
to
reference
this
Prometheus
in
the
cluster
to
get
metrics.
So
if
we
go
take
a
look,
we
can
see
we
have
Prometheus
running
in
this
cluster.
We
have
things
like
you
know:
machine
CPU,
cores
available.
We
have
all
the
various
metrics
I'm
using
the
standard.
Cube
Prometheus
stack
installation
to
get
that
I
seem
to
have
lost.
My
comment.
B
Feed
so
I'll
have
to
rely
on
you
to
to
throw
questions
at
me
as
they
pop
up.
Okay,
I
will
do
that
all
right,
so
few
Prometheus
stack
collecting
all
the
metrics
in
the
cluster
relatively
default,
configuration
there
and
then
the
vertical
part
Auto
scalar,
pointing
at
that
Prometheus
using
Prometheus
asset
storage.
So
we've
got
Carpenter.
B
We've
got
vertical
pot,
Auto
scaler
a
couple
different
values
associated
with
those,
and
now
we
get
into
the
the
next
bit,
which
is
let's
talk
about
Goldilocks
next,
so
Goldilocks
allows
you
to
create
vpas
for
all
of
your
workloads.
So
if
we
look
in
this
cluster
and
we
do
a
get
vertical
pod.
A
B
Scalars
across
all
the
namespaces
we'll
see
that
we
have
one
for
every
single
workload
in
this
cluster,
there's
quite
a
lot
of
different
workloads,
we're
running
our
CD,
we're
running
Prometheus,
we're
running.
You
know
a
few
different
demo.
Apps
we've
got
one
in
the
team.
One
namespace
we've
got
a
Yelp
app.
We've
got
a
demo,
a
basic
demo,
app
running
as
well.
B
I'll
talk
about
the
actual
applications
in
a
minute,
but
we
can
see
we
have
a
lot
of
different
vertical
pod,
Auto
scalers,
and
so
in
order
to
do
that,
we
installed
Goldilocks
using
a
Helm
chart,
like
I've
mentioned
in
the
past,
at
the
same
repo
that
we
have
in
our
Fairwinds
stable
repository,
and
the
only
thing
that
we're
doing
here,
that's
not
standard.
Is
that
we're
setting
this
on
by
default
flag?
So
what
this
does?
When
we
tell
the
controller
and
the
dashboard
on
by
default?
This
means
that
it
will.
B
We
don't
have
to
annotate
the
namespaces
that
the
objects
are
in
in
order
for
Goldilocks
to
create
vertical
pot
over
scalers.
It
will
just
create
one
for
everyone
in
the
cluster
automatically,
no
matter
what
so
Goldilocks
when
we
configure
it.
This
way
we'll
create
a
vpa
for
every
object,
but
it
won't
be
turned
on
it'll,
be
in
mode
off,
which
is
just
the
recommendation
mode,
which
is
the
default
for
how
Goldilocks
operates.
B
So
the
last
thing
that
we
do
is
modify
goldilocks's
ability
modify,
how
Goldilocks
create
these,
creates
these
vpa
objects.
So
let's
go
ahead
and
get
the
namespace
Yelp.
B
This
is
one
of
our
demo
applications,
the
Yelp
application
and
we've
added
two
annotations
to
this
namespace,
and
this
is
a
sort
of
not
well-known
feature
of
Goldilocks
that
allows
you
to
modify
how
the
vpa
is
created.
So
first
one
we
have
here
is
we
have
Goldilocks
about
fairwinds.com
vpa,
update
mode
set
to
Auto,
so
that's
going
to
put
all
of
the
vertical
pod
Auto
scalers
in
the
Yelp
namespace
into
this
automatic
mode,
which
means
it's
going
to
automatically
when
a
pod
gets
created.
B
In
that
namespace
and
mutating
admission,
web
Hook
is
going
to
set
the
resource
requests
for
the
pods
created
in
that
namespace
and
you'll
notice.
It's
turned
on
on
auto
for
all
of
my
namespaces,
so
every
single
pod
that
gets
created
in
this
cluster
has
its
CPU
requests
set
by
CPU
and
memory
requests
set
by
this
mutating
admission
web
hook
in
the
cluster,
which
is
a
little
bit
terrifying.
B
You
take
things
on
the
fly
as
you're,
creating
them
all
the
time,
which
is
why
I'm
doing
this
in
a
sandbox
cluster
and
we'll
talk
about
some
more
of
the
pitfalls
of
that
later,
and
the
last
thing
that
we
have
is
the
ability
to
control
minimums
and
maximums
Via,
this
container
policy
and
or
this
vpa
resource
policy
annotation,
and
it's
probably
easier.
B
If
we
look
at
the
the
vpa
object
that
gets
created
itself,
we'll
take
a
look
at
this
Yelp
UI
vpa
in
the
Yelp
namespace,
and
we
will
see
that
it
had
Goldilocks
has
added
this
resource
policy
here
that
defines
the
behavior
of
the
vertical
pod,
Auto
scalers
automatic,
right
sizing
or
resizing
in
this
namespace.
So
this
applies
to
all
containers
in
every
pod,
because
we
have
a
star
here
and
in
this
case
we're
saying
the
maximum
allowed
is
four
CPUs
and
six
gigs
of
memory.
I.
B
Think
it's
important
to
pick
a
value
for
this
I
had
some
early
experimentation.
That
was
really
interesting
where
the
vpa,
when
it
doesn't
have
a
full
amount
of
data,
will
sort
of
recommend,
potentially
very
large
or
very
small
amounts,
and
in
the
case
of
it,
recommending
very
large
amounts.
What
can
happen
is
say,
you
know
it
thinks
it
needs.
16,
CPUs,
right,
I,
think
I
had
one
that
it
said
this
pod
needs
16
CPUs.
B
Well,
that
wasn't
accurate,
but
it
went
ahead
and
created
that
pod
modified
that
pod
to
request
16,
CPUs
and
then
Carpenter
in
all
of
its.
You
know,
flexibility
very
happily,
obliged
and
I
think
it
created
an
m512xl
in
my
cluster,
which
is
a
very
large
instance
size
that
I
was
not
expecting,
and
so
you
want
to
have
these
caps
on
here
just
for
a
little
bit
of
safety.
Now
you
those
are
controlled
again
through
that
annotation
that
we
showed
earlier.
B
This
is
just
a
Json
format
of
this
container
policy
that
we're
looking
at
so
you
can
modify
that
on
a
namespace
level.
You
could
even
add
additional.
You
know.
Specific
containers
are
allowed
to
request
more,
but
I
definitely
recommend
having
this
resource
policy
in
place
just
to
cap
things.
The
other
thing
you
can
use
is
a
limit
range,
which
is
a
kubernetes
object.
The
vertical
pod
Auto
scaler
respects
limit
ranges
as
well.
I
found
this
resource
policy
to
be
a
little
bit
more
flexible
and
easier
to
work
with
than
the
limit
range
object.
B
B
But
that's
you
know
a
decent
amount
of
memory
that
it's
requesting
there,
and
so
we
have
recommendations
for
all
of
our
pods
we're
collecting
all
of
the
metrics
in
Prometheus
and
then
we're
allowing
Goldilocks
to
create
these
and
then
Carpenter
is
giving
us
new
nodes
in
our
cluster.
B
Based
on
the
requests
coming
into
the
cluster,
so
you
can
kind
of
see
all
of
the
different
Dynamic
pieces
going
into
this
that
allow
all
of
these
to
right
size,
and
so
the
last
thing
that
we
need
to
talk
about
is
horizontal
pod,
Auto
scaling,
so
we're
vertically
sizing
we're
setting
our
requested
limits,
but
we
also
have
applications
like
this.
B
Let's
go
to
the
Yelp
namespace.
We
also
have
applications
that
need
to
horizontally
scale.
So
if
we
take
a
look
here,
we'll
see
that
well,
we
have
two
replicas
of
the
app
server
running.
That
should
actually
be
more
I'm,
not
sure
we'll
dig
into
that
in.
A
B
B
But
if
we're
also
vertically
scaling
on
CPU,
we
don't
necessarily
want
a
horizontally
scale
in
CPU,
because
those
two
will
be
add-ons
with
each
other
and
possibly
conflict
with
each
other.
So
the
real
key
to
this
is
being
able
to
horizontally
scale
on
a
separate
metric,
and
since
we
already
have
Prometheus
metrics,
if
we
go
take
a
look
here,
we
could
get.
You
know
something
like
nginx.
B
Let's
see
nginx
Ingress
controller
requests,
so
we're
using
Ingress
nginx
we're
getting
requests
and
we
can
divide
that
up
by
the
different
ingresses
in
here.
So,
for
example,
say
the
sargo
CD
Ingress.
We
can
see
how
many
requests
it's
getting.
We
also
have
metrics
such
as
you
know,
Network
traffic
coming
in
or
latency
for
this
particular
Ingress.
We
have.
B
Available
but
setting
up
the
HPA
with
those
can
be
a
little
bit
difficult,
so
I'm
going
to
add
in
a
fourth
project,
actually
I
guess
we're
up
to
five
now,
because
we've
got
Prometheus,
Goldilocks
DPA
and
the
vertical
pod,
Auto
scaler,
so
I'm
going
to
add
a
fifth
project
which
is
Keta
or
how
do
folks
say
it
like
I,
don't
know,
I,
don't
think
we
could
do
a
poll
here,
but
I'm
curious,
whether
it's
pronounced
keita
or
Keta
I'm
gonna
go
with
Keta
for
today.
B
So
ketta
is
a
nice
controller
that
allows
you
to
sort
of
create
horizontal
pod,
Auto
scalers,
with
a
different
spec
called
a
scaled
object,
so
I'm
installing
Keta
in
the
Keta
namespace,
with
a
fairly
standard
set
of
values,
I!
Think
of
just
setting
some
resource
requests,
adding
some
Prometheus
information
so
that
I
can
get
metrics
but
other
than
that
I'm
I'm,
using
a
fairly
stock
install
of
Keta
and
so
with
Keta.
B
What
we
get
is
we
get
these
scaled
objects
and
the
nice
thing
about
these
scaled
objects,
and
let
me
switch
tabs
here
for
a
second
is:
if
we
take
a
look
at
the
spec
for
that,
and
we
look
at
say
the
Yelp
app
server
scaled
object.
This
is
a
very
straightforward,
spec,
very
similar
to
a
horizontal
scaler
that
allows
us
to
specify
a
Prometheus
query
out
of
the
box,
so
we
can
say
here's
where
my
Prometheus
lives.
This
should
look
familiar
from
when
we
configured
the
vpa.
B
This
is
what
I
want
to
call
this
metric.
This
is
the
threshold
that
I
want
to
shoot
for
per
pod,
so
this
would
be
10,
000
requests
per
pod
and
then
I
can
put
in
a
query.
So
if
we
take
this
query
for
the
Yelp
app
server
and
we
go
punch
that
into
Prometheus,
we
can
see
the
the
value
of
it
at
this
given
time
so
right
now,
it's
very
low.
B
So
if
we
look
at
the
HPA,
it's
currently
available
here
we'll
see
that
that
if
I'm
in
the
right
namespace
we'll
go
back
to
the
yellow
notes,
just
here's
that
Yelp
app
server
HPA
and
we
can
see
that
10
000
Target,
currently
we're
at
277,
so
we're
at
a
minimum
of
2
out
of
40
pots,
but
I
didn't
have
to
create
this
HPA
I
didn't
have
to
write
a
Prometheus
metrics
adapter
to
do
that.
Keta
just.
B
For
me
so
I'm
a
huge
fan
of
this
project,
and
it
is
really
the
last
piece
of
this
so
that
we
can
scale
all
of
our
pods
on
horizontally,
based
on
metrics
other
than
CPU
and
memory.
B
So
the
and
then
the
the
last
thing
I
need
to
do.
If
this
is
not
a
real
environment
which
it
is
not
is
I
need
to
generate
some
load,
so
I'm
going
to
go
over
here,
real,
quick
and
just
double
check
on
my
load
generation,
because
it
doesn't
seem
to
be
working
I'm
using
a
tool
called
k6.
This
just
runs
a
whole
bunch
of
load
against
various
endpoints,
and
that
seems
to
be
working.
We'll
see
what
happens.
B
Yeah,
okay,
that
window
wasn't
important,
I'm
not
going
to
worry
too
much
about
it,
but
so
to
recap
again,
because
I
think
there's
a
lot
going
on
here
and
I
think
it's
it's
sort
of
tough
to
put
all
the
pieces
together
in
your
head.
Necessarily,
if
you
haven't
done
this
before
we've
got
the
kubernetes
metrics
coming
in
through
Prometheus,
we've
got
the
horizontal
pod
Auto
scalar
that
is
configured
via
Keta
using
Prometheus
metrics
that
are
not
CPU
in
memory
to
scale
horizontally.
B
We've
got
our
vertical
pot
on
a
scalar
scaling,
pods
up
and
down
in
their
resource
requests,
and
then
we
have
Goldilocks
creating
those
vpa
objects
automatically
for
us
and
configuring
them
in
that
automatic
mode.
So
in
theory,
everything
should
be
completely
dynamic,
as
I
increase
load
on
the
cluster.
We
should
see
you
know
potentially
the
vertical
size
of
some
of
these
pods
getting
bigger
as
they
start
to
actually
consume
resources.
B
We
should
see
horizontal
scaling
where
we
we
all
the
the
different
horizontally
scalable
workloads
in
the
cluster,
go
in
and
out,
and
then
we
should
also
see
the
cluster
creating
new
nodes
to
accommodate
those
and
then
maybe
reshuffling
them
over
time,
and
so,
as
you
start
to
think
about
this
you're
like
oh,
how
am
I
supposed
to
like
wrap
my
head
around
this,
so
there's
something
wrong
right.
How
do
I
like
look
at
this
happening
and
so
what
I'm
working
on?
B
B
228
requests
coming
in
to
that
app
server,
and
so
you
can.
This
is
just
kind
of
seeing
the
and
let's
just
zoom
in
on
that
time
period.
This
is
the
HPA
doing
its
job
right.
The
HPA
wants
to
keep
the
number
of
replicas
such
that
the
per
pod
number
of
requests
is
a
10
000..
So
this
is
good.
We
want
to
see
that
and
then
we
have,
on
the
lower
end,
the
UI
pod.
So
this
is
a
multi-tiered
thing.
B
There's
a
UI
and
there's
a
back
end,
so
the
UI
is
is
also
scaling
horizontally.
So
that's
the
HPA
in
in
action
doing
its
work.
We
can
look
at
latency,
so
the
next
thing
that
we
have
to
consider
is
the
balancing
Factor
right.
We
can.
We
can,
you
know
vertically
scale
and
horizontally
scale,
all.
B
Where
how
what
balance
is
that,
what's
on
the
other
side
of
the
equation,
and
then
generally
on
the
other
side
of
the
equation,
is
some
sort
of
performance
metric
right?
We
need
to
have
enough
resources
to
have
good
performance
in
our
cluster,
so
we
see
here
we're
just
we're
tracking
latency
on
the
Yelp
UI
Ingress.
B
You
know
this
particular
one
has
been
sort
of
high
around
600
milliseconds
I'm,
not
actually
sure
why
it's
something
I've
been
planning
to
look
into,
but
you
know
we
want
to
have
another
metric
to
balance
against,
and
one
thing
that
I've
done
in
other
namespaces
is
add.
This
latency
metric
as
a
second
scaling
metric
for
the
horizontal
part
of
the
scalar,
because
that
key
to
that
ketta
spec
lets,
you
add
multiple
metrics
as
targets,
because
you
can
do
multi
multi-metric
pod,
Auto
scalers.
B
So
that
is
one
option
for
maybe
balancing
performance
with
your
resource
requests
down
here
we
just
have
the
raw
number
of
requests
coming
into
the
Ingress
and
then
over.
Here
is
where
we
start
to
look
at
the
vpas,
the
vertical
part
of
the
scalers.
B
So
if
we
look
here
and
I'm
just
going
to
filter
down
to
the
app
server,
because
it's
a
little
bit
easier
to
see,
we've
got
the
Target
from
the
vertical
pod
Auto
scaler,
which
is
its
recommendation,
and
then
we
have
the
actual
request.
Now
these
should
be
the
same
which
might
be
worth
looking
into,
but
they
are
very
close.
So
we
can
track
what
the
vertical
pod
Auto
scalar
is
doing
for
both
CPU
and
then
over.
Here
we
have
memory.
So
let's
go
ahead
and
filter
this
one
down
as
well.
B
A
B
Using
60
Milli
cores,
the
vpa
is
targeting
25
and
I'm,
looking
at
a
very
small
window
of
time
in
this
particular
graph.
Just
so
that
we
can
see
the
graph,
my
guess
is
if
this
hovers
at
65
for
long
enough,
the
vertical
pod,
Auto
scaler
will
start
to
bump
that
up
and
then
same
with
memory.
If
we
look
at
just
the
app
server
for
some
reason,
we're
hovering
around
1.7,
gigs,
not
sure.
What's
going
on
there.
B
Little
weirdly
might
be
not
the
best
example,
but
we
we
have
a
place
now
where
we
can
start
to
see
all
the
different
pieces
and
then
over
here
we
have
just
a
little
graph
showing
Carpenter
doing
its
job.
So
this
is
how
many
machines,
it's
created
and
terminated
I've
been
working
on
adding
graphs
for
seeing
the
types
of
nodes.
But
if
we
take
a
look
at
the
cluster
right
now,
we
can
see
we
have.
B
We
have
our
base
instance
group,
so
we
have
a
single
managed
instance
group
that
allows
us
to
run
the
carpenter
controller
and
things
like
that.
I've
changed
that
to
a
C5
instance
type,
because
I've
noticed
that
this
cluster
is
particularly
CPU
heavy.
But
then
the
ones
that
have
a
provision
are
listed.
These
were
created
by
Carpenters,
so
we
have
a
C5
2XL
and
a
c54
XL.
B
Obviously,
Carpenter
also
recognizes
that
we're
very
CPU
constrained
in
this
cluster,
not
memory
constrained,
and
we
can
see
how
it's
reacting
to
that
by
giving
us
compute
optimized
instances.
B
Well,
I
absolutely
can
use
the
look
at
how
this
is
functioning.
Is
this
tool
called
Cube
capacity
written
by
Rob
Scott?
If
anybody's
familiar
with
him-
and
so
if
we
look
at
Duke
capacity
and
we
add
the
utilization
flag,
the
dash
U
flag,
we
can
see
that
we
have
well,
maybe
there
it
goes.
B
We
have
a
CPU
utilization
of
83,
that's
pretty
good
across
the
cluster
I.
Don't
see
that
very
often
up
that
high
and
then
we
have
a
memory
utilization
of
20
percent.
Now
that
seems
a
little
low.
I
would
love
that
to
be
higher.
But
if
we
dig
into
that
a
little
bit,
we
can
see
that
we're
using
10
gigs
of
memory
across
the
cluster
and
we're
using
24
and
a
half
CPUs.
That
is
a
two
to
one
CPU
to
memory
ratio.
B
And
if
you
look
through
the
instant
size
list
available
in
Amazon,
you
will
find
no
instances
give
you
that
sort
of
spread
of
memory
to
CPU,
so
I've
been
having
a
debate
with
some
of
my
co-workers
about
whether
I
just
have
some
non-ideal
workloads
running
in
this
cluster.
That
are
just
you
know,
not
sort
of
your
average
workload
or,
if
you
know,
maybe
there's
something
to
to
dig
into
further
there,
but
having
a
CPU
utilization
above
80
feels
really
good
to
me
and
it's
sort
of
what
I'm
going
for.
A
B
Along
with,
if
we
go
back
to
maybe
one
of
our
other
demo
apps
a
a
latency
value,
hovering
under
100
milliseconds
for
this
particular
application,
so
that
feels
pretty
good
to
me
and
so
yeah.
Well,
it
was
here
it
went
back
up
here,
so
we'll
have
to
look
into
that,
but
we
also
dropped
our
requests.
B
So
I'd
also
like
to
share
just
a
little
bit
about
the
demo
apps.
We
have
I
think
three
of
them
running
in
this
cluster.
We
have
Yelp,
which
I've
mentioned
a
couple
of
times,
which
does
not
seem
to
be
functioning
at
the
moment.
B
Always
got
to
break
something:
don't
we
nope
all
right?
We
have
this
demo
application,
which
is
just
constantly
pings
the
back
end
and
shows
you
how
many
pods
there
are,
or
just
kind
of
shows
you
which
pod
it's
talking
to
the
color
tear
a
little
off.
So
you
can't
see
that,
but
that's
the
name
of
the
pod
that's
being
hit.
So
we
can
see
the
horizontal
Auto
scaling
in
action
here
and
then
we
have.
B
The
Emoji
Emoji
photo
app
from
our
friends
at
buoyance
who
make
Linker
D
where
you
vote
on
an
emoji,
and
you
can
see
how
many
votes
they
have
because
I'm
generating
traffic
against
this.
So
some
of
these
have
a
lot
of
votes,
180,
000
or
so
all
right.
So
those
are
the
three
apps
we're
running
and
yeah.
B
So
that's
kind
of
the
general
setup
and
you
know
obviously
took
me
a
half
hour
to
get
through
the
whole
setup
because
it
took
me
probably
two
weeks
just
to
build
all
this
and
and
get
it
working
and
the
goal
is
in
the
future
to
make
this
easier
to
do
it's
such
a
complex
process,
there's
so
many
different
pitfalls.
B
There's
so
many
different
levers
you
can
you
can
pull
not
as
you
can
turn
that
we
want
to
start
to
understand
how
all
of
these
tools
work
together
and
then
build
an
easier
story
going
forward.
So
that's
kind
of.
B
Goal
here,
do
we
have
any
questions?
Do
you
have
any
questions.
B
Yeah
so
I
think
an
important
thing
to
talk
about
is
various
different,
different
pitfalls.
You
know,
I've
talked
about
sort
of
the
the
issue,
a
couple
of
the
issues,
one
of
them
being
that
vpa
requires
eight
days
of
data
to
be
really
giving
a
good
recommendation,
and
so,
if
we
we
go
back
and
we
look
at
Prometheus
and
we
grab
a
CPU
utilization,
oh
we
can
grab
really
any
any
graph
here.
B
We
need
to
be
able
to
say
a
week
of
data
right,
so
I've
got
a
Prometheus
instance
set
up
here.
It's
retaining
eight
days
worth
of
data.
That's
can
be
a
considerable
amount
of
information
to
store
for
any.
You
know
Prometheus
instance,
depending
on
the
size
of
your
cluster.
So
that's
one
thing
to
worry
about
is:
how
are
we
storing
all
this
Prometheus
data
for
long-term
storage
a
week
may
not
be
a
problem.
It
hasn't
been
too
bad
for
this
cluster,
but
it
might
be
in
a
much
larger
environment.
B
So
that's
the
first
thing
to
consider.
The
second
is
understanding
how
the
vpa
works,
so
the
the
vertical
part
Auto
scalar
it
uses
over
those
eight
days.
It
uses
a
decaying,
histogram
and
so
of
of
the
utilization
for
CPU,
and
then
it
uses
a
sort
of
that.
It
uses
memory
Peaks
over
an
interval
to
generate
its
recommendation
and
it
can
only
set
requests.
It
cannot
set
limits,
and
so,
when
we
set
the
requests,
it
can
adjust
the
limits
and
move
them
up.
B
So
if
we
have
an
initial
amount
and
it's
going
to
move
them
up
or
down
it'll
move
them
proportionally,
but
it
it
won't
set
limits
by
default.
So
actually,
in
this
cluster
I
have
very
few
CPU
limits
or
memory
limits
set.
This
might
be
risky
in
certain
environments
or
for
certain
for
certain
workloads,
and
so
that's
something
to
evaluate
if
you
go
to
set
something
up
like
this
is:
do
I
need,
CPU
and
or
memory
limits.
B
I
know,
CPU
limits
are
a
hotly
debated
topic
and
I
won't
dive
into
the
details
of
that
today.
But
you
know,
do
I
need
them.
Where
should
I
put
them?
That's
going
to
limit
the
ability
to
to
scale
as
effectively
or
to
utilize
more
resources
as
effectively
so
I
had
a
really
hard
time
getting
to
that
CPU
utilization
of
80
percent
across
the
cluster,
without
removing
limits,
and
so
something
definitely
to
consider
there
in
your
individual
evaluation
of
it.
B
The
the
next
thing
to
think
about-
and
let
me
just
pull
up
my
notes
here-
the
dangers
of
using
Carpenter
I
pointed
out
earlier.
You
know
you
might
get
an
M5
12xl
or
a
c512xl
spun
out
because
of
an
errant,
an
errant
recommendation
from
the
vpa,
so
capping
is
important.
B
The
other
thing
to
be
aware
of
with
Carpenter
is
that
there
are
lots
of
ways
to
utilize,
node,
selectors
and
resource
requests
or
annotations
in
Carpenter
that
restrict
Carpenter's
ability
to
function
so
I
could
create
a
pod
that
said,
I
want
to
run
on
a
C5
2XL
specifically,
and
that
will
force
Carpenter
to
create
a
C5
to
XL.
B
Well,
maybe
that's
not
the
best
choice
for
the
balance
of
price
and
compacting
workloads
that
Carpenter
wants,
and
so
what
I
recommend
is
using
some
sort
of
policy
engine
to
restrict
that
in
your
cluster,
so
I
actually
have
in
this
cluster
oops
wrong
window.
There
we
go
I
have
some
Opa
policies.
These
are
being
applied
by
Fairwinds
insights.
I,
won't
I,
won't
talk
too
much
about
fireman's
insights.
Today
we
have
some
opal
policies
for
carpenter
that
restrict
specific
things,
specifically
the
we're
restricting
the
ability
to
use
the
node
selector
carpenter.cates.ios
instance.
B
Family
I
think
there
may
be
other
node
selectors.
The
carpenter
respects
that
I
need
to
restrict,
but
essentially
we're
saying
you
can't
create
a
pod
in
this
cluster
with
this
node
selector
and
we're
enforcing
that
via
Opa
at
admission
time,
so
that
we
don't
let
workloads
in
the
cluster
sort
of
mess
with
Carpenter's
ability
to
compact.
B
So
that's
that's
one
thing
to
be
aware
of
with
Carpenter
I.
Think
I've
talked
about
this
in
other
content
that
we've
put
out
before,
but
using
some
sort
of
policy
to
control.
B
You
know
the
workloads
coming
in,
so
they
can't
break
Carpenter.
The
other
one
that
we
have
is
the
carpenter
do
not
evict
annotation,
so
you
can
tell
Carpenter
not
to
evict
this
pod
ever.
But
what
that
means
is
that
you
can't
scale
down
you
can't
let
Carpenter
move
workloads
around
and
compact
the
cluster,
because
it
can
evict
pause.
That's
the
mechanism
by
which
it
moves
pods
around
so
instead
of
using
the
do
not
evict
what
we
use
is.
We
have
pod
disruption
budgets
on
some
of
our
apps.
B
So
if
we
look
at
the
various
services
for
the
Emoji
photo
app,
we
have
a
pod
destruction
budget
of
Max
unavailable
one,
so
it
can
only
evict
one
POD
at
a
time,
so
we're
not
affecting
performance.
We're
not
letting
you
know,
Carpenter
just
wipe
out
our
whole
service,
but
we're
using
these
pod
disruption
budgets
to
protect
us.
B
So
the
sum
of
this
is
auto
scaling
101,
like
things
that
you
should
be
doing,
no
matter
what,
whether
your
automatic
right
size
or
not,
if
you're
horizontally
scaling-
and
you
have
multiple
replicas,
you
should
probably
have
a
pod
disruption.
Budget
that'll
protect
you
in
the
event
of
you,
know,
nodes
being
drained
for
an
upgrade
or
various
other
events
like
that,
so
pod
disruption
budgets
are
super
important.
B
Another
thing
to
be
aware
of,
we
talked
about
performance
a
little
bit,
but
you
know,
especially
when
you're,
using
an
Ingress
controller.
I.
Think
one
interesting
thing
here
is
we're
automatically
right
sizing,
not
just
the
workloads
but
the
Ingress
controller
that
serves
those
workloads.
B
So
we
have
a
horizontal
pod,
Auto
scaler,
that
is
I,
believe
working
on
the
query.
We're
using
here
I
think
it's
requests
a
second
metrics
average
value.
Http
requests
total,
so
just
pure
number
of
requests
coming
into
the
Ingress
controller,
but
it
needs
to
scale
respective
to
all
the
workloads
in
the
cluster
because
it's
serving
all
of
the
traffic,
and
so
your
English
controller
may
end
up
getting
fairly
large.
B
We'll
probably
see
we're
using
five
CPUs
per
instance
of
the
English
controller,
and
we
have
four
of
them,
so
you
know
you
need
to
be
aware
of
the
relative
scaling
of
you
know
your.
B
Controller
and
then
funneling
out
be
extra
sensitive
about
how
this
particular
workload
scales.
I,
think
that's
super
important
to
be
aware
of,
and
then
really
just
monitoring
I
had
to
do
a
lot
of
extra
stuff
to
get
all
of
these
metrics
into
grafana
for
these
various
workloads.
So
something
to
be
cognizant
of
is
all
of
your
various
Prometheus
configuration
to
get
that
working.
B
I
think
you
know
in
order
to
get
in
order
to
get
the
BPA
State
apologies
into
Cube,
State
metrics,
you
have
to
add
custom
resource
State
for
the
vertical
podoscope,
so
this
is
pulling
in
that
vpa
container
recommendations
into
Prometheus,
so
that
I
can
see
that
I
also
had
to
I'm
starting
to
pull
in
the
carpenter
annotations
for
the
various.
A
B
B
B
We
can
create
dashboards
and
monitor
that
and
keep
an
eye
on
the
size
of
our
cluster
all
right
and
then
just
to
kind
of
show
the
the
history
of
working
on
this.
This
is
the
last
months
of
data
I've
got
in
this
cluster.
We
started
out
at
28,
CPU
utilization
or
down
at
16
here
and
we've
gone
all
the
way
up
to
87,
and
we
should
be
hovering
right
around
80-ish
here
over
the
next
few
days
and
then
memory.
We
know
from
what
I've
shown
before.
A
B
Still
fairly
low,
something
that
I'll
be
I'll,
be
working
on
in
this
particular
environment,
so
yeah
I
was
hoping
for
more
questions.
There's
a
lot
going
on
here.
A
Yes,
I
guess
the
below.
We
don't
have
any
questions,
I
guess,
but
there
was
one
question
like
which
should
we
you
choose,
so
it
was
in
between
the
when
you
are
actually
showing
the
labs
right,
but
yeah
other
than
this
there's
no
questions
left
I,
guess
viewers!
You
can
actually
ask
your
questions
whatever
you
have
any.
If
you
have
any
doubts
or
not
but
other
than
this
yeah,
you
can
continue
session.
I.
Guess:
okay,.
A
B
Don't,
let's
see
I'm
trying
to
think
if
I
have
a
ton
else
here
to
cover
okay.
B
I've
been
experimenting
with
different
ways
to
to
scale,
so
you
know
you
may
not
have
an
Ingress
metric
available.
B
You
may
only
have
you
know,
sort
of
the
base,
kubernetes
metrics
for
your
pod
or
just
what
comes
out
of
the
box
with
Q
Prometheus
stack
available,
and
so
we
need
a
metric.
That
is
not
CPU,
not
memory,
but
maybe
we
don't
have
Ingress
requests.
Maybe
we
don't
have
latency
to
scale
on.
Maybe
we
don't
have
something
like
that.
So.
A
B
Else
can
we
scale
on,
and
so
here
we're
actually
scaling
on
container
Network
receive
bytes.
So
just
the
raw
amount
of
data
coming
into
that.
B
B
B
That's
a
great
question:
I'm,
really
hoping
that
Carpenter
expands
out
into
the
other
Cloud
providers
in
the
near
future,
but
I
know
that's
not
necessarily.
You
know
a
priority
for
them.
I
know
a
lot
of
those
folks
work
for
AWS
and
I,
get
that
that
makes
sense,
but
I
know
in
at
least
I'll
focus
on
the
ones.
B
I
know
better
I'm
not
super
familiar
with
AKs
I'll
focus
on
like
gke
I
know
you
have
the
ability
to
give
gke
control
over
your
instance
sizes
and
allow
it
to
sort
of
dynamically
pick
instance
sizes.
The
other
thing
you
can
do
is
just
use
slightly
more
traditional
cluster
Auto
scaling,
but
being
more
cognizant
of
your
node
sizes
so
or
your
node
types
right.
So
cluster
Auto,
scaler
Works
in
all
the
cloud
providers.
It
will
give
you
more
and
less
nodes
based
on
demand.
B
How
many
pods
you
have
and
so
I
would
definitely
you
know,
recommend,
starting
with
that,
it's
not
as
intelligent.
It
can't
pick
instance
types,
and
so
what
you
need
to
do
in
that
case
is
monitor.
That
utilization.
Look
at
you
know
say
that
that
Cube
capacity
chart
or
I'm
still
working
on
a
I,
I
think
out
of
the
box.
Actually
we
have
a
from
Q
Prometheus
stack.
We
have
the
cluster
sort
of
utilization
metric
that
we
can
see
here.
B
So
we
can
see
the
CPU
saturation
and
start
to
look
at
that
balance
between
CPU
and
memory
usage.
If
you're,
you
know,
if
you
have,
you
know,
16
cores
total
that
you're
you
using
or
yes,
16
and
you
have
you
know
32
gigs
of
memory
in
use
at
load.
Then
you
know
something
with
a
one
to
two
ratio,
which
is
a
commute
compute
optimized
size
is
going
to
be
appropriate
and
then
you
can
adjust
your
cluster
Auto
scalar
settings
to
do
that
and
then
potentially
utilize
multiple
node
groups
within
cluster
Auto
scaler.
B
To
give
you
options,
if
you
have
enough
disparate
workloads
to
need
both,
maybe
some
memory
optimizing
computer,
optimized
or
things
like
that,
so
any
type
of
cluster
Auto
scaler
will
get
you
closer
to
automatic,
Resource
Management,
it's
just
not
going
to
be
as
intelligent
as
something
like
Carpenter
so
and
then
there's
other
there's
commercial
options.
Like
spots,
there's
an
I
think
it's
cast.ai
has
an
AI
driven
spot
instance.
Generator
I
think
both
of
those
work
class,
multiple
Cloud
providers,
so
there's
commercial
options
to
look
into
there
as
well.
B
So
great
question
thanks
man
great
all
right.
We
were
talking
about
other
metrics
that
we
can
scale
on
so
yeah
container
Network
receive
bytes
or
transmit
bytes
or
using
both
of
those
metrics
can
work.
But
if
we
go
take
a
look
at
this
metric,
we'll
go
back
to
our
Prometheus
and
I'm,
going
to
zoom
back
out
just
a
little
bit
because
we're
going
to
go
show
the
graph.
B
B
Just
adding
them
all
together,
this
is
the
entire
cluster.
So
if
we
just
drop
the
sum-
and
we
look
at
the
rates
for
all
the
workloads
across
the
cluster,
this
might
take
a
minute
to
query
the
last
week,
but
we'll
see
that
the
levels
are
so
different,
that's
entirely
dependent
on
the
app
like.
If
we
take
a
look
at
this
demo
app
here,
we're
just
sending
a
little
ping
request
every
few
seconds
to
the
Pod.
So
that's
a
very
small
size
of
requests,
whereas
this
particular
application.
B
When
I
you
know,
need
to
send
back
the
full
list
of
all
the
votes.
That's
probably
a
much
larger
object,
so
it's
going
to
be
transmitting
more
data
same
with
the
Emoji
one.
This
is
a
you
know,
much
more
information
being
transmitted
back
to
me,
so
it
can
work,
but
you
have
to
sort
of
tune
that
that
auto
scaler
in
order
to
do
that,
I,
don't
think
Prometheus
like
same
amount
of
query
amount
of
data
I'm
querying
here,
but
you
know
it's,
it's
very
different,
pretty
very
different
apps.
B
So
something
to
keep
an
eye
on.
Another
thing
you
could
do
is
just
be
raw.
You
know
number
of
connections
or
different
metrics
that
are
available
in
Prometheus.
So
you
know
that's
one
of
the
things
that
I'm
starting
to
look
into
is
how
can
we
sort
of
generalize
the
horizontal
pod
Auto
scalar
metric
to
make
a
recommendation
for
a
starting
point?
B
That's
much
easier
than
trying
to
you
know
craft
this
yourself
and
then
go
tune
this
you
know
threshold
number
based
on
on
your
percentage,
and
so
you
know
that's
one
area
to
look
at
and
then
the
other
thing
is
multiple.
You
know:
faceted
multiple,
multiple
faceted.
A
B
Scalers,
so
here
we
can
see
the
is
it
this
one.
B
Has
this
scaled
object
for
the
web
service
on
the
Emoji
photo
app,
has
multiple
metrics,
so
we're
we're
scaling
on
that
received
bytes
total
notice,
a
very
different
threshold
for
this
app
that
I've
had
to
tweak
over
time
and
then
we're
also
scaling
on
the
P95
latency
of
the
Ingress
controller
response,
duration,
they're
using
the
Ingress
controller
response,
duration
metric
for
this
app,
and
so
we're
trying
to
keep
our
95th
percentile
latency
around
half
a
second
or
actually,
this
comes
back
in
milliseconds,
no,
no
I'm
trying
to
keep
it
below.
B
You
know
around
500,
milliseconds
or
lower.
So
we
have
two
metrics
here
that
we're
horizontally
scaling
on
that
sort
of
balance
each
other
out
or
hopefully,
balance
each
other
out.
So
another
thing
to
experiment
with:
if
you're
going
to
go
down
this
route,
is
you
know
really
thinking
about
what's
important
about
what
your
horizontal
is
going
on?
If
latency
is
the
most
important
thing
for
that
application,
maybe
you
should
only
be
scaling
on
latency.
It's
going
to
affect
your
automated
right
sizing
right,
you
may
get!
B
You
know
a
little
bit
more
over
provision
resources
to
provide
that,
but
that
made
me
the
trade-off
that
you're
willing
to
make
so
I
think
you
know
the
hardest
thing
about.
All
of
this
is
choosing
metrics
and
then
tuning
those
different
numbers
to
values
that
you
want,
and
in
that
case
load
testing
is
super
important.
You
need
to
be
able
to
generate
some
sort
of,
even
you
know,
remotely
realistic
load
against
your
service
in
a
non-production
environment
in
order
to
play
with
these
values.
B
So
again,
I
used
k6,
which
is
actually
now
owned
by
grafana,
to
run
the
load
and
you
just
kind
of
write,
little
JavaScript
Snippets
that
hit
your
app
and
it
can
run
those
you
know
I
think
we're.
B
If,
if
we
go
back
over
here
to
this,
this
tiny
window
I'm
running,
let's
see
this
one's
running,
you
know
100
000,
requests
this
one's
running,
200,
000,
requests
and
running
them
pretty
quickly
against
the
app
so
we're
generating
a
decent
amount
of
traffic,
as
evidenced
by
some
of
our
requests
here,
and
actually
we
can
go.
Take
a
look
at
just
the
Ingress
controller
graph
here.
We're
doing
589
requests.
B
But
for
a
small
cluster
like
this,
that's
you
know
more
than
it
does
just
sitting
there
day
to
day
by
doing
nothing,
and
so
we
actually
get
some
real
numbers
here.
So
load
testing
is
super
important
to
be
able
to
tweak
these
numbers
and
and
set
up
all
these
different
things.
B
All
right
well
now's
the
time
to
ask
all
the
questions.
If
you
have
them
so
again,
six
different
Technologies
today
we've
got
Prometheus,
we've
got
keto,
we've
got
vertical,
potoscaler
we've
got
Carpenter
or
cluster
Auto
scaler,
depending
on
where
you're
at
and
then
Goldilocks
to
sort
of
tie
it
all
together.
So
keep
an
eye
out
on
Goldilocks
over
the
next
six
months
or
so,
hopefully
we'll
be
releasing
more
features
related
to
this
sort
of
concept
of
automated
right
setting.
B
It's
really
where
I
want
to
take
Goldilocks
in
the
next
next
iteration
of
it.
I
think
recommendations
were
a
great
place
to
start
and
I.
Think
folks
appreciate
those
and
the
Goldilocks
dashboard
won't
be
going
anywhere,
we'll
still
be
showing
you
your.
You
know,
recommendations
here,
it's
just
that!
Maybe
you
didn't
set
them
because
you're
letting
the
vpa
take
them
over
now
and
we'll
be
you.
B
A
This
this
word
for
the
questions
to
follow:
okay,
so
other
than
this
like
yeah.
So
what
is
you
inside
on
this?
Like
yeah
you're
talked
about
a
lot
right,
so
some
best
practices
regarding
these
things
to
these
tools.
What
do
you
suggest
on
this?
What
should
be
done.
B
B
I've
definitely
had
some
mistakes
in
this
testing
process,
where
you
know,
I've
blown
up
the
cluster
really
big
or
the
entire
thing's
fallen
over,
because
nginx
isn't
getting
enough
CPU,
it
just
can't
serve
any
traffic,
and
so
there's
a
lot
of
risk
associated
with
trying
to
automatically
right
size
and
so
as
best
practice,
don't
run
it
in
prod
right
now,
I
think,
just
just
really.
A
Yeah
we
have
I
think
we
have
a
question
yeah
so
yeah,
so
how
to
use
Goldilocks
for
apps
that
has
burst
spike
in
memory
users.
B
That's
a
good
question
and
that's
a
tricky
one
with
the
vertical
parallel
scaler.
So
if,
if
it's
bursty
in
a
semi-consistent
manner
in
that,
like
you
know,
you're
going
to
get
bursts
throughout
the
day,
the
vertical
pod
Auto
scaler
should
account
for
that,
because
it
is
using
memory,
spikes
within
a
window
to
calculate
its
Target,
and
so
in
theory.
It
should
be
able
to
handle
that
now.
If
you
think
you're
going
to
have
like
you
know
a
spike
once
a
week
that
you
need
to
account
for
then
I
think
aggressively.
B
You
do
have
to
be
careful
with
you
know,
requesting
or
having
memory
limits
higher
than
the
memory
available
on
a
node
or
doing
that
in
too
many
places,
but
that
is
one
potential
way
to
sort
of
mitigate
that
and
then
keeping
you
know,
vpa
will
also
take
into
account
room
kills.
So
if
you
get
out
of
memory,
killed,
vpa
will
bump
up
the
next
recommendation
to
a
higher
memory
amount,
and
so
it
will
sort
of
self-correct
over
time
potentially.
B
But
that's
sort
of
a
thing
where,
if
you're
expecting
specific
burstiness
that
you
can
test
right
right,
a
load
script
that
generates
that
burst
and
see
how
that
looks.
And
then
the
other
option
is
just
for
those
particular
workloads.
Don't
use
Goldilocks,
don't
use
VPN.
So.
A
B
A
B
That,
if
you've
seen,
kills
definitely
bump
that
memory
Limit
Up
and
keep
an
eye
on
that,
but
you
know
burstiness
is
in
here,
can
be
inherently
difficult.
The
other
option
is
to
drop
the
memory
limit
and
let
the
no
Doom
kill
it
if
you
run
out
of
memory,
but
then
you've
got
to
watch
your
node
sizes
and
your
your
system
level
and
kills
not
just
your
kubernetes
in
cups,
so
yeah.
A
See
if
there
any
question
pops
up?
Okay,
so
yes,
okay,
so
other
than
this
yeah.
That
was
really
an
awesome
session
and
yeah.
We
we've
been
seeing.
We've
been
visually
represented
with
a
lot
of
drops
and
so
honestly,
so
yeah,
okay,
so
yeah.
That
was
awesome,
so
something
you
would
like
to
add
or
okay.
There
is
another
question
like
okay,
okay,
so
like
so
what's
happened
when
scaling
to
memory
or
CPU
limits.
B
I,
don't
quite
follow
the
question,
but
I'll
try
and
answer
best
I
can
get.
It
can
definitely
scale
on
memory
or
CPU
limits.
I
have
intentionally
not
done
that
here
because
it
conflicts
with
the
vpa.
So
if
you
try
to
run
vpa
and
Keta
or
any
auto
scaler
on
CPU
at
the
same
time,
you're
gonna
get
unexpected
results.
B
Which
is
why
I'm
not
using
CPU
memory
so
I.
A
Okay,
yeah
I
guess
so
that's
yeah
people
are
saying
yeah.
That's
thanks
for
your
gift
station,
so
I
guess!
If
there's
nothing
to
add
more,
we
can
end
this
session.
Yeah
great
okay,
thanks.
B
A
B
Damon
sets
are
an
interesting
thing
to
consider
here.
Damon
sets
would
be
considered
sort
of
the
node
overhead
in
a
calculation
by
Carpenter
about
you
know
whether
you
should
add
another
note
or
not,
and
Carpenter
does
have
the
ability
to
take
that
into
account.
So,
ideally
that
should
be
handled
by
Carpenter.
B
A
B
Well,
Goldilocks
is
older,
that's
for
sure
been
around
longer,
but
Goldilocks
uses
the
vpa
just
to
provide
recommendations,
there's
very
little
cost
functionality
in
Goldilocks
there's
a
little
bit,
it's
very
specific,
very
limited
in
its
ability.
Cube
cost
has
a
lot
more
cost
focusability,
whereas
Goldilocks
is
much
more
focused
on
just
resource
requests
and
limits.
So
that's
kind
of
the
big.
A
The
big
differentiator
and
yeah
so
yeah,
okay,.