►
From YouTube: SIG - Performance and scale 2021-08-05
Description
Meeting Notes: https://docs.google.com/document/d/1d_b2o05FfBG37VwlC2Z1ZArnT9-_AEJoQTe7iKaQZ6I/edit#heading=h.lmkwmnkao9j0
A
Okay,
all
right:
everyone
welcome
to
sixth
scale
august
5th,
the
link
to
the
docs
in
chat.
So
everyone
can
please
have
myself
as
an
attendee.
A
All
right,
so
we've
got
a
bunch
of
agenda
items
to
go
over
and
before
we
do
how
was
last
week
it
seems
like
we
had
a
bunch
of
things
that
were
discussed.
A
Looks
like
things
progressed
pretty
well
like
any
leftover
items
for
this
week,
assuming
people
added
them,
I
brought
them
up
or
is
there
anything
we
want
to
discuss
from
last
week.
A
No
okay,
all
right
we'll
just
keep
going
on
then,
with
today's
agenda.
Okay
and
thanks
david
for
hosting
it.
I
really
appreciate
it.
Okay,
first
item
discuss
how
to
handle
mem
cpu
requests
on
pods
different
scale
requirements.
Return
scale
requires
different
resource
requests,
to
control
playing
components.
B
Great,
so
this
is
something
that
actually
ramon
pointed
last
time
actually
to
discuss
this
meeting.
So
it's
coming
from
last
meeting,
okay,
yeah
yeah,
so
we
know
I
was
like
trying
to
see
like
the
the
metrics.
B
B
You
know
this
especially
for
memory
is
fine,
what's
negative
here,
which
means
that
the
request
is
smaller
than
the
usage
so
and
yeah,
and
it's,
although
it
looks
like
small,
but
it's
significantly
for
cpu
requests,
and
this
shouldn't
be
like
that,
so
we
need
to.
Of
course
we
need
to
do
some
more
tests
to
define
what
should
be
at
least
a
desirable
cpu
request.
B
It
can
be
like
too
big,
but
it
also
shouldn't
be
too
small
in
the
way
that
I
think
it's
it's
happening
now
and
roman
mentioned
something
less
meaty.
I
don't
remember
now
what
he
mentioned,
but
one
thing
that
impacts
that
it's
for
sure
this
scheduling
you
know
and
also,
if
it's
requesting
less
things
and
we
are
placing
that
with
the
you
know
the
scheduling
and
should
be
putting
more
interference
in
the
workloads
and
so
on.
So
and
again,
so
I
don't
remember
now
understand
that
from
animation
I
think
he's
not
today.
A
I'm
trying
to
understand
like
so
this
is
so
we
have
these
negative
numbers
here.
This
means
that
we
are
we're
not
quite
using
what
we've
requested,
or
is
it
that
we're
using
over.
A
I
see
okay,
I'm
trying
to
so
this
green
line
is
the
vert
api.
So
this
is,
I
don't
understand,
what's
the
so
the
lowest
one
we
have,
I
can
see
the
orange
one.
Here
is
this?
Yes,
I'm
assuming
this
is
a
controller.
B
A
And
these
numbers,
like
I
I'm
just
trying
to
quantify
like,
is
this:
do
we
consider
this
very
small,
like
I'm
just
looking
at
this,
never
look
small,
but
I
don't
know
what
just
based
on
what
the
metric
is
like
like
is
this?
Is
this
like
write
this
off,
or
is
this
like
something?
That's
just
pretty
significant.
B
B
Then
then,
maybe
in
a
smaller
machine
it
might
be
more
significant.
You
know
this
values,
but
but
yes
it's,
although
it
looks
small,
you
know,
I
think
it's
it's
something
that
maybe
impacts
the
the
performance.
You
know
in
somehow.
A
Okay,
so
we
go
a
little
bit
over
okay,
so
we
maybe
need
a
little
bit
more
investigation
to
understand
the
metric
to
see
what
this
is.
There's
also
like
this
doesn't
go
to
zero.
Looking
at
this
right,
like
it
does
here
or
even
just
these,
like
our
controller,
goes
right
back
up
to,
I
guess
looks
like
underneath
what
it
requests
and
then
we're.
A
B
Yeah,
so
this
might
be
related
to
what
kevin
mentioned
before
that
when
we
clean
up
the
cluster.
It's
still,
you
know,
have
things
going
on
and
it
doesn't
you
know
completely
goes.
You
know
to
the
to
the
first
stage.
You
know
at
least
for
a
while.
Maybe
maybe
this
is
something
that
kevin
mentioned
he's
investigating,
but
maybe
the
you
know
exactly.
B
Yeah
the
interval
between
the
experiments,
maybe
must
be
because
I
think
what
kevin
may
shall
he
will
talk
about
that
for
sure,
but
maybe
it
was
the
garbage
collector.
You
know
working
and
yeah
yeah.
We
need
maybe
to
wait
more
time.
You
know
to
between.
C
D
Collector
because
we
have
a
lot
of
like
adjacent
decoding
with
all,
creates
a
lot
of
resource
objects
that
seem
to
get
cleaned
up
during
that
time,
but
it's
just
an
assumption
based
on
flame
charts
and
traces.
So
we
should
wait
more
time
if
we
do
like
step-by-step
tests.
D
D
It
was
at
full
cpu
load
trying
to
do
whatever
I
didn't
have
profiling
then,
and
it
only
came
back
down
after
deleted
all
the
vms,
and
I
think
it
it
got
stuck
on
something
I
don't
know.
A
So
this
this
garbage
collection,
so
to
me
it
just
I
don't
know
like
does
it?
Does
it
sound
like
to
anyone?
That's
that's
an
abnormal.
That's
abnormal
behavior,
like
we
shouldn't
be
spending
so
much
cpu
and
memory
to
do
this.
Garbage
collection,
like
I
don't
know
what
kubernetes
has
just
to
compare,
but
I
mean
it
seems
like
if
you
were
doing
json,
you
don't
want
to
work
with
json
sterilization
or
decentralization.
It
seems
like
optimized
significantly.
D
Yeah,
so
in
kubernetes
we
had
like
we
had
a
few
issues
where
we
built
something
new
that
did
a
lot
of
decoding
and
garbage
collection
was
a
huge
issue
at
some
point
and
we
had
to
rework
that
decoding
to
use
different
ways
of
solving
the
problem.
But
I
think
to
some
extent
it's
okay
for
us,
but
we
should
or
could
still
look
at
like
how
much
we
decode
like.
If
our
caches
need
to
be
the
way
they
are
like.
D
A
Okay,
well,
we
can
talk
about
it
more
kevin
when,
because
I
think
you
have
a
section
here
when,
when
you
bring
up
what
your
work
is,
okay,
so
kind
of
the
takeaway
here
is
like
we're
using
a
little
bit
more
cpu
than
we
requested.
A
We
need
to
further
understand
exactly
what
this
means,
so
we
could
decide
what
we
want
to
do
next,
okay,
okay,
all
right!
Is
that
sound
good
marcelo
we'll
go
to
the
next
one,
then!
A
Okay,
next
is
a
pug
fix
enhancement.
So
this
is
oh
this
is
you
kevin
okay,
you're?
Next,
all
right,
we
can
just
roll
over
into
the
discussion.
Then.
D
Yeah,
I
think
the
document
is
is
only
the
stuff
I
found.
I
took
to
use
it
as
a
notebook.
I
think
everything
there
should
be
taken
care
of
now.
That
was
just
to
go
routines,
but
I
shared
a
few
of
those
those
snapshots
showing
in
general.
What's
going
on
in
slack,
I
think
yeah.
A
A
Was
the
block
that
we
were
doing
in
here
in
this
routine
and
it
was
just
leaking
yeah?
Okay,
yeah!
Okay!
Is
there
anything,
though,
that
you
want
to
say
about
anything
more,
you
want
to
say
about
this.
D
Just
about
the
the
resource
usage,
maybe
a
little
like
or
the
cpu
load
I
saw
after
we
see
after
deletion,
I'm
like
the
numbers
we
see
on
word
handler.
I
think
I
got
only
created,
I
think
100
vms
and
then
300.
We
have
to
see
how
how
high
we
can
get,
but
in
general
it
we
we
should
look
into
if
we
can
optimize
it,
but
the
numbers
we
reach
on
cpu
load
and
also
memory
load
still
seem
pretty.
Okay
like
it.
D
If
you
think
about
that,
this
thing
is
managing,
let's
say
a
300
vms
per
node.
I
think
that's
still
a
fair
load.
To
some
extent,
I
I
mean
the
people
actually
running.
It
have
to
have
to
say
that,
but
it
doesn't,
it
doesn't
look
like
we're
doing
something
completely
wrong,
only
small
step
parts
that
we
could
optimize.
A
D
Let
me
pull
up
my
snapshots,
but
I
think
we
were
still
in
the
like
0.0
or
0.1
areas
of
cpu
load
and
that
it's
like,
I
think
we
can
live
with
that
yeah.
Even
the
high
load
after
it's
like
0.02
cpu
load
and
never
more
than
I
don't
know.
250
megabytes
of
ram
milly
bytes
of
ram
still
sounds
fair.
As
long
as
we
don't
see
problems
with
stuff
taking
too
long.
A
Okay,
how
should
I
classify
this
then
so,
like
just
after
deletion,
we
have
a
little
bit
more.
We
have
some
cleanup
to
do
with.
D
Yeah
the
coverage
collector
gets
load
from
deleting
all
the
vm
objects
from
its
cache.
I
think.
A
B
It's
if
the
load
is
too
high
getting
too
many
cpus
for
too
long
time.
Then
then
it
might
be
a
issue,
but
if
it's
like
just
a
little
bit
and
for
five
minutes.
D
I
would
say
that
yeah
so
obviously
a
reason
I
would
see
to
fix
that
would
be
if
we
see
the
word
handle
not
doing
things
fast
enough,
because
the
garbage
collector
takes
too
many
cycles
or
it's
taking
away
cpu
for
more
important
stuff,
which
I
think
it
shouldn't
like.
With
your
face
transition
stuff
you're
investigating.
If
we
see
the
vertana
is
severely
impacted
at
a
certain
scale.
That
might
be
why
we,
I'm
pretty
sure
we
have
burst
problems
if
that
like
that,
could
cause
that,
but
without
any
problems
it's
causing.
A
A
D
D
And
it
it
in
in
the
in
the
real
world.
It's
also
only
a
problem
if
you
have
like
it
could
only
be
a
problem.
If
your
use
case
requires
you
to
have
a
high
turnover,
you
create
a
lot
of
vms
and
you
get
a
lot
of
vms
and
you
will
have
that
load
of
cleaning
up
behind
you.
But
if
you
just
run
a
lot
of
vms
it,
it
won't
be
one.
A
A
All
right
there
we
go
okay,
I
think
that
covers
it.
So
yeah
we're
basically
seeing
this
at,
like
I
mean
usually
300
business
300
tests
and
we
see
a
non-zero
level.
I
mean
this
is
I
think
this
is
this?
Is
200
or
is
this
100
200
200,
it's
just
a
blip
above
zero
and
then
then
it's
fairly
steady
all
the
way
in
I
mean
it's
100
or
less.
That's,
probably
what
the
majority
use
case
is
going
to
be
in
here.
A
A
Good,
okay:
let's
go
to
the
got
you
kevin!
You
are
you
all
set
with
this
next
one.
D
A
Okay
and
the
go
routines
like
so
we
and
this
one's
are,
we
like
in
the
latest
after
your
two
patches
merged.
Are
we
seeing
no
more
leaks?
I
think
after
that,
second
one.
I
think
you
did
another
test
right
and
you
shouldn't
see
many
weeks
leaks.
B
Yeah
he's
still
continuing
it,
so
actually,
I
think
the
fix
I
think
ramon
also
mentioned
that
it's
related
to
migration
and-
and
I
don't
remember
the
the
other
thing,
but
oh
you
don't
see
any
more
leaks
yeah.
This
is
this
is
after
the.
D
A
D
Sometimes
the
word
handler
goes
up
on
itself
and
what
I,
what
I
was
looking
at
was
that
it's
generally
using
a
lot
of
routines,
because
I
I
don't
know
we
do
a
lot
of
stuff.
In
the
background
we
watch
a
lot
of
stuff.
Node
labeler
runs
with
10
threads
stuff
like
that,
but
that's
that's
fine.
I
mean
we're
still
building
programs
that
run
threats.
Some
are
to
be
expected.
D
A
A
Okay,
great
all
right,
we'll
go
to
the
next
one.
So
new
metric
to
monitor
request,
counts
by
resource
and
operation.
E
Yeah,
so
what
I
did
here
was,
I
add,
a
new
metric,
that's
parsing,
urls
from
the
clients
for
all
our
control
playing
components.
So
we
do
any
sort
of
client
operation.
This
hook
intercepts
it
parse.
This
url
figures
out
what
the
resource
was,
so
the
resource
being
a
pod
virtual
machine
instance
whatever,
and
then
the
operation
not
being
the
http
verb
but
being
the
actual
kubernetes
operation,
meaning
a
list,
a
watch,
a
get
a
patch
a
put
or
an
update.
Instead
of
a
put
things
like
that,
so
what
we
have.
E
Now
that
we
can
say
across
our
entire
control
plane,
let's
figure
out
how
many
gets
we're
doing
on
virtual
machine
instance
objects
or
how
many
updates,
for
example,
we're
doing
a
virtual
machine
instance
objects,
and
that
gives
us
an
idea
for
our
density
tests.
E
How
many
writes
we're
doing
for
these
objects
and
we
can
figure
out
which
exact
resources
we're
writing
to
the
most
and
things
like
that,
and
we
can
create
thresholds
to
say
hey.
We
expect
in
this
density
test
to
call
update
and
patch
virtual
machine
instances
x
number
of
times,
and
we
go
over
that.
Then
we
failed
our
threshold,
so
that's
kind
of
the
idea.
So
I
integrated
this
metric
into
that
perf
audit
tool
and
we
can
create
thresholds
and
all
that
good
stuff
for
it.
E
A
E
And
we've
had
mistakes
occur
here
where
some
subtle
code
path,
where
kind
of
causes
an
update,
storm
or
something
like
that
with
our
vmis,
and
that
puts
quite
a
bit
more
load
on
the
api
server
and
it
also
impacts
our
you
know,
time
to
to
running
creation
to
running
so
we
probably
see
lots
of
things
occur
when
issues
arise
when
we
have
like
an
update
storm,
we'll
see
that
the
the
time
is
increasing
before
we
can
get
to
running,
but
then
we'll
also
see
that
certain
thresholds
around
api
calls
will
probably
get
hit
as
well.
A
B
Yeah,
so
it's
pretty
good,
so
I
would
say
this
is
nice
and
I
just
comment:
did
some
few
comments?
It's
it's
just
not
me
that
the
comments,
maybe
it's
important,
but
it's
just
something
to
discuss
so
first
thing
is
operation.
B
So
it's
I
would
say,
like
the
other
metrics,
I
would
say
the
other
methods
related
to
that.
I
don't
know
if
it's
super
related
but
the
in
the
same
section.
They
actually
call
verbs
and
they
also
has
like
this
list
watch
and
I
don't
know.
Maybe
we
should
keep
also
verbs.
So
it's
like
the
mecca
sure
that's
been
used
and
I
don't
know
if
if
it
has
update,
but
I
think
it
has
it's
still
used
like
maybe
put
in
the
atp,
you
know
nomenclature
yeah.
E
So
verb
is
referring
to
the
http.
I
don't
know
what
the
I
guess-
operation
or
whatever
it's
referring
to
the
http
spec
itself,
so
we're
gonna
get
puts
patches,
gets
deletes,
creates
things
like
that
or
I'm
sorry
not
deleting
crates.
You
get
delete,
but
instead
of
create
you
have
it
put
and
things
like
that.
So
the
reason
that
shows
operation
we
can
pick
a
different
term
or
whatever.
I
didn't
want
to
confuse
what
we're
getting
here.
So
we're
not
getting
the
http.
E
A
E
E
E
So
when
we
look
at
the
request,
client,
sorry,
rest
client
request
latency
seconds.
I
think
that
one
has
a
verb
in
it
as
well.
It's
not
reporting
the
http,
I'm
sorry.
It's
reporting
the
http
verb,
not
the
kubernetes
verb.
E
Are
we
okay
with
these
terms,
meaning
different
things
for
what
look
like
similar,
similar
client
type,
behavior
or
monitoring.
A
Is
it
like,
so
this
is
the
the
verb
that
so
where
the
kubernetes
verb
they
were
talking
about
like
this
is
like
when
kubernetes,
when
you
create
something
we
have
a
create
event
in
kubernetes.
This
is
like
that
request
being
caught.
A
Call
it
a
created,
we
call
it
we're,
calling
it
like
a
post
or
something
is
that
I
mean
that's
the
confusion
here.
So
so
you
get
a
create
and
not
a
post
right.
A
B
E
I
wouldn't
call
it
an
event,
because
an
event
is
something
specific
in
cube:
kubernetes.
D
D
E
E
A
Okay,
the
verb
will
be
all
those
creations.
A
I
was
just
like
all
the
verbs
like
create
lists,
get
verbs
here
that
that's
all
I
want
to
say,
but
yeah.
B
Yeah,
my
last
comment
is
just
right,
so
it's
not
very
clear
to
me
the
the
whole
difference
about
the
red.
I
know
that
it's
getting
different
information
but
especially
because
you
mentioned
that
the
rest
client
requests
it's
for
atp
in
your
metric.
It's
getting
something
else.
So
is
it
what
is
just
something
else
so
which
which
other
protocol
are
you
getting
here?
That
is
not
atp,
so
it
might
might
be
nice
to
describe
the
difference
just
to
be
clear
about
the
the
metric.
E
Yeah
we're
getting
the
the
resource
and
then
the
kubernetes
verb.
That's
that's
the
difference.
Risk
client
request,
latency
seconds
is
just
getting
the
the
http
method
and
then
a
a
kind
of
normalized
url,
which
is
has
the
resource
in
it
kind
of.
But
you
can't
do
things
like
say
how
many
lists
that
I
get
on
that
resource
or
how
many
watches
and
things
like
that,
because
those
are
they're
all
gits.
E
So
a
list
watch
and
a
get
as
far
as
like
the
kubernetes
verbs.
All
are
the
http
method
get
all
three
of
those.
E
Get
more
information,
they
need
different
things
right,
yep
also,
latency
seconds
is
not
reporting
watches
because
it's
a
long
standing
like
long
pole
http.
E
So
we
don't
have
any
visibility
into,
for
example,
if
if
we
were
seeing,
lots
of
watches
occurred
there
during
our
stress
test,
and
that
would
mean
that
informers
or
something
are
failing,
a
lot
we're
getting
a
lot
of
errors.
D
I'm
a
bit
disappointed
with
the
kubernetes
client
that
it
doesn't
give
us
that
information
on
the
request.
Somehow
through
context
or
again,
you
have
to
do
regex,
I
expected
more
from
them,
but
I
think
I
think
I
like
how
the
metric
what
the
metric
is
now.
B
B
However,
maybe
you
know
with
this
new
metric,
we
don't
even
need
need
to
care
about
this.
Other
one.
Isn't
it
if
you
collect
the
latency
right
now,
you
are
only
counting.
E
A
B
E
C
E
E
A
E
For
that,
because
I
don't
have
a
lot
of
experience
with
what's
too
much
collection
and
if
we
are,
if
we
need
to
be
tighter
or
more
restrictive
about
creating
new
metrics
or
even
reevaluating
the
metrics
that
we
have
today
to
make
sure
that
they
are
all
valuable.
I
think
that
there's
some
things
specifically
around
the
what
we
collect
about
every
individual
vm
eye
that
might
be
pretty
intensive
as
well,
probably
the
most
intensive,
but
I
don't
know
something:
maybe
should
we
be
tracking?
E
Is
there
a
way
to
if
we
wanted
to
understand
this
better,
the
prometheus
load
and
whether
we're
bumping
up
into
any
sort
of
would
be
even
the
limits?
Are
we
talking
about
bandwidth
limits,
or
are
we
talking
about
just
the
the
database
stream
of
keeping
all
these
time
series
or
where
would
you.
B
Start
to
fall
apart,
I
think
all
of
this
yeah-
maybe
maybe
it's
not
just
the
bandwidth,
but
you
know
when
prometheus
is
scrapping.
You
know
the
service
and
then
the
service
needs
to
you
know,
bring
up
all
these
metrics
and
if
it's
too
much,
maybe
it's
using
too
much
memory
for
that.
You
know
cpu
to
compute
that
so
this
might
be
a
problem
in
the
service
itself
and
also
for
me
too.
A
Yeah
I
mean
where
do
you?
Where
do
you
think
we
might
hit
this
this
issue
because
I,
like
even
you,
know
what
we're
doing
now
at
least
the
stuff
and
the
stuff
that
I'm
aware
of,
but
I
don't
think
we're
really
causing
too
much
usage
from
permeability.
I
mean.
A
A
B
A
Like
we're
like,
you
might
run
into
troubles,
if,
like,
if
you
do
like
per
bmi
like
say,
for
example,
we
were
to
track
every
single
bmi's
and
get
gathered
data
from
every
single
bmi,
or
something
like
that.
You
can
run
into
some
trouble
there,
like
that's
at
scale.
I
can
yeah
well
like
but
like
per
like
like,
and
I
guess
it
depends
on
like
in
some
ways,
events
that
you
do
like,
which,
which
ones,
for
example,
are
like.
We
gather
information
on
like
an
individual
bmi
basis.
A
E
There's
a
lot
of
information
that
we
expose
invert
handler
that
aggregates
metrics
for
every
vmi
on
the
local
note
and
that's
how
it's
exposed.
I'm
curious!
D
So
far,
the
only
load
issues
with
prometheus
I
saw
were
really
storage
related
like
if
you
create
too
many
labels
like
that
at
some
point,
you
can't
store
a
week
of
history
but
three
days
and
at
some
point
that
prometheus
just
takes
a
while,
scraping
and
runs
into
timeouts
if
the
metrics
endpoint
gets
like
too
big
a
file,
but
that
really
only
happens.
If
you
I
mean.
A
A
Like,
ultimately,
I'm
not
like,
like
you,
can
disable
like
you,
don't
have
to
scrape
all
of
them
and
we
can
always
like
if
it's
like,
it's
not
like
we're
totally
going
to
be
just
like
cornering
anyone
like
if
it's
something
that
we're
like
we're
doing
too
much,
but
I
mean
even
this,
like
I'm
just
talking
about
the
ones
that
we've
worked
on
here,
I
I
don't
think
are
don't
say
like
especially
like
the
transition
times
that
david
did
like
those
were
like
the
number
of
tags
that
we
exports.
Not
many
like
it's.
A
D
Yeah
but
one
as
a
rule
of
thumb,
I
think
you
can
say
if
you
create
a
metric
and
the
amount
of
labels
and
different
labels
is
a
fixed
number.
You're
fine,
and
I
think,
with
this
we
are
there,
it's
not
it's
not
growing,
with
the
amount
of
labels
and
leg
variations
that
growing
with
the
amount
of
objects
we
have.
A
Yeah
yeah,
that's
that's
where
that's
exactly
like
when
you
run
into
trouble,
it's
like
when
it's
it's
when
the
it's,
when
the
it's
dependent
on
the
number
of
objects
you
create
and
that's
when,
like
number
tags
you
have,
it's
depend
on
the
number
of
objects
you
create
and
then
it
can
become
overwhelming
so
yeah.
If
that's
that's
it's
fine
and
then,
and
then
again
like
you
can
always
you
know,
people
can
always
disable
them
if
it's
something
that
it's
just
very
granular.
E
E
I
was
saying
that
we
do
so
bad.
I
don't
know
bad
is
the
right
word
we're
doing
some
things
that
have
potentially
performance
implications.
I
don't
think
the
thing.
E
Does
because
the
labels
are
pretty
set,
we're
not
going
to
get
a
lot
of
new,
it's
not
an
infinite
number
they
get
created,
but.
D
D
The
yeah
with
the
dashboards
and
the
promisives
in
our
test
environments
we
haven't
get.
It
should
also
provide
us
metrics
about
prometheus
itself,
and
we
can
also
have
a
look
at
those
and
see
how
how
much
we
kill
prometheus
with
some
changes
and
measure
our
impact.
That
way
as
well.
A
Yeah
this
is
so.
This
is
something
we
can.
I
guess
we'll
kind
of
take
away
from
this.
It's
something
we
keep
in
mind
when
we
generate
metrics.
If
we're
reviewing
things
like
we're,
creating
new
metrics,
if
we
are
having
things
that
are
dynamically
or
I
guess
the
number
of
labels
a
tag
scale
with
the
number
of
objects
created,
and
we
just
need
to
be
aware
of
that-
and
maybe
we
should
you
know
it
might
be
okay
to
have
it
in
some
cases.
B
I
think
maybe
it
might
be
a
good
idea
to
come
up
with
a
plan
to
analyze
that
you
see
so
every
time
that
we
do
a
scale
test.
I
don't
I
don't
know
how.
So
we
need
to
think
about
that
yeah
we
we
could
just
verify.
You
know
if
we
are
getting
too
bad
on
that
or
not
so
you
know,
and
you
know
people
can
still
keep
like
introducing
metrics
and
if
we
have
a
way
to
evaluate
that
we
can
rise
on.
You
know.
E
Yeah,
so
what
you're
talking
about
marcelo
is
essentially
monitoring
our
monitoring,
which
is
I,
I
think
it
makes
a
lot
of
sense.
We're
monitoring.
E
That
our
monitoring
puts
on
the
cluster.
It's
all
it's
all
load,
no
matter
where
it's
coming
from.
A
This
might
be
a
good
one,
I
think
you
know
like
if
we
have,
I
think,
some
point
we
kind
of
come
up
with
some
guidance
in
terms
of
like
how
to
run
kubernetes
scale.
This
would
be
a
good
one
say
like
okay,
we
have
these
metrics
here.
You
know
maybe
they're
important,
like
maybe
we
have
there's
a
legitimate
reason
for
them.
It's
smaller
scale,
but
at
a
larger
scale
like
if
you
want
to
be,
if
you're
talking
thousand
plus
notes,
you
probably
want
to
disable
these
because
they
can
affect
your
scale
or
something.
B
Yeah
I
just
come
up
with
like
prometheus,
has
just
targeted
and
it's
at
least
says
how
long
it's
taken
to
scrape
you
know
metrics,
and
we
can
have
a
maybe
a
look
on
that.
So,
if
the
target's
getting
too
high
to
scrape
metrics
it's
something
can
can
become
a
problem
for
latency
and
also
we
can
check
like
how
how
much
you
know
is
the
perimeter's
database.
It's
increasing,
you
know,
you
know
in
the
test,
so.
E
And
ultimately,
I
guess
what
we
care
about
here
is
the
response
time
we
query
prometheus.
Is
it
not
so
we
want
to
know,
isn't
that
where
it
would
fall
apart,
I'm
making
stuff
up,
but
my
expectation
would
be
if
I
made
a
query
to
promote
the
us
to
get
some
metrics
that
if
it
was
time
to
do
a
lot
of
calculations
that
really
intensive
across
the
database,
that
it
would
be
in
the
request
latency
of
giving
me
back
my
results.
B
B
A
A
problem,
so
I
think,
like
I,
have
two
notes
here.
I
think
this
kind
of
captures
it
we're
gonna,
let
let's
well.
First
of
all,
let's
just
keep
this
in
mind
when
we're,
because,
like
it's
true
like
right,
this,
the
central
like
a
lot
of
the
essential
piece
of
what
we're
doing
with
their
what
they're
measuring
as
it's
around
prometheus
right
now.
So
we
need
to
be
very
conscious
of
what
we're
doing
so,
that's
true,
we
need
to
be.
We
need
to
monitor
our
load.
A
So,
let's,
just
whenever
we
see
a
situation
where
we
we're
adding
metrics
and
the
number
of
objects
are
created,
skills,
the
number
of
tags
labels
scale,
the
number
of
objects.
We
just
need
to
be
aware
of
it,
but
I
think
the
real
way
we
kind
of
communicate.
This
is
when
we
talk
about
like
having
a
general
guide
of
how
to
scale
with
hubert.
A
I
think
that's
where
we
capture
this
as
a
thing
that
at
best
practice
or
something
people
need
to
watch
out
for
and
that's
yeah
I
mean,
I
think,
that's
at
least
the
best
thing
we
can
do
just
to.
A
E
Like
myself,
yeah,
it
looks
great
oops
yeah.
B
Yeah,
so
it's
well,
it's
regarding,
like
the
the
framework
that
we
have
been
discussing
for
a
while
and
david,
creates
the
you
know
the
profiler,
no,
not
the
profiler,
but
the
report
generator
the
lg
two
choose
crate,
metrics
and
generator
port,
and-
and
now
it's
you
know
proposed,
which
generally
to
generate
the
load
and
for
different
tests.
So
we
have
the
density
test,
but
we
can
add
the
idea
is
to
add
more
tests
later.
B
For
example,
you
know
this
stress
test
that
has
constant
load
and
ramp
up.
You
know
and
keep
delete
creating
in
the
life
cycle
for
the
vm,
so
creating
deleting
in
the
system
and
and
actually
those
are
the
tests
that
we
can
see
this.
They
start
still,
you
know
of
the
the
system
and
see
how
how
much
pressure
it
can
support
anyway.
So
and
then
I
think
I
saw
that
kevin
and
you
guys
makes
made
some
comments,
so
I
will
go
through
that
and
see.
E
Comment
about
this
and
I
don't
think
it's
something
that
needs
to
be
done
immediately,
but
in
the
future,
when
we
think
about
expanding
this
tool-
and
I
haven't
looked
at
this
in
great
detail
either.
But
the
one
thing
that
I
noticed
is
that
it
looks
like
it's.
Creating
it's
got
an
internal
way
of
structuring
the
vmis
and
the
things
that
we
have
control
over
are
primarily
the
image
and
things
like
that,
so
that
that
comes
in
as
an
argument
to
this
in
the
future.
E
Maybe
we
should
look
at
a
templating
mechanism,
something
as
simple
as
taking
existing
bmi
and
know
how
to
use
that
as
the
base
for
our
load
and
just
create
lots
and
lots
of
emis,
with
maybe
different
names
for
the
same
thing,
because
we
are
probably
going
to
want
to
begin
load
testing
in
different
ways
like
using
different
types
of
storage
or
different
types
of
cpu
and
memory,
and
maybe
even
topologies
with
that
like
dedicated
versus
non-dedicated.
B
Right
this
is
a
good
idea,
so
it's
like
I
I
try
to
to
start
it
as
simple
as
possible,
because
you
know
the
the
less
the
less
experience
to
a
big
apr.
It's
it's
it's
hard
to
to
go
in.
You
know
move
forward,
however
yeah
this
is.
I
think
this
is
a
good
idea.
Maybe
maybe
maybe
I
can
already
change
the
you
know
template
you
know
you
mean
template
has
like
the
vmi
uml.
B
D
D
First
thought
that
if,
if
like,
I
commented
that
right
now,
it's
vmi,
but
don't
we
maybe
also
want
to
test
vms
to
test
the
controller
part
load
test
that
part
as
well,
and
I'm
honestly
surprised
I
I'll
look
for
a
bit.
I
was
so
sure
there
is
something
already
providing
that
for
kubernetes
like
you
provide
a
folder
of
yaml
templates
and
it
does
exactly
what
this
is
doing
specifically
for
bmis.
It
just
creates
some
x
times
and
delete
some
x
times
and
does
it
over
and
over.
E
Q
burner
does
that,
but
it
does
a
lot
more
as
well
and
it
doesn't
have
some
of
the
tight
integration
with
vmis
and
vms
that
we
might
want
in
the
future.
I
mean
I've
had
decent
results
using
keyburner
just
with
a
vm
or
excuse
me,
a
bmi's
template
and
then
creating
a
bunch
of
them
and
deleting
a
bunch
of
them.
But
when
we
look
at
creating
lots
of
vms,
specifically
we'll
probably
want
to
begin
doing
actions
on
those
vms.
E
Like
query,
a
bunch
of
vm
objects,
start
them
all,
then
restart
them
all
like
things
that
are
vm
specific,
would
be
difficult,
and
then
we're
going
to
probably
look
at
migrations
at
some
point
in
the
future
as
well
being
part
of
the
density.
So
I
think
as
much
as
I
don't
like
writing
our
own
code.
Unless
we
really
have
to.
I
don't
dislike
the
idea
of
creating
our
own
tool
to
generate
this
being
specific.
E
B
Yeah,
I
I
actually
like
just
you
know.
I
was
thinking
to
do
that.
You
know
in
the
beginning,
but
then
yeah,
I
think
it's.
I
think
it's
good,
and
also
I
I
maybe
I
can
do
something,
as
you
guys
also
mentioned.
B
Instead
of
call
vmis
have
the
ml
with
the
template
and
actually
call
an
object
and
can
be
whatever
the
template
is,
and
then
it's
just
create
me
there
as
many
objects
that
we
want.
E
Sure
yeah,
I
think
that
the
input
config
would
be
the
thing
that's
going
to
serve
us
great
in
the
future,
because
it
I
can
see
this
game
really
complex,
trying
to
make
a
cli
a
repeatable
cli
command
of
this.
As
far
as
the
template
thing
goes,
that's
something
we
can
follow
up
like.
I
want
you
to
make
progress
on
this
and
be
able
to
get
in
fairly
quickly.
So,
whatever
you
think,
it's
a
minimum,
it's
usable,
mm-hmm.
D
A
E
Then
you
just
have
a
repeatable
config
that
you
run
through
here,
and
I
want
us
to
remain
flexible
with
this
stuff
too,
like
if
we
find
that
the
structure
like
I
don't
want
us
to
treat
this
as
a
like.
A
versioned
api
immediately
yeah.
A
E
Be
flexible
and
if
we
figured
out
that
we
want
to
restructure
things
in
the
future,
not
try
to
figure
out
how
to
make
things
backwards,
compatible
or
whatever.
This
is
just
tools
to
help
us
so
there's
something.
D
A
Okay,
all
right
thanks,
marcelo
yeah,
the
oh,
the
other
thing
I
brought
the
eventually
like
you
have
person
here
like
we
can.
We
eventually
can
get
to
extending
it
to
this
stuff
as
well,
which
would
be
cool
because
I
was
kind
of
like
you
have
in
here
like
we
could
also
get
to
more
config,
and
that's
what
kevin
was
saying
like
if
we
could
drop
in
a
test
file
configure
something
we
can
do
other
types
and
then
and
you'll
get
all
sorts
of
different
tests
and
different
results.
A
E
A
Yeah
this
was
just
a
summarize
like
I
was
just
looking
at
this
and
I
need
to
wait
for
him,
but
well.
The
only
thing
I
was
talking
about
this
was
one
of
the
tests
is
like
it's
creating
it's
it's
creating
feminize
and
measuring
their
like
their
performance
and
then
increasing
the
qps
and
then
well
getting
a
baseline,
increasing,
qps
and
then
doing
it
and
measuring
a
difference.
A
Yeah
and
my
take
on
this-
was
that
this.
This
is
just
a
little
complex,
because
there's
just
a
lot
that
can
happen
when
we're
trying
to
measure
performance
here
and
this
test,
and
I'm
kind
of
hoping
that
we
do
it
entirely
outside
in
the
in
the
tool
so
that
we
can
just
kind
of
yeah
like
just
so
that
we
don't.
My
fear,
is
like
we
don't
we
don't.
A
We
do
it
like
very
deliberately,
instead
of
kind
of
just
in
this,
this
one
functional
test,
I
think
like
it,
doesn't
always
necessarily
get
the
best
results.
Just
by
doing
this
because,
like
what
we're
after
is
we're
after
like
making
sure
that
we're
not
getting
rate
load,
not
necessarily
the
performance,
because
qps
could
change
in
the
future
and
all
of
a
sudden.
This
just
breaks
on
us.
E
A
E
This
specific
test
is
meant
to
target
whether
the
client's
rate
limiter
configuration
get
picked
up.
That's
really
all
it's
doing.
We
just
want
some
sort
indication
that,
when
we
set
values
that
some
performance
noticeably
changes,
it's
all
about
the
rate
limiter
configuration
being
propagated.
That's
all
we
care
about.
A
E
E
The
thing
that
roman's
doing
here
that
gives
me
confidence
that
this
will
remain
at
least
somewhat
viable
is
he's
using
percentages,
so
he's
running
a
some
sort
of
scenario,
with
one
configuration
for
the
limiter
and
then
he's
making
a
pretty
drastic
change
to
that
configuration
posting
it
and
running
the
same
scenario
and
just
measuring
the
percentage
of
change,
and
he
expects
a
certain
percentage
of
change.
B
E
B
The
test
is
flaky,
so
that's
why
we
we
start
to
talk
a
little
bit
about
that.
I
did
some
suggestion
about
that.
So,
if
you
can,
you
should
welcome.
The
last
comment
is
actually
related
to
what
you
know
ryan
is
saying
is
so
instead
of
we
just
check
like
how
much
lower
it
gets.
For
example,
what's
flaky
here,
it's
actually
roman
rolls
like
it
should
be
five
times
slower,
but
actually
was
three
times
lower
only
you
know
something
like
that.
I
don't
don't
remember
exactly
what
was
here,
but
something
like
that.
B
So
then,
instead
of
making
like
a
true
relative,
you
know
and
maybe
hard
to
to
verify
that
we
can
maybe
count
how
many
times
the
requests
got
like
a
throatless.
You
know
because
what
I
mean
so
in
the
rest,
client,
it
has
like
just
long
throttle
latency
and
aster
long
total
latency
and
when
it
reads
those
things
it's
right
in
the
log,
so
we
could.
Maybe
just
you
know
when
we
make
it
like.
You
know
the
character
second
tool.
B
We
might
see
more
these
things
on
the
log
and
then
we
can
see
that
it's
getting
you
know,
throated.
The
the
request,
if
we
increase
the
throughput
we
expect
to
actually
maybe
don't
see
any
of
this
thing
on
the
log,
and
then
you
know
it's
just
a
way
to
count
that
if
it's
doing
better
or
worse-
and
we
don't
need
to
play
with
relative-
you
know
performance.
B
You
know
like
five
three
times
or
things
like
that
that
make
it
very
tricky
to
in
the
test
seats
flick
for
that.
Because
of
that.
A
Okay,
well
something
we
can
just
go
through
roman
when
he's
back.
Okay,
we
do
are
almost
out
of
time
here,
so
we
can
cover
the
last
two,
probably
hopefully
pretty
quickly.
I
have
the
next
one's
the
the
performance
threshold,
so
I
wrote
this
originally
and
then
I
just
wired
it
up
to
the
the
audit
tool.
Dave
wrote
basically
the
only
the
the
thing
I
want
to
say
about
this.
Is
that
so
I
took
I
took
your
density
test.
A
I
kind
of
split
it
up
a
little
bit
into
a
few
different
things.
Well,
a
few
like
editions.
One
of
them
is
like
that
that
we
we
make.
We
take
a
look
at
the
prometheus,
the
we
reach
out
to
previous.
A
We
run
the
audit
tool
to
after
we
we
run
the
test,
and
then
I
took
your
test
and
I
took
like
some
of
your
common
functions
and
I
split
them
out
so
that
we
can
do
things
like
things
that
are
for
the
framework
just
common
functions
in
here
and
then
things
that
are
like
for
the
vmi
we
can
do
in
here
and
and
kind
of
you
were
saying
earlier
how
like,
when
we
can
generate
bmis
from
a
template
that
would
kind
of
be
that
would
be
cool
here.
A
Actually
that
was
thinking
that
would
make
these
tests
even
easier
to
write,
but
the
we're
going
to
go
back
to
your
density
test.
So
once
I
have
it
hooked
up,
where's
your
density
test
right
here,
yeah
I
mean
this
just
will
run
the
same,
and
then
we
just
get
the
information
at
the
end
by
just
running
the
the
person
on
a
tool
and
that's
it.
A
So
I'm
still
testing
this.
I
have
the
I'm
just
fighting
with
bank
cluster
up
now
to
to
make
sure
everything
is
looking
correctly,
but
this
is
just
the
work
in
progress
posts
that
I
have
right
now.
A
So
I
have
yeah
I
do
like
right
before
each
I
do
I
get
a
time
and
at
the
end
I
do
another
time
good
time
again.
E
Period,
it
does
help.
A
Yeah
so
yeah
then
they'll
get
us
our
our
metrics
and
make
them
available,
and
then
we
can
see.
I
there's
like
so
much
I
could
see
we
could
do
with
this.
Like
I
mean
I
could
see
us
getting
if
we
could
have
the
take
a
grafana
snapshot
or
something
like
and
then
have
it
available,
it'd
be
so
cool
too,
but
yeah
I
mean
this
will
get
us
something
that
wires
together,
so
that
we
can
very
quickly
create
more
of
this
or
these.
E
Marcelo
in
the
future,
do
you
see
your
performance
tool,
the
strategy.
B
A
It
says
what
yes,
what
you're
saying
is
like.
We
basically
will
replace
this
with
the
tool
like
we
basically
make
a
call
out.
We
wire
this
up
to
the
tool
or
something
or
would
we
not
like
I
mean,
is
that
what
would
be
kind
of
like
is
our
density
test
would
be
triggered
by
something
like
you
know
in
here
like
we
do,
or
would
we
run
it?
It.
A
Well,
I
guess
I
mean
I
guess
it
doesn't
really.
We
could
talk
about
it
when
we
get
there,
but
I
guess
like
what
the
we
want
this
eventually
to
be
run
in
ci.
So
your
your
load,
I
mean
this
is
basically
your
load
test
right
and
this
is
our
gather,
information,
our
audit,
so
yeah
I
mean
that
this.
We
would
basically
just
kind
of
the
way
it
is
now.
A
We
would
just
call
out
to
the
tool
right
here
place
this
entirely
with
your
tool
and
then
yeah,
and
then
we
have,
and
then
we
do
our
audit
at
the
end.
So
it
looks
something
like
that:
okay,
okay,
plenty
of.
A
On
these
on
these,
mrs
then,
I
might
just
wait
for
years
to
go
in
and
I
can
decide
what
to
do.
Probably
just
pull
this
out
and
then
we'll
or
maybe
I'll
just
wire
it
right
up
to
what
you
do.