►
From YouTube: SIG - Performance and scale 2021-06-24
Description
Meeting Notes: https://docs.google.com/document/d/1d_b2o05FfBG37VwlC2Z1ZArnT9-_AEJoQTe7iKaQZ6I/edit#heading=h.pbmjpoqv1fc8
A
Okay,
all
right.
Everyone
welcome
to
sixth
scale.
Please
send
your
name
to
the
attendee
list.
The
link
to
the
to
the
document
is
in
chat,
so
today
we're
going
to
talk
about
some
some
test
framework
ideas.
A
So
we've
we've
we've
had
a
few
we've
had
some
progress,
a
few
things,
so
I
have
this
little
tracking
area
right
here,
where
I
just
kind
of
like
added
things
that
we're
working
on
issues
just
so
we
can
have
an
idea,
so
we
had
one
issue
or
one
pr
that
was
merged.
So
david
also
did
some
good
work
on
adding
the
the
vmi
phase
transition
times
which
which
merged
so
that's
awesome.
A
So
now
we
can
start
looking
to
you
know,
consuming
this
data
and
even
look
at
different
ways
that
we
can
pull
this
into
ci
and
start
and
start
measuring.
A
So
that's
what
I
wanted
to
to
talk
about
today
and-
and
we
can
talk
about
this-
from
two
perspectives,
performance
and
from
scaling,
so
I
figured
we'd
start
with
performance.
So
there's
there's
already
some
work.
That's
that's
begun
around
this.
I
think
marcelo's.
Here
there
marcelo
wrote
a
ipr
looking
to
consume
some
to
kind
of
build
a
job
around
performance
testing,
so
we
can
measure
against
pr
in
each
pr.
So
there's
a
bunch
of
things
that
are
in
this
pr,
and
so
I
told
them.
A
I
went
and
created
a
an
issue
around
this
topic
for
performance
test
framework
that
covers
a
bunch
of
things,
but
so
before
we
get
into
that,
though,
I
kind
of
want
to
back
up
even
a
little
bit
more
and
talk
about
this
idea
in
general
and
get
some
thoughts
and
then
and
then
I
can
always
update
that
issue
afterwards.
So
take
some
notes,
so
at
a
high
level
like
the
goal
we
want
to
accomplish,
is
we
want
to
have
in
every
pr
that's
run.
A
We
want
to
have
a
way
to
measure
with
their
if
we're
above
some
sort
of
performance
threshold
with
this
pr
or
if
we
increase
performance
or
decrease
performance
whatever
it
is.
We
want
to
be
able
to
measure
it,
and
we
want
to
have
this
some
tool
to
measure
it.
We
want
to
run
it
in
every
server
pr
and
we
also
want
to
have
developers
running
against
their
local
codes.
You
want
to
consumable
by
those
two
different
personas.
A
So
with
that
in
mind,
you
know
what
what
are
people's
thoughts
on
like
how
we
can
do
this.
I
just
wrote
a
few
points
here,
but
I
think
this
can
be
expanded
quite
a
bit.
What
are
what
do
folks
think.
B
Well,
I
think
this
takes
a
very
well
the
overview
of
the
pr
data
and
then
the
document
that
I
sent
last
time
in
the
amazon
list
and
the
idea
is
exactly
that,
so
to
be
able
to
track
the
performance
regression
or
improvement
for
prs
and
also
for
release
the
I
also
the
idea
is
to
have
like
three
types
of
size
of
test:
some,
what
I'm
saying
small
scale
that
we
can
run
for
each
pr
it's
for
for
hpr.
B
You
know
daily
what's
happening
also
for
the
prs
and
but
with
a
larger
set
of
tests,
and
then
a
large
scale
test
that
I
actually
I
don't
know
if
we
can
run
so
big
right
now,
but
I
would
say,
like
maybe
you
know
well,
ideally,
is
to
have
like
something
very
big
and
then,
if
we
run
for
each
release,
I
don't
know
what's
what's
big
in
the
biggest
thing
that
we
can
reach
right
now.
B
But
1000
looks
good
now,
but
we
can
maybe
go
even
further
if
it's
needed
and
then
this
large-scale
test.
The
idea
is
to
run
for
each
release
before
you
release
or
for
each
release,
something
like
that
and
and
then
we
can
track
that
I've
been
discussing
with
the
red
hat
folks
about
that
for
a
while,
we
are
getting
access
to
some
resource
to
run
it
for
the
upstream.
B
So
I
actually
got
access
to
some
small
that
we
can
start
to
include
for
the
prs
now
performance
tests.
I
also
want
want
to
make
you
know
there
are
a
couple
of
things
here
for
performance
and
the
way
that
I'm
going
to
configure
the
this
the
infrastructure.
B
B
It's
run
inside
of
yam
and
then
it
creates
vms
inside
yeah.
So
we
don't
want
to
use
the
convert
ci
in
that
way,
we
want
to
have
the
kubernetes
cluster
running
directly
on
the
bare
metal
node.
B
C
D
C
Yeah
yeah
yeah.
I
would
expect
that
for
an
initial
phase,
we
should
really
not
boot
vms,
fully.
A
So
what
about?
Well?
If
we
don't?
So
if
we
don't
boot
the
m,
so
we
so
that
means
that
we,
the
running
phase
when
we
reach
running
on
the
vm
and
transition
time.
That
means
that
that
what
like
that's
when
we've
handed
off
to
the
handler,
but
doesn't
necessarily
mean
we
have
to
boot,
we
just
kind
of
stop
at
some.
At
some
point.
It
means
that
the
kimu
process-
well,
it
means,
and
when
it.
C
Reaches
running
yeah,
it
means
that
that
we
got
the
report
from
livert
that
cue
emu
started
the
booting
process.
C
E
Is
that
we're
not
measuring
we're
just
measuring
the
control
plane
so
maybe
said
differently
initially
we're
just
measuring
the
control
plane's
ability
to
scale.
A
Yeah,
okay!
Well
so
we
could
say
I
so
I
think
so
that's
important,
and
I
think
I
the
way
I
view
this
is
kind
of
like
stressing
the
control.
Point
of
view
is
more
like
the
scale
side
of
this,
which
I
do
want
to
talk
about,
but
like
for
pure
performance
testing,
though
we
do,
we
want
to
boot
the
vms.
Would
that
make
sense
that
sounds
like
we
would.
B
We
have
some-
I
had
actually
some
discussion
that
with
ramon.
Well,
I
think
both
scenarios
are
interesting.
So
right
now
we
decide
to
go
to
not
boot
the
vms
and
try
to
put
as
much
as
possible
pressure
to
the
control
plane.
Booting
the
vms
can
bring
some
benefit
like
it.
We
can
make
sure
that,
for
example,
the
network,
it's
operational
everything
it's
working,
everything
you
know
gets
exactly
allocated
to
the
vm.
That
should
be,
if
we
don't
boot
that
with
some
some
of
these
things,
we
cannot
check.
B
But
it's
fine
for
now,
because
what
I'm
saying
the
plan
now
is
to
put
pressure
as
much
as
possible
to
the
contour
plane
and
just
to
answer
that
in
the
pr
that
I
implemented
was
running
means
the
vm
got
the
state
running,
and
it
just
mean
that
they,
you
know
start
they
start
to
command,
was
sent
to
the
cooper
at
the
library
and
but
doesn't
mean
that
the
vm
will
actually
put
so.
Okay.
A
C
E
What
if
we
use
one
of
those
kernel
images
like
just
a
really
really
bare
bones,
then
it
wouldn't
have
to
necessarily
crash
and
booting
would
be
practically
non-existent.
I
mean
it
would
boot,
but
it
wouldn't
do
anything
really.
It
would
just
exist.
C
G
E
D
Get
the
traffic
we
wouldn't
get
traffic
from
things
like
guest
agents
updates
and
the
all
the
updates
to
the
status
as
the
as
always
comes
up
and
goes
later,
halts
yeah,
but.
F
You
said
the
word
crash
earlier,
so
I
might
have
missed.
C
E
A
A
So
that
sounds
like
a
really
lightweight
way.
We
could
just
get
a
lot
of
vms
going
okay,
so
then
I
I
guess
so
what
way
I'll
characterize
this-
and
you
know
correct
me
if
you'll
disagree,
so
I
say,
like
we
start
with
vms,
that
won't
boot.
There
are
other
areas
we
can
look
at
here.
If
we
do
want
to
expand
this,
because
this
does
limit
the
scope
and
that
likely,
like
you
said
we
can't
do
any
stuff
with
with
the
running
like.
A
If
we
wanted
to
do
like
attached
devices,
we
wanted
to
touch
a
network.
We
wanted
to
do
like
time
to
ip
address
or
something
like
that.
We
can't
do
that
and
now
that
could
be
something
we
could
consider
doing
measuring
in
the
future
if
we
wanted
to
so
that
could
be
an
extension.
But
maybe
this
is
like
the
first
step
that
we
can
look
at
most
achievable
goal
in
front
of
us,
so
we
so
we
could
do.
C
Yeah,
I
would
say
it's
the
least
controversial
in
general,
because
you
don't
have
to
care
about
the
operating
system
at
all
and
some
considerations
there
which,
like,
for
instance,
what
happens
with
your
data
if
you
up
after
at
some
point,
update
serious
if
you're,
using
it
or
fedora
version
or
whatever,
so
it's
a
very
reliable
source
to
not
boot
the
van.
That's
what
I
would
say.
Everything
else
depends
a
lot
on
a
lot
of
other
factors.
A
Okay,
okay,
so
it
sounds
like
with
this
idea,
so
we
could
reach
a
lot
of
a
lot
of
vms.
Is
there
like?
I
guess
we
wouldn't
really
know
what
the
limit
is,
so
it
will
eventually
just
reach
something
or
we'll
be
able
to
like.
Oh
actually,
let
me
phrase
this
differently.
So
like
these
vms
they're,
not
bootable,
we're
gonna,
do
it
in
essenvert
and
what's
like
I
don't
know
what
would
be
like
roughly
the
number
of
the
ends
like
we
could
do
like.
B
Well,
so,
if
we
give
like,
we
can
do
some
math,
I
would
say
like
how
many
how
many
resources
we
can
allocate
if
we
get
we
are
going
to
get
like
one
0.1
from
one
cpu.
Then
we
can
have
like
you
know,
10
vms
per
cpu,
something
like
that.
B
A
All
right
I'll
just
say,
there's
another
question,
then:
okay,
okay,
that
sounds
pretty
reasonable.
We
can
sounds
like
something
we
can
start
with.
Okay,
so
we
so
we
don't
boot
these
vms
things
you
want
to
do
like.
We
want
to
capture
the
metric
support
pr
performance
center,
so
this
would
be
like
so
david's
change.
We
have.
We
have
all
the
phases,
we
should
capture
all
of
them
and
then
we
have
some
sort
of
report
and
we
need
to.
C
This
is
something
which
we
will
just
find
out,
so
I
would
expect
that
we
start
like
with
running
the
tests
three
times
like
with
20,
vms
50,
vms
and
100
vms,
on
the
same
keyboard
configuration
in
the
same
cluster
and
and
initially
it's
just
the
matter
of
comparing
them
and
seeing
like,
and
then
you
can
start
defining
like
what
baseline
do
we
reach
with
these
bulks
and
where
is
the
startup
time
moving,
and
this
is
the
base
diato
and
once
we
have
once
we
have
it
and
visualize
it
or
can
compare
it,
we
can
start
defining
bass
lines.
A
Yeah,
okay,
that
makes
sense
to
me,
okay
and
then
do
we
want
to
talk
about
so
that
so
this
would
be
our
first
step.
So
we
want
to
talk
about
this
other
stuff
like
defining
types
of
tests.
Then
so
we
get
a
bait,
we
get
a
baseline
metric
and
we,
I
think
we
answer
this
question
too.
We
figure
out
like
how
many
vms,
so
I'm
gonna
do
this
down
here.
A
G
A
Okay,
so
first
step
will
be
so
I
like
the
idea
of
let's:
let's
do
some
information
gathering
here-
let's
figure
out,
let's
figure
out
the
answer
to
this:
let's
attempt
to
test,
let's
define
standards
and,
let's
figure
out
how
many
vms
we
can
reasonably
do
this
with
and.
C
I
think
the
numbers
above
I
don't
know
what
we
can
run
right
now:
yeah
marcelo.
You
know
how
many
machines
you
got
or
how
much
you
can
get
from
that
perspective,
but
I
think
it's
even
very
reasonable
to
not
even
start
with
two
big
numbers
numbers
and
we
would
still
get
reasonable
input
like
really
twenty
fifty
hundred,
maybe
two
years,
it's
not
like.
B
Okay,
yeah,
I
know
yeah,
I'm
just
saying
for
the
amount
of
vm
that
I'm
thinking
it's
to
start
with
100.
You
know
it's
for
the
the
small
scale,
pr
that
we
are
we
are
thinking
about
and,
of
course
we
can
push
to
see
how
much
we
can
get
with
the
cluster
that
we
we
got,
but
it
shouldn't
be
too
big.
Now
I
mean.
C
C
B
A
Yeah,
okay,
so
marcel
I'm
gonna
assign
this
to
you
since
you've
already
been
looking
at
this,
so
this
will
be.
We
want
to
try
to
answer
these
two
questions.
You
want
to
know
the
foreign
baseline
and
how
many
vms
that
we
can
get
to
okay,
great
okay.
So
then,
so
we
can
so
that'll
be
like
what
we
can.
We
can
start
with
so
there's
the
other
thing.
The
other
aspect
of
this.
We
we,
let's
take
some
time.
A
We
can
even
talk
about
like
different
ways
that
we
can
expand
this,
maybe
some
different
types
of
tests
so,
like
I
wrote,
which
one
is
it
this
this
issue
after
I
saw
marcel's
pr,
so
the
I
tried
to
define
a
few
things
in
here.
So
a
few
things
like
like
this,
like
what
are
the
different
types
of
ways
we
can
generate
low,
different
types
of
ways
that
things
tests
we
can
do
so.
You
talked
about
density.
You
know
birth
testing,
that's
that's
one.
A
We
do
stress,
testing
soap,
spike
testing.
These
are
just
some
of
the
ones
I've
read
about.
Does
that
make
sense
to
people
like?
What's
like?
What
do
you
think?
Is
there
could
there
be
more
here?
What
do
folks
think
of
these.
B
Yeah,
so
I
don't
know
if
it's
covered
here,
but
one
of
the
tests
that
actually
so
and
the
plan
that
I
sent
before
I
was
thinking
about
three
different
kind
of
tests,
one
it's.
You
know
this
of
a
shock
task
that
we
create
a
bunch
of
vms
that
it's
the
density
test
or
we
can.
We
can
do
like
different
ways-
stress,
as
I
mentioned
here
and
another
one.
B
It
should
be
like
you
know,
to
measure
the
steady
state
of
the
constant
load
that
we
generate
so
especially
to
measure
you
know
the
scheduling
we
we
can
configure,
for
example,
10
vms
per
second,
and
we
should
define
a
maximum.
You
know
population
in
the
in
the
cluster
and
delete
the
vms
and
keep
the
load
constant
for
for
a
you,
know,
constant
period
and
see
how
the
system,
maybe
is
the
stress
test
here
that
it
wrote
in
it
so
how
the
system
keeps
with
the
constant
load.
B
It
might
break
the
system
so
and
especially
if
the
depression
is
very
high
and
another
thing
is,
I
don't
know
if
we
want
to
cover
that,
but
it's
I'm
just
thinking
about
this
test.
Kubernetes
is
also
doing
them.
So
that's
why
I'm
trying
to
include
here
also
just
three
different
tasks
that
I
mentioned
the
other
one.
It's
like
chaos
test,
so
we
just
suddenly
remove
an
old
and
things
should
come
back.
You
know
and
we
need
to
measure.
How
long
does
it
take?
Should
the
system
recover.
A
Yeah,
do
we
well,
I
don't
know
if
we
want
to
do
that
here,
though,
like
I
understand
the
need
for
that
definitely
like
we
could
kill
some
of
the
control
plane
or
something
and
see
what
happens,
but
I
don't
know
I
mean
doing
that,
while
we're
measuring
performance,
I
don't
know
if
it's
going
to
get
us
a
lot
of
data
that,
like
that,
like
we
could
get
a
lot
of
variation
in
the
data
just
based
on
things.
Just
all
the
things
happening
in
the
cluster.
A
Yeah,
so
the
well,
the
other
thing,
so
I
I
was
just
thinking
of
this,
so
I
I
have
another
thing
here:
we
have,
we
have
generate
types
of
load.
This
is
another
question.
How
are
we
going
to
generate
load?
I
I
just
lumped
it
in
here
as
part
of
this
tool,
but
it's
an
open
question.
I
know
david
you've
talked
about
this
like
previously
like
what
do
people
think
like?
How
can
we
generate
load?
Is
that
that's
another
thing
we
need
to
figure
out.
I
think.
E
Everything's
on
the
table
there,
I
think
that
the
ci
or
the
functional
tests
that
we're
generating
load
that
marcelo
is
already
starting
on
that.
That
makes
sense.
It's
not
very
configurable
necessarily,
but
I
think,
as
a
standard
way
of
just
repeating
the
same
test
over
and
over
again
yeah.
That
is
probably
fine.
There's
some
other
tools.
I
looked
at
like
cube
burner,
which
it
needs
a
little
like.
E
I
need
to
submit
a
little
patch
just
to
make
it
wait
on
virtual
machines
to
until
they're
ready,
but
it
allows
you
to
create
a
repeatable
config,
so
you
would
define
a
config
with
some
vm
templates
kind
of
and
it
would
start
having
ever
made
iterations
of
that
exact
same
virtual
machine.
E
You
want
wait
until
they
all
come
online
then
go
to
the
next
iteration
start,
like
you
know,
100
more
and
like
you
can
represent
things
like
that,
and
even
like
the
deletion
of
them
afterwards,
so
you
can
in
one
config
to
define
how
you
want
to
kind
of
stepwise,
add
load.
So
we
could
use
a
tool
like
that
and
I'm
sure
there's
others
as
well.
A
Do
we,
I
guess
so
we
can
maybe
leave
this
to
some
investigation,
so
I
well
okay.
What
would
be
the
easiest
to
get
started?
Marcelo
seems
to
be
like
you
already
have
something.
So
maybe
we
can
just
start
with
that
to
see
just
see
like
how
it
goes
like
if
it
would
just
start
with
it
and
then
you
know,
as
we
maybe
start
to
look
at
expanding
it,
because
we
have
other
use
cases,
we
can
look
at
q
burner
or
other
use
cases
as
something
that
we
need
to
be
more
configurable.
B
Yeah,
so
just
just
to
give
like
very,
very
high
level
background
why
I
start
to
write
it
in
as
a
functional
test,
because,
first
of
all,
we
want
to
add
you
know
in
the
convert
ci
the
the
jobs
running
so
that
the
load
generator
should
be
well.
B
It's
not
not
necessary
should
be
in
cooper
repository,
but
that's
why
we
I
discussed
with
roman
and
he
suggested
to
to
keep
there
and
it
was
I.
I
thought
it
was
more
natural
to
add
it
as
the
functional
test.
As
everything
was
in
this
folder,
you
know
convert
test
and,
and
also
it
can
be-
you
know,
use
it
like
that
and
I've.
I
I
received
some
comments
in
the
pr
I
think
was
david.
I
don't
remember
now
so
yeah
that.
E
B
Me
yeah,
so
he
suggested
to
actually
separate
the
part
that
I'm
collecting
the
metrics
and
make
it
like
a
framework,
so
as
ryan
suggested
also,
you
know
to
have
this
performance
framework
and
actually
create
like
a
tool
that
it
I
I
saw,
the
school
burn
burn
is
doing
something
very
similar,
isn't
it
but
the
tool
can
be
something
more
or
less.
B
What
burn
is
doing
for
collecting
the
results
and
then
anything
can
generate
the
load
like
we
were
discussing
the
the
functional
tasks
or
any
scripts
or
even
could
burn
just
just
generating
the
load,
but
I
think
it's
what
we
help
like
to
to
keep
the
track
now
to
have
like
this.
You
know
functional
tests
to
create
the
vms,
and
then
we
can
just
test
verify
some
thresholds,
and
actually
the
test
will
fail.
So
this
kind
of
structure
that
helps
you
know
to
have
the
pr
the
performance
test
in
the
convert
ci.
B
So
that's
why
it's
good
also
to
have
it
in
the
as
a
functional
test.
Well,.
A
Is
there
a
way
to
get
the
numbers
without
like
because
I
I
do
like
the
because
to
me,
like
I
said
on
the
comment
like
we
one
of
the
goals
we
we
want
to
have
a
framework
we
want
to
be
so
that
it
can
be
usable
in
ci
or
by
users,
and
so
I
like
that's,
why
why
I
suggested
that
is
we
could
we
could
discuss
that
specifically
in
a
in
a
in
its
own
pr
and
then
and
then
have
like,
so
that
we
can
kind
of
separate
the
idea
of
generating
load
out
because
it
because
it
can
be
anything
like
we've
said
here,
but
but
is
there
a
way
like
you
can?
A
We
could
take
what
you
have
and
we
can
generate
that
base
like
answer
these
questions
like
without
without
merging
it
like,
so
we
can
just
get
the
data,
and
then
we
can
look
at
the
the
different
components
of
it.
Yeah
comment
on
them.
B
We
can
do
that
right,
so
without
merging,
so
I
actually
the
you
know
the
functional
test
that
I
created.
I
don't
know
if
it's
the
best
practice
that,
but
anyway,
I'm
actually
reading
a
configuration
file
and
in
the
configuration
file,
it's
possible
to
define
the
number
of
vms
and
the
functional
test
will
actually
be
dynamically
configured.
So,
and
you
know
it's
not
hard
to
call
that
that
the
the
test
configuration
itself
is
not
hardcoded.
It's
in
the
configuration
of
ml,
so.
C
Yeah,
mostly
about
I
mean
we
kind
of,
I
think
we
kind
of
agreed
now
to
one
first
basic
kind
of
test,
which
is
really
just
putting
a
very
small
as
a
very,
very
small
vm
as
small
as
possible
and
creating
an
embark
and
collecting
the
metrics
for
it,
and
I
guess
just
a
simple
end-to-end
test
would
be
sufficient
for
that.
Without
any
framework
or
anything
we
collect
the
prometheus
metrics.
We
can.
E
C
Ci
environment,
so
that
the
data
would
be
collected
from.
B
So
the
pr
is
doing
that
right
now
I
don't
know
yeah.
E
It's
just
it's
just
big,
that's
all
I
mean
what
are
we
talking
about
as
far
as
the
data
collection
we're
talking
about
manually,
collecting
it
ourselves
or
are
we
talking
about
the
data
collection
that
is
currently
in
the
pr?
I
just
want
to
make
sure
it's.
E
Talk
about
so
right
now
the
pr
is
creating
a
report
of
some
sort
or
that's
what
we're
talking
about.
I'm
not
sure
how
much
has
been
implemented
there
we're
talking
about
just
the
density
tests
and
we
would
be
looking
at
the
results
ourselves
so
not
having
the
functional
test
actually
prepare
a
report
for
us.
Is
that
accurate,
or
are
you
wanting
the
functional
test
to
prepare
a
report
as
well
roman?
I.
E
Well,
I
I
I
guess
I
would
be
in
favor
of
getting
a
density
test
is
optional,
a
really
small
density
test,
just
the
bare
minimum
that
we
need
to
begin
to
start
thinking
about
this
stuff
and
immediately,
and
let
that
be
the
thing
that
kind
of
starts
the
ball
moving
here
and
then
for
us
to
work
on
metrics
collecting
and
things
like
that
and
generating
a
report
independently
of
that.
B
Okay,
I
think
I
got
it,
I
can,
you
know,
simplify
and
remove
the
configuration
and
the
report
part
from
the
pr,
and
it
will
be.
B
C
B
B
Another
thing
is
the
resource
usage
for
for
the
memory
and
cpu
and
things,
for
example,
that
I'm
analyzing
I
give
like
the
for
the
vm
hvm
in
hvmi,
0.1
cpu,
request,
okay,
unlimited
and
then
I
get
like
the
resource
users
of
the
pod
that
it's,
of
course
many
containers
run
inside
and
in
my
very
simple
test
that
I
did,
it
was
using
the
double
of
the
cpu
that
should
it
was
allocated
for
the
vmi,
so
something
else
vert
handler
and
maybe
something
else
that
it's
there
in
the
pod.
It's
consuming.
B
You
know
some
overhead
of
cpu
and
this
kind
of
things
that
I
thought
was
important.
That's
why
I'm
collecting
this
and
and
then
do
I
used
to
collect
that
to
test.
Or
do
you
guys
think
that
this
pr
should
have
only
the
latency
and
not
the
resource.
E
I
think
that
the
pr
should
just
have
the
density
test
and
not
the
anything
with
metrics
collecting
yeah.
I
agree
to
that
and
I
think
that
when
we
look
at
the
metrics
collecting
part
like
if
we
look
at
external
tool,
or
whatever
like
I
mentioned,
maybe
in
the
tools
directory
we'd
start
with
just
a
single
something
really
really
simple
like
just
the
ver.
E
Maybe
we
just
start
with
transition
times
to
begin
with
and
create
a
report
that
just
shows
that
and
then
we
keep
expanding
that
and
introduce
memory
and
cpu,
perhaps
or
other
things
we're
interested.
I'm
just
saying,
let's
start
small
and
add
on
as
we
start
getting
more
data
and.
D
Right,
I'd
add
to
that
number
of
api
calls
made
the
vmi
or
something
like
that.
A
Yeah
yeah,
I
agree,
there's
like
a
bunch
of
them,
so
yeah
I
covered
this
gavin
in,
like
the
like.
I
have
I've
listed
three
here
and
like
like.
One
of
the
comments
like
I
mentioned,
is
that
basically
everything
that
in
this
pr
could
be
a
threshold
test,
every
single
thing
here
so
yeah
like,
but
but
like
like
we're,
saying
like
these,
are
we'll
get
there
so
I'd
like
to.
I
like
the
idea
with
starting.
A
B
So,
for
I
think
for
the
you
know
in
the
convert
ci,
we
don't
have.
We
don't
need
to
have
like
too
many
thresholds
metrics,
but
it's
just
some
key
ones.
B
I
I
don't
know
if
you
guys
saw
I
I
started
some
time
ago
to
do
like
just
a
document,
for
you
know
defining
this.
What
should
be
our
slos
for
the
convert,
so
you
know,
for
example,
vm
creation
time
it's
one
of
the
metrics
that
we
should
pay
attention
and
I
don't
think
we
should
have
dozen
of
them.
It's
too
much
things
you
know
and
get
complicated
to
to
analyze
and
see,
even
though
some
of
them
represent
the
same
thing.
So.
A
Yeah
marcelo,
we
have
so
I'd,
say
marcel.
These
are
the
three
things
that
that
have
for
action
items
that
you
can
take
here
so
like
we
do
that.
So
we
look
for
the
baseline,
we're
going
to
figure
out
how
many
vms
we
can
handle
in
ci,
and
then
we
take
your
pr.
We
break
it
down
to
a
simple
density
test
and
then
we
we
then
come
back
and
we
look
at
separating
this
out
into
a
framework
which
could
then
include
all
of
these
things.
So
that'll
be
part
of
the
discussion.
A
Talk
about
it
when
we
get
there,
I
think,
like
the,
I
think,
when
I
think
that's
something
that'll
probably
be
good.
Just
got
a
good
discussion
topic,
maybe
for
next
meeting.
If
we
after
we
get
this
data,
I
think
that's
something.
Maybe
after
like
two
weeks
go
by,
we
get.
We
got
that
information,
and
then
we
start
seeing
okay,
here's
how
we
can
expand
on
this.
Then
I
think
it'll
be
a
little
clearer
what
we
should
do,
so
we
can
take
that
for
a
next
step.
For
next
time,.
E
We
have
a
directory
for
just
kind
of
one-off
tools,
so
we
can
start
if
it's
just
for
convenience.
We
are
going
to
build
system
and
everything
in
cube.
We
can
start
in
the
key
vert
repo
and
just
as
the
path
of
least
resistance,
and
if
we
want
to
separate
that
out
of
the
keyboard
repo
later
on,
I
don't.
I
don't
see
that
being
a
problem.
E
A
Okay,
I
think
the
this
sounds
pretty
good.
I
think
we
have
a
plan
to
go
with
this
performance
test
framework
and
the
density
tests
and
get
things
kicked
off,
so
that
makes
sense
to
me
okay,
so
I
I'll
I'll
do
like
some
of
the
tracking
here
at
this.
I
can
I
just
kind
of
use
issues
and
attack
things,
but
if
I
go
with
you,
this
was
another
thing.
If
we
marcelo,
I
saw
you,
you
labeled
the
sixth
scale.
Do
we
do
we?
B
A
G
G
It
would
yeah,
I
think,
it's
a
pro
feature,
but
I'm
not
sure
I.
I
only
know
that
any
question
is
when
there's.
A
How
do
I
do
it
like
slash,
sig
scale,.
E
A
Okay,
all
right
something:
we
can
look
at
next
time:
okay,
great,
okay!
So
then,
okay,
second
question!
Our
second
point
here
is
skill,
testing,
so
kind
of
looking
to
answer
the
questions
like.
How
can
we
get
to
thousands
of
vms
to
stress
the
control
plane
to
test
in
every
pr,
maybe
not
wpr,
but
something
we
can
center
for
our
goal?.
A
Daily,
whatever,
if
it's
something
for
release
whatever
it
is,
but
we
want,
we
want
a
ton
we
want.
We
want,
I
think,
north
of
a
thousand
to
really
cause
stress
so
kind
of
open
floor
on
this
like
like
so
you
already
mentioned
vms,
that
don't
boot.
That
sounds
like
that.
Could
we
don't
know,
but
it
sounds
like
it
could
get
us
somewhere
that
that's
to
me
sounds
like
an
option.
This
is
there's
a
there's.
A
Another
idea
that
I
thought
of,
because
I
mentioned
q
mark
originally
when
we
started
this
sig
scale,
and
I
talked
to
david
about
this
a
while
ago
and
along
a
very
similar
line.
This
was
something
I
had
in
mind
that
we
could
do.
I
wrote
a
very
like
one
pagey
kind
of
design
document
around
this.
That's
linked
there
and
it's
pretty
simple.
The
idea.
A
All
this
is
is
or
the
concept
behind
this
is
that,
because
every
resource
in
kubernetes
is
a
or
can
be
considered
an
api
extension,
what
we
could
do
is
we
could
look
at
essentially
faking
our
components
so
that
what
we
could
do
is
we
can
lie
about
the
idea
of
a
vm.
We
basically
remove
the
the
compute
aspect
out
of
it,
so
we
have
no
pods.
A
We
have
no
vms,
we
just
pass
around
gamble
everywhere
and
we
just
mock
the
whole
thing
and
so
that
we
have
no
run
time.
We
don't
launch
any
pods.
We
could
do
this
with
a
real
api
server
that
controller
runtime
offers
and
we
just
take
tons
of
ammo.
We
just
throw
it
at
the
api
and
the
controller
and
just
see
what
happens-
and
this
is
one
idea
I
thought
could
be
like
a
low,
a
very
like
something:
that's
achievable
that
doesn't
require
a
lot
of
resources
to
do.
C
I
guess
you
can
remove
quite
some
load
by
implementing
invert
handle
or
something
like
a
more
client,
a
grpc
client
which
sits
on
the
underside
of
her
tender,
which
does
basically
nothing
except
reporting.
The
game
running
at.
C
E
That's
interesting,
so
it's
also
possible
that
we
could
create
a
code
path
or
a
way
where
we
would
ignore
the
pod
creation
and
just
immediately
hand
off,
divert
handler
and
then
pretend
like
we're
doing
all
the
commands.
So
we
can
use
a
real
cluster
and
just
tell
the
qfr
control
plane
to
fake
the
poly
creation.
Don't
actually
do
it
pretend
like
it
happened
and
also,
I
guess,
fake.
The
grpc
calls.
A
A
A
The
yeah
I
know
like
the
idea
would
be
that
we,
the
idea,
would
be
that
we
like,
when
I
think
like
when
looking
at
what's
happening
in
at
this
level
right
here
in
the
middle.
It's
we're
basically
we're
moving
yaml
around
right
and,
and
so
the
idea
is
that
it
doesn't
matter.
You
know
if
these
things
actually
exist
as
long
as
there's
yaml.
These
things
are
are
stressed
out,
and
so,
if
we
can
create
as
much
yaml
as
possible,
you
know
what's
the
way
we
create
as
much
you
know
as
possible.
A
E
F
A
Yeah,
like
we
could
run
into,
I
mean
I
agree,
dude
like
we
can
run
into
so
many
issues
like
when
we,
for
instance,
like
like
even
the
like,
I
mean
lots
of
different
things.
One
thing
is
like
I
was
thinking
of.
A
Is
that
that
recent
issue
that
we
had
and
then
I
pushed
the
mainland
with
the
list,
calls
from
vert
handler
like
if
we
had
if
we
had
if
we
were
fake,
like
we
had
thousands
of
word
handlers
and
we
had
all
that
yaml
floating
around,
so
we
had
prometheus
enabled
we
would
have.
We
would
have
seen
that
that
massive
spike
in
latency
would
have
exploded
in
our
faces,
and
so
that's
like
where
we
want
we'll
see
that
stuff,
but
we
won't
get
like
you
know
like.
A
If
we're
not
running
these
resources,
we're
not
actually
running,
we
don't
have
the
compute
resources
running.
You
know
we
don't
always
see
like
okay.
What's
the
interaction
with
when
you
have
one
handler
and
there's
thousands
and
thousands
of
launchers
or
something
or
I
don't
know
what.
However,
many
you
can
fit
in
a
node
and
things
like
that
and
how
it
holds
up
with
the
the
larger
cluster
so
yeah
we
we
can
get
something
but
yeah
we
can't.
I
agree,
we
can't.
We
can't
get
everything.
E
And
one
of
those
list
regressions
that
we
had
was
actually
caused
by
it,
wasn't
caused
by
prometheus.
It
was
in
the
prometheus
scraping
of
our
components,
so
we'd
have
to
like
that's
something
that
wouldn't
necessarily
get
represented
in
caper,
either
prometheus
hitting
our
scrape
in
points
over
and
over.
Maybe
we
can
make
it
do
that,
but.
A
Yeah,
I
know
I
I'm
using
that
example,
because
it
happened
to
involve
handler
and,
like
I
said,
if
you
enable,
if
we
had
enabled
it
or
something,
then
we
would
have
seen
it
but
like
that
idea
of
like
of
doing
lots
of
of
lists
or
something
could
be
hidden
in
somewhere
in
here
like
say,
we're
doing
a
lot
of
different
api
calls
and
we
just
haven't
reached
the
scale,
because
we
don't
have
the
physical
capacity
to
do
it.
A
We
could
find
some
things,
so
I
guess
like
the
way
to
characterize
is:
if
we,
if
we
were
to
say
like
create
5000
fake
vms,
it
doesn't
necessarily
mean
we
can
guarantee
a
scale
of
5
000.,
but
we
have
at
least
some
confidence
that
the
yaml
will
hold
up
in
our
components.
We'll
have.
We
won't
have
massive
amounts
of
latency.
In
some
cases
we
will
have
found
some
paths
where
it'll
be
functional.
C
C
B
B
B
Liberty
store
like
run
time.
So
then
I
think
instead
of
fake,
you
know
the
the
cooper
and
the
co-handler
and
the
coupe
launcher.
It's
just
maybe
fake.
If
we
need
to
go
for
the
direction.
Okay,
if
you
want
but
just
fake
the
leave,
you
know
deliver
it
itself
to
create
the
vienna
and
and
do
not
explode.
You
know
some
key
components
that
could
relies
on.
A
A
It
sounds
like
the
the
no
boot
option
is
the
easiest
one
to
start
with,
so
we
should
start
with
that
and
if
we
hit
some
limits,
this
is
something
we
can
consider
as
just
a
way
that
we
can
get
tons
of
ammo
and
just
see
what
how
our
control
plane
handles
it.
So
that's
that
is
an
additional
option,
something
we
can
keep
in
mind.
D
Yeah,
if
I
think
about
our
setup,
I
think
that
that
setup
is
quite
likely
to
show
up
pain
points
in
things
like
the
mutators
we
have,
and
so
on,
some
of
which
look
potentially
single
threaded
in
places
and
so
on,
and
I
think
it'll
be
valuable
for
that.
You
know
even
beyond
the
core
components,
all
the
additional
stuff
where
we're
heading
and
there's
likely
to
be
added
in
a
production
environment.
I
think
we'll
hire
problems
on
those
quite
quickly.
A
Yeah,
the
other
thing
I
was
thinking
like
we
don't
like.
I
wonder
how
many
like
fan
did
that
work.
They
posted
on
my
list.
We
did
like
the
reconcile
changes
like
you
wonder,
like
you
know
how
many
different
requests
we're
making,
how
many
api
requests
are
making
kubernetes,
and
that
would
be
interesting
to
me
to
see
like
when
we
really
explode
the
amount
of
launchers
and
our
handlers
and
when
we
have
tons
of
vmis
what
what
ends
up
happening.
A
You
know
like
what
like,
how
costly
are
these
requests,
so
things
like
that,
like
we
can
get,
we
get
some
numbers
roughly
like
that.
That
stuff
would
still
hold
like
the
number
of
api
requests
like
we
could
find
that
this
way,
there's
something
we
could
know
that
that's
a
problem,
so
there's
some
there's
a
bunch
of
things
that
we
can.
I
think
in
here
that
we
can
learn
but
yeah.
So
we'll
start
with.
Let's
take
a
circle
back,
though
I
think
I
think
it
makes
sense.
We
start
with
this.
A
B
I
have
I
have
one
comment
so
in
in
red
hat,
we
have
a
meeting,
you
know,
b
weekly,
it's
a
it's
also
like
a
meeting
for
scaling
performance,
but
it's
more
for
open
shift
anyway.
So
I
some
I
I
joined
this
meeting
normally
and
I
think
some
guys
here
also
do
that.
B
I
don't
know
if
it's
possible,
so
it's
bi-weekly.
So
if
it's
maybe
yeah
and
normally
it's
happens-
that
you
know
the
same
day
as
this
meeting
is
happening.
I
don't
know
if
it's
possible.
If,
for
you
guys
for
this
meeting
here
because
the
other
meeting,
we
cannot
change
it's
a
lot
of
of
people
that
doesn't
want
to
change.
B
But
if
it's
possible
here
just
change
the
you
know
which
which
meeting
at
which
week
this
meeting
it's
happening.
B
A
Well,
what
do
what
do
people
I
mean
we,
we
kind
of
we've
had
some
growing,
I
mean
we've
had
a
lot
of
things,
but
more
things
are
picking
up.
I
mean
right
now
we're
for
bi
monthly.
I
don't
know.
Do
people
think
that
it'd
be
more
more
or
less
or
the
same
value?
If
we
were
to
have
this
weekly
like
we
could
do
that,
and
then
we
could
get.
G
B
A
It's
what
I'm
saying
is
that
is
a
possible
solution,
would
be
that
we
could
have
it.
Well.
Maybe
that
turned
me
over
wrong,
but
if
we
were
to
go
to
weekly
on
thursday,
I
would
admire
saying
it's
like
you'd
be
able
to
it
wouldn't
conflict
every
other
week,
right
yeah.
So
that's
what
I
was
saying.
It's
like.
I
figured
I'd
just
throw
that
out
there.
A
If,
if
people
find
this
meeting
valuable
and
we've
been
having
a
lot
of
content
in
it,
we
could
look
to
go
to
weekly
and
then
you
know,
folks
that
have
that
conflict.
You
know
you
can
take
the
internal
meaning
and
then
join
on
the
other
aspect.
We
can
have
it
weekly
as
long
as
if
people
find
it
valuable,
then
I,
if
it's
you
know
if
that's
worth
the
time
to
have
a
weekly,
and
I
think
it
makes
sense
too,
but
I
don't
know
what
do
people
think
is
this?
A
Do
you
think
we
have
enough
content
with
the
things
I
mean?
We
seem
to
be
picking
a
lot
of
things
up,
so
maybe
we
could
go
to
weekly.
C
C
A
Okay,
all
right,
I
guess
then
so
why
don't
we
go
to
weekly,
then
and
and
then
it
kind
of
makes
it
easier
anyway,
for
scheduling
like
no
one
has
to
remember
or
a
week
and
a
half
hour
or
whatever
to
the
next
one.
So
just
do
weekly
on
thursday
and
then
yeah
then
we'll
just
yeah.
I
think
it
fits
better.
Then
it'll
fit
the
the
schedule
that
you're
that
conflict,
okay.
A
All
right
I'll
do
the
I'll
do
I'll,
handle
logistics
with
with
folks
and
we'll
get
that
sorted,
so
we'll
do
weekly.
So
the
next
meeting
will
be
next
thursday.
Okay.