►
From YouTube: SIG - Performance and scale 2021-07-08
Description
Meeting Notes: https://docs.google.com/document/d/1d_b2o05FfBG37VwlC2Z1ZArnT9-_AEJoQTe7iKaQZ6I/edit#heading=h.8taxjc2uv4bg
A
Okay,
all
right
welcome
everybody,
everybody
to
sixth
scale
I'll
mention
this
just
like
I
did
last
week
in
case
anyone
wasn't
here.
We
moved
to
weekly
meetings,
so
the
cadence
now,
instead
of
bi-monthly,
we'll
be
doing
it
every
every
thursday
at
this
time.
A
Okay,
so
just
kick
things
off
with
with
the
agenda
marcelo.
This
is
your
pr.
Do
you
want
to
give
an
update
on
how
things
are
going
with
that.
B
Yeah,
so
we
have
a,
I
would
say
a
large
discussion
about
that.
I
simplified
the
pr
more
mostly,
but
it's
of
course
it's
not
super
simple
and
as
roma
mentioned,
but
the
pr
it's
now
I
remove
it
all
the
parts
that's
collecting,
metrics
from
prometheus
and
reporting
and
and
verifying
that,
and
also
the
pr.
I
had
some
configurations
coming
from,
for
example,
that
the
task
would
be
a
conf
instead
of
be
hardcoded
was
not
yaml.
B
I
removed
that
also
it's
it's
kind
of
the
task
would
be
hard
to
call
that
now,
unless
you
have
me
less
things
and
I
update
the
part,
as
you
mentioned,
it
was
actually
a
very
good
recommendation
that
I
was
checking
the
status
of
the
vms,
for
the
updates
and
for
deletion
doing
the
doing
the
get
operation
and
when
I
was
running,
for
example,
a
simple
test
creating
500
vms.
B
So
it's
now
watching
the
vms,
for
both
the
change
for
for
the
change
and
for
deletion
for
the
chains,
I'm
just
getting
the
timestamps
from
the
vm
phase
and
get
the
running
time
of
the
vm
for
deletion
there.
It's
a
little
bit
needs
more
still
need
some.
B
Some
information
from
that
which
means
just
delete
the
no
an
object
in
kubernetes
doesn't
mean
that
it's
gone.
Isn't
it
so
we
need
to
check
if
it's
disappears.
The
vms
appears
in
the
in
the
cluster.
So
that's
why
I
have.
There
are
two
calls
a
separate
one
for
to
check
when
it's
deleted.
B
That's
pretty
much
so
regarding
the
latency
that
I'm
testing
so
roma
also
mentioned
he,
he
was
expecting
the
beginning,
a
test
that
was
just
creating
and
not
testing
the
the
performance
the
latencies
for
now.
Well,
I
already
have
it
implemented.
I
would
say-
and
I
think
it's
it's
good
to
have
the
test.
B
You
know
if
we
make
sure
that
if
some
pr
or
something
happens
it
will
fail.
You
know
the
the
the
verification,
by
the
way,
how
I'm
measuring
the
perform
the
latency
okay.
So
what
I'm
doing
is
I
run
the
test.
I
run
the
test.
Actually
I
run
15
times.
B
It
would
be
better
to
run
more,
but
I
run
15
times
and
I
take
the
the
the
times
the
highest
times
that
the
game
gets
created
and
I
define
like
1.5.
You
know
higher
latency
than
this
one,
just
to
make
sure
that
we
are
not.
We
are
defined
some,
you
know
some
range
and
like
50,
you
know
higher
than
the
latency.
It's
the
final
now
and
you
know
to
to
be
okay
in
the
system.
B
I
don't
want.
I
didn't
define
like
very
tight.
You
know
what
the
like
the
latency
that
I'm
collecting
from
the
system
just
to
avoid
to
introduce
a
test
that
maybe
start
to
fail.
You
know
many
times
and
just
to
to
have
a
comfort
zone
now
for
now,
then
we
can.
We
can
reject
that
later
and.
A
You
have
like,
do
you
see
you,
so
you
actually
have
something
that
fails
like
you
have
a
this.
This
is
like.
What's
the
so,
you
have
something
that
you
have
a
threshold
right
now
is
what
you're
saying.
A
Okay,
do
we
well,
I
guess
well
it's
worth
talking
about
because
I
I
like
I'm
wondering
like
because
we
like,
I
have
it
here
like
baseline.
We
talked
about
like
thresholds
and
other
stuff,
like
I
I
mean.
Is
that
do
we
want
to
do
this
here
like
do
we
want
to,
or
like
is
like
do
we
want
to?
A
B
Right,
the
proposal
of
this
test
was
to
actually
have
thresholds
and
fail,
and
this
is
part
of
the
first
implementation,
also
that
I
did-
and
I
think
this
is
available
anyway.
You
know
because
it's
if
we
don't
check
that
after
you
know
months
of
that,
no
one
will
care
about
that
anymore.
You
know
what
I
mean.
A
A
Is
it?
Is
it
in
a
comment,
or
do
I
have
to
like.
B
B
B
B
It
will
not
be
verified
if
we
just
check
into
video
vm's
latency.
You
need
to
also
check
the
badge.
A
And
this
is
like
this
is
to
running
like
this
is
when
we
first
see
the
vmware
running.
B
Okay,
so
I
get
the
timestamp
of
the
object,
be
created,
vmi
object
and
the
running
phase.
Okay,.
A
B
And
those
tests
will
be
actually
running
in
the
environment
that
I'm
running.
So
that's
why
I
you
know,
I
trust
this
latencies
now
so,
okay.
A
Do
we
wanna,
like
roman?
I
saw
you
had
a
comment
here.
Do
you
wanna
talk
about
this
like
we?
What
do
you
think.
C
Well,
yeah,
as
I
said
I
said,
discuss
with
my
childhood
and
also
out
of
I
was
really
hoping
to
just
have
to
just
keep
any
collection
right
out
right
now
completely
out,
so
we
would
just
I
mean
we
collect
all
kind
of
stuff
and
with
prometheus
and
we
would
catch
it
in
our
dashboard
and
I
just
wanted
to
have
a
very
a
few
very
basic
scenarios
which
just
run
and
are
not
even
concerned
about
collecting
the
metrics
and
integrating
them
right
now
and
then
in
the
second
step.
Think
about
a
framework.
C
A
Is
it
because
of
this
like
what
I
just
had
no
wait,
why
I
don't
know
what
that
was,
but
the
like,
where
I
just
was
with
the
density.go
like
because
it
has
like
a
it's
configurable.
Is
that
what
your
concern
is
like?
A
C
In
there
different
start
functions
like
consistent,
start,
vm
or
poisson
density
function,
whatever
I
just
right
like
they
can
all
make
sense.
It's
just
really.
I
wanted
it
to
have
that
simple
without
any
interpretations
right
now,
just
to
see
what's
happening
in
ci
and
then
and
then
later
on
doing
the
all
the
other
part.
That's
all
that's
my
main
thing.
I
I'm
not
even
trying
to
how's
it
going
to.
C
C
And
also
they
configure
all
these
tons
of
configurations
options.
I
just
expected
one
vm
template
via
my
template,
which
is
the
smallest
one
possible,
where
we
know
that
the
vm
will
crash
or
not
yeah,
but
still
the
key
is
staying,
running,
state
right
and
all
the
rest
later.
That
was
so
that
we
can
easily
see
things.
So
we
get
the
collection,
metrics
collection
fast.
C
We
will
probably
then
see
immediately
on
the
collected
metrics
that
we
run
in
the
quality
in
the
query
per
second
limit
of
our
client
and
stuff
like
this,
and
that's
you
know,
that's
what
I
wanted
to
have
initially
and
then
think
about
all
the
other
stuff
like
what
would
be
our
values
for
for
failing
the
test.
What
would
be
acceptable
for
us?
How
would
we
want
to
express
it
in
code
yeah?
D
Yeah,
maybe
your
thoughts
there
so
so
right
now,
I
would
just
mirror
material.
You.
E
Okay,
so
what
you're
saying
is
we
would
have
a
density
test.
That's
just
going
to
execute
a
creation
of
a
bunch
of
bmis,
make
sure
they
go
to
running
state,
and
then
I
guess,
delete
them
and
then
externally,
we'll
have
monitoring
some
sort
of
report
given
to
us
by
prometheus
or
whatever.
That
would
give
us
an
indication
of
how
how
this
did
yeah
from
the.
C
B
Right,
it's.
I
understand
that
I
don't.
I
just
think
in
that.
What
do
you
this
baseline
idea,
view
that
you
want
to
have?
It's
doesn't
really
needs
to
be
actually
in
the
ci
cd,
because,
for
example,
I
already
have
access
to
the
nodes
that
we're
going
to
run
and
actually
I'm
doing
today.
B
I
just
mentioned
to
you
the
the
about
the
metrics
again,
that
I
forgot
that
they
don't
complete
the
promises
but
anyway,
so
that
I'm
going
to
report.
You
know
those
informations,
so
I
want
to
do
actually
a
large
test
like
I
I
didn't.
I
was
trying
to
do
that
today,
but
I
didn't
finish
like
100
500,
something
that
the
actual
video
guys
did
and
expose
a
graphic
dashboard,
I'm
creating
graphing
dashboard
with
the
metrics
that
we've
been
discussing
here
in
this.
B
You
know
in
our
meetings
and
then
we
can
see
that
well,
even
though
we
don't
it's
not
integrated
to
see
icd.
Yet
we
can
already
see
that,
but
with
tests
we
can
make
sure
that
then
that
for
the
the
idea
that
we
were
thinking
about
this
continuing
evaluation
of
the
contour
plane,
we
are,
we
do
just
the
the
first
step
of
we
make
sure
that
things
will
fail.
If
someone
introduced
something
that
is
very
nasty,
you
know
and
increase
the
you
know
decrease
the
performance
too
high.
C
A
C
Talking
about
just
having
the
test
and
starting
collecting
the
metrics
and
then
for
months
this
will
be.
I
was
just
expecting
a
set
of
pr's
very
fast
per
weird.
I'm
not.
I
think
that
of
course
makes
sense
in
general.
That's
another
thing.
E
C
A
Don't
wanna,
you
were
just
trying
to
talk,
we
might
be
all
saying
the
same
thing
like
the
the
the
like
I
have.
This
is
like
stage
one
because
we
talked
about
it.
We've
been
talking
about
a
few
weeks
now,
where
we
we
have
this
initial
ci
job
and-
and
you
know
having
something
like
that-
actually
fails
like
having
it
measure
something
and
have
it
fail
like
if
we
have
that
now,
like
I'm
almost
like
okay,
that's
that's
fine,
I
guess.
A
Maybe
what
we're
saying
is
like
we
just
we
stop
it
at
that,
like
we
just
we
look
at
something
that
we
have
that
we
can
have
as
like
experimentally,
that
we
run
in
ci
just
for
a
period
of
time.
While
we
work
on
these
right,
that's
like
what
we're
saying
we
just
we
just
it
may
have
some
framework
stuff,
but
maybe
we
just
kind
of,
and
that
may
be
okay,
we
just
kind
of
come
back
to
it.
Like
we
part
of
these.
A
The
two
step
two
and
three
is
that
we
kind
of
we
we
work
some,
maybe
some
of
the
framework
stuff,
that's
in
here
or
the
stuff.
That's
generally
loaded
because
it
already
has
to
right,
we
just
kind
of
rework
it.
This
is
just
kind
of
like
our
initial,
you
know,
script
to
get
things
kicked
off
and
it's
like
how
I'm
looking
at
is
that.
A
E
Yeah,
I
was
just
trying
to
sort
that
out
as
well,
so
I
want
to
see
the
decouple
I
mean
I
mentioned
this
a
couple
weeks
ago.
The
decoupling
of
the
load
generator
and
the
report
generation,
which
would
allow
us
to
do
things
like
create
a
density
test
in
our
ci
framework,
used
a
common
reporter
tool
or
whatever
to
to
get
the
results,
but
also
give
us
the
the
power
of
using
the
same
reporting
tool
with
things
outside
of
rci,
so
create
load
tests.
E
The
don't
have
to
be
generated
from
our
functional
tests
and
still
get
reports
they're
consistent
yeah,
so
that
I
think
that's
my
concern.
Is
we
start
momentum
in
one
direction?
E
So
if
we,
if
we
merge
this
the
way
it
is
right
now,
then
we
we've
created
a
direction
and
reversing
a
direction
is
a
lot
harder
than
continuing
with
the
same
direction.
B
Right,
so
I
have
some
comments
about
that
so
yeah
in
the
beginning,
the
pr
I
would
say
that
it
had
a
lot
of
things
on
it
and
actually
the
report
generation.
I
kind
of
you
know:
oh
it's
still,
printing
printing
things
on
this.
You
know
stda
out.
However,
it's
not
generating
report
anymore.
I
remove
that
part.
So
it's
I
would
say
that
you
know.
Well,
let's
organize
thing,
my
sorry,
my
thoughts
here,
I
would
say
that
we
have
two
two
main
areas.
Okay,
one
thing
is
to
monitor
the
control
plane.
B
You
know
for
the
in
the
cicd
system
and
and
make
sure
that
things
don't
get.
You
know
too
bad.
Another
thing
is
to
deep
dive
in
the
performance.
So
to
do
like
a
very
you
know,
detailed
performance,
evaluation
and
deep
dive
on
that.
Actually,
I
think
it
was
a
good
idea
that
you
mentioned.
Maybe
you
extend
could
burn.
I
actually
tried
to
see
the
code
and
I
don't
think
it
will
be
too
hard
to
extend
that
and
coopburn
generates
this
nice
report.
B
B
It's
what
I
was
mentioning,
it's
you
having
the
you
know
integrated
in
the
cict
system,
this
idea,
so
I
I
think
I
I
don't.
I
don't
know
if
you
guys
saw,
I
sent
the
documents.
You
know
with
the
plan
of
that
before
so,
and
we
have
like
three
kind
of
of
types
of
ser
of
jobs.
B
You
know
that
I
was
saying
the
small
scale
to
100
vms
that
runs
for
hpr
a
medium
scale
that
runs
daily
in
a
large
scale,
that
in
red
hat,
we
have
this
possibility
to
access
a
large
cluster
that
I
want
to
run
that
before
each
release,
and
we
can
keep
that
you
know
and.
A
Marcelo
that
that
makes
sense
like
that
makes
sense
to
me
like
starting
interrupting,
like
that.
That
makes
sense.
We,
I
think
we're
aligned
on
on
this,
like,
I
think,
we're
aligned
on
like
for
the
idea
of
this.
A
Having
a
super
job,
I
think
maybe,
where
we're
not
aligned
is,
is
exactly
how
we
get
here
because,
like
we're
we're
sort
of
like
we
have,
we
have
these
two
steps
where
we're
talking
about
how
like
we
want
to
have
a
tool
generate
load
and
how
to
how
to
generate
a
report
that
will
take
us
to
a
ci
job
that
all
the
things
you
mentioned,
but
I
think,
like
I
think,
where
we
need
to
figure
out
is
like
because
we,
I
think
we
like
this
first
step
like
having
something
there.
I
I
just.
A
Can
we
break
this
down
like
what
is
it
that?
What
is
what
do
we
consider
to
be
like
an
acceptable
thing
for
this
pr,
because,
like
that's
like
that's,
maybe
where
at
least
I'm
struggling
to
like
you
know,
we
have
a
bunch
of
things
in
this
pr
like
we
have
some
thresholds,
you
know
what
is
it
that
we
wanted
to
do?
What
would
we
consider
to
be
a
step
forward?
B
Yeah,
so
as
far
as
understood,
what's
in
roman
mentioned
well,
it's
concerned
is
to
have
the
thresholds.
C
What
I
meant
is
what
I
see
there.
There
are
just
a
lot
of
things
right
now,
like
configuration
options
for
the
tests,
how
you
do
the
the
creation
of
the
vamps,
with
which
I
think
you
can
write
now
for
the
scope
for
the
initial
test.
It's
really
just
one
test
right
now,
where
you
start
a
different
number
of
vms.
C
All
I
think,
is:
let's
just
throw
all
this
out
right
now.
You
can
do
this
a
similar
thing
with,
I
don't
know
40
or
50
lines
of
code
and
just
in
a
follow
up
here.
Pr
think
about
how
you
want
to
report
it
for
creating
thresholds
and
all
and
in
the
meantime,
we
just
collect
in
the
ci
shop,
where
we've
prepared
everything
with
prometheus.
C
C
B
Right
so
well
yeah,
I
partially
agree
so
like
I,
I
don't
think,
like
you
know,
drop
the
structure
you
know
to
define.
You
know
information
from
the
test.
It's
it's
good
idea.
B
A
Of
but
yes,
all
right,
100
100
lines
of
go,
maybe
it's
called
by
a
bash
script
whatever
and
all
it
does
is
like
it.
It
creates
100
vms.
A
You
don't
need
any
configuration
that
you
can
set
the
threshold
to
fail.
We
have
those
hard-coded
values
like
we
could
that's
all
we
we
so
and
that's
it
like
we
just
that
gives
us
like
when
we
we
have
like
some
sort
of
we
don't
do
any
reporting,
we
just
kind
of
gather
and-
and
we
just
we
say,
pass
fail.
That's
it.
C
A
Mean
does
that
make
sense
marcel
like
we
just
want
to
simplify
it
as
much
as
possible,
so
we're
not
compromising
any
like.
B
Right
so
like
for
the
configurations,
so
if
maybe
you
can
point,
you
know,
for
example,
the
arrival
rate
that
things
that
you,
you
think
that's
maybe
still
fancy
now
for
the
test.
If
you
can
point
that
I
can,
I
can
yeah
definitely
remove
those
parts
from
the
test
and
and
then
we
can
move
forward
from
that
yeah.
A
Okay,
okay,
we
can
comment
on
the
pr.
Then,
okay,
okay,
that's
I
think
that
covers
this
one
then
so
they
we
can.
So
I
so
I
that
that
should
get
this
kicked
off,
and
then
we
need
to
do
some
design
here
on
these.
I'm
thinking
we've
already
heard
some
ideas
about
about
this,
but
we
can
have
sort
of
design.
A
I
don't
know
if
we're
gonna
have
time
to
talk
about
today
and
but,
if
there's
any,
if
anyone
wants
to
take
on
like
writing
about
any
either
any
of
these
things
like
how
that'll
look
goals
or
anything
like
that,
and
you
want
to
throw
in
an
issue-
google
doc
anything
whatever
or
if
you
want
to
just
add
a
bunch
of
bullet
points
here,
that's
that's
fine.
We
can.
A
We
can
look
at
taking
this
on
next
week,
like
we
can
maybe
look
at
taking
the
first
one
on
and
trying
to
build
some
a
bunch
of
different
ideas
around
it,
but
if
anyone
wants
to
take
it
on
feel
free,
okay,
we'll
move
on
to
the
next
point.
I
I
talked
about
this
last
time.
That's
something
I
said
I
was
going
to
do
with
with
baseline
and
there's
a
roman.
You
actually
just
did
this
patch.
We
talked
so
last
time
we
talked
about
in
reconcile.
A
One
of
the
things
that
we
saw
with
the
from
from
our
testing
internally
was
that
we're
being
rate
limited,
so
rowan
put
together
a
patch
to
measure
this,
so
we
can
see
it.
I
haven't,
got
a
chance
to
use
it.
I
have
to
pull
it
and
use
it
and
I'll
come
back
with
some
bass
lines
for
you.
A
I
can
do
it
in
the
middle
of
the
week
next
week
on
keyword,
dev
or
something
to
give
you
some
ideas,
but
it
kind
of
got
me
thinking,
like
you
know,
baseline,
how
how
we
will
define
baselines
for
things.
It
wasn't
really
clear
to
me
because
you
know
one
thing
I
was
thinking
of
like
you
know.
What
are
the
rules
here
like
I
could
say,
like
you
know,
my
cluster,
is
this
big?
Has
this
many
vms?
A
You
know.
How
am
I
doing
this?
You
know
how
am
I
going
to
find
this
baseline
it
like
I.
We
have
these
tools
that
we
are
thinking
of
of
generating
load.
I'm
sad,
it
sounds
like
to
me
like
eventually
what
we'll
do
is
we
take
these
tools
and
we
use
them
to
generate
our
baseline
for
different
things
and
we
sort
of
categorize
them
like
based
on
load
and
stuff,
like
that,
that's
what
I'm
thinking.
A
So,
if
we
do
so
any
sort
of
baseline
that
we
generate
that's
sort
of
you
know
ahead
of
time.
We
can
kind
of
use
this
like
just
a
temporary
placeholder.
So
what
I'm
thinking
would
do
is
like
I'll
create
a
like,
maybe
a
table
in
here
or
somewhere,
maybe
an
issue.
We
can
kind
of
just
kind
of
track,
any
sort
of
baseline,
at
least
until
we
have
this
to
normalize
all
our
expectations
and
and
maybe
in
a
format
like
this
kind
of
with
the
threshold
and
stuff,
or
something
like
that.
A
Does
that
make
sense
to
people
like
what
do
you
think
or
like?
How
would
it
does
anyone
have
any
suggestions
than
that.
B
Yeah,
so
I
for
regarding
the
baseline,
so
this
is
one
of
the
ideas
to
have.
Also
just
you
know,
just
jobs,
you
know
in
the
convert,
ci
and
you
know
to
have
the
baseline
can
be
very
you
know
you
can
explode
and
and
have
a
lot
of
configurations.
B
So
the
the
first
idea
was
to
have
a
minimal
configuration,
a
nice,
very
specific
operation,
system
and
storage,
and
you
know
that
we
can.
We
show
and
we
provide
some
information
because
you
know
convert
can
run
ever
you
know
anywhere,
and
it
can
be
hard.
You
know
to
define
you
know
where
which
system,
which
kind
of
system,
so
we
need
to
define
you
know
just
very
well,
I
would
say
I
I'm.
B
When
we
have
like
okay
yeah
a
summary
for
you,.
A
I
was
going
to
say
I
was
going
to
say
I'm
going
to
say
marcelo
like
it.
We
I'm
always
wondering
if
we
could
put
this
in
like
in
plain
text
somewhere
that
we
just
kind
of
have
like
the
ci
just
kind
of
just
absorb,
maybe
or
something
like
that,
because
whatever
this
is,
could
be
usable
by
or
should
be
usable
by
ci,
and
then
we
can
just
it'll
be
our
source
of
truth.
That's
that's!
I'm
kind
of
leading
toward
that
right
now,
yeah,
okay.
A
So
at
least
I
can
find
a
place
for
that
somewhere
in
the
in
the
repo
we'll
just
kind
of
we'll
just
track
the
stuff
in
plain
text,
and
we
got
a
little
hell.
Our
jobs
will
eventually
consume
it.
A
And
yeah
we
can,
you
can
put
your
your
stuff
in
there
when
I
find
it
I'll.
Let
you
know.
A
Okay,
next
reducing
update
patch
collisions.
Let's
take
a
look
at
this.
E
Oh
yeah,
that
was
mine.
This
is
more
of
I'm
just
pointing
out
something
that
I
saw
that
probably
impacts
our
startup
times
and
other
stuff.
I
don't
have
evidence
that
this
reduces
startup
times,
yet
I
don't
see
how
it
couldn't.
I
just
don't
know
how
measurable
it
is.
E
So
when
our
vmis
are
starting
up,
we
hit
lots
of
these
409s
at
least
two
to
four
before
a
vmi
gets
to
running
and
four
nine
is
when
we
try
to
post
an
update
to
a
vmi,
but
it
gets
rejected
because
our
bmi
that
we
have
in
our
informer
is
different
from
reality,
so
it
hasn't
our
informer.
Has
it
caught
up
to
to
what
is
actually
persisted
at
cd,
so
this
causes
things
to
get
rate
limited
and
it
causes
us
to
generate
load
on
the
api.
Server
that
doesn't
need
to
occur
turns
out.
E
The
reason
this
was
happening
was
because
we
have
lots
of
other
informers
that
aren't
our
bmi
informer,
they're
queuing
keys
onto
our
vmi
reconcile
loop,
so,
for
example,
at
pod
informer.
If
we
create
a
pod
and
then
update
our
bmi,
we'll
probably
get
notified
that
the
pod
was
created
before
the
vmi
was
updated.
E
E
The
point
is:
there's
a
way
to
resolve
this
using
a
really
simple,
like
heuristic,
using
an
expectation
that
says
every
time
we
update
the
vmi,
don't
process
that
key
again
until
we
actually
see
an
update
has
occurred
in
our
informer
and
that
pretty
much
made
all
of
these
collisions
go
away.
So
it
reduces
our
the
number
of
reconcile
loops.
E
A
Okay,
this
is
cool.
This
sounds
like
a
lot
of
what
fanna
talked
about
with
the
reconcile
interesting.
That
is
really
interesting.
E
So
so,
two
to
four
four
nine
errors
results
in
two
to
four:
more
reconciles
for
every
vmi
on
startup
yeah.
So
that
means.
A
E
A
Awesome,
that's
cool,
I'm
gonna!
I
really
want
to
try
this
like
with
some
of
the
other
measurements
we've
done.
I
wonder
if
this
might
be
the
the
thing
we've
been
looking
for.
That's
causing
some
of
the
collisions.
Okay.
This
is
really
cool.
We'll
take
a
look.
E
It
might
not
be
the
thing
that's
causing
you
all
to
to
have
this
spike
as
as
more
vmis
are
introduced
than
like
the
larger
and
larger.
It's
probably
one
of
the
things
at
least.
A
Yeah,
it's
a
fair
heads
yeah,
I
yeah
it
sounds
like
it
will
help
for
sure.
We'll
we'll
see
we'll
see
how
much
this
is
definitely
one.
I
want
to
see
like
with
some
of
the
other
graphs
we
generated.
I
want
to
see
if
like
how
this
moves,
the
line,
cool,
okay,.
A
Great,
the
thanks
david,
so
the
that's
the
last
bullet
point
for
today
does
anyone
else
want
to
treat
anybody
bring
up
anything
else.
E
I
guess
so:
we
have
about
20
more
minutes
or
15,
more
minutes
yeah,
maybe
just
for
the
sake
of
discussion,
so
forgetting
the
density
tests
and
ci
and
all
that
what
would
it
load
sorry,
what
would
a
reporting
tool
look
like?
What
would
we
want
to
see
in
that?
Would
it
depend
on
prometheus?
How
would
it
work?
E
Maybe
we
could,
I
don't
know,
make
an
exercise
out
of
that.
Does
anyone
have
any
thoughts.
E
That's
a
generation
tool,
though
that's
a
density
that
that'd
be
like
something.
Actually,
I
know
that
it
does
some
metrics
returning
our
collection
as
well,
but
I
wouldn't
consider
that
the
tool
that
we
use
to
gather
metrics,
necessarily
because
it's
generating
the
load
as
well.
B
Okay,
yeah
so
well.
Moving
to
the
parts
actually
kubovert
is
collecting
noku
burns
collecting.
I
like
the
way
that
they
are
doing
in
a
way
that
well,
basically,
if
you
you
want
a
tool
to
watch,
you
know
the
parameters,
metrics
and
dump
it
in
a
report.
A
Yeah
so
something
like,
so
we
could
something
we
could
consume
in
like
ci
like
I
could
go
through.
You
know
my
pr
and
see
how
I
can
say.
Okay,
here's
my
here's,
my
report
or
my
failure.
Whatever,
like
you
know,
I
can
see
like
my
thresholds,
were
a
little
bit
off.
I
had
you
know.
Maybe
like
I
was
on
my
ninth
percentile,
I
was
I
was
at
120
seconds
because
maybe
I
had
like
one
or
two
vms
that
were
just
slow
for
some
reason,
so
something
that's
consumable
by
ci.
A
So
let's
just
write
these
so
by
ci
and
the
developer.
E
E
B
And
and
the
metrics
and
it
so
I
think
we
discussed
that
in
the
beginning,
my
pr
was
doing
that
to
remove
that
part
that
I
call
resource
collector
which,
like
as
show
the
cpu
usage,
you
know
per
vm,
the
cpu
usually
use
it
for
the
all,
the
the
control,
plane,
modules,
memory
and
plus
the
latencies
that
we
were
discussing
and
and
also
show
this
kind
of
thing.
You
know
the
offer
latency
also
again
the
decline,
tiles
and
for
cpu.
You
know
in
memory
just
average.
B
A
A
So
latency
is
another
one,
but
we
do
thresholds
again.
So
we'll
do
the
amyloid
thresholds.
Let's
do
the
same
thing.
A
Apm
latency
thresholds
and
resource
usage
also.
B
It's
yes,
sometimes
you
know
not
normally
it's
related,
but
sometimes
cpu
usage
can
increase,
but
the
latest
thing
can
still
be
fine,
but
it
can
become
a
problem
later.
You
know,
I
mean
there
is
force
usage
or
especially
because
the
contour
plane
start
to
be
like
too
heavy
start
to
be
problems
and
yeah
yeah.
A
We
also
yeah.
We
also
want
to
think
too,
like
so
what
what
other,
what
other
personas
so
like
we.
So
we
have
consumer,
ci
and
developer.
Let's
talk
about
like
tests
where
you'd
want
to
get
reports
like
what
kind
of
tests
like
like
so
like
one
is
gonna,
be
so
like
when
we're
doing
massive
scale.
A
We
want
to
get
reports.
We
also
want
to
get
it
when
so,
like
the
reason
I'm
thinking
about
that
is
because,
let's
say
at
massive
scale,
suddenly
vert
handler
is
is
having
an
increase
or
something
in
in
usage
of
cpus
or
something
we
want
to
know
that,
like
we
want,
since
it's
going
to
be
reported,
this
is
one
of
the
things
we
want
to
it's
one
of
the
tests.
A
We
want
to
run
so
like
if
we
do
just
just
a
general
performance
test,
what
we
do
in
ci,
we
do
with
our
unit
tests
or
our
func
tests.
A
B
Tool,
well
maybe
the
report
too
should
like
show
the
system
configuration
you
know,
because
if
someone,
for
example,
you
know
if
we
have
different
companies
using
this
tool
and
as
you
mentioned
like
creating,
maybe
different
base
lines,
it
would
be
nice
also
to
show
some
report
about
the
system.
You
know
how
many
you
know
the
kubernetes
configuration
information
clustering
for
them,
some
some
more
information
about
the
system
where
it
was
running
well,.
A
Will
we
get,
I
mean
I
was
thinking,
maybe
that
the
person
running
it
would
could
provide
that,
but
like
would
we
even
get
like,
would
would
be
able
like
outside
of
the
test,
and
would
we
even
get
that
like
like
how
like
that
that
sounds
like
it
would
be
like?
We
would
need
to
sort
of
scan
the
system
with
with
the
toolbox.
B
A
Sorry
we
we
talked
about
thresholds
like
did.
Does
this
cover
our
you
know,
like
all
the
information
we
want
about
a
vmi,
this
gets
us
like
yeah.
This
gets
us
our
like.
You
know
how
fast
we
are
how
slow
we
are.
B
Those
those
metrics,
so
we
we
need
to
have
some.
You
know
high-level
metrics,
so
we're
kind
of
with
the
final
order.
You
know
slo
service
level
agreement,
something
like
that.
You
know
escobar
nets.
Has
I
start
to
prepare
a
document
about
that?
I
don't
remember
now
what
I
put
in
this
document.
I
don't
know
if
I
share.
I
think
I
shared
that
some
time
ago
and
because,
like
the
vmware
thresholds,
it's
like
just
kind
of
high-level
messages,
the
api
latency
it's.
I
would
say
that
it's
like
low
level-
it's
not!
B
A
Another
question
like
how:
how
should
this
be
run
like
do
we
run
the
reporting
tool
after
we
we
execute
a
test.
Do
we
run
it
before
we
run
it
during?
I.
E
I
consider
it
my
thought
has
always
been
that
it's
like
it's
kind
of
like
a
profiler
like
if
you
were
wanting
to
profile
a
cpu,
I
mean.
If
you
run
the
profile
process,
you
would
start
a
profiler
which
would
begin
sampling
the
process
and
then
you
would
stop.
E
Maybe
you'd
run
a
load
test
during
that
or
whatever
you're
going
to
do,
then
you'd
stop
the
profiler
and
examine
the
results
so
for
our
reporting
tool,
I
would
imagine
starting
the
profiler
or
our
report
gathering
tool
running
the
test,
then
stopping
the
profiler
and
examining
the
results.
It
would
only
capture
what
occurred
during
that
time
period
that
it
was
actually
running.
A
What
about
what,
if
you
were
to
run
the
report
to
afterwards
and
just
gave
it
a
time
frame
and
it
just
scrapes
the
metrics
like,
does
it
or
can
it.
E
Work
not
get
the
information,
maybe
that
would
only
work
if
we're
solely
using
prometheus
yeah
yeah.
So
if
there's
anything,
we
want
to
do
that's
different
introspection
of
the
system,
then
it
wouldn't
work.
A
So
let's
say
we
run
it
at
the
start.
How
is
it
so
we
have?
This
gives
us
options.
I
guess
is
the
point
like
this
gives
us
options
to
either
scrape
from
prometheus
or
presumably
do
some
sort
of
watching
in
gathering
the
same
data.
C
B
I
would
say
that
if
we
can
run
later,
it's
better
because
it
doesn't
introduce
load
in
the
system
in
it
can
interfere
in
all
the
tasks.
But,
of
course,
if
we
think
that
there
is
something
that
cannot
be
collected
by
permitted
later,
then
we
can
change
the
approach,
of
course,
but
right
now
it's
everything
from
permission.
A
That's
something
to
think
about
yeah,
because
it
kind
of
defines
the
identity
of
the
reporting
tool.
That
kind
of
when
I
I'm
trying
to
think
of
like
some
use
cases
like
it
would
be
like
an
example
of
something
that
we'd
want
to
get
during
during
when
a
test
is
being
run.
A
A
E
Be
necessary
if
we're
going
to
completely
depend
on
prometheus
for
our
reporting,
then
it
seems
like
we
could
run
it
afterwards
with
the
time
period.
This
tool.
C
I
also
think
we
can
get
pretty
far
with
it,
I'm
not
sure
for
some
things
like
sometimes
especially
right
now
we
have
a
watching
approach,
also
for
some
prometheus
metrics
and
some
things
may
be
hard
to
get
with
that,
because
some
objects
just
disappear
and
you
may
not
be
able
to
watch
them
fast
enough
to
collect
something,
and
then
it
may
be
too
difficult
to
distribute
the
metrics
collection
to
the
various
components,
but
yeah,
and
so
are
you
talking
about
the
granularity
of
the
reporting
yeah
I
mean,
like
vert
controller,
is
right
now
collecting,
for
instance,
all
the
phase
transitions,
but
if
you
want
to,
for
instance,
collect
the
time
it
takes
to
delete
vms,
it
could
be
impractical
with
that
approach,
because,
with
controller
is
not
necessarily
the
one
deletes
the
vm
and
we
may
not
get
the
the
timestamp
exactly
which
we
want,
because
which
controller
may
not
be
able
to
observe
it.
B
C
A
So
that
would
mean
if
we
had,
if
we
didn't
have,
if
we
ran
this
according
to
after
would
we
still
can
we
still
get
the
deletion
timer?
Is
that
we're
saying
we
can't
because
we're
going
to
miss
the
event.
A
E
Actually
we're
going
to
get
it
with
the
prometheus
metric.
We
still
get
deletion
because
it's
a
histogram
and
we're
we
are
updating
it
based
on
an
informer
locally,
so
the
informer
is
still
going
to
see
the
deletion
occurred,
but
that
phase
transition
to
the
final
state
occurred
and
it
will
get
stored
in
that
instagram.
B
C
D
C
Virtual
controller
is
in
charge
of
data
right
with
the
finalizer,
actually
right
yeah,
so
we
can
get
it
yeah
and
even
if
not
I
mean
we
get
a
real
delete
event
inside
verge
controller
for
overvm.
If
it
would
not
be
the
case,
we
may
not
process
this
right
now,
but
we
should
actually
so
yeah.
I
think
we
have
the
opportunity.
C
C
C
A
So
what
do
you
say
to
answer
this
question
if
we
were
to
if
we
were
to
run
run
late,
I
I
I
could
sort
of
position
like,
or
I
could
sort
of
envision
this
as
like.
If
we
were
to
run
this
late
and
then
we
have
sort
of
the
exact
same
idea
as
if
we
were
to
run
it
before
we're
like
we're
just
we're
going
to
gather
information
from
prometheus
through
this
period
of
time.
Like
that's
our
that's
like
our
api,
we
want
to
gather.
A
We
want
to
gather
this
information
for
this
period
of
time
and
then,
when
we
run
it
after
the
idea
is
that
we're
just
going
to
query
prometheus
for
the
timestamps,
and
this
would
give
us
the
opportunity
if
we
wanted
to
later
pivot,
to
say:
okay,
we're
going
to
run
it
for
this
period
of
time.
We
just
we
just
kind
of
change
it
to
be
like
okay,
we
just
change
it
to
some
sort
of.
You
know.
We
have
some
sort
of
time
that
we
run
it
so
that
wouldn't.
E
B
C
And,
in
addition,
that's
what
I
kind
of
hinted
in
the
pr
comment.
I
made
it's
any
time
possible
to
just
tell
prometheus
to
append
specific
labels
to
all
metrics,
starting
from
a
specific
time
point.
C
So
it's
also
easy
to
just
add
really
a
label
with
a
test
id
or
something
during
that
period
of
the
test
time
and
then
just
remove
it
again
and
replace
it
with
the
next
one
for
the
next
test.
And
then
you
don't
need
you
don't
even
need
time
stamps
on
the
reporting
tool,
for
instance,
just
as
an
example.
A
Okay,
I
think
that
gives
us
a
pathway.
I
think
that's
like
we
can
start
with
that.
I
think
this
is
easier
too,
like
we
start
after
we
just
assume
prometheus
will
make
that
assumption,
and
then
we
just
we'll
just
put
it
behind
some
sort
of
api
so
that
we
can
have
the
opportunity
to
do
this.
You
know
if
we
decide
we
wanted
to
do
during.
We
want
to
do
it
like
yeah.
If
we
want
to
come
back
and
do
something
during.
A
A
A
All
right,
we
only
got
like
one
minute
left.
I
I
like
what
we
have
here.
I
think
this
is
pretty
good.
What
other
like
last-minute
thoughts
like
what
else
can
we
throw
out
here
that
we
want
in
the
reporting
tool.
E
E
A
E
So
a
really
simple
tool
that
all
it
does
is
over
time
period
give
us
the
vmi
thresholds
that
occurred
during
that
and
then
let
us
build
out
from
there
so
just
make
sure
that
we
have
a
really
solid
agreed
upon
entry
point
for
what
this
tool
can
start
with,
because
that
makes
it
actionable.
I
think
it's
actionable
now.
Actually,
through
this
discussion
we
could
go
off
and
somebody
could
write
this
right
now.
A
Okay,
all
right
we're
at
time.
So
thank
you,
everybody.
This
is
pretty
good
yeah.
We
got
a
lot
done
with
this.
All
right
have
a
good
day.
Everybody.
Thank
you
very
much
and
we'll
see
you
online
all
right.