►
From YouTube: SIG - Performance and scale 2021-09-02
Description
Meeting Notes: https://docs.google.com/document/d/1d_b2o05FfBG37VwlC2Z1ZArnT9-_AEJoQTe7iKaQZ6I/edit#heading=h.4t9i91had6to
A
Okay,
all
right
welcome
everybody
to
some
scale
september,
2nd.
I
add
the
link
to
the
documents
in
chat,
so
you
can
add
your
name
to
the
attendees
great
okay.
So
what
we'll
go
through?
Please
add
any
agenda
items,
but
we'll
start
with
the
the
first
item.
A
So
this
is
a
discussion
that
I
bought
from
the
mailing
list
brought
up
a
few
weeks
ago,
but
I
wanted
to
kind
of
back
up
and
kind
of
clarify
one
of
the
original
goals
for
the
discussion
in
case
there
was
any
other
comments
about
it
and
kind
of
clarify
a
few
things,
so
this
originally
brought
up
two
weeks
ago.
A
The
original
ask
that
that
I
had
was
around
vmi,
specific
metrics,
and
the
context
of
this
is
that
we
had
discussed
having
metrics
around
vmis,
but
we
didn't
want
to
have
metrics
that
sort
of
that
that
kind
of
ballooned
out
of
control.
A
So
one
of
the
things
like
with
that
we
currently
do
with
metrics
is
we
have.
We
have
a
sort
of
like
a
summary
of
the
number
of
or
the
amount
of
time
it
takes
in
phases.
We
have
we
kind
of,
we
accumulate
them
into
buckets,
we
don't.
Actually
we
don't
actually
output
the
specific
vmi
labels
as
a
name,
and
so
the
difference
is
that
we,
we
have
a
static
number
of
of
metrics
that
come
with
the
phase
transitions
that
we
do.
A
We
don't
actually
report
the
name
of
the
vmi
on
them,
because
those
labels
would
quickly
grow
based
on
the
number
of
vms
in
the
cluster,
and
so
one
of
the
things
that
I
brought
up
was,
or
so
the
question
that
I
asked
here
was
like:
are
there
any
time?
Are
there
any
situations
where
we'd
actually
want
to
know
the
the
actual
vmi
we're
going
to?
When
would
we
want
to
kind
of
like
pull
back
the
curtain
and
say?
Okay,
this
is
the
vmi.
A
That
is
that
we
care
about,
because
it's
doing
something
you
know,
that's
that
we
want
to
look
at
more
closely
and
one
of
the
things
that
I
brought
up
I
started.
One
of
the
use
cases
was
was,
I
called
the
vmis
that
were
they
were
stuck
like
things
that
that
weren't
that
were
taking
a
long
time
and
so
what
that's
the
kind
of
mailing.
A
The
thread
that
I
that
I
brought
up
the
topic
was
stuck
the
amazed
and
one
of
the
things
that
came
out
of
it
from
our
discussion
last
week
was
and
was
that
if
a
vmi
is
stuck
the
thing,
I
want
to
clarify
that,
like
it
won't
be
creating
events
and
so
the
way
I
want
to
change.
A
The
definition
is
that
if
we
have
a
vmi,
that's
like
impending
or
something
and
it's
or
maybe
it's
in
scheduling
and
it's
waiting
on
a
device
to
be
assigned
to
it
or
something,
and
it's
just
sitting
there,
the
pods.
Just
let's
not
do
anything
so
we're
not
gonna
get
any
events,
and
this
this
could
classify
something.
A
That's
stuck
it's
not
progressing
at
all,
but
there
are
also
other
cases
of
this,
and
so
I
split
this
onto
a
second
category
which
is
like
emi's
that
are
slow,
they're
ones
that
are
progressing
but
they're
just
taking
a
long
time.
A
So
some
amount
of
time
that's
longer
than
what
we
expect,
and
this
could
be
for
any
number
of
reasons
but
like
if
we
went
from
you
know
scheduling
into
scheduled
and
we
knew
how
long
roughly
it
took
to
go
between
those
phases,
and
we
noticed
that
one
vmi
just
took
an
incredibly
long
time
and
we'd
see
this
in
our
dashboards.
But
you
know,
could
this
be?
A
It
would
be
useful
to
know
which
one
it
was
so
that
we
can
trace
it
and
get
a
better
look
at
it,
and
so
that
was
one
of
the
things
that
that
I
wanted
to
clarify
about
this.
But
this
is
a
fairly
general
topic
in
terms
of
other
metrics
that
we
could
do
and
how
we
could
label
them.
But
are
there
any
thoughts
on
this,
though?
Like?
Does
this
sound
like
in
terms
of
a
better
use
case
like
I'd
want
to?
A
B
Ryan,
can
you
hear
me
yeah
favorable
grit
right?
I
think
the
the
the
second
out
of
these
two
points
you
mentioned
something
like
you
discussed
it
last
time
already
where
I
wasn't
there,
but
so
maybe
I
might
add
something
which
you
already
said,
but
so
regarding
these
two
cases,
especially
the
second
one
reminds
that
are
stuck
won't,
create
events.
I
think
for
this
one,
you
summarized
it
pretty
well
that
you
would
want
to
see
events
there,
like
normal
cubecut,
will
get
events.
Events
for
the
first
case.
B
Would
it
be
more
something
which
should
be
handled
via
monitoring
and
less
via
labels
or
something
or
do
you
think?
It's
not
what
what
I
it's
only
a
little
bit
like
you
wanted
to
see
directly
in
the
objects.
A
A
We
can
see
this
show
up
in
the
dashboards
if,
like
yeah
and
so
like,
if
you
have
like
a
99
percentile,
for
example,
for
a
histogram,
you
you'll
occasionally
see
like
some
vms
are
just
they're,
just
a
little
slow,
but
you
don't
know
which
one
it
is
is
say:
there's
like
a
lot
of
churn
and
zones,
a
lot
of
creating
and
deleting
happening.
We
don't
know
which
vmi
it
is.
A
You
have
to
do
a
lot
of
work
to
figure
it
out,
and
so
my
suggestion
is
that
if
we
can
see
these
events,
perhaps
we
can
pass
the
the
actual
name
of
the
vmi.
In
addition
to
the
other
information
we're
passing,
so
we
can
locate
it
okay,
so
it's
sort
of
like
a
it's
sort
of
an
advanced
way
of
monitoring
so
that
we
can
have
like
a
subset
of
bmis
that
we
could
look
into
if
we
want
to.
C
On
the
not
getting
events
on
stock
vmis
would
it
have
been
idea
to
watch
events
for
via
my
parts
and
forward
those
just
as
we
forward
status
like
for
a
vmi
pod
event
with
create
a
vmi
event.
A
Yeah
well
so
th
this
one.
I
guess
we're
kind
of
one
of
the
discussion
I
think
like
where
we
are
with
this
one
is
that
I
don't
know
I
I
don't
know,
I'm
not
sure
how
we
could
solve
this.
I
think
kevin.
You
were
the
one
who
said
that,
like
if
kubernetes
has
this
problem
too
and
then,
if
a
pod
is
stuck,
you
know
what
is
what
is
kubernetes
going
to
do
about?
It
doesn't
do
anything
about
it
and
so.
C
B
D
B
Guess
we
have
here
this
kind
of
gap
where
we
are
waiting
for
for
kubernetes
to
do
the
scheduling
action
for
us
where
we
are
actually
mirroring
pot
conditions
already
in
the
vmi,
but
on
the
pod
condition
you
cannot
necessarily
see
that,
for
instance,
the
scheduler
tried
something
again.
You
would
only
see
that
on
the
events,
and
that
is
where
we
are
probably
silent.
A
So
if
that's
I
mean,
if,
if
if
we
could
get
some
events,
then
perhaps
this
could
be
something
that's
solvable.
I
mean
I
so
if
we
yeah,
if
the
schedule's
doing
something
there
and
it's
posting
some
something
on
the
pod.
Like
I
I
mean
we
do
see
this
like
if
you
like,
you're
impending
and
you're,
like
oh
no
notes
are
available,
you
know
do
like
100
times
or
something
like
you
can
see
that
there's.
A
But
what
about
like
cases
like,
if
you're
waiting
for,
like,
like
your
cni,
to
do
something
like
to
to
provide
you
with
an
interface
or
something
will
that
show
up
on
the
pod.
B
I
guess
the
main
issue
here
is
just
that
that,
on
the
part
status-
you're,
really
you
so,
for
instance,
you
have
an
issue
with
mounting
something
on
cni
that
that
would
add
a
let's
say,
accommodation
x,
on
the
pot
status.
So
it
would
add
on
the
first
error,
that
condition
and
send
an
event.
B
But
if
then,
when
the
cubelet
retries
with
cni
to
mount
it,
the
condition
in
the
status
does
not
change
at
all,
not
even
the
state
this
time
step,
but
you
will
see
another
event,
that's
mostly
to
avoid
storms
like.
Can
you
see
that
you
get
when
you're
watching
parts?
You
would
get
an
immense
amount
of
warnings
if
status
were
populated
differently
and
yeah
yeah?
I'm
also
not
sure
what
to
do
if
we
would
not
listen
to
porn
events
directly.
B
Don't
we're
looking
at
the
part
conditions
and
we're
mirroring
them
so
that
you
see
the
vmi
what's
currently
going
on,
but
if
the
part
conditions
are
not
changing,
but
just
because
the
port
conditions
are
not
changing
does
not
mean
that
they
are
not
sent,
for
instance,
50
events
in
the
meantime.
For
that
part,.
C
B
E
C
C
B
They
interpret
the
external
situation
at
the
moment
when
they
are
evaluating
the
objects
but
they're
not
looking
at
any
events
or
anything
so
and
I
think
in
general
it's
a
very
safe
pattern
to
do
it
exactly
like
this.
So
what
we're
doing
is
kind
of
the
pattern
being
intended
to
have,
because
it's
safe
and
scales
well
and
everything.
But
yes,
it
has
the
disadvantage
to
directly
see
always
on
the
object.
What's
going
on.
A
So
the
thing
that
for
me
I
the
most
important-
I
guess,
like-
I
think
the
first
item
here
I
think,
would
be,
I
think
at
least
a
start,
and
then
you
know
this.
Maybe
we
can
talk
about
this
as
more
of
an
advanced
case.
If
we
want
to
expand
this
to
monitoring
things
that
get
stuck,
I
think
this
one
at
least
for
in
terms
of
like
calculating
performance,
I
think,
would
be
initially
valuable,
so
we
can
locate
the
outliers
and
then
this
one
could
come
later.
A
If
it's,
you
know
as
a
way
to
weed
out
any
sort
of
external
things
happening
like
you
know
like
if
any
of
the
device
plug-ins
or
something
is
like,
we
can
maybe
capture
those
here.
C
Yeah
and
like
one
more
thought,
the
event
mirroring
I
mean
we
don't
need
to
do
that.
It's
it's
kind
of
solvable
on
the
client
set
as
well.
I
mean
right
now.
If
you
do
keep
ctl
described,
you
get
events
for
the
resource
you're
looking
at,
but
the
events
are
there
like.
You
can
also
just
query
event,
since
they
only
give
me
bmi
events
or
only
give
me
events
for
the
vm
this
vmi,
because
we
have
the
labels
on
the
parts.
So
we
should
be
able
to
query
that
not
tested,
but
it
should
be
possible.
C
B
A
Okay,
I
think
so
for
me
like
what
I'll
take
away
from
this
is
like
the
next
episode.
What
I,
what
I'll
like
to
investigate,
is
taking
taking
it
towards
this
round
of
going
like
things
that
are
slow,
so
assuming
that
we'll
hit
transition
times
and
see
if
we
can
capture,
we
can
capture
these
in
a
way.
That's
that's
sensible
that
and
like
the
way
I
proposed
on
the
mailing
list,
was
that
we
have
sort
of
a.
A
We
have
a
threshold
that
we
expect
is
as
configurable
that
we
have
per
transition
that
we
can
set
to
a
large
number
and
if
it
it's
over
this
number,
then
we
can
say.
Okay,
just
add
the
name
of
this
bmi.
C
A
Yeah,
I
mean
I
mean:
do
we
want
to
discuss
it?
Like
I
mean
the?
The
idea
is
that
whatever
I
would
do
would
be
configurable
to
you
know:
we'd,
do
it
in
the
the
cr
or
something
in
the
kubert,
cr
and
it'd
be
optional
like
we?
Wouldn't
we
wouldn't
use
it
by
default.
To
just
be
something.
If
you
want
to
have
some
sort
of
advanced
look
into
how
things
are
are
going
with
the
vmis
you
could
use
it.
D
Yeah,
I
think,
last
time
kevin
mentioned,
it
could
be
like
a
startle
tool
watching
you
know
the
vmis
and
generating
you
know
some.
You
know
warning
or
something
and
then
you
you
can
have
the
logs
of
that.
You
know
and
track
that
and
create
the
thresholds.
But
outside
you
know
the
the
control
plane
itself,
it's
just
a
tool
that
we
run
and
watch.
You
know
the
all,
the
objects,
all
the
vmi
objects
and
our
name
space,
maybe
and
and
then
you
can
mark
you
know
and
create
thresholds.
D
It's
like
a
kind
of
a
debugging
tool.
Isn't
it
and
it
can
can
go
in
the
direction
that
you
know
david
vosso
is
developing
also
two
for
you
know
for
monitoring
that,
but
instead
of
being
like
something
to
the
end
of
the
test,
but
something
online,
I
think
what
ryan
wants
to
do.
E
Why
are
we?
This
makes
sense
to
sense
to
extend
that
kind
of
thought
process
to
our
monitoring,
where
we
have
verbosity
in
our
monitoring,
where
somebody
increases
the
cluster
verbosity
for
monitoring.
We
start
to
include
more
information
in
our
monitoring
labels,
perhaps
even
more
metrics
that
are
more
intensive
to
the
logging
stat
or
the
monitoring
stack,
and
things
like
that
that
we
didn't
want
by
default,
but
maybe
during
certain
low
stress
scenarios
we
would
is
that
a
concept
that
we
want
to
even
consider.
B
B
For
instance,
I
mean
you,
for
instance,
david
had
the
pr
where
he
added
the
timestamps
when
transitions
are
happening,
so
it
would
be
rather
trivial
to
when,
when
I
see
again,
I
see
an
alert
going
off,
which
says
amount
x
of
let's
say:
five
percent
of
ovmis
are
suddenly
starting
slower
than
usual
that
I
would
then
just
run
my
other
diagnosis
tool,
which
would
really
just
fetch
ovmis,
look
at
the
phase
transition
and
give
me
okay.
That
are
the
five
ones
which
are
slowest
right
now,
and
I
see
it
immediately
yeah
and
that's
not.
C
Okay-
and
I
I
want
to
re-advertise
that
that
that
building
that
I
think
I
I
prototyped
that
and
was
very
trivial
to
build
something
that
just
takes
the
data
we
have
in
the
vmi
and
creates
metrics
from
it
one
needed,
because
we
have
all
the
data
all
in
there
already
like
it's
not
going
away.
A
I
mean
I
mean
you
could
well
I
mean
I
understand
the
perspective
like
we
could
I
mean
this
could
be
something
in
the
audit
tool
as
well
like
it
doesn't
have
to
run
as
a
watch.
We
could
just
scrape
the
time
steps
based
on
what's
currently
there
if
we
don't
want
to.
If
we
just
don't
need
the
history
we
just
kind
of
grab.
If
you
notice
something
is
wrong,
we
can
just
capture
which
one
it
is
yeah
I
mean
I
can
see
that
perspective.
A
A
Yeah,
I
mean
to
me
well
to
say,
like
I
I'm
not
real
like
I,
I
think
I'm
I'm
not
necessarily
sold
in
any
direction.
I'm
just
trying
to
figure
out.
What's
the
right,
what's
the
right
one,
so
it
seems
like
folks,
like
the
the
client
side
way
of
doing
this
like
kevin
you're
on
the
client
side,
I
mean
other
folks
are
on
the
client
side.
It
sounds
like
so
it
seems
like
the
consensus
is.
C
A
A
Like
to
me,
those
are
the
two
options
going
through
the
audio
and
doing
it
after
I
notice
something
is
wrong
or
having
it
directly
in
previous
or
having
directly
in
in
their
dashboard.
C
C
A
How
about
so?
How
about
this?
I
think
I
think,
starting
with
the
client
side
to
me,
makes
the
most
sense,
because
it's
I
don't.
First
of
all,
I
don't
think
it's.
I
don't
think
it's
like
a
massive
commitment
to
do
it
like
it's
in
it,
I
think,
would
be
useful
in
general
just
to
have
it
so
to
me,
like
that's,
I
think,
that's
a
good
place
to
to
start
and
then,
if
it
ever
comes
time
that
it
makes
sense
that
we
want
to
have
this
expanded,
we
want
to
have
more
history.
A
I
think,
then
we
can
talk
about
the
discussion
of
you
know
what
you
have
kevin
versus
having
it
directly
and
previous.
B
Also
rand,
what
a
wonder
since
you
have
some
experience,
running
keyboard
and
bigger
scale.
What
do
you
do
with
events
in
general?
Like?
Do
you
mirror
them
to
the
logging
tool
for
post
analysis
or
something
the
events
like
for
like
the
pods
sam?
Just
all,
like
you,
cuddle
get
events.
A
Yeah
I
we
do
have
cabana
I
mean
I
don't
know.
If
I
don't
know
if
we
have
all
if
we
capture
all
the
events
or
if
it's
I
believe,
it's
just
we're
just
grabbing
logs
from
all
the
components,
I'm
not
sure
if
it
gets
the
points
of
it
having
the
events,
but
that's
a
good
point
too:
okay,
yeah,
okay,
that
makes
sense.
I
think
I
have
a
path
forward.
I'm
going
to
go
with
the
I'm
going
to
go
with
the
audit
tool.
I
think
that'll
get
a
good
start
to
this.
A
Okay,
let's
go
to
the
second
point,
so
I
I
had
this
earlier.
This
was
also
from
last
time,
so
marcel
you
created
a
pr
for
this
already,
which
is
which
is
good.
So,
basically,
just
to
reiterate.
From
the
last
time
we've
we've
had
like
marcelo's
under
some
really
good
presentations.
Talking
about
the
different
data's
gathered
and
you've
shown
a
lot
of
good
dashboards,
so
I
thought
it
made
sense
if
we
could
all
share
some
dashboard
that
we
do
that
we
can
show
whenever
we're
doing
testing.
A
So
we
can
just
kind
of
compare
you
know.
So
we
all
have
access
to
it,
so
we
don't
have
to
build
a
new
one.
Each
time
or
we
don't
have
you
know
different
data
or
anything
like
that.
We
just
have
an
apple
samples
comparison
when
we,
whenever
we
do
show
any
pictures,
so
any
sort
of
like
any
sort
of
changes
like
we
want
to
do
with
the
performance,
let's
contribute
to
to
the
community
dashboard
and
keyword
monitoring.
A
D
Yeah,
it
was
actually
I
was
pending
to
to
commit
this.
This
new,
you
know
pr
so
once
I
saw
your
message
I
just
I
just
did
it
just
after
yeah.
Well,
thanks
so
yeah,
this
is
an
update
version
of
the
the
dashboard
that
you
guys
saw
before.
I
was
improving
that
a
lot
and
including
many
things
that
I
think
it's
important
for
the
counterplane.
D
So
then
we
have
like
some.
I
something
you
know
few
categories
now
like
request
rates
and
latency,
then
the
the
work
kills
metrics
then
etc,
metrics
and
general
process.
You
know
memory,
fi,
cpu
file,
description
and
network
and
go
link
status,
garbage
collector
memory
and
well
it's
actually.
It
has
go
routines
and
threads.
It's
missing
here
and
storage
operations,
which
is
interesting
in
the
the
tests
you
know,
especially
when
I
deleting
vm.
So
sometimes
it
takes
a
lot
of
time.
D
You
know
to
delete
vmis
and
I
also
see
when
it's
taking
a
lot
of
time
to
delete
vmis.
I
see
a
lot
of
errors
for
unmount
the
empty
gear
directory.
D
So
this
you
know
this
slow
down
might
be
related
to
that.
So
so
of
this
matter.
As
I
think
it's
interesting,
so
this
is
the
the
new
dashboard
I
I
don't
know
if
I
have
a
picture
of
that
right
now,
but.
A
You
have
on
your:
are
you
able
to
share
your
screen.
D
It's
I
think
I
need
to
reopen.
Oh
can
you?
Can
you
see
something?
Yeah?
Okay,
great?
I
I
thought
I
would
need
to
fix
that.
Okay,
so
we
have
so
this
one
you
guys
already
saw
you
know
this
read
code
request.
D
It's
basically
get
you
know
watching
this
operation
and
we
have
the
durations
the
duration,
something
that
warns
me
is
virtual
machine
lease.
It's
taking
one
minute.
It's
not.
The
watch
also
is
was
expecting
to
be
slow,
but
at
least
take
one
minute
so,
and
I
always
see
with
that
and
something
is
very
slow.
You
know
in
the
know,
what's
the
request
that
it's
it's
taking
one
minute,
but
some
list
requests
taking
one
minute
anyway.
D
The
goal
here
is
to
just
describe
the
dashboard,
so
we
have
the
right
code
same
thing
but
for
put
delete.
This
is
the
the
metric
that
it's
interesting.
I
also
keep
this
the
rate
limit.
Duration,
the
vmi
creation,
for
some
reason,
pending
phase
doesn't
show
up
for
me,
but
you
need
to
investigate
more
that
so
because
you
guys
see
that
isn't
it
you
know.
C
D
Oh,
it
might
be
zero,
that's
the
right
yeah
and
I
yeah.
I
don't
show
the
values
that
has
zero
that
okay.
Thank
you.
That's
that's
the
reason.
Okay.
So
I
also
have
this
via
my
account
and
the
the
rate.
The
rate
is
interesting
just
to
see
how
you
know
more
or
less
how
many
vmws
per
second
it's
being
created
when
we
do
the
dance
test
and
then
all
the
work
kills
metrics
that
we
already
saw
before
and
and
then
process
opening
files.
D
So
it
shouldn't
be
that
problematic
here,
but
it's
just
something
to
to
keep
an
eye
on
that
and
and
then
before
we
didn't
have
the
threads,
but
now
it's
so
it's
good
just
to
have
not
only
the
go
routines,
but
also
the
number
of
threads
in
it
that
it's
been
created
and
the
garbage
collector
also
that's
been
problematic
right
now,
the
cube
rbca,
that's
the
the
the
one
that
spends
more
time
on
the
garbage
collecting
for
the
test
and
a
lot.
All
everything
looks
fine,
also
for
me
here
and
the
tcg.
D
I
think
we
were
discussing
that
before
so
the
tcd
performance,
especially
the
request
duration,
it's
something
that
we
we
need
to
keep
an
eye
on
that
and
if
some
everything
that
is
higher
than
10
milliseconds,
the
official
cd
documentation
says
that
it's
you
need
to
to
see
that
it's
a
problematic
and
the
storage.
D
D
D
A
lot
of
you
know
amount
over
here
operations
for
when
it's
deleting
the
via
okay.
Maybe
it's
as
expected
to
happens
that
I
don't
know
so,
but
just
some
correlation
here
anyway.
This
is
the
dashboard
so
that
it's
theirs
and
the
idea
is
to
have
it
open
and
anyone
can
can
play
with
that.
B
C
I
I
just
wanted
to
ask:
if
maybe
we
could
move
the
bmi
metrics
to
the
top
like
how
many
there
are,
because
it's
like
the
main
indicator,
you're
looking
at
mostly
like
everything,
depends
on
how
many
vmis
there
are.
If
there
is
no
vmis
and
you
get
a
lot
of
arrows,
it's
bad.
If
you
don't
have,
if
you
have
a
lot
of
bmis.
B
D
Yeah
yeah,
the
idea
is,
I
I
push
that
to
the
monitoring
and
I
think
you
already
mentioned
that
before
that
we
actually
maybe
should
you
know
you
know,
pull
the
dashboard
from
this
repository.
You
know.
D
B
D
I
don't
remember,
I
think
this
is
the
up
to
date.
It
was
should
I
run
today
with
the
lasted
master.
So,
but
I
I
yeah
I
I
need
to
double
check
that
I
cannot.
I
I
don't
know,
but.
A
Okay,
all
right
thanks
for
that.
Okay,
let's
go
to
the
next
item,
so
we
have
metrics
focused
on
vmis
and
not
vm.
C
Yeah,
I
added
that,
because
it
came
up
in
my
team.
I
think,
a
few
days
ago
I
was
asked
if
we
have
like
count
and
some
other
basic
metrics
on
on
vms,
and
I
noticed
or
yeah
I
was
like.
I
don't
think
so,
and
we're
focusing
a
lot
on
the
vmi,
because
it's
the
main
workload,
but
we
also
still
have
a
first
citizen
object,
called
vmvm
and
it
also
progresses
through
stages,
and
I
indust
things,
and
I
want
to
mention
that.
C
I
I
what
we
looked
for
in
that
specific
case
was
like
the
amount
of
vms
that
are
running.
But
we
are
looking
right
now
with
looking
talking
about
phase
transitions
and
adding
metrics
to
our
dashboards,
and
we
all
do
all
that
for
bmis.
And
I
don't
know
if
we
should
do
that
for
vms.
D
E
Think
that
we
have
so
we
have
metrics
that
represents
a
running
vm
today,
that's
just
the
bmi
metrics
that
we
have.
I
think,
if
there's
something
specific
to
the
vm
controller,
that's
where
it
makes
sense
to
make
vm
specific
metrics
so
like,
for
example,.
E
Of
vm
experiences
or
vms
that
are
in
I
I
don't
know
what
else
would
we
really
want?
That's
just
specific
to
vms,
so
we.
B
E
Specific
flow,
so
we
have
a
day
of
on
templates.
It's
going
to
create
that
data
volume.
I
think
it's
going
to
wait
for
that
data
volume
to
complete
before
moving
on
to
creating
the
bmi,
so
that
that
would
be
a
specific
one.
Storytelling
yeah,
I
think
we're
just
seeing
how
many
vms.
B
D
B
For
for
for
something
like
not
running
vms,
I'm
not
sure,
for
instance,
I
if,
if
I
would
say
like
not
running
games
which
should
run
yes,
I'm
not
sure.
What
is
that?
Do
we
can't
do
we
also
how
many
co
yeah,
I
don't
know,
I
think
there
there
are
ways
to
see
how
many
conflict
maps
are
there
and
so
on
right.
C
B
D
B
B
A
Okay
is
that
does
that
satisfy
your
ask
kevin.
C
B
We,
I
think
that
I
mean
there.
I
think
it's
a
it
makes
sense
like
people
are
normally
not
using
ports
directly,
just
right.
D
B
Manually,
I
think
it
makes
sense
here
too
I
I
can
definitely
see
some
scale
use
cases
where
you
can't
just
do
your
own
vmi
stuff,
but
for
the
usual
vm
case,
where
you
want
to
stop
it,
restart
it
modify
it
and
start
it
again
and
so
on.
You
probably
want
to
am,
but
there
are
other
other
things.
So
thank
you.
Vm
is
one
control.
On
top
of
him,
I
would
say.
C
B
B
What
what
you
very
often,
I
think
we
need
is
that
you
have
to
monitor
your
whole
namespace
to
see
if
something
goes
wrong
like
you
would
probably
not
necessarily
see
that
a
part
has
issues
in
the
storage
provisioning
phase,
where
it's
waiting
for
pvcs,
you
would
just
see
that
it
takes
long
to
start,
but
you
would
probably
see
on
the
events
and
on
some
metrics
that
the
storage
provisioning
itself
takes
a
long
time
and
you
would
basically
monitor
both
and
see.
B
C
B
A
A
A
Okay,
okay!
Well
so
one
of
the
one
things
that
I
was
gonna
bring
up
or
just
maybe
discuss
so
like
we,
we
have
kind
of
delay
the
land
of
things
right
now
we
have
so
marcelo's
got
the
the
density
test
right
now
we
have
that
in
ci
right
we
have
david,
wrote
the
audit
tool
marcel,
you
did
the
the
the
load
generation
tool,
so
I
think
so,
someone's
like
kind
of
looking
at
what
we
have.
A
I
think
that
was
one
of
the
things
we
had
from
last
time,
so
we're
getting
close
to
a
few
of
these
things
being
able
to
tying
a
few
things,
two
things
together
and
start
getting
a
bunch
of
valuable
information
on
a
on
a
per
pr
basis.
So
I
think
the
next
step
and
and
then
I
think,
this
one's
for
you,
david,
so
david
you're,
doing
that
you're
going
to
do
the
thresholds
and
as
part
of
that
we're
going
to
have-
and
this
is
going
to
be
nci
right.
E
Yeah
so
just
initially
going
to
export
the
profi
results
and
then
after
we
see
a
few
runs
of
that,
we
can
establish
the
pattern
of
what
we
want
to
set
for
our
thresholds
and
that
environment,
and
we
can
commit
that.
E
A
Okay,
so
I
guess
that's
good
yeah,
so
we
have
so
as
part
of
this
right
david.
It's
that
we
have
to.
We
have
to
integrate
a
bunch
of
the
tools
right
like
or
is
it
that
you
want
to
run
the
audit
tool
and
just
generate
the
result
in
ci
and
then
maybe
the
kind
of
tying
it
together
and
see
I
could
be
separately.
Would
that
work.
E
What
what
do
you
mean
by
tying
it
together.
A
So
like
we
have,
we
have
the
generation,
we
have
the
load
generation
tool,
we
have
the
audit
tool
and
we
have
marcelo's
density
test
so
tying
together,
three
of
those,
but
that
could
be
a
separate
task,
then
just
generating
the
thresholds
or
yeah.
E
So
I
I
I
think,
marcelo
correct
me
if
I'm
wrong
that
you
were
going
to
be
replacing
the
current
density
tests
or
at
least
that's
generating
the
load
for
the
density
test,
with
your
new
load
generation
tool,
and
then
we
can
integrate
independently
of
that.
Eventually,
this
perf
audit
tool
as
well
and
the
perf
audit
tool
it
doesn't
have
to
have
thresholds
immediately.
You
can
just
gather
results
and
export
them,
and
then
we
can
decide
on
thresholds
after
we
get
a
few
iterations
of
data.
D
D
No,
not
not
our
creative,
we
created
already
so
yeah
and
I
also
integrated
the
the
dashboard
in
the
ci
to
see
the
job
that
is
running
there.
I
read
talk
with
federico,
so
it
is
their
graphene
dashboard
right
there.
So,
but
I
don't
know,
we
cannot
really
see
the
metrics.
You
know
with
the
dash
we
can
maybe
import.
I
don't
know
if
we
can
edit
the
graph
dashboard,
sir,
but
I'll
have
a
look
on
that.
B
D
No,
I
didn't,
I
didn't
so
the
ci
you
know
infrastructure
has
a
graphona
dashboard
that
I'm
saying,
but
I
didn't
play
with
that,
and
I
don't
know
if
we
can
see
the
job
that
is
already
running
there
and
well.
We
can
see,
but
I
don't
know
which
metrics
is
exported
and
if
we
can,
you
know
dynamically
include
a
new
dashboard
there
or
if
we
need
to.
B
Yeah
but
the
dashboards
are
so
if
you,
if
we
get
your
pr
merged
in
keyboard
monitoring,
we
can
just
roll,
we,
we
can
just
run
our
deploy
job
every
day
and
it
would
pick
up
the
latest
change
there
and
it
would
deploy
it
on
for
me
of
the
graffana
dashboard.
I
guess
that
would
be
something
which
you
want
to
do
from
the
ideologist.
B
A
Okay,
so
who
can
yeah.
A
A
Okay
and
so
then
marcella
wants
the
dashboard.
D
B
C
What
would
be
great
for
the
grafana?
I
great
I
I
I
just
collected-
I
don't
know
what
other
metrics
we
have
in
there,
but
we
we
are
exporting
metrics
with
prometheus
jobs
without
test
runs
so
for
permissions.
It
would
be
great
if
we
could
use
grafana
or
if
we
have
a
prometheus
ui
to
explore
those
metrics
without
creating
a
dashboard
taken
in
because
we
can't
access
them
in
other
ways.
C
B
C
A
So
just
to
understand
so
like
we
can
so
this
says
all
the
information
about
ci
we
could
we
like
is
this
the
work
that
you're
doing
ourselves
that
you'd
integrate
like
some
more
information
based
on
a
job
like
we
could
get
like
the
perf
data
from
it.
A
B
B
Already
we're
just
not
exposing.
A
Okay,
all
right!
Well,
I
think
we
have.
I.
I
think
those
are
the
kind
of
three
the
three
the
three
things
here.
I'm
gonna
show
you
looking
at
this
one
david
look
at
this
one.
I
can
help
with
the
middle
one:
do
the
hook
up
the
load
generation
to
to
replace
the
density
to
the
way
you're
doing
it
the
way
you're
generating
a
load
right
now
intensity,
yeah
and
give
that
one
a
shot
yeah.
I
was
trying
to.
D
A
D
A
Fine,
I
mean,
I
think
you,
you
understand
the
legitimacy
pretty
well,
so
probably
quite
a
lot
faster
for
you,
okay,
cool
all
right.
Are
there
any
any
other
topics
that
we
whatever
discuss.
D
Yeah,
just
just
one
less
comment,
so
if
you
guys
remember,
I
was
you
know
playing
with
the
creating
500
vms
per
you
know
in
the
node
and
for
you
know
our
scaling
test.
We
want
to
pack
as
much
vm
as
possible
in
the
future,
and
I
was
reaching
a
lot
of
you
know.
Libya
timeouts,
and
so
I
and
I
created
the
cluster
with
the
coop
spray
and
the
first,
the
first
cluster
that
I
create
with
was
with
docker
runtime.
D
Then
I
tried
with
cryo
and
actually
got
worse.
It
could
create
only
500,
400
vms
and
I
was
creating
580
and
then,
when
I
tried
to
use
container
d
as
the
runtime,
I
could
create
500
vms.
You
know
without
any
complaint
from
creating
the
containers,
so
without
any
leave
it
timeout
and
I
also
they
using
the
cryo.
D
Actually
I
received
a
lot
of
you
know
events
saying
that
they
cry
the
runtime
was
overloaded
and
was
delaying
the
creation
of
containers,
something
like
that.
So
it
might
be
the
if
docker
was
also,
you
know,
seen
the
same
issue
there.
The
runtime
was
being
overloaded
and
that's
why
I
couldn't
create
500
vms,
but
using
the
runtime
as
continuity.
I
can.
I
could
do
that
so
and
then
I
moved
to
an
x
issue
that
was
shortest
of
memory
that
I
already
discussed
with
ram
on
that
we
can
actually
allocate
the
minimal.
D
You
know
memory
for
vms
vmis
now
and
and
then
I
can,
you
know,
schedule
more
vms
per
and
and
finally
I
managed
to
create
you
know
just
to
say
five
minutes.
Finally,
I
managed
to
create
one
500
vmis
per
node,
so
yeah
I
will
share.
You,
know
the
new
experiment
soon.
So
with
you
guys.
A
Okay,
cool
yeah
that
one
that
would
be
cool
and
then,
if
you
do
the
the
rate
limit
of
change
that
roman
mentioned
to
see
the
difference
in
crate
time
or
just
the
no,
how
much
is
being
rate
limited
would
be
cool.
Okay,
all
right.
We
have
four
minutes
left
to
actually
remember
so
we
have
there's
a
few.
I
just
want
to
draw
attention
to
a
few
of
the
open.
Mrs
in
case
we
can,
if
these
are
ready.
A
The
pprof
profiler
is
this
one
good
like
has
there
been
any
more
review?
That's
needed
on
this
looks
like
you
haven't
one
looks
good.
Oh.
E
E
A
All
right
who
marcelo
are
you?
Are
you
comfortable
with
this,
and
I
can
do
the
the
approved
yeah?
Okay,
okay,.
A
Yeah,
actually
I
don't
yeah
roman,
if
I
think
you'll
have
to
you,
have
to
do
the
approve
here.
B
F
C
B
F
B
C
B
D
A
Okay,
all
right,
here's,
the
next
one.
This
is
the
monitor,
request,
counts
this
one's
good.
It's
got
approved,
it
looks
good
to
me,
looks
like
it'll.
Just
just
go
in
after
ci
passes,
okay,
yeah.
A
I'm
still
working
on
the
the
failed
phase
transition
metrics,
I'm
going
to
transition.
What
I'm
doing
slightly,
I
think
like
based
on
your
last
comment
date,
I'm
almost
thinking
that
this
needs
to
be
actually
in
create.
The
only
thing
we
need
to
do
is
actually
just
catch.
A
A
This
one
merged
yeah.
It
did.
C
C
Okay,
that's
the
cpu
one,
the
the
goring
fix
merged
and
so
did
a
few
of
the
back
ports,
but
not
all
of
them
because
it
seems
like
we
did.
Somebody
didn't
backport
everything
to
consistently.
So
if
you
have
the
backboards
fail
for
missing
images
and
lanes
and
very
weird
stuff,
I
don't
understand
and
I
couldn't
get
help
really,
but
I
think
I
can't
I.
I
can't
bring
up
the
energy
to
fix
releases
that
nobody
else
touched
for
a
long
time,
even
though
releases
before
them
got
touched
for
a
long
time.
C
E
A
To
the
the
end
of
it,
that's
right:
we
we
could,
I
did
the
backboard
internally,
so
it's
fine.
E
But
in
the
future,
once
we
approach
getting
into
cncf
incubation
and
eventually
ga
the
predictability
of
our
back
ports
and
our
release
schedule
for
how
long
the
community
actually
supports
releases
will
kind
of
be
defined
right
now,
it's
just
kind
of
in
the
backlog,
backlog.
A
Okay,
well
we're
at
time
everybody
thanks
very
much
I'll
see
you
all
online
have
a
good
day.