►
From YouTube: SIG - Performance and scale 2021-09-09
Description
Meeting Notes: https://docs.google.com/document/d/1d_b2o05FfBG37VwlC2Z1ZArnT9-_AEJoQTe7iKaQZ6I/edit#heading=h.p5py231aneed
A
Okay,
all
right
welcome
to
sixth
scale.
It's
september
9th
document
links
in
chat.
Add
your
name
as
an
attendee,
please
and
feel
free
to
add
topics.
While
we
discuss
anything
okay,
so
first
item
on
the
agenda,
it's
an
issue
for
profiling,
the
keyword
control
plane,
so
I
created
this
two
days
ago.
I
think
tomas
is
here
yeah,
so
the
basically
the
idea
is
that
so
we
have
a
few
changes
that
are
actually
close
to
merging
the
profiler.
A
We
have
the
load
generator,
so
this
issue
is
just
to
track
some
of
the
work
that
we
can
do
to
to
profile
all
the
control
plane.
The
different
tests
we
can
do
things
like
that
and
tomas
is
going
to
take
a
look
at
this,
but
something
that
we
can
look
at
as
a
community.
I
think
there's
a
lot
of
different
tests
that
we
can
do.
A
B
B
B
Is
there
anyone
that
is
reaching
high
numbers
of
who,
like
us,
like
10
000,
concurrent
users
in
a
single
pool.
A
Can
you
elaborate
like
what
you
mean
like
10
000,
current
users
so
like
this
is
10
000
users?
This
is
like
like
how
many
vms
would
you
say
like
is
this?
Are
you
sending
users
for
like
a
kubernetes
cluster.
B
The
maximum
number
of
vms
inside
google
is
in
a
single
vpc
is
15
000.,
and
that's
why
we
have
done
in
a
single
vpc,
10
subnets
in
each
subnet
we
have
up
to
1
250
vms,
running
the
os
and
on
top
of
os
means
okd,
and
on
top
of
okg
we
you
we
install,
convert
we
these
1250
clusters,
we
plan
to
have
up
to
10,
000
concurrent
users,
and
that's
why
we
are
asking
if
it's
possible,
someone
else
is
reaching
already
that
amount.
C
B
Let
me
just
specify
each
vm,
for
you
know
just
one
second,.
C
So
the
question
is:
if
anyone's,
if
we've
ever
run,
that
many
virtual
machines-
or
I
guess
I'm
trying
to
understand
what
we're
what's
the
time.
B
B
A
Yeah,
how
many,
how
many
would
you
say
like
andre
if,
like
this
was
we
have
one
cluster
where
you
know
what's
what
would.
D
B
Elaborate
a
little
bit
better
for
you
understand.
Can
you
enter
my
website
ddas.global?
Then
you
can
understand.
B
B
B
We
offer
these
flavors
if
you
walk
the
mouse
on
top
of
basics,
then
that
you're
gonna
understand
okay,.
B
Is
and
what
we
we
plan
to
achieve
is
one
million
concurrent
users,
because
we
have
today
1.5
million
name,
it
user.
Okay,
we
plan
to.
E
B
C
C
Let
me
stop
there
for
a
the
the
limits
are
difficult
to
put
hard
numbers
on,
because
that's
something
that's
going
to
be
specific
to
the
hardware
you're
using
and
what
that
hardware
and
it's
capable
of
both
at
the
node
level
and
the
control
plane
level.
I
don't
know
if
10
000
virtual
machines
is
practical
or
not
for
a
single
cluster.
I
don't
think
that
we've
tested
at
that
scale
yet
to
know,
I
would
say,
as
a
gauge
to
understand
like
if
you're
in
the
ballpark
of
something
that
is
practical
or
not.
C
Look
at
what's
been
scheduled
for
pods
on
kubernetes
clusters.
So,
if
you're
looking
for
how
far
kubernetes
can
scale
just
with
pods,
I
would
expect
that
we
would
get
pretty
close
with
virtual
machines
to
that
same
sort
of
realm
or
ballpark,
because
in
the
end,
we're
just
we're
just
pods
with
cumulative
processes
running
inside
of
them.
So
that
would
be
like
if
you're
looking
for
like
hard
numbers
and
just
kind
of
understand
the
limits.
B
B
We
are
under
the
all
the
limits
that,
but
we
are
doing
these
in
let's
say
in
a
hard
way
the
the
users
came
and
go,
and
the
idea
is
when
the
users
log
up,
we
just
kill
the
machine,
and
I
don't
know
if
this
is
already
available
like
linked
clones,
that
we
have
on
on
vmware
and
citrix
solutions.
B
Can
you
elaborate
how
you
are
doing
the
pool?
Because
I
saw
some
some
information?
Yeah,
that's
actually
something
that's
been
discussed.
The
idea
of.
C
A
virtual
machine.
I
would
like
to
understand
better
if
you
can
elaborate
what
is
available
today
and
yeah.
So
what
you're
asking
for
is
the
equivalent
of
an
aws
autoscaling
group
or
a
google
cloud
compute
engines,
instances.
That's
that's
what
I'm
interested
on
right.
We
don't
we
don't
have
that.
Yet!
That's
something!
That's!
I
would
say
that
it's
in
the
progress
of
being
designed,
it's
something
that
we
keep
poking
at,
but
it's
something
that
has
yet
to
gain
the
kind
of
attraction
to
actually
get
implemented.
C
Quite
yet
it's
something
we're
interested
in
doing,
and
I
think
that
your
use
case
actually
helps
us
drive
that
forward
a
little
bit.
B
Have
another
tool
that
keeps
the
profile
of
the
user
before
you
understand.
C
B
B
B
A
Okay,
while
we
wait
for
andre-
maybe
we
can
close
this
first
topic
here,
just
if
you
guys,
so
I
guess
the
the
topic
for
all
basically
asked
here
was
that
so
tomas
is
looking
at
doing
some
profiling
of
the
control
plane.
Here
are
some
of
the
tests
that
we
want
to
do
like
the
number
of
vmi's,
the
number
of
nodes
and
the
tools
that
we
want
to
use
to
do
it
in
the
pattern
we're
going
to
follow.
A
But
if
there
are
any
comments
about
that,
we
can,
we
can
address
them
in
the
issue.
Sorry,
okay,
sorry,
for.
B
That
sorry,
sorry,
for
that,
we
plan
to
have
up
to
1250
notes
in
the
same
in
the
single
cluster.
I
think
the
number-
and
we
for
you
understand,
since
we
have
several
flavors
for
you
understand
if
we
have
the
type
one
that
is
two
virtual
cpu
and
four
gigabytes
of
ram
we
handle
in
each
node
64
vms.
B
If
we
have
that
with
32,
cpus
and
64,
let
me
grab
it
here.
I
don't
remember
everything
with
16,
cpu
and
32
gigabytes
of
ram.
We
handle
eight
per
node,
for
you
understand.
C
B
Yeah,
my
question
is
how
to
create,
in
the
same
cluster
like
four
posts
that
need
to
go
up
to
ten
thousand.
These
four
fools.
C
Oh,
you
understand.
We
don't
have
that
abstraction
today.
So
the
way
you
can
do
it
is
to
create
your
own
controller
or
your
own
api
logic.
C
That's
going
to
post
a
vm
every
time
a
user
is
wanting
to
access
a
virtual
machine
and
that
vm
has
the,
I
guess,
the
characteristics
of
the
class
or
the
flavor
or
whatever
you
want
to
call
it
and
then
manually
delete
that
that
virtual
machine,
when
you're
done
with
it
so
it'd,
be
a
one-on-one
relationship
between
the
user
logging
on
and
being
being
created,
and
there
wouldn't
be
a
pull
mechanism
today.
If
you
want
to
use
keeper
as
it
is
right
this
moment
that
won't
exist.
B
But
the
mechanism
to
clone
the
the
disk
it's
available.
C
Yes,
you
can
clone
the
disk
using
cdi
and
what
would
happen
is
we
would
use
a
data
volume.
The
data
volume
could
be
associated
with
the
virtual
machine
and
it
doesn't
like
a
smart
cloning
behind
the
scenes,
depending
on
what
your
csi
driver
is,
meaning
that
you're
not
actually
taking.
We
plan
to
use.
B
Gluster
and
a
solution
call
it
vgo
for
also
do
they
do
the
duplication
of
the
disks?
For
you
know,
okay,.
A
Yeah,
andre,
you
know,
like
dave,
was
saying:
virtual
machine
pools,
isn't
is
influenced,
yet
there
is
a
design
dock
for
it.
So,
if
you
do
have
you
know
anything,
you
want
to
talk
about
with
your
use
case.
You
know
if
you
review
and
add
your
thoughts
in
there
it's
one
of
the
things
that
we
have
as
then.
The
list
of
things
is
covered
in
this
in
the
sig
that
we
want
to
get
to
eventually-
and
I
think,
having
an
additional
use
case
would
would
definitely
help
us
a
lot.
A
Okay,
all
right
thanks
andre,
so
before
we
move
on,
I
did
was
there
any
other
comments
that
people
had
about
the
first
one
like
in
about
the
profiling
before
we
move
on
to
the
third
points,
didn't
sound
like
there
was
anything,
but
if
not,
we
can
always
talk
on
the
issue.
A
Okay,
let's
move
to
number
three,
so
we're
controller
stress,
starting
with
increased
number
of
vms.
Let's
take
a
look.
We
have
a
snapshot.
F
Yes
hi,
so
I
was
doing
I
was
doing
the
stress.
I
was
trying
to
stress
my
cluster
with
with
intent
to
know.
When
does
our
control
plane
components
start
acting
up
in
the
sense?
Do
we
need
any
hpa
or
any
any
types
of
auto
scaling
for
for
our
control,
plane,
components?
That
was
a
larger
idea
and
I
started
playing
with
it
and
yeah
I'm
still
playing
with
it.
F
But
this
is
one
of
the
things
I
I
noticed
that
so
the
first
graph
is
the
number
of
vmis
and
running
due
to
the
limitations
of
nodes.
Only
120
could
be
in
running,
but
I
created
1000
vm
objects
out
of
it.
F
One
only
120
could
be
scheduled
and
running,
but
into
starting
like
in
the
first
couple
of
minutes
thousand
were
created,
and
we
can
see
that
the
first
four
graphs
are
different,
but
if
you
just
go
up
at
the
bottom,
the
word
controller
keeps
with
a
high
on
the
left
yeah.
The
word
controller
stays
high
on
the
cpu,
even
after
all,
the
vmis
and
vms
are
deleted.
So
it's
like
400
percent
of
of
the
current
requested
cpu
resources.
F
So
I
think
I
just
found
it
good
to
be
changed
or
we
should
take
a
look
at.
A
G
A
I
see
is
that
what
I
think
this
is
kind
of
what
kevin
saw
when
in
some
of
his
graphs,
where
we
we
saw
that
when
we
do
the
deletes
there
is
well,
I
don't
know
if
it
increased.
I
don't
remember
if
it
was
it
just
at
least
it
hung
around
at
least
a
higher
level
than
we
expected,
maybe
doing
the
garbage
collection.
But
I
I
don't
know
if
we
expect
this,
we
expect
an
increase
in
cpus.
It's
like
almost
twice
as
much
for
doing
the
deletes.
I'm.
F
Not
sure
if
this
has
kevin's
fix
in
this
I
mean
I
was
just
using
hco,
which
was
available
and
operator
helps,
I'm
not
sure
if
kevin's
latest
fix
is
included
here.
H
Yeah,
the
fix
doesn't
doesn't
fix
that.
What
ryan
just
meant
is
we
saw
that
when
we
delete
a
lot
of
objects,
resource
utilization
stays
up
for
longer
than
the
resources
are
being
deleted
because
it
takes
the
go
processes
to
clean
up
their
memory
for
a
while,
and
I
don't
think
that's
I
we
don't
know
yet
if
we
can
fix
that
or
have
to.
E
A
Kind
of
peaking
at
120,
and
then
you
said
you
had
a
lot
of
other
vms
objects
lying
around.
I'm
wondering
if
you
know,
if
you're
deleting
a
thousand
of
them
and
maybe
that's
taking
some
much
more
cpu
or
something.
I
wonder
if,
if
you
did
this
experiment
with
just
the
exact
amount,
if
this
was
there's
a
if
there's
a
difference
at
all
like
if
you
didn't
just
with
120,
you
had
120
running
so
what
you
create
120
vms,
you
have
120
running
bmis,
and
then
you
delete
them.
F
F
A
Yeah
and
then
I
think
marcel
has
got
a
change
that
we
can
get
also,
some
of
the
would
be
good
to
even
like
these
are
some
good
boards
of
even
having
the
the
standardized
ones
too.
So
we
can
always
look
at
the
same
ones.
We'll
make
this
easier
as
well.
A
Okay,
cool
all
right!
Well,
thanks
for
sharing!
Definitely
something
to
look
at.
You
said
this
is
one
node
right,
I
think
right
he
says
onenote
you
can
only
put
120
games.
F
F
D
H
I
said
I
think
I
already
sent
you
the
link
to
the
board
we
use.
We.
We
have
here
that
marcelo
built
for
the
tests,
if
you
use
that
next
time,
maybe
yeah.
A
Okay,
cool
yeah.
That
will
the
reason
I
mentioned
this
because
it
has,
I
think
it
has
these
just
so
you
don't
have
to
go
through
and
create
them
it'll
make
it
easier
for
you
for
the
next
time.
A
Okay,
all
right
thanks
for
sharing
yeah.
I
I'm
definitely
curious,
like
I
said,
maybe
see
if,
like
you
know
how
different
number
of
vms
that
you
have
see
if
it
affects
this
at
all,
that's
something
I
think
of
not.
Maybe
we
need
to
look
at
something
here
as
to
why
the
cpu
increases
right
around
when
you
do
the
deletes,
it
seems
a
little
weird.
D
H
So
I
would
suspect
it's
because
there
is
more
api
calls
in
the
back
deleting
the
vmis,
and
that
would
increase
the
total
of
stuff
going
on.
And
then
there
is
still
the
garbage
collection
in
process,
because
both
vm
objects
and
vmi
objects
get
garbage
collection
in
the
go
process
as
well.
So
it
might
just
be
yeah.
F
I
think
next
time
I'm
going
to
keep
the
vm
vms
for
longer
to
see
to
separate
out
the
to
separate
out
the
events.
So
we
can.
A
And
this
is
what's
template.
Validator
is
this
like?
This
is
like
making
sure
that
the
the
vm
templates
are
correct,
or
something
is
that
this
is
something
yes.
F
Okay-
yes,
it's
and
that's
that's
what's.
My
second
point
is
about.
I
I
also
want
to.
I
also
also
want
to
see
how
we
can
see
the
latency
of
web
hooks,
for
so
repetitive
related
actually
creates
admission,
webhook
and
I
wanted
to
plot
a
graph
of
the
latency,
but
I'm
not
finding
a
good
way
to
do
it
with
the
metrics
api
server
request
time.
Duration,
metrics.
So
if
you
have
any
hints
about
that,
I
would
be
helpful.
It
would
be
helpful.
C
The
latency
of
of
which
well
yeah,
I'm
curious
there.
What?
What
exactly
are
you
trying
to
measure
the
latency
of.
C
H
Yeah
we
were,
the
goal
was
to
investigate
how
much
low
templar
validator
can
get
until
we
should
scale
it
up
like.
F
The
larger
larger
idea
is
to
see
if
we
need
any
type
of
scaling
up
with
increased
number
of
load
or
increased
number
of
virtual
machines.
C
C
H
A
H
H
H
Template
validator
looks
at
vms,
created
from
openshift
templates
and
validates
that
they
comply
with
the
template
they
are
created
from
the
same
for
the
vmi.
It
also
validates
that
the
bmi
is
created
from
that
does
not
change
any
values
that
collide
with
the
templates
being
created
from.
C
So
it's
a
an
update
web
hook,
so
it
cares
even
after
the
vm
has
been
created.
H
It
is
only
validating,
as
far
as
I
know,
and
but
it
still
cares
that
you
don't
change
the
vm
after
it's
been
created
because
it
also
upgrades
the
templates
with
our
upgrades,
like
we
guarantee
the
vms
created
from
those
templates
work.
So
we
make
sure
the
user
doesn't
break
any
vm
create
from
a
such
template.
C
H
As
far
as
I
know,
yes,
there
might
be
some
values
they
are
allowed
to
change
like
we're,
for
example,
what
we're
looking
at,
where
we're
as
we're
allowing
them
to
change
resource
limits,
for
example,
I
think,
but
in
general
yeah
you
shouldn't
change
the
template
that
you
create
from
ibm.
Okay,.
C
C
Interesting
so
sorry
that
was
a
tangent
for
a
second.
These
vms
are
being
created.
Did
they
use
pvcs
and
are
they
doing
any
sort
of
cloning,
or
anything
like
that,
like?
What's
the
storage
that
they
were
using.
F
They
can
be
anything,
I
mean
we
don't
we
don't
limit
it
by
in
any
of
the
templates.
No.
C
I
I
I'm
I'm
sorry,
I'm
talking
about
the
the
load
tests
that
you
did
where
you
were
using
vms
yeah.
C
All
right
all
right
at
some
point,
it
would
be
nice,
and
I
don't
know
if
we
have
the
right
environment
for
this
quite
yet,
to
understanding
the
impact
of
using
pvcs,
because
there's
more
api
calls
and
things
like
that
in
our
control
plane
associated
with
vms,
especially
vms,
that
have
pvcs
attached
to
them
than
just
the
container
disk
flow,
because
we're
we're
doing
more
informers
and
all
that
kind
of
stuff.
H
H
C
Have
a
data
volume
and
I
want
the
source
to
be
this
container
disk
and
I
want
to
put
on
the
stateful
pvc,
but
if
you
just
have
a
volume
just
like
you
do
in
a
bmi
a
container
just
there.
What.
C
The
scenes
is,
we
create
an
ephemeral,
drive
or
disk
or
whatever,
and
share
some
data
across
that.
C
Yeah,
so
at
some
point,
when
we're
talking
about
the
vm
use
case,
understanding
the
scale
of
vms
and
not
vmis,
we
need
to
start
introducing
pvcs
there,
but
the
problem
with
that
is
that
we
begin
to
be
throttled
by
our
storage
provider.
So
how
quickly
it
can
provision
pvcs
and
things
like
that.
That's
going
to
be
a
new,
I
guess
graph
or
something
like
that
for
us
to
understand.
What's
actually
within
our
control,
so
we
don't
have
control
over
how
quickly
storage
is
provisioned
for
a
vm
as
part
of
the
start
flow.
C
Is
use
it
that's
so
that's
what
we
were
discussing.
It's
not
we're
not
actually
using
persistent
storage
for
these
load
tests
today
and
that's
what
I
was
pulling
out
that
we
need
to
start
doing
in
order
to
understand
the
impact
of
that
what's
happening
today
is
we're
using
ephemeral
storage.
So
it's
like
local
storage
on
the
node,
that's
being
on
demand
provisioned
for
these
virtual
machines
as
they
land
on
the
node,
which
helps.
C
C
F
There's
one
more
thing
I
noticed
in
this
graph
and
on
the
logs,
but
it's
probably
out
of
scope
of
this
discussion.
It
was
that
this
web
book
actually
just
monitors
for
virtual
machines,
but
if
you've
seen
the
template
related
to
cpu
graph,
the
the
second,
his
the
second
hill,
we
see
just
left
to
it,
yeah
that
was
the
right
to
it.
Sorry,
one
more
right
yeah,
so
that
that
one
was
when
when
the
vmis
were
actually
getting
created
out
of
the
vm
objects.
F
So
I
saw
that
in
the
logs
that
in
in
the
logs
for
the
temperature
that
that
there
are
entries
two
times
so
probably
missing
something,
but
it
can
also
be
possible
that
when
vm
status
is
changing,
there
is
another
request
coming
in
to
validate
the
vm.
I'm
not
sure,
but
I
think
that's
that's.
C
What's
happening,
that
makes
sense
so
when
the
vmis
are
launched,
it's
mutating
the
vm
status,
which
means
it's
going
through
that
same
web
hook.
Every
time
we
do
that.
C
H
H
E
J
The
bmi
is,
are
we
talking
about
vm
or
vmi
vm,
so
the
vm
has
the
status
sub
resource
enabled
you
can
ride
at
both
locations,
but
not
at
the
same
time
from
a
webhook.
So
you
can
use
the
status
subresource
endpoint
to
modify
it
with
the
patch,
for
instance,
or
the
spec.
But
if
you,
for
instance,
do
an
update
and
modify
both,
it
will
only
update
the
spec.
C
F
B
H
But
till
next
time,
when
you
run
this
test,
can
you
keep
the
grafana
board
running
for
a
bit
long?
I
would
be
curious
if
rich
controller
memory
goes
down
to
its
normal
before
as
well.
A
Yeah
it's
making
its
way
down,
but
yeah
it
would
be
nice
to
see
yeah
I
mean
I
like
the
thing
that's
occurs
to
me.
It's
like,
like
that's
all
you
mentioned
like
this,
this
peak
here
it
lines
up
when
we
do
the
when
we
do
the
deletes
what
what
is
going
on.
I
don't
know
I
mean
it'd
be
good,
like
I
think
we
we
said
we
have
a
few
tests.
A
C
Is
going
on
here
I'll
try
to
get
the
p
prof
thing
merged
soon,
so
then
you
could.
Actually,
if
you
wanted,
do
the
p
prop
profile
during
that
that
hump
and
maybe
get
some
interesting
results.
F
Also,
does
it
make
sense
for
us
to
have
a
metric
for
vms
and
not
just
vmi?
I
I
could
not
find
one
metric
for
vm
number
of
vms
in
the
cluster.
A
Yeah,
it's
a
good,
it's
a
good
topic
to
discuss
because,
like
we've
mentioned,
I
think
it's
been
mentioned
before
and
I
think
there's
definitely
some
use
cases
and
you're
kind
of
polluting
some
of
them
here
like
like.
What's
you
know
like
you
can
imagine
some
latency
like
what
is
the
latency
between
you
know
when
you
set
something
running
to
maybe
when
the
vmi
is
running,
there's
a
lot
of
areas
here
like
there's
api
calls,
there's
there's
other
things
that
are
happening.
A
Even
the
count
of
the
number
of
vms
that
would
also,
I
think,
would
be
useful.
I
think
it's
in
this
area
of
perfect
scale,
so
I
mean,
if
you
have,
I
mean
I
think
this
is
a
good
area
that
we
can
kind
of.
A
C
Right
now,
what
we
have
is
a
breakdown
of
everything
that
transitions
after
vmi.
It's
actually
posted.
What
we're
lacking
like
ryan
you're,
saying
right
here
is
we
lack
everything
that
occurs
before
that
bmi
is
posted.
So
when
we're
using
a
vm,
we
don't
have
any
sort
of
visibility
and
how
long,
for
example,
the
storage
provisioning
takes,
or
just
the
going
from
a
vm
being
posted
to
actually
posting
the
vmi.
We
don't
we
don't
know
what
that
latency
is.
H
But
so
just
I
think,
the
number
of
resources
of
any
resource
you
should
be
able
to
get
with
keepstate
metrics.
I
don't
know
if
there
are,
but
you
fall
and
openshift
somehow
what
you
need
to
do
to
get
them,
but
I
think
that's
the
tool
to
go.
If
you
just
want
to
know
how
many
objects
are
there
I'll
take.
A
A
look
okay,
yeah,
but
you
could
well
what
about
like
like
we
do
right
now.
We
do.
We
count
like
the
number
of
like
bmi
is
in
a
in
a
state
like
I
don't
know
what
other
states
there
are,
but
I
mean
it's
like.
I
think
it's
paused
right
for
vms
it's
running.
There's
I
don't
know
what
else
there
is
I
mean.
Maybe
that
could
be
valuable,
so
there
might
be
some
other
ones
like
not
just
like.
C
It
mirrors
a
few
of
the
values
kind
of
I
think
that
maybe
some
of
the
conditions
are
are
mirrored,
but
it
definitely
does
not
have
the
same
granularity
that
the
vmi
does
so
there's
a
lot
more
in
the
vmi
than
it's
on
the
vm.
A
A
C
Do
is
there
anything
else,
people
want
to
bring
out
the
one.
Well,
so
has
any
more.
Do
we
have
any
progress
update
on
the
periodics
or
anything
like
that?
I
know
there's
a
task
waiting
on
me
to
integrate
things
like
perf,
audit
and
stuff,
like
that,
I'm
just
curious,
being
any
work's
been
done
in
that
area
that
we
should
review
over
the
past
week.
C
I
know
marcelo
initiated
some
of
the
the
first
periodic
things
like
that,
so
if
no
work's
been
done,
we're
done
for
topic
for
that
topic.
I
just
want
to
make
sure
I
wasn't
going
to
stomp
on
anybody
if
I
end
up
working
on
that
a
little
bit
in
the
next
few
days.
Okay,.
A
Okay,
well,
why
don't
we
take
a
few
minutes?
Let's,
let's
talk
about
some
more
some
more
metrics,
because
I
mean
I've
heard
a
few
here,
just
being
thrown
out,
not
just
the
vm
metrics
like.
A
I
also
heard
this
one
like
the
the
the
volume
creation
one
like
that,
like
I
heard
that
one
as
well
that
we
can
enumerate
on
some
of
these
so
like
so
what
is
like,
what's
a
valuable,
what
are
valuable
metrics
for
vm,
like
I
think
so,
like
I
don't
know
like
count
so
like
what
what's
the
data
we
want
to
get
like?
What's
the
if
we're
reading,
you
know
metric,
what
do
we
want
to
get
from
it
like?
How
can
we.
C
What
are
some
ideas
so
one
of
the
things
that's
kind
of
difficult
about
the
vm?
Is
we
don't
have
a
phase,
so
we
don't
have
a
clear
transition
between
different
states
like
we
do
with
the
vmi,
so
we
have
to
look
at
it
a
little
bit
differently,
there's
not
a
clear
delineation
between
all
the
different
possibilities.
I
guess
that's
what
I'm
getting
at
the
things
I
care
about
is
understanding
how
long
it
takes
for
storage
provisioning
to
occur
before
launching
the
vmi.
A
So,
what's
the
difference
between
the
storage
provisioning
with
a
vm
metric
than
just
from
the
vmi
like?
Would
this
be
like?
Would
this
be
a
something
that's
specific
to
vms,
or
is
this
like
this?
Would.
C
Be
it's
specific
to
the
ends
so
there's
a
vm
feature
similar
okay,
so
think
about
a
stateful
set
with
the
stateful
step.
Today
you
have
a
pvc
or
persistent
volume,
claim
template
and
you
specify
what
you,
what
kind
of
storage
you
want,
every
one
of
these
replicas
to
have
and
when
a
new
replica
comes
online,
new
storage
is
provisioned,
that's
specific!
For
that
replica
and
a
virtual
machine.
A
A
C
The
creation
and
population
of
the
pvc
before
the
vm
starts
or
the
vmi
is
posted.
So
that's
that's
exercising
a
lot
more
right.
Let's
say
more
than
our
control
plane.
That's
the
cdi
control
plan,
because
that's
the
thing
that's
actually
going
to
be
populating
the
ppc
and
this
the
storage
provider
itself,
so
the
csi,
the
underlying
csi
storage
class,
that's
actually
creating
the
storage
network
storage
for
us
and
how
quickly
it
can,
whatever
it's
involved
with
populating
the
storage.
A
A
Okay,
what
other
things
people
have.
B
In
mind,
the
desk
is
the
amount
of
iops
each
vm
and
the
total
we
have
on
the
the
server
or
our
node.
B
We
do
a
tricky
things
to
make
it
happen
on
our
solution,
but
you
can
understand.
I
would
like
to
better
understand.
We
can
get
these
metrics
for
measure
not
only
a
single
vm,
but
the
average
of
all
vms.
In
this
this
node,
the
total
iops
I'm
getting
okay.
B
The
other
thing
that
the
desks
use
we
use
gpu
on
on
the
guest
vms
and
that's
why
we
better,
we
need
to
measure
the
amount
of
of
processing
is,
is
having,
and
the
problem
we
find
is
also.
We
measure
the
temperature
temperature
of
the
gpu
on
on
on
the
node.
Also,
for
you
know,
in
the
current
version
that
is
without
could
be
worth
before.
You
know.
A
You
might
get
that
well
for
the
node.
I
mean
the
one
that
I
at
least
that
interests
me
like.
I
guess,
with
what
you
said
is
like
the
like
device
plug-in
latency.
A
This
would
rely,
though,
on
something
external
like
we
had.
You
have
to
have
like
if
you're,
using
the
videos
you're
using
the
nvidia's
gpu
device
plug-in
like
you
you're,
relying
on
that
latest.
I
mean
that
one
we
could.
I
mean
that
could
be
something
that
I
guess
we
record
in
here,
like
I'm
just
trying
to
think
I
mean,
because
the
device
plug-in
can
also
explode.
It's
on
metrics.
B
A
B
A
B
This
is
don't.
Let
me
give
you
the
code,
what
we
are
using
just
once
one
second.
A
Yeah
I
mean,
I
think,
well
I'll
write
it
down,
because
it
is
so
like
I
mean
it's
something
to
consider,
because,
like
device
plug-in,
I
mean
there
in
terms
of
like
if
we're
measuring
this.
This
also
isn't
specific
to
vms
I
mean
I
think,
but
this
is
something
where
I
mean
we
would
want
to
know
in
terms
of
like
our
total
performance.
It
is
something
that
can
certainly
affect
the
you
know
the
gauge
like
push
it
one
way
or
another
if
we're
slow.
A
So
it
is
good
to
know,
I
guess,
trying
to
find
the
right
way
to
measuring
it
to
measure
it
is
the
challenge
so,
but
I'll
write
it
down
here.
So
there's
something
I
had
in
the
other
one
too.
I
think
it
makes
sense
to
something
just
finding
the
right
place.
A
Yes,
he
said
gpu
temperature,
something
we
can
also
look
at.
G
D
B
D
G
B
It's
on
the
vm
level,
because
you
give
a
profile,
and
this
profile
has
that
temperature,
for
instance,
like
the
node,
has
like
four
gpus,
which
one
is
using
this
vm
is
using
before
you
understand.
A
Yeah,
the
challenge
is
to
like
you
know,
saying
like:
where
are
you
going
to
get
this
information
like?
It's
probably
insist
this.
Is
it
so
like
that's
where,
like
you're,
not
going
to
get
that
with
vert
launcher
like
that,
that's
where
it's
a
little
bit
above
I
don't
know.
I
like
this
is
where
I'm
thinking
like
the
device
plug-in
is
going
to
have
access
to
some
of
this.
A
So
it
could
be
that
that's
the
place
where
this
goes,
but
I
mean
I
think
we
need
to
explore
this
a
little
more
because
I'm
not
so
sure
if
it.
If
we
know
the
whole
picture
here,
great
go
on
okay.
So
what
are
the
other?
So
what
some
other
ideas?
What
else
do
we
want
to
know
about
vmetrics?
A
I
I
mentioned
like
well,
I
heard,
let's
all
say
some
count
and
I
kind
of
expanded
on
the
idea
of
count
like
do
we
like
what
are
we?
How
would
I
do
what's
the
right
way
to
describe
this
like?
Is
it
just
the
number
of
vms
like?
Is
it
like?
Does
it
make
sense
to
break
down
by
like
the
number
that
are
running?
Remember
they're
not
running!
Is
that,
like
that,
be
like
a
like
a
histogram
or
something?
Is
that
a
metric?
We
can
do
that
valuable.
G
A
These
is
this
like
a
status
like:
what's
the,
how
do
they
how's
like
running
or
not
running
like
on
vms
how's,
that
displayed.
C
A
C
So
I
think,
there's
three
buckets
here:
there
there's
provisioning,
there's
running
and
then
there's
shutting
down.
Okay,.
A
I
A
Wasn't
sure
if
that
was
a
phase,
because
I
know
that
it
for
vms,
because
I
know
like
that:
well,
it
doesn't
it's
not
a
sound
specifically
my
face,
but
I
know
you
could
pause.
I
wasn't
sure
if
it
gets
reflected
on
the
vm
as
well.
So
I
guess
it
sounds
like
it's
not
slowly
running.
A
Oh,
I
mean
what
are
the
current
conditions
that
we
post
on
the
on
the
vm,
and
this
do
we
have
a
list
of
them.
C
A
C
What
we
have
is
the
best
thing
we
have
is
it's
not
meant
for
this
purpose,
but
I'll
say
it.
We
have
something
called
a
virtual
machine,
printable
status
and
what
it
is
is
it
looks
at
the
virtual
machine
status
and
it
aggregates
all
the
conditions
to
try
to
come
up
with
a
human
readable
explanation
for
the
state
of
the
virtual
machine.
So
that's
going
to
be,
for
example,
looking
at
all
the
conditions
and
saying
this
virtual
machine
is
stopped
or
this
virtual
machine
is
provisioning
or
starting
or
running
or
paused
or
whatever.
C
I
think
when
we're
looking
at
creating
a
metric,
we're
probably
going
going
to
want
to
kind
of
perhaps
replicate
some
of
this
to
understand,
what's
occurring
with
the
virtual
machine.
But
it's
it's
not
an
official
like
it's,
not
a
stable,
necessarily
way
of
doing
that,
because
it's
meant
to
convey
information
to
the
user,
not
necessarily
programmatically,
to
like
something,
that's
consuming
the
vmi
status
directly.
C
E
A
Yeah
I
mean
it
sounds
like
I'm
not
familiar
with
all
the
conditions,
so
it
sounds
like
it
sounds
like
that.
There's
a
lot!
Okay
that
we
could.
We
could
do
on
this
based
on
what
people
want
to
do.
I
mean
to
me
like:
what's
what's
the
simplest,
what
are
the
I
mean
if
we
were
to
pick
you
know
a
few
simple
ones,
these
sound,
pretty
pretty
reasonable,
stops
shutting
down
running
prisoning.
That's
what.
A
Okay,
so
this
would
be
after
shutting
down
sharing.
C
C
H
H
C
Sense,
starting.
C
A
E
G
What
I'm
worried
is
that,
if
I'm
looking
at
it-
and
I
want
to
see
like
a
pie
chart,
I
want
to
see
my
vms
what
status
and
how
many
I
have
in
each
state.
And
then
I
go
to
the
metric
about
with
the
phases
for
each
phase.
And
if
I
look
at
the
running
and
the
number
that
I
see
in
the
when
I
aggregate
on
the
phase
metric,
will
I
get
the
same
number
or
will
it
include
also
starting
and
shutting
down.