►
From YouTube: SIG - Performance and scale 2021-11-04
Description
Meeting Notes: https://docs.google.com/document/d/1d_b2o05FfBG37VwlC2Z1ZArnT9-_AEJoQTe7iKaQZ6I/edit#heading=h.xo1a3u7axxkr
A
Okay,
welcome
everyone
to
sixth
scale,
it's
november
4th
we're
going
to
paste
the
link
to
the
documents
in
the
chat
over
time.
Okay,
add
yourself
as
an
attendee,
please
and
feel
free
to
add
agenda
items
while
we're
talking
about
them.
Okay,
let's
start
with
number
one
periodic
job
threshold.
So
this
we
have
some
results
here.
I
want
to
go
over
them
and
get
some
thoughts
and
kind
of
what
we
can
do
with
with
this.
So
the
change
that
was
made
is
now
we're.
A
We
actually
have
the
the
audit
two
being
built
in
the
periodic
job.
It
looks
like
it
runs
every
day,
and
so
here
are
here's
actually
a
link
to
to
this
job,
that's
running
and
so
we'll.
Let's
take
a
look
at
some
of
the
results
together
here.
A
Okay,
so
what
we'll
want
to
see
is
here
the
results
at
the
bottom.
So
this
is
the
audit
tool
running,
and
this
is
the
density
test
that
you
added
marcelo,
and
this
is
the
results
that
we're
coming
back
with.
So
I
think
here
in
terms
of
thresholds
here
are
the
three
areas
that
we
care
about,
so
the
the
p50,
the
p95
and
the
p99
there's
some
other
values
here
as
well.
A
So
here
I'll
look
at
one
more
of
these,
so
I
did
11
one.
Let's
look
at
11
2.,
so
we
get
it
so
another
picture
of
this
roughly
what
it
comes
out
to
25,
29
and
29,
and
so
by
comparison,
we've
got
23,
29
35,
so
you
know
roughly
I'm
fairly
close
on
some
of
these
I
mean,
I
guess
the
p99
is
a
little
bit
again.
It's
kind
of
what
we
expect
is
a
little
has
a
little
more
variation.
A
So
I
guess
the
question
is
kind
of
what
I
want
to
discuss
is
sort
of
how
we
want
to
look
at
this,
how
we
want
to
take
thresholds
and
because
we
we
sort
of
need.
We
have
a
few
data
points
here
and
we
want
to
decide.
A
You
know
how
what
we
want
to
use
and
how
we
want
to
measure
and
kind
of
come
up
with
a
way
that
we
think
is
reasonable,
that
we
can
report
or
we
can
gate.
So
what
do
people
think
like
p50
seems
like
we're?
Gonna
have
less
a
smaller
standard
deviation,
p95
we'll
have
a
little
bit
larger
and
p99
is
gonna,
be
I
think,
we're
gonna
see
a
lot
more
variation.
So
what
do
people
think?
I
guess.
B
A
B
C
I
don't
think
I
would
measure
p99,
I
couldn't
I
don't
know
yeah
that
would
be.
Maybe
maybe
good
yeah
makes
immediately
sense
to
me.
A
A
A
C
E
Which
could
indicate
a
problem
if
why
like?
Why.
A
Yeah,
so
that
was
another
thing.
These
values
do
vary
like
you
can
see.
So
here's
at
first,
I
don't
know
why
they're
decimals,
like
crate,
pods,
count
right
that
seem
like
that
should
be
a
whole
number,
but
this
is
there's
quite
a
bit
of
variation,
so
here's
crepe
bonds
count
50
we're
seeing
17.
E
F
A
F
F
D
F
G
It
looks
like
it's
created,
the
the
test
didn't
fail,
isn't
it
so
maybe
something
else
was
being
created
as
well
like
you
know
what
what
spots
are
just
all
the
number
of
quads
in
the
cluster
or
only
the
vmi
quads.
F
F
G
G
However,
I
wouldn't
expect
like
failing
pod
crates.
You
know
body
creations
in
it
yeah,
so.
B
F
Couldn't
have
an
error
rate
in
the
audit
tool
as
well,
where
I
could
see
just
how
many
api
failures
are
occurring
like
non
200
exit
codes.
A
A
Yeah,
so
we
we
have
so
the
like
the
fed
was
like
marcelo
like
we
have
so
we
have
like
some
of
the
like
we
can
tell
404
is
like
that's
already
a
metric
like
we
that's.
I
think
I
remember
you
marcelo
had
a
bunch
of
showed
a
bunch
of
metrics
on
this,
so
we
we
can
get
this
right
so
like
the
fair
accounts
are
like
they
want
the
like
four
fours
and
the
whatever
the
400s.
A
Anything
that
we
see
is
account,
that's
a
failure
there
and
then
what
else
like
so
we
have
delete
pod
counts.
So
can
we
get?
I
mean
we
need
to.
We
get
the
events
here
right,
so
we
could
do
like
a
transition
to
failure.
Maybe
a
failure
state
our
failed
or
succeeded
state.
G
Is
it
total
or
average
because,
like
you,
delete
pod
counts,
only
five
parts
were
deleted,
I'm
just
maybe
it's
it's
not
total.
That's
why
we
see
a
big
variation
here.
Is
it
like
a
an
average
because,
like
the
system
now
is
very
slow
and
then
we
expect
like
less
creation
per
minute.
You
know
something
like
that.
A
F
Not
going
to
be
deleting
the
pods
here
so
what
happens?
Is
we
delete
the
vmis.
G
F
Yeah,
how
are
the
pods,
how
the
vmi's
getting
cleaned
up
in
this
test,
then?
Are
we
running
it
to
that's?
They
might
not
be
that's
kind
of
interesting,
we're
just
letting
them
all
get
to
a
running
state
and
then
exiting
the
test
and
they
just
keep
running.
And
then,
when
the
cluster
gets
torn
down.
A
Okay,
that's
another
thing
to
look
at
so
so
we
have.
I
don't
know
this.
We
just
see.
I
see
a
bunch
of
different
things.
This
one
has
a
delete
pod
count.
Obviously
these
don't
so
the
I
mean
we
yeah.
I
don't
know,
I
don't
know
what
this
means,
why?
Why
would
we
be
seeing
them
here.
B
Okay,
yeah,
I
mean
I
don't
know
when
the
test
stops,
but
it
could
be
any,
and
this
david
said
that
some
parts
are
failing
and
get
recreated.
A
Okay,
all
right,
so
let
me
add,
so
we
can
do
this
right.
We
can
just
do
like
a
the
number
of
vmi's
in
a
state
in
a
specific
state.
We
can
get
that
right
because
we'll
have
the
maybe
the
phase
transition
times
we
can
do
a
count
yeah.
We
have
a
count,
I
think
in
one
of
the
metrics.
We
can
do
like
how
many
are
in
failed
or
succeeded.
At
this
point.
A
A
G
A
So
get
endpoint
counts.
Okay,
so
these
are
like
a
third
api
requests.
Patch,
okay,
patch
virtual
machine
instances
count
99..
It
seems
pretty
close
to
the
one
to
one
right
because
we're
supposed
to
have.
This
is
supposed
to
be
100,
interesting.
A
G
F
A
F
G
A
F
F
I
think
I
I
did
some
optimization
and
on
one
of
the
controllers
to
update
less.
Maybe
I
didn't
get
it.
Maybe
I
didn't
do
that
at
the
node
level,
or
maybe
I
didn't
do
it
at
the
cluster
level.
I
need
to
introduce
a
similar
change,
so
we
have
an
expectation
that
we're
not
going
to
try
to
update
a
virtual
machine
until
we
have
seen
the
previous
update
has
occurred.
F
F
Yeah
but
I
don't
know
how
big
of
a
deal
that
was
really
but
it's
possible
that
we're
seeing
something
similar.
F
A
A
Yeah,
I
think
the
real,
like
one
of
the
big
mysteries
here
is
one
shows
up
here
and
kind
of
what's
leading
to
this.
Some
of
the
this,
these
numbers
being
a
little
bit
slower
than
the
others,
and
maybe
it's
that
there's
we're
having
some
failures
in
the
with
the
vmi's
yeah.
I
think
I
think
to
me,
like
yeah.
A
Probably
the
best
next
step
here
is:
let's
get
some
more,
let's
get
some
more
data
on
seeing
well
how
many
failed
bmi
somebody,
how
many
are
in
the
final
state-
and
that
might
give
us
a
little
more
insight
into
what's
going
on
here
and
then
maybe
if
we
could
start,
I
mean
I
think
to
me
like
if
we're.
A
If
this,
if
we
get
this
much
variation,
I
mean
this
was
a
lot.
We've
only
run
this
like
eight
or
nine
times,
or
something
like
that.
I'd
expect
like
a
little
bit.
I
wouldn't
expect
to
see
this
much
variation
in
that
that
quickly.
So
maybe
we
need
to
fix
that
first,
let's
find
out
what's
going
on
there
and
then
we
can
settle
in
on
some
of
these
numbers
yeah
and
the
other
thing
is
so
to
back.
A
To
the
other
point,
I
think,
like
maybe
we'll
just
have
this
discussion:
real
quick,
like
a
9,
p,
50
p,
95
p99,
I
like
p99
to
me,
seems
like
we
were
going
to
get
a
lot
of
variation.
Maybe
we
just
kind.
A
G
A
Yeah,
what
I
was
thinking
I
mean,
can
we
do
like
if
there's
like
a
p95
is
off
where
we
can
like
send
a
message
about
it
and
say
like
hey,
you
might
want
to
rerun
this
test
or
something
or
if
p50
is
slow,
then
we
fail.
Is
there
like
a
way
to
do
that.
G
G
You
know
set
an
alert
because
right
now,
since
the
cluster
shared
it's
expected
to
have
like
big
variations
now,
it's
just,
I
would
say
for
visualization,
it's
very
nice
to
have
the
test
and
then
we
check
and
see,
and
then
when
we
have
the
test
that
is
isolated,
then
we
can
set
alerts.
Otherwise
it
will
be.
Maybe
too
much
you
know
it
will
start.
B
F
G
G
B
B
F
Let's
say
rates
are
failing
because
other
things
are
causing
rate
limiting
and
other
problems.
It's
hard
to
say,
choose.
B
G
A
A
But
yeah
it's
not
clear,
I
would
it
would
be
yeah.
That
is
that
was
another
thing
like
it
would
be
nice
to
see
like
have
it
have
the
breakdown
of
we
like
we
talked
about
like
failed
there
I
mean
it
would
be
nice
to
know
that,
like
okay,
we,
this
was
the
expected.
You
know
we
have
100
density
tests,
but
yeah-
I
don't
I
don't.
I
don't
know
what's
happening
up
here.
F
Maybe
the
test
isn't
validating
all
the
vmis
that
come
online
before
before
exiting
or
something.
Where
can
we
see
this
test?
It's
an
let's
say
tests
is
a
density.
G
G
It
does
not
delete
okay,
so
it
leaves
for
the
cleaner.
That's
why
we
don't
see
the
delete
sometimes.
G
It's
fine
for
now,
isn't
it.
We
are
not
checking.
B
You
just
for
performances,
you
have
to
be
careful
because
there
are
big
variations
on
the
delete
times
based
on
cubelet
timings,
so
we
are
normally
between
end-to-end
tests
not
waiting
until
they
are
really
all
gone.
They
get
deleted
and
they
will
disappear
within
a
few
seconds,
for
instance.
I
B
I
I
B
F
We
would
expect
to
see
the
deleted
vmis
api
yeah.
G
B
B
G
B
It's
really
more
matter
of
what
the
test
is
supposed
to
do
right
now,
if
it's
just
about
the
creates,
we
can
ignore
the
deletes
right
now,
but
I
mean
yeah,
but
it's
still
interesting
here.
If
the,
if
you
don't
give
the
other
tool
a
good
time
stamp,
it
may
catch
some
deletes
of
the
clean
up,
but
also
not
it's
so
it
could
really
be
that
the
cluster
had
another
issue
and
five
parts
cut
to.
F
Actually,
a
minute
after
the
test,
because
we
run
we
get
the
start
time.
We
run
the
performance
test,
functional
test,
then
the
performance
test
I
can
see
in
the
after
clause
waits
30
seconds,
but
then
in
the
script
that
we're
actually
running
profiled
it,
for
it
also
waits
30
seconds
yeah.
B
B
G
F
B
Maybe
we
are
packing
the
pots
now
tighter
than
before.
G
A
Do
you,
how
are
we,
how
are
you
calculating
this
look
back
here,
so
we're
running
the
a
minute
later,
so
we've
let
the
metrics
reach
prometheus
and
then
we're
looking.
F
Back
at
that
command
that
I'm
that
I'm
running
and
perf
scale
audit,
and
maybe
we
can
yeah.
J
F
G
G
F
I
can
make
the
time
window
I
can
I
can
budget
a
little
bit
and
make
it
go
back
a
little
bit
further.
G
A
C
J
Let's
see
if
that
makes
a
difference
when
they
look
back.
A
G
We
should
have
a
problem
here,
isn't
it
if
that's
what's
happening,
it
shouldn't
have
yeah.
You.
G
F
F
G
F
Output
before
we
create
thresholds-
and
we
are
getting
that
that's
really
curious-
it
indicates
that
we
need
to
investigate.
What's
going
on,
like
we,
don't
even
have
a
stable
enough
output
to
set
thresholds,
and
maybe
our
controllers
are
messed
up.
G
J
A
While
you
work
on
that
dedicated
environment
silo
and
then
let's
see
what
we
can
find,
okay,
all
right,
let's
move
to
the
next
topic,
so
this
is
tracing
so
I
wanted
to
so
I
created
a
pull
request
for
this.
A
A
It
creates
a
time
stamp
and
then
every
time
that
you
want
to
record
enough
some
sort
of
events,
you
can
add
a
step
in
there
and
I
don't
have
an
additional
time
stamp
and
then,
finally,
you
can
stop
it
and
and
then
it
stops
timestamp
and
then
you
can
log
at
that
point
if
it
takes
longer,
if
you
know,
excuse
the
track
or
two
times
and
it's
longer
than
the
same
amount
of
time,
so
I
did
was,
is
that
this
pull
quest
adds
a
the
trace
that
will
output
to
the
logs
if
the
work
queue
takes
longer
than
a
second.
A
So
all
I
did
here
is,
I
just
had
to
sleep
for
one
second,
so
you
can
actually
kind
of
see
roughly
how
long
the
work
you
takes
for
a
lot
of
these,
it's
fairly
short,
it's
milliseconds,
and
so
I
kind
of
wanted
to
get
some
more
feedback
on
this
and
dave.
I
know
you
want
to
comment
so
I
wanted
to
talk
about
that
as
well.
It's
like
do
you
want
to
talk
about
kind
of
where
your
stance
is
with
this
and
what
kind
of
thoughts
about
like
your
concerns
about
it.
F
Sure
so
there
I
had
two
primary
comments,
and
I
don't
know
the
first
one
before
I
get
into
the
whole
performance
and
log
for
bossy
thing
was
what
we're
actually
measuring
it.
Looked
like
in
the
code
you
were
measuring
when
we
rate
limited
as
well.
Considering
that
as
part
of
the
same
span
did
we
address
that
that's
going
to
make
sure.
A
F
All
right,
so
we
can
move
on
from
that.
The
thing
I'm
I'm
concerned
about
is
enabling
this
tracing
by
default.
Just
always
a
lot
of
that
concern
is
simply
an
unknown
for
me.
So
I
don't
know
the
performance
impact
of
tracing,
especially
at
scale
like
how
much
more
memory
cpu
that
might
require
of
our
components,
and
I
also
don't
know
how
how
much
logging
spam
this
could
potentially
cause
if
certain
issues
arise.
F
So,
if
we're
just
hitting
this
all
the
time
in
an
unexpected
way,
like
a
certain
condition,
gets
hit
and
all
of
a
sudden
we're
just
spamming
the
logs
with
traces
every
time
a
a
key
is
queued,
then
that's
not
great
either.
So
I
was
trying
to
come
up
with
an
idea
of
how
maybe
we
could
enable
tracing
dynamically
as
like
a
debug
tool,
and
the
idea
that
came
to
mind
was
maybe
tying
it
to
log
verbosity.
G
F
Add
a
like
in
your
you've,
made
abstraction
around
tracing.
We
just
abstract
creative
structure,
abstraction
around
that
object
that
trace
object
and
just
make
it
a
no
op
until
log
verbosity
hits
something.
So
we
just
not
do
anything.
F
G
Avoid
to
trace
too
much
things
for
now
like
if
we
are
interested
in
something
try
to
keep
it
as
minimal
as
possible,
also
to
don't
increase
the
cold
too
much
you
know,
but
otherwise
it
starts
to
be
like
too
big
and
hard
to
control.
A
Yeah,
so
I
hear
the
concern
so
like
the
so
some
of
this
just
to
give
some
more
context,
like
kind
of
where,
where
it
came
from
so
the
I
was,
I
was
looking
at
something
in
the
the
kubernetes
api
server
and
the
api
server
has
this
on
and
I
can
see
it
in
the
live
it
passes
by
all
the
time
and
and
so
like
in
terms
of
in
terms
of
performance
like
like
looking
at
the
library
it
all.
A
It's
doing,
is
it's
it's
it's
taking
this
the
time
stamps,
in
which
case
we
have.
We
have
our
start
time
stamp.
So
we
have
one
and
then
I
think
I
have
two
steps
in
there.
So
two
steps
and
then
the
stop
and
then
so
we
just
do
a
math
operation
to
subtract
the
time
steps
from
that
from
that
time
period
and
if
we
don't
go
over
the
threshold,
which
I
said
at
one
second,
we
just
we
throw
it
out.
So
it's
so
by
throwing
out.
Basically
we
actually
can
log
it
it's
just
we
don't.
A
We
don't
actually
increase
the
the
basically,
the
logging
level
gets
increased
when
you
go
over
the
thresholds.
So
we
so
you
could
log
it,
but
obviously
we
wouldn't
want
to,
because
when
we
when
we
have
expected
behavior
when
we're
under
the
threshold,
so
the
basic
idea
is
that
it
does
that
it
records
those
time
stamps
and
it
just
does
a
math
operation
so
and
we're
kind
of
we're
doing
these
sequentially
right.
These
keys
are
one
at
a
time.
It's
not
it's.
A
So
in
terms
of
performance,
it's
not
massive
and
we
have
three
steps.
I
mean
I
wouldn't
expect
to
see
anything
at
all
for
for
performance
on
this
and
then
logging
wise,
like
so.
This
is
maybe
where
we
might
differ.
A
Is
that
so,
from
my
perspective,
if
we're
running
into
a
problem
like
with
performance
like
I
have
said
it
one
second,
if
our
work
queue
is
taking
longer
than
one
second,
I
really
want
to
know
that,
like
I'd
really
want
to
know
that,
and-
and
I
would
know
that
like-
I
would
see
that,
like
okay,
I'm
seeing
a
number
of
like
with
with
this
enabled,
I
would
see
that
pop
up,
but
like
in
every
other
case,
which
is
probably
like
99
of
the
time,
I'm
not
going
to
see
anything
at
all,
because
my
work
use
is
milliseconds.
A
Like
I
mean
this
is
18
milliseconds.
This
is
24,
and
I
mean
these
are.
These
are
tiny
numbers
that
that
are
not
going
to
be
they'll,
never
make
their
way
out.
So
I
mean,
I
guess
so.
My
perspective
on
this
is
that
I
I
think
I
would,
I
think
we
can.
We
could
enable,
and
I
don't
think
it
would
be
a
dragon
performance,
and
I
don't
think
it
would
also
be
a
drag
on
logs.
I
think
and
the
in
the
case
that
that
it's
bad,
I
I
think
it's
a
good
thing
to
log.
A
It's
almost
like
an
error
like
if
we
were
to,
if
we're
slow
when
want
to
know
and
then
that's
that's
kind
of
how
I'm
looking
at
it
instead
of
as
sort
of
as
like
a
you
know,
spamming
the
logs.
It's
really
like.
Okay,
we've
got
a
problem
here.
We
need
to
do
something.
F
I
don't
disagree,
I'm
still
nervous
about
the
unknown.
So
do
we
have
a
precedent,
so
this
is
used
in
kubernetes
my
understanding.
Maybe
we
can
gain
an
understanding
if
it's
always
on
in
kubernetes
and
what's
measured
and
let
that
kind
of
guide
what
we
feel
comfortable
with.
So
we
could
see
how
it's
utilized
in
other
production
environments
and
if
it's
on
then
okay,
then
it's
proven
that
at
least
it
doesn't
cause
any
problems.
A
Yeah,
so
we
have,
we
have
well
it's
kind
of
not
it's
a
it's
not
really
clear.
To
me
to
be
honest
because,
like
the
we
see,
we
see
that
you
know
we
had
that
the
slow
work
queue
times
and
those
have
a
ton
of
variation
and
and
yeah
I
mean
we
do
have
some.
But
it's
really
it's
not
really
clear.
You
know
what
like
what's
slow
and
this
is
actually
kind
of
what
I'd
want
to
know
like.
I
have
no
idea
like
which
of
these
work.
A
Cues
were
slow
and
if
we
do
hit
one
you
know
like
in
the
case
of
what
we
saw
in
prometheus,
like
we
see
a
few
of
them,
I
mean
it
would
be
interesting
to
know
about
them
and
have
them
in
the
log,
but
I
mean
it
doesn't
happen.
Often
from
from
my
testing.
I
don't
see
this
happen,
but,
and
we
can
even
see
I
mean
you
can
see
it
in
the
metrics
too,
like
we
see
a
few
very
slow
ones,
but
it
doesn't
happen.
A
G
F
I
think
both
for
sure,
so
the
metric
is
helpful
to
gain
insights.
I
think
for
like
a
high
level
what's
happening,
but
I
think
the
logs
are
really
helpful
to
understand
like
at
what
point
did
this
occur?
Why
did
it
occur?
What
was
happening
with
what
exact
key
was
it
and
where
was
it
in
the
in,
like
the
flow.
A
Yeah
this
will
even
so
I
what's
handy,
is
the
this
step
function
and
like
it'll.
This
will
tell
you
the
time
between
steps
and
even
though
this
like
says,
change
a
lot
of
files.
There's
really
not
a
lot
of
code
to
this.
It's
basically
I'm
just
adding
into
the
kind
of
key
parts
of
this
code,
which
is,
I
think
I
have
it
around.
That's
a
little
farther
down.
A
I
think
it's
around
the
updates
or
the
the
sync
and
the
update
functions,
I
think,
are
the
only
two
update
status,
yeah
update
status
and
then
the
other
one
I
think
is
in
sync
and
that's
it,
and
so
we
would
tell
we'd
be
able
to
know
like
when
what
was
slow
like
we
could
tell
okay
update
status
was,
though,
which
could
be
like
well.
Maybe
it
was
an
api
call
in
here
like
took
longer
than
a
second
or
something
or
whatever
it
was.
A
C
A
We
can
you
can
put
anything
so
it
basically
each
step.
Will
you
can
take
you'll,
do
time,
stamps
the
difference
between
each
of
the
steps,
and
so
you
can
get
information
and
that
you
can
pass
during
this
time
period
like
you
know
what
happened
if
you,
if
you
want
to
if
you
want
to
get
specific,
but
I
was
I've
only
added
two
steps
here,
just
about
the
general
functions
but
yeah
you
can
get
more
specific
about
them.
G
F
If
it
ends
up
being
something
we
want
to
tie
into
some
sort
of
dynamic
like
rossi
or
whatever
I
I
can
help
it's
that
might
sound
kind
of
intimidating.
I
don't
know
if
it
sounds
intimidating,
but
it's
actually
really
simple.
I
can
show
you
how
to
do
it
and
it's
going
to
be
an
easy
thing
to
to
tap
into.
We
have
a
lot
of
apis
and
ways
of
getting
that
information
that
makes
it
not
a
burden
at
all.
A
Sure,
okay,
yeah
sounds
good.
All
right,
we'll
follow
up
with
this
one
and
get
out
then
and
yeah
and
yeah.
So
this
is
just
one.
This
is
just
for
controller.
So
what
I'll,
once
whatever
we
decide
to
get
this
and
I'll
do
each
of
the
the
work
cues
and
whatever
else
after
okay
sounds
good
all
right,
we
don't
have
any
more
topics
david.
Did
you
want
to
talk
about
the
the
virtual
machine
pools,
or
do
you
want
to
save
that
from
a
different
time.
F
I
posted
it
so
there's
a
pr
now
for
virtual
machine
pools
and
what
I
did
was.
I
just
implemented
all
of
the
default
behavior,
so
I
looked
at
all
the
different
tunings
we
had
when
if
somebody
created
a
virtual
machine
with
our
design
and
set
no
tunings
other
than
how
many
replicas
they
want,
that's
essentially
what
I've
implemented
and
then
we
can
go
in
and
begin
adding
all
the
more
advanced
tunings
in
the
future.
I
just
didn't
want
to
overwhelm
this
one
pr
with
too
much.
So
that's
how
I
broke
it
up.
F
I
think
it's
pretty
close
to
what
we
want.
Ryan.
F
You
had
a
good
point
about
not
really
having
a
use
case
for
attach,
so
we
we
want
to
be
able
to
detach
virtual
machines
from
this
pool,
but
attaching
them
is
I
I
can't
think
of
a
reason
why
and
I
think,
might
actually
cause
some
problems
so
I'll
probably
either
disable
that
behavior
or
make
it
something
that
users
wouldn't
do,
but
there
might
be
a
case
where
somebody
orphan
deletes
a
virtual
machine
pool
and
then
recreates
it
where
those
previous
one
virtual
machines
would
get
adopted
again.
F
A
Yeah,
I
would
so
my
perspective
on
this.
Was
that,
like
I
had
some
concerns
in
the
touch
because
of
like
I
was
just
kind
of
looking
at
deployments
and
stuff
and
and
the
behavior
that
has
like,
when
you
do,
when
you
reattach
and
detach
a
pod
from
the
deployment
and
and
from
the
deployment
it
kicks
out
another
one
when
you,
when
you
reattach,
and
I
don't
and
that's
not
the
behavior
that
I
think
you
have
here
but.
F
Don't
explicitly
do
that,
but
it
would
be
kicked
out
due
to
the
replica
count,
not
being
it
would
be,
plus
one
because
you
attach
something
so
something's
going
to
get
removed.
A
Right,
well,
that's
that's
where
I
was
going
to
say:
that's
like
I
was
like
I
was.
I
was
like
where
how
are
we
going
to
solve
that
because
we're
in
this
weird
state,
because
what
made
me
think
of
this
is
like
okay,
so
let's
say
we
did
this.
We
have
these
stateful
objects,
you
know
now.
How
do
we
kick
out
like
you
know?
A
What
do
we
do
like
it's
almost
concerning
like
and
there's
also-
and
I
can
also
think
of
a
lot
of
mistakes
could
happen
this
way
and
and
like
we're,
we're
attaching
and
we
could
we
could
like.
Well,
I
don't
know
it
just
seems
risky
like
to
to
want
to
do
this
as
a
use
case.
I
don't
know
like
that.
That's
kind
of
where,
where.
F
It
was,
I
agree
and
that's
a
problem.
For
example,
aws
has
their
auto
scaling
groups,
which
is
similar
to
a
virtual
machine
pool,
even
though
it's
got
the
word
auto
scale
in
it
and
when
they
attach
something
to
an
autoscope
group,
they
actually
increment
the
replica
account
so
they're
they're
doing
bookkeeping
on
the
behalf
of
the
user
and
it's
possible
even
with
detach
they're,
doing
something
similar
where
they
decrement
the
replica
account
which
I'm
not
doing
for
detach.
F
If
you
detach
something,
then
it's
going
to
a
new
one's
going
to
get
spun
up
somewhere,
but
you
know
one
the
virtual
machine
detached
is.
You
know
yours
to
mess
around
with,
so
that's
debatable
as
well,
whether
we
it's
tough
because
we're
trying
to
run
this
line
between
what
the
kubernetes
ecosystem
is
doing,
which
is
what
I've
aligned
with
and
what
the
virtual
machine
ecosystem
is
doing,
which
is
a
little
different
and
less
standardized.
F
I
think
when
in
doubt,
probably
do
the
thing
that
the
kubernetes
ecosystem
is
doing
but
ensure
that
we
allow
the
flexibility
to
achieve
the
kinds
of
patterns
that
would
be
expected
in
the
virtual
machine
world
so,
for
example,
attach
and
detach
where
we
aren't
decrementing
or
incrementing.
The
replica
account
it's
possible.
Somebody
could
pause
their
virtual
machine
pool
attach
something
then
increment.
The
replica
account
themselves
and
everything
would
stay
stable,
but.
A
Yeah,
the,
I
think,
maybe
that's
that's
something
I
think
I
think
it
makes
sense
what
we
said
like.
Maybe
we
we
start
with
with
the
kubernetes
ecosystems
definition
of
this
with
to
me.
I
think
I
like
it
detach,
I
think
that
makes
sense,
detaches
and
detach
and
then
replace
we're
going
to
replace
with
the
rep
account
and
then
I
think,
like
the
virtualization
world,
it's
like
we're
going
to
detach
almost
with
the
intention
of
possibly
bringing
you
back,
which
I
think
is
maybe
a
little
bit
different
behavior.
A
So
I
mean,
I
think,
and
then
which
could
be
we
could
enable.
I
mean
that's
that
could
be
enabled,
but
I
think
it
would
be
different.
I
think,
would
be
different
than
kind
of
this
different
approach
than
this,
which
is,
I
think,
just
attaches
and
detaches
and
like
we're
going
to
do
something
with
it
and
we're
going
to
just
replace.
F
I
agree
so
for
now
my
take
is
I'm
going
to
allow
vms
to
be
detached.
They
will
get
replaced,
but
you
know
you'll
still
hold
on
to
your
the
virtual
machine
that
you
detached,
and
I
will
not
implement,
attach
we'll
just
I'll
think
through
that
a
little
bit,
it's
possible
that
I
would
allow
adoption
of
previous
virtual
machines
if
something
like
a
virtual
machine
pool
got
deleted
and
recreated
like
orphan
deleted.
A
Okay,
one
of
the
thought
I
had
about
this
because
I
thought
this
was
really
a
neat
idea
to
use
the
label
selector.
One
of
the.
Let
me
see
if
I
can
go
open
this
up.
A
One
of
the
thoughts
I
had
was
so
the
label
selector
is
what
would
control
effectively
detach
so,
in
other
words
like
patch
here
this
permission
will
allow
us
to
detach
something
that
will
give
us
the
the
ability
to
do
this.
Well.
B
A
A
A
Yeah,
oh
okay,
so
it
would
be
so
you'd
patch,
the
vm,
so
we'll,
okay,
but
well.
The
point
still
stands
like
just
maybe
not
this
object,
but
if
you
could
pat,
if
you
have
permissions
to
patch
the
vm
object,
you'd
be
able
to
detach
that's
sort
of
like
our
yes
or
way
in
okay,
and
so
the
my
thought
was.
Okay.
Do
I
mean
is
that
is
that
the
way
we'd
want
to
go
like
should
detach
be
like
its
own
api
resource?
We're
talking.
E
B
Yeah
both
this
both
are
possible.
The
underlying
mechanism
would
be
the
same.
The
one
of
the
main
reasons
for
this
is,
and
that's
why
the
core
components
have
this.
You
can't
just
do
these
operations
just
with
cube
cattle
on
any
types
of
resources.
You
don't
need
extra
sub
resources,
different
types
of
objects,
it's
very
easy.
B
Yeah
remove
the
label.
One
question
regarding
to
the
detach:
is
you
probably
thought
about
this
already,
but
would
I
then,
in
practice
skipping
that
index
basically
and
creating
another
index.
F
So
what
happens
is
when
we
are
creating
virtual
machines,
I'm
looking
at
all
the
virtual
machines
in
that
namespace
and
if
a
virtual
machine
with
a
certain
name
like
I'm
indexing,
just
incrementing
if
it
exists,
then
I
skip
that
index.
So
the
same
thing.
F
F
Let's
say
you
had
nine
and
zero
through
nine,
and
you
want
five
replicas
now
who
knows
like
right
now:
it's
random
that
was
part
of
the
default
behavior,
which
ones
get
picked,
so
you
might
have
nine
and
zero
exist
in
your
pool
with
3,
4
and
6,
or
something
there's
just
no
correlation
to
the
index.
A
F
Okay,
it's
possible.
This
would
make
sense
for
our
density
test
at
some
point,
manipulating
pools,
yeah
but
we'll
see
I
I
do
want
to
begin
at
some
point
in
the
density
tests:
testing
the
virtual
machine
object
rather
than
just
the
vmi
object
and
we're
not
we're
not
there
yet,
and
I
don't
want
to
derail
what
we're
doing
quite
yet.
But
the
idea
of
including
persistent
storage
in
this
flow,
I
think,
will
be
important
to
us
at
some
point
and
that's
where
the
vm
and
the
vm
pool
might
be
important
for
us.
G
F
Excellent
yeah,
so
it
might
even
be
a
different
density
test
entirely
one
that
has
persistent
like
network
storage
attached
to
it,
and
then
we
begin
wanting
to
know
like,
for
example,
what
happens
when
we
try
to
start
100
virtual
machines
in
this
environment
and
we
need
the
smart
clone
100
pvcs
from
the
root
disk,
like
so
we're,
measuring
something
outside
of
just
our
cuvvert
control
plane
and
trying
to
understand
the
impact
of
storage
with
all
these
start
times
and
stuff
as
well,
but
not
there.
Yet
that's
a
kind
of
a
future
topic.
F
A
Cool
all
right,
we
have
two
minutes
left,
there's
nothing.
We
have
any
more
topics,
we
have
any
final,
closing
thoughts
and
even
more
topics
to
bring
up
before
we
finish.
F
One
thought
I
just
had
when
we
look
at
one
reason:
the
p99
and
even
maybe
the
p95
creation
to
running
might
not
be
super
accurate.
Is
there
going
to
be
an
initial
pull
of
the
container
disk
and
that's
going
to
mean
that
one
virtual
machine
instance
takes
longer
than
all
the
subsequent
ones
on
that
node?
I
wonder
if
we
should
consider
that
somehow.
F
Maybe
roman,
do
you
know?
Well,
we
pre-cache
those
images.
Don't
we
on
keyboard?
Sorry,
I
I
got
distracted
for
a
moment.
That's
a
terrific!
So
for
container
disks
are
we
pre-caching
the
container
disks
on
we
are
on
key
for
ci.
I'm
pretty
sure
we
are
would
make
cluster
yeah
they're.
F
B
B
B
Let
me
see
what
density,
but
if
you
use
a
dedicated
cluster
which
is
not
using
keyboard
ci,
you
will
have
to
do
the
pre-caching
yourself,
so
this
only
applies
to
kuberci
clusters,
so
but
the
dedicated.
B
B
F
Really
simple
just
make
sure
that
before
the
test
runs,
if
we're
not
using
like
cluster
sync,
with
one
of
our
standard
key
vert
ci
clusters
that
we
pre-populate
that
image
on
every
pre-pull
it
on
every
node,
that's
it
the
one
that
you're
using
for
the
container
desk.
F
Launcher
we'll
we
get
it,
don't
worry
about
that
one,
because
it's
a
sorry,
it's
a
init
container
for
vert
handler.
So
it
has
to
be
there
all
right.