►
From YouTube: SIG - Performance and scale 2021-08-19
Description
Meeting Notes: https://docs.google.com/document/d/1d_b2o05FfBG37VwlC2Z1ZArnT9-_AEJoQTe7iKaQZ6I/edit#heading=h.lu45zu2oo32a
A
Okay,
all
right
welcome
to
sixth
scale.
Today
is
august
19th.
Let's,
let's
start
with
the
first
item
on
the
agenda,
the
first
one
is
vmi
specific
metrics,
so
we
had
this
discussion
previously
about
having
well.
A
The
discussion
was
sort
of
about
not
having
metrics
that
scale
with
the
number
of
objects
that
we
create
and
yeah
it's
important
for
a
lot
of
reasons
like
we
don't
want
to
just
sort
of
overwhelm
prometheus
or
with
a
bunch
of
data
we
want
to
be
careful
with,
since
I
can
affect
scale-
and
you
know,
since
we're
relying
significantly
and
perform
the
performance
measurements
on
metrics,
we
don't
want
to
overwhelm
it.
A
A
So
that's
that's
what
this
topic
is
kind
of
finding.
You
know
what
what
areas
where
we'd
want
to
know
specific
vmi
data,
so
we
can
identify
it
or,
or
something
and
kind
of
whatever
has
some
some
description
of
what's
happening.
So
I
have
some
ideas
around
around
this
and
see
if
they
make
sense
so,
for
instance,
right
now,
we
do
a
bunch
of
new
bunch
of
measurements
of
performance.
We
report
we
have
time
stamps
that
say
like
okay,
we
tune
each
phase.
A
One
one
thing
that
would
be
interesting
is
if
we
were
to
have
some
sort
of
gauge
that
could
capture
sort
of
like
how
many
bmis
are
in
each
phase.
This
could
give
us
like
a
way
that
we
could
say
like
have
a
general
idea
of
like
okay.
What's
going
on
in
the
cluster
like
we're,
seeing
a
ton
of,
am
I
stuck
in
scheduling,
we
already
know
like
there's
a
slow
performance
there
and
we're
seeing
a
lot
of
them
that
are
that
are
getting
stuck
there.
A
You
know,
maybe
we
should
investigate
that.
That's
one
idea:
another
one
is
a
few
mice
that
take
longer
than
expected
since
or
some
threshold
time
for
for
phase
so
like
right.
Now
we're
talking
about
thresholds
and
we're
going
to
do
this
in
ci
organized.
B
C
A
Okay,
sorry,
hey
all
right:
my
internet
dropped
off
again,
so
I'm
tired
from
my
phone
all
right
hold
on.
Let
me
see
if
we
can
we'll
cover
this
here.
C
A
Kevin,
let
me
see
if
let's
see,
if
I
can
okay
migraine
to
you,
ken
okay
you're
the
host
now,
okay-
that's
that's
fine,
all
right!
Well!
That
works
right
all
right.
So,
let's
I
I
don't
know
where
I
ended
up
getting
dropped
off
there,
but
I'll
just
talk
about
that
last
point.
So
the
the
that
last
point
is
the
so
vmi
that
took
longer
the
females
would
take
longer
than
expected
in
some
sort
of
threshold.
A
So
the
general
idea
is
that
if
we
have
vmis
that
are
a
little
slow
and
we
have
a
general
idea
of
what
slow
is,
you
know
we
make
something
that
we
can
configure.
Then
that
could
be
something
that
whenever
we
notice
it,
we
could
capture
and
view
my
specific
data.
We
have
labels
that
are
very
specific
to
dmi's
and
we
report
those
to
prometheus
and
then
we
can
have
something
like
a
dashboard
around
it.
A
So
so
something
that's
very
so
kind
of
the
goal
is
having
subsets
of
of
vms
running
in
the
cluster.
So
we
can
get
very
focused
view
on
things
that
could
be
going
wrong
and
kind
of
so
it
makes
it
easier
to
to
look
at
or
even
just
notice,
things
that
are
that
are
kind
of,
but
not
working
as
expected.
So
what
do
people
think
kind
of
of
this
idea,
alexa
like
via
my
specific
metrics
and
the
other
other
ideas.
C
A
No
okay,
so
there
is
there.
Is
this
one
then
so
we
have
okay.
I
thought
it
was
just
a
count,
though
I
thought
it
just
like
incremented
did
it.
Did
we
actually
like,
like
I
thought
right
now:
it's
the
number
that
we
see
or
that
we
have
seen
that
we
have
seen
in
a
phase
like
do
we.
I
don't
think
we
decrement
it.
A
Yeah
yeah,
so
I
I'm
thinking
more
of
a
gauge
than
a
count,
a
counter
so
like
we.
If
we,
if
we
have
50,
vmis
and
running
you
know
any
given
time
we
we
have
a,
we
have
a
our
our
metric
shows,
you
know
50,
you
know.
If
we
have
15,
you
know,
scheduling
and
so
on
and
so
forth.
A
Okay,
so
then
we
then
we
are
documenting
it.
Okay,
I
thought
we
weren't.
Okay,
then
that's
never
mind,
and
this
one,
I
think,
is
this
one's
covered.
So,
okay,
that's
good
to
know
how
about
the
second
one.
You
know
what
do
people
think
of
that.
C
B
B
A
Yeah,
like
okay,
that
could
work
yeah
I
mean,
I
guess,
like
the
idea,
is
I'm
thinking
it's
like
kevin.
You
shared
that
dashboard
like
we
can
see
in
general
how
long
things
take
and
like
if
we
were
to
do
histograms
of
like
say,
99
percentile
quintile
for
some
of
these
being
able
to
identify
like
okay.
This
is
these.
Are
the
super,
slow
ones?
Let's
look
more
closely
at
them,
because
they're
going
to
clean
it
to
large
quests,
just
like
thousands
of
years
like,
where
is
that
yeah.
C
I
don't
know
if
it
would
do
it
on
a
vm
basis,
because,
as
we,
we
talked
about
that
it
can
be
a
lot
of
a
lot
of
labels,
but
in
general
like
showing
on
the
dash
on
the
dashboard
time
spent
average
time
spent
in
scheduling
starting
pending.
I
don't
know.
A
B
A
My
point
is
that
I
agree
with
you
like
we.
We
don't
want
to
have
that
many
labels,
but
what
I'm
saying
is
that
you
know:
could
we
get
away
with
the
only
labeling
things
in
in
select
cases
so
only
like,
because
we
have
you
know
it
was
slow.
You
know
it
was
whatever
it's
over
some
threshold,
so
we're
going
to
create
we're
going
to
have
a
label
for
this.
That's
that
includes
the
the
view,
my
name,
because
we
want
it
to
stick
out.
C
A
Yeah
like
because
I
think
the
idea
is
like
so
if
we
have
like
it's
kind
of
the
way,
we're
doing
the
time
steps
like
so
like
we
have
like
say
we,
we
notice,
like
okay,
we're
going
we're
changing
face
right,
like
that's
kind
of
how
we
do
things
so
like
we
do
like.
We
have
a
change
phase
function
and
that's
when
we
do
like
we.
We
set
the
time
stamp
at
that
point.
A
At
this
point
in
time,
we
could
do
a
comparison
and
say
like
okay
was
that
really
slow
was
that
like?
Was
that
a
really
like
looking
at
the
the
two
times,
because
we
have
the
objects?
Was
that
a
really
slow
transition,
or
was
that
a
like
a
reasonable
one?
Like
was
that
I
guess
the
question
was
that
an
unreasonable
amount
of
time
spent
and
if
it
was
unreasonable,
then
we
could
flag
it.
We
could
say
like
okay.
This
is
kind
of
strange,
like,
let's
just
we'll,
add
a
label
for
this.
B
Outside
you
know,
like
the
lyrics
that
kevin
mentioned
lots
more,
you
know
you
know
suitable
for
that,
then,
because
you
have
like
you
can
change
that.
You
know
the
alerts,
because
you
know
this
kind
of
values
will
change
for
different
systems
so,
and
it
looks
also
something
that
will
you
might
monitor
via
parameters
in
it.
C
You
can
verify
that
yeah.
You
could
also
do
it
on
a
namespace
basis,
but
one
one
other
thought
on
that
was:
is
all
this
phase
transition
monitoring?
I
had
that
thought
when
we
discussed
it
for
the
first
time.
It's
not
really
anything
we
need
to
do
in
in
our
control
plane.
C
It
could
also
be
a
tool
you
run
like
top,
that
can
anybody
can
run
or
you
can
run
the
cluster
and
expose
prometheus,
because
all
it
has
to
do
is
watch
the
vmi
resource,
like
all
vmi
resources,
and
it
can
be
done
outside
control,
pane
outside
process,
just
monitoring
when
it
is
having
a
quinase
watch
on
vmi
and
recording
this
stuff
when
needed.
C
A
Yeah
I
mean
maybe
this
is
this
this.
I
guess
I
think
maybe
we
need
like
another
like.
I
think
that
I
kind
of
the
way
that
I'm
envisioning
this
is
that
it
could
be
configurable
because
you're
writing
that
it's
like
it's
gonna
vary
for
cluster
and
also
the
workload
everything
and
I
could
see
this
value
being
configurable
based
on
whatever
the
workload
is,
but
in
terms
of
like,
I
think,
the
use
case.
A
It
needs
to
be
defined
a
little
bit
more
in
terms
of
like
how
this
would
because
I
think,
like
it's
going
along
the
idea
of
like
where
we're
going
to
do
the
performance
in
prometheus
or
doing
it
through
prometheus
for
for
our
metrics
and
so
like.
I
wouldn't
expect
another
tool
like
you
could
I
mean,
like
I
agree
with
you
could,
but
I
mean
kind
of
I
think,
along
the
same
lines,
we've
we're
going
toward
this
we're
going
toward
using
prometheus
for
all
of
us,
so
it
I
mean,
I
think,
it's
possible.
A
I
think
it's
the
point
like
it's
possible.
We
could
do
it
as
long
as
it's
not
something
that's,
I
think,
like
I
think,
like.
I
think
there
is
a
use
case
for
it,
but
I
think
it
needs
to
be
a
little
bit
further
defined.
C
But
did
did
it
make
sense
like
everything.
That's
based
on
fields
on
a
kubernetes
resource
does
not
necessarily
have
to
be
a
metric
in
our
control
plane
and
we
expose
and
have
to
have
a
feature
toggle
for
or
anything
it
can
be
any
kubernetes
client
process
doing
that.
Creating
that
metric
for
us
on
demand,
and
if
you
care
about
it,
you
just
deploy
that
and
you
get
a
metric
or
if
you
don't
care
anymore,
you
remove
it
again
or
you
run
it
locally.
Does
that
make
sense.
C
No,
this
tool
doesn't
have
to
touch
it.
It
just
has
to
have
a
watch
on
the
vmi
resource
on
the
status
of
the
vmi
resource.
Only
it
can
actually
work
for
any
kubernetes
resource,
like
this
tool
could
watch
transitions
between
any
kubernetes
resource
that
has
status
and
face
or
any
film
status,
and
you
see
how
long
it
spends
in
a
certain
time.
It's
nothing
cooper,
specific,
just.
A
Yeah,
I
see
what
you
mean,
I
mean,
maybe
that's
something
we
could
discuss
in
this
context
like
this
kind
of
cause.
It's
just
it's
sort
of
it
wasn't
really
where
I
was
going
with
this,
but
I
understand
what
a
perspective
that
we
that
it
could
be
sort
of
a
tool.
I
mean
we
what
we're
doing
now
with
like
with
our
audit
tool
right.
We
could
do
this
like
we
could
you.
A
The
the
timestamps
and
say,
okay,
this
vm
took
too
long
like
we
could
do
that
right
now
we
won't
get.
We
won't
get
anything
in
our
dashboards
for
and
that's
I
guess
that's
what
my
point
is.
Is
that
the
the
specific
angle
I'm
looking
at
here
is
that
are
there
any
ways
that
we
could
take
advantage
of
having
vmi
specific
metrics
and
our
dashboards
so
with
just
a
subset
of
vmis,
with
cases
that
we
care
about
that's
kind
of
what
I'm.
C
A
C
A
A
B
C
A
B
C
D
A
C
A
A
C
Agree
I
wanted
in
prometheus
as
well.
I
just
the
difference.
I'm
saying
is,
I
don't
think
or
I
don't
want
to
teach
our
control
plane
to
decide
what
metric
to
do
based
on
the
label
or
some
toggle
on
a
vm,
that's
like
if
our
control
plane
exposes
a
metric,
it
should
be
as
safe
and
granular
general
as
possible,
and
this
process
could
bring
more
if
needed.
A
B
A
I'm
thinking
anything
I'm
thinking
over,
like
I'm
thinking
we
want
any.
The
idea
of
the
metric
is
something
that
it's
a
subset
of
bmis
like
because
we
can't
like
we
already
talked
our
last
time.
We
don't
want
the
the
number
of
of
labels
to
scale
with
the
number
of
objects,
so
we
want
something
to
sort
of
limit.
A
The
number
of
like
we
want
to
try
to
find
is
a
metric
that
can
give
us
a
more
granular
view,
while
also
not
causing
us
to
have
a
ton
of
labels
that
you
know
it
scales
with
the
number
of
bmis
that
we
have.
C
I
think
why
what
we
have
is
not
enough
is
we
count
how
many
vms
are
in
status
pending
and
when
they
switch.
But
you
don't
know
if
there's
like
four
vms
that
are
stuck
in
pending
forever,
because
metric
is
still
fluctuating,
you
can't
you
don't
get
the
transition
time
on
average
or
at
all
or
per
percentile.
A
A
B
A
That's
that's
what
I'm
asking
like,
if
that,
if
we're?
If
so,
if
that's
like,
because
because
I
understand
like
this
might
not
be
like
the
right
approach
in
terms
of
like
like
this
might
not
be
how
the
general
pattern
is
for
for
doing
the
symmetric.
So
that's
kind
of
where
I
wanted
to
go
with
this
like.
Is
this
something
that
we
could
see
as
reasonable
or
not.
B
C
A
A
Right
well
see
yeah.
The
idea
is
that
the
question
is
like:
can
we
make
it
more
specific
and
the
second
question
is:
should
we
make
it
more
specific,
okay,.
A
Of
the
those
are
the
things
because
I
think
there
are
use
cases
that
there's
valuable
information.
That's
still
there
like,
as
you
pointed
out,
kevin
like
you,
could
have
something.
That's
stuck
and
you
wouldn't
know
it.
You
could
also
have
something
that
was
very
slow
and
you
wouldn't
know
which
one
and
which
you
know
the
hbmi
is
so
that
that's
what
it's
targeting
those
cases
and
to
see
if
it's
something
that's
feasible
or
if
it's
maybe
this
is
not
the
right
approach,
and
you
know
maybe
another
tool
is
the
right
approach
here
or
something.
B
A
C
B
Don't
know
if
we
well-
maybe
maybe-
but
I
don't
know
if
you
identify-
which
bmi
is
the
one
that
it's
this
lowest
one.
It
should
be
the
focus
of
the
metric.
So
we
know
that
there
are
vmis
that
are
slow,
you
know
and
and
then,
if
they
fail
yeah
we
can,
but
we
can
go
to
the
logs
and
check
the
logs
of
ever
all
the
vmi
and
see
which
ones
fail
and
just
debug
that
that's
the
way
that
I'm
doing
actually
so
and
and
the
yeah
the
other
thing
was
oh
yeah.
B
C
C
The
what
I
could
imagine
the
link
that
you
sent
is
where
the
vmi
transition
time
gets
recorded
right
now.
What
I
could
imagine
is,
depending
on
some
condition,
is
just
a
label
or
something
we,
we
just
add
the
my
name
and
maybe
namespace
to
to
that
labels
list,
and
that
would
questions
if
you
wanted
triggered
by
a
label
or
a
namespace
label
or
how
you
would.
A
C
A
Like
it's,
we
we
we
have
this
thing
like
it's
like,
like
I
wrote
their
chase
change
phase
like
you
know
such
a
timestamp.
I
think
that's
that
function
that
you
have
here
that
you,
you
might
phase
transition
or
something
one
of
those,
but
we
we
changed
the
whenever
we,
the
the
we
change
a
phase.
We
set
the
timestamp
at
that
point
in
time.
C
Yeah
right
it
would
have
to
because
I
don't
think
we
could
hard
code,
something
like
that.
So
you
would
you
could
like
set
a
label
like
I
don't
know,
metrics
dot
cube
root.
I
o
slash
transition
time
threshold.
I
don't
know
and
set
it
to
10
and
then,
if
it's
longer
than
10
seconds,
the
label
gets
added.
A
C
A
B
B
B
Yeah,
I
I
think
like
if,
for
this
kind
of
specific
kind
of
threshold
it
shouldn't
be
in
the
control
plane,
it
should
be
external
here
like,
as
we
mentioned,
so
I
think
even
the
grafana
dashboard.
Maybe
it's
possible
to
create
kind
of
threshold,
so
you
can
you
can
there?
You
know
just
have
things
that
are
higher
than
some.
You
know
someone.
C
B
C
B
B
B
A
Yeah,
I
could
do
mailing
lists
to
write
or
something.
Maybe
we
can
follow
up
on
right:
okay,
yeah,
all
right
cool,
all
right.
Let's
go
to
the
next
topic,
then,
okay,
so
the
kubernetes
has
an
one
two,
oh
there's
greater
than
one
two,
oh
greater
than
equal
to
one
tomorrow.
It's
api
priority
and
fairness,
and
at
least
as
far
as
I
could
tell,
there
was
not
a
policy
created
for
q
verts,
and
this
would
be
an
interesting
thing
to
do
for
a
number
of
reasons.
A
One
of
them
is
that
we
we
want
to
make
sure
that
our
requests
to
the
api
server
are
not
inhibited
by
anything
else,
and
we
also
want
to
make
sure
that
we're
not
a
noisy
neighbor
and
that,
if
we're
some
reason,
our
component's
out
of
control,
we're
not
hogging
the
api
server,
so
it
it
protects
us
and
it
protects
others
in
the
cluster.
So
this
is,
we
could
define
this
per
component.
A
We
can
do
a
lot
of
things
with
this,
but
the
general
idea
is
that
you
know
we
can
create
some
policies
around
this
with
a
flow
control
and
a
priority
level.
Config
and
there's
some
good
examples
in
the
in
the
link
there
of
what
we
can
do
and
there's
already
some
existing
examples
if
you
in
the
by
default
in
the
cluster
right
now,
if
you
have
a
one-two
cluster,
you'll
see
that
there's
a
bunch.
So
what?
If?
What
do
people
think
about
this
topic?
B
I
think
it's
something
that
I
I
always
think
about
that,
but
I
know
I
never
come
up
with
some
conclusion
of
that,
for
example
in
the
test
that
I'm
doing
and
the
task
that
I
you
know
the
ci
environment,
we
have
the
kubernetes
masternodes
dedicated
to
some
nodes
and
at
some
somehow
I
never
did
that,
but
I
think
that
the
coupe
vert,
you
know,
control.
You
know,
controllers
should
be
in
the
masternodes,
you
know
and
should
be
sharing
the
worker
nodes.
B
A
B
C
Priority
and
fairness
a
bit:
it's
not
a
different
story,
kind
of
it's
it's
more
telling
the
apis
of
what
requests
are
prioritized
and
yeah
who
to
rate
limit
and
when
to
rate
limit
and
like
not
to
rate
limit
kubert
or
something
like
that
as
much
as
some
random
process.
Okay,
so
it's
your
friend.
A
Yeah
think
of
this,
as
like
kubernetes
way
of
protecting
itself
like
this
is
just
like
a
an
api
to
make
sure
that
that
you
just
that's
you
know
someone
just
doesn't
overwhelm
the
api
server,
so
it
it
comes
with
some
so
kind
of
the
it
comes
with
some
of
the
features
that
come
with
it
is
that
you
know
we
can
make
sure
that
you
know
our
requests
are
going
to
be
get
a
shot
at
the
api
server
and
we
can.
We
can
isolate
them.
A
You
know,
based
on
whatever
you
know
the
user,
the
name
namespace
the
service
account
all
sorts
of
things,
the
verbs
that
we
use
the
apis.
So
we
can
make
sure
that
those
requests
like
by
our
word
handlers,
for
example,
are
are
getting,
are
not
being
interrupted
by
perhaps
someone
else,
that's
using
the
same
api.
A
B
Yeah,
so
maybe
if
we
document
you
know
how
to
configure
those
things
in
kubernetes
for
our
modules,
you
know
have
like
a
straightforward.
You
know
demo
for
how
to
do
that.
It
would
be.
You
would
be
nice
yeah,
because
it's
not
anything
to
change
in
the
convert
itself
and
it's
just
how
to
apply
that
for
computing.
C
A
Yeah,
so
I
think
this
one
is
also
needs
like
a
final
discussion,
so
I
I'm
already
doing
this
is
something
I
was
looking
at
because,
like
there
was
there's
a
bunch
of
information
that
you
can
get
from
it.
A
In
addition
to
like
the
features
I
just
mentioned
like
because
you
can
do
because
you
can
sort
of
filter
by
api
user
name
space,
all
this
stuff,
just
all
the
general
artifact
rules
you
can
you
can
you
can
isolate
traffic
like
you
can,
based
on
you
know
what
gets
into
these
queues
that
are
part
of
the
priority
and
various.
A
So
you
can
see
like
like,
for
example,
if
humour
is
creating
like
a
ton
of
list,
requests,
we'll
we'll
see
them
get
queued
up
and
when
we
can
actually
have
metrics
that
that
could
do
this
even
on
a
pervert
basis,
if
we
wanted
to
so
there's
like
there's,
actually
some
in
addition
to
like
protecting
yourself,
there's
also
some
things
we
can
probably
learn
about
about
like
what
our
traffic
patterns
are
right
now.
So
there's
I
think,
there's
a
lot
of
benefits
to
it.
C
A
I'll
create
a
follow-up
discussion
on
this.
I
have.
I
already
have
like
a
document
going
in
terms
of
like
some
of
the
ideas
I
have
and
kind
of
do
a
study
like
how
much
memory
this
you
know
how
many
requests
per
second
we
can
take,
and
you
know
what
our
cues
should
be
and
stuff
like
that.
I
can
do
a
discussion
for
that
in
the
mailing
list.
C
I'd
love
to
see
that,
like
the
the
protein
fairness
being
implemented
or
used
by
keyboard,
because
it's
a
a
kind
of
big
thing
for
the
community
for
kubernetes,
for
the
api
companies,
api
and-
and
it's
still
beta.
But
I
think
it
should
at
some
point.
It
should
get
to
a
point
where
it's
mandatory
for
some
things
in
kubernetes.
Because
that's.
A
C
A
There
you
go
okay,
this
one's
this
one's
cool,
just.
A
Yeah,
do
you
want
to
talk
about
this
one?
Oh
there's,
a
pr
is
kind
of
sensitive.
D
Yeah,
so
I've
gathered
some
some
data
as
far
as
transfer
for
the
logging
and
and
it
seems
like
during
high
high
traffic.
We
see
a
lot
of
logs.
Some
of
them
are
either
duplicated
or
could
be
consolidated
into
into
ones
to
like
we
could
just
save
a
lot
and-
and
in
this
pr
I
tried
to
do
it
so
some
of
the
logs
I've
moved
them
to
either
like
higher
verbosity,
which
are
in
the
like
hot
paths
or
like
here.
D
In
this
execute
method,
I
try
to
consolidate
these
two
vmi
and
domain
logs
that
we
previously
logged
as
two
separate,
which
resulted
like
in
in
lots
of
kind
of
duplication
and
and
yeah
like
right
now.
I
I
think
it's
it's.
It's
also
much
better
to
to
search
for
forgiven
status,
having
having
those
two
in
one
place,
yeah
and-
and
I
think
I
think
that's
it.
A
And
you
saw
like
how
like
how
much
was
the
reduction
of
logs?
Did
you
see
when
like
because
these
were,
I
think
I
saw
like
on
the
graph?
They
were
pretty
significant
significance
of
these.
D
Yeah,
so
if
you
look
in
in
the
graph,
so
it
was
like
some
over
some
time
right,
and
so
it's
not
not
like
on
the
one
around
because,
like
the
logs,
are
in
in
millions,
so
we
we
are
just
saving
like,
for
example,
like
here
in
this
in
this
for
loop.
I
think
we
are
saving
around
like
six
millions
on
logs
and
the
the
whole
like
in
the
whole,
like
interval,
we've
collected
40
million.
So
it's
it's.
I
think,
quick
math.
D
D
Correct
yeah
it's
around
like
50
so-
and
this
is
just
just
for
this-
this
particular
place
and
and
and
in
in
overall.
It
should
be,
I
think,
like
thirty
five,
fifty
percent
of
the
locks
should
be
should
be
removed,
with
my
changes,
of
course
like
assuming
that
the
verbosity
is
is
set
to
two,
because
otherwise
it's
it's
yeah.
It's
different
well,.
C
Yeah
one
concern
I
have
with
this
specific
part
is
that
I
mean
if
my
math
and
wrong
this
is
just
is
reducing
the
amount
of
logs
of
this
part
by
half
kinda,
because
it's
combining
logs.
The
only
problem
I
see
with
it
is.
C
Yeah,
okay,
this
just
yeah
this
could
probably
okay.
The
only
thing
I
I
I
probably
can
see
with
those
combined
logs
is
that
there's
a
not
insignificant
trend
of
doing
log-based,
analytics
and
metrics,
also
blood
powered
by
graffana
and
stuff,
where
combined
logs
can
be
hard
to
extract
data
from
I
don't
know
if
anybody's
doing
that
and
has
more
experience
with
it.
But
for
those
use
cases
it's
easier
to
have
dedicated
messages
for
two
things
that
are
two
things
should
make
sense.
D
Yeah,
I
agree
that
that
there
is
some
trade-off
and
I
I
I'm
not
not
fully
sure
like.
What's
the
what's
the
right
approach
here,
like
the
the
problem,
is
that,
for
example
like
in
in
this
else,
if
where
domain
exists-
and
the
am
I
not
we,
we
are
kind
of
like
trying
to
cr
recreate
this
vmi
reference
right
instead
of
a
logging
domain
and-
and
it
could
potentially
be
a
big
confusion
about
this.
D
But
on
the
other
hand,
you
don't
seem
to
have
much
more
information
from
the
from
the
domain
than
than
the
vmi
and
and
the
the
we.
I
think
we
would
need
to
think
what
we
are
actually
getting
by
by
splitting
this
into
two
and
if
there
is
any
any
particular
reason
to
to
do
that,.
A
Okay,
well,
I
would
I
mean
we
can
take
it
up
on
the
pr.
Maybe
I
think
I
think
it's
a
great
change.
I
think
it's
like
we're
reducing
a
lot
of
vlogs
we're
doing
single
like
singaporean.
That's
that's!
A
really
good
change,
cool!
Okay!
A
All
right!
Let's
go
to
the
next
one!
Thanks
news!
Next
one's
the
perskillo
generator
needs
approval.
A
A
You
don't
either
okay,
we'll
need
david
to
do
it
then.
A
Okay,
all
right
the
next
one,
the
performance
evaluation.
So
it
looks
like
you
did
another
one
marcelo.
B
Yeah,
I
did
like
the
you
know
the
performance
mods
again,
and
this
is
the
update,
the
I
would
say
the
at
least
the
master
branch
for
this
week.
You
know
it
was
like
the
last
one,
but
it's
fair
enough.
B
You
know
update
it's
failed
to
create
500
vms,
and
I
was
expecting
that.
Let
me
explain
first,
so
I
was
doing
another
test,
as
you
guys
might
remember
how
many
vms
I
can
you
know
pack
in
a
node.
B
Actually
it's
the
the
next
task
that
I
listed
here
in
the
in
the
meeting
so
and
then
I
can
create
at
least
a
little
bit
more
than
300
vmis
per
node.
B
However,
here
in
this
test,
I'm
using
three
nodes,
so
I
would
expect
at
least
900
bmis
in
the
cluster,
but
with
500
fails,
I
didn't
collect
all
the
logs,
so
unfortunately
I
don't
know
why
they
failed,
but
they
failed,
and
but
we
can
check
the
grafana
you
know
matrix.
B
This
is
also
the
update
grafana
that
I
have.
It
might
be
interesting.
Also
to
see-
and
basically
I
see
here
just
remember-.
B
Okay,
so
yeah.
First
of
all,
we
can
go
to
this.
If
you
I
don't
know
who
is
sharing,
if
you
can
go
to
the
vmi
creation
time
yeah
this
one,
that
is
in
the
middle.
B
C
You
need
to
increase
the
timeout
when
you
create
a
snapshot,
ask
you
for
a
time
or
you
type
in
10
seconds
or
something,
and
it
should
be
fine.
B
Yeah
I
put
one
hour
and
even
even
though
it
fails
to
create
this
snapshot,
so
maybe
it's
my
internet.
I
don't
know
it
didn't
work.
So
that's
why
I
put
this
screenshot
here,
but
I
would
definitely
try
the
this
next
shot
later
if
it
works.
Okay,
so
things
that
are
interesting
here
is
okay.
Let's
take
a
look
in
this
two
last
ones,
which
is
actually
400,
vms
and
500..
B
So
500
fail
but
400
work,
and
we
see
this.
You
know
phase,
for
example,
the
running
and
then
the
schedule
and
then
scheduling,
especially
for
500
400.
B
Did
you
see
that
it
got
scheduled
with
five
minutes?
Oh
here's
night,
five
percentile
okay,
so
it
not
means
that
all
vms
were
super
slow
like
that,
but
they're,
the
worst
ones.
B
So
we
get
like
five
minutes
was
scheduled
and
then
it
took
more
five
minutes
to
actually
start
to
run
and-
and
it's
expected
because
I
don't
know
well,
we
are
trying
to
create
a
lot
of
vms
per
node,
but
it's
maybe
you
know
you
know
too
long.
You
know
to
to
create
that,
and
it's
just
something
interesting
to
just
see
the
other
one.
B
B
It's
the
one
that
it's
taking
more
time
and
then
we
have
the
schedule
and
the
running
which
looks
a
little
bit:
mismatching,
isn't
it
from
one
to
another,
but
again
the
vmi
creation.
Time
is
just
taking
the
the
worst
case
scenario
so
and
that
we
can
see
like
the
long
running
happens
here
there
so
yeah,
the
the
other
things
that
we
can.
Oh,
first
of
all,
if
we
go.
C
To
the
top
one
question
on
on
that
those
crafts,
real
quick
that
is
400
500
vms
right,
because
they,
the
grass,
are
very
close
together
on
the
numbers,
which
could
mean
that
we
reach
a
limit
at
400.
That
500
is
only
filling
a
bit
but
600
it
would
be
interesting.
600
goes
up
higher
or
if
it
also
caps
out
on
the
five
minutes,
somehow
on
scheduling.
That's
our.
B
Yeah
I
tried
to
create
more.
However,
as
I
mentioned,
500
failed
yeah
and
actually
the
namespace
got
stuck.
You
know
it
didn't
delete
and
okay
and
the
test
didn't,
you
know,
continue
with
more
vms
but
I'll
try
to
run
it
again.
Yeah
to
see
if
I
can
reproduce
that
with
more
vms
and
see
yeah.
A
B
We
are
definitely
reaching
the
image
here:
yeah
and
500
400
500,
but
it
unlimits
in
the
system.
Isn't
it
because
again
I
have
another
test
creating
300
per
oh
then
it's
300
that
was
300
per
node
and
it
was
you
know,
working
fine
and
it
only
failed
now
500.
Here
we
can.
If
you
see
the
vm
count,
that's
good!
Where
mission
yeah,
you
know
below.
B
And
then
it's
like
many
vms
fail
with
500,
also
another
material
that
I
put
here.
It's
the
rest,
you
know
rate
limit
duration
yeah
just
one,
and
this
should
be
fixed
within
the
aromas
pr
to
increase
that.
But
we
are
definitely
you
know
reaching
a
lot
of
you
know
go.
You
know
throttling
limits
here
for
the
api
request,
and
maybe
it's
related
to
that.
You
know
the
errors
that
we
are
seeing
and
this
is
slow
down
that
we
are
seeing
a
lot
with.
A
Yeah
marcelo,
I
was
wondering
you
know
because
it
would
be
cool
to
see
a
comparison
like
you
have
here.
It's
like
what
you
exactly
what
you
have,
plus
with
an
increased
qps
and
burst,
to
see
the
what
we
end
up
with
see
how
the
graphs
compare.
A
Okay,
yeah:
well,
maybe
the
next
one
you
do.
That
would
be
a
cool
one.
To
do.
That'd
be
awesome
to
see
what
see
how
it
makes
a
difference.
A
Now
that
it's
configurable,
the
other
one
is
if
you
scroll
down
kevin
this,
I'm
wondering
if,
like
you
see
how
we
have
the
vm
count,
we
get
some,
we
get
some
failed.
I
wonder
if
those
scheduled
vms
are
going
from
scheduled
to
failed,
and
maybe
that's
why
it
levels
off
right.
There.
A
A
C
A
B
C
B
B
I
think
david
tried
to
fix
that
before.
Isn't
it
yeah.
A
It
shouldn't
be
there:
okay,
well,
maybe
we'll
get
marcelo.
I
think
that
we
can.
Let's
next
chance,
you
get,
you
can
try
with
the
qps
change
and
let's
do
the
measurement
and
then
we'll
see
if
pending
shows
up
for
that
one.
C
C
A
Yeah,
okay,
all
right!
Well,
we're
we're
at
time.
There
is
one
more
topic
I
don't
know.
If
can
we
cover
this
in
like
in
like
30
seconds
here
or
do
we
need
to
push
this?
The
next
time.
C
Yeah,
I
I
it
came
up
yesterday.
That's
that's
mine.
Wait.
Let
me
share
a
different
screen.
David
asked
yesterday
on
my
bug
fix
for
the
go
routines.
C
If
we
can
test
that
somehow-
and
we
decided
like
it's
it's
hard
to
test-
and
one
thing
I
wanted
to
bring
up
is
that
we
could,
or
should
somehow
with
those
density
tests
measure
or
be
able
to
define
a
way
of
of
seeing
regressions
like
that
by
checking
like
some
some
threshold
of
go
routines
or
cpu
load
that
is
allowed
to
be
before
and
after.
C
A
C
A
C
A
A
Marcelo's
load
generator
emerges,
I've
got
the
thresholds
like
ready,
I'm
gonna
hook
up
to
marcelo's
low
generator,
and
then
I
think
we
have
sort
of
like
the
the
ground
floor,
like
we
have
sort
of
the
the
foundation
for
what
we
want
to
do
for
like
functional
tests,
and
then
I
think,
having
a
functional
test
for
this.
A
Exactly
would
be
you
know
using
that
existing
like
everything,
that's
there
and
just
kind
of
putting
in
some
code
to
say
like
okay,
here's
before
you
know
it
appears
like
here's
after
something,
I
don't
know
something
we
could
like
here
or
here's,
what
we
expect
or
something
like.
Here's
we're
not
we're
not
leaking.
Well.
I
guess
you
know
this
one,
you
just
you
create.
You
create
a
bunch
of
vms,
you
delete
them
and
we
see
that
there's
no
right.
C
C
Could
be
part
of
the
audit
tool
or
the
tool
scope?
For
example?
I
think
that
they've
mentioned,
but
yeah.
I
just
want
to
raise
attention
that
we
should.
A
C
A
A
That
we
eventually
get
to
for
this.
Okay,
all
right
we're
at
time.
Folks,
thanks
for
your
time,
I'll
see,
y'all
online
have
a
good
day.