►
From YouTube: SIG - Performance and scale 2022-03-10
Description
Meeting Notes:
https://docs.google.com/document/d/1d_b2o05FfBG37VwlC2Z1ZArnT9-_AEJoQTe7iKaQZ6I/edit#heading=h.tybh
A
Okay,
welcome
to
scale
it's
march.
10Th
put
the
meeting
notes
in
the
chat
find
yourself
inside
tendee,
please,
okay,
the
first
thing
is
just
an
announcement.
Next,
the
next
meeting
that
will
be
scheduled
for
for
sixth
grade
will
be
april
7th.
So
three
weeks
from
now
so
I'll
be
I'm
gonna
be
out
of
office
for
three
weeks,
so
this
will
be
the
the
next
time
I'll
be
back.
So
let
me
actually
double
check.
I
said
april
7th
one
two,
three,
no
thirty!
Thirty,
no,
no
yeah,
that's
right!
A
Okay,
so
I'll
be
back
in
the
office
on
the
beginning
of
april,
so
yeah
april,
7th
is
right,
so
I'll
put
something
on
the
mailing
list
again
as
a
just
as
a
reminder,
but
that's
when
we'll
we'll
meet
again.
A
Okay,
some
prs,
so
I
want
to
get
to
a
few
of
these
first
one
is
so
I've
brought
this
up
a
few
times.
I
just
want
to
decide
on
the
fate
of
this
pr
here,
see
what
I
want
to
do
with
this
one
we
could
either.
A
I
mean
it's
been
here
for
a
little
while,
like
we've
had
a
little
bit
of
review
on
it,
I
mean:
do
people
feel
comfortable
merging
this,
as
is
or
should
I
just
mark
this
as
work
in
progress,
and
you
know
we
spend
time
you
know
with
our
tests
and
our
test
framework,
and
we
build
out
this
this
pr
over
time
like?
Is
there
any
preference
for
that
or
you
know,
do
we
do
want
to
just
merge
it
now
or
like?
What
do
people
think.
A
All
right
I'll
have
to
ask
I'll
get
some
get
you
to
attack
it.
Then,
okay,
all
right!
That's
fine!
We'll
go
with
that.
Okay,
second,
one,
the
load
generator.
I
wanted
to
just
spend
a
minute
or
two
on
this,
because
I
did
make
another
change
to
this.
I
changed
the
interfaces
a
little
bit
to
make
this
even
I
think,
a
little
bit
more
friendly
for
different
types
of
jobs
that
we
could
add.
So
what
I
did
was
originally
is
like
designed
a
few
interfaces.
A
Let
me
see
for
for
a
job.
I
call
actually
I
originally
was
in
low
generator,
but
now
I
call
it
a
job.
So
these
are
the
things
that
I
consider
to
be
anything
that
we
need
to
to
manage
any
type
of
workload.
A
I'm
thinking,
eventually
what
we
could
do
is
we
could
allow
different
types
of
jobs
to
override
this
stuff.
Like
like,
I
could
see
different
types
of
study,
state
jobs
that
have
different
types
of
that
sort
of
handle
that
have
that
handle
like
the
refill
differently
or
something
or
delete
differently,
and
so
we
can
override
these
things
and
then
the
interface
for
actually
doing
these
things.
I
just
use
it
with
a
run
and
delete,
and
that's
it
so
I
change
the
api
so
that
it's
a
little
bit.
A
I
think
it's
a
little
bit
easier
to
use
based
on
my
previous
change
and
then
the
other
change
I
added
is
the
let's
see
here.
It's
in
steady
state.
A
It's
just
a
midterm
sleep,
so
this
is
just
a
all.
This
is,
is
a
a
the
configurable
amount
of
time
that
we're
going
to
wait
like
that
weight
the
weight
api
that
I
just
highlighted
for
every
every
time
we
do
this.
Let's
see
if
I
can
find
steady
state,
it's
easier
to
show
this.
A
Steady
state,
okay
here
so
so,
every
time
like
we
go
through
rate
like
so
we
go
through
iteration.
The
my
expectation
is
that
world
will
do
some
sort
of
weight
in
between
you
know
the
creates
and
deletes,
and
this
could
be
any
value
that,
and
it
also
could
vary
quite
a
bit
like
I
mean
weight
could
be.
You
could
do
something
and
wait
I
mean
for
for
all.
We
know
based
on
how
we
want
to
run
this
job,
but
I
I
simply
just
have
it
as
a
configurable.
A
A
So
what
it'll
do
is
it'll
calculate
the
amount
of
time
spent,
creating
and
it'll
subtract
it
to
any
like
say
it
took
20
seconds
to
do
create,
and
I
set
my
min
turn
sleep
to
30
seconds
it'll
then
sleep
for
the
remainder
of
that
time,
so
10
seconds
so
that'll
give
us
just
a
little
bit
of
a
buffer
between
creating
deletes,
but
if
they're,
if
we're
creating
like
up
up
until
you
know
the
point
that
we
need
to
delete,
then
that's
fine,
that's
fine
as
well,
and
so,
if
you
really
wanted
to
have
control
over
this,
you
could
set
this.
A
You
know
value
to
be
very
high.
If
you
want
to
have
you
know
some
sort
of
sleep
in
between
during
the
periods
of
turn,
so
it's
configurable,
I
think
that's,
I
think,
that'll
be
a
better
interface
for
testing.
A
Okay,
I've
been
doing
some
testing
with
this
internally
so
far
so
far
so
good,
I
still
want.
I
still
have
some
ideas:
how
to
explain
it
but
yeah
so
far.
I
think
so
far
I
like
it.
I
want
to
I'll
eventually
take
this
and
expand
this
to
one
of
your
like
what
you
have
right
now
with
the
well.
We
have
right
now
with
the
burst
job,
I'd
like
to
include
it
as
a
periodic,
but
I
want
to
get
to
the
right.
A
I
think
once
I
have
the
right
configuration
because
I
haven't
tested
this
at
like
the
same
scale
that
we
would
run
that
job
at
so
I
want
to
make
sure
that
I
can
figure
this
correctly,
so
I
don't
have
that
yet,
once
I
do
I'll
I'll
make
the
change
to
the
periodic
to
add
it
for
data
city
but
yeah,
here's
what
the
results
look
like
yeah,
you
can
see
like
we
get
this
study
create.
This
is
with
a
a
turn
of
two,
so
we
create
five.
We
delete
two.
A
Yeah,
okay,
all
right!
Well,
if
so
I
mean
the
only
thing
I
wanted
to
just
bring
attention
to
is
those
changes
but
marcelo
you're,
the
one
who
mostly
reviewed
this.
But
if
you
have
any
other
comments,
let
me
know
yeah.
I
think
I
think
it's.
B
Mostly
ready
yeah,
the
only
thing
that
I
I
think
I
comment
is
you
know
some
hard
cold
parameters,
so
I
really
think
that
we
should
avoid
hard
cold
things.
So
if
we
have
flags,
maybe
you
know-
and
if
you
don't
want
to
make
it-
you
know
configurable
in
the
email.
Maybe
you
have
a
flag,
you
know
I
don't
know
so,
I'm
just
against.
A
A
My
thought
for
this
marcel
is
that
well
like
I
wanted
to
find
a
value
that
is
not
likely
to
change.
Like
I
understand
what
you're
saying
like
with
the
some
tests,
I
want
to
make
it
different.
That's
totally
fair,
but
the
if
we
set
this
to
like
certain
values
like
I
expect
that
majority
of
tests
will
not
change
them
and
and
that's
kind
of
what
I
wanted
to
do
like
with
20.
A
It
seems
fairly
reasonable
that
it's
not
going
to
change,
but
I
I
totally
respect
that
it
could
and
what
I
want
to
see
if,
like.
If
we
get
to
that
use
case,
you
know,
then
maybe
what
we
do
is
we
increase
this
value
or
if,
if
we
find
that,
like
the
use
case,
is
that
we
constantly
need
to
change
this
on
tests
because,
like
you
know,
we've
done
a
number
of
tests
and
we
see
a
huge
difference
based
on
cluster
size
and
other
things.
Then
yeah.
A
B
B
You
know
zero
for
example,
so
it's
it
should
have
rate
limiting
when
you
know
when
I
interval
between
creations,
but
the
the
client
itself
you
know
should
should
not
have
like
rate
limits,
because
if
you
do,
for
example,
we
are
not
doing
lists,
but
if
you
do
lease
or
get
or
anything
it
shouldn't
get
any
rate
limit
for
the
you
know,
benchmark
standpoint.
I
think
so
that's
why
you
know.
In
the
beginning
they
have
this
global
configuration
and
also
the
job
configuration
rate
limit.
B
The
global
configuration
is
for
the
you
know.
The
client
in
general,
so
actually
by
default,
was
using
zero,
so
no
rate
limit,
for
you
know
for
all
the
requests.
B
A
B
B
I
was
using
the
the
library
rate
limit
actually
to
put
some
weight
between
creations.
So
then
just
case
I
could
configure.
For
example,
you
know
20
requests,
20,
crates,
requests
per
second
or
10
crates
for
quests
per
second.
I
had
some
logic
about
that
before.
B
So
I
don't
remember,
if
I
don't
remember
if
you
change
that
or
not
I'm
just
thinking,
because
now
that
you
remove
this
global
configuration
and
then
you
put
20
here
and
then
it
will
impact,
you
know
everything
you
know
not
only
create,
but
you
know
in
all
the
requests
so
get
leasing
whatever
will
be
also
rate
limit
with
20
and
and
then,
if
you
run
things
also,
you
know
I'm
just
thinking
in
the
in
the
global
case.
B
So
if
you,
for
example,
run
two
jobs
in
parallel,
they
will
also
be
rate
limit
with
20
between
themselves,
and
so
that's
why
I
think
the
client
shouldn't
have
rate
limit,
but
you
can
have
you
know,
control
the
creations
per
second
in
the
in
the
in
the
code.
So
that's
insane.
A
So
you're
saying
this
should
be,
this
should
be
zero,
and
then
I
should
rate
limits
like
between
with
some
other
mechanism
in
like
when
we're
actually
going
through
and
doing
the
crates
like
after
each
create.
I
should
have
something
to
rate
limit
again:
yeah.
B
In
the
case
you
you,
you
can
make
the
default
value
as
zero,
or
maybe
you
know
can
I
I
still
think
that
can
be
configurable,
but
anyway,
I
think
should
be
the
default
zero
for
the
global.
You
know,
client
configuration
and,
and
then
it
should
have
some
control
for
creations
for
the
leads,
and
you
have
a
control
for
updates,
for
you
know
for
the
steady
state
you
kind
of
have
this
control
with
slips.
B
So
but
you
there
is
a
library.
You
know
that
you
have
it's
called
rate
limiter
so
and
then
you
just
configure
like
also
burst
and
carriage
per
second,
and
then
you
do
the
some
weight.
I
thought
I
implement
that
during
the
original
load
generator,
but
I'm
not
sure
now.
So
if
you
didn't,
if
you
didn't
see
that,
maybe
it's
not
was
not
that
yeah.
A
Yeah
I
okay
yeah,
like
I
did
yeah,
I
see
what
you
mean
like
I
do
do
with
weights
yeah.
It
is
a
good
question
kind
of
how
that
interface
should
look,
but
it's
like
yeah.
I
don't
know
like
it's.
I
don't
know
this
just
seems
like
a
an
easy
way
to
do
it.
I
I
I
can
look
into
it,
but
I
mean
I
think.
I
think,
though
I
think
I
agree
with
you
that
this
should
be
zero,
like
I
guess
we
shouldn't,
we
shouldn't
rate
limit
at
all
here.
A
C
B
A
Limit
yeah
no
limit.
I
guess
that
makes
sense.
Okay,
so
yeah,
I
could
do
zero
for
this,
and
then
we
could
yeah,
then
I'll
leave.
So
how
I
think,
like
I
think
I
guess
so.
My
approach
myself
was
like
I
just
wanted
to
when
I
want
to
try
and
like
I
want
to
see
how
this
goes
like,
because
it
seemed
like
it's
fairly
basic
like
what
I
have.
A
I
think
what
so
I'll
make
this
zero,
and
then
I
mean
the
sleep
kind
of
works,
but
I
mean
I
totally
acknowledge
that
there
could
be
a
better
way
like
we
could.
I
kind
of
want
to
revisit
that,
though
I
think
okay,
if
we
can
make
it
more
powerful,
I
definitely
want
to
revisit
it
as
we
I
kind
of
want
to
let
it
evolve
like
as
we
kind
of
build
the
the
use
cases
as
we
use
it
more
yeah.
A
Yeah
yeah,
okay,
I'll
change
this
to
zero,
and
then
I
completed
that
okay
sounds
good:
okay,
cool,
okay
and
then
here's
the
I
wrote
the
config
to
go
with
it
marcelo
I
forget.
If
I
see
you,
I
did
okay
yeah.
This
is,
and
I
added
daniel
too
so
yeah
when
you
get
a
chance
this
will.
This
will
go
with
it
for
that'll
make
it
work
with
just
the
same
way
that
you
have
now
with
your
burst
test.
A
Okay
sounds
good,
okay,
so
the
other
thing
that
I
had
for
this
meeting
I
wanted
to
do
a
little
design,
because
so
one
of
the
things
that
we've
talked
about
we've
talked
about
this
tool.
We
generate,
we
we
generate
load,
we
talk
about
a
tool.
We
have.
You
know
we
audit.
A
The
other
part
we
we've
kind
of
talked
about
is,
like
you
know,
how
can
we
measure
like
how
we
could?
How
can
we
measure
pressure
has
been
like
our
that
question
that
larger
question-
and
you
know
we
talked
about
it
within
this
presentation
and
and
kind
of
the
way
I
wanted
to
think
about.
A
We
need
to
know
a
certain
things
right
about
the
cluster
like
we
know
how
many
nodes
it
has,
how
many
nodes,
how
many
knows
it
has
how
many,
how
many
other
things
that
that
has
that
could
be
causing
pressure
so
that
when
we
do
tests,
we
know
the
difference
like
we
know
how
much
pressure
that
that
we're
causing
you
know
with
our
workload.
A
Does
that
make
sense
to
people
like
kind
of
as
a
way
to
to
measure
like
so
basically,
what
this
would
do
is
like
this
would
give
us
a
way
to
understand
like
when
someone
is
running
our
tool
with
fluid
generation
and
auditing.
You
know
when
they're
telling
us
here's
what
our
scale
is.
We
understand
what
their
their
topology
is.
Their
cluster
topology
is
so
that
we
can
put
the
topology
against
the
numbers
and
understand
you
know
what
what
their
scale
or
their
performance
actually
means.
B
Yeah,
I'm
not
sure
if
I,
if
I
got
it
so
you
mean
to
visualize
the
data
to
analyze
the
data
because,
for
example,
you
have
a
cluster,
you
know
100
nodes,
and
then
you
create
some
pressure.
For
example,
you
want
to
do
a
burst
test,
creating
1000,
vms
or
the
steady
state,
and
then
you
configure
in
the
way
that
you
configure,
for
example,
creating
500
and
with
a
churn
of
20
and
and
then
reduce
this
analysis,
which
means,
and
then
you
check
the.
B
For
example.
You
need
to
check
the
some
slo,
for
example
the
vm
creation
time.
If
you
see
you
know
this,
the
the
creation
time
is
too
high.
You
need
to
decrease
the
pressure,
because
this
kind
of
pressure
the
cluster
doesn't
hold
so
and
but
I
don't
know
what
kind
of
topology
you
know
the
benchmark
tool
should
analyze.
B
A
Say
we
want
to
publish
slows
right.
We
say
that
in
release
zero
five
one
zero,
fifty
one
we
this
is
what
the
slo
is
like.
We
expect
you
to
be
able
to
do
a
thousand.
You
know
vm
start
to
create
or
create
to
running
time
in
less
than
I
don't
know
each
vm
with
less
than
20
seconds
on
average
or
something
you
know,
that's
our
slo
and
you
know
someone
does
goes
and
runs
this
in
their
in
their
data
center
and
they
don't
hit
the
slo,
and
you
know
why.
A
Why
didn't
they
hit
it?
Well,
maybe
they
had
a
hundred
thousand
pvcs
just
sitting
there.
Maybe
they
had
a
thousand
namespaces.
You
know
like
their
topology
is
totally
different
than
what
we're
testing
in
so
their
slo
is
totally
different.
So
it's
it's
almost
it's
not
even
a
fair.
It's
not
a
fair
comparison.
So.
B
Yeah,
I
don't
remember
now,
but
I
think
kubernetes
in
their
documentation.
They
don't
put
numbers
also,
so
we
they
just
describe
what's
the
slo,
for
example
the
vm
creation
time,
and
then
we
describe
what
is
that
vm
creation
times
it's
the
vm,
it's
in
the
running
state,
which
means
the
running
stage,
is
you
know,
lived
very
domain,
got
created
and
then
received
the
run
command
things
like
that.
You
know
so
we
described
scenarios,
we
don't
need
to
say.
B
B
So
if
we
we
have
more
vms,
you
know
a
a
batch
of
morphemes
burst
a
burst
of
110
000.
You
know
vms
the
worst
case
scenario
some
vms
will
be
like
is
lower
because
they
are,
you
know,
waiting.
The
work
kills
things
like
that
so,
and
there
are,
of
course
there
are
much
more
things
behind
the
scene
that
slow
down
the
rim
creation,
but
is
we
cannot
right
now?
We
cannot
guarantee
and
also
we
should
not
guarantee
in
our
official
documentation.
B
So
we
can
just
say
what
you
should
consider
you
know
regarding
regard
latency,
you
know
in
the
official
documentary.
So
if
we
have
some
report,
you
know,
for
example,
nvidia
report-
I
don't
know
red
hat
ibm
report
and
in
the
report
we
say:
okay,
these.
These
are
the
numbers
that
we
measure
you
know
in
our
environment
and
then
it
should
be
fine.
But
in
the
official
you
know
the
in
the
official
you
know
on
the
github
and
the
official
documentation
of
cooper.
B
C
Yeah
just
one
question
here:
I
did
not
hear
everything
so
you
may
have
answered
it
already.
Would
it
wouldn't
it
still
make
sense
to
kind
of
come
up
with
some
numbers
for
the
hardware
we
have
to
stay?
This
is
what
we
want
to
have,
and
this
is
not
a
regression
so
to
just
see
if
we
have
on
non-hardware
regressions.
B
You
you
know
in
the
yeah,
so
it
can
be
a
discussion.
So
I
was
just
thinking
like
in
this
ryan
was
saying
when
we
described
about
the
slows.
Okay
so
and
then
in
the
document
that
we
described
about
the
slos,
I
don't
think
we
should
put
numbers
here
there,
so
we
should
just
say
what
is
the
slos
and
and
then
you
know
how
we
can.
We
can
measure
that
and
then
later,
as
I
was
comment,
I
don't
know
you
guys
can
disagree
with
that.
B
It's
fine
so
and
then,
for
example,
if
we
have
the
convert
blog
and
then
we
describe
what
we
see
in
our
vert
ci,
for
example,
you
know
in
the
hardware
that
we
have
the
slos
that
we
defined.
What's
the
numbers
that
we
see
or
some
other
experiments
that
we
can,
we
might
run.
You
know
we
can
report
that
has
a
kind
of
a
report,
but
we
don't.
B
Maybe
we
don't
need
to
say
the
official
numbers,
like
you
know,
in
our
slo
document
document
and
say:
okay,
the
vm,
should
you
know
creating
a
vm
should
be
you
know
lower
than
this.
A
Yeah,
I
understand
that
you're
saying
I
mean
I
guess
like
what
I'm
saying
is
that
I,
I
think,
I'm
thinking
like,
theoretically,
that
it
might
be
possible,
because,
if
assuming
this
theory
that
we
can
measure
assuming
here,
that
we
can
measure
a
clusterous
pressure
like
if
we
can
quantify
it,
if
we're
able
to
quantify
it,
then
like
that
should
be
a
consistent
number.
That,
like
I
could
put,
I
could
say,
like
our
ci
system,
has,
for
instance,
this
measurable
amount
of
pressure
right
at
at
rest.
A
Like
the
moment,
we
run
our
tests,
so
this
is
the
what
we'd
expect
within
some
plus
or
minus
range
for
performance,
and
this
is
you
know
what
like
this
is
what
you
should.
This
is
what
you
should
get,
and
that
would
give
us
that
would
give
us
a
lot
of
confidence
now.
That
would
be
very
similar
to.
I
think
what
you're
saying,
which
is
like
the
value,
if
like,
for
example,
if
we
were
to
just
say
in
rci
here's
how
what
the
performance
we
respect
expect,
I,
I
would
say
it's
sort
of
the
same.
A
It's
the
same
thing
as
that,
except
when
someone
goes
and
runs
someone
else
and
another
outside
of
you
know.
The
ci
environment
wants
to
run
one
this
performance
or
load
generation,
test
and
audit
tool
and
do
a
performance
test.
They're
gonna
see
different
numbers
right.
So
what's
gonna
change
well,
the
only
thing
that
we
could
give
them
to
tell
what's
changed
is
is
a
measure
of
their
pressure
arrests
so
that
they
can
know
that
okay,
their
cluster
is
different.
A
Here's
how
it's
different,
and
so
we,
if
we
know
that
we
could
we
could.
We
might
be
able
to
estimate
what
their
expected
pressure
would
be
within
some
range.
If
you
know
if,
if
we
had
that
number
that
that's
all
I'm
saying
like
so,
in
other
words
like
instead
of
saying,
instead
of
like
just
documenting
what
we
you
know,
we've
tested,
I'm
saying
we
might
be
able
to,
we
might
be
able
to
provide
a
way
for
other
people
to
estimate
for
themselves.
A
Do
you
follow
me
at
all,
or
do
you
disagree
with
that
or
I
it's
a
little
bit
more
difficult
like?
I
would
say
it's
more
difficult
because,
like
I
think
it's
the
same
thing,
it
solves
the
same
problem
that
you're
saying
which
is
like.
We
want
to
have
a
number
that
we
could
say
our
ci.
Currently
the
performance
we
expect
with
rci,
but
we
could
also
say
we
can
also
estimate
other
different
forms
of
performance
based
on
you
know
pressure
we
could
get.
A
We
could
gather
data
like
that
like,
for
instance,
if
we
found,
like
so
say
internally
nvidia
had
like
some
high
pressure.
You
know
it's
totally
from
ci.
We
could.
It
might
provide
more
justification
to
the
performance
that
we're
seeing,
which
might
be
totally
different
than
ci,
and
it
might
just
be
that,
because
our
performance,
our
profession,
our
pressure
number
is
higher,
so
our
performance
number
is
lower
or
something
like
that.
B
Yeah,
so
I
get
it
so
you
want
to
have
like
some
baseline
yeah.
I
think
a
baseline
is
fine,
so
I'm
just
you
know.
I
think
we
are
not.
Maybe
you
know
we
are
not
ready
yet
to
have
baselines
you
know
in,
but
it
should
be
fine.
If
we
create
a
report
and
a
blog,
you
know
you
know
showing
this
results,
but
to
put
like
you
know,
what's
the
target
pressure
that
we
want
to
have,
especially
because
in
especially
in
the
ci
is
very
small,
so
we
can
even
say
about
scalability
there.
A
Yeah
about
this
marcelo
is
like
let's
say
we
could.
What
if
we,
for
example
like
this,
would
be
actually
good
to
test
in
ci.
This
could
actually
like
I
mean
I
think
this
would
help.
This
theory
is
that
if
we
were
to
test
consistently
in
their
ci,
you
know
how
we
do
now
like
clean
clustered
right.
We
get
the
same
results.
A
We
can
put
those
points
on
a
line,
whereas,
if
we
don't
do
that,
then
what
we're
doing
is
we're
we're
just
putting
we're
sort
of
graphing
and
we're
just
we're
just
putting
numbers
out
there
we're
saying
okay,
this
is
here's.
What
we
see
in
ci
there's
what
you
expect,
but
we're
not
comparing
it
again.
It's
anything.
You
know
like
so
we
could
do
this.
Actually,
we
could
test
this
in
ci
as
well
like
we
could
test
the
pressure.
B
I
test
the
pressure
in
the
ci,
so
it
was
like
no.
I
did
the
presentation
defrosting
you
know
to
create
like,
for
example,
what's
the
maximum
number
of
vms
that
we
can
create
with
the
tiny
vms,
for
example,
I
could
create
you
know,
500.
You
know
vms
per
node.
Okay,
we
have
only
three
nodes,
but
it
was
like
a
big
pressure
was
very
slow
with
500.
You
know
reached
some
limits.
It
was
a
huge
pressure,
so
that
was
the
maximum,
but
it
was
like
vms.
B
B
You
know
deep
performance
evaluation,
I
think,
should
be
in
some
other
system.
You
know
what
I
mean
so
the
ci.
We
have
this
test
it
the
results
that
we
have
there.
We
can
understand
that
as
a
baseline,
especially
to
compare
how
it
evolves
with
the
code,
because
people
are
changing
the
code
and
we
need
to
now
we
have
like
we
net.
We
have
a
way
to
verify
if
the
performance
is
being
impacted
or
not.
B
You
know
things
like
that,
but
to
find
you
know
the
limits
that
you
are
comment
now.
This
is
something
for
example.
We
we
do
that.
You
know
we
are
doing
that
this
kind
of
testing
internally
in
red
hat,
but
it's
and
then
we
can
find
another,
maybe
some
cluster
or
even
some
of
this
data.
If
it's
possible
to
publish
you
know
and
make
it
public,
then
we
can,
you
know,
write
a
blog
or
you
know,
create
a
report
saying
what
is
the
limits?
B
What
we
what's
the
pressure
that
we
see
you
know
and
then,
as
I'm
saying,
I
think
this
is
kind
of
analysis
for
some
deep
performance
evaluation
in
a
specific
cluster.
But
this
ci
shouldn't
have
this
deep
performance
evaluation
with
the
extremely
stressed
test.
You
know
it
should
be
like
something
that
some
tests
that
it
we
can
reproduce
and
see
how
the
code
evolves.
A
Well,
that's
what's
interesting
to
me
is
that
is
that
it
might
be
reproducible
like
that's
what,
like
you
saw
right
when
you
did
your
your
pressure
tests.
You
probably
saw
that,
like
that
500
vm
on
the
node,
it
was
consistently
slow
right.
It
was
not
unexpectedly
fast,
you
know
when
it
was
always
slow,
like
that.
That's
what
I'm
saying
is
like
defining
those
expectations
and
and
then,
if
you're,
if
they're
predictable.
A
If
those
are
predictable,
then
we
might
be
able
to
measure
them
in
such
a
way
that-
and
this
just
like
in
the
way
that
you
know
we're
measuring
at
low
pressure.
A
So
because,
because
honestly,
like
sometimes
things
like
code,
changes
can
have
an
effect
have
no
effect
at
low
price
low
pressure
right,
but
they
could
have
an
effect
at
higher
pressure.
So
there's
like
there
is
like
testing.
The
extremes
might
not
be
a
bad
idea
like
we
might
find
things
that
are
different
with
this
type
of
test,
but
really
the
whole
thing
that's
important
here
is
that
is
it
predictable?
Like
that's
the
question
like?
Can
we
predict
what
is
going
to
happen?
A
You
know
based
on
the
pressure-
and
I
think
like
just
from
some
anecdotal
evidence
just
by
your
testing
I've
I've
seen
it
myself.
I
mean,
I
think
even
this
presentation
talks
about
it
that
you
know
that
you
can
predict
it
like.
You
can
predict.
It
is
predictably
slow,
but
I
mean
is
it
quantifiable
like?
A
Can
we
can
we
put
it
to
a
mathematical
equation
that
we
can
actually
measure
it
and
plot
it,
and
then,
if
we
can,
then
we
could
measure
it
and
then
we
could
test
both
extremes
because
we
might
get
different
results
based
on
code
changes,
so
it
may
be
useful
in
ci.
Just
for
just
for
that
reason,
yeah
do
you
agree
or
disagree.
B
B
So
I'm
just
saying
like
we
need
to
be
careful
with
the
kind
of
test
that
we
put
there
yeah
and-
and
you
know
in
the
for
right
now
we
have
a
test
that
it's,
I
think
it's
already.
You
know
with
a
lot
of
stress.
If
I
create
600
vms,
it's
a
small
cluster
three
nodes
only
and
it's
creating
600
vms
with
200
per
node.
B
It's
officially,
you
know,
I
think,
officially,
you
know
openshift
it's,
I
think,
with
official
250
vms,
which
is
recommended
or
something
like
that
yeah.
So
I
think
200
200
vms
per
node,
it's
red
like
a
very
high,
and
we
have
this
test
and
it's
is
low.
So
if
we
compare,
you
know
600
vms,
against,
creating
100,
so
cr,
I
don't
remember
now
the
exactly
times,
but
let
me
check
here
and
have
it
so
100
vm,
it's
less
than
one
minute.
Okay,
when
we
have
like
600
vms,
we
reach
the.
B
It's
loading,
but
anyway
we
can
see
that,
can
you
can
you
guys
see
yeah.
B
Yeah,
it's
for
some
reason.
It's
super
slow.
Now,
you
know,
might
be
my
internet
anyway.
So
just
I
don't
know
what
is
this.
B
Okay,
you
know
the
this
is
the
zoom
panel.
Is
here?
Okay,
so
probably
you
guys
are
not
seeing
this
open
anyway,
so
we
can
see
here
the
vm
creation
time.
These
are
many
tasks,
some
old
executions-
they
were,
you
know,
reaching
to
up
to
10
minutes,
but
later
so
just
just
see
the
the
first
graph
you
know
in
the
upper
yeah
and
then
later
they
start
to
be
like
five
minutes.
B
You
know
the
worst
case
scenario,
so
something
changed
in
the
code
that
make
it
better
and
I
actually
I'm
planning
to
write,
maybe
a
blog
about
these
results
or
something
that
I
will
try
to
do
like
later.
I
don't
have
the
time,
but
I
will
do
that.
A
So
you
saw,
but
you
saw
that
it
was
an
improvement
at
it,
was
noticeable
like
when
you
did
it
at
this
amount
of
pressure
like
there
was
a
noticeable
improvement
exactly.
B
And
so,
and
then
we
see
like
it's
been
created
like
the
worst
case
scenario,
it's
like
you
know
five
minutes
and.
B
Yeah,
it's
not
five,
eight
minutes
so
yeah.
Okay,
we
can
see
here
there
are
some
spikes.
You
see
some
some
variations
and
don't
know
what
this
is
variations,
something
that
you
know
is
also
we
don't.
B
The
tests
here
are
not
all
the
same,
because
sometimes
we
run
the
tests
creating
only
100
vmis
and
the
other
time
we
range.
You
know,
200
400
and
up
to
600
vmi
creation,
yeah
we're
not
sure.
If
we
can
see
that
was
a
performance
improvement
here,
because
there
is,
there
is
some
variation.
We
need
to
check
that
more.
You
know
longer
term
to
see
if
we
can
trust
you
know
these
improvements
or
not
anyway.
B
So
if
we
see
here,
for
example,
when
we
are
creating
600
gems,
it's
taking
in
the
you
know
worst
case
scenario
here
isn't
eight
minutes,
so
it's
very
high,
so
we,
I
think
we
already,
you
know,
see
some
pressure
here.
B
Of
course
we
can
reach
even
further
limit
like
rate
1000,
maybe
because
I
think
the
cluster
is
fine,
so
it's
up
to
1
000.
more
than
1
000.
It
doesn't
create.
It's
actually
breaks.
B
So
read
some
limits
if,
as
far
as
I
know,
you
know
before
you
know
enabling
the
jobs
I
was
trying
to
do
this,
you
know
performance
analysis.
I
have
it,
you
know
documents
for
that.
It
should
be
open.
I
will
open
this
and
then
we'll
share
this
again.
B
It
was
a
long
time
that
I
did
stop
this
documentation
anyway,
so
I
mean
it
was
like
for
creating
more
than
one
thousand,
I
think
was
reaching
some
limit,
but
I
don't
remember
now
which
limit
was
reaching
and
but
we
can
you
know,
theoretically,
we
can
create
500
400
per
node,
so
it
was
oh,
yes,
we
can
create
1200
maximum
1200,
but
with
400
it's
it's
already
put
a
lot
of
pressure
because
it's
it's
creating
too
many
pods
per
node.
So
it's
another
limit.
B
B
The
you
know,
the
container
runtimes
start
to
be
overload
because
it's
creating
too
much
containers
and
then
we
start
to
have
another
bottlenecks.
You
know,
and
it's
not
the
cooper
but
on
x,
that
I
th
that's
what
I'm
saying
so
up
to
1000.
I
think
should
be
fine
in
this
cluster
that
we
have
and
to
not
see
other
bottlenecks.
That
is
not
related
to
convert,
that
I'm
saying
and
and
then
we
can
analyze.
You
know
metrics,
especially
the
convert
work
queue.
I
saw
you
know
something
that
it's
interesting.
B
I
want
to
point
that
yeah
we
have
this
vert
controller
node,
it
should
be,
you
know,
have
maybe
less
pressure
now,
because
there
are
some
tiers.
Maybe
that
will
affect
us
to
reduce
the
number
of
gets
kind
of
things
and
then
we
might
be
interesting
if
we
disappears
get
merged.
We
can
see.
You
know
things
like
these
things
here
in
the
convergence.
A
You
know
like
you're
saying
with
like
keyboard
pressure.
It
would
be
interesting
to
see
if
we're
within
the
the
range
that
kubernetes
expects
right
with
with
pods
like
well,
because,
like
you
said,
if
we're
just
loading
onto
nodes
like
kubernetes,
already
knows
that
that's
that's
a
lot
of
pressure
so
yeah
we
we
expect
like.
A
We
expect
what
you're
seeing
right,
yeah
like
and
and-
and
so
do
we
know
like
like
do
we
know
like
I
guess
what
I
wouldn't
see
or
would
be
interesting
to
see,
is
like
if
we're
in
a
you
know.
If
we
have,
I
don't
know
after
how
many
you
know,
it's
three
notes
or
whatever,
and
if
you,
if
you're
loading
it
to
you,
know
300
or
something
you
know,
what
would
the?
A
What
would
your
expected
performance
be?
You
know
for
cuber.
What
would
be
what
would
we
expect
to
provide
you?
We
might
be
able
to
measure
that
and,
and
then
would
does
cuver
add
any
pressure
like
it
would
be
a
question
of
like
if
we're
measuring
it
consistently.
We
should
know
if
it
does
at
this
pressure.
A
You
know
if,
if
we,
because
I
mean
at
baseline
rate,
we
will,
we
might
see
it
a
little
bit,
but
it
might
not
be
noticeable,
but
we'll
definitely
see
it
here
if
kubert
is
adding
anything,
especially
as
it
changes
right
like
between
code
changes
like
you
were
saying
earlier
like.
If
we
saw
a
code
change
and
it
had
some
sort
of
improvements,
we
would
definitely
anything
would
be
any
any
improvement
would
be
amplified
here
or
any
any
anything
that
made
it
worse
would
be
amplified
yeah.
B
C
B
You
know
depends
on
the
pressure.
It's
not
any
more
preferred
components
that
we
see.
You
know
performance
problem.
It
will
be
like
something
else.
You
know
you
know
container
runtime,
google
ad
pressure
things
like
that,
because
we
are
we're
already
like
officially.
B
I
don't
remember
now
how
how
many
pods
bernould
kubernetes
officially
say
that
they
support
well
the
default.
One
is
100,
isn't
it
110
and
I
think
the
in
the
document
that
you
showed
before
we
see
like
100?
Also
they
recommend
you
know
open
shift.
It's
recommend
it's
by
default,
using
200,
something
more
than
that
we
are.
Rather
you
know
beyond
the
limit.
So
then
we
should
like
be
careful.
That's
what
I'm
saying,
because
maybe
we
will
not
see
what
we
want
to
measure.
B
A
I
agree,
I
think,
within
reason
like,
I
think,
that's
sort
of
the
the
limit
here
is
within
reason
likely,
but
but
a
higher
pressure
job.
What
can
yield
some
new
information
that
could
be
helpful
to
us
yeah,
I
think
so.
Yeah
I
mean
I
yeah.
I
mean
I
like
the
the
other
example.
That's
really
interesting
that
you
showed
yeah
okay,
I
wrote
that
down.
I
added
that
I
think
I
think
it
would
be
cool
to
see
if
we
could
do
a
high
pressure
job
just
to
see
how
you
know
the
measurements
there.
B
Okay,
but
we
see
things
that
it's
getting
better,
for
example
here
you
know
when,
in
the
right
request,
duration,
okay,
it's
like
delete
and
delete.
You
know
before
you
know
we
can
see
here.
What's
this
date,
I
don't
know.
What's
this
date
here
march,
that
looks
like
march
5th
yeah,
maybe
march
5th,
yeah,
okay,
no,
it's
march
5th,
okay,
so
it's
march
5th-
and
we
see
this-
you
know
delete
the
delete-
was
getting
like
high
three
seconds.
Okay,
it
depends
on
the
it
was
varying.
A
A
Would
be
interesting
marcelo
to
see
what
what
happened
in
the
cases
that
it
was?
I
guess
what
I'm
wondering
here
is
like
what
would
cause
the
delete
to
be
high
and
what
will
cause
it
to
be
low.
You
know,
because
I
mean
it,
it's
possible
that
between
these
tests
and
like
there
is
it's
not
there's,
probably
not
code
change,
or
maybe
there
is,
but
maybe
there
isn't,
but
there
also
could
be.
You
know
the
the
way
the
job
is
run.
Maybe
there's
something
that's
different.
A
The
pressure
has
changed
in
such
a
way
that
maybe
one
of
these
tipped
it
over.
That
could
also
be
the
case
like
it
would
be
interesting.
That's
why
I
was
like
kind
of
saying
it
would
be
interesting
to
know
the
pressure
right
when
you
measure
each
of
these.
If
there
was
a
way
to
do
that,
it
would
be.
It
would
provide
a
lot
more
information
than
just
looking
at
the
graphs.
We
could
have
a
little
bit
more
than
okay
code
has
changed.
A
It
could
be
that
oh
wait
a
second,
the
pressure
has
actually
changed,
and
maybe
it
wasn't
the
code.
What
do
you
mean?
The
pressure
so
like
the
pressure
like
what
I'm
saying
is
that
the
test
that
you
ran-
I
mean,
I
don't
know
the
test
that
you've
run
here.
But
let's
say
that
you
know
between
each
of
those
little
bars
is
a
different
test.
A
Well,
yeah,
okay,
I
mean,
I
guess,
I'm
just
I'm
just
proposing
a
theory
like.
Maybe
the
pressure
is
different
because,
like
it
is
possible
like
now,
maybe
in
your
case
that
could
be
the
case
like
with
what
you're
doing
it,
but
it
with
any
in
any
general
test
like
it
would
be
good
to
know
what
the
pressure
is
at
rest,
because
we
would
know
that
if
there
is
a
difference
between
these
two
tests,
like.
B
So
what
happens
here
in
the
test
is,
it
starts,
deploys
cobra,
run
the
tasks
and
undeploy
scuba
and
then
waits
hours.
You
know
I've
it's
just
running,
so
each
test
is
just
running
once
a
day.
So
then
the
next
day
do
the
same.
So
it's
it's.
It
takes
hours,
you
know
idle
and
the
the
and
then
we
have
just
just
two
jobs,
one
that
creates
100
dmi's,
another
one
that
range
200,
400
and
600.
A
Yeah,
I
understand
myself
like
yeah,
like
you,
you
have
it
set
up
so
that
it's
there
should
be.
The
test
should
all
be
clean
in
between
right
yeah.
I
I
understand.
All
I'm
saying
is
like
that.
It's
just
another
data
point
like
that.
It's
another
data
point.
If
you
are
doing
testing
in
your
data
center
and
you're,
not
doing
this,
you
know
if
you
just
want
to
test
that
whenever
you
know
like
you
want
to
know
like
as
another
data
point
for
your
test,
what
the
your
pressure
was
when
you
tested
it.
B
Yeah,
I
think
so.
I
think
that
the
two
you
know
that
we
are
writing
it's
might.
It
can
be
like
many
different
tasks,
I'm
just
thinking
that
maybe
the
cover
ci,
you
know
should
shouldn't
have
like
too
many
stress
tests.
Unless
it
will
be
very
essential
to
see
some
performance.
You
know
problem
that
we
want
to
see.
A
B
B
And
yeah
because
we
were
saying
like
before
you
know
a
stress
test
to
test
kubernetes
objects.
So,
and
I
I
think
those
kind
of
tests
shouldn't
be
in
the
converted
ci,
you
know
just
it
should
only
test
the
convert
object.
So.
A
Okay,
sure
I
think
that's
all.
Let
me
see,
I
think
it's
all.
I
have
yeah,
so
I
wrote
down
this
this
I
mean
this
would
be
good
to
have
at
some
point.
We
can
talk
about
this
in
the
future,
but
I
think
I
think
we
got
the
topic
pretty
good,
and
then
I
mean
it's
something
to
think
about.
I
think
it's
something
it
would
be
interesting
to
see.
If
this
is
something
we
could
do,
I
I
mean,
like
I
said
I
I
don't
know
like.
A
I
don't
know
if
this
is
quantifiable,
it's
hard
to
say,
but
it
would
be
interesting
to
know
to
know
that
I
think
it
would
just
bring.
It
might
be
a
good
data
point
that
we
can
add
when
we're
when
we're
talking
about
slos
and
we're
telling
people
to
measure
their
clusters
like
something
helpful
for
people.
A
B
When
it's
idle,
so
I
mean
no
pressure,
isn't
it
so
it's
just
to
check
like
resource
usage,
this
kind
of
things
that
is.
A
A
Requests
yeah
right,
like
you,
could
have
one
api
server
and
a
thousand
nodes,
and
that's
that's
a
bit
of
pressure.
That's
quite
a
bit
of
pressure
for
one
api
server.
So
like
there's,
there's
a
number
of
things
like
that
where
it
would
affect
your
ability
to
like
it
would
affect
the
numbers
that
you're
seeing
and
the
way
you
you
know
from
the
from
what
work.
The
tools
that
we
have
is
for
people
to
use
the
test.
A
Yeah,
that's
basically
what
I'm
saying
it's
like
this
is
like
helpful
yeah.
It's
stability.
B
B
A
Well,
I
mean
that
one
might
be
difficult,
because
I
mean
maybe
we
could.
I
mean
again,
it
depends
how
we
define
pressure
but,
like
all,
I
was
thinking
of
like
things
that
we
know
to
find
like
some
of
the
things
listed
here
all
have
different
forms
of
like
the
number
of
nodes,
the
number
of
pods
number
of
pvcs
number
of
name
spaces.
A
Those
I
just
think
like
those
could
be
numerical
values
and
then,
when
we're
at
like
during
our
test,
you
could
measure
pressure
again,
and
you
could
say
you
know
now
we
have
a
new
number
of
vms
above
and
so
on.
Our
pressure
values
should
change
and
based
on
you
know,
like
the
pressure
value
during
the
test,
you
know
we
might
be
able
to
predict
what
you
would
expect
for
performance
just
because
of
the
amount
of
pressure.
A
A
If
you
know
if,
when
we
could
test
it
right,
it's
one
way
like
we
have
to
get
access
to
certain
amount
of
nodes,
and
then
we
have
to
run
a
you
know
test
again
against
it
and
we
probably
have
to
continuously
test
it.
A
There's
some
challenges
there,
but
you
know
that
would
give
us
a
way
to
say
that.
Okay,
it
works
with
this
many
notes
and
the
other
way,
which
is
what
I'm
saying
here
is
like.
If
we
knew
the
amount
of
pressure
and
the
performance
and
we
we
could
get
a
better
idea,
we
might
be
able
to
get
a
better
idea
of
how
well
it
scales
or
how
well
it
performs
at
some
at
different
scales,
just
by
a
measure
of
pressure.
A
Yeah
makes
sense,
yeah,
okay,
I'll,
keep
thinking
about
this,
like
I,
I
think,
there's
a
lot
to
unpack
here,
so
I'll
keep
thinking
about
this,
and
maybe
something
we
can
talk
about
in
the
future,
as
maybe,
as
I
get
some
a
little
bit
more
clarity
on
some
of
the
some
of
the
like
different
forms
of
pressure
and
how
it
can
affect
it.
But
I
would
I
think
what
I'll
do
is
I'll,
probably
do
a
little
bit
more
testing.
A
I
don't
want
to
use
that
steady
state
job,
do
a
little
more
testing
and
kind
of
continue
to
form
this
theory
based
on
you
know
what
I
see
and
the
results
from
that.
I
think,
because
I'll
go
with
this
okay
cool
all
right,
I
don't
have
any
more
points.
Do
you
guys
have
anything
else
you
want
to
discuss
yeah,
so
roman.
A
Mission,
roman
you're
quiet
on
some
of
this
stuff.
Could
you
do
you
have
any
opinion
on
this
of
what
we
were
talking
about.
A
That's
fine,
okay,
the
yeah!
That's
all
right!
We
can.
The
tldr
was
that
we're
trying
to
figure
out.
You
know
like
a
way
to
measure
pressure
based
on
the
number
of
nodes
in
a
cluster,
the
number
of
femi's
and
so
on
as
a
way
to
to
sort
of
normalize
someone's
performance
like
numbers
that
they
gather
and
possibly
have
a
way
to.
We
could
also
predict
scale
or
estimate
scale.
You
know,
based
on
someone's.
C
Yeah
you've
probably
mentioned
it
anyway,
but
I
mean
we
have
some.
We
would
see
some
pressure
in
general
if
we
have
to
ask
kale
and
have
prometheus
properly
deployed
right.
So
I
guess
we
would
just
do
on
our
deployment,
ensure
that
with
certain
goals
we
want
to
meet,
we
don't
see
disk
pressure
or
whatever
or
it's
just,
but
I
said
I
didn't
fully
listen,
so
it
may
be
a
little
bit
of
what
I'm
saying.
A
No
yeah
like
what
we
yeah,
it's
mostly
like
we
we
want
to
take,
is
like
it's
based
on.
Actually
this
this
a
little
bit
on
what
I've.
What
we've
seen
from
testing
like
marcelo,
talked
about
like
when
we
see
things
slow
down.
For
instance,
when
we
have
what
was
it
500
vms,
on
a
node
marcelo
like
we
see
like
the
400th
499th
vm
is
a
little
bit
slower
than
the
first
like
the
right
like
there's,
there's
a
difference
there
and
and
and
we're
trying
to
see
if
there's
a
way
to
quantify
it
like.
C
Yeah,
I
guess
we
have
a
lot
of
metrics
there
already
like
you,
you,
you
see
how
the
watches
are
performing
you're,
seeing
the
rate
limiters
in
the
clients
right
yeah,
and
this
is
mostly
important
to
to
see
if
you're
hitting
some
limits
there.
I
guess
but
yeah
I
mean
we
can
also
maybe
miss
some
and,
of
course,
disc
pressure
needs
to
be
this
precious
cpu
operation.
At
this
pressure
memory
pressure
needs
to
be
monitored.
A
Okay,
all
right
guys,
so
next
meeting
will
be
in
three
weeks
april,
7th,
so
I'll
be
up
for
three
weeks
and
then
I'll
return
and
it'll.
Be
our
next
call.
Okay,
okay!
Thank
you.
Everybody.