►
From YouTube: SIG - Performance and scale 2022-06-02
Description
Meeting Notes:
https://docs.google.com/document/d/1d_b2o05FfBG37VwlC2Z1ZArnT9-_AEJoQTe7iKaQZ6I/edit#heading=h.tybh
A
All
right
welcome
to
sixth
scale
everybody.
I'm
gonna
put
the
note
link
the
notes
in
the
chat.
Please
add
yourself
and
visit
as
an
attendee.
Please.
A
A
Let's
take
a
look,
we're
still
getting
some
failures.
Let
me
just
we're
gonna
look
at
both
of
these.
A
Okay,
so
for
for
those
who,
for
those
of
you
don't
know,
this
performance
job
is
something
we
run
periodically.
It'll
go
through,
create
100,
bmis
and
we'll
we'll
grab
a
bunch
of
metrics.
A
We
have
a
an
audit
tool
that
this
is
like
a
a
little
script
that
goes
through
and
grabs
metrics,
and
then
we
have
a
bunch
of
thresholds
that
we
can
pair
and
kind
of
the
way
it
works
is
at
this
end,
at
the
end
of
the
test.
A
It's
here's
thus
there's
a
summary
and
so
the
way
the
way
to
read
this
is
like
so
I
said
like
there's
a
hundred
females
that
we
create,
but
the
the
way
like
you'll
see
this
isn't
exactly
100,
and
this
is
actually
expected.
We
we've
done.
We've
done
like
extensive
amount
of
work
to
try
and
figure
this
out,
because
it's
actually
it's
really
tricky
to
to
get
an
exact
value,
because
the
way
that
prometheus
does
its
measurements
a
time
series
database.
A
The
way
we
need
to
measure
is
we
actually
need
to
measure
rates
of
change
and
prometheus.
Does
this
by
doing
some
estimations
over
periods
of
time.
So
the
count
the
the
crate
pods
count
is
is
an
estimation
of
of
what
we
would
see
over
some
certain
amount
of
time.
So
let's
say
our
test
ran
for
three
four
minutes.
A
Whatever
the
prometheus
does
some
sort
of
extrapolation
to
to
get
us
a
value,
so
it'll
never
be
exact,
but
it
should
be
close
roughly
to
the
the
amount
of
the
exact
amount
that
that
we
see
so
we
get
105
and
create
is
like
the
create
request
that
we
make
kubernetes
we're
actually
grabbing
those
metrics
and
we're
and
we're
comparing
we're
using
this,
like
kind
of
as
our
as
our
like
anchor
point
to
say
like
okay,
this
is
you
know,
this
is
what
we
expected
in
the
test,
and
so
what
we
actually
do
is
we
take
this
metric
and
we
do.
A
We
use
it
to
compare
against
some
of
the
other
ones
so
like,
in
other
words,
when
we
create
100
vmis.
Here's,
how
many
api
calls
that
we
expect
of
each
type.
So
we
expect
you
know
a
certain
amount
of
of
these
and
the
ones
that
we
have
very
confident
are
like
stable
or
the
ones
we
created
thresholds,
and
those
are
these
are
right
here.
You
can
see
that
the
way
the
way
to
read
this
is
we
have.
A
We
take
a
relationship
between
the
number
of
update
requests
and
we
relate
it
to
the
number
of
create
pod
counts
and
we
have
like
a
certain
threshold
and
we
think
it's
ten
to
one
that
we
allow
and
if,
as
long
as
it's
within
that
threshold,
we're
we're
happy
we've
we
haven't
regressed
in
any
way
of
the
number
of
calls
number
of
update,
calls
and
same
with
patch.
A
We
have.
I
think
those
are
the
main
two
that
we
have
thresholds
too.
So
I
mean
for
this.
This
all
looks
good.
This
job
obviously
passed.
You
can
see
the
running
phase,
we
actually
caught
them.
We
had
100
in
running
phase,
so
that
gives
you.
This
is
an
exact
metric
because
it's
just
a
at
count
specific
point
in
time,
and
then
we
have
ourselves
the
the
amount
of
time
it
took
for
each
of
the
the
vmis
to
go
through
their
phases.
A
So
from
crate
to
running
we
saw
like
90
percent
we
saw
within
our
threshold.
We
say
it's
45
seconds
with
almost
no
more
than
25.
When
the
muslims
took
25
seconds,
the
p95
is
38
seconds.
We
expect
to
be
less
than
60
and
we
don't
have
one
for
p99,
because
that
can
very
well
seem
as
high.
It's
like
60,
70
and
sometimes
as
low
as
we
see
here
is
39.
So
we
don't
even
count
it
it's
it's
kind
of
an
outlier
statistically,
but
it's
something
that's
interesting
to
see.
A
So
this
test
looked
good
in
the
past
and
and
this
runs
periodically
based
on
off
of
the
master
branch,
and
this
is
just
a
tool
that
we
like,
I
said,
tool
that
we
developed
and
you
can
actually
run
it
locally
after
any
test
that
you
do
in
your
cluster.
A
Okay
or
we
might
not
be
able
to
get
to
it
this
year,
they
might
just
let
it
load
in
the
background.
Maybe
you'll
get
an
answer
in
a
few
minutes.
Okay,
so
this
is
one
of
our
tests.
Let
me
go
to
a
few
other
ones.
We
have
so
it's
periodic.
We
have
number
periodic
once
you
have
two
more
through
that
periodics.
A
A
See
if
it's
the
same
as
you
want
to
do
that,
I'm
going
to
open
up
the
last
one,
which
is
the
pre-submit
job.
This
is
an
optional
job.
It's
the
same
thing
that
and
the
periodic,
which
is
we,
we
allow
people
who
are
doing
pull
requests
to
optionally
run
the
performance
tests.
A
Oh
there's
an
excuse:
presets.
A
A
A
And
then,
let's
see
your
thresholds,
so
actually
you
can
see
here
that
there's
sometimes
we
get
there's
a
lot
of
this.
Sometimes
we
get
a
lot
of
other
metrics
that
we
scoop
up
that
happen
during
this
time.
Since
we
we
grab
quite
a
bit
of
of
api
requests.
We
sometimes
there
are
other
ones
like
list
cuberts,
for
example.
So
that's
why
we
don't
have
thresholds
for
some
of
these
they're
inconsistent.
They
don't
they
show
up.
A
Sometimes
sometimes
they
don't
so
we
ignore
them,
but
they're
still
important
to
keep
an
eye
on,
though,
when
we
do
when
they
do
pop
up,
because
we
would
never
want
this
value
like
if
we're
doing
100
vms.
We
would
never
want
to
have
any
correlation
between
this
value
and
and
the
great
point
of
accounts
account
or
we're
going
to
have
we're
going
to
be
in
trouble,
because
this
is
expensive.
So
it's
good.
These
are
all
low.
It's
all
expected.
B
Yeah
anyway,
so
I
think
this
one
has
similar
characteristics
in
failures
as
the
other
one.
The
the
similarity
that
I
observed
is
line
two
zero
to
nine
one
is
so
one
vm
is
not
running
and
then
at
the
end
of
the
test
it
says
the
phase
of
one
vm
is
not
running.
I
I'm
not
sure
if
that's
a
red
heading,
but
I
that's
the
similarity
I
have
noticed.
A
Okay
yeah,
I
just
I
just
saw
this-
that
one
is
in
scheduling.
B
Yeah
yeah,
you
see
line
two
three,
four,
nine.
The
face
is
not
running.
B
A
Okay,
let's
see
okay,
that's
I
wonder
if
this
is
the
this
might
be,
we
might
need
to
increase
the
memory.
Let
me
see
if
the
you
can
get
the
artifacts
here
and
tell
us
why.
A
B
A
Here
it
is
okay,
so
it's
mission
memory,
okay,
that's
what
I
thought
it
was
okay.
The
memory
is
not
high
enough
on
these
notes.
All
right,
all
right
we
this
was.
This-
is
a
there's,
an
issue
we've
seen
in
the
past.
We
just
raised
it
like
a
few
weeks
ago
and
we
need
to
raise
it
again
looks
like
looks
like
it's
just
a
little
bit
too
tight.
B
Just
so
I
understand
we
need
to
increase
the
the
memory
for
provisioned
clusters
that
are.
B
A
A
Good
yep
all
right,
good
that'll.
That
should
fix
that
one
and
probably
that's
what's
going
on
yeah,
we
well
that's
beside
where
you
saw
it
in
both
cases,
so
that's
likely
what's
going
on
in
both
cases,
okay,
good,
all
right,
that'll
that
should
fix
that,
and
then
we
have
the
other
work
in
progress
for
the
dedicated
cluster
performance
job.
A
Okay,
all
right!
I
don't
think
that
marcelo
here
today,
so
much
this
is
marcelo's
fix
that
he's
working
on
to
on
the
load
generator.
This
is
what
will
fix
the
the
dedicated
cluster
performance
job
so
this
job.
So
I
didn't
talk
a
ton
about
it.
It's
it's
over.
A
It's
it's
this
one,
the
these
two
actually
so
these
are.
These
are
run
on
a
dedicated
cluster,
which
is
better
for
us
to
do
scale,
testing
on
and
right
now.
The
target
for
this
work
is
that
run
what
we
call
burst:
tests
and
burst
tests,
we're
defining
as
create
a
bunch
of
vms.
A
You
know
whatever
variable
rate
and
and
then
we're
going
to
wait,
till
they're
running
and
then
and
then
delete
them,
and
so
there's
a
lot
of
variation
to
that
like
we
could
create
them
at
rate.
We
could
create
them
weight.
There's
a
lot
of
things
that
we
can
do.
That's
one
of
the
two
types
of
tests
that
we're
going
to
do
we're
just
starting
with
first.
A
A
If
we
expect
100
create
100,
we
delete
10
and
then
the
job
should
automatically
recreate
10
more
and
that
one
has
a
lot
of
variation,
because
we
can
change
the
number
of
how
fast
we
delete
how
fast
we
recreate
and
it
sort
of
attached
like
how
the
different
you
know
how
pressure
affects
the
cluster
based
on
you
know
the
different
rates
and
and
how
fast
you
recreate
and
so
on.
A
Okay.
Lastly,.
A
So
cubecon
n
a
submissions.
The
call
for
proposals
ends,
I
think
it's
tomorrow
marcelo
and
I
are
going
to
talk.
This
is
what
it
is
where
we're
tracking
it.
Actually,
this
needs
to
be
updated,
but
we
have.
This
is
where
we're
actually
collaborating
in
the
school
doc,
but
this
is
what
we're
looking
online
to
submit.
Actually
let
me
this
is
what
I'll
do.
A
Let
me
just
go
to
this
one,
so
we
wanted
to
talk
about
actually
how
we're
gonna,
how
we've
created
the
performance
infrastructure
for
kubert
and
cater
to
talk
to
like
how
other
projects
can
do
it
so
go
through
some
of
the
steps
that
we
did
and
talk
about
some
of
the
things
like
the
metrics,
which
I
think
are
really
important
for
any.
A
Okay,
all
right,
I
don't
have
any
more
topics,
I
don't
know
la,
do
you
I
think
yeah
like
here
lay
do
you
have
anything
else?
You
want
to
talk
about.
B
No,
oh,
I
was
just
listening
in.
Thank
you,
cool.