►
From YouTube: SIG - Performance and scale 2021-07-22
Description
Meeting Notes: https://docs.google.com/document/d/1d_b2o05FfBG37VwlC2Z1ZArnT9-_AEJoQTe7iKaQZ6I/edit#heading=h.i2ab65exaot0
A
Okay,
all
right
welcome
to
sixth
scale
everybody,
so
a
few
announcements
so
for
today,
next
week
next
week,
next
thursday,
the
29th
july
29th-
I'm
gonna-
be
out
the
whole
week.
So
we
could
we
haven't.
We
could
have
a
meeting
if
folks
want
to,
but
I'm
gonna
need
a
volunteer.
If
someone
wants
to
host
the
meeting,
I
can
give
you
the
the
admin
control,
so
you
can
start
the
recording
and
self
recording,
etc,
or
we
can
not
have
a
meeting.
A
B
Yeah,
I
don't
mind,
writing
it.
If
we
have
things
to
talk
about,
let's
see
how
this
meeting
goes
and
if
we
feel
at
the
end
that
we
need
to
sync
again
next
week,
then
I
can
run
it
all.
A
Right,
okay
sounds
good
all
right,
so
let's
get
started
with
the
first
things.
It's
I
don't
know
if
marcelo
joined,
but
maybe
he's
got
a
conflict
today,
so
the
so
last
time
we
talked
about
this
document
from
marcelo's
experiments
where
we
got
a
bunch
of
information
about
from
the
from
grafana.
I
took
this
and
oh
looks
like
there's.
Marcelo
might
be
talking
about
it
right
now,
so
the
I
took
these
and
I
I
created
a
bunch
of
issues
from
them.
A
A
So
I
posted
the
the
graphs
here
based
on
and
the
metrics,
and
I
just
kind
of
drew
a
conclusion.
So
this
is
the
first
one.
So
api
requests
the
keyword.
I
o
virtual
machine
instances
returns
404s.
So
our
expectation
on
this
one
right
is
that
we
don't
see
this
like.
We
don't
expect
to
have
a
bunch
of
404s
on
this
on
this
api
so
and
it
just,
and
it's
really
got
worse
every
time
as
we
created
more
vmix,
we
just
saw
a
lot
more
requests.
A
I
mean
that
kind
of
makes
sense,
we're
making
a
lot
more
requests
to
this
api,
and
it's
just
it's
returning
a
404.
Does
that
make
sense
of
something
to
to
have
I
as
a
as
an
issue
here
or
is
this
something
that
I
mean?
Do
we
expect
this?
I
mean
I,
I
don't
think
so.
Right
like.
C
B
B
A
Yeah,
I
saw,
I
think,
because
he
does
have
the
kubernetes
metrics
in
here,
and
I
thought
I
saw
that
that
kubernetes
also
gets
these.
I
mean
it's
not
a
good
thing,
but
it's
just
another
data.
A
Well,
maybe
we
can
define
like.
So
what
is
what
exactly?
What
is
what's
happening
here
when
we're
hitting
this
404
we're
calling
this
we're
calling
a
is
it
that
we're
trying
to
request
a
virtual
machine
instance,
and
it's
just
not
there
or
is
it
that
we're
like?
What
is
it
that
we're
doing?
Because
this
this
metric
is
read,
we
don't
know
we're.
B
B
A
get
on
a
resource
and
the
resource
is
not
there.
So
what
resource
we're
calling
we
do
not
know,
so
it
could
be
anything.
We
have
like
12
controllers,
anything,
that's
keying
off
a
bmi.
Is
this
calling
a
get
on?
Something
could
be
causing
this,
for
example,
the
pod
destruction
budget?
If
we
were
calling
a
git
on
pod
disruption
budgets
every
time
a
vmi
gets
queued,
then
that
would
cause
a
404
or
even
like
more
like
snapshot
controllers
and
things
like
that
could
possibly
do
it
as
well.
We
have
to
investigate
it.
Yeah.
A
Okay,
all
right,
okay,
we'll
keep
it
in,
sounds
good.
All
right,
we'll
go
to
next
one.
This
one
is
for
controlling
disruption.
Budget
work,
you
add
rate,
is
very
high,
so
creating
a
lot
of
vms.
We
just
get.
A
We
get
this
in
the
work
you
add
rates,
it's
just
so
much
higher
than
the
other
in
the
other.
Metrics
here
word
handler
for
controller,
and
this
is
yeah
another
one.
Why
is
that
happening?
I
I
actually,
I
think
I
define
it
yeah.
I
did
okay,
so
the
total
number
of
ads
handled
by
a
work
queue
yeah,
why?
Why
is
this
one
so
much
higher
than
any
other
one?
Maybe
two
like
there
might
be
like
another
issue
buried
underneath
this,
but
this
could
be
like
the
at
least
get
us
to.
A
You
know
a
little
bit
closer
to
what's
going
on.
B
B
B
A
Yeah,
that's
that's
a
good
one.
Let's
get,
let's
get
a
little
more
information
than
this
already
used.
Okay,
next
one's
the
work,
q
performance,
this
one's
kind
of
general-
I
I
thought
about
splitting
this
up
into
a
bunch
of
different
ones,
but
it
just
kind
of
seemed
all
related
here
in
some
way.
So
first
metric
was
the
work
q
latency.
A
This
is
worst
case.
I
think
right,
I
don't
miss
it.
C
Yeah,
for
that,
I
would
really
just
leave
it
as
one
issue
until
we
have
the
possibility
to
configure
qrs
and
read
limited
to
the
tests
again
so
because
it
really
does
at
least
right
now
they
have
to
be
high
pretty
much.
There
is
no,
no,
not
any
other
chance,
even
and
as
soon
as
we
have
it,
we
can
then
see
what
remains
high
and
then
investigate
that.
A
All
right,
let
me
you
have
your
your
pull
request
somewhere
in
here
right.
It's
yeah.
A
All
right:
let's
do
that
then
I'll.
Give
this.
Let's
see
the
changes,
yeah,
okay,
next
one,
so
the
go
routine
count
and
memory
remains
high
after
vmis
are
removed.
This
one
was
weird
yeah.
I
mean
the
level
of
go
routines,
just
stays
that
way.
Even
when
we
are,
we've
got
no
vmis
memory
and
there's
even
another
question
here,
but
I
didn't.
A
The
picture
a
little
bit,
it
might
lose
some
quality
but
yeah,
so
the
so.
What
I
was
saying
is
the
you
can
see
like
the
go
routines,
the
line,
the
baseline.
When
we
have
no
vms
it
steadily
increases
as
you
create
more,
which
is
interesting,
cpu
usage
stays
down.
This
was
a
weird
metric.
This
is
oh,
I
spelled
the
wrong
time
spent
using
the
cpu,
as
opposed
to
like,
like
percent
or
like
number
of
cpus
you
used
or
something
memory
it
seems
like
it
kind
of
generally,
is
on
an
incline.
A
A
That
would
be
kind
of
one
thing
that
made
me
think
of
this,
like
I
know
at
least
like
internally,
like
we
set
requests,
we
we
have
some
requests
on
it
for
memory,
but
I
don't
know
if
it's
something
that's
currently.
C
Yeah,
we
said
something
requests,
but
no
limits
and
yeah
and
they.
A
Yeah
and
then
the
other
question
here
was
that
that
I
brought
up
was
if
we
fill
the
nodes
you
know
does
like.
Do
we
still
see
the
the
memory
climb
like
you
know,
let's
say
like
we
have
three
nodes
and
they
have
a
hundred
vm
limit.
We
fill
them
does
like.
A
Is
that
do
we
see
increased
memory
for
that
like
just
to
know
if,
like
if
there's
some
sort
of
other,
if
it
has
any
correlation
with
the
number
of
bmis
or
if
it's
just
simply
like,
we
have
invert
handler
that's
at
max
capacity?
A
So
that's
another
question,
but
we
haven't.
We
haven't
figured
that
out,
though,
that's
just
something
that
kind
of
looks
here,
something
that
we
will
need
to
figure
out
at
some
point,
but
that's
that's
kind
of
what
I
wanted
to
get
into
some
other
tests
that
we
can
do.
A
Okay,
so
that's
that
one
all
right.
So
those
are
the
four
issues.
Were
there
any
other
ones
that
we
could
think
of
that
need
to
go?
Are
they
gonna
be
created?
On
this
I
mean.
Should
I
break
the
I
don't
know
did?
B
Had
a
thought
after
the
meeting
last
week
about
the
409s,
so
that
that's
what
my
pr
I
had
a
pr
that
helped
address
that
under
load.
We
actually
see
less
foreign,
because
the
queue
is
backed
up
which
gives
the
informers
more
time
to
catch
up.
So
the
409s
would
actually
only
become
an
issue
when
we
become
more
efficient,
they're.
A
Okay,
yeah,
that's
interesting,
yeah!
I
yeah!
Okay,
let's
that's
another!
Well,
okay,
look
well!
I've
already
have
like
this
is
kind
of
what
I
wanted
to
do
in
the
evaluation
sections.
I
want
to
start
like
defining
things
that
we
can
like,
because
we
have
now
we've
got
like
this
data
and
stuff
like
things
that
we
can
evaluate.
A
So
let
me
just
write
this
one
so
like
so
four
nines,
probably
after
the
qps
change,
we
need
to
actually
when
you
do
the
qps
change,
so
qps
change
and
then
and
measure
or
q
efficiency,
and
then,
following
that,
we
want
to
see
if
the
four
nines
effect
effect
with
an
efficient
work.
You
under
high
load,
that's
another
one.
Okay,.
A
Okay,
yeah,
I
I
don't,
do
you
think
it
should
be?
I
mean
I
don't
think
we
need
to
have
issues
with
these.
I
think
we
could
just
keep
them
like
this
or
something
or
maybe
I
could
just
create
an
issue-
just
kind
of
load
these
all
in
there
and
we
can
just
check
them
off
or
something
I
don't
know
all
right,
we'll
just
go
with
this
for
now
and
if
it
becomes
a
problem,
I
can
create
an
issue
for
it
all
right,
yeah,
okay,
all
right
next
item,
so
baseline
thresholds.
A
I
created
this
this
morning,
so
I
talked
about
this
recently.
I
was
trying
to
find
the
right
way
to
like
make
this
usable.
I'm
just
going
to
start
really
really
simple
for
this.
So
basically
the
goal
is
to
have
some
source
of
truth,
so
that
ci
can
read
performance
scale,
metrics
per
release.
A
So,
like
every
time
like
we
go
to
where
these
four
three
we'll
have
some
sort
of
code.
That's
that
hangs
around
that
just
holds
these
these
thresholds
in
place
so
that
you
know
for
when
we
run
ci
for
like
back
ports
and
stuff
or
just
if
people
want
to
consume,
have
ci.
That's
that
runs
externally.
We
can
just
we'll
have
this.
A
This
is
what
like
our
expectation
is,
and
all
I
figured
we
do
is
we
just
add
a
bunch
of
constants
in
here
that
kind
of
builds
our
list
of
things,
and
we
just
have
a
process
for
approving
these.
These
thresholds,
based
on
what
we
know
about
the
release
and
and
what
we
want
to
measure.
B
B
As
well,
so
I
think
it
makes
sense
to
commit
at
least
some
sort
of
expectations
into
the
code
base
around
this.
Are
we
thinking
about
using
that
that
tool
that
I
was
creating,
that
would
retroactively,
gather
or
report
results
in
order
to
determine
these
thresholds
or
how?
What
were
you
envisioning
here.
A
A
We
we
know
what
these
are
we
just
kind
of
like
when
we
cut
the
release,
we
we
set
them
in
stone
and
then
we
we
cut
the
release
and
here's
our
here's,
our
here's,
what
we
expect
like,
if
you
don't
change
the
code
base.
This
is
what
you
should
see
all
the
time,
and
so
we
use
this
is
that
source
of
truth
and
we
gather
we
put.
We
gather
that
data
like
on
the
day.
A
We
do
the
release
from
your
tool
to
figure
out
to
set
all
of
these
up,
so
we
so
I
kind
of
see
it
as
like.
A
It's
like
it's
it's
our
way
of
communicating
like
what
exactly
what's
what
our
expectations
are.
What
for
this.
F
Yeah
I
I
haven't
just
I
would
say
like
this
thing
in
my
pr
before
my
huge
pr
that
I
had
before
so
then
we
simplified
that
the
idea
was
actually
that
thing
that
you
mentioned.
So,
for
example,
we
have
jobs
running
for
prs
or
daily.
I
think
we
should
start
daily
now,
just
to
make
it
easier.
You
know
to
run
those
tests
and
then
we
we
define
this
threshold
some
somewhere.
F
I
think
it's
good
the
place
that
you
did
and
and
then
we
were
discussing
the
framework
to
collect
this
and
and
then
to
compare
the
results
in
it.
So
maybe
we
can
combine
everything
so
this
threshold
that
you
define
here,
plus
the
david
tools
to
compare
the
the
experiments
so
and
then
and
then
it
should.
You
know,
write
some
alert
or
fail
a
task.
Then
we
should
discuss
that.
What's
better.
F
B
Of
representing
thresholds-
and
I
think
I
can
use
like-
create
some
sort
of
config
that
that
we
can
pass
into
that
perf
tool
that
I
was
working
on,
the
one
that
gathers
results.
But
we
can
say
these
are
the
thresholds.
We
want
to
meet
and
did
we
meet
them
or
not
and
have
that
and
the
results
or
something
tell
us
that
we
passed
or
failed
our
thresholds
as
well.
B
F
Yeah
so
the
my
initial
idea,
like
the
first
version
of
the
pr
I
I
use,
I
put
this
in
a
ml,
so
a
configuration
file
and
then
we
could
like
easily
change
in
all
the
thresholds,
and
everyone
can
maybe
change
that
according
to
their
environment.
If
it's
running
you
know.
B
So
I
think
I
like
the
thresholds,
can
we
move
that
to
the
test
package
and
maybe
have
like
a
perfscale
subdirectory
and
the
test
package
and
then
have
these
configs
per
an
environment,
the
threshold
configs
per
environment?
That's
what
I
think
would
make
most
sense.
A
Well,
so
one
question
about
this:
let
me
clarify
so
the
so
per
environment.
I
understand
that
so
we
use
it
as
a
config,
but
what
about
like?
Oh
okay,
I
think
I
understand
you
mean
per
environment
so
like
if,
let's
say
like
we're
like
I'm
in
the
mindset
of
like
when,
when
we
release
this,
you
know
how
we're
going
to
communicate
this.
So
would
this
be
like
in
a
100
nodes,
here's
what
we
expect
to
see
for
your
per
scale
thresholds
is
it
is
that
is
that
kind
of.
A
C
B
You
just
pick
the
config.
You
want
to
run
against
your
threshold
config
when
you're
running
the
test,
so
the
perf
test
that's
running
like
daily
or
whatever,
when
we
create
that
automation,
job,
maybe
there's
a
cli
argument
or
an
environment
variable
or
whatever
that
we
specify
the
threshold
config.
We
want
to
compare
against
that
matches
the
environment
that
we're
testing
against,
and
then
it's
just
always
used.
A
A
F
Because
if
we
have
it
hard
coded
so
every
time
that
we
change
the
environment,
we
we
need
to.
You
know
change
this.
Also,
for
example,
let's
assume
some
someone
wants
to
run
cooper
in
their
local
environment
and
want
to
test
also.
So
if
this
is
configurable,
they
can,
you
know,
use
this
task
there,
so
it's
more
like
generics
anyway,
so
right
now,
our
focus,
of
course
it's
in
our
ci
environment,
but
it's
making
it
useful
for
more
people.
Just
saying
that.
A
I
can
move
this
I'll
move
this
over
to
to
where
you're
working
david
and
then
so
the
other
thing
it's
kind
of
the
other
question,
because
this
this
I
I
just
wanted
to
clarify
so
like
what
like,
what
do
people
think
of
that
because,
like
this
is
where
I'm
kind
of
working
toward
with
this,
it's
like.
We
have
some
way
to
say
like
with
the
release
like
what
our
expectations
are.
Should
we
would
we
say
that
like
like
this
is
like?
Should
we
create
sort
of
that
expectation?
A
I
guess
is
sort
of
the
question
here
because,
like
what
we're
saying
is
it's
gonna
be
we're
aiming
toward
per
environment
for
testing
for
thresholds?
Should
we
even
go
that
route
and
say:
okay,
here's
what
our
expectation
is
for
performance
for
this
release.
Is
that
even
something
we
want
to
go.
F
F
A
A
C
When
you
run
it
on
a
permission,
metal
machine,
if
you
with
your
f1
ibm
cloud,
then
it
has
maybe
faster
cpus
or
something
so
it
will.
I
mean
for
the
release.
We
of
course
have
to
compare
it
to
the
same
machines,
but
when
you're
doing
local
experiments
you're
not
running
on
the
same
machines
like
on
ci,
and
you
may
want
to
change
some
parts.
I
think
that's
all
what
this
is
about.
F
A
No,
no,
I
I
yeah,
so
I
get
that
so
I
get
the
configurable
aspect
of
it.
All
I'm
saying
is
like,
like
I
had
mentioned,
it's
like
what
makes
sense
to
me
is
like
we
like.
One
of
the
goals
I
laid
out
here
was
that,
like
you,
want
it
to
be
like
used
by
ci,
to
evaluate,
and
so
like
what
we
talked
about
last
time
was
that
the
tool
that
david's
using
is
configurable
and
that
the
performance
is
going
to
be
specific
to
the
environment,
that
it
runs
on
that.
A
That
makes
sense
to
me
because
we
want
to
compare
apple
samples,
and
so
the
other
part
of
this,
though,
is
that
it's
like
I'm
saying,
every
release
that
we
do
at
cuber
like
do.
We
want
to
say
like
what
our
expected
thresholds
are,
but
it
like.
If
like
can
we
can
we
like
say
that
if
we're
also
saying
like
we
expect
like
without
saying,
let's
specifying
any
sort
of
like
hardware
requirements
or
like
saying
if
you
use
this
script
or
something
like
that,.
B
It's
expected
for
it's
the
expected
requirements
based
on
our
ci
hardware.
That's
it!
Yes,
yes,
it's
not
publicly!
This
is
what
you
would
expect.
I
mean
if
you
completely
reproduced
our
environment,
then
yeah
that
you
would
expect.
A
Yeah,
okay,
I
just
wanted
to
clarify
because,
like
the
just
around
the
same
page,
so
this
is
what
I
was
trying
to
find
kubernetes.
They
have
a
whole
section
where
they
wrote
about
the
sli
and
slos.
I.
A
Okay,
I
think
well,
so
let
me
take
this
so
I'm
gonna
move
this
over.
So
I'll
wait
till
david,
your
patch
merges
and
then
I'll
move
this
over
there
and
test
and
we'll
we
can
just
make
it
configurable
or
yeah.
That's
fine
or.
B
So
I
will
add
thresholds
to
my
pr
so
the
ability
to
pass
in
some
sort
of
gamble
and
define
your
thresholds
and
then
in
the
reports,
file,
you'll,
you'll
understand
if
you've
met
your
thresholds
or
not,
then
you
can
consume
that
when
the
patch
lands
so
I'll
try
to
get
that
done
today.
Maybe
we
can
get
that
in
this
week
and
then
we
can
start
using
that
and
ci.
You
know
soon.
B
When
are
you
leaving
on?
I
assume
pto
ryan.
A
Yeah,
I'm
gonna
be
out
it's
the
last
every
friday.
Let
me
out.
D
A
B
Yeah
wait
so
next
week,
you're
this
friday
you're
leaving.
B
Okay,
maybe
we
can
I'll
see
what
I
can
get
done
today.
Maybe
we
can
make
some
progress
right
before
you
leave
or
at
least
get
things
where.
A
A
For
me,
I
think
like
so
I
understand
here
so,
like
baseline
thresholds
will
have
like
so
we'll
have
like
our
definition,
like
I
said
like
so
like
what
oh
someone's
got,
the
yeah
the
this
is
so
like
we
basically,
let's
like
like
define
sli
like
slo
per
release
based
on
cis
based
on
ci,
and
that's
how
we
can
communicate
it
and
then
another
tool,
usually
per
we'll
use
it
kind
of
as
a
developer
tool
per
environment
yeah
that
makes
sense,
okay,
cool!
A
So
let's
go
to
evaluations
next,
so
these
all
I
want
to
do
with
this
was
just
kind
of
come
up
with
a
list
of
other
tests
that
we
can
do
here.
Now
that
we
have
like,
we
already
have
the
the
the
different
performance
sort
of
phase
changes
for
vmis.
We
have
a
bunch
of
things
that
have
merged.
A
I
just
kind
of
wanted
to
like
enumerate
a
list
here
of
of
tests
that
we
kind
of
we
want
to
do
just
to
kind
of
start
building
towards,
like
these
bass
lines
start
finding
them
do
people
have
any
ones
that
we
have
this
list.
I
think
the
first
one
we
got
to
do
is
like
we
had
to
bring
in
that
qps
change
measure
the
work,
q,
efficiency
again
and
then
it
sounds
like
four
nines
would
follow
and
see
how
that
how
that
changes
things
do
we
have
any
other
ones?
A
F
I
can't
I
can't
run
that
when
it's
get
merged
and
then
it
I
think
it.
Ideally
I'm
going
to
prepare
the
the
pro
job
to
run
like
daily
for
the
the
dance
test
that
we
have
so
that
we
can,
you
know,
start
to
check
it
like
you
know,
in
a
high
level,
in
a
way
that
we
can
go
through
our
phone
public
graphic
dashboard
and
check
things.
H
A
Okay-
and
this
is
how
many
vmis
is
this
like:
what's.
F
A
This
actually
brought
another
thought,
like
maybe
we're
not
there
yet
like
I'm
thinking
like
how
we
should
define
how
we
test
each
of
these.
So
we
have
a
consistent,
I
think,
like
marcel,
I
like
what
you
did
on
the
I
think
for
these.
Definitely
we
should
do
the
same
thing:
the
more
same
tests.
You
did
marcel
that
generated
these
metrics
here
because
that's
yeah.
We
want
to
compare
exactly
those
dashboards
before
and
after
I
think
for
those
two.
So
this
was
we
come
up
with
a
name
for
this.
A
This
was
like
that
that
10
10
20
30
40,
whatever
100
300
test.
I
don't
know
what
would
we
call
this
something
I
don't
know
marcelo's
test
10
to
300
view
my
ramp
up.
So
this
is
we
want
to
do
for
these.
A
Okay,
all
right,
I
just
wanted
to
clarify
that
okay
and
then
we
have
our
daily
tests
so
that
we
can
start
on
things
and
start
getting
that
going.
Okay.
So
let's
go
to
other
items,
so
roman
you've
got
a
pr
here.
C
Entry
yeah,
this
was,
would
just
be
my
initial
proposal
on
how
to
make
stuff
configurable
and
basically
right
now
we
have
four
clients
user,
which
we
use.
We
have.
We
have
two
clients
in
word
api,
one
for
console
connections
and
the
stuff
and
another
one
for
validations,
so
for
the
webhooks
that
it's
fast
and
we
have
one
for.
G
C
The
400
and
200
number
on
the
weapon
configuration
at
the
bottom.
That's
what
we
have
already
that's,
why
you
never
had
any
issues
with
the
validation
web
books
when
they
did
anything
but
for
controller
configuration
and
I'll
increase
it
here.
In
this
example,
too,
or
I
said
here,
the
same
defaults
like
kubernetes
did
for
the
controller
manager,
that's
13,
that's
the
1320
and
for
the
rest,
I
left
it.
C
B
B
Client
default
yeah
yeah.
I
see
that
you
created,
like
I'm
just
briefly
looking
at
this
there's
a
package
that
you
introduced
called
rate
limiter.
I
this
is
more
complex
than
I
thought
it
would
be,
because
I
thought
it
was
just
gonna.
It
would
just
be
setting
something
in
the
clients
that
was
yeah.
C
But
well
I
mean
I
am
studying
the
rate
limit
limit
on
the
client
configuration
and
so
that
I
can
so.
The
nice
thing
is
so
in
the
client
configuration
you
can
set
burst
in
qps
directly
and
then,
when
you
create
the
client,
a
rate
limiter
will
be
created
for
you
with
this
with
these
values.
C
But
you
can
also
just
directly
pass
indirect
limiter,
and
this
has
the
advantage
that
I
now
created
a
wrapping
rate
limiter
for
the
token
bucket
rate
limiter
from
kubernetes,
where
it's
passed
in
and
I
tied
it
together
with
our
qubit
config.
So
you
can
change
the
values
on
the
fly
test.
Okay,
so
that
that's
what
you
did
so
that's
where
the
complexity
is.
B
This
because
it's
dynamic
yeah.
Otherwise
I
would
have
to
restart
all
the
components
and
this
is
slow
and
are
we
worried
at
all
about,
like
the
the
locking
or
anything
like
that.
C
Okay,
so
yeah,
we
don't
get
slow
down,
so
maybe
maybe
as
so.
I
guess
we
added
or
not.
I
guess
I
know
we
had
a
delay
of
one
additional
locked
look
up,
but
it's
even
a
shorter
look
up
than
the
default
rate
limiter
does
so
at
first
we
have
a
very
small
delay
in
general
to
request,
which
should
not
really
be
measurable,
but
no
throttling
not
more
threatening.
You
know
what
I
mean
yeah
we're
already.
C
A
C
Yeah,
I
don't
I
mean
it
could
have
exposed
it
also
by
command
line
flex,
for
instance,
or
or
something
like
this
or
yeah.
Okay,.
E
C
If
just
after
the
changes
and
reboot
the
components
but
they're
ahead
right,
so
the
issue
with
that
is
when
I
just
for
instance,
tell
the
tender
to
reboot
itself.
When
it
detects
changes
there,
then
we
would
have
to
add
delays
there
that
no,
not
all
of
them,
are
rebooting
at
the
same
time
and
so
on,
and
it's
when
I
just
do
it
this
way.
It's
an
absolutely
simple
change
and
fast
and
sure.
A
Okay
cool
this
is
I
I
don't
remember
what
we
did
internally
for
these.
I
know
I
know
this
one
was
bumped
up.
I
don't
remember
what
it
was.
C
C
C
Research,
when
you
look
at
the
api,
docs
or
another
api
networks
at
the
kuwaiti
stocks
that
you
can
read
what
the
default
values
are
for
the
command
lines,
so
that
for
the
cubelet,
it's
10.5
for
the
controllers,
3020
and
yeah
for
the
api
server.
They
don't
have
it
because
the
api
server
directly
is
the
receiver
yeah.
C
C
With
different
values,
we
also
have
helper
functions
in
the
tests
package.
If
you
want
to
automate
that
easily.
So
there
is
something
you
can
just
fetch
the
keyboard
config
change,
the
values
you
want,
and
then
we
have
update,
cube
with
complicated
way
for
propagation,
and
once
this
this
function
is
done,
you
can
run
the
test
again
and
all
components
have.
The
new
word
is
guaranteed.
F
Like,
for
example,
if
you
see
it's
easier
just
to
to
do,
I
was
thinking
if
you
see
like
if
it's
throttling
the
request
and
then
how
you
know
how
much
request
it's
arriving
and
then
you
know
things
like
that.
A
You've
got
a
metric
you
added
for
this
right.
It
was
the
rate
limiting
something
like
that.
You
had
it.
You
know.
C
Which
metric
the
the
there
is,
the
rate
limit
metric
exposed.
C
Yeah,
so
this
would
so,
if
you
want
to
find
the
optimal
value
for
a
specific
size,
you
could,
for
instance,
just
start
with
the
default
values
run
the
test.
You
would
see
it,
you
would
see
the
rate
limiter
kicking
in
in
this
metric.
You
could
increase,
it
run,
re
increase.
It
run
the
same
test
until
the
rate
email
that
doesn't
get
hit
anymore.
That
would
be
a
possibility,
and
that's
probably
what
you
talked
about
right.
A
Yeah,
so
that's
so
it's
what
was
it
five,
five,
nine
six,
three
marcelo,
I'm
actually
thinking
like
we
could
that's
another
test
we
could
do.
We
could
change
qps
to
see
how
it
affects
like
how
much
like
we
see
like
maybe
like.
Maybe
there
could
be
a
tell
for
us
like
see
like
okay,
it's
just
we're
just
being
rate
limited
like
crazy.
A
Let's
see
how
the
rate
limit
metric
is
expected.
It
can
at
least
get
us
to
a
point
where
we
could
figure
out.
Okay.
Well,
what
should
we
be
at?
It
would
be
interesting
too,
to
see
like
if
you
know
how
this
how
this
changes
based
on
scale,
like
you
know,
just
a
different
like
how
it
how
it
moves
based
on
our
environment,.
E
And
keeping
if
we
do
those
kind
of
tests,
we
should
also
have
it
like,
maybe
a
separate
dashboard
for
it.
We
also
look
at
the
at
least
api
server
metrics,
to
see
at
what
point
we
put
too
much
on
that,
but
are
better
for
it's
better
for
us,
because
it's
always
going
to
be
a
balance.
A
A
So
yeah
that'll
be
that'll,
be
helpful.
Okay!
Does
this?
My
only
last
question
was,
like
I
don't
know.
Is
that
is
your
news
here
or
tomas
like?
Do
you
guys
remember
what
the
what
we
set
the
qps
burst
to
for
all
of
these.
H
I
think
it
was
around
30
or
40..
I
don't
remember
the
exact
number,
but
I
think
it
was
something
around
30.
A
Was
it
just
on
a
controller
or
was
it
was
it
for.
A
Okay,
all
right!
Well,
I
think
maybe
we
can
do
some
testing
and
see
like
we
do.
I
think
we
definitely
need
this.
It's
I
mean
it's
a
question
of
like
whether
this
these
two
change.
If
we
can
make
some,
we
can
do
this
in
testing
and
find
out
if
it
should
be
higher
if
defaults
or
whatever,
that
should
be
okay,
cool
all
right,
thanks,
roman
all,
right!
This
is
this
last
item:
do
we
have?
F
Maybe
the
one
related
to
the
maximum
number
of
yams
per
node,
I
included
here.
It
was
the
last
item
actually.
F
So,
just
to
contextualize
very
quickly,
it's
you
know.
I
think
we
already
discussed
that
so
kubernetes.
The
kubelet
has
some
pod
limits,
that's
default
110
and
we
can
increase
that
easily
and,
however,
the
cube
handler
I
could
run
there
also
has
some
you
know
parameter
for
that
with
the
maximum
device,
which
is
actually
the
number
of
vmis
it's,
even
though
the
name
doesn't
look
that,
but
it
reflects
that
it's
the
maximum
number
of
it
will
implies
that
it's
the
maximum
number
of
vmis
per
node.
F
However,
the
vert
operator
that
actually
creates
the
writ
handler
the
demon
set
has
some
very
strict
reconciliation.
So
if
we
change
the
virtual
handler
demon
set,
the
virtual
operator
will
overwrite
that.
So
we
cannot
change
things
unless
we
apply
some
some
things,
for
example,
hco,
which
we
are
not
using.
The
convert
conversion
can
patch
and
change
the
default
values
from
the
controllers.
That
which
operator
is
doing
roman
pointed
some
way
to
do
that
directly.
I
didn't
check.
That's
sorry,
yeah,
so.
C
C
When
you
go
down
a
bit
a
little
bit
more
in
the
patch
section
here,
you
see,
for
instance,
you
can
k
yeah,
you
can
just
do
a
json
patch,
on
which
controller
like
described
here.
Okay,
it's
more
or
less
similar.
What
hco
is
doing,
I
see
yeah,
I
think
h2o
is
just
passing
it
through
to
that
section,
but
I'm
not.
E
F
F
Yeah,
so
not
sure,
so
we
can
discuss
that
if
it
makes
sense
or
not
it's
similar
to
what
we
discussed
that,
but
this
pr
actually
was
also
something
that
you
know.
David
mentioned,
I
looked
for
the
max
number
of
pods
and
I
use
that
as
the
maximum
number
of
device.
So
right
now
it's
hard
coded
for
110,
but
we
can
just
search
that
value
for
the
maximum
number
of
quads
and
use
that
that's
the
pr
is
doing
so.
C
I
I've
checked
the
apr
a
little
bit.
Historically,
my
opinion
was
to
just
set
the
number
very
high
to
two
to
one
thousand
or
two
thousand
thousand
instead
of
110,
I'm
not
sure.
If
that
is
an
option
with
this
approach,
my
main
problem
was,
is
only
that
we
have
a
few
edge
cases
which
are
hard
to
catch.
One
is
when
the
default
configuration
path
is
changed,
obviously,
and
the
other
one
is
when
the
value
is
changed.
While
the
demonstrator
is
running,
we're,
also
not
picking
it
up.
F
Yeah,
so
it
makes
sense,
especially
because
you
see
you
see,
some
tests
are
failing,
because
it's
complaining
about
the
the
path
of
the
file
for
some
reason
in
the
in
the
environment
doesn't
find
the
path
of
the
file
for
some
reason,
and
it
makes
sense
the
the
the
thing
of
the
changing
the
parameter.
I
think
I
kind
of
leave
the
priority
of
the
the
for
the
flag,
but
you
know.
C
F
C
C
It's
not
like
you
request,
one
dev
kvm
and
another
one
requests
another
fkvm,
and
basically
we
just
have
this
number,
because
the
underlying
device
plugin
x
wants
us
to
give
it
a
number
and
quantity
I
see,
but
there's
no
real.
I
mean
we
will
see
it
on
the
pr.
Then
in
theory
there
could
be
some
inefficiencies
in
the
cubelet.
When
we
set
it
to
a
higher
number,
then
we
would
probably
just
go
with
thousand
or
something
but
yeah.
B
F
E
F
C
D
A
All
right
cool
all
right:
we
got
nine
minutes
left.
Are
there
any
other
things
we
want
to
discuss.
F
Yeah,
I
I
had
the
grafana
dashboard
and
this
one.
So
if
you
deploy
now
the
convert
ci
with
grafana,
you
can
see
the
dashboard
that
I
was
using.
The
tests
that
I
did.
C
When
it's
merged
here
you
will,
we
have
a
periodic
job
which
twice
a
day
I
think,
takes
the
latest
release
from
keyboard
ci
and
creates
a
pr
in
keyboard.
F
C
A
A
A
So
I
I
this
would
be
like
if,
if
I
did
so,
we
we
like
right
now
the
prometheus
isn't
like
make
cluster
upright.
So
if
I
did
make
cluster
up
and
then
check
this
I'd
see
it
microphone,
I'd
see
it.
F
A
Oh
yeah,
that
was,
we
had
discussed
on
the
on
slack
there
is,
I
found
a
few
more
that's
what
you're
talking
about
right
like
for
inside
of
what's
the
controller
or
the
work
you
metrics
like
there
was
oh,
I
wish
I
had
the
link.
Let
me
see
if
I
can
find
it,
there
was
a
ton
of
them
that
I
that
I.
F
Found
retry
is
it
retry,
and
I
include
that
one,
but
we
can.
I
can
double
check
so
if
you
can
just
highlight
again
those
those
metric
and
double
check
if
everything
is
in
the.
A
Yeah
it
was,
I
found
it
here.
A
Yeah,
that's
where
I
got
a
bunch
of
descriptions
for
some
of
these,
but
you
had
most
of
these.
It
was.
It
was
a
few
of
them
that
were
not
there.
Yeah.
F
A
A
A
Total
number
retry
is
handled
by
the
work
queue
yeah
this
one
and
there's
some
other
stuff
like
great
loaner
and
stuff
in
here
I
don't
know,
but
there's
I
don't
know
if
we're
hitting
any
of
these,
but
there
was
some
interesting
ones,
so
yeah.
A
Okay,
all
right
thanks,
marcelo,
that's
pretty
cool!
All
right!
Are
there
any
other
open
items,
then,
last
minute,
what.
E
I,
what
I
wanted
to
bring
up
was
the
that
I
wanted
to
look
at
the
amount
of
left
around
goal
routines
because
that's
quite
concerning
hey
there
we
go
to
like
we're
growing
exponentially
and
b
that
we
have
so
many
leftover
teams.
I've
been
looking
at
a
few
parts
of
this
and
other
things,
and
I'm
gonna
have
a
look.
A
Okay,
yeah
I've
got
in
this.
This
issue
kevin
yeah,
like
the
how
the
go
routines
climb
like
this
hey
the
stairs
when
we
scale.
E
Down
but
also
the
growth
seems
unproportional
to
the
amount
of
vms
we
create
yeah,
okay,.
A
Cool
okay,
any
other
things.
Four
minutes
left.
A
Okay,
all
right,
let's
like
like,
I
said
we
can
revisit
the
first
topic
from
the
meeting,
so
I
mentioned
I'll
be
out
next
week.
If
folks
want
to
have
a
meeting
david,
you
said:
you're,
okay,
with
hosting
it.
That
makes
sense.
If
you
have
some
items,
you
want
to
discuss,
yeah
I'll
leave
it
to
you
and
I'll
have
I'll
I'll,
send
you
the
admin
code
just
so
you
have
the
ability
to
record
and
everything
and
then
yeah.
If
you
have
an
agenda,
then
it
makes
sense
to
have
it.