►
From YouTube: 2021-10-28 Kubernetes SIG Scalability Meeting
Description
Agenda and meeting notes - https://docs.google.com/document/d/1hEpf25qifVWztaeZPFmjNiJvPo-5JX1z0LSvvVY5G2g/edit?ts=5d1e2a5b
A
B
Yeah
I
just
started
looking
at
it,
so
I
opened
an
issue
so
that
you
can
track
it.
B
C
Oh
slow
down
you're
going
into
problem
solving,
let's
first
think
about
the
problem.
Definition
right:
what
are
we.
D
C
What
do
we
want
to
accomplish
here
right?
I
think
there
are
a
few
different
things
you
want
to
accomplish.
One
is
we,
you
know
was
articulated
years
ago
when
we
first
started
the
apf
work
is
testing.
Can
we
turn
off
the
client
side
rate
limiting
and
have
a
system
that
works
another
that
was
articulated
much
more
recently
when
we
started
introducing
constants
in
the
support
for
long
lists
and
watches
of
you
know,
finding
good
values
for
those
constants.
A
Yeah,
I
think
the
plans,
I
think
it
makes
sense.
I'm
just
wondering
like
from
client-side
perspective,
because
we
want
to
like
turn
off
this
throttling
on
client
side.
There
are
some
things
that
probably
still
can
impact
cluster
like
creating
connection
which,
which
is
kind
of
heavy,
so
you
know
to
some
degree.
Probably
we
could
try,
but
but
anyway,
like
pnf
might
not
help
in
this
case.
E
Client-Side,
throttling
or
rate
limiting,
won't
really
help
with
like
establishing
connections,
because
we
are
reusing
connection
across
requests
right.
So
unless
you
are
explicitly
terminating
the
connection
in
your
client,
which
is
not
with
which
is
not
even
possible
in
the,
if
you
are
using
client,
go
for
example,
which
is
what
our
components
are
doing,
then,
then
you
will
be
reusing
connections
anyway.
So
it's
not
like
the
problem
that
is
affecting
us
right
or
our
okay.
Yes,.
C
I
would
put
it
in
a
slightly
different
way.
I
think
maybe
it's
it's
a
valid
concern
that
in
general,
in
the
real
world
there
may
be
scenarios
with
a
lot
of
connections
being
created,
but
the
client-side
rate
limiting
doesn't
affect
that
anyway.
C
So
the
switch
from
using
client-side
rate
limiting
to
relying
only
on
the
apf
self-protection
is
really
independent.
Of
that.
E
So
I
think
I
yeah
yeah
that
that
makes
sense.
I
think
that's
roughly
what
what
I
was
trying
to
say.
Yes,
I
think
we
may
also
consider
not
not
turning
off
client-side
rate
limiting
by
default
in
every
single
client,
but
we
may
potentially
remove
or
yeah
remove
it
only
in
like
some
or
all
like
in
three
components,
or
maybe
even
some
out
of
three,
but
not
not
in
like
not
remove
it
in
the
library
itself,
but
just
stream
within
some
components.
I
think
that's
that's
also
an
option
here,
although.
C
Well,
I
think
again,
maybe
you're
getting
ahead
of
ourselves
right.
I
think
our
first
question
is:
we
want
to
understand
what
will
happen
if
it
gets
turned
off.
I
guess
maybe
it's
fair
to
say
what
turn
off
scenarios
do
we
want
to
explore,
maybe
not
a
complete
turn
off
of
partials
turn
off,
but
the
first
question
before
we
decide
what
to
actually
do.
We
want
to
do
some
tests
right
and
that
that's
the
conversation
here.
What
tests
do
we
want
to
do
for
for
what
purpose
right.
D
B
Do
we
see
any
modeling
today
on
the
client
side
for
our
like
big
scale,
tests.
B
E
B
E
We
do
in
particular
like
we
are.
E
We
are
to
some
extent
like
artificially
like
on
one
hand,
artificially
throttling
the
tests
themselves,
and
also
we
are
yes,
the
the
in
particular
scheduler
or
controller
manager
when,
when
creating
pods,
they
are
throttled
by
client-side
trade
committee.
E
A
E
C
Yeah,
I'm
not
sure
either.
I
think-
and
I
suppose
in
fact
it's
it's
kind
of
difficult
to
interpret
right,
because
really
the
question
you
want
to
ask
is
kind
of
compare
this
world
to
a
hypothetical
alternate
world.
What
would
happen
if
we
had
a
different
rate
limit
and
no
metric
can
really
tell
you
that,
because
it's
it's
a
system,
behavior
question.
F
So
I
think
that
is:
can
people
hear
me?
Yes,
hey?
I
I
think
there
is
one
metric
which
probably
indirectly
says
that
there
is
some
sort
of
throttling,
but
not
necessarily
just
with
api
calls,
which
is
like,
I
think,
a
bunch
of
our
controllers,
for
example
with
cube
control
manager.
They
have
this
backlog
or,
I
think,
work
queue,
depth
kind
of
which
says
how
many
things
are
still
to
be
processed.
F
I
have
to
check
if
scheduler
also
has
something
similar
I
think
scheduler
doesn't
have,
but
but
yeah
this
metric
is.
E
G
C
Right:
it's
that
that
q
depth
is
a
consequence
of
a
lot
of
things.
Interacting
right,
I
mean
you've
got
the
number
of
sinkers.
You
know.
What's
the
number
of
sinkers
running
concurrently,
how
fast
does
each
api
call
run?
How
many
api
calls
does
each
sync
loop
iteration
take
there's
a
bunch
of
stuff
wrapped
up
into
that.
F
Yeah
another
way
which
I
kind
of
use.
Sometimes
I
used
it
in
the
past,
which
gives
a
more
strong
indication
that
there
is
actually
throttling
is
like
you
can,
because
you
have
audit
logs
and
if
you
have
audit
logs
for
the
calls
coming
from
that
component,
you
can
just
kind
of
query
through
those.
You
can
just
count
the
number
of
calls
coming
from
that
and
match
that
up
with
with
qps
that
we've
configured
for
it
to
see
if
it's
being
saturated,
but
that's
that's
still
not
like
a
direct.
E
C
C
E
Yeah,
I
think
around
like
getting
rid
of
all
the
rate
limits
in
our
existing
test.
Our
existing,
like
load
test
would
be,
would
be
the
first
thing
that
we,
it
would
be
interesting
to
see.
The
second
is
like
we
can.
We
can
even
speed
up
like
it's.
It's
super
easy
to
speed
up
this
test
and
and
and
say
it
should
be
going
as
fast
as
possible
and
see
what
will
happen
like
if
in
in,
if
the
system,
if
we
want
we
all
like.
E
So
I
think
that's
like
I,
I
don't
there's
like
this
won't
be
a
signal
like
we
are
fine,
but
it
won't
be.
It
will
give
us
some
signal
that
it
at
least
gives
us
something
I
mean
in
the
past
when
we
were
trying
to
speed
up
the
test
and
run
it
as
fast
as
possible.
It
was
basically
blowing
up
the
cluster.
G
F
C
Okay,
I
think
that's
quite
plausible,
let's
start
with
something
simple
and
we'll
just
see
what
happens?
I
mean
the
simplest
thing.
Probably
the
smallest
thing
to
do
would
be
to
try
take
leave
the
test,
speed
as
it
is,
and
just
try
turning
off
all
the
client-side
rate.
Limiting,
let's
see
what
happens.
A
Okay,
yeah
and-
and
I
believe
it
will
still
see
the
difference
because
of,
for
example,
scheduler
that
we
know
that
is
hitting
the
limit.
C
All
right,
so
that's
one
branch
of
the
investigation,
the
other
branch
is
for
tuning
the
constants
and
in
fact
this
could
kind
of
be
done
on
this
first
branch
right,
because
really
the
constants
are
in
service
of
this
self-protection
right,
but
we
do
have
some
constants.
C
I
think
we
have
three
constants
one
is
in
the
support
for
list
requests
and
there
are
two
constants
in
the
support
for
watch
requests
and
right
now,
they're
numbers.
You
know
that,
are
you
know
just
guesses.
We
should
have
some
evidence
for
setting
them.
E
Yeah,
so
I
think
that
the
one
for
list
request
it's
it's,
not
a
complete
guess.
I
mean
right.
No,
no,
no.
C
E
C
C
All
right,
so
what
would
be
a
plausible
way
that
we
could
do
this?
I
mean
one
of
the
concerns
I
have
is
that
the
the
regular
testing
is
run
in
scenarios
where
there's
concurrent
load
from
other
activities,
it's
totally
uncontrolled,
so
performance
results
are
variable
and
it's
tough
to
draw
conclusions
from
that.
Do
we
have
tests
that
are
run
with
you
know
nothing
else
competing
for
cpu
or
network,
or
anything
like
that.
E
C
Great,
yes,
that's
what
I
was
hoping
for
right
that
the
con
confounding
factor
is
the
tests
that
are
regularly
run
on
pr's
right,
they're
run
on
machines
that
are
also
handling
other
stuff,
and
you
know
we
have
a
lot
of
flakiness
due
to
the
load
and
the
you
know
this.
The
behavior
varies
a
lot
due
to
the
amount
of
concurrent
load,
just
the
interference
from
concurrent
activity.
E
Yes,
yes,
but
I
think
that
also
our
existing
scalability
test
may
not
be
perfect
for
that,
because
in
particular,
we
in
particular
for
lists,
because
we
are
not
doing
a
lot
of
heavy
lists
or
a
lot
of
lists
in
general.
E
Try
to
overload
the
the
api
server
with
list
with
purely
list
calls
like,
let's
maybe
start
with
list
calls
with
purely
leased
calls
and
of
different
sizes
and
see
what
will
happen
and
try
to
tune
that
variable.
Based
on
that.
A
C
I
agree
we
should
start
with
something
simple
like
that.
I
I
do
have
a
question,
though:
don't
we
need
to
the
in
some
sense,
compare
this
to
the
normal
case.
Right
I
mean
if
we
have
a
situation
where
we
find
out
what
setting
of
the
constant
just
gets
us
to
some
cpu
limit.
We
want
to
enforce,
for
example,
that's
so
in
my
mind,
right
the
way
what's
going
on
here
right.
C
The
reason
we're
doing
this
is
because
we
have
this
regulation
of
concurrent
requests,
which
is
basically
making
the
assumption
that
the
cost
of
serving
a
request
is
proportional
to
its
duration,
right,
because
the
concurrency
is
the
product
of
service
duration
and
arrival
rate,
and
what
we
find
is
that
for
lists,
the
cost
of
serving
is
bigger
than
for
most
requests,
so
to
set
the
constant
we're
looking
for
a
ratio
or
how
that
ratio
between
duration
and
cost
varies
between
normal
requests
and
the
list
requests.
C
E
C
Agree,
I
think
this
can
be
done
in
a
synthetic
way
and
that
will
be
best
because
we
can
get
the
strongest
signal
to
noise
ratio
in
a
synthetic
test.
C
A
Okay,
so
so
I
guess
we,
we
agree
that
we
should
create
synthetic
tests
for
kind
of
like
estimating
this
ratio
between
the
get
calls
and
list
calls.
C
It
would
probably
be
interesting
to
have
some
variety,
because
I'm
sure
that
the
cost
of
all
the
other
requests
not
exactly
the
same
it
would
you
know
what
all
we
need
to
do
is
get
the
list
calls
in
this
ballpark,
but
we
need
to
know
what
that
ballpark
is
so
having
a
variety
would
tell
us
what
the
ballpark
is.
So
I
think
that
would
be
good.
E
C
C
E
C
C
You
know
somehow
what
what's
appropriate
for
for
every
request.
But
then.
C
Yeah,
in
some
sense,
we're
straying
out
of
six
scalability
back
into
api
machinery,
but
right
right,
but
I
think
yeah.
I
would
like
at
some
point
to
have
something
that
could
you
know
automatically
auto
tune
all
the
requests,
but
I
think
that's
not
the
question
for
here
right.
I
think
the
question
for
here
is
given:
let's
just
talk
about
how
to
tune
with
the
the
constants
and
what
we've
got
so
for
four
lists.
E
Yeah,
I
think
we
can.
We
can
probably
we
don't
have
to
be
super
smart
here.
I
think
if
we
just
like
introduce
some
mix
of
like
yeah,
as
I
mentioned,
like
read
and
write
and
like
different
sizes
of
objects,
we
can.
C
Something
could
we
do
something
like
run
a
real.
You
know
ede
scenario
record,
all
the
api
calls
and
then
you
know
play
them
back
at
our
leisure
without
having
to
worry
about.
You
know
all
the
other
stuff,
that's
going
on
affecting
the
timing
and
so
on.
C
If
we
could,
I
think
we
could
you
know
kind
of
bifurcate.
What
we
do
right,
one
is
to
say:
let's
have
a
representative
mix
of
requests
play
them
as
fast
as
we
can,
and
you
know,
subject
to
a
given
concurrency
limit
right
and
and
see
what
the
resulting
cpu
and
network
is
all
right
and
then,
instead
of
that,
just
do
heavy
lists.
C
You
know
after
loading
it
up
so
there's
data
just
do
a
bunch
of
heavy
lists
right
and
tune
the
constant
till
we
get
to
the
point
where
that
also
produces
the
same
cpu
and
network.
E
C
C
Me
just
recite
that
back
in
a
little
more
detail,
so
I
think
you're
suggesting
I
get
on
the
two
sides
of
the
experiment.
One
side
we
can
run
some
edb
tests,
look
at
the
request
rate
and
the
cpu
and
then
on
the
other
side,
replace
the
those
ordinary
requests
with
you
know.
After
loading
up
some
data,
just
do
a
bunch
of
of
heavy
lists
and
tune
the
constant
until
we
get
the
same
relationship
between
request
rate
and
cpu.
C
Now
that's
not
going
to
work
because
the
request
has
got
to
depend
on
what
we're
doing,
because
if
it's
it's
yeah,
because
otherwise
it's
just
going
to
be
a
function.
Request
rate.
E
E
Starting
the
pots
and
notes
note
originating
signals
like
in
particular,
node
heartbeats
and
updating
endpoint
slices.
So
so
we
don't
really
need
to
to
like
simulate
the
test
like
I
we
we
can
play
a
little
bit
with
that,
like
ratio
of
of
those
calls,
but
we
we
know
what
is
happening
there
right
and
like
it
can
be.
E
We
can
generate
like
four
types
of
five
or
five
types
of
calls
and
like
that,
the
simulation
will
be
pretty
simple.
To
do.
B
E
Yeah,
so
that's
roughly
what
I
what
I
was
trying
so
so
yeah.
I
think
what
mike
is
saying
that,
like
we
first
should
do
pretty
much
this
without
lists.
Even
but
but
yeah.
That's
I
mean
it's.
It
kind
of
boils
down
to
the
similar
thing.
We
can
do
it
in
one
shot.
We
can
do
it
in
two
shots,
like
I'm
fine
with
both.
So
yes,.
C
B
Are
you
saying
that
when
we
do
list,
we
will
be
doing
only
list
that
that
doesn't
like?
I
think,
if
you
have
a
mix,
that
probably
is
more
close
to
the
real
world
scenario.
E
I
I
agree.
Yes,
I
mean
we,
we
should
have
at
least
some
non-list
calls
like.
We
should
think
about,
like
duration
and
probably
experiment
with
different
ratios
of
like
non-list
to
list
calls.
But
but
yes,
I
think
we
we.
We
should
have
both.
C
Yeah,
I
think
I
mean
again,
I
I
think
it
should
be
sufficient
right
to
be
able
to
run
it
kind
of
in
two
modes
yeah
if
we
have
a
mix
that,
in
some
sense,
you
know,
is
diluting
the
signal
right.
If,
if
lists
are
a
fraction
of
the
load,
then
they're
going
to
be
a
fraction
of
what
happens
well,
it'll
be
easier
to
see
and
analyze.
What
happens
if
we
have
two
pure
distinct
modes.
E
Yeah,
so
so,
how
I,
how
I
personally
envision
the
the
test,
is
that,
like
you,
have
like
this
one?
Not
when
you
are
saying
that
n
percent
of
the
load
is
coming
from
those
calls,
though,
like
those
right
and
simply
like
single
object,
calls
and
like
100,
minus
n
is
coming
from
lists,
and
we
can.
We
can
just
by
changing
like
one
constant
in
our
test
like
run
either
only
list
or
all,
or
only
like
those
single
object,
calls
or
a
combination
of
both.
C
That'd
be
great
right
that
way,
we
could
test
the
hypothesis
by
actually
trying
some
in
between
and
seeing
if
it
also
matches
the
prediction
from
the
extremes.
G
E
A
little
bit
more,
but
we
should
probably
try
to
conclude
it
somehow
so
abu
would
you
would
you
be
able
to
start
something
around
that.
B
Yeah
yeah,
I
mean
from
our
initial
discussion
from
the
last
meeting.
We
thought
about
like
adding
a
web
hook,
but
we
can
just
go
with
the
first
like
even
the
first
version
is
what
we
discussed
today
and
then
we
can
go
from
there.
C
E
B
E
B
C
D
C
You
know
with
wearing
another
one
of
my
hats
that
you
guys
haven't
seen
now,
I'm
trying
to
do
some
performance
studies
looking
at
latency
out
to
watch
clients,
I
wanted
to
talk
about
how
to
support
that,
but
I
guess
we're
out
of
time
for
today,
but
if
anyone
wanted
to
follow
up
on
the
slack
channel,
that
would
be
great
because
I
I
want
to
make
progress
soon.