►
From YouTube: 2020-12-10 Kubernetes SIG Scalability Meeting
Description
Agenda and meeting notes - https://docs.google.com/document/d/1hEpf25qifVWztaeZPFmjNiJvPo-5JX1z0LSvvVY5G2g/edit?ts=5d1e2a5b
A
A
A
Hey
guys,
yeah,
I
can
introduce
myself,
I
I've.
I've
joined
a
few
meetings,
I
know,
and
so
my
name
is
alex.
I
work
at
apple
and
apple
has
been,
it
has
been
building
larger
and
larger
kubernetes
clusters
in
the
last
year
and
scalability
is
now
one
of
the
main
problems.
A
Initially,
there
were
problems
even
growing
clusters
to
2
000
nodes,
and
now
we
had
the
4
000
nodes
and
thinking
to
build
clusters
as
much
as
6,
000,
8,
000
nodes
and
eventually
10
000
or
so
so
yeah.
A
lot
of
pain
points
there.
Internally,
we
started
using.
A
We
started
using
cluster
loader
tool
to
run,
run
performance
tests
and
we
do
have
plans
to
extend
existing
test
coverage
to
cover
our
internal
implementations
and
hopefully
contribute
upstream
as
well
with
all
right
findings
and
the
interest,
that's
great.
So
what
do
you
mean
by
your
internal
implementation
and
cluster
loader
support
in
that
area?
A
So
we
do
have
internal
implementations
for
cube
proxy.
For
example,
we
have
our
like.
We
have
implementation
for
for
network,
we
do
have
things
such
as
sticky,
ips
and
our.
So
we
also
use
container
d,
for
example,
instead
of
korea
for
container
runtime,
and
things
like
that,
but
I
think
network
part
is,
is
the
biggest
I
guess
trunk
there
cool
nicely,
obviously,
like
all
contributions
are
welcome.
A
So
yeah
like
some
like
there's
a
part
that
I
think
we
already
test,
for
example,
contain
early.
Our
periodic
tests
in
kubernetes
are
already
using
container
d
buddy,
like
obviously,
if
you
have
like
some
custom
q
proxy,
and
probably
you
want
to
gather
some
metrics
from
there
like.
This
is
something
that
cluster
holder
can
support
and
if
you
want
to
contribute
in
that
area,
that's
that's
that's
great.
A
Do
you
have
like
any
questions
to
us
and
anything
we
can
help
you
with,
or
I'm
sure
I
will
as
we
go
on,
I'm
still
researching
so
we
have.
I
might
I
actually
started
I
pivoted
to
scalability
team
just
a
few
weeks
ago,
or
I
guess
maybe
a
month
at
this
point.
We
have
a
number
of
people
that
have
been
working
on
it
for
about
a
year
now.
So
I'm
sure
I'll
have
more
and
more
questions
as
of
right.
A
Now,
I'm
just
you
know
like
a
flying
wall,
just
listening
in
and
check
on.
What's
what's
going
on
upstream,
okay,
perfect,
like
if
there's
anything
we
can
help
you
with
or
any
questions
anything
can
explain
or
maybe
you
want
to
share
with
us
them
feel
free
to
just
to
just
talk
all
right.
So
yeah
reverting
intro,
I'm
I'm
matt,
I'm
six
cowboy
teacher
vitek
is
also
here
as
kubernetes
work
group.
Reliability
voltaic
is
rtl.
A
B
Sure
so
I
guess
my
my
main
question
is
previously.
Some
scalability
or
load
tests
have
been
discussed
that
were
being
run
internally,
and
you
know
that
piqued
my
interest
into
like
what
are.
What
are
the
things
that
that
six
scalability
people
are
looking
at
to
determine
the
success
of
those
tests
like
what
are
the
types
of
failure
modes
that
would
actually
start
to
surface
once
you
try
to
deploy
a
large
cluster.
C
B
And
so
right
now
I
have
you
know
a
pretty
reasonable
use
cases
where
just
a
couple
hundred
nodes
is
fine,
but
in
the
future
we
have
plans
to
migrate.
You
know
much
larger
deployments
to
kubernetes,
and
so
I
gathered
some
statistics
based
on
on
the
suggestion
of,
I
believe
matt
about
what
like
a
what
like
a
hypothetical
workload
could
be
for
for
the
for
this
particular
scenario,.
B
A
B
So
basically,
we
have
like
a
kind
of
like
a
sinusoidal
type
workload
when
it
comes
to
scheduling
of
pods,
just
due
to
like
the
working
hours
and
people
doing
deploys
and
things
like
that,
but
at
peak
it
seems
to
be
or
or
I'm
sorry
as
an
average.
B
I
mean
it
does
actually
get
higher
than
this,
but
but
as
like
a
reasonable
peak
or
average
we're
seeing
roughly
around
30
to
45
pods
being
scheduled
a
second
or
we
would
be
in
this
particular
environment
across
roughly
40
000
machines,
and
you
know,
with
roughly
around
half
a
million
containers
running
or
pods
running
in
this
in
this
context
and
so
yeah.
B
So
I'm
really
wondering
like
what
types
of
things
should
we
be
looking
for
as
we
start
to
grow
these
clusters
in
terms
of
failure
modes
like
one
particular
concern
I
had
was
just
with
coupe
proxy.
So
you
know
with
the
number
of
pods,
as
they
scale
like
to
my
understanding.
B
Coup
proxy
is,
is
constantly
pulling
the
api
server
to
find
out
what
the
topology
is
and
then
set
up
ip
table
rules
locally
to
facilitate
that.
But,
like
you
know,
if
the
api
server
was
is,
is
like
degraded
or
unresponsive,
or
something
like
that
hypothetically,
like
I'm
thinking
that
maybe
those
rules
could
drift
out,
you
know
not
and
be
pointing
to
addresses
that
are
no
longer
valid
or
something
like
that.
B
You
know
also
by
default,
like
cube
dns,
you
know
so
I've
been
testing
in
eks,
but
by
by
default
you
know,
there's
only
a
couple:
coupe
dns
containers
that
get
scheduled
and
you
know
things
like
the
metric
server.
Also
is
you
know
these
things
are
not
deployed
in
a
way
that
they're
going
to
scale
automatically
so
yeah.
B
So
I'm
really
just
wondering
if
you
can
kind
of
walk
me
through
or
or
just
kind
of
give
me
some
points
on
what
what
should
we
be
looking
for
in
terms
of
degradation
within
the
cluster,
as,
as
you
said,.
A
A
Great
questions
I
I
would
say
like
they,
they
boil
down
to
the
core
of
what
we
do
at
six
scalability,
so
basically
how
we
test
capability
and
how
we
ensure
that
current
scales-
and
I
think
it
started
with
definition.
So
there
is
a
link
here.
A
few
meetings
like
below
of
the
presentations
we
gave
at
the
kubecon
last
year
basically
summarizes
a
bit
our
approach
to
how
scale
tests
and
like
how
to
define
that
cluster
scale.
A
Slow
cluster
works,
okay
under
high
load,
so
like,
as
you
said,
like
more
or
less
like
between
words
like
cavities,
complicated,
because
there's
a
lot
of
different
dimensions
and
these
dimensions
interact
with
each
other.
So,
basically
like
the
the
the
framework
that
we
have
and
the
idea
is
like
to
find
some
safe
space,
taking
into
account
like
all
these
dimensions
that
as
long
as
you
are
within
it,
your
cluster
is
happy
and
like
going
like
tldr
here.
We
we
approximate
this
envelope
but
like
set
of
limits.
A
So
basically
we
say
your
cluster
should
work
as
long
as
number
of
nodes
will
be
less
than
something
number
of
possible
less
than
something
number
of
secrets.
Configmaps,
etc,
etc,
and
basically
this
is
what
we
use
in
our
continuous
scale
tests
in
kubernetes
to
make
sure
there
are
like
no
regressions
and
kubernetes
keeps
scaling
to
that
limit.
But
now
the
question
is:
what
does
it
mean
that
cluster
is
helping?
I
think
that
that
was
your.
A
What
was
we're
referring
to
so
how
we
can
say
that
this
configuration,
for
example,
that
we
test
this
workload
is,
is
still
like,
okay
and
when
it
begins
like
to
obviously
like,
for
example,
if
that's
ever
gross
time,
the
ap
server
goes
down
and
then
then
then
it's
like
above
our
limits
right,
like
that's,
that's
beyond
what
cover
kubernetes
can
support,
but
what,
if
there's
just
a
slight
degradation
of
of
of
performance
right,
so
the
approach
we
use
is,
I
would
say,
pt
standard,
so
we
define
scalability,
slows
slice
and
slos
like
some
principles
of
them,
personal
links,
and
basically
we
use
that
to
to
say
that
as
long
as
all
those
sls
are
satisfied,
cluster
is
happy
clusters
case
right.
A
So
the
most
important
one
is
api
called
latency
and
we
have
like
few
others.
So,
for
example,
one
this
sort
of
one
to
what
you
said
about
the
proxy,
for
example,
overloading
our
api
server
and
then
q
proxy
lagging.
So
in
such
case,
we
have
in
cluster
net
for
grammar
latency
that
should
more
or
less
detect
that.
A
So
that's,
basically
what
our
tests
do
like
in.
In
very
few
words,
so
we
we
load
the
cluster
to
some
limits
and
then
we
check
that
all
slos
are
satisfied
and
not
all
of
these
are
ready
like
some
of
our
work
in
progress,
but
I
would
say
we
have
implementation
for
almost
all
of
them
in
some
state
in
cluster
or
supported
so
yeah
so
but
like
in
general.
That's
that's
another
question
that
you
can
answer
in
a
very
precise
way.
A
It's
kind
of
an
art,
sometimes
it's
more
art
than
science,
to
find
like
whether,
like
this
configuration
scales
or
not
like,
obviously,
you
can
run
tests,
but
to
maybe
do
it
more
generically
to
to
be
able
to
tell
whether
this
particular
configuration
will
work
or
not.
It's
sometimes
hard
to
answer.
So
I
think
testing
is
probably
the
the
only
way
that
can.
A
Right
so
yeah
totally,
so
so
why
tech
may
correct
me
here
or
like
add
something
because
he's
or
extend
this
area,
but
what
we
do
in
our
tests.
We,
I
think
we,
as
of
now,
are
creating
pods
with
50
per
second
rate.
We
are
also
like
experimenting
with
100
per
second
rate,
but
we
run
into
some
issues.
A
So
basically,
like
scheduler
is
not
a
problem
here,
it's
not
a
bottleneck.
The
only
issue
is
like
in
like
the
default,
like
configuration
that
we
have
on
of
of
kubernetes.
Scheduler
has
a
hard
limit
on
qps
like
client-side
qps,
and
this
is
like
set
to
like
it
depends
on
your
deployment.
Obviously
right
like,
but
like
that
we
run
the
test
in
kubernetes
and
gc
and
the
defaults
are
there.
A
I
think
20
or
something
like
that
and
we
bump
into
hundred,
but
you
can
basically
bomb
this
keep
client
side
qps
and
like
scheduler,
is
not
a
bottleneck
like
obviously
it
depends.
What
kind
of
like
pots
you're
are
running.
Varying
some
more
sophisticated
scheduling
features
like
dot
affinity
and
the
affinity.
A
Then
probably
there
will
be
some
drop,
but
if
you're,
just
like
scheduling
like
some
basic
deployments
of
spots,
then
like
usually
scheduler
is
not
a
bottleneck,
but
the
bottleneck
is
in
our
test,
at
least
like
the
biggest
bottleneck
is
the
the
the
the
load
generated
by
cue
proxies
like
watches,
watching
the
services,
because,
like
the
pots
we
are
creating,
are
at
least
some
of
them
are
part
of
the
services,
but
yeah
like
this
is
like
usually
overlapping
api
server,
but
we
have
some
solutions
to
that.
A
I
can
like
dig
into
that
later,
the
other
things
that
we
basically
that
prohibit,
because,
like
recently,
we
were
experimenting
with
spinning
up
the
test,
the
test,
so
we,
I
think,
ended
up
with
a
50.
We
didn't
go
to
100,
because
we
noticed
that
we
had
problem
with
events
yeah.
I
see
there
were
too
many
events
overloading
at
city
so
yeah
we
actually
have
had
some
ideas.
Then
I
can
like
link
you
to
some
discussions
about
the
reason.
B
A
Basically,
what
we
see
is
usually
increasing
api
called
latency,
so
the
slo
is
yeah,
there's
a
link
to
like
anyway.
I
can
share
the
links
later,
so
I
cannot
waste
time,
but
basically
like
the
slr
for,
for
example,
for
simple
get
calls
is
like
the
the
call
should
be
below
one
second,
when
lcd
gets
overloaded,
usually
like
spikes
out
right,
because
that's
that's
like
a
consequence.
D
A
That's
how
we
detect,
even
though
like
in
in
I
would
say
the
most
dire
case
if
nct
is
super
overloaded,
you
can
like
even
like
take
down
the
whole
vm
with
master
and,
for
example,
kill
api
server,
so
cluster
becomes
completely
unavailable.
But
that's
like
the
like
easy
to
detect
case
right,
but
very
often
it's
that
just
sd
is
overload
that
it
still
works
somehow.
But
it's
overloaded
and
we
see
that
in
the
in
the
api
collidency.
Usually
so
that's
how
we
detect
that.
That's
the
figure
remote
here,
okay,.
A
So
yeah
like
the
issue
that
that's
the
problem
of
the
endpoints
api
that
it's
backing
the
cluster
ip
services
right
with
endpoint
slices.
This
is
better
and
like
in
since
119
right.
We
have
npx
licenses
enabled
by
this
okay.
So
that's
why
we
were
able
to
push
this
limit,
because
before
that
we
were
using.
I
think
20
like
what's
per
second
rate
and
mostly
because
of
that,
or
maybe.
B
So
if
so,
if
let's
say
that,
if
I'm
using
my
own
service
discovery.
C
A
Even
though
it
doesn't
mean
like
what
gives
you
like,
unlimited
scalability,
because
even
with
watch,
we
have
like
issues
with
the
endpoints
api
right.
This
is
because
you
have
like
this
square
factor
there.
If
you
have
like
service
of
size,
n
pods,
then
the
endpoint
endpoints
object
will
have
size
of
n,
and
if
you
try
to
do
like
some
rolling
update
of
it,
then
basically
n
times,
you
will
be
sending
this
object
of
size,
n,
right
and.
D
A
Okay.
So,
like
that's,
why
like?
What
is
important?
You
need
to
throttle
on
the
like
producer
side
or
like
so
basically
on
the
side
of
controller
that
is
generating
the
data
that
is
later
being
sent
overwatch
right.
So
here
we
have
like
efforts
like
priority
and
fairness
is
coming
and
it
will
basically
provide
some,
I
would
say
some
overloading
protection.
We
can
call
it
this
way
that
should
basically
help
in
these
cases,
because
we
can
then
configure.
You
can
then
configure
the
the
the
api
server
to
make
sure.
A
For
example,
we
are
not
creating
too
many
objects
that
are
resulting
in
like
too
many
watch
events
to
be
sent
yeah
like
it's.
Basically
it's
coming
and
we
have
like
funds
to
extend.
C
It
if
you
have
your
own
networking
solution,
like
you,
mark
individual
services
or
a
group
of
services,
all
services
or
whatever,
as
something
that
shouldn't
be
followed,
that's
something
that
keep
proxies
doesn't
have
to
flow,
and
then
they
are
not
one.
They
are
not
like
programming
ip
tables
but
base
for
services,
but
like
this
is
also
like
using
the
load.
So
if
you
have
something
else,
you
don't
need
like
the
particular
service
to
be
resolvable
via
service
ips
within
the
cluster.
Then
it's
possible.
I
think
it's
called
like.
B
A
So
basically
they
don't
you
specify
known
cluster
ap
and
they
don't
have
cluster
ap
and
then
keyproxy
doesn't
watch
them
and
there
is
another
option.
You
can
provide
some
label
right
in
the
service
with
some
like
custom,
proxy
or
custom
proxy,
or
something
that
and
then
also
like
you,
proxy
won't
be
watching
them.
B
So
what
about
coop
dns
so
is
that
something
that
could
be
potentially
auto
scaled
like
can
we
run
because
I
is,
is
it
stateless
or
or
is
it
not
or
is
it
so
lightweight
that
I
really
shouldn't
worry
about
it.
D
A
Yeah,
so
you
can
scale
up
horizontally,
okay
and
like
not
only
we
do
out
of
scale.
Okay,
for
example
like
okay,
it
may
cause
trouble
sometimes,
because
if
you
like
scale
too
much
horizontally,
then
you
end
up
with,
for
example,
thousands
of
cube
dns
spots,
and
this
is
causing
problems
because
you
have
a
cluster
ip
service
on
top
of
that
right.
So
we
are
so
so
yeah,
okay,
because.
A
B
A
Quick
so,
basically
like
the
idea
is
instead
of
having
one
huge,
endpoints
object.
We
are
partitioning
this
object
into
multiple
slices,
like
that.
That's
what
we
call
them
here.
So,
for
example,
you
have
a
service
that
is
composed
of
thousand
pods.
You
will
end
up.
Instead
of
like
one
thousand
pot
object
like
endpoints
object,
you
will
have
ten
slices
of
hundred
pots
each
like
put
eyepieces
okay.
A
For
like
network
bandwidth
and
things
like
that,
cd
or
something
yeah
yeah,
exactly
because
then,
when
you
update
a
single
port,
you
just
update
a
slice
of
that
is
like
it's
like
a
hard
limit
on
on
the
size
right.
It's
I
think.
That's.
D
C
Exactly
an
endpoint
in
119
like
like
endpoint
slice,
api
itself
is,
I
think,
116
or
maybe
117,
but,
like
the
controller
is
118
beta
and
the
119
is
like,
where
cube
proxy
support
for
linux,
in
particular
when
and
this,
and
thanks
to
that
work
enabled
by
default
like
this
is
like
you
process
what
is
like
generating
this
cloud.
C
A
C
C
I
was
based
things
this
for
someone
else,
some
particular
thing
some
time
ago,
but.
B
Yeah
so
I
had
a
I
had
a
question:
that's
in
a
slightly
different
area.
I
guess
so,
if
we're
gonna.
So
if
we
have
a
very
large
cluster-
and
we
want
to
start
scraping
logs
for
containers
like
is,
that
is
that
something
that's
only
generating
load
on
kubelet
hosts
and
kubla
apis
and
should
scale
horizontally
or
or.
A
A
What
you
can
do
it
like?
Basically,
you
could
have
like
some
demon
set
with
some
node
laser
agent.
That
basically
does
that
right,
like
it
scrapes
all
the
container
or
like
yeah
yeah.
You
can
write
right
to
some
location
and
then
this
agent
can
basically
read
from
there
and
and
push
it
somewhere
like
whenever
you
need
to
okay.
So
that's
that's.
Definitely
a
more
scalable
approach:
yeah!
Okay,
that's
good!.
A
If
I
may
chime
in
another
issue
that
we
came
across
as
we
grew,
our
clusters
is
that
the
default
configurations
for
the
control,
plane,
components
and
cubelets
did
not
work
quite
well
for
larger
clusters.
So
I
was
wondering
if
it
was
documented
somewhere
so
basically
recommended
or
tested
configurations
for
clusters
of
a
particular
size.
I'm
not
sure
whether
it's
documented
anywhere,
but
if
I
say
correct
me
if
I'm
wrong,
but.
A
You
can
like
take
a
look,
what
we
do
in
our
continuous
tests,
because
there
we
basically
test
continuously
everyday
clusters
of
size,
5000
nodes.
So
obviously
we
need
to
tweak
some
some
parameters
and
linked
to
there.
No
that's
not
it's
always
so.
A
And
six
column
18
all
right,
so
basically
they're,
like
our.
We
use
pro
to
to
run
this
test.
If
you
are
familiar
with
product,
that
should
be
more
or
less
easy
for
you
to
read,
but
the
cluster
configuration
is
usually
there
like
some
pretty.
How
do
I.
A
D
C
A
That's
that's
true.
That's
also
true,
for
example,
yeah
like
in
flight
requests
right
there's
this.
If
else
logic,
if
like
number
of
nodes
is
larger
than
x,
then
use
this
setting
and
stuff
like
that,
but
yeah.
I
wonder
whether
there
is
a
way.
Actually
there
is
a
way
you
could
just
take
a
look
at
our
tests.
A
These
are
like
cluster
loader
2
tests
at
5000
node
scale,
and
we
dump
all
the
logs
from
masters
masters
and
note
and
like
from
master
nodes,
you
master
logs,
you
should
be
able
to,
for
example,
get
like
the
flags
that
were
used
to
to
start
cube,
api
server
and
other
controllers.
A
D
A
A
A
A
All
right,
we
have
one
minute
left.
B
No,
I
was
just
saying
that
they
answered
all
my
all
my
questions
and
gave
great
things
for
me
to
look
into,
and
I
really
appreciate
it.
D
Yeah,
I
just
wanted
to
echo
same
sentiment
and
come
off
of
like
lurking,
move
just
to
say:
hey,
introduce
myself,
I'm
elena
washington.
I
work
at
gusto
and
I
just
wanted
to
know
if
this
recording
will
be
anywhere
or
if
there's
like
a
list
of
recordings
or
if
that's
not
shared.
A
So
we
are
recording
for
a
reason
they
should
be
available
somewhere,
but
I
have
no
idea
where
I
know
that,
like
two
years
ago
or
two
years
ago,
they
were
like
automatically
uploaded
to
youtube,
but
I
think
the
last
time
I
checked
they
were
in
there.
So
I
will.
I
will
write
an
ai
on
me
too,
to
check
where
they
are
being
stored
and
share
with
you
awesome.
A
I
will
announce
this
stack,
but
probably
will
cancel
the
next
meeting
because
it
will
be
during
the
christmas
time
so
see
you
in
the
next
year
all
right.
Thank
you.