►
From YouTube: 2022-01-20 Kubernetes SIG Scalability Meeting
Description
Agenda and meeting notes - https://docs.google.com/document/d/1hEpf25qifVWztaeZPFmjNiJvPo-5JX1z0LSvvVY5G2g/edit?ts=5d1e2a5b
A
So
this
is
six
scalability
meeting
20th
january
2022
and
I
think
today
we
have
one
topic
to
discuss
because
the
first
one
was
added
by
mike
and
he's
not
here
so
yeah.
Maybe,
let's
start
with
with
the
first
point.
B
Yeah,
so
this
is
a
clan,
go
pr
that
we
have
been
working
on
so
basically
for
rich
rise
in
clan.
Go
the
retries
are
not
exponentially
backed
off,
so
this
pr
actually
adds
that
support.
So
I
just
wanted
to
bring
it
up
here
and
also.
I
want
to
run
some
scale
tests.
Maybe
the
5000,
like
node,
run
to
see
if
it
has
an
impact
other
than
that
the
ci
jobs
we
have
for
the
pr.
C
I'm
afraid
that,
like
running
the
current
scale,
job
will
not
really
benefit
us,
because
I
think
we
are
we.
We
are
not
really
exercising
the
path
of
of
retries
like
I
probably
there
are
a
couple
retries
over
the
whole
test,
but
this
is.
This
is
negligible.
I
would
say
so.
C
I'm
just
pretty
much
sure
that,
like
the
test
will
pass
and
we
will
not
see
any
reasonable
difference,
but
it
doesn't
mean
it
like
really
helps
or
it
it
breaks
anything
or
or
it
doesn't
really
say
us
that
it
won't
break
anything.
So.
C
I'm
personally,
okay
with
this
change,
I
would
like
some
tests
to
be
run,
but
I
don't
know
like
what
exactly
we
we
should
be
testing
here.
I
just
I
I
guess
just
I'm
just
saying
that
like
running
scalability,
just
we
can
do
that,
but
I
don't
think
it
will.
We
will
get
anything
useful
from
that.
B
B
B
B
Yeah
last
time
I
checked,
I
didn't
see
any
so
I
will
try
to
do
some
open
pr
with
that
change
and
maybe
then
we
can
use
that
too
to
see
if,
like
in
regular,
ci
jobs
or
in
scale
testing.
At
what
rate
are
we
actually
seeing
retries
being
tried.
A
But
actually
I
have
one
question
so
what
kind
of
errors
it
will
retry
like
all
of
them,
because
you
know
like
we
could
probably
play
a
bit
with
priority
and
fairness
and
throttle
some
requests
and
make
them
time
out.
For
example,
I
think
we.
C
B
Apart
from
apf,
the
only
time
we
retry
that
I'm
aware
of
is
when
it's
a
read
request
and
you
are
running
into
some
retribal
error
only
then
it's
the
request,
yeah.
B
Yeah
for
right,
exactly
yeah,
okay,
so
I'll.
Take
this
an
action
item
to
add
something
incline
go
to
measure
the
retries.
Also
for
this
pr
do
you
think
we
should
involve
other
folks
like
danielle
legit,
to
take
the
input
or
you
think
this
is
we
can
because,
ideally
we
want
to
we.
We
want
to
get
it
in
and
and
give
you
some
soap
time
for
the
for
124
right
so
that
we,
if
there's
an
issue,
arises,
we
can
fix
it
in
time.
C
B
Okay,
so
I
I'm
holding
the
pr
right
now,
then:
let's
I'm
going
to
open
a
new
pr
with
the
the
metrics
change
and
then
merge
that
first
and
maybe
then
we
can.
We
can
march
this
pr.
C
I
mean
regarding
like
letting
people
like
joel
battle
or
daniel
know.
I
think
it's
just
maybe
useful
to
put
the
link
and
let
people
know
on
sig
api
machinery
slack.
That
is
something
has
happened
and
like.
If
someone
is
interested,
then
they
will
look
at
it.
A
Okay,
so
I
don't
know:
do
we
have
anything
else
to
discuss
today.
C
A
C
A
C
They
were
proposing,
so
it's
let
me
just
find
it.
C
Let
me
place
it
here
if
you
can
copy
it
until
to
the
dock,.
A
A
C
This
will
in
practice
it
will
be
also
like
a
regular
watch
yeah
and
like
what
will
this?
What
will
be
different?
This
will
be
that
the
initialization,
like
basically
the
initialization
part
of
the
watch,
will
will
be
much
longer
and
we
are
already
we
already
have
support
for
watch
initialization.
So,
like
watches
initializations,
this
count
is
this
is
consuming
the
the
tokens
or,
however,
we
call
them
like
the
the
api
or
the
in-flight
tokens.
C
I
think
what
the
only
thing
that
we
really
need
is.
We
probably
may
want
to
adjust
the
width
of
the
of
those
requests
based
on
like
the
size
of
potentially
based
on
the
size
of
the
or
the
number
of
the
estimated
number
of
objects
that
we
will
be
returning.
So
it's
like
code
wise.
It
should
be
like
a
trivial
change,
like
literally
a
couple
lines
of
code.
B
C
No,
it's
basically
so,
okay,
so
maybe
to
put
in
another.
We
are
like.
Basically,
what
they
are
proposing
is
like
not
to
touch
lists
at
all.
Like
list
will
remain
the
same
api
as
it
is.
We
are
basically
going
to
utilize
like
the
more
or
less
existing
functionality
of
list
that,
if
you
or
sorry
of
watch
right.
A
C
You
pass
the
resource
version
equal
0
to
watch,
then
it
will,
at
the
beginning,
serve
the
basically
like
add
events
for
every
single
object
that
exists.
Currently,
there
are
some
tweaks
that
we
need
to
do
there
and
so
on,
but
like
conceptually
it's
it's
just
utilizing
this
existing
functionality
and
changing
reflectors
informers
and
so
on,
like
underneath
everything
is
using
reflectors.
D
C
So
it's
I
would
say
it's
more
of
like
a
client
side
change,
plus
some
like
that.
That's
a
little
bit
of
simplification,
because
there
are
server
side
changes.
There
will
be
like
a
new
param
to
like
and
to
to
get
that,
but
like
most
of
the
change
like
conceptually,
we
will.
C
We
are
tweaking
a
little
bit
of
the
watch
api
and
making
it
possible
to
parameterize
it
slightly
differently
and
all
the
other
changes
are
like
client-side.
C
Yeah,
so
this
gap
is
like
not
yet
approved
or
anything
like
that.
I
don't
think
it
was
even
like
fully
discussed
with
an
api
machinery.
C
I
did
two
passes
of
that
and
had
like
a
bunch
of
comments,
so
so
there
are
like
still
some
gaps
that
we
need
to
fill
in
there,
but
like
at
the
high
level.
I
think
I'm
I'm
more
or
less
supportive,
but
I
don't
think
it's
it's
in
a
state
of
like
being
and
they
were
close
to
approved
yet
so.
D
So
on
this
I
actually
remember
voytec
we
used
to
have
if
I'm
not
wrong,
there
used
to
be
an
old
watch
api
where
I
believe
like
when
you
start
the
watch
instead
of
doing
the
initial
list,
they
could
start
the
watch
itself
directly
and
the
initial
all
objects
are
sent
one
by
one
as
events
or
something
like
that.
Is
this
something
similar
to
that.
C
This
this
api
kind
of
still
exists
like
it's.
It's
basically
exactly
what
happens
when
you
send,
like
a
resource
version,
equals
zero.
So
it's
more
about
like
tweaking
that
that
code
and
making
it
more
explicit
and
so
on
then
inventing
something
new.
D
Cool,
so
I
had
one
idea
to
discuss
and
I
only
have
40
15
minutes
left.
If,
if
no
one
has
anything
else,
I
can,
I
can
bring
it
up.
I
wanted
to
discuss
actually
in
the
last
sig
meeting,
so
so
I
mean
I
still.
I
just
recently
only
started
looking
into
it,
so
I
may
not
be
so
thorough
in
how
I
see
this,
but
so
like
this
is
essentially
so
with
fcd.
D
One
of
the
problems
we
see
is
when
you're
making
rights
to
even
apsr
makes
rights
to
lcd
right
within
when
actually
the
when
he
writes
these
operations
to
its
right
ahead
log.
It
does
this
sequentially
one
right
at
a
time
because
of
this,
what
are
globally
unique,
increasing
counter
right.
So
what
this
means
is
rights
are
actually
happening,
sequentially,
so
pretty
much
all
the
I
o
operations
are
sequential.
D
That
means
no
matter
how
much
throughput
or
iops
you
provision
to
the
disk,
you're
still
latency
bound,
which
is
not
very
good
right,
and
I,
I
believe,
like
one
thing
we
did
to
work
around.
This
was
with
we
split
events
into
a
different
lcd
right.
You
know
in
our
scale
test
open
source
so
that
so
that
was
one
way
to
kind
of
like
get
a
little
bit
like
two
parallel
streams,
instead
of
one
stream
kind
of
thing
it.
D
So
what
I
was
thinking
is
what,
if
we
within
lcd
itself,
if
we
can
somehow
parallelize
different
prefix
prefix
ranges
right,
let's
say
the
prefix
for
pods.
If
you
think
of
that,
like
as
an
independent.
D
What
is
it
like?
An
independent
stream
which
has
its
own
unique
resource
version
right,
and
I
I
won't
talk
about
how
we
will
migrate
from
the
existing
thing
to
that,
like
I
have
some
ideas,
I
don't
yet
like
I
said
I
don't
want
to
go
too
into
those
details,
because
I
don't
know
if
it
will
work
without
looking
at
the
code
yet
but
like
let's
say
if
we
parallelized
the
at
some
level,
the
mvcc
for
each
of
these
objects.
D
So
instead
of
creating
different
lcd
clusters
for
different
objects,
let's
say
we
had
the
same
lcd
process,
but
within
that
we
could
shard
the
resources.
Because
from
what
I
understand
nowhere
in
kubernetes
does
api
server
ever
make
a
call
to
xcd
that's
trying
to
get
more
than
one
object
type
at
once
right,
it's.
It
always
makes
only
pods,
only
node
or
something
there
are
some
calls
like
cube
cut
will
get
all
and
stuff,
but
they
basically
internally
still
translate
to
individual
separate
calls.
D
So
if
that
is
the
case,
and
also
there
is
no
such
kind
of
ordering,
so
so
one
benefit
of
having
one
global
unique
counter
with
all
the
objects
together
is
you
can
make
some
complex
checks
like
which
can
be
dependent
between
object
types?
Let's
say
if
a
pod
is
created
with
a
node
already
assigned,
and
you
wanted
to
check
if
that,
no
at
the
time
when
that
part
was
created,
does
that
node
exist
so
for
that
it'll?
C
Actually,
we
are
officially
saying
that
you
shouldn't
depend
on
that
like
we
are.
We
are
officially
saying
that,
like
you,
should
never
compare
the
research
version
across
resource
types,
because
any
operator
is
free
to
share
that
cd
or
switch
at
cd
or
whatever.
C
On
a
pair
resource
type
basis,
so
actually
we
are
even
officially
saying
that
you
can
rely
on
the
monotonic
increase
of
of
resource
version
within
a
resource
type,
but
you
shouldn't
assume
anything
like
across
resource
types.
D
Oh,
that's
great!
That's!
Okay,
that's
perfect,
so
that
that
felt
intuitive
to
me,
even
otherwise,
because
usually
a
lot
of
these
controllers,
they
have
independent
reflectors
for
each
of
these
object
types
right,
so
you
cannot
guarantee
that
at
the
point
when
you
receive
a
pod
event,
what's
the
state
of
your
node,
cache
and
stuff
like
that,
so
so
anyway,
so
I
think
that's
the
highlight
I
don't
know
if
there
have
been
similar
ideas
floated
in
the
past
or
we've
discussed
anything
like
this,
but
what
I'm
trying
to
do
is
within
xcd
itself.
D
Basically,
at
the
prefix
level,
we
split
out
the
mbcc
and
maybe
like
the
right
ahead
log,
maybe
not
even
mbcc,
but
just
the
right
ahead,
log,
so
yeah.
D
If,
if
you
think
at
a
high
level,
this
makes
sense.
I
I
I'll
consider
like
exploring
this
a
bit
further
and
maybe
writing
a
cap.
But
this
is
this
is
a
scalability
issue.
We
are
seeing
within
eks
I've
seen
with
a
bunch
of
customers
some
of
our
customer
clusters.
D
I
don't
know
how
much
others
see
like
splitting
events
helps
right
but
like
even
within
within
a
particular
object
type.
Once
you
go
beyond
the
scale,
I
think
you
might
hit
this,
and
this
also
has
cascading
impacts
on
reads
because
reads
are
also
consistent,
so
they
are
behind
the
right.
So
if
rights
gets
piled
up,
then
reads
can
get
slow
and
stuff
like
that.
I
think.
C
Yep,
so
I
don't
know
at
city
well
enough
to
likely
say
something
smart
at
the
high
level,
it
makes
sense
to
me.
I
see
one
problem
that
you
will
encounter
for
sure,
which
is
review
bandwidth
on
the
lcd
project,
like
that
cd
project
is
really
really
suffering
from
lack
of
maintainers
and
lack
of
people
who
are
able
to
review
stuff
and
so
on.
So
and
this
effectively
will
be
a
change
in
the
cd
right,
so
they
are
prioritizing.
C
Bugs
and
by
or
sorry
bug
fixes
and
like
reliable,
like
reliable,
purely
reliability
related
and
a
little
bit
of
tank
depth
cleanup
and
so
on.
To
make
this
the
project
a
little
bit
safer
on
top
of
any
feature
related
work,
I
would
call
it
like
a
little
bit
of
a
feature
what
you
are
saying,
so
you
will
probably
face
this
problem.
It's
just
like
a
heads
up
for
you.
D
Okay,
so
I
think
things
will
probably
move
slower
than
I'd:
expect
cool,
okay,
all
right!
Okay,
if
you
think
it
makes
sense,
I
I'll
see
if
I
can
dig
a
bit
further
into
this,
but
is
this
a
problem
you've
seen
like
rights
because
it
has?
It
also
depends
a
lot
on
your
setup
and
like
what
kind
of
disk
you're
using
and
all
that
stuff
and.
C
So
yeah,
I
think
I
remember
a
case
with
that,
or
maybe
a
two
cases
of
that,
but
like
it's
not
like
something
that
is
seemed
to
be
an
urgent
problem
for
us.
I
think
we
are.
We
are
often
seeing
like
a
throughput
problem
to
hurt
us
even
more
on
the
api
server
side
and
the
watch
cache
and
so
on,
and
so
on
then
like
on
the
cd
site
per
se.
C
D
C
I
I
can
easily
imagine
a
right
heavy
with
like
many
crds
and
like
not
that
many
rights
per
type,
but
like
a
lot
of
a
lot
of
different
types
and
like
if
you
have
like
10
types
or
like
I'm
sorry,
100
different,
like
crds
and
even
100
rights
per
crd,
is
not
that
much
so
like
like
things
like
watch
questions
and
so
on
will
easily
handle
that,
but
it
it
already
gives
you
like
what
10
000
qps,
which
is
that
cd
will
can
easily
like
depending
on
the
setup,
but
it
it's
it's
it's
not
that
far
from
its
limits.
C
D
C
D
All
right,
so,
if,
if
there
is
nothing,
no
one
else
has
to
discuss,
I
want
to
bring
up
one
last
thing.
This
is
regarding
the
the
issue
I
had
cut
earlier
with
this
behavior
with
reflectors,
with
the
watch
cache.
D
So
this
ticket-
I
think,
oh,
I
think
you
did
take
a
look
at
this
one
already.
So
I'm
still
pending
a
comment
as
well
I'll
respond
to
some
of
the
things
you've
asked
but
martial.
Can
you
open
the
open
that
issue
real
quick
once.
D
Yeah,
so
I
think
what
I
was
mainly
wondering
is
that
fix,
which
was
made
right.
So,
okay.
So
like
a
quick
recap
of
the
issue,
what
happens
is
I
can
just
if
you
can
scroll
down
so
there's
like
a
event,
log
actually
yeah
in
this
comment,
so
what
happens
is
like
there's
a
client
which
is
trying
to
do
a
watch
and
it's
trying
to
do
a
result.
D
It's
trying
to
do
this
initialist
with
a
given
resource
version
right
and,
let's
say
for
some
reason,
like
the
watch-
moves
from
one
api
server
instance
to
another
apis
or
instance,
and
that
api
server
instance
for
that
resource
type.
Its
resource
version
is
behind
it's
in
the
past
because
just
because
there
weren't
any
events
for
that
object
type
since
since
that
resource
version.
D
D
It
will
get
like
a
500
error
right
and
then
it
will
keep
retrying
like
10
times
and
every
single,
because
some
client
retires
and
every
single
time
it
fails,
and
eventually
it
does
like
a
list
from
hcd
and
then
that
list
takes
it
to
an
even
further
resource
version
in
future,
while
the
resource
that
the
cash
watch
cash
is
still
stuck
in
the
past.
D
So
I
think
why
I
think
you
made
a
fix
for
this
to
I
I
guess
to
do
this
behavior,
which
is
like,
if
you
get
too
large
a
source
version
then
actually
make
a
call
to
hcd.
But
what
I?
What
I'm
trying
to
claim
here
is:
is
that
even
useful,
and
is
that
even
helping
anyway,
because
if
there
are
no
events
to
if
there
are
no
changes
to
objects
of
that
resource
type
like
there
is
probably
no
point
in
making
forward
progress
by
just
relisting
and
like
causing
all
this
churn
of
5xx
failures.
C
No,
I
don't
think
we
should
be
listing
from
resource
version
0,
because
the
fact
that
your
api
server
doesn't
have
it,
it
doesn't
mean
it
it's
not
there
right,
like
your
api
server,
may
be
slow
with
processing
that
and
and
lagging
behind
other
stuff.
So
it
was
very
conscious
decision
to
not
do
not
list
reach
or
from
resource
version
equal
to
zero
to
not
to
avoid
going
like
to
ensure
that
I
can.
D
C
C
All
right,
I
think
I
agree
it's
like
far
from
perfect
behavior,
what
you're,
what
you
are
saying,
what
we
are
showing
here.
I
think
the
the
question
is
like
if
we
should
be
doing
anything
without
about
that
or
basically
with
the
efficient
watch
resumption
feature
or,
however,
we
call
it
the
reuse
of
progress
notify
from
hcd,
which,
basically
is
this
feature,
is
doing.
Watch
cache
will
actually
be
making
progress
and
it's
enabled
by
default
in
121
right.
C
I
think
it
was
121
when
we
enabled
it
let's.
Let
me.
D
D
C
It's
not
that
fresh
version
of
that
cd.
I
can't
remember,
but
I
think
it's
it's
maybe
even
3.2.
C
Maybe
not
maybe
maybe
3.3.
I
think
that
the
trick
was
that,
like
in
the
older
versions,
it
wasn't
exposing
the
knob
to
configure
it,
and
then
it
was.
It
was
only
sending
this
progress
notify
like
every
hard-coded
time.
I
think
minute.
D
C
Minutes
events
or
something
so
so
so
we
exposed
that
like
knob
and
some
patch
release
of
lcd
3.4
point
something.
So
yes,
it's.
It
requires
some
reasonable
version
of
xcd.
But
it's
not
like
super
fresh
version
of
xcd.
D
Okay,
yeah,
and
so
what
I
was
trying
to
say,
though
I
think
is
just
like
cuba.
Anyone
who's
running
cubans
121
does
not
automatically
get
this
right.
They
also
have
to
explicitly
in
their
hc
the
way
they
set
up
fcd.
They
have
to
also
enable
that
flag.
Whatever
is
going
to
send
this
progress,
notifier.
C
That
flag
is
enabled
by
default,
so
actually,
okay,
yes
in
120,
it's
already
there
in
121,
it's
enabled
by
default,
so
actually
in
121.
They
will
already
have
this
behavior.
It's
just
like
the
progress
notify
if
they
have
hcd
configured
like
if
they
have
default
lcd
configuration.
This
wouldn't
be
super
helpful,
because
those
progress
notifies
will
be
coming
like
every
minute
or
five
minutes,
or
something
like
that.
Some
way
too
too
rarely
do
to
be
useful.
D
Not
sure,
okay,
so
on
on
that
note,
actually
one
last
thing
so,
as
I
think
of
it,
I'm
not
fully
sure
if
progress
95
will
solve
this
problem
100
as
well,
because,
let's
say
if
this
happens,
that
the
progress
notify
keeps
updating
the
rv.
Let's
say
every
few
seconds:
it'll
update
it
by
100
or
something,
but
if
this
retry
keeps
happening
on
the
reflector
where
it
keeps
making
these
list
calls
that
fails
a
tree
lists
and
when
it
release
it
will
get
like
the
latest
rv
from
hcd.
D
So
that
may
still
be
ahead
of
the
of
the
progress
notify
rvs
that
are
coming
to
the
watch
cache
right.
So
could
this
happen?
Could
this
happen
that
we
are
perennially
all
in
stuck
in
this?
D
I
think
it's
it's
a
matter
of
time
outs
right,
because
it's.
D
Minute
from
what
I
saw
on
the
client,
it's
trying
to
like
re-list
every
one
minute
and
it
it
it
once
at
least
from
etsy,
it
starts
a
watch.
D
C
So
so,
if
we
are
not
retrying
too
frequently,
and
we
shouldn't
also
because
of
we
have
multiple
retries,
then
it
should
work.
D
We'll
just
we'll
probably
just
know
this
like
when
it
happens,
but
yeah
at
least
like
this
cannot
make
things
bad.
It
should
only
help
if
it
can
cool
okay.
So
then
we
are
fine.
I
think
the
call
we
are
making
is
we'll
just
we'll
just
say
that
okay,
we
3d
progress,
notify
fixes
this.
So
let's
not
worry
about
optimizing
it
optimizing.
This
whole
behavior,
okay,
cool.
I
think
you're
over
time.
Sorry,
for
for
taking
more
time.