►
From YouTube: SIG - Performance and scale 2022-07-14
Description
Meeting Notes:
https://docs.google.com/document/d/1d_b2o05FfBG37VwlC2Z1ZArnT9-_AEJoQTe7iKaQZ6I/edit#heading=h.tybh
A
So,
okay,
okay,
this
is
july
14th
2022.
This
is
sixth
scale.
A
I'm
going
to
link
the
I'm
reading
out
some
chat
all
right
here.
We
go
okay,
so
today
we'll
just
do
abbreviated
agenda.
I
think
we
need
some
more
folks
to
talk
about
the
bottom
two
items
so
we'll
just
do
the
first
one.
This
is
an
issue.
That's
actually
been
open
for
a
while,
but
we've
been
testing
internally
in
nvidia
for
a
little
bit.
This
is
a
concept.
That's
been
added
in
kubernetes,
I
think
in
121
it
was
added
as
I
went.
A
A
It
focuses
on
making
sure
that
traffic
has
a
fair
chance
at
getting
api
server
access
and,
as
part
of
that
change,
you
get
the
ability
to
some
other
features
to
like
some
kind
of
some
rate
limiting
ability-
and
you
can
also
do
some
other
things
like
protecting
the
kubernetes
api
server,
which
is
really
important
and
protecting
the
control
plane,
which
is
really
important
for
a
lot
of
reasons.
Just
so,
you
can
take
it
in
your
cluster
for
multi-tenancy.
Lots
of
use
cases
work
so
anyway.
A
The
this
makes
sense
to
actually
for
kubert
to
integrate
with
this
api
it
because
q
vert
does
generate
a
lot
of
traffic.
We
want
to
make
sure
that
that
you
know
like
one,
those
requests
have
a
good
shot
at
reaching
the
api
server
and
also
the
in
the
same.
In
the
same
idea,
we
also
want
to
make
sure
we
don't
overwhelm
the
api
server
and
we
don't
really.
I
mean
from
from
all
the
testing
we've
done.
We
don't
we
don't
we're
not
really.
The
cuber
is
not
the
culvert.
A
Whenever
it
comes
to
sending
a
lot
of
requests
to
the
api
server,
it
usually
comes
in
some
other
form,
but
if
it
also
makes
sense
that
that
we
protect
ourselves
from
anyone
else,
that's
very
noisy.
So
there's
a
lot
of
good
reasons
to
add
this.
So
the
to
give
like
see,
I
don't
have
any
more
data,
so
the
I've
done
a
presentation
on
this
in
the
past
and
kind
of
go
through
like
some
of
what
the
different
the
different
things
mean
different
like
settings.
A
A
What
this
is,
what
this
does
is
it'll
do
per
user,
so
kubert's
just
got
one
service
account
and
we'll
have
a
we'll
do
rate
limiting
for
the
all
of
the
apis
and
the
keyword
group
and
all
the
verbs
and
then
and
then
the
keyword
name,
space
and
what's
going
to
do,
is
going
to
take
the
request
it's
going
to
put
in
the
workload
low
queue.
This
is
where
I
found
it
made
the
most
sense.
A
This
is
where
a
lot
of
a
lot
of
other,
like
anyone
who
wants
to
has
a
service
account
their
like
or
any
application
when
they
whenever
they
want
to
be
in
a
a
queue,
they
usually
go
into
it's
a
workload
low.
That's
that's
what
I
saw
so
this
made
sense
to
enqueue
include
keyword
in
here.
I
think
this
is
just
an
easy
way
to
start.
We
could
also
create
our
own,
but
I
thought
this
was
a
very
simple
way
just
to
to
get
started.
A
We
can
always
we
can
always
change
it,
but
for
in
terms
of
like
what
we
saw
in
results,
I
I
was
very
promising,
like
in
the
cases
that,
where
that
that
we,
that
we've
observed
high
amounts
of
pressure
on
the
api
server
by
some
other
application,
we
would
still
see
that
hubert's
requests
were
were
able
to
were
able
to
get
through
and
the
the
number
of
rejections
from
the
api
server
was
was
small,
so
that
was
really
good
to
see
and
it's
what
we
wanted.
A
So
I,
I
think,
kind
of
in
terms
of
a
starting
point
for
api
party.
In
fairness.
I
think
this
is
something
that
that
fits,
it
makes
makes
sense
to.
You
know
start
with
this,
and
we
can
always
optimize.
You
know-
and
I
think
there's
also
like
this
is
also
per
cluster.
I
really
think
that
people
will
want
to
edit
this
over
time
like
based
on
what
their
you
know,
performance
that
they're
expecting
and
their
clusters
so
really
is
like
kind
of
way
to
look
at
this
is
like
in
our
default
installation
of
qver.
A
You
know
what
would
we
expect
to
have
and
what
we
expect
to
work
well
and
I
think
workload
low,
because
it's
a
default
priority
level
configuration
that
gets
deployed
and
and
and
then
this
old
last
field
is
the
matching
precedence.
This
just
means
that
we're
going
to
be
below
kubernetes
defaults.
Basically,
the
control
plane
system
precedence,
it's
just
right
below
it.
A
I
think
900
is
like
the
last
one,
so
we'll
be
we'll
be
high
up
there
as
an
add-on
but
we'll
be
below
kubernetes
in
terms
of
the
amount
of
shares
that
we'll
get
of
the
api
server.
A
B
I
yeah
it
makes
sense
to
have
some
priority
over
users
and
to
allow
our
api
to
to
make
the
calls
we
need.
So
do
I
understand
it
correctly?
We
don't
get
rid
limited.
We
just
get
more
priority
over
other
users
right.
A
So
weak,
so
technically
you
can
get
right.
So
it's
it's!
You
can
get
rejected.
Your
request
can
get
rejected
by
the
api
server,
but
it's
with
when
you
integrate
like
this
in
the
way
that
we're
doing
it,
it's
it's
not
likely
so
here
I'll
I'll
back
up
and
put
it.
This
way.
Put
it
through
this
way.
If
there
is
no
real,
if
there's
no
api
priority
and
fairness
in
place
right,
it's
just
a
free-for-all,
so
anyone
who
gets
it
first
wins.
A
The
whole
idea
is
like.
So
if
there's
someone
really
noisy
right,
we're
going
to
get
we're
not
going
to
get
access
and
we're
going
to
get
rejected.
If
we
have
this
cue,
what
this
does
is
it
focuses
on
making
sure
that
the
person
is
really
noisy
gets
rejected
more
often
than
the
people
who
are
less
noisy
but
you're
still
affected,
because
resources
are
honestly
finite.
A
So
you
can
be
so
that
your
question
about
being
rate
limited,
we
technically
can
get
rate
limited
because
there
can
be
someone,
who's
really
really
noisy,
but
we'll
be
more
protected
because
we
have
ourselves
a
flow
control
and
a
priority
level
config.
So
we
there's
just
so
there's
sort
of
some
guarantees
that
we'll
have
a
good
shot
at
the
server
that
answers
your
question.
B
A
Yeah
yeah
all
right
well
I'll
start
follow
up
with
up
here
on
this
and
we'll
we'll
go
from
there.
Okay,
cool
all
right
lubo!
If
I
don't
know,
if
you
have
anything
else,
you
want
to
add
our
topics,
but
if
not
we'll
can
push
these
to
the
next
meeting.