►
From YouTube: API Priority and Fairness in kcp. Discussion 2 of ?.
Description
Discussion of approaches to including the API Priority and Fairness feature in https://github.com/kcp-dev/kcp .
A
I
hope
all
right
anyway,
so
you
know
basically
this
relationship.
Basically
the
you
know,
I
think
the
most
serious
problem.
A
Well,
I
mean
we
have
concerns
right,
so
we're
forced
to
trade
off
between
either
having
a
priority
level
be
specific
to
a
workspace,
in
which
case
we
have
lots
of
priorities
potentially
to
deal
with
or
sharing
or
having
a
priority.
Little
span
workspaces,
which
destroys
some
of
the
insulation
between
workspaces.
A
You
know,
that's
that's
one,
a
big
trade-off
and
the
other
another
one
is
if
we
want
configuration
specific
to
a
workspace,
and
that
requires
objects
that
are
specific
to
a
workspace.
You
know
we
can
use.
If
we're
willing
to
have
the
behavior
be
specific
to
a
workspace.
We
can
use
the
approach
of
having
a
default
set
of
objects
that
are
not
stored
and
then
having
additions
into
overwrites
stored
so
that
that
kind
of
mitigates
that
so
anyway,.
A
B
Does
it
make
sense
to
do
option
one
which
is
priority
levels
per
workspace
with
the
PLC
and
Fs
objects
per
workspace
like
it?
If
that
makes
sense
and
I'm
going
to
defer
to
you
in
particular,
Mike
like
if
that
makes
sense,
I
think
we
can
find
a
way
to
make
it
happen.
A
So
it
makes
sense
it's
coherent
in
at
least
the
sense
that
it's
you
know
something
you
can
Define
it
has
that
concern
that
you
know
we're
talking
about
problem.
A
A
A
But
that's
not
the
current
currency
limit
that's
enforced.
If
there's
a
dynamically
adjusted
concurrency
limit
that
gets
enforced
and
the
way
it
works
is
every
10
seconds.
A
There's
an
adjustment
that
takes
into
account
the
recent
I
tend
to
say
pressure
or
I.
Think
in
the
the
cap
dock
it's
called
seek
demand.
It's
really
looking
at
the
number
of
seats
that
so
seek.
A
Demand
specifically,
is
defined
to
be
the
number
of
seats
that
are
occupied,
plus
the
number
of
seats
in
of
the
queued
requests
and
there's
a
subtlety
also
in
APF,
in
that
the
a
request
has
two
different
number:
two
different
widths:
there's,
an
initial
width
and
a
final
width
and
but
with
the
number
of
seats
and
so
for
this
seat
to
bang.
A
We
use
the
larger
of
the
two
because
that's
actually
dispatching
Works
dispatching
also
uses
the
larger
of
the
two,
because,
unfortunately,
when
there's
a
difference,
the
larger
width
comes
second,
but
you
can't
interrupt
request
once
you've
dispatched
it,
so
you
have
to
dispatch
conservatively.
A
A
This
technique
keeps
track
of
the
average
estate
of
deviation
and
the
high
water
mark
of
the
seek
demand
and
then
at
each
oh,
and
also
each
parade
level
is
configured
with
a
limit
on
how
many
seats
it
can
lend
and
how
many
seats
it
can
borrow,
and
these
are
expressed
as
fractions
of
it's
nominal
percentage,
but
anyway,
anyway,
the
what
the
adjustment
does
is.
It's
it's
got
a
little
bit
of
logic,
that's
not
quite
trivial.
You
can
see
it
in
the
cap,
but
basically
what
it
says.
A
The
first
thing
it
says
is
the
effect
is
the
first
any
prayer
level
that
has
lent
seats
for
which
demand
has
shown
up
in
the
last
adjustment
period,
we'll
immediately
reclaim
those
seats
and
then
beyond
that,
there's
it's
kind
of
a
Target,
that's
defined
as
the
well.
It
just
gets
smoothed
the
thing
there's
exponential
smoothing
and
the
thing
the
input
to
the
exponential
smoothing
is
each
period
and
we
take
the
average
plus
the
standard
deviation,
and
that
goes
into
the
exponential
smoothing.
A
But
then,
if
the
high
water
mark
ever
jumps
up
that
that
immediately
jumps
that
up
so
anyway,
the
net
result
is
that
a
level
that
has
led
seats
out
of
its
nominal
allocation
will
immediately
reclaim
them
at
the
next
adjustment
and
beyond
that,
there's
a
kind
of
a
attempt
to
share
the
pain
or
the
wealth
relative
to
the
demand.
That's
been
seen
so
I,
don't
know
if
it's
at
all
clear
or
understandable,.
A
So,
to
be
clear,
the
regarding
fractions,
the
way
even
before
borrowing
the
way
it
worked
is
we
will
take
the
server's
concurrency
limit
and
then
for
each
variety
level.
It's
got
a
number
of
shares
right,
which
is
a
fraction
of
the
total
shares.
So
you
take
that
fraction
of
the
total
concurrency
limit,
which
is
in
general,
a
lot
of
integer
and
round
it
up,
and
that
gives
the
concurrency
limit
that
gets
enforced
for
that
priority
level.
I.
A
So
if
we
were
to
be
concerned
with
a
large
number
of
workspaces
and
do
this
same
kind
of
thing
for
workspaces,
we
would
yeah
yeah
you
wouldn't
get
to
zero
unless
the
fraction
was
actually
zero,
the
which
you
would
basically
never
do,
because
the
exponential
smoothing
would
never
go
to
zero
I.
C
I
feel
like.
We
would
not
be
good,
because
there's
kind
of
two
audiences
for
PNF
in
kcp,
the
first
being
that
kcp
had
been
the
second
one
being
an
individual
in
a
cluster
and
I'm,
not
sure
that
letting
users
configure
their
own
priority
classes
and
like
lock,
people
out
of
sharing,
would
suffice
for
the
admin
case
right.
A
B
C
And
I
guess
to
maybe
put
some
context
there
as
well
like
when
you're
using
SCD
as
the
backing
store,
like
you,
have
some
really
rough.
Like
mutation,
concurrent
mutation
in
flight
limit
type,
things
before
stuff
goes
to
really
unpleasant
places,
and
so
one
of
the
thoughts
very
early
on
was
we
kind
of
know
the
scaling
envelope.
That
will
make
this
reasonable
and
it
does
look
like
quite
a
lot
fewer
mutating
requests
as
compared
to
a
normal
Cube
cluster.
C
A
You
being
an
owner
of
a
workspace
sorry,
so
if
it's,
if
we
were
to
do
borrowing
between
workspaces,
then
clearly
it's
a
dynamic
value
and
I'll.
Let
me
just
add:
you
know
another
FYI
I'm,
not
quite
sure
you
know
what
my
I'm
still
processing
it
so
I'm,
not
quite
sure
what
my
final
opinion
is,
but
I'll
just
FYI,
you
know
Upstream.
We
do.
D
A
We
just
I
was
just
told
earlier
this
week
about
someone
running
into
trouble
with
APF
and
the
scenario
is
they
have?
It
didn't
really
explain
why?
But
there
was
one
client
that
was
sending
requests
that
always
time
out,
and
these
were
like
big
list
requests.
I
think
yeah
I
think
he
was
saying
that
the
client
is
slow
to
read
the
response,
so
each
one
of
these
requests
occupies
10
seats
because
that's
the
maximum
width,
that's
a
limit
in
the
code,
and
it
takes
a
minute.
A
You
know
it
holds
those
10
seats
for
a
minute
and
in
the
priority
level
that
he
was.
This
was
happening
the
hand
size
was
six,
so
that
means
that
this
client
could
really
occupy
60
seats
at
once.
There
was
more
or
less
ready
to
occupy
60
seats
all
the
time,
because
for
each
of
the
cues
that
this
client
is
being
dealt
onto
every
request,
it
holds
onto
10
seats
for
a
minute
and
as
soon
as
that
timeout
happens,
there's
another
request
waiting
to
take
its
place.
A
Other
requests
could
get
in
occasionally,
but
the
concurrency
limit
applied
to
that
priority
level
was
less
than
60.,
or
maybe
it
was
60,
but
it
wasn't
greater
than
60.,
so
the
other
clients
would
would
have
to
wait
basically
for
six
minutes,
I
guess
yeah
I
guess
well.
He
was
saying
that
the
concurrency
limit
was
10
or
less
so
really
the
other
clients
would
have
to
wait
for
10
requests
from
this.
You
know
bad
client
before
another
client
would
get
served.
A
So
you
know
we
talked
about
kind
of
guidance
for
configuration,
and
you
know
the
really
the
the
problem
condition
for
this
really
is,
when
the
hand
size
times
the
maximum
width
is
less
so
than
or
equal
to
the
concurrency
limit.
Then
one
bad
client
can,
you
know,
cause
a
lot
of
latency
for
other
clients.
A
A
A
Unless
and
until
you
know,
I
could
think
of
something
better
and
you
know
potentially
we
can,
because
this
is
kind
of
a
special
case
that
could
maybe
be
recognized
and
dealt
with
other
ways.
You
know,
but
you
know
if
we
take
that
as
guidance,
you
know
I
think
that
that
really
shoots
down
the
approach
that
says
we're
gonna
really
have
low
concurrency
limits
for
a
workspace.
C
So
I
was
in
requesting
flight
1000
workspaces,
each
with
five
priority
classes.
Each
priority
class
in
each
workspace
gets
two-fifths
of
a
seat
right.
Does
that
trigger
the
behavior
you're
talking
about.
A
Well,
I
guess:
I
was
starting
to
explain
earlier.
There's
ceilings
gets
applied
so
every
time
the
the
the
logic
right
for
deciding
on
the
concurrency
elements
has
a
ceiling
in
it.
So
you
know
you
never
get
a
fraction
of
a
seat.
You
always
get
one
or
two
or
three.
Unless
you
know
your
shares
were
X
or
the
total
limit
were
actually
zero
and
then
it
could
get
zero.
A
A
But
my
point
here
is
that
one
isn't
enough:
okay
right
if
you've
got
a
hand
size,
even
if
you
only
had
a
hand
size
of
one,
the
maximum
width
is
fixed
in
the
code
at
10.,
so
you
would
need
really
10
seats
in
order
to
not
have
this
problem
right.
C
A
Like,
however,
you
know,
I
do
want
to
say
kind
of
still
in
in
defense
of
this
borrowing
idea
right,
we
could
potentially
have
a
different
kind
of
borrowing
between
workspaces
so
that
you
know
as
it
as
I
was
saying
for
borrowing
priority
levels.
We
use
this
exponentially,
smooth
Target
or
demand
figure,
and
since
it's
exponentially
smoothing
it'll
never
go
to
zero
for
workspaces.
We
could
tweak
that,
so
it
does
go
to
zero
because
you
know,
obviously
you
know
we
you
we
must
be
expecting.
A
If
you've
got
a
thousand
workspaces,
you
must
be
expecting
that
most
of
them
are
completely
idle,
so
we
could
adjust
the
borrowing
so
that
it
will
give
us
zero
allocation
to
those
workspaces,
and
what
that
would
mean
is
that
a
request
that
arrives
while
the
allocation
is
zero?
You
know
sits
in
a
queue
until
next
adjustment
period,
in
which
case
that
workspace
would
get.
You
know
some
multiple
of
ten,
so
they
can
actually
succeed
and
and
then
get
dispatched.
A
Now
I
guess
I
want
to
also
should
also
point
out.
It's
also
possible
again,
since
you
know
today,
with
with
APF
turned
off,
you
know
we're
just
betting,
hoping
that
you
know
they're,
not
a
bunch
of
requests
that
actually
shut
up.
At
the
same
time,
we
could
continue
that
okay
and
continue
and
and
just
say
that,
yeah
okay,
the
minimum
allocation
per
workspace-
is
not
going
to
be
zero
and-
and
it
will
add
up
to
a
lot,
but
again
we
continue
to
expect
that
most
of
it
will
be
unused.
I'm.
Sorry.
C
C
C
C
Guess
the
other
other
thought
is
like
so
just
back
to
the
straw,
man
of
invisible,
Uber
APF,
visible
internal
APF
in
that
duality,
if
it
looks
to
the
user
in
their
logical
cluster,
like
they
are
virtually
being
given
all
of
the
mutate
or
all
of
the
Max
and
flight
requests
and
the
number
of
seats
is,
you
know
calculated
as
if
they
were
going
to
get
all
of
it.
None
of
the
math
is
broken.
C
C
Two
entirely
disjoint
systems
of
APF,
one
at
A
Shard
level,
and
then
once
a
request
goes
through
I
guess
this
would
probably
be
a
pretty
invasive
server
change.
But
once
the
request
goes
through
the
first
layer
at
The
Shard
level,
it
can
then
be
handled
by
APF
in
the
electrical
cluster
that
allows
users
to
you
know,
give
relative
priority
to
their
own
workloads,
but
never
exceed
what
the
system
gives
their
workspace.
B
C
I
mean
yeah:
can
we
I
guess
I'm
trying
to
think
about?
Like
you
know,
one
ancillary
design
goal
here
is
minimal.
Massive
changes
to
APF
right.
C
A
What
it's
saying
is
that
you
know
where
Jamie's
got
this
delegator
delegate
T
thing
right.
That
would
still
work
for
the
second
layer,
inner.
A
Right
and
the
outer
layer
again,
you
would
have
one
shared
delegated,
because
it's
really
the
controller
right
that
consumes
the
config
and
manages
requests.
So
you
could
you
you
because
of
this,
the
existing
structure,
the
thing
that
handles
requests-
you
know
the
controller
logic
is-
is
really
modular
right.
It's
just
got
enough
hooks
to
be
able
to
delay
or
reject
a
request
to
then
release
it
for
processing,
so
it
could
just
as
well
release
it
to
the
second
layer
of
APF
as
to
a
following
Handler
yeah,.
B
A
So,
actually,
right
this,
this
concern
about
users
being
able
to
destroy
the
protections
is
a
new
one.
Also
I
need
to
add
that
into
this
document,
and
let
me
pursue
another
thing.
You
know
I
just
wanted
to
understand.
A
You
know
the
the
concern
in
the
other
direction
right
is
that
without
giving
users
control,
you
know,
they're
the
kind
of
don't
have
a
lot
of
ability
to
customize
right.
It's
the
the
administrator
of
the
you
know
the
kcp
server
really
has
to
be
able
has
to
configure
it
in
a
way.
That's
going
to
work
for
all
the
workspaces.
C
It
seems
like
you'd,
basically
end
up
treating
every
workspace,
every
user
workspace
the
same
and
the
user
like,
because
workspaces
are
cheap.
It's
easy
enough
to
stamp
them
out,
but
I
mean
from
my
understanding
of
Jamie's
work
like
it's
well
positioned
to
solve
both
problems
and
I.
Don't
know
that
like
solving
the
global
one
first
and
then
the
user.
One
second
isn't
like
another
year
of
effort
right
like
it
seems
to
me
like
that,
should.
C
A
Think
I'll
totally
right,
he's
already
got
the
second
layer.
I
mean
it's
working
now
the
only
thing
that
he
that's
not
there
is
the
config
producing
controller
right,
and
you
know
if
we
go
with
the
two
layers
think
about
this
now.
Let
me
yeah
I'm,
not
sure
I
really
thought
through
the
two-layered
approach.
D
D
C
A
Well,
yeah
so
I'm
still
trying
to
think
through
the
the
two
layer
approach.
So
in
the
two
layer
approach
in
the
outer
APF,
remember,
APF
itself
has
kind
of
two
layers
right.
There's
this
isolation
into
a
collection
of
priority
levels
and
then
fairness
within
each
priority
level
in
the
outer
APF.
Is
there
any
meaning
to
a
collection
of
prayer
levels,
or
would
we
only
be
doing
fairness
between
workspaces.
C
If
we
can
detect
like
loopback
traffic
from
the
virtual
API
server
stuff,
like
that,
I
think
that
could
be
a
privileged
priority
class.
A
So
something
we
haven't
talked
about
is
in
APF.
There
is,
you
know
it
starts
with
configurable
classification
and
actually
some
priorities,
don't
cue
or
reject.
They
just
pass
things
through
without
any
further
control.
A
B
Yeah,
so
that's
basically
what
we
want
for
all
the
system
level
things
for
I
mean
not
necessarily
all
of
them,
but
the
majority
of
our
controllers,
the
loopback
clients
and
whatnot.
If
we
need
to
divide
them
into
categories,
we
can,
but
those
need
to
take
priority
generally
over
any
other
client
yeah.
A
And
in
the
default
configuration
for
APF,
the
admin
clients
get
you
know
classified
or
their
requests
get
classified
to
a
prayer
level
that
passes
things.
That's
called
exempt
that
gets
process
things
through
without
control.
C
There's
also
a
future
I
guess
where,
like
in
a
production
deployment
of
kcp,
you
provide
users,
you
know
different
payment,
tiers
or
whatever,
and
that
allows
their
workspaces
to
have
more
or
less
priority.
A
Okay,
but
remember
priority
is
actually
only
an
aspirational
word.
We
don't
actually
have
priority.
We
have
these
items
well,
but
it
really,
it
really
is
not
priority.
Okay,
there
was
originally
some
thought
that
we
could
have
a
sense
of
priority,
but
we
don't
even
with
borrowing-
that's
not
prioritized
more.
C
A
A
C
I
guess
I'm
saying
yes,
because
I
think,
like
first
you
know,
I
could
consider
like
two
canonical
users
of
kcp.
That
I
think
we've
had
in
mind.
One
is
like
the
service
provider
right,
so
I'm
exporting
certificates
and
I'm
running
cert
manager
in
my
workspace
that
workspace
is
going
to
have
a
lot
of
traffic
and
that
traffic
is
critical
to
the
functioning
of
like
their
service
and
that
could
be
higher
priority
class
than
some
random
user.
Installing
right.
A
Except
again,
we
don't
have
higher
priority.
We
just
have
different
tools
which
can
be
given
more
or
less
concurrency.
Okay,.
D
A
You
added
an
affirmative
there.
There
is
a
sense
of
a
kind
of
a
use
case
for
grouping
workspaces
into
different
concurrency
pools
and
then
within
each
concurrency
pool.
We
would
just
do
fairings
amongst
the
workspaces.