►
From YouTube: Kubernetes SIG Scheduling Weekly Meeting for 20230126
Description
In this meeting, we brainstormed the idea of preferred topology affinity.
B
So
I've
been
thinking
about,
you
know
scoring
of
resources
from
a
kind
of
like
topology
point
of
view,
for
groups
of
nodes.
B
Let's
say
a
wreck,
for
instance,
so
doing
scheduling
for
and
looking
at
the
resource
that
is
of
the
whole
rack
and
having
an
impact
from
there
due
to
the
scheduling
decision
and
actually
I
made
some
experiments
already
related
to
this
one
and
I
was
basing
that
experiment
to
this
node
resources
fit
plugin,
which
already
beautifully
calculates
resources
or
scores
for
individual
nodes,
and
on
top
of
that
I
just
you
know,
put
a
little
bit
of
a
hacky
code
to
this
normalized
function
and
did
a
little
bit
of
calculating
in
there
for
topology
key
value
players
shared
among
nodes,
and
then
you
know
had
that
as
a
sort
of
an
adjustment
for
the
node
score.
B
B
So
at
least
I
couldn't
figure
out
anything
else
and
the
use
case
is
it's
a
little
bit.
You
know
connected
to
these
Port
groups,
so
suppose
you
have
like
a
rack
which
has
very
fast
interconnections
between
gpus,
but
those
interconnects
don't
span
to
other
racks.
So
you
basically
want
to
have
a
bunch
of
ports
to
go
and
be
scheduled
into
some
rack.
You
don't
want
to
say
which
track
it
is,
but
some
rack
as
long
as
it
has
resources,
and
this
Interpol
Affinity
allows
for
basically
doing
that.
B
A
So,
first
of
all,
I
think,
let
me
clarify
the
requirements
that
I
think
if
I
answer
correctly
is
that
you
want
to
generally
say
I,
don't
want
to
use
the
the
particular
kind
of
resources
by
the
regular
workers.
That
may
not
need
that.
So
so
those
kind
of
expensive
resources
can
live
for
the
real
workflows
which
may
really
need
them.
Is
that
the
the
requirement.
B
You
know,
there's,
like
you
know
many
ways
of
configuring
that
you
can't
do
this
like
in
your
least
allocated
node,
or
you
know
so
forth,
so
I
would
take
want
to
have
some
sort
of
topologic
angle
to
it.
So
I
have
like
a
least
used
rack
prefer
the
best
Note
in
the
least
used
or
rack.
That's,
basically
the
goal
and
the
what
constitutes
as
the
resource.
Well,
that's
configurable
at
the
moment
and
I'm
I
wouldn't
propose
changing
that.
No
I
would
prefer
that
to
be
configurable.
B
A
B
A
C
B
Exactly
so,
if
you
have
this
kind
of
like
All
or
Nothing,
call
Scheduling
kind
of
a
thing
it's.
This
is
like
orthogonal
to
that
you
might
still.
You
know
if
you
just
you
know,
use
the
current
code,
scheduling
plugin,
for
instance,
it
just
you
know,
tells
you:
can
we
deploy
these
ports
to
the
cluster,
the
whole
cluster,
but
would
interpod
Affinity
help
there
I
mean
it's,
it's
a
basically,
it's
a
preferred
mechanism.
C
So
this
is
so
I
guess,
let's
put
it
another
way,
so
other
nothing
scheduling
is
kind
of
equivalent
of
a
filter.
It
says
yes
or
no.
I
can
fit
in
this
in
this
topology,
but
it
cannot
say
which
topology
is
preferred.
A
A
Check
my
is
technically
possible,
but
you
know
the
challenge
here
is
that
so,
basically,
we
are
trading
arbitrary
set
of
nails
as
an
entity,
and
we
just
treat
that
entity
as
a
minimal
unit
when
we
do
scheduling
decision.
That
is
the
purpose
right.
So
it's
similar
like
like
we
before
talking
about
like
we
more
want
to
see
whether
we
have
Cam
created
the
physical
note
like
the
metal
machines,
if
it's
running
a
VM
underneath
to
take
that
into
control
into
their
scheduling
decision.
B
A
Channel
here,
yeah,
the
charger
here
is
topology,
is
nothing
but
a
label
on
node.
So
that
means
it
can
have
as
many
carbonation
you
may
have
it's
like
2
equals
bar
like
bars.
Any
any
level
can
form
a
topology
instead
of
like
no
there's
a
physical
unit.
That
is
just
you
have
5000,
yet
just
five
thousand,
but
for
topology
kid
combination.
A
It's
arbitrary
I
can
be
a
lot
of
that.
You
may
have
spent
a
lot
of
resources
Computing
during
the
scheduling
cycle.
So
that's
the
challenge.
I
I
will
see,
but
technically
I
think
you
can
explore
using
plugin
first
and
I
know
in
the
production
environment
there's
a
lot
of
Labor.
So
that
means
a
lot
of
a
lot
of
topology.
A
A
Yeah
yeah,
basically
yeah.
Basically
the
scheduling
the
input
to
scheduling
is
that
you
have
to
tell
scheduler
what
kind
of
rules
you
want
to
treat
the
scheduling
logic
unit
right.
So
there
are
in
the
scheduling,
plugins
computation
cycle
it
can
just
re
calculate
you
calculate
this
kind
of
information
instead
of
just
trading.
You
know,
that's
the
minimal
scheduling,
unit,
yeah
I.
Think
technically
it's
it's
possible.
But
to
me
to
be
honest,
I
don't
see
a
popular,
a
large
kind
of
requirements
on
this
yeah
other.
A
Do
you
see
any
similar
requirement,
if
not
I
think
I
will
say
maybe
start
with
some
experimental
schedule
plugin
and
if
that
was
well
and
if
we
see
other
common
requirements
for
other
companies
for
other
users
yeah,
we
can
see
how
to
proceed.
C
C
So
so,
let's
say
just
a
bot
comes
in
the
first
part
comes
in,
and
even
if
we
have
information
about
the
topology
in
their
resource
scoring
you
sure
you
could
you
could
score
the
the
topologies
based
on
the
incoming
part,
but
that's
not
necessarily
the
best
location,
because
we
actually
should
have
considered
all
the
Bots
from
this
group.
B
Yeah,
ideally,
you
would
want
to
consider
the
whole
group,
however,
going
for
the
let's
say,
least,
allocated
topology
yeah.
A
B
A
A
B
So
suppose
you
just
have
like
not
a
group
of
boats,
but
you
have
a
single
port
and
you
want
to
do
this
pin
packing.
So
why
not?
You
know,
do
the
scoring
for
most
allocated
and
you
know,
run
the
same
logic.
It's
just
a
number
I
mean
scoring
is
just
a
number,
so
you
know
it
would
end
up
in
the
most
allocated
rack
where
it
fits.
C
About
you
know,
but
the
problem
is
that
we
take
the
decisions
for
each
bot.
So
if,
let's
say
it's
a
big
job
with
multiple
thoughts,
then
the
first
pod
might
choose
might
choose
a
particular
a
particular
rack,
then
gets
full
and
then
the
entire
job
doesn't
fit
I,
don't
think
the
the
cost
scaling
plugin
would
be
able
to
retry
in
a
different
drug.
A
B
Area,
yeah
and
I
was
kind
of
like
thinking
of
building
on
top
of
the
existing
node
resource
fit
plugin
because
it
already
has
all
these
beautiful
mats
and
these
strategies
to
do
like
least,
and
you
know
most
allocated
and
so
forth
and
I
would
I
wouldn't
want
to
recalculate
the
resources
I
would
want
to
reduce
the
calculations
of
the
existing
plugin.
B
C
B
I
kind
of
disagree,
in
the
sense
that
if
you
want,
if
you're,
somehow
having
like
multiple
different
kinds
of
let's
say,
Ai
workloads
and
for
some
some
reason
you
happen
to
have
some
of
them
such
that
there's
just
a
single
board
required
you
could,
you
know,
push
those
into
the
cluster
into
those
topologies
where
the
topology
is
not
entirely.
You
know
unused.
B
A
So
yeah
I
agree
yeah.
It's
to
me.
It
doesn't
seems
like
that.
You
related
with
Lisa.
You
said
it's
a
job
and
now
I
you
can't
totally
fit
this
English
Singapore.
It's
just
like
we
do
what
we
do.
Do
the
preference
to
know
you
just
totally.
We
switch
to
another
angle
to
do
preference
to
a
rack
to
arbitrary
topology.
You
can
be
fine.
C
I
think
yeah.
If
you
still
see
value
on
on
single
pots,
yeah
I
guess
it
makes
sense
to
start
with
a
plugin
or
yeah
I
would
start
with
a
different
plugin
just
so
that
you
can
have
some
feedback
or
like
just
just
iterate.
First,
before
going,
Upstream
I
have
one
more
thought.
B
Okay,
well,
I
shouldn't
have
problems.
You
know,
writing
a
entire
plugin
out
of
it.
Instead
of
just
piggy
packing
on
top
of
the
existing
node
resources,
fit
should
be
doable.
I
should
be
able
to
show
something
I'm
not
going
to
give
you
any
date
any
or
anything
like
that,
but
yeah.
C
To
yes,
yeah
one,
one
more
thing
so
way
was
talking
about
how
topologies
are
just
labels.
So
the
way
you
should
probably
start
is
by
having
this
plug
this
as
a
plugin
configuration
where
you
specifically
say
either
just
one
topology
key
or
a
list
of
topology
keys
that
that
the
plugin
will
be
paying
attention
to
during
the
setup
during
at
a
scheduler,
startup
type
or
like
before.
You
start
the
scheduler.
C
C
I,
don't
remember
the
name,
but
they
referred.
They
are
referred
to
as
zones
and
regions
which,
of
course,
in
a
cloud
environment
they
they
refers
to.
They
refer
to
zones
and
regions
in
cloud
in
the
particular
cloud
provider,
but
a
lot
of
users
in
that
run,
on-prem
on
their
own
servers.
They
already
choose
to
to
map
racks
into
zones
like
that's,
basically
what
what
some
some
people
do.
They
just
take
Zone
as
a
rock
okay.
C
B
B
Yeah
I
was
initially
thinking
about
just
using
the
scheduler
configs
I
see
some
of
the
plugins
actually
utilize.
The
information
in
the
ports
like
this
interport
Affinity
thing,
but
I
wasn't
initially
planning
on
touching
the
port
specs.
That
would
be
complicated.
C
Yes,
you
need
API
changes.
That's
that's
a
long
process
yeah,
but
also
not
every
not.
Every
cluster
needs
the
support
for
topology
scoring
I
guess.
C
B
And
you
know
you
can
adjust
things
with
using
profiles
for
the
scheduler
configs.
So
you
know
a
lot
of
things
can
be
done
quite
dynamically.
Even
with
those
are
sort
of
dynamically
you
don't
have
to
have
it
like
in
the
boards.
Always.