►
From YouTube: Kubernetes SIG Scheduling Weekly Meeting for 20210923
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Okay,
hi
everyone,
so
we
have
a
few
topics.
The
first
one
so
is
about
like
one
to
one
point
to
node
scheduling,
and
then
I
thought
that
we
could
have
like
a
quick
status,
update
on
both
v1
beta
3
component
config
and
simplified
plugin
configuration
and,
although
just
added
another
topic
for
discussion
related
to
probably
like
scoring
for
node
resource
utilization,
so
I'll
I'll
just
start
quickly
and
then
probably
leave
a
lot
of
time
for
that
for
the
last
one.
Hopefully
it's
not
too
complicated.
A
A
A
So
what
I'm
proposing
is
I'm
gonna
create
an
issue
for
this,
for
discussion
is
to
have
a
new
feature,
scheduling
feature
a
new
filter.
Basically,
that
says
a
pod
can
only
schedule
on
a
node
if
there
are
only
daemon
set
pods
on
it.
So
basically,
we're
saying
that
only
quote
unquote
workload
pod
can't
execute
on
a
node.
A
We
consider
demon
set
pods
as,
like
you
know,
agents,
etc.
So
we
don't
care
about
them.
They
have
to
execute
anyways,
and
this
is
typically
needed
for
hpc
workloads
where
these
types
of
workers
they
usually
don't,
want
to
set
requests.
A
They
don't
want
to
calculate
how
much
the
pod
should
exactly
have
and
they
basically
want
the
pod
to
use
all
the
available
cpu
on
the
on
the
node
like
usually,
this
is
not
like,
like
on
the
cloud
you
have
all
these
types
of
and
shapes
of
vms,
and
so
wherever
they
execute,
they
want
basically
to
use
the
whole
machine
for
that
part
and
so
explicitly
having
a
feature
where
they
say.
A
I
want
only
one
ending
quote:
unquote:
workload
part
to
execute
on
the
node
is
becoming
more
and
more
common,
as
we
are
seeing
more
batch
and
hpc
workloads
executing
kubernetes.
So
that's
my
thought.
I'm
going
to
create
an
issue
to
have
a
discussion
around
this.
What
the
exact
api
should
look
like,
I
don't
want
it
to
be,
for
example,
an
annotation.
That's
I
don't
want
to
go
that
route.
I
want
to
have
a
discussion
around
like
what
the
api
at
the
spec
level
should
look
like.
A
Maybe
a
scheduling
mode,
where
you
say
like
one
to
one
modular
demon,
sets
something
like
that.
Any
thoughts
comments.
B
So
is
the
intention
to
just
assign
one
single,
like
hpcc
part,
to
like
to
call
them
the
nail
or
you
can
schedule
multiple,
this
kind
of
special
hpc
part
two
are
note.
A
It's
exactly
one
two
one
like
exactly
so.
Basically
the
filter
will
be
really
basic
basic.
It
would
iterate
over
all
the
pods
on
a
node
like
when
we
examine
a
node,
we'll
iterate
over
the
parts
of
the
node.
If
any
of
these
parts
have
a
controller
like
an
awning
controller
that
is
not
demonstrated,
we
will
consider
the
node
not
eligible
we'll
filter
it
out.
A
So
until
we
find
a
node
with
pods
that
are
only
demonstrated
or
nothing
basically,
then
then
we
will
be
like.
B
Right
yeah,
another
question
is
that
so,
if
a
nerve
is
caught
in
actually
the
colon
just
symmetrically
means
there
is
a
pin
on
it
right.
So
how
to
deal
with
the
case
that,
upon
a
regular
path,
that
with
the
corresponding
toleration
and
then
the
part
gets
scheduled
on
that
accordant
node,
so
that
it
compared
with
the
one
one
mapping
dedicated
htc
part.
B
B
A
If
there
is
a
part
that
requests
one
to
one
mapping,
then
that
node
is
gonna,
be
it's
not
going
to
be
a
candidate.
A
So
there
are
two
cases
right,
like
first
case
is
the
pod.
Has
this
scheduling
mode
one
two
one
right:
we
the
annoys
candidate,
if
the
pods
that
are
already
scheduled
on
the
node
are
only
demonstrate.
A
A
Only
if
the
node
does
not
have
a
pod
with
this
energy,
with
this
scheduling
mode
right
like
if
there's
already
a
pod
running
on
the
node
that
is
requesting
one
to
one,
then
this
node
is
not
a
candidate
for
the
incoming
code,
that's
not
requesting
it.
B
Okay,
so
it
seems
this
kind
of
filtering.
The
logic
of
this
kind
of
filtering
needs
to
be
sort
of
mixing
some
other
filtering
plugins
luggage.
So
it's
kind
of
it's
like
a
combination
to
track
a
series
of
rules.
A
A
Sorry
not
resources.
It's
the
same
thing
here.
You
have
two
cases:
okay,
okay,
I
got
it
yeah
yeah,
one
of
the
part,
is
requesting
one
of
if
it's
not
requesting
the
the
two
one
mapping,
and
that
is
like
it's
going
to
be
way
more
efficient
than
node
anti-affinity.
A
I
believe-
and
it
will
work
way
better
for
cluster
or
scalar,
which
does
a
ton
of
simulations
and
node
affinity.
Sorry,
pod
affinity
is
always
like
a
thorn
in
this
backside
of
the
cluster
or
scale
because
it
doesn't
work
well
when
we
try
to
consider
cluster
level
status,
it's
too
expensive
for
them,
and
I
think
I
would
argue
that
this
is
a
cleaner
api
but
more
explicit
api
for
this
type
of
use.
A
Cases
like
even
if
you
want
to
use
node
anti-affinity,
it
is
kind
like
you
need
to
have
the
labels
on
every
note
and
on
every
point
right
like
to
propel
each
other
from
each
other
right
here.
It's
just!
No.
You
just
specify
that
feature
that
that
intent
of
having
one-to-one
on
the
pod
that
wants
the
one-to-one
mapping
right.
A
A
Okay,
I'll
create
an
issue
to
have
more
discussion
there
about
the
api,
etc.
So
I
thought
it
would
be
a
good
idea
to
bring
it
to
this
sig,
just
as
a
heads
up
and
see.
If
there's
appetite
for
this,
I
think
I
think,
like
I've
seen
enough
cases
really
from
our
customers
internally
and
even
outside,
they
are
not
like
they
don't
feel
like
kubernetes
is
giving
us
this
simple
scheduling.
You
know
primitive,
which
I
kind
of
agree
with.
A
D
Yeah
yeah
he's
last.
I
knew
he
was
waiting
for
api
review
from.
I
think
jordan
for
that.
So
that
is
the
status
that
I
know
about
that.
But
that's
introducing
the
view
on
beta3.
I
think
I'm
gonna
work
on
some
of
the
kep
and,
like
logistical,
you
know
opening
the
pr
for
the
docs,
but
I
can
poke
him
about
that
pr
and
see.
If
there's
any
other
review
comments
that
he
needs
to
get
to,
there's.
C
A
Okay,
simplified
plug-in
configuration
status.
D
D
But
the
good
part
is
that
I
think
during
our
kep
process
we
went
through
some
really
thorough
review
of
the
approaches,
and
so
I
think
that,
hopefully
that
translates
to
a
quicker
code
review
process.
Obviously
we
can't
assume
that,
but
you
know
that,
should
that
shouldn't
take
too
long
and
I'm
also
going
to
be
doing
the
prs
for
docs
and
updating
the
cap
along
with
that.
A
Okay
and
last
but
not
least,.
C
Hello
yeah:
I
can
speak
to
this
one,
so
this
the
probably
additional
problem
here
was
that
when,
when
we
are
doing
the
scoring
of
a
node,
we
assume
certain,
we
assume
certain
requests
on
the
pots
if
they
don't
specify
any
these,
and
this
causes
a
problem
of
under
utilization
in
small
nodes,
because
the
existing
logic
would
so
with
the
calculations.
C
With
this
estimated
request,
we
would
get
to
more
than
100
allocated
resources
in
the
scoring
so
and
the
existing
score
would
give
this
zero
would
give
zero
to
this
spot
so
that
could
lead
to
underutilization.
C
That's
what
this
pr
was
fixing
now
there's.
There
is
the
opposite
problem
that
this
pr
brings
in
the
case
where
you
want
spreading
the.
If
you
have
the
same
setup,
no
requests,
you
might
hit
the
100
utilization
for
both
cpu
and
memory,
and
then
you
are
basically
a
perfect
balance
or
the
scalar
thinks
it's
a
perfect
balance.
So
it
starts
giving
100
score
for
this
node.
C
So
what
what
happens
is
that
then
this
node
starts
being
over
utilized
in
the
case
where
you
don't
want
that.
So
there
there's
this
two
competing
problems
right
this.
This
pr
was
merged
first
in
122
and
then
back
border
to
all
the
supported
versions
and
well
it
brings
this
breakage
of
existing
clusters
or
existing
tests.
C
On
the
other
hand,
dave
send
another
pr.
Let
me
put
in
the
notes,
also
in
122
word
notes.
C
In
this
pr,
I
don't
know,
could
you
open
it?
Abdullah,
it's
in
the
notes.
Now
so
they've
sent
an
rpr
where
we
changed
the
scoring
of
balance
resource
allocation
from
absolute
difference
to
to
a
standard
deviation.
So
with
absolute
difference.
When
you
have
her
perfect
score,
perfect
balance,
you
have
you
get
100
score
with
stan
standard
deviation,
you
would
get
50.,
so
in
122,
this
problem
should
be,
should
be
diminished.
C
With
with
this
pr,
we
would
get
a
score
of
50.,
so
the
problem
should
be
diminished,
but
on
the
all
the
previous
versions.
Maybe
we
should
revert
this
back
port
to
to
avoid
breaking
existing
users,
so
that
would
be
121
120
when
119
is
already
closed,
so
those
those
two
would
be
the
the
ones
to
backport
a
revert.
D
Hi
hello,
just
to
so
this
was
first
brought
to
me
by
our,
I
guess
the
perf
scale
team
here
at
on
openshift
and
when
we
were
trying
to
figure
out
where
exactly
the
new
behavior
started
to
come
from.
You
know
we
found
the
first
pr
your
pr
there
and
so
we're
reverting
that
in
a
121.4
base.
You
know
that
that
fixed
the
problem,
but
we
are
still
last.
D
I
know
we're
still
seeing
this
issue
in
122.,
so
I
don't
know
if
this
did
address
it
or
not,
like
you
were
saying
we're
still
trying
to
gather
some
information
on
that,
and
I
think
that
you
know
some
of
the
people
that
we
were
talking
to
are
going
to
open
a
github
issue
to
track
it.
D
I
definitely
think
that
at
least
the
older
version
back
ports
could
get
reverted
to
maintain
that
consistent
behavior,
like
you
were
saying,
but
we
should
probably
look
a
little
more
into
what
could
be
causing
this
or
if
we
can
reproduce
it
in
a
vanilla,
kubernetes
install
and
to
totally
eliminate
any
concern
that
it
could
be
openshift
involving
it.
D
Yeah,
so
we
we
actually
first
noticed
it
through
a
122
version.
C
D
D
I
don't
know
exactly
the
cluster
state
that
they're
running
it
on,
I
don't
know
the
size
of
the
nodes.
I
know
that
it's
being
run
in
like
a
like
25
node
couple
thousand
pod
setup-
I
don't
know
the
exact
size
of
those
nodes
or
their
testing
setup,
but
I've
asked
them
to
try
to
write
down
the
reproduction
test
steps
as
much
as
they
can.
D
C
I
see
yeah,
I
thought
it
would
be
much
better
in
122.
C
C
Having
having
no
requests
is,
is
it's
problematic
in
general
yeah?
I
don't
know.
D
C
The
balance
is
score,
uses
this
difference
between
the
fraction,
the
the
the
usage
fraction
for
cpu
and
memory
and.
C
This,
when
you
don't
have
requests,
then
we
estimate
we
guess
what
would
what
could
be
the
request
of
a
pod?
C
And
then
this
with
this
estimation,
because
it's
it's
a
constant
we
can,
if
the
if
the
node
is
small,
we
can
get
over
100
utilization
that
causes
the
balance
score
to
basically
start
estimating
that
the
the
node
is
balanced
because
everything
is
at
100
utilization.
So
that's
perfect
balance.
C
C
This
is
causing
over
utilization
of
nodes.
Now
in
the
previous
behavior,
we
would
give
score
of
zero,
so
this
would
cause
underutilization
of
the
node,
so
both
both
are
problematic.
A
But
what
is
what
is
it
that
my
mike,
what
are
you
observing
like
yeah.
D
D
What
we
started
to
see
from
this
is
that,
like
up
to
a
certain
point,
all
the
notes
are
distributed
pretty
evenly
and
then
at
a
pretty
consistent
level
across
tests
like
144
pods
per
node.
It
would
suddenly
start
to
just
prefer
one
node
and
dump
like
the
next
100
or
so
pods
onto
that
node,
and
then
I
think
it
would
go
back
to
spreading
between
them.
But
I
know
that
that
that
behavior
was
the
odd
distribution
that
we
were
seeing.
A
So
if
we
revert
the
pr
what
would
be
the
behavior,
it
would
basically
not
consider
the
node
balanced
in
terms
of
cpu
memory
utilization,
even
though
that
the
pause
is
not
requesting
cp
nor
cpu,
nor
memory
right,
the
it
would
yeah
zero.
D
D
Right,
but
you
know
saying
that,
like
even
if
we
get
all
of
the
pods
in
our
cluster
to
set
the
requests,
it's
a
pretty
big
ask
to
push
that
then
on
to
users
and
tell
them
that
you
know
now
you
need
to
set
all
the
requests
on
all
of
your
pods,
or
else
you
could
end
up
seeing
this,
especially
with
like
a
large
cluster.
D
C
So
another
solution
could
be
to
lower
our
assumption.
Our
we
have
these
constants
right
to
give
a
size
to
give
a
a
request
to
a
pod.
That's
that
we
use
for
scoring.
If
we
reduce
that,
then
we
we
would,
we
would
hit
this
with
less
frequency.
C
I
don't
know
if
that's
something
we
could
consider.
C
But
for
the
record
I
think
it's
better
to
just
revert
the
change
for
121
and
120.,
but
I
think
we
need.
We
need
a
better
solution
for
122
and
123.
C
Mike,
could
you
craft
a
reverse
and
then
we,
if
you
can
open
the
issue,
we
can
start
brainstorming.
Some
ideas.
D
Yeah
I'll
open
up
the
reverts
and
I
think
one
of
the
people
that
we
were
talking
to
because
they
this
team
has
more
test
information
than
I
did.
But
I'm
going
to
push
them
to
open
up
the
bug
with
all
of
the
information
that
they
have
and
then
we
can
get
the
discussion
going
offline.