►
From YouTube: Kubernetes SIG Scheduling Meeting - 2019-09-12
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
A
B
Sure
so
it's
been
some
time
since
I've
talked
about
this.
So
a
short
overview
of
this
is
that
the
desire,
what
we
intend
to
do
here
today.
We
cannot
resize
the
pod
as
in
change
the
resources
field
in
the
pods
back
without
restarting
the
pod.
It's
disallowed
by
by
the
validation
check
in
most
cases,
and
the
intent
of
this
cap
is
to
make
that
mutable
and
then
have
a
way,
a
mechanism
in
the
core
kubernetes
to
handle
this
change
and
drive
towards
the
desired
new
resources
that
the
user
desires
as
a
desired
state.
B
So
in
doing
so
a
little
history,
our
first
design
was
keeping
scheduler
in
the
loop,
where
scheduler
would
first
look
at
the
request
and
then
approve
reject,
based
on
its
view
of
the
node
capacity,
and
then
the
cube
net
would
act
on
what
the
scheduler
approved
this.
The
signal
felt
was
overly
complicated
to
manage
so
and
direct
felt
that
we
should
it
once
the
pod
is
bound
to
a
node.
The
resizing
should
purely
be
a
discussion
between
the
API
server
and
the
cubelet
and
scheduler
can
potentially
help
if
by
pre-empting
lower-priority
pods.
B
But
that
is
something
we
plan
to
take
up
in
the
future.
It's
not
a
not
part
of
this
cap
and
this
time
so
once
the
pod
is
bound
and
resizes
requested,
cubelet
looks
at
it
and
decides
whether
it
can
accept
it
or
not,
based
on
the
available
capacity
room
in
the
node,
and
if
it
does,
then
it
indicates
that
by
updating
the
new
field,
that's
been
added
to
pod
spec
called
resources
allocated,
which
is
the
Quest's
part
of
the
resources
and
says:
okay,
yes,
I
have
room.
B
It
essentially
gets
updates,
it's
pod
updates
are
taken
in
and
it
updates
the
resources
allocated
section
of
it
when
a
recess
has
been
accepted
and
the
change
to
scheduler
here
would
be
to
start
using
resources
allocated
instead
of
resources
to
compute
the
pods
particular
resource
requirements
and
when
a
new
pod
is
being
scheduled.
It
looks
at
the
existing
pods,
looks
at
the
resources
allocated
field
and
then
computes.
B
A
To
reiterate
and
make
sure
that
I
understand,
so
this
scheduler
actually
will
look
at
exactly
the
same.
We
were
doing
right
now
when
we
admit
the
pond,
which
we
look
at
the
pod
resource
requests
like
resources,
dot,
request,
I,
guess
that's
that
feel
yes
for
the
new
pod,
yes,
for
an
only
for
a
new
part,
four.
B
A
And-
and
my
point
is
pause
that
are
not
bound-
are
the
pause
that
we
are
trying
to
schedule
should
so
for
any
part
that
we
are
trying
to
schedule
will
always
look
at
its
resources
that
request
to
decide
where
to
place
it
correct,
but
to
compute
how
much
resources
and
node
is
using.
We
will
iterate
over
the
pause
in
that
mode
and
look
at
the
resources
allocated
field
correct
right.
Okay,
so-
and
this
is
what
you
meant
by
like
bounded
pods
for
battle
pods,
we
look
at
resources
allocated
and
those
bounded
pods.
A
B
Today,
all
it
does
is
updates
the
cache
when,
like
things
like
annotations,
something
changes,
it
just
updates
the
pod
I
believe
this
would
in
the
change
would
be
due
in
its
accounting.
This
one
field
resources,
not
requests,
would
resources,
not
requests,
would
be
something
that
the
user
desires,
so
that
may
not
necessarily
be
what
the
couplet
has
agreed
to,
so
the
computation
should
be
based
on
what
couplet
has
agreed
to
and
for
a
brand-new
pod
in
the
API
server
early
at
the
pod
creation
time.
B
We'd
set
this
if
it
is
empty
and
the
current
thought
process
is
that
if
the
user
set
set
at
creation,
we
validate
that
it
should
be
equal
to
requests.
If
not,
we
fail
the
validation
and
this
bringing
this
up,
because
David
Rushville
suggested
there
might
be
a
use
case.
He
can
potentially
see
a
use
case
where
the
user
might
have.
B
A
Right
and
and
actually
the
case
that
David
Nashville
proposed
could
also
I
don't
know,
could
also
be
useful
for
UNIX
like
this
again.
The
second
item
on
the
agenda
approaches
to
how
in
front
of
schedule
of
other
compute
workloads,
so
you
could
use
that
as
a
hat,
basically
to
request
more
resources
to
present
more
resources
than
the
node,
but
actually
use
less.
B
I
think
that
requires
more
thought.
It's
a
nice
to
have
I
kind
of
see
it
as
okay.
This
gives
the
user
a
way
to
say
hey.
This
is
my
ideal
resource
requirements,
but
this
is
the
minimum
for
brand-new
pod
and
scheduling
me
at
the
minimum
and,
if
possible,
drive
towards
ideal
that
could
potentially
be
a
nice
to
have
once
this
is
in,
and
we
see
how
this
works.
How
well
this
works?
We
can
take
that
up.
A
B
I,
don't
know
yet
so
I
realized
couple
of
weeks
ago,
just
before
we
kind
of
got
consensus,
that
this
is
a
way
to
go
with
sig
node,
that
we
need
to
do
a
review
with
cigar
architecture,
for
a
peer
review
and
I
have
requested
the
APA
review
and
we're
planning
to
try
and
schedule
that
in
the
next
few
weeks,
just
wanted
to
before
I
went
to
them.
I
want
to
make
sure
that
we
cross
our
T's
not
arise
kind
of.
B
So
the
key
impacts
here
is
to
cube,
let's
ignore,
which
is
kind
of
looking
like
they're,
going
to
agree
to
it,
and
LG
TM
already
got
one
entity
by
David
but
Derrick
and
don't
have
to
approve
this
SIG
auto-scaling
should
have
no
issues
I'm
going
to
talk
to
them
on
Monday
when
it's
their
meeting
weekly,
weeding
and
six
scheduling,
which
is
you.
We
have
to
agree
that
this
is
not
gonna,
be
a
problem
for
scheduler.
Once
we
have
all
these
it'll
be
a
strong
case
for
me
to
talk
to
them
so
Bob.
A
C
So
once
the
part
is
preempted,
and
now
it
may
look
like
it
has
some
free
resources.
At
the
same
time,
scheduler
wants
to
use
that
resources,
so
we
get
many
or
cube-like
may
think
that
that
resource
is
available.
Similarly,
every
certain
amount
of
resource
may
be
available
and
I
know.
This
is
actually
what
I'm
talking
about
right
now
is
about
regular
scheduler,
well
or
scheduling
flow,
so
certain
amount
of
resources
is
available,
I
mean
no
BPA
and
we're
both
at
the
same
time
decide
to
use
it
one
for
increasing
resource
requirements
for
it.
C
For
a
part,
the
other
one
for
scheduling,
and
in
fact,
so
there
are
these
concerns,
I
guess
at
some
point
we
decided
that
these
resources,
these
kinda
race
conditions,
are
unavoidable
in
large
distributed
systems.
We
should
instead
of
trying
to
solve
them.
We
should
accept
their
existence
and
try
to
resolve
them
after
they.
C
B
B
However,
the
thought
process
is
that
this
should
be
a
fairly
unlikely
rare
event
that
you
know
the
overall
system
load
and
there's
no
way
to
know
this
until
we
have
something
in
production
is
a
chicken-and-egg
problem.
Well,
once
we
have
something
that's
in
there
and
is
used
as
a
large
scale
that
will
have
the
data
to
determine
okay.
Maybe
we
should
you
know,
update
our
implementation
to
have
scheduler
in
the
loop.
That's
something
that
I
could
not
justify
with
data
without
having
the
data,
because
there
is
for
me
to
get
the
data
and
the
system.
B
C
Yeah
well,
one
thing
to
know,
though,
is
that
you
know
they.
These
kind
of
race
conditions
are
somewhat
rare
in
the
kubernetes
world
because
most
of
the
times
in
kubernetes
companies
clusters
are
created
in
and
cloud
environments.
The
sort
of
the
assumption
is
that
there
is
plenty
of
no
that
notes
available
for
all
the
workloads
in
clusters.
As
a
result,
most
of
kubernetes
clusters
do
not
have
a
ton
of
pending
pods
waiting
to
be
scheduled
in
on-prem
clusters.
C
The
story
is
completely
different,
for
example,
in
borg
they
almost
always
have
a
ton
of
pending
pending
tasks
in
every
cluster.
Then
there
are
so
many
pending
tasks
or
pending
pods
in
a
cluster.
Potentially
these
can
erase
condition.
Can
conditions
can
happen
more
often
so
yeah?
We
should
see
in
practice
and
then
based
on
feedback.
We
can,
of
course,
yeah.
B
This
feature
actually
helps
in
that
there
was
the
one
of
the
cases
that
I
there's
a
company
called
JD
calm,
which
used
our
first
implementation,
and
they
found
that
if
they
see
a
lot
of
pending
pods,
they
go
in
and
see
what
pods
are
not
really
using
their
capacity,
their
allocations
and
they
size
them
down,
and
that
way
they
get
more
work
running
right.
So
this
feature
can
help.
A
B
A
D
B
D
A
E
So
my
my
ideas
are
not
very
formalized:
I
didn't
go
to
signal
first,
so
the
problem
that
we
have
is
we
have
an
existing
VM
scheduler
and
it's
scheduling
VMs
on
a
set
of
physical
nodes,
and
we
were
looking
to
add
kubernetes
to
that
management
plane,
but
I
think
it'll
be
a
long
time
before
we
switch
the
scheduler
to
kubernetes
schedule.
The
VMS
and
I
was
wondering
how
we
can
have
these
two
systems
coexist.
A
E
You
look
at
like
Qbert,
where
Qbert
schedules
they
take
I
think
was
it
pods
to
schedule
odds,
but
they
create
the
VMS
inside
the
same
namespace
as
the
body
okay,
but
if
we
scheduled
the
VMS
outside
of
kubernetes
and
we
watched
kubernetes
resource
allocations,
we
also
want
to
be
able
to
inform
our
resource
allocations
back
into
kubernetes,
and
I
was
curious
how
to
accomplish
this.
There's
there's
a
few
things.
I
saw
others
I
could
create
pods
that
didn't
do
anything
that
were
specifically
bound
to
nodes.
E
C
E
If
you
look
at
cuber
cuber
solve
this
problem
by
scheduling
the
VMS
using
by
scheduling
pods,
they
converted,
they
created
a
vm
CRD.
They
scheduled
a
pod
and
that
pod
represents
the
resource
allocation
of
the
CRT.
And
then,
when
the
pod
was
scheduled,
they
actually
created
a
VM
in
its
place.
I
see
so.
E
Schedulers
point
of
view
those
VMs
or
pods
yes,
I
was
trying
to
understand.
If,
if
I
didn't,
do
all
the
scheduling
through
kübra
Nettie's,
how
could
I
back-channel
inform
kubernetes
about
these?
One
idea
I
had
was
create
pods
that
were
assigned
to
specific
nodes
that
represented
the
resource
utilization
that
were
empty,
yeah
I,
wasn't
sure
if
you
had
any
other
suggestions
of
financially
and
how
to
approach
this
problem.
Yeah.
C
That's
one
way
of
doing
it.
In
fact,
we
have
something:
a
concept
similar
to
what
you
just
described
called,
mirror
pods.
Those
are
used
for
to
represent
the
denote
usage,
did
not
resource
usage
and
basically
create
a
sort
of
like
a
representation
representation
of
static
parts
that
are
created
directly
unknowns,
and
then
these
mirror
parts
are
created
to
represent
they're
sort
of
like
the
logical
object
and
they
are
stored
on
the
API
server.
C
We
decided
we
actually
recently
with
felt
like
these
are
not
great,
because
there
are
some
problems
with
with
respect
to
like
race
conditions,
when
these
parts
of
created
on
an
hours
versus
interpose
are
created
on
the
API
server
whatnot,
but
generally
they
work.
They
are
not,
of
course,
as
great
as
just
regular
natural
parts,
but
that's
one
way
he
can.
You
can
solve
this
problem.
C
C
A
Is
the
downside
of
using
what
you
just
proposed
because,
apart
from
her
perspective,
represents
the
pâtisserie
sources
that
we
are
reserving
on
the
node?
So
to
me
it
sounds
like
a
logical
solution,
but
how
are
you
good,
like?
Are
you
going
to
create
the
VM
not
from
the
pod
you're
gonna
have
kids
VM
via,
like
an
external,
so.
E
I
I
was
creating
the
VM
externally,
I
wasn't
sure
if
there
was
an
easy
way
to
potentially
create
the
reservation
before
I
realize
it
and
flush
that
cache
all
the
way
into
the
scheduler.
Such
a
scheduler
could
approve
or
reject,
and
then
the
only
way,
I
thought
of
approving
or
rejecting
was
probably
just
scheduling
a
pod,
and
we
do
have
preemption
and
priority
levels.
F
E
A
Mean
the
logic
like
node
is:
is
the
is
an
abstraction
right
front
company
right
what
how
we
present
the
amount
of
resources
of
a
single
entity
right
so
to
me,
the
most
logical
way
is
to
define
what
that
node
represents.
What
I
represent
half
of
the
resources
of
that
physical
host
or
all
of
it.
So
you
and-
and
if
you
represent,
for
example,
all
of
it,
then
you
are
practically
saying
that
kubernetes
can
use
all
of
that
resources
to
schedule
a
pod.
A
So
yes,
if
you
don't
want
to
do
that,
then
you
really
need
to
configure
the
not
object
that
represents
that
physical
host
to
consume
half
of
it
or
whatever
amount
of
resources
such
that
you
keep
enough
for
for
another
for
other
system
to
schedule.
Workloads
on
that
node.
So
from
my
perspective,
it
is
more
office
like
a
flag
or
some
configuration
to
the
cubelet
that
configures
how
much
resource
you
want
to
give
it,
rather
than
the
other
way
around.
Well.
E
A
A
So
what
idea
we
discussed
with
aldo
discussed
with
aldo
and
shown
here
is
that
we
would
like
to
have
a
feature
flag,
a
new
feature
flag
that
I
will
allow,
while
you're
migrating
predicates
and
priorities
into
plugins.
We
will
have,
for
example,
for
specific
predicates.
We
will
have
its
implementation
as
a
predicate
and
its
limitation
as
a
plug-in
and
the
priority
flag.
It's
a
it's
a
single
flag.
A
We
will
either
say
but
I'm
gonna
use,
predicates
or
I'm
gonna
use
buttons,
so
that
I
will
will
the
the
reason
is
that
we,
for
some,
for
some
of
the
logic,
will
be
just
copy/paste
but
for
others
might
not
be
because
we're
gonna
have
to
split
it
into
different
extension
points.
So
just
for
the
sift
like
a
sense
of
safety,
the
idea
is
to
have
a
feature
flag
and
and
graduated
alpha,
beta,
etc,
and
it
will
be
in
GA.
A
C
When
we
designed
the
scheduling
framework,
that
was
my
impression
that
these
are
all
internal
logic
of
the
scheduler.
At
the
same
time,
we
need
to
steal.
We
need
to
still
keep
the
mechanism
to
disable
some
of
these.
Why
are
the
existing
policy
can
figure
out
the
scheduler
or
basically,
we
should,
since
that
policy
config
existed,
we
need
to
be
backward
compatible
and
if
a
particular
predicate
is
disabled
or
if
a
particular
priority
function
is
removed
or
it's
late,
this
change.
We
need
to
apply
those
changes
to
the
plugins,
but
other
than
that.
G
C
A
C
Understand,
to
be
honest
with
you
very
early
plan,
was
to
build
a
second
scheduler
calling
schedule
okay
and
keep
this
scheduler
in
its
own
repo
and
built
the
scheduling
frame
frame
work
in
parallel,
but
later
on,
I
I
felt
like
it
will
be
almost
impossible
to
roll
out
as
scheduler
2.0.
If
we
don't
build
the
framework
into
the
existing
header,
so
we
decided
to
basically
build
the
framework
into
the
existing.
C
What
I
understand
that
there
are
a
lot
of
concerns?
Similarly,
there
were
a
lot
of
concerns
with
preemption
I.
Remember
that
I
I
got
quite
a
bit
of
a
hard
time
when
we
were
rolling
that
out,
everybody
was
freaked
out.
People
were
thinking,
our
billions
of
dollars
of
workers
are
in
danger,
so
I
understand
your
concern,
so
it
did
make
sense
if
we,
if
it
really
makes
it
much
more
reliable,
reasonably
then
yeah.
C
A
So
there
is
a
as
part
of
the
migration
planned.
We
will
be
moving
the
predicates
into
plugins,
but
I
have
a
section
in
my
job
to
to
present
a
reasonable
way.
This
is
like
even
daemon
sit
controller
to
consume
those
those
filters,
we're
not
just
gonna
cut
it
and
tell
you
guys,
you
know
deal
with
it.
We
we
will
collaborate
on
that
part
and
and
we're
not
gonna,
remove
those
functions
until
mid
until
we're
sure
that
everybody
has
moved
into
a
reasonable
state.
I.
G
A
F
A
It's
a
doc
that
will
probably
convert
into
a
kit.
The
core
part
of
that
clip
is
going
to
be
really
duplicate
in
polis,
convict
conflict.
That
scheduler
takes
right
exactly
and
what?
If
we
duplicate
that
practically
deprecating
access
to
predicates
and
priorities,
which
means
everybody
has
to
use
plugins,
etc,
and
and
as
part
of
that,
we
have
to
make
those
available
as
plugins
see
what
the
component
complete
right.
F
So
again,
this
is
a
I'm
not
very
familiar
with
it.
So
I
plugins
are
basically
going
to
be
core
like
compiled
into
the
scheduler
in
the
core
Cuban
at
ease,
right,
correct,
yeah,
okay
and
then
so,
when
we
disable
predicates,
is
there
going
to
be
a
one-to-one
flag
for
each
plug-in
that
we
enable
will
disable
the
corresponding
predicate?
No.
A
So,
as
we
mentioned,
this
is
this
is
practically
internal
logic.
As
long
as
we
are
able
to
support
the
policy
config
API
while
is
being
deprecated,
then
then
we're
fine
and
the
flag
that
I
was
mentioning
is
just
basically
a
generic
way
of
switching
between
the
old
implementation
and
the
new
one
right.