►
From YouTube: Kubernetes SIG Scheduling Meeting - 2019-08-15
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
All
right,
let's
start
hello,
everyone,
as
you
know,
this
meeting
is
recorded
and
will
be
uploaded
to
public
internet,
so
chances
are
whatever
you
say
will
remain
there
for
a
very,
very
long
time
that,
let's
start
the
meeting,
I
have
a
couple
of
items
to
talk
about.
Hopefully
these
are
not
going
to
take
much
time
and
then
I
know
that
there
are
a
few
folks
have
some
issues
that
wanted
to
speak
about.
Forgive
me
how
actually
a
demo
about
the
scheduler.
A
A
Choose
a
leader
among
themselves
by
acquiring
your
leader
lock
and
once
a
leader
loses
the
lock
a
real
real
action
happens.
Sometimes
when
I
mean
all
the
times
when
a
scheduler
loses
this
liter
lock,
it
must
restart,
but
apparently,
in
some
of
the
cases
that
draughty
had
tested,
it
didn't
restart
itself,
so
he
had
sent
a
PR
to
fix
this
issue.
Some
other
folks
in
the
community
believed
that
restart
happens.
A
So,
while
the
PR
is
fine
and
we
can
actually
manage
it,
there
is
a
question
whether
this
is
actually
a
bad
fix,
or
this
is
just
a
cleanup.
The
reason
that
it
makes
a
difference
is
that
if
this
is
actually
a
bug
fix,
we
need
to
back
port
it
to
all
the
versions
that
we
support.
Basically,
all
the
way
to
113
I
believe
that
we
support
today,
I,
don't
know
if
we're
always
in
a
meeting
now,
I,
don't
see
him
Mike
do
you
know,
I
probably
don't
know
about
the
status
of
this
change.
I.
A
Okay,
all
right
yeah,
so
we
can
actually
I
will
follow
up
the
tabulator.
A
With
some
agreement
from
the
signal
we
decided
to
mark
all
the
static
pods
or
consider
all
the
static
parts
as
critical
parts,
and
we
do
not
need
static
pods
to
actually
have
a
priority
or
have
any
annotation
or
anything
to
be
considered
as
us
as
critical.
All
of
them
are
considered
critical,
no
matter
what
their
priority
is
for
what
and
whether
they
have
an
annotation
or
not.
So
after
that
fix
is
managed.
A
We
now
have
been
able
to
reemerge
the
removal
of
critical
part
annotation,
and
that
has
happened
yesterday.
These
are
the
two
updates
that
I
had
so,
let's
see,
but
if
swathi
you
have
some
she's
on
the
call,
she
has
a
demo
for
us,
I
believe
but
I
don't
know
if
she's
here
all
right
in
the
meantime,
is
there
any
question
or
comments
around
these
from
other
folks
in
the
meeting
I.
B
Have
one
small
thing:
I
don't
know
if
this
is
the
right
place
to
bring
it
up,
but
we
had
a
test
that
was
flaking
around
paints
and
Toleration.
That
Robbie
tried
to
merge
a
fix
which
broke
the
test,
so
we
reverted
that
fix.
I've
got
another
VR
that
I
just
opened
today,
which
basically
brings
in
Robbie's
fix
and
tries
to
complete
it
so
that
it
shouldn't
break
at
this
point.
I
just
put
a
link
to
that
in
the
doc
yeah.
A
Please
do
if
you
can,
please
add
it
to
the
meeting
notes
and
another
question
I
have
for
you
guys.
Is
that
Robbie
reached
out
to
me
and
said
that
you,
you
guys
planned
to
rewrite
some
of
the
preemption
tests,
because
there
are
fledgling
in
your
environment
in
certain
scenarios.
I
actually
don't
know.
Do
you
plan
to
convert
them
into
immigration
tests
or
you
plan
to
just
rewrite
them
in
a
more
reliable
way
as
an
tantus.
B
From
what
it
sounds
like
talking
to
Ravi
we'd
like
to
move
as
many
as
we
can
just
integration,
because
we
have
an
open
ship,
we
have
a
lot
of
really
high
load
tests.
We've
got
a
bunch
of
other
and
knowing
tests
running
in
the
background
with
all
of
our
operators,
and
so
I
think
that
our
goal
is
to
move
as
many
of
them
as
we
can
to
integration
and
then
rewrite
others
where
we
can't
move
them.
Yeah.
A
We
definitely
need
to
have
at
least
one
probably
more
than
one
end
to
end
preemption
test,
because
we
want
to
make
sure
that
preemption
actually
works,
so
it
can
definitely
delete
parts
and
we
can
update
API
server
and
everything.
So
we
we
definitely
want
to
have
an
entrance
for
preemption,
but
generally
I
am
okay
with
converting
and
when
test
two
integration
tests.
If
we
don't
lose
any
functionality
of
the
test,
yeah.
B
C
A
So,
yes,
you
know
I
actually
I.
Also
Chad
I
told
you
that
I'm
not
fully
aware
of
that
changed.
So
I
would
rather
someone
who
is
was
review.
The
software
I
know
Abdullah
is
on
vacation,
but
he's
coming
back,
so
he
can
probably
provide
a
response
for
those
early
next
week.
I
hope,
that's
fine
with
you,
because
I
we
still
have
some
time
before
the
code
freeze
right
so
code,
freeze
it
at
the
end
of
the
month.
So
we
still
have
some
time.
Hopefully
we
can
merge
it
before
that
and
I
mean.
A
A
D
It
has
a
toleration
which
corresponds
to
the
taint
on
the
node.
However,
in
a
scenario
that
taint
gets
updated
on
a
node,
the
pods,
this
particular
pod
is
violating
its
placement
intention.
So
it
needs
to
be
rescheduled.
So
that's
essentially
the
idea
behind
this.
The
strategy
that
we're
proposing-
can
you
see
the
shell
here?
Yes,
okay,
so
I'll
show
you
the
nodes.
D
I
have
a
two
node
cluster
over
here
and
what
I'm
going
to
show
you
is
that
I'll
deploy
a
deployment
with
ten
nodes
and
they'll
be
distributed
across
both
nodes,
and
once
we
paint
a
specific
node
to
is
deploying
those
pods
should
be
moved
off
to
the
other
node,
because
the
corresponding
pods
will
not
be
compatible
with
the
Toleration
that
has
been
specified.
So
let
me
just
show
you
deployment
that
I'll
be
shuttling.
D
D
So
you
see
these
deployments
are
being
scheduled
on
would
the
note
so
we
have
four
one:
nine
and
eight
seven
seven
notes
the
shedder
tries
to
distribute
its
deployment
across
multiple
nodes.
So
that's
what
it's
essentially
doing
and
what
I'll
show
you
here
is
that
if
I
assign
attaint
to
the
to
the
node,
so
change
node.
D
D
In
this
case,
there
are
a
bunch
of
ports
being
deleted.
So
if
you
go
back
to
the
previous
shell
you're,
seeing
that
a
few
of
the
pods
are
getting
terminated,
subsequently,
a
few
pods
are
getting
recreated
and
at
the
end
of
this
d
shuttling
process,
all
the
pods
are
on
the
other
node,
which
is
no
longer
tainted.
D
So
this
is
the
strategy
that
we're
proposing,
and
it
also
shows
that
the
D
shadolla,
which
is
running
in
the
cube
system
namespace,
continues
to
run,
even
though
it
was
on
the
master
node,
because
it's
a
special
pod,
so
that
kind
of
summarizes
the
demo.
We
have
a
PR
up
in
up
on
the
D
jeweller
repo
of
a
she's
supposed
to
review
it,
but
if,
if
we
can
get
more
feedback
on
that,
that'll
be
great.
A
So
far
is
that
looks
like
new
scheduled
taint
causes
a
node
to
drain
basically
and
with
your
change
right.
If
I
wanted
to
achieve
this,
they
could
have
used
no
executing
right,
which
would
have
the
same
effect.
If
they'd
really
wanted
to
drain
a
node
I
think
they
would
probably
put
in
no
executing
I,
don't
know
it.
Probably
this
is
gonna,
be
something
unexpected
to
happen.
I
would
say,
probably
someone
puts
a
no
scheduled
chain,
I,
don't
know
it
because
they
don't
want
the
existing
cards
to
terminate
on
the
node.
D
Yeah
so
yeah
I
think
you're
right
in
that
sense
that
if
they'd
want
this
behavior
they'd
use
when
we
execute
ain't,
but
the
problem
that
we
had
was
in
a
in
a
deployment
that
is
up
and
running,
and
you
have
no
schedule
if
you
want
to
clean
up
your
environment,
and
you
want
to
make
sure
that
you're
running
pods
are
consistent
with
the
cluster
that
you
have.
You'd
probably
run
this
as
a
clean
up
operation
to
ensure
that
everything
is
compatible
and
in
a
desired
state.
D
A
A
So
this
this
definitely
should
be
configurable
because
it
could
potentially
cause
surprises
as
well,
because
you
know
people
may
think
that,
oh
if
I,
just
but
I
know
it's
controlled,
nobody's
not
going
to
get
new
parts,
but
suddenly
they
could
see
all
even
the
existing
parts
disappear,
which
is
not
exactly
what
the
API
says
so
yeah.
Yes,.
D
E
D
Yeah
so
like
at
the
time
it
was
being
scheduled,
it
was
complying
with
the
taints
that
were
on
the
node,
like
the
Toleration
that
the
pod
had
was
complying
with
the
taints
existing
on
the
node.
But
in
a
certain
scenario
the
taints
are
updated
or
your
fluster
gets
updated.
That
compatibility
is
no
longer
correct,
so
it's
no
complying
with
your
provided
toleration
so
that
that's
the
whole
idea
yeah.
A
You
know
one
of
the
one
of
the
reasons
that
you
were
thinking
about.
Having
a
new
scheduler
is
that
the
scheduler
tries
to
bring
the
cluster
at
runtime
to
a
state
that
is
specified
by
the
API,
for
example.
If
I
know
it
has
no
scheduled
no
pods,
it
is
scheduled
on
the
nodes
or,
if
you
know
we
have
other
scenarios
here.
For
example,
anti
affinity
could
compile
a
tit,
so
he
could
bring
the
schedule
the
cluster
to
the
state
that
is
compatible
with
the
specified
API
and
so
on.
A
E
E
D
I
think
yeah
so
yeah.
What
do
you
think
is
correct
in
a
way
that
so
the
part
where
the
pod
is
scheduled
is
only
considered
when
that
request
comes
initially,
but
the
de
leur
project
itself
talks
about
scenarios
where
you
would
want
to
actually
change
the
placement
of
the
pod
to
achieve
a
desired
state
in
your
cluster
Saudi
settler
project
kind
of
talks
about
like
scenarios
like
violating
the
pod
and
the
affinity
policies
having
do
pod,
pod,
anti
affinity
or
node
anti
affinity
policies,
so
things
like
that
are
considered
indicia
trigger.
D
A
Know
since
we
have
change,
I
can
agree
that
this
could
be
a
surprise
for
users,
especially
if
they
know
about
both
of
these
two
taints
and
they
they
purposefully,
put
and
no
schedule
taint
in
order
to
save
existing
parts.
I
don't
know,
then
some
feature
like
this
could
cause
surprises
for
users.
So
what
I
would
suggest
is
that
if
you
can
add
this
strategy
to
the
scheduler
but
keep
it
disabled
by
default,
so
only
users
that
really
want
this
feature
can
any
of
it
sure.
E
D
F
D
F
That
it
might
be
useful
to
for
this
kind
of
behavior
because
it
violates
the
API.
It
might
be
useful
to
have
it.
As
a
audit
system
tell
the
operator
this,
these
bots
are
in
violation
but
actually
afflict
them.
Okay,
maybe
I
doesn't
say
so,
I,
don't
know
if
it's
outside
of
the
goals
of
that,
but
just
to
be
compliant
with
the
API.
That
sounds
to
me
like
on.
D
So
far
like
the
use
of
digital
or
I
seen
this
is
this
has
been
used
as
exactly
the
way
you
just
described
as
a
clean
up
operation
once
in
a
while
and
not
as
a
cron
job.
But
in
the
repo
itself
like
it's,
it
said
that
you'd
run
it
as
you'd
run
it
as
a
cron
job,
like
maybe
running
once
a
day,
or
something
like
that
as
a
cleanup
operation,
but
yeah
I,
seen
being
run
once
in
a
while,
like
whenever
you
need
to
achieve
achieve
a
desired
state.
G
Wondering
if
you
could
share
a
use
case
that
that
might
explain
why
someone
would
want
to
automate
put
this
into
a
cron
job
for
these
d
scheduling
activities,
because
I'm
unclear
how
you
would
how
you
would
have
changes
in
the
taints
and
the
pods
already
running,
without
how
you'd
have
that
coming
up.
So
often
that
you'd
want
to
run
a
cron
job
to
check
to
remove
those
workers.
D
Oh,
like
I
mentioned
so
this.
This
was
an
issue
that
was
already
existing
in
the
de
leur
repo
itself,
so
I
kind
of
picked
it
up
from
there
and
and
kind
of
it
was
an
attempt
for
me
to
get
get
an
understanding
of
how
the
de
Sheckler
works
and
so
I
thought.
Okay.
This
is
a
problem
that
exists
in
the
in
the
repo
and
I'll
go
ahead
and
try
to
solve
it
and
get
an
understanding
and
understand
the
architecture
of
the
shelter
itself.
D
So
that's
where
I
came
from
I
I,
don't
have
a
specific
use
case
itself,
but
the
the
the
issue
described
it
exactly.
The
way
I
did
that
I
can
link
the
issue
as
well
in
in
the
agenda
item
as
well
to
describe
it
well,
but
the
idea
was
that
in
scenarios
where
you're
teens
on
the
node
get
updated,
you
want
to
ensure
a
compliant
yeah.
G
I
think
she
was
addressing
my
question.
I
just
wanted
to
make
sure
that,
when
we're
submitting
these
these
types
of
changes
that
we
know
exactly
or
we
have
some
examples
of
users
who
actually
have
this
need
right.
So
if
we're
just
coding
to
respond
to
gap
set
that
we
see
in
the
API
or
something
that
affect
that
to
me,
is
it
doesn't
seem
very
valuable
unless
we're
making
contributions
that
are
in
response
to
the
needs
that
users
are
actually
having
yeah.
A
I
mean
some
of
some
of
these
changes
are
useful
like
to
kill
you,
because
our
own
scheduler
does
not
really
check
anything
for
running
pods
right.
So
it
only
cares
about
pending
parts
which
are
not
scheduled
yet
so
if
cluster
conditions
change
and
turn
time,
of
course,
the
scheduler
will
no
longer
care
and
the
scheduler
helps
with
bring
the
clustered
to
desired
estates,
yeah,
I,
understand
and.
G
And
I
think
it's
a
valuable
process,
but
like
one
of
the
other
people
here
who
mentioned
the
fact
that
running
it
on
a
cron
job
is
different
than
having
an
admin
executing
it
on
a
as-needed
basis.
I.
It
scares
me
that
somebody
might
set
this
up,
not
knowing
what
types
of
jobs
would
beat.
These
schedules.
Yeah
I
know.
A
G
A
Once
it
becomes
like
a
established
feature
of
the
cluster,
people
will
probably
think
about
it
and
care
about
it.
No
matter
I
mean
if
it
matters.
We
also
have
something
similar.
My
Google
on
board
has
something
like
these
kids
were
as
well.
That
checks
the
cluster
and
tries
to
bring
the
cluster
up
to
the
standard
state
or
did
configure
to
state.
So
it's
not
something
completely
uncommon
in
cluster
management.
G
G
D
F
A
Yeah,
that's
what
that
was
my
question
as
well
yeah.
This
is,
does
it
actually
something
that
could
probably
cause
surprises?
If
user
is
not
fully
aware,
but
you
know,
as
we
discussed
a
user
can
probably
I
mean
we
can
believe
it
as
disabled
by
default
in
the
scheduler
and
the
users
who
really
want
to
have
this
kind
of
behavior
and
check
checks
at
runtime
for
no
scheduled
things
as
well,
then
they
can
enable
it,
but
otherwise
you're
right
I,
don't
think
that
she's
much
more
than
what
no
no.