►
From YouTube: Kubernetes SIG Scheduling - 2018-02-15
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
B
Steering
Brendan
Burns
has
submitted
his
proposal
and
there's
been
minor
feedback.
Nothing,
nothing
major,
but
the
intent
is
that
everything,
an
incubator
will
eventually
have
to
have
a
new
home
and
there
will
be
two
three
different
types
of
repositories.
The
first
is
the
main
kübra
days,
org
repositories,
and
hopefully
that
will
slim
down
too
as
well,
because
right
now,
there's
a
bunch
of
repos
in
the
main
kubernetes
org
that
shouldn't
be
there.
B
That's
still
gonna
work
itself
out
over
time.
The
last
another
type
of
repository
or
folks
are
calling
associated
repositories
where
it
basically
allows
people
to
be
covered
by
the
CN
CF
umbrella.
So
if
a
company
wants
to
sponsor
a
project
but
still
wants
to
house
it
or
neither
main
org
so
like
if
I
were,
you
know
for
hep,
do
some
ibly
project
that
I
worked
on
might
be
an
Associated
project
in
the
future
right
where
we
would,
we
would
have
the
CLA
bot
there.
B
So
that
means,
if
you
signed
the
CLA
at
any
moment
of
time,
where
they're
working
on
kubernetes,
you
could
contribute
to
anything.
That's
in
the
associated
umbrella,
so
I
think
for
most
of
our
work
within
this
sig
they
would
fall
underneath
the
kubernetes
things
org,
and
it
just
needs
to
follow
it.
The
naming
convention
that
spelled
out
in
Brendan's
doc
so
it'd
be
scheduling
whatever
right.
Okay,
so.
B
It's
not
merged
yet
there's
still
feedback,
that's
occurring,
but
most
of
the
feedback
is
pretty
minor
and
there's
not
it's
not
blocking
we
haven't
had
any
people
say
that
they
absolutely
are
opposed
to
this,
and
it's
been
out
for
we've
been
talking
about
it
for
weeks
internally
and
we've
exposed
it
for
one
week
to
the
entire
community.
So
there
hasn't
been
any
major
detractors
that
we
have
seen.
Okay.
B
They're
actually
I'm
I'm,
pretty
certain
that
folks
would
allow
us
to
move
and
create
repos
there
now
I,
don't
think,
there's
anything
preventing
it.
Cuz
the
org
has
been
created
and
I
have
admin
rights
to
the
org.
I
would
just
want
to
make
sure
that
I
get
buy-in
before
I
created
any
repos,
because
what
we
probably
should
be
doing
is
sponsoring
one
or
two
folks
and
try
to
feedback
on
what
the
mechanism
feels
like
and
if
it
works
for
everybody.
If
there's
any
problems
before
we
kind
of
unleash
it
on
the
whole
community
right.
A
B
C
B
C
B
B
C
C
C
Just
another
thing:
I
wanted
to
kind
of
run
it
by
I.
Think
so
for
moment
has
two
different
components:
one
is
Poseidon
another
one
is
formulated
for
moment
in
C++
base
and
Poseidon
is
a
kind
of
a
clue
between
kinetic
sand
for
moment,
which
is
called,
which
is
what
we
wrote.
So
is
that
issue
I
guess
we
will
have
to
make
the
Basel
thing
work
with
the
C++
build
process,
so.
B
Most
of
the
build
apparatus
is
containerized
builds.
So
as
long
as
you
have
like
a
two-stage
container
has
build
like
a
build
container
in
a
deployment
container,
which
is
like
what
pretty
much
90%
of
the
kubernetes
stuff.
Does
you
know
you
can
build
terminators
locally,
but,
like
the
actual
build
apparatus
for
testing?
Is
all
containerized
builds
the.
C
C
B
C
C
So
the
reason
I
would
like
to
those
guys
as
well,
though
I
think
you
know,
is
already
a
member
I'm,
not
really
sure
what
Marty
you
you
know
it.
You
know
that
yes,
yep,
okay,
okay,
good
yeah.
By
the
way
you
know
Liz
here
and
UC
Berkeley,
no
okay,
yeah
yeah.
He
moved
here
so
he's
a
postdoc,
so
we
work
very
closely
with
them.
C
B
C
A
B
A
All
right,
actually
I've
gone
through
one
of
these
papers
on
and
fermented
scheduler
as
well,
but
I
was
wondering
if
you
guys
can
have
a
presentation
in
one
of
our
Sigma
teams,
for
basically
everybody
who
is
not
familiar
and
to
refresh
our
own
memories
as
well,
so
it
might
be.
It
might
be
useful
if
you
can
have
a
like
a
I.
Don't
know
20
30
minute
presentation
in
one
of
our
sick
meetings.
If
I'm.
A
B
Great,
what
do
you
think
Tim
I
think
they'll
be
good,
I,
think
long
term
I
know
the
past
that
you
guys
followed
long
term.
If
we
eventually
get
to
the
point
where
we
have
a
full
scheduling
framework,
you
know
it's
in
my
heart
of
hearts.
I
would
ideally
love
to
see
the
firmament
scheduler
be
the
mainline
scheduler.
You
know
that'd
be
a
long
long
term
cycle,
though
I
don't
see
that
happening
anytime
soon,.
C
E
Of
the
things
that
that
I
definitely
be
interested
in
hearing
or
seeing
in
a
presentation
is
how
the
the
resource
tracking
fits
with
the
firmament.
I.
Guess
that
would
be
the
Poseidon
layer.
I
guess
how
it
was
the
deep
start.
So
we
have
a
heater.
Is
that
more
just
you
know,
sort
of
how
how
the
kubernetes
resources
are
mapped
to
what.
C
I
think
so
she
wanna.
Actually
she
was
there
on
the
call
he's
the
one
who
wrote
that
code
actually,
so
he
so
yeah
exactly
so,
there's
that
then
the
firmament
I,
don't
know
if
you
guys
read
that
paper,
there's
a
concept
of
task
descriptor
or
resource
descriptor.
So
your
question
would
be.
How
would
you
map
that
to
the
corresponding
entities
in
exactly
yeah,
definitely
or.
C
Exactly
so
last,
we
started
that
we
have
a
very
high
level
design
and
so
we've
starting
out
with
the
soft
constraints,
simple
soft
constraints
and
then
the
hard
constraints,
and
then
we
implementing
xor
rules
as
well,
so
that
you
know
I
don't
want
to
deploy
it.
So
I
have
a
bigger
set
of
five
replicas
I'm
gonna
make
sure
they
all
go
to
separate
nodes.
C
C
C
D
Also
have
a
similar
question,
so
not
only
that,
like
how
I
mean
not
only
they,
what
are
the
gaps
as
compared
to
default
schedule?
Is
there
to
behave
in
kubernetes
and
also
like
what
are
the
similarities
are
also,
there
are
other
things
that
are
not
in
default
schedule
or
data
provided
by
this
firmament.
Scheduler.
Definitely.
C
We
send
out
all
the
names
to
you,
folks
and
I'm,
so
Jim
will
create
a
repo
and
then
I
guess
so.
The
other
things
the
readme
file,
the
the
structure
of
the
intubation,
pretty
much
stays
the
same.
The
readme
file
which
are
sent
out
is
that
the
right
way
to
do
it
is
they're
pretty
much
the
same
like
the
way.
The
incubation
earlier.
Intubation
was
something
because.
B
B
C
Was
talking
more
about
murid
me
file,
you
know
you
go
in
there
to
our
repo,
for
example,
and
you
see
what
exactly
this
is.
What
are
the
key
advantages?
You
know
like
a
readme
file.
I,
don't
know
I
should
sell
again,
but
that's
what
I
was
thinking
of
actually,
including
that
when
you
go
to
the
or
now
would
be
great.
A
All
right,
so
the
next
item
on
the
agenda
I
want
to
give
an
update
on
priority
and
preemption.
This
is
an
effort
that
multiple
people
are
working
on.
There
are
different
pieces
of
this
work,
which
were
already
there.
There
has
been
some
improvement
in
the
performance
of
priority
and
preemption,
particularly
in
the
area
of
improving
performance
of
the
new
scheduling
queue.
There
has
been
some
work
on
adding
priority
for
critical
system
components
such
as
queue
proxy
and
a-cube
DNS
setup
yeah
go
ahead,
real.
B
A
A
good
point,
I
mentioned
it
in
that
rollout
dock
and
when
I
sure
did
I
asked
people
who
are
more
familiar
with
and
chaos
or
like
Cuba
diem
and
all
to
comment
on
this
because
I
I'm,
not
one
person
familiar
with
all
of
the
details
in
these
tools,
so
we
are.
What
we
have
done
is
that
we
have
updated
the
Yama
files
that
specified
there
are.
You
know
the
bases,
the
configuration
of
these,
these
components
like
queue,
proxy,
Hugh,
DNS,
etc.
B
Need
to
I'm
gonna
put
you
in
assignee
for
in
the
notes
to
add
a
link
to
the
PR
for
the
coop
up
stuff,
because
they
don't
need
to
be
socialized
broader.
So
what
will
likely
happen?
Is
people
will
miss
the
boat
for
a
period
of
time
and
then
they
will
be
added
on
and
try
to
figure
out
where
it
was,
but
I
think
if
we
were
to
bra
I
can
more
broadly
PSA
instead,
cluster
lifecycle.
That
here
is
the
PR
that
you
guys
should
reference.
B
A
A
So,
yes,
there
has
been
some
of
this
effort.
There
has
been
a
further
and
improving
or
updating
the
documents
with
the
new
with
a
new
behavior
explaining
that,
for
example,
reschedule
is
going
to
be
retired
and
some
of
the
new
changes
in
the
daemon
such
controller
class
is
working
on
a
demon
search,
controller
side.
His
peers
are
coming
and
some
of
them
are
already
merged.
The
other
ones
are
going
to
be
merged,
hopefully
all
before
the
code
freeze,
the
only.
A
Luckily,
we
have
found
one
internal
customers
at
Google
who
are
willing
to
try
this,
and
we
are
going
to
enable
the
feature
and
their
clusters
very
soon.
If
you
guys
know
other
people
who
have
liked
larger
clusters,
we
don't
want
to
go
with
like
super
large
clusters,
but
we
want
to
have
this
like
test
kind
of
meaningful
by
going
to
clusters,
which
have
like
several
tens
of
now.
Its
are
not
very
large
clusters
like
medium,
so.
A
B
Always
really
hard,
though,
like
to
get
actual
customer
feedback,
because
no
one
deploys
like
his
history
has
taught
me.
No
one
like
in
the
actual
community
deploys
bits
until
it's
like
that
one
or
dot
two
so
like
when
1.10
releases,
no
one's
going
to
actually
telex
like
1
10.1.
A
Have
confidence
that
it
works?
One
of
the
bigger
things
for
us
is
to
ensure
that
this
feature
is
not
gonna
break
any
existing
customer
if
they
don't
use
the
if
they
don't
use
the
feature.
Basically,
that's
one
of
the
bigger
bigger
things,
because
you
know
if
you
try
it
and
it
doesn't
work
well,
it's
easier
to
go
back,
basically
not
setting
priority
for
your
part,
but
breaking
the
existing
customers
who
don't
use.
The
feature
is
definitely
a
big
big
problem
and
we
don't
want
to
face
it
and.
D
A
D
A
The
problem
is
that
if
we
go
to
one
Chan
and
if
we
go
to
beta
in
1/10,
the
feature
will
be
enabled
by
default.
So
everybody,
including
those
who
don't
want
to
use
this
feature,
will
get
it
and
that's
why
I'm
saying
we
must
ensure
that
it
doesn't
break
existing
customers,
because
even
those
customers
who
won't
use
the
feature
is
gonna,
have
it
enabled
in
their
cluster.
So
that's
why
what
we
are
trying
to
ensure
is
working
fine
and
it's
now
I
gotta
break
anything
yeah.
B
And
so
so
here's
a
question,
a
broader
question
that
applies
to
this
there's.
A
lot
of
the
CNI
providers
are
going
to
need
to
have
priority
and
preemption
in
place
because
they
deploy
right
now,
as
Tina
said
pods
have
you
guys
worked
with
or
have
you
communicated
with
any
other
the
CNI
providers,
Weaver
caligo.
A
B
A
That's
that's
actually
a
good
feedback.
No,
we
haven't.
While
we
don't
expect
anything
to
basically
happen
in
case
we
don't
use.
If
we
don't
use
the
scheduling
freeze
of
the
demon
set
it,
we
cannot
really
retiree
Shore
without
communicating
with
those
guys
or
without
testing
it
in
more
realistic
scenarios.
A
G
C
A
Reschedule
err
has
been
used
to
ensure
that
our
critical
pods
are
scheduled
when
there
are
not
enough
resources
in
the
cluster
or
there
are
no
nodes
in
the
cluster
that
can
run
those
critical
pods
with
the
introduction
of
priority
and
preemption
or
clinic
pods
will
have
the
highest
priority
in
the
cluster.
Now
our
main
scheduler
will
take
care
of
scheduling
those
if
the
cluster
is
out
of
resources,
as
we
thought
we
don't
need
to
have
very
scheduler
anyone.
A
C
C
B
G
A
Okay,
so
yeah,
then
next
item
is
a
bug
that
we
have
faced
recently
happening
a
lot
more
in
certain
gke,
as
well
as
open
source,
kubernetes
customers,
you've
seen
that
scheduler
state
becomes
stale
or
basically
scheduler
cash
has
some
stale
information.
This
is
particularly
in
two
scenarios,
one
for
nodes,
one
for
parts,
because
we've
talked
briefly
about
those
before
so
we
are
working
on
trying
to
basically
chase
the
problem
down
to
see
where
it's
coming
from.
A
Our
recent
investigations
show
that
it's
probably
something
outside
of
the
scheduler,
maybe
something
like
some
events
are
not
sent
to
the
scheduler
by
either
it's
in
your
API
server.
So
folks,
here
at
Google,
are
looking
into
this
issue
to
find
out,
but
some
of
the
symptoms
that
we
have
seen
so
far
is
that,
for
example,
it
scheduler
thinks
that
a
node
is
full
because
pods
are
running
on
a
note,
but
the
pods
are
actually
deleted.
They
are
not
deleted
from
the
scheduler
cache.
So
schedule
believes
that
those
parts
are
running
as
a
results.
A
It
refuses
to
schedule
new
parts
on
the
nodes,
and
what
happens
in
this
case
is
that
autoscaler
does
not
add
new
nodes
to
the
cluster,
because
it
believes
that
these
pending
pods
are
scheduled
about
on
on
the
nodes.
So
there
is
a
disagreement
between
autoscaler
and
a
scheduler
autoscaler
waiting
for
scheduler
to
schedule
those
pods
and
never
does
that,
because
scheduler
believes
that
those
notes
are
occupied.
A
So
as
there
is
all
we
see
that
a
lot
of
these
parts
remain
pending
for
a
long
time
and
of
course
this
is
undesirable
behaviors,
it
also
seems
cuz
we're
trying
a
lot
to
schedule.
Pods
on
those
that
don't
exist,
and
this
problem
is,
we
have
have
been
able
to
add
a
workaround
for
it
in
the
scheduler.
Scheduler
deletes
such
notes,
if
in
if
it
faces
the
problem,
that
some
of
these
notes
don't
exist
when
it
tries
to
bind
pause
to
the
notes
anyways.
So
the
first
part
of
the
problem
is
a
major
part.
A
D
Also,
like
obviously
I,
don't
know
the
root
cause
but
I
think
around
one,
because
two
weeks
before
there
were
some
guys
like
whom
are
having
I
think
similar
issues
and
they
were
discussing
about
slack
channel
and
I.
Think
in
the
end,
what
what
those
guys
data
they
increase,
the
I
think
ups,
it
seems
actus
in
their
cluster
when
they
had
lower
QPS
a
dead
time.
They
had
that
issue,
but
I
think
when
they
increase
their
the
other
issue.
D
The
issue
was
solved
so
yeah,
so
it
seems
like,
like
those
guys,
hit
the
same
problem,
but
I
might
be
nobody
to
see.
It
seems
like
that
they
had
the
same
problem
and,
as
I
said
like
when
they
increase
that,
then
the
problem
went
away.
That's
what
is
in
fact
we
can
see
that
I
should
really
like
I'd
like
because
I
was
working
with.
There
know
also
like
I
mean
whatever
I
could
do,
but
in
the
end
they
told
they
did
that
and
I
think
the
other
issue
was
resolved.
Actually.
A
A
A
B
A
A
B
D
A
D
G
So
the
first
one
is
related
to
using
variants
like
as
of
non-balanced
resource
allocation.
We
are
using
only
CPU
and
memory,
so
in
online
environment.
What
we
have
noticed
is
there
are
chances,
notes,
exhaust
there
PVC
limits,
but
still
so,
basically
the
number
of
mounts.
Sorry,
the
number
of
volumes
that
could
be
mounted
on
a
no
while
helping
enough
CPU
and
memory
or
it
could
happen
the
other
way
around.
So
the
question
is,
should
be
as
of
now
they're
hard-coded
to
like
39
in
case
of
AWS
and
16
in
case
of
GC,
etc.
A
G
A
So
basically,
given
the
number
of
attached
volumes
and
given
the
number
of
volumes
required
by
PVC,
we
try
to
balance
the
number
of
volumes
among
the
cause
and
we
have
a
priori.
We
can
have
a
priority
function
that
tries
to
balance
the
number
of
volumes
yeah,
so
sure
I
I,
don't
see
any
problem
with
having
this
other
priority
function
and.
D
Then
one
more
thing,
I
think
in
one
of
our
previous
meetings
when
we
discuss
about
8
I,
think
we
also
discussed
I
think
that
time
in
a
brand
grant
was
also
in
the
meeting
that
we
would
like
to
have
some
sort
of
generic
function
so
that
in
the
future,
if
we
have
to
do
the
same
thing
for
GP
user,
we
might
add
them.
Also-
or
maybe
some
other
issue
say
yourself.
Ok,.
A
Right,
that's
a
good
point
for
TVs
in
particular.
I
think
it's
gonna
be
a
little
harder,
but
we
may
be
able
to
do
that.
This
is
this
is
slightly
different
because
we
need
to
we
need
to.
We
need
to
basically
look
at
the
number
of
PBS
already
attached.
We
also
need
to
look
at
the
number
of
TV's
require
requested
by
the
PVC
and
then
combine
those
two
which
is
slightly
different
from
the
other
models,
where
you
have,
for
example,
a
certain
amount
of
CPU
available
on
an
autumn.
A
A
G
B
B
Up
states
I
think
that
God
has
to
be
extender.
Do
please
you
Ravi
you're,
really
loud,
but
if
it's
part,
if
it's
part
of
the
main
node
updates,
if
some
basic
like
load
average,
is
passed
through
I'm,
totally.
Ok
with
that
in
the
main
scheduler.
But
if
you
wanted
to
add
any
heap,
stir
based
information
like
you
know,
there's
much
more:
you
could
apply
the
load
average.
B
A
D
D
But
but
I
think
another
concern
is
I,
think
that
we
had
before.
We
also
need
to
check
that
whether
the
matrix
being
reported,
those
are
like
instant
matrix
or
they
are
some
based
on
some
period
of
time
or
because
I
think
as
far
as
I
remember
the
concern
we
had.
If
they
are
not
like
based
on
some
time
period,
then
it
might
require
some
changes
in
matrix
at.
D
B
The
ID
ID,
almost
if
you
either
wanted
it
to
be
smart
about
this
type
of
thing,
you
could
do
automatic
update
on
insertion
versus
putting
into
the
scheduler
right.
So
if
you
had
some
type
of
history,
you
could
have
your
own
separate
component.
That
says,
you
know
if
you
have
history
of
the
actual
usage
data
that
you're
tracking
over
time,
you
can.
B
You
can
augment
the
initial
incoming
inbound
requests
with
something
that
looks
approximate
to
reality
and
that
type
of
augmenting
on
inbound
is
probably
better
than
building
into
the
mainline
scheduler,
because
it
could
be
a
totally
separate
system,
because,
if
you
put
stuff
into
the
mainline
scheduler,
it
will
only
be
as
good
as
the
data
that
you
are
feeding
into
it
and
that's
subject
to
change
over
time.
So
I
think
what
you
really
want
is
finding
a
good
fit
right
and
most
of
the
time
people
are
super
bad
at
sizing.
B
A
A
Could
right
all
right,
so
we
need
to
I
think
whatever
whatever
path
we
go
forward
with,
we
should
probably
either
built
an
extend
extender
first,
maybe
and
then
maybe
then
bring
it
back
to
the
schedule.
A
free
someday
field
it.
This
is
reliable
enough,
I
would
say,
or
we
could
completely
feature
get
it
looks
like
Tim
is
not
on
board
with
putting
it
in
the
schedule
right:
okay,
I'd.
G
D
Have
one
minor
thing
to
ask
like
for
schedule?
Eres,
para
ver:
si
group
like
we
have
two
kinds
of
I
think
we
have
reviewers
and
behave
approvers
list
right,
so
I
was
wondering
and
actually
even
I'm,
not
part
of
any
of
those
lists,
even
even
though
he
is
not
part
of
any
of
those
lists.
All
the
like,
obviously
like.
If
we
could
be
added
to
like
those
list,
definitely
we
would
like
to
be
more
responsive
al.
It's
no
I
mean
unless,
like
we
are
really
sure
about
that.
D
A
A
A
I
Basically,
my
question
was
about
trying
to
understand
the
behavior.
When
nodes
go
unresponsive,
I
mean
we
are
on
their
old
kubernetes
release
that
we
have
been
turned
on
this
feature,
but
last
night
we
turned
it
turned
it
on.
We
probably
don't
under
I,
didn't
read
the
documentation
enough,
but
from
what
I
have
read
like
it's
not
well
documented,
so
I
was
like.
I
A
A
My
knowledge
in
this
area
is
not
very
strong,
so
I
I
believe
it's
not
a
part
of
the
scheduler.
For
short,
not
is
always
for
removing
parts
from
those
unresponsive
notes,
but
of
course,
if
the,
if
the
posi
removed-
probably
maybe
by
by
the
node
controller
and
are
added
back,
then
they
come
back
to
the
scheduling
queue
and
the
scheduler
will
reschedule
them
on
existing
nodes
which
are
available
that
part
I
know,
but
the
first
part
which
is
like
who
is
removing
those
parts
from
the
nodes?
I,
don't
know
for
sure,
but
I
believe
it's.
D
I
D
If
I
I
think,
first
of
all,
if
the
pod
has
been
rescheduled
definitely
pod
is
not
going
to
go
back.
If
the
those
those
containers
are
still
running,
I,
think
I
think
they
either
need
to
be
garbage,
collected
or
I.
Don't
know,
I'm,
not
sure
exactly
but
yeah
I
mean
definitely.
Pod
is
not
going
to
go
back
to
that.
D
B
Coup
de
loop
will
evict.
The
Cooper
will
take
it
off
the
node,
because
the
bounded
pod,
that
is
the
location
of
the
bounded
pod,
which
is
a
location
inside
of
EDD,
that
it's
watching
it
will
no
longer
be
his
heart
pound
to
that
couplet.
So
as
soon
as
it
comes
back
online
and
talks
to
the
API
server,
the
couplet
will
remove
it.
I
see.