►
From YouTube: Kubernetes SIG Scheduling Meetings 20170424
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Okay,
it
says
that
it's
recording
we'll
see
I've
never
done
this
before
so
I
guess
we
should
get
started.
I.
Think
the
ten
items
for
today
were
talking
about
the
proposal
for
priority
preemption
resource
sharing
between
batch
jobs
and
then
also
talked
about
1.7
other
other
things.
So
Tim
you
were
the
one
who
wanted
to
discuss
it
at
this
meeting,
although
I
expected
that
we
would
have
done
it
anyway.
So
do
you
want
to
start
start
that
discussion?
Hopefully,
people
have
had
a
chance
to
read
the
proposal.
It's
kind
of
not
trivial.
B
I
think
the
first
question
is
that
you
know
the
document
has
an
opinion
view
of
some
of
the
batch
mechanisms
that
you
guys
may
have
worked
with,
but
I
think
there
might
be
several
implementations.
It's
not
really
clear
as
an
on
goal
from
the
beginning
of
the
document
that
this
is
almost
like.
An
architectural
perspective
like
this
is
one
way
we
could
possibly
solve
this
type
of
solution
where
the
primary
team
was
walking
in
his
priority
preemption.
But
the
outline
specifics
of
this
particular
batch
implementation
is
is
one
of
book
could
be
many
yeah.
A
I
mean
I
think
that
that
relates
to
a
comment
that
you
and
I
think
at
least
one
other
person
I.
Think
Joe
Beda
made
the
same
related
comment
about
whether
this
should
have
been
separated
into
two
documents
and
I
think
you
know
I
sent
to
the
mailing
list.
My
opinion
on
this
I
mean
I
think
it
could
have
gone
either
way
and
there's
there's
good
arguments
on
both
sides
sort
of
the
in
separating
the
priority
preemption
stuff
from
the
batch
scheduling
and
the
resource
arbitration
among
that
job
stuff.
A
My
argument
was
that
they
should
be
considered
together.
All
those
definitely
there
needs
to
be
sprits
design
Docs
in
more
detail
for
both
things.
That
wasn't
was
what
this
was
trying
to
do.
Like
you
said
this
was
trying
to
do
a
high-level
architecture,
and
that
probably
was
not
made
clear
enough
in
the
document,
but
the
argument
for
combining
them
was
just
that.
You
know
we're
kind
of
we're
talking
about
how
to
you
know
a
well.
C
A
To
share
the
resources
in
a
cluster-
that's
that's!
That's
too
vague
to
be
meaningful,
but
but
specifically
like
how
to
decide
which
workloads
and
which
car
could
which
work
was
most
important
at
any
given
time
and
who
should
be
made
to
wait
and
that
kind
of
issue
happens.
Kind
of
in
both
both
systems,
the
the
batch,
the
batch
queueing
model
and
then
also
in
the
the
priority
preemption
stuff,
they're
kind
of
both
doing
preemption
of
a
sort
and
prioritizing
stuff.
And
so
that's
why
I
kind
of
wanted
people
to
be
thinking
about
them
together.
C
A
Eric
to
nandana
rude
and
that
of
the
people
in
the
gaps
have
been
thinking
about,
attach
scheduling
this
gang
scheduling
for
patch
stuff
already
because
they
needed
something
for
spark
to
set
up
multiple
resources,
both
the
ones
that
produce
pods
and
the
ones
that
produce
other
kinds
of
API
resources.
Like
secrets
and
stuff
like
that,
and
so
they
had
been
thinking
about
some
kind
of
batch
queuing.
Some
and
batch
admissions.
A
Some
and
and
I
had
been
thinking
about
priority
and
preemption
I
mean
there's
been
discussions
going
on
for
a
long
time
and
then
the
third
third
piece
of
the
puzzle
was
was
Klaus
from
IBM
who
had
a
proposal.
I
forgot
what
it's
called.
It
was
called
by
multiple
applications
sharing
a
kubernetes
cluster
or
something.
Then
he
had.
He
had
written
up
a
proposal
and
there
have
been
some
discussion
on
that
and
so
I
thought.
A
This
is
a
good
time
to
try
to
pull
together
those
three
different
threads
into
something
that
we
could
because
a
lot
of
unified
proposal,
but
I
do
understand
that
it's
a
lot
of
moving
pieces
and
that
that
it's
useful
to
decompose
them.
If
we
go
as
we
go
into
more
detail,
I
mean
if
this
data,
this
kind
of,
was
curious.
What
people
think
about
kind
of
the
overall
architecture
that
was
proposed
there.
C
Is
nice
for
the
authors
to
to
consider
all
the
things
but
for
consumption
of
other
people,
maybe
would
have
been
better
to
have
two
or
three
Shepherd
dogs
yeah.
A
And
I
think
we
will
have
to
have
two
or
three
separate
ducks
I'm,
not
sure
I
agree.
That's
only
useful
for
the
authors
to
consider
these
issues.
Good
I
mean
I.
Think
that
you
know
we
want
the
community
to
understand
and
evaluate
whether
this
is
a
good
idea,
because
you
know
we
do
want
to
get
consensus
on
this
kind
of
thing
and
I
think
that
I
mean
this
is
maybe
we
shouldn't
waste
time
arguing
about.
A
B
Think
I
think
it
was
just
it's
just
a
matter
of
language
in
the
beginning,
honestly,
so
that
the
set
of
what
this
document
actually
the
purpose
of
the
intent
behind
it
to
explicitly
state
that,
then
it's
not
a
not
it's
like
a
non
goal
of
this
document
to
outline
you
know
like
the
actual
batch
system
itself.
This
is
an
example
of
one,
and
you
know
the
implementation
details
could
be
left
it's
what's.
B
A
You
know
because
there's
all
kinds
of
allocation
policies
out
there
and
the
real
world
but
I,
think
there
are
certain
pieces
and
maybe
that's
part
of
the
process
of
processing.
This
document
is
understanding
which
are
the
pieces.
We
want
to
be
kind
of
fixed
and
say
this
is
part
of
the
architecture
and
which
are
the
pieces
that
are
just
like
you
said.
Here's
an
example.
Here's
an
illustration
of
this
kind
of
thing.
You
could
do
yeah.
B
I
like
that,
last
summary,
your
last
sentence
actually
was
concise.
It
would
help
a
lot
I.
The
other
point
I
wanted
to
make
too
as
well.
You
use
gang
scheduling
a
number
of
times,
but
I
wanted
to
make
sure
it's
used
in
the
context
that
most
other
people
understand
it,
because
gang
scheduling
is
usually
book.
Block
scheduling
done
in
one
single
alloc
around
typically
for
purposes
of
MPI
jobs
where
you
have
a
coordinated
start
new
resource
starting
is
important.
A
I
mean
that's
a
good
question:
I
think
we've
been
abusing
the
term
and
I
think
I
think
you're
completely
right
that
we
shouldn't
be
doing
it.
I
mean
we
were
like,
like
we
were
talking
about
gang
scheduling,
one
aspect
of
it
being
like
scheduling
things
like
secrets
and
volumes
at
the
same
time,
as
you
admit,
a
a
job,
that's
going
to
be
producing
pods
and
things
like
that,
and
that
may
not
be
the
right.
A
C
Here,
I
think
it's
just.
If
you
have
true
gang
scheduling,
you'd
need
a
way
to
say,
I'm
submitting
this
thing
and
I
need
a
hundred
things
that
are
the
shape
for
a
hundred
machines
or
whatever
that
are
adding
them
all
at
once
and
we're
saying
like
you
could
instead
submit
like
a
pod.
That
then
goes
and
schedules
a
hundred
things,
and
so
we
want
some
way
like
pre
reserved
resources
without
requiring
the
exact
shape
of
the
thing
that
you
want
to
run
to
be
specified
in
a
very
verbose
configuration.
Language
is
part
of
Q&A.
C
So
what
if
you
need
like
on
your
things
like
this
and
20,
to
look
like
that
and
47?
Look
like
that,
like
I,
don't
want
to
write
that
a
config
file
I
want
to
like
have
a
fairly
good
probability
that
this
now
is
the
time
to
launch
that
and
then
launch
a
program
that
then
goes
and
launches
the
gangs
that
I
need
the
collections
of
identical
things
that
I
need.
C
A
gives
a
desire
to
like
so
it's
three
things.
There
there's
like
a
desire
to
have
an
aggregate
resource
specification
for
a
deferred
like
collection
creation.
It's
like
there
should
be
this
creamy
about
this
mini
resourcing
for
you
and
think
about
starting
this.
It
still
might
not
be
enough.
If
there's
weird,
then
banking
problems
like
this
is
about
what
I'm
going
to
need.
C
A
Yeah
I
think
we
want
to
accommodate
different
kinds
of
jobs,
the
kind
where
I've
heard
a
good
term
internally
used
at
Google
for
four
I,
think
crystalline,
or
something
like
like
like
jobs
where
like.
If,
if
any
one
instance
dies
and
I
mean
there's
multiple
definitions
of
it,
the
one
would
be
like
if,
unless.
A
Spark
with,
like
you
know,
I
forgot
what
it's
like
like,
where
there's
some
mode
in
spark
where
it
can
dynamics
I,
think
it's
called
dynamic
allocation
where
it
can
dynamically
vary.
The
number
of
tasks
that
are
running
the
number
of
executor
is
over
time,
and
and
so
it
can
adapt
to
the
fact
that
there
may
be
less
or
more
resources
and
it
wouldn't
want
to
be
killed
if
it
get
less
so
I
mean
I.
Definitely
think
that
we
would
want
to
accommodate
both
both
kinds
of
job.
A
D
A
So
I
mean
that's
a
good
point
to
there's
a
hybrid
I
guess
where
it's
like
their
stomach.
Well,
exactly
what
you
said:
there's
like
if
I
can't
get
at
least
this
much
and
fill
me,
but
ideally
I'd,
like
more
and
sort
of
within
that
range,
it's
okay
to
reduce
my
allocation.
As
long
as
you
don't
go
below
some
certain
value.
B
C
So
the
questions
people
that
have
worked
in
up
heavily
in
other
batch
systems
do
they're
like
one
correct,
well
accepted
model
for
cubes,
or
is
this
something
where
everyone's
got
an
opinion
about
it?
It's
better
to
like
make
sure
it's
pluggable,
so
everyone
can
do
their
own
thing
or
are
people
going
if
we
come
up
with
something?
That's
reason
we
flex
everyone
envisions.
She
wanted
to
talk
to
you
like
yarn
or
or
some
great
engine,
or
should
we
just
like
already
thoughts
on
that
there's.
B
I
think
that
experimentation
alone
is
worth
its
weight
in
gold,
because
what
you'll
probably
find
is
people
will
create
a
novel
implementation
of
some
system,
whether
it
be
some
stream
processing
engine
or
some
other
system
where
they'll
have
their
own
weights
and
models
based
upon
whatever
problem
they're
trying
to
solve
and
in
other
systems
which
you'll
find
you
is,
it
has
there's
no
canonical
example,
and
sometimes
it's
more
expressive
than
it
is
it's
massively
wide
open
space.
The
system
of
batch
processing
goes
back
30
years,
I.
E
I
would
definitely
say
that
in
the
yarn
universe,
there
are
originally
three
now
early,
two
major
schedulers,
because
it
was
originally
decided
to
make
it
flow
level
and
the
fact
that
it
was
pluggable
led
to
a
bunch
of
innovation
that
the
original
Hazuki
MIT
yahoo
and
did
not
come
up
with.
So
the
probability
was
valuable
from
the
project
perspective.
But.
E
I
have
to
imagine
that
were
somebody
trying
to
make
a
were
somebody
trying
to
make
it
system
in
yarn
that
also
had
continuously
long-running
application.
I,
the
two
existing
ones
wouldn't
work
well
for
them,
and
so
they'd
have
to
explore
a
whole
bunch
of
new
designs
there.
So
my
suspicion
is
that
the
plug
ability
pretty
useful
well.
C
I
think
I
was
thinking
about
is
the
the
priority.
Preemption
mechanisms
would
probably
be
not
very
pluggable
so
and
the
concept
of
having
like
a
collection
they
get
em
cubed
like
it
acute
to
be
run
in
the
future.
That
would
be
pre
architect
into
kubernetes,
but
the
policies
about
you
could
use
with
queues,
excuse
cascade
to
each
other,
how
you
prioritize,
which
fuse
to
run
what
resources
are
needed
before
you
start
something
from
a
queue
or
preamp
something
from
different
views.
That
would
be
highly
pluggable.
That
makes
me
it's.
C
The
only
batch
jobs
would
use
queues.
Many
batch
jobs
would
need
a
pluggable
policy,
whereas
long-running
jobs
would
just
be
started
directly
using
resource
code
ever
so
there
would
need
to
be
some
pluggable
interaction
between
like
long-running
resource
quota,
which
basically
is
like
cool.
If
you
imagine
a
time
to
mention
the
quota,
its
quota,
the
active
in
that
it's
constant
for
over
the
entire
timeline
versus
like
batch
quota,
we
have
an
aggregate
amount
that
you're
then
shuffling
amongst
different
sub
uses
I.
That
makes
any
sense
from
that's
how
I
was
thinking
about
it.
Yeah.
A
B
Problem
with
a
lot
of
systems,
too,
is
they
want
to
rebalance
dynamically.
So,
even
if
you
had
a
coordinating
system,
you
might
want
to
have
an
expressive
policy
by
which
you
can
change
the
evaluation.
This
was
commonly
done
on
existing
legacy
systems,
including
LSF
and
Condor,
where
it
was
much
more
expressive
to
be
done
on
the
fly
right.
So
that
way
you
could.
You
could
rebalance
your
whole
cluster
upon
a
couple
of
knobs
yeah.
E
Sorry
got
it
I'm
wondering
if
there's
some
kind
of
approach
where
the
primitive
operations
that
are
possible,
like
you
know,
preempt.
This
thing
launch
this
thing
right
now,
right
I
reserved
this
in
a
bad
way.
If
those
operations
are
available
operations
and
there's
a
kind
of
pluggable
piece
of
code
that
maybe
has
a
thread
or
two
that
runs
in
one
active
place
at
at
a
moment,
and
that
thing
then
uses
those
primitives
so
that
you
could
like
the
decisions
about
what
to
preempt
or
whatever
could
be
very
plausible.
A
E
A
E
To
be
followed,
yarn
yarn
has
it
there's
in
yarn.
The
actions
themselves
are
even
more
primitive
right
that
the
actions
aren't
even
preempted
this
or
whatever.
The
actions
are
like
messages
that
you
send
to
the
nodes
and
I
think
they
went
a
little
too
far.
They
could
have
had
slightly
higher
level
operations
that
are
a
little
safer
to
use,
but
the
policy
of
they
preempted.
This
don't
crimp
this
launch
of
this
in
a
in
a
batching
way
or
whatever
those
it's
not.
E
C
Way
I've
been
thinking
about
it
is
like
app
when
a
pod
is
on
a
node
like
the
policy.
There
is
like
that.
Preemption,
that's,
not
pluggable!
That's
up
to
coolant!
You
really
can't
affect
that
when
you
are
like
have
multiple
things
that
are
pending
and
you
need
to
like
schedule
something
now.
That's
not
pluggable
other
than
the
fact
that
a
scheduler
is
pluggable
but
like
if
you're,
using
a
stock
scheduler.
You
can't
really
plug
in
how
priorities
handles
things.
C
Look
for
pods
that
are
pending
for
good,
for,
like
a
group
of
pods
to
be
creating
the
future,
meaning
like
a
replication,
controller
or
deployment
or
whatever
like.
If
you
want
to
create
that
at
a
future
time,
and
it's
going
to
make
more
things
like
that
collection
of
entities
is
like
that's
a
cute
job
and
that
is
highly
configurable.
How
you
pre
end
that
collection,
I
bet
you.
E
Could
make
something
like
that
to
work
right?
The
idea
that
you
say
if
kubernetes
could
have
an
API
promise
that
if
you
set
up
your
priorities,
this
way
then
kubernetes
will
show
is
to
preempted
in
this
way.
You
know
you
know
you
like
this
ordering
or
whatever
and
then
the
thing
that
causes
that
there's
an
intermediary
in
between
the
user
and
what
kubernetes
knows
about
those
things
and
that's
the
pluggable
thing,
so
the
user
may
speak
one
language
to
the
pluggable
thing
and
then
that
pluggable
thing
goes
and
says.
E
A
So
I
mean
we
kind
of
proposed,
like
I,
think
the
way
that
that
would
fit
in
with
what
we
proposed
in
the
dog
is
that
you
know
you
could
have
something
like
an
admission
controller,
not
a
quota.
This
is
unrelated
to
quota,
but
like
an
admission
controller
that
Maps
properties
of
the
pod,
like
maybe
QoS,
or
something
killed,
or
something
to
internal
priority,
which
is
the
thing
that
we
would
put
on
the
pod
and
then
that
priority
would
be
used
by
the
scheduler
and
possibly
other
components
to
decide
who
who
to
preempt
them.
A
To
do
that
and
to
do
the
preemption
sue
like
if
I
understood
what
you're
suggesting
correctly,
it's
kind
of
like
adding
a
level
of
indirection
between
what
the
user
specifies
and
the
total
ordering
on
the
priorities
that
the
system
enforces
yeah
so
that
I
don't
think
that
was
explicitly
mentioned
in
the
doc,
but
I
think
that's
a
I.
Think
that's
a
good
I!
Think
that's
a
good
idea.
I
mean
the
one.
The
related
thing
that
we
mentioned
the
dock
was
the
author
administrator
should
be
able
to
map
the
names
for
the
total
order.
C
A
C
Would
make
into
many
decisions
if
the
hierarchy
is
like
physically
based
that'd
be
gross
to
distribute
that
hierarchy?
To
all
the
notes?
Sorry
I
didn't
hear
what
you
said
Eric
when
we
had
that.
If
we
had
non-numerical
priorities
which
were
user-configurable,
then
you
would
have
to
dispute
that
knowledge
to
every
node.
B
So
coulis
can
make
independent
pop.
Yes,
it
could
just
be.
It
could
be
delayed
evaluation
if
it's
an
expression
that
gets
evaluated
upon
the
number
actually
coming
in
then
then,
during
a
cycle
it
could
evaluate
because
usually
there's
periodic
evaluation,
expressions
that
occur
in
other
systems,
so
even
on
the
coolant
so
like
there's,
there's
killing
values
and
circles
that
occur
to
reevaluate
priority
and
preemption
on
the
notes,
and
if
it's
expression,
then
that
value
would
be
re-evaluated
every
single
time.
I
have.
E
Sorry,
it's
a
Hadrian
I
see
what
I,
okay,
so
that
that
could
be
a
scheme
that
could
work
I
mean,
but
I
would
hop
suggest
that
maybe
that
scheme
having
an
operator
say
you
know
these
are
my
orderings
or
whatever
is
something
that
could
be
in
the
pluggable
part
of
what
we
just
imagined
and
the
actual
you
know,
and
so
the
operators
interface,
if
they're
use
exact
cause
that
instantiation
of
the
plug-in
might
be.
This
string
ordering
I
have
some
concerns
about
that.
C
Like
users
are
allowed
to
specify
integers,
because
those
have
the
problem
of
like
oh
yeah,
like
big
tengo,
220
problem,
but
the
like
cluster
components
like
the
scheduler
and
the
Google,
it
should
use
integers
and
the
API
server
should
and
should
be
a
thing.
That's
responsible
for
mapping,
strings
to
integers
and
somehow
anyone
and
so
there's
never
a
remapping
problem.
I
am
I
know
that
as.
E
Long
as
there's
an
atomic
way
to
get
the
information
from
this
pluggable
scheduler
ish
thing
to
any
individual
node
and
that
can
be
a
atomic
consistent
operation.
You're,
probably
good
I
understand
what
you
mean
by
atomic.
If
you
need
to
update,
let's
say
the
thing:
that's
going
to
the
integer,
the
nodes
that
makes
it
decide
what
to
preempt.
If
there
are,
you
know
thirty
pods
on
the
nodes
with
varying
integers,
for
whatever
things
they
are.
You
know
you
don't
want
to
update
four
of
them,
but
not
the
other
twenty
six.
C
E
But
I
would
think
at
a
higher
level.
Point
I
think
this
idea
of
the
the
operator
specified
ordering
should
ideally
be
relegated
to
this
pluggable
thing
and
is
thus
not
a
part
of
the
core
kubernetes
thing.
It
may
be
reference
implementation
or
something,
but
that
kind
of
thing
is
an
area
that
I
would
expect
a
community
to
innovate
on
I,
see.
E
C
E
Know
and
then,
within
each
business
unit,
they
may
have
priorities
of
themselves
until
you
can't
ask
business
unit
a
and
business
unit,
B
like
an
operator
trying
to
compare
all
of
a
s
and
B's
individual
things,
they
don't
know
how
to
compare
they're
only
within
two
did
seem
to
know
to
comparison
and
mobile
ordering
other
work.
Every.
A
B
F
A
C
A
C
Question
is
like:
how
do
you
think
that
might
be
an
innovation
here?
Is
how
like
how
you
mix
hierarchical
group,
wrote
it
for
long-running
jobs
with
a
whole
group
quota
for
batch
computation?
How
that
relates
to
priorities,
how
both
those
are
distributed?
How,
when
you
submit
a
job
you're
expressing
without
your
consuming
batch
or
a
long-running
quota,
yeah.
F
C
A
useful
distinction
we
will
think
about
separating
kubernetes
that
is
consuming
long-running
quota,
which
is
like
giving
to
that
group
indefinitely
because
it
has
an
own
business
need
versus
like
batch
quota,
which
is
like
somewhat
shared
and
sloshed
around
do
unpredictable
future
needs,
and
so
therefore
we
want
to
separate
those
that
you
I
don't
know.
People
agree
with
that.
The.
A
Document
kind
of
made
a
proposal
in
that
regard.
It
kind
of
a
way
to
separate,
have
separate
quota
for
the
long-running
stuff,
what's
called
I,
guess
continuous
running
the
doc
and
and
the
and
the
batch
that
stuff,
which
could
then
be
dynamically
reallocated,
based
on
fair
shares
of
hierarchical
queues
and
stuff,
like
that
and
I
think
the
one
other
piece
of
that
was
like
you
know,
we're
proposing
that
the
batch
controller
would
have
some
interface
to
the
collections
that
is
managing.
A
So
we
could
like,
like
be
kind
of
like
the
the
scale
sub
resource
where
it
could
adjust
the
size
of
the
jobs
that
it's
running,
to
keep
them
within
the
kind
of
the
quota
or
the
whatever
the
resource
allocation
budget.
So
that
was
sort
of
a
kind
of
preemption
like
mechanism
that
was
proposed
for
a
batch
was
something
where,
like
the
batch
controller,
would
be
aware
of
the
shares
that
each
of
the
jobs
should
be
using
and.
C
A
Would
would
tell
them
to
scale
to
the
appropriate
amount
and
then
the
continuous
running
stuff
would
use
kind
of
a
more
direct
preemption
mechanism
where
the
scheduler
could
preempt
the
lower
priority
things,
and
you
could
somehow
separate
the
quota
for
the
long-running
stuff
and
and
the
bad
stuff.
So
here.
B
Here's
an
interesting
phenomenon
that
I
don't
think
a
lot
of
people
have
keydown
to
with
kubernetes
in
the
fact
that
you
can
run
you
can
create
your
own
batch
system
atop
by
the
notion
of
pods,
in
the
shape
of
your
pod
and,
like
I've,
talked
about
this
sporadically
with
some
of
their
folks,
but
typical
batch
systems
take
advantage
of
the
they
try
to
prevent
you
from
going
through
scheduling.
Again,
that's
actually
the
purpose
of
high
throughput
right.
B
So
if
your
goal
is
primarily
throughput,
if
you
have
something
that
has
the
same
dimensions
or
similar
dimensions,
you
do
what's
called
claimer
use.
So,
like
you,
you
start
up
things
again
on
that
same
machine
without
me,
going
through
this
whole
round-robin
thing
and
that
individual
scheduler
can
have
its
own
priority
preemption
model
outside
of
this.
So
if
you
simplify
the
process-
and
you
take
all
of
the
nomenclature
that
you
have
around
batch,
you
just
take
it
out
in
the
idea,
and
you
buy
a
very
simple
priority.
B
Preemption
model
they
can
do
their
own
schema
internally
and
use
kubernetes
itself,
as
they've
landed
on
there
to
adjust
their
weights
and
shares
right.
I.
Think
simplifying
the
model
that
you've
created
allows
allows
for
other
people
to
experiment
using
the
primitives
that
already
exists
inside
abscissa
Timothy.
Are
you?
Are
you
saying
that
you.
C
B
Think
it's
I
think
it's
a
race
I
think
let
it
3d
algorithm
exist.
They
they
try
to
submit
what
they
can,
but
they
maintain
their
own
cubes
right,
I'm,
not
saying
have
a
queue
within
the
system
and
say
having
a
queue
within
each
individual
subsystem
offload
that
logic
on
to
another
system,
so
they
maintain
their
to
you
entirely.
Why
put
the
queue
inside
the
core.
C
B
A
Chris
doesn't
see
across
you'd
have
consistency
across
your
model
at
the
core
but
like.
If
you
wanted
to
do
your
own
system,
you
could
design
that
right.
So
I
think
there's
this.
This
slide
bar
of
whip
belongs
and
core
look
belongs
outside
and
I
think.
For
the
first
part,
we
can
probably
slide
the
slide
bar
down
to
prevent
everything,
read
the
minimal
set
of
priority
preemption
primitives
into
the
core,
and
then
every
time
we
can
slide
that
out
right
where
we
patterned
become
a
version.
B
A
I
mean
I,
think
there's
I
mean
I
kind
of
agree
with
with
Erics
observation
about
the
consistency,
saying
I
mean
you're
kind
of
proposing
a
two-level
allocation
thing
and
I
may
be
one
way
to
make
these
combine.
These
ideas
together
is
not
a
two-level
but
yeah
I
know
I
mean
if
you're
saying
you
have
Frank
per
framework
schedulers
somebody
is
deciding
how
much
resources
that
each
framework
should
get
right.
A
So
that
was
what
I
was
referring
to
as
like
the
first
level
allocation
would
be,
would
be
how
much
resources
the
framework
to
get
and
the
framework
gets
to
decide
among
its
jobs.
Was
it
maybe
I
misunderstood
here
before
you
were
suggesting
I'm.
B
Removing
something
sim
to
that
idea,
but
it's
still
central
in
the
core
I
think
I
think
when
I'm
trying
to
push
back
on
ever
so
gently
is
the
notion
of
pre-baking
into
many
concepts
into
the
core
and
to
support
plug
ability,
and
it
starts
with
very
simple
atoms
and
then
slowly
grow
that
over
time.
But
what
what
the
proposal
darks
upon
is
is
several
core
things
and
they're
all
kind
of,
inter
woven
based
upon
some
experience
that
you
guys
have
right.
Yeah.
A
I
mean
what
so
justify
that
make
sense,
I
just
to
finish
the
thought
that
I
was
doing.
What
before
was
that?
Like
you
know
we,
the
idea
was,
you
would
in
queue
these
batch
objects
and
they
could
then
I
guess
manage
I
mean
yeah,
I,
guess
I
was
going
to
say.
Those
controllers
could
then
manage
resources
like
within
them,
but
it
could
have
sort
of
a
very
sink
one
level
flavor.
The
thing
that
we
proposed
and
yeah
there's
pros
and
cons
of
that
I
am.
C
A
winner
cause
any
maysa
users
on
the
call.
Maybe
they
haven't
talked
yet.
Who
could
give
a
you?
A
social
perspective
may
be
talking
about
long
run
like
marathon
and
batch
happening
in
the
same
cluster
or
have
frameworks
versus
our
whole
group
photo
ID,
one
that
wants
to
talk
about
that.
I,
don't
know
one.
G
One
point:
bad,
it's
just
a
kind
of
building
with
Tim
said
before
I
think
that
the
most
important
concept
to
bring
in
would
be
to
make
the
Q
public
kind
of,
like
you
have
in
your
jism,
where
you
have
a
first
class
thing
called
the
Q,
and
you
know
the
pods
that
you
need
to
run
our
inspectable
easily
from
the
API
there's.
One
problem
that
happens
when
you're
trying
to
gang
schedule
is
eventually
you
see
that
you're,
starved
and
you're
wondering
why.
C
C
H
Different
type
of
badge
to
have
to
consider,
for
example,
the
background
job
say,
for
example,
a
compaction
job
which
is
not
really
controllers,
could
use,
but
more
of
notification
driven
I
mean
one
mode
is
to
say
that
a
you
trigger
it
explicitly
only
through
controls,
the
other
common
mode
with
the
stapler.
Today's,
for
example,
chef
or
some
storage
notice.
Essentially,
the
background
of
is
the
good
based
automatically
triggered
based
on
you
know,
available
states
right
and
the
question
is,
then:
we
need
to
tie
that
into
the
oral
scheduling,
get
the
be
maximal
benefit.
H
H
Know
what
I
meant
with
yes
I
mean
there
are
two
modes
of
running
it.
One
is
it's
a
D,
for
example,
a
background
compaction
and
storage
background
compaction.
You
want
to
do
it
on
time,
trigger
other
is
automatic,
and
today
the
systems
that
are
deployed
more
in
automatic
fashion,
often
I
mean
basically,
when
you
start
seeing
hey
you're
running
order
phase
you
automatically,
then
the
tied
job
internally
there
in
that
node
fight.
So
the
question
is
now:
this
has
a
overall
scheduling
impact.
The
question
is:
how
is
that
being
conveyed
to
the
controller
it?
H
A
Yeah
I'm
not
sure
I
understand
the
distinct
I
mean
like
like
I,
think.
The
proposal
here
is
that
everything
would
go
through
some
kind
of
controller
right
and
so
and
it
would
be
making
the
decision
about
about
if
anything,
preempted
or
scaled
down
to,
because
the
new
work
is
higher
priority.
I'm
not
wasn't
clear
to
me
what
how
if
the
thing
you're
describing,
does
or
doesn't
fit
into
that
model.
H
It's
more
like
a
use
case.
The
question
is
well
we're
considering
I
mean
revamping
the
scheduler
considering
all
types
of
jobs,
then
how
does
this
fit
into
the
that
paradigm?
Right?
Basically,
it's
a
background
job.
It
can
be
triggered
through
a
controller
or
it
could
be
automatic
and
putted
systems
are
deployed
more
in
automated
fashion.
When
there
is
a
trigger
threshold,
a
you
run
out
of
space
or
some
other
trigger.
H
Then
you
basically
start
that
backgrounds
of
automatically
in
the
node,
but
none
if
you're
doing
hedges,
and
if
you
convey
that
information
to
you
know
the
central
scheduler,
then
the
scheduler
can
do
a
better
job
of
overall
resource
scheduling
right.
So
it's
a
slight
variation
of
a
bad
job
which
I'm
bringing
in
the
question
is
whether
we
want
to
you
know
start
thinking
about
at
this
point
here
or
you
consider
that
hey
is
it
a
separate
extension
effort
or
you
know,
think
about
I
mean.
A
I
would
hope
that
we
could
come
up
with
something
that
could
accommodate
that
I
mean
like.
If
it's
something
that
needs
to
run
right
away,
then
you
would
run
it
a
high
of
either
and
a
high
priority
queue.
Is
it's
a
batch
job
or
at
a
high
priority?
If
it's
a
continue,
it's
not
a
continuous
design
job
so
like
it
would
be
like.
If
you
considered
a
batch
job,
then.
A
Queue
or
something
like
that,
and
we
didn't
have
any
concrete
proposal
like
here
about
deadlines,
we're
trying
to
do
anything
like
you
know,
deadlines
based,
scheduling
or
time
shifting.
It
was
more
like
you
know,
release
the
job
once
the
necessary
resources
are
available,
but
it's
all
kind
of
real-time.
It's
not
like
reshuffling,
the
ordering
of
pending
jobs,
so
that's
like
the
ones
that
are
due
soon
run
run
sooner.
A
H
Absolutely
up
locally,
it
will
certainly
work,
but
my
question
is
in
a
hedge,
a
model.
If
you're
scheduling
to
multiple
nodes,
then
you
know
is
there
something
which
you
know
can
be
done,
smarter
with
the
scheduling
or
overall
just
load
balancing
the
traffic,
probably
I
would
think
a
dynamic
load.
Balancing
of
the
traffic
will
be
more
applicable
than
scheduling
is
what
I'm
seeing
now
thinking
about
it
more
and
more
because
the
I
mean,
depending
on
the
time
frame,
I
mean
basically
how
long
it
runs.
H
Then
it
may
not
be
scheduling
impact
opposed,
it's
gonna
run
for
a
minute,
then
maybe
the
scheduler
can,
in
that
window,
do
a
better
job
of
scheduling
other
resources
right.
But
if
the
time
it
was
probably
you
know
seconds,
then
maybe
this
comes
to
dynamic,
overflowed
load,
balancing
point:
it
just
depends
on
the
type
of
job
at
a
circuit
and.
A
A
C
This
Conor
made
a
comment
about
the
ability
to
expect
the
things
that
are
in
the
queue
and
I
think
I
kind
of
responded,
off-the-cuff
today
and
I.
Think
and
I
want
to
go
back
and
understand
it
better
because
I,
you
want
to
say
any
more
about
the
the
use
case
for
affecting
resource
alarms
of
things
in
the
queue
yeah.
G
G
B
Are
multi
clerking
yeah
there?
There
are
multiple
scheduler
models
that
I'm,
not
the
framework
based
design
that
that
do
have
global
priority
sorting,
but
the
they
do
push
Peters
periodic
evaluation
where
all
the
schedulers
push
to
a
collector,
what
their
cues
are
or
what
their
individual
queues
are
and
then
that's
globally
sorted
right.
If
the
problem
was
made,
those
is
because
it
has.
It
has
it's
a
different
two-phase
model,
but
it's
it.
A
Idea
that
how
the
queues
would
work
that
Eric
and
we're
thinking
of
was
like
you
know,
the
work
that
a
batch
job
wanted
to
do
would
go
into
this
queue,
and
then
it
would
have
some
kind
of
controller
that
was
responsible
for
like
managing
the
the
resource
allocation
towards
and
killing
pods.
If
it
needs
to
scale
down
and
stuff
like
that,
and
so
I
mean
what
you're
talking
about
it's
kind
of
like
how
you
get
the
demand
signal
from
the
framework
into
the
centralized
scheduling,
framework
and
I.
A
E
Have
a
question
about
something
that
I
saw
in
the
doctor,
I
think
I
just
heard.
Why
is
it
that
we
think
that
the
scheduling
of
the
the
long-running
containers
and
the
batch
and
the
containers
from
the
batch
jobs
or
pods
whatever?
Why
do
we
think
that
they
are
totally
separate
and
in
particular,
what
I'm
thinking
about
is
if
an
organization-
but
you
know
this
case
where
you
know
an
organization-
a
sub
organization-
want
some
amount
of
resource
here
and
a
different
sub
organization,
one
some
amount
of
resource
there.
E
We,
you
know
the
high-level
org
may
just
want
to
say
you
guys
get
so
much.
You
guys
get
that
much,
and
each
organization
might
figure
out
how
much
they
want
to
spend
on
long
running
and
how
much
they
want
to
spend
on
batch,
and
they
don't
have
to
keep
going
back
to
the
high
level
to
reevaluate
that
balance
and
I'm
worried
that
if
we
start
with
the
separation
being
batch
versus
long-running
and
carve
it
up
that
way,
it
makes
your
model
that
I,
oh.
B
A
We
were
trying
to
kind
of
shoehorn.
This
into
the
I
mean
shoehorn,
is
kind
of
a
pejorative
but
like
into
the
quota
model
that
we
have
today
in
kubernetes
and
I
mean
you
know,
one
way
that
you
can
do
that
is
you
know
you
get
photos
to
name
spaces,
and
then
you
can
use
your
quota
either
for
batch
or
long-running
and
I
and
I
think
that
that
I
don't
know
that
we
want
to
tie
it
to
namespace.
A
But
I
think
that
the
idea
that
you
can
get
quota
flexibly
is
a
good
is
a
good
requirement
to
have
I
mean
I
think
that
the
quota
needs
to
be
per
priority
level.
Just
so
you
don't
get
an
arms
race
or
everyone.
You
know
set
figures
out
what
what
maps
to
intern
acts
and
sets
their
priority
to
that,
but,
but
but
definitely
being
able
to
flexibly
share
the
quota
between
the
the
batch
and
serving
stuff,
at
least
at
first
thought.
A
C
I,
just
two
things
you
might
be
saying:
I
guess
one
would
be.
You
should
be
able
to
use
the
clusters
resources
flexibly
between
long-running
and
batch
jobs,
or
only
thing
you
might
say
is
there
should
be.
Should
there
be
a
difference
between
how
I
start
something
I
expect
to
run
basically
forever
versus
something
and
that
I
want
to
versus
something
that
I
want
to
I,
know
the
duration
or
has
a
finite
duration?
C
A
E
I
What
do
you
tell
your
quota?
I,
don't
think
running
it.
I
want
to
clarify
David.
You
said
that
this
ship
has
sailed,
but
before
I'm,
not
sure
if
we
were
considering
those
differences,
basically
whether
a
job
Tennessee's
coin
or
continues
to
write
in
decisions
basically
something
that
finishes
at
some
point
of
course,
is
done
and
releases
its
resources
and
those
resources
will
become
available
to
schedule
can-
and
you
know,
schedule
more
jobs
in
its
place.
So
by
saying
that
we
want
to
have
a
distinction
between
Indies,
with
adding
the
priority
and
preemption.
I
Are
we
saying
that
the
scheduler
is
going
to
decide?
For
example,
let's
say
that
we
have
you
have
a
job
with
the
same
priority
as
a
continuously
running
service
and
scheduler
is
going
to
make
a
distinction
between
the
two
when
deciding
which
one
to
schedule
first,
so
both
of
them
have
to
say
priority
and
you're
up
to
try
I'm.
A
Think
we
probably
need
to
define
this
distinction
better,
but
they're
kind
of
doing
somewhat
similar
things
that,
although
the
scheduler
was
always
going
to
be
involved
in
scheduling
those
kinds
of
jobs
like
the
default
scheduler
there's
the
thing
that's
scheduling,
pods
I
mean
one
one
way
to
think
of
it
is
that
the
batch
job
controller
is
admitting
hold
jobs
at
a
time
and
then
the
default
scheduler
is
scheduling
the
individual
pod
I,
don't
know.
If
I
answered
your
question
or
not
yeah.
C
B
What
serves
there
is
a
document
but
I
think
what
I'd
like
to
take
a
step
back
for
it
is
to
refine
the
requirements
such
that
we
can
all
agree
upon
a
very
primitive
set
of
requirements,
because
if
we
do
that,
if
we
say
like
there's,
either
going
to
be
there's
going
to
be
some
definition
between
service
and
job
and
where
we
can
define
priority
and
preemption
models
across
those.
If
we
have
a
concrete
finite
set
of
like
five
to
ten
requirements,
both
of
old
would
help
drive.
B
You
know
potential
implementations,
but
I
think
what
we
have
is
we
reversed
it.
We
have
a
potential
of
limitation
and
we
don't
necessarily
have
the
definitive
set
of
requirements,
because
what
will
happen
is
I'm
sure
the
IBM
guys
will
go
off
and
go
define
what
exactly
they
need
right,
and
you
know
folks
at
them.
Different
companies
wants
to
do
similar
things,
there's
something
you
can
put
together.
Timothy
I
know
you
have
a
lot
of
experience
with
hearing.
I
can
help
with
it,
but
I
can't
I.
A
F
I
With
you
term
I
agree
with
you
on
and
like
a
few
simple
requirements
so
that
we
can
develop
on
that
and
so
once
those
requirements,
maybe
I
want.
Peter
document
is
just
basing
item
by
name
least
on
these
requirements,
and
then
we
can
work
on
those
items
and
it's
a
bit
more
documents
on
how
high
your
calculation
should
work.
B
A
Yeah
I
think
I
think
we
have
some
process
that
will
converge
within
a
reasonable
amount
of
time.
I
think
maybe
jumping
to
an
implementation,
and
not
it
was
the
wrong
thing.
I
mean
I
tried
to
have
someone
with
requirements
in
the
dock,
but
it
probably
wasn't
complete
enough
and
I
definitely
take
Tim's
comments
to
heart
that
the
distinction
between
what
is
six
part
of
the
architecture,
and
what
is
you
know?
Drop-In
replaceable
was
not
clear
and
maybe
even
not
completely
thought
out
so
I
think
that,
in
terms
of
stuff,
I
took
away
from
Tim's
comments.
A
A
We
have
a
p0
request
from
the
node
sig
to
have
a
priority
mechanism
that
they
can
use
on
the
cubelet
to
drive,
eviction
ordering
and
then
also
we
have
just
in
general,
people
have
always
been
asking
for
a
preemption
mechanism
for
continuous
running
jobs,
not
just
for
batch
jobs.
That's
that's
that
they
want
so
so
I
think
you
know,
there's
been
like
Tim
alluded
to
there's
been
decades
of
research
in
this
area.
I
think
we
should
try
to
to
not
spend
decades
coming
up
with
the
design.
A
We
should
figure
out
how
we
can
how
we
can
get
this
to
converge
relatively
quickly.
At
least
the
high
level
I
mean
like
I
said.
That
was
my
goal.
Was
that
we
could?
We
could
figure
out?
You
know
the
what
the
high
level
architecture
is
and
then
people
could
experiment
with
some
more
detailed,
lower
levels
and
write,
separate
docks.
So
yeah
I
don't
know,
maybe
maybe
the
next
we're
kind
of
out
of
time,
but
I
mean
I,
don't
know
if
people
want
to
talk
about
it
again
at
the
next
meeting
or
maybe
Tim.
A
About
the
requirements
offline
and
in
the
meantime,
I
can
try
to
clarify
the
documents
in
some
of
the
ax
that
the
people
brought
up
here.
I,
don't
know
people
have
other
suggestions.
B
Merely
this
week
we
can,
we
can
define,
who
can
set
up
like
an
interim
meeting
or
something
just
to
hash
through
some
of
proposed
requirements
that
come
to
it
with
just
a
couple
of
them
in
mind,
and
then
we
can,
you
know,
force
all
or
subset
of
folks
who
are
interested.
We
could
just
work
on
that
bit
sure
we
could
also
yeah.