►
From YouTube: Kubernetes Resource Management WG 20170425
Description
Meeting Agenda:
https://docs.google.com/document/d/1j3vrG6BgE0hUDs2e-1ZUegKN4W4Adb1B6oJ6j-4kyPU
A
A
B
A
So
say
on
the
agenda:
we
have
four
items
here.
The
first
is
I
want
to
know,
give
an
update
on
the
face
place
to
see
if
there
were
any
questions
about
it
and
raise
any
of
those
concerns
or
have
any
feedback
based
on
the
proposed
schedule
that
fish
throughout
therefore
I'm
slotting
items
has
everyone
had
a
chance
to
review
that
proposal
or
proposed
schedule,
and
if
so,
are
there
any
questions,
comments
or
concerns
that
people
want
to
raise?
Now
you.
A
C
D
Question
yeah
the
stanwyk,
so
one
other
schedule
before
we
had
a
thought
we
said,
but
I
don't
think
fish,
so
I
have
asked
about
who
want
to
step
up
and
lead
the
discussion,
but
this
is
still
question
or
winning.
There's
no
owner
and
I've
seen
a
lot
of
interest
more
into
the
the
resource
sharing
and
then
the
last
one
you
receive
move
it
up
only
for
hot
preset.
For
the
part
reset
remember.
We
were
thinking
about
ordering
of
those
discussions
of
the
whole
array,
so.
A
My
perspective,
I'm
I
I,
believe
a
lot
of
the
issues
that
we
face
as
a
project
or
are
stuck
on
me
how
we
do
the
resource
isolation
boots
and
like
absent,
an
answer
on
that
then
like
or
once
we
get
an
answer
on
that
I
think
some
of
the
pod
presetting
parts
become
more
naturally
expressible
or
just
ship
less
contentious.
So
actually,
from
my
perspective,
if
need
be
depending
on
where
we
land
on
the
first
half
of
the
discussion,
I'd
be
happy
to
make
the
Tuesday
afternoon
3:00
to
4:30
slot
support
the.
A
How
do
we
make
this
easier
for
end
users
to
consume
around
more
enhanced
pottery
sets
so
awful
Fiona,
given
that
I
actually
probably
biased
to
prepare
materials
for
both
topics,
and
we
can
use
that
as
kind
of
scratch.
Space
depend
on
how
we
go.
That's
it
thanks
yep,
any
other
questions
come
with
your
concerns.
A
So
is
there
any
concern
some
people
being
able
to
achieve
that,
and
if
not,
do
they
want
other
folks
to
try
to
fill
in
those
gaps?
I
know
for
myself.
Fish
and
I've
been
collaborating
on
some
stuff
with
sess,
but
I
don't
know
if
anyone
else
feels
like
they're
overburden
and
would
like
someone
else
when
they
greenie
to
step
up.
A
A
A
C
Suspect
that
you
would
need
to
be
escorted
by
a
Googler
one
once
you
get
inside
badged
areas
and
I,
don't
know
if
the
building
that
we
are
going
to
be
meeting
at
has
a
lobby.
If,
as
one
denied
you
can
select
down
there
and
I,
should
I
should
be
there
and
I'll
be
having
a
few
more
people
to
take
Caitlin.
C
But
yeah
I
mean
if
there
is
no
Lobby
and
make
sure
that
please
one
of
us
is
there
outside
the
door
with
the
necessary
just
to
let
people
in
but
I'm
gonna
basically
hope
that
the
list
that
we
have
in
the
darkness
assign
a
list
of
attendees,
then
maybe
printing
batches
for
that
list
today
or
later
see.
C
A
C
A
C
A
A
A
B
A
B
F
B
B
A
A
C
G
E
A
B
A
G
Wider,
it
looks
like
the
aspect
ratio
of
a
screen,
but
there's
no
content.
A
What
happy
I'm
gonna
have
a
much
less
week
with
sharing
my
screen.
Well
I'll
talk
through
the
demo
and
you
guys
can
trust
me
that
it
works
basically
I
opened
the
pr
trying
to
explore.
Originally
at
the
end
of
in
q1,
6f
Jennings
from
my
team
had
opened
up
a
PR
to
propose
adding
support
for
pre-allocated
huge
pages
and
in
preparation
for
next
week's
face-to-face.
A
I
tried
to
put
together
put
a
type
demonstrating
some
of
the
ideas
around
that
PR
proposal,
just
as
a
verification,
especially
now
that
we
have
certain
things
like
pod
little
C
groups.
That
should
make
it
easier.
So
in
the
link
to
the
issue
in
question,
it's
a
very
quick
prototype
that
demonstrates
C
advisor,
adding
discovery
capability
on
the
default,
huge
page
size
and
the
number
of
pages
configured
for
that
size.
A
A
We
could
support
nodes
with
more
than
one
size,
huge
page
or
not,
and
then,
generally
speaking,
when
going
through
this,
it
seems
like
the
individual
container
runtimes
do
not
yet
support
control
on
letting
you
set
per
container
huge
allocations,
and
so
at
best,
if
we
were
looking
to
roll
us
out
in
the
near
term,
I
thought
doing
per
pod.
Huge
page
allocations
was
perfectly
fine,
but
yeah.
A
C
A
The
API
a
prototype
right
now
was
basically
I
mean
the
history
of
this.
Is
it
used
to
be
a
machine
just
had
one
size
and
so
like
in
proc
mem
info?
We
just
saw
one
thing,
but
then,
if
you
have
machines
that
have
two
mega
bit
few
pages
and
I,
you
know
gig
edges,
you
get
to
vary
by
size.
So,
right
now
the
resource
requirement
that
you
make
would
be
huge
pages
and
then
the
size
of
the
page
in
KB,
and
that
kept
the
naming
convention
that
at
least
I
see
on
in
the
Sisyphus.
A
When
you
look
to
see
how
you
traders
are
configured
basically,
the
naming
convention
would
be
the
same
okay,
and
so
that
has
a
nice
side
effect
of
as
a
pod
spec
author.
If
I
knew
I
wanted
one
gig
pages
versus
two
Meg
pages
or
16
Meg
verses,
16,
gig,
I
forget
the
other
sizes.
I
wouldn't
have
to
have
another
piece
of
information
to
have
my
pod
land
on
a
node
that
had
that
particular
thousand
elbow.
It
was
just
actually
expressed
in
combination
with
the
huge
page
request:
okay,.
C
A
Yeah
so
basically
yeah
the
representation
was
the
sheer
number
of
huge
pages
at
a
particular
size
and
then
notes
expressed
that
capacity
right,
I'm,
not
sure
I'm,
not
aware
if
it
requires
more
complexity
than
that,
but
that
was
the
general
gist
and
then
from
an
accounting
standpoint,
because
huge
pages
basically
represent
preserved
memory.
I
didn't
like
giving
I,
don't
like
the
idea
of
having
best-effort
pods,
get
higher
resource
guarantees
for
any
resource
generally.
A
So
me,
it
seemed
like
first
of
all
and
guaranteed
pots
of
the
only
things
I
could
consume
these,
and
then
it
could
be
vish
when
I
was
thinking
through.
If
you
want
to
get
by
the
resource
representation,
it
could
be
that
certain
size,
huge
pages,
are
more
valuable
than
others
so
having
the
size
of
the
page
expressed
in
the
request
X
and
makes
quota
work
easier,
like
namespace
like
quota,
because
you
can
allocate
them
differently
without
any
other
decoration.
So
that
was
the
idea
there
and.
C
So
I
was
hoping
that
like
to
make
the
make
the
specification
like
abscess
occasion,
portable,
maybe
applicants
can
specify
minimum
requirements
or
like
they
cancel
a
request
for
four
huge
pages
rather
and
like
laments,
and
so
they
will
just
be
slaughtered
to
the
closest
bucket.
Just
like
a
way
to
be.
Let
storage,
I.
C
F
I
think
we
both
both
modes,
are
needed,
while
this
would
also
be
interesting,
but
there
are
cases
where
there
is
a
hard
requirement,
a
say
depending
on.
If
you
want
to
run
a
low
latency
period
location,
then
you
would
want
to
satisfy
the
limit
to
when
the
request
is
satisfying.
What
is
available
may
not
be
the
right
approach.
Sorry.
F
A
nice
good
place
come,
you
need
both
type
of
allocations,
put
the
guaranteed
model
and
also
the
best
effort
model
recording
fantastic.
You
can't
believe
everyone.
My
stepfather
is
basically
I.
Guess
when
request
is
less
than
limit
right,
correct,
that's
the
case,
a
quest
equal
to
limit,
then
it's
a
guarantee,
basically
absolute
guarantee.
So.
A
Is
my
yeah,
so
my
assumption
was
that
any
best
effort
pod
in
the
so
basically
pods
I
have
no
requests
or
limit
for
any
resource
enumerated.
They
would
not
be
able
to
consume
huge
pages
at
all.
So
when
the
pod
level
see
group
is
manifested
the
number
of
huge
pages
allowed
for
that
pod
would
be
strictly
accounted
at
zero,
whereas
for
huge
pages
generally
I
guess
it's
worth
calling
up.
I
did
not
think
that
they're
a
resource
that's
safe
for
overcommit,
typically
and
so
yeah.
C
E
C
A
I
C
I
A
You're
going
to
this
yeah
I
guess
what
I
was
going
to
say
is
I
just
wanted
to
draw
attention
towards
the
use
case
and
get
people's
eyes
on
the
PR,
who
haven't
had
a
chance
to
look
at
it
as
a
goal.
I
agree
with
you
wish
that
getting
an
implementation,
one
seven
would
be
a
stretch.
I
would
like,
in
the
spirit
of
the
performance,
sensitive
workload,
topics
that
we've
been
discussing
at
least
getting
general
design
agreement
worked
out,
and
that's
mainly
what's
this
PR
was
trying
to
motivate
and
I
believe
actually
been
helpful.
A
C
Yet
so
that's
that's
what
I
meant,
like
my
expose,
I
meant
like
Borg,
exposing
it
yeah
I
need
to
check
on
that.
The.
C
I
mean
when
it
comes
to
watch
life
environments.
We
really
don't
know
if
you're
really
worth
dealing
with
like
what
we
don't
know.
What
we're
dealing
with
right,
like
everything,
can
be
commercialized
in
sodium,
like
PCI
and
like
Numa,
and
even
the
hardware
trends.
We
don't
really
know
what
we're
eating
work.
So
that's
general
problem.
F
C
C
A
A
B
We've
created
a
new
PR
pointed
against
upstream
just
so
it
that
is
easier
to
find
and
easier
to
link
against
the
feature
issue
and
the
other
related
issues.
So
hopefully
we
can
get
some
comments
on
it.
It's
a
ton
of
code
because
some
of
it
is
generated
and
there's
you
know
the
unit
tests
take
up
a
lot
of
space.
B
So
it's
like
a
thousand
lines
of
tests,
but
the
the
the
PR
description
has
the
section,
which
should
be
helpful
kind
of
as
a
tour
to
show
how
the
different
pieces
hang
together.
And
so
here
we
have
a
link
straight
to
the
GE.
Rpc
protocol
is
probably
a
good
first
place
to
start
for
reviewers
and
then
the
event.
B
Dispatcher
is
the
piece
and
the
cubelet,
the
new
that
manages
the
communication
between
Cuba
and
the
Isolators,
and
then
these
two
links
are
the
wiring
from
existing
components
in
the
event
dispatcher
unit
tests
and
then
there's
a
helper
library
that
were
used
in
our
example
Isolators.
And
then
these
are
links
to
the
example.
B
Isolators
there's
a
no
op
one
which
just
logs
basically
and
I,
think
it
injects
an
environment
variable
into
a
container
just
doesn't
as
a
as
an
example
and
then
there's
the
CPU
affinity
isolator,
which
we've
demoed
a
couple
times
in
this
meeting
before
yeah.
So
that's
pretty
much
it
I
can
show
you
where
the
the
current
state
of
the
events.
So
we
have
these
four
events
that
are
emitted
by
the
event
dispatcher
it.
B
Currently
we
have
a
pod
pre-start,
pod,
post-op
and
container
pre-start
and
container
post
out
and
then,
when
you
reply,
UK
and
send
these
lists
of
isolation,
controls
and
the
controls
we
have
implemented
so
far
are
these.
We
have
CPU
sets
CPUs
and
MEMS,
and
then
we
have
container
environment
variables
which
you
can
inject
into
the
container
environment.
And
then
we
also
have
the
C
group
huge
TLB
limit,
so
you
can
set
so
those
are
what's
been
implemented.
They're
really
just
examples.
B
If
we
were
going
to
break
this
up
into
PRS
to
target,
you
know
that
we
were
actually
trying
to
merge.
We
would
probably
start
with
one
and
then
expand
them
on
them
on
them,
one
by
one,
separate
PRS.
So
yeah,
that's
the
update.
It's
there
for
review.
If
you're
interested,
please
take
a
look
happy
to
have
any
comments,
and
this
will
be
one
of
the
topics
it's
on
the
agenda
for
the
face
to
face.
We
have
coming
up
and
that's
it
thanks.
Connor.
A
J
So
come
I
just
want
to
limit
this
conversation
about
the
runtime
isolation,
part
I,
think
this
scheduling
and
like
whether
we
do
and
there
well
all
the
scheduling
part
is
something
that
should
be
addressed
like
Acuras,
because
and
the
runtime
part
is
in
our
opinion,
the
thing
called
the
basic
of
it.
So
I
mean
I'm
going
to
start
with.
The
simple
premise
is
that
no
container
engine
or
rent
line
works,
natively,
GPUs
and
Nvidia
docker
was,
which
is
the
product
we
started
with
only
supported
docker.
J
The
idea
is
that,
basically,
depending
on
the
container
engine
you're
going
to
earn
the
continued
runtime
you're
going
to
do
different
steps
and
those
steps
are
basically
the
different
actions
you're
going
to
take.
We
put
them
in
the
limit,
we
put
it
in
a
library
and
for
each
container,
rental
I'm
will
have
some
we'll
have
a
small
book
that
will
call
in
the
library.
Is
that
something
that
makes
sense
to
everyone.
C
I'm
again
stating
my
opinion
here,
the
runtime
is
not
the
right
abstraction
for
hardware
devices
in
kubernetes.
The
runtimes
are
meant
to
be
imperative:
they're
not
meant
to
have
any
sort
of
intelligence,
I
still
have
a
pod
or
a
containers
being
isolated,
or
what
sort
of
like
what
sort
of
devices
at
get
access
to?
So
that's,
basically,
not
the
right
abstraction.
H
C
The
point
is
that
GPUs
are
just
one
among
and
the
trend
devices
that
we
need
to
deal
with
at
the
node
level
and
like
the
cubelet
are
one
of
cubelets.
Extensions
was
got
a
deal
with
a
whole
bunch
of
other
resources,
and
so
we
have
to
take
into
account
GPUs
awesome
and,
like
maybe
GPUs
a
quirkiness.
That's
like
it
requires
some
extensions
at
the
qubit
level,
but
then,
like
the
extension,
is
not
at
runtime.
This
is
basically
what
I'm
trying
to
say.
K
C
J
C
K
C
If
you
aren't
saying
like
it
sort
of
what
it's
sort
of
similar
to
what
Connor
was
demoing
a
little
earlier
and
that
like
it
could
be
just
an
isolated
extension,
that's
going
to
speak
to
it
to
a
set
of
devices
and
and
like
that
that
isolator
is
going
to
take
care
of
doing
what
is
necessary
for
giving
apart
and
its
containers
access
to
one
or
more
devices.
So
so.
The
problem
here
is
that
the.
J
K
It'll
be
helpful
if
you
can
like
wearable,
M
pigment
like
a
VM,
you
have
to
do
like
different
things
than
like
a
container.
So
a
couplet
wouldn't
know
that
that
we
need
to
add
the
actual
run
times
where,
like
nvidia
gpus
within,
like
again,
for
example,
or
nvidia
gpus
within
run
c
or
or
within,
like
like
C
or
I,
mean
it's
totally
different.
Every
time.
C
B
C
No
there's
no
disagreement
there
and
that's
probably
a
runtime
problem,
and
maybe
maybe
both
are
just
talking
about
two
different
problems.
What
I'm
saying
is
like
cubelet
is
going
to
say
what
device
a
given
part
and
its
containers
is
going
to
use
and
we're
like
how
you
follow
so
I.
Could
we.
J
C
J
K
A
K
K
Know
because,
again
like
this
is,
if
I
coming
can
we
can?
We
probably
need
like
in
the
VM
case,
we
need
to
do
a
high
pass
through
or
stuff
like
that
in
case
of
rigid
view
for
a
like
GPU,
like
virtualization,
like
I,
said
you
have
also
srl
V.
We
have
all
immediately
devices
I
mean
there
are
a
bunch
of
virtualization
technology
for
devices
like
the
C
group
is
just
one
of
them
sure.
C
But
like
at
the
point,
it's
still
not
clear
what
the
requirements
are
and
holla'd
maps
to
the
use,
reducing
api.
I
see
the
like
I'm
happy
that
we
have
some
agreement
on
like
what
the
responsibilities
are,
a
different
layer.
I
feel
like
there.
If
you
open
questions,
one
is
like
what
is
the
exact
interface
between
the
cubelet
and
actually
the
CRI
under
one
x,
because
right
now,
that's
just
like
device
files
which
which
would
then
have
an
associated
major
liner.
Well.
C
C
K
Very
much
terrible
that
like
when
you
run
your
credit
code,
you
can
specify
an
environment
variable
things
like
could
a
visible,
be
light
and
you
obviously
isolate
those
devices
logics
in
that
for
the
credit
application
and
so
for
containers.
It
would
be
exactly
the
same
thing
you
would
say:
Nvidia
visible
device
and
then
the
inner
runtime
will
see
that
and
I
do
like
those
according
to
the
implementation.
So.
J
K
C
On
the
other
hand,
you're
saying
that
that
would
be
use
cases
around
virtual,
a
CPU
I'm
more
interested
in
knowing
like
how
do
you
plan
on
exposing
such
features
like
if
you're
going
to
share
a
GPU
across
five
or
six
different
parts?
How
is
that
being
a
exposed
plant
users?
That's.
J
C
Okay,
I'm
just
trying
to
think
a
lot
at
this
point
like
to
to
better
understand
what
exactly
I
should
be
at
the
CRI
level.
I
think
we
need
a
list
of
different
different
means
of
consuming
GPUs
can
the
only
the
only
use
cases
support
know
is
like
providing
access
to
a
complete
device
in
the
form
of
the
form
of
like
a
an
actual
device.
Detonation
liner,
if
there
are
additional
scenarios
and
I
think
need
a
list.
K
C
J
B
J
It
with
just
like
ask
for
a
certain
number
of
PPS,
and
then
it
was
like
specify
the
different
constraints,
so
those
constraints
would
be,
for
example,
the
compute
capabilities
or
the
memory
I.
Think
the
simple
use
case
would
be
the
memory,
a
user
with
say:
I
want
to
DP
use
with
at
least
eight
gigs
of
memory,
and
of
course,
that's
not
today.
You
can't
express
that
levels
or
anything
again,
because
if
your
note
has
a
Jeep,
I
had
to
rent
a
noose
and
confuse
and
it
doesn't
work.
J
C
K
K
Well,
not
really
I
mean
the
ID
requirements
have
basically
the
couplet
that
you
can
like
the
requirements.
User
specified
role
like
select
like
the
GPU,
but
apart
from
that,
we
don't
well.
We
just
our
input
is
just
I
like
this
device
and
we'll
do
it.
It's
about
it.
How
you
like
with
devices
applicable?
Okay,.
C
So
it's
less
about
isolations
more
about
like
expose
this
device
to
this
given
set
of
transfer
fees,
okay,
yeah
yeah.
That
seems
sorry
to
me.
I
think
we
still
have
to
have
a
separate
conversation
or
like
what
API
should
be
because
then
my
not
variables
doesn't
seem
nice,
but
that's
a
much
easier
description.
Kaha
yeah.
C
J
I
think
the
the
basic
example
would
be
the
one
I
just
presented,
which
is
a
user
once
two
GPUs,
that
least
eight
gigs
of
memory
and
I
mean
that's
something
that
can
only
sort
be
saw
that
a
scheduler
level
and
I
think
one
of
the
discussions
we
had
was
in
the
sixth
grade
when
group
was,
and
one
of
the
concerns
we
had
is
that
what
is
the
minimum
set
up?
J
J
I'll
take
this
for
yes
and
the
edges
that
and
after
that,
we'll
go
through
the
schedule.
Prior
dice
scheduler
would
would
apply
all
his
prioritized
functions
and
then
call
the
matching
extenders
and
finally
it'll
come
to
a
decision
and
we're
basing
ourselves
on
a
schedule
at
PR
and
at
this
point
that
and
delegates
the
bind
responsibilities
to
an
extender
and
that's
to
do
the
resource
management
parts
where
you
need
the
extender
to
be
aware
of
which
node
has
been
taken
to
do
proper
resource
management,
and
is
that
something
everyone's
following?
F
F
F
F
J
Could
be
like
basically,
external
hardware
constraints,
not
CPUs,
and
you
know
I
think
so,
and
the
supporting
part
for
GPUs
is
described
a
bit
at
the
beginning
of
the
document
and
and
basically,
if
it's
not
done
right,
then
it's
not
it's
usually
not
worth
and
launching
your
task
on
GPUs
that
have
better
or
not
next
to
each
other.
So,
for
example,
in
the
four
GPUs
with
PCIe.
K
F
K
The
village
is
like
really
complicated
and
we
would
like
to
for
the
MVP
not
care
about
topology.
It's
like
a
broader
topic
that
need
to
involve
a
lot
of
vendors
and
I
mean
basically
to
make
topology
decision.
You
need
to
appear
on
the
whole
system,
I
just
use
so
for
now,
if
we
could
just
I
focus
on
like
having
basic
constraints
for
the
GPU
itself,
unlike
the
requirements
like
memory
or
like
compute
capability,
if
you
could
just
do
that
as
I'm
VP,
that
would
be
great
right
now.
F
Yeah
I
mean
yes
like
where,
on
one
side
it
can
argue
topologies
like
really
complicated,
but
you
know
impact
invent
tools,
exercises
OpenStack
I
mean
if
we
can
model
it
as
network
interconnected
it
pretty
much.
You
know,
and
with
that
Apple
it's
not
horrendously
complicated.
I
agree.
There
is
some
multi-vendor
environment
working
world,
but
it
can
be
modeled,
you
know
reasonably
and
it
can
be
solved
in
open
source
framework.
That's
what
I
found
so.
K
J
J
C
C
A
I
think
we're
heading
up
on
the
end
of
the
hour
here
so
far.
The
best
thing
to
do
is
everyone
gets
a
chance
to
read
all
these
materials
more
closely
before
next
week
and
I
think
we'll
proceed
from
there.
John
I
think
I'm.
Looking
forward
to
everybody
participating
next
week,
I'm
excited
to
see
a
lot
of
participation
in
this
space
since
the
start
of
this
new
year.
So
yeah.