►
From YouTube: Kubernetes Resource Management WG 20171101
Description
Meeting Agenda:
https://docs.google.com/document/d/1j3vrG6BgE0hUDs2e-1ZUegKN4W4Adb1B6oJ6j-4kyPU
A
All
right
so
I
just
kicked
off
the
recording.
This
is
the
I
guess.
November
first
meeting
of
the
resource
management
group
got
to
topics
on
the
agenda,
but
I'm
not
sure
if
we
actually
have
their
representatives
on
the
call.
So
if
someone
that
is
not
recorded
here
wants
to
talk
about
the
initial
FPGA
support
proposal
that
was
linked,
please
speak
up
and
if
not
we'll
move
on
to
the
device
plug-in
architecture
document
is
someone
on
the
call
that
was
able
to
speak
about
FPGAs.
A
B
A
A
F
Can
you
hear
me
now?
Yes,
somewhere,
we
can
hear
you,
okay,
okay!
Thank
you.
There
are
some
confusion
about
the
exact
time.
I
joined
a
few
minutes
earlier
and
we're
saying
the
force
is
not
adjoined
and
there
were
stores
actually
one
hour
later
because
of
daylight
savings
time.
So
it
was
a
bit
of
a
sigh
sorry
about.
F
F
Thank
you.
Okay.
First
of
all,
thanks
a
lot,
those
sanguine
authors
and
Alexandra
Kenneth's
key
for
helping
me
in
writing.
This
document
I
think
many
of
you
already
looked
at
the
document,
so
we
can
kind
of
either
board
section
by
section
or
it's
got
specific
questions.
We
can
look
at
that
I'm
fine
either
way.
A
F
Good,
so
the
overall
background
is
that
FPGA
is
a
different
breed
of
devices
so
compared
to
other
devices.
There
are
additional
factors
like
how
you
program
the
device
and
how
you
handle
images,
programming
and
such
in
addition,
when
you
do
the
programming
that
could
be
multiple
accelerators
in
the
same
device,
so
I
could've
FPGA
with
two
separate
partial
reconfiguration
regions
and
each
region
may
contain
a
different
accelerator.
F
Also,
the
you
need
needs
to
match
the
particular
region
type
it's
meant
for
so
when
you
oxygen
rate,
let's
say
an
IPSec
activator,
for
example,
would
synthesize
an
image
for
a
particular
region.
You
cannot
apply
to
some
other
region,
you
on
the
same
FPGA,
so
the
software's
take
care
of
all
these
things,
and
when
you
pick
an
image
to
do
the
programming,
you
need
to
make
sure
it's
matching
the
participation
type.
There
may
not
be
something
in
the
hardware
which
prevents
a
bad
programming.
Some
device
may
do,
and
some
may
not.
F
F
F
For
example,
by
doing
a
em
map
manually,
not
the
PCI
registers
to
address
space
that
requires
privileges,
so
there
should
be
some
way
for
the
continuity
expressed
that
or
for
a
plugin
to
say
that
this
kind
of
operations
requires
so
based
on
all
the
discussion
you
had
in
the
document
or
the
past
couple
of
days.
I
think
the
overall
tenor
of
the
feedback
is
that
we
should
not
make
any
exchanges
just
yet
we
should
try
to
stick
to
the
API
is
already
proposed
and
see
what
limitations
will
arise.
B
Yeah
so
yeah,
that's
a
good
point
in
that,
like
in
let's
just
keep
it
very
simple.
I
think
we
can
make
some
safe
assumptions
that
people
aren't
going
to
reprogram
every
day
or
like
every
hour
and
so
have
like
a
really
simple
controllogix
or
give
people
a
complete
solution
that
they
can
use,
even
if
it
has
some
limitations
and
then
like
start
prioritizing,
what
limitations
needs
to
be
addressed
right
away
and
then
then
bring
it
back
to
the
community.
F
To
answer
all
your
questions:
yes,
the
main
things
we
want
to
address
our
static
click
program
devices
as
well
as
contain
a
program
model,
but
I
should
also
emphasize
that
the
current
EAP
is
do
not
allow
for
handing
local
memory
in
any
form.
So
if
you've
got
two
FPGA
implementations
which
differ
in
the
amount
of
local
memory
I,
there
is
no
way
they
express
that
or
to
request
for
a
certain
amount.
So
even
the
local
memory
has
to
be
outside
the
API.
For
now,.
B
Devices
are
not
really
shared
like
and
if
you
have
to
share,
then
we
need,
like
better.
Api
is
essentially
like.
For
example,
we
share
the
main
memory
and,
like
we
share
CPU
and
like
that,
requires
an
extra
inbuilt
logic
into
cumulus.
For
to
make
that
happen,
you
don't
have
a
good
extensible
way
for
applying
such
a
logic
to
other
resources.
B
So,
to
begin
with,
if
we
can
just
like
statically
define
what
the
memory
part
memory
sizes
would
be
for
four
different
accelerators
that
you
burn
into
an
FPGA,
then
that
might
be
good
enough
to
start
with.
Maybe
you
express
that
as
part
of
your
resource
name,
for
example,
we
just
find
the
easiest
possible
way
to
to
have
users
consume
this
resource,
and
then
we
can
come
back
and
like
find
out.
How
do
we
even
have
to
improve
it
and
if
so,
like?
What
would
be
the
right
user
experience?
B
F
D
D
So
I
think
you're,
imagining
in
the
document
that
you
can
start
by
just
a
hybrid
computing
devices
to
have
some
statically
allocated
local
memory
with
that
model,
and
so
I
do
say
the
your
pointer
or
supporting
like
a
dynamically
local
memory
allocation,
and
this
information
from
Daniel
is
expected
to
be
a
local
knowledge
or
Hakeem.
Is
it
true.
F
Initially,
it
can
be
completely
hidden
by
the
plug-in,
but
what
I'm
trying
to
convey
is
that
in
general
it
may
need
to
be
exposed
a
resource.
So
the
container
may
come
and
say:
I
want
4gb
of
local
memory
for
this
IPSec
accelerator,
but
not
all
devices
may
actually
have
4gb
of
local
money,
so
you
need
to
actually
pick
a
device
at
the
scheduler
level
or
the
coordinate
level
which
actually
contains
it
that
much
resource.
F
So
to
begin
with,
we
can
make
some
simplifying
assumption,
like
all
devices
contain
enough
resources
for
any
use
cases.
That
would
be
a
kind
of
a
very
restrictive
assumption,
but
you
can
start
with
that,
but
in
general,
we'll
add
to
certain
some
more
exposed
sources
and
the
contents
would
be
able
to
ask
for
that
right.
B
So
you
could
like
homogeneous
like
as
nodes
more
systematically,
so
you
could
like
consider
adding
labels
like
this
was
part
of
the
original
device
plug-in
designer
cell
that,
like
plugins,
can
expose
node
labels.
So
you
can.
We
can
assume
that
on
a
given
node,
accelerators,
foo
and
bar
that
you
have
program
onto
an
FPGA
would
have
so
much
amount
of
memory
right
like
we
can
use
node
labels
to
do
some
sort
of
like
cheap
scheduling
for
now.
Okay,.
F
B
Like,
overall,
like
your
previous
question,
was
like
what
information
do
we
need
in
order
to
consider
adding
more
features
like
personally
I
would
prefer
having
real
customer
feedback
rather
than
like,
because
we
are
really
creative
and
like
we
can
think
of
awesome
use
cases,
but
at
the
end
of
the
day,
the
customer
might
not
be
that
sophisticated.
So
I
would
like
to
see
like
overall
end-to-end
workflows,
like
I
mean
probably
like
network
acceleration
was
one
use
case,
other
use
case.
A
text
on
your
dock
is
about
like
predictions
like
ml
predictions.
B
H
It's
a
bit
of
a
catch
one
one,
because
if,
with
the
current
API,
we
basically
are
very
restricted
in
the
with
the
container
programming
model,
so
what
you
would
not
allow
customers
to
do
with
with
the
current
API
used
to
is
for
them
to
go
to
and
and
really
work
on,
containers
that
could
program
FPGAs,
which
is
a
use
case.
That's
is
very
important
as
well,
so.
B
H
B
H
I
I
agree
with
that.
I
definitely
agree
that
we
want
to
restrict
ourselves,
but
the
the
thing
is
by
not
by
another
acknowledging
that's
user
need
to
understand
how
much
memory
they
can
get
from
from
how
much
memory
they
can
request
from
from
for
a
specific
IP
or
what
are
the
security
contact
context
that
they
their
container
is
going
to
need
for
programming
the
FPGA
we
kind
of
remove
a
one
very
important
use
case
and
put
it
outside
of
the
of
the
user
reach.
H
B
It's
yeah
I
just
threw
that
thought
of
like
having
cycle
what,
if,
like
these
nodes,
are
dedicated
in
a
sense
that
you're
not
sharing
these
nodes
with
other
workloads,
at
which
point
you
can
safely
like
hand
over
maintenance
to
the
end-user
and
then
like
being
free
to
like
reprogram
the
machine
and
like
use
it.
However,
the,
however
they
want-
and
this
is
actually
a
very
common
factor
here
and
kudos-
are
we
have
like
specialized
hardware,
that's
used
by
a
small
subset
of
users.
Then
you
just.
I
H
B
I
mean
fact:
I
am
I,
going
one
step
further
and
saying,
or
the
initial
version
for
maybe
for
like
the
next
six
months.
What,
if
you
just
have
users
like
completely
owned
nodes
that
have
a
few
GS
and
you've,
given
a
solution
like
you
get
cid
or
whatever,
that
simplify
is
burning
effigies
for
them
and
you
use
the
equivalent
a
PS
simplify
table.
But
then,
like
those
nodes,
are
not
being
shy,
at
which
point
like
your
security
restrictions
are
reduced
and
and
like
you
can,
you
can
like
use
the
use.
B
B
I
F
B
You
modify
the
pots
back
once
it's
admitted
by
the
API
server,
except
for
like
the
object
meta,
and
the
assumption
is
that,
like
all
the
security
policy
is
that
an
administrator
is
put
in
place
are
are
being
evaluated
as
far
apart
mission.
So
it's
really
not
like
it's,
not
the
coolest
model
to
have
some
extension
running
later
in
the
in
the
lifecycle
of
a
part
to
go
and
like
elevate
privileges
or
like
change
the
path
specs.
So
that's,
but
that's
not
what
the
extensions
were
actually
designed
for
it.
B
D
B
B
F
Just
to
go
back
to
one
of
the
previous
points,
if
you
look
at
today
see
API
is
the
actual
code.
It
it
takes
a
part
spec.
The
scheduled
one
takes
a
part
spec
pointer,
so
if
it
were
to
modify
it
will
actually
get
reflected
back
in
the
main
scheduler.
So
without
any
changes
scheduled
extension
EAP
is,
it
looks
like
a.
We
can
actually
modify
the
portsmouth.
I
understand
it's
not
modest
meant
far,
but
without
any
further
change,
little
seems
to
work
right.
So
the.
B
So
what
we're
suggesting
you
do
is
like,
as
derek
was
saying,
maybe
a
focus
models
we're
like.
Maybe
you
have
an
extender,
but
external
is
also
acting
as
a
web
hook
and
what
it
does
is
like
whenever
it
sees
a
part
that
Rick
that
requests
a
resource
that
your
external
is
is
supporting.
Then
it
injects
the
right
capabilities
or
any
other
security
privileges
that
you
need
for
the
part
right
and
then
it
goes
through
the
regular
admission
process
and
an
operator
can
enforce
waterfall
security
policies
that
they
want
to,
and
so
there's
like.
B
J
J
J
B
D
B
Logic
would
be
any
logic
that
can
be
abstracted
and
centralized
in
the
cubelet
I
mean
our
assumption.
Is
that
anything
you
can
do
a
DLL
kid
you
can
you
can
sort
of
do
it
during
allocation
phase?
If
that
is
not
true,
and
if
you
have
like
a
really
concrete
use
case
for
Dirac
aid,
then
like
just
please
simple.
D
Currently,
on
Kublai
said
we
don't
have
a
good
way
to
know
when
the
allocation
actually
happens.
How
do
to
that
later
during,
like
the
next
time
location,
just
look
at
the
active
part
and
reclaim
my
knee
results
is
a
not
use
active
part.
So
just
aren't
cool
I
said
that
we
don't
really
have
a
good
way
to
know
when
the
allocation
I
would
happen.
B
D
J
H
H
I
Great
and
so
just
for
the
geolocation
port
for
GPUs,
we
it's
not
that
we
didn't
have
a
clear
use
case.
We
do
want
to
wipe
the
memory
we
do
want
to
be
able
to
do
a
few
things
on
geolocation.
We
just
moved
that
part
to
the
allocation,
a
part
because
it
was
more
reliable
and
it
might
be
interesting
to
shut
down
the
GPUs,
but
that's
not
usually
what
our
customers
do.
B
D
I
So
I
do
want
to
come
back
on
two
quick
things
that
hurt
that
I
couldn't
touch,
but
the
first
one
is
I.
Remember
fish,
mentioning
I,
don't
remember
who's
mentioned.
Is
it
this
schedule
extended
and
I've
been
through
that
part
during
the
initial
implementation
at
the
device?
Looking
for
GPUs
and
there's
a
lot
of
not
really
straightforward
issues
that
you're
gonna
encounter?
I
So,
for
example,
you,
if
you
have
to
inject
things
in
the
pod
spec
in
the
schedule
extender
you
end
up
having
to
delete
the
pod
and
recreate
it
and
the
schedule
extenders
not
the
best
place,
you
probably
want
to
do
it
at
the
admission
control
level
and
the
other
one
I
was
looking
at
I.
Remember
is
so
I
heard
you
mentioned
that
you
wanted
to
inject
the
security
context
and
I
didn't
understand
that
we
I
think
this
was
the
one
who
was
pushing
like
on
this.
B
It
might
logically
make
sense,
but
practically
speaking
as
a
cluster
administrator
I
don't
want
my
pots
to
have
elevated
privileges
like
I
would
want
to
like
know
that
either
it's
a
dedicated
node
at
which
point
like
I'm,
not
really
applying
any
sort
of
security
policies.
I
wish
my
I
don't
really
care
what
what
the
security
policies
are
on
that
node
or
it
should
should
be
like
a
centralized
policy
and
policies
are
enforced
currently
at
the
cluster
level,
not
at
the
node
level,
and
so
I
would
prefer
like
not
injecting
edition
security
privileges.
I
Right
and
device
buildings
are
supposed,
or
at
least
we
expect
device
plug-ins
to
be
deployed
by
an
administrator.
A
cluster
administrator
and
I
mean
I'm,
not
exactly
sure
if
security
context
is
needed,
but
if
it
is
needed
for
for
you
to
program
your
device
point
earlier,
a
speech
game
is
that
it
wouldn't
make
sense,
at
least
for
the
cluster
admin,
to
say.
Oh,
it's
these,
for
example,
these
this
group
of
user
wants
to
be
able
to
program.
This
PPA,
then
I
understand
that
they
need
more
spillage
for
their
containers.
That
would
make
him
right.
B
To
begin
with,
that
might
be
that
could
be
an
working
assumption
or,
like
we
said
a
few
times
like
you
can
have
that
hook
or
initializer
or
whatever
you
wanna
use
to
to
like.
Do
this
as
part
of
admission
rather
than
after
admission.
So
you
will
do
it
before
the
paths
package
is
accepted
by
the
API
server,
but.
H
B
H
D
H
D
D
B
I
mean
like
this
is
pushing
like
doing
things
in
the
ex-owner
or
doing
things
in
the
in
the
device
plug
in
a
shot
of
pushing
us
towards
an
imperative
design,
which
is
exactly
what
criticism.
Whiting,
so
I
would
say,
like
just
start
with
the
thing
that's
possible
today,
and
then
we
can
explore
further,
if
necessary,
in
the
future.
But.
I
I
D
B
Let's
start
with
something
very,
very
simple
and
then
and
then
we
can
think
of
extending
like
adding
any
mutating
the
ports
back
in
any
day
after
admission,
this
sort
of
pushing
cumulus
to
us
an
imperative
I
think
will
face
a
lot
of
resistance
from
people.
So
I
just
feel
like
it's
not
it's
not
worth
the
conversation
at
this
point
and
unless
we
have
exhausted
other
options,
and
we
have
a
concrete
proposal
for
why
we
have
to
change
the
model.
A
I
Can
you
all
see
and
the
design
document?
Yes,
yes,
so
I
hope
now
that
everyone
has
taken
a
look
at
it.
I
wanted
I,
think
I
wanted
it
to
so
I
wanted
it
to
come
back
on
base
grander
goals,
so
it
in
my
mind,
I,
actually
completed
their
list
of
difference.
I
Central
Cummings
and
fix
bugs
so
as
I
was
completing
this
list
and
what
what
one
of
the
goal,
or
at
least
the
main
goal
of
this
design
document,
and
that
that
I
tried
to
address
is
that
the
number
of
bugs
that
we've
had
a
number
of
shortcomings
that
could
coverage
that
the
people
who
tested
them
I
mean
the
facts,
isn't
back
to
performances,
but
that's
pretty
much
limited.
I
That's
a
that's
a
signal
that
says
the
current
architecture
that
we
have
is
is
just
not
good
enough
and
if
we're
going
to
continue
and
adding
features
were
if
we're
going
to
continue
adding
features
yet
then
I
mean
this
architecture
is
just
going
to
continue
in
service
more
bugs.
But
that's
my
my
feeling,
looking
at
all
the
the
number
of
bugs
that
we've
had
because
well
the
features,
the
number
of
thoughts
that
we've
had
it
pretty
much
astonishing,
and
so
that's.
B
D
Kind
of
like
an
integration
problem
like,
for
example,
the
best
pranky
and
those
educational
resources,
it
doesn't
manages
it's
kind
of
like
the
integration
with
the
resource
name
API
and
like,
for
example,
the
it
we
know
tested.
It's
flaky,
I
think
the
the
it
was
I,
don't
think
it's
too
freaky
right
now.
I
think
it
was
flaky
mostly
at
the
beginning.
When
it'll
have
some
issues.
I
B
I
think
that's
why
there's
some
disagreement
or
not
and
that
like
it's
beyond
just
a
cubelet
site
implementation,
there's
like
a
really
device
plug
an
implementation
that
we
are
using
for
end-to-end
tests
and
then
we
are
relying
on
like
GC
infrastructure
and
we
are
relying
on
our
driver,
installation
mechanisms
and
so
on.
So
there's
like
lot
more
variables
and
just
the
Vice
plug-in
architecture.
So
it
might
be
a
little
bit
premature
to
like
say
that
it's
the
device
plug-in
architecture,
that's
the
cost
for
all
bugs
I.
Think
that's
far.
I
So
this
is
also
a
second
understand
that
also
I
was
also
thinking
that,
if
we're
going
to
continue
adding
more
and
more
features-
or
at
least
if
we
can
continue
and
adding
more
features,
then
it
would
be
good
to
have
a
reliable
test
infrastructure
that
it
is
not
likely
that
does
not
take
60
seconds
and
that
we
can
actually
pretty
quickly
say.
I
mean
I.
I've,
looked
at
the
test
and
I've
written
most
a
lot
of
tests
and
what
I
feel
from
the
viewers
of
the
scene
and
from
the
coded
ribbon
is
that
writing.
I
B
B
Real-World
scenarios
in
having
having
really
good
unit
test
coverage
is
very,
very
helpful,
like
all
they
call
excited
about
that
and,
frankly,
they
care
about
very
helpful.
But
I
think
you
should
also
like
think
in
terms
of
how
can
we
better
spend
our
energy
like
if
we
spend
all
our
energy,
including
just
unit
this?
Would
that
be
enough?
Or
would
we
have
to
spend
our
energy
on
the
class
level?
I
I
But
the
point
is
mostly
that
I'm,
so
I
just
wanted
that
I
guess
what
I
was
trying
to
give
up
so
and
the
the
point
was
mostly
that,
if
we're
able
to
at
least
get
a
few
tests
here
are
at
least
the
integration
tester
nuts
and
place
the
unit
test
effort,
and
if
we're
going
to
go
to
to
go
on
down
the
road.
We're
saying
that
our
main
point
for
this
milestone
is
to
have
stability,
that
integration
test
in
unit
tests
should
be
object.
That's
right.
D
So
maybe
we
can
move
to
the
table
proposed
a
katara
change,
I
think,
like
I,
think
we
should
ponder.
Is
there
like
improving
unitized
performance?
It
shouldn't
be
this
so
legal,
the
so-called
like
fatheri
architecture,
but
maybe
we
can
look
at
your
proposal.
I
see
you,
you
made
some
good
observations
and,
like
the
current
architecture,
has
some
limitations
and
then
maybe
we
can
simplify
it
by
looking
at
how
we
may
simplify
the
locking
logic
or
the
current
code.
D
B
Organization,
I
just
want
to
make
sure
that
everyone
else
in
the
call,
possibly
including
me,
I,
I'm
a
little
doubtful
if
we're
all
like
other
of
the
nitty
gritty
details
of
the
internal
architecture.
So
would
it
make
sense?
This
discussion
will
happen
between
the
folks
who
are
like
very
much
familiar
with
architecture
rather
like
in
this
studying.
D
C
B
D
B
H
C
B
E
B
A
E
Just
wanted
to
plug
so
Kerry
Zhang
Rhys,
our
and
github,
and
Balaji
and
I,
are
putting
together
a
topic
proposal
for
kube
cotton,
the
contributor
summit
to
discuss
in
intra
node,
topology
enhancement
so
stuff
like
making
the
CPU
manager
and
the
device
plug-in
manager
coherent
in
terms
of
Numa
affinity
for
decisions.
So
just
a
heads
up
that
will
be
submitting
that
as
a
topic
and
hopefully
it
gets
accepted
and
if
so,
I'm
looking
forward
to
discussing
it
with
everybody.
B
My
you
know
my
gut
feeling
is
that
we
would
not
really
be
having
enough
time
to
actually
go
or
that
think
the
the
meeting
that
we
had
in
May,
where
we
actually
had
a
couple
of
days
to
to
discuss
this
with
the
WIPO.
It's
probably
a
better
setting
for
it,
but
in
any
case
like
we
can
give
it
a
shot
and
see
if
we
can
make
some
progress.
A
D
A
E
A
Well,
I
I
say
that
for
two
reasons
one
I
won't
be
there
and
so
I
don't
want
that
to
be
the
decision-making
forum
and
then
to
like,
when
invites
went
out
for
this,
it's
it's
intended
to
be
more
like
a
place
just
for
people
to
discuss
things
but
not
decide
things.
So
as
long
as
we
understand
that
we're
not
coming
out
of
these
things
with
decisions,
then
you
know
people
can
discuss
what
they
want
to
discuss,
but
I
prefer
that
we
have
a
more
dedicated
topic
or
discussion
for
it.
Okay,.