►
From YouTube: Kubernetes SIG Node 20200707
Description
Meeting Agenda:
https://docs.google.com/document/d/1j3vrG6BgE0hUDs2e-1ZUegKN4W4Adb1B6oJ6j-4kyPU
A
Code
and
they
were
reviewing
it
thanks
for
your
effort
in
getting
them
involved,
but
it
looks
like
they're
reviewing
some
of
the
previous
previously
completed
api
decisions,
including
resources.
Let
me
just
share
the
screen,
so
we
can
look
at
it.
A
Okay,
great
so
yeah
tim
was
asking
what
whether
we
really
need
the
resources
allocated
to
be
in
the
spec.
Why
not
do
local
checkpointing
summaries
versus
resources
to
set
resources
allocated
instead
of
having
the
pod
the
admission
plugin
that
we
added
the
pod
resource
allocation
plugin
and
a
few
other
questions
around
runtimes?
I
hope
mike
brown
is
here
today.
He
could
share
some
insight
on
the
particular
question
is
if
a
runtime
class,
whether
it
should
be
allowed
to
say
I
can't
do
in
place
recess.
A
I
think
this
game
arose
out
of
a
scenario
where
vm
runtimes
they
may
be
able
to
access
a
cpu
within
the
mili
within
the
cpu,
integral
cpu
bound.
So
if
you're
going
from
one
cpu
to
1.5,
it
can
do
it
without
starting,
but
we
might
need
to
restart
if
it
needs
to
go
to
two
cpus.
A
So
a
situation
like
that
and
in
general,
the
way
we
have
currently
defined
it
is
we
are
saying
no
restart
is
okay.
As
far
as
kubelet
is
concerned,
we
won't
do
anything
to
restart
you,
but
for
the
end
user.
That
may
not
be
true,
and
that
looks
kind
of
like
the
user
experience
is
not
really
well
defined
there.
They
might
see
something,
that's
they
might
expect
something
and
not
happen,
and
that
might
you
know
confuse
them
and
they
may
not
be
happy
about
it.
B
Sorry
can
I
can,
I
just
say
just
because
I
don't
want
to
waste
anybody's
time,
so
I
jumped
on
this
cap
because
it's
a
it's
particularly
interesting
to
me.
You
know
I
worked
in
borg
and
omega
before
inside
google.
I
know
just
how
complicated
this
problem
is
and
actually
it's
more
complicated
than
even
I
understand
so,
I'm
very
interested
in
it
I've
most
of
the
concerns
that
I
had,
I
think
I've
waived.
B
I
have
only
a
few
points
left
that
I
think
are
worth
discussing
and
specifically
that's
the
spec
versus
status
checkpointing
versus
api
thing,
whether
we
want
a
sub
resource
for
resources,
conditions
and
events
and
signals
in
general,
and
the
last
point
is
the
sort
of
the
semantic
that
you
just
described,
though
I
like
the
words
that
you
wrote
on
the
on
the
cap,
so
actually
in
my
mind
that
one's
resolved,
so
I'm
down
to
three
three
standing
points.
A
Okay,
so
let's
pick
up
the
first
one,
so
we
the
reasons
I
have
mentioned-
and
I
just
updated
responded
to
that-
I
think
the
benefits
of
having
resources
are
located
in
the
spec
is
twofold:
one
is
it
opens
the
door
for
what
seems
like
a
useful
feature
where
a
pod
can
say:
hey,
I
need
at
minimum.
I
need
like
x,
amount
of
resources,
say
one
cpu
and
one
gig
of
ram,
ideally
I'd
like
two
and
then
we'd
admit
it
at
x
at
one
and
then
work
towards
two
where
possible.
A
So
that
way,
the
pod
would
would
get
scheduled
and
start
running
with
its
minimum
desired
minimum
required,
rather
than
not
stay
pending,
because
resources
are
not
available.
So
that
feels.
B
Like
I,
I
I
I
can't
think
of
how
I
would
use
that,
and
I
don't.
I
don't
want
to
design
to
hypotheticals
my
main
argument
for
context
for
anybody
who
hasn't
read
along
this
giant
pr
back
and
forth,
like
I
admit,
there's
a
lot
of
comments
going
on
there.
The
main
point
that
I
have
with
it
is
as
an
api
reviewer.
B
I
find
it
to
be
very
weird
to
be
putting
something
in
spec
that
the
owner
of
spec
isn't
allowed
to
write
to
and
the
need
for
this
admission
control,
which
is
using
identities
to
subset,
who
isn't
isn't
allowed
to
write
to
certain
fields,
is
really
unprecedented
and
is
a
is
a
strong
smell
to
me
that
something
isn't
working
right.
B
So
I
asked
why
why
not
status
and
vinay
helpfully
disavowed
me
of
the
idea
that
that
would
actually
work,
and
there
was
a
there's,
a
nice
corner
case
that
I
had
missed
in
my
head.
B
What
I
really
think,
though,
is
why,
should
this
be
part
of
the
public
api,
as
opposed
to
being
effectively
a
cubelet
problem?
It
feels
to
me
like
the
right
answer,
for
this
is
a
local
checkpoint
and
to
store
the
allocated
in
status
that
fits
with
all
of
the
precedence
around
the
api.
Both
you
know
the
api,
spec
and
status
blocks
being
owned
by
different
entities
and
all
the
general
patterns
without
needing
any
special
cases,
but
it
requires
a
cubelet
checkpoint.
C
Hey
tim,
this
is
derek,
hey
derek,
so
I
want
to
apologize
to
banai,
because
I
know
that
this
has
been
a
topic.
That's
gone
on
discussion
for
feels
like
a
year
now,
and
so
I
think
your
point
on
this
being
hard
is
is
valid.
C
C
You
have
the
initial
resource
requirements
of
that
pod
that
were
enumerated,
and
then
you
have
a
need
to
say.
I
want
to
bump
that
or
decrease
it,
and
we
had
no
way
to
kind
of
express
when
we
reached
a
level
state
so
like.
If
you
just
had
updated
the
resource
requirements,
then,
if
the
cube
couldn't
size
you
down,
you
were
still
using
for
more
resources
than
you
had
initially
claimed,
and
there
could
be
some
inconsistency,
and
so
I
thought
it
was
that
need
to
coordinate
on.
C
Have
we
reached
a
level
state
that
needed
to
be
understood
by
scheduling,
quota
and
node
that
led
us
to
where
we
are?
If
my
memory
is,
is
incorrect,
maybe
that's
a
sign
that
our
initial
thoughts
on
it
yeah
yeah?
That's
that's
one
of
the
reasons
I
mentioned
yeah
and
so
like
to
me.
The
sniff
test
tim
is
like
if
having
not
read
this
kept
in
six
months.
C
D
B
On
that
node
right,
so
let
me
let
me
be
a
little
bit
clearer,
I'm
actually
suggesting
that
you
keep
allocated,
but
you
move
it
to
status,
which
changes
the
corner
case
from.
I
need
to
be
able
to
signal
that
this
thing
happened
to
what
happens
if
the
node
reboots
and
all
of
that
information,
that's
currently
in
memory
is
lost
and
that's
where
the
checkpoint
comes
in.
That
covers
the
the
corner
case
of
what
happens.
B
If
cubelet
loses
its
memory,
because
you
know
we
don't
have
a
representation
of
things
like
memory
request
in
any
durable
place,.
C
So
I
think
the
issue
with
that
and
again
I'm
I'm
happy
to
be
wrong-
was
that
at
the
time
I
thought
we
wanted
status
to
be
entirely
reconstructable
by
the
cubelet
on
a
restart,
and
so
if
the
cubelet
reported
status
for
what
it
actually
observed
was
being
used,
that
that
was
not
consistent
with
what
was
previously
requested.
And
so
you
still
had
say
the
quota
subsystem
potentially
being
gamed
or
confused,
because
it
would
not
be
aware
of
an
attempt
to
reclaim
resources.
C
So
I
thought
quota
is
updated
to
look
at
allocations
and
so
just
quota
wouldn't
be
status
aware,
and
so
that
was
one
of
the
other
design
tensions.
B
I
could
schedule
a
pod
that
requests
a
gigabyte
of
memory
and
then
once
it's
accepted
to
a
node,
reschedule
it
to
use
one
byte
of
memory
or
one
megabyte
of
memory,
and
my
quota
would
say
I'm
using
one
megabyte
when
in
fact
I
was
using
one
gigabyte
because
keyboard
wouldn't
be
able
to
shrink
me
because
I
had
active
pages
or.
C
C
E
I
real
quick
yeah
go
ahead,
so
we
actually
don't
need
to
use
the
max
the
reason
being
that
we
there
are
two
kinds
of
changes
we
could
make.
We
could
either
change
the
requests
or
we
can
change
the
limits
for
limits
that
can
be
rejected
if,
for
example,
you're
using
all
of
your
memory,
but
the
downsizing
for
requests
will
always
be
accepted
by
the
cubelet,
so
we
don't
actually
need
to
use
the
max.
You
can
temporarily.
B
Right,
which
is
also
an
interesting
attack
right,
I
could
get
a
guaranteed
class
pod
that
I
then
shrink
request
on
or
like.
Are
we
going
to
enforce
that
if
you're
in
guaranteed
class,
you
have
to
shrink
them
in
sync.
E
I
believe
the
current
behavior
as
proposed
and
implemented
would
actually
you
would
accept
the
change
in
requests
and
that
would
take
effect
immediately.
So
if
you
had
eviction,
you
would
actually
evict
based
on
the
new
smaller
requests,
but
the
limits
would
the
qubit
would
attempt
to
reduce
the
limits
after
the
new
spec
has
been
accepted
with
quotes
right
and
that
could
be
rejected.
B
E
E
E
No
quas.
C
Odd
validation,
though
again
my
memory
is,
is
weaker.
Here
I
thought
in
pod
validation.
We
actually
verify
that
you
can't
change
the
clause
class
of
a
pod
when
making
the
update
itself.
So,
yes,
I
think
eviction
is
kind
of
a
second
order
effect,
but,
like
I'm
not
aware
of
any
reason,
unless
I'm
mistaken
that
you,
you
could
change
clause
classes
with
the
current
proposal.
E
E
If
you
have
a
resize
that
is
admittable
such
as
a
downsize
of
a
guaranteed
pod
but
is
not
updatable
by
the
runtime.
You
can
end
up
after
admission
where
your
requests
and
limits
have
been
changed
according
to
the
cubelet
and
according
to
its
record
in
memory
of
that
pod,
but
have
not
been
changed
as
far
as
c
group
files.
E
But
what
I
am
saying
is
that
the
important
thing
is
that
the
admission
would
never
be
rejected
at
admission
time
right
of
a
guaranteed
pod
being
downsized.
E
But
you
could
have
a
state
where
it
was
unable
to
reduce
the
actual
limit
of
the
container,
because
it's
it
that
memory
is
in
use.
B
So
the
the
question
here
is,
it
came
the
way
it
was
explained
to
me.
Vinay
helpfully
explained
this
corner
case
of
having
two
pods
and,
if
you
like,
if
cubelet
were
to
restart,
it
would
not
know
what
its
previous
decisions
had
been
right
if,
for
example,
status
got
lost
right,
which
is
always
sort
of
the
litmus
test
for
status
right.
B
What
if
I
erased
it,
would
it
come
back,
and
that
seems
like
a
real
corner
case,
but
can
be
worked
around
by
having
cubelet
acknowledge
its
own
previous
decisions
via
a
checkpoint
right,
like
I,
I
have
admitted
this
and
I
have
decided
it
made
sense
in
the
case
of
a
restart
either
I
will
either
I
have
the
checkpoint,
in
which
case
I
can
resume
where
I
left
off,
or
I
don't
have
the
checkpoint,
in
which
case
I
hadn't
really
made
the
decision
yet,
and
that
seems
okay,
but
it
comes
at
the
cost
of
making
this
facet
a
cubelet
problem
instead
of
an
api
server
problem.
D
So
I
just
want
to
follow
up
to
reply
to
you.
What's
the
clear
use
cases
the
cleaner
use
cases,
it
is
what
to
really
earlier
explained,
because
when
we
first
talked
auto
scanning
the
sig
auto
scale,
whatever
team
and
the
one
of
the
requirement
it
is,
it
is
the
scheduler
and
the
controller,
and
they
want
to
initially
really
aggressively
to
set
that
allocated
to
to
to
update
to
to
adjust
of
this
one.
But
then
they
will
base
the
historic
data
and
adjust
and
based
on
real
usage
to
adjust.
D
So
they
basically
want
to
leave
kubernetes
make
tons
of
the
decision,
but
they
want
to.
We
give
more
signal
back,
and
so
they
could
just
make
the
maybe
like
the
retreat
back
or
maybe
adjust.
So
that's
the
main
reason
originally,
and
we
think
about
this.
Basically,
they
want
to
keep
a
tab
describe
what
their
desire
state
at
the
beginning
time,
but
it
is
us,
but
it's
not
a
user,
so
so
the
basically
it
is
like
the
request.
D
There's
the
one
it
is
from
the
initial
user
to
send
that
request,
set
the
initial
request
and
then
there's
the
like.
The
system
automatic
adjuster
to
us,
that's
all
the
required
spike,
that's
kind
of
the,
I
believe.
That's
the
original
reason
we
think
about
it,
because
already
we
also
talk
about
status
and
but
the
status
it
is
it's
not
all
those
controllers
watch,
and
so
they
basically
watch
off
the
spec
and
also
that
time
there's
the
sample
resources
also
have
the
limitation,
so
all
those
kind
of
things
as
about
together.
C
Like
I'm
imagining
a
thermometer
right
where
it's
like,
I
originally
set
my
temperature
at
x
degrees.
I
am
now
asking
that
between
the
hours
of
12
and
4,
I
want
to
dial
my
temperature
up
a
little
higher
and
I
thought
that
there
were
multiple
values
we
wanted
to
read.
One
was
what
you
were
saying
was:
did
the
vpa
ask
for
too
high
of
a
temperature
that
the
keyboard
itself
couldn't
satisfy?
C
The
cube
would
still
want
to
say
what
the
current
temperature
reading
was,
and
I
thought
there
was
some
desire
to
kind
of
like
find
a
a
a
a
way
of
approaching
when
we
got
to
a
level
between
what
was
desired,
what
was
met
and
how
strong
of
a
desire
was
being
made.
Maybe
that
made
sense
at
the
time
and
that's
not
coming
clear
to
tim.
B
I
think
I
understand,
I
think
I
understand
the
desire
for
this.
The
signal,
the
the
problem
that
I'm
having
with
it,
is
really
from
an
api
consistency.
Point
of
view
what
you're
describing
in
as
allocated
here
doesn't
belong
in
spec.
It
is
state
information
that
isn't
safe
for
the
owner
of
a
pod
to
set
on
their
own
right
and
that's
that's
the
smell.
If
you
look
at
other
apis
that
follow
similar
patterns
like
service
right,
the
user
can
set
their
cluster
ip
and
we
will
respect
their
cluster
ip
if
they
set
it.
B
C
Like
that,
sniff
test
is
the
same
sniff
test
that
that
made
me
feel
a
horrible
stench
and
probably
delayed
vinay
a
lot,
but
then
like
we,
we
do
have
that
right,
like
podspec.nodename
is
not
a
normal
name,
is
respected.
A
Yeah,
it's
the
same
here
too,
in
fact,
for
the
live
api
review
with
john
legit
and
kubecon
contributors,
san
diego.
That
was
the
first
slide.
I
think
I
had
where
we're
proposing
this.
The
precedent,
for
this
is
a
node
name,
and
we
had
we
had
resources
allocated
plus
a
subresource
for
setting
this
from
the
node
and
the
president
was
binding
and
scheduler
setting
the
node
name.
B
B
A
The
binding
endpoint
does,
if
you
call
it
again,
it's
not
it's
not
that
it's
called.
I
think
my
connection
is
going
on
bad,
so
the
bending
node
name
plus
binding
scheduler,
says
that
once
it
once
it
sees
that
it's
clear
and
if
it
is
not,
if
it's
already
set,
then
the
skiller
will
ignore
it,
and
then
it's
the
user
scheduling
the
part
to
a
particular
node.
You
know
that.
B
C
C
We
have
many
actors
that
act
on
things
in
a
system.
Is
the
distinction
you're
raising
meaningful,
like
I
can
change
the
container
image
name
on
a
running
pod
like
I
guess
what
makes
that
distinction
meaningful.
B
Yeah,
the
distinction
is
that
there's
a
there's,
a
sort
of
philosophy
behind
the
api
like
the
spec,
is
what
you
intend
to
be
happening
and
the
status
is
what
system
is
actuating
and
if
you
say,
there's
a
thing
that
exists
in
spec
and
I'm
going
to
write
to
it,
but
you're
not
allowed
to
write
to
it.
Even
though
you
own
the
rest
of
spec
you're
sort
of
flying
in
the
face
of
all
of
the
existing
conventions
and
node
name
is
different,
because
you
can't
change
node
name
that
doesn't
cause
a
reschedule.
That's
not
a
thing.
B
If
you
look
at
something
like
I'm
trying
to
come
up
with
a
good
a
good
example,
so.
B
C
Like
if
a
pod
has
never
been
started,
or
it's
just
stuck
waiting
to
get
started
after
it's
been
bound
to
a
node
and
the
cube
doesn't
report
a
status
on
it
or
doesn't
report
a
resource
allocation
on
it,
because
a
volume
hasn't
yet
attached
and
the
containers
were
never
actually
invoked
like
do
we
do
we
want
to
worry
about
those
things
or
not?
I
guess
I'm
trying
to
think
through,
like
when.
Does
the
cubelet
say
that
this
resource
is
now
allocated
and
then
the
keyboard
still
has
to
report?
C
D
D
C
That
additional
resource
and
the
cube
would
have
been
watching
that
additional
resource
and
it
had
scaling,
impacts.
B
Yep,
so
I
I
concur
with
all
of
your
assessment.
This
is
not
there's.
No
obviously
easy
good
answer,
given,
I
think,
actually
all
the
work
that's
gone
into
the
cap
is
really
good.
I'm.
What
I'm
trying
to
offer
is
what
I
I
hope
and
I'm
hoping
there's
no
corner
cases
that
I
missed.
I
hope
is
a
slightly
surgical
modification.
B
I'm
saying
take
allocated
with
the
semantics
that
you've
currently
assigned
it,
move
it
to
spec
or
move
it
to
status,
and
anybody
who's
looking
at
a
pod
needs
to
consider
sort
of
the
tuple
of
requested,
allocated
and
actual
and
make
decisions
that
are
sort
of
context
dependent
based
on
those
three
fields.
You
can't
just
look
at
any
one
and
to
cover
the
case
of
what
happens
if
cubelet
restarts
cubelet
has
to
save
that
information.
It
has
to
save
its
own
decisions
somewhere
and
the
api
is
not
the
right
place
for
that.
C
I
don't
want
to
portray
my
arguments
as
being
stronger
than
my
memory
allows
me
to
have
so
on
the
checkpointing
side.
I
know
we've
been
bit
by
checkpoint
in
the
cubelet,
so
there
would
be
some
polling,
at
least
on
a
cro,
to
see
what's
possible
there
empty
dirt
volume
usage
is
actually
really
hard
to
measure
appropriate
in
status,
and
so
that's
a
little
tricky.
E
C
D
D
C
You're
right,
I
I
I
apologize
for
usage,
I
it
is
just
needing
to
introspect
what
the
runtime
had
been
previously
told
to
to
write
in
secret
with
s.
B
So
I
initially
thought
oh
well,
we
just
it's
because
c
groups
is
is
missing
information,
but
actually
I
don't
think
that's
sufficient
either
because
there's
still
a
case
of
what,
if
the
node
rebooted
and
came
back-
and
you
just
want
to
restart
all
your
pods
in
place,
all
the
c
group
information
is
going
to
be
lost.
So
I
think
if
this
is
a
case
of
kubelet,
is
making
a
decision,
and
it
needs
to
remember
that
decision
over
time.
E
C
No,
no
but
like
it's
been
admitted
to
the
api
server
now
right
like
and
so
now
we're
we're
bringing
this
into
the
cubelet
and
then
the
keyboard
has
to
be
like
yeah.
I
can
still
fit
this
or
I
can
still
grow
this
and
so
like.
The
scheduler
hasn't
even
been
involved
in
the
regrowing
of
the
requests
right,
they're
like
they're,
not
there,
and
so
it's
the
node
fit
check
and
right.
A
We
were
looking
at
scheduler,
potentially
assisting
a
resize
like
let's
say
it's
a
high
priority
part
and
then
the
scheduler
notices
that
the
same
argument
that
I
have
for
vpa
here,
where
it
can
keep
a
running
average
of
what
the
expected
time
for
this
to
be
updated.
Once
it
sees
the
resources,
change,
scheduler
can
watch
that
same
metric
and
then
say
hey.
This
has
been
it
typically.
If
the
cubelet
could
resize
it,
it
takes
two
seconds
and
it's
been
five.
A
Maybe
I
need
to
kick
out
some
lower
priority
pods
so
to
help
this
guy
grow.
So
I
love.
D
A
D
B
C
So
here's
here's
the
here's,
a
question
tim,
I'm
sorry
and
I'm
getting
my
thoughts
back
into
my
cache
so
like
I
have
a
cubelet
with
a
pod
restart
policy
of
always
right
and
it's
been
bound
to
node
derrick
and
the
derrick
node
has
very
few
resources
because
he's
just
too
strapped
and
so
he's
got.
You
know
three
cpus
a
gig
of
memory
or
something
and
the
pod
fit
there.
Just
fine.
C
B
B
Well,
yes,
I
mean
the
field
is
published
through
pod
status,
but
cubelet
has
a
disk
on
a
file
on
disk
or
something
that
says.
This
is
what
I
decided
in
the
past,
so
in
case
status
were
to
go
away.
I
can
always
re
redecide
that
and
the
pod
would
be
able
to
restart
with
the
lower
limit
and
cubelet
would
have
a
way
to
say
hey
this
resize
request,
while
you
know
formally
schematically,
it
is
correct
practically
I
will
never
be
able
to
do
it.
C
D
Okay,
I
have
to
watch
the
time-
and
here
we
already
spent
like
a
roughly
after
40
minutes
on
this
one
holy
moly.
So
so
I
think
we've
been.
I
could
talk
about
this
one
for
more
than
a
year
and
and
the
team
thanks
for
you,
raised
all
the
consent
and
refreshed
me
a
lot
of
memory
and
I
believe,
derek
too
so
so
we
can
carry
on
this
one
more
and
and
please
connect
the
community
and
everyone.
If
you
are
interested
in
talking,
please
take
a
look.
D
This
is
next
a
really
good
chance
to
revisit
our
decision,
and
I
I
think
about
do.
We
also
need
the
loop
of
the
auto
scanning
team
here,
because
a
lot
of
the
requirement
and
async
team,
you
understand
a
lot
of
like
the
auto
scanning
in
work
and
what's
the
pink
pint.
I
understand
you
what
you
propose
to
check
pine.
B
Through
the
yeah,
I
just
posted
in
the
chat
the
link
to
my
pr,
which
has
a
bunch
of
questions
against
the
cap,
I'm
happy
to
keep
the
conversation
going
there.
I
had
a
chat
with
jordan
this
morning
just
to
get
some
history
from
him
and
I
will
go
through
the
cap
when
I
get
a
chance
later
today
and
respond
to
vinay's
comments
and
and
update
it
with.
You
know
the
results
of
the
conversation
with
derek,
but
I
would
love
to
keep
the
conversation
going
on
that.
B
I'm
really
down
to
these
three
points
and
I
think
I'm
not
I'm
totally
willing
to
be
swayed,
but
I
really
need
to
understand
them,
because
I
think
that
there's
precedent
that
this
will
set,
which
worries
me.
B
D
Yeah,
thank
you.
Thank
you
and
then
it's
more.
The
topic
welcome
to
next
time
and
we
also
can
carry
following
discussing
and
at
least
the
report
back
to
the
community
and
what
we
decided
or
what's
that
we
resolved
and
what's
it
is
the
new
open
question.
So
let's
move
to
next
topic
manu.
Do
you
want
to
talk
about
the
the
staff
rule.
G
Yeah,
so,
to
give
some
bit
of
a
background
before
nalin
johnson
is
like,
we
are
working
on,
enabling
unprivileged
builds
inside
a
pod,
so
basically
not
giving
any
privileges
to
the
pod,
but
still
be
able
to
perform
builds,
and
this
is
done
using
fuse
overlay
fs,
which
requires
dev
fuse
in
a
container,
and
while
doing
that,
we
run
into
some
issues
and
nolan
can
talk
more.
I
guess.
B
Greetings
everyone,
so
the
main
problems
we
ran
into
well
in
this
case
was
dev
fuse
not
being
available
to
the
container
the
unprivileged
enchanter
I
should
say
what's
going
on
there
is
that
dev
use
is
not
available.
When
we
create
a
host
path.
Char
dev
volume
mount
to
make
it
available
from
the
node
to
the
unprivileged
container.
We
still
can't
access
it
because
it's
not
added
to
the
unprivileged
containers
device
list
that
gets
set
in
this
device
control
group.
B
We
can't
get
around
this
with
just
the
volume
mount.
It
doesn't
fit.
The
pattern
for
device
plug-ins,
which
expect
a
certain
limited
number
of
devices
and
that
fuse
can
just
show
up
anywhere.
You
want
to
be,
and
we
don't
have
a
solution
list
for
this
right
now,
so
we
went
looking
and
it
turns
out.
There
is
an
open
pr
that
changes
the
kubelet
so
that,
when
a
device
on
the
node
is
added
as
a
block
device
or
a
character
device
volume
mount
in
a
non-privileged
container.
B
The
cri
request
also
adds
that
device
node
to
the
list
of
devices
that
we
tell
the
runtime
to
expose
the
container
and
that
ends
up
causing
the
run
time
to
add
it
to
the
device
control
group,
which
would
be
perfect
for
this
case.
But
it's
been
open
for
about
a
year
and
I
haven't
seen
any
traffic
in
it
for
a
couple
months.
So
I
was
hoping
for
some
info
or
guidance.
D
If
I
know
correctly,
we
maybe
we
talked
about
this
one-
I
just
signaled,
or
maybe
it's
different
in
the
different
tastic
group.
Stick
meeting
me
here
from
the
or
maybe
sod
from
the
storage
actually
proposed
some
other
way
to
support
this
one.
So
so
I
don't
remember
the
detail,
and
so
so
so
that's.
Why
did
we
follow
on
that
one?
I
haven't
looked
at
this
request
yet
so.
B
Can
catch
up
with
it
looks
like
the
proposed
options
were
device,
plugins
controls,
sorry
csi
drivers
and
a
local
persistent
volume.
B
I
But
csi
driver
may
have
a
notion
about
a
volume
volume
device.
Of
course
it
was
done
for
block
devices
but
might
be
for
fuse.
It
also
might
work
and
in
case
of
device
plugins,
we
had
a
similar
scenario
with
our
intel
integrated
gpu
devices,
where
what
device
node
is
actually
the
same,
but
where
amount
of
clients
to
it
might
be
unlimited.
I
G
Is
there
a
third
part
possible
here?
I
guess
it's
for
dawn
and
direct
them
so
like
where
we
use
something
like
psp
or
scc
or
opa,
to
have
a
list
of
allowed
host
devices
that
the
pod
can
mount,
and
then
we
just
allow
it
to
the
cri
directly
without
creating
a
device
plugin
for
such
use.
Cases.
D
I'm
not
suggest
to
using
device
plugin.
Actually
here
I
was
more
like
the
when
we
talked
about
the
house
pass
and
and
and
I
thought
we
agreed
to
move
to
using
of
the
common
api,
which
is
the
csi
and
from
now
on,
so
not
like
the
ink
tray
and
or
hard
coded
of
the
wallet
management
what
we
did
in
the
past.
D
So
I
that's
the
kind
to
what
my
first
reaction
here
and
and
but
I
understand,
maybe
there's
some
different
special
cases
and
the
current
kind
to
the
csi
driver
didn't
support
the
api
thing
just
included
this
one,
but
we
can
discuss
so
so
we
we,
we
could
figure
out
what
we
are
missing
in
the
csi
api,
how
to
make
that
generic
and
also
but
in
generic
enough,
but
at
the
same
time,
can
support
the
particularly
use
cases
you
proposed
here.
Oh,
we
can
think
about
it.
D
Do
we
need
to
have
the
third
category
to
support
this
one
so
far,
based
on
the
alexandra
and
even
at
device
plugin,
we
could
do
something
it's
high
key,
but
I
I
try
to
not
reinvent
a
lot
of
whales
here
and
try
to
see
the
what's
the
existing
mechanism.
What
is
missing
so
then
to
understand
the
the
discrepancy
between
those
apis
and
then
we
can
say:
do
we
need
instant,
like
immediate
jumping,
to
introduce
new
api
or
new
plugins
right?
So
that's
that's
my
first
reaction.
D
Devices,
yes,
that's
the
initially
when
we
proposed
the
device
plugin
and
unfortunately,
until
today,
us
plugin
kinda
a
little
bit
specific
to
the
gpu,
and
maybe
this
is
the
chance
we
make
that
more
generic
start
a
sec
support
the
second
case,
so
I'm
not
proposing
to
really
using
or
not
using,
but
I
at
least
we
need
to
understand.
What's
the
discrepancy
here
and
then.
I
J
C
So
I
was
trying
to
catch
up
with
all
of
it
in
the
background
here,
because
I
was
curious,
what
was
bringing
this
to
the
top
of
attention,
and
maybe
there
are
use
cases
that
people
are
trying
to
meet
that
we're
not
appreciating
renault.
Do
you
want
to
give
some
detail
on
like
what
the
use
case
is
that.
G
B
Sure,
currently,
we're
using
privilege
pods
with
the
kernel
based
overlay
file
system
fuse
lets
us
do
that
from
privileged
users
outside
of
pods
in
the
broader
sense.
What
we're
using
these
four
is
to
do
the
copy
on
write
layering
for
container
image
builds,
so
we
can
extract
the
differences
more
efficiently
than
much
slower
methods
that
we
have
available
to
us,
but
which
are
kind
of
expensive
in
terms
of
time
and.
C
So
when
I
read
the
issue,
the
issue
kind
of
just
discusses
like
because
of
container
runtime,
let
me
do
this:
kubernetes
must
do
the
same
type
of
level
of
discussion.
What
I'm
wondering
is
if
we
can
maybe
get
a
more
formal
design
authored
on
why
kubernetes
should
do
this
and
what
use
case
it
fulfills
and
if
we
can
scope
it.
C
I
guess-
and
you
know
at
red
hat,
I
know
we
do
obviously
container
image
builds
and
kubernetes,
and
I
know
other
communities
are
doing
similar
and
so
getting
getting
something
written
around
that
use
case
is,
I
think,
probably
very
clarifying
for
everybody
when
understanding
why
the
request
was
coming.
G
Sure
yeah,
we
just
saw
this
open
issue
and
we
wanted
to
see
if
there
were
any
blockers
around
that,
but
we
can
create
a
new
issue
or
an
enhancement.
I
guess.
D
Yeah
please
and
yeah.
I
agree
with
the
dark
and
clear
they
write
down
of
the
use
cases,
and
then
we
can
start
the
final
use
cases
and
to
talk
about
and
the
exam
of
the
current
solution,
and
then
we
are
open
to
any
suggestion,
and
but
we
need
to
think
about
this,
not
just.
We
also
have
to
examine
why
existing
of
the
solution
is
not
support.
These
cases
use
cases.
D
G
K
Sure
yeah,
so
it's
sitting
there
like
for
2015
and
the
use
cases
we
have
to
set
our
limits
per
container
is
mostly
for
from
aiml
and
hpc
workloads
where
we
need
to
set
unlimited
memlock
specific
stack
size.
K
And
currently,
the
situation
is
that
we
are
enabling
those
limits
globally
per
runtime.
So,
for
example,
for
cryo,
we
are
set
creating
a
drop
in
file
for
systemd
and
having
memlock
stack
size
and
other
variables
unlimited.
But
we
don't
don't
have
to
have
those
for
all
containers.
We
just
wanted
for
specific
containers.
K
The
other
thing
is,
we
can
run
a
privileged
container,
which
is
not
which
we
don't
want
to
use.
We
want
to
have
all
those
settings
per
user
in
a
container,
and
I
wanted
to
ask
the
group
how
we
can
get
forward
on
this
issue
of
our
limits
in
kubernetes.
C
So,
to
my
knowledge,
it's
been
there
since
2015,
because
nothing
has
changed
since
2015
on
like
a
primitive
we
could
use,
and
so
I
guess
as
funko
is
there
something
that
we
might
be
missing.
I
see
dawn's
original
comm
and
I
think,
still
holds
true
from
2015,
which
is
we
don't
really
have
a
good
knob
to
do
this
effectively
and
the
one
that
was
used
by
docker
in
the
past
wasn't
really
considered
appropriate.
I
But
derek
we
we
nowadays
we
have
a
second
profiles:
pattern
where
runtime
can
define
like
certain
profile
and
when
the
user
can
specify
like.
I
want
to
be
this
particular
profile,
so
something
similar
can
be
here
like
I
want
limit
profile
like
this
and
on
the
runtime
site.
It
will
be
just
expanded
to
set
of
parameters
for
injected
for
container
spec.
C
C
K
I'm
I'm
just
curious
if
we
are
allowing
all
parts
to
have
locked
memory,
if
pods
could
you
know
lock
too
much
memory
and
stuff
other
parts
if
they
have
unlimited?
K
Let's
say
somebody
sets
core
unlimited
core
unlimit
memlock
unlimit
stack
size.
If
memory
can
be
exceeded
for
other
parts,
if
we
are
setting
it
globally
for
the
complete
node
they're,
like
in
our
dmac
use
case,
we
have
like
two
use
cases.
One
is
dedicated
where
we
let
have
only
one
user
and
one
workload
running
and
the
other
one
would
be
multi-tenancy
where
we
have
a
couple
of.
K
I
don't
know
several
tens
of
hundreds
or
containers
running
using
rdma
with
aiml
workloads
and
if
we
are
setting
this
globally
for
all
pods,
if
this
could
somehow
break
the.
K
C
Yeah,
I
don't
have
a
clear
answer
on
that.
I'm
curious
if
others
on
the
call
have
explored
any
other
workarounds
that
we
might
want
to
raise,
but
I
think
swanko
we
would
have
to
sit
down
and
think
about
that
carefully.
C
L
D
Actually,
I
want
to
ask
you
so
there's
the
a
while
back
there's
the
unlimited
sequel
pool
very
actively,
at
least
the
relative
activity
in
the
upstream
kernel.
Do
you
follow
that
one?
Do
you
know
what
the
status
I
forgot
to
follow
last
two
years.
C
And
I
don't
know
the
status,
which
is
why
I
was
wondering
if
there
was
a
very
real
chance
that
someone
likes
funko
vermin
would
have
helped
educate
me
on
something
that
may
have
changed,
that.
I
wasn't
aware
that
had
changed.
So,
okay.
C
D
I
do
think
about
the
uf
leader
and
the
moment,
the
the
the
five
years
ago,
almost
five
years
ago,
if
I
remember
earlier
people
say
I
made
some
comment,
if
I
recall
that
correctly,
because
people
ask
for
to
use
in
the
darker
and
the
content,
netflix
darker
feature
unlimited
feature
and
that's
the
process
based.
We
already
agree
and
that's
very
easy:
it's
not
that
user
friendly.
D
So
at
that
time
I
remember
there's
the
unlimited
c
group
is
disgusting,
and
so
that's
why
we
hope
to.
If
we
there's
the
make
clockwise,
then
we
could
enable
that
either
the
container
light
will
add
negative
power
levels.
So
we
can
utilize
the
front.
Kubernetes
perfect,
that's
much
easier
to
apply
and
if,
if
that
not
make
progress-
and
we
need
to
understand
why
that's
so
complicated
and
we
can
see,
we
can
see
we-
I
can
see
some
the
complexity
there.
D
But
do
we
want
to
understand
what
it
is
so
then
we
maybe
not
think
about
the
phone
either
way
to
do
something,
or
maybe
we
can
help
the
community.
I
mean
the
upstream
kernel
to
push
down
some
requirement
from
user
space
and
I
think
about
the
secret
pool
it's
so
successful
in
the
past.
It
is
just
because
a
lot
of
use
cases
push
down
the
requirement
pre-done
from
the
user
space
to
the
kernel.
So
that's
why
a
second
version
of
the
stick
will
be
so
successful,
especially
memory
management.
D
So
we
are
in
the
unique
of
the
space
to
to
send
out
those
kernel,
folks
and
expertise,
and
and
what's
the
use
cases,
the
user
care
about.
I
just
want
to
share
here
and
to
see
what
we
can
do
here,
because
the
anime
things
actually
has
been
raised
many
times.
G
Yeah
we
can
follow
up
on
that
for
sure
and
see
where
the
kernel
is
at
and
we'll
come
back.
So
meanwhile,
I
was
wanting
to
give
a
more
limited
use.
Case
of
caveats
can
be
enabled
like.
Yes,
all
the
processes
don't
get
the
limits,
but
in
most
cases,
if
your
main
container
process,
if
we
can
apply
limits
to
the
main
container
process,
that
may
be
enough
to
elevate
the
issues
that
zwonko
is
running
into.
F
Yeah
hello,
so
I
wanted
to
to
ask,
or
maybe
a
quick
update
for
direct
that
was
wasn't
here
the
last
time.
The
last
time
I
asked
about
this
pr
that
I
created
to
move
to
change
the
with
what
we
talked
in
the
other
signal
meeting
will
basically
switch
the
cycle
cap
to
provisional
state
and
mention
that
it
depends
on
the
cubelet
graceful
shutdown
that
is
still
in
the
works.
F
C
Yeah
so
appreciate
the
pain
and,
like
you
know
that
I
wasn't
here
last
week
and
then
the
holiday
weekend,
as
still
have
me
catching
up,
but
I
have
your
appearance
now
this
afternoon.
F
Okay,
I
I
didn't
know
about
it.
Sorry
cool!
So
if
I
understand
correctly
this
pr,
if
it's
fine
will
be
emerged
and
then
I'll
do
another
following
prs
with
the
callouts
that
we
discussed,
does
it
make
sense.
D
L
Yes,
our
question
related
to
repository,
but
sweaty
will
talk.
J
L
J
Thanks
alexey,
so
we
presented
topology,
aware
scheduling
in
six
scheduling
and
like
the
scheduler
plug-in
itself
was
approved
to
be
placed
in
the
scheduler
repo
itself.
But
we
were
hoping
to
get
get
an
opinion
and
get.
C
So
some
of
those
questions
I
think
you
may
even
just
answered
there,
and
so,
if
you
can
kind
of
update
on
that
that
be
great
and
then
my
my
understanding
for
the
next
discussion
is
basically
there's
a
request
that
says,
would
signal
want
to
sponsor
a
subproject
to
export
topology
information
to
the
scheduler
and
it
sounds
like
you're
saying
the
scheduler
would
support
a
built-in
plug-in
that
could
read
that
information.
C
My
ask
on
the
cap
was
what
why
do
we
need
to
have
more
than
one
node
fleecing
agent
in
the
community,
and
could
we
do
this
as
a
enhancement
to
node,
feature
discovery
or
evaluate
it
in
that
context?
And
so
maybe,
if
you
can
look
at
that
feedback
and
reach
out
to
the
sub
project
maintainers
there
to
see
what
their
perspective
is,
that
would
be
good
going
into
our
next
discussion.
J
Sure
yeah,
I
think
the
reason
we
didn't
go
ahead
with
the
node
feature.
Discovery
approach
was
because
it
populates
the
node
information,
the
form
of
labels,
and
that
wasn't
something
we
were
trying
to
do.
We
were
trying
to
create
crds
here
based
on
the
resource
topology.
So
that
was
the
initial
reason,
but
we
can
definitely
think
about
it.
A
bit
more
and.
C
Yeah
yeah
so,
like
I
know
we're
over
time,
but
the
last
parting
concern
I
have
would
be
that
where
we
could
centralize
in
the
case
of
nfd,
like
node
labeling,
to
like
a
core
authority
rather
than
giving
highly
privileged
credentials
out
to
many
different
things.
That
was
a
desirable
trait,
and
so
I
know
the
nfd
community
had
to
tackle
that.
C
J
D
M
M
I
I
just
it's:
it's
not
enough
just
simply
to
provide
cloud
net
file
to
it.
While
system
comes
up
so
we
were
trying.
We
are
trying
to
get
a
simple
fedora
image
to
use
for
no
testing,
but
it
turns
out
federal
image
is
not
officially
available,
so
we
were
wondering
if
there
is
a
step
cap
or
intermediate
solution,
where
we
can
host
a
fedora,
32
image
for
our
node
tests
and
then
use
it.
D
I
don't
think
about
so
there's
the
people.
I
think
this
is
kind
of
a
mix
of
the
excess
right
so
to
end
to
a
project
and
also
mix
about
the
cncf
beginning,
all
those
kind
of
things.
So,
if
you
don't
mind,
can
we
have
like
a
follow-up
meeting
and
then
we
can
report
back
to
the
signal
here
and
see
what
status,
but
it's
harder
for
us
to
to
have
this
meeting
here
and
and
and
also
people
actually
in
charge
of
that
account
also
is
not
here
so.
H
D
So
how
about
you
schedule
a
meeting
with
me
and
I
I
do
involve
the
parties
here
and
then
you
just
add
the
involved
party
from
your
side.
I
add
more
party
for
myself
and
then
we
can
come
back
later.
So
next,
let's
drop
this
off
nine
separate
meeting.
Is
that
okay
get
everyone
in.
M
I
will
continue
I'll
drop
you
an
email
with
the
same
thread
that
we
are
having
me
and
all,
and
then
we
can.
We
can
talk
over
there
about
suitable
time.
Okay,.
D
N
Yeah
sure
sure,
before
the
next
meeting
I
will
issue
the
enhancement
proposal
in
the
repo.
D
Okay,
so
the
last
one
it
is,
I
think
I
asked
for
approval
and
I'm
going
to
take
a
look
and
I'm
on
that
one.
Okay,
just
I
think
the
x
will
ask
a
pool
and
I
will
take
a
look
at
another
one.
Thank
you
everyone
and
have
a
great
weekend
we're
a
week.
Sorry
about
it.