►
From YouTube: Kubernetes SIG Node 20190806
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
B
Great
I
had
the
issue
and
it's
relatively
straightforward.
Basically,
today,
eviction
has
done
it
that
pod
granularity,
but
the
first
thing
it's
done
is:
is
you
would
you
know
when
you're
checking
requests
versus
actual
utilization?
The
requests
are
the
sum
of
what's
done
for
each
individual
container
and
the
actual
utilization
is
done
by
the
sum
of
the
containers
in
the
particular
pod
and
those
who
are
compared
and
then
based
on
that
you
have
better
details
of
maybe
that's
a
good
candidate
for
eviction.
B
It
might
be
going
through
the
list,
pod
stats,
where
it
does
a
mix
of
CRI
and
c
advisor,
and
all
of
this
to
sum
up,
each
of
the
container
requests.
It
seems
like
it'd,
be
quite
a
bit
simpler
if
you
just
went
ahead
and
looked
at
the
Pazzi
group.
Having
said
that,
I
don't
see
anywhere
in
in
and
that
we
do
this
today.
So
I'm
not
sure
if
it's
controversial
from
like
a
performance
standpoint
or
a
caching
or
anything
like
I
don't
want
to
just
go
ahead
and
submit
a
PR
without
kind
of
understanding.
A
I
just
update
that
the
issue
there
you'll
brother
I,
provided
some
lacks
a
region.
Why
we
didn't
do
that
because
the
giving
the
original
QoS
implementation
it
is
the
per
container
and
also
per
resource.
Then
later
we
unified,
is
negative,
become
connect
per
container
actually
later
become
to
prep
outer
and
for
all
the
resources
so,
but
to
then
that
time
we
want
to
have
something
civilian
and
also
there's
the
production.
A
Also
there's
the
next
Irenaeus
when
we
first
a
design
container
runtime
interface,
and
there
are
something
like
that.
Logging
like
there
are
certain
things
we
want
to
charge
to
the
part
instant
attracted
to
the
demon
and
back
those
is
and
deterministic
and
those
resource
usage.
So
we
are
connect
the
team
to
really
move
to
that
direction.
So
analysis
just
like
what
you
are
working
on
powder
over
harder
called
or
had
it
used
to
be
the
fixed
amount
and
I'd
miss
the
fall
for
the
CPU
and
memory
just
make.
A
And
I
work
out,
we
want
to
implement
a
different
continent
renderer
and
then
we
believe
the
different
type
of
the
container
runtime
overhead,
it
is
will
be
very
out
here.
So
that's
why
we
didn't
do
I'd
end.
The
dnt
goes
to
the
powder
and
but
I'd
the
signal.
The
meeting
actually
many
times
we
did
talk
about,
and
once
we
finish
all
this
work
introduced,
the
part
that
I
was
it
grow
and
and
introduce
once
we
unify
all.
Those
kind
of
things
include
after
Hector,
more
clear
on
the
pod
over.
A
B
A
D
E
C
C
G
B
Was
looking
at
the
sinc
function
in
the
addiction
manager,
you
just
cycle
back
well,
if
it's,
if
it's
fixed,
then
everything
is
straight
forward
and
quite
easy.
Then
I
thought
that
it
was
still
going
through
list.
Pod,
CPU
and
memory
stats,
but
I
I
think
that
I
was
wrong
on
that,
so
that
this
that'd
be
great.
If
that's
not
the
case,
that's.
E
Just
I
think
in
general,
if
it's
not
doing
that
for
memory,
it's
a
bug,
yeah
I,
think
there's
it's
possible.
There
could
be
code
in
here.
That
is
summing
things
just
for
non
memory,
resources
and
so
depending
on,
where
you've
looked,
you
might
have-
maybe
maybe
there's
some
happening,
but
it's
it
should
not
be
getting
used
for
memory.
Yeah.
E
B
H
A
I
So
a
little
bit
of
context
around
this
cap,
so
certain
contexts
kubernetes,
is
in
a
bit
overprotective
around
stateful
parts.
One
of
the
examples
is
the
case
of
a
node
that
is
shut
down,
so
the
currently
the
control
plane
cannot
know
if
the
cube
layer
is
crashed
or
the
node
is
totally
shut
down.
A
while
ago,
we
added
to
the
cloud
provider
interface,
called
the
queries.
I
The
cloud
provider
and
returns
even
noticed
shutdown,
so
this
gap
tries
to
make
and
leverage
information
from
the
cloud
provider
make
certain
assumptions
to
forcefully
did
delete
the
pods
make
them
into
another
node,
so
the
design
is
relying
on
node
list
object.
So
basically,
whenever
we
see
them,
a
node
is
shut
down,
we'll
try
to
grab
a
lock,
which
is
the
no
please
so
basically
setting
the
holders
field
to
the
control
loop
that
will
forcefully
evict.
I
The
pots
will
do
this
to
basically
handle
a
case
where
the
key
can
come
up
back
and
start
the
containers
again,
which
will
lead
in
some
case,
to
some
data
logs.
So
the
idea
is
to
add
a
code
in
the
control
plane
that
grabs
a
lock
and
a
code
in
the
cube
layer
that,
once
the
cube,
let
startup
tries
to
update
it
leaves
its
list
and
if
it
succeeds,
it
means
that
no
one
is
holding
its
list,
meaning
no
eviction
is
going
on.
I
J
G
C
C
I
A
That's
dangerous
so
so
doing
the
curator
last
time
and
there's
no
shutdown,
though
anything
is
just
something's
wrong
and
Cuba
needs,
because
today's
Cuban
a
to
her
debate,
they
will
cure
ethically
okay.
If
the
there
is
the
one
time
they
cannot
successful
the
nice
and
then
all
the
running
pod
is
going
to
be
coming
right.
Yet,
yes,
I.
K
I
I
What
the
we
we
were
trying
to
do
it
another
way
like
relying
on
change
and
we
implementing
some
semantics
of
locks
around
it.
But
if
field
like
was
error
prawns,
so
that's
why
we
took
the
way
of
using
the
least
object,
but
the
I
liked,
if
we
want
to
make
this
happen,
we'd
still
like
we'd
still
need
to
have
some
way
to
enforce
at
the
node
level.
No
container
would
start
up.
C
I
C
L
C
E
I
think
that
I'm
a
little
anxious
on
this
one
so
like
if
we
were
to
do
that,
we
have
plenty
of
users
that
deploy
workers
that
may
lose
separation
to
their
control
plane
and
they
expect
that
the
workloads
running
on
those
workloads
will
keep
running
even
when
not
able
to
phone
home.
So
this
is
just
to
the
lease
and
the
qiblah
being
out
of
to
renew
its
lease
like
we
have
plenty
of
users
that
would
expect
the
workloads
to
continue
to
run
for
the
time
being
and
I
think
that
that
would
be
rather
problematic.
I
I
E
You
know
if
there
is
no
under
eye
as
underneath
cube,
the
pods
aren't
actually
removed
until
the
qiblah
wakes
up
and
confront
home
and
acknowledge
the
delete
or
the
admin
has
figured
out
how
to
manually
force
that
deletion
to
make
it
safe.
But
we
can't
just
have
the
cube
of
delete
things
because
it
lost
a
little
ease
because
there's
plenty
of
environments
that
are
running
bare
metal
hosts
that
expect
that
workload
to
keep
running.
Yes,.
K
K
J
I
I
I
I
M
Yeah
and
the
cloud
provider
bits
like
we're
requiring
the
cloud
provider
parts
right
now,
because
that
is
the
only
way
we
know
of
right
now
to
deterministically
decide
if
a
node
was
intentionally
shut
down,
because,
like
we
mentioned
before,
like
we
can't
evict
pods
if
a
node
temporarily
just
lost
a
heartbeat
for
like
a
minute
or
something
right.
So
the
cloud
provider
check
essentially
gates
that
that
control
loop,
where
we
start
your
victim,
stateful
pods.
E
Its
my
original
understanding
on
this
was
proposed
was
that
in
a
cluster
with
a
provider
enabled
control
plane,
running
controller
would
distinguish
a
node
shutdown
from
a
network
separation
because
they
had
access
to
the
underlying
I
estate
and
it
would
label
the
or
taint
the
notice
saying
I'm
shut
down
and
then
once
it
was
shut
down,
I
thought
we
were
going
to
force
evictions
from
that
No
and
then
just
trying
to
work
through
the
issue.
Here,
the
when
the
note
is
restarted,
it
comes
out
of
maintenance.
Whatever
was
happening
on
that
note.
E
I
M
E
M
The
problem
is
that
a
down
note
is
doesn't
result
in
the
note
object
being
deleted
so
so
running
pods
on
the
note
aren't
deleted.
Then
you
have
issues
with
volumes
that
are
mounted
on
that
node
that
don't
get
detached
and
put
into
another
node.
That's
that
and
that's
where
the
coordination
on
the
couplet
needs
to
happen.
N
N
N
A
It's
more
state
of
or
how
it's
running
on
that
I
intended
shutdown
know
if
it
is
intended
and
I
commend
tennis
all
the
things
it
it
could
be
quiz
for
shut
down,
no
to
drink
the
node.
All
this
kind
of
things.
The
problem
is
just
other
side
and
then
know
that
just
gone
and
so
but
it
abroad
like
they're
in
the
production,
a
lot
of
customer
behavior
like
a
panic,
we
don't
know
behavior,
it
is
today
it
is
know
that
will
come
up
a
lot
of
provider.
A
We
can
know
the
config
in
a
panic
and
whatever
things,
and
we
want
you
to
know
that
it
should
be
come
back.
Everything
still
serve
the
majority
of
customer.
If
not
the
batch
workload,
because
you,
it
is
going
to
continue
serve,
continue
finish
the
work,
but
if
or
the
stateful
side,
we
totally
there's
the
problem.
My
question
is
the
proposal
right
now
it
is
just
like
a
big
hammer.
A
Maybe
they
know
the
only
know
any
state
of
all
part
or
maybe
have
the
wonderful
part,
but
we
are
actually
just
because
these
and
a
crowd
these
are
temporary.
We
cannot
get
that
one.
We
have
to
hold
a
hold
of
norm
agency
staff
leaders,
not
just
staff,
and
also
the
proper,
so
actually
applied
to
entire.
After
note
left
circle
like
the
even
you
are
not
a
shut
up,
but
we
are
pure
agony
trying
to
renew
our
lease
and
the
kind
of
proposal
is
basic,
connected.
A
Okay,
no
matter
which
time
because
we
cannot
tell
right
now
it
is
the
children
in
the
first
start
doing
the
boot
up
time
or
it
is
just
couponing
to
restart
or
whatever
do
you
know
whatever
hyper,
then
we
apply
the
same
rule
then
we're
basically
not
get
okay
like
at
any
time
this
Cuban
I
cannot
acquire
or
renew
their
lease
and
then
all
the
part
I
don't
know
that
will
be
terminated.
Okay,
if
we
couldn't
imitate
only
at
the
boot
time,
we
could
talk
about
more
I.
Think
right.
N
J
I
I
K
J
K
K
So
as
I
understand
it,
you
know
a
node
shuts
down
and
we
start
deleting
pods
stateful
set
will
create
the
new
pod,
but
due
to
a
race
condition,
the
shut
down.
Node
comes
back
and
starts
recreating
all
the
containers
that
it
had
before
shut
down,
and
then
you
end
up
with
these
conflicting
writers,
yeah.
J
E
Thing
Patrick
that
would
break
the
ordering
guaranteed
semantics
around
stateful.
So
that's
where
you
have
to
allow
for
order
to
start
up
in
order
to
shut
down
and
we
kind
of
I
don't
think
at
the
end,
that
that
has
to
be
enforced
at
the
actual
stateful
set
controller
layer
and
I.
Don't
think
that
we
can
get
around
that
if.
M
What
I
mean
is
like
when
the
when
the
cubelet
starts
up,
and
it
has
to
do
the
least
acquiring
like
it
only
starts
that
process?
If
it
sees
it
ain't
on
itself
already,
because
a
cloud
provider
would
have
said
it
rather
than
having
the
cubelet
always
acquire
the
least,
regardless
of
what
could
what
happened
before
it
started?
Well,
we
always
acquire
the
least
because
the
leases
are
hardened.
I
A
L
A
O
E
Just
don't
so
if
I'm
running
a
bare-metal
environment
today
and
my
worker
has
lost
connectivity
to
the
control,
plane
and
I've
been
running
in
this
fashion
for
I,
don't
know
twelve
releases
and
we
upgrade
this
and
say
hey
your
pods
that
used
to
keep
running
will
no
longer
run
because
when
the
cubelet
restarted,
you
didn't
put
the
note
shut
down
paint
on
Toleration.
Aren't
we
kind
of
like
breaking
a
backward
incompatible
they're
like.
E
But
basically
you're
saying
if
the
cubelet
couldn't
acquire
its
lease
I
should
step
down
all
workloads
on
that
node
and
the
cubelet
in
that
case
doesn't
even
know.
Or
are
we
saying
only
if
the
kid
what
is
configured
with
an
Associated
cloud
provider?
Will
the
cubelet
has
the
agree
on
startup?
So
what
happens
when
the
qubit
is
configured
with
an
external
cloud
provider
so.
I
E
J
A
Even
in
on
the
to
ke,
GCE
I
don't
want
this
behavior
because
we
do
have
customer
want
their
pod
running
after
comeback
or
maybe
the
time
we
have
the
network
petition
because
services,
the
type
network
a
partition,
could
be
only
returned
after
control
plan
and
worker,
and
but
actually
the
service
is
still
you
have
the
access.
So
we
don't
want
those
kind
of
things
just
being
forced
to
evicted
or
shadow,
and
also
this
is
different
controversial.
Actually,
we
want
a
long
term
like
the
Cuban
eight
to
even
act,
the
ATF
server
it
is
died.
A
Cuban
eight
will
continue
services.
Next,
we
are
talking
about
the
checkpoint
and
the
eight
the
palace
back
on
the
node.
So
we
could
accumulate.
You
can
serve
as
part
of
the
cultural
plan,
even
there's
the
series
of
the
net
or
competition
return
terminator
and
the
control
plan.
So
this
is
kind
controversial
from
what
we
are
talking
about.
E
I
think
that's
independent,
okay,
I
kinda
wanna
reinforce
what
Donna's
saying
here,
which
is
to
me.
It's
it's
a
non-starter
that
a
qubit
needs
to
stop
workloads
if
it
can't
talk
to
the
control
plane
like
we
have
plenty
of
real-world
applications
in
the
world
where
people
are
running
applications
on
worker
notes,
which
might
have
own
dedicated
routers,
which
can
access
their
content.
E
Even
if
you
lose
network
connectivity
to
the
control
plan
and
like
our
real-world
systems
might
come
down,
because
a
small
network
fall
between
worker
and
control
plane,
which
has
nothing
to
do
with
ingress
to
that
worker
from
other
parts
of
the
world.
So,
like
I'm,
pretty
strongly
against
the
idea
of
the
qiblah
just
shutting
down
things
if
it
lost
access
to
the
control
plane
with
such
a
broad
hammer
right.
E
If
we
can
tighten
the
hammer,
maybe
I
can
feel
differently,
but
even
then
I'm
not
sure
and
I'd
have
to
reason
through
a
little
bit
but
and
I
say
this
entirely.
Empathetic
cuz
I
also
get
the
counter
customer
complaints,
which
is.
Why
isn't
my
work
or
would
be
rescheduled
because
I
want
this
PV
to
be
attached
to
another
node?
I
C
I
A
You
and
also
police
are
also
take
a
look
at
what
Patrick
suggests
the
earlier.
We
can
think
about
is
potential
internet
static,
call
the
provider
layer
or
it
is
a
stored
into
storage
layer.
More
specific
in
stand
like
that,
like
this
one,
okay,
thank
you.
We
can
come
back.
Please
come
back
if
you
have
the
alternative
approach
and
adjust
update
to
your
cap.
So,
let's
move
to
next
one
and
the
next
one
I
think
theme
is
that
what
you
asked
for
for
the
code
organization
for
Cuba
I
mean
what
they
think.
Oh
it's.
I
I'm,
the
one
that
added
a
we're
trying
to
move
Cuba
diem
out
of
Cuba
nearest
communities
and
right
now
the
and
the
node
end-to-end
tests
imports
the
validation
of
comedian.
So
we
were
suggestion
to
move
the
validators
of
comedian
into
a
separate
Rico.
Do
you
want
to
see
with
you
guys?
What
do
you
think
about
it?
If
you
had
any
strong
opinion
on
this
I.
G
C
Keep
admin
code
I'm,
not
sure
what
for
do.
You
know
what
it's
used
for.
I
So
I
think
it's
mostly
the
docker
validator
that
is
used,
but
I
didn't
get
through
the
all
of
the
code.
So
there's
definitely
some
parts
that
are
used,
the
doctor,
validator
and
maybe
some
other
forms.
So
what
there's
two
possibilities?
I
think
one
is
to
refactor
the
tests
to
not
rely
on
these
or
the
second
one
make
it
like:
a
library,
shared
library
for
validation
and
moving
out
of
tree
in
two
separate
rhythm.
N
N
A
G
P
Okay,
last
week
we
discussed
I
think
we
stopped
it.
The
recommended
take
some
time
to
look
at
the
proposal
and,
in
particular,
I
wanted
to
focus
on
the
discussion
around
the
source
of
truth
for
resources
that
are
allocated
that
cubelet
has
agreed
to,
and
I
have
two
links
in
that
which
analyzed
the
case
when
pods
a
single
part
and
then
multi
pods,
two
plus
pods,
get
updated
during
cube
net
restart.
P
So
the
solution
potential
solution
here
is
to
rely
on
the
resource
version
to
sequence,
the
to
determine
which
update
came
first
and
honor
that
so
that
way
we
do
a
first-come,
first-serve
yeah,
even
during
restart,
and
we
can
tolerate
I
think
this
was
a
concern
that
David
had
regarding
how
we
handle.
If
restarts
happens,
something
doesn't
get
hired
of
another
one.
So,
if
possible,
could
you
guys
please
take
a
look
at
these
two
analysis
and
then
see
if
there
poke
any
holes
in
that?
P
If
there
is
I
met,
I
miss
something
and
the
point
is
to
determine,
make
a
decision
on
which
one
do
we
want
a
source
of
truth.
I
know
there
is
I'm
leaning
towards
the
node
local
storage.
Many
people
here
who
would
not
prefer
that
they
would
want
to
put
it
in
the
pod,
spec
and
either
case
is
fine
for
me.
I
just
feel
that
no
local
storage
is
simpler
and
we
are
already
doing
that
in
a
way
when
we
come
back
up
and
determine
what
pods
are
running.
E
Have
not
had
a
chance,
I
mean
okay,
having
just
been
acquired
by
a
huge
company.
I
had
to
be
distracted
by
looking
at
some
other
things,
so
I
will
try
to
get
it
open
this
week,
but
I
want
to
apologize
and
the
prev
I
was
able
to
go
upstream.
I
tried
to
get
the
easy
ones
first
and
the
hard
ones
I
don't
know.
If
I
have
the
mental
power
at
this
exact,
you
know
moment,
and
hopefully,
next
week,
I
can
get
back
to
it.
Yeah.
P
P
If
we
can
resolve
this
and
then
I
don't
know
how
far
it,
how
many
issues
we
need
to
like
drill
down
and
identify
to
close
this
and
get
approved.
This
will
help
me.
If
it
gets
approved,
then
I
can
go
back
to
management
and
say
you
know
what
I
need
budget
I
need
many
people,
and
this
is
something
we
are
driving.
So
it's
coming
to
that
at
this
point
because
of.
E
E
P
P
P
Okay,
but
the
new
design
I
think
we've
minimized
the
amount
of
changes
that
we
need
to
do
so.
Hopefully
the
risk
is
very
well
contained
with
this
it's,
but
it's
definitely
worth
taking.
I
know
this
is
a
pretty
invasive
change
and
to
commit
a
case
at
this
point,
so
it's
definitely
worth
taking
a
close
look
at
every
aspect
of
it
and
if
we
can
get
80%
conference
or
even
something
around
that,
then
I'll
feel
good
about
going
forward
with
implementation,
getting
resources
for
this
and
doing
it.
P
P
So
David
did
you
have
any
other
comments
regarding
the
source
of
truth.
Any
thoughts
looking
at
analysis
I
know
I
posted
the
analysis
for
to
play
sports
case,
but
this
is
something
we
can
look
into.
The
resource
version
is
can
be
used
for
sequencing.
We
just
need
to
fix
the
Kunitz
caching
of
the
resource
version.
C
C
C
P
We
will
not
I
mean
as
then
we're
relying
on
status
being
being
the
source
of
truth
that
the
API
conventions-
that's
where
we
have
a
disconnect
with
that.
That's.
Why
did
that
before?
I
just
don't
know
if
we
should
rely
on
it,
given
that
it's
quite
clearly
stated
out
stated
in
the
APA
convention
stock
that
this
can
be
lost
and
it
should
be
regenerated
or
yeah
I.
Think
the
I.
C
The
requests
and
limits
of
containers,
so
I'm,
not
suggesting
changing
that,
but
I
am
suggesting,
is
that
it
might
be
worth
looking
into
an
admission
time
when
we're
admitting
pods
that
are
already
running
doing
so,
based
on
the
actual
resources
that
have
been
allocated
to
containers
rather
than
the
desired
ones
in
the
spec.
So.
C
E
P
P
Okay,
yeah:
please
take
the
time
to
review
the
comments
and
then
overall
cap
in
in
the
next
case,
I
know
that
much
of
the
discussion
since
the
last
update
has
been
going
on
in
the
comments.
So
if
there
are
any
questions
I
can
we
can
a
piece
of
Linda
Kay
asked
on
the
thread
public
thread
and
we
can
I
can
answer
any
specific
questions.
I
know
it's
a
lot
to
dig
through.