►
From YouTube: Kubernetes SIG Node 20210112
Description
Meeting Agenda:
https://docs.google.com/document/d/1j3vrG6BgE0hUDs2e-1ZUegKN4W4Adb1B6oJ6j-4kyPU
A
All
right
welcome
everyone
to
the
january
12th
kubernetes
signature
meeting,
just
a
reminder
of
the
meetings
I
recorded
them
all
about
them
to
youtube
for
those
who
can't
be
here
at
this
time
or
forget
all
the
great
things
we
say,
but
we'll
treat
everyone
well
in
the
meetings
together.
So
a
number
of
topics
on
today's
agenda.
A
The
first
topic
was
an
item
that
I'd
put
on
there
with
lana
that
we
want
to
give
some
general
awareness
to
around
maybe
some
unintended
consequences
that
we
saw
with
how
graceful
termination
worked
and
liveness
probes
alana.
Do
you
want
to
talk
through
the
issue
and
maybe
get
some
awareness
of
the
cap?
You
were
proposing.
B
Yeah,
so
I
threw
up
sort
of
a
straw
cap
for
discussion.
I
didn't
want
to,
like
you
know,
fill
out
this
whole
full-fledged
cap
if
people
are
not
interested
in
this
approach.
But
basically
the
sort
of
summary
of
the
issue
is
that
when
we
use
a
liveness
probe
on
a
pod
and
the
liveness
probe
fails,
it
will
use
the
termination
grace
period
seconds
for
the
pod
to
wait
to
terminate
that
thing.
B
And
so,
if
the
termination
grace
period
seconds
is
set
super
long
say
like
an
hour,
because
you
want
the
pod
to
be
able
to
gracefully
drain
connections
and
that
kind
of
thing,
then,
if
your
liveness
probe
fails,
it
will
potentially
take
up
to
an
hour
for
the
liveness
probe
to
restart
the
pod,
which
is
not
intended.
Behavior
not
desired
and
could
potentially
result
in
some
outages.
So
there's
like
a
number
of
different
approaches,
we
could
take
to
fix
this,
but
probably
the
most
backwards
compatible,
one
would
be
to
say.
B
Well,
we
want
to
be
able
to
configure
this
directly
without
changing
the
existing
behavior,
because
it
appears
that
a
number
of
people
have
documented
sort
of
like
relying
on
the
fact
that
they
expect
it
will
take.
You
know
the
termination
period
gray
seconds
to
terminate
a
pod
or
a
container
in
a
pod
on
a
liveness
probe
failure.
So
that
is
what
I
put
in
the
kubernetes
enhancement,
repo
sort
of
up.
B
As
like
a
straw
proposal,
I
think
I
put
the
the
link
in
the
notes
already
sorry
firefox
restarted,
so
all
of
my
tabs
have
not
loaded
yes,
so
I
put
the
I
linked
the
pr
in
the
agenda
and
I
don't
know
if
anybody
has
any
specific
comments
outside
of
what's
already
been
put
on
the
cap.
Pr,
but
you
know,
do
people
think
that
this
is
a
reasonable
approach.
There
are
some
other
possibilities
that
we
could
go
for.
That
would
not
require
a
cap
and
an
api
change.
B
The
api
change
that
I'm
proposing
is
backwards
compatible.
I
know
there
were
some
concerns
about
well,
if
we
add
this
like
field
to,
you
know
the
liveness
probe,
it's
going
to
be
on
all
of
the
probes,
so
like
are
all
of
the
probes
going
to
use
it
or
like.
Is
it
just
going
to
be
for
the
one
and
like
some
concerns
about
maybe
having
a
field
for
all
probes,
but
only
being
used
by
ones
so
happy
to
hear
your
feedback
comments
doesn't
have
to
be
now?
You
can
also
comment
on
the
pr.
A
Yeah
thanks
for
the
summary
and
the
fall
up
on
the
salon.
I
know
personally,
we
at
red
hat
were
just
hit
by
this
issue,
and
so
we
are
somewhat
bit
by
the
scars
of
teams
feeling
an
unintended
consequence
of
of
this
type
of
issue,
so
I
saw
dawn.
You
had
commented
that
you
had
seen
similar
and
that
it
might
have
had
some
issues
in
api
review.
Did
you
have
any
particular
references
to
it,
because
I
I
hadn't
recalled
any.
C
I
tried
to
find
I
before
this
meeting
I
couldn't
find
because
it
is.
I
think
this
is
why
people
started
to
work
around
that
issue,
and
this
is
particular
things
actually
hurt,
gke
a
while
back
and
also
I
saw
this
people
came
to
the
signal
discussing,
so
we
did
suggest
the
support
the
company,
but
without
all
the
detail,
because
I
obviously
from
the
clinton
I
see
the
people,
the
link
he
shared.
I
see
the
people
abusive
to
using
ever.
C
Obviously
what
we
see
also
it
is
abusive
using
so
also,
but
we
didn't
saw
the
so
so
so
there's
the
proposal
not
goes
to
the
enhancement
that
I
think
that's
the
pr
directly.
If
I
remember
correctly,
the
pr
director
tried
to
fix
this
problem
and
it's
basically
just
rejected
by
the
complicate
complexity.
C
C
No,
we
actually
don't
and
if
the
part
in
the
such
sea
nectar
for
like
the
liveness
probing
failure
and
the
config,
this
is
one
there's,
no
any
good
way.
So
that's
we
have
to
proactively
tell
the
customer
don't
do
that,
but
if
it
is
white
of
the
node
shut
down
because
we
need
a
drink,
so
we
basically
said
magazine
value
and
it
just
shut
up
like
five
minutes.
It
is
kind
of
readable.
A
Yeah,
so
I
get
that
that
that
can
happen
on
node
maintenance,
where
you
might
have
encountered
this,
but
these
issues
at
least
struck
us
when
node
node
maintenance
was
happening
right.
It
was
like
you
have
a
one
one-hour
graceful
termination
period
to
allow
requests
to
drain,
or
even
if
it
was
five
minutes
right,
like
you,
basically
double
your
outage
time,
so
I
guess
my
bias
would
be
I'd
love
for
us
to
find
a
way
to
solve
this
broadly
in
the
community
and
so
like,
like
kept
alana
yeah.
A
Let's
I'd
like
we
keep
pushing
this,
and
I
guess
if
folks
have
other
add-ons
or
operational
experience,
they
might
have
done
to
work
around
it.
Let's
get
them
on
the
cap,
but
hopefully
we
can
overcome
any
api
review
challenges
in
just
a
kept
discussion.
B
I
see
jack
francis
linked
a
comment
in
the
chat
where
I
guess
this
issue
was
reported
as
well
in
december.
The
issue
that
I'm
working
on
was
like
a
super
old
one
that
ended
up
getting
reopened
so
jack.
Do
you
have
any
thoughts
you
want
to
add.
D
Hey
elena
and
folks,
I'm
really
happy
people
are
talking
about
this.
So
basically,
my
involvement,
just
to
give
a
little
bit
of
a
backstory,
was
observing
the
change
that
went
in,
I
think
late
december
for
120.
D
to
fix
a
long-standing
edge
case
where
the
the
liveness
probe,
for
I
can't
remember
it
was
for
exactly
for
hdb,
but
one
of
the
two
was
never
working,
and
so
that
was
actually
fixed
and
then
absorbed
the
current
one
second
default,
which
has
the
had
the
practical
side
effect
of
suddenly
introducing
a
one
second
default
to
all
the
liveness
probes
out
there,
because
because
it
hasn't
been
working,
no
one
had
been
declaring
that
timeout
value.
D
I
mean
folks
who
are
doing
it
permissively,
I'm
sure
there
are
folks
who
are
trying
to
do
it
and
didn't
know
that
it
didn't
work
but
anyway.
So
I
was,
I
I'd,
observe
this
and
I
tried
to
basically
my
point
was:
let's
not
push
this
change.
Let's
roll
this
back,
because
this
is
going
to
surprise
people,
and
so
I've
been
thinking
about
ways
to
fix
this.
I
mean
it's
super
unfortunate
that
there's
this
behavior,
that
everyone
assumed
was
working
for
several
years
and
was
discovered
not
to
be
working
which
has
these
side
effects.
D
So
I'm
I'm
wondering
if
we
can
just
increase
the
that
issue.
The
conclusions
that
actually
right
now
are
to
perhaps
marshall,
a
discussion
around
just
increasing
the
timeout
to
something
like
30
seconds,
because
it
sounds
like
what
you
guys
have
observed.
Is
that
folks
noticed
this
new
timeout
said:
wait
a
minute.
I
don't
want
to
time
out.
So
I'm
going
to
just
throw
some
totally
crazy
high
value,
which
has.
B
There's
a
couple
of
things
I
think
possibly
getting
a
little
bit
conflated
here,
so
I
think,
looking
through
sort
of
the
the
tree
of
things,
your
link
there
points
to
the
issue,
so
that
was
exec
probe.
So
I
think
in
that
case,
exec
probe
timeouts
had
just
like
never
worked,
so
you
could
set
them.
B
Just
wouldn't
be
enforced
in
any
way,
and
so
I
guess
I'm
not
sure
how
exactly
that
was
implemented,
but
I
think
I
I
don't
know
if
that's
like
a
default
or
configurable
or
whatever.
In
this
case,
this
is
like
the
non-exec
probe
sort
of
like
liveness
probes,
like
http
that
kind
of
thing
where,
if
these
fail,
they
don't
have
anything
configurable
at
all.
They
default
to
use
the
termination
grace
period
seconds,
that's
set
at
a
pod
level
and
that
could
be
super
high
for
you
know
like
normal
operations.
D
B
A
If
we
determine
that
you're
not
live,
we
shouldn't
the
way
graceful
termination
period
was
originally
defined,
was
basically
saying
how
we
shut
down,
something
that
was
healthy
with
enough
time,
and
we
haven't
really
approached
how
to
give
the
right
amount
of
time
to
shut
down
something
that
was
already
known
to
be
unhealthy
and
so
yeah.
I
think
that's
a
good
summary
lana,
I
guess
maybe
as
a
concrete
action
item.
A
If
folks
could
review
alana's
cup-
and
maybe
we
could
see
if
we
can
reach
some
type
of
consensus
in
this
release
window
on
a
plan.
I
think
that
would
be
good,
but.
D
I'll
definitely,
happily
support
a
lot
of
effort
for
sure.
Thank
you
so
much.
F
I
I
was
wanting
to
just
chime
in
and
say
like
what
this
kind
of
goes
back
to.
What
does
aliveness
probe
failure
mean
right,
because
if
it
means
the
application
is
not
running,
then
we
want
to
kill
it
as
soon
as
possible.
Right,
like
zero
grace
period
termination
seconds
right.
F
We
want
to
kill
it
and
start
something
that
will
work
so
it
it
kind
of
goes
back
to
what
does
a
liveness
probe
failure
mean
if
it
means
that
the
application
is
not
serving
requests
or
not
operating
properly,
then
we
should
kill
it
immediately
and
there
shouldn't
be
any
time
out.
The
timeout
should
always
be
zero.
F
B
Think
that
that's
not
currently
defined
as
like
one
of
the
alternatives
in
this
in
the
cap,
so
I
could
add
that,
and
specifically,
I
think
maybe
the
way
that
we
would
do
that
is
like
we'd
need
a
feature
flag
for
that
behavior
for
sure,
but
like
if
we
have
the
feature
flag
turned
on,
be
like
like
kill
the
thing
right
away.
B
No
wait
ignore
the
the
thing,
but
in
the
other
case
like
maintain
the
default
behavior,
I
guess
my
question
would
be
if
we
feature
flag
that,
like
long
term,
do
we
want
to
face
that
in
as
the
default
or
like
do
we
want
to
keep
the
two
options
like
that
would
be
my
ori.
C
I
I
to
simplify
the
problem.
I
kind
of
really
agree
with
the
size
and
but
I
just
say
I
just
want
to
point
out
since
I
really
agree
with
you.
I
want
to
emphasize
this,
but
I
noticed
that
the
people
abusive
this
way
the
lifeless
problem
a
lot
of
times,
people
using
it's
not
really
just
loudness,
and
we
we
saw
that
right.
So
we
add
some
other
prop,
because
there's
some
people
using
that
loudness
is
readiness
and
yes,
I'm
using
for
the
some
other
purpose.
C
So
so
I
guess
like
there's
simply
a
trick
that
the
secure,
maybe
is
not
backward
compatible,
but
I
totally
agree
with
you
from
the
design
point
if
we
design
this
correctly
initially
and
then
we
shouldn't
have
this
part.
Oh
we
fixed
this
earlier
because
I
remember
when
this
is
probably
reported
long
time
back
and
before
we
have
so
many
structure
and
the
sake
see
the
architecture
and
so
that
so
then
we
shouldn't
allow
such
long
time
like
the
period
time
grease
period
time
for
lifeless
problem.
G
A
So
I
think
yeah
we
got
the
option
added.
Maybe
we
can
think
of
creative
ways
to
make
the
good
choice
the
safe
choice
that
people
opt
into
personally,
if
I
could
have
opted
in
on
the
pod
spec
to
say
liveness
means
you
know
true
death
and
that
would
have
been
great
for
our
use
case.
I'm
sure
anyone
else
who's
running
ingress
had
similar
scenarios.
A
So
thanks
anna
for
helping
push
this
forward
and
we'll
try
to
continue
discussion
on
the
so
in
the
interest
of
time.
Maybe
on
the
next
item
a
few
of
us
have
been
discussing
on
the
red
hat
side
what
we
were
interested
in
pursuing
in
121.
We
want
to
open
discussion
to
the
broader
community.
You
know
folks,
if
what
what
we
wanted
to
do
as
a
group
in
121
and
try
to
start
shepherding
planning
so
renault,
you
had
a
doc.
You
wanted
to
share
through
sure.
A
H
H
All
right,
can
you
see
my
screen.
B
H
H
Okay,
so
the
first
item
in
the
list
is
cri
graduation.
So,
while
doing
this
work,
we
realized
that
we
first
have
to
update
the
runtimes
like
container
d
and
cryo,
to
support
both
e1
and
v1
alpha
one.
So
we
can
bring
it
back
into
the
ci
and
then
drop
v1.
So
we
have
the
kept
merged.
Cryo
is
updated.
Mike,
do
you
know?
What's
the
status
on
container
d
side,
yeah
we're
updated
as
well?
Okay,
awesome,
then.
H
I
think
we
are
ready
for
the
next
step
to
bring
the
runtimes
back
into
ci
and
then
drop
v1
alpha
one,
so
so
that
that
can
proceed
no
blockers
over
there.
Any
questions
or
comments
on
that
item.
H
Okay,
so
the
next
one
is
run
as
group,
and
I
think
here
mike
opened
a
pr
and
then
it
just
didn't
go
further.
So
maybe
we
can
just
continue
moving
that
or
moving
that
work
forward,
because
it
looks
like
it
shouldn't
be
a
lot
of
work
beyond
addressing
the
comments
in
the
pr
I.
H
Think,
okay,
so
the
next
one
on
the
list
is
node,
graceful,
shutdown,
so
david's
pr
for
that
feature
got
merged
in
120
and
the
next
step
there
is
to
test
it
out
gather
feedback
and
also
have
a
plan
for
some
kind
of
an
end-to-end
test
in
the
ci.
So
we
can
move
it
to
beta.
H
All
right
so
so
we
spoke
yesterday
what
we
thought
is
like.
If
we
can,
I
don't
know
if
we
can
do
a
complete
restart
in
the
e2e
test,
but
at
least
we
can
do
half
of
it.
So
we
can
fake
the
signal
that
the
cubelet
gets
to
do
the
drain
and
then
have
a
special
pod
that
waits
till
the
end
of
its
graceful
period
to
write
out
a
file
and
then
make
sure
that
file
gets
written.
H
I
I
think
that
makes
sense
when
you're
having
n10
tests
of
the
the
shutdown
is
important
and
then
moving
into
beta
and
so
forth.
Okay,.
H
H
H
We
do
like
memory,
low
memory
high
that
don't
map
to
c
groups
v1,
so
we
play
with
that
and
then
the
runtimes,
for
example,
using
annotations
or
something
and
then
figure
out
how
we
can
expose
those
features
or
how
they
map
to
the
spot
settings
that
are
there
today
in
kubernetes
to
to
take
this
forward
and
then
the
second
thing
that
we
can
do
is
see
how
we
can
configure
the
qos
slices
to
take
advantage
of
these
new
features
and
and
like
the
third
step,
would
be
how
to
use
a
user
land
killer
like
umd
or
something.
A
A
H
Yeah
so,
like
I
see
a
lot
of
complaints
from
customers
around
like
kill,
really
not
behaving
well
and
the
end
game
really
is
to
get
us
to
a
position
where
we
can
use
custom
user
and
home
demons
and
and
also
better
behavior
from
the
kernel
with
the
fixes
going
into
v2.
I
C
So
manu,
I
want
to
ask
you
when
we
talk
about
the
sql
version
2,
we
are
not
talking
about
all
the
resource
management
right,
like
the
new
enhancement
for
memory,
new
enhancement,
cpu
and
the
new
enhancement
on
the
disk.
All
those
kind
of
things
we
are
not
we
just
only
want
to
have
enough
of
the
node
can
support
both
v1
and
v2
and
then
based
on
that
one
we
want
to
evolve
and
to
utilize
some
v2
functionality.
So
that's
the
separate
feature
request
right.
H
A
Yeah,
okay,
because
I'm
sorry
I
think,
on
the
cap
we
said
the
existing
kep
covered
parity
with
v1
capability,
so
anytime
we
bring
in
a
new
v2
capability,
like
I
read
that
as
a
separate
activity.
C
Yes,
yes,
so
I
just
want
to
make
it
clear,
because
people
start
talking
about
me
too
many
fancy
features,
but
actually
that
need
to
carefully
design
and
the
whole
memory
management,
even
like
the
qs
might
be
helped
you
thinking
about
as
a
whole
to
design.
This
is
just
only
for
the
capability,
so
we
can
start
support.
We.
H
Do
right
so
the
next
one
on
the
list
is
username
spaces,
and
is
this
mauricio
or
rodrigo
on
the
call?
H
Okay,
so
I
mean
they
have
there's
a
cap
open,
there's
been
a
fair
bit
of
black
preview,
going
on
with
tim,
hawkins
and
others
have
also
chimed
in
over
there.
So
if
we
get
some
agreement
that
then
we
can
probably
start
working
on
phase
one
is
if
we
like
reach
agreement
before
the
announcement.
C
H
C
Yes,
and
if
I
remove
correctly,
I
need
to
refresh
my
memory
on
this
one
so
the
evening
of
this
one,
there
are
some
small
changes
required
on
the
cri
api.
H
Yes,
yes,
yes,
those
changes.
C
H
Okay,
so
the
next
one
is
the
one
covered
by
ilana
the
liveness
broke.
Timeout
then
dawn
you
proposed
the
ephemeral
containers,
so
I
put
down.
Leeward
was
the
owner
for
moving
to
beta
and
yes,
can
you
be
the
approval
dawn
so
definitely
yeah,
okay,
okay,
awesome.
H
Okay,
so
the
next
one
on
the
list
swap
I
mean
it's
it's
early
days,
so
karen
and
elana
talked
about
it.
I
think
last
week
so
best
we
can
do
is
maybe
target
a
cap.
B
B
C
Are
we
going
to
target
this
one
for
the
secret
version
2
only
or
both
version
1
and
the
version
2
have
to
support?
I
think
we
we
because
I
remember,
there's
some
enhancement
and
but
the
idea
interpersonal
try
with
the
version
2
and
those
kind
of
support,
but
I
want
to
know
more,
and
I
do
we
want
to
just
support
one
motion,
sequel
for
version
2
or
maybe
both.
I
think
this
also
needs
because
I
believe
the
implementation
and
design
should
be
different.
If
I
remember
last
time,
I
checked
single
version,
two.
H
H
So
the
next
one
on
the
list
is
like
enabling
second
by
default-
and
we
spoke
about
this
just
before
the
break
and
sasha
and
I
are
working
together
on
a
cap.
We
will
get
one
ready
before
next
week's
meeting,
so
we
have
something
to
discuss
and
then
we
can
see
I
mean
if
we
reach
agreement,
we
feel
it.
It
may
not
be
too
much
of
a
stretch
to
actually
target
an
alpha
for
this.
H
So
next
one
I'm
gonna
hold
because
we
have
a
separate
topic
on
the
agenda
for
this
item
after
this
one,
then
I
think
there's
like
a
couple
more
items,
so
this
there's
a
sis
cuddles
move
graduating
it.
We
don't
have
an
owner
for
that.
Yet
so,
if
anyone
on
the
call
that
isn't
already
signed
up
for
something
has
the
time
and
want
to
be
an
owner,
just
let
us
know,
and
we
can
look
into
moving
that
forward.
A
Yeah,
so
on
this
one
renault,
it
wasn't
until
there
was
some
discussion
on
the
slack
channel
either
end
of
last
year.
Earlier
this
year,
around
says
cuddles,
where
I
would
have
thought
given
the
state
of
cisco's.
Now
I
would
have
been
fine
just
moving
it
right
to
ga,
but
there
was
some
feedback
around
the
user
experience
when
using
unsafe.
A
That
okay,
I
may
want
to
refresh
my
memory
on,
but
I
don't
know
if
the
person
who
gave
that
feedback
might
be
on
the
call
or
not,
but
it
was
more
saying
maybe
the
node
should
be
able
to
be
tainted
appropriately
if
it
had
unsafe,
cyst
controls
consumed
on
or
not,
but
we
can
go
back
and
okay,
okay,
even
that
I
think,
could
be
handled.
Maybe
post
ga
of
the
existing
sys
control
support.
Okay,
all
right.
H
There
I
got
captured
that
in
the
notes
we
can
follow.
H
H
So
the
next
one
is
cri
container
log
rotation
and
moving
graduating
that
and
uruguay.
She
is
happy
to
work
on
that.
I
think
this
was
captured
in
derek's
feature
list
and
we
just
didn't
get
to
120.
C
C
So
now,
if
we
want
to
enable
this
by
default,
I
want
to
make
sure
other
production
also
is
compatible
well
now.
There
is
because
that
time,
there's
no
any
compatibility
issue
on
the
gke,
but
because
I
have
the
compatibility
with
openshift,
so
we
delayed
engineer
delayed,
but
I
know
over
time
production
move
forward,
not
just
gke
and
not
just
openshift
and
not
many
products.
So
I
I
guess
this
one
will
have
to
do
some
like
the
due
diligence
to
figure
out.
There's,
no
compatibility.
A
B
Yeah
one
of
the
things
I
have
on
the
agenda
for
later
is
a
cap
triage
thing
from
last
meeting
where
I
went
and
did
a
spreadsheet
of
all
the
caps
and
one
thing
that
I've
noticed
is
people
reaching
out
to
me
being
like
where's,
my
cap,
and
it's
because
there
was
no
issue
tracking
it.
So
anybody
who
has
something
in
that
situation,
something
that
they
want
done
but
does
not
have
an
issue
in
the
k
enhancements
repo,
I'm
hoping
to
get
that
addressed
in
this
release.
H
So
the
next
one
on
the
list
is
a
memory
manager
and
I
think
it's
a
waiting
review
from
clayton.
J
H
Thanks
awesome
and
the
last
one
on
the
list
is
spoiled
resources.
Concrete
assignment
derek
I
don't
have
any.
I
don't
remember
enough
detail
here.
If
you
want
to
talk
to
it.
A
Yeah,
so
this
was
the
kept
was
already
approved
in
1
20,
but
the
pod
resources
endpoint
that
lets
you
do
things
like
gpu
understand
what
devices
got
assigned
to
what
pods
the
cap
had
extended
that
to
understand
what
cpu
sets
fair
call
got
assigned
to
particular
pods
and
I
think
we're
at
landed
in
the
last
releases.
We
had
the
cap,
but
then
other
engineering
distractions
prevented
us
from
getting
the
implementation
done.
A
So
some
of
the
goal
here,
I
think,
was
trying
to
support
a
longer
term
goal
of
having
topology
aware
scheduling,
information
fed
back
to
the
cluster
scheduler,
and
this
was
like
one
of
the
early
prereqs,
so
yeah
renault
had
reviewed
the
cap
and
we'll
look
at
the
implementation.
A
C
C
The
reason
I
opened
this
up
because
that
auto
sizing
system
reserved
without
to
really
handle
power
overhead
properly
and
the
charge
properly.
I
don't
think
you
still
have
like
the
issues
that
overloaded
the
still
potential,
and
so
we
may
have
to
want
to
sure.
H
A
The
the
only
the
other,
so
maybe
what
we
could
do
is
and
also
a
big
thank
you
for
helping
get
this
list
together
and
sharing
it.
Maybe
as
a
sig,
we
could
set
a
goal
of
giving
everyone
a
week
to
reflect
and
then
come
back
next
week
and
see
if
there
were
tweaks
or
adjustments
or
gaps
that
we
missed.
A
H
It
makes
sense
so
today,
maybe
next
week
we
come
back
and
do
like
a
comment
check
for
each
of
these
and
see
what
actually
we
want
to.
H
A
Clear,
like
authors
versus
approvers
type
stuff,
maybe
we
can
help
flesh
that
out
in
the
week
ahead,
yep
but
yeah,
a
big
thank
you
for
helping
put
this
through
and
the
rights
to
edit
this.
It
should
be
just
anybody
in
the
cigna
group
right,
so
I
think
so.
Yeah
yeah,
okay,
all
right!
If
it's
okay,
we
can
move
on
to
the
next
topic.
K
Yeah
I'm
falling
for
harshal,
so
at
red
hat,
we
are
seeing
quite
a
few
notes
going
into
not
ready
and
on
every
single
node
type
that
we
have.
K
We
have
to
calculate
the
system
reserve
to
try
and
protect
the
node
from
going
into
those
memory
constraints,
scenarios
where
the
cubelet's
not
running
sshs
and
running
systemd
performance,
not
working
correctly,
and
so
the
the
way
to
that
the
cubelet
does
this
currently
is
that
you
set
a
system
reserved
on
it
so
that
the
c
group
gets
set
throughout
the
system,
and
currently
you
have
to
do
that
prior
to
the
cubelet
starting
and
so
the
proposal
here
is
to
adopt
basically
google's
plan,
which
is
that
they
have
an
algorithm
to
calculate
a
system,
memory
and
cpu
reserved,
and
we
would
like
to
put
this
in
the
cubelet
itself.
K
And
the
way
that
we
are
proposing
to
do
this
currently
is
that
on
an
alpha
release,
we
would
have
basically
a
default
profile
that
contains
the
algorithm
that
google
uses
it
uses
a
a
sliding
scale
like
number
of
like
four
gigabytes
of
memory
gets
you
25
system
reserves
set
up
and
then
perhaps
on
a
beta
or
ga
release.
K
Allow
different
profiles
to
be
injected
into
the
system
depending
on.
If
there's
more
demand
for
this
feature,
but
I
think
for
red
hat's
purposes
this
the
scale
that
is
that
google's
using
is
probably
sufficient,
and
so
that's
basically
the
feature
that
we're
proposing
here.
We
can
do
it
outside
the
cubelet,
but
that
means
basically
pushing
out
scripts
to
all
the
nodes
to
be
able
to
calculate
this
in
a
uniform
way,
and
we
feel
that
the
cubelet's,
probably
the
right
spot
for
this.
K
K
C
C
If
you
notice
retreats
back
in
the
old
of
the
ci,
I
did
propose
and
talk
to
engineer
behind
and
google
the
proposal
actually
charging
to
the
node.
Even
like
there's
some
disk.
I
o
usage.
We
try
to
charge
it
to
the
container
charging
to
power,
but
that's
too
dramatic
changing
and
the
at
the
end
we
end
up
have
like
the
so
so
basically,
so
that's
why
we
have
the
proposed.
You
can
see
like
a
magazine,
part
per
load,
and
what
we
are
today
apply.
C
It
is
per
node
based
on
the
merging
size
is
not
optimized,
it's
not
so.
Basically,
we
base
it
on
our
production
based
on
the
kernel
version.
Also,
there's
the
factor
about
the
kernel
different
kernel
version.
Actually,
the
usage
also
is
different,
which
is
unfortunate.
This
kernel
thread
you
see,
is
different,
so
so
I
just
want
to
correctly.
We
are
not
really
auto
auto
sets
based
on
the
demand,
my
original,
it
is
basically
before
I
assign.
I
will
know
this
part.
This
container
will
go
into
that
node.
C
So
how
much
reserve
that
would
be
dynamic,
then
very
easily
that
because
the
scanner,
because
we
will
pass
that
node
allocable
to
the
scheduler,
you
can
see
all
those
design.
And
so
then
scheduling
will
be
due
west
late,
because
with
dynamic
change,
you
know
the
allocable
resource.
All
those
kind
of
things
system
will
say
can
I
attack
accept
this
job
will
be
do
the
best
we
didn't
do
that
at
the
end.
C
Actually,
today
is
based
on
the
machine
type
we
monitor
from
from
our
kernel,
worship
and
and
and
the
monitor
from
all
those
kind
of
things,
and
also
you
have
the
different
production
have
the
different
demon
up
from
the
schedule.
So
that's
why
we
didn't
really
need
that
level.
I
just
wanted
to
make.
You
know
how
complicated
it
is
and
in
the
original
design.
C
What
I
proposed
even
like
include
of
the
cia
and
also
node
allocable
and
many
things
I
didn't
think
about
auto
sizing
and
it's
just
hard
in
the
community,
even
if
the
gk
cloud
is
much
harder
to
do,
but
it
might
be,
can
do,
but
for
some
awful
people
using
kubernetes
for
their
internal
infrastructure.
Maybe
it's
way
easier,
but
things
like
the
gke
and
openshift
provide
services
for
cloud
user
which
much
harder.
C
I
just
honestly
share
with
you,
because
if,
like
the
book
I
will
propose,
I
need
to
propose
this
for
burger
many
times
and
but
then
the
cost
is
too
big
many
years
ago.
Cost
is
too
big
for
every
time
when
you
made
the
kernel
change.
So
that's
why
I
even
for
book
I
back
off.
I
just
make
you
know
here.
K
Maybe
it
would
be
more
beneficial
perhaps
to
work
on
the
pot
overhead,
maybe
instead
of
this,
what's
your
thought
on
that.
C
I
I
do
think
about
the
overhead,
so
the
reason
powder
over
had
also
been
proposed
a
long
time
back
and
we
delay
that
the
work
it
is
because
that
time
we
only
have
the
one
continental
right
time
right.
So
then
we
only
have
the
urancy
so
the
reason
power,
the
overhead
is
later
have,
like
the
more
urgency
related
urgency.
It's
just
because
the
clear
container
booming,
of
course
that's
not
the
clear
container,
that's
the
qatar
and
then
there's
the
device
cases
and
there's
the
many
other
like
the
some
like
other
container.
C
C
We
could
enhance
a
number
because
we
could
change
our
elite
container
in
a
long
time
when
we
worked
on
the
racket.
Actually
I
did
talk
to
core
as
fox
and
we
could
build
our
make
our
elite
container
more
intelligent.
We
never
get
to
do
those
things
because
is,
is
all
those
advanced
the
feature?
Might
not
the
common
use
cases
for
everyone?
So
that's
why
we
didn't
move
forward,
but
we
do
think
about
it.
In
long
run,
you
need
a
container
can
do
a
lot
of
other
jobs.
A
Yeah,
so
don,
thank
you
for
sharing
the
background,
at
least
on
google's
experience
when
we
had
researched
some
of
this,
I
think
we've
looked
at
what
every
vendor
is
trying
to
do
in
their
hosted
space,
and
I
think
everyone
has
different
tables
to
some
degree.
A
I
think
the
way
I'm
trying
to
think
about
this,
though,
and
just
to
maybe
softly
push
back
on
some
of
your
comments
is
we
have
heuristics
today
in
the
cube
that,
like
pods
perform
right.
That
is
a
good
example
of
like
it's
a
very
rough
heuristic,
but
it's
it's
probably
used
everywhere.
A
Right
now,
on
a
rough
like
yeah,
you
can
punt
what
10
pods
per
core
or
something
is
our
default,
and
I
I
think
what
I
see
is
a
lot
of
struggle
on
even
setting
it
at
all
setting
system
reserved
at
all
or
setting.
Basically
any
reservation
that,
like
I
felt
like
having
some
heuristic
from
the
sig
that
gave
was
potentially
valuable.
Whether
or
not
this
particular
mapping
table
is
the
right
table
to
go
as
a
kind
of
a
different
topic.
C
The
the
reason
it
is
how
you
are
going
to
measure
that
overhead
you
are
based
on
what
we
introduce
maximum
part
per
load
to
measure
all
you
basically
just
say.
Oh,
I
estimate
the
average
of
the
part
on
each
node.
How
much
how
you
are
going
to
estimate
that
and
how
you
are
going
to
say.
Oh
30
parts,
maybe
it
is
my
average
and
for
each
part
they
have
like
the
two
or
three
container
running
include
of
those
default
container
info
container.
C
Then
you
measure
that
one
and
also
then
you
end
up
especially
for
open
shift
use
cases
you
in
the
app
maybe
have
some
like
some
database
lack
of
those
things
running.
Maybe
they
want
a
single
one.
They
basically
told
you.
You
are
reserved
too
much
for
kubernetes,
which
is
only
managing
one
stateful
set
here
and
then
are
you
going
to
overwrite
a
lot
of
that
override,
so
this
is
kind
of
really
so.
This
is
why
I
think,
a
long
time
back,
I
see
the
easy
way
of
auto
sizing,
but
auto
sizing
dynamic
settings.
C
It's
not
auto
sensing.
Sorry,
dynamic
sizing
based
on
us
on
the
assignment
based
on
the
scheduling,
but
obviously
I
know
I
cannot
move
over
from
the
scheduler,
because
that's
more
change
for
them,
because
we
really
want
a
simple
scheduler
right
like
the
kubernetes
is
not
what
I
want
more
complex,
enhanced,
intelligent
scheduler,
but
the
auto
setting
is
really
hard
because
you
only
on
the
node
to
do
the
local
optimization,
which
is
not
a
cluster
level.
A
You
support
a
percentage
based
value
and
the
thing
that
so
you
can
say
I
reserve
10
of
ram
or
something
right,
and
the
thing
that
strikes
me
when
I
look
at
the
table
here
and
I
look
at
what
others
have
done
is
rather
than
just
having
a
fixed
percentage
or
a
literal
value,
it's
kind
of
like
if
I
could
provide
a
a
function
instead
right
and
really
what
this
table
is
showing
is
just
a
plot
and
like
if
we
could
provide
a
syntax
that
allow
somebody
to
provide
that
plot,
like
I
think
it's,
it
makes
things
a
lot
easier
for
how
you
manage
nodes
across
a
wide
variety
of
of
footprints
right.
C
I
I
agree
with
you.
This
is,
I
do
have
the
I
think
the
similar
proposal
captured
in
one
of
the
really
old
of
the
issue
like
the
what's
the
formula
we
can
using.
We
can
carry
out
this
discussion.
I
just
I
just
want
to
see
that
if
we
really
want
to
make
this
work
you
have
to
if
this
is
just
like
the
guidance
for
community,
maybe
that's.
A
Ryan's
point
on
maybe
pod
have
a
reheard
versus
this,
like
I
don't,
I
think,
that's
a
false
choice
versus
like
if
we
change
this
instead
of
saying
auto
system
reserve
sizing,
we
said
formula
based
reservations
and
still
left
it
easier
to
supply
the
formula.
I
think
we
can
get
the
same
results
and
people
can
then
figure
out
the
right
formulas
in
their
production
environment,
but,
like
it's
un,
it's
unquestionable
to
me
that,
like
this
is
a
a
plot
of
a
graph
and
people
figure
it
out
for
their
sizes,
how
to
best
line
that.
A
So
I
don't
know,
maybe
we
can
take
that
feedback
and
adjust
the
other
feedback.
I
don't
know
don.
If
you
had
experience
on
that,
you
wanted
to
share
was
if
we
wanted
to
do
reservations
for
other
resources
aside
from
memory,
so
I
think
renault
and
I
discussed
pids
and
I
forget
the
other
one-
monoliths
ephemeral,
disc
yeah.
Thank
you,
but
just
looking
at
this
at
least
ryan,
I
feel
like
having
a
function
would
be
going
a
long
way.
Okay,.
B
A
Yeah
good
point:
elana.
B
B
Yeah
he
he
asked
me
to
cover
for
him,
because
I
think
his
his
kid
is
sick.
I
can
cover
that
now.
If
we
want
to
cover
that
next.
A
Yeah,
if
there's
a
120
regression
now
it's
yeah
yeah.
B
It's
it's
a
regression
from
120.,
so
the
the
issue
is
linked
there
and
I
think
both
sergey
and
I
have
spent
some
time
looking
into
this.
B
Basically,
there
was
a
change
in
120,
which
was
actually
a
something
that
had
previously
been
reverted
because
of
performance
issues,
but
we
are
finding
that
introducing
this
again
we're
still
having
a
lot
of
performance
issues
on
pod
deletion,
and
this
is
causing
quite
a
lot
of
test
flaking,
as
well
as
like
just
generally
pods
are
taking
a
lot
longer
to
delete
and
there's
concerns
that
that
might
cause
production
implications
like
above
and
beyond
the
like
tests
flaking.
B
So
I
know
that
we've
had,
I
think,
pakoshu
like
suggested
a
possible
patch
to
fix
this.
But
I
looked
at
it
and
I
wasn't
like
it
adds
a
lot
more
complexity
to
the
cube
logic
and
I'm
not
convinced
it
will
actually
fix
it,
and
so
sergey
is
suggesting.
We
just
revert
it,
and
I
think
I
I
also
agree
with
that.
But
I
wanted
to
get
other
folks
feedback
on
this.
It's
a
really
like
sort
of
sticky
thing
that
has
to
do
with
the
sandbox
cleanup.
A
It's
always
a
recurring
topic
on.
If
we've
discovered
we're
leaking
something
or
we
weren't
seth.
Did
you
ever
look
at
the
original
ones
of
these
in
the
past
if
we
were
leaking
sandboxes
or
not?
Well,
I'm
thinking
back
to.
A
I
don't
know
if
david's
here,
but
we
had
issues
where
we
were
wanting
to
make
sure
is
the
pod
actually
gone
before
we
deleted
the
thing,
and
I
know
at
least
I
could
be
conflating
issues
alana,
I'm
sorry,
but
last
year
both
seth
and
I
spent
a
fair
bit
of
time
helping
various
engineers
explore
if
things
were
deleted
or
not
deleted,
and
my
memory
at
the
time
here
was
that
there
was
some
concern
that
sandboxes
weren't
always
being
short,
delete
or
not,
but
my
memory
could
be
poor,
so
I'll
I'm
I'll
have
to
track.
F
A
C
I
don't
I
don't
know
the
original
problem.
I
don't
think
about.
We
discussed
at
the
signal
that
I
saw
I
I
didn't
look
at
that,
but
I'm
totally
okay
with
the
reward
to
avoid
the
potential
of
the
production
regression,
but
then
we
can
have
more
time
to
figure
out
what's
how
to
properly
fix
the
original
issue.
C
C
B
Yeah
I'll
synch
with
him
and
then
either
he
or
I
or
someone
else
can
submit
the
pr
for
that.
C
A
Okay,
so
we
have
one
minute
left
and
apologies
for
the
full
agenda.
A
If
it's
okay,
the
items
we
didn't
get
to
will
move
to
our
next
agenda
and
if
some
of
them
were
items
that
we
wanted
to
explore,
looks
like
checkpoint
restore,
we
should
probably
get
them
on
the
121
plan
doc,
but
big.
Thank
you
to
everyone,
helping
make
the
meeting
so
productive
today.
So
I
will
talk
to
you
all
later.