►
From YouTube: Kubernetes SIG Node 20210831
Description
Meeting Agenda:
https://docs.google.com/document/d/1j3vrG6BgE0hUDs2e-1ZUegKN4W4Adb1B6oJ6j-4kyPU
A
Welcome
everyone
to
the
august
31st
node
meeting
meeting
is
recorded
I'll,
be
uploaded
to
youtube
shortly
afterwards
light
agenda.
Today,
I
think
the
first
item
kevin
had
put
up
around
getting
an
item
tracked
in
caps
that
I
know
donna
myself
got
labeled
with
the
right
milestone.
So
I
think
that's
settled.
A
B
Yeah
sure
so,
like
I've,
I've
been
doing
like
profiling
and
cpu
memory
usage
analysis
on
the
node
side,
and
one
thing
I
notice
is
like
in
cryo
we
used,
we
use
a
library,
a
parallel
gzip
library,
to
speed
up
image,
pulling
and
like
extraction
and
so
on,
and
that
library
ends
up
allocating
bigger
buffers
to
to
speed
things
up.
So
it
is
like.
Okay,
you
get
30
speed
and
wall
clock
time
compared
to
using
the
default
like
32
kb
size
buffers,
but
the
problem
we
run
into
with
something
like
that
is.
B
If
you
don't
limit
the
number
of
concurrent
image
pools,
you
end
up
with
huge
spikes
in
memory.
So
I'm
trying
to
find
the
right
balance
here.
Like
one
thing
we
considered
was
like.
Okay,
do
we
switch
away
from
the
library
like
on
some
devices?
We
don't
care
if
it
takes
longer
to
pull
images,
but
even
if
say,
I
want
to
use
that
library
to
speed
up
my
image
pulls,
but
I
also
want
to
stay
under
a
certain
memory
limit.
Maybe
it
might
make
sense
to
limit
the
number
of
concurrent
image
pulls
that
I
perform.
B
So
we
have
a
serial
option
in
the
cubelet
and
then
we
have
a
one
that
say:
don't
serialize
which
I'm
assuming
like
it
does
everything
in
concurrent.
So
I'm
I
want
to
bring
up
whether
we
ever
discussed
having
a
limit
on
the
number
number
of
image
pools.
We
can
do
concurrently
in
the
cubelet
and
does
it
make
sense
to
add
one
so
we
have
like
we
can
put
a
cap
or
we
can
adjust
our
memory
usage.
So
it
is
predictable.
B
Yeah
yeah,
so
cryo
can
do
it,
but
the
problem
there
is
then
we'll
have
to
deal
with
the
cubelet
asking
cryo
to
pull
an
image
and
then
cryo,
saying
yeah
and
then
just
it
doesn't
get
to
it
and
like
just
additional
back
and
forth
between
the
cubelet
and
the
cryo,
and
I
I
feel
it
might
be
cleaner
to
do
it
on
the
cubelet
side
because
it
it
controls,
like
we
already
have
a
serialized
serialized
flag.
So
on
top
of
that,
if
we
have
okay,
do
five
pulls
or
do
six
pulls.
B
Anyways
you're
gonna
be
like
saturating
your
network
beyond
a
certain
number
and
you
won't
be
able
to
pull
any
more
images
so
having
such
a
tunable
may
be
helpful
and
especially
useful
in
like
low
memory,
edge
edge,
node
kind
of
scenarios,
so.
A
If
I
recall
on
the
number
of
the
size
of
that
queue
that
could
be
in
that
image
pulling
for
the
parallel
puller,
I
was
trying
to
refresh
my
own
memory
on
this.
I
don't
think
we
had
anything
that,
like
kept,
that
cue.
A
B
B
E
You
know
you
talk
or
you
talk
about
the
memory
usage
is
higher
right
and,
if
we
add
some
limit
to
I'll
talk
about,
I
think
just
like
the
concurrent
number,
or
are
you
going
to
propose
to
add
some
resource
limit
that
concerning
him,
cannot
cannot
use
more
than
what
important
image.
B
Yeah,
so
my
my
goal
ultimately
is
like
I
should
be
able
to
come
up
with
a
number
that
hey,
I
am
doing
so
many
image
pools.
I
am
running
so
many
pods
and
I
am
running
so
many
containers
and
for
a
particular
version
of
go.
My
memory
usage
should
be
predictable
and
and
like
like
doing
it
like,
I
think,
with
the
changes
going
into
the
go
run
time,
it
may
be
hard
to
do
it
with
just
the
like,
having
a
memory
limit
but
to
have
these
knobs
and
then
for
each
version.
B
We
test
and
publish
and
at
some
point
we'll
stabilize
when
gold
vm
doesn't
change.
Like
recently,
I
saw
some
numbers
that
go
117
reduce
the
rs
rss
usage
drastically.
We
haven't
yet
tested
with
that,
but
that
may
help.
But
the
problem
I
have
right
now
is
like
we
don't
have
a
cap
and
because
of
that,
like
any
kind
of
system
reservation
or
any
alerts
that
we
set
up
in
that
area
are
meaningless,
because
if
you
have
a
spike
in
the
number
of
image
pools,
that
number
could
go
up
and
then
your
go.
Vm
is
gonna.
B
Hang
on
to
that
memory
for
like
five
ten
minutes
and
customer
is
gonna
get
alert,
so
I
wanna
put
a
cap
on
that.
So
at
least
I
know
with
my
testing
that
cryo
will
never
exceed
this
number
or
I
can
go
and
change
the
code
that
oh
1mb
is
too
big.
So
I
want
to
make
my
buffers
to
be
like
half
of
or
one
fourth
of
that
size.
So
I
can
trade
off
like
how
much
my
image
pull
time
is
versus
how
much
memory
and
cpu
it's
using.
E
I
see
yeah,
I
think
it
I
just.
I
think
it
makes
sense
to
me,
but
I
just
want
to
also
mention
one
thing
that
I
think
in
anything
continuity
there's
also
there's
another
concurrency
configuration
there,
that
is
for
for
each
image
how
many
layers.
B
E
B
B
A
I
mean
I
have
no
conceptual
issue
to
it.
I
mean
it's,
it's
just
the
size
of
a
channel
that
we're
filling
in
the
current
palette
puller.
So
we
could.
G
A
Thanks
awesome
thanks
ronald
next
item
here:
no,
the
agenda's
growing.
A
H
So
like
just
to
provide
a
little
bit
of
context
on
the
issue
and
like
what
the
idea
behind
it
was.
H
The
originally
kept
foreign
containers
mentioned
that
for
cluster
administrators
to
like
identify
pods
that
had
cut
fm
error,
containers
created
a
new
port
condition
would
be
added
and
that
part
would
would
be
added
like
for
that
pod
and
through
that
condition
it
would
be
recognized
that
said,
pod
had
an
ephemeral
container
created
now
that
wasn't
implemented
so
far,
but
hopefully
in
123.
H
That
should
at
least
like
we're
hoping
to
try
and
implement
that.
So
what
I
wanted
to
try
and
get
feedback
on
is
so
if
you
go
through
the
issue,
there
is
discussion.
There
is
discussion
around
two
major
themes.
One
being
should
we
add
pod
condition
to
only
pods
that
have
ethanol
containers
created
or
should
the
spot
condition
also
be
applicable
to
pods
that
have
cube
cut
legs
executed
on
them.
So
what
I
basically
wanted
to
try
and
get
feedback
on
is,
do
we
add
like
us
like
under
one
part
condition?
H
Do
we
cover
both
of
these
cases
or
do
we
not
need
a
port
conditioner
for
keyboard
legs?
I
can
just
work
on
small
containers
on
this
new,
like
added
port
condition
or
like
along
along
those
lines
along
those
lines.
A
Yeah
I
was
just
bringing
out
the
ephemeral
containers
kept
to
refresh
my
memory
on.
A
I
know,
at
least
speaking
from
our
experience
at
red
hat
dealing
with
users
of
kubernetes.
We
have
a
number
of
users
that.
A
Want
to
proactively
disable
the
usage
of
exec
those
same
users
would
probably
also
proactively
disable
the
usage
of
ephemeral
containers
from
a
for
a
security
posture.
A
The
one
thing
that's
giving
me
pause
on,
maybe
speaking
to
your
question:
exactly
is
the
other
cap.
That's
put
forward
right
now
to
handle
container
notifications.
A
A
A
Okay,
just
looking
at
the
cap,
but
we
didn't
actually
name
the
condition
in
the
related
part
of
the
cup.
H
A
Okay,
if
we
look
back,
if
you
look
back
on
the.
A
The
notification
api
kept,
which
I'll
paste
in
the
chat
that
also
maybe
we
would
want
to
think
about
if
there
would
be
a
condition
tied
to
that,
because
that's
basically
a
way
of
doing
execs
without
doing
the
end
user
initiating
the
exec.
A
A
All
right
and
then
it
looks
like
we
have
one
one
other
item
on
the
agenda.
A
F
Yeah,
so
we
were
experimenting
on
windows
with
running
the
sandbox
container
as
different
users
to
match
linux
functionality,
and
we
broke
something
what
we
know
what
we
broke,
but
we
saw
an
interesting
behavior
come
out
of
that.
That
we
wanted
to
just
double
check
here.
Is
that
if
the
sandbox
container
fails
to
start
the
pod
stays
in
a
pending
state
and
you
can
look
and
see
the
errors?
F
A
A
The
cubelet
should
still
destroy
that
pod,
but
I
think
pod
phases
generally
was
a
tough
thing
to
think
through
here,
in
the
sense
that
there
isn't
like
a
great
phase
that
a
pod
in
that
set
could
go
to
and
it
was
kind
of
a
fixed
state
machine.
So
was
there
something
mark
when
you're
looking
this
that
you
thought
it
made
more
sense
to
be
in.
A
A
F
A
Would
expect
it
to
stay
pending
forever?
We'd
have
the
same
issue
as
well.
If
the
cni
hadn't
yet
been
deployed.
Okay
on
that
node
right,
where
the
sandbox
also
wouldn't
create
so.
F
D
A
What
was
the
reason,
though,
that
you're
saying
what
were
you
expecting
in
the
situation
you're,
exploring
that
the
sandbox
would
eventually
successfully
create.
F
No,
there
was
an
issue
in
the
way
that
we
were
trying
to
to
start
the
sandbox
where
it
would
never
create-
and
we
also
do
see
this
occasionally
so
with
on
on
windows.
Each
like
the
container
image
needs
to
be
paired
to
the
container
like
to
the
os
that
it's
running
on.
So
each
new
version
of
windows
that
comes
out.
F
We
need
to
add
a
new
image
to
the
new
container
image,
to
the
pause
image
that
we
publish
or
that
we
build
and
release
out
of
kubernetes,
and
we
have
seen
in
the
past
that,
when,
if
users
update
the
os
versions
and
don't
take
a
new
pod
or
don't
take
a
new
pause
image
that
contains
a
container
image
that
matches
that
the
the
sandbox
image
won't
start.
And
then
you
get
into
the
same
state.
A
Yeah,
so
the
behavior
thing
I
think,
is
as
intended,
and
maybe
what
this
is
a
reminder
of.
It's
like,
I
think
pod
phase
is
probably
the
thing
identified
as
like
one
of
the
errors
of
kubernetes
and
so
versus
uses
of
conditions
generally,
and
we
can't
say
that
it's
running
and
we
can't
add
a
new
state
and
we
can't
destroy
the
pods.
So
I
think
unfortunately
like
where
you're
at
right
now
is.
Our
hope
would
be
that
the
sandbox
gets
created
successfully
or
if
something
else
had
to
go
and
reap
these
pods.
A
They
would
have
to
do
that
themselves.
F
Okay,
yeah
we're
not
necessarily
looking
we
weren't,
necessarily
looking
for
a
change
of
behavior,
but
we
were
just
wondering
if
there
was
intention
behind
that.
It
sounds
like
there
is.
I
Reviews
actually
not
a
new
topic,
but
my
question
to
previous
one.
So
would
it
make
sense
to
also
combine
that
with
a
problem
when
the
runtime
is
not
able
to
create
a
sandbox
and
port
required
to
be
evicted
from
the
node
and
reschedule
it
somewhere
or
in
this
particular
case
rescheduling
into
another?
Node
will
not
help.
A
I
My
my
question
is
like
scenario
for
for
this:
when
the
sandbox
creation
failed,
would
it
help
if
this
port
will
be
evicted
from
the
node
and
when
rescheduled
somewhere.
I
I
F
I
H
B
I
F
A
It
alex,
I
mean,
I
think,
generally
there's
no
shortage
of
higher
order,
things
that
could
go
and
write
a
controller
to
reap
that
pod
and
hope
it
gets
rescheduled
elsewhere.
A
Yeah,
I
just
don't
know
if
the
decision
the
cubelet
would
choose
to
make
would
always
be
right,
like
so
take
the
the
cni
choice,
let's
say:
you're
running
a
single
node
kubernetes,
and
when
you
do
life
cycle
maintenance
on
that
single
node
kubernetes,
you
don't
drain
your
workload
because
there
is
no
other
node
for
it
to
go
to
you
restart
your
box.
Behavior
right
now
for
cubelet
is
that
it
will
just
try
to
restart
all
pods
that
had
previously
been
scheduled
to
it.
Pods
had
been
scheduled
to
it.
A
That
said,
they
expected
a
cni,
and
if
we
have
a
situation
where
we
try
to
start
a
pod
that
the
cni
wasn't
present
yet
because
the
cni
itself
hadn't
had
its
statements
that
launched
like
you,
wouldn't
want
to.
I
Yeah,
I
understand,
with
cni
case
I'm
more
worried
about
this
infrastructure
containers.
What
mark
mentioned
is
which
is
a
bit.
A
More
problematic,
but
like
the
usage
of
a
pause
container
itself,
for
example,
is
not
uniform
across
all
run
times,
so
cryo
itself
doesn't
always
start
a
pause
container
depending
on
what
the
the
the
pod
needed.
So
it's
kind
of
opaque
to
the
keyboard.
At
that
point,.
I
Yeah
yeah
and
what's
was
the
background
of
my
question
like,
should
we
have
a
scenario
with
runtime
returns
where
which
says
like
regardless
how
much
you
try
it?
I
can't
run
that
spot
on
this
note.
I
I
Well,
maybe
like
hypothetical
example,
but
let's
say
we
have
a
vm
based
runtime
and
for
some
reason
like
it's,
it's
configured
reporters
schedule,
it
run
time.
Class
set
for
this
vm
based
runtime,
but
let's
say
like
virtualization,
is
not
enabled
on
onenote.
So
regardless
how
many
times
you
try,
the
hypervisor
will
say
sorry
vtx
is
not
enabled
on
this
node.
I
cannot
start
it
properly.
A
So
renault
you
had
looked
in
the
past
on
maybe
enriching
error
handling
responses
from
cri
to
cubelet,
maybe
alex.
If
you
had
a
few
examples
we
could
we
could
look
at
trying
to
enrich
that
api.
That's
so
that
the
cubicle
could
make
a
decision
that
says
the
runtime's
telling
us.
You
know,
there's
just
no
hope.
Okay,
yeah
I'll
I'll
check
this
one
yeah.
J
J
Yeah,
I
just
wanted
to
call
out
that
we
see
similar
cases
with
the
cubelet
volume
manager
as
well,
where
the
csi
plug-in
might
try
to
basically
attach
your
amount
of
volume,
and
you
know
the
cubelet
just
basically
continues
to
retry
through
the
cubelet
volume
manager,
even
though
it
might
be
a
terminal
condition
where
the
csi
plug-in
may
not
ever
succeed
right.
So
it
just
continues
to
try
to
attach
an
amount
of
volume,
so
kind
of
like
a
similar
scenario.
K
So
I
wanted
to
add
one
example
as
well
so
yeah,
similarly
with
the
mismatches
as
well
on
windows,
not
only
with
bose,
if
one
of
the
customers
is
using
image
that
mismatches
a
host,
then
we
end
up
in
similar
scenario
as
well,
where
we
keep
pre
trying
to
recreate
the
sandbox,
but
this
will
never
run
on
this
host.
K
So
I
agree
with
the
point
of
alex
if
we
can
filter
some
types
of
errors
that
yeah
really
runtime
can
run
this
pod
anyway,
on
this
node
and
then
in
this
case
we
should
try
elsewhere
or
stop
trying
to
run
it.
A
The
other
thing
that
we
could
think
about
in
this
is
this
sounds
like
a
just
like
a
startup
problem,
so
we
have
deadline
seconds
which
basically
says
how
long
this
pod
can
run
on
this.
This
node
before
the
cubelet
proactively
reaps
it.
Maybe
we
could
think
about
use
cases
of
like
startup
periods.
Let's
say
if
this
pod
doesn't
start
up
in
period
x,
then
the
qubit
also
should
proactively
read
that.
I
My
also
second
reason
for
the
question
was
like:
yes,
we
with
all
these
deadlines.
We
can
have
a
scenario.
What
port
pod
will
be
market
has
failed,
but
maybe
we
should
have
a
scenario
to
report
to
scheduler
what
we
need
to
find
another
place
for
this
spot.
I
Well,
for
a
good
example
of
this
csi
plug-in,
so
what?
For
some
reason,
the
volume
is
not
possible
to
attach
on
this
node
so
find
another
node,
where
this
volume
also
will
be
available.
A
So
I'm
sure
there's
like
specific
scenarios.
We
could
work
through
so
yeah,
maybe
alex
if
you
had
a
few
like,
I
said
if
we
had
a
way
from
the
cri
to
the
cubelet
to
advertise.
This
is
this
is
has
no
more
hope.
Then.
Maybe
we
could
think
about
proactively
terminating
that
pod.
I'm
not
I'd
have
to
think
more
on
the
volume
scenarios,
but.
I
Yeah,
so
that's
practically
applicable
to
any
of
extension
to
a
couplet.
So
when
we're
well,
I
wouldn't
bring
with
device
plugins,
but
it's
also
a
possible
scenario.
What's
like?
Yes,
you
can
try
to
allocate
the
device,
but
for
some
reason
I
can't
satisfy
this
device
request
and
it
needs
to
be
more
without
from
an
old.
I
So
it's
probably
a
generic
thing
to
say
what
this
port,
for
some
reason,
is
not
runnable
on
this
node
and
needs
to
find
a
new
place,
and
we
need
to
have
that
from
storage
interface
from
runtime
from
device
plugins
or
whatever
else,
extending
mechanism
we
might
have.
A
Yeah
I
agree
alex,
but
just
for
the
case
of
the
device
plug-in
though
I
thought
that
the
expectation
would
be
that
the
device
plug-in
dynamically
updates
the
allocatable
number
of
devices
on
that
node,
so
that,
if
it
the
device
was
unhealthy.
But
it's
it's
not
going
to
be
counted
by
the
scheduler.
A
Anyway,
I
think
each
one
of
these
is
complicated
to
reason
through.
So
if,
if
there
were
things
just
from
cri
flows,
maybe
we
could
focus
on
one
call
in
particular,
like
start
pod
sandbox
and
find
out
if
there
were
well
understood,
terminal
cases
that
we
can
respond
to.
A
All
right:
well,
I
think
that
is
today's
agenda.
Just
a
reminder,
I
think
the
dates
for
caps
are
september
9th,
so
I
hope
to
get
through
a
lot
of
them
this
week
and
I'm
sure
the
other
reviewers
will
will
do
their
best
as
well.
So
we
will
meet
again
next
week.
There's.
D
Also,
a
soft
deadline
from
the
production
readiness
team
that,
if
your
kepp
is
not
if
it
doesn't
have
the
prr
questionnaire,
filled
out
and
ready
by,
I
think
this
thursday
one
week
before
the
deadline
we
may
not
get
to
yours.
So
please
ensure
that
if
you
want
to
get
your
prr
approved,
that
your
kept
is
up
and
has
the
questionnaire
filled
out
by
this
thursday.
A
Awesome
thanks
a
lot
and
we'll
see
you
all
next
week.