►
From YouTube: Kubernetes SIG Apps 20190408
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
A
A
So
no
co-host
and
primarily
I'm
gonna
discuss
some
workload
stuff,
the
caps
that
are
in
play,
tour
and
that
we're
looking
at
presently
and
discuss
some
things
about
mentoring
and
what
we
might
want
to
do
with
our
owners
files
with
respect
to
reviewers
and
approvers,
so
I'm
gonna
start
with
the
latter
topic.
Actually,
so
one
of
the
one
of
the
things
that
we
looked
at
after
talking
to
Paris
was
what
kind
of
housekeeping
do
we
need
zu
and
aside
from
cleaning
up
the
cig,
apps
main
page
in
kubernetes
community?
A
We
also
could
do
better,
probably
with
dealing
with
our
approvers
and
reviewers,
like
for
the
bass
owners
file.
We
have
no
owner's
alias
for
sig
apps
approvers.
We
have
kind
of
a
makeshift
collection
of
approvers
and
then
for
reviewers.
It's
also
kind
of
makeshift,
and
it's
not
clear
to
me
that
the
sig
apps
reviewer,
because
I
don't
think
the
sig
apps
reviewers
owner,
alias,
is
actually
used
in
the
workloads
API
files,
which
probably
isn't
super
fair
to
reviewers
for
various
components
of
the
API
and
certainly
doesn't
make
it
easier.
A
A
Files,
I
think,
is
something
we
can
do
immediately
to
kind
of
improve
that
my
thought
is
to
still
leave
some
people,
who
are
obviously
concerned
with
a
particular
portion
of
the
workloads
API
who
have
been
contributing
to
it
as
owners
of
what
reviewers
at
least
and
approvers.
In
some
cases
across
the
surface
like
there's
like
Mike
Denis
has
contributed
to
daemon
set
and
Klaus
has
is
also
very
interested
in
daemon
set
because
of
the
scheduling,
logic
and
the
implications
from
moving
to
default.
A
Scheduler,
though
you
may
not
be
at
this
point
because
that
move
is
complete
and
most
of
the
fallout
from
the
move,
we've
already
worked
our
way
through
so
but
the
the
notion
of
that
change,
and
even
that
the
demon
step
has
to
deal
with
certain
scheduling
predicates.
It
might
be
good
to
keep
him
there
as
a
reviewer
or
an
approver
I.
Don't
think
he's
going
to
be
contributing
as
actively
at
this
point.
A
History
have
enough
contributions
to
warrant,
or
at
least
reviewer
ship
and
from
there
we
can,
probably
if
people
are
interested
in
becoming
reviewers
and
approvers
and
so
forth,
set
up
something
a
little
bit
more
like
like
actively
asked
to
be
assigned
reviews
and
one
of
the
reviewers
will
go
through
and
find
stuff
for
you
to
look
at.
There
are
plenty
of
open
things
that
we
could
triage
and
we
could
spread
that
out
a
little
bit
more
and
that
would
be
a
good
way
to
help.
People
who
are
interested
grow.
B
I'm,
okay,
being
a
global
or
if
you're
I,
don't
see
any
problems
with
that.
Okay
and
a
brutal
ship
sure
also
one
question:
when
you
will
be
putting
together
the
PR
moving
and
reshuffling
owners,
approvers
and
reviewers,
can
you
send
that
one
out
over
to
say
gaps,
mailing
list?
The
reason
for
that
is
that
will
be
a
wider
distribution
of
the
fact
and
it
it
might
be
easier
for
everyone
to
be
aware
that
this
is
the
change
that
we
are
doing
get
help.
B
B
A
Architecture
and
we're
gonna
need
somebody
with
route
approvers
to
do
it
anyway,
because
I
want
to
add
in
your
owners,
bailius
to
the
route
to
the
route
owners,
but
talking
to
the
guys
from
the
city,
architecture
and
steering
committee.
They
seem
fine
with
the
idea.
They're
actually
encouraging
other
states
to
do
something.
Similar
yeah.
B
A
I
think
one
thing
that
we
can
do
if
we
move,
if
we
put
it
as
owners,
aliases
and
primarily
use
the
aliases
to
control
who's
reviewing
and
approving
in
the
particular
files
in
the
workloads,
API
and
like
at
this
point
with
v1
I.
Think
the
right
thing
to
do
is
kind
of
grow,
more
global
reviewer
ship,
as
opposed
to
people
who
are
point
specialists
on
particular
areas
of
the
surface.
A
If
we,
if
we
manage
it
from
like
a
global
root
owners
and
owners
file
with
approvers
and
reviewers,
it's
easier
to
add
people
and
it's
easier
to
for
it
to
be
clearly
visible,
who
the
reviewers
are
and
who
the
owners
are.
They're,
approvers
are
and
I
think
that
helps
visibility
right
now,
I
mean
actually
doing
a
review
of
who's
in
each
file.
It's
really
not
it's
not
out
of
whack,
it's
okay,
it's
fine!
It's
just
like,
and
then
there
are
a
few
people
who
have
like
either
moved
on
who
are
contributing
actively.
A
B
A
A
When
batch,
cron
and
cron
job
and
job
yeah
that
a
heavy
liberal
ad
staple
set
was
kind
of
its
own
thing,
yeah,
but
I'm
looking
at
it
now
and
the
way.
Looking
at
the
PRS
and
looking
at
the
reviews,
it
seems
like
people
are
kind
of
broadened
out
already
and
that's
just
kind
of
taking
its
natural
progression
and
building
that
expertise
across
the
various
controllers
is
something
I
think
should
be
a
goal.
B
Especially
that,
if
you
well,
maybe
that
it
will
be
a
slight
over
exaggeration,
but
given,
if
you
know
one
controller,
you
should
be
feel
pretty
confident
skipping
between
controllers.
Of
course,
each
and
every
single
of
them
will
have
some
bits
and
knits.
That
will
be
specific
to
that
particular
controller,
but
I'm,
hoping
that
those
places
are
usually
either
covered
by
tests
please
or
or
have
a
decent
document,
a
comment
explaining.
Yes,
we
did
it
because
because
of
this,
and
that
so.
A
So
my
great
hope
our
test
coverage
as
a
sig
is
actually
pretty
good
and
ie
coverage
is
actually
pretty
good.
Our
flakiness
is
much
lower
than
it
was
a
couple
of
years
ago,
so
it's
we're.
Okay,
there.
We
agree
with
what
you're
saying
I
think
one
of
the
things
the
the
parts
about
understanding
what
the
controller
does
and
how
that's
implemented,
include
used
to
be
kind
of
a
lot
more
important
than
understanding
the
base
infrastructure,
but
I
feel
like
with
the
introduction
of
shared
Informer,
like
as
the
the
shared
infrastructure
has
evolved.
A
A
lot
I
think
the
understanding
of
how
a
controller
works
and
all
the
intricacies
around
what
you
can
do
in
the
cache
and
the
shared
and
former
cache.
What
happens
when
you
mutate
objects,
and
why
you
should
never
ever
do
that,
but
why
pointers
are
used
instead
of
immutable
copy,
not
even
reference.
A
It's
just
copied
Struck's
like
that
understanding
should
be
global,
and
if
you
understand
like
that,
much
like
you're
saying
it
should
be,
you
should
be
comfortable
doing
basic
reviews,
and
hopefully
the
idea
would
be
that
you
reach
out
to
people
who
would
be
more
experts
if
necessary,
if
you're,
not
confident
with
the
review.
Yeah.
B
Also,
the
white
adoption
off
see
are
these,
which,
in
majority
of
cases
ends
up
people
writing
their
own
controllers
for
their
own.
Specific
resources
allows
to
have
a
broader
knowledge
of
the
controller,
so
the
basic
knowledge
is
there.
You
just
need.
C
B
A
And
then
the
other
thing
we
might
get
out
of
this
is
if
we
can
hopefully
find
more
commonalities
in
places
where
we
might
be
able
to
refactor
out
to
share
a
little
bit
more
code
across
the
controller's.
That
would
be
helpful
to
oh
yeah,
oh
yeah
yeah.
We
could
do
some.
We
there
might
be
some
code
clean
up
that
that's
that's
available
to
be
done
in
the
not-too-distant
future,
and
that's
that's
all
I
really
had
to
say
on
it.
Does
anybody
else
have
anything
to
add
or
when
I
talk
about
more.
D
Yeah
I
liked
it
we
will
clean
up
the
aliases
and
stuff
because
we
also
have
like
there
are
some
shaped
tools
like
I,
think
controller
wraps
manager.
That's
a
separate
tool
which
is
shared
for
the
controller,
so
different
than
the
ownership
can
use
a
bit
of
a
cleanup.
I
think
end-to-end
tests
have
different
owners
as
well
right,
yeah,.
A
But
it
seems
to
me,
like
everybody,
is
a
global,
easy
plastic
Ross,
the
entire
we
we
need
to
I-I'll
take
I'll.
Try,
take
a
look
at
that
to
you
and
figure
out
what
we
can
do.
There
I
think
we
can
now
that
we've
broken
out.
We
had
sig
apps
labeled
tests
that
we're
responsible
for
make
that
ownership
a
little
bit
better
as
well,
but
the
ownership
mechanism.
There
is
a
little
bit
different.
A
A
Okay,
cool,
so
then
the
other
stuff
I
just
want
to
talk
about
current
work
in
flight
and
kind
of
the
way.
So
the
way
we've
kind
of
been
trying
to
grow
contributor
ship
isn't
so
much
like
for
people
who
are
interested
in
committing
we
kind
of
start
with
like
okay.
A
A
Now
the
current
process
is
a
cap
which
you
know
we
review
will
give
feedback
on
and
then,
when
you
open,
PRS,
we'll
review
them,
approve
them
so
forth
and
that's
a
great
way
to
get
started
contributing
for
people
who
are
kind
of
just
interested
in
contributing
to
the
project
because
they
they
are
really
interested
in
kubernetes
and
the
workload
JP
I
am
in
particular,
but
they
don't
have
something
that
they
need
to
merge
for
their
own
use.
The
other
way
in
terms
of
you
know,
reviewing
PRS
and
you
know
helping
triage.
A
That's
that's
another
way
to
grow
contributor
ship,
but
the
primary
way
we've
been
doing
it
is
through
the
capsule,
and
this
is
when
they
kind
of
go
over
where
various
stuff
is
and
see.
If
anybody
else
wants
to
reach
out
and
add
comments,
so
the
staple
set
max
unavailable
is
merged
currently,
but
I
think
we
still
need
to
add
some.
You
know
and
let
me
go
ahead
and
just
bring
some
of
us
up
on
my
screen.
A
A
B
A
B
A
For
I
mean
that
should
be
a
desirable
goal
for
new
things
that
are
architected.
I
think
the
reason
people
are
okay
with
implementing
this
to
support
like
service
mesh
technologies
is
that
you
know
kubernetes
took
an
approach
of
it's
kind
of
different
from
some
of
the
other
container
orchestration
systems.
In
terms
of
we
want,
we
wanted
it
to
be.
A
We
wanted
legacy
applications
that
were
containerized
to
work
well
on
top
of
continuous
and
because
of
that,
there's
a
lot
of
applications
that
can't
actually
tolerate
the
sidecar
not
being
present,
and
they
need
that
lifecycle
orchestration
to
work
for
them
in
order
to
be
able
to
be
run
on
kubernetes
without
being
completely
Rhianna
connected
and
a
like,
so
feedback
from
the
guys
on
the
ISTE.
Oh
and
envoy
side
were
like.
We
should
do
this.
We
need
to
do
this
and
yeah.
There
is
additional
complexity,
but
some
of
the
feedback
from
tim.
A
A
That's
fundamentally
more
and
more
popular
to
do
and
it
even
if
you
look
at
native
integration
with
sta
like
if
you
want
to
use
those
technologies
together,
it
becomes
kind
of
important
to
be
able
to
do
the
sidecar
injection.
Well,
it's
not.
You
know.
A
the
complexity
versus
the
benefit.
People
are
generally
more
Tory.
It's
beneficial
okay,
I
happen.
A
B
A
And
then
we
can
inject
the
side,
cars
via
API
machinery
and
to
say
that,
like
you,
expect
us
to
architect
the
application
so
that
it
works
well
with
a
sidecar,
even
though
the
person
who
owns
the
application
may
not
own
the
infrastructure,
that's
doing
the
sidecar
injection
for
either
storage
or
network
for
whatever
they're
doing
it
was
kind
of
like
you
can't
leave
us
like
that.
So
and
then
to
me,
that's
fair!
Oh.
A
In
like
the
the
workload
control,
respect,
you're
gonna
have
a
problem,
but
that's
not
where
the
injection
is
done.
The
mutation
is
done
on
the
pod
spec,
which
actually
work
well
that
won't
break
them
since
I
think,
like
we
already
had
this
with
some
admission
book
like
we
had
an
admission
web
hook
that
broke
deployment
because
it
was
me
hitting
the
replica
set
and
that
that
breaks
cuz
deployment
doesn't
comparison
between
the
template
specs
to
determine.
A
If
something's
been
mutated
demon
set
in
stateful
set
actually
just
label
the
impact
and
products
they
will
label
the
pod
and
say
we
think
it's
at
this
revision
and
it
takes
into
account
that
the
pod
that's
created
based
on
the
template,
may
be
arbitrarily
mutated
by
admission
control
before
it
becomes
an
actual
pod.
So
we
don't
do
pods
like
template
for
the
actual
pod
or
pups
are
a
pod
spec
examination
for
the
pot
itself,
so
I
think
we're
okay
with
demon
set
and
staples
and
replication
controller.
A
Sorry
replica
set
actually
should
work
with
admission
control
for
well
for
mutating
web
hooks.
If
you
only
modify
the
pod
spec,
so
that
is
safe
to
do.
But
like
modifying
the
template
itself
is
probably
problematic:
the
API
machinery
guys
Daniel
and
Jordan
based
or
like
yeah.
Just
don't
do
that
so
so.
D
A
Right,
yeah,
let's
tick
it.
Let
me
take
a
look
at
that,
because
there
is
another
one
that
was
open
for
staple
set
where
the
same
thing
was
occurring,
but
the
user
was
basically
implementing
a
mutating
web
hook
that
was
ill
formed,
so
that
wasn't
either
the
fault
of
the
controller
or
the
fault
of
the
API
machinery
or
a
paradigm
kind
of
cognitive
dissidence
thing.
A
A
Really
have
to
know
what
you're
doing
in
order
to
make
it
work.
That's
without
setting
up
the
PKI
infrastructure,
that's
necessary
to
communicate
safely
with
the
api
server
to
execute
your
web
hook
effectively
like
just
it's
a
lot,
but
as
CR
DS
go
toward
ga
default
way
for
defaulting
and
validation
is
going
to
be
well
defaulting
will
be
a
mutating
web
hook,
and
then
validation
will
be
a
validating
web
book.
So
they're
gonna
get
more
popular.
We
should
probably
be
prepared
for
them.
A
Yeah,
when
we
added
controller
history,
we
got
rid
of
that
logic.
Initially
we
were
comparing.
Basically,
we
were
just
making
sure
the
pod
was
labeled
with
the
correct
controller
revision,
as
opposed
to
doing
a
direct
comparison
between
the
template
of
the
demon
set
and
the
spec
of
the
pod.
So
if
that
creep
back
in
I
would
be
surprised
but
I
wouldn't
be
shocked,
I.
D
C
A
I'm
not
doubting
the
business
trip.
The
other
thing
is
we
have
this
cap
from
the
guys
at
Pinterest,
which
is
interesting,
but
I'm
not
sure.
If
it's
something
we
necessarily
want
to
do,
I
think
I
brought
it
up
last
week.
It
would
be
great
if
we
had
some
other
people
take
a
look
at
it.
They're,
basically
proposing
to
start
adding
I
guess
the
best
way
to
say
it
is
lower
disruption
updates.
So.
A
The
only
thing
we
can
do
right
now,
you
know
in
all
honesty
is:
do
it
for
an
image
right.
That's
the
only
thing
that's
mutable
on
a
pod
spec,
so
that
would
be
the
only
thing
we
can
touch.
Then
there's
the
interaction
like
we
just
discussed
a
minute
ago
with
mutating
webhooks,
which
may
in
fact
modify
the
container
prior
to
actually
launching
it
in
such
a
way
that
you
can't
actually
tell
if
you're
doing
the
right
thing.
If
you
start
doing
image,
inspection
or
the
fields
of
the
pods
back,
that
could
cause
controller
tight.
A
Looping
then
there's
the
fact
that,
like
because
we
can't
resize
the
container
I,
don't
know
how
valuable
it
actually
is
like.
If
all
you
want
to
do
is
an
image.
The
way
to
do
that
might
be
doing
something
in
the
open
source
like
base
of
kubernetes.
That
did
image
streams,
which
is
kind
of
a
feature
and
open
shift
right
now
and
thinking
about
it
in
terms
of
rolling
out
image.
Streams
might
be
a
better
way
to
look
at
this
problem,
but
they
want
to
do
it.
For
stateful,
set
and
demon
said.
The
like
image.
D
D
Are
actually
two
types
of
well
image?
Stream
is
the
object
that
actually
holds
the
image,
and
then
you
have.
There
are
two
options.
You
can
have
one
admission
that
will
actually
change
the
image
in
the
port,
that's
being
created
as
an
admission
route
and
the
the
other
one
that
we
settled
up
for
is
the
to
link
it
to
your
deployment,
and
it
will
inject
the
image
into
the
deployment
and
never
wrong
will
happen
since
this
well.
B
In
short,
it's
a
it
depends
whether
we're
talking
about
resources
that
are
available
and
in
OpenShift
natively.
For
those
we
do
support
directly
working
with
image
streams
for
the
resources
that
are
coming
from
cube
api.
We
always
work
through
through
admission,
which
is
replacing
the
image
based
on
whether
you
want
it
or
not,
and
you
need
to
on
at
a
Dem
extreme
or
label
image
stream
with
with
the
information
whether
it
it
is
possible
for
it
to
be
used
in
and
in
a
workload,
even
more
or
less
excitation
how
it
works.
A
The
volume
claims
I'm
sorry
the
volume,
expansion
cap
and
there's
a
corresponding
kept
for
in
place
resize,
which
is
basically
like.
If
your
storage
supports
online
file
system
resizing
is
still
not
merged,
but
not
from
us.
If
I,
because
I
think
this
was
approved
by
say,
apps
I
think
it's
just
waiting
on
sig
storage
approval,
I
reached
out
to
Saad
and
Michelle.
A
A
Custom
controllers
that
are
workload
specific
for
databases
like
a
lot
of
them,
are
still
using
stateful
set
under
the
hood
as
a
primitive.
They
just
write
in
customer
orchestration,
on
top
of
it
so
being
able
to
resize
disks
kind
of
is
a
super
important
feature
for
those
people.
I
feel
like
so
I'd
like
to
get
this
in
sooner
or
later.
I
might
well
I'll
see
if
the
the
author
wants
to
try
to
go
at
the
implementation,
because
what
they've
proposed
is
mostly
seen
so
far.
A
A
A
So
the
current
implementation
only
works
with
the
built-in
types
that
built-in
workload
types
basically
I
I,
don't
know.
So
if
we
like
the
the
API
and
we're
okay,
GA
Matt
I
mean
I
could
be
convinced
that
it
would
be
ok
to
GA
the
API,
as
is,
and
then
add,
support
for
the
scale
sub
resource
later.
Maybe
but
I
think
that
what
we'd
like
it
to
do
is
act
more
like
the
HPA
controller
and
be
able
to
act
on
anything
that
has
a
scale
sub
resource
scales
of
resources
available
in
CR,
DS,
I.
Think
since
113.
A
That's
not
crippling
to
the
API
machinery
prior
to
saying
that
the
whole
thing
is
generally
available.
I
mean
part
of
it
is
making
the
API
seeing
the
API
is
stable,
but
the
other
thing
with
GA
is
we're
usually
saying
we
think
the
implementation
is
stable
enough,
that
it
won't
break
you
or
I'm,
not
gonna
have
to
change
it
in
backward
incompatible
ways,
and
if
we
supported
the
scale
sub
resource
and
then
decided
later
that
okay,
we
can't
support
it
or
it's
broken
or
like
yeah.
This
is
horrible.
A
There'd
be
no
way
to
back
that
out,
like
once,
we
decide
we're
going
to
support
the
scale
sub
resource
from
an
implementation
perspective,
removing
that
would
be
backward
and
compatible
for
sure
all
of
a
sudden
you'd
turn
up
a
cluster,
and
previously
you
had
PD
B's
associated
with
the
CR
D,
and
they
just
worked
now.
This
new
version
doesn't
work
or
behaves
differently
with
respect
to
CRTs,
and
you
know
the
availability
guarantees
that
you're
trying
to
provide
via
the
disruption
budget.
A
Just
aren't
there
anymore,
so
I
think
we
probably
want
to
at
least
see
the
scale
of
resource
support
it.
The
other
feedback
I
was
wondering,
is
any
one,
so
we
added
maxint
available
instead
of
been
available
because
it
allowed
you
to
use
a
PI
DB
and
not
have
to
mutate
it
frequently.
It
was
just
a
cleaner
expression
of
disruptions
in
terms
of
the
way
people
generally
think
of
them.
The
other
company
I
was
wondering
is
do
we
want
to
keep
men
available,
because
when
an
ER
did
the
initial.
A
Edition
of
the
maxint
available,
he
didn't
add
it
to.
He
wanted
to
just
replace
it
altogether,
but
we
couldn't
do
that
in
a
backward
compatible
way.
So
GA
is
a
chance
to
break
backward
compatibility
I'm
not
to
having
both
I,
don't
see.
It
is
hurting
anybody
so
I'm
not
to
like
gung-ho
that
we
need
to
do
it
one
way,
the
other,
but
if
anyone
else's
feedback
it
would
be
good
time
to
give
it
now.
I.
A
A
A
I'm
saying
I
don't
either
and
I
do
remember,
doing
the
V
1
beta
2
for
the
workloads
api's,
but
the
primary
motivation
there
was
to
just
put
something
out
into
production
and
let
people
use
it
and
make
sure
that
we
were
confident
that
we
had
the
right
thing
to
promote
to
GA
I.
Don't
think
it
was
part
of
the
the
deprecation
and
support
policy
for.
Why
would
do
that?
But
I
also
am
Not.
A
A
He's
just
gonna
follow
a
regular
cap,
but
I
mean
the
graduation
criteria
will
be
in
there.
He's
gonna
describe
what
is
going
to
be
done,
but
not
necessarily
the
details.
I
think
probably
PRS
are
a
better
way
to
talk
about
her
the
actual
code.
Unless
you
want
to
see
something
separate,
in
which
case
you
know,.
B
A
A
B
B
So
it
should
be
fine,
yeah
I
do
want
to
ensure
that
you
know
there
were
some
discussion
that
we
need
to
graduate
cron
job
as
soon
as
possible,
because
we
graduated
everything
else
and
I'm
like
yeah,
but
everything
else
is
different
than
cron
job
and
we've
been
trading
lots
of
lots
of
stuff
with
a
cron
jobs,
either
migrating
from
one
for
promoting
from
one
version
to
the
other
or
whatever
else
that
we've
been
dealing
with
for
for
the
cron
job
consolidation,
yeah
I
mean,
and
in
the
worst
case,
is
it.
It
was
never.
A
Around
open
kind
of
CR
DS
for
for
batch
API
is
like
yeah
could
be
flow
as
a
tensorflow
job
for
machine
learning.
I
could
definitely
see
if
you
wanted
to
do
something
a
little
bit
more
high
throughput
how
the
job
API
might
not
be
the
best
thing
for
you,
I
think
cron
job
is
one
of
the
ones
where
it's
like
I,
don't
I,
don't
know
if
there's
anything
to
improve
on.
To
be
honest,
I
mean
most
of
the
people
who
want,
like
I,
want
to
run
this
job.
B
A
A
B
A
B
A
Yeah,
it
starts
reaping
the
the
pods
before
the
job
is
done
and
you
get
jobs
that
run
forever,
which
is
like
a
real
thing.
You
could
maybe
use
finalizar
x'
to
prevent
the
GC
from
kicking
in
because
the
the
necessity
is
so
in
when
there
is
a
I
was
looking
at
this
particular
issue,
not
too
long
ago,
with
a
bunch
of
other
people
from
API
machinery
and
node,
just
trying
to
figure
out
what
like
why.
So
why
are
the
pod
GC
settings
what
they
are
today?
A
Why
did
we
add
the
controller
and
do
we
still
need
it
at
this
point,
because
it
I
thought
it
was
probably
a
storage
pressure
issue
on
node,
but
it
actually
isn't.
It
was
a
CPU
pressure
on
a
component
that
wasn't
using
pagination,
which
doesn't
actually
do
anymore
so
like
there.
There
are
things
we
can
do
here
to
make
that
particular
problem
better,
but
is
this
so
job
is
v1
as
it
is?
We
had
a
B
to
alpha.
B
I'm
not
sure
we
need
a
a
new
API
to
solve
this
particular
problem,
but
it
was
rather
envisioning
that
we
could
improve
the
controller
to
be
more
aware
of
the
of
the
polit
so
that
they
could
be
removed,
maybe
something
along
yeah
I,
don't
know
calculating
sha
or
whatever.
B
A
B
A
Is
negative,
but
you
can
also
add
an
intermediate
object
that
basically
acts
as
a
counter
whose
spec
can
be
updated
so
that
the
job
can
determine
like
this
is
how
many
pods
I've
actually
launched.
So
you
can
have
another
intermediate
like
kind
of
implementation,
detail
object
that
you
use
a
counter
to
kind
of
track
the
status
of
the
current
job.
It's
running,
so
you
don't
care
about
the
pods
getting
garbage
collected
and
that
way
it's
also
one
who's
actually
implemented
that
it
was
an
interesting
way
to
do
it.
A
B
Yeah
you're
not
changing
the
you're,
not
changing
the
the
backwards
compatibility
in
any
way.
You
are
expanding.
The
current
information
about
a
status
object
and
the
current
status
will
still
be
updated,
as
is,
and
the
fact
that
you're
using
additional
helper
objects
to
have
a
intermediate
data.
I
don't
see
that
as
a
breaking
backwards.
Compatibility
in
any
way.
A
B
Exactly
okay,
yeah:
it
is
a
problem.
We
know
about
the
problem
since
literally
day
one,
and
that
was
a
conscious
decision
when
we
made
it
back,
then
that,
yes,
we
will
have
to
fix
it
at
some
point
in
time.
Oh
the
fact
that
we
never
fix
it
is
a
feature,
I
mean
and
then
there's
their
it's
a
buck.
The.
A
Other
way
we
could
do,
it
is
like
you
could,
in
theory,
just
add
the
finalizar
and,
assuming
that
we've
cleaned
up
the
implementation
of
the
other
components
that
were
so
like
the
thing
that
was
breaking
was
Lister's
on
other
system
components.
If
we
added
finalizer
x'
to
make
sure
that
the
pods
weren't
garbage
collected-
and
we
can
actually
stand
to
have
that
many
pods
lying
around
listed
a
that
fixes
the
problem
too
yeah.
B
B
A
They're
definitely
implement
implications
for
so
you're
saying
basically,
I
have
a
node
of
all
these
pods
I
can't
make.
My
eviction
doesn't
work
if
I'm
doing
something
like
a
roiling
node
upgrade
because
I
have
all
these
pods
stuck
around
in
the
terminating
state,
because
I
have
this
batch
job.
That's
continuously
running
but
not
finished.
Yeah.
B
B
A
B
A
B
Well,
it
won't
be
a
problem,
in
average
case,
from
what
I've
seen
the
average
case
is
a
couple
of
parts.
That's
all,
even
if
the
case
is
yes,
my
job
is
running
for
I,
don't
know
50
60
hours,
that's
still
not
a
problem.
If
it's
creating
ten
parts,
that's
not
at
all
ecology,
Bowl,
even
if
it
that
would
be
fifty,
then
that's
okay,
but
when
you're
talking
about
hundred
plus
that
kind
of
sucks
and
and
a
hundred
plus
with
with
a
duration
of
a
week
or
so
that
that's
becoming
problematic,
so
yeah.
A
Actually,
I'm
talking
about
average
case
for
high-throughput
batch,
so
I'm
thinking
more
along
the
lines
of
thousands
of
pods,
okay
and
100
I.
Think
I
mean
like
for
people
who
are
doing
the
case
doing
the
things
you're
talking
about.
They
tend
to
not
be
troubled,
at
least
from
what
I've
seen
yeah.
That's
true
like
it's.
A
Just
for
my
batch
then
have
a
separate
nude
pool
for
my
serving
workloads
and
stateful,
and
then
it
had
different
machine
shapes
for
each
of
them
and
if
I'm
looking
to
get
better
utilization,
it's
pretty
easy
for
most
clouds.
Now,
for
me
to
choose
arbitrary
machine
shapes
like
I,
can
make
my
own
SKUs
that
fit
whatever
I
want
I'm
more
concerned
for
people
on
Prem
who,
like
you
know,
I,
buy
a
SKU
and
it's
a
fixed
size
and
I'm.
A
Trying
to
run
this
on
bare
metal
and
what
I
really
need
to
do
to
get
better
utilization
is
fill
holes
in
the
shape
of
the
SKU,
with
patchwork
right
so
like
for
them.
The
cost
savings
and
the
cost
optimization
is
predicated
on
the
ability
to
run
batch
workloads
well,
along
with
serving
workloads
and
stateful
workloads,
and
that's
a
use
case.
I
haven't
really
seen
people
exploring
very
in
a
great
deal.
A
A
Don't
think
I
heard
about
that
one:
okay,
it's
it's
a
Red
Hat
thing,
but
it's
relatively
new.
They
just
announced
it
recently.
It
seems,
like
that's
gonna,
be
a
thing,
so
we
should.
We
should
make
sure
that
the
use
case
for
a
collocating,
Bashan,
staple
or
stateless
workloads
is
doable.
I
would
think.
A
A
C
So
one
issue
that
we
recently
saw
with
the
terminated
pods
was
not
related
to
jobs.
Basically,
we
saw
that
the
default
value
of
when
that
GC
kicks
in
is
12,500,
and
we
have
seen
that
due
to
some
recent
changes
when
cubelets
restart
they
reset
the
extended
resource
sizes
and
that
leads
to
pods
getting
evicted
and
terminated
in
the
in
the
terminators
say.
So
we
end
up
with
lot
of
these
pods
in
the
terminated
state
and,
and
the
GC
never
happens
until
it
reaches
12,000.
A
The
actual
problem
that
was
control
playing
him
that
broke
the
control
plane
was
Lister's
and
there
the
number
of
times
that
were
set
up
the
GC
had
to
be
put
in,
because
when
I
forget
what
component
it
was
when
the
pod
listing
was
happening,
it
was
causing
cpu
burn,
like
basically
the
duty
cycle.
The
CPU
got,
eaten,
live
and
was
control,
causing
control,
plane
instability,
so
they
set
the
GC
threshold
and
added
the
garbage
collector
to
prevent
Lister's
across
the
board
from
like
just
breaking
everything,
but
with
pagination
the
the
problem.