►
From YouTube: Kubernetes WG Batch Weekly Meeting 20221208
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Okay,
hello.
Everyone
today
is
December
8th,
2022.
Welcome
to
today's
edition
of
kubernetes
batch
working
group,
I'm
going
to
be
moderating
your
meeting
today,
I'm
Swati
Sega,
I,
work
for
red
hat
and
I
would
like
to
let
everyone
know
that
this
meeting
is
being
recorded
and
please
keep
in
mind
cncf
code
of
conduct.
When
you
engage
in
this
meeting,
we
have
two
items
today.
The
first
one
is
from
Wilfred
what
unicorn,
scheduler
Wilfred
feel
free
to
start
off.
B
Okay,
good
morning,
I'm
Wilfred
I
work
for
Cloudera
and
I'm.
The
the
tech
lead
on
the
Apache
unicorn
project
and
I
was
asked
by
some
of
the
the
boy
group
people
to
present
on
what
we
do
around
our
gang
scheduling,
the
All
or
Nothing
stuff
that
we
have
set
up.
So
I've
I've
created
a
short
presentation.
So
if
I'm
allowed
to
share
my
screen
and.
A
B
I'll
I've
prepared
a
short
presentation
around
what
we
do
and
how
we've
implemented
game,
scheduling
and,
and
some
of
the
reasons
behind
what
we,
what
we
did
or
I'll
talk
about
and
compare
that
in
in
a
number
of
ways
to
what's
what's
available,
currently
kubernetes
so
yeah,
beginning
scheduling.
So
if
we
look
at
what
game
scheduling
does
I've
spoken
for
this
before
in
May
at
kubernetes,
the
cubecon
in
Europe?
B
We
looked
at
at
gang
scheduling
and
what
we
did
is
we
said
we
we
need
something
or
some
way
that
we
can
can
tell
or
can
schedule
an
application
based
on
a
larger
request
and
then
one
part
or
or
a
multi-pot
request,
so
what
we
do
and-
and
we
we
look
at
it
specifically
from
from
a
unicorn
perspective.
B
We
said
we
want
to
introduce
that
kind
of
scheduling
based
on
an
application
object
or
an
application,
aware
kind
of
a
scheduling,
an
application
within
where
the
Unicorn
is
a
loose
definition
of
of
a
set
of
of
bots,
and
that
is
purely
based
on
the
fact
that
there
is
an
annotation
on
the
part.
That
is
an
application
ID.
So
it
doesn't
have
to
be
a
set
of
five
of
the
same
Bots
or
it
doesn't
even
have
to
be
parts
that
get
submitted
all
together.
B
It
could
be
one
part
of
one
type
and
10
parts
of
another,
but
it
could
be
10
different
types
of
pots.
Also,
so
that's
That's
the
basis
of
what
we,
what
we
did.
The
other
thing
that
we
wanted
to
do
is
we
wanted
to
be
quota
aware
So
within
unicorn.
B
We
don't
use
the
kubernetes
namespace
quotas,
we
have
got
quotas,
the
set
up
in
a
Q
system,
and
that
is
purely
hierarchical,
so
we've
got
multiple
queues
within
the
Unicorn
scheduler
and
within
that
hierarchy
we
set
up
a
quota
and
that
quota
runs
for
that
specific
Queue
at
the
root,
and
then
we
can
have
children
of
that
route
and
Children
of
the
children,
etc,
etc.
So
that
getting
scheduling
that
we
would
set
up
had
to
fit
in
with
that
quota
system
that
we
do.
B
We
schedule
a
lot
of
spark
workloads
as
batch
work
routes,
but
we've
also
got
workloads
or
services
that
we
want
to
schedule
or
there's
people,
that's
that
start
up
biking
jobs
and
it
needs
to
fit
in
with
all
these
kinds
of
different
workloads.
So
we
don't
want
to
pleasecribe
and
say
you
need
to
do
a
job
or
needs
to
be
a
demon
said
or
whatever
now
any
type
of
workload,
but
people
could
create
needs
to
be
support.
B
We
also
had
a
a
view
of
we
want
to
have
a
minimal
impact
on
the
on
the
submission,
so
that
means
whatever
we
need
to
do.
It
needs
to
fit
in
with
existing
applications
like
spark
and
other
objects
and
other
ways
of
submitting
things.
B
So
the
third
point
was
crossed
throughout
the
scaler,
whatever
we
do,
it
has
to
fit
in
with
the
cluster
Auto
scaler,
without
making
any
changes
to
the
grass
around
the
scale
itself.
B
Unicorn
runs
as
a
schedule
against
a
wide
spread
of
kubernetes
versions.
We
said
we
don't
want
to
introduce
new
kubernetes
objects
or
an
API,
because
that
limits
us
for
when
we
can
start
using
it
and
which
versions
The
end
users
can
run
as
a
kubernetes
release
So.
B
Based
on
all
of
that,
we
started
working
on
on
our
game
set
up
and
we
wrote
a
design
dock,
it's
available
on
the
Unicorn
website,
and
we
have
been
running
with
this
gang
Desiring
for
I,
think
about
a
year
and
a
half
now
and
we're
running
it
within
production
and
it's
being
used
by
a
number
of
launches
spark
uses
in
production.
It's
mainly
spawn
users
that
are
that
are
using
this.
B
Let's
first
take
a
step
back
and
then
we'll
take
Sparks
and
as
a
as
an
example.
So
what
happens
when
we
we
start
scheduling,
spark
application
and
and
how
those
others
that
work
with
either
unicorn
or
with
the
default
schedule.
So
from
from
a
spark
application
point
of
view,
when
we
submit
the
spark
application
we
create
the
driver
bot
the
driver
Port
gets
scheduled
and
after
the
driver,
ports
gets
scheduled,
the
driver
bot
itself
will
create
a
number
of
executors.
B
B
So
it
also
means
that
the
driver
does
not
define
at
that
point
in
time
how
many
executors
it
will
use
over
the
lifetime
of
the
spark
applications
run
yeah
every
driver
that
creates
a
number
of
executors
and
every
executed
he
supports
and
it
will
just
get
scheduled
one
by
one
up
until
it's
needed
and
and
when
it
finishes.
That's
the
end
of
the
spark
application,
the
number
of
executors
and
the
configuration
of
the
executes
is
included
within
the
driver,
and
so
that's
the
part.
B
B
B
So
what
if
we?
What
have
we
done?
Let's
first
have
a
look
at
the
gang
specification,
so
using
games
in
in,
in
the
whole
gang
scheduling.
What
we've
said
is
when
we,
when
we
use
gang
scheduling,
we
specify
the
the
Gang
done
or
All
or
Nothing,
set
up
on
the
first
pot
that
gets
submitted.
So
in
the
spark
case
that
could
be
the
driver
or
if
you
use
anything
else,
we've
done
a
little
bit
just
kubernetes
jobs.
B
You
specify
the
the
gang
information
on
the
first
book
that
gets
submitted
and
that's
where
we
read
it
from
it's
purely
a
simple
annotation
under
the
the
Unicorn
namespace
and
The
annotation
consists
of
a
number
of
elements
that
we
use
so
like
I
said
before
we
can
Define
multiple
sets
of
of
of
of
bots,
and
that
could
be
just
one
type,
food
everything
that
runs
within
the
application.
But
we
can
also
have
four
or
five
or
six
different
groups
in
there,
so
distinguishing
them
by
name.
B
Two
in
the
first
design
in
the
first
setup
that
we
had,
we
did
not
have
the
node
selectors
to
relations
and
amenities
included
in
it
and
later
on,
when
we,
when
we
started
looking
at
the
more
advanced
use
cases,
we
saw
that
we
had
to
get
that
in
there
to
to
allow
proper
placement
to
just
work.
Also
based
on
the
number
of
members
that
we
require
in
the
number
of
you
of
groups
that
we've
got,
we
create
placeholders
on
kubernetes.
B
So
if
we
go
back
to
the
spark
kind
of
a
setup,
the
only
thing
that
gets
submitted
is
the
driver,
so
we
don't
know
the
the
number
of
executors
or
what
what
will
be
requested.
You
know
if
we
don't
know
that
we
can't
reserve
the
space.
So
we
can't
schedule
this
again,
but
what
we
do
know
is:
if
this
annotation
is
there
and
we've
got
the
game
definition
we
can
create
whatever
we
want
based
on
these
gang
definitions.
B
B
All
these
placeholder
pots
will
get
scheduled
and
we
put
all
these
live
spots
on
the
system
and
we're
going
to
say
we
start
scheduling
those
spots.
We
hold
back
the
other
Bots,
we'll
dive
into
that
a
little
in
a
in
a
little
bit.
So
beside
the
bot
specifications
for
the
the
placeholders.
We
also
have
got
a
policy
specification
again
on
the
first
application.
B
What
we
have
not
just
these
status
groups
that
we
specified,
but
we
also
have
a
policy
that
we
allow
you
to
specify,
because
what
we
want
to
do
is
we
want
to
be
able
to
say,
do
not
Reserve
this,
this
placeholder
pots
that
we've
got
forever
time
them
out,
because
if,
if
we
haven't
used
them
or
there's
something
else
going
on,
we
don't
want
them
to
sit
there
forever,
and
we
just
want
to
make
sure
that
we
can
time
them
out
and
that's
something
that
the
customer
knows.
They
know
the
behavior
of
the
application.
B
We
also
have
a
game
scheduling
style,
because
what
we,
what
I
need
to
was
customers
asking
us
saying
Okay
we
want
to
be
able
to.
If
you
can't
give
me
all
the
placeholder
pots
and-
and
it
doesn't
fit
within
the
system,
then
I'm
happy
for
the
application
and
for
for
the
thing
to
still
run,
but
just
as
a
normal
scheduling
cycle,
so
no
placeholders.
B
We
just
want
to
do
then
put
by
port
and
we'll
see
what
we
could
do.
So
we've
got
two
mechanisms:
a
hard
and
a
soft
one.
The
hard
one
was
like
if
I
can't
get
all
the
Bots
defined
in
my
game,
I
failed.
The
application
go
on,
that's
the
real
That's,
the
basis
that
would
that
we
started
with
and
say,
give
me
10
pots
of
this
size
and
then
five
parts
of
the
other
size.
B
If
I
don't
get
all
these
15
ports
within
a
certain
time,
I
failed
the
application,
and
that
was
that
was
the
start
of
the
startup
for
what
we
did.
But
people
then
said
now.
If
we,
if
we
only
get
some
of
these
spots,
I
still
want
to
have
the
application
run,
but
without
any
of
the
the
gang
style
in
involved.
So
we
do
a
soft
system.
B
Then
the
second
part
is
using
the
gangster's
specification,
so
we've
set
it
up,
we've
got
it
on
the
first
board,
we've
defined
all
our
task
groups,
but
now
we
need
to
be
able
to
say
use
these,
these
placeholders,
that
we've
got
that
we've
created
and
then
use
them
during
the
the
scheduling
cycle,
and
since
we
don't
have
the
official
pots
at
startup,
yet
we
don't
have
to
have
them.
We
might
have
them.
We
might
not.
B
A
B
The
long
time
I've
got
about
three
three
to
four
minutes:
left:
okay
as
good
as
done
here.
So
thank
you,
the
next
step.
The
next
step
is.
We
want
to
use
this
game
specification
so
on
every
single
pot
that
we
have
we
create,
or
we
we
put
another
annotation
on
which
defines
for
us
the
which
member
we
use
and
we
we
use
the
the
placeholders.
So
we've
got
some
checks
and
balances
on
that.
B
We've
got
an
an
opt-in,
it's
a
kind
of
a
scheduling
setup.
So
if
you
don't
use
all
the
placeholders
we
clean
up,
if
you
use
more,
we
treat
them
as
normal
parts
and
we
we
go
through.
What
we
also
saw
is
that
people
sometimes
have
got
different
requests
in
the
setup,
then
that
they
have
on
the
little
pods.
B
So
all
that
the
difference
is
being
accounted
for
during
the
the
scheduling
cycle,
the
scheduling
cycle
looks
a
bit
like
we
hold
the
port,
we
create
placeholders,
we
replace
the
placeholders
or
you
release
them
in
the
end.
If
it's
not
done
a
real
quick
overview
of
what
that
looks
like
in
the
game
for
the
spark
set
up,
we've
got
a
driver.
We
specify
the
gang
as
one
driver,
three
executors.
That
means
that
we
create
these
four
placeholder
pods
on
the
system.
B
These
four
plays
all
the
Bots
get
scheduled
by
the
scheduler,
and
at
that
point
we
release
the
original
Drive
report
that
was
created
and
that
original
driver
Port
will
start
creating
the
executors
that
are
now
created.
So
we've
replaced
the
the
driver
placeholder
and
we
schedule
the
the
executors
that
now
come
into
play
one
by
one
and
we
replace
the
placeholders
that
are
there
and
in
the
end,
we've
got
the
whole
application
up
and
running.
B
So
from
the
moment
that
we
create
the
placeholders,
we
are
using
the
quotas
in
the
queue
and
in
the
system
and
if
that
would
not
completely
fit
so,
if
you
ask,
for
let's
say
one
drive
OneDrive
as
we
executed,
but
it's
more
than
your
quota
that's
available,
then
we
reject
the
application
already
for
you.
So
again,
we've
got
some
checks
and
balances
around
all
of
these,
but
this
is
a
flexible
way
that
we
use
to
build
and
schedule
gangs
around
spark
or
any
other
application
that
you
want
to
do.
B
Let's
that's
a
quick
overview,
there's
multiple
examples,
multiple
ways
of
doing
things
and
all
the
all
the
documentation.
Everything
like
I
said
in
the
first
slide,
is
on
the
unicorn
website.
Things
thank.
A
A
Later,
there's
a
question
from
Abhishek
there's
a
question
from
Abdullah
as
well,
but
I'll
start
from
a
question
from
Abhishek
who's
asking:
do
we
lose
the
guarantees
of
running
the
app
when
quota
is
evaluated
later
and
is
there
a
scenario
for
a
partial
job
or
a
part
start.
B
Yes,
so
we
we
do
have
so
there's
two
things
if
the
whole
gang
doesn't
fit
within
the
quota.
We
reject
the
the
request
and
we
say
sorry,
you
can't
do
this.
This
is
out
of
quota
or
too
large
for
the
quota
that
you've
got
if
the
request
fits
within
your
quota.
But
at
the
point
that
you
want
to
run
it
there's
not
enough
quota
available,
then,
depending
on
the
the
heart
or
the
soft
setup,
that
we've
got.
If
it's
a
hard
setup
for
the
game,
we
fail
the
application.
B
If
it's
soft,
then
we
try
to
schedule
within
the
quota
that's
available.
So,
yes,
we've
got
that
those
two
different
kind
of
of
ways
of
scheduling
things,
and
especially
that
the
the
last
bit,
if
there's
not
enough,
quote
available,
and
we
still
go
on
and
schedule
that
came
from
requests
of
the
the
users
that
we
we
had
around
this
this
option.
So
they
really
said
I
do
not
want
to
fail.
The
application
I
just
want
to
try
to
schedule
within
what
is
available
at
that
point
in
time.
A
Perfect,
thank
you,
Abdullah
feel
free
to
ask
your
question.
C
Yeah,
sorry,
can
you
please
go
back
to
the
the
gang
definition
on
the
first
part,
so
the
this
works
well
like
for
spark
where
you
have
the
driver
first
created,
and
so
you
can
put
that
in
the
driver,
but
like
for
a
V1
job
like
you
have
to
put
it
in
all
parts
right
of
that
job.
B
C
B
Correct
yes,
yep
in
in
that
case,
when
we,
when
we
do
create
jobs,
we
just
put
it
on
the
on
every
single
pot
that
is
there
and
depending
on
which
part
we
see
first,
because
you
can't
even
rely
on
the
fact
that
when
you
create
a
job
spec
things
flow
through
the
the
job
controller,
and
we
even
see
that
with
when
we
create
placeholders
based
on
what
we
see
what
we
do
here,
we
we
create
pots
in
a
certain
order,
but
going
through
all
the
event,
processing
and
everything
that
happens
in
the
background.
B
We
get
them
back
in
a
completely
different
order,
so
yeah
we,
we
can't
rely
on
some
kind
of
ordering
anywhere
in
that
system.
So,
yes,
if
you
do
a
job,
you
specify
this
on
every
single,
every
single
pot,
because
we
don't
know
which
one
we're
going
to
see.
C
First
and
the
placeholder
parts
are
actual
like
you,
you
create
them
after
you
check
the
quota
for
the
gang
or
before
yes,.
C
A
C
C
They
need
to
schedule
in
their
place.
You
somehow
match
them
using
I
guess
the
second
slide
correct.
B
Because
we
could
have
five
or
six
different
kinds
of
of
task
groups
within
that
gang,
so
even
if
you
submit
an
application
and
you
later
on,
submit
a
pot
under
that
application,
it
could
be
part
of
the
example
group,
but
it
could
also
be
part
of
in
the
spark
we
do
that,
often
with
drives
and
executors.
B
It
could
be
part
of
the
driver,
group
or
part
of
the
executed
group
or
you
could,
even
if
you
want
to
say
oh
no.
This
is
something
that
I
want
to
run
outside
of
this
guarantee
and
you
create
a
pod
without
any
annotation,
and
then
it
just
get
scheduled
as
one
outside
of
that
guarantee
that
you've
created
with
the
game
right.
B
It's
not
no,
so
we
will
We
allow
you
to
go
over
the,
so
we've
got
us
in
the
spec.
Here
we
say:
we've
got
a
minimum
member
and
that
we
treat
it
as
minimum
member.
So
if
you
create
plots
that
are
mem,
give
the
name
example
group
as
the
as
they
supported,
but
I've
already
scheduled
two
and
there's
no
placeholder
left,
then
I
just
treat
it
as
a
normal
pod,
so
we're
flexible.
You
can
go
over
that.
B
It's
it's
a
minimum
member,
it's
not
a
maximum
member,
so
we
are
flexible
because
we
need
to
be
able
to
do
that.
If
we,
if
you
look
at
Dynamic
executors
in
spark,
you
give
it
a
minimum
number
of
executes
in
spark,
and
that
doesn't
mean
that
it
uses
that
number
it
could
use
more,
but
it
could
also
use
less.
So
we
allow
you
to
do
both.
So
that's
that
checks
and
balances
with
that
opt-in.
If
you
use
less
parts
that
specif
that
specify
the
specific
dialogue
we
are,
we
have
got
placeholders
left
over.
B
C
The
the
the
the
problem,
one
problem,
because
this
is
something
similar
to
what
we've
been
discussing
with
like
general
idea
of
reservation,
so
one
thing
that
is
going
to
be
complicated.
It's
like
basically
you're
trying
to
create
the
Pod
spec
in
The,
annotation
and
I
mean
I
mean,
like
you
already
have
this
problem
with.
C
Oh,
you
had
to
add,
not
affinity,
and
then
you
had
to
add
tense
and
Toleration
so
and
then
validation,
Etc,
so
I
would
caution
that
this
I
don't
know
how
this
will
like
evolve
in
the
future
in
your
in
your
case.
But
you
can't
always
scale
with
like
running
away
with
keep
pushing
things
into
annotations.
Annotations
are
not
really
an
API,
but
the
bigger
problem.
I
see
is
that
you
have
cases
where,
for
example,
Dynamic
resources
being
allocated
like
PVCs
on
the
Fly
using
stateful
sets
right.
A
C
Status
controller
creates
the
PVC
on
the
fly
when
they
create
the
part
that
you
wouldn't
know
when
you
create
deployments
hold
that
part,
so
you
wouldn't
create
a
PVC
for
that
part,
so
you
wouldn't
be
able
to
schedule
them
at
all,
so
you
wouldn't
be
able
to
provision
storage
for
the
future
positive
of
the
application.
Correct.
B
And
correct
so
we
will
leave.
We
leave
that
up
to
the
to
the
real
Port
that
gets
created
so
but.
B
B
We
handle
that
that
we
do
that
we
handle
that
with
with
our
scheduling
internally.
So
if
it
turns
out
that
that
doesn't
fit,
then
we
we
use
different
nodes,
and
we
move
on
for
that.
So
yes,
we
do
that.
We
we
already
do
that
in
in
our
current
set,
because
we've
already
noticed
that
sometimes
when
people
specify
all
of
this
information
in
the
in
the
dance
group,
they
don't
set
up
any
nodes,
selectors
or
tolerations
or
affinities,
and
then
later,
when
the
real
pot
comes
in.
They
do
have
that.
B
So
we
already
see
that
it's
not
just
for
PVCs,
it
already
happens
and
we
handle
that
during
our
scheduling
cycle
that
that
is
done,
so
placeholder
might
run
on
one
note,
and
then
the
real
pod
gets
scheduled
on
a
different
node
later
on.
So
we
do
that,
that's
handled
we
we
saw
that
happen
before
so,
yes,
and
that
is
not
just
for
PVCs,
but
it's
it's
with
everything.
B
People
are
not
always
consistent,
they
say
oh
I
just
want
to
do.
The
minimum
I
want
to
make
sure
that
it
runs
within
the
queue,
so
I
only
specify
Min
memory,
I
mean
resources,
do
not
specify
any
of
the
other
things
and
then
later
on
that
gets
gets
on.
We
even
saw
people
that
said.
B
Oh,
my
placeholders
are
one
CPU
and
one
gig
of
memory
and
then,
when
the
real
Port
comes
along,
they
ask
for
one
CPU,
but
for
one
and
a
half
gig
of
memory
and
in
certain
circumstances
that
doesn't
fit
on
the
Node
at
that
point
in
time.
So
again
you
need
to
have
fallback
mechanisms
during
the
scheduling,
and
that
is
all
built
into
the
the
schedule
it
handles
all
these
edge
cases
with
differences
between
realport
and
the
and
the
the
placeholder
that's
been
created
all
that
stuff.
That's
that's
all
handled.
A
All
right,
thank
you.
Angela
thanks
Wilfred,
the
presentation
was
really
interesting
and
the
slides
as
well
I
would
really
like
you
to
continue
talking
on
this,
but
we
have
another
agenda
item
so
I'd
like
to
move
to
that
thanks
again,
so
we
have
Kevin
here
and
he
wants
to
discuss
about
pending
pods
you
have
an
over
to
you.
E
Yeah,
thank
you,
I'll
go
ahead
and
oh
can
I
get
permission
to
share.
A
Yeah,
you
should
have
okay
snow.
E
Thank
you.
So
this
kind
of
came
from
a
discussion
I
had
on
slack
about
I,
was
looking
at
the
the
retryable
job
idea
for
I
was
trying
to
use
that
in
Armada
and
I
realized
that
a
lot
of
the
cases
we
found
in
the
Armada
projects
were
actually
around
handling
pending
pot
issues
like
invalid
image
names
a
secret.
E
That
is,
that
you
mounting
a
secret
to
a
volume
of
a
lot
of
other
ones,
and
so
I
wanted
to
try
and
see
if
there
was
a
way
to
kind
of
have
that
retryable
job
idea
for
for
pending
pods,
to
try
to
say,
like
my
end
goal,
would
ideally
be
the
job
API
can
or
the
job
a
batch
job.
Keep
your
dicks
whether
or
not
a
like
a
the
job
is
stuck
in
pending
due
to
configuration
error.
E
Can
it
transition
to
failed
so
that,
like
a
main
use
case
for
this,
is
that
for
if
people
are
scheduling,
large
amount
of
jobs
and
batch
users
are
usually
not
kubernetes
experts,
so
they
might
have
configurations
like
a
image
poll
secrets
that
are
invalid
and
a
lot
of
the
controllers
do
handle
like
this,
like
they
might
retry
a
job
and
Armada.
We
actually
have
we.
We
have
a
controller
that
reads
the
container
status
or
the
container
reason,
and
also
we.
We
also
read
the
events
to
know
whether
or
not
jobs
or
pods
are
failing.
E
So
I
have
kind
of
a
potential
idea
that
I
want
to
run
by
this
group.
I'll
skip
to
the
bottom
I
kind
of
went
through
the
the
common
examples
that
I
found
for
what
I
found
for
configuration
errors,
and
these
are
mostly
just
contrived
examples.
But
so
a
lot
I
would
say.
There's
like
three
grouping
of
cases
that
I've
seen
for
pods
going
in
the
pending.
One
is
configuration
errors
and
those
are
usually
well
represented
by
the
container
status
of
waiting
with
a
valid
reason
and
yeah.
E
So
like
this
one
is,
if
you
have
all
capitals
in
your
image
name:
you'll
get
this
kind
of
state
from
your
container
status
notice.
The
conditions
are
all
they're
scheduled,
they're,
just
not
ready
and
they're
ready,
false
and
container
is
ready,
false,
which
isn't
really
a
valid
condition
to
kind
of
code
against,
because
there
are
business
as
usual
cases
where
all
these
conditions
are
set.
E
Image
pullback
off
is
one
that
I
think
I,
don't
think
we
should
I,
don't
think
we'll
be
able
to
Target
really
because
it's
a
kind
of
a
business
as
usual
case
where
I
mean
there's
stuff
like
image
poll
Secrets.
If
you
have
an
invalid
image
poll
a
secret,
it
will
still
go,
get
stuck
in
the
image
pull
back
off
and
also,
if
you
have
an
image
that
doesn't
exist,
but
I
think
it
might
be
too
difficult
to
try
and
predict
all
those
cases.
E
So
I,
not
I,
don't
know
if
we
can
handle
that
one
with
this
case,
but
then
so
some
others
are
like.
If
you
have
an
image
poll
being
never
and
you're
trying
to
pull
sorry
image,
pull
policy
is
never
and
your
container
doesn't
exist.
You'll
get
an
error
image,
never
pull,
but
the
container
still
sticks
and
pending.
E
If
you
have
a
missing
config
map,
you'll
get
a
reason
that
says
create
container
configure.
So
these
are
all
pretty
well
represented.
The
interesting
one
is
actually,
if
you
have
a
if
you're,
mounting
a
secret
from
if
you're
mounting
a
volume
from
a
secret
or
other
way
around
I
mean
have
it
mixed
up.
I
noticed
that
this
status
actually
just
says
reason
is
container
creating
and
the
only
information
you
get.
E
That
knows
that
it
actually
failed
is
in
an
event,
so
that
is
kind
of
tricky
to
I,
mean
I,
know
I,
don't
think
we're
supposed
to
be
relying
on
events
to
understand
whether
or
not
a
pod
failed,
but
in
this
case
you
kind
of
have
to
and
the
other
one
that
or
so
the
there's
also
there's
also
the
case.
I
didn't
really
cover
here,
which
is,
if
you're,
your
pod
can't
get
scheduled.
I
think
that
is
covered
by
this
condition.
E
Your
pod
scheduled
is
false,
and
this
is
a
an
interesting
case
where,
if
you
have
a
missing
volume
for
your
container,
your
your
pod
will
be
stuck
in
pending,
but
it
won't
be
able
to
get
scheduled
and
so
kind
of.
What
my
idea
was.
Is
that
I
think
they're?
E
A
lot
of
these
cases
can
be
represented
by
a
potential
condition
to
add
to
the
Pod
API
that
sort
of
kind
of
reflects
invalid
image
name
some
of
these
images,
and
then
my
hope
would
be
that
eventually
we
could
use
the
in
the
retrieable
job
cap.
I
know
they
have
an
image
or
they
have
a
retryable
policy
based
off
conditions.
I
would
hope
that
maybe,
if
we
have
a
pod
condition
that
has
like
configuration
errors,
we
could
Force
jobs
to
fail.
E
But
this
was
at
least
kind
of
my
proposal
and
I
was
hope.
I
was
wondering
what
the
this
group
feels
about
going
forward
with
this,
obviously,
and
not
want
to
get
into
design
now,
but
I
know
there's
a
cap
in
other
stages,
but
I
want
to
know.
If
the
idea
is
good
and
I'll
open
the
floor
for
questions.
D
D
We
have
to
add
a
condition
and
I
think
in
general,
the
less
conditions,
the
less
different
conditions
we
need
to
add
the
better
and
I
think
I
think
you
you're
just
proposing
one
and
that's
that's
good,
but
I
think
the
other
thing
we
need
to
think
about
is
when
or
how
would
Parts
transition
to
to
failed
because
they
are
currently
impending
right
and
that
I'm,
not
sure
I'm,
not
sure
about
like
well.
Where
should
that
responsibility
fall
like
I?
Guess
one
option
is
it
falls
within
the
job?
D
Another
option
is
the
fails.
It
fits
Within
the
cubelet,
to
fail
eventually
the
pod,
so
that
one
I'm
not
sure
where
is
the
best
location
or
what
what's
the
best
component
to
to
solve.
As
for
the
condition
I
think
cubelet
is
probably
the
best,
the
one
that
has
the
best
knowledge
to
to
do
that.
E
Yeah
so
you're
saying
yeah,
I,
think
I
think
it
would
be
very
clear
that
the
the
condition
will
have
to
be
added
as
part
of
in
the
cubelet
code.
I
have
I've
done,
some
exploring
where
that
would
be,
and
I
know
that
it's
it's
pretty
I
think
I
found
the
code
where
that
is
added
in
the
cubelet
to
do
that
and
yeah.
So
at
least
the
condition
the
transitioning
to
fail,
but
I
don't
know
how
to
start
that
conversation
I.
C
Think
this
is
a
cubelet
like
I
would
explain
that
the
cubelet
would
be
doing
that
because
you're
talking
about
failing
to
start
the
part
right
and-
and
so
you
think,
of
the
part
as
like
you
know,
State
machine
like
when
it
gets
created.
It's
the
API
server
that
starts
looking
at
it
after
it
was
created
and
persisted
it's
the
schedule
that
picks
it
up
moves
it
to
a
stage
from
unscaleable
to
schedulable
and
once
it's
assigned
to
a
node
I.
C
E
C
Mean
I'm
just
like
trying
to
describe
like
how
to
think
about
this
and
web.
The
responsibility
of
you
know
which
component
is
responsible
for
transitioning,
that
part
to
the
state
from
pending
to
field
right
yeah,
so
I
would
expect
that
cumulate
should
be
doing
that
and
so
I
would
I
would
present
this
topic
to
Sig
node.
C
A
I
have
one
question
here
so
kind
of
on
abdullah's
comment
about
who's
responsible
for
identifying
and
maybe
changing
the
state.
I
was
thinking
about
like
it
depends
on
why
the
Pod
has
gone
into
a
pending.
State
comes
into
play
here
as
well.
A
So
if
resources
that
are
there
aren't
enough
resources
say
on
the
cluster
and
the
Pod
is
pending,
it
is
kind
of
because
it
hasn't
found
a
node
suitable
to
be
placed
I
understand
that
volume,
provisioning
and
and
cases
kind
of
are
after
the
Pod
has
been
scheduled
on
a
node
and
node
has
taken
ownership
of
that
pod.
But
before
that
has
happened,
a
pod
could
stay
pending
because
there
aren't
enough
resources,
and
in
that
case,
cubelet
probably
wouldn't
be
the
right
place
to
transition
that
state
from
pending
to
failure.
C
For
that
case,
we
already
have
the
condition
right,
like
the
unscalable
condition.
The
schedule
already
does
that
now,
I
guess,
the
question
is
who
should
delete
it
is.
Is
that
the
idea
like
you
want
it
to
be
deleted,
or
is
it
fine
to
continue
to
exist?
There.
A
Yeah
I
think
what
what
is
being
said
or
I've
looked
at
a
very
high
level,
but
the
proposal
says
that
we
should
transition
for
the
pot
from
pending
to
failure.
So
it
becomes
evident
that
there's
some
error
and
we
need
to
either
take
some
action.
Maybe
provision
more
resources
make
sure
that
the
volumes
are
available
and
that
changes
from
case
to
case.
E
Yeah
I
think
there's
a
lot
of
cases
is
what
I
was
when
I
was
going
through.
The
examples
like
I
know,
the
there's
I
didn't
really
have
much
examples
of
this
unschedulable
one,
but
I
know
that
that
condition
is
at
least
well
represented,
but
there's
also
the
yeah
the
configuration
cases
that
I've
seen
happen
a
lot,
and
that
is
one
that
I
like
it
is
scheduled.
It's
just
that
it
gets
stuck
in
pending
and
the
only
way
to
take
it
out
is
you
have
to
have
an
out
of
tree
controller
delete
the
pod.
E
So
at
least
that
is
for
this
for
I'd
understand.
There's
a
lot
of
there's
a
lot
more
cases
than
I've
covered
here.
D
I
think
that
that's
a
good
point
that
you
know
adding
the
condition
is
something
that
needs
to
be
like
you
lit,
but
this
this
cutter
already
does
so
that's
not
a
problem,
but
deleting
the
Pod
or
you
know,
failing
the
pot
could
be
done
by
an
external
component.
D
Maybe
that's
a
good
place
to
start
just
you
know
prototypes
to
have
an
external
component
to
the
deletion
and
then
have
it
as
a
proof
of
concept,
because
he
unpair
it
with
the
job,
the
job
failure
policy
API
to
see
how
how
useful
it
is.
I
think
that
the
major
question
is
how
we
Define
yeah,
how
we
translate,
or
when
do
we
transition
to
fail?
D
We
need
we
need
a
timeout
at
the
very
least,
but
maybe
timeout
is
not
enough
and
if
we
think
of,
for
example,
unicorn
that
just
presented
the
pot
can
be
pending
and
in
I
don't
know
if
it
it
would
introduce
an
unescalable
a
condition,
but
if
it
does,
it
could
be
because
there
is
no
quota
and
you
know
it
doesn't
mean
that
the
pot
failed,
but
it
will
fail
later
so
I,
don't
know,
I,
don't
know
about
that
scenario.
For
example,.
E
Yeah
I'll
I
guess
I
will
I'll
think
about
it,
a
little
bit
more
and
then
I'll
try
to
reach
out
to
signode
and
see
about.
Maybe,
although
you
and
I
can
or
I
I
guess,
I
can
I
do
like
the
idea
of
the
job
controller,
picking
this
out
through
the
Pod
failure
policy,
just
as
a
a
proof
of
concept,
because
I
I
think
the
problem
will
run
into
if
cubelet.
Is
that
the
it's
like
we
add
it's
like.
E
We
need
another
like
state
in
order
to
say
like
if
there's
a
configuration
and
it
failed
because,
like
there's,
there's
I
think
for
containers,
there's
waiting
running
and
failed
and
like
sort
of
that,
it's
a
big,
it
might
be
a
pretty
large
transition
to
add
another
like
State
transition.
Here
at
least
that's
where
I
that's
where
I
was
coming
at
and
then
even
from
the
issue.
I
think
I
posted
that
started.
E
This
conversation
there
was
a
lot
of
I
think
there
was
more
of
a
request
to
have
this
as
an
out
of
tree
controller
rather
than
in
Cuba.
So
that
was
why
I
kind
of
went
with
at
least
the
condition.
That's
a
starting
point
and
then
starting
a
conversation
about
how
we
want
this
further
down
the
road
but
I
think
we're
over
by
a
few
minutes.
So
I
want
to.
A
Yeah,
just
one
last
comment
on
this:
if
you
want
to
bring
it
to
signor
and
you
plan
to
get
this
in
for
127
signal
is
going
to
have
their
planning
session
for
127
next
week.
So
I
think
it
would
probably
be
the
right
time
and
and
kind
of
you
bring
it
to
signal
and
that
gets
incorporated
or
at
least
discussed
in
the
next
meeting,
because
if
we
don't
do
that,
then
it
will
move
on
to
the
following
cycle.
E
A
Perfect.
Thank
you.
Thank
you.
Thanks
everyone.
Sorry,
it
went
over
time.
I'll
see
you
guys
in
two
weeks,
bye.