►
From YouTube: Kubernetes SIG Apps 20220919
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Good
morning,
good
evening,
good
afternoon,
depending
on
where
you
are
today,
is
September
19th,
and
this
is
another
of
our
bi-weekly
Sega
School.
My
name
is
mache
and
I'll,
be
your
host.
Today,
before
we
jump
into
126
items
that
we're
going
to
cover
a
quick
reminder,
the
enhancement
series
for
126
is
roughly
two
weeks
away.
It's
on
October
6th,
I
didn't
even
properly
calculate
it,
but
I
think
that's
about
two
weeks,
two
and
a
half
weeks,
it's
Thursday
from
this
Thursday
in
two
weeks.
A
It
is
when
there
will
be
the
the
free
dates.
I
link,
their
126
schedule,
actually
try
to
get
your
reviews
for
enhancements
sooner
than
that,
because
the
production
Readiness,
which
is
a
required
step
for
all
the
enhancements,
the
soft
treats
for
that
one-
is
actually
a
week
before
the
actual
cat
phrase.
A
Because
from
what
I
remember,
there
were
only
two
two
persons
assigned
at
prr
reviewers
for
this
cycle,
so
try
to
get
through
most
of
the
reviews.
Sooner
than
later,
with
that
I
think
we
can
jump
over
to
main
topics
unless
Ken
wants
to
add
something,
not
sure
if
I
missed
anything
with
regards
to
nonsense.
Ken
I
think.
A
C
Okay,
let
me
let
me
speak
I
hope
you
can
hear
me
right
loud.
C
C
So
what
we
are
working
on
beta
is
to
extend
the
feature
with
handling
pod
failures
initiated
by
cubelet.
So
maybe
a
couple
of
words
about
this
feature
in
general.
So
in
this
feature
we
tackle
the
problem
of
handling
pod
failures,
a
little
bit
for
jobs
like
a
little
bit
more
flexibility
than
what
the
standard
portfolio
policy
of
backup
limit
gives.
C
C
However,
what
we
discovered
in
the
experiment
is
that
the
Pod
end
state
is
like
podcasts
can
fail
in
many
different
ways,
and
it
isn't
like
standardized
that
you
can
look
at
reason
or
status
to
to
tell
why
the
Pod
failed
or
what
was
the
cube
component
that
initiated
the
failure.
So
a
big
part
or
a
part
of
the
feature.
C
So
whenever
in
Ideal
World,
whenever
we
kill
a
pod
or
what
is
terminated,
we
add
some
pot
condition
that
is
then
easily
easy
to
be
interpreted
in
terms
of
do
I
want
to
retry
it
or
not
so
for
Alpha
what
we
did.
We
introduced
a
new
spot
condition,
type
called
disruption.
Target
and
we
add
this
disruption.
Target
condition
whenever
the
Pod
is,
is
five
due
to
disruption.
So,
for
example,
we
added
when
we
add,
we
evict
the
Pod
due
to
no
execute
stain
or
due
to
preemption
or
due
to
API
eviction.
C
So
in
all
these
cases,
it's
not
like
the
Pod
is
it's
a
fault
of
the
Pod.
It's
just
a
disruption,
and
that
is
the
reason
of
the
Pod
failure.
So
we
don't
really
want
even
to
count
this
failure
against
the
back
of
limit,
so
this
is
yeah.
This
shows
an
example
about
failure,
policy
that
uses
the
disruption
Target,
but
also
what
it's
using
here
you
can
see
is
resource
limit
succeeded.
So
this
is
already
hinting
to
what
we
want
to
do
in
beta.
C
So
in
Alpha
we
added
disruption
Target,
but
we
only
annotated
the
failures
that
I
were
initiated
by
the
either
scheduler
or
controller
manager,
but
with
beta
we
want
to
cover
cubelet
and
cubelet
is
responsible
for
initiating
eviction
in
a
couple
of
scenarios
and
I
think
the
most
problematic,
or
at
least
the
one
that
took
us.
The
most
discussion
time
is
when
it's
evicted
due
to
the
limits
exceeded
for
resource
like
memory
or
disk.
C
C
However,
we
intend,
as
a
result
of
the
discussion,
to
change
this
name,
because
how
do
we
tell
if
the
limits
are
exceeded?
12
now,
so
in
our
invest
from
our
investigation?
When
cubelet
initiates
eviction
due
to
this
limit
exceeded,
then
it's
easy,
because
it's
cubelet
that
checks
the
if
the
limit
is
succeeded
in
its
code
and
that
evicts
the
port
if
it's
exceeded
and
then
we
can
just
easily
inject
the
adding
of
the
new
condition
there.
C
So
it's
out
of
memory
killer
that
avoids
or
kills
the
container
and
then
runtime
container
items
sets
the
reason
out
of
member
kill,
and
this
is
what
we
observe
in
cubelet,
and
the
issue
is
that
in
some
situations,
actually
the
out
of
memory
killer
can
kill
a
container
even
if
it's
not
exceeded
the
limits,
but
it
was
close
like
the
node
itself
was
under
pressure
and
then
the
out
of
memory
killer
has
some
allegory
to
determine
which
spot
or
which
container
to
process
to
to
kill,
and
but
it
does
not
necessarily
mean
that
the
limits
were
exceeded.
C
So
that's
why
the
name
may
not
be
accurate
and
then,
as
a
result
of
the
discussion,
I
think.
The
currently
leading
approach
is
to
name
it
like
resource
exhausted
or
something
like
this,
but
in
general.
I
I
would
like
welcome
some
input
on
how
to
and
if
it's
either
possible
to
to
to
to
to
determine
if
the
limits
are
exceeded
or
we
can
or
what
we
intend
to
do
is
is
best
what
we
can
have.
C
There
are
also
a
couple
of
scenarios
when
cubelet
initiates
changes
which
we
can
interpret
as
just
disruptions.
So,
for
example,
if
there
is
no
the
pressure,
then
we,
but
there
is
no
indication
that
the
limits
were
actually
exceeded.
Then
we
just
add
the
disruption
Target
as
as
in
the
other
cases,
so
there
is
also
admission
hours,
but
I
think
those
those
are
easier
in
general,
so
I
think
that's
will
be
for
my
brief
introduction
of
the
feature
and
what
we
want
to
do
in
beta.
C
If
you
have
some
questions,
I
think
you
can
then
expand
more,
but
in
general
you
know
any
input
either
now
or
at
the
at
the
cap
will
be
is
welcome.
So
that's
me.
A
In
that
case,
of
course,
everyone
are
more
than
welcome
to
have
a
look
at
the
cap
read
carefully
through
it
and
leave
the
comments
or
suggestions
and
the
PRS
next
on
the
list
is
Philip.
D
D
So
this
so
I
posted
a
cap
which
is
introducing
a
new
condition
called
operational,
which
is
still
up
to
the
discussion,
and
it's
mainly
to
start
a
discussion.
How
this
condition
should
work
and
how
it
should
operate.
So
we
could
reuse
it
across
all
the
workloads
and
have
like
well-behaved
Behavior
for
consumers
to
use.
So
please
take
a
look
if
you
want
on
this
cap
and
yeah
thanks.
A
Okay,
hearing
none,
let's
jump
over
to
the
next
on
the
list.
Ravi
I
see
you
have
two
of
them,
but
I
didn't
see
an
issue
linked
to
the
first
one.
Do
you
have
that
link
handy
I?
Have
the
hot
healthy
policy
open,
but
I
don't
have
the
other
one
that
you
started.
E
E
The
main
thing
that
I
wanted
to
get
some
feedback
on
is
is
this
some
problem
that
rest
of
the
folks
are
facing
to,
especially
in
a
multi-clustered
world,
where
we
have
workloads
that
are
spanning
across
clusters
and
since
pdb
is
actually
specific
to
a
cluster,
it
is
actually
causing
the
problem
there.
We
wanted
to
count
pdbs
across
clusters
to
ensure
that
the
disruption
is
allowed
or
not.
So
we
have
couple
of
ideas
on
how
to
implement
it.
E
Deep
is
also
here
from
Apple
who
works
with
me
on
on
those
areas,
but
we
are
wondering
if
anyone
else
has
similar
issues
and
how
they
have
dealt
with.
B
B
I
mean
the
the
short
answer
is
no
right
now,
right,
like
pdps,
are
really
like.
When
we
designed
them,
they
were
meant
for
controlling
disruptions
for
things
like
automated
node
repairs,
right,
like
so
you're
upgrading
all
the
nodes
in
your
cluster,
and
you
want
to
find
a
way
to
go
through
and
knock
over
a
one
by
one,
maybe
move
faster
than
one
note
at
a
time,
and
you
want
to
try
to
not
be
disruptive
to
workloads
that
can't
tolerate
the
disruption
from
the
underlying
infrastructure.
B
Modifications
right,
they're,
not
there's
no
notion
of
that
being
multi-cluster,
aware
and
like
there's
no
link
between
the
API
Machinery
across
multiple
clusters,
so
there's
no
real
way
to
communicate
an
ongoing
infrastructure
disruption
that
potentially
spans
multiple
clusters
within
the
same
region
or
even
within
the
same
zone
right.
So
typically,
what
I've
seen
people
do
for
this?
Is
you
use
the
pdb
to
ensure
that
you
aren't
disrupted
within
a
particular
cluster
and
then
for
whatever
higher
level
mechanism
you're
using
to
orchestrate
infrastructure
across
multiple
clusters?
B
You
have
some
other
mechanism
to
manage
disruption
like
for
Rolling
cluster
upgrades
or
for
tolerating
Regional
outages
right.
So,
like
pdb
is
not
the
mechanism
that
I've
seen
people
use,
not
the
mechanism
that
I've
used
either.
E
I
see
be
like
say
if
you
have
a
multi-cluster
controller,
which
is
aware
of
multiple
clusters
that
are
happening
and
it
can
actually
interact
with
pdbs.
Can
we
actually
say
that,
for
this
particular
workload,
I
do
not
want
the
existing
PDP
controller,
which
is
in
my
cluster,
to
take
care
of
the
disruptions,
but
I
would
like
to
delegate
it
to
another
higher
level
controller,
which
is
multi-cluster
aware.
Can
we
have
some
mechanism
or
mechanics
in
place
which,
which
tell
that
I
do
not
want
pdb
controller
to
manage
the
disruptions
for
this
particular
workflow?
B
Have
I,
don't
think,
there's
a
mechanism
by
which
the
eviction
controller
is
able
to
call
out
to
a
third-party
mechanism
in
order
to
assess
the
decision
of
an
eviction
right
because,
like
the
eviction
controller,
is
ultimately
like,
if
you're
using
evict
in
conjunction
with
pdb
the
way
you
would
do
it.
One
thing
that
I've
seen
people
be
successful
with
is
the
creation
and
mutation
creation
and
removal
of
pod
disruption
budgets
across
clusters
by
by
a
higher
level
mechanism
during
infrastructure
orchestration.
B
B
You
know
like
let's
say
conservatively
now:
you
don't
want
to
lose
anything
so
high
disruption,
budgets
aren't
really
mutable
in
a
very
good
way,
but
what
you
can
do
is
create
a
new
disruption
budget
that
targets
it
like
a
more
conservative
disruption
budget
and
then
just
delete
the
older
one,
and
that
would
raise
it
and
then,
after
the
disruption,
assuming
it's
not
extremely
long
running
has
passed
and
you
recover
the
capacity
and
your
application
is
now
there
again.
B
You
can
go
ahead
and
you
know
conserve
like
be
less
conservative
with
the
disruption
budget
in
the
cluster
and
that's
the
primary
mechanism
that
I've
seen
it
used
and
granted
that
the
orchestration
to
do
that
is
complicated.
I'm,
not
saying
that's
trivial,
but
it's
the
way
I've
seen
it
done,
be
super
open
to
trying
to
do
something
easier.
It's
just
again.
B
It's
not
like
there
there's
no
notion
of
multi-cluster
API
Machinery
inside
of
core
right,
like
I.
Don't
have
a
mechanism
for
the
API
Machinery
to
hook
up
if
the
eviction
controller,
which
I'm
actually
sure
of
Sig
apps,
owns
like
if
you
wanted
to
try
to
do
something
where
we
modified
the
eviction
controller,
to
be
able
to
call
out
to
a
third-party
mechanism
via
some
object.
B
I
wouldn't
do
it
with
pdb
I
would
have
it
or
maybe
with
pdb,
but
have
some
type
of
mechanism
that
triggers
the
eviction
controller,
to
call
out
to
another
piece
of
Machinery
that
helps
in
the
eviction
decision.
That
might
be
something
but
I'm
not
even
sure
that
cigaps
actually
do
we
own
the
eviction
controller.
We
all.
A
Don't
think
so,
maybe
we
did
touched
on
it,
but
I
would
I
would
probably
still
given
that
this
is
touching
the
API
surface,
because
that
lives
in
the
in
the
API
server
I'll,
probably
even
if
we
would
be
involved
I,
would
still
reach
out
to
the
API
machinery
and
ask
them
for
for
the
feedback,
because
this
will
affect
the
API
throughput
in
the
long
run,
because
you
will
be
basically,
an
eviction
will
be
extended
with
I.
A
Don't
know
it's
some
kind
of
a
third
particle
and
I
think
that
we're
not
doing
this
because
scheduler
has
that
capability
that
it
will
reach
out
to
for
a
decision
to
external
plugins,
which
I'm
not
saying
that
it
can't
be
done,
but
I'm
just
saying
that
we
should
sync
with
our
API
yeah
performance.
A
B
E
A
A
Yeah
that
I
I
remember
when
we
were
talking
about
the
the
PDP
it's
called
healthy.
They
were
the
ones
that
were
talking
and
having
some
ideas
around
how
this
could
potentially
work,
because
they
were
also
reusing
or,
like
I
said
over
using
pdb
for
an
unmanaging
pot,
we're
actually
managing
or
ensuring
that
a
particular
number
but
yeah
I
would
probably
agree
with
with
what
Ken
was
explaining
with
regards
to
having
a
a
third-party
tool
to
manage
the
the
pdb
slide,
especially
that
you
already
I
assume
since
you're
talking
about
multi-clusters
Solutions.
A
So
you
probably
already
have
some
kind
of
a
controller
written
that
is
responsible
for
creating
those
workloads
in
all
the
Clusters.
Accordingly,
to
whatever
roles
that
you
use,
codified,
I
guess.
E
Yeah,
to
be
clear,
what
we
are
thinking
of
doing
is
what
Ken
was
suggesting,
where
we
wanted
to
modify
the
pdbs
for
the
individual
clusters
like
have
a
higher
level
controller,
which
is
multi-clustered
aware
and
I
mean
not
modifying
the
pdbs
but
creating
those
pdbs
that
that
would
allow
the
disruption
or
not
allow
the
disruption
to
happen.
E
But
we
we
are
wondering
if
the
second
possibility
that
Ken
has
mentioned,
which
is
like
have
the
eviction
logic,
be
hooked
up
to
another
external
third-party
controller
or
some
other
entity
which
can
actually
make
that
decision.
For
us.
B
I
mean
it's
possible.
My
advice
like
trying
to
put
myself
in
your
shoes
the
reason
that
it
depends
on
what
your
time
Horizon
is
really
right
like
if
you
want
to
modify
the
eviction
controller
and
get
it
to
a
beta
level
where
it's
available
on
most
Cloud
providers
or
most
distributions
that
are
enabled,
and
you
that
that's
going
to
take
a
while
right.
So
this
is
a
problem
that
you
have
like
right
now
that
that
might
be
a
longer
way
to
go
about
it,
but
it
is
feasible.
B
The
other
thing
I
would
say
is
that
it
are
you
like
it's
a
question.
Are
you
dealing
primarily
like?
Is
this
a
problem
you
have
with
stateful
workloads,
primarily.
E
B
Are
you
doing
this
with
stateless
serving
workloads.
B
So
the
other
thing
that
I've
done
in
the
past
is,
if
I
had,
are
they
heterogeneous
or
is
it
like
one
particular
application
that
is
kind
of
core
to
your
organization
that
you're
worried
about.
B
So
what
what
I've,
what
I've
done
in
the
past
there
and
I've
seen
other
people
do
that
are
successful,
is
so
for
the
for
the
staple
applications.
You
can
write
a
custom
orchestrator
using
a
crd
that
manages
the
cluster
itself
so
get
outside
of
a
staple
set
world
and
orchestrate
it
directly
that
way,
you're
you're
able
to
manage
disruption
in
whatever
way
you
want,
and
it's
easy
to
kind
of
it's
easier
or
lower,
lower
barrier
to
entry
to
get
a
working
POC
out
right.
B
We
for
stateless
surveying
workloads
generally
like
I,
would
not
even
you
like
managing
capacity
across
multiple
clusters
is
usually
a
function
of
managing
traffic.
Ingress
like
that
would
be
the
way
I
would
handle
it.
So
just
you
know
like
whatever
you're
using
for
load
balancing
across
the
original
clusters.
B
Well,
there
are
other
ways:
mechanisms
that
kubernetes
has
to
get
around
it,
using
like
writing,
operators
or
custom
controllers
for
the
workload
that
are
capable
of
handling
the
intricacies,
because
with
staple
applications
a
lot
of
times
that
disruption
budget
isn't
like
it's
sufficient
only
for
availability,
but
the
intricacies
of
the
storage
topology
aren't
captured
by
the
disruption
budget
anyway.
B
So
like
with
Cassandra,
for
instance,
a
lot
of
times,
you
can
say,
don't
take
down
more
than
x
of
these,
but
it
doesn't
necessarily
provide
availability
for
any
particular
partition
of
your
key
families
right
and
same
with.
Kafka
right,
like
you
can
say,
don't
take
down
more
than
exodies,
but
if
you're
looking
to
have
like
a
a
stronger
semantic
around,
like
you
know,
I
I.
This
is
how
my
partitions
are
replicated
across
the
topology
and
I
need
to
make
sure
that
these
two
don't
go
down
right,
like
there
are
other
kind
of
gotchas
there.
B
E
Got
it
yeah,
so
I
think
we
have
some
customizations
in
place
by
that.
What
what
I
mean
is
we
have
our
own
crd
and
there
is
a
controller,
but
but
high
level
we
have
been
using
pdbs
as
as
the
disruption
building
blocks
through
which
we
can
say.
I
would
like
to
get
this
workload
scaled
down
to
these
many
parts
or
scale
up
to
more
than
these
number
of
pops.
But
at
this
point
of
time,
what
we
are
wondering
is:
can
we
make
it
Community
compatible
where
we
can
say
that
hey?
E
This
is
something
that
is
kubernetes
is
providing
out
of
the
box,
or
there
is
an
extension
mechanism
in
kubernetes
that
we
can
go
ahead
and
use
instead
of
instead
of
having
something
that
that
we
have
to
maintain
on
our
own.
B
Yeah,
no
one's
saying
you
can't
walk
down
that
path
or
that
it
would
be
generally
unsupported.
It
just
might
be
hard
as
well,
because,
like
magic
said,
it
does
definitely
affect
the
core
API
and
the
interview
on
API
and
then
on
top
of
that
it
would,
if
you
modify
the
behavior
of
the
eviction
controller,
there's
like
conformance
implications
as
well.
So
it's
it's
just
a
larger
conversation,
but
it's
something
that
can
be
done.
B
It's
just
like
if
you
so
typically
like
if
I've
written
up
like
if
I
deploy
an
operator,
that's
managing
Cassandra
or
Kafka
or
other
Staple
workloads
in
the
Apache
ecosystem.
I
do
deploy
like
PDP
I'm,
not
saying
pdbs.
Aren't
part
of
that
solution.
It's
just.
It
provides
an
entry
point
inside
of
the
in-cluster
controller
that
operates
for
that
specific
workload
where
I
can
already
modify
those
PB
pdbs
in
an
intelligent
way
as
necessary
right
so
usually
that's
where
I've
captured
the
logic
in
the
past
and
where
I've
seen
other
people
do.
That
kind
of
successfully.
E
And
you
would
put
those
those
those
rail
guards
in
in
our
guardrails
in
in
The
Operators
or
the
workflow
controller.
Instead
of.
B
B
Yeah,
that's
what
I've
done
in
the
in
in
the
past
right,
because
the
controller
is
aware
of
the
specific
considerations
for
the
workload
and
also
like
from
from
their
perspective
like
if
I'm
using
this,
because
I
have
teams
that
are
turning
up
Cassandra
rings
on
a
regular
basis.
Right
like
a
lot
of
times.
They
don't
want
to
be
concerned
with
configuring,
the
pdb
and
the
API
Machinery.
They
want
some
API
where
it's
like.
B
You
know,
Cassandra
ring
or
Cassandra
cluster,
and
they
put
that
in
the
yaml
and
then
the
controller
looks
at
bat
and
decides
what
it
needs
to
do
in
terms
of
creating
pdbs,
creating
Services
rolling
out
pods,
whether
they're
using
Staples,
that
are
using
something
else
under
the
foot
in
order
to
make
sure
that
works.
Setting
the
naming
conventions
for
the
Cassandra
nodes,
all
that
stuff,
maybe
things
and
tolerations,
and
to
try
to
keep
things.
B
That
might
not
be
good
tenants
to
run
alongside
beside
Cassandra
right
because,
like
you
get
a
lot
of
things
that
hit
the
the
page
cache
at
the
same
time
and
create
memory
pressure
Cassandra's
not
as
performant
as
you'd,
like
so
all
of
those
considerations
usually
and
like
the
thing
about
it,
is
like.
If
you
have
a
framework
like
that,
and
then
you
open
source
that
whole
thing
that's
another
way
to
contribute
it
to
the
community
in
a
big
way.
B
That
would
help
other
people
who
have
the
same
workloads
that
you're
running,
because
it
sounds
like
a
lot
of
what
you're
running
are
primarily
internal
or
external
versions
of
Open
Source
workloads
that
are
in
common
use.
You're
not
running
like
this
is
this
custom
thing
that,
like
yep,
we
built
right,
yeah.
E
Like
most
of
them
can
be
open
source.
E
All
right,
let
me
do
one
thing
like
deep
and
I
will
go
back
and
then
we
will
see
if
we
can
make
those
changes
within
the
operator
so
that
those
I
mean
we
already
have
those
operators
being
open
source.
But
we
wanted
to
make
sure
that
we
are
not
doing
something
that
is
Apple
specific
or
if
there
is
a
way
to
do
it
in
a
community
way.
We
would
like
to
do
with
that.
E
So
we'll
see
if
those
things
can
be
put
in
the
open
source
and-
and
we
will
get
back
to
you.
B
A
Cool
thanks
a
lot
Ravi
on
then
on
to
the
next
topic:
Philippines
Health
policy
for
pdb.
E
So
I
think
I
should
give
an
update
on
this.
So
I
have
taken
the
pr
from
Martin
and
I've
started
working
on
the
changes,
but
I
think
I
made
the
mistake
of
making
API
changes
and
the
implementation
together
and
I'm
I
think
I'll
not
have
enough
time
to
work
on
the
implementation
side
of
things.
So
what
I'm
thinking
of
doing
is
I'll
make
the
those
API
changes
in
a
separate,
PR
and
then
open
the
pr
for
reviews
and
Philip
is
going
to
work
on
the
implementation
side
of
things.
E
A
That
yeah
I
can
edit
this
description,
not
sure
if
more
time,
it's
reachable
to
paying
me
about
updating
this,
so
that
it
matches
the
current
state
and
the
cap
will
only
require
updating
the
numbers
so
that
it
matches
what
we
are
actually
doing,
because
I
guess
it
is
currently
pinpointed
to
those
versions.
That's
describing
in
this
issue.
A
D
A
Okay
hearing
none
last
topic
for
126
is
something
that
Matthew
carrier
was
working
on
for
a
little
while
this
basically
adds
a
new
field
in
a
state
tool
set
which
allows
you
to
express
what
should
happen,
whether
PVC,
when
I,
say
post,
that
it's
either
scaled
down
or
being
removed,
whether
the
the
PVCs
can
be
safely
removed,
or
they
should
stay.
A
If
I
remember
correctly,
there
is
an
updated
cap.
So
for
those
this
feature
is
currently
in
Alpha.
That's
probably
the
most
important
bit
and
my
mine
was
pinging
me
about
pushing
this
over
to
Beta,
so
I
think
he
has
APR
open
to
address
that
if
you
have
a
little
bit
of
time
and
interest
in
Statesville
set,
have
a
look
at
the
at
the
issue
and
the
attached
vrs
I'll
sync
with
him
about
updating
the
description
as
well.
A
A
Here
and
none
so
with
that,
I'm
gonna
give
you
back
22
minutes
of
your
time.
Thank
you
very
much
all
folks
for
for
hurtful
discussion
as
usual
and
see
you
next
time.
Bye
all
right.
Thank.