►
From YouTube: 20200708 Cluster API Office Hours
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Okay
welcome
everyone
today
is,
actually
I
don't
know,
july,
8th
2020,
and
this
is
the
cluster
api
office
hours.
Cluster
api
is
a
sub
project
of
sick
cluster
life
cycle.
A
Please
adhere
to
the
cnc
dev
code
of
conduct.
I
use
the
raised
hand
feature
on
zoom
if
you'd
like
to
speak
and
be
kind
to
each
other.
If
you
are
attending,
please
add
your
name
to
the
agenda
in
order
to
get
edit
access,
you
need
to
join
the
cluster
lifecycle
mailing
group.
The
instructions
are
at
the
top
of
this
document.
A
So,
if
you're
new
to
this
meeting
or
to
this
group
and
you'd
like
to
introduce
yourself
and
tell
us
a
bit
more
about
why
you're
here
now
is
a
great
time
to
do
that,
so
I'll
go
ahead
and
mute
and
feel
free
to
unmute
and
speak
up.
If
you'd
like
to
introduce
yourself.
B
I'll
say
a
quick
hello
as
I've
not
joined
this
meeting
before
I'm
james
bondly.
Some
of
you
may
know
me
from
set
manager
and
a
few
other
meetings,
I'm
just
dropping
in
to
chat
about
cert
monitoring,
strapia.
C
I'll
jump
in
there
and
well
as
well,
I'm
chris
hein.
I
work
at
apple,
I'm
kind
of
sitting
in
on
this
just
to
feel
out
what's
going
on
and
see
if
there's
any
anything
that
we
can
help
with
or
potentially
look
into
into
leveraging
great
welcome.
A
A
A
Okay,
so
we'll
move
on
to
psas
may.
Do
you
want
to
take
the
first
one.
D
Yeah
sure
yeah
so
me
and
cecile
finalizing
slides
for
kubecon
this
week.
D
If
there
are
things
you've
been
working
on
either
in
class,
api
or
infrastructure
provider,
and
you
you
want
to
highlight
that
at
the
virtual
coupon,
which
is
on
august
20th,
I
believe,
might
be
august
19th
and
then
contact
us
in
the
community
slack.
I
suppose
I'll
drop
my
email
in
as
well
there
as
well.
If
you
want
to
reach
out
that
way,.
A
Yep,
let
us
know
vince
rc,
release,
exciting.
E
Yeah
so
we're
almost
getting
there
so
yeah
like
yesterday,
we
cut
rc0
and
thank
you
for
to
the
azure
folks
like
to
for
quickly
testing
this
out.
I
think
mater
and
users
will
like
put
a
pr
up
and
like
that's
at
the
end,
which
seems
like
they
have
been
successful,
which
is
great
to
hear.
E
There's
a
lot
in
this
release
so
feel
free
to
take
a
look
at
the
roadmap,
but
this
is
a
rough
list
of
like
high
level
features
that
we
added
in
and
yeah
community
and
the
team
like
did
a
great
work
in
this
release.
The
release
notes
will
actually
be
compiled
at
release
time,
I'm
hoping
to
do
an
rc1
tomorrow,
which
will
add
support
for
a
cluster
cuddle,
move.
F
E
Close
resource
set
and
then
the
actual
release,
like
maybe
either
like
friday
or
probably
early
next
week,
though
like
so
that
we
don't
rush
it
and
we
can
test
it
a
little
more.
E
So
that's
exciting
and
congrats
everyone
for
this
huge
milestone
and
then
like
after
this,
like
we'll,
probably
start
talking
about
alpha
4
in
the
next
few
meetings.
Any
questions
on
that.
A
I
don't
see
any
hand
raised,
so
I'm
gonna
assume
that's
a
no
that's
awesome
great!
So
moving
on,
I
don't
see
any
demos
or
pocs
signed
up
for
today,
so
we'll
jump
to
general
questions
and
by
the
way,
if
you're
new
ish,
this
general
question
section
is
great.
If
you
have
questions
throughout
the
meeting
so
feel
free
to
add
them
even
after
we
pass
this
section
I'll
make
sure
to
check
back
in
at
the
end,
so
we
have
one
from
lauren
about
the
no
draining
timeouts.
G
Yeah,
so
I
have
a
there
was
a
issue
about
adding
a
no
draining
timeout
to
the
machine
controller.
I
believe,
and
if
you
want
to
open
that
up.
Basically,
I
kind
of
did
some
some
diving
in
and
I
found
out
that
we
have
a
the
machine
controller
by
default
when
it
drains
the
nodes.
It's
sort
of
default
way
of
doing
things
is
by
evicting
the
nodes
and
then
waiting
for
them
to
be
deleted.
G
However,
I
noticed
that,
amongst
the
many
timeouts
that
are
there,
namely
timeout
global
timeout
and
there's
one
called
skip,
wait
for
delete
yada,
yada
yada.
What
I
notice
is
that
in
the
path
for
eviction
we
essentially
after
evicting,
we
wait
for
the
pause
to
be
deleted
essentially
forever,
but
if
the
delete
pods.
So
if
we
disable
eviction,
I
mean
just
go
down
the
path
of
deleting
the
pods.
G
Another
thing
I
wanted
to
bring
up
was
if
there
are
any
sort
of
tests
specific
for
this
draining
logic.
If
you
could
point
me
to
them,
I
didn't
I
kind
of
did
a
quick
check
and
I
didn't
really
run
into
any.
So
if
you'll
know,
if
there
are
any
tests
around
testing
these
things,
they
might
also
kind
of
be
useful
and
understanding
be
the
behavior
again.
I
I
just
want
to
understand
this,
because
if
I
you
know
I
want
to
make
changes,
I
don't
want
to
screw
everyone
over
with
some
real
time
out.
H
Yeah,
so
I
think
the
oh
I'm
sorry,
so
I
think
the
biggest
challenge
we'll
have
here
is
if
we're
recommending
that
users
that
are
running
critical
workloads
use
pdbs
to
ensure
that
those
workloads
are
running
as
intended.
H
We
need
to
make
sure
that
we're
honoring,
you
know
those
pdbs
as
part
of
eviction
and
even
including
that
in
our
kind
of
like
scaling,
work
flows
and
all
of
that
stuff,
so
I
think
it's
good,
probably
not
to
wait.
You
know
until
max
end,
but
at
the
same
time
we
need
to
make
sure
that
we
give
kind
of
reasonable
time
for
those
workloads
to
evict
prior
to
moving
on.
H
You
know
some
of
that
information,
but
right
now
we're
just
kind
of
blind
right
now,
and
I
think
the
the
draining
process
is
the
only
interaction
point
we
have
right
now.
That
would
have
any
chance
of
interacting
with
the
pdbs.
I
Yeah,
as
of
today,
you
could
build
some
kind
of
external
controller
if
you
want
it
and
apply
the
skip,
node
draining
annotation
after
some
time
out
that
you've
determined.
G
Yeah,
I
think
the
origin
of
this
issue
was
like
to
make
these
timeouts
during
the
draining
configurable
so
that
they
could,
you
know,
adhere
to
the
pdbs
that
have
been
configured
as
well.
I
I
guess
I
can
reach
it
michael
later,
but
regarding
the
separate
controller
to
to
annotate
the
pods,
I
was
hoping.
J
I
think
having
the
ability
to
have
an
external
controller
with
the
annotation
as
an
escape
hatch
is
useful,
but
I
also
think
that
it
would
be
nice
to
not
require
that
if
you
did
want
to
just
say:
okay,
I've
been
waiting,
10
minutes
for
the
eviction
to
happen.
It
hasn't
happened.
So
let
me
just
force
it
through.
A
Yeah
I
see
a
few
hands
raised.
I'm
sorry,
I
don't
know
your
first
name
d,
thorsten
thorson,.
K
Sorry,
I
guess
I
didn't
have
my
name
set
and
he's
no,
it's
dean.
I
think
he
just
actually
answered
my
question.
It
was
just
going
to
be
that
we
have
a
use
case
for
very
long
drain
times.
So
as
long
as
that's
configurable
to
a
very
large
number,
we
would
be
fine,
but
if
the
default
changed,
we
just
need
to
just
keep
track
of
when
we're
going
to
change
that
default,
so
that
we
don't
break
things
when
we
deploy
it.
K
Relatedly
we're
also
I'm
troubleshooting
what
appears
to
be
a
bug
in
the
drain
logic,
and
I'm
not
sure
if
it's
in
here,
if
it's
somewhere
else,
but
I
think
I'm
getting
closer
to
to
what's
happening
there,
but
it
seems,
like
our
evictions,
aren't
they're
not
actually
making
it
to
even
that
20
second
timeout
before
our
pods
are
going
down.
I
Yeah
yeah
I
mean
we
can
make
it
configurable
as
long
as
the
default
is
wait
forever.
A
Okay,
warned.
G
Yeah
I
just
wanted
to
clarify.
I
wasn't
intending
to
change
the
defaults,
my
I
just
wanted
to
expose
it
and
make
it
configurable
and
that's
what
kind
of
led
me
down
this
path
of
which
timeout
to
make
it
configurable
and
that's
what
I
noticed.
G
A
Sounds
good
yeah,
it
sounds
like
having
a
configurable
and
keeping
the
default
is
a
reasonable
change
for
now
and
then
we
can
have
a
more
same
default
later.
If
we
feel
that
the
default
doesn't
make
sense,
I
see
some
hands
still
raised.
Are
those
still
raised,
or
did
you
forget
to
lower
your
hands?
A
Dane
okay,
good,
all
right,
let's
move
on
for
now,
but
yeah.
Thanks
for
writing
the
notes,
ninja,
okay,
so
discussion
topics,
I
actually
had
the
first
one.
So
I
just
wanted
to
ask
about.
A
I
know
there
was
some
discussion
about
the
bootstrap
reporting
a
while
back
and
basically
the
idea
is
that,
right
now
it's
pretty
difficult
for
the
infrastructure
providers
to
know
or
for
cluster
api
to
know
what
the
status
of
bootstrapping
is
when
we
run,
for
example,
cloud
init
on
a
machine
if
cloud-
and
it
fails
due
to
like
some,
for
example,
if
the
vm
size
is
too
small-
and
there
are
no
not
enough,
cpus
cubanium
will
fail
and
we
won't
get
that
reported
back,
and
so
that's
not
great
from
an
observable
observability
perspective.
A
D
Yeah
I
paused
on
it
for
this,
for
we
run
out
for
free,
ish
037
anyway.
In
any
case,
I
think
got
to
the
conclusion
that
it
was
going
to
be
it's
quite
difficult
to
do
an
infrastructure
agnostic
way
of
doing
it,
but
definitely
something
to
consider
for
we
want
our
four
so
I'll
probably
be
happy
to
start
revisiting
it
pretty
shortly.
Actually,
if
anyone's
interested.
A
Sounds
good,
I'm
very
interested,
please
let
me
know
if
you
have
more
discussions
and
if
we
think
that
infrastructure
agnostic
is
going
to
be
too
difficult
in
the
short
term,
I
think
that's
okay,
too,
and
we
I
just
I
was
holding
on
off
on
doing
anything
infrastructure
specific,
because
I
wanted
to
see
if
we
came
up
with
a
solution
in
cappy
first,
but
it
is
quite
a
blocker
right
now,
so
I
would
like
to
proceed
with
an
azure
specific
solution.
A
If,
if
we're
not
gonna
get
that
done
right
away
cool
anyone
else
have
any
questions
on
this
or
comments.
A
E
Yeah,
I
just
wanted
to
call
out,
and
also
shout
out
to
james
for
jumping
on
this,
like
we
are
using
a
very
old
version
of
serp
manager,
which
is
0.11.0,
and
we
have
been
talking
for
quite
a
while
to
upgrade
the
search
manager
version.
We've
been
stuck
with
this
because,
like
mostly
like,
we
inherited
the
dependency
from
cube
builder
and
controller
tools,
so
we're
trying
to
move
away
from
zero
eleven
to
one
of
the
latest
version
and
yeah.
E
If
you're
interested
like
this,
there
has
been
a
lot
of
discussion
on
2240,
definitely
jump
in,
like
I
think
we
have
been
making
some
progress.
So
I'm
not
sure
if
we
will
be
able
to
like
it
do
any
upgrades
in
the
alpha
3
release,
but
we'll
definitely
prioritize
this
for
alpha
4.
B
B
A
Thanks,
do
we
have
any
questions
for
james
or
vince
about
cert
manager
and
upgrade.
A
All
right
well
yeah
thanks
james
for
jumping
in
cool,
so
I
had
one
more
question.
Actually
I
was
wondering
if
I
don't
know
if
sadaf
is
on
the
call
or
not,
but
if
we
could
talk
a
bit
more
about
cluster
resource
sets
for
everyone
here
and
because
I
know
that's
a
big
new
thing
from
the
new
release
and
specifically
how
infra
providers
should
go
about
adopting
it
in
terms
of
timeline.
A
I
know
there
are
a
few
issues
open
right
now
in
the
queue
are
there
any
that
are
blocking?
Should
we
be
aware
of
any
bugs.
L
L
Yes,
it
is
ready,
ready
to
be
used.
Yeah.
A
L
A
Okay,
great
and
yeah,
as
vince
pointed
out
in
the
chat,
would
you
mind
just
giving
a
brief
overview
to
everyone,
who's
not
familiar
with
what
crs
is
and
what
it
does
sure.
L
So
cluster
resource
set
is
would
be
useful
to
kick
start
a
cluster
by
adding
initial
add-ons
such
as
cni
csi,
and
make
it
operational
for,
for
example,
installing
adobe
managers.
So
initially,
when
a
cluster
comes
up,
it
is
not
in
operational
state.
Basically,
we
create
cluster
resource
set.
Add
a
couple
of
resources
can
be
secrets
or
config
maps,
and
we
by
doing
that,
we
can
like
install
like
plugins
to
the
newly
created
clusters
by
matching
the
matching
their
labels.
Basically.
A
Okay
sounds
good
thanks
and
one
last
question:
are
we
planning
on
removing
that
cni
step
from
the
quick
start
as
soon
as
we're
able
to
when
providers
adopt
this
and
there's
no
more
need
to
install
the
cni
manually.
L
We
haven't
discussed
it,
but
it
can
be
possible.
I
think.
E
So
we
might
need
to
unleash
her
because,
like
we,
it's
an
experiment
and
it's
disabled
by
default.
So
you
have
to
change
the
flag
unless
we
default
it
to
true
which,
given
it
is
the
first
first
integration
which
probably
shouldn't
but
I'll
I'll
leave
it
to
the
group
to
the
side
like
I
just
wanted
to.
E
F
I
want
you
to
give
a
question
on
the
end-to-end
question.
What
we
have
now
is
a
specific
end-to-end
test
that
that
basically
exercises
the
cluster
resource
set.
I
think
that
your
question
this
year
was
about
when,
if
we
already
are
using
the
customer's
set
for
spinning
up
all
the
other
tests,
a
quick
start,
self-hosting
is
all
this
is
not
yet
there,
but
we
are
planning.
There
is
an
issue
to
do
this.
E
I'll
open
an
issue
to
see
so
like,
let's
repeat,
zero,
three
seven
wait
when
clusters
will
set
defaulting
to
false
and
then,
as
we
evaluate
it,
then
maybe
in
zero,
three,
eight
or
nine
or
like
in
another
patch
release.
We
can
default
it
to
true
and
because,
like
it
doesn't
impact
that
other
controllers,
which
is
nice,
it's
like
kind
of
separate,
so
it
can
be
done
and
then
yeah
we
can
discuss
on
the
issue.
A
Okay,
all
right,
so
I
think
that's
the
end
of
our
discussion
topics
but
yeah
feel
free
to.
Let
me
know
if
I
missed
anything,
so
I'm
just
going
to
move
on
to
issue
triage
for
now.
A
A
So
those
are
the
issues
that
are
without
a
milestone
right
now
and
I'm
sure
there
are
many
other
new
ones
that
vince
has
already
triaged
or
others
have
already
triggered,
but
we'll
look
at
those
few
for
now.
So
I
think
this
is
actually
the
bug
that
dane
was
referring
to
earlier.
They
do
want
to
talk
a
bit
about
this
already.
K
Certainly
I'm
in
the
process
of
reproducing
the
issue
which
I've
done
multiple
times
in
a
few
different
ways.
The
simplest
way
most
recently
was
just
to
create
a
pdb
for
a
small
deployment
of
three
pods
and
delete
one
of
the
pods.
I
made
sure
it
had
a
initial
delay
of
like
two
minutes
to
make
sure
that
it
had
a
good
long.
K
While
that
we
were
at
the
point
that
it
should
have
restricted,
the
pdb
should
have
restricted
further
evictions,
and
then
I
deleted
one
of
the
other
machines
that
housed
one
of
the
other
pods
and
it
it
just
drains
really
quickly.
It
takes
about
20
20
to
60
seconds
and
it's
gone,
and
then
we
end
up
with
two
unavailable
pods.
This
originated
from
an
actual
outage,
where
it
impacted
a
stateful
workload
in
in
one
of
our
live
environments.
K
So
the
most
recent
thing
I
found
was
there
was
an
event
for
the
pod
that
that
shouldn't
have
been
evicted
from
taint
manager
eviction.
K
So
I'm
trying
to
figure
out
what
would
have
triggered
that,
and
there
are
a
few
other
moving
parts
in
our
clusters
and
some
other
controllers.
So
I
also
need
to
rule
out
that
it's
not
one
of
our
other
controllers
that
is
doing
this.
A
Thanks
so
it
sounds
like
you're
still
investigating
is
this?
Do
you
think
this
is
a
regression
from
the
current
release,
or
is
this
potentially
something
that's
always
been
there
and
do
you
have
any
ideas
around
that.
K
I
just
I
haven't
looked
at
the
history
of
the
drain
code
path,
but
I
suspect
it's
been
there
for
a
while.
I
guess
I'm
not
really
sure
I
only
recently
started
working
on
this
code
base,
so
maybe
someone
with
more
history
on
the
drain
code
could
speak
to
that.
I
Yeah
I've
been
working
on
this
area
extensively,
so
each
machine
is
drained
on
a
per
machine
basis
and
the
pdbs
through
the
eviction
api
are
quite
atomic.
So
it's
unlikely
it's
the
eviction
api
one
possibility
is
if
one
of
these
kubelets
went
unreachable
during
this
time
period,
so
the
first
first
one
was
a
lot
of
drain
like
normal
and
then
one
of
the
other
ones
went
unreachable.
I
For
whatever
reason,
what
will
happen
is
the
node
lifecycle
manager
will
taint
the
pods
and
the
nodes,
and
the
scheduler
will
mark
those
all
as
deleted,
and
the
default
options
is
drain
ignores
for
the
cluster
api.
The
drain
ignores
pause,
the
deletion
timestamp
greater
than
five
minutes
when
the
kubelet's
unreachable.
I
So
that's
the
only
scenario
that
I
can
think
you
might
have
bumped
into
here,
but
I'm
definitely
interested.
We
posted
on
this.
If
you
find
anything.
K
Yeah
but
actually.
K
That
sounds
very
likely.
Looking
at
the
event,
the
the
stopping
container
engine
x
is
coming
from
source
component
cubelet
host
and
then
the
node's
name
so
that
I
think
you're
you're
right
on
with
that.
A
Okay,
so
for
now
I've
seen
michael
on
here
sounds
like
you
should
follow
up
with
him
and
I
think
in
terms
of
milestones
vince.
Where
are
we
because
we're
about
to
release
zero?
Three
seven:
do
you
wanna
start
putting
things
in
zero?
Three,
eight
or
I
don't
think
this
is
necessarily
a
blocker
for
the
release,
but
it
should
be
addressed
soon.
E
Yeah,
I'm
I'm
gonna
do
zero
three
x
because
we
don't
have
okay
that
got
a
milestone.
F
A
Okay
future
add
support
for
multiple
tilt
providers
and
tilt
provider.json.
J
J
I
would
say
this
is
really
kind
of
an
anytime
sort
of
thing,
but
I
do
know
that
there
are
folks
who
are
working
on
trying
to
add
additional
providers
to
kappa
to
support
things
like
eks.
So
from
their
perspective,
it's
probably
important
soon.
A
Got
it
yeah,
it
sounds
like
that's
the
use
case
that
it's
being
asked
for,
and
should
we
mark
this
as
help
wanted
or
is
this,
do
you
think
this
is
a
good
candidate,
yeah?
Okay,
all
right.
A
A
A
Comedian
control
playing
resilience
to
machine
disk
space
issues
sounds
interesting.
Ben.
M
Yeah
so
as
part
of
I'm,
I've
been
working
on
the
machine
health
check,
implementation
for
kcp
and
just
was
like
trying
to
see
what
would
happen
in
the
scenario
to
see
like
if
it
like.
M
If,
if
there's
this
whole
thing
in
scd
of
like
disk
space
alarms
that
we
could
look
at
to
see
if
it's
under
disk
pressure
and
what
I
saw
was
just
like,
I
was
just
filling
up
the
suv
with
random
keys,
which
is
not
a
very
realistic
like
scenario,
but
I
think
it
seemed
like
we
well
that,
like
std
didn't
raise
any
alarms
for
one
and
other
pods
started.
Failing
like
the
api
server
pod
started,
failing
and
yeah,
it
just
got
into
a
bad
state.
M
So
it
seems
like
something
where
we
just
probably
have
never
investigated
this
failure
mode
before.
But
we
should
probably
see
what
see
what
would
happen
and
like
try
and
make
sure
it
happens
in
the
way
that
we
want
it
to
so.
It
definitely
is
like
a
long-term
feature.
A
N
Hello,
thank
you.
Yeah,
I
noticed
cube.
Adm
does
not
request
any
disk
space
for
xcd,
which
probably
makes
this
a
tiny
bit
worse,
and
I
have
an
issue
filed
somewhere
for
that
which
I'll
post
in
the
chat
in
a
second.
A
M
Yeah
yeah,
I
imagine,
there's
like
a
category
of
tests
that
we
don't
have
today,
which
are
like
kind
of
exercising
failures.
You
know
kind
of
chaos,
testing
or
something
I
was
thinking-
would
be
a
really
interesting
area
to
explore.
A
Yeah
definitely-
and
this
I
have
run
into
this
before
the
disk
space
bank,
so
it's
definitely
important
that
we
test
it
great
so
brian,
if
you
want
to
paste
the
link
to
that
other
issue
in
kubernetes
that
you
opened
that'd
be
great,
so
we
can
link
them
and
then
I
think
we
can
leave
this
as
a
milestone
next
for
now,
since
this
is
more
of
a
long-term
thing,
any
objections
or
any
other
questions
or
comments.
A
Nope,
okay!
Well,
I
still
can't
do
milestones
so
I'll.
Let
vince
do
my
writing
for
me,
okay
and
then
I
think
that's
it.
Are
there
any
other
recent
issues
that
weren't
discussed
here
that
anyone
would
like
to
bring
this
group's
attention
to
or
any
milestone
advancement
for
positions.
A
Yeah
brian,
can
you
please
link
the
related
issue
in
the
in
the
other
issues?
That'd
be
great
cool.
Okay,
I
think.
That's
it
anything
else.
Anyone
wanted
to
cover.
A
All
right,
if
not,
I
think,
that's
it
for
today,
thanks
everyone
and
congrats
on
the
new
release,
really
great
work
from
a
lot
of
people
here.
So
super
excited
to
see
this
coming
along
have
a
good
end
of
your
wednesday
bye
thanks
cecile.