►
From YouTube: 20190826 - Cluster API Provider AWS Office Hours
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Hello
and
welcome
to
the
August
26th
edition
of
The
Closer
API
provide
our
AWS
office
hours,
a
sub-project
in
both
the
cluster
API
and
state
cluster
lifecycle.
We
have
a
relatively
short
agenda
today,
so
please
go
ahead
and
add
anything
if
you
have
it.
I
am
put
in
the
link
in
the
chat
right
now.
The
first
item
on
the
agenda
I
wanted
to
just
give
PSA
that
going
forward.
A
B
B
B
B
C
D
B
B
B
B
A
A
Yeah
well,
actually,
this
is
related
to
these
labels
that
we
apply
so
right
now
we
assign
the
labels
control
plane
is
the
key
and
controller
manager
is
the
value
so
by
doing
that,
by
making
it
unique
means
that,
if,
for
some
reason
we
do
deploy
all
of
the
multiple
components
in
the
same
namespace,
you
can
actually
run
them
alongside
each
other.
Without
issues
with
the
deployments
and
replica
sets.
B
A
B
B
All
righty
surface
all
permission
errors
as
events
this
one
I
imagine
we
would
have
to
do
a
lot
of
surgery
right,
like
anywhere
that
we
talked
to
AWS
and
and
get
some
error
back.
We'd
have
to
parse
the
error
and
potentially
generate
an
event
or
events
from
it,
and
we
could
have
a
helper
for
it,
but
it
means
touching
every
single
line
of
code
where
we
talk
to
AWS
right.
B
B
Let
me
do
that.
Make
it
a
little
bit
more
important,
already
machines
with
police
instances,
don't
join
the
cluster.
This
one
was
yours,
Liz
and
I
would
Jason
you
and
I
had
discussed
back
and
forth
and
get
but
I
think
that
we
basically
need
to
mark
the
machine
as
perma
failed
in
the
event
that
we
see
that
the
ec2
instance
has
gone
away.
B
B
A
B
A
B
B
A
B
A
B
B
B
B
Using
unmanaged
VPC
with
subnets
and
different
availability
zones,
the
ELB
for
the
API
server
is
not
configured
correctly,
so
terraform
module
for
unmanaged,
PPC,
no
AZ
specified
I've,
never
created
in
different
a
Z's.
The
AZ
or
the
ELB
was
only
attached
to
one
availability
zone,
not
the
one.
The
control
plane
was
in
and,
and
it
failed
is
this
something
that
we
can
identify
and
fix.
Do
you
think
Jason
so.
A
B
A
And
what
it
is
is
one
was
public,
one
was
private,
but
we
don't
necessarily,
you
know,
know
which
is
which,
but
we
probably
just
picked
one-
to
determine
what
a
Z
to
use.
So
we
probably
need
to
enumerate
the
provided
subnets
and
just
make
sure
that
the
AZ
for
all
of
them
is
included
when
we
create
the
eld.
A
B
A
A
A
B
B
B
D
A
I
didn't
have
a
chance
to
debug
it
much
further
other
than
to
know
that,
for
some
reason
the
DNS
propagation
was
taking
an
exceedingly
long
time
and
it
shouldn't
be
related
to
anything
that
we're
doing
it
seems
like
it's.
Some
type
of
a
bug
on
the
AWS
side
and,
what's
even
odder,
is,
is
that
the
resolutions
not
working
from
BC
to
instances
that
we're
creating
too.
So
it's
not
like
it's
cached
on
some
type
of
DNS
server
between
AWS
and
us.
A
The
resolutions
not
working
using
AWS
as
DNS
servers,
so
I
didn't
file
an
issue
just
because
I
didn't
know.
You
know
what
we
could,
what
we
could
really
do
about
it
other
than
say.
If
you
hit
this
issue
and
the
DNS
resolution
doesn't
work
after
you
know
in
a
reasonable
amount
of
time,
delete
the
cluster
and
recreated
and
it
should
work
because
I
just
deleted
it
recreated
another
one
and
didn't
run
into
that
issue
again.
So
yeah.
D
Interesting,
so
we
actually
hit
this
from
the
context
of
pivoting
on
Vince's
cluster
CTL
branch.
What
happens
there
is
the
process
kind
of
stops
and
it's
waiting
at
kind
of
a
weird
spot
that
we
weren't
used
to
it.
Waiting
like
it's
usually
hung
on
like
waiting
for
a
control,
plane,
zero
to
be
ready
or
something
like
that.
But
this
time
it
was
like
waiting
to
apply
yeah
Mille
and
in
the
middle
of
all
this,
like
we
got
security
groups
and
we
got
the
load
balancer
and
it
just
kind
of
chilled
there
for
a
second.
D
A
One
this
one's
tricky
at
least
right
now
in
the
current
state
of
v1,
alpha
1
and
B
1
alpha
2
in
that,
in
order
to
clean
it
up,
you
need
to
you
can't
just
replace
the
load
balancer.
You
also
have
to
replace
the
control
plane
instance,
because
when
you
generate
the
new
load
balancer
now
you
need
a
different
Sam
on
the
certificate
that
you're
generating
on
the
initial
control.
Plane
instance
as
well.
A
D
B
You
want
to
talk
about
this,
one
that
I
filed,
so
all
of
our
retrial,
AWS
calls
use
this
function
to
create
the
back
off
and
I.
Think
I
did
the
math
right.
If
you
account
for
jitter,
then
are
the
sum
of
the
amount
of
time
that
we
would
be
sleeping
waiting
for
AWS
to
return
a
positive
indication
of
whatever
you're
waiting
on
is
ninety
billion
years,
which
is
a
long
time.
B
So
I
think
that
we
should
probably
either
switch
to
using
the
API
machinery
weight
functionality,
which
I
believe
can
give
you
some
timeouts
like
hard
timeouts,
or
we
can
figure
out
some
way
to
do
it,
but
it
probably
would
be
nice
to
have
a
timeout
of
somewhere
between
50
and
60
minutes
when
you're
waiting
for
a
WS
to
do
anything-
and
you
know
the
longer
lived
operations
or
the
operations
to
take
longer
like
waiting
for
the
Nathan,
Act
and
Gateway
to
be
available.
That
can
take
a
few
minutes.
B
A
I
would
probably
say
that
we
want
to
go
closer
to
the
shorter
end
of
that,
because,
right
now,
all
of
their
operations
would
currently
be
blocked
on
it
longer
term.
We
probably
want
to
have
a
better
strategy
around
reentrant
C,
and
then
we
could
just
basically
return
the
item
and
riku
it
or
handle
it.
Some
other
way,
yeah.
B
So
the
in
both
alpha
1
and
alpha
2,
the
clustering
machine
controllers
now
like
they're,
not
in
any
released
versions
of
Kathy
or
Kathy.
Yet
but
at
least
in
the
release
branches,
you
can
set
the
flags
that
I
just
added
to
the
manager
to
increase
the
concurrency
for
the
clustering
machine
controllers
above
the
previously
hard-coded
value
of
one.
So
if
you
want
to
reconcile
10
machines
simultaneously,.
B
And
I
need
to
do
another
PR
to
master,
to
do
the
same
support
for
AWS
machine,
an
AWS
cluster
for
those
controllers,
so
that
would
at
least
help
with
different
clusters
in
different
machines,
but
certainly
if
you
are,
if
it
is
stuck
waiting
on
an
ad
gateway
and
for
whatever
reason,
AWS
just
never
comes
back
with
it,
then
yeah.
We
need
some
way
to
deal
with
that.
Yeah.
B
B
I
think
so,
since
we're
here
and
there's
not
that
many
of
us
and
there's
actually
not
that
many
issues-
oops
that's
a
long
milestone,
so
we
have
22
open
issues
for
our
zero
for
zero
release
and
I
know.
We
don't
have
a
full
house
here
today,
but
maybe
with
those
of
us
who
are
here,
we
could
at
least
go
through
these
22
and
decide
if
we
want
to
still
try
to
do
them
for
this
Friday
or
defer
anything
else.
B
That's
deferrable,
so
I'm
just
gonna
start
at
the
bottom
and
we'll
go
up
so
number
one
ability
to
customize
security
group
rules.
This
was
requested,
I'll,
say
a
long
time
ago.
So
give
there's
no
way.
We
could
QA
something
this
helpful.
You
know
yeah,
so
I'm
gonna
bump
this
to
the
patch
release
milestone
so.
A
D
B
This
was
nadir,
so
I
don't
know
if
he
was
doing
that
on
behalf
of
one
of
our
customers
or
just
something
he
was
interested
in
seeing,
but
yes,
I
totally
agree,
it's
been
a
long
time
and
not
implemented
so
next
it
is
alright
document
what
you
get
in
a
cluster
I
still
think
this
is
useful.
It
is
a
documentation
issue,
so,
given
that
it
doesn't
directly
impact
whether
or
not
the
code
works,
I
think
we
can
leave
it
in
the
milestone
all
right.
This
one's
been
around
for
a
while
I
know.
B
A
And
this
goes
into
any
additional
changes.
We
require
much
larger
refactoring
and
that's
why
we
basically
looked
at
just
the
implementing
the
exponential
back-off
to
begin
with
so
I
think
it's
safe
to
bump
this
patch
or
next
I
think
we
should
at
least
attempt
to
address
it
in
the
patch.
But
okay
yeah.
D
A
The
eventual
consistency
of
the
end
of
us
API,
so
that
was
the
initial
one
that
we
were
hitting.
So
the
initial
tagging
would
try
to
happen
and
it
would
fail
because
resource
was
found
yet
and
then
we're
basically
or
from
the
resource
we're
in
a
better
state
now
with
the
back
off.
But
ideally
we
should
be
using
AWS
client
tokens
wherever
we
can
and
then
that
would
ensure
that
even
if
we
fail
tagging,
you
know
we
should
get
the
same
result
back.
If
we
call
that
make
that
call
a
second
time,
I
see
the
downside.
A
There
is
that
we
need
to
ensure
that
we
record
that
client
token,
so
that
we
can
use
it
again
and
they
can't
trust
B,
the
UID
and
the
object,
because
once
you
pivot
or
if
you
restore
from
backup
that
UID
is
going
to
be
different.
So
ideally,
what
we
would
have
to
do
is
generate
the
client
token
save
that
back
to
the
API
server
ensure
that
save
is
successful,
and
then
we
can
go
ahead
and
proceed
got
it.
B
B
A
So
what
we
can
probably
do
here
is,
we
can
just
go
ahead
and
clean
up
any
load
balancers
that
match
the
tagging.
That's
done
for
the
integrated
cloud
provider,
stuff
yeah!
It's
just
a
matter
of
adding
a
query
for
those
and
then
deleting
those
resources,
because
if
the
cluster
is
gone,
I
mean
you're
not
using
the
load.
Balancer
you're,
not
using
those
load
balancers
anyway,
there's
no
risk
of
actual
data
loss.
There.
B
B
We
can
potentially
just
close
this
if
we
want
to
given
that
at
some
point,
alpha
1
isn't
going
to
be
maintained
anymore,
so,
but
I'll
put
in
0
3
4
right
now
and
maybe
after
I
get
back
from
San
Francisco
I
can
actually
document
this
Larry
ad
validating
webhooks
for
alpha
2
I
think
this
is
still
worth
trying
to
do
and
if
it
doesn't
make
it,
it's
ok,
I'm
good.
D
B
Problem
the
problem
so
I
went
through
the
Q
builder
v2
webhook
configuration
flow
and
coding
flow
and
I
use
cert
manager
to
generate
certs
and
it
works
there.
At
least,
there
was
at
least
one
Gacha
that
I
ran
into
with
some
of
the
Hamill
that
got
generated
where
I
had
to
hack
in
a
fix,
because
of
something
that
wasn't
coated
in
controller
gen
and
I.
Don't
know
if
it's
been
fixed
yet
but
yeah.
If
you
don't
use
that
approach,
then
I
guess
you're,
sort
of
on
your
own
yeah.
D
A
B
Of
them,
yeah
I'd,
say:
there's
probably
three
options
there's
option,
one
which
is
I,
don't
want
to
deal
with
this
as
an
end
user,
and
so
you
know
we
just
have
a
banner
somewhere.
That
says
warning.
If
you
don't
feel
like
dealing
with
certificates,
then
you're
not
going
to
have
all
of
the
functionality
that
we
have
coded
for
validation
and
then
the
two
options
are
the
default
being
cert
manager
and
then
instructions
for
how
to
do
a
non
certain
integer
approach.
D
B
A
A
D
B
B
B
A
B
Yeah
I
think
if,
if
there
are
things
like
a
subnet
in
this
case
that
need
to
get
linked
to
other
things
like
Route
tables
using
poor
terminology
but
I
think
you
get
the
point,
then
you
either
need
to
let
Kappa
manage
it
fully
or
you
need
to
manage
it
fully.
But
it's
not
fifty-fifty
and
that's
different
from
saying
something
like
well.
You
know
if
you
could
bring
your
own
ELB,
we
would
let
you
do
that,
and
but
it's
not
like
that.
A
B
B
B
C
Alternate
the
alternate
thing
to
do
here
is
a
pre-admission.
C
I
agreed
that
I
don't
think
that
this
will
I
agreed
that
this
should
not
block
that
release.
I,
think
we
we
can
move
it
into
the
point
release
but
I,
don't
think,
there's
anything
that
prevents
us
from
doing
this.
Anyone
else'
to
there's
no
there's
no
light
fields
or
anything
that
I
think
we
need
to
make
this
yeah.
B
A
E
C
A
B
B
D
Is
a
tough
one
only
because
I
guess
figuring
out
like
which
is
a
unit
test
and
which
is
like
an
integration
test,
is
a
little
bit
difficult
in
these
queue
builder
repos,
where,
like
there's
the
big
setup
block
that
vegetable
stands
up
a
whole
cluster
and
the
reason
why
I
think
this
is
tough
is
because
it
takes
almost
like
a
little
bit
of
design
to
figure
out
what
goes
where
like
for
us.
In
our
queue
builder
controllers.
D
We
just
put
everything
into
that
suite
and
just
take
the
performance
and
timing
hit,
but
it
seems
like
in
some
of
our
repos.
We
have
some
unit
testing
I'm,
sorry
in
some
of
the
cluster
API
repos,
whose
drivers
specifically
there's
like
a
bunch
of
fake
client
work
that
runs
a
bunch
of
tests.
The
issue
with
fake
client
in
a
bunch
of
these
is
that
it
doesn't
care
about
a
lot
of
the
deletion
stuff
like
fake
client
ignores
finalized
errs,
so
there's
a
bunch
of
I.
Don't
know,
I
did
tests
right,
I,
don't
know
yeah.
B
Why
don't
we
just
move
it
cuz?
It's
not
a
blocker
yeah
all
right.
We
talked
about
this
one
already,
that's
an
awful
one-one
and
Jason's
working
on
the
certs.
We
talked
about
that
and
we
talked
about
that.
So
I
think
we
are
good
in
at
least
for
having
covered
all
of
these
anything
else.
Anybody
must
talk
about.
D
Somewhat
related
to
the
Ubuntu
image,
I
guess
like
how
much
do
we
care
about
I,
don't
know
like
security
patches
to
those
a.m.
eyes
or
something
like
that
like,
for
example,
there
was
some
port
mapper
stuff.
That's
default,
enabled
on
these
lundi
machine
totally
unnecessarily,
but
I
don't
know
if
it's
like
in
scope
or
not
for
us
to
care
about
like
patching
all
those
little
things
and
making
eyes
so.