►
From YouTube: Kubernetes - AWS Provider - Meeting 20220708
Description
Recording of the AWS Provider subproject meeting held on 20220708
Agenda - https://docs.google.com/document/d/1-i0xQidlXnFEP9fXHWkBxqySkXwJnrGJP9OGyP2_P14/
A
A
A
We,
I
don't
think,
let's
see
there
were
some
new
releases.
A
Excuse
me,
there
were
some
new
releases
recently
see,
I
think
120
through
124
all
had
a
patch
release.
I
think
that's
it
yeah
go
ahead
and
go
to
cops.
C
Thank
you.
I
think
I
don't
there's
any
aws.
Surprise
related
surprises.
We
are
about
to
release
a
124
ga,
so
that's
good
and
we
have
some
patch
releases,
but
yeah.
Everything
on
the
81st
side,
I
think,
is-
is
fairly
uneventful
as
it
were,
which
is
good
news.
A
Yeah
sounds
good:
let's
do
load,
bouncer
controller.
D
I
don't
have
any
significant
update,
I'm
planning
for
the
next
patch
release
of
two
four
two
four
three,
which
would
be
the
minor
fixes
that
we
have
accommodated
so
far.
A
B
A
A
C
It's
tangential
I'll
mention
that
we
observed
some
ede
failures
across
a
bunch
of
chaos
tests
and
we
think
it
is
actually
because
the
aws
janitor
is
deleting
resources
over
eagerly.
So
if
you
see
iam
roles
in
particular
disappearing
mid
test,
hopefully
that'll
get
fixed
but
yeah.
So
it's
not
it's
not
an
aws
issue.
It's
just
an
issue
that
is
happening
with
our
test
infrastructure
on
aws.
A
Is
this
from
the
account
that
I
guess
I
I
know
of
like
one
account
where
tests
run
in
and
I'm
I'm
one,
but
I
think
there
might
be
others
so
do
you
know
which
account.
C
It
is,
it
certainly
happens
in
the
the
primary
account
as
it
were,
but
I
think
the
janitor
runs
against
all
the
accounts.
The
what
happens
is
basically
once
you're
running
enough
tests
or
running
tests,
often
enough
that
a
resource
exists
on
every
run
of
the
janitor.
C
It
doesn't
for
some
resources
which
don't
have
a
unique
id.
So
like
instances
have
a
unique
like
numeric
id
or
hex
id.
Maybe
I
don't
know
they
have
a
unique
id.
I
am
roles
do
not.
They
just
have
the
name,
and
so
the
janitor
can't
tell
them
apart
and
if
it's
c,
if
we're,
if
we're
unlucky
and
the
janitor
sees
like
those
at
every
hour,
it
will
treat
it
as
a
resource.
That's
been
there
for
a
long
time
and
it
will.
C
It
will
delete
it,
assuming
that
it's
no
longer,
assuming
that
it's
pac,
because
it's
past
the
threshold.
So
it's
it's
unlucky,
but
it
it
is
very
surprising
when
it
happens.
How.
C
Well,
the
janitor
runs
every
hour
and
it
seems
to
clean
up
something
every
hour,
but
like
a
lot
of
our
so
on
the.
If
you
look
at
test
grid
the
k
ups
test
grid,
I
think
it's
now
hitting
one.
It
seems
to
vary
based
on
luck
of
scheduling
as
it
were.
I
think
it's
now
hitting
the
123
tests
pretty
hard,
but
it
was
previously
hitting
the
122
tests
pretty
hard.
It
would
happen
about
like
half
the
tests,
but
the
janitor
is
running
every
hour.
I
think
currently
the
45
minutes
passed.
C
Yeah,
sorry,
oh
sorry,
it's
not
it's
not!
It
is
a
it's
called
the
aws
janitor
it
isn't
that
aws
wrote
it.
It
is
part
of
the
it
used
to
be
part
of
kate's
test
testing,
kubernetes
test
infra,
and
it's
now
moved
to
kubernetes
six
dash,
slash
boscos.
C
Sorry,
it's
it,
it
runs
every
hour
and
looks
at
all
the
resources
and
it's
supposed
to
only
delete
them
after
a
ttl,
which
I
think
is
four
hours
or
something
of
that
nature.
I
don't
actually
remember
the
exact
value,
but
it
certainly
isn't
every
it.
It's
only
that
it
got
confused
by
us
happening
to
recreate
the
same
resources
every
hour
effectively.
D
There
are
some
instances
like
where
the
cleanup
doesn't
happen
properly.
I
do
see,
may
not
be
for
chaos
project,
but
for
some
other
projects,
and
then
I
kind
of
periodically
look
at
them,
but
I
usually
have
like
two
hours
or
at
least
four
hours
before
I
delete
the
stack
but
yeah.
I
do
that
sometimes,
but
not
that
aggressively.
C
I'll
put
a
link
I'll
put
a
link
in
chat
to
the
project
itself
yeah,
so
this
is
well
I'll,
put
it
in
chat
and
then
I'll
also
paste
into
the
notes.
C
This
is
the
project
and
it
is
supposed
to
be
the
thing
that
automatically
cleans
up,
because
the
the
challenge
is
like,
even
if
chaos,
what
chaos
we
all
know
is
absolutely
perfect
and
never
has
a
bug,
but
even
if
there
was
a
bug,
even
if
there
were
no
bugs
in
chaos,
the
ede
test
runs,
can
sort
of
be
interrupted
and
cleanups
aren't
guaranteed
guaranteed
to
run.
So,
even
with
the
perfect
k,
ops
resources
are
still
going
to
leak
or
perfect.
D
Correct
does
it
use
like
cloud
formation
templates
or
anything,
because
it
will
be
easier
to
clean
up
right.
You
delete
the
cfn
and
the
resources
go
away,
just
like
thinking
aloud
here.
C
Right
kops
does
not
use
cloud
formation
or
doesn't
by
default,
use
confirmation,
templates
and
yeah.
I
I
just
don't
think
we
can
guarantee
it
for
everything
in
the
ecosystem
that
it
it
does
it,
but
I
think
I
don't
know
whether
it
will
clean
up
cloud
formation
templates.
I
was
looking,
it
looks
like
it
will
clean
up.
It
looks
like
the
janitor
will
clean
up
cloud
formation
stacks.
So
I
guess
someone
is
using
cloud
formation
stacks
and
has
added
effectively
garbage
collection
for
them.
D
D
D
A
B
A
Yeah,
thank
you
for
adding
that
I
and
I've
totally
dropped
this
one.
I've
been
meaning
to
review,
but
I've
been
on
call
so.
B
Yeah
sure
give
me
one
second,
real
quick:
it's
been
like
a
few
few
days.
I
worked
on
this.
B
So
yeah,
the
issue
is
when
we,
when
the
cloud
order
makes
an
easy
to
describe
instance,
call
to
check
if
the
instance
exists,
we
pass
in
the
credentials
which
are
assume
role
credentials
and
if
we
fail
to
assume
those
credentials,
cloud
provider
will
keep
trying
for
every
node
to
make
a
call
to
assume
those
credentials.
So
we
want
to
slow
down
the
call
to
sds
when
assume
roll
is,
is
failing
or
is
not
able
to
assume
the
role
either.
B
The
role
is
broken
or
some
other
issue,
so
we
don't
want
to
slow
down.
The
describe
instance
call
because
we
want
to
process
the
nodes
as
fast
as
possible,
but
this
is
when
the
describe
instance
call
is
happening.
It
needs
credential,
but
for
credential
it
needs
to
make
a
zoom
roll,
although
once
the
call
succeeds,
these
credentials
are
cached
for
15
minutes.
So
that's
the
happy
path,
but
if
we
fail
to
assume
a
role
once
then,
then
the
loop
just
runs
over
and
over
again
and
calls
sts
multiple
times.
A
And
so
why
was
it?
Why
was
there
such
a
difference
between
the
like?
Why
did
it
wait
until
okay,
I
see
yeah
so
so,
what
do
you
cash
on
the
failure,
then.
B
Say
it
again,
what
do
we
cash
on
soluble?
I.
A
A
B
B
The
assume
roll
call
that's
failing
okay,
but
if
that
call
succeeds
and
we
we
get
the
credentials
from
sds
so
in
our
credentials
chain,
we
we
already
cache
them
so
that
the
next
time
when
describe
instance
needs
to
make
a
call.
It
will
use
those
cash
credentials
for
15
minutes.
So
we
don't
see
it
that
often,
but
when
assume
role
fails
describing
since
we
keep
calling
assume
role,
assume
real
assume,
so
we
don't
want
to
do
that.
So
that's
why
we
are
trying
to
slow
down
the
call
to
sts.
A
Okay,
and
so,
if
assume
roll
fails,
so
what
is
the
new
behavior
like?
How
do
we
get
around.
B
B
So
if
we
have
seen
that
the
last
call
was
made
within
one
second,
it
will
just
return
the
last
value
or
the
last
error
it
has
seen,
and
if
you
call
it
again
after
one
second,
then
it
will
call
the
underlying
provider
which
we
are
today
passing
to
the
credentials
and
get
the
result
back
and
send
that
result
whatever
it
is
getting
from
there.
So
basically
we're
just
adding
a
layer
to
cache
for
one
second,
okay,
that.
B
So
make
sense,
so
I
added
the
cloud
watch
graph
to
show
the
difference
between
before
this
fixed
and
after
this
weeks.