►
From YouTube: Kubernetes - AWS Provider - Meeting 20210402
Description
Recording of the AWS Provider subproject meeting held on 20210402
Discussed cloud-provider controller race - service gets deleted and there is a node sync
A
Okay,
hello,
everybody.
It
is
april,
2nd.
Welcome
to
the
provider.
Aws
meeting
looks
like
there
is
an
agenda
item
today.
Actually
keisha
you
put
that
in
march
19th,
I'm
just
gonna
copy
it
up.
B
Yeah,
so
what
I
observed
was
in
the
service
controller
like
there
are
two
independent
threads.
One
is
the
node
sync
loop
and
another
is
the
work
queue
like
whenever
the
service
gets
updated
deleted
and
what
it
turns
out
is
like
the
way
that
the
load
balancer
functions
are
invoked
like
in
a
certain
sequence,
when
service
is
getting
deleted
and
at
the
same
time
there
is
a
node
sync
event.
B
We
see
that
both
of
them
end
up
calling
like
some
of
the
common
functions
in
their
update,
instant
security
group,
for
example,
and
that
is
causing
a
race
condition
like
some
resource
leakage
and
all
other
undesired
effects.
We
looked
at
the
code.
B
We
could
put
a
lot
in
those
functions,
but
a
lock
isn't
gonna
be
very
helpful
because
they
would
get
invoked
anyways
like
even
if
she
realized,
like
the
update
code,
could
potentially
get
invoked.
B
Okay,
let
me
take
that
back,
I
mean
lock.
Locking
is
not
like
a
very
feasible
right
now
because,
like
we
have
delays
and
all
sort
of
things
there,
so
what
we're
looking
at
is
like
whether
from
the
service
controller,
we
could
serialize
all
of
these
operations
like
putting
them
in
a
single
queue,
rather
than
handling
them
independently.
So
that's
what
I
wanted
to
discuss
further.
A
Whoops
got
it
so
you're
talking
about
so
the
the
work
queue
for
like
node
events
and
service
events.
Basically.
B
Correct
so
what
we
do
is
even
for
the
node
events.
We
look
up
the
service
that
are
impacted
and
then
reconcile
them.
So
if
we
could
combine
them
into
a
single
queue,
then
we
can
get
benefit
of
that
our
queue
synchronization
that
it
offers
and
potentially
minimize
the
race
condition
here.
A
Yeah,
that
makes
sense.
Is
it
worth
taking
a
look
at
the
code
right
now
and
just.
B
So
I
saw
it
on
115,
but
I
looked
at
the
recent
code
and
those
conditions
haven't
changed,
so
it
would
still
be
possible
that
it
gets
triggered
yeah
I
mean
it
could
be
specific
to
aws,
but
like
the
way
that
things
are,
I
mean
we
should
do
something
in
the
service
controller,
which
would
be
beneficial
for
everybody.
B
But
this
definitely
happens
like
on
a
big
cluster
like
say,
hundreds
of
nodes
and
like
some
200
services
and
like
whether,
when
there
is
a
constant
node
change
event
happening,
scaling
up
and
down
happening.
So.
A
A
Okay,
controller
node.
B
Yes,
so
this,
if
you
look
at
towards
the
end,
you
will
call
the
update
load,
balancer
host
function.
B
B
So
you
will
call
somewhere
update
lock.
B
Yes,
locked
update,
load,
balancer
host
and
then
this
will
call
the
update
load
balancer,
so
that
goes
into
the
aws
code.
For
us
now
update,
load
balancer
will
do
its
thing
right,
it
will
go,
look
up
the
service
the
nodes
and
then
you
will
update
the
security
group
and
all
of
those
stuff
in
there
right
and
it's
not
protected
by
anything
right
now
this
update
or
any
any
of
the
aws
functions.
B
B
So
what
I'm
saying
is
like
depends
on
how
they
are
involved,
but
what
I
am
seeing
is
like
process
load
balancer
deleted
the
insured
load.
Balancer
deleted
function
goes
ahead
and
modifies
the
security
group
rules.
It
deletes
some
entries
from
the
sg
rules
and
then
the
update
one
gets
invoked
a
little
bit
later
and
then
it
goes
and
adds
back
the
rule
again
because
because
they
just
run
concurrently.
So
that
is
the
reason
why
they
they're
not
synced.
B
B
Not
yet
I
wanted
to
discuss
it
further
before
creating
it.
Just
wanna.
A
A
A
My
my
recommendation
is
definitely
to
move
forward
with
making
the
issue
and
proposing
the
solution
in
the
issue.
Definitely
do
that
before
you
know,
starting.
B
A
Because,
maybe
we're
not
not
understanding
something
but
yeah,
it
seems
seems
pretty
straightforward
to
me.
B
Okay,
so
the
note
sync
is
usually
100
seconds
or
so,
and
it
may
not
even
be
seen
like
if
the
number
of
nodes
are
less
or
like
updates
are
not
frequent.
It
just
has
to
have
the
right
set
of
trigger
condition,
so
I
I
we
didn't
reproduce
it,
but
we
look
at
the
logs
and
then
we
analyze
the
code
and
then
young
and
I
concluded
that
that's
what
should
happen
so,
okay,
I
will.
A
In
sick
network
yeah,
that's
a
that's
a
very
good
idea.
Once
you
create
the
issue,
since
they
are
responsible
for
the
service
resource,
you
should
definitely
bring
it
up
there.
Okay,.
B
B
About
like
adding
lock
to
the
aws
code,
but
that,
like
a
it's
like
difficult,
because
if
we
look
at
the
insure
load,
balancer
deleted
function
that
can
take
up
to
10
minutes
to
complete,
like
we
have
that
time
out
right,
so
it
it
would
effectively
like
slow
down
everything
and
it
may
or
may
not
work.
There
might
be
corner
cases
we
haven't
thought
through,
but
there
might
still
be
some
corner
cases
that
may
not
be
handled
there.
B
A
Yeah
yeah,
I
mean
there
could
be
other
reasons
for
having
two
distinct
cues
for,
for
you
know,
and
not
combining
node
and,
and
maybe
it
has
to
do
with,
like
the
fact
that
the
the
node
sync
only
happens.
Every
100
seconds.
A
B
B
And
how
does
it
work
for
cloud
provider
v2?
Is
it
still
gonna
be
this
loop
or
it's
gonna,
be
slightly
different,
like
for
the
out
of
three
cloud
provider
that
we're
gonna
have
eventually
how
things
are
gonna
be
different.
A
I
think
it
was.
It
was
going
to
be
this
loop
unless
we
needed
to
change
it.
So
if
you
know
if,
for
some
reason.
A
So
I
was
seeing
this
with
like
the
the
load
balancer
like
using
custom
load,
balancer
names,
the
way.
A
Oh
sorry,
actually
with
the
with
node
names,
like
the
the
way
that
node
names
works
with
the
way
it
is
now,
it's
just
really
difficult
to
use
custom
node
names
with
just
like
the
way
that
the
the
the
cloud
provider
loops
work.
So
I
was
considering
nb2
the
same
thing
replacing
some
of
these
loops,
but
yeah,
I'm
not
I'm
not
sure.
Yet
it
really
just
depends
on
if
we
need
to
or
not.
A
So
I'm
I
think
I'm
gonna
push
the
issue
triage
to
next
meetings
when
when
justin's
back,
so
I
don't
have
anything
else,
if
you
guys
don't.