►
From YouTube: SIG Cloud Provider 2022-08-31
Description
Agenda: https://docs.google.com/document/d/1OZE-ub-v6B8y-GuaWejL-vU_f9jsjBbrim4LtTfxssw/
[dmoiseev] Introduce well-known tag for exclude subnets within a auto-discovery procedure for ELB backed services
https://github.com/kubernetes/cloud-provider-aws/issues/442
[jyotimahapatra] As a cluster operator I don’t have a mechanism to shift leadership away from an impacted AZ. https://github.com/kubernetes/kubernetes/issues/111899
[bridgetkromhout] https://github.com/kubernetes/kubernetes/pull/108095 - looking for more feedback/replies so we can get this resolved
B
Okay
welcome
everyone
today
is
August
31st
2022,
and
this
is
the
Sig
cloud
provider
community
meeting.
We
follow
the
kubernetes
Sig
Community
guidelines,
which
essentially
means
raise
your
hand.
If
you
want
to
talk-
and
you
know
please
tweet
out
each
other
as
you
would
expect
to
be
treated
or
explicitly-
please
be
kind
to
each
other.
B
I
will
share
my
screen
to
show
the
agenda
here
and
then
we'll
go
through
things.
Okay,.
B
All
righty,
so,
let's
see
normally
we
go
through
triage
at
the
beginning
of
these
meetings,
but
as
we
are
running
in
kind
of
a
limited
capacity
today,
I
think
we'll
we'll
come
back
to
triage
at
the
end.
If
we
have
time
we
do
have
a
couple
agenda
items
today,
so
I
guess
we'll
go
through
the
sub
project
updates
to
begin
with,
and
it
doesn't
look
like
there
are
any
recorded
currently
are
there
any?
Would
anyone
like
to
make
a
sub-project
update
that
isn't
already
on
the
on
the
agenda.
B
All
right,
I
am
not
seeing
any
hands
go
up,
so
we
will
move
through
to
the
agenda
items
then
all
right.
The
first
agenda
item
is
from
Dennis
Moise
who's,
a
colleague
of
mine,
and
was
curious
to
talk
about
this
topic
here
so
Dennis.
Why
don't
you
take
it
away.
C
Yep,
so
here
you
could
see
the
link
under
GitHub
issue,
which
I
created
a
couple
of
weeks
ago
already,
and
that's
about
design
discussion
of
possible
like
labels
to
exclude
load
balancers
from
Auto
Discovery
Logic
on
AWS.
So
the
issue
which
we
have
faced
within
Red
Hat
was
basically
about
that
new
availability
zones
which
jws
was
introduced,
namely
wavelength
zones
that
far
Edge
zones
and
so
on,
and
these
zones
have
a
limited
capacity,
not
not
capacity
but
limited
capabilities
in
terms
of
which
load
balancers.
C
We
could
use
so
and
we
faced
quite
a
lot
of
issues
because
Cloud
controller
manager
tries
to
attach
machines
to
load
balancers
which
are
not
intended
to
like
use
in
these
zones.
So
here
I
just
want
to
I.
Don't
know
ask
about
what
people
thinking
about
that
design,
which
I
propose
how
I
can
proceed
with
that?
Further
will
I
need
to
prepare
some
enhancement
documents
yeah.
These
kind
of
things
so
basically
want
to
have
some
feedback
on
on
the
thingy.
So.
A
C
So
that's
about
labeling
subnets,
so
not
not
entire
Zone,
but
the
the
subnets
which
belongs
to
some
zones,
yeah,
wavelet
or
farage,
or
something
like
that,
because
we
cannot
attach
it
to
there's
a
load
balancers
which
we,
which
we
have
the
elastic
condolences.
Basically.
B
C
D
Is
entirely
in
the
local
and
wavelength
Zone
and
not
the
regular
zones
so.
C
That's
for
mixed
clusters,
specifically.
So,
for
example,
we
have
an
existing
clusters
and
we
want
to
add,
like
sublins
and
virtual
machines
within
like
wavelength
Zone,
and
if
we
have
that
kubernetes
IO
slash
cluster
cluster
ID
tag
on
this
entities
like
subnets
and
the
virtual
machines,
so
the
CCM
would
try
to
attach
this
to
existing
load
balancers
so
which
breaks
Cloud,
control
and
manager.
Effectivoice.
C
D
D
Think
through
it
further
to
see
how
we
can
solve
this
use
case,
so
workaround
is
to
like
use
subnet
tag
where
you
want
to
specify
the
subnets
that
you
want
to
attach
the
load
balancer
to,
but
I
guess
that's
not
feasible
in
this
case,
like
specifying
subnet
for
each
of
the
load
balancers
here.
D
Yeah,
so
let's
take
it
further,
you
can
assign
to
me
myself
and
Nick
and
then
I
will
look
into
it
further.
If
you
have
any
proposal,
feel
free
to
bring
bring
it
out
and
we
can
work
on
it.
Yeah.
C
So
basically
I
try
to
describe
proposal
which
feels
reasonable
to
me
like
right
now,
probably
if
we
could
introduce
a
special
attacks
for
subnets
to
exclude
these
subnets
from
Auto
Discovery.
So
that's
that
would
work.
C
No
I
here
I
mean
that
auto
Discovery
logic,
which
we
have
within
Cloud
controller
manager,
which
looks
at
which
looks
at
tags
so
it'll
be
taxed
on
on
that,
so
it
would
be
attached
if
it
marked
as
like
kubernetes
dot
IO
cluster
with
cluster
ID.
It
would
be
attached
so
if
we
could
specify
explicitly
that
we
do
not
want
to
attach
these
concrete
subnets
there's
a
lot
balancer.
That
would
solve
at
least.
D
D
Okay,
got
it
yeah
sure
assign
it
to
us,
and
then
we
can.
If
you
have
any
changes,
any
PR
feel
free
to
bring
it
about.
We
can
discuss
further
on
a
PR
as
well.
Okay,.
B
All
right,
awesome
Kishore.
What
are
what
are
yours
and
Nick's
GitHub
IDs
here,
just
so
I
don't
mess
this
up.
D
Mine
is
Nick.
A
B
I
can
remember
what
his
his
icon
looks
like,
but
I
don't
know
I.
D
Put
that
so
this
is
in
the
cloud
provider
AWS
like
a
separate
one.
Do
we
still
go
ahead
with
that.
B
You
guys
mark
it
up,
then
sure
all
right
awesome.
Thank
you
all
right,
Dennis
did
you.
Did
you
get
everything
you
needed
there.
B
Yeah
and
I
sync
up
with
Nick
and
and
Kishore
in
in
the
issue
and
whatnot,
then
yeah.
D
B
B
All
right
awesome.
Thank
you
very
much
now.
Tell
me
if
I
say
this
right
got
I
think
you've
got
the
next
one
yeah.
F
Hi
guys
this
is
my
first
time
in
this
meeting
so
hi
my
name
is
and
I
work
with
Nick
and
Kishore,
and
all
the
other
AWS
guys
there,
like
cluster
operator,
manage
a
lot
of
clusters,
look
at
issues
that
arise
from
them
and
with
Cloud
providers.
One
of
the
given
things
is
in
AC
outage
could
happen
so
I
wanted
to
I
have
thoughts
about
that
when
these
things
happen
most
times.
These
are
Byzantine
problems.
F
We
can't
handle
them
well
and
one
of
the
things
that
happened
last
time
was
the
CCM
controller
was
able
to
establish
the
lilies
because
hcd
connectivity
was
not
broken,
but
the
leader
which
held
the
lease
was
not
able
to
connect
to
the
Internet.
So,
even
though
there
was
a
leader
for
CCM,
it
was
not
very
useful
and
CCM
makes
so
many
calls
and
there
are
heuristics
to
say
how
how
to
kill
CCM
process.
F
If
I
see
these
patterns
of
failure,
because,
let's
say
SPS
like
IAM,
fails
or
elb
calls
failed
or
just
DNS
fails:
Anything
Could
Happen.
My
proposal
here
was
that
we
make
something
that
a
cluster
operator
can
hit
a
crd
or
something
of
that
sort,
and
the
leader
election
mechanism
looks
at
that
and
weighs
away
kind
of.
But
apart
from
that,
the
proposal
I
wanted
to
just
hear
out
because
other
platforms
are
here.
F
How
would
it
how
do
they
think
about
this
problem
of
zonal
outage
and
partition
notes
where
leagues
is
able
to
get
established,
but
the
thing
cannot
really
work
and
it's
really
not
for
CCM.
Certainly,
but
CCM
is
where
it's
a
compass
component
and
when
I
went
to
API
Machinery,
they
said
that
you
could
Implement
a
live.
Z
and
Lively
could
fail,
that's
great,
but
for
CCM
I
don't
know
how
to
implement
a
good
live
Z
that
could
fail
on
like
zonal
outages,
so
yeah,
that's
the
context.
I
can
answer
more
questions.
E
F
G
F
So
the
the
trouble
is
the
bad
instances
are
so
Network
partition
partitioned
that
I
cannot
go
into
it
to
do
anything,
but
I
know
that
you
have
to
do
SSH
or
some
other
mechanism
to
get
into
the
instance
to
do
some
action,
but
I
cannot
do
that,
but
I
know
that
hcd
leaves
connectivity
is
good,
so
I'm,
leaning
on
the
fact
that
connectivity
to
xcd
is
established,
so
I
can
apply
that
from
anywhere
through
a
public
endpoint
of
the
cluster.
F
We
have
like
two
or
three
hcd,
sorry,
two
or
three
Master
instances.
I
can
apply
the
crd
reliably
through
any
master
and
because
the
bad
Master
can
still
come
to
HD,
it
will
know
well
I
have
to
relinquish.
This
is
only
relevant
when
HDD
connectivity
is
present.
If
hcd
compute
is
not
present,
it's
a
mood
point
because
it's
easily
broken
things
are
good.
This
is
only
when
a
master
could
still
talk
to
hcd
but
can
not
connect
to
anyone
else.
F
Yeah,
because
master
and
HTT
are
in
the
same
VPC
based
on
the
setup.
They
are
not
partitioned.
A
F
So
I
guess
none
of
us
has
seen
any
of
these
problems.
B
If
there's
a
zonal
outage,
and
you
have
a
you-
have
a
control
plane,
node
that
has
that
CD
on
it
and
obviously
it
can
communicate
with
itself,
but
like
would
it
lose
quorum
to
the
other
FCD
members
then
like,
if
you
had
a
crd
that
was
generated
in
one
place,
how
how
could
you
assure
that
it
got
propagated
to
the
rest
of
the
cluster
if,
like
especially
if
you're
at
your
control
plane
set
out?
You
know
across
zones
or
something
like
that
and
you
had
a
Zone
allowed.
G
We
already
know
that
the
leader
release
is
hold
by
the
last
remaining
instance,
so
it
kind
of
assumes
that
the
LCD
connectivity
is
still
there,
because
if
there
would
be
no
LCD
connectivity,
the
last
leader
would
lost
the
the
the
leader
on
the
instant
that
has
that
also
it
would
be
released
and
the
leader
would
jump
to
some
other
instance.
So
it
kind
of
assumes
that
we
have
this
LCD
connectivity,
yeah.
F
So
it's
not
actually
is
on
the
bat.
Node
actually
has
three
different
VM
setups
master
has
two
different
VMS
setup
and
they're
on
the
same
VPC,
so
the
partition
has
not
affected
them,
but
getting
into
the
node
working
any
function
out
of
the
node
to
connect
to
internet
is
not
working,
let's
say
so:
yeah
I
I
faced
probably
five
or
six
such
customer
issues
where
these
things
happen
over
like.
If
you
operate
like
some
thousands
of
clusters,
this
is
an
edge
case
and
even
1.1
percent
would
be
probably
in
hundreds.
F
So
that's
where
I
I
see
this
problem
and
I'm
just
trying
to
get
like
socialize.
This
idea
that
do
you
see
it's
a
a
valid
problem
and
have
anyone
seen
this?
So
that's
the
intent
I'm
not
looking
for
like
answers
right
away,
though.
G
F
Yeah
that
that
doesn't
work
yeah,
it's
probably
we
can
call
it
like
lease
dealing,
but
the
default
lease.
The
parameters
is
every
two
seconds.
It
tries
to
take
a
lease
for
next
15
seconds
so
and
it's
not
reliable
to
say
that
yeah
I
could
steal
the
lease
for
a
bit,
but
the
bad
note
could
still
take
the
lease
over
again.
It's
not
a
reliable
way.
F
So
the
lease
is
active
for
15
seconds,
so
even
if
a
steal
it
it's
like
a
surgery,
it's
not
augmented!
Well
so
yeah.
G
B
So,
back
back
to
your
previous
question,
GLT
you,
you
might
actually
be
like
the
world
expert
in
this
in
this
topic.
If
no
one
else
is
hitting
it
or
perhaps
it
might
be
worth
reaching
out
on
the
mailing
list,
also
to
see
if
others
have
experienced
this
just
to
cast
a
wider
net
yeah.
F
Okay,
so
in
in
the
short
term,
the
way
I'm
thinking
is
I
could
still
Implement
a
live
CJ
which
I
control
and
I'll
I
can
look
at
this
object.
So,
instead
of
leader
election
object,
looking
at
it
and
doing
its
stuff,
I
can
still,
in
the
short
term,
look
at
it
from
the
live
Z,
but
I
have
to
implement
like
10
lives.
These
CCM
search
controller,
KCM
scheduler,
and
the
list
goes
on
and
it's
not
extensible
to
any
component
that
that's
out
there
right,
so
I
could
still
go.
B
I
see
you
have
your
hand
up,
do
you
have
a
question
for
Jyoti
or
just
in
general,.
A
Yeah
I
had
a
question
for
Josie
about
the
police's
like
yeah
I,
probably
should
wait
till
he
comes
back,
but
the
my
understanding
is.
Each
of
these
components
is
using
the
default
Cube
lease
algorithm,
and
so
they
need
access
to
a
working,
API
server
to
be
able
to
renew
their
lease.
If
there's
a
failure
in
their
networking,
then
they
won't
be
able
to
talk
to
the
API,
and
so
they
won't
be
able
to
renew
the
lease
at
which
point
they
should
give
up
and
normally
panic.
G
B
And
just
just
to
back
up
a
little
a
little
bit,
Jyoti
I
think
before
you
dropped
out.
You
were
talking
about
the
current
workaround
and,
if
I
understand
that
clearly
you're
creating
liveliness
probes
for
all
these
things,
yeah
and
I
think
we
missed.
We
missed
the
end
of
your
statement
there.
So
maybe,
if
you
could
just
kind
of
finish
your
thought,
yeah.
F
So
I'm
going
to
implement,
live,
Z
checks
for
many
components:
CCM
cert
controller
scheduler,
many
other
things
that
I
am
interested
in
as
an
operator.
What
I
I
think
is.
It
will
be
good
as
an
operator
if
I
hit
a
well-defined
label
or
pattern
create
a
crd
and
the
lease
object
of
all
leadership.
Election
based
components
could
benefit
from
that.
As
a
cloud
operator.
I
know
that
here's
the
time
for
the
next
one
hour
this
zone
is
bad.
F
So
any
anybody
who
is
a
team
who
is
using
the
cluster
I'm,
not
in
direct
contact
with
them,
I
just
created
a
cluster
for
them.
They
could
know
that
there
is
crd.
They
don't
have
to
do
that
themselves.
The
leader
election
controller
knows
about
that
and
turns
it
away.
Saying
I
should
not
be
the
leader
I'm
in
the
bad
Zone,
and
somebody
else
should
take
over
and
relinquishes
the
controls
and
never
takes
it.
There
could
be
TTL
safety
that
doesn't
happen
for
more
than
15
minutes.
B
Okay,
so
that
that
sounds
very
actionable
to
me
and
that
I'm
guessing
that
might
be
in
the
discussion
on
this
issue
here.
But
you
know
the
notion
of
being
able
to
say:
okay
I'm,
going
to
apply
this
label
that
describes
there'll,
be
a
zonal
outage
and
so
like
this
component
should
release
its
leadership
and
another
one
should
take
it.
That
is,
that
am
I
hearing
that
correct
yeah.
F
Okay
kind
of
that
and
this
third
controller,
which
is
not
an
AWS
vendor
thing,
it
could
take
benefit
from
that
and
move
away
to
a
different
component
right
controllers
in
as
your
gcp.
Anybody
writing
a
controller
could
use
this
mechanism
to
move
away.
Leadership
and
zones
are
a
thing
in
all:
Cloud
providers.
B
Yeah-
and
that
makes
sense
to
me,
like
I'm,
not
I'm,
not
really
sure
what
the
next
step
here
would
be
at.
Are
you
looking
for
kind
of
more
discussion,
yeah.
F
I'm,
looking
for
like
people
who
record
brainstorm
with
people
in
SRE,
Yorks
in
your
organizations,
I
could
take
contact.
I
could
reach
out
to
you.
Next
I
mean
people
on
this
call
to
see
who
I
could
talk
to
people
who
are
sres
might
be
dealing
with
customers.
They
might
be
seeing
one
of
issues
like
that
and
I'm.
F
Looking
for
contacts
to
to
see
how
how
Edge
case
is
this
if
it's
like,
because
this
is
not
really
a
edge
case-
I
mean
so
nuanced
that
we
don't
have
to
take
it
I'm,
all
okay
with
that,
but
I
do
want
to
like
talk
to
more
people,
so
I
could,
after
this
call
look
at
identity
list
reach
out
to
people
to
see.
They
could
put
me
in
touch
with
someone
who
could
be
interested
in
talking
about.
B
Yeah,
so
it
sounds
like
right
now.
This
is
kind
of
like
information
gathering
and
you're
trying
to
reach
out
and
meet
other
people.
So
unfortunately,
you
know
this
meeting,
although
it's
quite
large
today
we
have
a
lot
of
people.
Usually
this
meeting
is
kind
of
small
I
would
definitely
recommend
reaching
out
on
the
kubernetes
developer
list
to
see
if
others
have
run
into
this
and
yeah
just
start
to
expand
your
net
that
you're
looking
for
people
to
get
in
touch
with,
because
you're
I
mean
you're,
probably
right.
B
F
B
Thank
you,
Georgie.
That
was
a
great
topic
all
right.
Next,
we've
got
Bridget
looks
like
looking
for
some
feedback
here.
So
why
don't
you.
E
Take
it
out
yeah
this
one
might
be
sort
of
short,
because
Nick
had
actually
had
a
chance
to
look
at
this.
Put
some
comments
on
then
we
had
kind
of
the
slow
turn
around
and
it's
kind
of
a
meta
question
of.
E
Do
we
have
yeah
if
you
scroll
down
to
the
bottom,
you'll
see
that
you
know
a
colleague
of
mine
wrote
back
to
Nick
and
then
it's
been
a
couple
of
weeks
and
like
this
is
when
we
were
trying
to
get
in,
and
you
know
it's
okay,
that
it
didn't
make
it
in
this
time,
but
I'm
sort
of
wondering,
of
course,
when
and
if
Nick
has
a
chance.
He
can
look
at
it
again.
E
E
B
A
B
E
The
bug
is
that
if
the
node
IP
changes
routes
won't
be
updated,
it'll
just
kind
of
sit
there
being
like
what.
E
B
E
I
kind
of
just
wanted
to
see
what
see
if
anyone
had
thoughts
about
it
and
then
also
maybe
bring
up
the
topic
of
I
hate
to
have
all
of
this
be
on
a
couple
of
people's
plates.
If
we
can
get
more
people
who
want
to
review
this
kind
of
PR
I'm,
not
sure
exactly
what
the
plan
is
for
moving
forward
on
that.
But
maybe
we
need
to
try
to
make
that
happen.
Yeah.
B
B
Right
totally
and
in
my
impression,
like
some
of
the
difficulty
here,
is
just
finding
people
who
are
working
with
the
CCMS
at
a
low
enough
level
that
they're
like
comfortable
reviewing
these
things,
I'm,
not
sure
how
we
expand
that
net.
That's
that's
kind
of
a
bigger
question
for
me.
B
F
Like
good
I'm,
not
a
reviewer
I
could
take
a
look
help
but
I'm
not
sure
with
this
one,
but
this
I.
E
Mean
this
that
asks
you
to
solve
something.
This
is
so
much
like.
Yes,
exactly
what
what
Jay
is
saying
in
the
chat.
Please
go
ahead
and
unmute
and
mention
your
thoughts
there,
because
I
feel
like
this
is
where
we
need
to
give
people
a
chance
to
start
becoming
reviewers,
so
it
doesn't
get
stuck
on
just
a
couple.
People
yeah
when
Nick
and
Walter
are
here.
B
Yeah
because
I
think
oh
go
ahead.
Sorry
sorry.
E
We
need
to
make
sure-
and
this
is
something
I've
seen
in
other
sigs
too,
is
like
making
sure
that
people
who
want
to
start
being
contributors
have
that
chance
to
up
level
and
become
you
know,
forces
to
reckon
with
in
this
community
and
especially
if
you
have
deep
understanding
of
one
specific
area,
and
you
want
to
start
applying
that
to
Mars
like
I'm
just
kind
of
looking
at
the
folks
who
are
very
informed.
Coming
to
this
call
and
thinking
you
could
be
the
reviewers.
B
Yeah
plus
one
to
that
idea,
I
think
you
know
everyone
who
is
here
representing
you
know
some
cloud
or
some
large
provider
or
something
like
that.
You
know:
ask
your
colleagues
ask
your
friends
internally,
who
are
also
working
on
these
projects.
You
know
this
is
a
great
opportunity
for
junior
developers
also
to
get
involved.
B
So
perhaps
you
know
some
people
who
are
looking
to
get
more
involved
in
open
source
and
they
work
with
you
on
related
topics
internally,
certainly
if
you
could
direct
them
this
way,
we
would
be
happy
to
reach
out-
and
you
know,
as
Jai
is
kind
of
suggesting
here-
maybe
a
shadow
program
or
maybe
something
where
we
pair
up
and
work
on
some
bugs
together,
like
you
know,
or
just
reviews
together,
I
shouldn't
say
bugs
you
know
that
would
be
really
nice
so
yeah.
B
E
Awesome
and
then
yeah.
If
anyone
wants
to
take
a
look
at
that
long-standing,
but
hopefully
almost
resolved,
bug
that
I
have
a
colleague
reporting
there
be
thrilled
to
have
even
your
feedback
of
like
hey
the
stuff
that
you
are
trying
to
fix
here.
In
my
you
know,
especially
if
you
have
fresh
eyes
and
you
look
at
it
and
you
think
I
don't
know
if
this
does
fix
it
because
of
this.
B
And
I'll
just
mention
this
because
Jai
and
please
correct
me
if
I'm
saying
your
name
wrongly,
is
it
Jai
or
yai
or
Jay's?
Fine,
Jay,
okay,
I.
G
B
Saying
it
totally
wrong,
so
Jay
is
saying
in
chat
here
that
the
Sig
release
team
has
a
really
good
Shadow
program,
so
that
might
be
another
way
another
place
for
us
to
look
at
and
maybe
get
some
advice.
Thank
you.
Jay.
B
Okay,
any
other
comments
on
this
topic
or
things
that
we
would
like
to
bring
up
in
general.
D
A
quick
question
for
Denise:
if
you
are
you
on
kubernetes
Slack,.
C
D
Okay,
I
was
just
like
trying
to
follow
up
the
ID
that
you
use
on
GitHub.
So,
okay,
let
me
look
up
using
this
ID.
A
B
Right
cool
anything
else,
I
think
we're
gonna
skip
triage
today,
just
because
I
don't
necessarily
feel
comfortable
triaging.
These
things
I'd
prefer
if
Nick
or
Walter
were
here.
B
E
Yeah,
that's
familiar,
did
we
not
does
it
still
have
needs
triage.
B
We
were
asking
about
once
they
were
kept
for
this
and
there
is
a
cap.
Okay,
so
I
guess
once
it's
reviewed
like
does
that
mean
they
should
be
accepted?
Then.
B
E
A
B
E
Yeah,
there's
a
label
that
gets
applied.
You
can
see
the
the
in
the
list
of
commands.
Sorry,
if
you
click
on
that
I
understand
the
commands
that
are
listed
here.
If
you
search
in
there
for
the
word
triage
you'll
get
the
the
correct.
A
B
All
right,
okay,
so
I
mean
I,
guess
that's
it
we'll
leave
the
other
one
for
when
Nick
and
Walter
are
around
okay,
I
guess
anything
else!
I'm
gonna
stop
sharing
here
anything
else
or
should
we
should
we
take
back
some
time
here.