►
From YouTube: Kubernetes SIG network meeting 2020-01-09
Description
Kubernetes SIG network meeting from Jan 9, 2020
A
A
A
A
C
B
F
B
B
H
C
C
B
B
B
C
B
B
C
M
B
C
B
L
B
L
K
I,
come
okay,
great,
so
I
just
want
to
bring
some
attention
to
a
kept
that
we
had.
That
is
pretty
simple
to
add
at
protocol
to
services
and
endpoints.
The
idea
here
is
just
to
make
it
a
little
bit
easier
to
work
with
little
balancers
and
I'm
sure
there
are
other
use
cases
as
well,
and
this
has
been
a
few
issues
related
to
this
for
years
now,
and
these
suggestion
was
well,
we
could
just
add
a
protocol
to
services
and
endpoint
ports,
and
this
has
already
gotten
a
decent
amount
of
discussion
and
I.
K
C
P
C
Wrote
the
comments
specifically
saying
you
should,
if
you
can't
use
I,
am
a
service
names
which
are
called
out
in
an
RFC,
but
you
can
use
the
prefix
to
define
your
own.
So
you
can
have
company
comm
slash
my
special
thing,
or
you
can
say
HTTP,
which
is
the
I,
am
a
specified
protocol,
name
and
actually
I.
Think
I
am
a
says.
Hds
TLS
and
maybe
a
couple
of
other
names
on
the
same
thing.
C
K
This
is
this
has
been
I
solved,
shall
we
say
with
annotations
that
are
frustrating
to
work
with
and
not
consider
that
consistently
across
clouds.
I
think
all
three
of
the
major
u.s.
clouds
have
different
annotations.
That
mean
the
same
thing:
yeah
yeah,
so
this
is
culturally
in
the
standard
way
to
you
does
the
same
thing
is
all
this
sorry
if
somebody
was
just
talking
and
I
couldn't
hear
you
you're,
really
quiet.
Sorry.
R
R
J
J
Q
J
J
C
J
C
K
K
I
C
K
T
Can
you
hear
me
yeah,
yeah,
so
I
brought
in
this
cap
about
the
mix
protocol
supporting
service
definition?
Maybe
you
remember,
it
was
a
issue
in
December
that
it
should
be
a
cap
and
it's
a
kind
of
heads-up,
but
now
there
is
a
first
draft
and
I
try
to
investigate
the
different
cloud
providers
and
different
cloud
load.
Balancers
from
pricing
perspective.
T
What
it
would
mean
if
if
additional
protocol
is
implemented
or
added
behind
the
same
IP
address
or
behind
the
same
opens,
for
instance,
so
that
is
now
in
the
cap
and
also
if
there
is
a
need
for
some
option
control.
Then
there
are
some
proposals
in
the
cap
for
that
either
as
an
annotation
or
similar
to
the
current
metal-metal
lb
implementation
or
GCE
workaround.
A
N
K
We're
back
to
Rome,
okay
I've
got
a
couple
quick
ones.
The
first
one
is
with
endpoints
lice.
We
realized
that
we
need
to
split
it
into
two
different
feature
gates
to
do.
A
seamless
upgrade
the
rash
now
is
that
you
can't
guarantee,
during
a
cluster,
upgrade
process
that
all
endpoint
slices
are
going
to
be
created
by
the
time
they
would
be
consumed
by
COO
proxy
I.
Imagine
in
99%
of
use
cases.
F
K
Yeah
I
mean
it's.
Unfortunately,
the
endpoint
slice
controller
is
limited
by
whatever
in
the
API
rate
limit
is,
and
we
have
no
idea
what
else
might
be
consuming
that
rate
limit.
So
you
can't
assume
that
endpoints
less
control
is
going
to
get
all
20
PPS.
It
could
be
any
number
of
things.
So
in
a
really
big
cluster,
with
a
lot
of
endpoint
slices,
it's
potentially
a
problem
and
the
way
the
queue
proxy
support
is
written.
Is
it's
one
of
the
other
just
yeah
yeah.
K
C
A
K
Move
on
to
the
next
one,
then
this
has
been
going
on
for
a
bit.
It's
clarifying
ingress,
be
one
and
specifically
what
we're
doing
for
path
type
there.
So
path
type
is
a
new
idea
into
ingress
and
we
have
implement
implementation
specific,
which
is
basically
what
we've
already
done.
That's
backwards
compatible,
but
we're
also
adding
patents
that
are
prefix
and
exact.
So
you
can
be
a
little
bit
more
clear
for
each
path.
K
How
you
want
it
to
be
match,
decide
whatever
is
implementing
that
there's
a
minute
long
discussion
around
how
we
should
default
those
because
this
is
adding
a
new
attribute,
but
we
also
want
it
to
be
required
at
some
point,
and
so
I
just
want
to
bring
attention
to
that
cap.
If
you
have
any
strong
feelings
on
how
we
should
approach
cap
tight,
definitely
add
a
comment.
K
There
I
think
we've
reached
a
consensus
before
he
was
actually
involved
in
that
cap,
but
since
this
little
larger
audience
take
a
look
and
if
you're
interested
in
ingress
be
one
in
the
progress
there
we're
meeting
every
other
week,
just
the
opposite
time
of
this
meeting
so
a
week
from
today,
it'll
be
in
risky
one.
Instead
of
staking
it
work
on
the
ingress.
S
C
S
So
it's
not
me,
but
it
there's
precedent
for
it
right.
So
as
soon
as
I
say
like
if
this
is
spec
dot,
back-end,
which
isn't
going
to
exist
anymore,
it
says
that
doesn't
exist,
which
kind
of
makes
sense,
because
we
pass
in
to
the
validate
ingress
spec,
currently
just
the
internal
type.
But
we
delete
that
because
we're
you're
renaming
it
right.
C
So
you
can
write
the
code
in
this
case
that
says
that
checks,
the
default
back-end
field
and
then
switches
on
the
API
version
and
for
you,
1,
beta
1,
spells
it
back-end
and
for
you
on
it
calls
it.
You
fall
back
the
error
message:
it's
really
it's
ugly
that
it
has
to
be
open
coded
that
way,
there's
an
ongoing
discussion
about
validating
on
real
version
types,
but.
V
S
S
C
Right
so
I'm
saying
you
only
switch
in
the
error
message
that
you
produce
the
logic
is
that
is
going
to
operate
on
the
internal
type
with
the
internal
names
and
then
at
the
last.
Second,
if
you
detected
it
in
his
failed
validation,
you
do
an
if
clause
to
see
if
which
witch
named
put
in
the
error
message.
Oh
okay,.
S
I
understand
so
it
doesn't
really
I,
don't
really
need
to
validate
the
the
back-end
type.
It's
just
if
this
thing
fires
at
all.
If
your
v1
say
this,
if
your
yeah
kind
of
dirty
that
makes
a
lot
more
sense,
yeah,
so
I
should
be
able
to
finish
that
up.
You
know
tomorrow
morning.
That's
what
I
was
working
on
today,
but
I
got
lost
in
the
types
thing:
okay,.
C
C
C
C
A
J
Yeah
I
put
a
comment
there.
I
can
submit
it
and
then
people
basically
reply
and
if
you're
gonna
be
there
and
we
could
coordinate
yeah,
let's
get
it
in
no
reason
not
to
know
they
said
there
may
be
a
lack
of
room,
so
we
should
just
submitted.
Our
sessions
are
always
very
well
attended,
let's
put
it
and
we'll
figure
out
what
to
do
with
it.
Okay,
so
all
you
know
out
after
a
race
and
everything
and.
A
E
Yeah,
so
maybe
I
missed
something,
but
it
seems
to
me
that
kubernetes
is
pretty
good
in
fact,
I
really
like
the
Korea's
control
plane.
It's
it's
a
distributed
system
built
with
a
lot
of
attention
to
resilience
to
all
sorts
of
crap.
That
can
go
wrong,
but
there
does
seem
to
be
one
blind
spot
or
maybe
I'm
just
missing
something.
But
you
know
one
of
the
failure
modes
that
I
see
is
if
something
goes
wrong
in
the
data
plenty
of
that
the
C&I
plugin
establishes.
E
Nothing
really
detects
that,
specifically
all
right.
There's
health
checks
on
containers,
the
lorry
start
containers,
there's
health
checks
on
nodes,
but
nodes
communicate
over
a
host
network,
not
the
cluster
Network
right.
So
what
I'm
saying
is,
it
seems
like
there's
kind
of
a
an
opportunity
or
some
kind
of
issue
here
where
nothing
is
detecting
the
fact
that
the
cluster
network
specifically
has
problems
and
reporting.
G
It
per
se
so
there's
two
things
that
were
working
on
upstream
in
CNI
on
that
and
the
first
one
was
the
CNI
check
command
that
got
added
to
CNI,
sometime
I,
think
like
mid
last
year
and
I
believe
that
there
is
now
support
in
cryo
for
that
command
and
that's
still
on
a
per
container
basis.
But
that
explicitly
allows
whatever
CNI
plug-in
is
handling
the
networking
there
to
check
its
control,
plane
and
return
errors,
at
which
point
the
pod
will
be
terminated
and
then
retried.
So
that's
kind
of
the
short-term
fix
that
check.
G
Support
needs
to
get
into
the
docker
shim
part
for
kubernetes
and
that's
kind
of
been
on
a
to-do
list
for
somebody
on
my
team
for
a
while.
But
we
haven't
quite
gotten
there
yet
for
dr.
shim,
but
we
did
add
it
to
cryo.
The
second
part
is
kind
of
a
much
longer
term
CNI
GRP
C,
which
would
allow
events
from
the
plugin
to
go
back
to
cube
a
synchronously,
but
that's
also
kind
of
tied
in
with
random
discussions
that
are
pretty
early
stage
around.
What
should
the
overall
networking
API
look
like
for
kubernetes?
So.
G
Part
of
that
is
that
the
check
command
for
sandboxes
specifically
is
supposed
to
do
essentially.
Does
this
container
have
network
access,
and
if
the
control
plane
is
not
working
for
the
plugin,
then
you
could
argue
that
you
know
that
container
may
or
may
not
actually
have
network
access.
It
depends
on
the
control
plane
itself,
but
it's
also
intended
to
do
things
like
you
know.
Does
this
contain
your
still
have
an
IP
address?
G
G
C
I
think
that's
a
real
challenge
if,
depending
on
whatever
plugin,
you
are,
how
do
I
know
that
you
have
connectivity
or
my
pod
might
not
actually
be
doing
anything
with
the
network?
My
on
have
any
open
sockets.
It
might
be
network
policy
about
from
egress,
so
I
can't
go
in
and
poke
out
like
you
have
to
do
it.
The
driver
level
right
like
or
something
the
host
to
make
sure
you're
getting
something
across
that
pipe.
If
it's
deep.
G
It
would
be
part
of
the
pod
sandbox
status
and
that
would
get
called
every
time.
Pod
sandbox
status
get
called
through
CRI
and
the
plug-in
can
do
as
much
or
as
little
as
it
wants
to
do,
and
if
it
decides
that
for
some
reason
the
sandbox
is
unhealthy
network
wise.
Then
it
would
return
an
error
and
then
the
expectation
is
that
cubelet
would
decide
that
that
pod
needs
to
be
torn
down
and
restarted.
E
H
We
also
do
they
want
me
to
get
like
if
we
do
like
a
service
level
health
check
like
what's
the
mitigation
of
mitigation,
we
have
it's
restarting
pods
or
you
know
just
rain
clouds
I
mean
creating
them
like.
That.
Is
that
much
more,
we
can
check
what
are
the
failures
that
we
be
checking
where
that
service
level
health
check
that
we
can't
check
in
these
others
can
I
what.
E
Right,
I'm
thinking
of
situations,
I
mean
I've,
seen
two
cases
right.
One
is
something
decays
on
a
node
and
then
all
the
pods
on
that
node
that
are
using
close
to
networking
become
unreachable,
and
then
the
other
case
is
something
is,
is
wrong
in
the
cluster
and
everything
is
unreachable.
On
cluster
networking,
of
course,
in
the
latter
one
there's
no
recovery,
but
you
might
like
some
kind
of
report
that
says:
hey
your
cluster.
Networking
is
totally
hosed
and
in
the
other
case
you
know
restarting
really
blacklisting
the
pod
I'm.
G
We
do
have
that
check
right
now,
and
so
you
can
sort
of
simulate
this
right
now
as
long
as
you
use
something
like
a
demon
set
or
something
outside
of
cubelet
to
remove
your
CNI
config
file
and
then,
depending
on
what
C&I
driver
you
use,
that
may
trigger
the
node
to
go
and
a
healthy
if
the
config
file
is
gone.
So
if.
C
C
What
if
hypothetically
we've
lost
the
route
in
the
cloud
provider
and
so
within
the
node,
all
everything's
happy
the
GRP
c
CI
or
the
c,
and
I
check
whatever
is
all
running.
Fine,
even
a
local
art
for
ping
across
a
beef
pair
will
work
a
fine
all
pod
health
checks
will
be
successful.
The
cube
look
but
because
it's
using
the
host
network
we'll
be
recording
to
the
API
server
that
all
the
pods
are
happy
and
healthy,
but
the
pods
themselves
actually
can't
get
in
or
out
on
the
note
so.
G
E
Think
in
our
style
of
doing
things
right,
we
would
want
and
to
end
behavior
tests
right,
not
something
that's
tied
to
a
particular
implementation.
I
guess
I
also
want
to
back
up
to
the
again
this
interesting
point
of
what's
the
recovery
and
what
are
we
really
looking
for
and
I
talked
about
problems
in
the
node
and
in
the
cluster.
So
for
the
case
of
problems
in
the
node,
it
suffices
to
actually
have
cific
probe
pods
deployed
for
this,
so
they
were
there.
E
B
Yes,
so
this
is
making
me
think
of
something
that
I
always
think
of
when
I
see
the
the
weird
Network
policy
rule
that
the
node
is
always
allowed
to
connect
to
every
pod,
so
it
can
do
health
checks
and
that's
the
fact
that
we
don't
really
specify
anywhere
whether
health
checks
are
supposed
to
be
testing
if
the
the
pod
server
is
working
or
if
the
network
is
working
to
the
pod.
But
maybe
this
just
makes
sense
as
another
kind
of
health
check
where,
where
we
say
explicitly,
we
want
some
component
to
periodically
test.
B
C
C
I
think
to
build
a
lot
of
higher
level
there
to
integrate
with
a
lot
of
higher
level
load
balancers.
You
need
an
externally
reachable
object,
but
we
don't
require
it.
So
it's
been
challenging
to
try
to
figure
out.
How
do
we
make
that
work
without
requiring
a
tie?
I'm
not
against
the
idea
of
a
service
level
got
like
a
keep
adding
service
stuff,
but
adding
a
service
level
health
check
that
defines
behaviorally?
What
does
it
mean
for
this?
For
the
pond
behind
the
service?
Do
I.
E
Yes,
although
I
still
I'm
still
thinking
about
the
node
case,
you
know-
and
maybe
this
is
good,
because
it's
more
compositional
right
it's
if
we
can
test
that
the
well
yeah
I
mean
yeah
I
mean
you
can
imagine
all
sorts
of
things,
but
if
you
know
the
case
of
something
to
color
the
case
on
a
node
and
we
have
some
that
can
detect
that
and
of
course,
that'll
help
anything
that's
running
on
that
node
and
also,
if
you're
testing
the
service.
E
Q
E
Okay,
yes
right
right!
If
it's
it's
not
consistent
throughout
the
node,
then
that's
a
little
harder
that
gets
back
to
the
problem
of.
How
can
we
test
every
pot
on
the
node
rather
than
a
probe
pod
on
the
node?
It
would
be
every
pod,
that's
using
cluster
networking
and
I
guess
back
to
ping
I
mean.
Maybe
we
want
to
just
be
doing
something
with
ping
where
network
policy
allows
it
or
maybe
define
their
work
quality
to
allow
it
or
have
some
wave.
It's
inspecting
network
policy
to
see
where
it's
allowed.
E
C
I
mean
something
maybe
as
an
incremental
step
towards
proving
this
out
would
be
to
write
that
prober,
but
actually
write
it
as
a
line
operator
that
creates
a
demon
set,
runs
a
probe,
tears
down
and
even
set
and
then
periodically
recreates
the
demon
set
so
that
you're
forcing
it
to
allocate
and
reallocate
IP
addresses.
So
you
bypass
some
of
these,
like
it
used
to
work,
but
now
it
doesn't
problems
rights
to
be
held
for
thrashing
on
recent.
Yes,
yes,.
Q
C
Q
Song
at
lift
that
we've
been
fighting
with
a
little
bit.
Our
current
idea
is
maybe
defining
a
custom
threshold,
a
percent
of
workloads
that
can
go
unhealthy
on
annuit
before
we
decide
to
notice
unhealthy
making
like
the
demon
set,
a
new
resource
approach
could
work,
but
I
worried
that
there's
so
many
failure
edge
cases
where
a
new
resource
might
still
succeed.
When
there's
no
problem,
it's.
H
E
E
G
C
If
you
want
the
enterprise-grade
version
of
this,
you've
got
something
that
runs
in
your
cluster,
where
each
plug-in
and
reaches
out
to
the
cluster
agent.
You
have
two
of
them,
so
you're
guaranteed.
One
of
them
is
not
on
a
node
and
they
both
have
to
reach
in
and
because
the
plug-in
knows
all
the
details,
it
can
say:
okay,
ping
will
work,
fine
or
a
ping.
All
work
fine,
but
you
can
do
this
other
thing.
E
C
This
is
it's
an
interesting
topic
and
I
think
it
segues
into
the
Gateway
and
service
health
checking
stuff
nicely
it's
hard
to
abstract.
They
say
we're
going
to
cover
every
possible
use
case.
I
mean
I,
don't
know
if
there's
some
route
in
real
problems,
Mike
or
if
maybe
we
could
accumulate
from
people
or
the
Valerie.
We've
got
food,
something
around
experience.
E
Mean
I,
it's
I've,
you
know
I'm
not
running
production,
stuff
myself,
but
in
my
own
experience
you
know.
Cluster
networking
is
something
that
fails
so
yeah
I
think
gathering
experience
from
people
who
do
have.
It
would
be
a
great
place
to
start
so
I'll
send
out
an
email
start,
an
email
thread
on
the
mailing
list.
Maybe
we'll
gather
some
experience
and
think
about
what
sort
of
I
want
to
focus
really
on
what
really.
Ultimately,
the
question
is
what
remedies
well
I
said
what
no
it's
remedies
can
be,
kicked
upstairs
and
all
sorts
of
stuff.
A
C
M
R
V
L
U
D
C
Well,
I
hope
that
y'all
stick
with
us,
we
should
do
I
would
love
to
do
as
a
group
a
once
once
for
the
year
backlog
Burton
one
of
these
meetings
to
go
through
not
just
the
triage,
but
like
all
the
things
that
we
have
triage
into
bugs
and
see.
If
there's
any
that
are
closeable
or
we
can
get
some
of
these
new
contributors
to
help
out
with
some
of
the.
M
That'd
be
awesome,
just
from
my
perspective.
Over
the
last
couple
months,
I've
been
trying
to
find
issues
to
kind
of
start,
dipping
my
toe
in
the
water
and
the
minute
I
start
trying
to
look
at
one.
You
know
someone
will
jump
in
and
they've
got
a
PR
for
it
and
the
way
we
go
and
so
on
to
the
next
thing
so
it'd
be
helpful
for
me.
Is
you
know
to
get
a
little
direction
from
the
team
like
where
some
good
places
to
jump
in
and
help
sure.
B
Yeah,
if
you
want
to
jump
in
like
leave
a
comment
on
whatever
issue
or
PR
or
whatever
you're
thinking
about
because
lots
of
people
when
they're
going
through
PRS,
if
they
see
that
somebody
else
has
already
claimed
responsibility,
they'll
be
like
oh
good.
I.
Don't
need
to
touch
this
one
then
so
it'll
be
all
yours.
The
converse.
M
C
Okay,
it's
a
great
starting
place
and
and
people
should
not
be
shy
about
jumping
on
things
like
don't
worry
about
stepping
on
toes
or
those
sorts
of
concerns
like
there's
plenty
of
work
to
go
around.
Not
all
of
it
is
organized
very
well
and
come
in
so
close
by.
If
you
get
stuck
jump
on
slack
and
we
do
talk.