►
From YouTube: Application Networking Day Session #7 Data Plane Resilience, No Problem. What About Control Plane?
Description
Featuring Eitan Yarmush. Envoy is an incredibly performant cloud native proxy which is quickly becoming one of the most used pieces of software across our industry. Due to Envoy's popularity, and configurability, quite a few control planes have also been created to dynamically configure Envoy. These include Gloo, Istio, and others. There are now many control planes, but what makes a great control plane. In this talk we'll examine one specific aspect of configuring Envoy, resilience, meaning it's ability to tolerate failures.
A
A
Yeah
we're
going
to
take
a
bit
of
a
step
back
or
not
back,
but
we're
going
to
talk
about
Envoy
here.
A
So
we've
been
talking
a
lot
about
istio
and
ambient
and
ebpf,
but
we're
going
to
take
a
step
back
and
talk
about
the
technology,
the
proxy
that
has
powered
istio
up
until
this
point,
that
being
the
envoy
proxy.
But
the
cool
thing
about
Envoy-
and
this
talk
is
that
these
Concepts
really
apply
to
kubernetes
software
development
in
general.
So
I
think
just
sharing
our
experience
here
and
specifically
how
it
relates
to
Envoy
is
very
interesting
and
and
hopefully
very
useful
to
all
of
you.
A
So
with
that
we'll
get
started
so
who
am
I?
My
name
is
etan
yarmesh
I
am
an
architect
at
solo
and
I've
been
with
solo
for
about
four
years
now,
I've
worked
on
pretty
much
all
of
our
products
past
future
things.
I
can't
talk
about
no
I'm,
just
kidding
and
so
yeah.
So
what
are
we
going
to
go
over?
A
Well,
first
I'm
gonna
give
a
brief
introduction
about
Envoy,
specifically
XDS,
for
those
of
you
so
I
guess
to
take
a
poll
who,
in
this
room
knows
what
XDS
is.
A
Okay,
so
don't
worry
if
you
don't
I'm
gonna
go
over
it
just
just
curious
and
then
I'm
gonna
describe
the
problem
statement
and
potential
Solutions,
so
you're
just
gonna
have
to
hold
your
breath
for
that
one
all
right,
so
Envoy
I
guess
who,
in
this
room,
has
heard
of
envoy
before
yeah.
That's
what
I
expected.
Okay,
so
Envoy
is,
as
the
you
know,
as
the
slide
says,
open
source,
Edge
and
service
proxy
designed
for
cloud
native
applications.
Now,
why
is
that
right?
We
hear
Envoy
Envoy,
Envoy
Envoy,
it's
this
amazing
proxy.
A
Now,
in
my
opinion,
there's
actually
two
reasons
for
that.
One
is
its
performance
and
speed
right.
It
operates
incredibly
at
scale,
but
the
second
most
important
one
is
that
it's
configuration
and
the
way
it's
configured
specifically
with
XDS
is
designed
for
cloud
native
in
in
environments.
So
we're
going
to
talk
about
that
in
a
second,
but
specifically
we're
going
to
be
talking
about
the
different
control
planes
that
exist
and
how
those
are
built.
A
So
there
are
many
control
planes
out
there
right
now,
specifically,
you
know
from
the
solo
side
there's
glue
Edge,
that's
our
that's
our
Edge
proxy
and
then
there's
istio,
of
course,
which
configures
all
of
the
sidecars
and
now
the
Waypoint
proxy,
as
well
as
the
Z
tunnel,
and
then
there's
other
projects
that
exists
in
the
cncf
landscape,
like
contour
and
on
Plenty
of
others
that
have
been
home
brewed
by
lots
of
different
companies,
but
obviously
I.
Don't
I,
don't
know
what
those
are
so
we're
not
going
to
go
into
those.
A
But
there
are
many
so
why
Envoy
to
talk
about
why
Envoy
I
think
we
should
talk
about
how
the
typical
proxy
is
configured
right.
So
when
you
think
about
configuring
a
proxy
historically
right,
we're
going
to
use
nginx
as
as
an
example
right,
there's
some
config
file
that
you
have
to
give
to
that
proxy
right.
The
proxy
or
whatever
piece
of
software
has
to
read
that
configuration
in
right.
Now.
Let's
say
you
want
to
update
that
configuration
right
either.
The
proxy
has
to
restart
to
pick
it
up
or
assuming
there's
some
hot
restart.
A
It
has
to
read
in
the
file
right
close
down
some
of
its
connections
and
restart
them
right,
and
so,
when
you're,
in
an
environment
where
the
endpoints
are
not
changing
very
often
not
a
big
deal,
not
how
kubernetes
Works
pods
are
coming
up.
They're,
spinning
down
they're
changing
all
the
time.
Endpoints
are
all
over
the
place,
so
we
need
a
proxy
that's
able
to
handle
that
okay.
So
how
does
envoy
work
well?
A
The
way
that
Envoy
gets
its
config
is
that
there
is
a
control
plane
or
a
management
server
in
Envoy
language
running
somewhere
that
the
proxies
can
talk
to,
and
there
is
a
bi-directional
stream
I'm
not
going
to
get
too
much
into
the
technology
there,
but
there's
a
bi-directional
stream
between
Envoy
and
the
control
plane
over
which
the
config
is
fed.
So
Envoy
asks
for
the
config
and
the
control
plane
sends
it
back.
So
this
is
a
very
in-depth
look
at
what
that
looks
like.
A
This
is
merely
meant
to
show
the
complexity
and
Ingenuity
of
the
model
I'm
not
going
to
dive
too
deep,
but
I
will
say
that
the
envoy
proxy
docs
have
an
entire
page
about
the
XDS
model,
and
I
can
talk
at
length
about
why
I
think
XDS
is
amazing,
but
for
now
I'm
not
going
to
go
too
deep
on
this,
but
this
is
at
a
high
level
how
XDS
works.
A
So
what
does
that
look
like
more
simplified?
So
Envoy
is
going
to
say
right.
I
want
config
V1
cool.
Now
the
control
plane
says
back
all
right
here.
It
is
I.
Have
that
for
you
now
in
the
in
the
good
case
right
if
the
config
is
is
good,
then
Envoy
is
going
to
say
all
right
that
was
great
I
can
feed
I
I
can
serve
that.
Thank
you
now
what
happens
when
it's
bad?
Okay,
it's
the
same
same
First,
Steps
Envoy
reaches
out
control.
Plane,
says
here:
it
is,
but
this
time
Envoy
says
nope.
A
A
Now,
how
can
this
go
wrong
right
in
our
in
our
initial
discussion
about
nginx
and
and
typical,
you
know,
historically,
like
configuring,
a
proxy
through
files
right,
the
file
wasn't
going
to
go
away.
The
file
is
the
file
system,
I
mean
assuming.
Unless
you
know
you
have
some
really
crazy
situation.
The
file
system
is
probably
going
to
be
there.
However,
that's
not
always
true
for
a
different
application.
A
A
One
is
hardware
issues,
and
one
is
software
issues,
so
the
Red
X
is
a
is
a
hardware
issue,
so
let's
say
for
some
reason:
the
control
plane
is
scheduled
to
a
node
or
it's
running
somewhere,
where
you
know
the
hardware
goes
down,
for
whatever
reason
you
have
a
regional
outage,
zonal
outage,
but
so
in
this
case,
envoy
can
no
longer
get
its
config
and
it's
going
to
stop
serving
the
correcting
thing
right
and
that's
a
real
problem.
A
So
again,
the
new
model,
the
envoy
model-
allows
for
many
Dynamic
config
sources,
especially
routes
and
and
endpoints,
which
are
constantly
changing
with
zero
downtime.
That
means
zero
downtime.
When
you
change
any
of
the
config
it
just
continues
to
serve
another
plus
is
you
can
have
multiple
config
sources
for
a
single
proxy
which,
if
you
have
a
file,
that's
not
possible
and
then
also
there's
a
well-known
Proto
API.
A
So
it's
really
easy
to
create
these
these
control
planes,
but
there
are
Pros
to
the
condition
to
the
to
the
traditional
approach
for
one
it's
saved
to
disk,
so
it's
tolerant
to
a
restart.
If
the
envoy
proxy
restarts
and
the
control
plane
is
not
up,
you
might,
you
might
be
subject
to
some
downtime,
it's
simpler
when
you're
deploying
smaller
number
of
proxies.
So
if
you
only
have
a
few
proxies
and
they're
running
from
your
file,
config
right,
you
don't
have
to
worry
about
potentially
configuring.
A
So
many
proxies
all
at
the
same
time
and
it
is
also
simpler
if
the
config
is
static,
so
if
it
doesn't
change
very
often
which,
as
mentioned
earlier,
is
probably
not
true
in
a
cloud
native
environment
where
the
pods
are
spinning
up
and
spinning
down
all
of
the
time.
A
So,
let's
pre.
So,
let's
briefly
talk
about
the
standard
running
environment.
So
in
a
normal
case,
you're
going
to
have
Envoy
running
in
your
kubernetes
cluster
and
your
control
plane
running
in
your
kubernetes
cluster
now
hold
on
as
as
mentioned,
what
can
go
wrong?
Well,
the
node
can
go
down
right
and
if
your
node
goes
down,
your
control
plane
is
going
to
be
down
and
Envoy
is
not
going
to
get
any
more
config
now,
hopefully,
it'll
it'll
reschedule
but
having
any
downtime
where
Envoy
can't
get.
Those
updates
is
obviously
less
than
ideal.
A
So
what
can
we
do
about
that?
Well,
luckily,
kubernetes
allows
us
to
scale
up
pods
right
using
a
deployment
or
whatever
other
you
know,
scheduling
mechanism
you
might
use
depending
on
your
kubernetes
distro,
but
the
deployment
right
is
the
the
one
that
comes
with
kubernetes,
so
you
can
scale
it
up,
and
so,
if
one
goes
down,
you're
much
you're
much
safer.
So
if
a
specific
node
right,
let's
say
you're
running
a
multi-zone
cluster,
if
one
zone
goes
down
you're
safe
awesome,
so
we
are
still
liable
to
failure
here.
A
So
what
failures
are
we
still
liable
to?
Well
what
if
there's
a
software
issue
in
the
control
plane
and
we
stop
serving
config
right?
We've
eliminated
the
problem
of
having
a
node
go
out
or
having
a
hardware
issue,
but
we
still
have
potential
software
issues
because
we
want
to
continue
serving.
But
if,
for
some
reason
we
have
a
panic
or
our
control
plane
is
not
functioning
properly,
Envoy
will
stop
getting
its
config.
So
how
do
we
deal
with
that?
A
Well
to
talk
about
that?
Let's
talk
about
how
the
control
planes
usually
work-
and
this
is
very
similar
to
how
many
control
planes,
whether
it
be
for
Envoy
or
anything
else,
in
kubernetes
work.
You
have
a
set
of
crds,
which
are
configuring,
the
control
plane
and
then
the
control
plane
configures,
whatever
end
product,
whether
you
know
on
and
which
is
Envoy.
In
this
case,
however,
the
control
plane
is
liable
to
failure,
just
like
any
other
component
in
kubernetes
right.
That's
what
this
orange
x
means.
A
It's
definitely
not
orange
on
the
screen
screen,
but
it
is
orange
for
those
of
you
in
the
back
and
some
weird
color
here.
So
how
do
we
ensure
that
our
control
plane
is
always
serving
config
right,
even
though
it
is
receiving
config
via
crds,
potentially
from
other
users,
potentially
from
a
whole
nother
control,
plane
right,
the
the
abstractions
continuously
layer
on
each
other
and
all
of
a
sudden?
A
So,
let's
talk
about
two
ways
that
that
we
can
do
this,
and
this
is
these-
are
both
coming
from
learnings
that
we
have
had
at
solo
building
glue
Edge.
So
the
first
thing
is:
we
can
make
sure
that
the
core
logic
of
the
control
plane
doesn't
error.
Now,
what
do
I
mean
by
that?
So
we
can
ensure
that
there
is
no
situation,
no
code
path
in
which
the
control
plane
doesn't
continue
to
serve
at
least
some
config
right
now
this
comes
down
to
a
a
win-lose
among
two
different
situations.
A
Right,
do
you
want
to
serve
the
minimal
config,
or
do
you
want
to
stop
serving
altogether
in
our
experience?
Serving
minimal
config
is
always
better
than
stop
than
not
serving
at
all,
and
so
what
that
has
meant
in
practice
is
making
sure
that
the
actual
logic
which
takes
in
the
new
configuration
or
whatever
is
it
is
listening
to
and
then
translates
it
to
the
envoy.
Config
can
never
fail,
always
serves
something,
and
you
know
ensure
it
can't
panic,
but
you
know:
we've
all
been
there.
So
how
do
we
solve
so?
A
What's
another
thing
that
we
can
do
well,
we
can
add
a
cache
layer.
So
what
does
that
mean?
Depending
on
the
control
plane,
implementation
right?
There
may
be
quite
a
bit
of
complex
business
logic
in
it,
and
this
business
logic
is
liable
to
failure.
Again.
We
can
you,
know
integration,
test
and
unit
test
and
end-to-end
test
to
our
hearts
content,
but
things
happen,
so
it's
liable
to
failure
one,
but
it's
also
potentially
computationally
expensive
right.
A
So
if
we
have
a
control
plane
and
all
of
a
sudden,
we
want
High
availability
for
all
of
our
proxies
and
we
start
scaling
it
up.
Our
CPU
costs
might
start
going
through
the
roof
because
all
of
a
sudden
we're
Computing
the
config
for
Envoy
a
million
different
times.
So
why
do
we
need
to
compute
it
so
many
times?
If
the
problem
we
were
trying
to
solve
was
just
to
make
sure
it
never
went
down
and
to
make
sure
it
doesn't
go
down,
we
only
need
probably
one
percent
okay.
A
So
what
can
we
do
about
that?
Well,
luckily,
there's
an
open
source
project
from
the
envoy,
Community
called
XDS
relay,
which
can
help
with
that
and
we'll
talk
about
that
in
a
second.
So
what
does
XDS
relay
well?
This
helps
in
two
major
ways,
for
the
first
is
caching,
and
this
is
actually
something
that
we
added
on
top
of
XCS
relay
the
open
source.
A
Implementation
does
not
do
this,
but
essentially
it
just
takes
the
last
good
config
and
saves
it
in
the
Pod
and
so
and
not
the
last
good
config,
the
last
config,
and
it
saves
it
locally
in
memory
in
relay
right.
So
again,
all
the
logic
in
relay
is
just
get
served,
config
and
hold
it
for
Envoy,
no
complicated
business
logic
just
take
hold
right
and
the
the
other
is
aggregation.
A
So
Gathering
up
configuration
from
multiple
sources
right
that
way,
Envoy
doesn't
have
to
be
the
one
to
do
that
and
then
relay
serving
the
config
to
Envoy.
So
these
are
the
buzzwords
that
are
on
their
their
on
their
layer,
but
again
you're,
potentially
saving
a
lot
of
compute,
because
the
work
that
this
component
has
to
do
is
very
little
very,
very
very
little
and
again,
the
important
thing
here
is
not
just
XDS
relay
right,
but
it's
the
concept
here
and
again.
A
I
really
believe
that
this
this
model
applies
to
so
many
development
scenarios
in
kubernetes
it.
These
are
the
lessons
that
we
learned
from
blue
edge
and
Envoy,
but
so
now
now
that
we've
sort
of
gotten
towards
the
end
right.
We
have
scaled
our
control,
plane
right
and
we're
it's
running
on
multiple
zones,
but
we
were
finding
it
expensive.
So
we
wanted
to
keep
that
down
and
we
also
wanted
to
make
sure
that
if
the
control
plane
did
go
down
for
any
logic
reason,
the
config
was
cached
in
XDS
relay
okay.
A
A
And
and
that's
it
yeah
again
in
summary,
just
these
are
the
lessons
that
we
have
learned:
building
out
glue,
Edge,
and
hopefully
you
can
learn
something
from
them
for
your
General
application
development
or
for
learning
more
about
how
Envoy,
XDS
or
anything
in
there
Works
again,
I
was
Aton
hope
you
enjoyed.