►
From YouTube: Istio Feb Meetup/ Demo: Scaling Istio in Large Clusters by Auto-Generating Sidecars by Cathal Conroy
Description
Demo: Scaling Istio in Large Clusters by Auto-Generating Sidecars
Sidecar resources are key to scaling Istio in large clusters, but configuring Sidecars requires a detailed registry of every given workload’s peers. What do you do if your org maintains no such registry? What if service names are dynamic? This is the story of how we used NetworkPolicies to auto-generate Sidecars for hundreds of services, spanning many technical orgs and teams.
A
A
So
a
little
bit
about
myself,
I'm
kahul
I've
been
with
workday
for
three
three
and
a
half
years,
always
in
the
public
cloud
team,
where
we
manage
all
aspects
of
workday's
public
cloud
offering,
and
so
that's
the
kubernetes
platform
itself,
service,
mesh
and
domain
specific
operators
and
everything
in
between
I'll
touch.
Briefly
on
our
scale
walk
you
through
our
journey
with
this
yo.
Briefly,
some
of
the
problems
we
encountered
as
we
rolled
out
to
our
larger
clusters
and
how
we
we
use
sidecars
to
overcome
those
problems.
A
So
we've
been
running
kubernetes
since
1.2,
currently
on
121
and
istio,
since
1.1,
currently
on
111.,
we've
had
seo
running
in
production
since
october
last
year.
Historically,
we
used
combination,
linker
dns
tunnel
to
encrypt
http
and
tl
or
tcp
traffic,
our
org
itself,
we're
split
between
dublin,
ireland
and
pleasant
and
california.
I
think
we've
about
50
engineers
and
six
teams.
A
A
At
any
one
time
we
can
have
up
to
200
dev
clusters,
each
with
about
a
thousand
pods
40
up
to
up
to
a
thousand
pounds.
40
names
with
nodes
and
75
namespaces
namespaces
are
going
to
crop
up
a
few
times
in
this
talk
and
there
isn't.
There
is
a
reason
for
that.
I
know:
namespaces
are
not
typically
a
scaling
constraint
for
istio,
so
I
will
explain
what's
going
on
there,
so
I'm
going
to
be
digging.
A
A
From
that
point,
by
july
we
had
a
requirement
to
to
run
with
low
ttl
certs
on
our
workloads
with
the
search
rotation
mechanism
with
zero
tolerance
for
service
disruption,
and
so
we
we
decided
to
split
our
delivery
between
istio
ingress
and
istio
mesh
and
we
set
ingress
as
our
first
goal.
A
So
up
to
this
point,
we
were
using
ingress,
lb's
external
to
the
cluster,
and
we
decided
we
all
wanted
to
run
with
istio
at
the
edge
so
by
january,
development,
full
on
wonderful
swing,
we're
building
images
off
scratch
or
just
released
for
proxy
v2,
and
we're
fitting
that
you
know
enormous
helm,
chart
that
is
provide
into
our
platform
cicd
system
there.
It
is
in
all
its
glory,
hopefully
not
too
many
of
you
had
to
work
with
that.
A
Back
in
the
time
come
april,
we
had
our
first
service
using
ingress
and
dev,
and
we
had
the
institute
control
plane
rolled
out
to
all
dev
xboxers
with
side
parent
traction
disabled
now,
by
september,
majority
of
ingress
services
migrated
over
in
dev,
istio
control,
plane,
running,
live
in
production
and
seo
is
handling
ingress
for
a
single
service,
which
is
a
big
milestone,
and
by
march
I
think
most
services
were
running
at
seo.
Ingress
at
this
stage.
A
Excuse
me,
and
then
by
january
we
had
pretty
much
everything
so
seo
ingress
was
was
running,
actually
live
in
production
and
we
had
the
majority
of
services
onboarded
to
its
your
mesh.
There
is
like
a
good
10-month
gap
there,
and
we
took
this
to
onboard
everything
to
issue
teaching
teams.
A
Why
we're
bringing
the
tech
in
the
problems
themselves
etc
and
actually
onboarding
them
so
that
I'm
already
involved
about
180
services
from,
I
guess
50
or
60
teams,
and
that
was
enough
of
a
challenge
to
inspire
our
talk
with
all
maybe
another
day,
but
anyway
by
january
we
had
all
services
on
board
in
a
dev
and
we
began
our
rollout
to
production.
A
So
around
that
time
in
february
we
were
scaling
up
our
perk
clusters
to
match
our
largest
production
clusters
to
make
sure
everything
was
performant
and
good
in
the
world.
Unfortunately,
we
found
it
wasn't.
Istio
was
running
fine
in
dev
and
are
much
smaller
production
environments,
but
it,
but
it
came
crashing
down
rather
spectacularly
in
our
large
perth
clusters.
A
So
we
have
two
kind
of
large
clusters
in
particular,
which
are
significantly
larger
than
others,
and
so
we
paused
our
production
roll
out.
At
this
point,
while
we
figured
out
what
was
going
on
fast
forward
to
october,
mesh
was
rolled
out
everywhere
in
dev
production
and
that's
we
are
where
we
are
today,
and
so
this
talk
is
kind
of
going
to
focus
in
on
the
the
problems
we
had
between
april
and
october.
Getting
those
those
last
largest
clusters
enabled
with
mesh.
A
So
as
we
scaled
our
clusters
up
to
around
three
and
a
half
thousand
four
thousand
pods,
we
started
encountering
problems
with
istio
istio
istiod's
hba
would
scale
as
far
as
it
could
seo.
D
pods
would
be
giving
would
begin,
ooming
and
every.
As
a
result.
Every
data
plane
proxy
in
the
mesh
would
start
timing
out
trying
to
pull
configuration
from
the
control
plane
and
those
which
did
manage
to
come
up.
A
Looping
crash
loops
going
all
over
throughout
the
mesh,
and
this
ascending
wave
after
wave
of
events
to
the
control
plane
to
the
process
and
the
control
plane,
wasn't
processing
this
fast
enough,
causing
data
plane
configs
to
time
out
to
slow
down
causing
more
crashing,
and
we
ended
up
with
a
positive
feedback
loop,
which
was
essentially
killing
the
seo
control
plane
and,
in
fact,
the
kubernetes
control
plane
too.
A
So
the
first
protocol,
the
slaving
is
today
we
scaled
it
massively
horizontally
and
vertically.
Just
to
see,
I
guess
and
see,
if
we
can
get
under
control,
we
increase
data
plane
memory
to
three
gigs,
bear
in
mind.
You
you've
got,
you
know
up
to
four
thousand
pods
running
three
gigs
at
least,
and
that
wasn't
enough
data
plane
still
om
all
over
the
place.
We
played
around
with
pilot
d-band
settings
after
max
to
try
and
reduce
the
amount
and
frequency
of
config
pushes
from
the
control
plane.
A
But
ultimately,
nothing
worked.
Here
is
a
snapshot
of
our
control
plane.
You
can
see
the
amount
of
replicas
we
have,
and
these
are.
These
are
peaking
at
around
eight
gigs
and
cpu
same
again,
we've
got
we're
running
12,
cpus
and
no
matter
what
we
get
essentially,
no
matter
what
we
scaled
to
and
the
control
plane
would
max
out,
and
we
were
stuck
with
the
issues
we
had
now.
A
This
graph
here
shows
the
max
proxy
memory
per
namespace
and
we're
looking
this
on
a
permanent
space
again
for
a
reason
which
we'll
circle
back
to
the
worst
offender
in
one
namespace
there
in
purple
it
was
maxing
out
at
8.5
gigs
of
memory
for
a
single
proxy
and
the
average
in
the
name.
A
Space
was
5.6
gigs
that
namespace
in
particular,
was
was
holding
you
know,
shared
messaging
systems,
so
kafka
and
rmq,
and
we
noticed
that
those
high
traffic
systems
would
would
have
the
highest
memory
footprint,
and
we
saw
an
enormous,
ultimately
reduction
in
that
memory
as
we
introduced
sidecars,
but
pretty
pretty
insane
memory
usage
across
the
board
majority
of
name
space
is
maxing
out
around
three
gigs
again,
which
is
our
limit
at
the
time.
A
So
it's
like
cars,
let's
take
an
imaginary,
imaginary
cluster.
This
monitor
cluster,
where
blue
boxes
represent
namespaces
and
green
circles
or
pods
or
workloads
by
default.
Istio
provides
every
workload
proxy
with
the
configuration
needed
to
communicate
with
every
other
workload
in
the
mesh,
which
is
fine
for
a
cluster,
with
three
names
places
and
seven,
eight,
eight
pods.
But
what
happens
when
you
have
a
lot
of
name
spaces
and
a
thousand
pods
or
five
thousand
or
ten
thousand
thoughts,
etc.
A
This
requires
a
pretty
enormous
amount
of
configuration
to
manage.
It
takes
a
lot
of
memory
to
store
and
it
takes
a
lot
of
cpu
to
compute
and
in
reality,
in
a
typical
microservices
architecture.
Most
services
only
ever
speak
to
a
very
small
subset
of
all
their
peers.
They
don't
actually
need
to
speak
to
everybody.
A
A
So,
of
course,
istio
does
have
a
mechanism
to
do
this,
and
that
is
sidecars.
It's
a
very
simple
istio
crd.
It
has
a
workload
selector
to
select
who
it
applies
to,
and
you
get
an
ingress
listener
list,
an
egress
listener
list
and
an
outbound
traffic
policy,
and
we
won't
talk
about
traffic
policy.
Today,
it's
not
important
here,
but
essentially,
among
other
things,
side
cards,
allow
us
to
limit
the
amount
of
configuration
pushed
to
our
sidecars
ingress.
Configuration
is
actually
derived
from
workload
information.
So
your
your
pod
ports.
A
A
We
can
configure
what
downstream
or
upstream
clusters
we
will
we
want.
We
want
to
get
configuration
for
essentially,
so
we
need.
We
need
a
definitive
list
of
all
the
peers
that
we
might
speak
to,
which
leads
us
to
one
important
question:
who
does
our
service
talk
to?
Who
does
any
given
service
talk
to
and
more
importantly,
as
a
cluster
administrator?
And
how
do
you
get
that
information
for
200,
odd,
unique
services
running
in
your
cluster,
which
you
don't
even
own,
and
so
yeah
quick
note
on
on
sidecar
workloads,
selectors
themselves?
A
A
default
cycle
is
one
with
no
workload.
Selector
and
typically
called
default,
sits
in
the
namespace,
and
it
applies
to
all
workloads
in
that
namespace
and
which
aren't
matched
by
any
other
sidecar
undefined
behavior
occurs
when
a
workload
is
assigned
more
than
one
sidecar
can
take,
so
whether
that's
from
multiple
default
sidecars
or
multiple
sidecars,
with
with
workload
selectors
avoid
it
don't
do
it
stay
away
from
it
one
workload
config
per
workload.
A
So
back
to
that
question,
who
does
my
service
talk
to
or
who
does
any
service
talk
to?
We
needed
some
definitive
source
which
described
relationships
between
all
services,
which
you
know
you
would
think
only
service
teams
could
know
that.
Why
would
we
know
that,
but
we're
on
a
deadline,
everyone's
on
a
deadline
and
we're
not
about
to
approach
60,
odd
service
teams
and
ask
them
to
write
a
sidecar
definition
for
every
one
of
their
services?
A
Oh
and
by
the
way,
if
you
get
it
slightly
wrong,
you
know
you're
going
to
risk
breaking
network
connectivity
so
that
wasn't
going
to
happen
we
had
to.
We
had
to
provide
a
solution
ourselves
work
day
internally,
we
have
a
tool
called
workday
registry,
which
is
like
a
it's
like
a
who's
who
of
services
and
workday.
A
You've
got
service
owners,
deployment
platforms,
dependencies
with
peers,
et
cetera,
et
cetera,
but
that
has
its
own
problems
it's
manually
configured,
so
everything
is
from
human
input,
and
so,
if
I
build
a
new
feature
and
I
have
a
new
dependency
and
I
don't
bother
putting
into
work
registry-
well,
that's
not
going
to
be
accounted
for.
I'm
going
to
have
no
network,
it's
not
exhaustive!
So
it's
typically
used
to
describe
you
know
application
relationships
relationships.
Your
services
need
to
drive
application
behavior.
A
It
doesn't
typically
include
things
like
logging,
syncs
or
metric
servers,
etc.
So
we
needed
something,
and
that
was
that
was
exhaustive
and
and
also
registry.
You
know
this
predates
kind
of
workplace
adventures
into
to
communities
and
public
cloud
and
everything,
so
it
had
no
concept
of
namespaces
or
services,
etc.
A
We
run
network
policies
in
all
our
clusters,
of
course,
and
we
run
a
deny
ingress
by
default
setup
policies,
which
means
anybody
at
all
to
pod
traffic
in
a
cluster
has
to
be
described
by
some
network
policy
somewhere.
A
Here's
a
sample
net
call
which
we
apply
to
each
namespace.
This
is
our
our
default
deny,
so
we
figured
if
we
could
map
out
every
possible
allowed
traffic
flow
from
network
policies.
We
could
kind
of
invert
that
into
a
map
of
ingress
egress
of
sorry,
so
everybody's
network
policies
describes
what
services
are
allowed
ingress
to
them.
A
So,
as
a
start,
we
want
to
programmatically
generate
something
like
this,
and
this
is
a
sidecar.
It's
a
default
sidecar,
no
there's
no
workflow
selector,
and
so
it
applies
everything
in
the
a
namespace
and
what
we're
doing
is
specifying
a
list
of
hosts
and
which
wildcard
two
other
namespaces.
So
we're
saying
everything
in
namespace
a
is
going
to
be
able
to
speak
to
everything
in
bc
default
enf
and
in
doing
this,
we're
reducing
our
peer
space
from
the
entire
cluster
to
just
these
namespaces.
A
Yes,
everything
in
those
namespaces,
but
it's
still
a
massive
reduction,
and
the
algorithm
is
very
simple
to
do
that
for
every
network
policy
in
your
cluster
and
note
the
destination
in
blue
and
note
your
source
namespaces
in
orange
there
and
so
kind
of
that's
a
flow
from
a
source
to
a
destination.
Add
that
to
your
map.
A
And
here's
the
same
thing
again
with
a
pretty
picture:
if,
for
every
network
policy
peer,
if
it
doesn't
have
a
namespace
selector,
then
the
source
is
the
current
namespace
and
if
the
namespace
selector
is
open,
add
that
name
paul's
namespace
as
a
destination
for
all
in
the
cluster.
A
A
There's
a
few
gotchas
here
if
the
ingress
peerless
is
empty
or
missing,
the
rule
actually
allows
all
topic.
If
the
ingress
pure
list
has
no
namespace
selector
matches
the
current
namespace.
If
the
namespace
selector
exists,
what
is
empty?
It
matches
all
namespace
and
it's
really
important
to
get
this
right
again
without
it.
You're
kind
of
you
risk
breaking
network
connectively
in
your
cluster.
A
A
A
A
Why
do
we
have
a
setup
like
this?
It's
the
results
of
decisions
from
very
early
in
our
kubernetes
journey.
This
is
not
good
practice,
it's
not
something
you
should.
You
should
try
to
replicate
and
you
know
provide
good
namespace
isolation
between
all
of
your
services,
and
certainly
we
don't
allow
that
today,
but
this
is
the
reality.
This
is
what
we
had
to
to
build
around
so
per
name.
A
Space
site
cards
were
not
going
to
be
enough
for
us
and
looking
back
at
those
side
cars,
we
wanted
to
go
a
step
further
and
define
exactly
which
hosts
a
given
night
name.
Space
might
want
to
erase
to
so.
This
means
that
a
namespace
which
only
wants
to
egress
to
you
know
one
or
two
services
in
that.
In
that
default
link
space.
I
was
only
going
to
get
a
configuration
for
those
two,
so
we've
gone
from
wildcarding
and
every
namespace
to
specifying
exact
services
in
those
spaces.
A
The
biggest
problem
with
this
is
that
sidecar
egress
listeners
use
fqdns,
while
network
policies
use
pod
selectors,
and
so
we
needed
some
way
to
find
what
services
sit
in
front
of
a
you
know.
Given
product
kubernetes
provides
no
such
mechanism
to
do
that,
and
there
is
no
way
natively
to
find
which
servers
may
sit
in
front
of
a
given
pod.
A
So,
of
course,
kubernetes
services
use
pub,
selectors
to
select
pods
and
that's
very
straightforward.
That's
a
one-way
lookup
and
network
policies
and
our
source
of
truth
can
select
pods
via
different
puzzle,
vectors
services.
A
A
And
actually
it's
quite
easy,
so
pop
selector
gives
us
a
label
selector
and
all
that's
doing
is
listing
pods
in
a
namespace
and
reducing
them
to
those
whose
labels
are
a
superset
of
the
label
selector.
A
So
similarly,
then
we
take
those
pod
labels
and
we
list
all
services
in
that
namespace.
This
positive
lecture
is
a
subset
of
our
pod
labels,
so
we
need
to
adjust
our
algorithm
very
slightly.
A
The
problem
with
this
now
is
is
we
were
generating
these
kind
of
as
we
bootstrapped
our
clusters,
so
they
weren't
very
you
know
they
weren't
dynamic.
Of
course,
clusters
are
not
static,
name,
spaces,
change,
services,
pods
network
policies,
they
all
change,
and
these
sidecar
definitions
need
to
to
respond
and
be
regenerated.
Accordingly,
we
need
something
dynamic,
introducing
the
sidecar
generator
very
appropriately
named.
It
generates
istio
side
pairs,
it's
an
evolution
of
what
we've
just
seen
before.
Rather
than
and
running.
A
You
know
during
cluster
bootstrap,
we
actually
run
it
in
a
cluster
permanently
as
a
deployment,
it's
long-lived,
and
so
it
watches
namespaces
services,
pods,
network
policies
and
responses,
changes
and
regenerates.
Sidecars
live
in
the
cluster,
so
rescribe
to
interested
resource
events
using
a
shared
informer
and
the
sharing
former
triggers
add,
update
and
delete
functions.
A
We
do
a
bit
of
filtering
there
to
ignore
events
that
we
don't
care
about.
For
example,
you
know
annotation
changes
they're
not
going
to
impact
our
sidecars,
so
we
don't
care
event
handlers
decide
whether
or
not
they
need
to
trigger
a
regen
of
side
cars,
and
we,
essentially,
you
know,
push
these
events
into
a
channel
of
one
size.
If
there's
something
in
the
channel
regenerate
the
sidecars
and
take
the
item
from
the
channel,
otherwise
wait.
A
And
this
is
the
output
yeah,
it's
a
bit
blurred,
but
yeah.
Imagine
this
with.
You
know
a
couple
hundred
side,
cars
and
you're,
creating
this
for
every
every
name,
space
in
the
cluster
yeah,
and
this
had
pretty
enormous
impact
in
our
perth
clusters.
It
it
completely
stabilized
the
kubernetes
control,
plane,
seo
control
plane
and
you
can
see
the
worst
offending
max
memory.
A
Pronating
space
has
dropped
from
8.9
gigs
to
1.3.
That's
an
85
percent
decrease
the
worst
offending
average
memory,
average
yeah,
sorry,
average
memory
per
name.
Space
has
decreased
from
5.6
gates
to
650
miles
and
that's
an
88
decrease.
So
absolutely
enormous
memory
decrease
in
the
data
plane
and
something
similar
to
control
plane
and
the
top
graph
here
is
cpu
per
hdod
instance
and
we
saw
a
75
reduction
there
and
the
bottom
is
the
same
for
memory.
A
A
So
yeah,
that's
that's
pretty
much
it
moving
forward
and,
as
we
scale
our
clusters,
we're
probably
gonna
have
to
optimize
even
further.
Here
again,
we
were
generating
circuits
per
service.
We
could
go
even
further
and
generate
them
per
service
per
port.
The
sidecar
allows
us
to
do
that.
I
don't
know
if
we're
going
to
see
huge
optimization
there,
I
don't
know,
I
don't
think
we
have
too
many
services
which
expose
you
know,
ports
that
aren't
used
by
you
know
all
their
clients.
A
Ideally
we
would
generate
a
single
sidecar
for
every
workload
and
but
that
requires
a
you
know:
a
unique
label
for
every
single
workload.
We
saw
a
pattern
in
our
clusters.
Where
you
know,
teams
might
use
a
layer
label
and
which
would
select
multiple,
unique
services.
Then
they
would
also
use
an
app
or
a
name
label
to
select
individual
ones
and
that
kind
of
screws
up
how
you're
applying
workloads
to
individual
pods.
A
But
you
know
it's
it's
something
we
could
solve,
but
we
haven't
needed
to
yet
a
big
one
for
us
and
it's
something
that
we're
absolutely
working
on
since
discovering
this
is
eliminating
those
shared
work.
Those
shared
namespaces
completely,
so
the
default
one
is
a
pretty
bad
example
for
us,
and
we've
been
pushing
backs
on
the
online
service
seems
heavily
had
to
do
that.
A
Now
we
did
find
one
kind
of
issue
upstream
on
the
envoy
proxy
git,
where
where
people
were
talking
about
lazy
loading
and
by
clusters
and
routes
and
endpoints
as
they
were
kind
of
queried,
which
would
be
pretty
awesome,
the
conversation
seemed
to
died
off
about
march.
So
I'm
not
sure
I
don't
think
there's
any
active
work
on
that
at
the
moment,
but
that
would
be
something
really
awesome
and
to
see
in
envoy
and
then
obviously
we
will
get
to
reap
the
benefits
from
that
in
in
istio,
too.