►
Description
Running Istio across multiple clusters can bring a lot of value, but can be difficult. In this session, we review some of benefits of a multi-cluster architecture including single-pane of glass operations and global service routings and what patterns and practices can be used to ease operations.
B
B
So,
let's
start
at
the
beginning,
let's
talk
about
the
standard
ico
deployment,
as
we
can
see
here,
it's
pretty
straightforward.
We
have
a
single
cluster
with
a
single
ico,
control,
plane
and
three
different
workloads,
an
ingress
gateway
that
sends
traffic
to
the
account
service
that
sends
traffic
to
the
user
service.
B
B
Last
point
we
want
to
talk
about
is
that,
due
to
regulatory
geography,
attempts
to
reduce
latency,
we
may
actually
have
to
run
clusters
in
different
regions
and,
with
that
I'll,
take
take
it
over
to
a
time
to
talk
about
a
little
bit
about
the
what
we
see
as
far
as
multi-cluster
hcl.
A
So
what
does
multi-cluster
and
istio
offer
us?
Why
is
it
so
great?
First
of
all,
it
offers
us
redundancy
for
the
control
plane
and
the
data
plane.
So
if
the
control
plane
were
to
go
down,
we
have
backups
of
the
control
plane
and
similarly,
if
our
workloads
or
members
of
the
data
plane
were
to
go
down,
we
have
replicas
of
those
as
well.
A
Next,
as
you've
all
mentioned,
we
have
flexibility
in
deployment,
geography
and
availability.
So
we
have
customers
running
all
over
the
world
and
who
are
serving
their
own,
their
own
customers
potentially
all
over
the
world,
and
so
they
might.
They
are
running
services
in
different
localities
and
needing
again
to
run
their
services
in
those
different
regions
and
zones.
A
So
let's
talk
about
a
multi-cluster,
istio
deployment
and
what
it
really
looks
like
so
to
do
that,
let's
just
quickly
go
back
to
the
single
cluster
deployment,
just
as
a
reference
point,
so
we
have
our
gateway,
our
account
service
and
our
user
service
and
they're
all
being
they're
all
being
managed
by
a
single
istio
control
plane.
A
Now,
let's
start
to
expand
that
picture.
So
now
we
have
again
our
account
service
and
our
user
service
in
cluster
one
region,
one
but
we're
adding
more
clusters.
So
we
have
cluster
two,
which
has
an
order
service
and
another
user
service,
and
we
have
a
third
cluster
and
this
one
is
in
is
in
region
two,
so
this
one
has
an
order
service,
a
user
service
and
an
account
service
of
its
own.
A
So
this
way
again,
we
have
split
for
for
redundancy
and
high
of
availability,
our
service
across
multiple
regions
and,
as
you
can
also
see,
we
have
istio
running
in
all
of
these
all
these
clusters.
A
Now
what
challenges
does
this
bring
up?
Well,
first
of
all,
these
clusters
need
to
be
able
to
communicate
with
each
other
right
if,
if
the
whole
reason
for
them
to
exist
and
be
part
and
to
deploy
in
a
multi-cluster
is
so
that
potentially
they
can
communicate
with
each
other.
B
A
How
do
we
make
that
happen?
Well
now
we
need
to
add
gateways
right.
We
need
to
add
gateways
or
and
or
ingresses
managed
by
istio,
which
allow
our
different
services
to
communicate.
This
is
often
called
federation,
and
so
now
that
we
have
added
these
gateways
or
ingress,
these
services
running
in
different
clusters
are
able
to
communicate.
A
A
Well,
we
need
to
establish
trust
when
we
do
that
via
shared
certificates,
not
shared
certificates,
but
a
root
of
trust,
which
will
then
create
roots
of
trust
for
our
clusters,
which
can
communicate
with
each
other.
So
all
this
to
say
that
when
scaling
up
from
one
to
n
clusters,
you
are
not
simply
just
adding
right
cluster
after
cluster,
you
are
also
adding
levels
of
complexity
due
to
the
the
connections
that
must
be
made
and
maintained
between
the
different
clusters.
A
So
what
is
this
added
complexity?
Well,
as
the
name
of
our
talk
would
suggest,
configuration
managing
sprawling,
istio
config
for
many
clusters,
simultaneously,
two
orchestration
ensuring
that
all
of
our
control
planes
are
running
compatible
versions,
ensuring
that
all
of
our
control
planes
are
up
being
able
to
observe
all
of
our
control
planes
and
knowing,
what's
going
on
at
any
given
time
and
then
three
federation,
exposing
services
running
in
clusters
to
other
clusters.
A
B
Thank
you,
ethan.
We
come
and
see
two
deployment
patterns
for
multi-class
radio
deployment.
Let's
talk
about
the
first
one:
a
replicated,
control
plane
in
this
method
in
this
system
to
deploy
ezio
each
cluster
gets
its
own
hcod
control,
plane
right
and
essentially
it
means
that
each
cluster
has
its
own.
It's
your
ds
and
is
only
aware
of
the
workload
running
in
that
cluster
communication
across
cluster
is
done
using
ingress
gateways
that
the
user
need
to
manually
configure.
B
So
let's
talk
a
little
bit
about
the
pros
and
cons
of
this
approach
where
each
cluster
has
a
different
hcod
deployment
right.
So
one
pro
it's
availability
and
in
addition
to
that
fault,
tolerance
and
that's
pretty
simple,
to
explain,
because
each
cluster
is
completely
isolated
as
far
as
hcod
failure
in
one
cluster
will
not
impact
at
all
other
clusters
right
and
the
other
pro
is
that
each
cluster
is
self-contained.
B
B
Now,
let's
talk
about
the
other
method
for
deploying
hco
in
a
multi-cluster
way
and
that's
a
single
control
pin
as
you
can
see.
In
this
example,
we
have
an
ecod
deployed
to
cluster
3
and
it
is
configured
to
manage
cluster
one
and
cluster
two
in
region
one
and,
of
course
you
can
also
mix
and
match,
and
in
this
example,
we
have
a
also
a
replicated
control
plane
for
region.
Two
that
manages
the
clusters
in
region.
A
B
Right
so
in
this
example,
you
can
see
that
eziod
in
region,
one
in
cluster
three,
is
managing
directly
the
the
cluster
one
and
cluster
two
in
region,
one
and
there's
no
extra
hcod
for
each
cluster.
So
we
have
one
hcld
managing
two
clusters.
At
the
same
time,
let's
talk
a
little
bit
about
the
pros
and
cons
of
this
approach
right.
So
the
pros
is
that
because
we
now
give
hcod
access
to
the
kubernetes
api
in
these
two
clusters,
it
can
perform
a
service
discovery
and
endpoint
discovery
and
help
us
with
service
federation.
B
B
So
let's
talk
a
little
bit
about
the
cons
right.
So
in
order
for
this
to
work
it,
so
they
needs
access
to
the
kubernetes
api
server
for
each
remote
cluster,
and
we
touched
a
little
bit
about
this
in
the
in
the
previous
approach
right.
Essentially,
it
kind
of
opens
up
a
little
bit
the
api
server
beyond
the
cluster
itself
into
an
hcud
that
is
running
in
a
separate
cluster,
and
you
need
to
make
sure
that
this
is
properly
secured.
B
B
The
disadvantage
in
in
this
approach
is
that
also
the
config
boundary
right,
because
each
ecod
is
deployed
and
independent
in
each
cluster.
Essentially,
each
cluster
configuration
can
only
happen
in
that
cluster
right.
So,
for
example,
imagine
if
you
have
an
authorization
policy
that
allows
the
user
workload
to
talk
with
the
account
workload
right
now
you
have
to
replicate
that
configuration
across
every
cluster
and
if
it
ever
changes,
you
need
to
make
sure
all
these
configurations
are
in
sync.
B
A
A
So
what
are
the
pros
of
running
multi-cluster
istio,
as
we
said
earlier,
redundancy
for
our
control
plan
in
our
data
plane,
no
single
point
of
failure,
gonna
keep
saying
this.
This
is
so
so
so
important.
We
we
hear
our
customers
talking
about
this
all
the
time,
flexibility
and
deployment,
geography
and
availability,
flexibility
in
company
policy
and
practice.
A
What
are
the
cons
because,
let's
be
honest,
there
are
some
and
they're
big.
We
have
added
network
hops,
since
our
traffic
may
be
going
through,
maybe
going
through
gateways
when
crossing
cluster
boundaries,
establishing
trust.
This
is
a
non-trivial
task,
observing
and
debugging
all
our
services
and
our
control
planes
across
multiple
clusters
in
a
unified
way
and
four
something
that
we've
touched
on
over
and
over
again
configuration
sprawl,
keeping
all
of
the
sprawling
configuration
in
check
and
in
sync.
A
A
Now
what
were
the
pros
of
the
multi-cluster
istio
deployment
that
we
mentioned
again?
High
availability,
fault,
tolerance
and
isolation
and
clusters?
Don't
need
access
to
each
other's
api
servers
now
what
if
we
could
wed
the
two
of
these
into
one
one
thing
and,
as
you
may
have
guessed,
that's
glue
mesh.
A
A
Well,
that
sounds
great,
but
let's
see
an
example
come
on
so
we're
going
to
go
with
the
example
of
implementing
a
reliable,
highly
available
service,
specifically
with
a
replicated
control
plane.
Now
remember
we
discussed
earlier
that
our
customers
are
tending
to
favor
the
replicated
control
plane.
This
config
would
be
a
little
bit
different
with
a
single
control
plane,
but
for
the
purposes
of
this
example,
we're
going
to
use
replicated.
A
Now
in
glue
mesh,
you
would
need
one
virtual
destination
and
one
traffic
policy.
That's
only
two
crds,
a
massive,
a
massive
step
down
in
terms
of
the
number
of
config
objects
that
you
need
to
manage,
not
to
mention
that
all
of
the
glue
mesh
objects
would
live
in
one
cluster
as
opposed
to
the
multiple
of
the
other,
so
keeping
it
in
sync,
with
your
with
any
kind
of
git
ops,
ci
cd
approach
would
fit
with
the
current
current
tools.
A
So
with
that
in
mind,
let's,
let's,
let's
do
a
quick
demo
for
those
of
you
who
have
watched
the
keynote
you
will
have
already
seen
this
demo,
but
for
those
of
you
who
have
not
please
enjoy
it
will
make
morse.
It
is
tailored
specifically
to
this
situation
and
shows
off
the
benefits
and
simplicity
of
the
glue
mesh
crds
for
this
typically
difficult
scenario.
A
Now
before
we
get
started
with
this,
let's
quickly
go
over
the
workloads
and
services
that
we
have
running
in
our
clusters
and
so
in
cluster
one
here
we
have
product
page
ratings
reviews,
v1
and
reviews
v2,
and
for
those
of
you
that
are
familiar
with
the
bookinfo
app
you'll
know,
you'll
notice
that
there
is
an
instance
of
the
reviews,
app
that
is
missing.
Now
we
can
just
go
over
to
our
other
cluster
to
go,
find
it
so
in
cluster
2
we
have
reviews
v3.
A
A
Now
the
virtual
destination
has
only
a
few
parts
to
it.
The
first
part
is
the
host
name,
and
this
is
the
dns
address
at
which
this
virtual
destination
will
be
made
available
to
the
rest
of
the
mesh
or
the
rest
of
the
virtual
mesh,
depending
on
how
you
decide
to
export
it
and
that
we'll
talk
about
that
below
the
destination
selector.
This
is
the
select.
This
is
how
you
select
the
services
which
will
become
a
part
of
the
virtual
destination
and
then
later
will
be
seamlessly
filled
over
to
outlier
detection.
A
This
is
how
envoy
or
istio
decide
how
or
when
a
service
becomes
unhealthy
such
that
it
will
be
removed
from
the
pool
when
making
routing
decisions.
Next,
as
I
said
earlier,
is
the
export
two.
This
is
our.
This
is
our
mesh
list.
This
can
also
optionally
be
a
virtual
mesh.
This
is
how
the
user
can
decide
which
parts
of
the
system
should
have
access,
be
able
to
call
this
virtual
destination,
and
then,
lastly,
is
the
port.
A
We
are
selecting
the
review
service
as
our
destination
and
routing
all
traffic
bound
for
the
review
service
to
our
virtual
destination,
which
we
have
created
above
so
this
will
ensure
that
all
traffic
bound
for
the
review
service
will
in
fact,
call
our
virtual
destination.
A
And
we
can
see
here
that
we're
getting
our
reviews
now
something
worth
noting
and
the
way
that
we
are
going
to
tell
the
difference
between
our
review
services
is
that
the
reviews
v1
will
not
return
a
color.
The
reviews
v2
will
return
the
color
black,
as
you
can
see
here,
and
the
reviews
v3
will
return
the
color
red
now
we
can't
see
that
yet
because
our
local
services
are
still
healthy.
A
So
let's
go
ahead
and
just
call
that
one
one
more
time
and
we'll
see
that
we
have
another
call
to
the
local
service.
So
now,
let's
go
ahead
and
make
our
local
services
unhealthy.
We're
going
to
do
that
by
injecting
the
sleep
command
into
our
local
deployments.
So
first
we'll
start
with
v1
and
as
we
wait
and
just
quickly
wait
for
that
to
roll
out
and
then
we're
going
to
go
ahead
and
do
v2.