►
From YouTube: 020 Istio at Scale
Description
In this talk, I'll share the lessons I learned while deploying Istio on multiple Kubernetes clusters with more than 1000 Pods.
I decided to run everything on KinD clusters and I had to tune many parameters at different levels (Linux kernel, Kubernetes, Istio, ...).
After that, I'll explain how multi cluster communication can be configured using either the native Istio discovery service (EDS) or Gloo Mesh. I'll compare both solutions and how they scale.
Finally, I'll demonstrate how Gloo Mesh can be used to provide high-availability of applications across clusters, zones, and regions.
A
A
My
goal
here
is
to
speak
about
the
lessons
learned
from
the
testing
I
did
when
I
was
trying
to
deploy
multiple
istio
clusters
with
you
know
a
lot
of
pods
and
so
on,
and
that
means
in
fact
trying
to
deploy
like
many
many
control
planes,
many
pods,
but
without
spending
too
much
money.
If
I
want
to
summarize
that-
and
if
I
look
at
my
growing
numbers,
I
wanted
to
achieve
at
least
five
cubans
clusters.
A
So
five
is
your
control
plane
at
least
a
thousand
pods
and
spending
less
than
100
a
day,
and
I
had
different
options.
There
gke
clusters
on
one
side-
and
I
said
gabe
that
could
be
eks
clusters
or
whatever
clusters
provided
by
a
cloud
provider
and
can
cluster
on
the
other
way,
which
means
like
having
a
large,
vm
and
deploying
multiple
kinds
clusters
in
this
vm.
A
Both
options
are
valid,
I
would
say
I
started
with
a
gke
cluster,
but
it
was
like
more
expensive,
but
it
was
still
like
attractive.
So
that
was
not
the
reason
why
I
I
decided
to
go
finally
with
kind
clusters,
it's
really
more
complex
at
the
beginning
with
ken
clusters,
but
you
get
a
lot
of
advantages
like
you
can
redeploy
everything
very
quickly.
A
You
can
try
out
different
network
topologies
like
having
you
know,
communication
between
pods
through
gateways
or
or
directly.
You
can
run
that
anywhere.
So
it
was
like
providing
a
lot
of
advantages
and
I
ended
up
with
being
able
to
deploy
eight
istio
control
plane.
So
you
can
see
here
one
ubuntu
vm,
my
eight
kind
clusters
and
metal
lb
running
on
each
of
them
so
that
I
can
create
service
type
load,
balancers
and
so
on.
A
I
became
a
little
bit
crazy
during
this
tested
because,
like
there
were
like
every
time
I
was
like
solving
an
issue.
I
was
finding
another
one,
and
the
goal
of
this
talk
is
that
for
you
it
should
go
like
that
right.
It
should
be
very
easy.
You
should
be
able
to
relax
and
just
like
apply
what
I
learned
and,
and
you
should
be
able
to
deploy
everything
quite
easily.
A
So
the
first
issue
I
I
encountered
was
like
this
too
many
open
files
and
it
was
quite
straightforward
to
find
out
when
I
was
looking
at
the
istio
logs.
So
I
got
that
at
the
very
beginning
when
I
deployed
the
first
250
pots
in
the
first
cluster,
because
I
decided
to
go
like
8
cluster
and
250
pods
per
cluster.
A
So
that
was
like
what
I
ended
up
with
and
I
found
very
quickly
this
documentation
on
the
the
kind
website
telling
about
which
values
to
modify
when
you
get
this
kind
of
issue,
and
it
was
okay.
But
I
put
this
shy
guy
here
just
to
remember
that
I
need
to
tell
you
that
don't
be
shy
with
the
numbers
you
use
here
right.
A
It's
not
a
production
environment,
it's
just
for
tests
and
why
why
I
said
don't
be
shy
is
because,
if
you
start
to
use
like
the
numbers
that,
for
example,
I
I
got
in
this
documentation,
then
yeah
you
can
deploy
more.
But
you
start
to
have
the
same
issue
again
later,
because
you
reach
the
limit
again,
so
don't
be
shy,
put
like
high
numbers
so
that
this
issue
is
fixed
and
you
can
move
to
the
next
one.
A
So
next,
one
again
trying
to
deploy
these
250
pods
and
after,
like
I
don't
know
like
you
see
I
I
was
still
having
like
190
189
pending.
So
you
know,
I
did
quite
only
like
60
something
and
it
was
just
like
all
of
them.
Staying
in
pending
states,
the
other
ones,
and
I
I
finally
figured
out
that
there
was
that
this,
like
cpu
and
memory
request
that
was
set
by
default
when
you
deploy
istio
and
third,
is
your
cycle
proxy.
A
So
I
changed
the
value
from
the
default
value
of
100
milli,
cpu
to
10
milli,
cpu
and
from
128
megabyte
of
ram
to
32
a
megabyte
of
ram,
so
just
change
that
in
the
istio
sidecar
injector
config
map
and,
obviously
again
you
do
that.
You
find
that
you
modify
this
value
and
then
you
figure
out.
You
need
to
restart
again
because
the
pods
that
you
deploy
don't
use
this
right
value
right.
A
So
you
need
to
either
restart
the
current
pods
and
it
will
use
the
new
value,
but
you
know
how
it
is
like
in
kubernetes
right
if
it's
very
often
faster
to
redeploy
your
cluster
than
to
try
to
to
delete
like
dozens
of
spots
right.
So
that's
again,
one
of
the
benefits
of
using
kind
is
that
you
discover
an
issue.
A
A
Is
that
basically,
you
just
read
the
number
of
maximum
number
of
poles
per
node,
which
is
like
by
default
110,
and
you
see,
I
deployed
less
than
100,
but
you
you
already
have
like
the
the
system,
pods
like
the
pods
for
calico
for
metallic
and
so
on.
So
you
reach
this
limit
and
you
change
that
in
the
kind
config
there
is
like
a
kind
cluster
object
and
that
you
can
patch-
and
you
say
here
I
say
like
1000
pods
per
cluster,
but
you
guess
what
you
know
you
need
to
restart
right.
A
A
But
then
what
happened
is
that
I
started
to
get
another
issue
which
is
like
you,
you
you
deploy
these
these
pods,
like
you,
you
have
like
one
yammer
that
describes
all
these
spots
and
you
submit
this
yaml
file
and
it's
like
process
the
the
first
10
or
20
very
quickly,
and
then
you
start
to
slow
down
slow
down
and
then
it
takes
like
several
seconds
per
pod
at
the
end
right
and
it's
just
because
etcd
is
slowing
down.
A
So
the
the
approach
I
took
here
was
like:
okay,
let's
just
create
a
one
gig
tmpfs.
So
memory
file
system
and
again
update
the
configuration
of
the
can
cluster
so
that
it's
use
this
tmpfs
to
as
a
backend
for
etcb.
A
A
So
next
step
was
probably
one
of
the
most
difficult
one
to
solve.
I
was
like
having
this
fast
etcd
and
I
was
able
to
deploy
my
first
cluster
with
250
pods
and
then
the
second
one
with
250
pods.
Then
I
believed.
Okay,
that's
that
that
looks
good
right.
If
I
can
do
that
now,
I
should
be
able
to
deploy
as
many
clusters
I
want
as
soon
as
I
still
have
memories
available,
because
you
know
if
it
works
with
one,
it
works
with
two.
A
It
will
work
with
three
five,
whatever
and
and
now
in
fact,
at
some
point
after
a
few
clusters,
I
started
to
have
really
weird
dns
error.
Sometimes
it
was
the
nsl.
Sometimes
it
was
like
very
strange
issues
that
you
don't
really
understand
and
and
after
some
doing
some
research,
I
found
out
that
when
you
run
too
many
containers
in
the
same
operating
system,
you
get
this
issue
with
the
arc
cache.
A
So
you
have
different
values
there,
where
you
can
manage
how
garbage
collection
is
performed
by
your
operating
system,
and
there
are,
there
is
one:
that's
represent
the
number
of
entries
that
you
have
in
the
cache.
The
minimum
number,
the
most
important
one
is
probably
the
second
which
is
called
like
gc
stage
two,
which
represent
the
the
maximum
of
entries
in
this
cache,
and
after
that
it
starts
to
remove
the
old
ones,
and
then
you
have
the
r
maximum.
So
I
again
I
just
like
the
the
first
one.
A
I
think
the
default
is
128,
which
is
very
low.
Imagine
you
it
cannot
even
keep
in
the
cash
like
entries
for
more
than
100
something
containers
right.
So
it's
very
very
low.
It
was
still
working
for
a
few
clusters
because,
probably
even
if
the
arc
cache
was
not
big
enough,
there
was
not
so
many
new
entries
to
manage
all
the
time.
So
it
was
okay,
but
with
a
thousand
parts,
then
that
doesn't
scale
right.
So
I
I
use
these
values.
You
can
use
the
same
ones
again.
I
was.
A
I
have
not
been
shy
here
because
first
time
I
started
to
do
like
just
going
to
from
128
to
256,
but
still
I
was
having
the
problem
just
a
little
bit
later,
but
I
was
still
having
good
problem
so
using
these
commands
just
fix
that
that
issue
after
that,
what
I
found
out
is
that
I
have
like
a
huge
machine
with
like
several
hundreds
of
gigabytes
of
ram,
but
still
I
will
not
be
able
to
deploy
all
my
pods
because
I
won't
have
enough
memory.
A
If
I,
if
I
do
the
calculation
right,
because
I
found
out
that
each
pod
was
using
like
100
megabytes
of
ram-
and
I
wanted
to
do
like
eight
cluster
2050,
pods
and
100
megabytes
per
pod
means,
like
200
gigabyte
of
ram.
That
was
a
little
bit
too
much.
I
was
close
to
have
enough,
but
it
was
still
too
much
so
then,
here
what
I
discovered
is
this.
A
You
know
the
the
the
fact
that
by
default,
istio
is,
is
configuring
the
different
sidecap
proxy,
so
that
they
have
visibility
of
all
the
other
pods
in
the
clusters?
So
that
means
like
there
was
like
what
we
call
envoy
clusters
entries
in
each
sidecar
for
all
the
other
parts
of
the
cluster
and
the
more
you
have
entries
there,
the
more
you
use
memory.
A
So
if
you
create
like
a
sidecar
object
in
an
istio
sidecar
object,
you
can
define
which
pods
they
can
see,
and
this
special
notation
that
you
see
here,
which
is
like
dot
slash
star
means
basically
that
I,
I
will
only
see
the
the
only
the
other
parts
of
the
same
name,
space
by
default
and
it's
in
istio
system
namespace.
So
that
means
it
will
apply
on
on
everything.
A
So
I
went
from
like
100
megabytes
to
70
megabytes,
and
that
means
that
I
needed,
like
60
gigabyte
of
ram
less,
which
is
which
was
good
enough
for
for
my
testing
and
what's
interesting
is
that
it
was
all
these
all
these
different
things.
All
these
different
steps
are
just
like
for
preparing
everything
right
for
deploying
istio
on
eight
clusters
and
these
250
ports.
But
after
that
I
wanted
to
do
this
testing
about
multi-cluster
communication
and
so
on.
A
Right,
and
it
would
have
be
a
lot
worse
because
the
way
you
do
multi-cluster
communication
is,
you
need
to
first
of
all,
have
a
discovery
in
place
so
that
one
cluster
knows
about
the
pods
of
the
other
clusters.
So
that
means
that
each
envoy
sidecar
would
have
had
the
visibility
of
2000
pods
right.
So
in
terms
of
memory
it
would
have
been
like
a
nightmare
and
I
would
have
need
like
a
terabyte
of
ram.
A
A
I
will
show
you
that
in
the
environment,
but
using
that
was
really
nice,
because
I
was
able
to
go
from
two
hours
and
a
half
to
45
minutes
exactly
what
what
I
wanted
and
the
second
really
positive
side
effect
of
that
is
that
now,
because
everything
is
in
memory,
I
reboot
my
machine.
I
start
from
scratch.
I
don't
need
to
have
this
very
long.
A
You
know
if
you
do
a
kind
delete
cluster
and
you
use
normal
storage.
It
will
take
forever
because
it
has
to
delete
to
go
through.
You
know
the
directory,
where
all
these
containers
have
been
stored
and-
and
it
takes
a
lot
of
time
to
to
delete
all
these
entries.
So
with
memory
you
don't
care,
you
reboot,
it's
gone
and
you
start
from
scratch
again.
A
So
this
is
the
achievement
at
the
end,
like
eight
clusters,
two
thousand
pots
and
forty
five
minutes.
I
didn't
calculate
the
budget,
but
the
you
you
you
can
you
can
do
the
math.
It's
like
three
three
dollars
an
hour,
the
vm.
I
use
it's
a
huge
one,
but
still
only
three
dollars
now,
so
it's
less
than
100
per
day,
and
even
if
you
want
like
a
printable
version
that
would
be
less
than
one
dollar,
so
it
can
be
very,
very
cheap
and
you
can
do
very
nice
scale
testing
with
it.
A
So
the
next
step
was
like
now.
I
won't
like
to
have
this
multi-cluster
communication
right
and
the
way
it
works
with
the
multi-primary
design
in
istio.
A
Is
that
you
enable
something
that's
called
endpoint
discovery
service
and
the
way
it
works
is
that
each
each
istio
you
need
to
create
on
each
istio
control,
plane,
one
secret
corresponding
to
all
the
to
the
cube
api
server
of
all
the
other
clusters,
so
that
one
control
plane
will
then
go
and
reach
all
the
cube
api
server
of
the
other
clusters
to
be
able
to
discover
the
workloads
and
the
second
control
plane.
That's
the
same,
and
the
third
has
the
same,
and
it's
only
time
four.
A
A
It's
not
really
nice
in
terms
of
the
way
it
will
scale,
but
also
it
has
like
other
issues
like
if
I
cannot
reach
if
one
one
of
my
api
server
here
becomes
unavailable
and
one
istio
control
plane
restart,
it
cannot
start
you
have
to
go
and
delete
the
secret
before
it
can
start.
You
have
like
also
security
concern
right.
If
one
guy
one
of
these
cluster
is
compromised,
you
have
a
secret
for
all
the
other
ones.
A
So
now
you
can,
you
can
really
delete
everything
if
you
want,
you
can
delete
all
the
pods
of
all
the
clusters
and
and
create
a
big,
a
big
mess
so
with
blue
mesh,
which
is
our
management
plan
for
managing
multiple
istio
control
planes.
We
have
like
a
very
nice
design
where,
first
of
all,
it's
just
one
component,
so
the
blue
mesh
control
plane
or
a
management
plane
that
will
be
responsible
for
discovering
everything
and
making
the
other
clusters
aware
of
what
it
has
discovered.
A
But
the
other
thing
that
we
have
implemented
recently,
which
is
is
very
nice,
is
that
we
now
have
like
a
an
agent.
We
that
runs
on
each
cluster,
and
this
is
the
responsibility
of
this
local
agent
to
watch
the
local
api
server
and
to
pass
the
information
to
blue
mesh
using
a
grpc
channel.
A
So
blue
mesh
gets
old
info,
but
all
these
discovered
workloads
and
then
blue
mesh
can
use
the
same
grpc
channel
to
tell
the
all
the
different
agents
what
what
it
has
discovered
and
and
how
to
apply
all
the
policies
and
so
on.
So
it's
a
lot
more
scalable
and
also
a
lot
more
secure
right,
because
there
is
no
exchange
of
the
secret
of
the
api
servers
and
and
so
on.
A
So
you
can
do
much
more
with
bluemesh.
Obviously,
it's
not
only
about
discovery
and
I
will
speak
about.
I
will
kind
of
do
a
demo
where
I
will
show
you
this
environment
that
I
built,
but
I
will
also
in
this
demo
do
a
focus
on
the
global
failover
routing
which,
which
is
really
an
amazing
feature.
A
I
could
spend
like
an
hour
just
or
two
hours
or
more
just
to
go
through
all
the
nice
capability
of
blue
mesh,
but
but
I
think
you
get
like
a
good,
a
good
first
view
of
of
what
it
can
it
can
do.
So,
let's
go
for
the
live
demo,
so
I
have
here
my
environment,
which
is
you
know
the
vm
I
spoke
before
and
I
have
glue
mesh
training
and
you
see
I
have
my
eight
clusters.
A
I
have
my
more
than
2000
pods
deployed,
so
everything
I
described
before
and
I
will
go
through
a
lot
more
details
quickly
but
before
let
me
just
show
you
that
on
the
cli,
so
I
have
like
a
different
context.
A
I
have
one
for
like
the
management
cluster,
because
I
spoke
about
the
fact
that
I
have
like
at
the
end
eight
clusters,
but
in
fact
I
have
nine.
So
I
have
one
for
the
management
for
like
blue
mesh
and
the
eight
others.
Where
I
list
you,
I
could
have
the
management
on
one
of
these
eight
clusters,
but
it's
kind
of
a
best
practice
to
have
a
dedicated
one
and
what
you
can
see
here
that
I
have
like
just
blue
mesh
running.
I
don't
have
istio
at
all.
A
We
have
airbag
web
hook,
which
is
also
very
nice.
I
won't
have
time
to
go
through
the
details
here,
but
it
can
help
you
to
define
who
can
do
what
like,
who
can
create
what
kind
of
policy
on
which
cluster
and
which
name
space,
which
kind
of
capability
like
traffic
shifts
like
hitch
rides
or
all
these
different
things.
A
So
that's
kind
of
my
management
cluster.
Now
I
have
like
cluster
one
to
cluster
eight,
where
you
can
see
that
I
have
like
ten
name
spaces
and
in
each
namespace
I
have
like.
If
I
do
go
to
the
first
one
here
on
each
namespace,
I
have
like
25
codes
so
10
times
25,
it's
250
points
right.
So
that's
what
I
have
here.
I
also
have
like
in
the
default
namespace.
A
You
can
see
that
I
have
like
something
called
like
a
vd
like
I
could
like
for
virtual
destination.
This
is
a
small
ui
that
we
will
use
to
to
show
this
to
demonstrate
this
global
failover,
okay.
So
the
other
thing
I
want
to
show
you
is,
if
I
go
to
cluster1
and
look
at
the
nodes.
A
So
at
least
that's
what
I
simulate
right
and
if
I
go
to
cluster
2,
it's
u.s
west
for
some
region
got
a
different
zone
and
then
on
number
three,
it's
a
different
regions
and
another
zone,
and
I
have
like,
like
that,
like
four
regions,
two
zombie
regions,
that's
what
I
I
simulated
here
right
and,
as
I
was
saying
before,
the
idea
is
that
here
I
have
like
this
172
1821
is
the
ingress
of
the
the
ingress
gateway
of
cluster
one.
A
So
if
I
do
a
get
of
this
app,
it
will
basically
send
a
request
to
this
url.
And
you
see
this
url
is
the
first
sport
of
the
first
namespace
right
and
this
application.
What
it
does.
It
returns
information
about
the
pod
like
the
name
of
the
pod,
but
also
return
information
about
the
region
and
the
zone
right.
So
this
one
says:
okay,
I
am
running
in
west
one
right
zone.
A
A
So
the
idea
would
be
to
you
know,
see
how
we
can
use
glue
mesh
now,
so
that
if
this
service
becomes
unavailable
on
the
first
poster,
I
want
it
to
go
directly
to
the
next
available
zone
and
if
the
next
available
zone
is
not
available,
I
want
to
go
to
the
next
region
right.
So,
as
I've
shown
you
here
on.
Currently
it
just
goes
local
right
on
this
region,
and
this
one
as
well.
A
One
thing
I
didn't
mention,
which
is
very
nice
with
blue
mesh
as
well,
is
that
we
kind
of
consolidate
all
the
metrics.
So
what
I
did
is
that
I
we
I
use
it
to
consolidate
all
the
matrix,
so
all
the
sidecar
processes
they
send
their
metrics
to
the
local
agent
and
the
local
agent
passed
this
matrix
to
the
blue
mesh
management
plane
and
on
cluster
one.
I
deployed
kelly
and
I
point
chili.
A
Instead
of
pointing
kelly
to
a
chromatize
locally
that
will
scrub
the
local
metrics,
I
I
pointed
it
to
a
promoter
locally
that
scrapped
the
matrix
from
groommesh
directly,
and
you
see
here.
A
I
see
communication
in
the
last
10
minutes
that
happened
between
you
know:
ingress
gateway
of
cluster,
one
going
to
echo
on
one
and
ingress
booster
of
cluster,
eight
going
to
echo
and
one
year.
So
exactly
what
we
we've
just
demonstrated
right
and
now
to
have
this
high
availability
that
I
discussed
about.
What
we
need
is
like
a
few
things
right,
so
the
first
one
is:
if
we,
if
we
look
at
our
blue
mesh
ui,
we
see
like
we
have
a
virtual
machine.
We
have
like
our
eight
clusters.
A
A
So
what
we
did
is
that
we
just
say
we
will
create
a
newest
name
that
will
be
called
echo
and
one
dot.
Echo
end
service,
one,
something
like
that.
Let
me
let
me
take
a
look
here
and
we
will
see
it,
but
basically
we'll
just
go
there.
You
see
we
have
like
we
have.
We
have
made
the
250
pods
or
the
250
services
highly
available,
so
everything
is
automate
will
be
automatically
you
know
doing
this
failover
between
regions
and
zones
and
so
on
right.
So
the
way
it
works.
A
A
So
the
way
I
could
use
it
is
that
I
could
just
use
this
hostname
now,
instead
of
just
name,
I
have
used
here
and
I
would
have
this
high
availability,
but
what
we
can
do
as
well-
and
this
is
what
we
do
in
this
demo,
which
is
even
more
powerful-
is
that
we
created
some
policies
so
that
what
these
policies
are
doing
is
that
we
say
when
the
request
is
sent
for
the
local
echo
and
one
or
echo
and
service
one
on
the
echo
and
one
namespace
when
the
request
is
sent
locally.
A
Basically,
I
want
it
to
go
to
my
virtual
destination,
so
that
now
is
transparent.
So
that
means
that
when
I
send
a
request
here
to
the
local
service,
it's
basically
behind
the
scene,
a
highly
available
service
so
to
try
it
out.
It's
it's
quite
easy.
We
will
fail
the
service
here
on
the
first
poster
and
we
see
that
it
will
automatically
go
to
the
the
next
one
right.
So
let
me
do
that
here,
so
I'm
going
to
so
what
I
do.
A
I
just
replace
the
the
container
used
the
image
used
for
the
service,
because
it's
a
very
minimal
one,
and
I
I
use
a
new
one
and
I
just
like
do
a
sleep
20
hours
so
that
it
cannot
reply
anymore
and
it's
kind
of
considered
as
a
failing
service
by
envoy.
And
here,
if
I
just
go
here,
it
will
take
like,
like
probably
like
30
seconds
to
start
and
to
terminate
the
the
other
one.
A
So
you
see
here
it's
pending
the
new
one,
so
I
still
have
the
old
one
running
and
I
still
need
just
to
wait
like
30
seconds
or
something
like
that,
and
what
I'm
going
to
do
in
the
meantime
is
that
I'm
going
to
go
to
my
ui,
the
service
perspective,
my
ui.
A
Yeah-
and
here
I
want
to
do
like
a
curl
local
host,
15
000,
slash
crystal
so
these
are
the
clusters
that
these
are
like
the
android
clusters
right.
So
these
are
all
the
entries
that
it
knows
right.
So
it
knows
about
the
local
services
that
are
in
the
same
name
space,
because
this
is
the
way
I
configured
my
sitecare
object,
but
it
also
knows
about
these
global
services
right.
So
here
I
can
see.
I
have
like
a
global
service
and
the
one
I
want
is
the
one
that's
called
one
right.
A
So
if
I
do
that
here,
I
see
all
the
entries
that
correspond
to
my
eight
services
right.
It
has
been
automatically
replaced
by
that
right
and
what's
interesting,
is
that
if
I
look
at
my
zone,
for
example,
I
can
see
that
I
have
like
each
one
in
a
different
zone
right
because
they
are
each
on
on
different
clusters
and
I
could
even
see
something
else
interesting
which
is
like
based
on
the
zone
and
the
region.
A
What
istio
has
done
is
that
it
has
set
what
we
call
a
priority,
and
you
see
here
so
us
quest.
One
is
a
cluster
one
right,
so
the
priority
for
this
entry
is
zero.
So
I
want
to
send
all
the
requests
locally.
First
right
and
u.s
waste
ii
is
same
region
but
a
different
zone.
So
it's
priority
one.
If
I
cannot
go
there,
I
prefer
to
go
there
and
then
all
the
other
ones
are
in
different
regions
and
they
have
priority
too
right.
A
So
that's
basically
why
we
will
see
that
it
will
go
to
the
next
dawn
and
then
the
next
region.
So
if
I
go
back
oops
sorry,
that's
not
what
I
wanted
to
do.
If
I
go
back
to
my
close
to
here,
I
see
I
have
like
oh
look
in
the
right
cluster.
A
So
again,
if
I
try
to
look
at
my
pods
here,
I
see
it
has
been
replaced
now
and
if
I
go
to
my
ui
and
I
try
to
access
it,
you
see
now
it
goes
automatically
on
the
next
zone
right
because
the
local
one
is
not
available
anymore,
and
I
can
do
the
same
here
and
just
try
to
now
make
the
second
one
failing
as
well.
A
That's
what
I'm
doing
just
now
here
now,
it's
still
there
because,
as
you
have
seen
before,
it
takes
some
time
before
it's
replaced
by
the
new
one.
So
we
have
to
wait
like
for
a
second,
but
what
we
can
do
as
well.
We
can
go
on
kelly
and
you
see
here
it
has
discovered
automatically
that
now
it
sends
a
request
from
one
cluster
to
another
right.
So
that's!
What's
very
nice
with
this
global
metrics,
we
have
gathered
from
everywhere
that
we
know
where
the
requests
are
going
and
so
on
right.
A
So
it's
one
example
is
like
to
use
kelly,
but
you
can
also
just
have
access
to
all
the
metrics
by
yourself
you.
We
are
also
going
to
add
some
nice
graphs
based
on
this
matrix
in
our
ui
in
the
in
the
in
the
near
future.
Right.
A
Another
thing
I
want
to
show
you,
while
we
wait
for
this
failing
to
happen
here,
is
that
we
have
also.
I
spoke
about
the
metric
that
we
consolidate,
but
we
also
consolidate
all
the
access
logs
and
here
it's
not
about
like
translating
all
the
accessories,
because
it
could
be
a
lot
right.
It's
really
giving
you
the
ability
to
specify
what
you
want
to
gather
like.
Let's
say
you
have
an
issue
at
the
point
in
time
with
an
application.
A
A
Come
on
yeah,
I
said:
okay,
I
want
to
gather
all
the
access
log
for
my
workloads
that
com
that
have
this
label,
so
that's
correspond
to
my
first
application
in
the
first
name
space-
and
I
want
to
to
these
logs
from
all
the
clusters
in
one
place
right.
So
I
configured
that
and
now
I
can
use
this
endpoint,
which
is
on
my
management
cluster,
so
on
bluemesh
to
gather
these
logs
right
and
I
can
get
a
lot
of
very
useful
information
here.
A
You
see
here,
for
example,
I
got
this
log
from
the
cluster
tool
right
because
this
is
where
I
send
my
last
request
right
and
I
can
see
a
lot
of
very
interesting
information
about.
You
know
the
the
identity
of
of
the
the
pod.
I
can
see
information
obviously
about
you
know
my
request.
You
know
performance
information,
but
also
like
information.
You
know
it
was
a
get
on
this
path
and
this
has
been
the
response,
and
you
see
the
same
here
on
cluster
2
again,
and
you
can
continue
like
that.
A
At
some
point
you
will
see
on
customer
one
that
we
have
like
a
response.
You
know
500
or
something
like
that,
like
you
see
503,
because
that's
where
we
we
failed
ourselves
right
so
now,
if
I
go
back
and
take
a
look
at
my
pods
and
show
it
it's
good
and
now
I
can
see
that
I'll
be
redirected
now
to
any
of
the
other
cluster
right.
A
It
doesn't
matter
now,
because,
obviously
there
is
no
clarity,
no
difference
between
the
the
different
regions
and
if
I
go
back
to
kelly
here
very
quickly,
you
see,
I
see
that
it
goes
everywhere
right
now.
You
see
that
I
got
like
he
quest
going
through
all
the
different
clusters.
A
So
yeah,
I
think
that's.
You
have
a
good
idea
now
about
how
it
works.
We've
just
shown
of
a
few
capabilities
of
blue
mesh,
but
I
think
you
you
already
can
see
how
it
can
simplify
your
life.
If
you
you
have
like
multiple
histo
clusters,
so
I
hope
you
enjoyed
the
talk
and
we
have
some
time
for
q
a
now
thank.