►
From YouTube: 010 Adopting Istio across 100 clusters at T Mobile
Description
Learn about T-Mobile’s journey of adopting Istio across 100+ clusters to support microservices for fraud detection, billing, sales and APIs across many teams. The talk will cover things such as tenancy, install/upgrade, feature adoption, CI/CD integration, and architecture tradeoffs.
A
A
I'm
also
a
maintainer
of
the
magtape
project
that
focuses
on
kubernetes
policy
as
code,
but
enough
about
me
at
t-mobile
we
have
a
team
of
rock
stars
that
power
our
platforms
and
services.
I'm
just
the
one
here
talking
about
it
today.
So
big
shout
out
to
all
the
amazing
folks
that
I
get
to
work
with
to
make
all
of
what
I'm
about
to
cover
possible.
A
A
We
still
had
a
pretty
diverse
environment
with
a
large
cloud,
foundry
footprint
and
a
lot
of
applications
that
didn't
make
sense
to
drive
to
kubernetes
from
virtual
machines
and
bare
metal
systems.
Security
was
a
constant
concern.
As
our
platform
grew
and
new
risk
surfaces
emerged,
there
was
a
definite
learning
curve
in
the
container
space
for
our
developers,
and
we
wanted
to
simplify
that
as
much
as
possible
and
just
like
anything,
your
business
becomes
to
depend
on
resiliency
was
a
key
focus
in
everything
we
did
now
with
any
large
project.
A
We
came
together
and
worked
on
a
list
of
goals
to
help
us
drive
towards
service
mesh
adoption,
with
only
a
small
team.
To
start
with,
we
had
to
keep
automation
in
mind
from
day
one
we
knew
service
mesh
wouldn't
be
a
good
fit
for
all
users,
at
least
not
at
first,
and
we
really
weren't
staffed
to
handle
the
support
burden
from
our
entire
user
base
sort
of
being
forced
to
adopt
across
the
board.
A
A
A
We
started
our
istio
journey
before
1.0
man
have
things
changed
since
then,
as
with
any
complex
software,
you
need
a
good
plan
for
life
cycle
management.
Just
getting
it
installed
everywhere
is
not
enough.
The
the
day
two
operational
burden
can
be
huge,
and
this
is
part
of
the
process
where
we
learned
a
ton,
as
it
was
sort
of
a
new
model
with
respect
to
how
things
could
impact
consumers
of
the
platform,
with
the
data
plane,
essentially
being
a
user-facing
component
to
help
ease
a
lot
of
this.
A
We
came
up
with
a
pretty
formal
process
for
promoting
seo
changes
and
upgrades
in
our
environment.
Now,
there's
no
magic.
Here
we
started
by
reading
the
release,
notes
and
change
logs,
just
like
everybody
else
does
to
see
if
there
are
any
configuration,
changes
or
breaking
changes
that
we
need
to
be
aware
of
and
solve
for
in
our
own
configuration
next,
we
target
installing
the
new
release
in
a
sandbox
environment
of
sorts,
and
we
run
our
suite
of
tests
to
validate
things.
A
A
At
a
high
level,
we
try
and
absorb
as
much
of
the
mesh
complexity
into
the
platform
and
tooling
as
possible.
We
have
common
ingress
and
egress
gateways
established
in
a
sort
of
centralized
manner,
and
for
this
we
have
load
balancers,
dns,
tls
pre-plumbed,
so
things
just
work
without
the
mesh
consumers
having
to
worry
about
any
of
that.
A
This
takes
the
concept
that
most
folks
are
used
to
with
an
ingress
controller
for
http
services
and
makes
it
possible
for
tcp
based
services
that
support
smi.
We
get
the
same
one-to-many
name-based
routing
functionality
as
an
ingress
controller
instead
of
mapping,
node
ports
or
burning
through
a
bunch
of
load
balancers.
A
Our
service
mesh
story
is
far
from
over.
We
still
have
a
lot
that
we
want
to
do,
but
overall
we're
down
with
the
meshness
we've
started
to
play
around
with
integrating
non
services
into
the
mesh
we're
following
along
with
the
enhancements
coming
from
the
istio
project
itself,
but
aren't
far
enough
along
they've
developed
any
real
opinions
so
far
in
our
own
testing.
A
A
A
If
you
don't,
it's
likely
a
mistake,
there's
been
a
lot
of
work
over
the
past
few
releases
focusing
on
stability,
but
it
has
been
a
real
problem
in
the
past.
We
keep
a
trust
but
verify
attitude
these
days
as
we
go
into
upgrades
and
sort
of
tying
into
release,
stability,
we've
seen
default
values,
change
and,
in
general,
the
api
changes
came
fast
and
furious
for
a
while
most
of
the
apis
are
mature
at
this
point,
but
definitely
pay
attention
to
the
release
notes.
A
Now
we've
been
bitten
more
than
once
with
things
outside
the
mesh
being
invasive,
some
in
slightly
annoying
ways
and
some
in
catastrophic
ways
feel
free
to
follow
up
with
me
offline,
and
I
can
share
specifics
now
releases
come
often
and
it
can
be
really
hard
to
keep
up.
We
have
a
running
joke
that
anytime,
we
start
talking
about
a
specific
istio
release.
There's
probably
a
new
seo
release
being
made.
So
keep
that
in
mind.
A
Many
enterprises
have
a
separation
of
duties
around
those
who
build,
maintain
the
mesh
layers
and
the
developers
that
consume
it.
Here
are
a
few
things
to
keep
in
mind
that
we
found
helpful
one.
Do
not
let
buzzwords
or
the
fear
of
missing
out
drive
your
adoption
of
service
mesh
talk
through
your
needs
and
lead
based
on
features.
A
A
A
Next,
let's
talk
a
little
about
stability,
and
here
I'm
referring
to
maintaining
a
stable
service
mesh
offering
after
you've
got
it
deployed
in
your
environment.
The
first
thing,
automation
is
your
friend.
The
sdo
project
moves
extremely
fast
and
keeping
up
with
upgrades
is
hard.
The
more
automated
your
life
cycle
management,
the
easier
it
will
be
to
keep
up,
set
a
realistic
pace
for
yourself
that
works
within
your
company's
strategy
and
know
that
skipping
releases
that
solely
implement
new
features
that
may
not
be
important
to
you
is
probably
okay.
A
A
Now
take
a
brief
moment
and
talk
about
usability,
we
saw
a
huge
advantage
in
embedding
our
platform
team
with
our
application
development
teams,
and
this
is
a
boundary-
that's
closing
more
and
more
every
day
and
decisions
on
how
the
mesh
should
be
installed
and
configured
need
to
be
made
with
the
awareness
of
the
applications
in
your
environment
so
meet
with
teams
regularly
to
identify
to
identify
what
does
and
does
not
work.
This
isn't
a
one-time
thing.
A
A
That
being
said,
as
we're
testing
out
how
we
want
to
handle
our
multi-cluster
mesh
strategy,
we
found
that
there's
still
a
lot
of
edges
that
can
come
during
upgrades,
config
changes
and
environment
specific
oddities,
while
the
istio
project
supports
multiple
topologies
for
how
to
tackle
multi-cluster
meshes
and
some
even
offer
more
operational
savings
than
what
we
have
today
for
us.
We're
sticking
with
the
multiple
control
planes
to
reduce
our
overall
risk
of
impact.
A
On
more
than
one
occasion,
we
found
that
noisy
neighbors
can
be
a
real
problem.
Within
a
mesh
instance,
we've
had
mesh
users
affect
each
other
across
name
space
boundaries
as
well
as
non-mesh
users
affecting
mesh
users.
The
isolation
boundaries
are
not
rigid
at
all,
and
this
was
while
probably
naive,
kind
of
a
surprise
to
us.
In
the
beginning,
I'm
happy
to
chat
afterwards
and
provide
some
additional
detail
around
these
sort
of
problems.