►
From YouTube: The Salesforce Service Mesh Our Istio Journey
Description
#IstioCon2021
Presented at IstioCon 2021 by Pratima Nambiar.
Istio and Envoy are foundational building blocks of the Salesforce Service Mesh. This presentation walks you through our service mesh journey. I will briefly talk about why we chose the service mesh design pattern, how we initially built it using envoy and our in-house control plane and our subsequent pivot to Istio. I will discuss how we are currently leveraging Istio and our plan to increase adoption of Istio to further enhance our Service Mesh platform.
A
A
One
of
core
one
of
salesforce's
core
values
is
trust,
trust
that
our
customers
can
have
in
us
that
their
data
is
safe
in
the
cloud
and
one
of
the
core
factors
that
plays
into
trust
is
security
and
compliance
and
for
network
traffic.
What
that
means
is
mtls
with
authorization
everywhere,
using
a
specific
set
of
ciphers
that
are
approved
by
our
security.
A
A
Our
pre
sto
service
mesh
looks
something
like
this.
We
focused
on
our
data
plane
to
begin
with.
In
fact,
we
started
with
a
another
open
source
data
plane
and
then
switch
to
envoy,
because
onward
was
more
performant
and
because
envoy
has
a
good
control,
plane
and
data
plane
split
with
a
well-defined
api
between
the
two.
A
At
that
time,
which
is
about
four
plus
years
ago,
there
wasn't
a
good
xds
open
source
implementation
that
we
could
leverage
and
therefore
we
built
our
own
bare
bones:
xts
implementation
to
solve
for
the
most
common
use
cases.
At
that
time,
this
control
plane
was
backed
by
zookeeper
as
the
service
registry
for
announcements
and
the
control
plane
interfaced
with
the
zookeeper
and
triggered
xds
updates
to
the
running
own
voice.
A
Based
on
that,
we
used
we
built
a
resiliency
test
framework
and
ran
through
some
common
scenarios
like
rolling
upgrades
transient
failures,
etc
and
came
up
with
good
defaults
for
resilience,
policies
to
maximize
the
success
of
requests
and
applied
that
configuration
via
our
in-house
control
plane
to
all
the
envoys
in
the
mesh.
A
A
It
seemed
to
be
solving
the
problems
that
we
were
trying
to
solve
and
therefore
overall
seemed
like
a
good
fit.
Istio
also
has
had
a
strong
community
then,
and
it
continues
to
which
has
grown
significantly
over
the
past
few
years,
that
we
could
rely
on
to
evolve
this
control
plane.
In
short,
it
felt
like
it
could
be
the
next
kubernetes
for
service
mesh.
A
A
good
example
of
where
this
decision
already
paid
off
was
when
envoy
built
the
v3
xds
api
and
started
to
deprecate.
The
v2
api,
in
order
for
us
to
continue
to
upgrade
envoy
and
pick
up
security
fixes,
we
would
have
had
to
invest
time
in
rebuilding
our
control
plane
to
support
the
v3
api.
A
One
was
mutual
tls
using
our
internal
ca.
Salesforce
requires
us
to
use
a
internal
ca
and
we
couldn't
use
citadel-based
certs.
We
were
already
running
with
a
heterogeneous
infrastructure.
Our
monolith
runs
on
bare
metal
and
we
have
quite
a
few
services
that
run
on
kubernetes,
dynamic
infrastructure.
A
We
made
some
resiliency
related
fixes
to
pipe
for
pilot
and
onward
communication,
and
we
made
the
envoymetric
service
configurable
so
that
we
could
ship
metrics
to
our
internal
metric
system,
as
we
did
the
poc.
What
we
really
liked
was
istio's
use
of
crds
to
configure
mesh,
and
that
made
it
easier
for
us
to
read
mesh
configuration
when
compared
to
onward
configuration,
and
we
could
see
that
it
would
allow
us
to
plug
into
our
tooling
and
pipelines
in
order
to
generate
this
configuration
to
support
more
higher
level
use
cases.
A
A
A
So
this
is
kind
of
the
architecture
we
ended
up
with
after
the
poc
we
ran
istio
pilot
when,
in
our
cluster
and
like
I
mentioned,
we
intentionally
chose
to
adopt
istio
incrementally
and
therefore
did
not
bring
up
galley
or
mixer
for
telemetry
and
authorization.
At
that
time,
we
just
stuck
with
the
control
plane.
A
We
built
a
config
web
hook
that
would
listen
for
service
events
and
it
would
generate
the
own.
The
mesh
configuration
or
istio
configuration
for
all
the
services,
including
applying
those
resilience
policies
that
I
talked
about.
A
A
Mtls
with
authorization
is
a
hard
problem
to
solve,
and
we
saw
a
lot
of
new
types
of
use
cases
starting
to
leverage
mesh.
In
order
to
meet
this
requirement,
we
saw
quite
a
few
off-the-shelf
products.
Leverage
mesh
a
few
that
are
worth
calling
out
is
cupid
our
messaging
platform,
solar,
our
search
platform,
zookeeper
and
redis
for
caching,
our
monolith
that
used
to
run
on
bare
metal
bare
metal
infrastructure
now
runs
on
kubernetes
and
it
uses
blue-green
deployment
strategy
for
deployment
powered
by
istio's
traffic
shifting
rules.
A
A
We
also
define
declarative
authorization
policies
in
a
config
repo
that
gets
fed
into
our
mesh
architecture
and
then
gets
converted
into
envoy
or
back
filters
that
get
applied
at
the
side
car
and
gets
enforced
at
the
sidecar
of
a
service.
A
We
also
generate
out-of-the-box
health
signals
or
golden
signals,
as
they
are
called
in
the
google
sre
book
using
envoy
telemetry
for
all
master
mesh
services,
that
our
sres
can
then
view
in
a
single
dashboard
to
understand
what
the
health
of
the
system
in
general
is
and
in
order
to
deploy
a
steer
to
public
cloud
infrastructure.
We
had
to
integrate
it
with
our
spinnaker
pipelines
and
and
build
a
process
for
upgrades
which
is
crucial
for
us.
Since
we
actually
run
the
latest
version
of
on
istio
and
envoy.
A
A
Apart
from
adoption
of
istio
for
mesh
communication,
we
also
adopted
a
software
load
balancer
at
the
edge
in
our
hyper
force
deployments
in
our
data
centers
today.
Most
of
the
traffic
flows
via
f5
load
balancers,
but
in
hyper
force
when
we
shifted
to
using
a
software
load.
Balancer
ingress
was
the
obvious
choice
for
the
most
part,
istio
configures
envoy,
as
an
edge
proxy,
pretty
well
with
good
defaults.
A
So
we
are
able
to
use
it
more
or
less,
as
is
istio's
gateway,
crd
to
configure
sni-based
routing
rules
at
the
ingress
simplifies
that
configuration
and
salesforce
has
complex,
dns
and
certificate
requirements.
A
So
we
had
to
build
a
workflow
around
dns
and
certificate
provisioning
in
order
to
allow
us
to
publicly
expose
our
services
via
ingress
gateway,
and
this
is
what
that
looks
like.
We
have
a
config
repo,
where
you
can
define
your
dns
and
certificates
and
template
requirements
for
exposing
a
service
to
the
public
internet
that
gets
that
gets
fed
into
kubernetes
as
crds,
and
we
run
a
kubernetes
operator
called
ingress
assistant
that
listens
for
these
events
and
then
triggers
off
a
workflow
to
create
or
update
those
dns
entries
and
provision.
A
Public
certificates
using
the
sand
templates
that
were
requested
and
then
deliver
them
to
walt
that
ingress
is
then
able
to
read.
Certificates
from
ingress
assistant
then
watches
for
the
completion
of
these
events
and
then
creates
that
gateway
crd
to
bring
it
all
together
and
configure
configure
the
s.
I
based
routing
rules
using
the
gateway
crd
that
get
then
gets
picked
up
by
sgod
and
then
gets
delivered
to
ingress
gateway
and
the
public
web
is
functional.
A
We
are
looking
to
use
the
new,
auto
registration
feature
for
bare
metal
services
so
that
we
can
get
rid
of
that
zookeeper
deployment
that
I
talked
about,
that
we
run
and
manage
today
for
service
registry.
We
are
also
looking
to.
We
are
in
the
process
of
leveraging
the
dns
proxy
feature
for
tcp
multi-cluster
support,
so
essentially
tcp
mesh
style,
communication
between
services,
running
on
two
different
kubernetes
clusters,
and
we
are
looking
to
use
the
traffic
shifting
capabilities
for,
say,
path-based,
routing,
cross-region
routing.
A
We
are
also
experimenting
with
standing
up
versions
of
our
monolith,
that
is
optimized
for
certain
types
of
requests
and
then
use
istio's
traffic,
shifting
rules
to
route
traffic
to
those
specific
subsets
that
are
optimized
to
receive
that
traffic
salesforce
has
complex
authentication
and
authorization
requirements.
A
Since
we
are
a
multi-tenant
platform,
we
are
looking
for
patterns
of
authentication
and
authorization
rules
that
we
can
move
from
our
application
code
and
make
them
as
features
of
the
mesh
platform
with
the
sidecar.
As
the
policy
enforcement
point,
we
are
actually
using
web
assembly
for
jot
minting,
for
example,
in
this
flow,
we
are
also
using
oppa
for
enforcing
these
authorization
rules.
A
We
are
looking
for
at
integrations
for
service
protection
and
and
gen
in
general
rate,
limiting
features.
We
expect
to
use
onward
photos
for
building
integrations
for,
say
things
like
a
centrally
controlled
fault,
injection,
etc.
A
As
hyper
force
gains
momentum,
we
expect
to
run
service
mesh
spanning
multiple
kubernetes
clusters,
so
we
have
the
software
for
things
that
we
are
watching
to
adopt
and
features
that
we're
watching
very
closely
would
be
multi-cluster
support
of
istio.
A
We
are
excited
about
the
web
assembly
technology
for
proxy
extensions
to
solve
for
new
types
of
business
use
cases
I
talked
about,
I'm
using
it
for
jot
minting.
We
will
look
at
using
it
for
dynamic
routing
header
injection
for
protection
against
wasp
security
and
risks.
A
We
are
watching
for
improvements
in
the
the
dns
proxy
feature,
especially
around
stateful
set
support
and
in
general
we
are
watching
for
improvements
with
the
istio
product
in
general
for
better
support
with
larger
meshes,
specifically
around
reducing
proxy
initialization
time,
optimized
config
delivery
and
improved
envoy
to
control,
plane,
load
balancing.
A
We
also
hope
to
leverage
istio
in
egress
gateway
as
our
egress
solution
at
some
point,
and
we
are
looking
for
improvements
to
the
upgrade
process
we
upgrade
pretty
often
since
we
make
changes
to
the
open
source
product
for
our
business
use
cases,
we
need
to
bring
back
those
changes
into
our
deployments
in
a
timely
manner.
So
we
actually
run
the
latest
version
of
istio
and
anything
that
would
make
it
easier
for
us
to
do.
Those
upgrades
would
be
awesome.
A
We
have
come
a
long
way
in
our
journey
to
adopt
service
mesh
and
issue
and
onward
in
particular,
but
we
also
have
a
long
way
to
go.
I
would
like
to
end
with
a
shout
out
to
the
istio
community.
Istio
has
a
strong,
active
community.
We
have
seen
this
manifest
in
a
variety
of
ways.
The
very
relevant
features
that
have
been
added
in
the
past
few
releases
is
a
good
example.