►
From YouTube: Using SPIRE in Production at Uber - Andrew Moore
Description
Using SPIRE in Production at Uber - Andrew Moore
In this session we will provide an overview of how Uber uses SPIFFE and SPIRE for workload authentication and authentication in a diverse deployment environment. We will highlight the deployment architecture, operational practices, and benefits achieved.
Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2021 Virtual from May 4–7, 2021. Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.
A
So
let's
go
ahead
and
get
started,
so
what
I'm
going
to
be
talking
to
you
about
at
the
start
is
just
the
scale
that
we
deploy
to
at
uber,
including
aspire
the
environments
that
we
work
with
and
then
how
we
integrate
with
spire
server
on
various
levels.
After
that
I'm
going
to
be
going
into,
how
did
we
actually
onboard
our
consumers?
A
Make
that
experience
nice
for
them
make
sure
as
much
as
possible
everyone's
getting
at
the
identities
that
they
need
so
for
the
scale
we
have
about
a
dozen
independent
data
centers
with
less
than
10
spire
servers
per
data
center.
Each
one
has
tens
of
thousands
of
hosts
in
it
and
then
a
dedicated
spire
agent
on
each
every
data
center
every
day
is
performing
tens
of
millions
of
x,
509
signings
from
spire,
and
then
today
we
have
over
500
services
onboarded
to
spire,
which
is
less
than
10
percent
of
the
internal
services
at
uber.
A
For
our
environments,
we
have
several
in-house
orchestrators
for
better
or
worse.
We
build
a
lot
of
things
in-house.
Our
hosts
are
both
on-prem
and
provided
by
multiple
cloud
providers,
and
then
nearly
all
of
our
services
are
containerized
and
written
in
either
go
or
java
or
python.
A
So
to
go
over
our
server
deployment
and
integration.
It's
already
been
gone
over
before
by
andrew
harding
and
evan
gilman.
How
spire
works
in
general,
so
we
have
the
spire
server
cluster.
In
the
data
center,
we
have
the
spire
agents
deployed
on
every
single
host,
a
testing
with
spire
server
syncing
on
what
registrations
they're
responsible
for
the
aspire
agents
are
baked
into
an
enforced
image
fleet
wide
to
ensure
that
it's
present
we
have
next
to
the
spire
server
a
low-level
registrant
that
we
own.
A
Would
not
be
able
to
get
identity,
for
instance,
that
would
include
orchestrators
of
workloads
somewhere
there's
that
bottom
turtle
orchestrator
that's
deploying
other
orchestrators.
That
bottom
level
thing
needs
to
get
an
identity
from
spire,
so
low
level
register
will
usually
make
the
registrations
for
those,
after
that,
it's
the
goal
state
that
these
orchestrators
be
managing
the
registrations
for
the
workloads
that
they
actually
own.
A
We
also
have
things
like
health
checkers
on
the
infra
level
level,
for
instance,
when
a
host
comes
up,
make
sure
that
the
spire
agent
on
it
is
actually
attested
with
spire
server
when
we're
bringing
a
host
down
for
repair
or
just
normal
host
life
cycle,
make
sure
that
that
agent
gets
evicted
out
of
spire
server,
and
then
there
are
other
just
special
interest
services
that
just
want
to
know
various
things
about.
What's
registered,
inspire
for
their
own
purposes,.
A
A
Refactoring
of
their
service,
they
shouldn't
have
to
do
a
bunch
of
reading
to
figure
out
what
it
is
they
need
to
do.
We
shouldn't
change
their
service
behavior
unless
they
want
it
to.
We
have
to
allow
a
lot
of
customization,
because
every
service
has
different
needs
and
then
we
have
to
interoperate
with
the
existing
authentication
and
authorization
at
the
company,
so
keeping
in
mind
that
most
services
are
containerized
and
we're
limited
to
just
a
few
languages.
A
Our
solution
for
this
was
to
create
a
library
that
wraps
off
for
services,
so,
of
course,
wraps
authentication
with
spire
agent
or
whatever
authentication
the
service
happens
to
have
configured
exposes
functionality
for
using
whatever
identity
has
been
assigned
for,
say,
like
signing
various
things
performing
ntls
or
anything
else,
where
it's
just
useful
for
a
service
to
have
an
x509
with
an
associated
private
key.
A
This
library
also
uses
middleware
to
inject
short-lived
tokens
into
outbound
requests,
and
then
those
tokens
have
the
spiffy
id
of
the
calling
service
and
are
signed
by
that
x.
509S
vib,
then
also
we
wrap
all
logic
for
authorization
and
then
authorization
decisions
are
made
based
on
the
calling
spiffy
id
and
the
called
endpoint
how
this
actually
works.
A
A
Fed
for
it
now,
let's
say
that
workload,
a
wants
to
talk
to
workload
b
for
simplicity,
we're
going
to
say
workload
b
is
also
on
board
with
the
security
library
it
doesn't
have
to
be.
It
just
makes
things
more
convenient,
so
what
workload
a
is
going
to
do
it's
going
to
send
its
request,
as
per
its
normal
business
logic,
off
to
work
b,
what
the
security
library
is
going
to
do
on
behalf
of
workload?
A
A
Tokens
just
based
on
our
own
needs
workload
b
side
we're
not
touching
the
workload
b
business
logic
at
all.
At
this
point,
this
is
the
security
library
on
workload
b.
Side
is
going
to
check.
Does
the
signature
on
the
token
match?
Am
I
actually
workload
b?
Did
I
somehow
wind
up
getting
a
request
or
token
intended
for
some
other
destination.
B
A
If
workload
b
is
onboarded
to
spire,
it
could
include
just
verifying
against
the
distributed.
Spire
trust
bundle
is
this
spiffing
id
allowed
to
call
me
on
this
endpoint
and
then
that's
just
a
authorization
policies
as
part
of
the
library
and
finally,
assuming
just
all
of
that
passes.
We
finally
make
it
to
workload
b's
actual
business
logic,
the
code
that
the
owners
of
workload
b
are
actually
worried
about
and
just
process
the
request
as
normal.
A
There
might
be
a
question
of
well
we're
injecting
these
tokens.
Why
are
we
not
just
using
jwtsvids
so
for
one
thing,
x509s
just
work
better
with
our
existing
services
today,
if
everything's,
just
using
the
same
standard,
just
makes
it
easier
to
create
the.
B
A
There's
also
the
issue
of
resiliency.
If,
for
some
reason
aspire
agent
were
to
become
even
temporarily
unhealthy,
then
callers
would
not
be
able
to
get
their
tokens
anymore,
which
are
much
shorter,
lived
than
the
jwt
generally
and
then
receivers
of
requests
would
also
not
be
able
to
verify
if
their
spire
asian
is
unhealthy,
wouldn't
be
able
to
call
verify
jwts
anymore.
A
B
A
A
A
B
Yeah
andrew,
can
we
read
the
questions.
A
Yeah,
so
there's
curious
why
you
chose
to
use
library,
architecture
versus
sidecar,
just
answered
that
and
then
someone
just
had
a
string
of
things
asking
why
we're
not
using
other
projects
that
might
be
available
for
kubernetes
we're
not
on
kubernetes
at
uber
today.
So
we're
not
using
something
like
envoy
we're
not
using
other
solutions
like
that,
so
library
architecture
just
worked
out
better.
A
Using
is
an
in-house
solution
today,
but
we
might
move
on
later.
If,
for
instance,
we
do
migrate
over
to
kubernetes,
then
we
might
use
something
that's
more
compatible
there,
rather
than
an
in-house
solution.
A
There's
roughly
how
many
trucks,
how
many
different
trust
domains
are
you
operating
and
how
much
traffic
do
you
have
crossing
trust
domain
boundaries,
so
our
team
is
responsible
for
just
two
trust
domains.
Only
one
of
them
is
a
real
trust
domain,
because
it's
not
used
for
like
testing
your
development
and
then
we
do
have
some
cross
traffic
with
the
team
that
is
in
charge
of
the
personnel
identity,
trust
domain.
A
So
we
do
have
to
handle
verification
of
personnel
identities
crossing
over
and
contacting
aspire
things
on
board
to
the
spire
trust
domain.
I
don't
think
we
have
any
other
major
trusted
names
today
that
doesn't
mean
won't
in
the
future
and
then
uber
is
also
very
enthusiastic
about
partnering
with
an
acquiring
company,
so
you
might
have
like
federation
in
the
future.
Things
like
that
who
knows,
but
not
today,
does
the
receiving
and
validate
the
spiffy
id
and
the
jwt
matches
the
city
id
and
the
signing
x59
svid.
Yes,.
B
A
So
these
thousands
of
workloads
are
mostly
managed
in
the
single
domain.
Has
that
caused
any
challenges?
No,
that's
that
hasn't
caused
any
issues
at
all.
Really,
the
only
thing
that's
come
up
was
our
initial
work
to
split
a
like
test
domain
versus
a
real
production
domain
beyond
that,
no
having
having
all.
A
No,
it
is
not
there's
a
lot
of
custom
stuff
related
to
uber
in
it
and
if
we
were
to
say,
I
would
theorize
that
if
we
were
to
strip
out
everything
custom
to
uber
in
there,
it
would
be
a
pretty
trivial
library,
because
you
ultimately
are
just
contacting
spy
getting
your
s,
vid,
creating
tokens,
outbound
requests,
so
that'd
be
more
an
exercise
for
the
audience.
I
think
not
something.
I
think
that
we're
gonna
open
source
anytime
soon
yeah
is
simply
the
only
identity
management
framework
you're
using
the
uber.
A
If
not,
is
there
a
transition
to
spiffy
right?
So
we
are.
We
do
have
some
services
planned
for
deprecation
or
decommissioning
which
didn't
start
out
with
spiffy
standard
because
it
didn't
exist
yet,
and
we.
A
Our
security
library
that
handles
calls
related
to
services
onboarded
to
that
that
kind
of
takes
them
over
to
being
on
the
spiffy
id
standard.
A
Use
any
service
mesh,
what
kind
of
topology
do
you
use
for
multi-cluster
federation
so
right
now
we're
using
spire
as
the
basis
to
start
up
a
secure
service
mesh.
We
do
have
other
things
currently
implemented
that
are
entirely
in-house,
but
we're.
A
Over
to
more
secure,
secure
things
based
on
the
spiffy
framework-
and
we
don't
use
federation
today,
but
we
could,
in
the
future.
A
What
were
some
of
the
bigger
challenges
he
had
an
uptake
of
spiffy
spire,
I'm
gonna
assume
you
mean
for
uptake
on
the
onboarding
of
customers
don
ross,
so
challenges
where,
of
course,
everyone
that's
currently
using
the
legacy.
Software
for
authentication
wants
to
just
continue
using
that
forever.
So.
A
Are
we
changing
at
all
beyond
that?
We
knew
that
there
would
be
a
lot
of
uphill
battles
if
we
made
onboarding
difficult.
So
that's
why
we've
made
a
lot
of
the
upfront
effort
to
make
this
security
library
as
easy
to
onboard
to
onto
and
configure
as
possible
to
make
sure
all
our
internal
guides
related
to
it
are
also
like
as
straightforward
and
easy
to
understand
as
possible.