►
From YouTube: Authorization with Envoy at Square - Jelle Vanhorenbeke
Description
Authorization with Envoy at Square - Jelle Vanhorenbeke
Every organization has different authentication and authorization needs and it is not always clear how Envoy can help to abstract this from the application layer. In this talk we will show you how Square leverages Envoy's ’s ext_authz filter and how our centralized authorization service has become the new source of truth for hundreds of services. We will cover how we migrated multiple authorization libraries to this centralized authorization service and how we rolled out these changes to production. This process has benefited other teams and allowed them to launch new features that were previously not possible.
A
Hi
everyone,
my
name,
is
jealous,
and
today
I'm
going
to
talk
about
authorization
at
square.
I'm
a
software
engineer
on
square's
developers,
iam
team
and
I've
been
on
this
team
for
almost
a
year
and
a
half
now
and
the
entire
time
on
the
team.
I've
been
working
on
authorization,
which
is
what
I
would
like
to
talk
about
today.
A
Here's
a
quick
agenda
of
the
different
topics,
I'm
going
to
cover
first
I'll
talk
a
little
bit
about
envoy
at
square,
then
give
you
a
quick
overview
of
our
previous
authorization
architecture,
some
of
the
problems
and
challenges
we
face
with
that
and
then
how
we're
leveraging
envoy
to
do
authorization
before
we
do
that.
I
do
want
to
talk
about
authentication
and
authorization.
A
Authentication
is
a
process
verifying
your
identity.
Are
you
who
you
say
you
are?
Is
this
the
authentic
sally,
while
authorization
is
a
process
of
verifying
that
someone
has
the
right
permissions
and
is
allowed
to
do
what
they
want
to
do?
A
And
during
this
talk
we're
going
to
cover
the
second
bullet
point,
which
is
authorization?
And
I
know
that
very
often
these
two
go
together
and
I
would
encourage
people
to
try
to
think
about
them
separately.
A
Next,
I
would
like
to
introduce
you
to
safe,
safe
is
our
session
authorization
framework
enforcer?
It's
one
of
the
authorization
frameworks
available
for
service
owners
at
square
next
on
voice
safe,
which
is
our
envoy
based
authorization
solution,
which
is
the
main
solution
I'm
going
to
talk
about
today,
and
it
takes
a
lot
of
the
things
that
were
available
in
our
safe
framework
and
it
took
that
and
moved
it
to
an
actual
service
that
we
are
leveraging
with
our
service
mesh
next
on
voyage
square.
A
So,
there's
a
great
talk
that
two
of
my
colleagues
gave
at
this
exact
same
conference
two
years
ago
and
a
lot
of
the
things
they
talked
about
then
now
are
a
reality
at
square,
so
we're
at
the
next
level.
Where
now
we
can
leverage
our
service
mesh
to
build
a
lot
of
these
new,
exciting
features,
and
so
some
of
the
highlights
and
things
to
keep
in
mind
that
are
important
for
this
talk
is
that
square
has
a
centralized
control
plane
and
the
control
plane
has
a
preconfigured
cache
with
sidecar
configurations,
also
known
as
snapshots.
A
And
now
a
quick
overview
of
authorization
or
how
we
used
to
do
authorization
at
square.
We
had
multiple
authorization
strategies,
so
services
could
implement
different
libraries
or
leverage
different
libraries
to
do
authorizations.
Some
of
these
were
using
protos
some
of
these.
Some
of
these
other
libraries,
such
as
safe
they
had
echo
like
files
or
you
could
specify
the
the
different
rules
and
authorization
requirements
while
a
third
set
of
services
were
using
custom
code,
no
additional
library-
and
it
was
all
written
in
the
actual
application
layer
on
top
of
this
square-
supports
three
major
languages.
A
Several
minor
and
the
same
authorization
solution
is
not
available
in
all
languages.
What
they
mean,
what
that
means
as
a
service
owner.
If
you
have
multiple
microservices
written
in
different
languages,
it
is
possible
that
you
cannot
leverage
the
same
authorization
solution
for
both
these
microservices.
A
So
in
reality
that
looks
a
little
bit
more
like
this,
and
even
though
we
try
to
keep
these
libraries
in
sync
or
or
keep
feature
parity
as
much
as
possible.
That
is
not
always
the
case.
Some
features
get
implemented
in
one
language,
then
they
get
de-prioritized,
others
still
haven't
been
developed.
So,
there's
always
a
little
bit
of
a
difference
even
between
the
same
authorization,
library
and
in
different
languages.
A
Apis,
as
you
can
imagine,
even
though
this
works,
it
definitely
presents
multiple
challenges.
Some
of
these
challenges
are
it's
really
hard
to
know.
What
is
if
all
our
microservices
are
running
the
latest
version
of
our
odds
framework.
It's
also
hard
to
just
roll
out
new
features,
because
you
have
to
implement
them
in
multiple
languages.
A
On
top
of
that,
given
the
two
different
permission
sets,
I
just
mentioned
it's
it's
complicated
to
use
our
public
apis
internally
and
then
a
lot
of
people
would
reach
out
to
us
and
ask
us
what
is
right
authorization
strategy?
What
is
the
right
framework
to
use,
and
there
was
not
always
a
clear
answer
to
that
question.
A
Besides
all
these
problems,
another
challenge
we
had
it
was
for
our
infosec
team.
It
was
extremely
hard
for
them
to
do
audits
because
they
would
have
to
look
at
these
aqua
files,
proto
files
or
even
custom
code
to
look
at
things
like
is
this
endpoint
exposing
pii
data?
If
it
is,
if
it's
requiring
the
right
permissions
that
it
should
given
the
data,
it's
exposing,
that
was
a
very
hard
question
to
answer.
A
Some
of
them
are,
we
need
a
consistent
authorization
strategy
and
then
we
could
also
we
talked
about
unifying
both
permission
sets,
so
we
could
reuse
our
public
apis
internally.
The
next
effort
to
solve
the
infosec
problem.
We
thought
about
a
centralized
source
of
truth
that
they
could
use
to
actually
look
up
resources,
look
up
their
requirements
and
and
see
if
these
permissions
match
what
is
expected
from
a
security
perspective.
A
So
these
were
some
of
the
solutions
and
motivations
and
and
then
we
started
thinking
in
how
we
would
actually
implement
these.
As
I
mentioned
some
of
them,
we
would
be
able
to
address
this
with
a
centralized
source
of
truth,
some
other
issues
we
could
fix
them
by
having
a
single
authorization
point,
and
then
we
also
wanted
to
have
a
deny
by
default
approach,
which
we're
still
not
quite
sure
how
we
were
going
to
fix
that
you
deny
by
default.
A
A
At
that
point,
we
had
been
looking
at
the
external
ot
filter,
that's
available
in
envoy,
because
we
had
reached
a
point
at
square
where
all
these
services
had
envoy
side
cars.
So
now
leveraging
envoy
became
a
real
thing,
and
so
for
those
of
you
who
are
not
familiar
with
the
external
oddsy
filter
on
an
envoy,
it's
personally
my
favorite
extension.
A
It
has
made
my
life
so
much
easier
and
basically,
the
way
it
works
is
that
it's
a
filter
that
will
call
an
external
service
send
to
the
original
request
and
the
external
service
can
then
make
a
decision.
If
that
request
is
authorized
or
not,
if
it's
authorized,
it
will
return
a
200
and
then
envoy
will
move
on
to
the
next
filter
and
eventually
reach
the
application
layer
upstream
and
now
the
application
upstream
knows
that
this
request
has
been
authorized.
A
A
So
basically,
this
is
how
this
would
look
like
for
a
successful
request.
So
the
client
sends
a
request
which
gets
proxied
by
envoy,
which
then
calls
the
authorization
service
receives.
A
200,
then
envoy
forwards
the
request
to
the
application
layer
that
who
eventually
will
return
that
to
the
client.
A
As
you
saw
earlier,
we
use
a
lot
of
library
code
at
square
and
we
did
not
have
a
authorization
service.
So
we
had
to
build
an
authorization
service
to
accept
and
and
support
these,
this
external
rt
filter
for
the
authorization
service.
A
This
is
a
quick
preview
of
how
that
ui
looks
like
and
we
did
go
back
and
forth
on.
Should
we
use
a
ui
in
a
database,
or
should
we
use
echo
files
that
can
be
checked
in
in
our
search
control
and
the
main
reason
we
decided
to
go
with
the
ui
in
the
database
is
because
we're
still
making
a
lot
of
changes.
A
While,
if
you're
using
apple
files,
it
would
require
a
redeploy
of
the
authorization
service.
So
since
we
have
somewhere
around
250
services
that
we
were
trying
to
migrate
every
single
time,
one
of
those
services
makes
a
change.
We
would
have
to
redeploy
the
authorization
service
for
that
change
to
to
show
up
in
either
staging
or
production.
A
Next
we
had
the
solution
in
place
where
envoy
is
calling
the
authorization
service
for
every
single
request.
That's
when
we
introduce
the
concept
of
protected
and
unprotected
routes
unprotected
routes,
there
are
routes
for
static
content,
blog
posts,
images
that
do
not
need
authorization
so
for
those
routes.
We
we
really
don't
want
envoy
to
call
the
authorization
service,
because
that's
a
waste
of
resources
for
both
the
authorization
service
and
the
request
itself.
So,
in
order
to
do
that,
we
started.
A
We
gave
service
owners
the
option
of
saying
if
their
routes
were
required,
authorization
or
authentication,
and
then
we
built
a
integration
with
our
centralized
control
plane
and
our
authorization
service,
which
would
now
send
over
to
the
control
plane
the
unprotected
routes
and
the
services.
A
A
Next,
we
we
had
this
in
place.
It
was
great,
and
now
we
had
a
migration
challenge,
so
we
had
the
solution,
but
we
still
had
250
services
that
we
now
needed
to
migrate.
How
do
we
get
all
those
rules
and
authorization
requirements
into
this
central
storage?
This
is
when
the
team
decided
to
invest
some
time
in
building
migration
scripts.
A
If
you
remember
from
this
this
previous
overview,
different
libraries,
some
use
aqua
file,
some
use,
proto
files
and
what
we
did
is
we
build
different
scripts
that
would
extract
the
rules
from
these
files
and
call
temporary
endpoints
in
the
authorization
service.
So
we
could
store
that
data
in
the
authorization
database.
A
A
This
was
very
helpful
as
we
were
rolling
out
envoy
safe
having
the
ability
to
disable
it,
knowing
that
there
was
still
a
backup
strategy,
this
library
would
be
up
to
date
and
would
have
all
the
right
requirements
in
place.
A
Next,
we
had
a
set
of
services
that
had
their
authorization
requirements
built
in
the
application
layer.
Unfortunately,
there
was
no
easy
way
to
extract
that
data
and
migrate
it
to
the
authorization
service.
So
we
had
to
work
with
service
owners
to
have
them
manually
migrate.
These
routes.
This
is
not
as
ideal
mainly
because
teams
have
their
own
deadlines,
their
own
schedules,
so
we
had
to
work
with
that
and
even
though
our
teams
were
very
supportive,
it's
there's
still
no
automated
way
to
keep
these
both
to
keep
both
strategies
in
sync.
A
A
Next,
I
would
like
to
talk
a
little
bit
more
about
our
our
rollout
strategy.
So
at
this
point
we
had
a
solution
in
place.
We
had
a
lot
of
data
and
we
were
ready
to
to
try
this
and
roll
out
for
for
multiple
services
and,
first,
what
we
did
is
we
introduced
a
logging
only
mode
a
logging
only
mode.
What
that
does
is
that
our
authorization
service
would
always
return
to
200,
no
matter
what
the
actual
authorization
decision
was
so
envoy
would
never
short-circuit
the
request.
A
A
Our
next
rollout
strategy
is,
we
use
the
runtime
fraction
configuration
this
allowed
us
to
split
some
of
the
traffic
and
roll
out
on
a
percentage-based
approach.
So
what
we
did
is
that
we
would
roll
out
on
voice
safe
for
five
percent
of
a
given
service
traffic
that
allowed
us
to
make
sure
that
the
authorization
service
was
hitting
the
right
slas.
We
were
able
to
handle
that
qps
and
we're
also
sure
that
not
we're
not
blocking
any
traffic.
A
So,
in
order
to
support
this
percentage
rollout,
we
introduced
that
as
part
of
an
admin
panel,
and
we
had
that
passed
to
the
onvoice
control
plane
through
that
same
integration
that
I
mentioned
earlier
for
unprotected
routes
that
did
require
some
changes
in
our
data
model
on
the
on
the
control
plane
side
to
support
these
these
different
values,
but
that
worked
out
and
then
next.
I
want
to
talk
a
little
bit
about
some
of
the
lessons
learned.
A
A
As
you
can
see,
someone
would
mark
a
wild
card
or
all
traffic
as
unprotected
and
later
on,
a
more
granular
route
with
actual
authorization
permissions
requirements.
So
we
had
to
build
some
logic
around
that
to
detect
these
cases
and
either
notify
the
end
user.
Through
the
ui
saying,
hey,
you're,
introducing
a
conflict,
you
might
want
to
consider
specifying
a
more
granular
route,
or
we
have
to
be
very
clever
about
how
we
organize
these
routes.
A
When
we
send
them
as
unprotected
routes
to
the
authorization
service,
but
it's
definitely
something
to
keep
in
mind,
because
we
missed
that
initially
debugging
debugging
becomes
a
little
bit
more
challenging
because
now
service
owners
have
to
rely
on
the
logs
we
use
in
the
service
mesh
or
in
the
authorization
service.
So
they
can
it's
a
little
bit
harder
for
them
to.
They
cannot
add
any
custom
logging
or
custom
metrics.
They
have
to
rely
on
a
more
generic
output
and
next
some
shortcomings.
A
We
noticed
with
the
external
ozzy
extension
is
that,
as
I
mentioned
earlier,
it's
my
favorite
filter
and
I'm
not
the
only
one
who
thinks
that,
so
there
are
a
lot
of
teams
at
square
who
are
trying
to
use
this
filter
not
only
for
authorization
but
for
other
use
cases
as
well.
So
this
filter,
I
think
it's
very
versatile,
so
it
can
be
used
for
multiple
use
cases
and
solve
different
problems.
A
So,
at
the
same
time,
there's
no
way
to
enable
or
disable
them
individually.
That
caused
some
conflicts
between
different
teams
who
are
trying
to
use
this
filter
and
they
cannot.
We
cannot
use
them
for
the
same
services
because
if
they
disable
it
they're
disabling
our
solution,
if
we
disable
or
enable
it
we're
also
enabling
the
filter
for
on
their
solution,
there's
also
no
way
to
bypass
the
filter
for
a
given
header.
A
That
would
have
been
useful
in
in
some
cases
to
make
sure
we
do
not
reauthorize
or
reauthenticate
a
request
twice.
And
then
you
can't
change
the
class
of
how
the
grpc
call
works
and,
and
that
limits
you,
if
you
want
to
have
a
microservice
that
implements
two
endpoints
that
could
be
called
by
the
external
rt
filter.
A
So
conclusion,
we
decided
to
move
all
our
authorization
from
app
and
library
code
into
the
service
mesh
with
a
centralized
source
of
truth,
where
we
are
so
currently
we
start
rolling
this
out
in
production.
We
are
targeting
your
rollout
on
voicey
for
closely
250
services
and
we're
expecting
to
handle
somewhere
around
20k
qps.
A
Then
our
next
steps
is
going
to
be
focus
a
little
bit
more
on
decoupling,
some
of
our
authentication
authorization
strategies
and
then
that
will
also
allow
us
to
implement
a
new
and
more
flexible
permission
system
that
will
allow
us
to
implement
even
more
features
from
a
application
site.
A
That's
all
I
had
today
thanks
everyone
for
listening.
This
is
my
email.
If
you
want
to
reach
out,
I
would
love
to
hear
how
your
team
is
solving
authorization
and,
if
you're
interested
in
working
on
some
of
these
problems
square
is
hiring
thanks.
Everyone.
A
Hey
everyone,
thanks
for
for
listening
to
the
talk,
let
me
know
if
you
have
any
questions.
A
How
long
are
you
projecting
this
rollout
to
take
for,
say,
90
of
all
services?
It
took
us
about
three
four
months
to
get
most
of
these
routes
into
our
database
and
and
have
service
owners
review
them,
update
them
and
start
rolling
out
in
staging
we're
currently
working
on
our
rollout
production.
We
expect
this
to
take
somewhere
between
four
four
to
five
months,
easily,
probably
a
little
bit
longer.
A
What
was
the
overhead
overhead
like
from
running
these
ot
calls
not
only
out
of
process
but
over
the
network?
That's
a
good
question.
I
think
some
of
our
request
calls.
We
have
an
additional
latency
depending
on
some
of
these
calls.
I
think
our
average
is
still
below
10
milliseconds,
which
is
pretty
good
for
a
while.
We
were
making
two
calls
one
to
the
authentication
service
and
one
to
the
authorization
service,
and
so
that
was
that
was
doubling
almost
that
additional
latency.
A
A
When
db
updates
are
done
through
the
ui,
how
do
you
push
the
changes
through
the
odds
service
instances
yeah?
So
the
way
that
works
is
through
the
integration
I
mentioned
earlier
between
the
control
plane.
So
basically,
when
somebody
updates,
through
the
ui
some
of
the
routes,
some
of
the
permission
requirements,
this
gets
synced
to
to
the
database
and
then
our
authorization
service
and
the
control
plane.
They
are
constantly
syncing.
A
So
when
a
envoy
sidecar
will
request
our
hit
our
centralized
control
plane,
it
will
get
an
updated
configuration
and
I
think
right
now
we
can
get
that
done
in
there's
a
max
latency
of
five
minutes.
A
Oh,
how
do
you
deal
with
the
situation
where
safe
and
other
teams
used
to
separate
external
rt
filters
in
the
same
pipeline?
That
is
a
problem
we
haven't
solved.
Yet
it's
something
we're
working
on.
We
are
considering
making
changes,
maybe
to
the
external
rt
filter
itself,
so
we
can
identify
these
and
enable
or
disable
it
depending
on
the
configuration
or
give
it
some
sort
of
identifier
right
now.
We
have
not
solved
that
problem
yet.
A
Our
odyssey
decision
always
pays
path-based,
or
is
there
any
application
business
logic?
So
so
far,
it's
all
path-based
and
the
main
reason
is
that
we
try
to
keep
as
much
application
and
business
logic
out
of
the
authorization
service.
A
A
We
are
working
on
a
different
authorization
model
where
you
can
force
them
of
like
more
business
logic
or
business
decision
in
your
and
specify
more
granular
rules,
that's
still
under
development,
but
we're
hoping
by
once.
We
have
that
to
have
a
little
bit
more
flexibility
and
you
can
force
on
certain
parameters
and
right
because
right
now
that
still
lives
in
on
the
actual
application
layer.
A
So
it's
still
possible
that
a
service
will
get
the
authorized
request
and
need
to
do
some
additional
authorization
logic
in
order
to
especially
if
it's
very
close
to
the
data
model
in
that
service.
A
Did
you
look
at
oppa
ferrazzi?
If
so,
why
did
you
choose
a
db
model?
We
did
not
look
into
that
too
much,
which
is
the
db
model,
mainly
for
when
I
mentioned
what
for
what
I
mentioned
during
the
talk,
flexibility
was
the
main
reason
we
decided
to
to
move
forward
with
a
database
model.
Also,
we
wanted
to
have
the
ui
integration,
which
is
some
of
the
features
we
were
looking
to
implement.
A
Did
you
guys
resolve
for
better
through
foot?
We
have
not
it's
something.
We've
considered
it's
an
optimization
we
might
implement
later
on,
especially
since,
as
I
mentioned
as
well,
there
is
no
way
to
actually
skip
the
external
odds,
filter,
which
is
possibly
a
good
thing,
but
yeah
we
might
consider
caching,
some
of
the
odds
result
results.
A
A
Is
safe
odds
done
by
save
I'm,
not
sure
what
that
question
means
so
we
had
a
safe
framework
that
was
doing
ot
and
then
basically,
we
took
a
lot
of
the
way
that
authorization
was
being
enforced
by
that
framework
and
we
move
that
to
a
service.
But
it's
it's
essentially
your
implementation
of
some
of
those
authorization
rules
so
safe
and
on
voice
safe.
A
It's
it's
it's
a
different
implementation,
but
they
use
the
same
concept,
same
permission,
sets
and
some
of
the
mapping
between
these
permissions
that
we
had
for
private
apis
and
public
apis.
So
there's
some
like
shared
concepts
that
these
two
share,
but
the
authorization
service
has
its
own
implementation
and
does
not
reuse
the
existing
safe.
A
Framework,
I
think
I
answered
all
the
questions.
If
I
missed
any
of
your
questions,
feel
free
to
reach
out
or
repost
it
again
and
I'll
try
to
answer
it.
I
left
my
email
and
in
the
slides
so
feel
free
to
reach
out
and
and
happy
to
chat
a
little
bit.