►
From YouTube: Maturing with Open Source: Cultivating an API Ecosystem for Enterprise Success – UnitedHealth Group
Description
Learn more about Kong: https://bit.ly/2I2DypS
Building a thriving API ecosystem is more than offering a stable and performant gateway solution; it’s about the people, processes and practices which contribute to long-term success. Since Kong Summit 2019, our experiment in open source has grown to support over 90% of UnitedHealth Group’s RESTful traffic. The path to success within this Fortune 10 healthcare giant has been paved with intriguing challenges, unexpected hazards and valuable lessons. Watch this Kong Summit 2020 session to learn our roadmap for how to build a thriving API ecosystem, engineered for long term success, with Kong at its core.
A
B
And
hi
I'm
ross
sabrisha,
I
was
hired
by
optimum
about
four
years
ago.
I've
been
working
in
the
gateway
space
ever
since
I'm
a
kong
champion
and
I'm
based
out
of
denver.
Let's
kick
this
thing
off
by
talking
a
little
bit
about
optum
optum
is
a
healthcare
technology
company.
It's
part
of
united
health
group
and
one
of
its
important
missions
is
to
provide
the
tech
infrastructure
for
this
fortune.
B
7
healthcare
giant-
and
I
mean
you-
can
see
this
screen
there's
over
300
000
employees
in
this
organization,
thousands
of
apis
countless
integrations,
and
since
our
presentation
last
year,
kong
and
optum
has
seen
some
remarkable
growth
in
adoption
within
the
space.
So
let's
start
there
around
this
time.
B
Now.
Surely
growth
at
this
scale
must
involve
an
equally
precipitous
increase
in
the
work
required
to
add
capacity
for
it
all
right,
I
mean
this
very
scientific
graph
shows
the
breakdown
of
feedback
and
advice
I
received
when
the
platform
is
really
starting
to
ramp
up
concerns
about
capacity
were
by
far
and
away
the
most
common
topic.
So
for
those
folks
I'd
like
to
share
the
complete
record
of
everything
we
did
to
add
this
new
capacity
buckle
up.
B
A
One
of
the
first
asks
came
from
a
large
enterprise
effort
to
add
a
new
centralized
log.
Sync,
the
gateway
seemed
like
a
good
application
as
a
central
point
to
capture
security
related
metadata
for
api
transactions.
First
asking
each
individual
api
provider
and
their
variety
of
application
stacks
to
implement
the
log
sync
themselves.
A
A
This
task
required
adding
a
fresh
rework
of
our
ingress
to
enable
the
handshake
to
happen
on
the
gateway,
as
well
as
security
payload
scanning
much
like
kong.
We
wanted
an
open
source
solution
to
help
with
threat
protection
as
well.
We
were
able
to
leverage
the
mod
security
v3
open
source
web
application
firewall,
along
with
the
oauth
core
rule,
set
to
manage
a
set
of
well-defined
attack,
vector
checks
directly
from
our
kong
nodes.
A
This
was
a
great
experience
because
it
opened
our
team
to
more
of
the
application
security
side
of
engineering,
and
there
were
some
really
great
community
members
from
both
the
rule
set
and
waf
teams.
The
new
ingress
approach
for
mutual
tls
also
enabled
direct
visibility
to
threats
and
the
blocks
we
faced
on
api
transactions
now.
Lastly,
a
long
time
ask
from
customers
was
how
to
support
multiple
auth
patterns
on
a
given
proxy
service.
A
Now,
for
a
long
time,
kong
had
offered
an
anonymous
user
pattern
that
enabled
kong
to
run
multiple
programmatic
authentication
patterns
against
a
given
proxy,
but
one
consequence
there
was
needing
a
con
consumer
resource
in
the
context
with
the
required
acl
group
set.
This
caused
problems
for
a
use
case.
A
Many
customers
wanted
where
they
desired
the
proxy
to
support
both
programmatic
and
user-based
authentication,
but
because
our
user-based
authentication
came
from
third-party
identity
providers,
it
meant
that
there
was
no
con
consumer
resource
to
be
had
in
these
types
of
interactions,
but
then
came
a
really
neat
pr
from
oppo
of
the
kong
engineering
team,
which
enabled
programmatically
setting
the
authenticated
groups
to
the
context
of
the
transaction,
bypassing
the
need
for
a
con
consumer
resource
at
all.
This
elegant
solution
enabled
great
flexibility
to
the
authentication,
plugins
and
patterns
that
kong
per
proxy
could
support.
A
A
Well,
we're
here
to
say
that
now,
after
a
period
of
three
years
and
22
plus
kong
upgrades
in
multiple
environments,
the
process
is
really
just
not
that
scary.
Almost
all
of
the
upgrades
went
smoothly
all
except
one
which
we'll
touch
on
shortly
and
exemplifies,
even
when
things
go
a
bit
sideways.
Recovery
with
open
source
technology
is
not
a
process
to
be
afraid
of
breaking
down
the
kong
upgrade
process.
It's
engineered
to
be
simple
and
impact
free.
First,
starting
with
the
running
version
of
kong
taking
traffic.
A
A
Once
satisfied
with
the
behavior
seen
on
the
newer
version
of
your
kong
node,
you
can
execute
a
con
migrations
finish
to
wrap
up
a
few
finalized
database
adjustments,
specifically
for
full
support
on
the
newer
version
of
kong.
It
is
at
this
stage
that
you
want
to
only
send
traffic
to
your
newer
versions
of
kong
and
ensure
that
the
older
kong
version
instance
nodes
are
taken
out
of
rotation.
A
We
have
three
kong
environments,
a
dev,
sandbox
environment,
a
stage
which
is
customer
facing,
and
then
our
production
environments
the
upgrade,
went
off
without
a
problem
in
dev.
So
we
had
confidence.
That
stage
would
face
no
issues.
Then,
once
we
got
around
to
upgrading
stage
and
the
migrations
up
had
seemingly
completed,
we
ran
a
migrations
finish
now.
That's
when
we
got
the
air
that
no
louis
kong
programmer
likes
to
see
stack
traces
of
the
dreaded
attempt
to
index
a
nil
value.
A
This
was
a
harbinger
error
to
the
air
that
we
started,
seeing
an
impact
around
10
percent
of
our
stage
traffic
and
would
continue
to
do
so
almost
until
the
next
full
day.
Let's
break
down
this
real
world
scenario
to
gain
more
confidence
in
the
recovery,
from
a
situation
in
the
open
source
space
and
the
mistakes
we
made,
so
you
don't
have
to
make
the
same
well
step.
One
for
us
is
to
try
to
capture
and
ship
logs
and
screenshots
of
some
of
the
areas
we
see
directly
to
the
community
in
the
kong
repo.
A
A
Now
this
is
where
our
biggest
mistakes
were
made
in
situations
like
this,
we
rely
on
our
database
backup
and
restore
process
to
bail
us
out.
Now
this
process
was
very
underutilized
and,
as
it
turns
out
fairly
immature,
a
restoration
process
was
incapable
of
restoring
data
to
a
cluster
that
had
already
gone
a
schema
change,
since
the
backup
was
taken
like
exactly
what
happens
during
a
car
migration.
A
While
we
initially
did
fix
this,
the
time
that
we
found
ourselves
manually,
editing,
database
records
and
key
space
schemas
to
complete
the
car
migrations
manually,
trust
me:
this
is
not
a
situation.
You
want
to
find
yourself
in
begging
a
kong
engineer
for
some
guidance
via
zoom,
while
leadership
is
heavily
engaged.
Let's
just
say
that
the
database,
backup
and
restoration
should
always
be
the
go-to
first
option
for
fast
remediation
of
a
problematic
upgrade.
A
A
Even
after
using
the
kong
application
for
over
three
years,
we
still
get
critical
enhancements
that
meet
our
needs
and
use
cases.
One
such
example
of
this
could
be
seen
this
year
by
tebow's
contribution
that
enables
dynamic
upstream
keep
live
pools.
This
is
what
helped
us
to
route
to
certain
types
of
secure
apis
that
shared
the
same
initial
ip
address
and
port
for
their
ingress,
with
all
of
our
growth
and
scale
over
the
last
year.
A
Another
great
example
of
how
staying
latest
has
helped
us
overcome
problems
was
a
code
contribution
by
a
community
member
named
harish
throughout
2019
on
the
kong
1.x
series.
We
were
hit
by
an
odd
bug
that
constantly
caused
cassandra
to
return
a
null
pointer
exception
when
kong
resources
were
being
rebuilt.
Frequently
due
to
changes
in
configuration
and
us
nor
the
kong
team
had
the
answer
to
what
was
causing
the
problem
for
close
to
eight
months,
then
it
seemed
out
of
nowhere.
A
So
you
know
any
of
you
calling
cassandra
users
out
there
who
need
to
query
enough
resources
to
cause
cassandra
paging.
You
know
if
you're
on
the
elderly
versions
of
kong1.x
or
potentially
on
the
ancient
versions
of
calling
0.x
series,
then
you're
gonna
want
this
fix
to.
You
know
be
able
to
perform
these
proper
cassandra
paging
techniques
while
running
your
kong
instance.
A
Now
the
other
added
benefits
of
staying
up
to
date
on
the
latest
versions.
Is
your
kong
upgrades
stay
smaller
and
change
scope,
the
bigger
jumps
you
must
take
to
get
from
the
latest
version.
You
know
you're
going
to
run
into
lots
and
lots
of
potential
problems
as
you
jump
migration
to
migration
to
migration.
It's
much
better
to
take
things
in
small
scope.
A
I'd
also
say
that
support
for
the
legacy
version
of
open
source
copies
of
kong
is
going
to
be
minimal.
You
know
what
it
takes
to
run
and
maintain
all
the
features
and
stuff
that
kong
releases
you're
going
to
have
people
come
to
you
and
you're
going
to
say:
hey,
I'm
running,
you
know
version
0.14.x,
I'm
facing
these
issues.
People
are
going
to
want
you
to
bounce
back
and
get
back
to
the
latest
version,
because
it's
very
likely
that
the
features
have
been
fixed
in
the
latest
releases.
B
Okay,
another
great
way
to
contribute
to
customer
confidence,
promote
adoption
and
create
advocates
for
your
platform
is
to
empower
customers
through
effective
operational
support.
Let's
justify
that
statement
for
a
quick
moment
and
examine
why
we
need
operational
support.
I
mean
why
do
we
really
need
it?
We
have
a
git
ops
based
self-service
model.
We
have
detailed
event,
logging
and
metrics.
We
have
mature
documentation
which
covers
everything
from
those
topics
to
the
security
patterns
we
support
and
troubleshooting
tips.
Why
do
we
need
op
support?
B
B
These
types
of
discussions.
Secondly,
troubleshooting,
because
no
matter
how
many
faqs
you
write
tips
you
provide
or
how
complete
you're
logging.
Sometimes
the
customer
is
just
going
to
need,
help
to
figure
out
what's
wrong
and,
lastly,
and
least
frequently
needed
non-standard
proxy
management.
These
are
effectively
manual
work
orders
to
provision
rare
changes.
We
don't
support
through
self-service
now,
we've
kind
of
touched
on
the.
Why?
Let's
dive
into
the
how,
by
looking
at
the
support
flows,
we
manage
to
deliver
support
for
all
of
these
needs.
We'll
start
over
here
with
non-standard
proxy
management.
B
We
have
a
fairly
dedicated
intake
request
for
these
in
the
form
of
a
specific
git
issue
with
a
template.
Clients
will
submit
this
issue
to
make
their
modification
request.
An
engineer
from
our
team
will
be
assigned,
and
that
engineer
is
responsible
for
fulfilling
the
request
and
closing
the
issue,
pretty
straightforward,
skipping
troubleshooting.
For
a
moment,
we
have
three
support
flows
for
our
integration,
consultation
intake.
We
support
these
consultations
through
email,
our
weekly
gateway
office
hours
call
and
because
it's
sometimes
just
unavoidable
dedicated
meetings.
B
Now.
Finally,
we
come
to
troubleshooting.
This
is
by
far
and
away
the
most
common
reason
for
engaging
support
from
the
gateway
team.
We
handle
these
requests
differently,
based
on
priority
in
the
low
priority
group.
We
have
non-production
issues
and
production
build
out
issues.
We
will
typically
respond
for
requests
for
troubleshooting
assistance
through
email
office
hours
and
most
commonly
through
our
internal
chat,
app
flow
dock
if
you're
not
familiar
with
flow
dock,
it's
very
similar
to
slack
and
then
in
the
high
priority
category.
We
have
live
production
issues.
B
These
are
treated
very
seriously
and
triaged
on
a
moderated
war
room
and
we
have
24
7
on-call
paging
support
for
this
purpose.
So
this
is
the
full
picture
of
our
operational
requirements
and
the
support
flows.
We
have
for
them.
Let's
quickly
review,
how
we
manage
these
support
flows
because,
as
you
can
see,
this
has
the
potential
to
be
a
little
bit
disorganized.
B
We'll
start
with
the
24
7
paging
groups.
Most
of
you
are
going
to
be
familiar
with
a
system
like
this.
We
nominate
two
engineers
from
our
teams
in
weekly
rotations.
One
of
them
is
the
primary
on-call.
One
of
them
is
the
backup
on-call
seems
pretty
straightforward
so
far,
but
remember
that
primary
on-call
they'll
have
some
additional
responsibilities
during
their
week,
which
are
critical
to
the
success
of
the
system.
B
B
Emails
are
similarly
unremarkable,
but
it's
worth
noting
that
our
primary
on-call
is
responsible
for
our
inbox
during
their
on-call
week.
Now
our
flow
dock
is
a
weekly
meeting
open
our
office
hours
is
a
weekly
meeting
open
for
90
minutes
where
all
questions,
comments
and
feedback
are
welcome.
It's
a
great
way
to
cut
down
on
those
dedicated
meetings
we
want
to
avoid
and
to
take
the
pulse
of
our
community.
The
primary
on-call
is
also
responsible
for
running
the
office
hours
call
on
their
week.
B
This
is
our
most
active
support
channel
by
far,
and
it
really
represents
some
of
the
best
and
worst
that
our
model
has
to
offer
on
one
hand,
customers
can
get
quick
feedback
on
their
questions.
The
community
has
the
opportunity
to
participate
and
it
enables
a
dialogue
without
being
as
interruptive
as
a
meeting
would
be.
On
the
other
hand,
short
inquiries
don't
always
stay
short.
Topics
can
often
stray
from
gateway
integrations
to
more
general
engineering,
support
topics
like
certificate
management
and
the
line
between
valuable
service
and
cumbersome
time.
B
Sync
is
not
always
clear
either
way,
it's
the
primary
on-call's
responsibility
to
sort
that
out
during
their
week,
as
is
it,
is
their
responsibility
to
handle
those
manual
work
order,
get
issues
I
talked
about
now,
I'm
at
risk
of
losing
credibility
with
some
of
you.
If
I
fail
to
mention
direct
one-to-one
pings,
I
could
make
a
full
documentary
about
how
to
deal
with
these
things,
but
for
the
sake
of
time
I
will
just
say
that
we
try
to
send
all
direct
one-to-one
pings
to
our
official
support
flows
and
leave
it
at
that.
B
So,
let's
see
how
our
system
has
coped
with
the
scale.
You
recall
this
past
year
saw
a
10x
or
1
000
increase
in
the
proxies
on
our
platform.
Here
you
can
see
the
size
of
the
team
rep
responsible
for
providing
operational
support.
In
addition
to
gateway
engineering
responsibilities,
we've
gone
from
a
total
of
seven
to
ten
or
an
increase
of
about
40
percent,
so
for
an
increase
in
1
000
in
traffic,
a
scale
of
40
percent
in
operations
seems
fairly
reasonable.
The
system
is
very
much
a
work
in
process
process.
B
B
Remember
this
graph
from
the
start,
showing
our
increase
in
traffic
volume
over
the
past
year.
This
is
expressed
in
transactions
per
month,
but
it's
a
little
bit
more
meaningful.
If
we
use
a
different
metric
expressed
as
a
proportion
of
the
company's
restful
traffic,
we
can
see
that
com
now
handles
over
90
percent
of
all
rest
api
traffic
in
the
company.
It's
largely
the
only
component
common
to
it
all,
and
this
inherently
gives
us
the
reach
to
affect
almost
the
api
gate.
Almost
the
whole
api
space
with
the
support
of
leadership,
it
gives
us
more.
B
It
gives
us
the
reach
to
govern
the
api
space
and
to
understand
that
value.
Let
me
talk
you
through
a
quick
example.
Let
me
introduce
you
to
company
x.
Company
x
is
a
large
organization
with
multiple
api
development
teams.
The
apis
produced
by
these
teams
share
no
consistent
design
frameworks.
Have
no
common
quality
standards,
often
employ
unorthodox
or
insufficient
security
models
and
have
no
common
means
of
discovery
or
documentation
for
the
unfortunate
clients
of
company
x.
B
B
So
let's
talk
about
how
to
do
it
right.
Good
governance
starts
with
good
guidelines.
So
we'll
begin
by
discussing
how
to
think
about
our
guidelines,
our
rules,
because,
folks,
the
quality
of
your
rules,
is
what's
going
to
make
or
break
your
attempted
api
governance.
We
want
to
offer
governance
on
both
the
technical
product
and
the
design
process.
Our
rules
should
be
kept.
Current
and
useful,
a
great
way
to
do
that
is
to
document
them
visibly.
B
B
B
Next,
we
have
a
rule
called
the
provide
specification.
You've
been
using
open
api
v3
rule.
This
just
says
that
the
design
that
the
api
providers
have
to
do.
This
must
be
done
with
an
open
api
v3
seems
pretty
straightforward
enough.
Let's
move
to
something
a
little
bit
more
restful
and
talk
about
some
of
our
resource
structure
rules.
We
have
rules
about
keeping
our
urls
verb
free,
so
using
slash
messages
instead
of
something
like
retrieve
messages.
B
B
Nobody
wants
to
call
a
url
with
45
sub-resources
in
the
chain
sticking
with
the
restful
theme.
Let's
examine
a
couple
rules
around
the
use
of
hdb
methods
and
status
codes.
We
require
that
our
developers
use
http
methods
correctly.
This
just
means
gets
for
reads
posts
for
creates.
Put
in
patch
for
updates
deletes
for
deletes.
You
get
the
idea.
B
We
require
that
our
developers
use
standard
http
codes,
no
defining
your
own
custom
status
codes
for
the
application.
We
also
say
that,
in
addition
to
using
standard
http
codes,
we
must
use
the
most
specific
status
code.
Not
every
success
is
a
200..
Sometimes
we
have
201.
Sometimes
we
have
204
204s,
you
get
the
idea,
and
finally,
we
shouldn't
mix
our
success
and
error
components.
These
need
to
be
separate
structures.
B
B
We
have
rules
that
specifically
govern
not
just
the
design
and
perform
a
performance
of
our
apis
at
design
time,
but
also
the
performance
of
our
apis.
At
runtime
we
say
that
all
apis
must
have
95th
percentile
latency
less
than
some
amount
of
milliseconds.
Unfortunately,
that
is
secret
information.
We
actually
enforce
this
with
the
gateway
timeout
at
the
kong
layer
and
finally,
we
have
limits
on
the
payload
sizes,
which
all
of
our
apis
will
accept.
50
megabytes
is
somewhat
arbitrary.
B
B
In
the
interest
of
time,
I'm
going
to
skip
over
the
nomenclature
and
taxonomy
examples.
These
basically
have
to
do
with
how
to
construct
the
api
path
on
the
apis.
We
expose
and
I'm
going
to
skip
right
into
how
to
enforce
governance
effectively.
We
need
to
make
this
process
easy
to
follow
and
mandatory
for
it
to
have
effect
for
it
to
give
us
the
environmental
benefits
we
want.
B
We
started
out
with
a
manual
governance
process.
It
wasn't
much,
but
it
was
a
starting
point
and
it
was
a
way
to
introduce
the
process
at
optum.
It
was
driven
through
an
archaic
system.
We
had
manual
reviews
for
api.
Taxonomy
developers
would
be
required
to
schedule
separate
security
scans,
which
sometimes
would
take
weeks
to
run.
Needless
to
say,
this
led
to
inconsistent
enforcement.
Long
delivery
raise
lots
of
frustration,
lots
of
exceptions,
talk
about
exceptions
more
in
a
minute,
but
nowadays
we
have
an
automated
governance
process.
B
This
is
a
get
ops
based
spec
driven
model
where
the
open
api
spec
is
effectively
the
key
to
the
key
to
start
governance
on
a
particular
api.
This
is
directly
linked
with
our
provisioning
for
the
purposes
of
enforcement.
If
your
api
has
not
gone
through
governance,
we
say
if
your
api
has
not
been
certified,
you
will
not
be
allowed
to
create
a
production
proxy
on
kong
and
optum.
B
This
automated
governance
process
also
links
directly
with
our
documentation
and
discovery
hubs.
This
makes
it
easy
for
apis
to
see
greater
levels
of
reuse
than
they
ever
would
have,
and
discovery
has
never
been
less
of
a
problem.
Apis
can
be
certified
start
to
finish
in
just
minutes
with
this
system.
B
B
In
order
for
the
exceptions
to
be
valuable
and
to
work
long
term,
we
need
to
have
accountability
baked
into
this
process.
All
of
our
exceptions
are
associated
with
a
specific
person.
By
name,
we
have
a
specific
remediation
plan
and
date,
so
how
we
gonna
fix
the
problem
when
we
gonna
fix
it
by
and
to
just
give
that
little
extra
bit
of
organizational
enforcement.
We
also
loop
in
the
vp
of
the
api
development
team,
requesting
the
exception
for
their
acknowledgement
and
approval.
B
Finally,
we
do
have
exceptions
from
exceptions.
This
is
effectively
just
security.
We
will
not
compromise
here.
For
any
reason,
all
apis
exposed
through
the
company
will
have
world-class
industry
level
security
and
there's
really
no
bones
to
be
made
about
that.
Any
other
deadline
can
take
a
back
seat
in
in
exchange
for
this
benefit
alrighty.
So
we're
about
done
here.
Let's
wrap
up
with
a
couple
highlights.
First
growing
a
platform
in
a
large
enterprise
requires
flexibility,
so
be
flexible
and
deliver
in
useful
features.
That
will
delight
your
customers
and
create
advocates
for
you.
B
Even
if
you
haven't
planned
to
implement
those
features
beforehand,
we
should
be
fearlessly
pursuing
updates
and
upgrades
to
keep
the
platform
stable,
competitive
and
to
increase
user
confidence.
This
is
doubly
important
for
open
source
based
applications.
We
need
to
empower
our
customers
with
effective
operational
support
and,
if
we
do
this,
our
platform
will
have
undeniable
value.