►
From YouTube: Democratic Deploys at Airbnb - GitHub Universe 2015
Description
As teams grow, there is often a temptation to add more process around shipping code in an effort to make it safer. Topher Lin and Igor Serebryany describe an alternate approach — using flexible tools to enable engineers who write the code to also ship their code quickly and safely. Airbnb's tooling deeply integrates SCM, builds, and deploys to guide engineers through the deploy process.
About GitHub Universe:
Great software is more than code. GitHub Universe serves as a showcase for how people work together to solve the hard problems of developing software.
For more information on GitHub Universe, check the website:
http://githubuniverse.com
A
So
it's
good
to
see
you
all
here,
thanks
for
coming
to
our
talk,
my
name
is
Igor,
my
name's
Topher.
We
are
oh
I.
Have
this
quicker?
We
are
engineers
on
air
being
v's,
developer
happiness,
team
arm,
which
has
actually
been
rebranded
to
developer
infrastructure,
but
at
heart
we're
still
consider
ourselves
develop
our
happiness
engineers.
A
So
we're
going
to
be
talking
about
kind
of
the
antidote
to
this
process
that
you
see
pictured
on
the
screen
here.
This
is
kind
of
the
traditional
release
management
process
that
involves
taking
code
through,
like
many
discrete
cycles
and
like
different
people,
take
it
through
different
stages
and
then
like
do
different
things
with
it,
and
those
things
might
be
full
of
mysteries.
So
we
found
that
this
process
doesn't
actually
make
engineers
particularly
happy.
A
The
thing
that
we
do
to
overcome
this
kind
of
process
is
this:
this
is
deploy
board.
We're
gonna,
be
talking
a
lot
more
about
it.
A
So
this
is
how
many
deploys
we
do
through
deploy
board
starting
in
2012,
and
you
can
see
it's
kind
of
like
exponentially
trending
up
line.
This
is
the
number
of
Engineers
using
deploy
board
kind
of
it's
a
very
similar
graph.
It's
basically
the
same.
The
reason
why
both
of
these
kind
of
grow
in
sync
is
because
that
air
B&B
every
engineer
deploys
her
own
code,
there's
no
like
handing
it
off
over
the
wall
to
some
release,
managers,
or
anything
like
that.
So
and
we'll
talk
about
how
that
works.
A
So
the
agenda
for
this
talk
is
why
why
do
we
think
engineer
should
ship
their
own
code
in
the
first
place?
What
kind
of
tooling
we've
built
to
enable
them
to
do
that
arm?
A
A
So
the
fundamental
problem
and
I
think
the
reason
why
many
companies
resort
to
an
extremely
process
heavy
deploy
process
is
that
deploys
are
pretty
dangerous.
Deploys
are
oftentimes
the
thing
that
breaks
your
infrastructure.
If
something
goes
wrong,
it's
you
should
look
for
the
deploy
that
caused
that,
and
so,
in
order
to
overcome
that
problem,
people
often
use
these
guys.
A
These
magical
wizards
for
called
release
engineers
and
these
guys
are
supposed
to
ensure
that,
like
whatever
changes
are
coming
down
the
pipeline
into
your
application,
that
they
that
those
changes
don't
break
the
application
and
I
don't
actually
or
like
we
at
Elm
every.
We
don't
actually
think
that
this
actually
works
this
way,
we
have
several
objections
to
using
this
common
practice
that
many
other
software
companies.
Some
of
these
objections
are
practical
arm
release
managers.
A
When
they're
deploying
someone
else's
code,
they
don't
they're
not
as
familiar
with
the
code
that
they're
deploying
as
the
people
who
originally
wrote
that
code,
so
they
don't
have
as
much
context
into
like
what
they
should
watch
for,
like
which
metrics
it
should
be,
observing
which
parts
of
the
site
might
break
like
what
might
actually
be
affected
by
the
deploy.
They
also
because
they're
become
like
a
bottleneck
for
getting
code
out
the
door.
That
means
that
the
overall
process
of
deploy
slows
down.
Like
oftentimes
people
have
like
a
traditional
release
process.
A
Then
they
released
like
every
certain
unit
of
time,
like
maybe
once
a
week
or
something,
and
so
as
a
result
of
that,
the
deploys
that
they're
doing
once
a
week
become
much
larger.
More
code
goes
out
in
each
deploy
and
not
is
actually
more
dangerous.
The
more
code
you
have
going
out
the
greater
the
likelihood
that
something
in
that
giant
pile
of
code
is
broken.
A
Oh
and
then,
if
something
is
broken,
it
becomes
harder
to
tell
what
exactly
is
broken
because
there's
so
many
different
changes
going
out
at
the
same
time,
and
you
still
like,
maybe
you
could
be
like
okay,
maybe
we
could
just
prevent
code
from
breaking
in
the
first
place
by
using
this
release
management
process,
but
in
reality
no
one
can
do
that
that
you
really
don't
know
until
you
deploy
to
production
and
like
observe
metrics,
whether
the
code
actually
works,
like
it's
really
difficult
to
do
that
and
British
managers
don't
have
any
more
power
to
do
that
than
any
other
engineer,
and
another
part
of
our
objection
to
this
kind
of
process
is
also
like
a
cultural
objection.
A
We
think
that
by
using
this
process,
where
you
code
and
then
you
hand
it
off
to
someone
else
to
take
care
of
from
there
that
causes
like
bad
cultural
effects.
It
means
you
have
less
autonomy.
You
don't
have
as
much
control
over
like
when
your
projects
actually
make
it
out
the
door.
You
kind
of
lose
some
responsibility,
like
you
kind
of
lose
some
knowledge
about
what
it's
like
to
actually
operate.
A
Your
code
introduction-
and
we
kind
of
take
these
things
seriously
at
13
v,
we're
a
very
like
core
values,
driven
company
and
also
like
one
objection-
is
that
the
job
of
Engineers
is
not
actually
to
write
code.
The
job
of
Engineers
is
to
make
product
that
they
deliver
to
users
and
when
you
reduce
their
job
to
like
you,
write
some
code
and
then
someone
else
will
deliver
the
product
that
actually
that's
not
that's,
not
that's,
not
really
what
engineers
are
supposed
to
be
doing.
A
So
this
is
an
old
picture
of
never
being
the
engineering
team
we're
in
lederhosen.
All
of
these
people
are
really
sent.
Managers
like
we
make
it
so
that
any
person-
and
it
may
be
engineering
team-
is
actually
a
release.
Manager
and
people
might
say.
Well,
you
know
that's
actually,
maybe
not
a
good
idea
like.
Are
these
people
even
qualified
to
be
released?
Managers
like?
Isn't
there
a
ton
of
stuff
they
have
to
know
to
be
able
to
do
that
like
they
have
to
know
how
to
deploy.
A
They
have
to
know
how
to
roll
back
when
something
goes
wrong.
They
have
to
know,
there's
no,
how
to
even
like
how
do
they
even
know
what
something
went
wrong
in
the
first
place
arm
and
we
think
that
actually
like,
even
if
you're,
like
a
formally
trained
release
engineer
you,
you
still
have
trouble
doing
all
these
things.
The
way
that
you
manage
to
do
this
is
you
have
really
great
tools
so,
like
this
big
wrench
here,
that's
like
what
release
managers
use
to
do
their
jobs,
and
so
actually
the
problem
becomes
really.
A
B
Okay,
great
so
so
far,
okay,
so
so
far
we
figured
out
traditional
release
management.
It's
not
magic!
It's
not
going
to
fix
all
your
problems.
There
are
some
practical
issues
with
having
us
bottleneck
that
slows
things
down.
We
want,
culturally
for
everyone
to
be
able
to
ship
their
code
and
the
way
we
achieve
that
is
with
tooling
so
drinks.
Next
section
we're
going
to
go
through
some
of
the
tooling
that
Airbnb
has
built
and
show
what
the
workflow
looks
like
so
great
Airbnb
releases.
B
How
do
they
work
so
it
actually
starts
when
you
push
your
code
upstream
for
the
first
time
and
open
a
pull
request,
we
do
everything
we
can
to
help
you
at
that
point.
Even
before
you
merge
the
codons
master
and
put
it
on
a
machine.
We
want
to
make
sure
that
you're
in
touch
with
the
right
people
and
are
getting
a
lot
of
feedback
on
whether
your
change
is
going
to
be
safe
to
release
or
not.
B
B
Whoever
was
their
friends
would
refer
them
to
the
company
or
the
new
hire
buddy
that
they
over
lunch
during
the
first
week,
even
if
those
people
were
not
the
ones
who
are
best
equipped
to
review
that
sticky
of
code.
As
a
result,
we
had
pretty
like
there
are
things
that
pass
people
weren't
experts
on
certain
domains
like
our
messaging
code
and
they
didn't
understand
edge
cases,
and
so
they
couldn't
give
proper
feedback.
B
Another
use
case
we
ran
into
was
areas
of
code
that
we
want
everyone
to
contribute
to,
but
which
have
certain
guidelines
that
people
need
to
be
aware
of.
So
an
example
of
this
is
our
API
code,
we've
written
an
API
framework
in
house,
and
we
want
everyone
to
be
able
to
write
their
own
API
endpoints
and
to
modify
them
as
they
need
to
for
their
product
needs.
B
But
we
also
want
people
who
wrote
the
framework
and
maintain
it
to
review
all
the
code
that
affects
the
API
and
is
possibly
related
to
it,
to
make
sure
that
everything
is
following
guidelines
to
make
sure
they're
following
best
practices
and
to
be
aware
of
needs
that
the
product
teams
have
that
made
the
API
frame.
So
to
help
with
those
we
created
a
review
routing
bot.
So
in
here
you
can
see
at
the
top.
We
have
a
pull
request.
John
is
modifying
some
messaging
related
code.
This
is
traditionally
a
pretty
complex
part
of
Airbnb.
B
It's
not
just
messaging.
It's
also
sending
special
offers
opening
reservations.
In
this
case,
there
was
an
edge
case
where
we
weren't,
showing
the
dates
of
a
reservation
after
it
had
been
altered
at
the
request
of
a
guess
or
host,
so
John's
updating
that
to
make
that
happen.
So
because
this
is
pretty
complex
is
pretty
old.
It
also
has
some
technical
debt
that
only
a
few
people
really
understand.
Well,
we
need
to
have
John,
have
a
bot
tell
John
who
actually
knows
about
this
stuff.
So
these
are
the
poor
unfortunate
souls
who
are
responsible
me
spike.
B
Ben
and
Gordon
the
way
this
works
is
the
bot
looks
for
a
reviewers
comment
within
the
file.
In
this
case,
the
messaging
controller
and
automatically
post
modifies
the
pr's
to
add
mention
the
relevant
people
you
can
so
you
can
also
do
this
for
directories
as
well.
You
can
put
a
review
pile
in
the
root
of
a
directory
apply
to
all
files
within
that
directory
at
any
sub
directories.
Anyone
can
set
reviewers
at
any
time.
We've
kind
of
organically
grown
the
number
of
things
that
deserve
reviewers
over
time,
as
we've
seen
the
need
to
do
so.
B
The
other
piece
of
what
we
do
is
something
that
a
lot
of
people
do
continuous
integration,
and
here
we
use
the
github
statuses
API
to
great
effect.
We
like
this
pretty
much
was
game-changing
when
we
started
using
it
because
it
puts
CI
checks
right
into
developers,
faces
they're
really
used
to
using
the
pull
request.
You
I,
so
putting
continuous
integration
results
right.
There
is
really
important
and
it
also
affects
how
the
merge
button
is
rendered.
So
here,
for
instance,
please
offer
continuous
integration
that
job
has
failed
and
we
therefore
say
merge
with
caution.
B
Another
thing
we
do
is
you
will
notice.
There
are
a
couple
deployed,
lock,
statuses,
those
are
used
to
communicate
the
status
of
the
service
at
the
time.
If
we
have
some
kind
of
incident
going
on
we'll
set
a
lock
and
that'll
set
the
status
to
read
so
that
developers
know
even
if
they're
changed
pass
all
the
tests,
it
might
not
be
a
safe
time
to
merge
right
now
later
on.
When
we
unlock,
then
things
will
go
green
and
they
can
merge.
If
something
goes
wrong,
they
can
click
through
the
details.
B
We've
invested
a
lot
of
effort
into
creating
a
single
you
I
in
deploy
board
for
any
sort
of
continuous
integration
system,
so
that
developers
don't
have
to
flip
between
a
lot
of
different
CI
systems.
You
is,
for
some
reason,
CI
systems
like
to
own
everything,
including
I,
and
sometimes
the
UI
doesn't
really
fit
our
needs.
So
we
put
some
time
into
making
sure
that
everything
can
fit
in
is
my
mic
on
cool,
okay,
awesome,
yeah!
B
A
B
Thank
you.
Okay,
great
so
after
everything
passes
already
emerged,
there's
that
big
juicy
green
merge
button
and
you
click
it.
You
feel
really
good
and
after
that
you
go
to
deploy
board.
So
this
is
the
builds
patient
deploy
board
for
every
PR,
that's
merged.
We,
you
can
see
it
listed
there.
You
can
use
the
compare
buttons
in
the
fourth
column,
to
see
what
the
change
set
is
between
any
two
given
builds
that
just
links
to
the
github
UI's
compare
view
between
change
sets.
B
You
can
see
what
build
is
currently
deployed
to
each
development,
it's
at
each
target
environment.
So
if
you
want
deploy
just
click
that
deploy
button
and
you
get
this
little
modal,
you
are
watched
over
by
our
vp
of
engineering
mike
who
took
an
absurd
photo
with
some
magazine
and
now
we've
created
stickers
and
put
them
all
over
the
office.
It's
pretty
great
I,
don't
know
if
he
appreciates
it.
We
have
two
different
environments
you
can
deploy
to.
B
One
is
called
next,
so
the
way
it
works
deploys
work
at
Airbnb
is
in
addition
to
all
the
testing
we
have
an
environment
called
next,
which
is
similar
to
staging.
It
doesn't
receive
production
traffic,
but
it
does
otherwise
behave
like
a
production
machine.
So
what
we'll
do
is
we
put
changes
on
next
and
then
we'll
ask
engineers
to
manually
check
their
changes.
This
is
another
way
of
saying
like
before
the
rubber.
B
Really
it's
the
road
before
when
the
rubbers
is
kind
of
like
scraping
the
road
you
can
kind
of
check
and
understand
if
anything's
going
wrong.
So
you
deployed
the
next
and
at
this
point
this
is
where
you
go.
Two
steps
in
we
created
a
little
bot
named
Yoda,
who
hangs
out
and
slack
and
tells
you
how
the
status
of
the
changes
is
going.
B
So
the
first
thing
that
Yoda
will
do
is
tell
you
that
the
changes
are
next
it'll
list,
all
the
PRS
that
are
on
next
and
it
will
add,
mention
the
user
to
inform
them
that
it's
time
to
check
their
changes,
then
each
of
them
will
come
in
and
tell
Yoda
the
status
of
their
change.
They'll
say
they're
checking
or
they'll,
say
it's
good,
occasionally,
they'll
say
it's
bad,
in
which
case
we'll
have
to
deal
with
that.
So
this
is
kind
of
the
usual
flow
of
how
things
go.
B
You
can
see
that
in
this
case
we
have
a
couple
users
who
haven't
checked
in
yet,
if
they're
not
available,
you
can
poke
the
reviewers
as
well
will
like
parse
through
the
PR
comments
to
see
what
users
were
involved,
but
users
signed
off
and
will
ping
them
as
well
cool
after
everything
is
good.
Well,
everything
is
good,
Yoda
will
say:
hey
Freddie
you
can
deploy
to
production
and
Freddie
can't
deploy
to
production
using
deploy
board,
deploy
board
updates
in
real
time.
We
have
this
kind
of
deploy
counter.
B
It
shows
you
all
the
machines,
how
many
are
in
a
waiting
state,
how
many
are
ready
to
deploy
and
wish
machines
are
deploying
at
the
time.
If
you
need
to
drill
down,
we
give
you
the
ability
to
list
all
the
machine
role
and
click
through
to
see
the
log
of
the
deploy.
So
it's
all
accessible
right
there
in
the
UI
really
easy
to
use.
B
You
don't
have
to
SSH
into
any
machines,
you
don't
have
to
do
any
like
Grippin
or
anything,
it's
all
right
there,
all
right
and
then
after
it's
done,
deploying
you'll
see
that
you
know
we
mentioned
the
users
again
just
to
let
them
know
what's
going
on,
but
we
also
linked
to
some
key
stuff.
We
link
to
our
new
relic
dashboard.
We
link
to
a
bunch
of
dashboards
with
business
metrics
so
that
people
can
monitor
and
quickly
see
what's
going
on.
So
what
we're
doing
here
is
we're
automating,
a
lot
of
the
communication
and
knowledge
sharing.
B
That's
really
important
when
you're
releasing
software,
we're
giving
you
like
we're
putting
stuff
right
in
your
face,
so
you
can
understand
whether
your
deploy
succeeded
and
how
it's
affecting
users,
but
then
we
have
to
consider
the
question
what
if
something
goes
wrong?
This
is
something
released.
Manager
usually
takes
care
of,
and
you
want
to
be
able
to
do
something
in
that
situation
we
don't
have
a
central
release
manager,
so
we
need
someone
to
help
out
when
something
goes
wrong
like
this
poor
girl.
So
the
first
thing
we
do
is
we
have
automated
alert
notifications.
B
We
you,
if
the
error
rate
increases
by
two
much
here,
it's
gone
up
more
than
three
hundred
percent,
so
we
add
mention
everyone
in
the
channel
and
say
hey.
This
is
a
problem.
We
need
to
roll
back,
or
at
least
investigate.
What's
going
on,
we
make
roll
back.
So
here
you
can
see
that
the
AB
mention
has
brought
a
lot
of
people
into
the
room.
We
have
a
picture
of
a
system
thanks
to
Ben
the
error
rate
raising.
B
So
how
do
we
roll
back?
We
make
rolling
back,
really
really
easy.
It's
right
there
in
the
UI
once
again,
so
remember
this
photo
of
deploy
board.
These
are
the
buttons
that
are
right
there
in
the
deploy
progress
bar.
So
if
you
click
roll
back,
you'll
get
a
menu
that
explains
to
you,
a
modal
that
explains
to
you.
B
What
you're
going
to
roll
back
to
you
can
see
what
the
chase
set
is,
and
you
can
see
what
the
Shah
is:
that'll
just
abort
the
current
deploy
and
immediately
begin
a
new
deploy
of
what
was
previously
on
production.
If
this
is
a
more
tricky
situation,
you
may
just
want
to
stop
the
deploy.
Midway
and
leave
some
of
the
boxes
and
it's
kind
of
half
done
state.
It
kind
of
depends
on
situation.
We
give
developers
power
to
make
to
use
their
judgment
and
make
decisions.
So
here's
the
board
model
pretty
similar.
B
It
gives
you
a
little
bit
information
explaining
when
you
want
to
abort
versus
when
you
want
to
roll
back,
then
then
afterward.
What
we
want
to
do
is
lock
the
deploy
if
master
is
in
a
dirty
state.
So
here
we
have
the
ability
to
set
a
lock
on
the
deploy,
lock.
Just
the
application-
and
we
say:
hey
master-
contains
to
change
the
brake
search.
If
you
need
to
talk
to
us
about
it,
coming
to
slack
will
explain
what's
going
on
and
we
can
figure
the
situation
out
together.
B
That
just
like
pings
all
the
open,
pull
requests
and
sets
that
status
to
read
so
everyone
knows
you
can't
deploy
right
now
so
again
we're
automating
communication
about
what
the
status
is
of
the
service
like.
What's
going
on
we're
taking
a
lot
of
operational
knowledge
and
just
broadcasting
it
in
so
that
everyone
has
this
knowledge,
you
can
also
set
deploy
locks
for
other
reasons.
One
thing
that
we
do
often
is
we
lock
if
we're
going
to
be
on
the
news,
so,
for
instance,
when
we're
on
Ellen
DeGeneres,
probably
the
biggest
event
we
had
recently
was.
B
A
A
We
talked
about
how
really
like
the
process
of
release
management
is
really
a
process
of
communication,
communicating
with
all
the
people
who
are
involved
in
making
the
code
and
shipping
the
code
and
we've
kind
of
automated
that
away
using
the
tools
we've
written
and
we
also
talked
about
how
to
recover
from
mistakes
which
another
thing
that
release
managers
often
do.
So
you
might
be
wondering
how
does
all
this
work?
So
you
do?
How
does
all
this
work
under
the
hood?
So
this
is
kind
of
the
flow
of
the
system.
A
A
People
want
to
use
web
hooks
to
like
make
automated
systems
and
integrate
with
Demeter
prize,
but
then
it
can
be
kind
of
a
pain
to
like
have
to
set
up
individual
web
hooks
for,
like
all
the
different
things
that
you
want
to
write
so
we've
written,
we
have
a
single
web
hook
that
is
installed
automatically
on
all
of
our
repositories
and
that
web
hook
sends
all
events
from
ghe
to
a
single
publisher,
and
that
publisher
turns
those
web
hooks
into
RabbitMQ
events.
So
then
deploy
board
and
other
systems
subscribe
to
those
rabbitmq
events.
A
That's
basically
the
stream
of
all
things
happening
in
get
land,
and
this
is
really
useful
because
you
can
see
the
stream
of
all
different
kinds
of
events
like
everything,
from
new
code
being
pushed
and
new
benches
being
created
to
comments
being
lost
on
PO
requests,
and
some
of
automations
rely
on
that
stuff
as
well.
So
under
the
hood,
deploy
board
itself
is
actually
composed
of
several
different
RabbitMQ
listeners,
each
of
which
performs
different
functions.
So
some
of
these
dispatch
our
build
system,
events
like
our
CI
events.
A
So
this
is
so
I
think
I
skipped
us
through
a
slide.
A
Okay,
so
arm
under
the
hood.
All
of
this
work
is
implemented.
For
us
as
rescue
jobs
were
a
big
rail
shop
we
use
rescue.
This
is
also
the
system
that
getup
itself
uses
for
it
to
delay
jobs.
This
is
an
example
of
our
like
basic
build
job.
It's
kind
of
like
the
thing
that
we
use
to
build.
Many
of
our
projects
don't
require
anything
particularly
most
more
special.
It's
like
this
job
will
like
we
find
that
a
lot
times
people
write
their
build
systems
and
she'll
scripts.
A
If
you
use
some
kind
of
job
dispatcher
and
the
job
dispatchers,
often
like
Jenkins
or
something
and
then
in
order
to
drive
Jenkins
you're
at
a
bunch
of
shell
scripts
which,
like
nobody,
can
read
or
maintain
because
their
shell
scripts,
so
we
find
that
Ruby's
a
lot
easier
to
use
for
this
and
rescue
is
just
as
good
a
dispatching
system
as
Jenkins
is,
if
not
better
in
many
ways.
So
this
is
a
basic
build
job,
it
will
clone
the
repo
and
then
it
will
install
some
dependencies
that
are
you
looking
at
p.m.
A
install
it'll
do
a
bundle,
install
and
then
it
will,
if
it's
a
master,
build,
create
a
build
artifact,
including
all
those
things
that
just
install
the
dependencies
and
ship
that
so
that
it's
available
for
the
deploy
system
to
actually
deploy
the
build
jobs
kind
of
stack
on
top
of
each
other.
So
this
is
like
a
rails
build
job.
It
basically
is
a
subclass
of
the
basic
build
job,
but
in
addition
to
that
it
also
thanks
Topher.
It
also
does
a
couple
of
things
that
are
real
specific.
A
In
particular,
it
compiles
assets
for
rails
and
packages
goes
up
and
uploads
them
to
s3.
If
we
use
an
asset
host
for
this
project-
and
maybe
I'll
I'm
just
going
to
try
to
keep
clicking
and
hopefully
it'll
just
work,
okay.
So
how
does
the
play
board
know
what
are
all
the
different
tasks
that
it
should
do
when,
like
some
event
comes
in
from
RabbitMQ?
You
get
like
it's
suppose.
You
get
an
event
that,
like
new
code,
has
been
pushed
to
click
the
repository.
What
is
it
like
like
what
happens
next.
A
Okay,
so
this
is
a
screenshot
of
our
deployed
board,
apps
repo.
This
is
like
the
JSON
that
engineers
right
to
configure
their
repositories
so
suppose
that
you're,
a
new
engineer
or
like
let's
say,
you're,
an
experienced
engineer
and
you
have
new
project
like
some
new
thing,
you
want
to
stand
up
and
be
serviced
and
you
want
that
service
to
get
the
regular
CI
flow
and
the
regular
deployed
flow.
You
write
one
of
these.
It's
just
a
JSON,
config
it'll
specify.
So
this
is
the
jason
config
for
deploy
board.
A
We
use
deployed
board
to
deploy
itself,
so
it's
just
like
any
other
project.
So
you
know
you
give
us
the
title
of
the
project.
You
listed
it's
a
repository
URL,
which
is
how
we
know
that
this
is
the
JSON
config
that
corresponds
to
those
events
like
if
the
repository
URL
matches
and
we
specify
some
additional
things
like
Oh,
post
notifications
about
this
app
in
this
channel,
and
then
we
have
a
cup
of
CI
events
listed
here.
This
is
like
the
build
rails
job
that
you
saw
earlier.
A
We
use
solano
to
do
the
unit
tests
for
this
app,
so
it
also
dispatches
those
from
that
from
that
from
that
same
place
and
then,
finally,
at
the
bottom,
you
see
this
collection
of
targets
these.
These
are
like
the
things
you
actually
deploy
to
and
you
can
specify
like
like
what
are
those
things
like?
A
What
are
the
different
deploy
targets,
and
I
also
like
what
are
the
specific
roles
of
the
machines
involved
in
that
deploy,
so
for
deploy,
order
to
be
like
some
web
workers,
some
chron
workers
and
some
rabbit
and
key
workers
which
do
all
the
which
wich
do
this
stuff
that
we're
looking
at.
So
people
are
pretty
pleased
with
this.
They
we
would
like
to
encourage
people
to
write
more
services
over
time,
and
people
have
been
writing
more
deployed
board
apps
over
time.
A
So
we
now
have
like
about
170
different
deployed
board
apps
that
are
like
they
all
have
the
same
kind
of
workflow,
not
necessarily
always
as
complicated,
obviously
keep
working
on
a
smaller
app.
It's
a
little
bit
easier
to
deploy.
You
have
fewer
people
contributing
so
there's
less
coordination,
but
for
something
like
are
monolithic,
rails
app,
which
still
has
a
lot
of
code.
This
kind
of
coordination
stuff
is
useful,
cool.
B
So
one
thing
we
want
to
do
is
start
doing
canary,
deploys
the
ability
to
take
a
branch
and
put
it
on
a
subset
of
the
production
machine,
so
they
can
receive
production,
traffic
and
developers
can
what
the
diff
is
an
error
rate
or
they
can
play
around
with
actual
production
data
to
see
you
know,
hauser
change
working
out.
We
have
a
lot
of
the
Meccan,
the
mechanics
for
this
or
which
they're
on
the
back
end.
B
One
thing
we
want
thought
into
what
the
UI
looks
like
developers
to
see
their
favorite
branches,
understand:
they're,
like
work-in-progress
branches
versus
the
branches
that
are
ready
to
ship
and
to
create
a
UI
that
easily
lets
them
see
what
the
difference
is
between
the
error
rates
of
production
and
the
canary.
Another
thing
we're
doing
is
core
is
expanding
the
alert
automation
for
metrics,
so
the
alert
that
we
show
prior
was
the
new
relic
error
rate
and
notify
everyone
to
channel.
B
You
know:
that's
okay,
we'd
like
to
be
able
to
alert
on
more
specific
metrics,
including
business
metrics,
so
conversion
metrics
for
the
core
booking
flow
Facebook,
sharing
payments
by
country
and
under
or
like
trust
and
safety,
related
metrics,
and
be
able
to
alert
specific
teams
or
even
be
able
to
highlight
certain
individuals
if
we
know
that
they
recently
altered
code
that
is
related
to
this
metric.
So
you
know
right
now
it's
kind
of
a
blast
everyone,
but
we
want
to
make
sure
everything
everyone.
A
B
Gets
information
that's
relevant
to
them
cool,
so
I
talked
a
little
bit
about
making
deploy
board
more
personalized.
Before
with
seeing
your
favorite
branches,
you
know
we
show
why
there
were
just
bills
listed
and
deploys
listed,
and
actually
you
don't
need
to
see
that
information
that
was
kind
of
a
naive
approach
to
you,
I
that
we
took
just
to
make
things
simple
in
the
beginning,
but
actually
now
that
we're
building
every
single
pull
request,
you
know
you're
not
going
to
deploy
every
single
one
of
those,
so
it
might
not
be
useful
to
see
them
all.
B
We
want
to
give
you
a
more
particular
view
of
stuff,
that's
relevant
to
your
team
stuff,
that's
relevant
to
you
at
the
very
moment
in
regards
to
understanding
the
recent
history
of
deploys
recent
walks
and
so
on,
maybe
even
upcoming
locks
that
are
going
to
affect
your
workflow.
Finally,
we
want
to
do
better
data
gathering.
We
want
to
prompt
people
during
rollbacks
to
do
the
right
thing.
We
want
to
give
them
information
about.
You
know
what
apps
are
historically
risky.
B
A
Want
to
talk
about
this,
oh
one
thing
that
we
found
is
that
actually
a
lot
of
companies
probably
build
something.
Kind
of
similar
to
this,
like
I
know
that
a
lot
of
big
companies
build
tools
that,
like
ship
their
code
out,
but
then
we
find
that,
like
you
know,
like
we
talked
to
people
at
other
companies
and
they've
also
all
built
some
kind
of
homegrown
deployed
flow,
and
this
is
like
really
not
the
like
github
way
right
like
it's
silly
that
we're
all
building
the
same
tools
over
and
over
again.
A
So
we
would
really
love
to
open
source
deploy
board
and
make
it
available
for,
like
other
people
to
use,
know
the
companies
and
also
to
like
improve
the
project
arm
and
right
now.
The
project
is
like
a
little
bit,
you're
being
be
specific,
it
kind
of
it
it
like
consumes
a
lot
of
things
that
depend
on
the
Airbnb
environment,
and
so
it's
like
not
really,
it's
not
necessarily
a
trivial
project
to
factor
it
out
and
make
it
a
standalone
project.
A
But
one
thing
that
would
be
helpful
is
if
we
received
a
lot
of
feedback
from
people
who
hear
talks
like
this,
that
they
want
to
use
a
tool
like
this,
that
it
would
be
helpful
for
them.
So
if
that's
something
is
interesting
for
you,
and
especially
if
you
want
to
try
something
decide
at
your
company
like
maybe
before
it's
a
completely
been
open
source
and
a
great
lead
be
put
on
public
github,
but
like
maybe
we
can
try
to
figure
out
how
to
install
it
in
your
environment.
A
Okay,
so
something
doing
I'm
gonna
finish
something
up
just.
B
Review
what
we've
gone
over,
you
know
we
talked
about
release
me.
We
don't
want
release
managers,
we
want
everyone
to
release
their
own
code.
We
think
there
are
practical
benefits
to
this.
Primarily,
we
think
there
are
huge
cultural
benefits
of
this
making
developers
happy
and
giving
them
a
lot
of
control
over
their
own
workflow.
So
if
we
look
at
the
list
of
the
things
release
managers,
do
you
know
we
had
all
this
arcane
knowledge
like
how
to
deploy
how
to
deal
with
bad
changes?
B
How
to
understand
if
something's
bad,
we
replaced
a
lot
of
that
with
tooling.
So
you
know
just
click
a
button
to
do
most
things.
We
have
some
automation
around
communication
around
alerting
and
when
things
go
really
south,
we
have
locks
to
help
communicate
service
status
and
prevent
people
from
making
the
situation
worse.
So
you
know
one
thing
is
you
know
we're
doubling
every
year
as
the
team
scales,
it's
very
tempting
to
try
to
put
a
lid
on
the
chaos
by
centralizing
controls,
trying
to
make
sure
like
okay,
everyone
has
a
certain
probability
of
breaking
things.
B
We
need
to
really
funnel
everything
through
a
knowledgeable
expert
or
a
team
of
experts
in
order
to
prevent
problems
from
happening,
but
you
don't
have
to
do
that
if
you
and
think
about
carefully
about
like
what
you're
trying
to
do
and
invest
in
the
right
tooling,
for
you
and
your
team,
you
can
allow
people
to
run
free,
given
the
right
tools,
helping
them
be
prepared
for
the
challenges
they're
going
to
face.
So
that's
the
message
we
want
to
send
out
today.
Thanks
for
listening,
does
anyone.
B
B
A
Yeah,
that's
right,
so
the
changes
a
lot
question
yeah.
So
the
question
is:
how
do
you
deal
with
like
auditing
requirements
like
what
happens
when,
like
you
know,
maybe
this
code
is
particularly
sensitive
or
something
like
that.
So
I
think
the
controls
for
that
can
be
enforced
at
the
repository
level
like
in
general.
If
you
can
push
through
the
repository,
then
the
game
is
over
right,
like
it's
not
about
deploys
it's
about
like
the
code
itself,
so
for
some
of
like
in
general,
everybody
we
keep
all
repositories
public.
A
Like
generally,
anyone
can
contribute
to
any
repository
for
some
of
the
more
sensitive
stuff
that
we
do
on,
for
instance,
our
payments
infrastructure.
Those
teams
are
starting
to
lock
down
some
repositories
that
contain
the
core
code,
and
only
a
few
people
can
push
those
repositories
and
by
enforcing
the
controls
and
pay
her
like
you
know
like
they
have
access
to
the
same
tools,
the
same
workflows.
If,
like
it
like
a
you
know,
the
build
will
be
generated
for
the
payments
repository
if
a
code
event
happens
not
repository
and
then
those
builds
are
like
signed.
A
And
you
know
those
are
the
only
things
you
can
you
can
deploy,
but
yeah
I
think
the
controls
are
around
the
code
itself.
Does
that
does
that
answer
your
question.
A
Oh,
we
don't
that's
right,
we
don't
have
that.
No
I
mean
we
assume
that,
like
there
are
places
to
hook
that
into
right,
like
at
the
point
where
you
are
allowed
to
click
the
merge
button.
That's
the
point
where
we
would
say
something
like
you
can't
hit
the
merge
button
unless,
like
the
appropriate
people,
have
reviewed
your
code
or
like
only
some
people.
This
is
why
we're
really
excited
about
in
protected
branches
and
mandatory
statuses.
This
is
like
a
huge
feature
for
us
super
excited
about
that.
B
B
Sure
so
the
question
is
about
you
know
this
is
about
making
changes
to
code,
and
the
question
is
about
what
about
making
changes
to
data,
for
instance
altering
the
schema
of
a
database
or
about
like
making
alteration
to
a
data
set
that
might
break
halfway
through
so
schema,
migrations
are
kind
of
a
sensitive
issue.
They
can
threaten
stability
quite
a
bit,
so
we
have
a
kind
of
a
more
cultural
system
right
now
around
making
sure
that
schema
migrations
are
safe.
When
people
have
a
migration,
they
want
they'll
write
a
proposal
for
what
it
would
be.
B
They
like
actually
write
out
what
the
alter
table
statement
is
going
to
look
like,
and
they
send
it
to
a
team
of
people
on
our
production
infrastructure
team
who
have
expertise
in
my
sequel
and
they
get
feedback
on
that
oftentimes.
In
that
case,
for
really
sensitive
schema,
migrations,
encore
tables.
B
What
will
often
do
is
we'll
wait
for
a
little
bit
and
collect
like
several
proposals
that
people
have
and
then
we'll
like
bash
them
all
together
in
one
alter
table
statement
so
that
we
don't
have
like
crazy
stuff
going
on
all
the
time,
but
in
general
we
have
like
lots
of
little
schema
migrations
going
on
all
the
time.
The
process
is
fairly
lightweight.
Most
things
that
aren't
threatening
will
like
pastor
review
very
quickly
in
regards
to
changes
to
like
services
that
serve
data.
B
We,
the
plea
that
varies
a
little
bit,
not
all
the
services
that
do
that
kind
of
thing
use
deploy,
bore
for
that
kind
of
like
data
release
process,
those
that
do
generally
have
the
ability
to
just
kind
of
start
from
scratch.
I
guess
it's
like
idempotent!
You
could
call
it
right
if
you're
going
to
release
a
data
set,
you
release
it.
A
A
Right,
like
if
I'm
deploying
code
and
there's
like
three
other
people
with
code
in
my
deploy,
then
all
those
people
are
kind
of
shared
responsible
for
like
that
deploy
going
successfully,
and
if
that
deploy
doesn't
go
successfully
start
getting
things
from
the
automated
systems
and
a
tomato
system
say
the
error
rates
increased
or
the
business
metrics
are
dropping
or
something
like
that.
Then
those
people
will
roll
back.
A
The
code
there'll
be
like,
like
prodded
by
the
system
to
roll
back
the
code,
and
then
they
have
time
to
investigate
and
so
they'll
be
able
to
figure
out
like
Oh
what
exactly
broke.
How
do
I
revert
that
change?
It's
that
the
job
of
those
people
who
happen
to
be
like
managing
this
particular
release.
So
then
the
question
is
maybe
beyond
that.
You
know
suppose
it
like
we're
always
like
breaking
code
for
the
same
reason
like
every
single
day,
like
like
people
cause
like
the
same
basic
incident
or
something,
and
how
do
we
like,
like?
A
What's
the
longer
term
solution
beyond
just
like
reverting
the
code
and
like
fixing
it
so
we
have
like
this
is.
This
is
not
really
part
of
the
deploy
process,
and
so
it's
not
really
handled
by
the
same
tool.
But
we
do
have
like
a
post-mortem
tool
where,
if
people
break
production
in
like
a
severe
enough
way,
they'll
often
write
a
post
mortem.
That
explains
like
this
is
what
happened,
and
this
is
why,
like
this,
was
difficult
to
catch.
A
It's
like
a
tricky
issue,
and
here
are
some
remediation
steps,
so
we
can
take
to
figure
out.
You
know
to
make
sure
that
this
doesn't
happen
like
over
and
over
again
and
then
either
they
or
like
the
teams
that
are
interested
in
that
stuff
will
like
take
it
on.
So,
for
instance,
if
we
see
that,
like
one
reason
that
we're
like
breaking
the
site
over
and
over
again,
is
because
some
like
front
end
pieces
are
not
tested
or
something
then
like.
A
A
A
The
first
step
is
that
we
need
to
produce,
builds
for
both
branches
as
well
as
master
and
that's
like
the
easy
step,
and
then
we
have
to
go
kind
of
a
UI.
The
lets
you
discover
the
different
builds
for
different
branches.
Right
now,
are
you
I
specifically
around
master,
and
then
once
you
do
have
a
build
you
can
you
have
a
deploy
button?
A
Where
do
you
actually
deploy
that
our
plan
right
now
is
to
spin
up
several
canary
environments
so
like,
for
instance,
for
like
monorail,
which
has
like
lots
of
deploys,
and
there
might
be
lots
of
like
a
like
a
backlog
like
a
traffic
jam
to
use
those
environments,
we
will
spin
up
like
five
or
six
of
them,
or
something
like
that
and
then
you'll
be
able
to
deploy
to
that.
The
deploy
system
will
make
sure
that
you
can't
deploy
build
it's
too
old.
A
A
build,
that's
behind
master
I'll
has
to
be
forward
of
master,
and
then
you
will
lock
that
environment
to
yourself
automatically
for
like
some
period
of
time
and
then
after
that
credit
time
is
over.
We'll
just
turn
that
environment
off
and
in
the
meantime,
that
period
of
time
you
have
to
see
how
the
error
rates,
or
whatever
differs
between
the
environment
and
so
that
environment.
We
receive
production
traffic
as
well,
and
you
also
be
able
to
go
there
in
your
browser
under
a
separate
name.
A
A
A
Islands,
this
is
kind
of
a
deep
infrastructural
question.
The
the
real
reason
is
that
for
us,
it
takes
probably
around
10
minutes
or
so
to
spin
up
a
new
instance,
because
we
run
chef
from
scratch
on
like
a
blank
slate
instance,
and
that
takes
like
some
time
arm.
So
we
just
want
to
make
that
process
fast
so
like
we
want
to
have
the
boxes
like
ready
to
go.
That's
that's
kind
of
like
why
we
do
the
deploy
process.
A
The
way
we
do
anyway,
like
a
lot
of
people
team,
you
know
companies
or
something
might
like
spin
up
a
whole
new
set
of
workers
for
a
deploy
that
it
would
take
too
long
for
us.
We
just
want
to
reuse
the
same
instances.
This
might
be
optimized
at
some
point
in
the
future,
but
for
now
the
system
is
like
working.
Okay,
I
already.
B
How're
you
what
yeah
so
okay,
wait,
how
we
use
New
Relic
to
measure
error
rates,
yeah.
A
B
Yeah,
maybe
I
misunderstood
question,
so
the
way
we
use
new
relic
is
they
have
this
gem
that
works
really
well
for
rails
applications.
So
if
you
just
put
in
your
gem
file-
and
you
know-
add
some
configuration-
they
have
like
a
nice
default
configuration
file.
Put
your
license.
Key,
maybe
like
tune
things
a
little
bit.
It'll
just
automatically
report
errors
to
New
Relic
were.
B
All
oh
I
see
just
exceptions,
so
any
requests
that
five
hundreds
to
a
user
because
of
some
exception
that
the
code
raises
goes
to
New
Relic,
so
those
are
actual
errors
that
the
user
encounters.
We
also
have
separate
exception,
like
an
exception
tracker,
with
more
detailed
back
traces
that
lets
you
filter
by
controller
in
action
and
all
that
stuff.
B
Those
also
allow-
we
include
errors,
in
the
exception,
exception
tracker
that
the
code
rescues
and
then
you
can
like
manually
send
the
error
to
the
exception
tracker
just
so,
you
have
more
information
to
debug
without
impacting
the
user
experience,
but
if
the
New
Relic
error
rate
itself
goes
up,
that
means
we're
seeing
a
significant
change
in
the
user
experience.
So
it's
well
worth
the
learning
on
cool
thanks
a
lot
we'll
be
here
afterwards.
If
you
want
to
come
and
talk
to
us
some
more.
Thank
you.