►
Description
John Jarvis talks about deployer and patcher for GitLab.com
A
A
A
Before
we
began,
I
just
wanted
to
say
if
we
could
ask
questions
in
a
meeting
agenda
and
they're
also
some
great
questions
from
the
last
presentation
we
did
earlier.
So
maybe
read
those
as
well,
and
you
know,
so
we
don't
ask
the
same
things
so:
introduction
I'm,
John,
I'm,
a
senior
sre
and
the
delivery
team
working
on
deployment
and
patching
this
presentation
kind
of
served
as
an
overview
of
what
we've
been
doing
in
the
last
month
to
improve
deployment
and
patching.
A
A
We
are
pretty
selective
about
changes
that
we
make,
and
this
makes
it
so
that
there's
a
bit
of
manual
work
to
try
to
see
work
for
the
release
team
and
the
release
managers
and
the
delivery
team
right
now
to
choose
what
changes
go
into
the
release
and
then
we
move
these
release.
Candidates
through
our
pipeline
and
pipeline
right
now
is
staging
goes
to
QA,
then
canary
and
canary
QA,
and
then
production
I
also
added
a
bullet
ear.
That
mentions
that.
Not
all
our
C's
make
it
to
production.
A
What
we
actually
release
is
the
omnibus
package,
and
this
is
the
same
package
that
we
give
to
our
self-managed
to
the
two
self-managed
community.
It's
a
six
hundred
megabyte
baby
and
file
for
boon
to.
Of
course,
we
build
this
package
for
all
different
types
of
platforms,
and
it
contains
everything
you
need
in
one
package
to
run
gitlab.
It
contains
the
rails
code
for
get
lucky.
It
contains
work
horse,
get
a
leak,
if
left
shell
pages
registry
sidekick
and
get
lab
monitor.
A
It
also
contains
other
binaries
that
you
might
need
to
run,
get
led
like
postgrads
right
as
prometheus,
an
alert
manager
for
again
lab
comm.
What
we
do
is
we
install
this
omnibus
across
the
entire
fleet,
and
then
we
selectively
enable
or
disable
the
services
that
we
that
we
need
or
what
to
disable.
The
other
thing
is
also
that
we
use
the
omnibus
for
rebus
right
now,
since
we
moved
our
h,
a
database
solution
to
Patroni,
we
stopped
using
the
omnibus
Postgres
and
we
have
our
own
deployments
of
prometheus
and
alert
manager.
A
We
also
in
addition
to
these
releases
this
this
these
monthly
releases.
We
also
have
security
releases
that
happen
and
those
have
to
be
back
ported
for
the
last
three
releases.
So
that's
something
that
happens
every
month
and
then
I
also
wanted
to
talk.
We'll
be
talking
a
bit
in
this
presentation
of
what
we
do
for
patches
and
when
I
talked
about
release
patches
when
I'm
talking
about
are
these
post
deployment
patches
that
we
create
when
the
code
has
been
deployed
to
production,
and
we
find
an
incident
to
be
a
security.
A
So
here
is
the
deployment
pipeline.
You
can
see
that
it's
broken
into
environments,
we
have
staging,
we
have
canary
and
we
have
production.
What
we've
done
in
the
last
month
is
we've
transitioned
off
of
the
previous
deployment
tool
called
take
off
into
a
new
deployment
tool
called
deployer
which
allows
us
to
do
it
leverages,
get
lab
CI
CB,
and
it
allows
us
to
create
a
pretty
flexible
pipeline
for
deploying
the
omnibus
one.
Nice
thing
about
the
new
deployer
is
that
it's
flexible
enough.
A
A
A
A
It
goes
all
the
way
from
staging
to
production,
each
stage
or
actually
each
environment,
creates
pre-chat
or
has
pre
checks
that
checks
for
things
like
critical
alerts,
health
checks
for
the
load,
balancer
tier
and
versions,
and
things
like
this
also
at
each
environment-
we're
adding
or
we've
added
QA
smooth
tech
smoke
test
that
at
the
end
of
each
stage,
does
a
set
of
tests
using
QA
and
for
the
canary
stage.
What
we
do
is
we
run
a
kill
at
QA
and
we
set
the
canary
cookies
so
that
it
checks
just
the
canary
infrastructure.
A
A
So,
basically,
when
we,
when
we
deploy
to
a
fleet
of
machines,
there
are
a
sequence
of
operations
that
happened
to
start
off.
A
pipeline
is
initiated
by
with
two
variables
deploy
version
and
deploy
environment,
and
then
we
deploy
to
a
lot
of
different
fleets,
sometimes
in
parallel,
and
we
do
that
in
batches
of
10%.
A
We
can
also
be
can
also
when
we
initiate
this
pipeline
and
I
mentioned
this
earlier.
We
can
actually
pass
in
multiple
environments,
so
this
would
create
a
longer
pipeline
that
goes
like
from
staging
to
canary,
but
we
don't
currently
do
that
now,
if
every
omnibus
installation,
what
it
does
is
it
goes
through
this
sequence
and
illustrated
this
in
the
diagram.
This
is
the
deployed
lab
demo,
which
is
the
definition
for
deploying
to
a
single
fleet.
It
does
it
in
ten
percent
batches
and
we
first
drain
connections
from
H
a
proxy.
A
It's
gotten
a
bit
better
in
the
last
month,
because
we
have
new
tooling.
That
makes
it
one
it's
a
little
bit
more
self-service
for
developers
and
too
it's
automated.
You
see,
I,
see
I,
see
I,
see
D,
which
means
that
we
don't
have
to
manually
patch
the
fleet
anymore.
If
you
do
need
to
create
a
post
deployment
batch-
and
this
is
usually
for
an
s1
or
s2
incident.
A
What
you
do
is
you
submit
an
mr2,
the
patches
repo
and
it's
version
by
there's
a
bunch
of
directories
in
there
for
each
version,
and
you
can
test
it
on
staging
on
your
branch
and
then,
once
you
merge
it,
it
gets
deployed
in
a
pipeline
all
the
way
to
production.
I
put
some
bullets
here
that
can
explain
the
benefits
of
this
new
patching
to
lean
right
now.
Patches,
for
example,
like
before,
was
manual.
A
Now
it's
only
applied
through
CI
CD
right
now
as
well-
and
this
is
fairly
recent
patches
can
be
reapplied
on
automatically
on
new
deployments.
So,
if
deploying
a
new
version
of
the
omnibus-
and
we
need
to
maintain
a
patch-
and
this
is
a
fairly
unusual
thing
for
us-
but
if
we
need
to
do
that,
the
new
deployer
tool
candles
it
and
another
nice
benefit-
is
that
the
patching
tools
share
the
same
configuration
with
the
deployment
tooling.
So
it's
all
using
the
same
configuration
code
and
it
makes
it
a
bit
cleaner,
I
put
a
link
here.
A
So
this
slide
describes
a
little
bit
about
the
deployed
tools,
development
and
we
chose
ansible
as
a
tool
because
it
just
suits
very
well.
This
idea
of
like
orchestrating
SSH
across
many
different
fleets
of
servers,
its
batteries
included,
meaning
that
it's
it
includes
a
lot
of
the
things
that
we
typically
do
like
install
packages
deal
with
H
a
proxy.
A
A
A
This
kind
of
this
gives
you
a
brief
summary
of
the
different
projects,
the
repositories
that
we
use
for
the
deployer
and
patcher.
There
are
essentially
three
projects
right
now
that
have
pipelines
the
deployer,
the
patcher
and
the
registry.
We
starter
these
projects
just
to
contain
and
get
map
CI
that
yeah
moe.
That's
pretty
much
it
and
then
they
have
get
sub
modules
for
the
actual
tooling
repository
that
contains
the
ansible
code
and
then
there's
also
the
Patra
project,
which
not
only
contains
the
get
mod
CI,
but
it
also
contains
those
deployment
patches.
A
B
A
And
I
wanted
to
add
some
links
here
for
future
improvements.
One
thing
that
we're
working
on
now
is
directing
all
requests
that
are
made
to
the
git
lab,
comma
or
group.
Those
are
gonna
go
out
by
default
to
Canary
and
it'll
be
opted
out.
The
advantage
of
this
is
allows
us
to
test
more
internal
traffic
on
canary.
It
also
allows
us
to
test
traffic
to
other
services
like
HTTP,
GET
and
registry,
that
use
the
give
comm
or
get
lab
org
path.
This
does
not
mean
that
all
Gila
traffic
is
going
to
go
to
canary.
A
It
just
means
that
we'll
have
the
ability
to
send
more
internal
traffic.
There
we're
also
looking
at
daily
deployments
to
staging.
We
have
this
proposal
in
flight
for
adding
a
one
box
environment
which,
essentially
before
we
deploy
to
all
the
production
we
can
deploy
to
a
single
single
node
in
each
cluster,
which
gives
us
a
little
bit
more
data
and
metrics
before
we
decide
to
promote
the
entire
fleet
and
I
just
linked
here
to
all
the
things
that
we're
doing
to
reduce
the
number
of
manual
steps
during
releases.
A
C
A
A
So
we
essentially
deployed
two
fleets
by
roles
and
chef
and
in
the
ansible
config
you
specify
you
specify
a
role,
and
this
is
actually
done
in
the
guild
lab
CID
ammo
for
each
of
the
jobs
we
set
a
role
and
that
role
is
passed
ansible
and
then
that
role
ends
up
being
in
the
hosts
list.
For
the
you
know,
deploy
they
begin
by
a
deploy,
yamo
file
and
then
that
map's
to
a
bunch
of
hosts.
A
D
Are
there
any
other
questions,
otherwise
we'll
just
send
early
I
can
take
on
the
next
one.
So
I
have
two
questions.
The
first
is
roll
back,
so
I'm
glad
to
see
that
we
have
a
naming
convention
for
that
roll
back.
Now
it's
we're
still
doing
a
roll
forward
mechanism,
or
is
it
a
true
roll
back
like
what
what's
right
so.
A
See,
there's
actually
there's
a
there's,
an
open
issue
for
handling
roll
backs
a
bit
better
than
we
do
right
now
for
now
the
way
that
we
roll
back
there
there
are
two
roll
backs,
one
is
for
post-deployment
patches
and
to
roll
back
a
post-deployment
patch.
We
just
you
just
changed
the
file
to
dot
roll
back
and
then
the
pipeline
will
just
roll
back
the
patch.
The
other
type
of
roll
back
is
when
we
actually
deploy
an
omnibus
and
we
need
to
roll
back
the
omnibus
for
that.
A
A
What
we
want
to
do
is
deploy
digitally
after
the
fleet
to
ensure
that
it's
done
safely,
if
we
ever
do
get
like
often
like,
if
we
get
all
the
way
to
production,
past
post,
employee
migrations
and
we
have
to
roll
back,
usually
we're
trying
to
roll
back
as
fast
as
possible.
But
you
know
we
are
like
looking
into
safer
ways
to
roll
back,
got.
D
It
thank
you.
My
last
one
is
a
round
post
deployment
that
involves
database
migration.
So
for
context
we
are
we
clean
up
some
of
the
database
database.
Discrepancies
last
quarter.
Are
there
any
processes
that
makes
sure
that
if
a
post
deployment
patch
has
a
database
migration,
that
migration
makes
its
way
into,
they
they've
seen
the
master
and
make
sure
that
we
don't
have
database
schema
discrepancies
between
production
and,
what's
in
code,.
A
D
So
if
there's
an
s1
or
s2
performance
do
we
need
to
address
in
production
like
we
submit
a
post
deployment
patch
that
involves
a
migration
or
schema
change
in
the
DB
he
goes
into
production.
Are
there
any
guardrails
to
make
sure
that
it
makes
its
way
back
into
it
to
master,
because
there's
we
had
to
clean
up
some
of
this
before
last
quarter
and
I?
Think
it's
just
a
matter
of
doing
that.
The
process
is
there
that
the
chain
just
makes
way
back
into
master
specifically
for
all
schema
changes.
Yeah.
A
B
You're
right,
John,
I,
think
your
question
man
because
let's
say
we
had
to
submit
a
merge
request
that
had
a
database.
Migration
has
to
actually
change
his
schema.
We
haven't
actually
run
in
that
case,
yet
if
I
think
there
was
one
case
where
the
approvers
was
not
right,
this
last
release
and
I
think
today
to
be
see
manually
just
set
reset
some
of
those
columns
and
wasn't
actually
in
a
migration.
So
it's
a
good
question.
I
think
we
should
open
up
an
issue
to
figure
out.
B
A
Sounds
good
and
it
kind
of
reminds
me
of
the
issue
right
now
with
post
deployment
migrations
because
oftentimes
we
consider
this
the
point
of
no
return,
but
it
happens
during
you
know
in
our
deployment
pipeline
for
production
post,
deploy,
migrations
may
possibly
delete
data
from
the
database
like
evicting
stuff,
but
you
know
a
column
in
a
table
or
something
like
that.
So
you
know
these
are
these
are
discussions
as
well
and
in
the
context
of
rollback
when
those
post
deployment
migration
should
be
run?