►
Description
John Jarvis presents the steps taken while building the pre-production environment
Slides: https://docs.google.com/presentation/d/15nWPLNRYvSjIdLr4NKI1OLHJedYAKDhbeV7tpnSWJ6w/edit
A
A
A
A
So
this
presentation
is
about
creating
the
pre
prod
environment
and
it
also
covers
a
little
bit
of
material
about
what
an
environment
is
versus
what
an
internal
deployment
is,
and
it
goes
into
a
little
bit
of
why
we
need
this
environment
to
begin
with
so
I'm
starting
off
with
this
summary,
what
is
pre-production
just
kind
of
give
you
a
general
overview.
We
need
it
to
build
this
environment,
because
this
year
we're
planning
on
trying
to
improve
deal
of
deployments
in
general,
and
these
are
deployments
that
go
to
get
comm.
A
We
are
planning
to
make
them
more
frequent
and
make
them
more
reliable,
and
in
order
to
do
this,
we
discovered
that
we
really
need
a
new
environment
other
than
staging
just
because
there's
so
much
contention
for
staging
when
we're
validating
fixes.
So
initially
this
new
environment.
For
this
new
deployment
of
Gil
Lab
is
going
to
be
used
for
testing
fixes
that
are
made
after
we
deploy
to
production.
Perhaps
security
fixes
maybe
fixes
for
regressions,
but
it
kind
of
remains
to
be
seen
exactly
how
we're
going
to
use
it.
A
But
in
a
way
this
is
just
going
to
be
something
that
runs
alongside
of
staging
that
we
can
use
for
validation
before
we
go
to
production
with
code.
There
are
a
few
differences
from
both
staging
and
production.
Initially,
this
is
just
going
to
have
a
fresh
database
doesn't
have
any
customer
data
on
it.
It's
completely
separate
from
production
doesn't
share
any
infrastructure.
This
means
it's
in
its
own
GCP
project
and
you
know
we're
managing
access
with
Google,
OAuth
and
also
initially
for
this
first
iteration.
A
It
doesn't
have
any
h.a,
but
we
are
building
it
with
a
che
in
mind
to
the
added
leader
to
start
off,
I'm
gonna
kind
of
go
through
the
different
environments
and
starting
off
for
just
internal
deployments,
and
I
wanted
to
kind
of
begin.
This
began
this
slide
with
just
talking
about
single
installations,
because
these
are
quite
easy.
Anis.
One
of
the
things
I
really
like
about
gitlab
is
that
we
make
it
extremely
easy
to
install
in
a
single
server,
and
that's
thanks
to
the
omnibus
package.
A
Despite
the
high
number
of
moving
parts
and
the
complexity
you
really
all
you
need
to
do
is
and
have
to
get
install
or
an
RPN
install,
and
you
think
you
know,
however
running
instance.
This,
of
course
has
some
shortcomings.
One
is
that
it's
a
single
instance,
so
it
doesn't
have
any
high
availability.
A
Of
course,
there's
other
things
like
the
logs
are
on
the
instance
itself,
so
you
need
SSH
access.
There
is
monitoring,
but
it's
do
think
it
web
application.
It
doesn't
monitor
everything
that
you
might
need
to
monitor.
All
of
the
assets
are
locally
on
disk.
So
if
you
want
to
use
object,
storage,
there's
some
additional
configuration
that
you'll
need
for
that.
So
this
is
where
we
introduce
environments
and
the
first
tier
of
environments
introduced
these
sort
of
additions,
and
these
are
services
provided
by
the
infrastructure
team.
A
You
have
you
know,
chef
manage
servers,
you
have
centralized
logging
and
monitoring.
Runners
are
connected
that
are
managed
by
the
infrastructure,
team,
object,
storage
and
central
authentication.
I
put
these
two
as
the
first
examples,
because
they're
the
simplest,
dev
and
ops,
which
I'm
sure
everyone
has
used
for
most
people
have
used.
A
The
next
here
is
I'm,
putting
this
new
environment
to
pre,
prod,
predicate
lab
comm,
and
it
introduces
a
bit
more.
It
has
a
bastion,
it
has
some
AJ
and
you
can
also
do
chat,
ops
deployments
and
then
the
last
tier
of
environments
are
production
and
staging.
Of
course,
production
and
staging
are
extremely
similar
in
topology,
although
they
do
have
some
differences,
like
the
number
of
instances
is
much
smaller
for
staging.
In
some
cases
the
instance
sizes
are
different,
but
we
typically
keep
them
the
same.
They
have
separate
databases.
A
Staging,
of
course,
has
production
data
in
it
so
that
we
can
test
against
it
before
we
go
to
production
and
production.
Has
this
PR
environment
attached
to
it,
which
is
a
geo
deployment
which
is
currently
being
built
out
and
both
of
them
have
canary
I'm,
not
going
to
go
into
more
detail
here
on
what
the
canary
deployment
is
or
what
the
the
canary
stage
or
the
geo
deployment.
But
this
is
sort
of
the
layout
of
environments
that
we
have
in.
You
know
internal
to
give
that.
A
So,
let's
just
kind
of
summarize
like
what
is
an
environment,
and
these
are
all
services
that
the
infrastructure
team
provides.
You
know
you
have
chef
and
terraform,
centralized
logging
monitoring
and
alerting
authentication
access
with
a
bastion.
We
have
to
set
up
Network
peering
to
the
operations.
You
know
the
operations
infrastructure
so
that
we
can
do
deployments
rails
and
DB
console
access
for
developers
that
we
control-
and
this
is
someone
asking
a
question-
no
okay
and
and
also
like.
We
can
configure
these
environments
with
a
very
similar
to
Paula
g-2
galeb
comm.
A
So
I'm
going
to
talk
here
about
the
very
like
minimal,
Viable
Product
for
pre
prod,
and
this
was
purposely
done
to
make
so
that
we
can
create
this
thing
quickly
and
you
know
deploy
to
it
as
soon
as
possible.
So
it's
very
simple
or
as
simple
as
it
gets.
When
it
comes
to
environments,
we
have
a
gift
that
server.
We
have
a
brother,
we
have
a
bastion,
we
have
monitoring
infrastructure,
which
includes
Prometheus
and
the
alert
manager
and
some
exporters.
A
What
this
doesn't
include
are
some
of
the
things
that
we
may
or
may
not
add
later,
and
we
probably
will,
but
those
are
in
the
gray
box.
It's
like
load
balancing
having
a
dedicated
H,
a
Patroni
database
and
Redis
clusters
and
separate
giddily
servers
and
I
guess.
The
thing
to
keep
in
mind
here
is
that
we've
built
out
kind
of
the
basic
environment
here,
and
we
can
add
to
this
later
as
we
need
it.
A
A
That's
pretty
much
it
so
I'm
gonna
kind
of
talk
about
like
what
we
need
to
do
to
create
this
very
simple
environment
and
it's
gonna
be
broken
down
into
a
few
bullets.
We're
gonna
start
off
with
terraform
it's
about
600
lines
of
terraform
config.
That
sounds
like
a
lot,
but
it
actually
isn't
that
much
because
we
use
shared
modules,
and
this
is
just
a
copy
and
paste
from
other
environments.
A
There
are
some
things
that
are
unique
to
the
environment,
obviously
like
instant
size,
its
counts,
sub
that
allocation
and
then
which
fleets
were
deploying
to,
but
more
or
less
it's
just
a
subset
of
staging
or
a
subset
of
production.
Another
thing
is
that
we
have
to
do
some
chef
configuration,
and
this
usually
involves
creating
new
chef
roles
for
the
you
know,
OS
level
configuration
that
is
specific
to
that
environment.
This
includes
things
like
endpoints.
A
It
includes
like
any
kind
of
like
labels
and
things
that
we
need
for
Prometheus
and
I
kind
of
just
broke
these
down
too,
into
like
monitoring
infrastructure
bashed
in
the
get
modification
which
fills
out
to
get
lab,
RB
file
and,
and
then
I.
Just
put
a
note
here
that
we
don't
you
chef
for
the
runner,
because
it's
deployed
in
gke,
which
makes
it
a
bit
easier.
A
The
things
we
need
to
do
is
that
we
need
to
create
a
new
Java
ops
command
for
deploying
to
it.
This
could
be
a
little
bit
more
generic,
but
right
now
we
have
separate
like
a
separate
option
for
each
environment,
so
we
needed
to
do
that.
You
needed
to
create
a
we
needed
to
create
a
deployment
pipeline,
and
this
involves
just
editing
the
gitlab
CID
mo
for
the
deployer
I'm
not
going
to
go
into
too
many
too
much
detail
there.
A
But
basically
you
create
you
know
you,
you
create
the
changes
or
you
make
the
changes
to
the
git
lab
CID
animal
and
that
allows
that
allows
us
to
have
these
stages
for
deploying
to
the
environment.
And
then
you
have
some
other
things
like
documentation
and
monitoring.
Of
course,
like
we
need
run
books,
we
need
dashboard,
which
is
you
know,
a
manual
thing
for
right
now
and
like
where
to
find
logs
and
things
like
that.
A
So
what
I
did
here
is
that
I
broke
this
down
into
days,
to
kind
of
give
you
an
idea
of
like
how
long
it
took
and
what
needed
to
be
done
day
by
day.
It
took
six
days
to
do
and
I
think
you
know
my
manager
asked
me
like
what
year
it
is
I
said:
oh,
it's
2019
because
he
thought
it
was
like
2015
or
2005
or
something
like,
but
it
doesn't
matter
which
year.
It
is
because
it's
just
ridiculous
that
in
2019
it
takes
so
long
to
build
this
environment
and
I'm
gonna.
A
Try
to
explain
here
why
it
took
this
long
and
then,
at
the
very
end,
I'm
gonna
kind
of
go
through
the
improvements.
I
think
we
could
make
in
the
short
to
medium
turn,
to
make
this
a
little
bit
easier
for
the
next
person
day.
One.
You
know,
of
course,
you'd
create
the
project
in
GCP.
You
have
to
enable
all
of
your
api's
and
quotas.
This
is
all
very
manual,
it's
not
easy
to
automate,
and
then
you
create
your
initial,
mrs
for
the
terraform
config.
A
This
just
creates,
like
all
of
the
base
infrastructure
day
to
you,
know,
I
created
all
the
chef
roles
and
then
the
first
big
milestone
is,
like
you
know,
provisioning
one
server
and
terraform
ensuring
that
I
can
boot
and
register
properly
with
chef.
There
was
an
issue
here
that
we
had
to
work
through
because
there
was
a
problem
with
TD
agents
that
took
a
little
bit
of
time
to
debug
and
day
three.
This
is
like.
A
Where
now
you
have
most
of
the
infrastructure
up,
ensure
that
the
application
is
configured
want
to
make
sure
you
can
log
in
and
then
you
have
finally
like
a
full
run
of
terraform.
Without
any,
you
know,
without
any
errors,
everything
is
up
and
running,
and
so
now
you're
just
down
to
kind
of.
Like
the
last
configuration
on
day,
four
I
decided
just
to
destroy
everything
and
reprovision
it,
because
I
wanted
to
make
sure
that
it
could
come
up
from
scratch
and
then
there's
some
manual.
A
Things
like
I
had
to
create,
go
off
credentials
for
logging
into
productive,
lab
comm,
so
basically
the
OAuth
credentials
that
allows
anyone
with
a
given
ID,
like
an
email
address
to
log
in
and
for
the
Prometheus
servers
as
well.
Since
we
have
a
lot
than
a
volt
there
de5
ensure
that
the
runners
are
configured
and
connected
to
get
live
instance.
This
is
pretty
easy
because
it's
all
in
gke
I
committed
a
run
book
that
shows
you
how
to
do.
A
This
basically
follows
our
instructions,
but
makes
it
even
a
little
bit
easier
for
our
specific
use
case.
So
I
do
this
a
lot
that
it's
very
simple
just
to
like
create
a
new
runner
cluster
on
gke.
We
have
to
configure
the
purine
between
the
ops
environment
and
the
pre
pod,
which
we
talked
about
earlier,
of
course,
like
I
realized
after
this
that
there
was
overlapping
subnets
I
had
to
resolve
that
we
have
to
generate
SSL
Certificates
because
we're
you
know
connecting
to
pre-buy
comm.
We
don't.
A
Of
course
we
don't,
of
course,
have
a
wildcard
certificate,
so
we
have
to
generate
a
certificate
updated
in
secrets.
Reconfigure.
The
registry
also
needs
and
summon
SSL
cert,
and
then
you
just
need
to
go
through
everything
and
confirm
that
all
the
configuration
is
proper.
So
you
know
like
we
have
a
lot
of
files
that
are
stored
in
object,
storage
and
they
each
have
different
buckets.
A
You
have
to
make
sure
that
they're
all
working
properly
basics,
so
this
is
like
granting
access
so
that
anyone
can
log
in
creating
the
deployer
pipeline
created,
chat,
ops,
command,
create
a
new
select
channel
for
that
chat.
Ops
command,
create
a
dashboard
for
the
Peapod
environment,
just
to
kind
of
have
some
overview
and
then
test
the
deployment,
and
this
involves
use
and
give
that
chat
up.
So
you
can
see
the
command
there.
A
We
run
the
deployment
it
creates
this
nice
little
CI
pipeline
and
then,
if
everything
works,
everything
is
green
and
then
you
can
move
on
and
then
on
the
seventh
day.
I
guess,
like
you
know,
you
can
rest
or
you
can
kind
of
think
about
how
you
can
make
this
whole
process
better
or
maybe
question
your
life
choices,
because
it
is
like
a
lot
of
work
to
get
it
working.
A
The
I
did
include
a
detailed
log
of
everything.
I
did,
including
all
of
the
issues
I
ran
up
against
for
the
next
person.
It's
way
more
detailed
than
most
people
probably
are
interested
in,
but
for
maybe
the
sres,
if
you
want
to
see,
feel
free
to
access
it
at
that
link.
So
I
came
up
with
a
list
of
things
like
these
are
the
things
I
would
like
to
automate
to
make
it
completely
self-serve,
and
it's
quite
a
list.
A
I
mean
there's
a
lot
of
things
here
and
I
think
we
can
maybe
kind
of
rethink
of
how
we
do
these
environments
in
general
to
make
this
list
shorter.
But
there
are
a
few
things
that
I
think
I
would
address
right
away
and
I
and
I
put
this
here.
So
the
top
three
things
I
think
like
in
GCP
having
a
GCP
project
that
is
dedicated
to
an
environment
is
not
always
the
right
choice.
A
I
mean
GCP
comes
with
a
lot
of
overhead
for
creating
projects
like
it's
requires
you
to
enable
eight
the
eyes
it
requires
you
to
adjust
quotas,
they
can't
be
disposable,
so
we
probably
would
just
want
one
project
for,
like
all
of
all
of
our
environments,
if
you
were
gonna,
create
lots
of
them.
Second
bullet
using
chef
to
maintain
hosts
inventory.
A
Just
doesn't
scale
and
I
think
most
of
the
sres
are
realized
this
and
it's
something
that
we've
been
talking
about
a
lot
if
we
switch
to
using
a
central
console
cluster
for
like
managing
our
hosts,
inventory
and
and
some
reasonable
or
like
pretty
strict
conventions
on
how
we
name
things
and
I.
Think
a
lot
of
the
configuration
where
you
just
like
either
have
to
have
raw
IP
addresses,
because
you
can't
we
don't
have
internal
name
resolution
across
projects
or
you
know
like
anywhere.
We
have
where
we
have
to
like
use.
Hostnames.
A
It's
just
going
to
be
easier.
If
we
can
have
DNS
lookups
on
console
having
a
more
consistent
bootstrap,
definitely
would
help.
We
do
a
lot
when
we
bring
up
a
new
server
that
includes
you
know.
We
start
from
the
base
OS.
So
there's
a
lot
of
opportunity
for
things
to
go
wrong
during
provisioning.
Using
images
would
just
be
a
massive
improvement,
and
this
is
something
that
is
currently
being
looked
at
in
that
epoch
came
up
with
some
other
improvements.
A
These
are
like
lower
priority
things,
but
you
know
we
do
manual
private
subnet
allocation
because
we
have
to
ensure
that
subjects
don't
overlap
for
peering.
This
is
something
that
we
could
probably
manage
a
bit
better
for
logging,
like
everything
that
is
not
staging
or
production,
is
sharing
the
stage
indices
so
for
elasticsearch.
So,
but
maybe
we
could
come
up
with
something
better.
There
chat
ups
and
deploy
could
be
a
bit
more
automated
and
generic
and
then
for
database.
A
I
would
say,
like
you
know,
maybe
if
we
had
a
lot
of
environments
and
we
were
bringing
up
a
lot
of
them
having
a
shared
database,
cluster
and
Redis
clusters
with
multiple
databases,
because
I
think
AJ
configuration
I,
really
just
don't
see
us
automating,
setting
up
a
Patroni
cluster
and
tearing
it
down
quickly.
It's
just
something.
That's
a
little
too
high
touch,
so
I
would
say
like
having
my
infrastructure
that
we
can
share
across
multiple
environments
and
make
that
a
lot
easier.
B
Just
just
I
was.
A
So
currently
for
pre
pod
for
registry,
it's
running
on
the
instance
locally,
just
like
it
does
on
dev
and
ops,
which
means
your
registry
images
will
go
to
object
storage,
but
the
registry
service
itself
is
on
the
get
on
the
same
instance
that
the
rails
app
is
running
and
to
connect
to
registry
for
pre
prod
you
just
you
use
registry
pre-doc
to
get
led
comm
for
pages.
It's
actually
not
enabled
for
pre
prod,
but
it's
something
that
we
can
add.
I
don't
have
like
a
separate
fleet
for
the
pages
service.
A
C
John,
it's
mal
from
compliance.
I
had
a
question
maybe
about
slide
4
it
looked
like
there
would
be
some
potential
connections
between
these
pre
prod
instances
and
ops
net
or
logs
mm-hmm
Duquette.
La
donna
is
that
the
those
quote
production
instances
of
those
sites-
those
environments?
Yes,
perfect!
That's
this
guide!
Yes,.
A
Yeah,
so
we
we
allow
connections
for
it.
It's
it's.
It's
fairly,
selective
we
have
specific
subnets.
So
the
way
it
works
is
that
you
first
create
appearing,
and
then
you
use
firewall
rules
to
restrict
like
which
that's
can
access
which,
like
boxes
on
on
either
side.
So
we
allow
incoming
connections
from
the
ops
from
the
ops
infrastructure
for
monitoring,
because
dashboards
that
get
LOD
net
needs
to
use
the
Prometheus
server
on
the
left
as
a
data
source.
A
So
we
allow
that's
four
on
the
Prometheus
subnet,
we
allow
connections
from
dashboards
like
a
lab
net,
for
the
release
runner.
We
have
two
and
four
operations
like
maintenance.
We
have
the
SSH
from
the
release
runner,
so
we
allow
the
release
runner
to
have
pretty
much
all
access
to
all
of
the
internal
ideas
on
all
the
subnets,
but
it's
limited
to
the
release
runner,
but
maybe,
if
I'm
not
sure,
if
I'm
answering
your
question.
C
So
my
initial
concern
was
generally,
when
you
have
these
types
of
pre-production
environments,
seeing
those
connections
with
non
with
production
infrastructure,
there's
potentially
some
issues,
yours
if
you're
saying
that,
based
on
firewall
rules
that
certainly
some
kind
of
controls
that
we
could
potentially
and
Jeff.
Let
me
know
if
I'm
I'm
capturing
this
correctly,
we
could
validate
to
make
sure
that
there
is
no
bleed
or
leakage.
Yeah
I
think
the
directional,
the
direction
of
those
connections
yeah.
A
C
Yeah,
okay,
perfect
I
just
wanted
to
make
sure
that
I
was
seeing
that
correctly
and
that's
just
a
kind
of
a
to
do
for
for
you
and
for
your
group
and
the
compliance
team
to
actually
fully
vet
that
out,
so
that
we
we
have
a
full
understanding
so
that
we
can
either
suggest
additional
controls
or
say
nope.
This
is
the
way
it
is
and
then
speak
to
it
in
the
terms
of
an
audit.
So
thanks
cool
I
appreciate
it.
Probably
any
other
questions.