►
From YouTube: Delivery: Canary promotion and deploy rollback
Description
J.Jarvis, J.Skarbek and M. Jankovski discuss deploy promotion between non-prod and production canary and rollback process
A
A
All
right,
I
wrote
up
some
in
the
dock
in
161
and
obviously
like
one
of
the
big
items
that
we
need
to
finish.
First
before
we
get
to
that
issue
is
scheduled,
automate
automate
the
scheduled
deploys
to
staging,
but
that
can't
really
move
until
we
decide.
How
are
we
going
to
promote
things?
Took
an
area
I
would
say,
because
us
just
automatically
deploying
to
staging
will
not
bring
any
any
real
benefit.
I
would
say.
A
I
think
like
there
is
already
a
situation
where
I
think
staging
is
not
sufficient
for
us
I
think
we
will
probably
have
to
have
another
non
production
environment,
but
to
keep
it
simple.
Let's,
let's
try
and
think
about
how
would
things
look
when
we
are
happy
what
the
staging
is
actually
producing
when
we
deploy
a
newest
lay
a
nightly
package
and.
B
So
I
personally
I,
don't
think
I
have
any
less
confidence
and
a
nightly
build
off
master
than
I
do
in
the
our
seas.
To
be
honest,
like
I
feel
like
it's
probably
the
same
amount
of
risk
as
far
as
deploying
to
staging
I'm,
not
saying
we
shouldn't
have
another
environment,
but
for
this,
but
I
think
it
probably
I
feel
pretty
comfortable
deploying
nightly
to
staging
where
at
least
as
comfortable
as
I
do
deploying
our
seas
to
staging.
A
B
A
A
B
I
would
say
we
well
I,
guess
I!
Guess
it's
a
bit
early
to
consider
it.
We
need
to
get
the
staging
first,
but
I
would
say
it
comes
shortly
after
shortly
after
we
do
these
staging
deploys.
I
would
say
we
just
do
canary.
You
know
and
I
think
once
you
roll
back
canary,
then
you
wait
until
the
fixes
are
on
stages
on
staging
and
then
you
promote
again.
B
A
B
A
And
what
we
try
to
do
there
is
that's
right.
What
we
are
doing
there
is,
we
have
a
stable
branch
and
we
create
an
RC
out
of
only
that.
So
this
is
where
my
other
proposal
comes
in,
where
we
would
have
a
simple
way
or
simpler
way
for
people
to
know
what
version
or
what
sha
all
of
the
components
are
running
from.
So
even
if
we
go
and
roll
back
canary,
we
still-
and
we
found
like
an
item-
we
want
to
roll
back
an
area.
C
A
That
we
don't
bring
in
more
items
accumulating
right,
like
you
know
how
many
commits
are
going
in
daily,
so
it
would
be
not
unexpected
from
us
to
not
stop
the
train,
but
at
least
say
well.
We
have
evidence
about
these
15
items
and
that
one
item
was
broken.
So
how
about
we
create
a
version
of
github
that
will
deploy
just
with
that
fix,
so
I.
B
Think
I
think
the
two
two
questions
then
are
when
we're
in
the
pipeline.
Do
we
create
the
stable
branch
or
a
branch
for
which
we
say
we're
only
taking
small
incremental
changes
to
fix
regressions
and
how
long
of
that
time
window
is.
Those
are
the
two
questions
now
the
answer
to
those
two
questions
are
for
we
deployed
a
staging
is
when
we
do
it,
and
the
time
window
is
from
the
7th
to
the
22nd.
C
A
The
window
is
one
day,
so
we
have
a
window
of.
We
say
that
we
want
to
deploy
to
staging
the
earlier
right,
and
we
kind
of
agree
that
well
I,
don't
know
whether
we
agree,
but
the
discussion
is
that
everything
that
is
on
staging
today
will
be
on
canary
tomorrow
right.
At
the
same
time,
we
want
to
pro
promote
the
previous
day's
work
right
so
say
that
the
item
that
we
are
deploying
to
canary
tomorrow
is,
for
some
reason
has
it
has
a
broken
item.
We
are
already
on
day
2
on
staging
right.
A
A
You
need
to
back
port
this
into
this
temporary
new,
stable
branch,
whatever
you
want
to
call
it
temporary
branch
containing
only
this
fix,
so
that
we
can
build
off
of
the
head
of
that
branch
right
like
containing
everything
that
will
redeploy
it
plus
the
fix,
and
we
can
do
a
fast
promotion
to
canary-
or
maybe
we
have
another
environment
in
between
that
will
only
be
used
for
these
types
of
things.
But
the
idea
being.
We
already
know
that
the
package
worked.
A
B
A
Please,
let's,
let's
make
sure
that
we
don't
use
the
same
terminology
as
right
now,
because
I
think
that's
that's
possibly
gonna
confuse
us
because
right
now,
stable
branch
is
for
us,
a
cinema
synonym
for
a
long-running
frozen
branch
and
I.
Don't
want
to
do
that
like
I,
want
to
be
always
branching
off
master
and
having
those
short-lived
branches
right.
B
B
I
guess
I
guess
we're.
We
may
differ
a
little
bit.
Design
I
was
actually
being
a
bit.
I
was
thinking
me
would
be
a
bit
more
aggressive
and
do
this
on
the
production
stage.
Not
the
canary
stage
so
canary
would
continue
to
get
daily
snapshots
of
master,
and
we
would
do
actually.
I
was
actually
thinking.
Post-Deployment
patches
on
prod
didn't
spill
until
canary
is
until
are
comfortable
to
canary,
and
then
we
would
just
promote
it
again,
which
would
bring
with
it
a
whole
bunch
of
other
changes
right.
A
Think
that
will
be
a
big
job.
I
think
it
might
make
sense
to
do
that
at
some
point,
but
I
think
we
can't
be
that
aggressive
at
the
moment.
Given
you
know,
do
you
I
wouldn't
say
all
the
unknowns?
We've
seen
quite
a
lot
of
different
things,
but
maybe
because
we've
seen
quite
a
lot
of
those
different
things
we
we
need
to
I,
wouldn't
say
this
is
not
playing
it
safe.
This
is
definitely
not
playing
it
safe.
This
is
more
being
cautious
in
how
we
promote
things.
I
would
say
so.
A
A
B
A
Know
about
the
set
of
boxes,
what
I
found
out
is
that
we
need
to
have
whatever
is
done
in
staging,
or
rather
whatever
is
inside
of
the
package
inside
of
the
release.
Artifact
is
not
on
is
not
altered,
so
this
is
also
the
case
for
patches.
Right,
like
patches
also
need
to
be
approved,
because,
as
soon
as
it
changes
the
source
code
of
our
environment,
it
needs
to
get
through
additional
improvement.
A
C
A
B
C
B
A
branch
at
the
commit
and
then
any
critical
bug
or
regression
fix,
would
not
only
go
into
master
but
go
on
to
this
branch
and
we
would
cut
a
new
package.
What
I'm
missing
is
what
happens
next?
Do
we
promote
that
package?
Does
that
package
go
directly
to
canary
or
does
it
go
to
staging
because
now
we're
gonna
be-
and
this
is
this,
why
you
said
that
we
need
like
another
environment?
Yes,.
A
A
B
B
Well--That's
shares
okay,
so
we
could
have
maybe
one
staging
database,
one
staging
Redis
cluster
and
then
the
problem,
the
problem
where
we
run
into
problems
a
sidekick
because
there's
really
no
way
to
isolate
sidekick
without
namespaces
and
there's
no
pan
issue
for
this
and
I,
don't
think
it's
gonna
be
resolved
soon.
If.
B
C
B
A
B
B
So
the
current
proposal
is
that
we
deploy
to
canary-
and
this
is
where
we
create
a
branch,
and
then
we
take
small
incremental
changes
for
critical
bugs
and
regression
to
limiting.
Now
the
amount
of
commits
that
go
to
prod
what
I'm
suggesting
this
is
that
we
do
patches
like
post-deployment
patches
on
production
until
we're
ready
to
promote
canary,
which
would
have
those
fixes
plus
everything
else
and
like
it.
Just
seems
to
me
that,
if
we're
trying
to
get
to
full
CD,
we
should
be
deploying
for
a
master.
B
Still,
don't
think
I
understand
how
long
this
window
would
be.
You
said
a
day,
but
how
does
that
work
exactly
like
we
would,
you
would
freeze,
we
would
create
as
soon
as
we
deployed
a
canary.
We
would
let
it
go
to
production
and
soak
for
a
day
and
then
do
incremental
changes
for
a
day
and
then
unfreeze,
okay,.
A
A
So
day,
one
purple
on
staging
day,
two
purple
on
canary
day
three
purple
on
production.
Basically
before
this
arrives
so
from
the
moment
on
staging
to
the
moment,
we
have
this
on
staging
on
production.
It
would
be
two
extra
days
allowing
us.
You
know
one
day
on
staging
to
actually
find
anything
and
stop
the
move
one
day
on
canary
to
do
the
same
right,
because
it's
gonna
be
like
way
more
traffic
and
so
on,
and
ideally
by
the
time
this
would
arrive
on
production.
A
B
B
A
A
You
can
expect
that
this
is
going
to
increase
in
volume
as
soon
as
we
speed.
This
up
alternative
is
to
do
a
slower
roll
out
so
that
we
can
get
the
feel
of
what
is
gonna
be
done
so
say
instead
of
deploying
every
day
on
Zone
on
staging,
say
every
two
days
and
then
promote
once
a
week
to
Canary
and
then
promotes,
but
that's
way
slower
and
I.
Don't
want
to
do
that.
I
want
to
do
faster,
deploys.
B
A
B
Environment
for
this
gap,
where
we
want
to
take
small
incremental
this,
basically
the
window
where
we
want
to
examine
incremental
changes.
What
if,
instead
of
going
directly
to
canary
we
go
directly
to
a
single
instance
server,
we
call
it
pre,
prod,
whatever
very
low
maintenance
like
we,
don't
have
a
full
H,
a
topology,
and
then
we
go
to
canary.
Do
you
think
we
could
sell
that.
A
B
A
A
B
No
I
think
I
think
what
makes
these
a
new
environment
expensive.
This
is
the
AJ
topology
of
Patroni,
plus
Redis
plus
I'd.
Take
all
these
single
server.
That's
cake,
I
mean
we
could
even
we
could
even
do
like
in
H
a
database
and
a
single
server
and
still
be
a
lot
less
overhead,
so
yeah
I
think
that's.
That
sounds
like
if
we
can
sell
that
and
I
think
the
outdated
to.
A
Repeat
basically,
best-case
scenario:
we
have
this
graph,
that's
as
day
one
we
deploy
something
on
staging
day
to
that
same
purple,
circle
is
deployed
on
canary.
We
verify
nothing
is
broken
on
canary
day
three.
We
have
thing
the
same
thing
in
production,
so
this
is
the
best-case
scenario,
and
then
that
goes
for
every
deploy
everything
lands
in
production.
Now,
where
things
start
to
complicate,
is
in
case
we
are,
we
didn't
find
something
on
staging.
We
are
on
canary
and
we
found
a
regression
that
we
find
is
gonna
cause
a
lot
of
pain
for
our
users.
A
A
B
So
I,
don't
think
I
mean
not
to
say
we
couldn't
it's
better
than
nothing
and
I
think
we
can
probably
sell
it.
I.
Don't
think
people
understand
this
architecture
well
enough
to
complain
at
least
the
people
who
are
complaining,
don't
understand
it
enough,
but
I
think
I,
think
yeah,
I
think
I
think
we
should
probably
I.
C
A
Do
care
about
what
data
is
there
because
the
items
so
theoretically,
if
all
developers
did
their
work
correctly,
they
would
have
already
noticed
this
breakage
in
their
local
data.
The
problem
is
staging
brings
in
another,
you
know
set
of
data
or
large
larger
data
set
that
all
of
a
sudden,
their
item
is
breaking
down
on.
B
B
A
A
B
A
A
A
That
means,
if
we
don't
do
security
release
quick
enough,
we'll
have
a
large
backlog
of
things
to
deploy
on
later.
So
we
are
back
into
the
situation
we
have
now.
Alternative
is
as
soon
as
we
have
a
security
patch.
We
don't
cause
anything
when
it
comes
to
deploys
to
stage
in
Canary
production,
but
we
have
security
patches
continuously
applied
until
we
have
a
release
that
we
can
publicize,
and
then
that
would
mean
that
the
master
would
receive
the
fix
from
security
requests.
B
B
A
But
I'm
afraid
if
we
pause
the
pipeline,
we
run
into
a
situation
where
we
are
going
to
have
like
say,
for
example,
we
had
a
seven
days
of
security
release
right.
That
means
that
we
have
seven
days
of
not
deploying.
This
brings
us
back
to
our
current
situation,
right
of
having
things
build
up,
and
then
I
was
deploying
a
week
or
two
weeks
later
right
once
you
actually
unpause
the
pipeline,
then
you
get
like
this
torrent
of
things
into
your
environments
again.
So
pausing
is
something
that
we
really
really
should
not
be
doing.
B
A
A
A
B
B
C
B
A
B
A
A
A
C
B
C
B
C
B
B
B
B
A
A
Honestly,
like
I,
think
a
single
code
base
you
get.
Liberals
would
help
immensely
like
lower
the
difficulty
level
of
this,
because
now
you
need
to
to
think
about
for
repositories
versus
not
even
to
one
that
you
would
be
needing
to
use
I
guess,
because
we
have
one
on
top
get
lost
or
gained
one
on
Comm.
A
A
B
B
Yeah
I
do
the
so.
The
this
file
based
metric
that
we
use
for
the
version
dashboard
is
the
parsed
output
of
dpkg.
It
was
kind
of
a
joke
when
I
did
it
and
now
it's
like
something
we
depend
on
it's
funny,
like
I
I,
think
it's
a
total
hack,
it's
a
cron
job
that
does
a
dpkg
and
parses
the
output
of
it
and
gives
you
the
omnibus
version.
That's
installed.
It's
a
file
based
metric.
B
A
If
you
I
mean
you
know
that
weren't
world
2
runs
on
hacks
and
jokes,
so
right,
if
you
take
a
look
at
up
to
get
lab,
you
have
a
version
manifest
or
JSON
file
in
the
root
of
that
directory,
not
to
get
lab
that
one
is
obviously
JSON
file
and
it
has
all
the
versions
that
we
ever
want
with
Shas
with
URLs.
Okay,.
B
B
A
A
So
if
we
do
that,
skarbek
I'd
paste
this
in
the
in
the
dock
so
below
the
picture,
you
can
actually
see
how
it
looks
so
it
has
the
locked
version
which
is,
in
our
case
the
original
of
the
nightly
deployed.
So
this
is
the
version
of
github
rails
community
edition
this
codes,
sure
that
is
deployed
on
get
low
block
on
that
bit
of
the
tour,
and
we
have
a
similar
thing
for
get
early
and
all
the
others.
A
Creating
a
chat
ups
command.
If
we
want
to
use
chattels
for
this,
you
wanna
use
something
else:
I'm
fine
with
that,
but
I'm
spitballing
here
chat,
ops
command.
That
will
just
tell
us,
you
know,
chat
ups,
run
canary
version,
chat,
obscure
on
canary
staging
or
from
staging
version
right
to
print
out
all
of
the
shots
that
we
have
currently
deployed
and.
A
A
Once
we
have
those
two
things,
I
think
it's
a
matter
of
adding
a
schedule
in
a
repository
that
will
continuously
trigger
a
build
and
deploy
to
staging
automatically.
This
is
the
last
one
is
two
minutes
of
work.
The
previous
one
is
a
bit
more
and
Java
already
say
that
we
have
you're
already
saying
that
we
have
things
in
place
for
pushing
things
to
primitives.
We
just
need
to
change
from
parsing
apt
to
parsing
a
file
JSON
file,
yeah.
A
Okay
and
then
we
can
start
enabling
this
and
starting
how
to
see
how
things
break.
We
can
then
pause
it
and
continue
with
our
current
workflow
that
we
have
right
with
tagging
and
so
on,
but
then,
as
we
enable
again
and
figure
out
new
breakages,
we
can
we
can
try
to
try
and
introduce
this.
It's
not
gonna,
be
as
smooth
I
expect.
This
I.
B
Was
kind
of
I'm
sort
of
joking
about
this
basic
off
thing,
but
now
that
I
think
about
it,
like
you,
could
also
envision
for
the
stage
in
environments
like
a
chat,
ops
command.
That
will
allow
us
to
turn
it
off
and
on
and
rotate
the
password.
And
you
know
we
have
a
security
release.
Then
the
text
aging,
with
like
a
basic
basic
auth
layer
and
the
password
is
different.
Every
time
you
can
ask
chat
outs,
what
the
password
is
that
could
work
in
place
of
the
VPN,
but
I
think
it
probably
would
give
us
enough.
B
C
B
B
B
A
B
B
A
B
B
A
B
A
A
B
A
B
A
Okay,
so
we
basically
say
migrations
are
taken
taken
out
of
this
requirement.
For
now
we
want
to
roll
back,
but
without
touching
migrations.
Okay,
that
makes
sense.
So
basically,
our
roll
back
should
be
just
simple
ruling
that
nodes
and
then
rolling
back
literally
so
just
making
sure
that
order
is
correct
so
that
we
don't
cause
more
damage.
B
I'm
kind
of
I
know
you
don't
like
it
as
much
as
I
do,
but
I
do
kind
of
like
think
we
would
be
better
off
to
lay
in
pose
to
play
migrations
until
the
next
deploy
and
I.
Think.
The
main
issue
with
this
is
that
you
kind
of
don't
wrap
up
a
release
until
you're
done
with
everything
right
and
maybe
like
I,
don't
I,
don't
fully.
Maybe
I,
don't
fully
understand
all
your
your
criticisms
of
this
approach.
Yeah.
A
A
Were
known
to
cause
a
lot
of
load
on
the
database
error
cause
they
have
been
known
to
be
unoptimized.
So
if
we
delay
this
depends
on
the
delay.
But
let's
say
we
delay
this
until
the
next
release.
Whatever
the
next
release
is
we
run
into
a
situation
where
people
already
moved
on
so
just
bring
the
context
switch
of
understanding
what's
happening
right
now,
and
how
do
you
fix
this
and
then
release
this
as
an
level
of
items
that
need
to
be?
It
needs
to
be
tracked.
A
What
we're
doing
right
now
is
we
do
it
right
now
we
do
the
deploy
post
deployment
migrations.
If
something
goes
wrong,
we
need
to
handle
it
right
then,
and
there,
and
we
need
to
fix
the
problem
within
a
certain
amount
of
time
that
it
takes
to
develop
this.
But
it's
really
much.
It
really
depends
much
on
what
do
you
consider
the
next
release
on
production,
because
if
we
are
talking
about
this
continuous
pipeline,
that
will
mean
that
it
will
take
2-3
days
for
things
to
arrive
on
production,
I.
A
B
C
B
Change
that
to
the
first
thing
we
would
do
is
we
would
do
post
deploy
migrations
at
the
current
version.
Then
we
would
upgrade
the
deploy
box
and
then
we
would
do
the
regular
migrations,
so
basically
it'd
be.
The
first
step
would
be
to
do
post
deploy
migrations
at
whatever
version
is
currently
running.
How.