►
From YouTube: Discuss HAProxy downtime requirement
Description
A short discussion about alternatives that we have for site downtime for the HAProxy upgrade.
Notes (internal): https://docs.google.com/document/d/1v6kheVYq1ap4Xq6Kmnj0uKRelGTq5s-mRHZHGos7KLQ/edit#heading=h.dhp4sslmd35r
A
I
have
the
first
item,
which
is
to
just
try
to
get
a
sense
of
why
we
need
to
have
a
downtime
event
for
the
AHA
proxy
upgrade
I
read
through
a
little
bit
of
the
background
here,
and
it
seems
to
me
that
we
need
to
change
the
node
pool
for
the
new
nodes
and
doing
that
is
like
a
destructive
event.
Does
that
sound
right,
yeah.
B
Anything
I'm
missing
there,
yeah
so
in
the
in
the
Epic
upgrade,
and
the
Epic
update
that
you
linked
the
Assumption
was
that
we
just
needed
to
replace
the
node
pools.
However,
in
the
Mr
that
you
linked,
I
came
to
the
realization
that
basically,
that
rep,
that
also
forces
the
replacement
of
the
of
the
internal
lb
for.
A
B
General
Internal
lb,
which
means
that
we
cannot
guarantee
that
the
IPS
cannot
stay
the
same.
So
we're
not
talking
about
something
that
might
disrupt
might
be
disrupted
for
like
a
fraction
of
a
second
while
the
node
pool
is
switched
over,
but
something
that
potentially
carries
well
the
MS
caching
playing
into
the
whole
situation,
which
is
why
it's
probably
a
more
delicate
change
than
I.
First
anticipated.
A
Why
why
I
don't
recall,
but
is
it
that
we
don't
set
a
static
IP
for
the
internal
ilbs.
B
So
it
would
be
possible
to
rework
it
to
be
able
to
specify
an
IP
object.
However,
because
we're
not
using
a
reserved
IP
already,
we
would
need
to
do
the
Ipswich,
regardless
yeah.
A
Okay,
so
we're
not
using
a
reserved
IP
right
now,
no
I
see
that's
what
I
can
tell
so
we're
not
using
a
static,
IP,
I
wasn't
I,
wasn't
sure
and
honestly
like
this
was
Point
number
three
like
I
I,
don't
think,
there's
anything
that
depends
on
the
IP
for
the
internal
bouncer.
Are
you
aware
of
anything
yeah.
B
It's
just
the
DNS,
so
the
DNS
record
will
be
automatically
updated
by
by
terraform,
however
stuff
that
uses
the
internal
Loop,
also
such
as
Pages
Gallup,
shell.
You
know
Etc
if
that
caches
DNS
that
can
be
problematic,
yeah
yeah.
A
B
We've
we've
seen
that
the
Google
metadata
service,
which
does
the
DNS
in
gcp,
also
really
likes
caching
aggressively
for
DNS
records
yeah.
So
like
the
the
blast
radius
of
how
this
is
gonna.
Oh,
this
is
gonna
blow
up.
Basically,
it's
kind
of
unknown.
A
B
A
And
bouncing
back
up
to
point
number
two.
So
are
we
only
talking
about
the
internal
lbs
as
an
issue?
If
we
took
away
the
internal
LPS,
we
wouldn't
have
any
downtime
right.
B
Yeah
so
right
now,
so
the
the
changes
split
into
two
merge
requests.
So
one
that
you've
linked
is
4951
and
4950
is
the
one
for
the
the
other
ones,
because
they're
not
using
the
internal
load,
balancer
module.
We
can
just
concatenate
the
node
pool
there
and.
A
B
A
And
okay,
so
moving
on
to
point
number
four,
this
is
how,
like
naively
based
on
my
very
surface
level,
understanding
like.
Can
we
just
created
a
DNS
entry
and
then
move
everything
over.
B
B
This
is
probably
easy.
It's.
A
B
We
then
need
to
also
change
the
configuration
everywhere
where
that's
been
used
and
it's
kind
of
been
pain
to
update
it,
as
it
is
right
now,
because
there's
a
few
consumers
there.
A
I'm,
just
wondering
like
I'm,
just
wondering
whether
There's
an
opportunity
here
to
one
is
like.
If
we
don't
set
a
static
IP
for
the
internal
IPS
we
could
we
could
do
that.
We
could
fix
the
module
or
even
like
I
I,
don't
know
how
much
Croft
is
in
that
module
that
we
use
for
the
internal
LPS,
but
maybe
there's
an
opportunity.
We
can
clean
that
up
a
bit
just
provision
new
internal
lbs
with
a
new
DNS
and
then
and
now
we
have
them
both
running
in
parallel
right.
A
Internal
lbs,
new
internal
lbs
are
pointing
to
the
new
AJ
proxy
Fleet
old
internal
LPS
are
put
into
the
old
age
of
proxy
Fleet
and
then
and
then
we
just
move
the
DNS
over
I
mean
you're
right.
We
need
to
identify
everywhere.
We
connect
to
the
internal
lbs
and
change
that
configuration,
but
I
think
it's
not
going
to
be
too
too
bad,
like
I
I.
Think
it's
only
in
three
or
four
places
for
the
CI
stuff.
There
is
the
option.
The
CI
stuff
was
only
done
to
save
money.
A
We
could
even
disable
that
temporarily,
but
I
would
say.
A
A
B
A
A
B
Like
after
after
we've
removed
the
L1
at
the
C
name
to
the
to
the
new
one
and
be
like
okay,
if
something
breaks,
we.
A
B
I
think
because
there's
like
when
I
when
I
looked
into
the
terraform
stuff,
there
was
a
lot
of
you
know,
policies
and
also
configuring,
the
CI
environments
and
that
kind
of
stuff.
So
that's
probably
if
we
can
just
turn
off
CI,
you
know
the
CIA
proxy
usage
while
we're
doing
the
migration
and
then
turning
it
back
back
on
again
afterwards,
that's
probably
the
easiest
way
to
deal
with
CI.
A
A
Obviously,
he
knows
everything
but
yeah
I.
If
you
don't
get
far
with
that,
just
let
me
know
I
can
I
was
also
like
involved
in
it,
not
heavily
involved,
but
like
involved
enough
that
I
can
probably
help
it
would
be
also
better.
Like
I
mean
more
people
should
know
about
the
CI
stuff.
A
This
configuration
it's
pretty
opaque,
I
think
to
sres
like
how
that's
configured
I'm,
not
sure
I,
even
remember
like
all
the
details,
so
that
maybe
we
could
check
the
Run
books
and
you
know,
get
that
stuff,
updated
and,
and
that
was
kind
of
serious
half
serious,
at
least
about
like.
If
this
internal
lb
terraform
module
is
really
bad,
we
could
maybe
fix
it
in
a
way
that
it's
nicer,
I,
don't
know.
A
B
A
B
A
life
cycle
prevent
deletion
rule
in
place,
which
also
makes
the
CI
job
fail
for
the
merge
request,
because
you
need
to
manually
hack
in
local
copies
of
the
terraform
files
to
make
that
work.
Cool.
A
Okay,
that's
pretty
much
all
I
had
so
yeah.
Let's,
let's
try
to
sketch
this
out,
see
if
it'll
work
and
if
it
does,
then
we
can
avoid
the
downtime
yeah.
B
I
mean
we
can
so
we
can
probably
sketch
this
out
to
also
do
that
on
staging
yeah,
because
the
proper
way
to
do
it
yeah
and
then,
if
we
can
yeah
and
that
works,
then
we
can
do
it
in
production.
A
Oh
man,
that's
all
I
have.