►
From YouTube: Episode 12: Hitless Deploys
Description
Join Yuval Kohavi dives into how to perform rolling deployment upgrades without any downtime using Envoy and Kubernetes.
About us https://www.solo.io
Questions? https://slack.solo.io
Code Samples: https://github.com/solo-io/hoot
Suggest a topic to cover here: https://github.com/solo-io/hoot/issues/new?title=episode+suggestion:
A
All
right
so
welcome
everybody
to
another
episode
of
hoot,
where
we
are
going
to
discuss
hitler's
deploys
using
android
kubernetes
and
we're
going
to
talk
about
that.
Why
is
it
how
we
want
to
do
it?
Why
is
it
problematic
we're
going
to
have
a
little
slideshow
and
then
a
little
demo,
and
I
also
want
to
once
I
do
the
demo.
I
actually
have
some
questions
to
our
viewers
about
how
you
want
to
see
demos
going
forward.
So
please
stay
for
the
demo,
so
we
can
get
that
answer
for
you.
A
B
A
This
episodes
every
two
weeks
talking
about
cloud
stuff
and
without
further
ado,
let's
get
started
so
I
prepared
a
little
presentation
here
talking
about
heatless
deploy
and
again,
if
you
have
any
questions,
as
per
usual,
hood
feel
free
to
ask
them
in
the
chat,
and
I
will
answer
them
live.
So,
let's
start
with
our
goal.
What
what
are
we
trying
to
accomplish
right?
So
our
goal.
A
And
specifically,
keep
it
happy
when
things
are
changing
right.
So
if
we
have
a
cloud
setup
with
a
load
balancer
sending
traffic
to
an
edge
gateway.
In
our
case,
we
call
it
envoy
that
sends
traffic
to
services.
Every
time
we
roll
out
a
new
service,
or
even
when
we
upgrade
the
the
edge
gateway
android,
we
want
the
user
to
keep
getting
200
okay
to
the
user's
request
right.
We
don't
want
the
user
to
be
ever
said
and
ever
request
a
turn
into
500
instead
of
200
right.
A
A
A
B
A
And
in
our
case,
that's
going
to
be
an
android
proxy
that
then
sends
them
to
a
kubernetes
service
right.
So
in
this
case,
when
we
talk
about
making
sure
that
there's
no
hits
right
that
everything
is
flowing,
we
can
say
that.
Okay,
what
can
each
layer
do
right?
So
the
user
can
or
the
app
can
perform
several
retries.
A
If
a
request
is
failed
before
showing
an
error
to
the
user,
the
cloud
load
bouncer
can
perform
health
checks
on
the
edge
gateway
to
know
which
instances
of
the
edge
gateway
it
can
use
and
the
edge
gateway
can
in
turn,
can
perform
health
checks
and
retries
on
the
services
to
know
which
instances
of
the
service
it
can
use
and
I'm
going
to
go
into
a
bit
more
detail
and
I'm
going
to
show
some
a
mistake:
can
I
common
anti-pattern
and
then
what
can
we
do
to
make
it
right?
So.
A
B
B
B
B
A
It
reaches
the
cloud
lancer
and
the
cloud
base
it
sends
it
to
our
edge
gateway
in
our
kubernetes
cluster.
In
our
case,
we're
talking
about
android,
so
the
cloud
load
balancer
needs
to
know
the
health
state
of
android.
The
problem
is
that
cloud
load
bouncers
are
usually
not
kubernetes
aware
and
are
usually
designed
to
work
with
virtual
machines
or
auto
scaling
groups.
A
The
our
edge
gateway
as
a
node
port
service,
and
we
can
configure
our
cloud
balancer
to
send
traffic
to
the
kubernetes
node
port
right.
What's
the
problem
in
this
approach,
the
problem
with
this
approach
is
that
if
cloud
load
answer
what
it
says,
it
does
not
see
envoy
right.
It
sees
all
these
kubernetes
nodes
and
their
node
port
and
each
health
request
that
gets
to
a
node
in
kubernetes
can
reach
any
of
the
envoy
instances,
because
they're
distributed
kind
of
randomly
there's
also.
A
Not
gonna
work
right
because
the
if
I'll
go
back
to
remove
that
x,
because
the
load
balancer
sends
a
health
request
to
node
1,
it
will
get
the
health
status
of
randomly
either
android.
A
Note
1
or
android
note
2
right,
so
it
makes
the
health
information
useless
and
again
it's
kind
of,
because
the
cloud
load
balancer
is
not
kubernetes
aware
so,
essentially,
if
you're
using
a
node
port
to
expose
your
age
gateway
to
your
cloud
load
balancers,
you
cannot
just
naively
use
hell
shades
because
they
will
mix
and
match
on
the
different
pods
and
essentially
become
meaningless
right.
So
what
to
do,
then?
If
we're
not
going
to
do
that,
so
you
gotta,
you
have
several
options
in
gke.
A
A
So
that's
to
solve
the
the
health
correlation
problem
with
the
cloud
of
dancer.
Another
thing
you
need
to
do
is
configure
the
health
check
filter
in
envoy
and
we'll
show
you
how
to
do
this
in
the
demo
in
order,
so
that
envoy
replies
to
health
check
requests
all
right
so
up
until
now.
If
there's
any
questions
feel
free
to
ask,
is
everybody
seeing
that
presentation,
okay
feel
free
to
drop
a
line
in
the
chat,
say
hi?
A
That
makes
sense
if
you
want
me
to
go
a
bit
deeper
or
anything
all
right,
so
so
that's
kind
of
the
short
version
of
the
heatless
de
you
know:
health
checks
and
utility
plus
between
load,
balancer
and
and
the
edge
gateway
right.
If
we
mess
up
these
health
checks,
the
gateway
might
mistakenly
think
that
one
of
the
nose
is
not
healthy
and
and
will
route
less
traffic
to
it.
A
A
Of
this
earner
set
setup
things
that
node
one
is
not
healthy.
They'll
send
requested
to
node
two
they'll
still
make
it
to
the
angular
or
node
one.
That's
a
coming
down
and
the
user
will
still
get
500
right
and
that's
why
we
need
all
this
strong
correlations
between
the
health
check
performed
by
the
load
balancer
and
the
envoy
pod.
That's
essentially
the
goal
that
we're
trying
to
achieve
here
all
right.
A
A
Makes
andre
aware
of
that,
and
we
will
not
be
by
default,
aware
of
that,
because
android
is
kind
of
infrastructure
agnostic
right
it
just
does
the
data
path
and
it
lets
the
control
plane,
give
it
the
details
it
needs
from
the
infrastructure.
A
In
addition,
you
got
to
remember
that
in
a
distributed
system,
the
view
of
each
component
in
the
system
of
the
of
the
system
state
is
eventual
consistent
right.
So
if
a
pod
tears
down
it
takes
time
for
this
information
to
propagate
to
android
right.
So,
even
if
you
have
a
control
plane
that
takes
the
kubernetes
information
and
sends
it
to
android
it
still.
You
cannot
rely
on
this
happening
instantaneously
right
and
let
me
I
have
this
animation.
A
I
I
gained
some
animation
skills
in
here
to
to
kind
of
understand
to
kind
of
relate
the
all
right.
So
let's
talk
about
a
fail
attempt
here
on
how
we
want
to
do
a
rollout
for
another
pod
in
this
deployment.
So
let's
say
we
have
android
and
envoy's
view
of
the
state
of
the
cluster
that
it
has
two
endpoints.
It
has
a
1001
and
1002
right
now.
B
A
Want
to
do
is
we
want
to
kind
of
do
a
rolling,
deploy,
deploy
another
pod
and
remove
one
of
these
pods
in
our
deployment.
So
now
we
deploy
this
v2
pod
with
ip003
and
we
delete
the
previous
pod
as
part
of
the
rolling
deployment
right
now,
because
the
system
is
eventually
consistent.
There's
gonna
be
a
little
bit
of
time
until
this
information
propagates
to
anboy
right,
it
might
be
half
a
half
of
a
millisecond,
but
it
or
more.
A
B
B
A
A
Idea
is
kind
of
when
you're
building
two
parts
and
you
want
to
put
them
together.
You
you
don't
want
it
to
be
too
tight
of
a
fit
because
then
you
they
might
not
work
you.
Oh
you
want
a
little
bit
of
wiggle
room
right
and
that
grain
period
is
that
wiggly
room
right.
So
the
pod,
instead
of
terminating
immediately
hangs
around
for
a
bit,
fails
health
checks.
Let
dependent
component
know
that
it's
about
to
go
down
and
after
that
drain
period
is
over.
Then
it
exits
right.
A
Period
from
the
plot
now
a
service
mesh
can
automate
some
of
this.
It
can
actually
add
health
checks
that
fail
automatically,
and
so
that's
also
something
to
explore.
If
you
have,
if
you
don't
want
to
rewrite
all
your
application
code
everywhere,
maybe
a
service
mesh
can
help
you,
but
you
know:
do
your
own
research
on
that
part
all
right.
So
let's
talk
about
what
we're
gonna
see
in
this
short
demo.
A
I
just
basically
gonna,
demonstrate
envoy
talking
to
two
services
and
then
being
deployed,
and
after
that
we'll
review
the
android
configuration
and
see
how
we
made
it
to
work
with
the
heat
less
deploy.
A
A
And
so
first
let
me
review
this
kind
of
service
environment
that
I've
created.
A
So
what
we're
gonna
do
here.
First,
we're
gonna
start
service,
one
right
and
service,
one
returns
200
just
so
we
can
tell
the
difference
between
it
and
other
services
and
we're
going
to
let.
A
A
B
B
A
A
second
terminal
here,
a
realistic
with
kubernetes.
Thank
you
for
that
and
I'll
try
to
make
the
nexus
more
cluster,
with
a
yamas
and
kubernetes
and
kind.
A
Six,
so
now
what
I
want
to
show
is
we're
gonna,
run,
hey
and
issue
some
requests.
Now
you
see,
service
1
is
responding
with
200.
Everything
is
good,
nice
all
right.
So
now
I'm
going
to
just
hit
enter
here
and
we're
going
to
see
what
happens
when
service
2
is
deployed
and
service
1
is
being
teared
down
without
any
draining
to
service.
A
A
2,
but
this
time
service
2
has
draining
all
right.
So
you
can
see
it's
deploying
publishing
the
endpoints
and
service
three
is
now
deployed
and
let's
revisit
that
metric
and
you
can
see
that
metric
did
not
increase
right.
It's
a
still
original
hundred
five
hundreds,
we
don't
know
we
did
not
have
additional
500
when
we
deployed
service
to
now.
All
this
code
is,
in
our
gita
brief,
there's
a
link
in
the
description,
a
github
solo
yohood.
So
you
can
look
and
play
around
with
it
and
in.
A
B
A
A
Before
we
look
at
the
android
config
one
more
thing,
when
we
talk
about
the
load
balancer
part
android
can
expose
health
checks
on
it
on
its
own.
So
let's
look
at
these
health
checks,
so
it's
configured
to
use,
slash
health
and
you
can
see
that
it
returns
http
200,
okay,
now
what
we
can
do
is
to
fail.
Tell
I'm
going
to
fail
its
health
checks
and
the
way
we
do
it.
We
go
to
android's
admin
endpoint
and
we
call
using
a
post
request.
A
Onboard
the
status
of
android
itself,
all
right,
if
there's
any
questions
on
what
you
just
saw,
please
feel
free
to
ask
now
and
in
the
meantime
I
will
review
the
android
configuration
that
I
used
in
this
demo
right.
So
it
starts
pretty
normal
things
get
interesting
right
here.
So
the
first
thing
you
can
see
that's
different
than
a
normal,
vanilla
config.
A
It
is
a
retry
policy
defined
on
the
virtual
host
and
what
I'm
telling
it
is
to
retry
on
all
of
these
network
error
right.
The
current
thing
with
this
error
they're,
usually
not
application,
specific
they're,
usually
because
of
some
time
out
some
network
error.
You
know
some
connection,
refuse
connection
reset
and
not
a
bad
request
right.
So
these
are
network
errors,
and
I
grabbed
that
list
from
the
android
documentation
and
in
the
readme,
the
repo
there's
a
link
to
the
documentation
where
it
disappears
and
essentially.
B
A
Do
a
five
retries
and
make
five
attempts
to
select
a
healthy
host
to
a
retry
to,
and
I
see
we
have
a
question
in
the
chat:
how
does
the
pool
ejection
it's
your
work,
so
I'm
going
to
get
to
that
so
circuit
breaking
is
something
a
little
bit
different
and
we're
not
going
to
cover
circuit
breaking
here.
We
we
are
going
to
cover
it
in
just
one
second,
health
checks
and
outlier
detection,
which
is
the
in
the
cluster
definition
down
below.
A
So,
okay,
so
this
is
first
part
the
retry
on
network
errors
and
that's
defined
in
the
in
the
virtual
host
or
the
route
level
configuration
inside
the
http
connection
manager
right.
So
that's
one
part
and
that.
B
A
A
B
A
One
because
it's
a
small
number
of
endpoints,
I
disable
panic
mode.
Otherwise
the
demo
wouldn't
really
work
correctly,
but
we'll
ignore
that
for
now,
if
you
want
to
learn
more
about
panic
mode,
feel
free
to
read
about
it
in
the
android
box,
there's
two
sections
or
two.
B
A
Configuration
that
are
interesting
in
a
cluster
there's
passive
health
checks
and
active
health
checks
right,
so
outlier
detection
is
the
passive
health
checks
and
what
that
means.
It
means
that
envoy
observes
requests
going
to
a
certain
host
and
observe
their
responses,
and
if
envoy
sees
three
consecutive
500
response
codes,
it
consider
that
host
not
healthy
right
and
it's
called
passive,
because
for
this
to
work
there
needs
to
be
traffic
right.
There
needs
to
be
traffic
from
the
user
ongoing
to
that
upstream.
B
A
A
Right
and
first,
I
want
to
say
the
first
part
is
a
generic
part,
that
is
all
health
checks
have
in
common,
so
unhealthy
threshold
and
healthy
threshold
and
that
those
are
threshold
to
after
how
many
successful
or
failed
health
check
requests.
A
Do
we
change
the
state,
the
health
state
of
an
upstream
ip
address
and
a
pod
right,
the
interval
in
which
to
do
health
checks
and
the
interval
to
do
health
checks.
If
there's
no
traffic,
you
might
want
the
lower
interval
here
to
reduce
the
load
on
the
cluster.
I
I
set
it
to
low
again
for
this
demo
to
work
to
work
more
reliably
and
timeout,
basically,
the
timeout
for
the
health
request.
A
I
only
defined
the
path
right
so
with
this
configuration
approximately
every
second
there's
a
little
bit
of
jitter
involved,
but
approximately
every
second,
and
we
will
send
http
requests
to
the
upstream
two
slash
health
and
if
it
gets
200,
it
would
consider
it
success
and
if
it
would
get
the
500,
it
would
consider
it
failer,
and
these
are
all
tunable
all
the
status
quo.
Everything
you
can
review
that
in
the
android
documentation
right.
So
to
summarize,
there
are
two
means
for
health
check.
A
B
A
All
right,
I
know
that's
a
lot
of
input
to
take
in
I'll,
do
a
one
final
review
of
this
config
and
then
we'll
we'll
wrap
it
up
here.
So
we
in
this
android
configuration,
we
implemented
two
mechanism,
two
slash
three
to
increase
reliability
and
enable
heat
less
employees.
One
is
retry
policy
on
network
errors
and
the
other
is
health
checks.
Here
I
define
both
passive
health
checks,
also
known
as
a
outlier
detector
and
active
health
checks.
You
may
not
want
both
of
them.
You
may
want
one
of
them.
You
want
one,
the
other.
A
It
really
depends
on
your
use
case,
and
what
this
is
allowed
to
do
is
essentially
assuming
you're
a
downstream
pod.
Sorry,
assuming
your
upstream
pod,
knows
to
properly
signal
and
propagate
healthy
information,
as
you
tear
it
down.
This
will
allow
I'm
going
to
process
this
information
and
stop
sending
traffic
to
the
pod
before
it's
completely
done
now,
in
the
case
that
something
happened,
the
part
crashed
or
maybe
you
cannot
change
the
drain
period,
because
you
don't
control
the
application
part.