►
From YouTube: Contour demo - Envoy Shutdown Manager
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Hi
I'm
Steve,
sloka
I
work
on
contour
and
today,
I
want
to
talk
a
little
bit
about
a
new
feature.
We
have
in
contour
called
the
shutdown
manager,
so
the
goal
of
this
shutdown
manager
is
a
new
sub
command
of
contour.
That
lets
you
manage
the
lifecycle
of
on
voice.
So,
if
you're
not
familiar,
let's
go
ahead
and
review
the
architecture
for
contour.
So
here's
a
quick
overview
of
that
so
again,
contours
and
ingress
controller
for
kubernetes.
A
It
gets
traffic
from
outside
the
cluster,
and
this
job
is
to
route
that
traffic
from
outside
the
cluster
into
the
cluster.
We,
the
routing
component,
that
makes
that
happen
is
envoy.
So
envoys
are
data
path
component,
so
it
handles
all
the
actual
routing
of
traffic
on
the
wire
contours.
Job
is
to
be
a
controller
or
a
server
for
envoy,
so
it
goes
and
builds
configuration
based
on
what
it
sees
in
the
cluster
and
passes
that
configuration
off
to
envoy.
A
A
We
can
manage
the
deployment
of
contour
change
versions
of
that
independently
of
changing
the
Envoy
version,
but
at
some
point,
you're
gonna
have
to
change
envoy
or
roll
out
a
new
change
or
upgrade
or
do
some
sort
of
change
to
it,
and
what
the
problem
we
want
to
solve
here
today
is
to
be
able
to
roll
out
that
set
of
envoys,
and
do
it
in
a
way
that
we
don't
have
to
have
these
users
out
here.
Sending
requests
notice
that
you're
even
changing
it
right.
A
We
want
to
have
them
not
realize
downtime
or
have
so
any
kind
of
error
is
due
to
us
rolling
out
this
infrastructure
change.
That's
you
know
not
apparent
to
them,
so
the
shutdown
manager
is
going
to
help
us
do
that
work.
So
it's
hot
back
here-
and
this
is
the
the
project
contour
data
I/o
website
and
here's
the
documentation
on
how
we
can
redeploy
Envoy.
So
the
general
overview.
What
we
want
to
accomplish
today,
when
we,
when
we
want
to
you,
know
roll
a
non-void
and
change
it
as
first.
A
We
want
to
stop
accepting
new
connections
to
that
envoy,
so
find
a
way
that
we
can
not
direct
new
traffic
to
it.
Once
we
do
that,
we
want
to
start
draining
connections,
and
we
can
do
that
by
telling
envoy
that
it
needs
to
it.
Fails
it's
health
check,
so
we
can
send
a
post
request
to
envoy
over
its
admin
web
page
and
that's
to
the
slash
health
check,
slash,
fail,
endpoint
and
once
we
do,
that
envoy
will
start
to
drain
connections
after
that.
We'll
just
have
to
basically
wait.
A
The
first
thing
that
does
is
it
has
a
liveliness
program,
and
this
is
just
there
to
maintain
that
this
shutdown
manager
is
acting
healthy
right,
so
kubernetes,
aliveness
probe
will
call
this
endpoint
and
here
in
our
example,
we're
using
slash
Health
Z.
So
basically,
just
says:
hey
are
you
healthy
and
the
shutdown
manager
will
reply?
Yes,
I'm
healthy?
If
it
ever
doesn't,
then
it
will
go
ahead
and
restart
that
container
for
us.
The
second
thing
we
want
to
do
is
we're
going
to
implement
a
pre
stop
hook.
A
So
increment
is
the
pre
stop
hook
lets
you
intercept
the
shutdown
from
kerbin
eyes
before
the
container
gets
the
cig
term.
This
is
the
generic
example,
for
this
is
always
a
database
right.
So
if
I'm
running
a
database
and
before
I
and
I
got
a
signal,
hey
shut
down
before
I,
do
that
shut
down,
I
want
to
make
sure
I
clean
up.
My
connections
I
want
to
commit
many
transactions.
I
have
in
memory,
get
all
of
that
stuff
cleaned
up
before
I
actually
shut
down,
so
the
pre
stop
hook.
A
Lets
the
container
decide
when
it's
actually
ready
to
go,
shut
down
and
there's
two
gates
around
this
one
is
it
you
know
it
decides.
It
replies
to
this
HTTP
HTTP
GET
request,
which
is
here.
The
second
gate
is
the
termination
grace
period
seconds
right
and
that's
kind
of,
like
a
high
overview
of
the
max
time
that
a
pod
will
sit
in
terminating
before
cubanos
will
stick
terming
so,
regardless
of
how
healthy
or
how
you
know
how
many
connections
we've
drained.
A
If
we
hit
this
total
termination
grace
period
seconds,
crew,
babies
will
just
forcibly
get
rid
of
it
right.
So
the
first
thing
we're
going
to
do
when
we
go
through
the
shutdown
procedure
is
we'll
get
a
signal
from
kubernetes
that
we
want
to
terminate
the
pod
right,
and
this
happens
first
and
the
culet
says:
hey
envoy,
you're
gonna
go
shutdown
and
it
calls
the
pre
stop
hook
for
envoy.
A
When
that
happens,
the
cutout
manager
also
gets
the
same
hook
and
it'll
send
a
signal
or
a
post
request
again
to
envoys
Ivan,
webpage
and
we'll
say:
hey
envoy,
go
and
healthy
and
begin
draining
connections
as
soon
as
this
happens,
the
the
readiness
probe
on
envoy,
which
is
the
way
for
kubernetes,
to
determine
if
a
pot
is
ready
or
not
that
will
start
to
fail
and
that
will
stop
new
connections
coming
in
from
outside
the
track
outside
the
cluster.
Once
that
happens
and
an
envoy
starts
draining.
A
Basically
what
the
shutdown
manager
does
is
it'll
pull
for
open
connections
and
on
some
interval,
that's
defined.
It
will
just
say:
look
you
look
at
the
prometheus
end
point
of
envoy
and
look
at
the
listeners,
the
TLS
and
non
TLS
listeners,
and
it
just
looks
for
how
many
connections
are
open
once
it
meets
a
certain
minimum
criteria,
then
it
we
know
is
that
it's
drained
enough
connections.
The
shutdown
manager
will
reply
to
the
pre,
stop
hook
that
the
cubelets
sent
initially
and
it
will
tell
kubernetes
hey
we're
good
to
shutdown
between
all
connections.
A
You
can
tweak
to
figure
out
how
long
you
want
to
wait.
You
know
maybe
for
you
it
might
take.
You
know
ten
minutes
for
connections
to
drain.
It
may
take
an
hour,
so
we'll
have
to
look
at
and
see
how
you
can.
You
know
turn
these
these
settings
up
and
down
to
really
match
your
scenario
all
right.
So
let's
go
ahead
and
do
this
for
real.
So
what
I
have
is
I've
got
a
cluster
running
and
in
my
cluster
we
actually
shipped
some
sample
dashboards
for
this.
A
So
this
is
the
generic
dashboards
you
get
from
from
contour.
That
looks
at
all
the
Envoy
metrics
and
what
you
see
here
is
we
have
this
envoy
open
connections,
tab
so
in
my
cluster
I've
got
three
different
envoys
on
three
different
nodes
and
right
now,
I
have
no
traffic
going
to
it.
So
what
I'm
gonna
do
is
I'm
gonna
go
ahead
and
we're
gonna
generate
some
traffic.
A
Once
we
have
traffic
running
to
all
the
nodes,
then
we're
gonna
go
ahead
and
change
the
envoy
daemon
set,
and
when
we
do
that
we're
gonna,
you
know
just
change
it
have
it
rolled
out
and
what
we
can
do
is
we
can
watch
it.
Go
unhealthy,
wait
for
connections
to
drain
and
then
and
proceed
on
so
I'm
gonna
do
is
I,
have
a
have
a
generic
load
test
thing
here.
Let
me
pull
this
up
and
show
you
so
what
this
is.
A
This
is
a
tool
called
Bombardier
and
what
it's
gonna
do
is
is
going
to
go
ahead
and
get
rid
of
that.
This
is
going
to
go
and
send
some
connections.
So
it's
going
to
send
a
thousand
connections
for
five
minutes
and
then
it's
just
gonna,
you
know
hammer
are
so
versus
with
requests
right.
So
let's
pop
that
over
here
and
let's
go
ahead
and
create
our
job.
Okay,.
A
A
Right
and
we'll
do
a
watch
on
getting
our
pods
alright.
So
here
what
you
can
see
is
I've
got
my
three
pods,
my
three
Envoy
pods,
which
are
matched
up
here
and
what
we
can
see
now
is
there's
traffic
going
to
each
one
right.
So
I
have
a
thousand
connections
and
I've
got
about
you
know
300
and
some
connections
per
envoy
getting
traffic.
So
what
we'll
do
is
we'll
go
ahead
and
we'll
edit
the
dataset.
A
So
let's
go
ahead
and
edit
our
Envoy
process
and
we
will
we'll
change
something
in
it
and
will
cause
the
rollout
now
I
can
this
could
be
a
new
image.
This
could
be
something
like
that.
I'm
just
gonna
change,
one
of
these
delay
seconds
to
make
it
go.
So
let's
go
ahead
and
change
this
one
here,
this
period
seconds
from
3
to
4.
Again
this
is
a
silly
change,
but
just
to
demonstrate
the
change
so
I'll
write
that
out
right.
A
So
what
should
happen
now
is
one
of
the
pods
got
that
shutdown
signal
right,
so
the
pretty
stop
hook
got
called
the
shutdown
manager
went
and
sent
the
termination
or
the
unhealthy
state
to
envoy,
and
here
you
can
see
that
it
went
unhealthy,
so
one
of
two
are
ready,
so
the
Envoy
pod
is
unhealthy.
So
now
no
new
traffic's
gonna
target
that
envoy
and
what
we
should
see
is.
A
You
could
make
it
larger
or
smaller.
It'll
have
a
delay
until
waits
for
things
to
go
so
there
you
go
that
it
must
have
found
zero
zero
connections,
so
it
responded
to
the
pre
stop
hook
and
now
the
pot
got
terminated
right
so
that
one
went
down
and
now
we
can
see
the
new
pot
spinning
up
again.
It's
got
a
past.
Its
readiness
probe
once
Envoy
goes
healthy,
it'll
get
its
configuration
from
contour,
and
then
things
will
chug
along.
So
we
should
see
this
pot
here.
A
This
JJ
ptq
go
away,
and
then
we
should
see
this
new
man
come
up
this
KKR
SD
and
there's
the
KKR
SD
and
you
can
see
now
it's
getting
traffic
now
the
next
one
to
go
down
is
the
CN
8zq
and
again
it's
the
same
way.
We
should
see
all
traffic
go
to
zero
on
this
one.
Once
it
goes
to
zero,
we
will
be
able
to
go
back
to
enroll
the
next
one.
A
A
Everybody's
waiting
for
this
connections
to
pull
we're
pulling
through
that
now
again,
there
are
different
parameters
down
here
that
we
can
tweak.
So
we
can
tweak
the
check
interval,
which
by
default,
is
every
five
seconds
that
check
delay,
which
I
talked
about.
Is
that
time
delay
before
it
starts
to
poll?
That's
by
default
at
a
minute
and
right
now
my
minimum
opened
connections
is
zero.
So
it's
going
to
wait
till
all
the
connections
drain
to
zero
and
your
environment.
That
may
not
be
zero.
A
A
And
that's
this
kick
ers
D
cool
that
one's
going
this
one
just
spun
up
thirty
three
seconds
ago,
so
that
was
the
second
one.
Now
here's
the
third
one
that's
gonna
terminate
once
again.
This
one
goes
to
zero
will
finish
and
how
long
we
have
this
log
here.
So
we
did
a
five-minute
run
for
I
got
a
thousand
connections,
so
we're
probably
getting
close
to
that
five
minutes
so
be
able
to
check
back
and
see
how
many
requests
we
sent
and
then
what
were
the
error
rates
now?
A
The
tricky
part
with
doing
this
is
it's
difficult
to
really
get
100%
of
you
know
of
a
rollout
right,
there's
always
some
times
where
you
there's
some
chance.
You
might
still
get
errors
from
your
users,
but
again
the
job
of
this
is
to
help
minimize
those
those
rollout
problems,
all
right
cool,
so
that
one
went
to
zero
zero.
It's
getting
terminated.
Our
new
pod
will
spin
up
here
in
a
second.
A
Okay,
there
comes
the
new
one
t4
hf2
here
it
pops
it
in
there
again,
traffic
she's
spread
out
across
all
three
of
those.
Now
once
that
gets
bounced
out,
then
our
job
will
finish
here:
okay,
cool!
It
did
finish
so
now
you
can
see
we
sent
for
that
private
run.
We
sent
how
many
5.4
million
connector
requests
and
they're
all
200
level
requests
and
we
had
zero
for
hundreds
of
500s
in
that
whole
scenario.
Right
so
again
that
was
a
simple
test.
It's
a
small
app!
It's
a
very
stateless
app!
So
it's
just
again.