►
Description
In this tutorial, we'll set up a demo application and have it undergo some chaos in combination with load testing. We will then use Keptn quality gates to evaluate the resilience of the application based on SLO-driven quality gates.
Follow the tutorial: https://tutorials.keptn.sh/tutorials/keptn-litmus-08/index.html
Learn more: https://keptn.sh
Get started with tutorials: https://tutorials.keptn.sh
Join us in Slack: https://slack.keptn.sh
Star us on Github: https://github.com/keptn/keptn
Follow us on Twitter: https://twitter.com/keptnProject
A
Hi
everyone
in
this
video,
I
want
to
show
you
how
to
evaluate
the
resilience
of
your
applications
with
litmus
chaos
prometheus
and
captain
for
this
I'm
following
a
tutorial
from
tutorials.captain.sh.
I
will
also
link
this
tutorial
in
the
video
description,
so
make
sure
to
check
it
out.
So
for
this
tutorial,
all
you
need
to
do
is
to
bring
your
own
kubernetes
cluster
to
get
started.
I
brought
my
cluster
from
gke.
A
It
has
enough
nodes
and
vcpus.
So
I
came
with
a
cluster
with
eight
vcpus
and
around
30
gig
of
memory,
and
so
I
can
install
everything
that
is
needed
in
this
tutorial.
A
First
part
is
that
we
are
going
to
install
istio.
We
will
need
it
later
on
for
traffic
routing
I've
already
downloaded
the
istio
cli
here,
so
I
can
already
go
ahead
and
install
this
on
my
cluster.
So
here
in
my
terminal,
I'm
already
connected
to
my
cluster
and
I
just
execute
the
insta
installer
here
and
I
just
let
it
finish
the
installation
for
a
second
in
the
next
part,
we
will
download
the
captain
cli
and
we
will
then
also
install
captain
on
our
kubernetes
cluster.
There
are
different
ways
to
download
the
captain
cli.
A
I
have
chosen
the
first
way
just
by
using
a
curl
to
download
the
captain,
the
most
recent
captain
version
onto
my
local
machine
and
then
I'm
going
to
install
captain
from
my
local
machine
in
the
kubernetes
cluster
that
I'm
connected
to.
A
So
let's
just
look
back
if
easter
is
already
installed,
not
yet
we're
still
waiting
for
it
to
be
installed,
but
it
should
be
finished
just
in
a
second,
so
I
can
already
go
ahead
and
copy
this
command,
which
will
install
captain
or
actually,
which
will
download
the
captain
cli
on
my
local
machine
and
then
we're
going
to
install
captain
into
the
cluster.
A
So,
let's
download
the
captain
cli,
which
should
just
take
a
second
and
in
the
next
part
we're
going
to
install
captain
into
our
kubernetes
cluster.
We
will
use
the
cluster
ap
endpoint
service
type
for
captain.
We
won't
expose
it
by
a
node
port,
because
we
will
do
this
part
with
istio
the
exposure
of
the
captain
api
and
the
services
that
captain
managers
will
do
with
istios
is
there
will
be
our
ingress
in
this
exam
and
the
use
case.
We
are
having
for
this
installation
is
the
continuous
delivery
use
case.
A
Captain
into
the
cluster
just
move
this
a
little
bit
and
we're
going
to
do
this
by
just
copying
this
command
here
into
our
terminal,
and
it
will
ask
us
if
we
want
to
actually
install
it
on
the
cluster
mine
is
called
the
litmus
tutorial.
I
will
go
ahead
yes
and
we'll
install
it
into
the
cluster,
so
this
will
take
about
three
minutes
to
install.
Let's
take
a
look.
What
is
actually
installed
here,
so
we
chose
we're
choosing
here
the
use
case,
continuous
delivery.
A
That
means
that
we
are
not
only
installing
the
control
plane
of
captain
which
you
can
use,
for
example,
the
quality
gates
use
case
of
captain,
but
we're
also
installing
some
services
from
the
execution
claim,
like
the
helm
service
or
the
cheat
meter
service
that
are
responsible.
The
first
for
deploying
our
application
in
the
second
one,
the
cheap
leader
service,
to
be
able
to
send
some
load
tests,
some
genuine
tests
against
our
services
that
we
actually
manage
with
captain.
A
You
can
also
just
install
captain
control
plane
and
later
on,
that
installed
the
execution
plane
services
by
itself,
or
you
can
just
go
ahead
and
use
this
use
case
flag
into
captain
installation
procedure.
So
in
my
case
I
want
to
install
captain,
including
the
execution
plane
services
like
the
helm
service
and
the
cheap
meter
service.
As
we're
going
to
need
this
in
this
tutorial.
A
I
said
it
will
take
a
couple
of
minutes
to
be
finished.
Let
us
take
a
look
here
and
it's
already
finished.
Captain
has
been
successfully
set
up
on
your
cluster,
so
that's
perfect.
Let's
move
on
and
now
configure
istio
to
be
the
ingress
for
our
captain
installation.
For
this
we
have
provided
a
sample
script
and
we
just
download
this,
make
it
executable
and
then
execute
it.
A
So
in
the
first
step,
I
just
downloaded
the
script
and
I
will
just
add
the
execute
it
will
flag
to
it
and
now
let
us
just
execute
the
script,
so
it
will
basically
set
up
an
ingress
for
captain
and
a
gateway,
and
it
will
also
restart
the
help
service
part,
as
it
should
pick
up.
The
new
configuration
that
we've
just
changed
here
with
the
script.
A
In
the
tutorial
itself,
there
is
actually
also
the
description
what
is
really
going
on,
but
we
want
to
focus
in
on
the
use
case
later
on.
I
just
want
to
walk
you
through
this
to
see
how
long
it
takes
to
actually
walk
through
this
tutorial.
So
let
us
now
connect
our
captain
cli
to
the
captain
installation
that
we
just
did
on
our
kubernetes
cluster.
For
this,
I'm
copying
these
two
variables
here,
the
captain
endpoint
and
the
captain
dpi
token
they
will
be
fetched
by
this
cubesat
commands.
A
So
no
need
for
me
to
to
know
the
captain
end
point.
In
the
api
token
myself,
we
have
provided
here
some
utility
functions
and
in
the
next
part
I
can
actually
authenticate
now
the
captain
cli
against
the
captain
cluster,
and
here
we
go.
I
am
now
successfully
connected
against
my
cluster
just
open
this,
and
here
we
are.
I
have
access
to
the
swagger
ui.
That
is
our
api
documentation,
and
I
see
it's
the
correct
version
of
captain
that
has
just
been
installed.
A
That's
great!
So,
let's
move
on
in
the
tutorial
also
provided
a
couple
of
demo
resources
for
you
to
use
and
let
us
just
download
them
from
github
and
we
have
everything
locally
available
on
our
machine,
and
I
think
I
already
did
this
earlier.
So
let
me
just
remove
the
folder
where
I
downloaded
it
and
download
it
again.
A
A
A
A
So
the
first
one
is
our
prometers
service,
which
will
be,
which
will
be
responsible
for
configuring
and
creating
a
parameters
instance
and
the
other
one
is
our
previous
sli
integration,
which
is
responsible
for
retrieving
data
from
promoters.
So
I'm
installing
both
into
the
kubernetes
cluster.
A
So
this
service
yaml,
is
actually
has
been
downloaded
from
the
repository
from
earlier
in
step
number
eight,
we
downloaded
the
demo
resources
and
we're
just
going
to
apply
the
service.
The
business
service,
that
is
its
definition,
is
stored
in
this
folder.
So
let
us
run
this,
and
since
this
is
now
created,
let
us
take
a
look
on
all
the
parts
that
are
now
already
running
in
our
capital
installation.
A
So
the
captain
installation
lives
in
the
captain
namespace,
and
we
can
see
a
couple
of
pods
are
running,
including
the
litmus
service
that
is
already
running
in
our
installation
in
our
capital
installation.
Perfect.
Now,
let's
go
ahead
and
create
a
project,
so
we
want
to
create
a
project
basing
on
this
shipyard
definition
and
this
ship
that
this
definition
has
one
stage
it's
the
ko
stage
and
it
has
one
delivery
sequence
where
we
actually
want
to
do.
First,
the
deployment
a
test
and
then
an
evaluation.
A
So,
let's
sorry
one
copy,
let's
first
copy
this
line,
and
now
let
us
create
a
project
here
we
go
project
successfully
created.
I
have
not
yet
defined
a
git
upstream
for
this
repository.
So
right
now
everything
is
stored
in
the
git
repository
managed
by
captain
locally
inside
this
kubernetes
cluster,
which
is
fine
for
the
moment.
A
Let
us
now
onboard
our
service,
we
call
it
the
hello
service
and
we
have
all
the
charts
the
help
charts
already
defined
in
our
test
data
repository.
So
we're
just
going
to
create
this
service
here
in
our
demo
kinetic
cluster
and
in
the
next
step
we
are
now
going
to
add
some
jeta
tests
first
load
test
in
some
configuration
how
we
want
to
execute
this
load
test.
So
I'm
copying
both
lines
and
adding
this
also
to
the
captain
installation.
A
A
And
now,
let's
take
a
look
on
the
slow
file,
hello
service,
slo,
dot,
gamma.
So
what
we
can
see
here
is
we
have
defined
two
slos.
A
But
what's
really
important
is
the
criteria
here
in
the
morning
and
also
in
the
past,
in
the
past
property
here
of
our
slice
and
for
the
pro
duration.
That
means,
if
the
probe
is
faster
than
200
milliseconds,
we
will
get
full
score
for
this
criteria
and
if
not,
we
will
get
zero
points
for
this
quality
evolution.
A
Okay,
so
now,
let's
add
our
chaos
experiment
to
to
the
git
repository
that
is
managed
by
captain.
Also
here,
this
experiment
is
already
defined
for
you
to
use.
We
just
copy
this
and
execute
it,
so
we
are
adding
from
our
local
instance.
We
are
adding
the
experiment
of
the
angle
onto
our
kubernetes
cluster.
A
One
final
step
for
the
setup:
let's
say
we
want
to
configure
promethoice
for
our
litmus
project
and
our
hello
service
application,
and
we
also
want
to
add
the
premier's
blackbox
exporter.
We
we
will
do
this
because
our
application
is
actually
going
to
be
removed.
The
the
part
in
which
our
application
is
running
will
be
removed
by
the
case
experiment,
and
we
need
some
way
from
the
outside
to
check
the
availability
of
the
service,
because
the
service
won't
be
available
for
a
couple
of
seconds
and
we're
going
to
do
this
with
the
black
box
exporter.
A
So
with
this
first
we're
going
to
do
the
captain
configure
monitoring
prometheus
that
will
set
up
prometheus
and
also
alerting
rules
for
prometheus,
which
we
won't
use
in
this
example.
But
usually
this
is
what
captain
configure
monitoring
parameters
is
doing
and,
in
the
next
part,
we're
going
to
install
our
blackbox
exporter.
A
A
A
A
A
A
Oh,
the
hello
from
the
potato
web
application
is
already
prompt
here
and
we
can
see.
Currently
the
little
service
is
running
and
the
cheap
leader
service
is
running,
so
the
litmus
service
is
sending
the
instructions
to
litmus
chaos.
To
start
the
chaos
experiment
and
the
case
experiment
that
we
have
added
to
our
capital
management
repository
are
the
the
pod
delete
experiment,
so
litmus
chaos
will
go
ahead
and
delete
the
application
port.
A
So
with
this
we
can
make
sure
that
our
application
we
are
testing
is
actually
under
some
load.
That
is
important
because
we
don't
want
to
evaluate
the
recipients
of
our
education.
If
there's
not,
if
there's
nothing
going
on,
if
it's
not
under
load,
so
we
will
measure
and
the
impact
of
chaos
in
an
idle
situation,
and
we
want
to
impact
it.
We
want
to
evaluate
the
impact
of
chaos
in
a
real
world
scenario
in
a
real
world,
setting
where
there
is
actually
some
traffic
going
on
fire
against
our
application.
A
A
So,
while
I
was
talking
both
the
cheerleader
service
as
well
as
the
little
service
finished
their
job,
and
we
can
already
see
the
evaluation
scored
zero
points,
we
can
even
take
a
look
why
it
scored
zero
points.
We
can
take
a
look
here
clicking
on
this
little
icon
here
and
we
see
the
probe
success.
Percentage
was
only
65,
so
only
65
of
the
time
of
the
evaluation
period.
The
evaluation
period
is
actually
around
two
minutes.
A
A
So
what
does
it
mean?
It
could
not
respond
to
all
the
requests
that
were
sent
from
the
from
the
prometheus
blackbox
explorer
and
it
would
not
be
able
to
respond
to
all
the
requests
that
are
sent
by
real
users.
A
So
what
we
now
want
to
do
is
we
want
to
first.
Let's
say
we
want
to
increase
the
replica
set
of
this
application
instead
of
running
only
one
instance
of
this
application.
Let's
run
three
applications,
three
instances
of
this
application.
That
means,
if
our
case
experiment
is
going
to
delete
one
of
those
instances.
Two
instances
should
still
be
available
and
should
be
able
to
serve
the
traffic
that
is
fired
from
chelita
against
our
application.
A
Actually,
everything
that
we
just
saw
is
also
has
also
been
shown
here
in
the
run
experiment
step
where
we
can
see
that
the
experiment
failed.
So
let's
now
increase
the
resilience
of
this
application
and
let's
take
a
look
on
the
deployment
event
that
we
now
want
to
send
to
our
captain
installation.
A
So
the
important
part
that
I
want
to
focus
on
here
is
the
replica
account
so
just
with
sending
this
instructions
here,
this
it's
a
cloud
event
holding
all
the
information
that
is
needed
for
captain
to
trigger
a
new
delivery
sequence
and
the
important
part
that
we
want
to
change
here
is
the
replica
account
we
are
using
the
same
image
as
before.
We
are
just
changing
the
replica
account
to
three.
A
That
means
we
will
now
run
three
instances
of
our
application
and
before
we're
going
to
do
this,
let
me
just
open
another
tab
here,
so
we
can
watch
the
actual
application,
how
it
will
be
scaled
up
and
how
little
scales
will
eject
the
chaos
experience.
A
So
here
ctl
get
paul's
lines
and
and
it's
running
in
the
litmus
ks
namespace
so
right
now
we
have.
Let's
do
it
like
this?
Probably
we
have
one
instance
of
our
application
running
sorry.
Here
we
go
so
now.
Let's
do
a
captain
send
event
our
hello
service,
the
deployment
event
that
I
just
showed
you,
let's
now
send
this
event
to
captain
captain
will
pick
up
this
event
and
it
will
trigger
the
helm
integration
to
to
start
the
deployment
we
can
already
see.
It's
now
scaling
up
to
three
instances.
A
Two
are
not
ready,
yet
they
are
not
yet
available
because
there
is
a
readiness
pro
defined
as
part
of
this
application,
and
it
will
take
30
second
seconds
for
it
to
initiate
and
then
be
able
to
serve
the
traffic.
So
after
around
30
seconds,
we
should
have
two
ready
instances
available
and
once
these
are
available
here
we
go.
One
is
already
available.
The
other
one
should
come
up
quite
soon.
A
Once
these
are
available,
captain
will
be
informed,
the
deployment
is
finished
and
it
will
go
ahead
and
trigger
the
next
phase,
which
is
now
actually
the
chaos
experiment.
So
we
can
already
see
the
pot
delete.
Experiment
has
now
started.
There
is
also
a
chaos
runner
managed
by
litmus
chaos
just
started
and
one
of
those
other
three
parts
we
actually
deleted
by
this
chaos
experiment.
So
we
can
see
one
exactly
this
one
is
now
terminating
and
another
one
is
coming
up.
A
This
experiment,
sorry,
this
part
here,
will
still
take
around
30
seconds
to
be
fully
ready
again
here.
The
readiness
probe
is
running
again,
so
it
will
take
around
30
seconds
for
it
to
be
ready
and
also
to
serve
the
traffic,
but
the
other
two
instances
they
were
able
to
serve
the
traffic
already.
So
in
this
run
of
the
experiment,
we
have
not
deleted
all
the
instances
of
the
application,
because
we
now
have
three
instances
running
of
this
application
and
we
only
deleted
one
of
them.
A
Two,
the
other
two
should
be
able
to
respond
to
all
the
requests
and
to
serve
the
traffic,
and
we
will
see
this
indicator's
evaluation,
how
this
is
actually
impacting
the
resilience
of
applications.
A
We
just
saw
that
the
litmus
chaos
managed
resources,
the
chaos,
runner
and
the
experiment
itself.
They
are
now
finished.
They
are
also
removed
again
from
this
namespace
and
everything
has
been
now
recorded
back
also
to
captain.
So
let's
take
a
look
in
our
in
the
captain's
bridge.
A
Here
we
go.
The
delivery
succeeded
in
this
case,
and
we
can
see
just
remove
this.
We
can
see
the
deployment
was
successfully
finished.
The
tests
both
tests,
fx,
have
actually
finished,
and
the
evaluation
has
now
scored
100
one
hundred
percent
of
of
our
quality
score.
So,
let's
take
a
look,
as
we
already
hoped.
The
success
percentage
is
now
100,
so,
instead
of
killing
all
the
pot,
so
we
just
killed
one
quarter.
We
also
did
it
earlier.
We
killed
one
pot,
but
it
was
only
one
part.
A
Only
one
instance
of
our
application
was
available.
We
also
did
this
now
to
we
deleted
one
instance
of
our
application.
The
other
two
were
still
able
to
serve
some
traffic,
so
that
was
the
success.
Percentage
of
all
the
pros
we
sent
to
our
application
was
100
and
due
to
the
high
available
high
availability
in
terms
of
at
least
two
replicas
that
were
available
to
respond
to
the
requests,
we
could
also
satisfy
the
probe
duration
in
terms
of
being
faster
than
200
milliseconds
in
response
time.
A
So
with
this,
we
have
found
a
way
to
increase
the
resilience
of
this
one
microservice
application.
There
might
be
other
ways,
but
for
our
for
our
tests,
we
did
it
with
scaling
up
the
application
making
it.
Let's
say
high
available
with
three
instances
running
of
this
application
instead
of
one
instance.
So
every
time
there
is
one
instance
crashing,
there
is
still
two
other
instances
that
can
serve
the
traffic.
A
So
we
can
also
see
this
here.
There
might
be
some
other
numbers.
It
really
depends
on
your
setup
and
it
really
depends
on
the
sizing
of
the
cluster
which
numbers
you
you
can
come
up
with,
but
nevertheless
we
can
see
that
we
did
one
run
that
actually
failed.
The
quality
gates.
We
identified
that
our
application
is
not
recipient
and
should
not
be
moved
to
production.
A
We
fixed
this
issue
by
scaling
up
the
application
to
three
instances
of
the
same
application
and
having
the
traffic
shifted
between
these
three
applications
or
three
instances,
and
we
did
another
successful
run
of
our
delivery
sequence
with
a
deployment
and
test
and
evaluation.
In
this
case,
everything
is
going
fine.
A
So
thanks
for
watching
this
short
video
again,
all
the
tutorials
are
linked
on
below
this
video
and
you
will
find
all
the
tutorials
of
tutorials.captain.ch.