►
From YouTube: Chaos Engineering WG - 2018-07-24
Description
Join us for Kubernetes Forums Seoul, Sydney, Bengaluru and Delhi - learn more at kubecon.io
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
A
B
I'm
here
this
is
Tammy
from
gremlin.
This
is
the
first
time
I've
been
able
to
join
I've,
been
traveling
a
lot
so
times
and
didn't
work,
but
yeah
I'm
in
San,
Francisco
right
now
and
yeah
great
to
be
here
and
I.
Do
chaos
engineering
at
gremlin
and
I
also
previously
did
chaos.
Engineering
at
Dropbox
or
I
got
a
10x
reduction
in
incidents
and
no
high
severity
incidents
for
12
months
and
before
that,
I
did
it
at
the
National
Australia
Bank
for
like
many
years
since
really
2009
so
yeah
great
to
be
here.
A
A
Right
so
I
went
out,
did
a
pull
request
to
the
cloud
native
landscape
and
started
with
four
projects
that
I
could
essentially
find
an
SVG
logo
for
of
high
quality,
a
you
know,
base
and
also
kind
of
information
on
contrabass
and
so
on
so
different
requirements
that
we
have
in
the
and
the
CNC
F
landscape.
So
I
issued
a
PR
if
there
are
other
projects
that
you
want
to
add,
please
let
me
know
I'm
more
than
happy
to
kind
of
add
them.
A
I
think
we'll
start
with
kind
of
a
flat
structure
first
and
then
later
on,
we
kind
of
go
fight
about
how
we
want
to
sub
categorize
things,
because
we
had
definitely
a
bit
of
difficulty
kind
of
trying
to
figure
out
how
to
do
that.
So
hopefully,
people
are
okay
with
this
approach
in
terms
of
starting,
simple
and
iterating,
but
look
kind
of
left
to
open
it
up
to
the
feedback
from
the
group
before
before.
Moving
on.
A
A
I
guess
silence
is
there's.
No
disagreement.
Have
a
look
after
the
meeting:
okay
yeah!
Now
it's
pretty
straightforward,
I
just
started
with
four
and
we'll
kind
of
go
from
there.
Eventually,
once
those
are
in
the
interactive
landscape,
we
could
dis
still
kind
of.
We
have
a
design
team
that
could
kind
of
take
that
and
break
that
apartment
categories,
but
I
first
kind
of
want
to
just
collect
the
information
out
there.
A
The
example
is
a
good
place
to
start.
Oh
yeah
I
was
getting
a
little
frustrated
after
just
kind
of
working
on
things.
I
just
needed
to
get
something
out
there,
so
you
could
get
iterating
cool,
so
yeah,
but
I'll
continue
to
do
that
and
give
you
an
update
in
a
couple
weeks
on
that,
but
hopefully
that
should
get
merged
in
soon
in
terms
of
community
presentations.
A
C
Think,
Karthik
and
I
are
going
to
talk
a
little
bit
about
quickly
what
led
masses
and
we're
still
in
the
early
stages.
Late
masses
we've
been
toiling
for
some
time
on.
You
know
what
to
call
it
math
is
you
can
call
it
it's
a
tool
or
our
vision
really
says
you
know
and
get
all
the
tools,
the
open
source,
the
best
tools
and
then
use
them
together.
So
we
we
call
it
as
actually
a
framework
and
it's
a
framework
for
class
engineering
right
now.
Kubernetes,
so
tagline
is
concentrating
posted
hulloa
clothes
on
kubernetes
stateful
clothes.
C
Instead,
that's
kind
of
you
know
tough
idea
right
now.
All
the
problems
will
come.
The
moment
you
bring
in
the
stateful
applications
and
the
underlying
storage
and
networking
will
play
a
major
role
in
the
stability
of
stateful
cloud
and
at
the
moment
it
gives
a
set
of
heat,
a
sensible
playbooks
and
each
litmus
test
is.
C
Playbook
that
runs
inside
a
container
right
and
in
today's
demo,
we're
going
to
show
how
a
litmus
test
looks
like
on
github
and
also
introduced,
has
into
MySQL
app
and
see
what
lead
pass
does
how
it
must
helps
in
introducing
case
and
saying
whether
the
application
works
as
expected
or
not.
Primarily,
litmus
is
supposed
to
be
used
by
developers
and
devops
groups
in
their
kubernetes
clusters,
see
ACD
pipelines
and
sometimes
before
you
put
things
into
production
example,
a
new
kubernetes
cluster
being
upgraded.
How
do
you
make
sure
this
coven?
C
C
C
It's
just
a
matter
of
cloning,
the
project
you
get
all
the
tests
and
then
modify
the
test
according
to
you,
according
to
your
need,
and
then
really
just
from
the
test,
using
the
cube
shuttle
command
right,
so
every
test
will
have
file
called
and
litmus
test
so
that
really
kicks
towards
the
actual
it
must
test.
Let
me
go
and
see,
show
you
so,
for
example,
my
SQL
application
of
the
Percona
application.
C
C
If
you
see
here,
we
are
right
now
using
the
underlying
storages
opening
their
storage,
you
could
perhaps
use
rook
or
port
works,
which
were
following
the
kubernetes
way
of
attaching
the
persistent
volumes
to
the
pod
and
we're
currently
supporting
for
this
particular
test.
Three
types
of
case,
one
is
through
Pumbaa
Pumbaa
itself.
I
think
we
heard
from
Alexei
lost
session
that
combo
is
a
cast
tool
that
can
introduce
two
types
of
guys:
one
is
to
introduce
network
latencies
and
the
other
one
is
to
actually
dock
a
stop-start
type
of
thing.
C
So,
in
this
test,
we're
going
to
fill
an
application
part
using
docker
stop
through
the
Pumbaa
api's,
and
the
sister
also
can
be
used
to
introduce
different
types
of
cares.
You
know
you
can
use
event
where
kubernetes
things
should
be
set
up
by
litmus,
so
that
the
port
gets
evicted
and
then
see
you
know
what
happens
to
the
underlying
application.
Similarly,
the
node
train
that
really
means
that
the
lateness
job
will
go
and
kill
one
of
the
nodes
right.
C
That's
the
kind
of
configuration
parameters
that
we
provide
so
just
to
give
before
we
do
a
quick
demo.
I
want
to
take
you
through
the
setup
that
we
have
also
the
demo
flow
and
we
have
set
of
nodes,
and
there
was
this-
the
kubernetes
cluster
on
Google
Cloud
GK
engine,
and
we
have
a
couple
of
nodes
where
the
GPD
is
configured
as
a
data
source
to
these
nodes,
and
we
are
expected
to
run
an
application
that
uses
this
data.
C
So
what
we
do
is
we
run
a
litmus
job
that
is
the
following,
so
it
lodges
a
litmus
pod
which
does
the
real
test
and
it
launches
the
my
scale
pod
and
make
sure
that
the
underlying
data
I
connect
to
it
is
done
through
open,
EBS
or
whatever
is
configured
as
part
of
the
test,
and
then
it
launches
the
caius
framework
Pumba,
and
then
it
introduces
caps
right
and
masks.
You'll
board
will
be
watching,
for
you
know
the
test,
whether
it's
running
fine
or
not.
The
moment
you
introduce
casts
it
gets,
killed
and
kubernetes
relaunches.
C
It
right
says
the
same
process
can
happen
again
and
again
you
keep
introducing
her
as
it
can
configure
how
many
attempts
you
want
to
do
this
and
after
the
end
of
introduction
of
chaos,
you
go
and
really
verify
the
data
right,
the
stuff
like
monsters,
okay,
the
curse
has
been
introduced
now,
poor,
you
schedule
somewhere.
If
it's
not
at
all
scheduled,
then
that's
a
failure,
even
if
it's
scheduled.
C
Is
it
really
connected
to
the
data
and
I
mean
seeing
the
right
tables
underneath
and
that's
that's
really
what
it
is
and
then
it
cleans
up
one
by
one
and
then
you
know
all
the
latest
part,
and
then
it
gives
you
back
the
node
in
the
same
state,
the
cluster.
So
the
idea
here
is
Latinos
with
litmus.
The
DevOps
teams
can
really
take
from
end
to
end
a
given
test
in
an
easy
manner
into
the
pipelines
right.
C
C
I
got
three
windows
here
to
show
one
is
where
I'll
be
watching
were
the
parts
in
the
litmus
name,
space
and
also
I'll,
be
observing
the
logs
coming
out
of
litmus
pod
and
then
really
this
wind
I
am
going
to
kick
start
the
test.
So
let
me
just
show
the
test
again.
I've
taken
open
EBS
as
the
cache
and
actionable
is
just
puts
all
the
locks
on
to
the
STD
out
and
caius
type
and
taking
Pumbaa
here.
C
C
Am
running
this
test
and
watching
the
little
Muslims
base
it
already
started,
and
this
is
the
container
that
it's
creating
and
I'm
observing
the
logs
on
this
window.
As
you
can
see,
it's
in
the
mode
of
it
already
deployed
the
application
and
it's
it's
coming
up
and
you
can
see
the
open,
EBS
volume,
controller
and
three
replicas
are
already
deployed
and
once
the
MySQL
Perkin
application
comes
up,
then
you
will
see
Boombah,
also
getting
launched
and
caius
being
introduced,
and
then
you
can
keep
watching
how
Percona
behaves
in
the
meantime.
C
As
you
can
see,
the
application
is
running
right
now.
You
know
the
entire
test.
We
configure
a
date
to
finish
in
about
two
to
three
minutes
and
you
can
see
that
there
are
some
test
data
being
written.
Also
the
Pumbaa
is
being
launched
and
that's
the
has
tool
being
used
for
this
particular
test
and
then
the
moment
Pumbaa
comes
up.
It
starts
introducing
the
curves,
which
is
nothing
but
kill.
This
part
the
application
point,
and
then
we
expect
that
part
to
come
back
up.
C
C
So
this
this
part
has
gone
into
error
state
and
we're
waiting
for
copan.
It
is
to
reschedule
it
it's
rescheduled
back
and
again
it's
in
that
state
getting
killed
and
coming
back.
Then,
once
that's
done,
we
have
just
for
the
benefit
of
keeping
the
demo
short.
We
put
less
duration,
the
casts
in
the
real
test
you
would
want
to
see.
C
Sometimes
kubernetes
puts
turn
or
port
back
onto
the
same
node
right.
So
the
idea
would
be
the
best
practice
in
this
case
is
now
introduce
some
same
guys
ten
times
and
observe
eight
going
across
multiple
nodes
and
finally
see
whether
the
data
is
persisted
or
not,
and
it's
coming
back
and
here
we'll
we'll
be
able
to.
C
C
That's
the
kind
of
introduction
of
cares
and
verifying
the
data
is
already
they
have
not
one
typical
way
to
do.
This
is
whatever
we
did,
because
we
could
have
put
this
into
a
pipeline,
and
then
you
know
repeat
as
many
times
as
you
want,
and
the
ansible
jobs
can
be
configured
automatically
to
alter
the
configuration
parameters
of
the
play
bugs.
So
this
tests
of
a
lady
tend
to
be
used
in
a
friendly
way
by
the
DevOps
themes.
So
that's
the
quick
demo.
Hopefully
it
made
sense
to
you
any
questions.
D
C
Yeah,
so
it's
a
community
project
again,
as
I
was
saying
in
the
next
few
weeks,
two
months
more
tests
get
moved
into
the
litmus
framework
and
we
would
like
to
see
you
know
various
application,
developers
or
users
getting
their
expertise
into
this
test
and
then
using
this
test
to
their
own
needs
or
requirements.
Yeah.
A
Thank
you,
my
Karthik,
so
not
too
many
things
to
you
know
before
we
close
out
the
meetings.
So
just
to
you
know,
if
you
go
to
slide
14
just
calling
out
on
essentially
mainly
the
white
paper,
you
know,
there's
been
some
discussion
and
iteration
there.
So
I
encourage
the
group
to
continue
to
do
that
and,
of
course,
the
landscape
I
have
the
initial
PR
out.
So
if
you
have
another
project,
you
want
to
add
there,
please
do
and
kind
of
once
we
build
that
up.
A
We
could
have
more
discussions
about
breaking
those
apart
into
kind
of
subcategories
quarter
kind
of
wrapped
things
up,
gentle
reminder
on
slide.
16.
The
first
chaos
conference
is
happening
in
San
Francisco,
hosted
by
our
friends
at
gremlin,
so
we'll
be
there
and
good
to
have
some
folks
also
show
up
there.
We're
also
going
to
be
doing
a
chaos
engineering
track
at
cube
cotton
cloud
native
con
in
Seattle
in
December.
A
The
chaos
engineering
working
group
will
be
entitled
to
essentially
to
talks
at
cube
con,
so
I'm
gonna
try
to
figure
out
how
to
best
divvy
that
divvy
that
up
with
the
group,
but
essentially
you
know
I'm
looking
cut
out
for
introductory
content
and
maybe
kind
of
an
overview
of
the
kind
of
different
tools
out
there
with
demos.
But
we
don't
have
to
figure
that
out
right
now,
but
something
to
keep
in
mind
other
than
that
any
other
questions
I'm,
always
seeking
volunteers
for
community
demos.
A
A
Pressure
makes
diamonds
will
be
good.
Deadlines
are
good,
exactly
yeah
and
then
I
don't
know.
If
there
was
someone
from
lift
two
I'd
be
kind
of
curious
to
see.
If
lift,
would
you
willing
to
talk
about
some
of
the
stuff
they
do,
especially
with
envoy
has
some
of
the
baked
in
ability
to
kind
of
do
chaos,
testing
so
yeah.
E
That's
me
Zach
here,
there's
a
couple
of
things
that
I
could
demo
I'm,
not
I'm,
not
sure
what
I
need
to
do.
I
need
to
talk
to
you
before
doing
these
things,
but,
okay,
there's
a
red
line
test
that
we
that
we
run
across
all
of
our
services.
We
kind
of
adjust
the
load
balancing
weights
through
envoy
discovery,
service,
yep.