►
From YouTube: Chaos Engineering WG Meeting - 2018-09-25
Description
Join us for Kubernetes Forums Seoul, Sydney, Bengaluru and Delhi - learn more at kubecon.io
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
A
Gets
no
so
I
hope
everyone
had
a
couple
good
weeks
weeks,
since
we
last
chat
so
really
brief
agenda
this
week,
I
know
many
of
hopefully
a
lot
of
you
will
be
at
the
Kaos
conference
coming
up
on
Friday,
so
I'm
good
to
hopefully
catch
up
with
some
folks.
You
know
face
to
face
in
San
Francisco,
but
today's
meeting
start
with
some
any
introductions
for
any
people
to
call
we're
gonna.
A
Do
a
community
presentation
from
Julian
who's
gonna
do
a
little
bit
of
a
demo
around
sto
and
Kaos
engineering,
which
I'm
personally
excited
about
since
sto
is
kind
of
a
growing
project
and
there's
some
there's
a
lot
of
let's
say
like
knobs
and
switches
that
he
could
play
with
sto,
so
I'm
kind
of
curious.
What
Jillian
has
to
share
and
then
I'll
just
kind
of
do
a
brief
kind
of
update
where
we
are
at
landscape
and
white
paper.
C
A
Good
good
good
get
to
hear
from
you
late.
So
let's
go
on
so
oh
my
gosh,
my
computer
is
going
crazy,
so
it's
gonna
start
things
off.
I
won't
hear
from
Julian.
He
linked
his
slides,
but
I'll
talk
a
little
bit
about
I
for
people
that
are
not
familiar
with
kubernetes
hope
you,
like
a
brief,
brief,
brief
intro
and
then
talk
a
little
about
service
measures
and
chaos.
Engineering,
so
I'm
happy
to
share
mice,
are
happy
to
give
up
the
sharing
if
you
want
to
kind
of
steer,
steer
things
Julian.
So
let
me
go.
D
Right
cool
so
welcome
everybody
I
hope.
Can
I
zoom
all
right?
That
is
better,
so
we're
going
to
talk
about
cows,
engineering
with
the
service
mesh
and
even
though
many
people
are
more
interested
into
service,
machine
and
chaos,
engineering,
I
hope
you
take
out
a
lot
of
knowledge
from
to
implement
cows
engineering
in
your
organization,
so
Who
am
I
I'm
a
software
engineer,
I
turned
DevOps
and
I
used
to
work
for
unity.
D
The
game
engine
company
I
am
now
in
Sweden
Stockholm
working
for
discovery,
and
you
can
contact
me
or
feel
free
to
drop
me
a
message
if
you
want
to
know
more
about
these
topics.
So,
as
I
said
as
a
crystal
in
a
presentation
we're
going
to
cover
a
little
bit
the
base
ground
of
why
service
mesh
and
how
it
came
to
be.
We
also
follow
with
a
few
demo.
D
I
will
demo
invoice
separately
from
each
do
because
I
I've
done
this
talk
a
few
times
now,
and
people
have
a
kind
of
a
hard
time
to
grasp
what
the
power
of
invoice
and
how
does
it
fit
with
sto
and,
of
course,
since
it's
basically
intended
for
people
to
introduce
people
to
cows,
engineering
I
will
demo
I
will
introduce
the
concept
of
chaos,
engineering
and
demo.
Some
fault
injection
feel
free
to
stop
me
at
any
time.
If
you
have
a
question,
I,
don't
know
I,
don't
think
I
might
see
everybody,
but
it's
alright.
D
So
at
the
beginning
there
was
an
app
and
app
was
code
and
that
needed
to
scale
so
most
most
companies.
They
have
this
big
monolith
and
inside
that
model,
if
they
have
a
few
components
that
are
pretty
independent
from
each
other
and
instead
of
scaling
vertically
meaning
buying
bigger
box,
they
they
try
to
break
it
down
into
what
we
call
my
micro
services
so
into
separate
services.
D
The
problem
is
that,
instead
of
just
calling
a
function
that
is
nearby
in
the
code,
they
have
to
go
through
the
network
and
that
brings
a
whole
new
set
of
problems
eventually.
So
from
there.
The
deployment
is
not
one
thing:
that's
done
atomically.
So
now
we
have
tons
of
little
micro
services.
That
does
one
thing
and
to
solve
that
docker
and
the
container
came
to
be
the
changes
that
the
it
brings
to
the
code.
D
How
do
you
schedule
them
I'll?
Do
you
make
them
talk
to
each
other?
How
do
you
make
sure
that
they
are
healthy
and
also
the
the
secrets
and
the
configuration
are
kind
of
hard?
But
this
is
where
all
those
point
here
kubernetes
can
help.
So
Q
Benitez
is
a
is
a
scheduler.
Basically,
kubernetes
fix
the
deployments
and
the
rolling
out
of
new
version
it's.
It
is
very
good
to
increase
the
speed
of
development
for
a
team,
but
it
doesn't
solve
everything.
D
So,
as
I
said
previously,
the
the
network
is
still
a
problem,
so
we
introduced
there
is
this
8
fallacies
of
distributed
computing
that
people
might
be
familiar
with,
and
if
you
see
the
first
one
is
the
network
is
reliable.
So
that's
the
biggest
lie.
People
tell
you
and
say:
just
just
stand
it
and
you
will.
You
will
receive
something.
It's
never
true.
You
can
have
hardware
failure,
you
can
have
miss
configuration
and
packet
get
lost
all
the
time.
So
to
cope
with
that.
There
is
also
this
RFC
that
I
highly
recommend
is
3,
page
long.
D
It
was
written
22
years
ago
and
it's
still
relevant
and
actually
quite
funny.
It's
it's
totally
worth
watching
it
reading
it.
So
of
course,
once
you
have
cuban
it
is.
You
also
have
all
those
problems
here
too
to
figure
out,
and
it
becomes
tremendously
overwhelming
for
a
team
to
to
really
handle
each
of
these
building
blocks
separately.
D
If
you
have
a
small,
if
you
want
to
just
do
a/b
testing
and
just
route
a
small
part
of
your
traffic
to
to
a
new
release
to
test
it,
how
about
failover
and
all
those
things
are
really
hard
to
get
right?
I
mean
I,
don't
know
if
anybody
is
implemented
retries,
for
instance,
if
you
had
to
retry
our
service,
I
mean
I.
Did
those
myself
that
way,
so
it's
quite
hard
to
to
get
it
right
without
breaking
something.
So
the
way
developers
solved,
that
is,
they
use
more
code.
D
D
I
think
it's
called
an
abstraction
by
Zachary
Talman
that
described
what
is
a
good
abstraction
in
codes
if
you
are
interested,
but
basically
all
those
problems
with
the
deployment
and
what
happened
if
you
make,
if
you
have
rolling
update
you,
don't
let
down
down
time.
That
means
that,
in
your
code,
you
have
to
handle
two
cases
if
you,
for
instance,
migrating
from
once
database
schema
to
another
and
also
debugging
becomes
like
tremendously
complicated.
D
If
you
have
all
those
micro
services
talking
to
each
other,
you
want
to
know
basically
what
happened,
and
so
there
is
this
trend
now,
where
something
becomes
imitating
developer
in
what
they
do.
Is
they
take
that
code,
make
it
a
separate
part
of
the
infrastructure?
So
we
see
that
trend
very
much
with
the
service
mesh
trying
to
fix
or
to
not
fix
it's,
not
the
right
word,
but
to
basically
abstract
away
the
network.
So
there
is
this.
The
the
genesis
of
service
mesh
of
East
you
in
particular,
is
I.
D
Thinking
is
done,
serially
from
Google
who,
when
touring
in
Europe
and
in
the
u.s.
asking
about
the
the
main
Google
customer,
what
are
the
main
problem
and,
for
instance,
banks
have
I
think
it's
truly
a
billion
of
dollars
invested
into
hardware
to
encrypt
all
traffic.
So
if
they
don't
want
canary
release
because
nobody
wants
their
bank
to
try
to
see
how
they
handle
money
today
and
see
how
it
works,
but
they
very
much
want
the
encryption,
especially
if
you
are
in
Europe
GDP,
are
actually
force
you
to
encrypt
your
traffic.
D
So
that's
one
thing
to
think
about
or
so
connecting
traffic,
even
if
kubernetes
provide
some
service
discovery
that
you
can
manage
doing
canary
release
in
in
communities
is
quite
hard
and
also
the
last
step
is
observed,
which
is
is
really
hard
for
developer
to
know
how
the
service
behave
once
it's
in
production,
so
here's
the
link
of
the
video.
If
you
are
interested
to
to
basically
see
further,
why
you
don't
need
the
whole
thing
you
can
just
pick
one
block
and
go
with
it
it
will.
D
It
will
make
your
evolution
towards
a
complete
service
mesh
easier
and
so
by
talking
of
what
is
a
service
mesh
I
like
to
explain
what
problem
does
it
solve
and
the
only
problem
in
solve
is
communication
between
services?
Now
it's
not
just
a
one
function,
call
away
you're,
not
on
the
same
box.
You're
you
have
tons,
you
have
you
have
to
go
through
the
OS.
You
have
to
go
to
the
network
to
the
other
OS
and
then
the
application
received.
D
So
there
is
a
lot
of
component
that
can
go
wrong,
but
the
idea
here
is
to
a
service
mesh
isn't
can
be
summarized
a
network
for
services.
You
don't
want
to
describe
all
the
IP
table
and
every
oath,
especially
with
all
those
moving
parts-
it's
become
quite
hard,
even
with
automation,
to
keep
up,
and
so
how
does
a
service
mesh
solve
the
entire
service?
Communication
is
basically
the
sidecar
pattern.
So
here
you
can
see
that
service
B
is
in
what
we
would
call
a
cube
unity
spot.
D
So
is
the
smallest
unit
of
deployment
in
Cubana
T's,
it
can
be
a
pod
can
contain
one
or
more
container,
and
the
goal
here
is
to
inject
a
proxy
inside
that
pod
that
listen
to
every
network
packet
in
the
service
be
is
sending
or
receiving.
So
if
service
a
here
wants
to
talk
to
service
B,
it
has
to
go
through
the
proxy.
The
interesting
part
here
is
that
you
have
the
control
plane.
So
this
the
proxy
to
proxy
is
called
a
data
plane.
D
The
control
plane
is
where
all
the
the
overall
governance
the
decision
get
made.
So,
for
instance,
you
have
three
parts
in
East:
EO
special
specifically,
is
the
pilot
that
is
in
charge
of
making
sure
that
the
consistency
of
the
routes
are
spread
to
the
proxy.
So
if
we
you
create
a
new
services,
you
want
to
let
the
other
proxy
know
about
that.
New
services-
you
just
publish
an
update
and
the
pilot
will
be
wind-
will
be
in
charge
of
updating
its
route
table.
D
D
Then
you
have
the
Citadel,
which
is
in
charge
of
encrypted
rotating
the
product
certificate
to
make
sure
that
service
a
is
authenticated
with
service
B.
Actually,
it's
not
the
service
B
and
a
that
communicates
is
the
proxy
that
encrypts.
So
the
encryption
is
abstracted
away.
It's
taken
away
from
the
code
and
put
inside
the
proxy,
so
you
don't
have
to
worry
about
is
my
call
encrypted.
You
don't
care,
and
so
for
the
data
plane.
D
The
proxy
is
actually
am
void
made
by
lift
if
anybody
hasn't
heard
of
envoy
it's
it's
basically
a
single
binary
that
takes
ten
megabytes
in
memory.
He
can
handle
two
million
per
second
so
before
you
have
a
scaling
problem
with
that.
Well,
if
you
have
scaling
problem,
just
let
me
know,
I
would
be
very
curious
to
see
what
what
you're
doing
so
here
I
want
to
demo.
I
have
HTTP
bin,
which
is
just
I,
want
to
demo
envoi.
So
is
the
size
and
font
size?
Okay,
all
right!
D
D
You
see
here
that
envoi
added
some
others
into
it,
so
you
have
the
request
ID,
which
allows
for
tracing.
So
all
your
requests
are
marked
in
a
way
and
it's
the
responsibility
of
the
application
to
take
that
ID
and
pass
it
to
the
next
service
is
going
to
call
so
that
you
have
all
the
tracing
necessary.
D
So
that's
a
very
quick
demo,
but
we
can
actually
create
errors
and
see
if
I'm
not
so
there
is
a
500,
and
you
can
see
that
the
telemetry
is
quite
interesting
to
see
okay,
so
you
see
that
I
actually
defined
some
retry
logic,
so
you
know
exactly
what
happened
in
the
service,
how
many
time
it
retries
and
it
returned
the
500
after
after
third
the
tree
retried,
so
that
was
for
the
demo
of
envoy
is
everything.
Okay
I
cannot
see.
If
is
it
good
for
okay,
dear
person
to
follow
that's
good
or.
D
Where
was
I
and
boil
that's
done
so,
as
you
can
see.
Usually,
if
you
have
to
configure
this,
is
the
Aussie
Network
stack?
So
if
you
have
to
configure
an
overlay,
this
is
quite
low
level
and
you
have
to
deal
with
IP
table
and
it
becomes
that
tremendously
complicated.
If
you
have
20
service
that
talk
to
five
database,
that's
hundreds
rules
that
you
have
to
implement
so,
but
with
a
service
mesh,
that's
on
top
of
TCP
IP.
D
You
can
actually
just
say
name
that
service
web
and
that
database
database,
and
you
just
have
one
rule
that
scales
instead
of
having
to
be
specific
about
what
you
get
is
that
it's
completely
different
is
for
authentication,
if
you
say
I
want
to
talk
to
that
person
or
I.
Have
this
phone
number
it's
a
little
bit
the
same
so
yeah
the
control
plane
here
is
the
pilot
that
does
service
discovery
and
I
was
wondering
what
was
in
the
code.
D
So
how
do
you
have
to
code
your
application
in
order
to
gain
from
the
from
the
service
mesh?
And
it's
so
smart,
because
you
just
give
it
a
name?
It
doesn't
matter
what
you?
What
you
use
you
just
give
a
name
to
the
service
and
you
stick
with
it
and
envoi
will
make
sure
that
the
the
name
is
resolved
to
the
service
that
you
define
in
a
row.
So
here
I
cannot
understand
for
the
life
of
me
why
they
use
this
port
inside
their
URL,
but
I.
D
Guess
it's
just
the
the
demo
app
that
East
your
team
build
up,
so
they
might
have
some
reason.
I
I
have
no
clue,
but
maybe
it
doesn't
matter.
Maybe
it's
just
okay.
We
use
a
port
and
we
stick
to
it.
So
we
don't
need
to.
We
have
the
full
IP.
We
it's
not
default
to
80,
and
so,
if
you
want
to
see
what
is
a
manifest
of,
for
instance,
what
is
to
your
call
a
virtual
service?
D
It's
quite
easy.
You
can
define
those
the
host
that
is
going
to
send
to
and
you
can
define
rules
to
much
how
the
traffic
is
going
to
get
routed.
So
here
you
see
that
I
have
a
HTTP
Clause
that
will
match
with
the
header
containing
an
end
user
that
is
exactly
JSON
and
since
in
CML,
I
guess
these
parties
together,
and
so,
if
the
user
JSON,
if
a
header
in
the
request
contained
the
other
hand
user
and
is
JSON,
it
will
be
routed
to
the
service
called
review
v2.
D
So
v2
is
like
the
tag
name
in
Cubana
T's,
but
you
can
implement
it
different
way
and
otherwise
it
gets
routed
to
v1
it
will.
I
will
make
a
demo
to
to
clarify
what
all
this
means.
Another
thing
is
that
resiliency
is
basically
out
of
the
box.
You
want
to
implement
retries,
you
just
have
those
three
lines
to
add
at
the
end
of
your
Yano,
I
mean
I've
implemented
retries
a
few
times,
and
this
gets
so
much
easier.
You
just
know
how
much
you
can
know
with
your
timeout.
D
You
can
understand
the
overall
behavior
of
the
of
the
the
ole
request,
so
yeah
about
authentication
and
security.
You
can,
of
course,
implement
mutual
TLS
in
between
those
proxies.
You
can
have
namespace
and
service
level
policies
and,
of
course
it
inked.
A
great
very
well
with
the
kubernetes
with
the
Arabic
observability
is
actually
quite
interesting.
I
have
this
clearly
and
you
see
I
deployed
an
application,
it's
a
bookstore,
it's
the
the
tutorial
one
and
all
the
traffic
reached
this
product
page.
This
product
page
talked
to
this
review
service.
D
That
review
service
only
the
version
two
and
three
talked
to
the
rating
service,
and
you
see
that
this
is
the
application
that
gets
called,
and
you
can
see
that
sometimes
you
see
the
color
of
the
reviews
changing,
and
that
is
the
different
version.
So
you
see
that
v2
is
black.
V3
is
red
and
v1
doesn't
have
so
this
gets
you.
You
have
a
free
load
balancing
and
that
how
you
can
visualize
what
what
happened?
D
D
So
you
see
that
the
detail
page
is
called
before
the
review
page,
but
maybe
now
that
you
see
that
you
can
say,
but
can't
we
make
that
call
in
parallel
and
save
a
lot
of
time
and
you
can
improve
your
your
service
that
way
so
yeah,
let's
do
a
little
demo.
So
that's
the
application
that
I
showed
you
and
the
best
thing
is
that
it's
code
independent.
It
doesn't
matter
which
language
it's
used
because
it's
it's
separated
from
the
code.
D
D
So
yeah
all
the
request
goes
to
v1
I.
Don't
have
to
worry
about
anything,
the
the
other
are
not
used
anymore.
So
what
we
want
to
do
is
actually
specify
that
the
user-
let's
call
it
JSON-
he's
going
to
be
routed
only
to
v2.
So
it's
the
only
one
to
have
access
to
v2
and
the
way
we
can
do
that
is
by
created
a
new
page
super
secure,
app
and
there
I'm
logging
as
JSON.
D
So
you
see
that
without
impacting
the
user,
I
can
have
everybody
on
the
on
the
main,
a
main
version
and
have
a
specific
version
just
for
my
user,
so
I
can
test
whatever.
This
is
happening
and
I.
Remember
at
the
beginning
of
the
working
group.
I
think
is
a
Michael
from
LinkedIn
entry
made
a
demo
about
how
to
how
do
they
do
cows,
engineering,
and
that
was
exactly
the
same.
They
choose
for
one
user
which
problem
they
want
to
generate
so
that
you
can
see
and
from
there
we
can
come
back
to
the
demo.
E
D
I
put
a
list
of
the
various
service
meshes
that
have
more
or
less
the
same
feature.
Some
have
more.
Some
at
last
I
would
say
that
linker
D
was
made
in
Scala,
but
they
reimplemented
in
go
and
rest,
so
they
provide
a
simpler
adoption
rate,
so
they
don't
need
to
to
map
the
whole
cluster.
You
can
just
introduce
a
proxy
gradually
console
is
very
made
the
console,
connect
and
made
it
very
easy
to
implement
security,
so
encryption.
D
D
So
we
want
things
to
go
well,
even
if
they
are
not
mistreated,
and
it's
a
little
bit
like
the
the
vaccine,
shot
where
you
inject
a
little
bit
of
harm
in
order
to
create
a
reaction
from
the
body
and
for
the
body
to
be
able
to
to
to
defend
himself
and
the
thing
is
it's
not
very
much
to
cause
problem?
It's
very
much
to
reveal
problem.
D
We
want
to
know
how
the
system
behave
under
these
certain
circumstances,
because
we
can
have
a
lot
of
idea
of
how
things
work,
but
in
reality
it's
completely
different
and
we
all
we
all
end
up,
surprised
and
oops
and
there
it
goes,
and
it's
like
the
Nintendo
switch
snafu
that
happened
in
Christmas,
so
it
it's
unpacked
a
lot.
The
business
and
here
I
really
like
this.
This
explanation
is
chaos.
Engineering
is
the
exploratory
testing
of
non-functional
requirements
where
non-functional
requirements
are
the
requirements
that,
if
not
met,
render
service
non-functional.
D
So
it's
quite
very
hard
to
define
what
is
this?
What
would
make
the
service
withstand
turbulent
condition?
But
that's
why
we
need
to
to
explore
and
test
doesn't
mean
that
we
have
to
blow
half
the
cluster
to
find
out,
because
that
way
you
will
know
exactly
what
it
is
and
you
would
probably
won't
like
it.
D
So
the
mean
yeah
I
love
this
good
having
a
child
chaos,
engineering
for
everything
in
your
life,
and
so
what
cursive
engineering
is
not
is
having
this
belief
that
if
you
do
things
without
paying
attention,
it
will
go
away
because
hope
is
not
a
strategy.
It's
really
doing
doing
things
in
order
to
find
out
how
how
to
know,
because
there
is
things
that
there
is
unknown
known
and
there
is
unknown
unknown.
D
There
are
variables
that
you
don't
know
yet,
and
we
need
to
find
out
that
the
most
of
the
things
that
usually
go
and
tested
is,
for
instance,
draining
the
requests.
Sometimes
when
a
deployments
there
is
a
shift
of
traffic
and
those
requests
get
get
sent
500
back
and
nobody
noticed
because
it's
during
the
deployment,
so
it
sometimes
things
go
wrong,
but
the
health
check
and
all
those
timeouts
are
super
hard
to
detect
and
so
they're
different
type
of
error.
You
might
want
to
test,
for
instance,
what
happened?
D
The
difference
between
a
service
is
late
or
services
unreachable,
and
that
might
be
difficult
things
to
detect
or
what
happened
when
some
service
reply
504
too
long,
but
the
other
are
fine.
You
might
want
to
stick
with
breaking
everything
or
there
is
this.
I
have
this
story
about
doing
a
database
migration
on
a
huge
MongoDB
set
and
the
migration
was
running
before
the
program
start.
So
the
thing
is
that
with
nobody
on
communities
there
is
this
liveness
and
readiness
probe
that
nobody
ever
thought
that
the
migration
will
take
longer,
but
it
did
so.
D
So
basically
the
SLA
is,
is
money
to
the
customer
that
you're
you
made
a
contract
with,
say:
ok,
we
can
provide
this
level
of
service
and
with
a
service
mesh,
you
can
see
that
you
have
your
all
your
data
right
there.
You
know
exactly
how
many
500
you
you
send
back
last
like
months,
because
everything
is
storing
from
materials,
and
you
can
query
everything
from
from
as
long
as
you
want,
because
the
the
all
your
data
are
are
stored.
D
So
for
doing
chaos,
engineering,
it's
a
good
idea
to
they
call
it
a
game
day
so
fill
in
the
blank.
When
you
want
to
answer
what
happened
when
and
there's
this
good
article
about
breaking
dynamodb,
one
thing
to
note
is
that
dynamodb
doesn't
scale
down
so
after
a
certain
load,
DynamoDB
scale
up
and
you
cannot
reduce
the
size
of
the
of
the
box
after
that,
so
you
are
stuck
with
a
big
bill.
If
you
want
to
scale
down,
you
have
to
migrate
the
data.
D
How
to
do
migration-
that
is
a
very
good
game
day,
say
we're
going
to
practice
and
recover
from
the
data
migration
from
one
instance
to
another.
That
is
a
nice
example,
but
is
nothing
can
get
done
if
the
organization
is
not
behind?
So
it's
like
the
mentality
of
the
organized
organization,
should
be
expect
failure
and
learn
from
it.
We
should
not
fear
and
cover
up
failure.
It's
it's
something
to
be
to
be
dealt
with,
and
so
a
very
good
idea
is
to
have
the
high
severity
incident
management
program
to
know
what
to
do.
D
What
can
who
to
to
update
who
should
be?
Who
should
communicate
that
and
how
we
can
resolve
that
in
a
in
a
useful
manner?
So
it's
a
very
much
a
cultural
approach
and
never
underestimate
the
power
of
root
cause
analysis.
It's
it's
really
nice
to
have
a
document
with
a
proof
that
saying
we
did
this
this
happen.
Because
of
that-
and
we
here
are
the
result
and
everything
is
documented,
because
if
you
don't
learn
from
a
mistake,
it
will
it's
bound
to
be
repeated,
and
so
there
is.
D
This
I
find
this
recently
about
the
Toyota
assembly
line.
So
they
have
this
motor
core
Kaizen,
which,
if
you
take
the
character
separately,
means
change
good.
It's
basically
means
a
continuous
improvement,
and
so
they
have
this
unknown
Court
on
the
assembly
line
of
the
car
and
as
soon
as
an
employee
detect
a
problem.
They
pull
that
cord.
The
manager
come
to
the
side
to
check,
and
if
the
problem
is
severe,
it
can
actually
stop
the
whole
line,
but
at
the
scale
of
Toyota,
where
they
have
to
produce
thousands
of
car
per
day.
D
Maybe
it's
a
rough
lot
of
money
that
they
might
lose
so
problem
gets
fixed
really
fast,
because
the
detection
of
problem
is
really
fast.
They
did
take
the
prong
early
and
they
fix
it
early.
So
they
have
a
procedure
on
how
to
fix
problem
basically
and
yeah.
If
for
people
who
would
like
to
start
cows,
engineering
I
really
recommend,
by
experience
to
to
be
careful
with
the
word
chaos,
because
it
means
different
things
for
different
people
and
I,
really
like
that.
D
You
know
people
in
IT
are
really
excited
by
it
about
it,
but
for
a
manager
saying
oh
we're,
gonna
blow
half
the
cluster
to
see
how
it
react,
might
not
sound
like
a
good
idea
at
the
time.
So
I
would
recommend
that
you
use
the
word
resiliency
at
first
for
after
telling
okay
we're
doing
chaos
engineering
once
you
have
the
result,
it's
easier
to
to
explain
that
was
cows
engineering.
D
So
that's
the
the
little
setup,
because
I
think
there
is
a
few
step
that
might
be
important
is
instead
of
mentioning
Calais
mentioned
the
result
mention
the
goals
like
we
want
to
improve
the
resiliency
of
the
of
the
database.
Like
how
do
we?
How
do
we
do
that?
If
we
don't
have
monitoring,
it
doesn't
exist,
something
that
doesn't
get
measure.
It
is
just
gonna
be
forgotten.
So
the
good
thing
is
that
if,
since
we
have
all
those
nice
graph
now
about
how
the
system
react,
we
can
make,
you
can
feel
a
low
almost
like.
D
Oh
this,
this
is
not
normal.
It
was
not
like
that
last
week
and
you
you
have
some
kind
of
good
feeling
of
how
things
should
look
and
that
would
could
be
called
your
steady
state
once
you,
you
understand
the
steady
state,
it's
easy
to
make
hypotheses
like
what
happened.
If,
because
you
know
you're
you're,
it's
like
you're
used
to
it
to
to
what
is
how
things
should
look,
and
you
can
challenge
that
by
form
forming
an
input
and
hypothesis,
and
so
with
an
I
put
once
you
have.
D
Your
hypothesis,
you
can
set
in
place,
will
board
events
like
I,
don't
know
shutting
off
the
one
one
end
one
instance
see
how
the
service
react
or
killing
a
pod
in
communities
to
see
how
it
gets
recreated
and
so
on
and
so
forth.
The
goal
is
to
once
you
did
that
to
write
a
report
so
that
other
may
benefit
from
it.
You
don't
need
to
reinvent
the
wheel
over
and
over
again.
Another
good
thing
is
that,
once
the
report
is
written,
you
can
talk
about
the
chaos
experiment
and
then
become.
D
It
becomes
like
a
positive
thing
instead
of
something
scary,
because
the
only
reason
chaos
is
scary,
people
don't
know
about
it.
It's
nothing
nothing
different
from
it.
Then
testing
engineering,
you
you
want
to
know
about
something,
so
you
experiment
with
it,
and
the
last
thing
is
to
keep
on
doing
it
doing
it
only
once
won't
make
it
improve.
So
it's
better
to
set
practice
in
place
and
do
it
often
so
yeah,
let's
see
a
little
bit
of
the
demo
for
Calais
engineering
I.
D
Have
this
manifest
here
that
will
introduce
delay
for
our
friend
JSON
that
is
connected,
so
I
set
a
timeout.
Maybe
we
can
look
at
how
easy
it
is
to
set
a
timeout
in
so
yeah.
So
basically,
what
I
did
what
I
did
here
is
if
it's
JSON
for
a
hundred
percent
of
the
request,
just
at
the
seven
second
delay.
D
Otherwise,
for
all
the
other
just
sent
the
normal
one,
so
we
don't
impact
anybody
else
at
all,
so
here
I'm
an
impacted,
but
if
I
look
at
JSON,
you
can
see
that
here
the
the
requests
are
stuck
and
we
should
see
an
error
soon.
Oh
there,
you
go
the
error,
fetching
product
review,
so
you
have
a
very
small
blast:
radius,
a
control
blast
radius
to
do
your
testing
and
to
see
how
the
service
will
look
like
if
this
service
doesn't
answer.
D
So
it's
a
very
good
way
to
to
be
comfortable
with
errors
like
it
allows
for
a
very
easy
way
to
handling
and
to
to
show
that
we
can
recreate
problem
easily.
What
happened
when
so
was
I
yeah
I
want
to
clean
up
because
I
might
have
another
demo
later,
but
okay,
and
so
that
was
easy
too
just
to
clean
everything.
And,
let's
see
everything
is
back
to
normal
now
so
yeah
I.
A
F
That
was
awesome.
Dylan
I
really
appreciated
that
I
also
especially
agree
with
the
whole
idea
that
we
should
try
to
change
the
culture,
around
reporting,
outages
and
being
very
upfront
about
sort
of
what
failures
happened.
So
we
can
all
kind
of
learn
and
grow
together.
I
did
have
a
quick
question
to
serve
about
that
that
little
Gamal
file
at
the
end
there
that's
the
kubernetes
yellow
right.
So
how
does
that
actually
interact
with
SEO
is
that
is
the
delay.
D
F
D
F
D
The
delay
gets
sent
to
Cuba
letís
through
the
custom
resource
definition,
so
communities
send
it
to
sto
the
pilot.
The
pilot
take
that
info
and
spread
it
to
all
the
the
proxy
to
all
the
anvil.
Russians.
The
thing
is:
I
can
show
you
what
the
envoi,
the
rumble
of
envoy
looks
like,
and
it's
a
little
bit
more
verbose,
so
you
sure
do
come
as
a
manager
of
the
fleet
of
invoice
and
it
it
allows
for
here
you
can
see
that
we
try
on
503
retries
in
East.
D
G
You
I
have
a
question
on
the
same
point
that
Matty
was
talking
about.
You
know:
I
tried,
injecting
the
delay
and
I
realized
if
I
try
to
access
the
endpoint
using
curl.
The
delay
is
not
enforced.
However,
if
I
go
through
the
web
browser
which
then
goes
for
the
ingress
gateway,
then
that
is
observed
so
and
I.
Guess
that
can
explain
the
point
that
you
were
making
earlier,
that
it
is
injected
before
the
proxy.
So
if
you
go
through
the
proxy,
then
the
delay
is
observable,
but
not
too
curved
yeah
exactly.
D
So
you,
the
thing
is
that
there
is
this,
the
the
main
problem
I
the
question
also
I
get
from
from
that
is
that
people
who
have
already
a
service,
for
instance,
with
SDK
inside
that
connects
with
TLS,
might
create
some
issue.
If
you
try
to
to
go
egress,
you
have
to
configure
that
to
allow
service
to
go
outside
the
cluster
and
you
have
to
configure
for
traffic
to
go
in
the
cluster,
so
the
north-south
traffic
is
also
part
of
the
of
the
Yamal
definition
that
you
need
to
think
about.
D
D
The
basic
gateway,
the
Gateway
is
interesting
to
see.
If
you
look
this
okay
so
that
that's
the
you
define
the
virtual
service,
which
is
the
the
product
page,
this
is
the
web
page
and
you
can
define
route
and
for
those
route
you
send
it
to
that
to
that
service.
So
that's
what
ingress
gateway.
So
that's
a
little
bit
too.
You
allow
this
traffic
to
be
to
reach
the
cluster,
because
by
default
nothing
is,
is
reachable
yeah,
it's
an
opt-in
mechanism.
Is
that
a
question
exactly.
G
You
know
it
does
and
that's
exactly
what
I
was
doing
as
well,
because
when
I
was
starting
to
play
with
this
I
realized
that
you
know
it
cannot
be
using
curl.
It
has
to
go
through
a
gateway
and
that's
when
it
works,
as
the
matter
of
fact
funny
enough,
so
I'm
actually
giving
a
talk
at
a
rally
velocity
next
Monday
on
this
exact
same
topic.
So
this
is
a
very
relevant
thing.
G
I
have
a
lot
of
content
prepared
already
I
wish
I
could
have
gone
through
sort
of
the
unwise
concepts,
but
I
only
have
a
twenty
five
minutes
lot.
So
I'm
straightaway,
jumping
into
the
code
and
saying
alright.
Here
is
it
all.
The
issue
is,
there's
a
rule
that
my
players-
and
here
is
how
you
can
inject
and
and
I
like
you
preached
it
that
to
your
manager.
Do
not
even
mention
the
word
chaos.
You
know.
They're
gonna,
freelance.
E
G
D
Thank
you
and
I'm
very
happy
to
hear
that
I
was
not
the
only
one
who
got
that
kind
of
reaction
coming
from
you
know,
non-technical
people,
if
you
don't
know,
but
I
heard
that
even
Netflix
we
named
the
cows
engineering
team,
the
resiliency
team,
or
so
they
want
to
set
the
goal.
What
not,
what
they're
gonna
do
you
know
resiliency.
G
G
D
Good
I'd
have
more
de
more
like
one
thing:
I'm
really
excited
about
is
the
dark
launches,
so
you
can
actually
admire
your
traffic
from
the
ingress,
so
you
would
have
a
secret
cluster
clone
with
a
new
product,
and
you
can
test
live
all
with
real
traffic
going
there,
because
all
the
answer
back
to
the
proxy
are
discarded
automatically.
So
you
can
do
super
powerful
thing
and
just
with
configuration
of
just
with
configuration.
D
G
A
A
A
Alright,
so
we
only
have
probably
about
five
minutes
left
I'm
not
going
to
you
know,
take
too
much
time
outside
of
you
know.
I
mentioned
last
time
we
met.
We
have
a
Kaos
engineering
landscape
now
for
C
and
C
F,
please,
you
know,
send
pull
requests
like
there's
obvious
things
missing.
You
know
like
Netflix,
is
chaos,
monkey
and
so
on,
but
I
haven't
been
too
easy
to
edit
myself,
hoping
someone
else.
A
Does
it
more
importantly
and
more
timely,
you
know,
there's
a
chaos
conference,
this
Friday
where
I
know
some
of
you
will
be
out
that
I
think
Matthew
and
some
other
folks
from
his
organization
are
putting
on
so
hope
to
meet
some
of
you
face
to
face
there
and
if
you
have
any
kind
of
topics
you
want
to
discuss,
let
me
know,
and
then
finally,
I
had
some
volunteers
for
the
kind
of
intro
to
chaos.
Engineering
topic
for
Q,
con
Claudian
con
in
December.
A
So
thank
you
for
everyone,
volunteering
there
so
we'll
have
a
session
there
that's
hosted
by
a
few
folks
and
finally,
you
know
there's
some
work
for
us
to
kind
of
iterate
on
the
white
paper
I've.
Just
been
super
busy
that
I
haven't
had
time
to
kind
of
go,
look
at
it
myself.
Just
then
wrapped
up
with
traveling
to
China
so
appreciate
it.
I
think
Sylvain's
been
driving.
A
So,
but
that's
mostly
on
my
onus
but
I
appreciate
if
folks
have
time
to
to
contribute
and
look
at
that
than
that.
You
know
we'll
meet
again
in
a
couple
of
weeks.
If
there's
anyone
in
this
group
that
would
like
to
volunteer
to
present
on
a
topic,
let
me
know:
I
found
Julian's
topic
today,
awesome
and
very
interesting.
So
if
there's
any
anyone
else
there
that
wants
to
talk
about
something.
Let
me
know
I
know
Mikhail
from
Bloomberg
is,
is
up
to
present,
but
I
know
if
he
can
make
it
in
two
weeks.
H
A
No,
you
know,
there's
no
official
drop
date,
but
I
would
say
it
would
be
advantageous
for
us
if
we
got
it
ready
for
December
early
December
for
cubic
on,
mostly
because
we'll
have
a
lot
of
PR
or
kind
of
analyst
people
there
at
the
conference,
and
so
it's
just
a
way
to
kind
of
drum
up
interest,
and
we
actually
could
have
folks
if
folks
are
actually
planning
to
be
there.
We
set
up
kind
of
meetings.
A
A
Cool
yeah
I
think
we
add
I,
think
it's
totally
doable
like
a
lot
of
us
have
been.
You
know
busy
with
you
know,
conferences
and
so
on,
but
I
think
now
we
got
a
good
group
of
folks
that
kind
of
iterate
on
it
and
get
there
so
Lane's
been
doing
a
good
job
of
pushing
it
along
as
far
as
possible.
I
think.
I
E
A
I
mean
that's
a
challenge
of
the
open
sourcing
in
general
right,
so
the
idea
yeah
I
mean
the
idea
would
be
like.
If
we
get
you
know
it
doesn't
have
to
be
this.
You
know,
like
you,
know,
chaos,
engineering,
tome,
slash
Bible
likely.
No,
you
know
we're
basically
positioning
it
to
introduce
the
topic
to
kind
of
the
wider
clownin
of
community
and
basically
offering
like
an
initial
landscape
right.
So
you
know
I
would
love.
You
know
more
additions
to
that
landscape.
A
So
when
we
kind
of
launched
the
working
group
officially
and
announced
the
white
paper,
we
had
the
kind
of
updated
landscape
to
kind
of
go
with
it
right.
That's
kind
of
what
we're
trying
to
really
do
here
is
educate
the
wider.
You
know,
CN
CF
and
in
cloud
native
community
on
on
chaos
and
resilience,
engineering
cool.
Any
other
thoughts
concerns
questions
before
we
cut
out.
A
A
A
E
H
A
Yeah
I'm
most
likely
I'm
planning
to
make
it
work,
so
maybe
we
could
kind
of
do
something.
Let
me
let
me
think
about
that.
Okay,
all
right
thanks
everyone
for
your
time
and
thank
you
again
Julian
for
that
amazing
demo.
So
we'll
get
the
stuff
published
on
YouTube,
so
people
could
watch
it.
I'm
gonna
go
update
the
readme
tree
to
kind
of
link
to
previous
videos,
I've
been
a
poor
steward
of
that
rainy,
so
I'll
get
it
done
by
the
end.
Today,
all
right
take
care
great.