►
From YouTube: CNCF SIG App Delivery 2020-11-04
Description
CNCF SIG App Delivery 2020-11-04
A
C
Yeah,
let's
actually
put
in
brackets
here,
I'm
going
to
just
say:
let
me
space
my
data
all
right.
I
can
rename
I'm
just
renaming.
B
B
C
D
Honestly,
so
I'm
afraid
that
not
necessarily
a
lot
of
people
are
going
to
show
up
today,
elections,
yep.
D
Yes,
but
let's
still
give
people
a
chance
to-
and
I
know
that
you
wanted
to
present
something
so
you
might
not
have
the
biggest
of
audiences
but
anyways
we
have
to
recording
and
we
can
share
it
later
on.
That's
still
something
we
can
do
all
right.
I'm
posting,
as
always
the
link
to
the
doc
in
here
waiting
a
couple
more
minutes,
then
for
people
to
join.
D
Then
litmus
integration
with
this
captain
again
as
not
too
many
people
are
going
to
join
today,
we'll
just
record
it
and
then
post
the
link
to
the
on
the
mailing
list
later
on.
The
bitcoin
should
be
then
available
to
tomorrow,
and
then
I
briefly
also
wanted
to
give
you.
Maybe
we
do
this,
then
also
as
one
of
the
first
items,
a
brief
update
on
the
potato
head
project.
D
D
D
E
So
a
short
update
from
the
operator
working
group,
and
so
we
are
meeting
once
a
week
at
a
small,
very
small
group,
three
more
people
and
myself
and
on
that
meeting
we
are
talking
about
the
operator
trying
to
get
it
into
something
we
can
share.
We
are
sharing
the
results
week
by
week,
so
in
the
definition
of
an
operator
you
can
see
sorry
in
the
yeah
in
the
operator
definition
working
document
I
will
edit
in
the
links
you
can
see.
E
Progress
is
being
made
and
we're
open
to
comments
there.
And
if
you
want
to
join
the
small
group-
and
I
will
also
share
how
or
you
can
just
ping
me
and
the
last
time
we
talk
about
capabilities
of
of
an
operator
and
we
are
trying
to
define
as
phases
in
the
life
cycle
of
an
operator
and
how
an
operator
can
say
I'm
doing
that
or
I'm
not
doing
that.
D
Sorry,
it's
been
a
long
day
thanks.
When
do
you
think
would
you
would
want
to
share
it
with
a
a
broader
audience
that
just
started?
There
is
already
like
just
quickly
sharing
it
here.
E
E
I
think
is
something
we
already
discussed.
The
yellow
is
something
we
discussed,
but
it's
not
in
final
words,
so
we're
hoping
weak
and
we
will
refine
it.
So
we
talked
about
it.
This
is
the
output
of
what
we
talked
in
the
meeting
and
now
every
one
of
us
took
a
small
chunk
and
we
write
it
in
the
better
and
explained
words.
D
D
Yeah,
I
think
that
might
be
that
that's
going
to
be
great
and
this
notice
has
been
in
the
working
form
for
quite
some
while
then
we
kind
of
lost
track
of
it,
and
now
we
are
back
on
it.
So
I
think
that's
great,
and
ideally
we
can
also
like
share
it
with
a
bigger
audience
later
this
year,
but
that
looks
very,
very
promising
already
and
yeah
thanks
for
taking
this
taking
on
this
work
and
if
you
want
to
discuss
specific
topics,
also
feel
free
to
use
the
mailing
list.
D
Since
yes
conformance
and
new
cncf
working
group
who
edit
this
one,
that
was
me,
taylor,
okay,
so
I'm
just
looking
at
also
having
the
demo-
and
I
assume
you're
here,
to
really
give
the
team
here
a
small
update.
So
I
was
wondering
whether
should
this
do
this
before
the
demo.
How
much
time
would
you
need
for
for
that
topic?
Roughly.
A
D
Yeah
then
I
would
propose
you
go
first
and
then
we
dive
into
the
demo
if
the
demoing
team
is
fine,
because
that
gives
you
more
or
less
time
for
the
demo
till
the
end
of
the
meeting
and
being
sure
that
tyler
gets
some
time
to
present
on
this
one.
A
All
right,
I
can,
I
guess,
share
my
screen
if
that's
something
that
y'all
like
to
should.
D
A
A
All
right
so
at
the
I
don't
I
don't
know
who's
aware
of
the
cnf
conformance
in
in
general.
It
it's
been
going
for
about
a
year
as
far
as
at
the
program
start,
but
it's
been
focused
on
a
test
suite
project
up
until
recently.
So
what's
just
recently
started
was
a
a
new
working
group.
A
That's
the
new
thing
here,
but
to
give
you
a
little
bit
of
background
and
by
the
way,
this
may
be
a
repeat
for
some
people,
if
you're
on
the
toc
meeting
yesterday,
because
this
was
shown
there,
so
I'm
not
going
to
go
through
all
of
it.
You
can
go
review
that
these
slides
are
from
the
toc
meetings.
A
But
the
main
thing
here
is
the
cnf.
Conformances
is
looking
at
how
to
help
telcos
and
being
more
cloud
native
with
their
applications,
the
telco
industry
and
we're
looking
at
the
service
providers
themselves,
so
at
t
voter
phone
orange,
whoever
and
the
creators
of
the
telco
applications,
so
vendors
and
everyone
else
and
how
to
help
them
move
I'm
just
going
to
kind
of
go
through
very
quickly.
A
So
this
is
just
an
overview
of
saying,
what's
been
happening
and
and
the
telco
is
more
and
more
interested
the
you're
starting
to
see
more
adoption
of
kubernetes
and
on
a
platform
side
it
seems
like
everyone
wants
to
do
certified
kubernetes,
but
one
of
the
big
things-
and
this
is
recent-
surveys-
is
talking
about
moving
the
actual
networking
and
telco
type
workloads
which
they're
usually
referring
to
like
network
functions,
moving
those
applications
onto
kubernetes,
and
so
there's
there's
issues
with
the
the
philosophy
and
viewpoint
on
how
a
lot
of
those
have
been
built
that
are
quite
a
bit
different
from
the
normal
applications,
so
that
the
conformance
goals
as
a
program
is
similar,
like
kubernetes
conformance
for
interoperability
at
a
platform
level.
A
A
But
the
the
new
thing
here
is
this
working
group
going
around
defining
like
a
certification
and
definitions,
and
this
is,
you
can
see
an
open
pull
request
if
you
go
to
the
cnf
conformance
where
it's
the
the
charter
and
all
the
other
things
are
being
added
here
for
this
new
working
group
and
what's
in
scope,
so
I'm
I'm.
A
I
don't
want
to
take
up
a
lot
of
time
in
this
call
to
go
over
it,
but
the
the
main
point
here
is
there
is
going
to
be
overlap
in
many
communities,
cncf
and
kubernetes,
and
one
of
the
main
ones
that
we've
talked
about
is
sig
app
delivery
and
another,
wouldn't
would
I
just
point
out,
would
be
sig
network
because
there's
there's
an
overlap
on
both
of
those.
A
When
you
look
at
telco
and
the
type
of
applications
that
are
run,
is
you
have
a
lot
that
deal
with
like
user
data
planes,
so
not
using
the
standard
networking?
So
what
do
you
do
for
those
type
of
applications?
And
what
do
you
do
for
other
other
applications
that
may
go
beyond
what
the
standard
common
applications
are?
So
that's
what
the
focus
of
this
group
right
now
is
to
to
work
with
the
telco
and
cloud
native
community
to
to
try
to
look
at
what
is
the
scope?
A
That
platform
is
likely
to
be
part
of
it
in
some
ways
when
you
talk
about
stuff
like
a
user
data
plane,
and
you
look
at
other
mechanisms
within
the
the
platform
or
you
could
say
layers
and
add-ons
of
kubernetes,
and
that's
where
it
ties
in
with
the
network,
plumbing
group
and
sig
networking
from
kubernetes
and
as
well
as
the
cncf
sig
networking,
which
looks
at
service
meshes
and
stuff.
A
So
this
is
more
of
this
working
group
is
forming
right.
Now,
we're
going
to
be
having
community
meetings
and
to
talk
more
at
kubecon
that
we'll
start
having
regular
working
group
meetings
and
we
would
we
expect
to
protect
collaborate
more
directly
with
sig
app
delivery
than
most
most
of
them,
and
probably
sig
networking
as
well.
A
I'm
sure
there's
going
to
be
a
lot
of
overlapping
different
ones,
but
if,
if
we'd
love
to
have
your
participation
in
that
group
to
help
telcos,
I
think
that's
probably
it
that
the
test
suite
itself,
maybe
just
throw
out,
is
this.
This
is
going
to
be
put
forward
as
a
a
a
sandbox
project
in
the
next
three
to
six
months.
B
A
Test
suite
and
maybe
a
quick
overview,
the
test
suite
is,
you
could
think
of
it
as
a
combination
of
what
the
the
e
to
e
tests
are
in
in
kubernetes,
when
you
think
kubernetes
conformance,
you
have
the
e
to
e
tests,
which
are
managed
by
sig
testing,
and
then
you
have
sauna
boy,
which
is
what
people
usually
use
to
make
it
easier.
You
don't
have
to
use
some
way,
but
many
people
do
so.
The
what's
currently
called
the
cnf
conformance
test.
Suite
may
be
renamed
is
kind
of
it.
A
D
A
D
For
sharing,
so
I
definitely
think
that
there
are
reasons
that
there
are
points
of
overlap
here
going
forward,
so
I
think,
having
a
recognizing
make
sense.
Interestingly,
the
first
time
we
had
a
touch
point
with
regarding
telcos
was
really
on
the
air
gapped
work
that
we
kind
of
well.
We
didn't
really
stop
it,
but
it
didn't
get
as
much
momentum
as
we
thought
it
would.
D
A
Absolutely
and
I'd
say
the
it's
more
of
a
compliment
than
anything
else,
and
potentially
it's
going
to
be
a
a
specific
on
on
the
application
side
when
versus
when
we
get
down
lower.
That
may
be
like
a
platform
or
the
sig
networking,
I
would
say
it's
a
specific
case
and
if,
if
it
wasn't
for
the
the
platform
items
where
it
starts
getting
a
little
bit
gray,
this
probably
would
be
a
working
group
under
sig
app
delivery.
There's
a
little
bit
of
discussion
on.
A
Does
it
fit
more
on
sig
network
or
sig
app
delivery,
and
I
see
matt
farina.
I
think
you
said
in
the
toc
yesterday
that
it
kind
of
bridges
both
or
agree
agree
on
that.
But
yes,
for
a
quarterly
update
and
we'll
continue
to
collaborate
and
think
of
sig
app
delivery
kind
of
as
a
base
for
a
lot
of
the
cloud
native
application
pieces.
D
D
Okay,
then
next
item
on
the
agenda-
and
I
see
you
wanted
35
minutes-
we
won't
make
it
like
for
the
full
35
minutes.
You
might
have
to
speed
up
your
demo,
but
it's
great
to
see
updates
from
two
two
projects,
and
especially
two
csf
projects,
working
together
on
the
chaos,
engineering
and
delivery
integration
between
captain
and
litmus.
So
I
would
ask
you
to
maybe
certainly
stay
in
time.
Can
you
do
it
in
24
minutes?
Because
that
will
keep
us
in
time.
F
Yeah,
I
think
that
should
be
possible
for
us,
I'm
going
ahead
and
just
share
my
screen.
F
Yeah
thanks
always
and
as
you
already
mentioned,
that
was
really
a
joint
effort
between
the
litmus
and
the
captain
team
and
together
with
kartik
from
the
litmus
team,
we
are
going
to
present
our
joint
effort
here
how
to
integrate
these
two
projects
and
actually,
why
we're
doing
this
and
which
problems
we
want
to
solve.
We
have
a
little
bit
of
a
presentation
prepared
and
then
the
demo
card-
tick.
F
Maybe
you
just
let
me
know
when
I
should
move
on
with
the
slides,
and
I
will
do.
I
will
let
you
start
and
then
I
will
hand
over.
I
will.
B
Thanks
hi,
everyone
great
to
see
taylor,
watson
and
luna
here-
we've
met
recently
and
should
get
around
soon
to
contributing
to
the
cnf
conformance
this
week.
Okay,
so
we
have
some
items
on
the
agenda.
That's
a
couple
of
slides.
We
have
that
is
explaining
what
we're
trying
to
do
and
then
we'll
go
to
the
actual
demo
and
try
to
keep
it
short
on
the
slides
and
move
to
the
demo
quicker.
B
We
could
probably
go
to
the
next
slide
here
again,
thanks,
okay,
so
we
are
all
aware
that
resiliency
on
microservices
is
hard
and
whenever
we
are
deploying
applications
on
kubernetes,
we
know
that
there's
a
lot
of
services
running
there,
which
we
have
not
picked
ourselves
and
90
percent
of
our
resilience-
depends
upon
all
the
infrastructure
components
that
we
are
running
on
top
of.
So
it's
really
difficult
to
predict.
What
kind
of
outage
can
happen,
so
it's
very
important
to
have
a
system
where
we
are
going
ahead
and
injecting
these
failure.
B
B
B
So
with
communities
all
aspects
of
managing
complications
and
managing
resources
and
policies
is
all
done
in
a
declarative
way
via
yamils,
and
we
wanted
to
adopt
the
same
ux
when
people
are
attempting
resiliency
tests
or
chaos
on
on
their
questions.
So
that's
where
the
custom
resources
came
in,
so
we
were
able
to
provide
the
same
user
experience,
describe
the
chaos
intent
in
custom
resources
and
have
an
operator
understand
that
and
implement
the
right
functions.
B
B
We
have
a
hypothesis
that
is
describing
some
steady
state
for
the
application
or
infrastructure
which
is
validated
first,
followed
by
a
fault
injection,
and
there
is
going
to
be
a
steady
state
verification,
that's
done
at
the
end
of
it
or
even
during
the
fall,
as
the
chaos
occurs
in
parallel,
and
if
our
hypothesis
is
right
and
the
steady
state
conditions
are
met
or
they
are
regained.
Within
the
toleration
seconds
that
we
have
specified,
we
are
going
to
qualify
that
infrastructure
or
application
as
being
resilient.
B
So
when
the
chaos
engine
is
created
because
operator
launches
a
set
of
chaos,
parts
which
actually
implement
the
experiment,
the
clears
injection
logic
is
implemented
by
them
and
the
results
are
captured
in
another
cr.
So
they
are
different
crs,
as
each
of
these
has
a
scope
for
a
lot
of
information
to
be
placed
inside
it
and
acted
upon.
B
So
this
is
a
declarative
way
of
doing
chaos.
That
was
the
same
for
litmus
and
that's
something
that's
been
adopted
and
taken
quite
well,
and
it
lends
itself
to
a
lot
of
paradigms
in
cloud
native
space
like
guitars,
which
we
will
talk
about
in
a
short
while
when
is
being
discussed.
B
B
Initially,
chaos
engineering
was
viewed
very
much
as
an
ops
thing
very
much
in
the
domain
of
the
srd,
but
there
has
been
a
left
shift
and
people
are
using
it
as
part
of
the
release,
that
is,
they
are
using
it
as
part
of
the
delivery
pipelines.
B
So
chaos
experiments
with
the
frameworks
like
fitness
are
allowing
themselves
to
be
used
in
this
model,
so
you
could
actually
run
a
kiosk
experiment.
This
part
of
a
ci
pipeline
with
different
popular
ci
frameworks,
great
land,
negative
actions,
etcetera
where
you
can
actually
store
the
experiment
computations
beforehand
and
run
them
as
part
of
the
pipeline
and
use
the
result
to
determine
the
ci
stage.
Success
of
the
ca
job
senses
and
application
artifacts
that
do
pass.
B
This
stage
can
be
placed
into
other
clusters,
which
are
doing
some
kind
of
scheduled
chaos
which
are
typically
for
longer
durations,
and
then
we
can
interface
with
cd
mechanisms
with
ct
solutions
which
can
actually
take
this
and
put
it
into
a
staging
name.
Space
run
its
own
set
of
validations
and
promote
it
to
other
stages
depending
upon
the
success.
B
So
this
is
something
that
we
are
seeing.
In
fact,
this
is
one
of
the
dominant
use
cases
or
litmus
in
the
recent
times
that
we
are
coming
seeing
the
community
and
has
sparked
some
very
interesting
integrations,
which
we
will
talk
about.
That's
where
I
would
like
to
hand
it
off
to
you
again
to
talk
about.
F
Thank
you.
Oh
there
was
a
little
animation,
but
here
we.
D
F
Yeah
thanks
so
before
we
we'll
jump
into
the
demo
and
show
you
how
this,
how
captain
and
litmus
chaos
work
together.
I
just
want
to
give
you
a
brief
heads
up
on
what
captain
is.
Maybe
you
have
already
heard
about
it.
It's
also
a
cncf
sandbox
project
and
it's
a
it's
really
a
control
plane
for
cloud
native
delivery
and
operations,
and
in
this
sense
it
can
orchestrate
the
whole
application
life
cycle.
F
For
today
we
will
really
focus
on
slo
driven
delivery.
That
means
that
captain
has
as
essential
piece
quality
gates
based
on
sre
principles
like
slis
and
slos.
It
has
this
as
a
central
piece
already
baked
into
the
captain
platform.
So
we
have
this
part
and
we
can
trigger
different
tools.
Different
integrations
like
litmus
and
then
after
they
do
their
job.
F
Captain
can
then
take
its
next
actions,
for
example,
evaluating
how
actually
these
tools
have
been
executed
and
how
the
quality
of
a
microservice
that
should
be
delivered
to
to
production
is
actually
affected
same
as
in
litmus.
Also,
everything
in
captain
is
declarative
and
captain
itself
comes
with
its
own
kidops
approach,
so
everything
that's
stored
inside
kept
and
managed
by
captain
is
also
versioned
and
stored
in
the
github's
approach,
and
that
is
true
also
for
everything
that
we
are
doing
now
with
the
litmus
experiments.
F
So
everything
really
works
or
is
moved
into
the
github
or
git
repository
and
can
be
linked
to
github,
gitlab
or
whatever.
So
how
captain
works
in
the
sense?
How
can
you
really
connect
other
tools?
The
two
main
definition
files
for
captain
are
a
shipyard
file
that
describes
your
environment
and
a
uniform
file
that
this
or
a
the
concept
of
a
uniform
that
describes
which
tools
you
want
to
connect
to
captain
and
captain
is
this
control
plane
then
then
connects
all
the
different
tools
together.
F
So,
for
example,
captain
starts
by
deploying
a
new
version
of
an
artifact.
Your
container
image
deploying
this
after
the
deployment
capture
will
automatically
trigger
the
testing
tool
you
will
define
which
testing
tool
you
want
to
use.
We
will
be
using
in
the
demo
two
testing
tools
at
the
same
time
that
are
both
connected
to
captain
after
the
tests
are
finished.
Captain
will
execute
the
evaluation
of
the
tests,
but
captain
can
also
be
linked,
or
other
tools
can
be
linked
and
connected
to
captain
like
a
shadows
tool.
F
You
can
control
captain
by
a
chat
bot
or
you
can
have
everything.
That's
going
on
inside
captain
sent
as
a
notification
to
slack
and,
of
course,
you
can
link
tools
for
observability,
like
promethos
dashboarding
with
grafana.
You
can
link
this
to
captain
and
when
you're
using
captain
captain
can
distribute
the
events
from
the
control
plane
to
these
tools
and
then
detect.
They
can
take
action.
So
with
this,
a
lot
of
automation
is
possible.
A
lot
of
automation
is
baked
into
captain
for
this
use
case.
F
Today
we
will
only
focus
on
the
deployment
testing
and
evaluating.
We
won't
take
a
look
at
the
automatic
configuration
of
dashboards
or
promotion
to
production.
We
will
only
take
a
look
at
one
one
piece
of
captain,
let's
say
so.
What
we
have
prepared
for
today
is
really:
how
can
we
include
chaos?
Engineering
into
a
pipeline-
and
we
start
with
deploying
the
potato
head
application-
I
I
just
saw
it's
also
an
item
or
for
today's
agenda
of
this
seek
meeting.
So
we
will
hear
maybe
more
about
this.
F
It's
a
small
demo
application
that
will
that
we
are
going
to
use
for
for
this
demo.
We
are
going
to
deploy
this
into
a
chaos
stage,
so
it's
a
single
stage
environment
in
this
case,
because
we're
just
interested
in
evaluating
the
resiliency
of
our
microservice
we're
using
help
for
deployment
and
captain
is
executing
the
deployment
with
kevin
with
helm
for
us
after
the
deployment
is
finished,
captain
will
trigger
chain
meter
tests,
but
not
only
meter
tests
but
actually,
at
the
same
time,
also
trigger
litmus
chaos.
F
F
The
captain
quality
gate
is
is
triggered,
and
this
will
reach
out
automatically
to
promethos
and
gather
some
data
that
is
defined
for
this
quality
gate
and
we'll
gather
this
data
for
the
exact
testing
time
frame
and
for
the
service
under
test,
in
this
case
our
potato
health
application,
and
then
captain
will
go
ahead
and
evaluate
based
on
the
quality
criteria
of
the
quality
gate.
What
we
have
defined
in
the
slo
dml
file.
We
will
see
this
also
in
the
demo.
F
Capture
will
go
ahead
and
evaluate
this
quality
gate
and
we'll
come
up
with
a
total
score
and
the
score
can
be
either.
Let's
say
a
thumbs
up.
That
means
that
the
resiliency
is
satisfying
based
on
our
service
level
objectives
or
if
we
cannot
reach
the
score.
Our
as
resiliency
is
not
satisfying
and
we
can
just
rerun
the
whole
workflow
again,
for
example.
Usually
what
captain
can
also
do
is
we
can
just
promote
it
to
the
next
stage
like
from
pre-production
to
production?
F
B
Cool,
so
the
demo
environment
is
quite
simple,
so
we
have
a
gk
cluster
on
which
we
have
the
potato
head
app
with
hello
service.
That's
running,
it's
basically
giving
you
a
hello
page
at
its
service
endpoint
and
it's
configured
with
the
redness
pro
this
demo.
We
will
basically
look
at
the
difference
between
a
single
replica
and
multiple
replica
one
and
see
how
the
quality
gate
actually
fails
in
the
first
case
and
goes
through
in
the
second
case.
So
it's
highlighting
and
deployment
issue.
B
So
there's
going
to
be
a
deployment,
that's
going
to
be
managed
by
captain
and
once
the
deployment
of
the
hello
service
is
complete,
the
litmus
experiments
are
going
to
be
triggered
on
that
and
in
that
process
the
we
also
have
a
black
box
exporter
and
amity's
instance
running
managed
by
captain,
and
we
have
also
a
black
box
exporter.
B
We
can
see
that
when
we
do
the
demo,
what
are
the
rules
that
we
have
set
as
part
of
this
evaluation,
and
if,
like
you
can
said,
if
we
are
successful
in
meeting
the
criteria
that
we've
set
against
the
matrix,
then
we
go
ahead
and
pass
the
stage.
Otherwise
we
are
going
to
say
the
quantity
it
is
failed.
B
F
Sure,
okay,
so
what
I
have
here
is
the
user
interface
of
captain,
what
we
call
the
captain's
bridge
I'll
just
open
this.
It's
actually
from
earlier
today
when
we
did
our
experiments,
but
what
we
are
going
to
do
is
send
a
new
deployment
event
to
captain,
and
I
will
do
this
with
the
captain
cli,
but
with
the
option
that
I
just
sent
the
cloud
event
to
captain.
The
reason
I
do
this
by
the
cloud
event
and
not
by
the
built-in
capabilities
of
the
captain.
F
Cli,
is
that
we
can
take
a
look
a
little
bit
on
the
details
of
the
cloud
event,
so
I
will
go
ahead
and
deploy
my
potato
identification.
It's
the
hello
server
microservice,
and
I
will
just
do
this
with
one
replica
and
I
will
instruct
here
captain
to
start
the
deployment
and
what
I've
already
told
you
in
the
on
the
slides
will
happen.
So
captain
will
start
to
actually
to
deploy
the
service,
and
we
will
see
this
in
the
captain's
bridge.
F
The
captain
will
deploy
the
service
and
what
we
can
already
see
here
that
the
chaos
runner
and
the
pot
delete
will
start
it
so
after
deployment.
It
was
really
fast
because
actually
it
was
already
running
in
this
exact
version
on
my
cluster,
so
the
deployment
was
finished
and
captain
is
triggering
two
different
kind
of
tests.
The
first
one
is
the
chain
meter
tests.
They
are
running
in
a
different
name
space,
so
we
are
not
seeing
the
part
here.
F
We
are
only
seeing
the
part
in
the
litmus
chaos
namespace,
but
what
we
can
see
is
that
captain
also
triggered
the
chaos
tests.
We
can
already
see
it's
killing
a
part
here,
so
our
chaos
tests
are
the
pod
delete
chaos,
experiment.
That
means
a
part
that,
from
our
halo
service
replica
set,
a
random
pod
will
be
killed
by
the
litmus
chaos
experiment
and
it
takes
a
couple
of
seconds
for
the
next
part
to
come
up.
F
We
have
added
a
readiness
probe
with
a
little
bit
of
delay,
so
we
can
make
sure
that
once
the
pot
will
receive
some
traffic,
it's
already
up
and
running
and
now
after
a
couple
of
seconds,
is
up
and
running.
So
let
us
now
take
a
look
on
the
captain
user
interface.
What
we
can
see
here
so
we
just
triggered
the
chaos
and
we
can
see
the
configuration
change
was
received
and
the
deployment
is
finished
after
the
deployment
is
finished.
F
Captain
will
start
the
tests
and
we
have
already
seen
the
test
execution
of
the
chain.
Sorry
of
the
eliminate
chaos
experiment,
but
actually
we
will
wait
here
for
the
for
the
full
execution
of
the
che
meter
tests.
Usually
they
take
about
two
minutes
for
them
to
be
finished,
so
we
should
receive
the
event
just
in
a
second.
F
So
we
can
already
see
the
tests
are
finished.
Our
achievement
tests
with
a
couple
of
thousand
requests
has
been
executed
against
the
service
triggered
by
captain
captain
is
now
doing
the
sli
retrieval.
That's
some
captain
internals,
let's
say
to
reach
out
to
prometheus
and
query
all
the
data
and
captain
is
doing
an
evaluation,
and
I
make
this
a
little
bit
bigger.
So
we
can
see
the
evaluation
is
done.
We
can
see
already
here
the
total
score.
So
it's
red
red
of
course
means
it's
not
good.
We
got
a
result
of
zero.
F
F
The
reason
for
this
is
that
there
was
the
time
the
part
was
actually
not
available
for
a
couple
of
seconds,
since
the
chaos
experiment
killed
the
part.
There
was
no
other
part
that
was
able
to
receive
the
traffic,
and
it
took
a
couple
of
seconds
for
the
pot
to
come
up.
F
So
we
with
this
experiment,
we
could
already
evaluate
the
resiliency
of
our
application,
and
what
we
can
now
do
is
to
play
a
little
bit
with
our
replica
set
or
with
other
settings
in
our
experiment.
We
will
just
increase
the
replica
account,
let's
say
to
three:
I
just
save
this
file
and
I
just
send
it
again
so
in
this,
in
this
case,
I'm
sending
the
same
instructions.
I
want
to
deploy
the
same
version,
but
I
want
to
have
a
replica
count
of
three
this
time.
So,
taking
a
look,
we
can
already
see.
F
Captain
was
triggering
hell
to
deploy
two
other
replica
to
other
instances
of
this
part.
Once
they
are
finished,
then
capital
will
go
ahead
and
trigger
the
tests.
Again,
it's
always
the
same
workflow
or
pipeline
whatever
you
want
to
call
it.
It's
always
the
same.
First,
the
deployment
test,
the
evaluation
and
then
the
result
of
the
evaluation
with
the
promotion
or
rollback,
for
example,
of
the
artifact.
F
F
So
actually
captain
is
starting
the
chaos
runner
in
the
chaos
runners,
then
starting
the
pot
delete
part,
and
this
one
will
kill
a
random
instance
of
those
three
instances
of
our
hello
service,
I'm
not
sure
which
one
will
be
killed,
but
one
of
those
will
be
removed
and
another
one
of
course
will
come
up
just
in
a
second
here
we
go
a
new
one
is
created,
because
this
one
was
killed
by
the
experiment.
F
Everything
is
good.
It
will
take
about
30
seconds
for
it
to
come
up
and
to
be
ready,
but
in
this
case
we
would
assume
that
the
two
other
instances
they
are
ready
and
they
are
here
to
reserve
to
to
receive
all
the
traffic
that
should
go
also
to
the
to
the
kill
instance.
So
we
would
assume
that
our
quality
evaluation
regarding
our
resiliency
would
be
quite
good.
This
time,
because
we
have
a
higher
replica
set,
then
we
have
some
backup
pods
that
will
receive
the
traffic
one
of
those
goes
down.
F
So
let
us
take
a
look
in
the
captain's
bridge.
It's
again,
it's
a
new
instance
of
our
configuration
changed
it's
the
same
version,
but
this
time
it's
a
higher
replica
set
deployment
is
already
finished.
So
right
now
the
tests
are
being
executed
in
the
background
and
we
are
not
only
waiting
for
the
for
the
litmus
experiment
to
finish,
but
also
for
the
gme
that
has
to
finish.
F
We
can
see
jamie,
the
service
has
finished
and
we
can
see
the
evaluation,
and
this
time
we
got
a
score
of
100,
since
both
of
our
slos
are
evaluated
correctly
or
satisfying.
So
our
success
percentage
is
100.
There
was
no
downtime
this
time,
although
one
of
the
parts
got
killed
and
also
the
probe
duration
was
quite
fast
faster
than
this
100
milliseconds
that
we
had
set
as
a
limit.
F
So
what
we
can
see
here
is
basically,
we
included
chaos
testing
into
our
pipeline
into
our
cd
next
to
performance
tests,
and
we
called
it
the
chaos
stage.
That's
the
reason
why
you
see
a
chaos
here.
It's
indicated
right
here
because
it
was
failing,
but
you
with
this,
you
can
easily
integrate
chaos,
tests
into
your
cd
and
make
sure
to
to
evaluate
also
the
resiliency
of
your
microservices.
F
That's
the
main
idea
that
was,
it
was
a
short
and
easy
demo,
but
if
there
are
any
questions,
we're
happy
to
answer
the
questions.
Otherwise,
we
just
have
two
more
slides
as
an
outlook.
What
will
come
next
with
this
integration?
But
we
are
happy
to
answer
questions
here
as
a
we're
just
inside
of
the
demo.
D
F
Sorry,
yeah
yeah.
I
just
want
to
share
this,
but
please
please
go
ahead.
C
Now,
after
this
slide,
I
I
I
wanted
to
ask
if
I
can
take
two
to
three
more
minutes
to
give
quick
updates
and
run
through
three
to
four
flights.
F
Okay,
okay,
so
what
will
be
the
next
for
litmus
and
the
k
and
the
captain
integration?
We
will
also
take
a
look
at
the
litmus
case.
F
Results
right
now,
we're
using
prometheus
as
the
data
provider
for
for
this
evaluation,
but
also
litmus
is
exposing
a
lot
of
data
for
the
experiments
and
we're
also
taking
a
look
at
the
litmus
data,
and
we
also
want
to
improve
this
work
by
not
only
including
it
into
the
cd
part,
but
actually
testing,
self-healing
methods
and
testing
auto
remediation
by
having
litmus
experiments
as
the
tests
and
then
auto
remediation
scripts
and
auto
remediation
instructions,
orchestrated
by
captain
as
the
as
the
solution
for
problems
or
chaos
introduced
by
litmus.
F
And
if
you
want
to
learn
more
about
this
in
a
little
bit
more
detail,
we
have
a
joint
webinar
next
week
on
november
11th,
we
will
share
the
the
slides
and,
if
you
want
to
give
it
a
try
yourself,
you
will
find
all
the
resources
here
thanks
so
much
and
please
please
share
your
thoughts.
C
Yeah
sure,
no,
that
was
great
integration,
and
I
wanted
to
thank
with
the
projects
they
were
working
on
working
sessions
for
three
to
four
weeks,
so
that
was
great.
Can
I
share
my
screen
as
well
again:
yeah?
Okay,
so
we
were.
C
This
is
just
an
extension
of
you
know
the
presentation,
so
we
were
chatting
with
harry
about
you
know
what
are
the
next
steps
to
the
project
and
he
suggested
that
before
going
to
qoc,
it
would
be
good
if
you
could
present
it
in
the
sick
chairs
and
that
delivers
sick.
So,
primarily
you
know
we
have
been
seeing
a
great
momentum
in
the
last
four
to
five
months
just
before
and
right
through
after
the
project
has
become
a
sandbox.
C
And
the
maintainers
feel
that
you
know
we
already
put
incubation,
so
we
wanted
to
state
that
intent
and
provide
a
quick
snapshot
of
why
we
think
so.
So
the
real
reason
where
why
we
feel
that
we
are
good
for
applying
for
or
just
showing,
the
intent
to
move
to
incubation
is
adoption
and
which
indicates
the
project
maturity.
C
We
are
1.10
release
and
also
there
are
some
huge
deployments
in
production
for
some
time
now
and
they're
being
become
the
contributors
as
well.
So
we
feel
that
we
will
definitely
pass
their
due
diligence
very
easily
and
we
also
are
happy
to
state
that
this
project
now
has
been.
You
know
managed
not
just
by
my
data
but
great
contributions
coming
from
a
lot
of
others,
but
we
have
foreign
maintainers.
Also
in
terms
of
indeed
and
amazon.
C
You
know
red
hat
and
autopilot
container
solutions
have
been
contributing
recently
to
the
project
and
we
have
established
the
open
conference
model
where
we
have
a
lot
of
six
inside
litmus
and
there
are
about
16
meetings
that
have
happened
and
telecom
or
some
of
some
of
the
contributors
or
sig
chairs
coaches
in
inside
litmus,
and
we
actually
integrated
with
the
captain.
Octato
spinnaker
argo,
that's
one
of
the
big
ones,
all
right.
So
for
all
these
reasons
I
know.
C
Team
maintenance
team
things
that
we
will.
We
need
to
move
to
the
next
step
as
a
quick
snapshot.
These
are
our
users.
Some
of
them
are
vendors,
vendor
users
and
some
of
them
are
they
qualify
the
end
user
category
and
definitely
intuit
is
one
of
litmus
biggest
users
and
our
cncf
end
user
and
then
iag
autopilot
and
fi
networks
they've
been
using
litmus
in
production.
C
So
in
the
last
two
months
since
we
presented
the
project
here
at
capitol
during
chairs,
we
have
added
six
experiments
and
we
ran
about
eight
community
meetups
and
very
happy
to
say
that
there
are
39
contributors
and
16
members
and
our
slack
has
been
busy
because
you
can
see
that
last
two
months
we
almost
grew
again
about
40
in
terms
of
the
experiment
plans,
so
we
do
a
release
of
every
month,
so
the
community
can
expect
that.
C
Definitely
there
is
a
release
coming
out
on
the
15th
and
last
six
months,
a
lot
of
development
happened
and
we
released
a
redmos
portal,
which
is
a
full
ui
based
git
ops,
friendly
platform
for
running
chaos,
engineering
for
kubernetes.
So
as
part
of
that
effort,
we
integrated
with
argo
workflow.
C
Now
you
can
run
super
complex
chaos,
workflows
that
are
closer
to
your
real-life
scenarios
in
parallel
in
sequence,
you
know,
so
it
is
possible
to
take
a
chaos
engineering
to
a
production
level,
and
as
part
of
that,
we
also
introduce
probes
where
we
don't
tell
what
should
be
the
end
result
or
about
it.
You
can
define
what
you
think
should
be
a
pass
or
fail
for
a
given
test
in
the
hypothesis
and
as
part
of
the
requirement
for
incubation,
we
have
updated
all
our
cacd
inside
the
litmus
scales
development
they're
all
open.
C
So
we
have
a
full-faced
project
for
running
the
cac
and
also
monitoring
is
a
very
important
aspect.
So
we
started
the
architecture
completed
it
in
showing
chaos
interlude
dashboards.
For
example,
you
might
have
a
soft
shop
application
on
any
other
dashboard
for
the
farmer.
Now
we
have
a
chaos
interleaving
as
well.
So
when
you
run
some
chaos
test,
there's
a
red
mark
that
comes
right
on
top
of
it.
C
So,
and
also,
chaos
is
now
possible
to
be
limited
to
a
name
scope
and
you
can
have
multiple
operators
within
the
cluster,
so
developers
can
run
as
if
it's
a
small
application.
So
that's
a
very
quick
update
and,
as
you
can
see,
this
is
from
dev
stats.
It
shows
support
from
my
data
integrated
at
hsbc,
microsoft,
dense,
updater
autopilot.
C
These
are
the
top
10
contributors
since
sandbox,
so
I'm
very,
very
happy
and
thrilled
to
say
that
this
project
has
received
a
great
adoption
and
you
know
they're
actually
contributing
back
and
it's
working
the
way
it's
supposed
to
be
in
a
sincere
project.
So
this
is
another
step
that
says
you
know
how
the
forks
are
increasing,
which
shows
that
you
know
people
are
taking
it
changing
and
upstreaming
it
back.
C
So
these
are
just
a
quick
update
on
who's
actually
contributing.
You
know
this.
Also
one
of
the
primary
contributors
that
had
a
team
from
israel
is
adding
infrastructure
changes
to
litmus
so
that
you
can
run
chaos
from
kubernetes
on
a
resource.
That's
outside
kubernetes
right
so,
for
example,
a
vm
that
is
orchestrated
by
overt
and
container
solutions,
which
runs
litmus
on
some
of
the
huge
production
deployments.
C
They've
been
contributing
back
with
the
network,
kiosk
australist
and
openshift
support,
and
there
are
other
microsoft
team
from
japan
has
completely
tested
the
aks
certification
litmus
in
all
33
experiments
at
that
time,
so
autopilot
team
has
been
using
litmus
but
they're,
also
contributing
back
in
the
helm
shot.
So
you
know
these
are
some
of
the
in
the
interest
of
time.
I
just
wanted
to
skip
this.
I've
shared
the
slides
and
thank
you
guys
for
giving
a
few
more
minutes
into
the
agenda,
but
this
is
really
about.
C
You
know
how
litmus
project
is
running
things.
You
know
open
governance
model
meetings
and
integrating
with
the
ecosystem
projects
and
actually
gaining
adoption
in
production,
so
all
that
stuff.
So
with
that,
I
really
want
to
stop
this
presentation,
and
I
wanted
to
thank.
D
D
Yeah
yeah
thanks
for
the
heads
up
and
yeah.
Let's
let
let's
work
on
this
and
it's
great
to
see
to
see
the
project
working
forward
and
also
great
to
see
you
collaborating
closely
with
other
cncf
projects
as
well
yeah.
I
think
with
this
we
conclude
for
today.
I
wish
everybody
writes
rest
of
the
day
nice
evening,
depending
on
your
time
zone
and
talk
to
you
again
in
four
weeks,
not
in
two
weeks.
So
next,
two
weeks
we
have
kubecon
and
we
decided
not
to
have
a
meeting
during
kubecon
thanks.
D
C
You
all
right
great
job
again,
you've
been
in
gothic
bye
thanks.