►
From YouTube: Keptn Auto-remediation Working Group - May 19th, 2021
Description
Meeting notes: https://docs.google.com/document/d/1y7a6uaN8fwFJ7IRnvtxSfgz-OGFq6u7bKN6F7NDxKPg/edit
Learn more: https://keptn.sh
Get started with tutorials: https://tutorials.keptn.sh
Join us in Slack: https://slack.keptn.sh
Star us on Github: https://github.com/keptn/keptn
Follow us on Twitter: https://twitter.com/keptnProject
Sign up to our community newsletter: https://keptn.sh/community/newsletter/
A
This
time
I
think
or
young,
and
I
we
think
that
it
would
make
sense
that
we
step,
maybe
one
step
back
and
think
what
should
we
achieve,
or
what
do
we
want
to
achieve
in
the
last
meetings
we
or
let's
do
a
very
short
recap
at
the
beginning,
we
had
a
really
good
ideas
and
thoughts
and
we
were
making
up
this
mind
map
that
helped
us
a
lot
of
ordering
our
thoughts
around
autoremediation.
A
This
was
in
meeting
two
and
one
and
two,
and
we
also
did
a
little
bit
of
work
in
the
third
meeting,
and
then
we
decided
that
we
want
to
provide
a
a
template
for
a
use
case.
A
use
case
in
this
in
our
situation,
is
a
jvm
exhaustion
and
based
on
this
problem,
we
want
to
derive
the
remediation
process
that
is
necessary
to
fix
this
particular
problem
and
the
last
two
meetings
we
were
discussing
this
template
and
in
the
last
meeting
we
then
also
did
a
little
bit.
A
We
have
oh.
A
A
Great
and
yeah,
this
is
what
I
have
summarized
now,
meaning
what
we
have
worked
on
the
last
couple
of
meetings
and
what
we
have
achieved
so
far
the
mind
map.
Then
we
also
define
the
user
story
and
the
charter.
A
And
last
but
not
least,
we
talked
about
the
template
for
a
jvm
exhaustion
example,
and
this
is
where
we
stand
right
now
and
I'm
honest
with
this
working
group
now
and
well.
A
I
would
like
to
ask
you
all
all
of
you
what
should
be
actually
the
outcome
that
we
want
to
achieve
next
and
because
I
think
we
we
got
a
little
bit
lost
with
the
template
and
where
we
are
heading
towards
to
so
that
we're
not
on
our
not
focused
on
what
we
want
to
achieve,
and
I
think
this
should
be
now
an
open
discussion
on
what
our
desired
outcome
should
be
in,
for
example,
the
next
three
meetings.
A
D
It's
an
interesting
dilemma
to
have
jvm
exhaustion,
because
it
can
have
multiple
different
ways
of
being
remediated
and
it
can
have
multiple
different
sources.
So
of
all
the
things
we've
done,
I
think
we've
we
kind
of
got
our
mission,
our
charter
and
that
stuff's
fairly
good
and
and
nailed
down.
D
Anything
could
go
on
there,
but
the
thing
that
seemed
to
be
one
of
the
more
exciting
conversations
we
had
as
a
group,
at
least
in
this
group,
was
starting
to
talk
about
what
is
the
idea
for
that
jvm
exhaustion
as
an
example
to
explore
the
model
or
framework
for
how
how
an
auto
remediation
that
process,
what
that
process
looks
like
and
what
are
all
the
steps
and
decisions
along
the
way
in
that
such
a
process,
because,
in
the
mind
map
remember,
we
talked
about
permissions
and
governance
of
different
levels,
pre-approved
remediations
versus
remediations
that
need
approval.
D
Well,
if
we,
if
we
walk
through
that
example
of
a
jvm
exhaustion
that
that'll
help
us
decide
where
do
those
decisions
take
place?
How
does
how
does
an
organization
create
those
policies,
but
it's
also
because
it's
technically
exciting
I
mean
to
everyone.
D
It
was
like
all
right
now
we're
in
the
we're
getting
our
hands
dirty
in
an
actual
problem
to
remediate,
so
it
makes
us
feel
kind
of
excited
because
that's
kind
of
the
fun
to
be
like
I
didn't
have
to
go
through
those
45
steps
to
diagnose,
determine,
apply
and
then
evaluate
whether
it
worked
or
not
like
that's.
You
know,
that's
a
huge
process
to
evaluate
and
do
all
that
and
you
know
kept
in
auto
remediation.
Just
did
it
for
me
automatically.
D
So
I
think
that's
that's
to
me.
That's
the
that's
the
light
bulb
moment
that
I
think
at
least
excited
me
about
it.
At
least
it's
triggered
for
me
the
idea
of
different
levels
of
remediation,
meaning
the
levels
of
intrusiveness
or
disruption.
How
disruptive
is
a
remediation
to
the
point
where
I
have
to
do
a
code
change
and
a
push,
maybe
a
little
less
than
that
would
be
an
application,
configuration
change
and
a
push
a
little
bit
less
than
that
would
be.
D
Maybe
it's
a
machine
level
configuration
change,
something
in
the
operating
system
or
in
the
container
framework
and
push
or
like
I
don't
have
to
touch
any
of
that.
I
just
need
more
instances
or
I
need
more
of
a
resource
memory
faster
network,
something
like
that.
That's
at
a
very,
very
low
level
in
the
infrastructure.
D
Maybe
it's
even
dns
changes.
Maybe
it's
something
outside
the
application
stack.
That
would
help
remediate
something.
Caching
is
a
good
example
of
that
something
completely
separate.
The
application
doesn't
even
know
it's
happening,
but
we're
starting
to
cache
things
externally.
So
getting
that
model
for
to
walk
through
that
example,
I
think,
is
a
good
for
me.
That's
that's
an
outcome
that
focuses
other
things.
We've
talked
about
on
to
what
we
really
what
that,
aha,
that's
cool
moment
really
becomes,
and
then
it's
also
something
that
I
believe
is
deliverable
like
hey
here.
D
Are
the
automa
auto
remediat
the
auto
remediation
package?
If
you
will
a
collection
or
assembly
of
things,
the
auto
remission
irritation
assembly
for
jvm
exhaustion
like
we
do
one,
for
you
know
threads
controlling
threads.
We
we
could
do
one
for
general.
You
know
network
latency
at
a
very
general.
They
get
very
specific
here's.
You
know
caching
and
dot
net.
You
could
do
that
so
you'd
have
very
very
narrow,
scoped
packages
or
assemblies
of
policies
for
auto
remediation,
very
kind
of
general
ones,
but
to
me
walking
through
that
example,
gets
us
to
those
kinds
of
outcomes.
B
Cool
yeah,
so
maybe
I
just
add
my
my
thoughts
here
so
for
me,
the
the
most
interesting
discussions
where
actually
what
I
really
like
around
the
mind,
map
to
kind
of
do:
brainstorming
open
up
all
the
different
possibilities,
all
the
things
that
are
kind
of
involved
into
auto
remediation
to
find
out.
Okay,
what
are
like,
maybe
let's
say,
organizational
boundaries.
What
are
the
technical
issues
which
are
more
related
on
the
infrastructure
side?
What
are
the
parts
that
one
application
has
actually
to
provide
to
be
able
to
be
auto
remediated?
B
Maybe
not?
Each
application
can
be
auto
remediated.
What
are
the
like
approval,
steps,
so
kind
of
laying
out,
let's
say,
a
a
framework,
but
not
a
framework
in
the
technical
sense
and
in
the
implementation
sense,
but
more
a
framework
where
you
can
see?
Okay,
these
are
all
the
different
parts,
and
if
you
want
to
build
auto
remediation,
then
you
have
to
think
about
all
of
these
things
and
if
you
want
to,
if
you
want
to
implement
it,
you
have
to
think
about
this.
B
Those
those
different
parts
which
are
actually
the
leaves
in
our
mind,
but
these
are
the
different
aspects.
So,
let's
call
them
aspects,
and
I
also
think
that
doing
this
applying
this
onto
an
example
gives
really
it's
kind
of
the
confirmation
that
it
really
makes
sense
that
it's
general
applicable,
but
it
does
not
mean
that
it's
fully
in
in
all
detail.
Maybe
it's
not
fully
kind
of
matured
in
each
detail,
but
it
should
be.
It
should
give
a
very
good
understanding
of
all
the
different
parts
that
are
involved.
B
Regarding
the
example
itself
of
the
jvm
exhaustion.
I
think
it's
a
very
good
practical
example
to
do
it
for
me
personally,
it's
great
to
see
how
it
works
for
the
jvm
exhaustion.
B
Just
me
personally,
I
would
not
go
into
other
examples
and
then
kind
of
have
a
look
at
how
I
don't
know
a
process
crash
would
would
look
like
and
what
are
the
different
options
for
for
crashes
of
processes
and
and
dig
dig
deeper
there.
Maybe
this
is
something
where
we
can
have
more
folks
from
the
from
the
working
group
or
just
interested
once
joining
and
say:
okay,
now
there
is
this
framework.
How
can
we
use
this
framework
on
for
my
example
and
kind
of
do
the
ex?
B
Do
the
exercise
that
where
we
provide
the
where
we
provide
the
framework
or
the
template
and
others
can
then
do
the
exercise
and
do
it
and
then
maybe
we
can
as
a
larger
group?
Maybe
then
we
can
come
out
again
with
the
catalogue
of
different
things,
but
for
this
group
I
would
see
more
a
desired
outcome
on
kind
of
a
written
way.
B
How
one
can
apply
the
the
framework
that
we
are
that
we
have
kind
of
developed,
yeah.
D
This
is
not
to
totally
interrupt
you.
The
word
model
is
coming
into
my
brain,
but
it's
more
there.
When
we,
we
wrote
a
book
at
microsoft
on
this
air
on
asp.net,
it's
one
of
the
old
original
asp.net
performance
and
scalability
books,
and
we
as
part
of
the
sql
server
team,
cooperating
with
the
the
ias
team
at
the
asp.net
team.
We
had
pretty
some.
I
have
pdfs
of
them
somewhere,
I'm
sure
we
can
find
them
on
the
internet.
D
Very,
very
elaborate
flow,
charts,
logic,
flow
charts
and,
of
course
they
were
fairly
linear
in
terms
of
how
you
branch,
your
thinking
and
there's
multiple
branches
of
thinking
to
remediate.
D
In
a
very
simple
example,
it
was
like
sql
server
performance,
you
look
at
locking
and
then
you
look
at
blocking
and
then
you
look
at
resources
and
then
you
look
at
and
if
I'm
in
a
performance,
sql
performance,
I
would
go
through
these
steps
and
there
was
a
logical
sequence,
meaning
it
doesn't
make
any
sense
to
look
at
cpu
until
I
resolve
these
two
other
issues
and
bring
the
data
from
those
two
steps
into
the
third
step,
and
so
at
a
very,
very
low
level.
You
might
ask
all
right.
D
I
looked
at
locking
at
the
same
time,
according
to
the
model
in
the
mind
map
that
we
went
through
all
right,
I
found
something
that
I
can
fix.
Do
I
have
permissions
to
fix
it?
How
long
will
it
take
to
fix
it?
Is
that
fit
within
an
slo
or
a
of
a
fixed
budget
right
or
an
error
budget?
So
there's
all
kinds
of
things
from
our
our
brainstorming
model
that
would
come
into
each
step
of
the
process.
So
I
kind
of
I'm
connecting
the
two
just
hearing.
D
You
talk
that
there
may
be
another
outcome
that
is
sort
of
hey
here's.
Some
elaborate
logic
and
the
making
it
look
huge
was
great
for
us,
because
I
could
increase
my
bill
rate.
It
was
because
people
go
holy
crap,
that's
what
your
brain
has
to
go
through
to
figure
out
a
sql
server
performance
problem:
yeah,
that's
not
a
deficiency
of
sql
server,
that's
just
any
database!
D
You
have
to
go
through
all
these
steps
to
figure
out
how
to
fix
it,
and
so
it
was
an
interesting
way
to
sort
of
display
the
complexity
of
performance,
interdependent
resources
and
performance
problem,
identification
remediation,
which
is,
I
think,
we
end
up
talking
about
it
casually.
At
least
I
do
because
it's
just
baked
in
my
brain,
but
if
you
really
spill
it
out
like
we
did,
and
we
would
do
it
gets
really
elaborate,
but
it
could
map
back
to
the
brain.
To
the
mind
map
say:
oh
yeah.
A
D
Yeah
and-
and
it's
like
all
right-
this
example
for
jbm
exhaustion,
there's
some
parts
of
the
model
that
we
didn't.
We
didn't
have
to
visit,
meaning,
let's
say
we're
remediating,
something
where
I
don't
have
to
escalate,
because
it's
a
fairly,
not
disruptive,
step.
Okay,
first
step,
let's
say
you're
going
to
do
an
you're
going
to
hot
swap
memory,
but
it's
not
available
to
the
operating
system
until
you
restart
that
system.
So
you
could
do
step
one
without
additional
permission,
then
step
two
you're
going
to
restart
that
node.
D
So
now
there's
another
level
of
notification
in
the
model
where,
like
who
do
I
talk
to
do
I
have
permissions?
Do
we
have
you
know
that
kind
of
stuff?
Is
this
a
highly
sensitive
application
versus
a
pretty
much
a
throwaway
part?
Can
we
just
throw
that
container
away
and
fire
up
another
container
with
more
memory?
D
That
kind
of
remediation
but
we'd
be
able
to
go
through
each
step
and
say:
okay
for
jvm
exhaustion?
We
only
used
four
of
the
areas
of
the
model.
Let's
come
up
with
another
example
that
now
goes
into
let's,
as
an
example,
database
indexing,
a
missing
index
very
common
in
almost
every
rdbms,
and
even
other
issues
in
nosql
you'll
find
things
that
are
indexes
and
keys
are
wrong.
D
You,
you
would
just
pick
another
example
to
illustrate
other
parts
of
our
model
like
a
high
risk
part
of
an
application.
There
may
be
other
controls
and
permissions,
so
we
have
to
add.
You
know,
steps
we're
fine,
so
we
could
find
maybe
two
or
three
different
examples
down
the
road
that
would
flesh
out
sort
of
the
whole
model,
because
I
think
for
all
of
our
thoughts,
finding
one
problem
to
solve
in
production
that
hits
all
parts
of
that
model
might
get
a
little
bit
crazy,
a
little
bit
overwhelming
to
somebody.
D
I
just
want
the
concept,
so
here's
a
fairly
simple
remediation.
I
need
to
expand
the
cluster,
add
more
nodes,
okay,
fairly
disruptive.
We
don't
have
code
changes,
we
don't
have
config
changes.
We
just
need
more
nodes
in
a
non-elastic,
not
an
autumn.
Auto
scaling
type
of
situation,
and
maybe
the
abbreviation,
is
then
last
step
is
recommendation
for
the
future.
Then
we
talked
about
hey:
did
we
learn
and
recommend
what
we
think
should
be
fixed
down?
The
road
hey,
get
your
application
to
enable
auto
scaling.
D
You
know
with
xyz
feature
kind
of
thing:
jvm
exhaustion
is
somewhere
in
the
middle
depending
on
the
source
of
g.
Are
you
just
you
just
don't
have
enough
memory
and
you
need
more
memory
so
that
gc
is
working
efficiently.
That
could
be
it
just
limited
heap
or
is:
do
you
have
a
memory
leak
somewhere
in
the
application
where
it's
growing
and
growing
maybe
of
a
static
structure?
That's
not
coded
properly,
and
it's
just
growing
and
growing
over
time,
depending
on
throughput.
B
Do
we
have,
in
the
in
the
mind,
map
some
kind
of
requirements
for
the
for
the
application
itself?
So
you
just
said:
if
there
are
some
issues
with
jvm
memory,
then
just
build
the
the
boundaries
of
this
that
it
fits
that,
like
gc,
can
come
up
with
a
nice
way
to
to
to
do
the
the
garbage
collection,
but
are
there
maybe
some
other,
like
prerequisites
for
applications
that
they
have
to
be?
I
don't
know,
maybe
one
thing
is
they
have
to
be
state
class.
B
They
have
to
be
like
conf,
they
have
to
be
kind
conf
available
or
able
to
be
configured
from
the
outside
or
some
maybe
like
this
12
factor
applications,
or
maybe
it's
just
three
factors,
or
maybe
it's
something
else.
Maybe
I
I'm
not
even
sure
what
it
is
or
like
for
infrastructure.
It
has
to
be
accessible
via
some
kind
of
like
infrastructure.
B
As
code
you
have
to
have
infrastructure
as
code
if
there
has,
if
there
is
a
manual
way
to
do
some
things
and
there's
no
way
to
do
an
api
call
to
kind
of
approve
something
right.
If
there
is
always,
if
you
install,
I
don't
know,
you
do
a
apt-get
install
and
you
cannot
do
a
minus
y
to
automatically
accept
and
and
the
prompt
is
waiting
for
the
weekend
for
someone
to
type
in
yes
and
then
so.
Maybe
there
are
some
kind
of
prerequisites
for
applications
that
we
can
also
put
into
this
framework.
B
D
Sure,
even
in
even
in
even
in
a
micro
services
versus
monolith
type
of
situation
in
microservices
you,
you
could
t
something
up
to
push,
but
certain
organizations
don't
allow
it
to
go
all
the
way
through
because
of
either
a
compliance
issue
or
whatever
there
is
somebody
sitting
there
going
well,
it's
a
manual
package
drop
like
cd
only
took
us
so
far,
and
the
zip
file's
sitting
right
there.
All
you
have
to
do
is
double
click
on
it.
You
know
and
it'll
go,
but
then
no
one's
available
to
do
that.
D
So
remediation,
where
main
we
did.
I
think
the
mind
map
to
your
original
question.
We,
I
think
we
got
to
the
precipice.
We
got
right
to
the
edge
of
opening
that
discussion
and
to
me
I'm
gonna,
I'm
gonna
try
a
little
wizardry
here
for
you.
If
you
guys
can
see
this.
Do
you
see
this
yeah
to
me?
It's
that
level
where
there's
things
at
a
very
low
level,
which
is
just
adding
memory
right
at
the
very
bottom.
D
I
can
take
an
existing
machine
or
I
can
just
re
reconfigure
and
like
the
app
doesn't
even
know
that
it
just
has
more
memory
available,
but
the
container
framework
would
know
or
whatever,
even
if
it's
a
physical,
redeployment
or
no
redeployment,
hot
swap,
the
memory
in
I
didn't
have
to
call
a
developer
whatever
I
can
throw
memory
at
a
problem,
and
maybe
it
buys
me
something
you
just
need
more
memory
and
maybe
maybe
kept
in
or
auto.
D
Remediation
would
basically
do
a
little
diagnosis
of
saying
we
can
see
where
your
gc
pattern
is
changing
over
time,
and
so
we
think,
if
you
just
give
us
a
little
more
overhead,
we're
having
forced
gcs
because
of
heap
limitation.
But
if
we
give
more
heat
space
without
changing
the
jvm
configuration
or
anything
just
have
more
available
memory.
It'll
we
can
hit
that
peak
and
have
gc
be
more
intelligent,
especially
g1
gc
and
java
11,
and
later
g1
really
is
smart
about
knowing
how
to
stay,
but
if
you've
crunched
it
it's
like.
D
I
really
look
at
these
structures.
I
would
really
like
to
have
not
forced
gc
because
of
heap
exhaustion,
but
that's
like
a
to
me.
The
model
from
the
mind
map
would
be
like
level
one.
Then
let's
say
you
actually
had
to
do.
Jbm
config
push
if
you're
jurgen
to
your
point.
If
config
is
code,
that
jvm
config
has
to
be
pushed
back
through
cd,
and
maybe
I
do
that
change
and
because
that's
more
elaborate
and
a
longer
process,
more
people
involved,
then
our
mind
map,
where
we
talk
about
governments,
is
this
a
pre-approved
remediation.
D
Has
it
been
tested
in
pre-prod
where
everything's
a
lie,
but
have
we
previously
passed
a
quality
gate
in
kept
in
to
say
that
is
an
approved
remediation?
You
can
just
push
the
button
and
we
tested
it,
which
means
it
doesn't
have
the
minus
y
and
there's
it'll
just
go
tweak
the
jvm
automatically
and
push
different
heap
settings.
D
If
you
went
further
than
that,
you
could
get
into
the
ear
more
detail,
but
now
you've
got
app
config,
so
things
within
the
application
layer
that
are
on
top
of
the
jbm,
and
that
would
be
like
a
level
three
remediation
and
again
even
more
permissions.
If
you
took
this
entire
stack
and
said,
I'm
going
to
put
this
at
in
a
high
risk
app
I've
got
now.
I've
got
additional
steps
at
each
of
those
to
get
permissions
to
get.
You
know,
notifications
change.
Oh,
this
is
a
high
risk
app.
D
There
was
an
even
if
we
think
it's
benign
and
it
was
a
pre-approved
remediation,
we're
going
to
give
a
notification
to
the
app
stakeholders
to
say:
oh
hey,
one
of
our
high
risk,
apps
high
revenue
generating
high
dependencies,
went
through
a
change
that
is
supposedly
benign
and
here
are
extra
information
about
what
to
do.
If
something
doesn't
seem
right
if
you
who
to
call
etc.
So
there's
other
steps
in
our
mind,
map
thinking
that
would
play
over
here
where,
as
a
low-risk
app,
let's
say
this
is
medium,
and
this
is
low-risk
app.
D
We
could
do
all
three
levels
of
remediation
over
here
in
low
risk.
Without
any
extra
notification
no
didn't
have
to
call
anyone
nothing.
We
just
take
care
of
that
app
and
keep
it
running
in
the
medium
side.
Maybe
when
it
gets
to
app
config,
we
need
the
technical
product.
Owner
gets
a
notification
and
maybe
the
lead
technical
dev,
you
know,
gets
invited
to
a
15
minute.
Hey
can
you
be
on
call
while
we
push
this
app,
config
change,
etc,
etc.
D
So
that's
the
idea
where,
for
me,
walking
through
an
example
with
different
levels,
allows
you
to
say:
hey
here's
where
I
would
not
have
to
use
a
lot
of
our
thinking
in
a
very
elaborate
mind
map,
whereas
high
risk
or
compliance
or
high
throughput
high
risk
could
be
high
throughput
as
well.
It's
under
tremendous
load,
extra
steps
for
load
balancers.
D
D
B
Yeah,
I
really
think
that
these
concepts
can
can
really
understand
the
whole
complexity
of
auto
remediation,
and
I
think,
at
the
very
beginning,
one
it
was
never
a
goal
to
do
the
whole
implementation.
B
The
goal
was
always
to
showcase
what
is
part
of
what
what
has
to
be
part
of
a
modern,
auto
remediation
tool,
let's
so
to
say
something
that
is
not
existing
yet
so
we
don't
have
to
focus
on
captain
how
it
is
right
now
we
should
focus
on
how
we
need
to
build
it
and
helping
with
all
these
examples.
B
I
think
it
really
helps
if
we
can
think
about.
Okay,
where
is
this
classification
of
low
medium
high
risk,
for
example?
Is
it
only
in
the
remediation
part?
Maybe
it's
already
in
the
deployment
part.
Maybe
it's
in
the
whole
application
life
cycle
part,
maybe
high
risk
applications
need
to
be,
let's
say
for
high-risk
application.
We
enforce
security
scans
during
deployment,
so
this
could
be
one
of
the
things
where
captain
already
takes
care
of
this.
B
I
know
now,
I'm
again
talking
about
captain,
but
it's
just
something
that
it's
not
only
for
the
remediation
part
yeah,
and
then
it
gives
it
like
a
justification.
Why
it's
so
important
for
remediation,
because
it's
not
only
important
for
remediation.
So
we
really
have
to
put
this
into
the
into
the
concept
of
also
in
remediation,
because
it's
actually
the
whole
application
life
cycle.
B
If
we
think
of
a
rollback,
then
maybe
a
high
risk
application
has
to
be
treated
completely
differently
in
a
rollback,
because
manual
approvals
are
needed
and
they
are
needed
because
it's
of
high
risk,
yeah
and
and
he's
also
kind
of
involves.
Then
the
deployment
already
in
the
first
place.
D
Yeah,
I
think
I
I
obviously
lean
heavily
into
captain's
definition,
I
would
say
definition
but
implementation
of
the
quality
gate
concept,
because
that's
near
and
dear
to
my
heart
or
it's
in
my
experience
in
my
mind,
more
than
anything
and
if
you're
not
using
kept
in
to
do
quality
gates,
you
might
be
doing
that
in
very
old
school
ways.
You
might
have
a
fairly
elaborate
quality
gate
concept
on
your
pipeline.
D
The
other
question
that
expands
for
me
on
that
and
the
reason
I
see
captain
quality
and
the
automation
of
your
pipeline
working
in
tandem,
is
because
you
have
multiple
pipelines
flowing
into
an
entire
app
landscape
and
so
you're
doing
auto
remediation,
not
just
on
a
single
pipeline.
You've
got
to
go
back
and
say:
oh
well,
wait
a
minute.
I
can
make
this
change,
but
that
also,
in
fact,
this
dependency.
This
dependency
database
indexing
is
a
good
example.
D
If
I
just
go
hey
this
application
uses
this
table
for
this
index,
but
there's
40
other
applications
that
hit
that
same
table.
Okay,
because
we
still
have
a
monolithic
database
or
a
centralized
database
concept.
Okay,
I
can't
just
tweak
this
index.
For
me,
there's
a
now,
I'm
hitting
some
other
roadblocks
in
the
idea
of
auto
remediating
a
missing
index.
I've
got
three
other
there's
an
index
there
already
that
covers
five
of
the
six
attributes
that
I
need.
I
can
just
add
the
sixth
in
there
because
it
may
affect
that
other
application's
query.
D
This
is
really
what
we
pay
the
human
brain
to
do
all
day,
long
as
a
dba
designer
developer
modeler.
It's
like
all
right,
I'm
going
to
figure
out
how
how
many
different
stored
procedures
to
talk
to
account
data.
Do
I
can
I
really
support
under
different
workloads
and
do
I
need
to
split
them
out.
I
have
three
different
account
tables
now
in
the
nosql
world.
D
They
have
some
similar
data
when
I
need
to
coalit
coalesce
all
those
tables
and
go
do
reporting
somewhere,
that's
a
whole
separate
process,
so
there's
different
ways
to
solve
that
problem
of
that
layered
independent
interdependency
to
me,
that's
where
remediation
generally
gets
really
complex,
so
we
picked
a
fairly
simple
one,
even
jvm
exhaustion,
but
when
you
think
about
it
in
in
this,
in
that
context,
it
quickly
gets
more
complex.
Even
when
you
start
applying
our
thinking
on
it
anyway,.
D
High
permission,
high
permission
versus
low
permissions,
meaning
manual
intervention
for
permissions
high
pre-validated
remediation
steps
versus
low
pre-validated,
meaning
they
have
has
the
have
these
changes
ever
gone
through
the
quality
steps
or
quality
gates
to
to
say,
hey,
we're
pretty
we're
pretty
comfortable
in
our
experience
about
changing
jbm
settings,
we've
done
a
lot
of
them
over
the
years.
We're
just
going
to
tweak
this
one,
like
all
the
other
applications,
so
that
that
you
know
previous
experience
prior
knowledge.
D
Permissions
things
like
that
could
go
into
the
mind
map,
and
maybe
that
does
become
a
model
and
I
think
they're
there
to
me
there's
still
a
flowchart
in
there
there's
a
you
know
what
are
all
the
things
I
have
to
go
through
as
almost
a
checklist
before
I
take
any
action
or
even
try
an
action.
The
other
thing
you
think
about
the
akamas
guys
I
mean
they're,
they're
sort
of
they'll
put
in
a
setting.
They'll
run
something
and
they'll
get
the
data
back.
D
You
could
even
do
in
production,
say
well
we're
gonna
of
the
16
different
containers,
we're
going
to
tweak
it
and
move
one
of
them
put
a
little
traffic
onto
there,
monitor
that
and
then
see.
If
that
change
is
good.
That's
like
when
you're
developing
the
remediation
there's,
even
people
that
under
pressure
are
like
well,
let's
try
it
try
one
see
how
it
goes.
If
that
looks
good,
we'll
push
it
across
all
the
rest
of
the
containers
and
restart
them.
D
B
B
I
don't
know
what
I
want
to
buy
like
more
coffee
beans
and
but
we
could
also
think
about
auto
remediation
as
it's
not
meeting
slos
and
the
slos
are
defined
as
running
very
performant
like
the
akamas
guys,
they
are
always
optimizing
the
jvm
settings,
so
maybe,
if
we're
not
hitting
our
slos,
we
treat
it
as
an
autoramination
problem.
So
we
could
go
back
and
forth
with
autoramination
to
kind
of
do
auto
remediation
steps.
B
It
would
be
more
on
the
low
impact
part
and
would
would
hopefully
go
without
approvals,
but
these
kind
of
doing
one
thing
giving
it
a
try,
testing.
It
then
evaluating
it
doing
another
thing,
giving
it
a
try,
then
evaluating
it.
It's
it's.
Basically,
it's
the
same
workflow
or
it's
the
same
flow
chart
for
auto
remediation
as
well.
B
Just
that
the
trigger
is
not
an
incident
or
an
issue
or
a
problem,
but
the
trigger
is
because
we
failed
our
slos
that
are
specifically
only
made
for
optimizing
the
application,
so
it
could
also
be
like
on
the
it's.
It's
not
the
core
problem,
but
it's
the
same
idea,
optimization
as
a
remediation
problem.
D
What
about
something
famous
from
about
10
years
ago
was
this
idea
of
cost
optimization,
which
was,
if
you're
looking
at
something
in
like
cloud
found
or
abstraction
layer
in
the
cloud?
Hey,
it's
cheaper
to
run
my
containers
over
on
those
guys
than
it
is
on
these
guys.
They
announce
a
pricing
change
and
like,
if
you
put
the
cost
of
the
platform
cost
of
the
infrastructure
into
the
model.
D
Could
I
just
don't
change
a
line
of
code?
No
one
even
knows
hey
we're.
Now
we're
running
on
these
containers
over
here
we're
running
on
that
with
that
provider
over
there.
So
it
there's
not
even
the
the
upside
is
not
great
performance.
It's
just
same
performance
costs
you
a
thousand
dollars
less
a
month.
B
B
What
they,
the
folks
at
ida
rate,
are
doing
yeah,
you've
heard
of
them,
so
either
rate
is,
I
I
think
of
it.
It's
a
little
bit
of
a
captain
quality
gate,
but
they
don't
call
it
the
quality
gate,
but
they
more
want
to
do
like
ap
testing
canary
releases
more
they
they
say,
iterate
makes
it
easy
to
optimize
business
metrics
and
validate
slos.
When
you
release
new
versions
of
kubernetes
apps
yeah
yeah.
D
I
think
you're
right
there,
there
is
a
that
should
be
in
the
model.
You
know
what
is
the
impetus?
What
is
the
motivation?
Is
it
and
then
it
could
be?
You
know,
refinement
or,
like
you
say,
optimization
as
as
iterating
on
the
the
continued
we've
been
remediating,
this
same
way
the
entire
time,
but
is
there
a
feedback
mechanism
that
says
you
know
every
time
we
do
that
we
should
have.
D
We
should
wait
longer
between
changing
the
load
balancer
and
then
give
it
more
time
or
there's
very
technical
ways
of
refining
that,
but
then
to
these
other
goals
yeah
could
I
is,
does
it
change
conversion
rate,
the
a
b
testing,
honest
to
god
that
kind
of
testing?
I
always
think
you're
right
just
it
should
be
one
of
many
quality
gates
it.
Just
it's
not
coming
from
a
hard
requirement.
It's
coming
from
our
friends.
Call
it
a
desirement
I
desire
to
make
more
money.
I
desire
to
have
a
better
conversion
rate.
D
I
desire
really
nice
colors
or
you
know,
on
my
website.
I
want
to
change
my
colors,
but
you
know
what
suddenly
some
graphics
engine
just
goes
right.
It
looks
terrible.
So
optimizing
other
telemetry
yeah
yeah.
B
Yeah
you
could
optimize
for
availability
or
for
revenue
so
sure.
Of
course
you
need
some
kind
of
availability
for
revenue,
but
yeah
or
like.
D
I
mean,
but
I
mean
it
can
also
be
a
sentiment
right.
We
know
that's
a
a
big
thing.
You
know
if
you've
got
feedback
mechanisms
on
your
on
your
website,
so
people
we
made
some
changes
and
unrelated
to
the
change
people
you
know
had
a
10
uptick
in
people.
That
said,
your
website
is
great
awesome.
Okay,
cool.
B
A
As
we
were
working
on
the
mind
map
all
right,
gentlemen,
what
should
be
the
next
step.
B
So
I
would
sorry
I
just
had
to
put
down
my
phone.
I
I
would
suggest
to
take
the
mind
map
put
it
into
a
written
document,
because,
right
now
all
the
leaves
they're
just
the
bullet
points.
We
we
know
what
we
mean
with
it,
but
I
think
adding
a
sentence
or
two
for
each
of
them
would
be.
B
We
already
come
up
with
a
kind
of
a
large
document
with
just
the
description
of
the
mind
map
and
then
once
we
have
this,
we
can
go
on
to
the
template
and
johannes,
and
I
we
already
did
a
couple.
We
we
used
this
template
already
today
and
put
in
the
tools,
for
example,
which
are
used,
because
this
was
also
one
of
the
the
action
items
from
from
last
meeting.
B
It's
I
think
today,
we've
discussed
it's
not
about
the
tools,
it
should
be
nevertheless,
tool
agnostic
and
that
can
later
on
it
can
be
a
nice
framework
for
us
to
find
in
the
in
the
ecosystem.
What
are
the
tools?
What
are
they
doing
now
nowadays
and
then
reach
out
to
them
and
invite
them
also
back
to
to
this
working
group,
or
maybe
we
initiate
a
new
working
group
under
a
new
umbrella
or
whatever.
D
Or
a
working
session
specific
to
what
they
do,
like
it
iterates
a
great
example
akamas
a
great
example
they're
already
a
partner
to
like
hey.
You
know
we
want
to.
We
want
to
take
what
you
do
and
try
it
out
in
this
new
model
and
get
some
feedback
from
them.
Yeah.
B
Yeah
we
were
discussing
today
that
also
that
there
are
these
special
interest
groups
that
are
part
of
the
cncf,
and
maybe
so.
This
could
be
something
where
we
then
open
up
a
little
bit
more
and
ask
others.
Okay,
can
you
please
present
how
you
are
doing
this?
Does
it
fit
to
the
model
that
we've
already
developed?
Does
it
or
do
we
need
to
extend
the
model?
Maybe
we've
only
seen
this
from
one
angle
and
one
perspective.
B
I
think
we
had
already
great
discussions
here,
so
I
I
would
pretty
much
be
be
surprised
if
we
missed
one
part
if
we
totally
missed
one
part
but
having
this
and
then
giving
this
as
a
kind
of
as
an
input
for
for
further
discussions
could
could
really
help,
but.
B
To
have
a
written
document,
do
the
the
mind,
mapping
describe
all
the
the
beliefs
and
then
all
the
all,
the
other
parts
of
the
mind
map
also
describe
the
template.
I
think
kevin
has
also
done
a
great
job
here
in
describing
everything
that
was
going
on
in
their
organization,
very
detailed,
marcus.
All
you
have
also
done
this
great
job
on
on
the
user
story
that,
where
that
can
fill
in
here,
so
I
think
we
already
have
something
me
personally.
A
B
Use
the
model
for
the
template:
we
have
the
example
for
it,
and
this
could
be
something
I
think
it
might
be
way
too
long
for
a
blog
article,
but
like
a
lot
of
working
groups,
they
then
come
up
with
a
white
paper
at
the
end,
it
can
also
be
a
block
article.
It
can
be
whatever,
but
something
to
share
then
with
others
that
have
not
been
yet
part
of
this
working
group
so
that
we
can
also
get
to
get
their
feedback
and
we
can
use.
B
This
eventually
also
is
part
of
a
new
implementation
or
a
further
implementation
improvement
in
in
captain
yeah,
but
also
to
it's
a
food
for
thought
for
the
whole
community.
D
Yeah
so
things
that
would
go
in
that
document
jurgen,
I
think
we
see,
I
see,
maybe
a
bl
like
a
flow
chart,
but
it's
also
without
process
flow.
There
could
be
like
a
block
model.
What
are
the
building
blocks?
That
are
all
part
that
you
have
to
think
about
like
when
I
go
in
to
remediate
problems,
I
kind
of
it's
a
it's
like.
I
could
go
anywhere.
I
need
to
within
these
different
considerations
for
well
what
application
does
this
test?
D
What
another
thing
we
could
do
in
the
paper
are:
here's
the
you
know
the
list,
a
checklist
of
questions
that
would
even
be
inputs
to
something
in
an
automated
auto
and
an
automated
auto
remediation
product
yeah
process
would
be
you're,
always
ingesting
information.
That
tells
you.
Okay,
here
are
the
blocks
of
the
model
that
are
going
to
be
most
important
for
what
you're
dealing
with
and
the
third
mo
block
model.
I
think
there's
a
list
of
questions
for
ingest.
What
what
are
the
questions?
D
It's
it
always
starts
in
the
first
n
number
of
hours,
with
gathering
data
gathering
information
and
pulling
them
into
my
brain
model
and
then
figuring
out
exactly
you
know,
oh
okay.
What?
Then
I
asked
this
loop
back
on
questions
all
right.
I
see
what
could
be
the
thing.
Is
this
something
we
try
is
this
something
that
I
know
will
fix
it,
how
confidence
so
that
makes
me
think
of
the
vectors
meaning
in
our
in
our
model.
D
Here
again,
the
vector
was
low,
medium
high
and
I
just
said
risk
so
risk
would
be
a
vector
permission
or
experience
would
be
we're
very
experienced
in
tweaking
jvms
we've
been
doing
it
for
20
years.
We
know
a
lot
about
it,
so
we
can
automate
that
knowledge
on
a
vector
disruption
would
be
another
dependencies
would
be
these
different
vectors.
I
think
we
described
them
as
when
I'm
looking
at
the
block
model
are
some
of
those
blocks
more
complex,
less
complex,
more
sensitive
or
less
sensitive.
D
Given
the
environment,
because
you
could
walk
into
another
system
and
say
well,
there's
very
low
dependencies
because
we're
in
microservices
and
we
can
make
changes
without
really
putting
at
risk
the
majority
of
the
monolith.
But
if
I
walk
into
a
more
monolithic
application,
I'm
like
well,
you
know
what
this
is
the
jenga
tower.
I
can't
just
go
pulling
pieces
out
and
starting
to
try
to
fix
things.
D
There
has
to
be
extra
testing,
maybe
extra
you
know
retries
before
simple
roll
out
stuff,
so
I
think
that
those
could
be
some
of
the
things
we
write
up
in
terms
of
the
model,
how
it,
how
it
would
visually
appear,
a
flow
chart
to
how
to
apply
it,
because
once
you
take
an
abstract
model,
apply
it
to
the
step
by
steps
now
you're
you're,
going
through
the
logic
flow
vectors.
D
What
data
points
to
be
ingested,
you
know
the
and
those
could
block
blog
articles
could
come
out
of
each
of
those.
You
could
just
say:
hey.
You
know.
Let's
here
are
the
top
10
questions
that
we
ask
when
it
you
know
when
we're
going
to
start
to
build
auto
remediation
engine.
You
know
what
kinds
of
data
inputs
do
we
need.
We
start
with
some
questions.
A
And
I
I'm
listening
now
and
and
and
ordering
my
thoughts
I
always
have
this
chart
in
my
mind
that
has
four
quarters
and
yeah
with
two
two
angles:
two
directions
and
yeah.
We
can
also
put
different
vectors
or
we
can
use
different
the
vectors.
Maybe
one
is
for
risk
and
the
other
one
for
dependencies
and
then
depending
on,
in
which
quarter
you
are
in
maybe
other
recommendations
and
aspects
you
need
to
consider
and
yep.
We
could
kind
of
provide
this.
A
This
quadrant,
it's
a
quadrant
and
tell
them
the
reader
or
the
person
that
is
interested
in
which
aspect
is
relevant
when
you
are
in
which
quarter
kind
of
recommendation.
D
You
can
have
a
really
secure
system,
a
really
performant
system
or
make
your
boss
really
angry,
because
you
took
a
bunch
of
risk
without
their
permission,
but
I
think
there's
I,
I
think
that's
to
make
keep
to
communicate
something
simple
and
then
ease
people
into
the
idea
of
your
kind
of
now
you've
gone
through
this
first
quadrant
map
to
figure
out
where
you
are:
let's
double
click
on
that
and
drill
down
and
look
at
now,
you've
got
these
other
things
at
the
next
level
of
complexity,
which
is,
I
think,
what
makes
this
area
of.
D
First
of
all,
the
exciting
thing
to
me
is
just
how
your
brain
works
to
go
through
this
and,
if
you've
done
it
a
couple
thousand
times
in
your
career
it
I
don't
often
stop
to
even
stop
my
own
brain.
It's
like
you
know,
I'm
crossing
the
most
interesting
things
are
the
stuff
that
I
apply.
Apathy
like
what
are
the
things
that
I
feel
safe
that
I
can
put
out
of
scope.
Okay,
I
know
I'm
not
going
to
touch
x,
because
that
has
nothing
to
do
with
my
next
hypothesis
to
go
down
and
fix
something.
D
So
there's
a
there's
even
like
a
whole
other
article.
Maybe
maybe
I'll
give
a
talk
about
this.
Like
I
don't
care,
I
like
apathy
as
that's
my
guiding
light.
I
I
go
into
a
situation
where
I'm
nervous
and
I
care
about
everything
and
then
I
process
of
elimination.
Here's
all
the
things
I
don't
care
about
and
then
get
to
the
root
of
something.
So
maybe
there's
a
whole
other
inverted
thing.
Instead
of
hey,
we
need
to
expand
everything.
We're
aware
of.
You
also
need
to
know
how
to
safely
ignore
stuff.
B
Yeah,
we
just
had
a
talk
recording
today
with
our
friends
from
litmus
chaos,
and
they
also
had
this,
the
the
known
knowns
that
the
known
unknowns,
the
unknown
knowns
and
the
unknown
unknowns,
and
where
kind
of
why
it
is
so
important
to
to
also
test
your
applications
to
to
always
assume
or
it.
Actually
it
was
the
other
way
around.
We
always
assume
what
are
we.
We
have
a
lot
of
some
assumptions
when
we
develop
applications.
We
assume
we
always
have
network.
We
assume
we
always
have
power.
B
We
assume
we
always
have
unlimited
kind
of
unlimited
cpu.
So
a
developer
would
not
assume
his
application
would
just
be
moved
from
one
node
to
the
other,
and
during
this
process
it
will
not
be
available.
This
is
not
part.
This
is
not
what
the
developer
thinks.
The
developer
has
some
assumptions
that
if
I
build
my
docker
container
whatever
then
and
it
runs,
then
it
will
always
run
because
if
there
would
always
be
a
safe
place
for
it,
but
that's
not
the
case.
B
D
The
the
epistemological
model
for
for
chaos
I
gave
I
gave
a
talk
once
about
that
that
was
the
no
known,
so
no,
no,
no,
it
was
about
cogniti
precognition
in
your
in
your
experience
of
interacting
with
an
application,
while
you're
testing
doing
exploratory
testing
it
was
like
there's
stuff,
I
don't
even
I'm
not
even
aware
of
that.
I'm
thinking
as
I
do
this
and
how
do
I
tap
into
the
stuff?
B
D
Does
that
help
us
johannes,
get
to
narrowing
down
on
kind
of
what
we
would
like
to
produce.
A
I
think
I
think
what
we
discussed
right
now
is
two
aspects.
First
of
all,
we
derive
a
model
or
framework
from
the
mind
map.
A
We
have
just
took
a
couple
of
notes
how
the
model
or
what
the
model
should
contain
and
then
based
on
that,
we
do
an
example
based
on
the
the
jvm
exhaustion
scenario,
to
validate
how
the
model
fits
in
or
how
the
or
how
the
model
fits
and
how
it
yeah
fulfills
than
the
requirements,
and
I
think
this
makes
totally
sense
because
it
summarizes
the
work
we
did
at
the
beginning
and
what
we
did
on
the
last
two
meetings
to
have
then
a
written
document
and
something
that
we
can
collaborate
and
we
can
share
with
others.
A
A
Yeah
I
have
to
leave
the
building
in
two
minutes,
but
I
really
would
like
to
define
what
we
should
work
on
the
next
two
weeks
I
mean
I
can
or
I
will.
D
I
I
can
definitely
I
would
like
to
do
a
few
more
of
these
just
just
and
then
go
back
and
find
stuff.
That's
coming
out
of
my
brain
and
go
flush
them
like
actually
take.
What's
on
the
mind
map
like
work
backwards
from
the
example,
which
is,
I
think,
the
reason
we
wanted
to
do
an
example
anyway.
D
So,
let's,
let's
go
through
what
we
think
that
process
is
and
see
if
we
can
find
our
way
back
home
to
what
the
block
model
is
and
then,
let's,
let's
throw
out
a
kind
of
a
generic
outline
for
what
we
think
the
the
the
white
paper
or
the
what
the
what
the
dot,
what
the
report
would
look
like
and
kind
of
the
different
sections
that
we
want
to
dig
into
that
that'll
kind
of
lay
out
generically
as
an
outline
where
we
can
start
writing
and
then
editing
and
tying
them
back
together,
and
I
think
our
objective
the
way
we
look
at
the
charter
that
this
that
kind
of
deliverable
fits
with
what
we
want
to
contribute
to
the
industry
right,
some,
some
pretty
in-depth
thinking
on
how
how
you
would
model
this
out.
D
So
I'm
good
with
that
and
I'm
in
the
next
two
weeks
next,
two
weeks,
I'm
pretty
good.
I
think
I
I
don't
have
any
yeah.
I
don't
have
any
customer.
Hopefully
I
don't
have
any
escalations,
but
I
don't
have
any
key
customers
on
my
plate
right
now:
cool.
A
And
you
may
ask
you
to
help
out
a
little
bit
on
writing
the
the
framework.
The
model.
B
No
worries
cool
yeah.
I
also
need
to
jump
here,
but
I
think
it
was
a
very,
very
open
discussion
today,
a
lot
a
lot
of
new
thoughts
and
I'm
really
looking
forward
to
to
bring
this
all
together
into
that.
D
And
I'll,
let
you
guys
go,
I
I
would
think
maybe,
given
that
we
only
had
a
handful
of
us
here
today
and
we
we
started
with
more
folks.
Maybe
we
find
a
way
to
get
even
an
update
of
what
we're
working
on
once.
We
have
something:
that's
a
little
more
visual
or
written
up
to
kind
of
bring
it
back
out
to
the
major
working
group
or
the
the
the
advisory
board
and
say:
hey
just
share,
share
an
update
of
what
we're
doing.
B
Iterate
iterate,
it's
not
a
company,
it's
it's
also,
a
pro
it's
an
open.
I
think
it's
also
open
source
project.
I
I.
C
B
C
They're
kind
of
like
stormforge,
if
you've
seen
those
it's
a
different
performance,
optimization
company.
B
C
Storm
forge,
I
just
saw
we
had
a
discussion
internally
and
they're
pretty
similar
to
akimas.
They,
they
try
out
different
settings.
C
Yeah,
basically,
it's
yeah
ai
ml
based
optimization
based
upon
prometheus.
D
Yeah
yeah,
no,
that's
right.
I
remember.
I
saw
white
paper
from
them
very
cool
yeah,
it's
the
other
jurgen.
I
think
the
it's
called.
They
call
it
loop
testing,
where
they're
they're,
trying
different
stuff
and
like,
like
you
say,
for
a
b
and
that
kind
of
stuff,
but
johannes,
has
to
drop
he's
gonna
get
out
of
here.