►
From YouTube: Keptn Auto-remediation Working Group - April 7, 2021
Description
Meeting notes: https://docs.google.com/document/d/1_WlLP6oLcHe0yyC7kXH2hB3i9bOPvIArp83NohE78FU/edit#
Learn more: https://keptn.sh
Get started with tutorials: https://tutorials.keptn.sh
Join us in Slack: https://slack.keptn.sh
Star us on Github: https://github.com/keptn/keptn
Follow us on Twitter: https://twitter.com/keptnProject
A
All
right,
hi,
yeah,
hi
and
welcome
everyone
to
this
to
the
next
meeting
of
this
working
group.
Let
me
just
share
my
screen.
First
to
go
over
the
agenda
items
for
today.
A
Actually,
first,
we
should
take
a
look
at
what
we
defined
as
action
items
till
today,
and
it
was
about
adding
some
thoughts
and
then
restructuring
or
repolishing
polishing
the
chart
so
that
we
have
one
document
that
we
have
in
our
mind
or
in
our
focus
and
we
are
working
towards
this
was
the
action
item
and
we
should
discuss
this
charter
finally
to
come
up
with
a
final
conclusion
and
then,
from
the
last
meeting,
we
still
had
an
open
issue
or
an
open
action
item,
which
was
then
the
success
story
that
mark
and
jung
were
working
on.
A
We
also
want
to
go
over
this
story
today
and
after
the
story,
we
we
think
we
should
discuss
here
in
the
group.
What
is
the
actual?
What's
the
next
milestone?
We
want
to
achieve
jurgen
and
I
we
talked
today
a
little
bit
about
that
and
we
think
that
it
would
make
sense
to
to
write
a
white
paper,
but
that's
just
our
opinion,
and
we
should
discuss
this
in
the
group
to
come
up
with
what
we
want
to
have
or
what
we
want
to
achieve
next.
A
All
right,
then,
let's
go
directly
to
the
first
item
on
the
on
the
agenda.
It's
the
charter.
A
And
what
I,
what
I
did
is
I
removed
or
did
some
rework?
I
did
not
change
the
wording
I
just
removed.
For
example,
in
the
first
sentence
there
was
dyna
trace
in
there,
but
we
agreed
on
not
focusing
on
on
a
vendor
or
not
on
a
yeah
on
dynadress
or
any
other
monitoring
solution
it
should.
The
outcome
of
this
working
group
should
not
be
vendor-specific
at
all,
and
the
mission
is
that
the
captain.
A
A
This
is
kind
of
the
mission
we
have
in
focus
and
based
on
input
from
the
working
group.
This
team
seeks
to
establish
a
living
set
of
automated
remediation
requirements
and
deliver
these
requirements
on
a
periodic
basis
to
the
community
and
then
what
I
did
also
is.
I
took
the
ideas
from
mark.
I
really
liked
them
because
last
time
he
said
as
a
member
of
this
working
group
we
or
you
as
a
member,
you
innovate
best
practices
for
auto
remediation.
A
B
Be
honest,
actually
maybe
it's
just
me,
but
I
don't
really
understand
the
mission
or
the
the
first
part
in
the
vision.
It's
the
the
remediation
requirements.
B
C
B
Understand
this
language,
but
the
other
language,
it's
very
hard
for
me
to
to
understand
not
sure
what
it
what
the
reputation
requirements
are.
C
Know
yeah
no,
I
I
I'll
second
that
and
I
think,
remember
in
our
last
meeting.
Our
discussion
was
around
something
that
was
like
really
specific
to
dynatrace,
product
development
or
or
product
management
and
stuff.
So
maybe
some
of
that
requirements
like
specifying
the
requirements
was
coming
from
that
perspective,
and
so
maybe
the
requirements
word,
I'm
with
you,
you're
gonna.
It
just
doesn't
sit
quite
right
if
we're
kind
of
expanding
the
idea
that
we're
not
really
requirements
of
the
community
but
proposed
automated
remediation
practices,
maybe
requirements
we
could.
C
A
Let
me
remove
that
one.
The
goal
is
establish
an
approach
that
defines
the
future
of
auto
remediation
for
both
cloud
native
environments
and
traditional
implementations.
This
will
help
series
and
devops
engineers
to
define
auto
remediation
processes
and
allow
testing
these
processes
and
instructions
to
validate
terrific
effectiveness.
A
True,
I'm
also
yeah.
D
A
And
best
practices-
that's
true
and
then
yeah.
Some
sub
calls
identify
suitable
remediation
actions,
then
prevent
issues
and
then
also
have
this
concept
of
closed
loop
remediation,
which
means
that
the
goal,
or
that
we
want
to
have
an
approach
or
a
process
process
that
provides
visibility
and
answers
to
the
to
be
valuable
for
remedy
for
the
remediation
process
itself.
D
Yeah,
I
I
personally
and
it's
just
a
personal
preference
I
think,
but
when
I
see
prevent
issues
I
don't
know
a
lot
of
times.
I
think
that's
like
marketing
material.
You
know
hey,
let's
catch
issues
before
they
happen.
Well,
it's
not
possible
right.
You
know
in
realistic
terms
we're
trying
to
minimize
impact
or
mitigate
yeah,
timely
yeah,
timely
mitigation,
more
than
actually
prevent
issues.
I
mean,
I
understand
the
concept,
but
it's
you
know
when
you
read
things
like
that,
it's
it's
great
for
c-level
executives!
Oh.
C
I
wonder
if
there's
it
isn't
one
more
thing
when
I
think
about
the
vision
as
a
member
of
the
working
group.
One
of
those
lines
above
is
the
contributing
these
best
practices
back
to
the
industry,
to
the
community
and
maybe
as
a
goal,
we're
also
there
are
aside
from
technically
inventing
auto
remediation
actions
and
processes
and
practices
and
applying
those
in
a
way
in
an
approach
or
a
a
context
for
mitigating
issues.
I,
like
the
the
idea
of
in
you,
know
working
on
closed
loop.
C
I
wonder
if
there
isn't
something
also
about
I
don't
I
don't
think
adoption
is
the
right
word,
but
it's
the
barriers
to
like,
let's
say
all
of
a
sudden,
everyone
was
inventing
their
own
ways
of
trace
ids
all
the
way
back
to
old
tea
leaf
systems.
You
know
back
in
the
day,
and
now
we
have
observability
frameworks
that
get
built
in
and
we're
tracing
stuff
through
open,
telemetry
and
away.
C
We
go
there's,
there's
something
to
the
maturity
of
the
industry
being
able
to
adopt
or
integrate
these
more
comprehensive
ideas
versus
an
off-the-shelf
product
that
you
plug
in
it's
just
for
you.
So
I'm
wondering
if
there's
something
around
the
open,
telemetry
part
where
there's
a
auto-remediation
body,
that
one
of
our
goals
could
be
to
accelerate
the
adoption
or
the
acceptance
of
both
the
you
know.
Hey
here
are
some
ways
of
doing
standard
practices
for
auto
remediation
or
good
practices,
but
also
you
know,
there's
people
averts
to
the
ai
machine
learning
stuff
around.
C
You
know:
hey.
Can
I
just
let
a
few
of
these
things?
The
system
knows
how
to
kind
of
keep
itself
running
in
certain
certain
cases,
so
there
might
be
some
pushback
or
some
resistance
to
that.
So
I
wonder
what
you
guys
think
about
one
of
the
goals
being
taking
everything
that
we're
working
on
and
also
you
know
a
goal
is
to
ease
the
adoption
or
accelerate
the
adoption
into
the
into
the
industry
as
well.
D
Yeah
I
like
that
concept
yeah,
because
I
think
everybody's
on
board.
You
know
everybody
likes
all
the
buzzwords
of
ai
and
and
in
my
own
organization
you
know
we're
very
anxious
and
we're
actually
doing
a
demo
of
a
auto
remediation
that
we
put
together
to
our
cto
this
afternoon,
and
you
know
everybody
loves
to
see
it
and
then
the
first
word
that
pops
up
is
well.
That
would
be
risky
right
if
we
did
in
production.
D
Well,
let's
put
the
brakes
on
you
know,
so
it's
you
know
to
bridge
that
cultural
adaptation
of
this
to
to
get
people
past
that
risk
acceptance,
I
think,
requires
some
pushing
and
some
good
community
acceptance
and
documentation
as
to
the
guard
rails
around
this
type
of
thing
that
says:
okay,
this!
This
is
truly
safe
and
here's.
Why
yeah?
And
so
I
think
we
need
to
publish
some
of
that-
those
guard
rails
and
stuff
too.
I
don't
know
where
that
would
fit,
but.
C
I
just
think
it's
I
I
like
how
you
said
that
it's
like
you
can
build
some
amazing
technical
things
and,
if
you're
like
well,
how
does
this
fit?
How
does
this?
How
do
I
get?
How
do
I
encourage
people
to
start
using
it?
Giving
us
feedback
start
engaging
like
okay,
there
is
risk,
but
we
also.
We
also
thought
about
the
risks
in
the
working
group.
We
also
thought
about
those
things
and
here's
some
way
of
saying
you
know
how
we
can
ease
the
adoption
or
or
improve
the
adoption
of
these
practices.
Yeah
yeah
cool.
A
Totally
valid
point:
we
should
not
not
forget
about
the
acceptance
and
the
adoption
at
the
end
because
we
can
build
or
we
can
up.
Okay,
we
can
come
up
with
a
cool
solution,
but
at
the
end
it
must
also
be
usable,
and
then
someone
has
to
to
trust
in
it
as
well.
D
Yes,
I
think
that
you
know
keeping
that
goal
in
mind
obviously
drive
some
of
your
architectural
design
behind
all
this
right.
Keep
it
in
mind.
You
know
like
the
open
telemetry.
They
made
it
too
difficult
to
implement.
Well
nobody's
gonna,
adopt
it
so
same
with
this.
You
know
we
have
to
architect
something
that
can
easily
be
adopted.
So
I
think
that's
a
key
goal
to
keep
in
mind
when
we
come
up
with
our
solution.
A
A
Okay,
any
additional
thoughts
on
the
goals.
For
now
I
mean
we
can
always
come
back
to
to
rethink
these
goals
and
to
maybe
refine
them,
as
we
make
our
way
through
the
topics
that
we
want
to
tackle
on.
C
Johannes
did
we
just
want
to
replace
the
word
prevent
with
mitigate
okay?
You
know
what
I
mean
like
it's,
it's
not
a
separate
goal,
but
it
makes
it
it'll
make
us
think
about
it
differently.
A
All
right
cool,
then
this
is
what
I
wrote
down
as
I
started
working
on
this
charter,
a
non-goal
should
be
that
we
define
a
workflow
engine
or
incident
management.
I
think
there
are
great
solutions
out
there
that
do
this
job
in
a
better
way
that
we
can
think
of,
and
I
just
define
it
as
a
non-goal,
but
when
you
think
there
are
other
non-goals,
then
please
feel
free
to
add
them
in
scope.
A
A
Then?
Maybe
we
can
define
some
standards
when
it
comes
to.
How
can
I
define
or
declare
my
remediation
action
then
define
best
practices
to
validate
these
actions,
and
the
last
in
scope
item
would
be
identify
a
real
world
use
case
or
a
story
that
mark
and
and
jurgen
we're
working
on,
so
that
we
have
something
also
more
touchable,
and
we
also
can
then
discuss
certain
use
cases
around
this
story
and
then
jointly
work
on
the
story
and
then
drive
it
forward.
B
I
think
actually,
this
one
is
the
real
world
environment
is
actually
some,
maybe
like
a
captain
user
or
maybe
a
bandages
customer.
I
think
what
we've
worked
on
was
more
a
fictional
scenario
of
how
it
could
help
a
user,
but
it
was
not
a
really
reasonable
scenario.
Yet,
okay,
it
was.
It
was
more
like
just
defining
something
that
could
happen.
B
This
bullet
point
is
more
around
really
having
some
some
captain,
user
or
or
you
know,
maybe
a
dynamite
customer
or
someone
who
really
has
the
pain,
and
we
can
solve
this
with
alternation
with
the
processes
and
concepts
that
we
are
developing
here.
A
B
A
A
All
right,
yeah,
I
think,
or
already
that
we
always
can
come
back
to
this
document
and
also
reconsider
it
when
required.
But
from
my
point
of
view,
I
think
we
are
good
enough
to
to
go
on.
A
True,
as
they
are
very
important
stakeholders
when
it
comes
to
adoption
and
acceptance,
we
should
not
forget
about
these
as
well.
A
D
C
B
You'll
be
honest,
sorry,
but
can
we
go
back
just
for
one
more
second,
since
we
now
kind
of
discussed
this,
I
thought.
Maybe
we
can
remove
draft
here
and
just
put
a
date
on
this
one
kind
of
charter
and,
let's
say
as
of
april
7th,
and
so
we
we
know,
we
kind
of
the
working
group
today
agreed
on
this
and.
B
B
Pretty
it
was
dropped
previously
and
the
the
comment
that
mark
had
is
it
already?
Can
we
just
merge
it,
and
then
it's
also
that
we
have
like
we
kind
of
agreed
on
one
on
one
version
for
today,
and
I
think
it's
just
it's.
It's
very
good.
A
C
From
my
exception,
handling,
that's
one
one
idea
around
a
developer
view,
an
sre
or
a
devops
engineer
might
say
all
right.
These
are
things
that
I
need
to
deliver
with
this
version
of
the
code.
These
are
the
appropriate,
auto,
remediations
plus,
which
ones
are
validated
which
ones
are
approved,
which
ones
do
I
need.
Can
I
do
fully
automated,
which
ones
need
oversight?
C
How
do
I,
as
a
as
a
leader
or
a
manager,
start
talking
about
that
across
the
organization?
Where
does
it?
You
know
you
might
get
into
now
roi
statements
and
things
like
that
so
stakeholders,
maybe
maybe
we
drop
stakeholders,
I'm
just
thinking
of
that
persona.
Almost
of
you
know,
management
and
leadership
will
be
able
at
a
company
to
be
say.
Okay,
I
see
what
you
guys
are
doing
this.
This
will
be
great.
It
will,
you
know,
allow
us
to
do
all
of
the
take
care
of
things
that
are
fully
approved
to
be
auto
remediated.
D
So
so
maybe
it's
because
I
work
for
a
bank
and
my
last
job
was
a
bank
as
well,
so
I've
been
in
the
financial
industry
for
a
while,
but
obviously
auditing
is,
is
huge
for
us
to
say
you
know
and
that's
one
of
the
first
questions
that
comes
up.
If
you
try
to
automate
something
you
have
to
prove,
you
know
that
it's
being
done.
Who
has
the
access
to
do
it
and
all
this?
D
So
there's
I
I
don't
know
if
every
company
is
that
way,
but
you
know
anything
that
we
develop
and-
and
we
have
this
in
our
mind-
map
because
that
came
out.
You
know
we
have
to
have
to
be
able
to
show
that
this
stuff
is
happening
behind
the
scenes
somewhere,
so
that
people
can
audit
it
for
comply,
yeah
compliance
yeah.
So
I
I
don't
know
if
that's
worthy
enough
to
mention
specifically
here
at
the
view,
but
maybe
it
is.
D
Because,
if
it's,
if
it's
not
a
view
tailored
to
them,
they're
going
to
the
developer
or
to
the
sre
and
say
gather
this
information
for
me
and
then
they're
attempting
it
to
do
it
from
their
view,
which
may
not
be
an
operational
view.
You
know
so
a
reporting
view
versus
an
operational
view.
I
think,
are
two
different
but
necessary
things.
A
A
Then
I
think
now
I'm
good
to
move
on
to
the
success
story
you
can
or
mark
someone
of
you
wants
to
share.
A
B
B
If
I
find
it,
I
can
also
hear,
but
you
already
share
it,
so
the
the
main
idea-
maybe
I
can
start-
and
I
will
hand
over
to
mark-
because
I
know
he
is
for
sure
more.
He
has
the
nice
words
for
everything
that
we
have
put
together
in
here,
and
you
can
explain
it
in
better
words
that
I
could
do,
but
I
will
just
kind
of
set
the
stage
and
what
the
idea
was
here
is.
B
So
we
already
had
very
great
discussions
in
the
last
working
groups
in
the
working
group
meetings
and
we
had
our
mindless
in
the
mind.
Map
was
sometimes
already
very
technical.
B
How
we
want
to
do
this
and
which
parts
that
are
actually
are
involved
in
in
remediation
scenarios,
even
like
a
business
remediation
and
also
technical
regulation
like
infrastructure
or
application
remediation,
then
we
have
a
couple
of
other
parts
like
testing
revelation
the
process
in
a
lot
of
different
aspects,
and
what
we
tried
to
do
here
is
mainly
but
mark
feel
free
to
correct
me
for
wrong,
but
mainly
to
to
use
these
aspects
that
we
already
identified
and
and
adding
some
value
to
it,
so
that
it
can
be
once
it's
it's
finished
and
then
also
kind
of
yeah
polished
by
some
with
some
marketing
wording.
B
It
can
be
a
very
nice
success
story.
How
captain
auto
renovation
all
the
concepts
everything
that
we
have
developed
is
actually
helping
a
fictional
customer
in
in
saving
a
lot
of
money,
since
they
do
not
expect
any
downtime
anymore
or
drastically
reduced
downtime,
let's
say
and
yeah.
We
try
to
to
put
some.
I
said
some
the
value
to
all
the
aspects
they
are
not
yet
prioritized
in
any
way.
B
It's
just
a
couple
of
phrases
and
ideas
that
could
be
added
to
this
kind
of
press
release
or
success
story
and
yeah,
but
mark
I
I
I
will
let
you
do
the
the
phrasing
and
the
presentation
of
this.
I
can
also
go
ahead,
but
I
think
you
you
can
do
it
in
the
better
better
way
than
I
could
do.
C
No,
no,
you
do
a
fine
job,
but
I'm
happy
to
I
kind
of
like
this
exercise,
because
sometimes
I
can
at
least
I
my
personality-wise.
I
can
relate
to
an
interview
like
I
do
all
the
podcasting
and
all
sorts
of
stuff
and
I'm
always
interviewing
people.
C
So
to
like,
I
was
imagining
a
fictional
interview
that
you'd
turn
into
a
column
story
or
a
press
release
with
future
kept
in
customer
x
from
from
from
acme
corporation
and-
and
I
like
the
idea
to
of
just
as
a
mental
exercise,
because
then
you
work
backwards
into
what
we're
going
to
spend
our
time
doing
over.
The
next
number
of
months
is
hey.
C
If
we
really
would
imagine
just
in
our
wildest
dreams
that
somebody
would
say
these
kinds
of
things,
these
kinds
of
statements
about
our
work,
then,
okay,
what
what
are
we
really
going
for?
Are
we
doing
the
things
that
are
going
to
keep
us
towards
this
as
just
one
of
the
ideas?
So
I
I
kind
of
changed.
C
We
had
jurgen
had
some
great
ideas
out
there
and
I
just
kind
of
reworded
them
as
if
it
were
a
narrative
interview
of
a
customer
giving
you
you
know,
quotes
for
a
press
release,
so
the
one
I
like
is
the
after.
After
the
validation,
the
the
days
of
guessworks
and
crossing
fingers
are
over
for
us,
which
doesn't
mention
anything
about
a
product
or
anything,
it's
just
about
pure
benefit.
Like
oh,
I
used
to
cross
my
fingers,
like
that's,
I'm
emotionally
nervous
to
kevin's
point
like
risk.
It's
like
wow.
I've
got
some
risk.
C
Should
I
really
do
should
I
can
I
hot
swap
this
memory?
Can
I
have?
Can
I
not
swap
some
cpus
back
in
the
old
physical
world?
We
used
to
do
that
and
it
was
nerve-racking
and
now
I'm
gonna
let
an
ai
bot
go
ahead
and
do
this
for
me,
but
you
know
to
have
somebody
there's
also
like
an
emotional
quality
to
it
that
says:
hey
we
used
to
have
guesswork
and
crossing
our
fingers.
We
don't
have
that
anymore,
because
the
remediations
have
been
validated,
we
actually
tested
them.
C
We
actually
know
that
they
work
with
this
particular
code
base,
which
is
cool,
so
that
was
one
I
just
really
liked
that
kind
of
kind
of
really
put
my
head
in
there.
I
don't
know
jurgen
if
you
had
another
favorite
one
in
there.
B
I
think
let
me
just
go
through
this
yeah.
Actually,
the
one
right
on
top
of
this
one,
all
the
everything
that
comes
to
we
already
know
it
will
work
in
production
because
we
tested
it
it's
kind
of
all.
It
relates
to
the
one
that
you
just
said,
but
we
already
tested
our
remediation
instructions.
We
finally
had
a
way
to
do
this,
because
we
are
kind
of
simulating
outages.
B
We
were
able
to
plug
in
already
our
remediation
instructions
and
also
one
big
part
is
we
were
confident
that
our
monitoring
or
observability
tool
would
actually
alert
us
if
something
goes
wrong.
I
was
just
talking
to
johannes
earlier
today
about
a
conversation
I
had
with
julius
faults,
he's
one
of
the
the
co-founders
of
promethous,
and
he
once
told
me
that
he
now
he
founded
his
own
company,
it's
called
prom
lens
and
he
does
a
lot
of
consulting
and
with
prom
lens
what
his
software
is
actually
doing.
B
He
can
analyze
all
the
promptql
queries
or
from
ql
statements
that
you
write
and
you
can
analyze
them
on
a
tree
basis
and
whatever,
and
what
he
often
finds
out
is
that
a
lot
of
the
alerting
rules
are
actually
broken.
They
will
never
give
you
a
result.
The
result
set
is
always
zero,
because
there
is
a.
There
is
some
kind
of
issue
in
the
in
the
in
the
query
itself.
B
So
it's
kind
of
it's
not
dividing
by
zero,
but
there
is
some
statement
that
will
always
break
the
whole
expression
and
it
will
never
alert
so
with
dyna
trace.
For
example,
that's
not
the
case
because
you
would
not
overwrite
the
ai
to
completely
break
it,
but
in
as
kind
of
a
outcome
of
of
our
solution
or
what
we
are.
What
we
are
proposing
here
is
that
you
can
validate
your
alerting.
You
can
validate
your
other
radiation
instructions
and
you
will
validate
this
not
in
production,
and
this
is
basically
those
three
lines.
C
Yeah
yeah
and
then
down
more
jurgen
had
written,
really
kind
of
describing
like
the
automated
process.
That
happens,
and
maybe
one
of
the
auto
remediation
actions
about
you
know,
restored
our
initial
landing
page
and
kept
the
instagram
income
stream
high,
and
then
I
threw
in
a
fake
quote
of
you
know
I
like
to
go
fishing
on
the
weekend.
So
that's
pretty
awesome.
I
thought
that
was
that
was
fun
that
one's
my
favorite,
you
just
you
make
it
a
little
relatable
right.
C
I
mean
it's
like
what
are
we
really
putting
all
this
whiz-bang
fantastic
technology
for
and
if
it's
keeping
the
income
up?
That's
great,
because
I'm
going
fishing
on
the
weekend,
but
the
last
one
the
last
one
goes
to.
I
there's
a
lot
of
things
in
automated
testing.
Automated
everything
where
you
know
we're
like.
I
we're
gonna
try
to
automate
the
most
complex
parts
of
our
jobs
so
that
we
can
dig
into
more
complex
stuff
and
it's
it's
kind
of
the
opposite
approach.
Where
really
as
humans,
we
probably
would
benefit
more.
C
If
we
spend
our
minds
the
other
way
around
right,
I
mean
we
should
be.
We
should
be
digging
in
the
more
complex
stuff
and
the
validated
auto
remediation.
Be
the
same.
So
I
the
last
sentence
there,
the
paragraph
gets
to
kind
of
this
same
old
adage
of
hey.
C
If
you
get
automated
testers,
that
means
I
can
get
rid
of
a
bunch
of
the
the
manual
testers
and
if
you
buy
something
that
is
like
a
framework
for
development,
then
I
don't
need
as
many
developers,
because
they
don't
have
to
write
all
the
code
from
scratch
and
there's
something
in
the
balance
of
you
know
how
companies
run
that
are
like
hey.
If
I
can
automate
this,
I
don't
need
people
truth
be
told.
If
you
automate
this,
you
still
need
some
of
those
people
and
they're
going
to
do
slightly
more
digger,
deeper,
digging
and
stuff.
C
So
that's
something.
There's
there's
some
history,
at
least
for
me
in
the
automated
testing
world,
where
that
this
is.
This
is
a
good
thing
to
state.
When
anyone
says
I'm
auto
anything,
I
hit
the
autopilot
and
then
we've
got
the
boeing
737s.
Unfortunately,
you
know
so
hitting
the
automate
button
doesn't
mean
you
don't
have
a
pilot,
and
even
if
you
do
have
a
pilot,
you
could
still
have
things
whacked
out.
So
there's
some
interesting
things
to
think
about
in
that
last
paragraph
too.
C
Otherwise
I
I
loved
kind
of
spending
some
time
writing
on
this
and
it
I
don't
know
it's
cool.
I
don't
know
what
you
guys
think.
D
Yeah,
I
got
one
that
I'd
like
to
add
to
this.
You
know
that
could
add
value,
I
think
so
when
we
talk
about
slos,
you
know
we're
talking
about
error
budgets
right
and
if
you
hit
your
air
budget,
you
have
to
stop.
You
know
if
you
follow
the
the
true
mantra
of
it.
So
by
instituting
auto
remediation,
you
get
to
increase
your
features
which
brings
value
to
your
company,
so
you
are
automatically
increasing
yeah
the
amount
of
work
you
can
spend
improving
your
product.
C
A
B
B
Cool
yeah,
very
good
idea,
anyone
else,
some
input
that
we
should
add
here.
I
really
like
the
idea
of
mark
to
having
this
kind
of
answers
to
the
questions,
so
we
can
just
throw
in
some
answers
or
quotes
that
we
want
to
kind
of
collect
here.
It
can
even
be
something
like
I
like
to
go
fishing
on
the
weekend.
A
I
mean
we're
on
the
top
two.
We
have
a
couple
of
of
items
left,
but
we
we
don't
have
to
go
through
all
of
them.
I
think
we
picked
the
highlights
and
yeah
really
cool
what
you
came
up
with
and
would
be
awesome
when
we
can
achieve
that.
I
mean
when
we
show
at
the
end
someone
our
outcome
that
also
implements
it
in
the
company
and
finally,
we
get
quotes
like
that.
This
would
be
really
really
awesome.
C
Johann
is
just
one
more
thing:
if
you
scroll
down
in
that
the
I
we
put
some
other
ideas
for
what
these
things
could
be
now,
if
other
other
stories,
they
wouldn't
necessarily
have
to
be
press
releases.
You
know
it's
actually
more
like
just
ideas
for
other
types
of
this
fictional
writing.
That
might
be
helpful
and
one
of
them
was
there's
a
the
guns.
They've
stopped
is
a
line
from
star
wars.
It's
actually
episode.
Four,
I
think
right.
C
Sorry,
it
would
be
the
new
hope
right.
It's
right.
When
they're,
when
they're
in
I
gotta
see
I
gotta
repair
that
that's
episode,
four,
but
it's
a
the
classic
infrastructure
person
who's.
You
know
constantly
getting
crazy
alerts
and
stuff,
and
suddenly
I
I'm
just
getting
information
that
things
are
being
auto
remediated,
it's
not
really
an
alert,
but
it's
like
the
alerts.
They've
stopped.
I
thought
that
could
be
the
and
you're
going
to
remember.
We
talked
a
little
bit
about.
C
We
could
do
like
a
a
film
or
a
short
short
movie
or
something
which
could
be
fun
of
having
you
know
visiting
somebody
who
is
you
know
they.
They
haven't
turned
on
the
button
to
turn
on
auto
remediation,
and
so
everything
is
just
chaos.
Everything
is
whatever
we
that
could
be
fictionally
really
funny
and
suddenly
you
hit
the
button
and
everything
goes
quiet
and
then
all
of
a
sudden,
one
of
the
all
the
lights
start
going.
Green,
green,
green,
green,
green,
green,
green
green.
D
It's
it's
funny
that,
on
that
topic
we
would
often
have
in
my
previous
employer.
We
would
also
often
have
people
contact
us
and
say
is
monitoring
down,
because
the
alerts
have
stopped.
D
Yeah,
it's
it's
the
fomo.
If
you're
missing
out,
you
know
so
many
people
feel
like
that.
They
need
that
information.
It's
it's!
It's
a
hard
cultural
shift
to
tell
people
you
need
to
change,
and
you
don't
really
need
that
information.
D
C
I'd
say
the
the
other
thing
that
we've
seen
recently
with
all
of
the
remote
working
from
home
through
a
slack
or
teams,
or
you
know,
the
collaboration
chat
is
all
of
the
alerting
started
out
really
being
on
a
single
channel,
and
then
you
quickly
blow
out
the
capacity
for
a
human
to
keep
track
of
the
barrage
of
an
automated
channel.
So
the
the
escalation
and
communication
of
what's
happening
with
just
alerts
and
then
you've
got
pagerduty
flooding,
something
you
got
other
tools
flooding.
C
It
became
this
different
sections
of
the
architecture
have
different
channels
dedicated
to
them.
So
it
almost
is
like
when
you
see
the
unread
messages
show
up
in
the
slack
channels.
You'll
see
you
know,
fro,
let's
say:
frontend
server
services,
layer,
database
layer.
If
all
of
a
sudden,
it's
like
unread
messages,
go
you're
like
okay,
that's
that's
just
throwing
alerts
into
a
slack,
but
I'm
seeing
it
as
a
flow
of
where
things
blew
up
and
blowing
that.
But
that's
might
be
something
that
we
consider
in
the
in
the
other.
C
A
All
right
and
think
I
think
this
was
not
a
good
hand
over
to
the
last
bullet
point
for
today
on
the
agenda,
which
is
the
topic.
What
is
our
next
milestone
that
we
want
to
achieve?
I
think
we
should
discuss
this
here
in
an
open
discussion
should
not
be
predict
defined
by
someone.
We
should
think
about
what
we
want
to
achieve.
Next,
let
me
just
give
you
a
quick
recap.
In
the
first
meeting
we
did.
A
We
started
with
the
mind
map,
but
we
put
quite
a
lot
of
ideas
into
the
mind
map
to
structure
our
thoughts
and
also
to
start
discussing
the
topic
we
continued
on
that
one.
In
the
second
meeting
in
the
last
meeting,
I
showed
you
how
captain
is
doing
remediation
and
we
also
started
talking
about
the
charter
as
we
continue
today
and
the
success
story
we
also
have
now.
A
We
took
a
look
at
it
to
understand
all
to
have
this
company
in
mind
that
we
want
to
help,
and
we
want
to
make
sure
successful
what
is
also
helpful
for
doing
booking
backboards,
but
now
comes
the
question:
how
should
we
go
on?
How
should
we
proceed
here
in
this
group
yoga
and
I
we
we
talked
today
a
little
bit
about
this
topic
and
we
think
it
makes
sense
to
or
it
would
make
sense
to
start
thinking
about.
Writing
a
white
paper
yeah.
A
Well,
we
have
to
define
the
outline
first
and
then,
when
we
have
outline
and
structure
of
how
this
paper
can
look
like,
we
can
then
kind
of
distribute
the
chapters
so
that
then
the
people
can
collect
or
or
can
do
some
research
in
the
topic
they
are
responsible
for
and
then
contribute
back
the
the
knowledge
that
they
have
yeah
collected
into
the
paper.
C
C
Let's
say
what
are:
how
would
you
auto
remediate
by
workload
so
standard
web
services,
different
kinds
of
of
servers
or
workloads,
data
workloads,
data,
warehouse,
workloads
different
than
online,
so
maybe
by
focus
area
you
don't
it
could
be
chairs.
It
doesn't
have
to
be
that
formal,
but
it's
starting
to
get
our
own
mind
maps
one
level
deeper.
C
I
think
we
should
my
my
thought
would
be.
We
do
some
work
there
to
get
those
things
brewing
and
then
then
think
about
hey.
We
have
a
white
paper
based
on
those
ideas
and
some
of
those
remediation
actions
what's
possible
with
also
certain
technologies
are
easy
to
do
some
kind
of
automated
change
to
remediate
other
other
technologies
not
like.
If
I'm
gonna.
Oh,
I
need
to
partition
a
database
table
okay
in
in
a
no-sql
world.
C
It's
that
might
be
very
different
than
doing
that
with
a
big
old
oracle
database
or
something
so
maybe
we
we
do
need
to
to
I'd
rather
take
some
time
to
dig
into
those
focus
areas
and
start
getting
our
hands
a
little
dirty
and
what
are
all
the
different
things
we
would
potentially
automate
for
remediation
in
each
of
those
different
areas
and
and
then
bring
that
bring
that
brainstorming
back
into
some
kind
of
paper
would
be
cool.
That's
my
thought.
I.
D
Would
I
I
would
personally
like
to
maybe
walk
through
a
specific
focus
area,
so
I
think,
as
you
branch
out,
your
different
focus
areas
you're
going
to
have
many
things
in
common,
and
so
maybe
if
we
started
more
with
like
a
poc
of
of
say
so,
the
the
auto
remediation
we're
showing
today
is
a
jvm,
exhausted
memory,
and
so
we're
just
going
to
detect
it
and
restart
it.
You
know
automatically
so
through
some
ansible.
D
And
then
you
know
that
that
would
kind
of
be
a
base
template
that
we
would
work
from
and
then
you
would
hit
your
focus
areas
of
okay.
Well,
this
one's
a
little
different
and
this
one's
a
little.
So
even
with
the
jvm
restart,
you
know
we
started
talking
about
well,
websphere
is
going
to
be
different
than
tomcat,
which
is
going
to
be
different
than
just
a
standalone
jvm.
D
So
you
know,
as
you
start
digging
through
the
weeds
thing,
those
differences
start
to
creep
out,
but
you
know
there's
a
lot
of
base
similarities
of
you
know.
How
did
you
even
know
this?
This
was
a
problem.
Where
did
you
get
that
data
from
you
know?
D
C
B
C
A
Also,
a
very
valid
point,
and
a
very
good
idea
to
have
a
poc
kind
of
setup
where
we
as
a
team,
define
how
it
should
look
like
for
this
particular
use
case
for
this
particular
scenario
like
a
jvm
restart
and
then
when
we
are
confident
and
think
that
the
way
it
should
be,
then
we
can
start
to
to
add
more
content
and
more
topics.
On
top
of
that,
like
that,
one
as
well
yep.
C
This
is
what
we
think
the
a
template
for
an
auto
remediation
at
multiple
levels
or
multiple
tries,
or
you
know,
level
one
level,
two
level
three
what's
validated
what's
not
validated
can
I
am
I
approved
to
do
a
restart
yes,
but
you're
not
approved
to
change
the
memory
configuration
okay,
so
we're
really
walking
through
what
that
model
looks
like
as
a
just
pick
one
as
a
team
as
a
group
and
go
through
like
a
jvm
memory
issue.
B
A
Okay,
then,
we
have
just
a
couple
of
more
a
couple
of
minutes
left,
but
how
can
we
prepare
the
next
meeting
for
getting
started
on
on
this
pc.
C
Well,
I
think
we
could,
let's
create
a
doc
on
jvm
memory,
exhaustion
and
then
just
start
dumping
ideas
on
all
right.
What
are
the
different
ways
we
detect
it?
What
are
the
different
ways?
We
would
validate
it?
What
are
the
different
ways?
What
should
we
validate
for
the
process
of
conducting
the
remediation
like?
C
What
are
the
steps
that
you
would
look
for
in
a
jvm,
a
healthy,
jvm,
restart
and
start
laying
that
out
as
a
process
could
be
a
process
flow,
but
also
just
hey
different
ideas
could
could
relate
it
back
to
an
slo
to
validate
the
remediation
success
or
not
got
you
know
the
gotchas
and
stuff,
but
I
think
we
just
need
to
let's
start
collaborating
on
a
doc
and
sharing
some
ideas.
C
And
also
yeah
the
indicators
right
when,
when
can
I
see
sort
of
virtual
memory
exceeding
physical
memory
allocation
above
and
beyond
the
current,
like
there's
things,
you
can
see
sort
of
pre-issue
that
to
forecast
something
there's
all
sorts
of
interesting
things
that
we
we
as
professionals
know
how
to
look
for
that
stuff,
but
we're
gonna
try
to
brain
dump
it
into
into
this.
This
proof
of
concept.
For
sure
I
like
that,
thank
you,
kevin.
That's
good!
That's
cool!.
A
All
right
there
we
have,
let's
go
use
case.
B
C
Just
as
a
side
note,
obviously
we
have
different
performance
engineers
different
engineers
that
we
all
know
who
have
specialties
like
I
I
know
who
to
go
to
if
I've
got
oracle
performance
issues,
I
can
go
to
the
academic
book
guys.
I
can
go
to
consultants
and
there's
people
that
we
may
want
to
pull
in
as
we
develop
these
things
and
make
them
sort
of
anointed
reviewers,
like
you
would
review.
When
you
write
a
book,
you
have
reviewers,
so
these
are
sort
of
reviewed
by.
C
Maybe
some
industry
names
that
people
might
recognize,
depending
on
the
technology
that
we're
working
with,
could
be
entire
companies,
as
we
build
a
framework
that
they
can
adopt
and
say:
oh
we're,
gonna.
We
do
auto
remediation,
but
now
we
can
talk
to
captain
and
other
other
tools
for
open,
auto
remediation.
C
But
that's
as
we
get
into
this
template
and
thinking
about
it's
like
okay,
now,
I'm
at
a
point
where
that's
really
super
detailed
memory
configuration
within
the
jbm,
maybe
we
can
get
a
second
opinion
from
an
external
source
to
say:
hey,
you
know
the
person
that
invented
the
g1
garbage
collector
hey.
Would
you
like
to
review
this
idea,
but
that
might
be
this
just
as
a
tangential
idea
as
we
move
forward.
You
know
in
terms
of
focus
down
being
focused
on
the
industry
and
and
adoption,
and
things
be
like
oh
yeah.
A
A
But
then,
let's
define
this
as
an
action
item
that
we,
for
we
add
our
thoughts
and
ideas
for
this
use
case
into
this
document.
It's
kind
of
a
brain
stump
brain
dump
again,
but
still
it
should
be
focused
on
on
the
problem
that
the
reason
jvm
memory
exhaustion
and
we
want
to
get
that
one
remediated,
maybe
just
all
think
of
how
this
also
could
fit
in
our
success
story.
With
this
company
with
the
ace
me
ace,
acme,
acme
company
corporation.
D
A
We
can
add
an
additional
quote
here
that
says
that
this
company
can
save
x
millions
of
dollars
because
we
can
fix
their
jvm
problem.
Also
think
about
that.
But
let's
continue
working
on
on
the
use
case.
A
Down
here
yep,
it
was
here
in
the
second
meeting.
I
have
a
link
to
the
mind
map,
and
here
you
find
all
the
the
initial
thoughts
that
we
have.
A
C
Yeah,
just
as
a
one
of
the
things
I
think
we
did
brainstorm
and
for
our
our
scenario,
our
our
first
template
or
walk
through
the
proof
of
concept,
one
of
the
things
was
being
able
to
publish
auto
remediation
rules,
actions
etc,
like
a
plug-in.
So
if
I'm
vendor
x,
with
some
cool
new
thing,
I
could
say
just
like,
I
would
say:
hey
here's.
We
we
can
talk
open
telemetry.
So
if
you
install
our
stuff
we'll
cooperate
with
trace
ids
ever
you
know,
just
like
anything
else
in
your
entire
ecosystem.
C
Same
thing
for
auto
remediation.
If
I
get
new
vendor
x
component-
and
I
put
it
in
it-
comes
with
hey
here's,
the
auto
remediation
things
that
can
talk
to
whatever
auto
auto
remediation
framework,
you're
using
it
for
us,
it
would
be
kept
in,
but
the
idea
being
that
plug-in
ecosystem
is
something
we
might
also
talk
about
as
we
work
to
through
the
proof
of
concept
but
get
out
the
other
end
and
say
all
right.
What
does
the
process
really
look
like?
C
What,
if
this
is
let's
say,
tomcat
got
on
board
and
said:
hey
we're
gonna
build
auto
remedia.
How
would
you
publish
this
as
a
plug-in,
yeah
and
now,
of
course,
then
you're
building
a
marketplace
there's
a
whole
other
thing:
dude
I
charge
an
extra
ten
dollars
or
you
know
an
extra
extra
ten
10.
You
can
buy
the
plug-ins
for
auto
remediation
or
something
but
that's
a
whole
other
ball
game.
A
All
right,
cool,
okay,
great
thanks
yep,
like
always
cool
meeting
great
talking
to
you
and
let's
jointly
work
on
this
use
case
on
this
poc
and
then
next
time
we
see
what
we
came
up
with
and
yeah.
Then.