►
From YouTube: Learning from Incidents John Allspaw (Adaptive Capacity Labs) 2020 07 10 OpenShift Commons Briefing
Description
Learning from Incidents
John Allspaw (Adaptive Capacity Labs)
Andrew Clay Shafer and Diane Mueller (Red Hat)
OpenShift Commons Briefing
July 10 2020
A
All
right,
everybody
welcome
back
to
another
open
ship
Commons
briefing
with
the
the
good
folks
from
the
GTO
office
drew
clay
Schafer
here
with
us
and
John
Alpha
from
adaptive
capacity,
labs
and
other
incantations
of
himself,
and
today
we're
going
to
talk
about
learning
from
incident
which
it's
against
its
incidence.
Okay,.
A
B
So
I've
talked
to
you
before
and
I'm.
Not
sure
I
want
to
talk
too
much
about
myself,
but
I
will
talk
about
myself
a
little
bit
to
introduce
John.
So
the
the
thoughts
that
I
have
you
know
around
some
of
the
things
regarding
DevOps
and
operations
were
definitely
influenced
by
this
man.
John
all
spa
and
the
way
that
you.
C
B
Got
to
be
part
of
some,
some
very
will
call
him
generative
projects
and
and
gave
a
talk
that
I
would
say
essentially
gave
DevOps
the
the
movement
its
name.
So
there's
this
famous
talk
from
velocity
conference,
where
John
all
spa
and
Paul
Hammond
talk
about
Devon,
ops,
cooperation,
that
flicker
and
that
that,
like
chained
into
a
bunch
of
other
things,
led
to
a
bunch
of
conversation
about
DevOps.
He
is
a
big
part
of
velocity
conference.
B
C
Yeah,
that's
that's!
That's!
That's
a
great
intro
I
would
just
say
that
I've
learned
as
much
from
you
Andrew,
as
as
you
might
have
learned
from
me,
yeah
that
that's
that's
about
right
and
really
at
the
highest
level
is,
on
my
mind
and
my
colleagues
mind
are
introducing
new
ways
of
looking
at
how
work
gets
done
and
one
of
the
most
effective
ways
of
looking
at
how
work
gets
done
can
be
can
be
seen
by
looking
closely
at
incidents.
B
C
C
C
C
A
Weeks
ago,
cat
swittel
I
was
on
talking
him
on
another
one
of
these
sessions
and
she
brought
up
the
a
slide
that
had
a
picture
of
like
in
a
factory
floor
of
you
know
zero
incident.
So
in
the
past
you
know
365
days
or
any
event,
and
basically
the
anecdote
she
was
telling
was
whenever
she
saw
something
like
that.
It
panicked
her
a
bit
because
that
meant
they
weren't
watching
for
something
or
they
were
missing.
A
B
Indeed,
I
don't
know
where
you
want
to
go
with
the
John,
but
keying
off
that
and
this
notion
of
what
considered
an
incident
is
also,
in
some
cases
a
question
of
blame
right.
So
so
like
or
attribution
causation,
so
I
know
you
have
lots
of
thoughts
on
this,
so
maybe
you
could
give
us
a
little
a
little
monologue
about
about
some
of
these
things.
Yeah.
C
Yeah
yeah
well,
first,
that
I
think
I
actually
I
actually
like
what
Diane
had
brought
up
and
so
I'll
risk.
From
from
that
vantage
point,
so
you
know
like
a
measurement
for
a
lot
of
what
we
do
is
bringing
new
perspectives
to
understanding
what
makes
work
hard
and
what
makes
people
good
at
it
and
what
makes
them
what
could
potentially
either
support
or
hinder
their
ability
to
do
work.
C
The
majority
of
the
techniques
and
and
perspectives
come
from
sometimes
called
safety,
critical
domains
like
power
plants
and
and
medicine,
and
military
and
transportation
all
that
sort.
So
we
have
to
remember
that
categorizing
remember
even
declaring
a
thing,
a
thing
that
happened
in
event
as
an
incident
right,
labeling
it
as
an
incident
is
itself
a
categorization.
C
The
notion
that
that
a
that
there's
really
only
two
sounds
cartoonish,
or
at
least
I
hope
it
sounds
cartoon,
there's
still
a
lot
of
a
lot
of
the
the
viewers
here,
but
quite
often
you'll
hear
okay
in
the
wake
of
a
an
actually,
you
say:
okay!
Well,
is
this
a
result
of
human
error
or
technical
failure
for
whatever
reason
the
journalists
one
just
one
of
those
two
categories-
the
that
frame
that
what
makes
an
incident?
C
What
stems
from
this
is
exactly
what
you
what
you
just
mentioned,
Andrew,
which
is
which
is
blame.
Blame,
certainly
gets
a
lot
of
attention
because
it's
sort
of
palpable
it's
a
it's
telling
the
story
of
human
error
or
making
it
about
the
individual
attributions
of
a
particular
person.
This
is
the
you
know
the
the
root
cause
with
Stephen
right
or
something
along
those
lines
is
really
just
a
special
version.
C
Otherwise
it
wouldn't
be
a
surprise
in
some
way
came
out
of
nowhere
right,
but
to
admit
that
those
things
in
the
future
are
possible
and,
and
the
sort
of
ever-present
dread
that
they
can't
all
be
anticipated
means
that
we
have
to
put
this
sort
of
fear.
This
general,
like,
oh,
my
god,
how
good
about
the
future,
even
if
it's
a
lie,
I'd
rather
feel
good
right,
and
so,
where
can
I
place
this?
This
mission.
C
C
It
is
the
way,
so
we
need
put
me
to
put
it
in
a
box
notice,
I
didn't
say,
container
put
it
in
a
box
and
sometimes
that
box
is
embodied
in
a
person.
Sometimes
it's
in
a
really
big,
vague.
It's
in
the
system,
man
or
sometimes
it's
in
our
cloud
vendor
or
you
know,
as
long
as
there's
a
if
there's
a
place
to
put
the
uncertainty,
the
the
underpinning.
This
is
developing
an
understanding
of
the
incident,
so
you.
A
B
So,
where
you
stop
looking
I
want
to
make
one
quick
comment
that
I
think
might
help
the
listeners,
which
is
John
and
I,
have
spent
hours
and
hours
talking
about
some
of
these
things
over
the
last
10
years
and
and
I.
Think
that
there's
it's
in
our
best
interest
to
articulate
that
these
quote
unquote
systems
are
neither
human
nor
technical.
It's
socio-technical
both
of
those
things
together
and
then
also
add
and
I
think
this
is
relevant
to
the
openshift
community.
C
B
C
C
One
of
the
things
that
that
I
that
have
come
to
understand
in
a
really
deep
way
is
something
that's
quite
unintuitive,
which
is
that,
which
is
that
success.
Is
you
know,
understanding
how
people
are
just
plain
doing
their
work?
It
can
also
be
a
significant
source
of
innovation.
Right,
you
can
think
of
many
products
in
the
world.
Very
you
know
very
successful
businesses
that
that
turned
a
profit
earned
a
what
was
otherwise
a
workaround
in
a
previous
product
into
a
significant
and
really
groundbreaking
service,
I.
Think
of
CDNs.
C
It's
a
great
example
of
that,
and
so,
but
the
difficulty-
and
this
is
the
difficulty
with
the
field
of
resilience
engineering
is-
is
that
you
have
to
you
know
I
can't
just
say
all
right,
everybody
at
the
end
of
the
day,
let's
get
together
and
let's
talk
about
all
of
the
ways
that
the
site
could
have
gone
down,
but
didn't
right
because
there's
not
enough
time.
The
next
we'd
be
there
too,
the
next
day,
and
so,
and
this
reflects
in
the
same
thing
of
safety
right.
The
denominator
in
cats
in
cats
slide.
C
It's
a
great
example
in
the
world
of
safety.
Where
you
see
those
signs
and
those
signs
actually
douchey
talking,
you
are-
are
in
a
number
of
two
places
notice
the
denominators
missing,
which
is
one.
It
takes
four
for
an
account
that
an
incident
is
all
incidents
are
the
same.
It
also
doesn't
count
how
many
incidents
were
prevented.
C
It
only
shows
the
ones
that
were
there
and
Erika
Naugle
has
said
that
when
you
start
measuring
things,
hoggle
being
sort
of
pioneer
of
resilience
engineering,
when
you
start
measuring
when
you
start
measuring
things
by
what
is
not
there,
you
run
into
some
difficulties
you
can.
You
can
certainly
prevent
a
lot
of
scores
on
goal,
but
if
you're
not
scoring,
then
it
might
not
end
as
well
the
way
the
way
you
think,
but
again.
C
Okay,
yeah,
improvised
I,
would
say
that
resilience.
Engineering
is
a
study
of
both
currently
is
the
study
of
it's,
that
is
to
say,
adaptive
capacity,
investments
in
adaptive,
capacity
playing
out
in
real-world
situations,
standing
on,
grounded
and
concrete
empirical
evidence,
resilience
engineering,
the
engineering
of
resilience,
but
stands
on
understanding.
What
resilience
looks
like
to
begin
with?
C
Absolutely
right,
and-
and
that
is
the
thing
that
was
fascinating-
the
reason
why
I
was
able
to
you
know
when
I
became
first
interested
in
this
did
my
master's
degree
and
started
and
continuing
reading
when
I.
You
know,
when
I
contacted
the
heavies
in
that
field.
You
know
it
was.
It
was
a
Richard
cook
and
Dave
woods
and
Sidney,
Decker
and
and
and
Steve
shock
and
others
I.
So.
B
C
Sure
sure
so
so
so
I
worked
in
a
photo
sharing
website
called
Flickr.
As
you
mentioned,
we
were,
we
were
acquired
by
Yahoo,
but
for
the
most
part
we
were
sort
of
our
own
standalone
sort
of
entity
and
we
grew
in
ridiculous
ways.
I
mean
in
like
cartoonishly
batma
sphere.
Like
stratosphere
ways,
we
went
from
being
like
the
25th
most
trafficked
property
at
Yahoo
to
the
fifth
most
trafficked,
like
behind,
like
the
front
page
and
Mail
right
and
in
in,
like
18
months,
the
complexity
of
the
back
end
of
the
website.
C
All
of
the
things
that
made
things
work
its
kind
of
exploded
and
at
some
point
you
know,
I
had
a
team
of
six
infrastructure
engineers
and
at
some
point
we
had
some
big
outages
and
some
pretty
significant
outages,
but
I
couldn't
get
over
the
fact
that
on
paper
we
should
have
had
way
more
and
I
couldn't
understand.
What's
that
about,
and
actually
some
of
these
you
know
having
been
just
reported
responding
some
of
the
incidents
after
you
know,
you
work
out
the
incident
and
all
incidents
can
be
really
harrowing.
C
Okay,
what's
what's
going
on
here,
either
I'm,
incredibly
good
at
hiring
and
then
like
being
able
to
do
this
work
is
sort
of
innate
you're
born
with
it
or
something
or
whatever.
I
just
happened.
Just
you
know
strike
bold
with
the
people
on
my
team,
or
they
were
certainly
pretty
good
and
I'm
an
amazing
manager
and
that's
what
like,
so
both
of
those
are
completely
unbelievable.
C
Certainly
the
latter
one
would
would
have
been
existentially
difficult
to
accept
because
I
had
have
no
idea.
What
I
would
what
I
did
to
make
that
so
I
started
to
under
two
into
like
what
makes
what
makes
what
what
are?
What
underpins
people's
ability
to
solve
a
problem,
not
just
a
solve
a
problem
but
solve
a
problem
under
time
pressure
where
any
of
the
actions
you're
taking
could
could
very
well
make
things
worse
right
and
represent
in
in
some.
C
In
definitely
cautionary
tales,
an
existential
business,
you
know
situation
and
that's
what
led
me
to
human
factors
and
what
I
understood
about
kuna
factors
is
this.
You
know
fields,
you
know
most
of
us
understand.
Ergonomics
ergonomics
is
quite
often
seen
to
be
a
sort
of
specialized
for
subset
of
the
field
or,
if
you're
in
Britain
you'd
say
that
was
the
field
and
human
factors
is
a
subset,
but
the
the
fact
of
the
matter
is
where
technology
work
and
people
you
know
happen
is.
Is
this
field?
What
I?
What
I
realized
was?
C
Something
happened
in
about
70s
and
a
part
of
human
factors.
Traditional
human
factors
started
to
undergo
a
sort
of
a
again
an
existential
sort
of
wait,
a
minute.
We
don't.
Maybe
we
we
actually
don't
understand
this
stuff.
We
think
Three,
Mile
Island
was
the
point
the
whole
the
whole
planet
that
was
doing
human
factors.
Work
was
like
holy
crap.
No,
actually,
you
can't
design
a
since.
You
know
an
operations
room
without
taking
into
account
the
cognitive
work,
not
just
like
plain
old.
C
Can
you
see
the
dials
and
all
that
sort
of
stuff
and
cognitive
systems,
engineering
sort
of
was
born
and
it's
a
very
I,
wouldn't
call
it
a
splinter,
but
certainly
it's
a
it's
a
field
in
and
of
itself
Don
Norman
Dave
woods.
These
are.
These
are
folks
that
were
almost
entirely
came
from
nuclear
research
in
nuclear
power
plants,
but
then
went
on
and
to
this
day,
even
though
resilience
engineering
as
a
field
resilience
engineering
is
a
pretty
broad
field
because
it's
not
a
law.
It's
it's
entirely.
There
are
sociologists.
C
There
are
operations,
research,
there,
Center
decisions,
there's
lots
of
people
a
core
part
at
this
juncture.
Is
cognitive
systems
engineering,
it's
not
all
of
what
we
represents:
resilience,
a
Giri,
but
certainly
a
core
part
of
it.
Much
like
you
know,
statistics
is
a
bit
of
a
part
of
computer
science
or
you
know
mathematics.
So
these
things
sort
of
interrelate,
that's
a
little
bit
of
my.
You
know
background
of
how
I
got
there
and
I'm
still
learning.
So
that's
that's!
That's
the
gist.
What
I?
C
The
final
thing
I'll
say
is
that
the
it
is
much
more
rewarding
and
the
thing
that
I
am
excited
about
is
that
you
know
much
like
continuous.
You
know
delivery
continuous
deployment,
then
no
all
of
the
things
that
we
associated
with
that
things
that
enable
it
the
rationale
for
even
thinking
about
the
thinking
about
it.
In
those
sense,
there
was
nothing
that
you
know
there
was
nothing
special
about
that
2008-2009
timeframe,
like
all
of
those
ingredients
had
been
set
up.
C
Like
you
know,
it's
like
one
of
those
things
when
you
look
at
like
oh
yeah,
it
seems
so
obvious
in
hindsight,
and
it
was
pretty
straightforward.
You
know
small
and
frequent
changes
for
these
for
this
reasoning,
and
you
need
these
to
do
this
sort
of
straight
work,
but
it
is
a
perspective
shift.
I
mean
you
think.
My
guess
is
that
both
of
you
were
were
there
to
sort
of
see
this
perspective
shift
light
bulbs
go
on.
A
So
how
does
when,
when
you
talk
about
resilience,
engineering
and
cognitive
systems,
engineering,
can
we
talk
a
little
bit
about
the
work
that
how
you
applied
that,
maybe
not
at
yahoo
but
after
woods
and
stuff
and
teased
that
a
little
bit
yeah,
because
the
thing
that
actually
sprung
into
my
mind
was
how
we
tried
almost
to
automate
that
in
software
with
things
like
chaos,
engineer,
chaos,
monkey
and
things
like
that
like
which
doesn't
take
into
consideration
the
human
factor
at
all?
It's
just
like,
but
well
it
tries
to
simulate
it,
but
there's
no
hue.
A
C
C
C
Well,
it
said
the
first
real
sort
of
application
was
in
my
master's
thesis
would
just
understand
what
what
rules
of
thumb
or
heuristics
engineers
use
when
trying
to
resolve
and
understand
and
respond
to
outages,
especially
when
signals,
as
we
know,
can
be
disparate,
sometimes
contradicting
sometimes
not
make
much
sense
in
in
when
faced
with
an
entire.
You
know
C
or
almost
infinite
number
of
places
to
look.
You
have
to
look.
You
have
to
you
start
looking
somewhere.
What
leave
you
look
in
some
places
rather
than
other
places,
and
so
this
is
a
this.
C
C
Of
things
that
make
up,
what's
known
as
cognitive
task
analysis,
cognitive
task
analysis
is
more
or
less
the
formalized
method,
with
related
cognitive
work,
analysis,
CTA
and
CWA,
and
all
of
these
tips
and
tricks
that
go
into
that
is
the
application
of
cognitive
systems.
Engineering
you
can
think
of
those
are
the
tools
to
understand
how
people
understand
and
how
people
wrestle
with
both
cooperatively
in
teams
and
also
individually
problems
that
they're
that
they're
facing
problems
that
they're
anticipating
and
what
those
problems
in
anticipation
or
in
in
responding
to
what
they
mean.
C
B
Add
there's
a
tendency
in
all
of
these
practices,
especially
when
you're
kind
of
outside
of
the
the
core
conversations
to
focus
on
the
the
tools,
because
you
see
it
as
a
concrete
representation
of
what's
happening,
but
in
my
mental
model
and
the
conversations
I've
had
with
some
of
the
people
you
just
mentioned.
I
feel
like
the
core
chaos
engineering
community-
and
you
know
the
stuff
we're
talking
about
with
cognitive
engineering.
Resilience.
Engineering
like
those
are
essentially
inseparable
in
my
in
my
head,
yeah.
C
Absolutely
and
what's
exciting
about
chaos.
Engineering
is
not
only
the
original.
You
know
a
lot
of
the
sort
of
you
know.
Proponents,
even
the
earliest
proponents
of
chaos.
Engineering
are
are
seeing
this
connection
and
they're
seeing
this
connection
in
in
ways.
That
is
for
me
really
satisfying
and
they're,
making
new
connections,
that
is
between
resilience,
engineering
and
chaos,
engineering
that
I
had
I
wouldn't
have
even
seen,
and
so
that's
really
satisfying
super
happy
about
that.
B
Someone
just
dropped
something
in
the
chat
that
remind
me
of
some
of
the
stuff
I've
seen
you
talk
about
before
that.
There
might
be
fun
to
articulate
here,
which
is
this
notion
of
the
the
kind
of
the
the
lines
of
our
models
and
and
how
the
the
process
of
incidents
and
and
analyzing
them
helps
us
build
clearer
models.
Yeah
yeah,
yeah.
C
Yeah
this
this
notion
of
this
line
of
representation.
It's
a
bit
of
a
mind,
blower
right
so
I,
and
this
is
entirely
from
the
worked
out
in
the
snafu
catchers
consortium
and
is
describing
a
lot
more
detail.
I'm
not
going
to
be
as
much
as
eloquent
here,
but
in
the
stellar
report
describes
this
sort
of
frame
and
the
frame
the
frame
goes
like
this.
We
have
all
of
the
stuff.
The
technical
we've
got,
the
databases
we
have
you
know.
We've
got
the
thing
that
we
build.
Here's
the
thing
that
generates.
C
That's
that
stop.
We
manipulate
that
stuff.
We
do
things
with
that
stuff
via
a
representation
of
that
stuff.
It's
not
with
the
stuff
right.
When
you
go
to
make
a
schema
change,
you
don't
go
to
the
data
center
and
do
a
thing
physically
to
the
database
right
and
and
and
what
would
and
so
what
we
everything
we
know
about
that
world
is
via
these
representations,
they're,
not
the
things.
They're
representations
of
those
things
right
distributed
tracing
app
is
a
representation
to
the
extent
that
it's
useful,
it
is
a
representation.
It's
not
the
fame.
C
B
C
C
It's
not
this
stuff
below
the
line.
It's
not
that
there's
no
intelligence
that
goes
down
there
other
than
what
what
has
come
from
us
and
it's
not
the
below-the-line
stuff.
That's
doing
it
right!
It's
it's
our
ability
to
make
sense
of
what's
happening.
What's
happened
in
the
past,
what's
happened
what's
happening
right
now,
what
makes
that
matter
and
what
makes
what
might
matter
in
the
future
important
to
pay
attention
to,
and
so
that's
the
notion
of
above
the
line
below
the
line
so.
A
That
and
to
go
back
to
something
early
very
early
in
this
conversation,
the
you
know,
the
blame
game
and
and
I
come
from
a
perspective
of
open
source
community
development
and
trying
to
you
know,
shed
sunlight,
and
so,
when
there
is
an
incident,
we
have
one
team
has
their
mental
model
of
how
things
are
working.
One
of
the
things
that
I
try
really
hard
and
is
almost
as
very
hard
to
get
people
do,
is
to
share
their
model.
A
It's
almost
a
cultural
shift
because
it
often
it's
inside
it's
something
that
went
wrong
with
a
product
or
a
service
or
something
like
that.
Flickr
went
down
or
you
know
somebody
went
down
and
they're
very
reluctant
to
like
have
an
open
dialogue
with
the
user
community
about
what
went
wrong
because
then
maybe
they'll
ship
to
another
service
provider,
or
you
know
something
like
that.
So
there's
this
and
I'm
just
wondering.
Maybe
from
both
of
your
perspectives,
you
know
how
you
help.
A
B
B
C
C
Well
well,
I
mean
I
mean
if
they
believed
that
they
would
get
something
from
it.
They
would
do
it
if
they
believed
that
that
that-
and
this
is
internal-
and
just
like
just
like
Andrew
said-
and
external
there's-
nothing
that
there's
some.
You
know,
peculiarities
about
in
you
know
write-ups
about
incidents
to
the
public,
but
remember
those
are
those
are
the
purpose
of
those
the
audience
for
those
is
very
different
than
an
internal
right,
and
it's
a
mistake.
The
two
as
being
similar,
it's
different,
the
point
that
you've
brought
up,
which
is
reluctance.
C
There's
there's
a
reason:
why
there's
that
that
people
are
reluctant
right
if
they
think
that
they
can
get
something
if
they
think
there's
something
positive
and
they
feel
supported
in
giving
a
story
then
great,
if
there's,
if
there's
something
that
is
potentially
threatening
for
them
or
others,
then
then
they
won't
write,
and
so
remember
the
the
and
and
the
somewhat
of
a
potentially
nitpicky
point
on
mental
models.
Is
that
peep
I
can't
ask
you
for
your
mental
model.
C
C
You
have
to
build
a
constellation
of
data
that
supports
this
mental
model,
calibration
recalibration
and
you
have
to
and
that's
about
a
mixture
of
records,
of
what
people
do,
what
people
say
and
what
people
do
and
say
about
what
they
do
and
said,
including
others.
This
is
called
process
tracing,
but
it's
the
way
that
you
can
do
way.
You
can
make
ever
and
inferences
about
cognitive
processes,
sorry
to
get
really
nerdy
there
for
a
second,
but
this
is
the
reason
why
you
know
this
is
this
is
what
makes
doing
this
work.
C
Difficult
people
won't
share
a
thing
that
they
think
everybody
knows,
or
they
aren't
even
aware
of
themselves.
Famous
famous
researcher
in
this
in
the
late
60s
said
it
quite
quite
best
about
tacit
knowledge.
We
can
know
more
than
we
can
tell
and
and
a
significant
part
of
studying
cognitive
work
is
exploring
tacit
knowledge,
and
there
are
some
ways
that
you
simply
cannot
do
it
and
you
have
to
learn
to
how
you
have
to
learn
and
practice
how
to
do
that.
C
Otherwise,
the
results
aren't
valid
and
there's
only
one
thing
worse
than
a
really
poorly
captured
incident
write-up,
and
that
is
an
incident
write-up
that
every
everyone,
despite
its
contents,
finds
to
be
non
credible,
because
the
authors
and
the
methods
by
which
that
was
that
was
formed
is
seen
to
have
an
agenda
to
the
effective
incident.
Analysis
requires
an
analyst
to
be
a
non
stakeholder
full
stop
full
period.
There
is
no
other
alternative.
C
B
C
C
The
reason
why
a
lot
of
this
comes
out
of
research
in
the
military,
DoD
and
do-e
funded
projects
in
the
US
and
in
other
parts
of
the
world
is
because
of
the
because
of
consequences
in
time
pressure
and
it's
you
know,
jokingly,
said
that
you
you're
you're
doing
this
work
either,
because
somebody
who
was
supposed
to
get
killed,
didn't
or
somebody
got
killed,
who
should
have
and
and
that
wipes
away
consequence
and
time
pressure
wipes
away
anything
else.
That
is
immaterial.
That's
what
makes
incidents
that's
the
Trojan
horse.
C
We
think
you
know
that
it's
a
myth
to
say
that
that
using
these
techniques
looking
into
incident
analysis,
isn't
there
the
focus
isn't
necessarily
to
find
what
broke
it's,
not
some.
It's
not
something
sort
of
socialized
debugging
is
to
find
out
how
stuff
works
at
all.
The
incident
is
just
a
director
of
attention.
The
incident
is
just
the
you
know.
The
filter,
you
know-
and
you
can
think
of
an
incident
as
your
system
saying:
hey
everybody.
C
A
C
Engineers
need-
and
this
is
something
we
know
about
software
engineers-
they
don't
read
anything,
they
don't
think
they
need
to
read
and
when
they
think
they
need
to
read
something
they
have
an
expectation.
They're
gonna
get
something
out
of
it:
you're
damn
right,
they're,
gonna,
read
it
and
so
doing
that
capturing
what
makes
incidents
hard
capturing.
C
What
makes
you
know
red
herrings
and
wild
goose
chases
happen
because,
following
those
have
worked
in
the
past
mm-hmm
right,
but
you
never,
you
very
rarely
see
the
details
of
red
herrings
and
what
made
red
herrings
so
attractive
to
follow
in
incident
right
hours.
Very,
very,
very
rarely
do
you
see
that
that's
an
example
of
something
that's
an
example
of
the
messy
details.
That's
really
important.
The.
A
The
other
outcome
of
doing
post,
mortems
and
incident
reports
is
also
building
trust
when
you
share
that
information,
you're
building
trust
with
the
other
folks
across
silos
internally
or
your
end
user
community-
that
you're
sharing
this
information
as
opposed
to
withholding
it
and
not
exposing
you
know
the
the
things
that
might
have
led
up
to
it.
So
there's
so
I
think
you
know,
and
the
hardest
thing
is
to
do
it
well
and
yeah.
C
That's
this
is
proportionately
you're
right
about
trust,
but
but
that
trust
is
proportional
to
the
quality
and
what
others
find
of
interest
in
the
report
right,
which
is
why
I'd
say
a
very
strong
signal,
not
be
signal,
but
a
very
strong
signal
is
how
many
people
read
how
many
people
read
it.
You
know
how
many,
if
you
can't
I'm
gonna,
go
out
on
a
limb.
I
know
that
counting
and
doing
statistics
on
how
often
somebody
has
visited
a
web
page.
B
C
That's
great
for
the
record,
if
everyone
was
interested
in
having
me
on
retainer,
certainly
please
reach
out
the
so
that
there's
a
great
question
so
there's
there
there
are
two
things
that
I
would
I
would
suggest.
The
first
is
to
understand
that
there's
a
growing
community
who
is
who
is
it's
not
just
about
depress
you
labs
right.
C
There's
a
website
called
learning
from
incidents.
You
will
see
reflected
in
a
lot
of
blog
posts,
more
and
more
people.
Talking
about
these.
These
topics
happy
to
tweet
much
more
I
would
say
that
the
learning
from
incidents
Paige
and
in
particular,
Lauren
Hochstein,
on
github
as
written
an
absolutely
stunning
sort
of
set
of
resources
about
resilience,
engineering
and
they're
standing
cognitive
work
that
you
can
look
at
pragmatically,
practically
a
couple
of
suggestions.
The
first
is
to
make
effort
to
capture
from
as
many
people
as
you
can.
C
What
was
difficult
asked
them
and
put
it
put
it
in
the
news,
put
a
new
section
in
your
post
mortem
template
or
wherever
you
wanted
and
get
people
to
to
write
what
was
hard,
what
was
surprising,
what
was
difficult,
the
more
people
and
not
not
what
they
thought,
the
team
thought
was
difficult,
not
in
an
abstract
way:
individual
perspectives,
individual
perceptions.
What
was
hard
was
difficult,
the
more
that
you
can
button
lots
of
things
are
difficult.
It's
not
just
sometimes
even
even
understanding.
C
The
thing
that
you're
seeing
is
bad
can
be
difficult,
so
gathering
those
sorts
of
reflections
right.
Every
engineer
has
this
feeling
this
this
sort
of
when
we've
we've
talked
with
organizations
we
sigh
asked:
have
you
ever,
you
know,
have
you
about
to
run
a
command
you're,
responding
to
an
incident
about
to
run
a
command,
and
everybody
thinks
you
should
do
this
right?
Well,
your
colleague
like
you
should
do
this.
This
looks
like
it's
a
best
shot.
Okay,
all
right,
I'm
going
to
go.
Do
it?
A
C
A
palpable,
extremely
important
experience
that
almost
never
finds
its
way
into
these
narratives
capturing.
What
makes
work
hard?
What
makes
work
harrowing,
you
know
absolutely
astonishing.
You
know
there
are.
There
are
surprises
that
are
absolutely
fundamental
right
that
there's
this
notion
of
a
situational.
Surprise:
that's
when
you
buy
a
lottery
ticket
and
you
win
the
lottery
right
and
then
there's
fundamental
surprise
and
that's
when
you
don't
buy
a
lottery
ticket
and
you
win
the
lottery.
Okay
fundamental
surprises
are
what
make
sure
noble.
C
A
So
it's
great
it's
been
wonderful,
but
then,
when
we
take
it
and
we
have
to
do
it
in
the
open,
yeah
and
and
when
I
talk
about
sharing
that
you
know,
you
know
how
we
do
this
in
an
open,
positive
way
and
and
learning
the
practices
in
open
source
communities.
What
is
something
that
now
that
I've
read
the
books
now
I've
heard
you
speak
and
I've
heard
Andrew
speak
and
everybody,
but
everybody
is
trying
to
figure
out
how
to
take
this
to
the
open
source
community
work
that
we're
doing
variants.
A
B
A
A
Yeah
and
I'll
try
and
find
many
of
the
references
that
you
spoke
up.
The
both
of
you
spoke
of
and
add
them
to
a
resources
page
for
this.
This
conversation
when
we
post
it
up
and
definitely
have
you
back
again
and
boy
lots
of
things
to
think
about
now
over
the
weekend
and
ongoing.
So
thank
you
very
much
for
joining
us
today.