►
From YouTube: Understanding Observability (The Podlets, Ep 4)
Description
Observability - what the term means, how it relates to the process of software development, and the importance of investing in a culture of observability.
For the show notes and transcript: https://thepodlets.io/episodes/004-observability/
Feedback and episode suggestions:
https://twitter.com/thepodlets
https://github.com/vmware-tanzu/thepodlets/issues
info@thepodlets.io
Hosts
https://twitter.com/carlisia
https://twitter.com/kris-nova
https://twitter.com/mauilion
A
B
C
C
A
A
C
B
Yeah
I
agree
with
that,
but
it
seems
like
it's
one
of
those
very
hot
topics.
I
mean
it
feels
like
people,
often
conflate
the
idea
of
monitoring
and
logging
of
an
application
with
the
term
with
the
idea
of
observability
and
what
that
means.
So
I'm,
looking
forward
to
kind
of
digging
into
this
the
details
of
that.
What.
B
Might
it
might
take
observability
is
a
set
of
tools
that
can
be
applied
to
describe
the
ways
that
data
moves
through
a
distributed
system,
whether
that
data
is
a
particular
request
or
a
particular
transaction.
In
this
way
you
can
actually
understand
the
wave.
All
of
these.
You
know
all
of
these
distributed
parts
of
this
system
that
we're
building
are
actually
interacting
and,
as
you
can
imagine,
things
like
monitoring
and
metrics
are
a
part
of
it
right,
like
being
able
to
actually
understand
how
the
code
is
operating.
C
B
C
We
got
we
have
an
hour
day.
Listen
to
me,
I
mean
basically
like
okay,
so
I'm
an
infrastructure
engineer.
I
wrote
this
book
cloud
native
infrastructure.
Everything
to
me
is
some
layer
of
software
running
on
top
of
it
infrastructure
and
observer
observability
to
me.
Is
it
solves
this
problem
of
how
do
I
gain
visibility
into
something
that
I
want
to
learn
more
about,
like
I,
think
my
favorite
analogy
for
observability
have
you
all
ever
been
to,
like
you
know
like
like
a
gas
station
or
a
convenience
store
and
on
the
front
door.
C
There's
like
it's
like
a
height
scale,
chart
you'll,
say
like
4
feet,
5
feet,
6
feet
7
feet.
I
always
wondered
what
that
was
for
and
I
remember:
I
went
home
one
day
and
I
googled
it.
It
turns
out,
that's
actually
for
if
the
place
ever
gets
robbed,
as
the
person
runs
out
the
front
door,
you
get
a
free
height
measurement
of
how
tall
they
are,
so
you
can
help
identify
them
later.
To
me,
that's
like
the
perfect
description
of
observability.
C
A
So
observability
is
sort
of
a
new
term
because
it's
not
necessarily
something
that
I
as
a
developer
would
jump
in
and
say:
oh
gee,
my
project
doesn't
do
observability,
I
needed
I,
I,
understand,
metrics
and
I,
understand,
logging,
monitoring
and
so
now
I
hear
observability.
Of
course,
I
read
about
it
to
talk
about
it
on
the
show,
and
it's
not
and
I
have
been
running
into
this
word
everywhere,
but
I
feel
why?
Why
are
you
people
talking
about
observability?
That's
my
question.
Yeah.
C
Well,
I
think
this
kind
of
goes
back
to
the
gas
station
analogy
again
right
like
what
do
you
do
when
your
metaphorical
application?
It's
Rob
like
what
happens
in
the
case
of
a
catastrophic
problem,
and
how
do
you
go
about
preparing
yourself
the
best
way
possible
to
to
have
an
upper
hand
at
solving
that
problem?
Right,
like
you
know,
some
guy
robbed
a
store
and
then
ran
out
the
front
door,
and
then
we
realized.
Oh,
we
have
no
idea
how
tall
he
is.
He
could
be
40
feet
taller.
C
He
could
be
6
feet
tall
and
then
you
know
we
learned
the
hard
way
that
maybe
we
should
start
putting
markers
on
the
door.
I
feel
like
observability
is
the
same
thing,
but
I
feel
like
people
just
kind
of
wake
up
and
say,
like
I
need
observability,
I'm,
gonna
go
and
I.
You
know
I
need
all
of
this
like
bells
and
whistles,
because
my
application,
of
course,
is
gonna
break
and
I
feel
like
in
a
weird
way.
B
B
Argue
that,
like
the
term
observability,
is
coming
up
more
frequently,
and
it's
certainly
a
hot
topic
today,
because
of
effectively
context.
It
still
comes
back
down
to
context
when
you're
in
a
situation
where
your
application,
wouldn't
you
have
built
like
a
cloud
native
architecture
of
your
application.
You
got
a
bunch
of
different
services
that
are,
inter
communicating
or
maybe
all
communicating
with
some
particular
shared
resource,
and
things
are
misbehaving.
B
You're
gonna
need
to
have
the
context
to
be
able
to
understand
how
it's
breaking
or
at
what
point
it's
breaking
or
where,
in
the
crate,
in
the
tangled
web,
that
we
move
is
the
problem
actually
occurring
and,
and
can
we
measure
that
at
that
point
right
like
and
so
traditionally
like
in
in
a
monolithic
architecture,
you're
not
really
looking
at
that
you're
like?
Maybe
you
break
up
the
model,
you
brick
you
break
over
the
monolith
you
fit.
B
You
set
up
a
couple
set
points
you're
looking
for
it
the
way,
particularly
code
paths
work
or,
if
you're,
if
you're,
on
top
of
the
game,
you
might
like
instrument
your
code
in
such
a
way
that
it
will
emit
events
when
particular
transactions
happen,
or
particular
things
happen
and
you're
going
to
be.
Looking
at
those
events
in
really--it
logs
and
looking
at
metrics
to
figure
out
how
this
one
application
is
perform
is
performing
or
behaving
with
observer
ability.
We
have
to
solve
that
problem
across
many
systems,
so.
A
That
is
why
I
put
on
the
show
notes
that
it
has
to
do
something
with
the
idea
of
cattle
vs.
parrots,
because
because
I'm
saying
this,
because
Duffy
was
asking
me
before
we
started
recording.
Why
was
that
on
the
show
notes
and
because
correct
me,
if
I'm
wrong,
I
think
you
were
going
in
the
direction
of
saying
you
don't
see
it?
You
don't
see
the
relation,
but
the
relation
that
I
was
thinking
about
was
exactly
what
you
just
said.
A
If
I
have
a
monolith,
I'm
looking
at
one
thing
about
looking
at
one
log,
I
can
treat
there's
my
little
pad
as
opposed
to
when
I
have
many
microservices
interacting
I
can't
even
try
anything
if
I
Twitter
them
as
badly
without
that
right,
because
I
can't
this
is
too
much.
So
the
idea
of
the
reason
why
observability
is
necessary
sounds
to
me
like
that.
It's
a
problem
of
scale
in
complexity,
yeah.
C
And
I
think
that
explains
why
we're
just
now
hearing
it
too
right,
like
I'm,
trying
to
think
of
another
metaphor
here,
I
guess
today
it's
going
to
be
a
metaphor
day
for
me:
Oh
got
it
okay,
so
I
just
got
back
from
London
last
week,
I
had
gotten
off
the
tube
and
I.
Remember
I,
like
came
up
to
the
surface
and
like
the
blinding
light
is
in
my
eyes
and
all
of
a
sudden
I
saw
a
sign
for
Scotland,
Yard
and
I
was
like
whoa
I.
A
How
do
people
handle
and
I'm
asking
this
question
because
truly
I
have
not
yet
I
have
yet
to
do
like
to
have
this
problem
for
my
project
that
I
need
to
put
I
need
to
do
observability
in
my
project.
I
need
to
make
sure
my
project
is
observable,
I
mean
other
than
the
bread-and-butter,
metrics
and
logging.
That's
that's
what
we
do.
We
don't
do
anything
further
than
that,
but
I
don't
know
if
those
things
are
were
constitute
observer
ability,
but
what
what
Nova
just
said?
A
A
C
We
get
into
like
this
exciting
world.
If,
like
how
long
do
we
persist
our
data,
in
which
data
do
we
track
and
there's
you
know
a
lot
of
schools
of
thought
and
a
lot
of
different
opinions
around.
What's
the
right
solution
here
is,
but
I
think
it
kind
of
just
boils
down
to
every
application.
Instead
of
the
concerns,
it's
gonna
be
unique
and
you're,
just
gonna
have
to
give
it
some
thought.
B
There's
this
idea
of
in
this
in
a
book
called
distributed
system
observability
by
Cindy,
shreekant,
sweet,
Darrin,
I'm,
probably
flattering
her
name,
but
she
went
that
there's,
like
skis
three
pillars.
The
three
pillars
are
events
metrics
and
traceability,
or
tracing
the
bench
metrics
and
tracing.
These
are
the
three
pillars
of
observability.
So
if
we
were
going
to
lay
out
the
way
that
those
things
might
apply
to
just
any
old
application
like
a
monolith,
then
we
might
look
at
how
can.
C
B
The
things
do
you
want
to
instrument
in
your
application
or
a
any
calls
that
your
application
is
going
to
make
that
might
represent
a
period
of
time
right,
like
it's
going
to
make
a
call
to
an
external
system.
That's
something
that
you
would
definitely
want
to
omit
an
event
for,
if
you're
trying
to
understand
you
know
like
where
the
problems
are
going
sideways
like
how
long
it
took
to
actually
make
a
query
against
the
database
in
the
back
end
of
a
wordpress
blog
is
a
great
example
right.
B
C
My
understanding
of
instrumentation
is
like
there's
kind
of
a
like
a
bit
of
an
art
to
it
and
you're
actually
going
in
and
you're
adding
like
lines
of
code
to
your
application
that
on
line
13,
we
say
starting
transaction
on
line
14
we
make
an
HTTP
transaction
and
on
the
next
line
we
have.
The
event
is
now
over
and
we
can
sort
of
see
that
and
discover
that
we
made
this
HTTP
transaction
and
see
where
it
broke.
If
it
broke
at
all,
is
that
it
am
I
am
I.
Thinking
about
that
right,
I
think.
B
You
are
but
what's
interesting
about
that,
but
the
reporting
on
line
14
right
what
you're
actually
saying
the
event
is
over
right.
That
way,
I
think
that
we
end
up
actually
measuring
this
measuring
this
in
both
an
event
stream
and
also
in
a
metric
right,
so
that
we
can
act.
You
understand
you
know
over
the
last
hundred
transactions
to
the
database.
You
know
like.
Are
we
seeing
any
increase
in
the
amount
of
time
of
the
process
takes
like?
Are
we
actually?
You
know
are
we
are
we
are?
B
We
is
this
something
we
can
measure
with
metrics
and,
like
you
know,
understand
like?
Is
this
value
changing
over
time
and
then,
from
the
event
perspective?
That's
where
we
start
trying
in
things
like
contextually
in
this
transaction?
What
happened
right
so
in
this
particular
event,
is
there
some
way
that
we
can
correlate
the
event
with
perhaps
a
trace
and
we'll
talk
a
little
bit
more
about
tracing
too
but
like
so
that
we
can
understand?
Okay.
Well,
we
have
you
know
at
two
o'clock.
B
We
see
that
there
is
like
an
incredible
amount
of
latency
being
introduced
when
my
wordpress
blog
tries
to
write
to
the
database,
and
it
happens
every
day
at
two
o'clock.
I
need
to
figure
out
what's
happening
there
and
so,
like
that's
a
great
to
even
get
to
the
point
where
I
understand
it's
everything,
it's
two
o'clock,
I
need
things
like
metrics,
so
many
things
like
events
specifically
give
me
that
time
correlation
to
understand.
Oh,
it's
a
two
and.
C
This
is
where
we
get
into
what
currently
I
just
asked
about,
which
was
how
do
we
solve
this
problem
of?
What
do
we
do
when
it
goes
away
like
in
the
case
of
our
two
p.m.
database
latency
like
for
lack
of
a
better
term?
Let's
just
call
it
the
heartbeat,
the
2:00
p.m.
heartbeat,
what
happens
when
the
server
that
was
experiencing
that
latency
mysteriously
goes
away?
Where
does
that
data
go?
And
then
you
look
at
tools
like
I
know:
Prometheus.
Does
this
an
elastic
search?
C
Has
its
capability
to
do
this,
but
you
look
at
how
do
we
start
managing
time
series
data
and
how
do
we
start
tracking
that
and
recording
it
and
it's
a
fascinating
problem,
because
you
don't
actually
record
you
know
2:00
p.m.
to
this
second
and
this
degree
of
a
second,
this
thing
happened.
You
record
how
long
have
spent
since
the
previous
event,
so
you're
just
constantly
measuring
Delta.
It's
like
it's
like
the
same
way
that
get
works
like
every
time
you
do
a
git
commit
you
don't
actually
write
all
1,000
lines
of
software.
B
You
highlight
it
really
I
mean
both
both
both
of
the
to
be
a
highlight.
A
really
good
point
around,
like
this
whole
kettle
versus
best
thing.
You
know-
and
this
is
actually
something
that
I
spent
a
little
time
with
in
a
previous
in
a
previous
life,
and
that
and
the
challenge
is
that,
like
especially
in
systems
like
kubernetes
and
other
systems,
where
you
have
you
know,
perhaps
your
application
is
running
or
being
scaled
out
dynamically
or
scaled
down
dynamically
based
on
load.
You
have
all
of
these
ephemeral
events.
B
You
have
all
these
events
that
are
from
pods
or
from
particular
instances
of
your
application
that
are
ephemeral,
they're
not
going
to
be
long-lived,
and
so
they,
this
highlights
a
kind
of
a
new
problem
that
we
have
to
solve.
I
think
when
we
start
thinking
about
cloud
native
architectures,
in
that
we
have
to
be
able
to
correlate
that
particular
application
with
information
that
gives
us
the
the
context
to
understand
like.
B
Perhaps
this
was
this
version
of
this
application,
and
these
events
are
related
to
that
particular
version
of
the
app,
and
when
we
made
a
change,
we
saw
a
great
reduction
in
the
amount
of
time
it
takes
to
make
that
database
call
and
we
can
correlate
those
new
those
new
metrics
based
on
the
new
version
of
the
app
and
because
we
don't
have
this
like
as
a
long-term
entity
that
we
can
measure
like
this,
isn't
like
a
single
IP
in
a
single
piece
of
software.
That
is
not
changing.
B
C
Okay,
I
have
a
question
open
question
for
the
group.
What
is
the
scope
here
and
I
guess
to
like
kind
of
like
build
on
our
WordPress
analogy?
Let's
say
that
every
day
at
2:00
p.m.
we
notice
there's
just
latency
and
we've
been,
you
know
just
we
spent
the
last
two
weeks
just
endlessly
digging
through
our
logs
and
trying
to
come
up
with
some
sort
of
hypothesis
of
what's
going
on
here,
and
we
just
can't
find
anything.
C
Everything
we've
talked
about
so
far
has
been
at
the
application
layer
of
the
stack
instrumenting,
our
application,
debugging,
our
application,
making
HTTP
requests
what
happens
or
what
should
we
do
or
disability
even
care
if
one
of
our
hard
drives
is
failing
every
day
at
2:00
p.m.
when,
like
the
cleaning
service
comes
by
it
accidentally
bumps
into
it
or
something?
C
B
C
B
C
A
That
what
monitoring
is
like
some
sort
of
testing
from
the
outside,
like
an
external
testing,
that,
of
course
you
only
get
gives
us
the
information
after
the
fact
right,
the
server
aware
it
died.
My
application
is
already
not
available
so
now,
I
know
yeah,
but
isn't
that?
Isn't
that?
What
monitoring
but
isn't
monitoring?
What
would
address
a
problem
like
that
I.
B
B
Globally,
at
2
o'clock,
what's
going
on
in
my
world
right
like
is
there
is
you
know,
I
know
that
these
are
the
two
entities
that
are
responsible.
I
know
that
I
have
a
bunch
of
pods
that
are
running
on
this
cluster
I
know
that
I
have
a
database
that
may
be
external
to
my
cluster
or
maybe
on
the
cluster.
I
need
to
really
like
understand,
what's
happening
in
in
the
world
around
those
two
entities,
as
it
correlates
to
that
period
of
time.
B
B
A
It's:
how
do
you
do
it,
though,
because
I'm
super
gonna
go
back
to
the
monastery
I
mean
I'm
using
external
serve
service
to
ping.
My
my
service
in
my
service
is
down
yeah
I'm,
going
to
get
the
timing
right.
I
can
go
back
and
look
at
the
information,
the
blog
stream.
What
I
know
that
was
because
of
the
server
know,
but
should
I
be
paying
in
the
server
too,
should
I
be
paying
every
layer
of
the
infrastructure?
How
do
people
do
that?
Yeah.
C
B
So
if
my
query,
so
what
I
was
trying
to
do
was
actually
like,
you
know,
submit
a
comment
on
a
wordpress
blog
if
I
had
a
way
of
implementing
tracing
through
that
WordPress
blog
I
might
be
able
to
leave
myself
little
breadcrumbs
throughout
the
entire
set
of
systems
and
understand,
okay.
Well,
what
you
know
at
what
point
did
I
I
mean
we're
we're
in
this
in
this
particular
web
transaction?
Am
I
spending
time
so
I
might
see
that
you
know
from
the
load.
Balancer
I
begin
my
trace
ID
and
in
that
load.
B
I
commented
to
the
database
and
identifying
what
that
database
is
it's
an
important
part
of
that
trace
like
if
I
understand,
I
mean
you
know
where
that
traffic
is
gonna
go
next
and
how
much
time
I
spent
in
that
transaction.
You
know
so
again.
This
is
like
down
to
that
code.
Layer
like
we
should
have
some
way
of
actually
leaving
us.
You
know
producing
an
event
that
may
be
related
to
a
particular
trace
ID,
so
that
we
can
correlate
the
the
entire
lifecycle
of
that
transaction
that
unique,
trace
ID
across
the
entire
process.
C
One
of
the
things
that
I've
kind
of
learned
about
kubernetes
as
I've
been
like
working
with
kubernetes
and
explaining
it
to
people
and
going
out
on
the
road
and
talking
and
doing
public
speaking
I
found
that
it's
very
easy
for
users
to
understand.
Kubernetes.
If
you
break
it
down
into
three
things:
compute
network
and
storage,
and
it
what
I'm
kind
of
getting
at
here
is
like
the
application
layer
is
probably
going
to
be
more
relevant
to
the
compute
layer.
Storage
is
going
to
be
where,
which
is
that's?
C
Observability
storage
is
going
to
be
more
monitoring
and
that's
gonna
be
what
is
my
system
doing?
Where
am
I
storing
my
data
and
then
network
is
kind
of
related
to
tracing
which
we're
looking
at
here,
and
these
aren't
like
necessarily
one-to-one,
but
it
just
kind
of
like
distribution
of
concerns.
Here
am
I
thinking
about
that?
Like
kind
of
the
same
way
you
are
Duffy,
I,
think.
B
You
are
I,
think
I
think
what
I'm
trying
to
get
to
is
like
I'm,
trying
to
identify
the
tools
that
I
need
to
be
able
to
understand,
what's
happening
at
two
o'clock
and
all
of
the
players
involved
in
that
right
and
so
for
that
I
I'm,
actually
relying
on
I'm
relying
on
tools
that
are
pretty
normal,
likely
Billy,
actually
monitor
all
the
systems
and
understand.
What's
you
know
and
have
like
real
time
stamp
stuff
that
describes
you
know
like
I,
got
an
adios
or
alert
or
what-have-you.
B
C
And
I
don't
want
to
like
take
away
from
this
lovely
definition.
You
just
you
just
dropped
on
us,
but
I'm
gonna,
to
take
a
stab
at
trying
to
summarize
this
so
observability
it
spans
the
whole
stack,
so
I
mean
it's
like.
If
you
look
at
the
OSI
reference
model,
it's
gonna
cover
every
one
of
those
layers,
and
all
it
really
is
is
just
a
fancy
word
for
all
the
tools
to
help
us
solve
a
problem.
Yeah.
B
A
C
I
definitely
think,
there's
like
you
can
always
tell
I
like
somebody
once
asked:
what's
the
difference
between
an
SRE
and
a
senior
sre
and
they
were
like
patience
and
it's
like
you
can't,
you
can
always
tell
folks
you've
been
burned
because
they
take
this
stuff
extremely
seriously,
and
I
think
that
culture
like
there's,
there's
commodity
they're,
like
people
are
willing
to
pay
for
it.
If
you
can
actually
do
a
good
job
at
going
from
chaotic
problem,
I
have
no
idea.
C
B
I'm
I
was
recently
discussing
the
the
ability
in
in
a
in
another
medium.
We
were
having
a
conversation
around
doing
chaos,
testing,
test
and
I.
Think
that
this
relates
and
the
the
interesting
thing
that
came
out
of
that
for
me
was
the
idea
that
you
know
I
spent
a
pretty
good
portion
of
my
career
teaching
people
to
troubleshoot,
which
is
kind
of
weird.
B
Will
they
be
the
people
who
are
operating
the
code
or
the
people
who
are
who
are
just
trying
to
keep
the
whole
system
up
or
provide
you
feedback
to
experiment
and
to
and
to
develop
hypotheses
around
how
the
system
might
break
at
a
particular
scale
and
to
test
that
right
and
and
giving
them
the
tools
with
which
to
actually
observe
this
is
critical.
You
know
like
it's
amazing,
but
yes,.
C
I
kind
of
like,
in
my
mind
again
I'm,
on
my
metaphor,
kick
again
I.
Think
of
like
the
like
the
bank,
robber
movies,
where
they
like
take
dust
and
blow
it.
Then,
all
of
a
sudden,
you
can
see
the
lasers
mm-hmm,
yeah,
I,
come
kind
of
feeling
like
that's
what's
happening
here,
is
where
we're
kind
of
purpose
like
chaos,
testing.
It
would
just
be
the
practice
of
intentionally
breaking
the
lasers
to
make
sure
our
security
system
works
and
observability
is
the
practice
of
actually
doing
something
to
make
those
lasers
visible.
B
A
Because
the
two
of
you
spend
time
with
customers
special,
maybe
a
few
more
so
than
over,
but
definitely
I
spent
zero
time
no
I
I
spent
zero.
My
I'm
curious
to
know
if
someone
I
said,
let's
say
on
sree-
wants
to
implement
set
of
practices
that
comprise
what
we
are
talking
about
and
saying
it's
a
possibility:
okay,
but
they
need
to
get
a
buy
out
from
other
people.
A
A
C
B
A
B
Know
if
we're
not
Institute,
if
we're
not
working
with
our
developers,
who
are
more
focused
on
understanding,
you
know,
does
this
function
do
what
it
says
on
the
box,
rather
than
is
this
function
implemented
in
a
way
that
might
accept
that
might
emit
events
or
metrics
all
right?
This
is
a
that
ever
had
completely
different
set
of
problems
from
the
developers
perspective.
B
Here's
here
are
the
things
that
this
application
must
have
to
be
able
to
and
wire
into
to
enable
us
to
operate
this
app
right
so
that
we
can
understand.
We
can
observe
it
and
and
minute
monitor
it
and
do
all
the
things
that
we
need
to
do
and
the
great
part
about
that
is
that
it
means
that
you're
teaming
with
the
developer
teams.
You
have
some
engineering
piece
that
is
teaming
with
the
developer
team
and
enabling
them
to
understand.
A
It's
a
getting
to
that
place.
It's
a
interesting
proposition!
Isn't
it
because,
as
a
developer,
even
as
a
developer,
I
see
the
world
moving
more
and
more
towards
the
developer,
taking
ownership
of
the
apps
and
knowing
more
of
the
more
layers
of
the
stack
and
if
I
am
a
developer
and
I
want
to
implement,
incorporate
this.
These
practices
I,
don't
want
to
I
need
to
convince
someone,
but
either
develop
or
whoever
is
in
charge
of
monitoring
and
making
sure
the
system
is
up
and
running
right,
yeah,
so
I,
don't
wanna
lose
my
train
of
thought.
A
So
one
way
to
go
about
quantifying
the
need
for
that
is
to
say
well.
Over
the
last
month,
we
spent
X
amount
of
hours
trying
to
find
a
bug
in
production
and
that
X
is
like
a
huge
number.
So
you
can
bring
that
number
and
say
this
is
how
much
the
number
cost
and
in
engineering
hours,
but
on
the
other
hand,
you
don't
want
to
be
the
one
to
say
that
it
takes
your
hundred
hours
to
find
one
little
bug
in
productions
of
you,
yeah
I,.
C
Anyway,
I
was
just
gonna,
say
I
feel
like
this
is
why
all
teams
are
so
successful
because
baked
into
how
you
do
your
work
is
the
sort
of
this
implicit
way
of
tracking
your
time
in
your
progress.
So,
at
the
end
of
the
day,
if
you
do
spend
a
hundred
hours
work
like
trying
to
find
a
bug,
it's
sort
of
like
that's
the
team's
hours.
It's
not
your
hours
and
you
sort
of
get
this
data
for
free
at
the
end
of
every
sprint.
Yeah.
B
Think
that
frequently
we
assume
there
are
many
I
should
say.
Let
me
put
this
differently.
I've
seen,
companies
where,
in
the
culture
is
somewhat
damning
for
people
who
spend
a
lot
of
time,
trying
to
troubleshoot
something
that
they
wrote,
and
that
is
a
terrible
matter,
because
it
means
that
the
the
people
who
are
out
there
right
in
the
code
who
are
just
trying
to
get
across
the
finish
line
with
the
thing
that
needs
to
be
in
production
right,
have
now
this
incredible
pressure
on
them
to
not
make
a
mistake.
That
is
not
okay.
B
B
Yeah,
it
makes
me
nuts
that
that
there
are
organizations
that
are
like
that,
I
feel
like
we
really
just
in
it.
What's
awesome,
both
is
I,
see
that
narrative
raising
up.
You
know
within
the
within
the
ecosystem
that
I,
you
know
the
brown
cloud
native
architectures
and
other
things
like
that
is
it's
like
you
know
we
we
at
you
know
you
were
hired
to
do
a
hard
job,
and
if
we
come
down
on
you
for
thinking
that
that's
a
hard
job,
then
we're
messing
up
you're,
not
messing
up.
A
Building
software
is
very
hard
and
complex,
so
if
you're
not
making
mistakes
you're
either
not
human
or
you
know
making
enough
changes,
and
in
today's
worlds
we
still
have
humans.
Making
software
robots
we're
not
there
yet,
but
and
it's
a
very
risky
proposition
not
to
make
be
making
continuous
changes,
because
you
will
be
left
behind
yeah.
C
I
feel
like
there's,
there's
definitely
something
to
be
said
about
empathy
for
software
engineers,
like
it's
very
easy
to
be
like.
Oh,
my
gosh,
you
spent
a
hundred
hours
looking
on
this
one
bug
to
save
$20.
How
dare
you,
but
it's
also
it's
a
lot
harder
to
be
like.
Oh
you,
poor
thing.
You
had
to
dig
through
a
hundred
million
lines
of
somebody
else's
code
in
order
to
find
this
bug
and
it
took
you
a
hundred
hours
and
you
did
all
of
that
just
to
fix
this
one
little
bug.
C
How
awesome
are
you
and
I
feel
like
you
know,
that's
where
we
get
into
the
team
dynamic
of?
Are
we
like
a
blame,
Centrex
team?
Do
we
do
we
try
to
assign
blame
to
a
certain
person,
or
do
we
like
look
at
this
as
a
team's
responsibility
like
this?
Is
our
code-
and
you
know
poor
karlie
CEO
over
here
had
to
go
dig
through
this,
like
code
that
has
been
touched
in
ten
years
or
whatever.
A
Another
sorry
sophi
another
layer
to
that
is
that
my
experience
I
have
never
done
anything
software
or
looked
at
any
codes,
brought
up
any
system
that
is
trivial
as
the
end
result
boys,
especially
in
relation
to
the
time
spent.
It
has
never
happened
that
it
wasn't
a
huge
amount
of
Education
that
I
got
to
reuse.
You
know
you
know
in
the
future
in
future
work
so.
A
B
You
will
automatically
I
mean,
like
you
know,
by
your
nature,
build
a
better
intuition
yourself
around
how
all
of
these
systems
operate,
doesn't
matter
whether
it's
you
know
the
application
that
you're
working
on
or
some
other
application,
you're
going
to
be
able
to
build
a
better
intuition
for
how
to
understand
and
characterize
systems
in
general
you'll,
be
a
better
person.
You'll
be
a
better
engineer
for
distributed
systems.
If
you
are
in
a
culture,
that
is
blameless.
B
A
B
C
I
mean
I
think
like
they
came,
they
certainly
can
be
automated
I.
Just
don't
think,
there's
a
hard
right
query:
a
hard
bit
of
criteria
that
says
everyone
needs
to
be
automated,
like
there
ain't
nothing
wrong
with
Association
into
a
server
and
running
a
debug
script
of
something
if
you're
having
a
really
bad
day.
Okay,.
A
B
A
Right
so,
let's
say
not
to
exclude
the
options
to
do
it
manually
too,
if
you
want,
but
let's
say
we
have
these
wonderful
tools
that
I
don't
can't
automate
a
bunch
of
this
work
for
us
and
we
get
to
look
at
it
at
a
high
level.
So
I'm
thinking
is
well
beef,
whereas
before
we,
if
we
didn't
have,
we
didn't
use
those
tools
or
we
are
not
using
those
tools.
We
have
to
do
a
lot
of
that
work
manually.
A
We
have
to
look
at
like
a
lot
hidden,
a
lot
more
different
places
and
you
get
to
do
I
would
challenge
you
that,
but
hey
gorg.
Let
me
finish
the
whole
tech
spill.
I
will
challenge
you
that
we
develop
even
more
its.
We
shun
that
way.
So
we
are
decreasing
the
level
of
intuition
that
we
develop,
potentially
by
using
the
tools
now
I'm
going
to
agree
with
you.
A
One
thing
here
is
big
already
in
my
head
and
then
for
me
when
I
switch
contacts,
then
go
look
at
something
else:
I
all
right,
I
forgot
what
I
you
know
what
I
looked
at
over
there
and
it's
hard
to
really
hard
to
keep
track
and
really
wasteful
for
it,
it's
possible
to
keep
all
of
it
in
our
minds.
Right
and
let's
say:
I
have
to
go
to
the
whole
debugging
process
of
over
again.
A
If
I
don't
have
notes,
it
will
be
like
just
the
first
time,
because
I
can't
possibly
remember
I
mean
I've
been
in
situations
of
having
to
the
bug
different
systems
and
my
okay
I'm
now
like
third
time
around
I'm
taking
notes,
because
the
fourth
time
is
just
going
to
be
so
painful.
So
having
tools
that
lets
us
look
at
these
at
a
higher
level.
I
think
has
the
additional
benefit
of
helping
us
understand
the
system
and
have
hold
it
together
in
our
heads.
A
A
B
C
Everyone
I,
following
up
on
carlee's
Hijaz,
how
she
challenged
you
and
then
agrees
with
you.
I
I
have
I
really
want
to
ask
this
question
because
I
think
Carly
CEA's
answer
is
going
to
be
different
than
Duffy's
and
I.
Think
that's
gonna
say
a
lot
about
the
different
ways
that
we're
thinking
about
observability
here,
and
it's
really
fascinating.
If
you
think
about
it,
so
have
either
of
you
worked
in
a
shop
before
where
you
had
like
the
guy.
C
You
know
that
one
person
who
just
knew
the
code
base
inside
and
out
he
had
been
around
for
forever.
He
was
a
dinosaur
and
whenever
something
went
wrong,
you're
like
we
gotta
get
this
guy
on
the
phone
and
he
like
would
come
in
like
oh,
it's
this
one
line
and
this
one
thing
that
it
would
take
you
six
months
to
figure
out.
But
let
me
just
fix
this
really
quick,
Bam,
Bam
and
productions
back
on
line.
A
B
C
B
I
think
it
normalizes
it
to
your
point.
I
think
that
it
basically
gives
you
so
like
I,
think
I
think
you're
on
to
it.
I
think
since
I
think
I
agree
with
you,
but
I
think
that,
fundamentally,
what
happens
is
through
tooling,
like
chaos,
engineering
through
tooling,
like
observability,
you
are
normalizing
what
it
looks
like
to
have
that
to
to
teach
anybody
to
be
that
person
right
and
that's
the
key
takeaway
is
like
you
know,
to
curvaceous
point.
She
might.
C
B
A
B
And,
what's
and
I
think
that
the
benefit
of
having
common
tooling,
with
which
to
experiment,
to
understand
and
observe
the
behavior
of
these
distributed
systems
means
that
you
know
we
can.
We
can
normalize
what
it
looks
like
to
be
a
developer
and
have
a
theory
about
how
the
system
is
breaking
or
would
break
and
having
some
way
of
actually
validating
that
through
the
term
through
through
the
use
of
observability
and
perhaps
chaos,
engineering
depending
and
and
that
means
that
that
that
we're
turning
the
keys
over
to
the
crew.
B
B
A
B
A
B
A
A
weird
way
to
observe
like
where
things
went
wrong
and
again
going
back
to
that.
What
I
said
that
more
and
more
developers
are
having
to
say
being
we
asked
I
mean
some
developers
are
actively
taking
on
the
ship
and
in
in
other
cases,
they've
been
asked
to
take
more
ownership
of
the
whole
stack
and
I'm
saying
you
know
from
the
application
level
down
the
stack
and
but
you
gave
me
tools
to
observe
where
they
went
wrong
beyond
my
cold
as
a
developer,
I'm
not
gonna
call
the
guy
yeah.