►
From YouTube: 2022-08-16 CNCF TAG Observability Meeting
Description
Liz-Fong Jones joined us for an discussion / talk on "Evolving and hybridizing signal types - my journey from metrics/logs to traces to profiles"
* OpenTelemetry Profiling updates (#otel-profiles)
* potential for collab w/ other tags (STAG)
* updates from https://github.com/cncf/landscape-graph
* we have a logo!
Meeting notes and more: https://github.com/cncf/tag-observability
A
You
might
want
to
tilt
your
camera
yeah.
B
D
B
D
B
In
indiana
I'm
in
oakland,
but
I
went
to
iu.
B
D
C
E
Hi
folks,
good
morning,
I
think
cal
is
welcome
thanks
for
joining
in
we're
super
excited
to
have
you
today
and
I
think
we're
gonna
wait
a
few
minutes
for
matt
to
join
in
as
well
as
some
of
the
others
joining
in
how's.
Everyone,
good,
good,
good,
good
ryan.
How
are
you.
E
Good
good
good
super
excited
again
kind
of
spread
the
word
across
different
channels,
but
it
takes
a
while
for
folks
to
join
him.
Hi
kevin
thanks
for
joining
steve.
A
A
Exactly
what
I
figured
is,
you
know,
like
small
audience,
you
know,
I
know
you
all
and
like
this,
you
know
we're
all
people
who
work
in
the
space
as
opposed
to
necessarily
audiences
who
are
like
what
is
this
a
durability
thing
anyway?
I.
E
Know
exactly,
but
I
think
it
I
think
it's
great,
that
you
know
liz
you
and
the
end
charity
and
the
team
did
the
book,
because
I
think
that
the
book
actually
is
very
helpful
for
engineers.
You
know
who
are
getting
involved
in
you
know
in
actually
building
out
observability
into
their
applications
and
and
just
understanding.
E
You
know
the
the
basic
concepts
and
advanced
concepts
so
good
stuff
and-
and
you
know
really
thank
you
for
making
the
book
available
online
because
I
think
that's
always
a
great
game
changer,
for
you
know,
folks,
to
pick
up
the
thing.
That's
really.
A
Surprising
to
me
is
that,
despite
making
the
book
or
maybe
because
of
making
the
book
available
for
free
online
o'reilly
has
said
that
it's
one
of
their
better
selling
books.
Oh.
E
F
Yeah,
I
I
think
that
the
formalization
of
you
know
structured
events.
As
you
know,
a
thing
that
that's
that's
that's
actually
defined
is
another
great
addition
of
the
book.
You
know,
especially
that
it's
complemented
with
more
pragmatic,
practical
advice
about
you
know
how
to
actually
employ
this
to
humans
right.
A
Because
that
yeah,
that
turns
out
to
be
one
of
one
of
the
things
I
that
I
wanted
to
talk
about
today,
is
kind
of
how.
E
A
Came
around
to
yeah
we'll
talk
about
how
I
came
around
to
structured
events,
because
I
was
a
very
hardcore
metrics
person
at
google.
A
I
am,
I
am
going
to
break
my
rule
about
about
saying
about
saying
you
know
when
I
was
google
wii
dot
dot
dot.
I
try
not
to
do
it.
I
have
a
swear
jar
from
when
I
do
it
at
honeycomb,
but
but
here
it's
like
for
setting
historical
context,
not
a
you
should
do
this
because
we
did
it
at
google.
So
I
I
think
it's
okay,
but
give
me
a
little
bit
of
rope.
There.
F
Did
you
already
do
the
disclaimer
elita?
I
joined
a
minute
late.
The
toc
meeting
ran
two
minutes
late.
E
I
introdu
just
said
hi
and
I
welcomed
everyone
but
matthew
happy
times.
F
Welcome
everyone:
this
is
a
cmtf
sponsored
event.
As
such,
the
cncf's
code
of
conduct
applies.
Please
don't
do
anything
in
chat
or
anything
on
there
in
the
meeting.
That
would
be
in
violation
of
that.
F
I
have
a
few
things
to
briefly
cover
that
are
more
administrivia
that
I
hope
to
kind
of
blow
through
and
there's
links
and
follow-up
can
happen
after
liz,
I'm
not
sure
how
much
time
you
want
to
fill,
but
I
did
want
to
leave
the
bulk
of
the
time
for
for
you
for
your
talk
again,
I
love
that
it's
you
know
for
practitioners,
so
impromptu
and
not
super
not
slidewear
or
architecture
is
actually
an
advantage
here.
F
So
thank
you,
okay,
so
I
can
share
my
screen
briefly
just
to
anybody's,
following
along
later.
F
Cool
so
today,
in
the
tlc
meeting,
that's
just
prior
to
this
meeting
by
one
hour
is
my
mic
level.
Okay,
this
is
a
brand.
E
F
Totally
deadly
that's.
F
So
the
tsc
has
asked
us
to
the
tag
you
know
and
the
tag
chairs
and
whoever
else
is
interested,
I
suppose,
reach
out
to
to
kind
of
assess
the
health
of
the
cortex
project
and
make
some
concrete
recommendations
about
what
I
might
need.
So
there's
a
link
there
at
long
last.
I
think
it's
a
year
and
running
we.
We
now
have
a
logo
and
we're
getting
the
actual
high-res
svgs
from
the
cncf
creative
folks.
F
I
wanna
highlight
something
you
know
I
kind
of
I
went.
I
went
and
visited
tag
security
last
week
to
talk
about
the
landscape
graph,
which
I
don't
want
to
get
into
the
details
of
here,
but
for
time,
but
there
are
links
as
they
have
a
working
group
on
secure
supply
chain,
and
why
does
this
matter
for
observability
right?
Well,
you
know
when
we
build
things
and
they
bring
in
all
kinds
of
dependencies.
F
Not
only
are
there
impacts
to
performance
and
runtime
and
all
that,
but
from
a
security
perspective,
part
of
our
charter
is
to
help
with
the
comprehension
and
observation
of
cloud
native
workloads,
including
what
cves
might
be
there.
You
know
as
of
this
morning
or
last
week,
and
so
the
part
of
that
graph
project
that
is,
defining
the
data
model
for
packages
and
pm
through
deb.
F
You
know
et
cetera,
all
of
the
all
of
the
packages
they
have
a
similar
effort
going
on
in
in
some
collaborations,
outside
the
cncf,
with
other
open
source
and
other
linux
foundation
affiliated
groups.
F
In
addition,
they've
kind
of
proposed
a
similar
graph
project,
that's
more
requirements
phase,
and
so
there's
some
overlap,
and
so
some
there's
some
collaboration
here.
That
could
happen
between
tag
observability
and
tag
security.
They
actually
call
themselves
s-tag
and
there's
some
debate
as
to
which,
as
to
whether
we
should
call
ourselves
otag,
but
that's
probably
something
for
slack
and
a
poll,
and
then
it
looks
like
that's
all
I
had
in
the
way
of
you
know:
administrative
stuff,
I'm
assuming
ryan.
F
Do
you
want
to
talk
to
the
next
item?
Otep
for
profiling,.
B
Yeah,
I
can
just
briefly
give
like
an
update
of
what's
going
on
there,
so,
basically
a
couple
weeks
ago
they
merged.
I
guess,
like
the
you
know,
open
telemetry
merged
this
project
management
guidelines.
B
So
there's
there's
a
bunch
of
yeah
there's
like
a
bunch
of
different
efforts
around
various
kinds
of
specifications
and
that
kind
of
thing,
and
so
they
wanted
to
create
a
more
sort
of
standardized
process
for,
like
you
know,
the
the
different
check
boxes,
you
need
to
check
in
what
order
you
need
to
check
them,
and
so
the
the
main
ones
that
they,
you
know
that
they
say
is
like
sort
of
the
the
minimal
set
of
criteria
that
you
need
is
one
a
group
of
designers
or
subject
matter.
B
Experts
or
just
you
know,
people
general
generally.
You
know,
I
guess
like
somewhat
interested
and
qualified
and
ready
to
dedicate
some
time
into
working
on
the
project,
and
so
we've
already
sort
of
established
that
we
have
a
lot
of
people
from
a
lot
of
different.
B
You
know
some
open
source
projects,
some
vendors,
some
end
users,
so
we
have
like
a
good
mix
of
people
there
and
then
the
second
one
is
the
tc
needing
to
be
aware.
As
of
last
week
or
two
weeks
ago,
we
got
our
second
tc
member
to
like
sponsor
the
the
working
group,
the
efforts
there,
and
so
you
know
obviously
they're
aware
and
then
the
third
one
is
that
the
spec
approvers
in
the
broader
community
need
to
be
aware
of
progress
being
made,
and
so
that's
the
one
that
we
are
currently
working
on.
B
I
actually
also
just
got
out
of
there
the
specification
sig
sig
yeah.
They
call
them
sig.
Still,
I
think,
or
tag
whatever.
It
is
the
specification
it's
the
sig.
It's.
E
A
Is
six
other
cigs
but
cncfs
tags
hope
that
helps.
B
Yeah
yeah,
so
I
just
got
out
of
that
meeting
and
so
yeah,
so
we're
now
and
I
put
a
link
into
the
doc
which
we
are
still
in
the
process
of
finalizing,
but
basically
yeah
we're
we're.
B
You
know
in
kind
of
the
final
stages
of
getting
that
first,
you
know
check
boxes
done
where
this
becomes
an
official
thing
and
then,
after
that,
the
next
steps
are
creating
a
project
tracking
issue,
a
project
board
which
we
actually
kind
of
got
a
little
bit
of
a
head
start
there,
as
we've
already
started
talking
about
sort
of
more.
B
I
guess,
like
the
writing
actual
code
for
this
you
know,
instead
of
just
talking
about,
we've
talked
a
lot
qualitatively
about
all
the
different
pieces
of
all
the
different
ways
you
can
collect
profiling
data
and
the
different
formats
that
different.
You
know.
Companies
and
projects
are
using,
and
so
we've
already
started
to
figure
out
what
the
qualitative
metrics
that
we'd,
like
are
sorry
quantitative,
metrics
we'd
like
to
have
our
in
terms
of
like
you
know,
benchmarking
stuff
like
that,
and
so
basically
I
think
things
should.
B
We
should
be
able
to
move
pretty
steadily,
hopefully
quickly,
as
that
happens,
but
in
the
meantime,
we're
finishing
up
this
official
spec
and
presenting
it
to
the
specifications
group
and
then
and
then
yeah
we'll
go
from
there.
So
that's
the
update
on
that.
The
last
section
that
I
guess
needs
the
most
filling
in
and
you
know
it's
kind
of
something.
It's
not
something.
That's
meant
to
be
like
done
by
any
means,
but
you
know
we
can
always
continue
to
add
to
it.
B
But
the
last
section
that
we're
still
sort
of
working
on
is
the
different
use
cases
for
profiling
and
yeah,
so
just
fyi
that
group
meets
on
every
other
thursday.
So
not
this
thursday,
but
the
one
following
is
the
next
meeting
there.
So
if
anybody
would
like
to
join,
feel
free
to
join
otherwise
check
out
the
doc-
and
let
me
know
if
you
have
any
feedback,
thoughts,
questions
or
feel
free
to
just
comments
in
the
doc.
F
E
B
Yep
yep
and
oh
yeah,
and
then
I
guess
I
didn't
mention
the
yeah
kind
of
the
the
structure
that
we
went
with
there.
So
we
we
used
the
I
mean.
B
Obviously,
a
lot
of
the
the
signals
are
similar,
and
so
we
kind
of
use
the
structure
from
the
logs
proposal
from
a
while
back
and
then
kind
of
merge
that,
with
the
various
like
vision,
statements,
mission,
ambition,
statements
from
hotel
itself
and
sort
of
like
combine
those
to
create
kind
of
the
outline
for
this
and
then
sort
of
filled
it
in
that
way.
B
So
you
know
yeah,
so
we
really
tried
to
be
intentional
about
having
it
align
with
the
overall
goals
of
old
hotel
as
well
so
yeah,
also
just
fyi
on
how
we
sort
of
came
up
with
the
points
we
came
up
with
the
following
docs
after
this
will
be
sort
of
more
you
know
now,
every
doc
after
this
we'll
get
a
little
bit
more
into
the
weeds.
You
know
what
specific
you
know.
What
specific
fields
do
we
want
to
have
in
the
you
know
in
the
profiling
format?
E
F
Yeah
I've
put
a
link
in
last
tlc
meeting.
We
talked
about
this
specifically
and
in
the
slide
there
if
you're
curious
about
the
actual
meeting
notes
and
whatever
there's
jumping
off
link
points
there
as
well.
So
we've
been
socializing
all
of
this
and
really
trying
to
get
consensus
prior
to
a
formal
thing
and
it
seems
to
be
working
so.
F
Similarly,
I
think
the
rest
of
the
time
is
yours
liz.
He
said
20
or
30
minutes
and
I
think
that'll
take
us
to
time.
So
thank
you
again
for
joining
us
here.
If
folks
aren't
familiar
I'll
hold
it
up
once
more,
everyone
probably
is
here,
there's
a
new
o'reilly
book
out
with
fight
with
liz
and
charity
and
george
with
that
I'll,
say
and
and
stay
tuned
for
the
pr
actually
to
formalize
this,
but
we're
hoping
to
have
a
speaker
series.
F
You
know,
through
the
fall
to
see
what
happens
and
we'll
have
a
process
to
propose
speakers.
You
know
we're
starting
with
authors,
so
there's
a
couple
of
books
that
have
come
out.
This
is
one
of
them
and-
and
you
know,
we
hope
to
make
this
talk.
You
know
feedback
from
domain
experts
and
practitioners,
which
is
obviously
in
our
charter
and
relevant.
So
liz
is
the
first.
F
A
Basically,
what
I
wanted
to
talk
about
today
is
how
I
came
to
where
I
am
on
this
journey
of
thinking
about
signal
types
and
kind
of
what
our
objectives
are
and
observability
and
how
we
can
best
realize
them,
and
what
I
think
is
up
is
up
ahead,
so
I've
been
working
as
a
site,
reliability,
engineer
of
some
kind
for
the
past.
Let's
say
17
years
now,
17
years,
17
18
years
sounds
about
right.
A
I
started
started
doing
this
when
I
was
very,
very
young
when
I
was
17
years
old
and
I
cut
my
teeth
really
trying
to
solve
problems
that
we
wouldn't
necessarily
think
of
as
systems
engineering
or
systems
administrative
problems.
I
actually
started
off
as
a
what
you
could,
I
guess,
described
as
an
abuse
analyst
at
a
game
studio.
A
A
Right
like
it
is
a
user
pattern
that
we're
trying
to
detect
using
the
signals
that
we
have,
but
unfortunately
the
signals
that
we
had
were
not
very
we're,
not
particularly
good,
but
what's
really
interesting
is
that
you
can
start
to
see
kind
of
the
similarities
with
what
with
what
I
do
today
and
and
the
differences
right
like
the
similarities,
are
that
I'm
trying
to
identify
unknown
patterns
caused
by
my
users
in
my
code,
without
necessarily
having
foreknowledge
of
what
these
users
are
trying
to
do.
A
A
What's
the
duration
per
user
of
of
those
gameplay
sessions,
so
you're
able
to
you
know
with
clever
use
of
said
and
awk
and
grip,
we're
able
to
produce
these
like
interesting
command
line
right
like
text
text
histograms
that
showed
some
users
were
completing
or
completing
hundreds
or
thousands
of
play
sessions
in
the
amount
of
time
it
would
take
a
normal
user.
You
know
to
complete,
like
10
play
sessions,
so
that
was
kind
of
that.
A
First
exposure
to
what
I
didn't
realize
at
the
time
was
kind
of
a
high
cardinality
problem
right
where
there
are
many
many
thousands
of
users
who
may
be
logged
into
the
game
at
any
time
doing
you
know
tens
of
thousands
of
different
play
sessions.
How
do
we
sift
that
that
signal
from
the
noise
so
fast
forward
about
another
five
years?
And
I
had
joined
google
at
this
point,
and
this
was
kind
of
my
first
exposure
to
what
you
know
best
practices.
Look
like
what
kind
of
doing
things
in
anger
at
scale.
A
Look
like
and
what's
interesting
is
that
the
tools
were
different
from
were
very
different
from
what
I
did
in
my
game.
Studio
jump
right
like
I
was
not
necessarily
going
to
be
using
gripping
through
logs.
That
was
a
thing
that
did
not
work
at
google
scale.
A
What
google
had
really
embraced
at
the
time,
even
as
of
when
I
joined
in
in
2008,
was
this
idea
that
everything
should
be
recorded
as
a
time
series
metric,
because
there's
just
going
to
be
too
much
data
to
record
to
centrally
index,
and
you
would
use
your
metrics,
which
were
potentially
broken
down
by
machine
and
then
by
job
and
then
by
and
then
by
data
center
and
availability
zone
and
all
these
varying
things
right,
like
you,
could
use
metrics,
essentially
to
do
what
I
eventually
formalized
as
the
idea
of
binary
searching
the
potential
problem
space
right
like
that,
if
we
were
seeing
a
high
rate
of
hdd,
500
or
high
rate
of
latency,
what
we
would
wind
up
doing
is
we'd
wind
up
finding
a
number
of
different
dimensions
to
break
down
by
right,
like
maybe
it's
availability
zone.
A
Maybe
it's
kernel
version
right,
like
I'm
going
to
see,
I'm
going
to
you
know,
keep
on
plugging
in
these
values
that
I
want
to
group
by
to
see
where
all
the
lines
simultaneous
right,
where
one
line
spikes
and
the
other
lines
don't
right,
like
that's
kind
of
how
you
bisect,
where
the
problem
is
coming
from
inside
of
your
systems
and
sometimes
it
was
sufficient
to
bisect
it
down
to
okay
right,
like
you
know,
what
will
you
think
of
today,
as
as
a
particular
kubernetes
pod
template
spec
right
like
that
once
once
you
know
that
it's
a
particular
deployment
id
that
is
all
exhibiting
this
problem
or
a
particular
availability
zone?
A
That's
all
exhibiting
this
problem.
That
would
then
enable
us
to
say:
okay,
let's
you
know,
let's
coordinate
that
off
right,
like
let's,
let's
drain
the
bad
availability
zone,
that's
clearly
having
some
issues
or
let's
revert
that
bad
release.
That's
clearly
having
some
issues
what's
interesting
is
that
google,
there
was
not
necessarily
a
notion
of
caring
about
high
cardinality
in
users.
Basically,
the
idea
was
any
problem
is
going
to
you
know.
Oh,
that's
an
impossible
problem
right
like
you,
you
would
never
want
to.
You
know,
for
both
privacy
and
and
technical
reasons.
A
You
would
never
want
to
group
by
individual
google
search
user,
that's
just
impossible.
It
was
kind
of
the
thinking
at
the
time,
but
sometimes
the
metrics
were
not
sufficient.
In
particular,
the
metrics
were
not
sufficient
in
two
different
ways.
A
When
you
had
issues
of
noisy
neighbors,
when
you
had
issues
of
crashes
and
single
machines,
where
you
could
tell
that
an
abnormally
high
error
rate
was
coming
from
a
specific
set
of
machines,
but
not
necessarily
your
metrics
were
not
sufficient,
and
what
I
found
interesting
was
that
there,
yes,
you
would
fall
back
onto
logs,
but
the
logs
were
not
centrally
indexed
that
we
would
go,
and
we
would
you
know,
not
ssh
the
machine,
but
we
would
go
and
pull
up
a
log
viewer
that
would
scrape
the
the
files
off
of
the
individual
machine
to
go
and
to
go
and
look
at
them
right,
and
that
would
kind
of
enable
us
to
have
this
rolling
circular
buffer
of
logs
that
we
could
go.
A
Go
to
go
to
if
we
desperately
needed,
but
that
it
was
not
a
signal
that
we
relied
upon
kind
of
for
our
bread
and
butter
work.
It
was
kind
of
only
if
all
if
everything
else
has
failed.
A
But
there's
another
interesting
problem,
which
is
what
about
the
problems
that
do
not
appear
as
kind
of
single
point
sources
of
failure
or
what
about
problems
where
you
don't
know
what
to
filter
or
group
by
because
there
were
millions
of
metrics
at
google,
and
I
think,
one
of
the
fascinating
things
that
I
uncovered
there
and
kind
of
what
pointed
me
down
this
path
of
tracing
was
when
I
saw
when
I
saw
for
the
first
time
we
had
a
black
box
probing
service
that
would
basically
repeatedly
hit
the
service.
A
A
You
know
against
a
special
table,
that's
inserted
into
this
into
the
into
this
tenant
in
order
for
us
to
be
able
to
perform,
read
and
write
tests
against
that
one
table,
and
that
was
set
to
always
trace
and
we
were
getting
very
high
quality
data
out
of
that.
Out
of
that,
that
would
tell
us
no
matter
what,
for
these
kind
for
these
black
box
probes,
they
were
issuing.
You
know
multiple
times
per
second,
where
did
it
get
stuck?
You
know,
is
the
request
flow
because
it
got
stuck
in
the
underlying
file
system.
A
A
So
that
was
really
neat,
but
I
think
one
of
the
challenges
was
what
happens
if
you
have
request
comes
in
from
a
user
saying,
hey
that
you
know
the
big
table's
slow,
but
it's
not
something
that
was
necessarily
forced
to
trace
on.
How
are
you
going
to
find
that
request?
How
are
you
going
to
find
the
needle
in
the
haystack
of
a
request
that
looks
like
that,
one?
A
That
is
that
is
traced,
because
if
it
was
not
a
black
box
pro,
this
is
not
something
that
you're
mainly
forcing
on
again
problems
of
google
scale
we're
we're
sampling
one
for
a
hundred
thousand
we're
sampling
one
for
a
million
right
like
so.
It's
like,
okay,
you're.
Looking
for
a
p99
latency
event,
that
is
also
one
in
a
million
sample
rate.
A
Okay,
so
this
is
where
we
finally
get
to
the
idea
of
trace.
Exemplars
of
by
this
point,
the
folks
at
google
had
had
had
designed
a
second
generation
metric
storage
system.
So
we
originally
had
this
thing
called
borgmon.
That
was,
you
know
very
similar
to
and
kind
of,
inspired
prometheus
right.
So
it's
kind
of
this
idea
of
okay
right
you've
got
a
pool
based
protocol
that
goes
ahead,
and
you
know
scrapes
scrapes,
a
bunch
of
key
value
pairs
out
of
host
we're
all
very
familiar
with
this
format.
A
But
what
was
interesting
and
different
about
monarch?
The
next
generation
system
was
that
it
was
designed
to
be
able
to
propagate
additional
information,
but
besides
the
key
value
pairs,
for
instance,
it
had
a
native
histogram
type,
and
not
only
did
it
have
a
native
histogram
type
with
you
know,
custom
bucket
widths
and
and
various
other
things
to
improve
resolution.
A
But
the
folks
who
designed
that
system
had
added
the
the
idea
that
you
could
attach
exemplars
to
your
chase
buckets
to
pass
on.
You
know,
even
if
you
had
aggravated
away
detail,
to
keep
some
of
the
aggregated,
the
the
the
pre-aggregation
detail.
A
So,
for
example,
if
I
had
a
if
I
was
aggregating
a
metric
on
request,
latency
and
I
was
aggregating
it
at
the
at
the
data
center
level,
I
might
choose
for
any
specific
bucket.
Sorry
if
this
is
a
recap
for
people
who
already
know
what
exemplars
are
but
like
for
a
given
bucket.
A
If
I
was
aggregating
away
the
machine
id
field,
because
you
know
that's
no
longer
relevant,
I'm
I'm
I'm
I'm
just
you
know
combining
all
these
various
machines
together
into
into
one
aggregate,
composite
latency
histogram.
A
A
Besides,
leaving
on
and
not
filing
off,
the
cardinality
of
you
know,
hostname
or
of
or
even
of
user
id
or
other
things
like,
that
was
that
we
had
the
idea
of
attaching
traces
trace,
ids
right,
so
you
would
separately,
you
know,
have
the
decision
about
whether
or
not
to
sample.
But
if
you
did
choose
to
sample-
and
you
were
also
in
the
context
of
a
metric,
we
would
tie
together
one
trace
id
that
exemplified
the
the
histogram
bucket
of
the
metric
and
send
it
along
and
when
it
got
post
aggregated.
A
You
would
pick
at
random
one
of
the
trace
ideas
that
might
have
gotten
kept
and
then
and
then
propagate
that
along.
So
the
net
result
of
this
was
that
when
you
look
at
a
histogram,
you
could
see
for
the
first
time
trace,
ids
or
you
could
see
the
first
for
the
first
time
that
higher
cardinality
detail
that
you
had
previously
had
to
file
away
because
it
was
too
it
was
too
noisy
and
to
to
create
a
time
series
for
each
individual
tag.
A
And
this,
I
think,
was
for
me
the
moment
where
I
realized
that
we
don't
have
to
manually
correlate
all
of
these
things
right.
I
don't
have
to
keep
a
correlation
id
in
my
head
and
then
go
to
the
machine
and
and
like
grab
for
that
particular
correlation
id
right.
I
don't
have
to
manually,
you
know,
look
for
trace
ids
and
then
see
whether
the
metric
spikes,
at
the
same
time
like
that,
I
could
visualize
these
two
things
together
at
the
same
time.
A
So
while
I
continued
to
rely
pretty
heavily
upon
metrics
kind
of
the
thing
that
really
opened
my
mind
in
in
20
in
2017,
was
this
idea
that
we
can
better
utilize
signal
types
if
they,
if
they
share
the
same
vocabulary,
if
they
share
the
same
verbs
and
nouns,
and
we
are
able
to
jump
fluidly
between
those
signals,
so
that
if
there
is
something
where
we
don't
have
sufficient
resolution
just
from
a
metric
or
just
from
a
log,
we
can
all.
We
can
jump
to
the
relevant
piece
of
context
that
will
help
us
understand
it.
A
So
when
we
turn
this
on
for
bigtable,
which
was
the
service
that
I
was
running
at
the
time,
what
happened
was
kind
of
magical
in
that
we
were
suddenly
able
to.
Finally
diagnose
for
the
first
time
customer
says
that
their
you
know.
Internal
customer
says
that
there's
a
latency
problem
in
this
particular
part
in
this
particular
partition.
A
Maybe
I
should
group
by
worker
to
find
out
whether
that's
a
fluke
right,
whereas
previously
you
would
have
to
magically
know
that
the
worker
id
was
was
the
relevant
field
and
dimension
to
be
aware
of,
or
you
know,
if
you
clicked
into
a
couple
of
example,
traces
and
they
all
said
that
the
underlying
storage
system
was
slow
great
now
we
know
to
flip
to
the
storage
system
dashboard
so
that
really
really
accelerated
time
to
debug.
For
for
for
my
team,
the
the
big
table-
sorry
team
at
google.
A
A
That's
working
on
tracing,
so
I
think
here's
where
a
couple
of
things
combined
first
off
I'd,
had
a
number
of
vigorous
twitter
arguments
with
charity
majors
over
over
over
the
years,
and
we
developed
a
sense
of
respect
for
each
other,
rather
than
hatred
for
each
other
out
of
that,
it's
kind
of
cool
to
find
people
that
you
can
disagree
with
and
not
get
upset
at,
but
I
think
the
other
thing
was
that
I'd
seen
how
useful
tracing
was
based
off
of
my
example
of
exemplars,
but
I
was
still
not
necessarily
thinking
of
tracing
as
a
primary
data
source
right.
A
A
A
Tracing
can
have
variable
sample
rates.
That
was
the
thing
that
I
had
completely
missed
was
that
at
google,
the
tracing
systems
were
actually
fairly
inflexible.
Right,
like
you
know,
you
would
set
a
sample
rate
all
across
the
board
right
everything
had
to
be
sampled.
You
know
one
for
a
hundred
thousand,
unless
you
were
manually
specifying
a
specific
request
to
trace
through
because,
for
instance,
it
was
a
black
box
probe,
but
it
turns
out
think
about.
Let's
think
about
this.
A
It
just
so
happens
that,
instead
of
sometimes
having
no
exemplars
in
a
bucket,
we
might
have
you
know
we
might
have
one
exemplar
in
a
bucket
for
sure
or
there
might
be
two
or
three
exemplars
right
and-
and
we
might
just
say,
okay,
you
know,
there's
a
sample
rate
of
50
on
this
one
there's
a
sample
rate
of
20
on
this
one.
So
we're
just
going
to
add
those
two
numbers
together
and
say
that
there
are
70
total
events
approximately
that
meet
those
criteria.
A
A
That
really
blew
my
mind,
but
it
suddenly
made
sense
too
right
that
this
thing
that
I
had
been
conceptualizing,
as
you
know,
metrics
is
the
primary
source
of
truth.
We
sometimes
use
traces
traces
can
exemplify
particular
sets
of
behavior.
Instead,
its
traces
exemplify
all
sets
of
behavior.
We
make
it
cheap
enough
by
sampling
and
for
a
majority
of
people
sampling
one
for
one
or
even
one
for
ten
is
sufficient.
A
You
don't
have
to
go
all
the
way
down
to
one
for
a
hundred
thousand,
which
means
that
you
can
get
resolution
to
the
nearest
10
plus
or
minus
five
right
like
rather
than
than
saying
you
know
either
I
get
one
event
which
represents
a
hundred
thousand
or
I
get
zero
right
like
that.
For
most
people
at
most
scales,
it's
actually
possible
to
get
higher
higher
fidelity
than
that
and
still
be
able
to
assemble
histograms.
A
A
What
can
we
do
with
this,
and
the
answer
is
that
by
keeping
traces
and
by
aggregating
them
at
read
time,
it
opens
up
a
few
possibilities
that
we
couldn't
do
with
metrics,
because
metrics
pre-aggregate
right,
they
say
you
know
I
only
want
to
break
down
by
by
attendant
id.
I
only
want
to
break
down
my
host
name
and
it
solves
a
lot
of
the
problems
I've
had
before
of
having
to
correlate
right
to
see.
A
Did
these
two
lines
wiggle
at
the
same
time
right
did
the
error
rate
by
tenant
spike,
at
the
same
time
as
the
error
rate
by
hostname
right
and
if
they,
the
two
curves
exactly
match.
Then
I
know
that
those
two
things
are
probably
at
least
correlated,
if
not
causated,
whereas
if
I
have
the
raw
data
that
that
includes
the
tenant
id
and
the
hostname
that
I
can
perform
these
operations
to
be
able
to
filter
or
group
by
both
of
the
two
things
simultaneously,
rather
than
only
one
at
a
time
and
squinting.
A
At
the
end
of
the
day,
events
are
happening
as,
as
as
our
systems
process
requests
and
we
can
choose
to
pre-aggregate
the
events
and
generate
metrics
or
we
can
choose
to
pass
along
the
detail
about
each
each
event
and
each
request
flowing
through
our
system
and
post
aggregate,
the
metrics,
but
there's
still
representations
of
that
underlying
data
that
we're
trying
to
express
of
what's
happening
inside
of
our
systems,
and
you
can
do
it
at
varying
levels
of
granularity
right.
You
can
do
it
at
the
whole
system
level.
A
You
can
do
it
at
the
individual
service,
individual
request
level
or
you
can
even
think
about
it.
The
line
of
code
level,
even
though,
like
we
know
that
we
can't
keep
every
reflect
every
request
to
every
single
assembly
instruction
right
like
that,
that
that's
wait.
That's
way
too
much
data,
if
you're
keeping
100
of
that
right.
A
So
I
think
that
trade-off
is
that
the
more
granular
you
need
to
get
the
more
you
need
to
sample,
but
no
matter
how
heavily
it's
sampled.
If
you
have
enough
samples,
you
can
reconstruct
a
composite
of
what
happened,
and
all
of
this
is
in
service
of
answering
the
questions
right
like
who.
What,
when,
where?
How?
How
why?
Right?
Like
those
those
are
you
know,
those
are
the
questions
that
we
want
to
answer
about
our
systems,
and
it
does
turn
out
yes
that
some
that
some
debugging
techniques
are
are
better
suited
to
answer.
A
Some
of
those
questions,
for
instance,
right,
like
metrics,
are
really
great
at
answering
the
when
question
of
telling
you
like.
Should
I
be
getting
out
of
bed
for
this
right
and,
if
you're
interested
in
where
right
that
that
is
a
really
great
place
for
tracing
right
for
tracing,
to
tell
you
which
services
to
look
at
or
kind
of
where
that
gap
in
time
that's
unexplained
is
coming
from
and
if
you're
interested
interested
in,
who
that's
not
a
thing,
that's
specific
to
a
signal
type.
A
A
But
I
think
one
thing.
That's
really
eluded
me
over.
The
years
has
been
kind
of
why
and
how?
Because
tracing
does
give
us
some
degree
of
why
and
how?
If
the
problem
is
a
call
to,
for
instance,
to
an
external
resource
like
a
database
right
in
open
telemetry,
we
generate
trace
plans
for
when
you
call
a
database,
and
we
even
have
with
sql
commoner
the
idea
of
propagating
through
to
the
database,
to
tell
you
who
called
you
that
way
you
can
trace
back.
A
So
one
of
the
realizations
I
have
come
to
over
the
past
year
is
that
we
often
need
resolution
beyond
the
request
level
and
beyond
just
the
tags
that
we
attach
to
the
request
that
no
amount
of
tagging
a
request.
A
If
the
only
level
of
granularity
you
can
get
to
is
the
request
is
sufficient
for
being
able
to
understand
how
and
why
did
that
request
spin
for
2.3
seconds
before
it
called
out
to
the
database
right?
What
happens
if
the
problem
is
not
that
the
database
was
slow,
but
instead
that
you
sat
there
thinking
or
maybe
blocking
on
something
or
waiting
on
a
lock
or
something
right
like
we
don't
know,
but
for
some
reason
this
request
stalls
for
2.3
seconds
and
then
it
talks
to
the
database
2.1
seconds
and
immediately
returned
right.
A
What
was
it
doing
and,
yes,
we
can
wrap
this
in
additional
trace
bands,
but
that
breaks
the
fundamental
promise
of
observability.
The
fundamental
promise
of
observability
is
that,
without
pushing
new
code,
we
should
be
able
to
understand
any
behavior
of
our
system,
so
the
problems
that
we've
solved
to
date
with
observability
have
been
kind
of
chipping
away
at
various
angles.
Of
of
these
of
these
questions
right
who?
What?
When,
where?
Why?
How
we've
chipped
away
at
needing
to
push
new
code
to
deal
to
deal
with
them
right?
A
If
you
have
adequate
ability
to
debug
and
diagnose
cardinality
on
the
fly
you
no
longer
have
to
think
about
who
any
or
you
no
longer
have
to
push
new
code
to
learn
who
great
right?
If
you
have
tracing,
you
no
longer
need
to
push
new
code
to
learn
to
learn
where
right.
If,
if
you
have,
if
you
have
an
adequate
time
series
to
unders
to
understand,
like
you
know,
when
behavior
is
happening
right,
you
don't
need
to
say:
oh
crap,
there's
a
system
outage.
Well,
I
better,
you
know
turn
on
the
metrics
right.
A
So
when
is
no
longer
a
problem
but
yeah
the
kind
of
the
the.
Why
and
the
how?
I
think
you
know
either
need
very
high
high
granularity
trace
bands
to
be
wrapped
around
every
function
and
kind
of
turn
on
on
demand
for
future
flags
right.
That's
one
valid
way
of
doing
that,
but
it
turns
out
there's
another
answer.
The
other
answer
is
continuous
profiling.
A
So
this
is
why
a
lot
of
my
attention
over
the
past
six
over
the
past
six
twelve
months
has
been
spent
on
profiling,
because
I
think
it
kind
of
answers
that
that
holy
grail
question,
of
which
specific
line
of
code
is
causing
problems
which
previously
you
know.
Yes,
I
traveled
through
logs,
if
I
happen,
to
have
a
log
statement
that
that
matched
what
I
was
looking
for,
but
otherwise
I
would
have
to
push
new
code
with
a
you
know.
Log
saying
I
got
here,
but
I
think
there
have
been
two
challenges
with
regard
to
profiling.
A
Adoption
I
think
challenge.
The
first
is
that
it's
in
a
similar
maturity
stage
as
as
tracing
was
three
to
four
years
ago,
right
that
it
was
a
disconnected
signal
that
people
are
treating
as
this
kind
of
completely
separate
thing
that
had
no
relation
to
what
people
were
working
on
before
and,
secondly,
that
it
required
a
lot
of
advanced
setup
to
collect
and
to
analyze
and
that
it
felt
to
people
that
you
know
you
had
to
have
the
entire
system.
A
You
know
profile
or
trace
really
well
in
order
to
get
any
value
right,
but
I
don't
necessarily
think
that
that's
how
we
should
be
thinking
about
things
right.
I
think
that
a
lot
of
the
previous
attempts
at
tracing
failed
because
we
tried
to
say
you
know:
oh,
you
must
be
jaeger
or
zipped
in
your
entire
system
in
order
to
get
any
value
out
right
and
it
turns
out
that
there
is
value
in
examining
individual
services
right
and
being
able
to
generate
not
distributed
traces
to
to
understand
what's
happening
inside
of
your
systems.
A
I
think
anything,
that's
true!
That's
true!
For
selling
too,
I
think
the
other
thing
is
reframing.
What
is
the
value
that
we
are
getting
out
of
this
right?
I
think
that
when
we
framed
tracing
as
being
only
for
problems
that
are
that
are
involving
many
different
microservices,
I
think
we
kind
of
we
lost
the
plot
a
little
bit.
A
A
I
don't
think
the
answer
is
you
know
you
must
have
logs
and
you
must
have
traces
and
you
must
have
right
like
at
this
point.
We've
hit
a
saturation
point
where
people
shouldn't
have
to
collect
them
all
like
pokemon
people
shouldn't
have
to
pay
multiple
times
to
store.
What's
fundamentally
the
same
kind
of
data,
I
think
the
right
approach
to
me
is
is
that,
yes,
you
know
there
are
similar
to
how
there
are
separate
use.
Cases
for
for
tracing
were
separate
use
cases
for
tracing
versus
versus
metrics
five
years
ago.
A
Yes,
there
is
some
value
in
using
profiling
to
identify
cost
improvements,
but
I
don't
think
that
that's
you
know.
A
majority
of
software
developers
are
not
thinking
about
how
much
this
is
going
to
cost
in
production.
A
What
they're
trying
to
do
is
understand
is
this
going
to
deliver
a
good
user
experience
in
production,
something
that
spins
for
an
extra
500,
milliseconds
or
extra
two
seconds
is
probably
not
going
to
break
your
bank,
but
it
is
going
to
result
in
blown
service
level,
objectives
and
happy
users
and
that's
something
that
we
ought
to
be
able
to
to
fix,
and
we
cannot
expect
people
to
wrap
everything
in
a
trace
fan
right.
I
I
don't.
A
I
don't
think
that
that's
a
reasonable
presupposition,
I
think
that's
an
extension
of
the
open,
telemetry,
auto
instrumentation
work
to
say
we
should
be
able
to
auto
instrument
your
code
to
the
function
level
without
you
having
to
lift
a
finger
right
like
that.
The
promise
was
that
otel
is
going
to
give
you
request
level
tracing
for
free.
A
Why
shouldn't
otel,
give
you
a
function
level
tracing
for
free
where
that
function
level
tracing
is
is
is
profiles
that
are
highly
highly
sampled.
You
know
per
you
know,
per
per
1
millisecond
increment,
so
that
you
know
sure
you
may
not.
You
may
or
may
not
get
a
sample
for
a
request
that
runs
less
than
10
milliseconds,
that's
fine,
but
for
a
request
that
sits
there.
Spinning
for
for
two
seconds,
yeah
you'll
statistically
get
you
know
at
least
20
profiles.
A
Out
of
that
that'll
give
you
some
you
know
20
or
if
you're
sampling,
everyone
else
and
they're
going
to
get.
You
know
2000
profiles
right,
they'll
tell
you
which
line
of
code
which
function
right
like
which
function
is
slow
and
I
think
that's
kind
of
how
we
connect
the
value
of
profiling.
To
average
developer
is
that
the
average
developer
needs
to
right
like
we,
we
live
in
a
world
of
you
build
it.
You
run
it
right,
so
developers
should
have
service
level
objectives.
A
Developers
should
be
able
to
bug
their
service
level
objectives
to
understand
where
things
going
south
right
who,
where?
What,
when?
Why,
how
and
part
of
that?
Why
and
how
is
tracing
and
profiling,
and
that
this
is
a
new
set
of
behaviors
people
are
going
to
have
to
learn,
but
hopefully
not
a
giant
a
giant
step.
If
we
can
make
the
user
experience
smooth
right,
if
we
can
make
it
as
seamless.
A
As
you
know,
for
me,
at
google,
going
from
a
metric
heat
map
to
a
trace
was
if
we
can
make
going
from
a
trace
to
a
profile.
Is
that,
I
think,
is
the
vision
that
I
have
for
for
the
future
of
observability?
Is
you
know,
truly
being
able
to
debug
any
problem
in
production
anywhere
to
the
line
of
code
to
the
user
and
being
able
to
fix
it.
E
It's
pretty
awesome
thanks
liz
all
right,
I
think
liz
was
right
on
time,
so
I
think,
let's
open
up.
We
have
a
few
minutes.
If
folks
can
just
run
over
a
bit
questions.
E
C
Because
I
recently
went
through
the
process
of
enabling
profiling
and
I
wonder
what
your
stance
is
on-
not
the
runtime
expense,
like
the
the
time
expense
of
profiling,
but
more
also,
that
like
it
depends
on
language
and
and
what
profiling,
but
I've,
seen
a
quite
high
memory
cost
of
profiling
in
like
using
like
profit
stuff.
So
I
wonder
if,
like
with
the
trace
open
telemetry,
if
like,
we
should
be
working
on
something
that
is
a
bit
more
lightweight
or
if
we
can
even
work
with
something.
That's
more
lightweight,
then
yeah.
A
So
I'm
not
necessarily
the
best
person
to
to
speak
to
that,
because
full
disclosure,
when
we
ran
p
broth
funnily
enough
on
our
main
interest
service,
we
discovered
we
were
spending
ten
percent
of
this
of
the
of
the
processes,
time
creating
and
sending
trace
fans
right.
So
right
like
we
view
that,
at
least
at
honeycomb
as
an
acceptable
expense
to
spend
ten
percent
of
our
times
generating
trades
because
it
turns
out,
it
enables
us
to
debug
high
cardinality
issues
that
we
otherwise
wouldn't
be
able
to
right.
A
So
that
is
a
choice
that
we
have
willingly
made
to
sacrifice
a
little
bit
of
performance
for
better
visibility.
That
being
said,
you
know
if
we
really
cared,
we
would
head
sample
rather
than
tail
sample
the
data
right.
So
we
would
not
bother
generating
the
trace
events
in
the
first
place
for,
but
instead
we
choose
to
generate
all
the
data
at
source
and
then
to
tail
sample
it
later.
That
is
a
choice
that
we
have
made.
I
I
think
yes
you're
right.
A
Some
organizations
may
not
view
investment
in
observability
and
kind
of
run
and
and
being
willing
to
tolerate
some
runtime
slowness
in
exchange
for
being
able
to
see
what's
going
on,
and
I
think
that
the
way
that
we
solve
that
is,
you
know
number
one,
as
is
the
case,
was
with
with
tracing
right.
Adjusting
the
head
sample
rate
can
really
really
toggle
that
overhead
versus
versus
granularity
and
fidelity-
and
I
think
the
same
is
true
for
profiling.
A
Our
analysis
says
that
tracing
is
actually
far
more
expensive
than
profiling
in
term
in
terms
of
percentage
cpu
hit.
You
know
even
for
the
services
where
it's
not
as
high
volume
of
traces
and
therefore
we're
not
you
know,
spending
ten
percent
of
our
time.
Mangling
traces,
you
know
we'll
see,
maybe
a
two
percent
hit
from
tracing
and
we'll
maybe
see
less
than
a
point.
Five
percent
hit
from
profiling.
So
that's
our
experience
and
we
continuously
profile
it
profile.
Everything
asterisk
stupid
go
runtime
bugs.
A
Unfortunately,
we
stumbled
into
a
number
of
go
runtime
bugs
because
to
our
knowledge,
we
are
some
of
the
first
people
to
be
exercising
a
lot
of
these
code
paths
in
anger
across
like
100
production,
but
yeah
basically-
and
I
think
that's
the
question
of
maturity,
I
think
that's
a
question
of
effort
right
like
if
enough
people
are
invested
in
in
investigating
this,
if
peop,
if
enough
people
are
feeling
this
pain,
because
they're
they're
using
this
in
anger
right
like
we'll,
get
those
bugs
squashed,
pretty
quick
but
yeah,
I
think
I
I
think
in
my
view,
it
is
worth
it
and
even
if
it
is
not
worth
it,
you
can
always
put
it
behind
a
feature
flag
for
it.
A
You
can
always
turn
on
profiling
temporarily
at
varying
rates.
You
know
you
can
increase
the
profiling
rate
or
turn
it
from
zero
to
you
know
one
one,
one:
every
ten
milliseconds,
even
right.
It's
just
that
limit
of
resolution
right
if
you're
profiling,
every
ten
milliseconds
you're,
never
going
to
catch
something
that
hangs
for
one
millisecond,
but
if
you're
profiling,
every
10
milliseconds,
you
will
get
enough
samples
for
something
that
happens
every
you
know
that
that
blocks
for
two
seconds.
C
And
I
think
that's
an
interesting
point
also
with
like
profiling,
on
the
open
telemetry
level
that
develop
developers
could
provide
some
wave
information
for
the
trace.
Let
the
instrumentation
add
to
call
level
to
know
this
function
will
never
should
never
take
more
than
x,
so
only
create
a
span
when
it
when
it
does
stuff
stuff
like
that.
A
Yeah
that
one
that
one's
a
fun
one,
because
it's
kind
of
a
post
fact
of
knowledge-
thing
right,
right,
yeah,
but
you
can
say
okay
right,
like
if
I've
had
at
least
one
call
this
function
take
more
than
x
seconds
in
the
past.
In
the
past
five
minutes,
then
I'm
going
to
turn
on
increased
sample
rate
for
that
very
big,
so
you
can
dynamically
just,
but
you
can't
catch
it
post
facto,
if
you
never
traced
it
in
the
first
place,.
C
Well,
it's
just
like,
like
just
like
unexpected,
like
in
containerized
environments
like
in
companies
where
containers
have
memory
limits,
for
example,
and
you
want
to
dynamically
enable
okay,
let's
trace
this
path,
and
your
container
keeps
going
out
of
memory
because
well
you
you
switch
the
flag
that
uses
more
resources.
So
some
of
these
features
might
changes,
sadly,
always
involve
restarts
which
then
trigger
like
okay.
The
problem
doesn't
show
up
after
yuri
started
anymore,.
A
Yeah,
that's
why
I
am
you
know,
I'm
cautiously
looking
at
ebpf
approaches
that
run
in
a
sidecar
rather
than
directly
touching
the
process
right,
because
it
does
allow
that
separation
of
instrumentation
from
from
the
code
that's
under
under
test,
but
I
think
that
that
is
a
maturity
question
right,
like
you
know,
for
all
that
I
complain
about
p,
prof
and
and
instability
in
the
go
run
time
like
you
know,
it
is
at
least
a
standard
thing
that
is,
that
is
produced
by
the
go
authors
that
is
well
supported.
Ish.
D
Yeah
thanks
guys
liz
great
talk,
really
appreciate
your
time.
I
I
have
a
question
you
mentioned.
D
Profiling
is
kind
of
where
traces
were
on
the
maturity
curve
like
a
few
years
ago,
and
I'm
wondering
if
you
see,
I
guess
what
I'm
wondering
is-
are
the
adoption
sort
of
strategies
that
you
can
drive
inside
your
organization?
Do
you
see
any
reflection
there?
The
reason
I
ask
is
like
I
tried
to
introduce
profiling,
and
I
think
this
is
a
bigger
problem
in
like
interpreted
languages
like
the
usability
is
harder
for
the
profiling
tools.
I
tried
to
introduce
it
a
few
years
ago.
D
Locally
didn't
play
nice
with
our
test
stack
right
gave
up
for
a
little
bit.
Our
vendor
came
out
with
a
continuous
profiling
tool
in
production.
I
enabled
it
and
immediately
even
with
the
sample
rate,
drives
our
usage
on
resources
out
of
control
and
also
because
it's
a
beta
tool
poisons
all
of
our
metrics
for
the
month
and
that
was
kind
of
like
after
shot
two.
The
organization
was
like
we're
not
using
profiling
anymore
right.
Where
do
you
see
like
this
like?
D
A
It's
100
usability
question
right,
like
you
know,
with
much
love
and
respect
to
jager
for
kind
of
paving
the
way
of
looking
at
a
single
trace
right,
like
the
value,
comes
from
being
able
to
examine
multiple
choices
in
context
right
for
treason,
so
the
value
for
profiling
right,
like
profiling,
is
never
going
to
succeed.
If
the
only
people
who
use
it
are
the
bren
and
greg's
of
the
world
right
are
the
performance
engineers
of
the
world.
Right,
like
you
know,
yes,
you
will
have
right
like
yes,
you'll
have
organizations
that
think
it's
worth
it.
A
For
you
know
a
select
team
of
people
to
understand
profiling
to
be
able
to
you
know,
drive,
cost
reductions,
but
like
for
a
majority
of
organizations,
the
problem
is
not
the
cost
of
the
aws
or
azure
gcp
bill.
The
problem
is
that
they're
wasting
developer
time
chasing
down
bugs
right
like.
Why?
Should
we
not
be
able
to
fix
that
right
like
so?
I
think
that
when
you
articulate
value
that
people
solve
with
that
problem
and
when
you
the
usability
such
that
the
average
dev
can
can
can
can
can
use
the
tools.
That's
that
that's
that's!
A
I'm
not
aware
of
organizations
where
devs
routinely
look
where,
where
you
know
ordinary
dad's
routine,
like
jager
right,
like
you
kind
of
have
to
have
more
layers
of
obstruction
over
the
individual
trace
to
be
able
to
get
value
out
of
tracing.
So
similarly,
you
have
to
have
more
levels
of
abstraction
over
the
profiles
to
be
able
to
or
to
give
people
the
carrot
to.
A
D
With
just
to
clarify,
without
like
some
sort
of
scale
on
the
on
the
profiles
that
you
can
collect
like,
for
example,
local
profiling
is
local
profiling.
Usability
is
not
necessarily
a
goal
in
your
mind
of
any
like
new
profiling
initiative.
A
On
you
know,
go
test
bench,
people
off
right
and
I'll
go
and
look
at
the
at
the
profiles
in
in
in
go
tool,
prof,
absolutely
right,
because
I'm
current
I'm
working
on
a
benchmark
right
now
right,
but
I
think
you
know
to
me
what
the
way
I
should
be
approaching.
This
is
not
in
terms
of
signals.
The
way
I
should
be
approaching.
This
is
what's
the
problem
I'm
trying
to
solve
right
and
to
me
the
biggest
problem
is
I
have
a
trace
right
like
it
used
to
be
I've
requested
taking
five
seconds?
A
A
I'm
frustrated
that
I
have
you
know
that
I
have
a
request
of
taking
five
seconds.
That's
blocked
in
this
individual
request
for
two
seconds,
and
why
is
it
taking
two
seconds
right
like
that's
the
motivating
fire
for
me
right
like
and
and
and
then
the
profiling
is
the
how
right
it's
it's
either
profiling
or
you
know
dynamically,
enabling
traceman's
down
progressive
levels
of
function
stack
but
similar
to
how
you
know
exemplars
take
into
an
extreme
approach
approach,
sample
traces
when
you
turn
on
trace
bands
on
finer
and
finer
levels
of
granularity.
A
That
starts
to
approach
profiling
and
it
turns
out
to
be
way
more
efficient
to
profile
instead
of
to
in
instead
of
creating
trace,
stands
for
every
function.
Call
right
awesome.
Thank
you.
A
Awesome.
One
of
my
hopes
is
that
the
hotel
profiling
effort
is
going
to
standardize.
The
agents
right
is
going
to
lower
the
overhead
is
going
to
make
it
tunable
is
make
it
correlatable
to
the
open.
Telemetry
span
ids
right
like
that,
a
lot
that
you
know
the
mission
statement-
and
I
think
about
you
know
when
I
think
about,
should
this
project
be
an
hotel
is,
does
this
relate
to
signal
collection
and
correlation
to
the
common
set
of
principles
right
like
that's?
A
Why,
for
instance,
we
accepted
sql
commenter
right
like
it's
a
this
is
a
trace
propagation
issue
right,
so
I
think
with
profiling.
It's
a
this
is
a
trace
correlation
issue
for
diagnosing
things
that
go
even
even
further
beyond
beyond
traces.
If
there
hadn't
been
that
connection
to
tracing
right
it
would,
it
would
be
like
okay
great.
This
is
a
performance
tool
right.
Why
doesn't
this
belong
with
ebpf
right.
E
Yep
absolutely
and
then
that's
a
very
good
point
to
call
out,
because
I
think
that
that's
not
clearly
understood
given
the
overuse
of
the
word
profiling
itself.
So.
A
Right
right,
like
it,
turns
out
that
profiling
and
ebpf
are
correlated
right,
but
you
can
use
them
to
solve
slightly
different
problems
right
like,
for
instance,
ebtf
is
more
than
just
profiling.
You
can
use
ebpf
to
live
debug
things
and
look
at
variables
right.
Conversely,
you
know
there
are
other
ways.
Besides
eb
you
have
to
accomplish
profiling,
like
you
know,
runtime
support
for
pre-prof
right.
So
it's
kind
of
this
overlapping
venn
diagram
circles
totally.
E
Totally
and
and
that's
a
very
good
point-
I
think
we
are
a
bit
over
time
and
two
minutes
to
ten.
So
if
folks
have
any
one
more
question,
maybe
we
can
address
otherwise
we
can
give
a.
C
Minute,
I
don't
would
have
one
last:
what's
your
thought
on
animal
detection
at
collection
or
span
profile
creation
time
like
let's
say
you
don't
profile
all
the
time,
but
if
your
metric
exporter
notices,
hey
then
to
the
standard,
let's
collect
profiles.
A
A
A
A
E
At
time
and
kevin,
thank
you
again,
liz
again
deeply
grateful
that
you
could
join
today
and
and
really
appreciate
everybody
joining
in
a
really
awesome
talk
today.
Thank
you
and
we'll
be
posting.
The
recording
right
after
it's
available
take
care.
Everyone
thanks.
Thank
you.