►
From YouTube: 2022-10-18 CNCF TAG Observability Meeting
Description
Yuri Shukro - Talk on the 6 pillars of observability data
* https://medium.com/@YuriShkuro/temple-six-pillars-of-observability-4ac3e3deb402
* https://research.facebook.com/publications/positional-paper-schema-first-application-telemetry
Meeting notes and more: https://github.com/cncf/tag-observability
A
Alolita
sharma::
let's
wait
for
some
folks
to
try
it..
You
really
did
confirm
that
he
was
going
to
join
in
to
kind
of
talk
about
your
to
get
kind
of
talk
about
um..
Some
of
the
his
thoughts
on
observability
and
hotel,.
B
A
B
A
Alolita
sharma:
areas
where
you
know
the
end.
user.
engineers,,
you
know,
end
up
kind
of
building
out
the
usability
of
you
know
many
of
these
core
tools.
right?.
So
it's
not
only
just
building
out
the
core
features,,
but
it's
also
looking
at
overall.
A
B
A
Alolita
sharma:
doc.
and
as
I
was
mentioning
to.
A
Alolita
sharma:
daniel,.
We
have.
A
A
Alolita
sharma:
um,,
I
think
I
I
just
ping
yuri
so
hopefully
he'll
join
shortly.,
but
the
shared
link
of
the
doc..
Let
me
just.
A
Alolita
sharma:
for
those
of
you
who
are,
you
know,
in
the
area.
it's
in
in
and
around
detroit
again.
A
A
A
D
Yuri
shkuro:
oh,
matt
young:
hey!
welcome!
matt
young:
oh,
on
the
previous
topic,,
really
fast!.
I
didn't
wasn't
there.
A
poll
put
out.
I've
been
a
little
out
of
touch.
uh,.
I've
started
a
new,,
a
new
role.,
so
I've
been
a
little
dark
for
a
month
until
I
kind
of
I'm
doing
some
onboarding..
But
I
do
remember
there
was
a
dinner.
D
C
Henrik
rexed:,
as
of
now,,
if
you
want
to
come
uh,,
we
have
booked
a
table
for
thirteen
people.
So
if
we
have
to
adjust
the
booking,,
let
us
know.
so
we
can
adjust
to
booking.,
but
the
dinner
will
happen
on
wednesday.
C
A
Alolita
sharma:
awesome.
thank
you,
henrik..
I
just
noted
it
in
our
agenda
docs,,
but
henrik
feel
free
to
post
any
details.
A
alolita
sharma:.
Would
you
like
to
get
started?
again?
uh,
with
great
pleasure.?
Let
me
just
turn
off
our
sharing.
E
E
Alolita
sharma:
yeah,
sure.,
but
again,
for
those
of
you
who
don't
know
your
he
is,,
one
of
the
core
maintainers
of
jager
has
been
very
he's
a
next
subject
matter:
expert
in
the
open,
source.
observability
has
been
involved
in
observability
for
a
long
time.
A
E
Yuri
shkuro:
no,
thank
you.
um.,
so
yeah,
uh,,
my
name
is
juris
kuru..
I'm
a
your
engineer
at
meta..
My
primary
focus
is
obserability
platforms
and
products
for
for
internal
consumption.
and
uh,.
As
I
already
mentioned,,
I
work
with
jaeger
and
open
to
them
at
the
projects
and
open
source,.
E
Yuri
shkuro:,
so
yuri
shkuro:,
today,
or
or
like,,
maybe
a
couple
months
ago,
before
the
profilers
came
in
open
to
them..
It
was
essentially
three
pillars:
metrics,
logs,
and
traces.
um!,
and
there
is
some
kind
of
noise
about
other
stuff,
like
events
uh,,
but
wasn't
really
anything
official.
um,
then
there's
also
not
determined
industry
that
you
may
have
seen
like
me..
That
kind
of
adds
the
events
to
the
to
the
picture.
um,.
E
Yuri
shkuro:,
like
exceptionally
bad
name
events.
but
I'll
I'll,
go
over
that
um,
and
then
uh,.
I
think
tomorrow,
with
some
of
the
things
that
are
starting
and
open,
telemetry.
we're
agent
profiles..
The
events
are
kind
of
being
discussed
into
like
what
exactly
it
means
to
to
support
events
like
we
have
span
events..
We
come
some
other
events,
potentially
in
the
log
in
space.
um,,
and
so
I,
when
I
was
kind
of
discussing
this
in
terms
of
that
metal.
one
of
my
colleagues
came
up
with
this
temple.
E
Yuri
shkuro:
acronym,
and
then
I
said,
well,
what
if
we
do,
the
full
work.
temple.
uh,,
because
the
one
signal
that
was
missing
in
in
all
discussions
are
is
is
the
exceptions.
um.,
and
so
I
wrote
a
blog
post
about
it,
linked
to
all
kind
of
going
through,
and
that's
what
I
I
I
will
go
through
quickly.
Here,
and
the
sort
of
the
ordering
of
the
letters
in
the
word
doesn't
mean
anything..
It's
really
because
the
word
is
nice.,
it's
temple,,
but
it's
it
doesn't
imply
any
sort
of.
E
Yuri
shkuro:
of
priority
or
anything.,
so
I
will
actually
go
in
a
more
traditional
order
through
the
signals,
starting
with
metrics.,
and
so
metrics
are
as
many
of,
you
know,
as
like
a
numerical
observations
that
are
highly
aggregateable.,
we
kind
of
uh..
We
do
support
dimensions
on
them,,
but
we
often
drop
those
dimensions
from
the
row.
Events
and
aggregation
allows
us
to
drastically
reduce
the
amount
of
data
that
we
have
to
store.
uh,
and,
at
the
same
time,
sort
of
provide
much
longer
retention.
E
E
Yuri
shkuro:
mostly,
though,,
when
we
talk
about
metrics
in
in,,
you
know,,
open
to
themetry
space.,
we're
talking
about
operational
metrics
and
not
so
much
about
business.
Metrics.
uh,
in
fact,,
like
business
metrics,,
I
typically
more
usually
collected
from
like
in
the
form
of
structured,
looks
rather
than
the
actual
traditional
sort
of
like
time
series
thing.
E
Yuri
shkuro:
and
one
thing
that
methods
are
great
is
for
monitoring
because
they're
highly
accurate.
they
don't
lose
preceding
with
the
aggregations
well,
when
done
correctly,
like
you,
you
can,
of
course,
aggregate
average,
average,
but
like,.
If
you
don't
do
that,,
then
you
or
you
get
a
very
good
numbers
uh,,
but
they
generally
considered
to
be
fairly
bias
for
troubleshooting,
because,
as
a
with
aggregations,
with
with
dimensions.
uh.,
and
so
you
kind
of,.
You
know
there
is
a
problem,,
but
you
know
where
and
why,.
E
Yuri
shkuro:
and
so
logs
is,
is
like
the
very
classic
way
of
troubleshooting
systems,
and
there
are,
uh,
several
categories
of
logs.
Like
then
structure
to
the
classic
printf
style,
free
form,
text,
logs.
um,,
we
send
it
structured..
Sometimes
you
sort
of
like..
You
can
consider
same
structure
in
the.
I
think
the
timestamp,
or
like
a
log
severity
can
be
isolated
to
the
separate
field
of
the
log,,
or
sometimes
you
can
even
do
well.
give
an
api
to
the
users
where
you
say
just
log
random
events.
in
in.
E
Yuri
shkuro:
and
sort
of
like,
in
the
structure
form
where
you
give
names
to
the
dimensions
right?,
and
so
you
can
have
region
as
a
dimension,
or
you
can
have
a
customer,
id
as
a
dimension.
That
kind
of
makes
it
same
is
structured.
what
I
still
separated
from
fully
schematized,
like
fully
structured,
looks
where
you
actually
go
with
the
schema
first
approach,
and
uh,,
something
similar
that
that
is
much
more
prevalent
in
a
business.
Analytics.,
uh,
uh,,
where
you,
you
define
the
scheme
upfront,
because
it's
really
using
it,
and
not
just
one.
E
Yuri
shkuro:
about
yourself
as
the
producer
of
the
log,,
but
much
more
about
the
consumers
of
the
of
those
looks,
and
how
it
the
tole
effects
of
the
consumers.
um.
one
super
nice
feature
about
logs
is
that
they
are
very
local
to
our
specific
resource
in
meeting
those
logs,
and
so,.
But
they
are
very
easy
to
chart
from
sort
of
capacity
capacity
management
points,.
If
you
like.
you
can..
If
you
have
a
service,,
you
can
do
that
service.,
and
this
is
how
my
slog
volume
you
consuming
right?
E
Yuri
shkuro:,
um.
and
but
at
the
same
time
logs
are
very
genuinely
expensive.,
and
so
you
kind
of
give
this
knobs
to
people
to
say
you
can..
You
can
do
various
severity
and
uh,
various
retentions,
and
all
that,
and
one
other
thing
about
well,,
because
they're
localized,
they
they
kind
of
hard
to
correlate
across
the
architecture,
or
even
within
the
node
right.
When
you
have.,
I
don't
know.
E
Yuri
shkuro:
one
thousand
kps
on
a
service.
uh,,
all
your
logs
come
with
a
single
pile
of
of
of
like
consciousness,
right,
and
it's
very
difficult
to
make
sense
of
them.
Without
introducing
some
other
nick,
and
using
which
you
really
come
from
tracing
um,
and
those
so
tracing
is.
E
Yuri
shkuro:
is
also..
You
can
think
of
it
as
a
special
form
of
log,
but
structured
and
request
scope,
loads
right,
all
more
generically,.
I
think
I
I
I
prefer
to
call
trace
as
a
workflow
centric
load,,
because
request,
kind
of
narrows
you
more
to
the
rpc.
space,,
whereas
workflow
opens
up
other
avenues
of
vlog,
and
where
you
have,,
I
don't
know,
c.
icd.
pipeline,,
which
isn't
really
working
on
rpc.
I
think,
or
the
the
messaging
and
and
data
pipelines
where
the
workflows
maybe
will
define.
but
they're,
not
our
pc.
based.
E
E
Yuri
shkuro:
in
in
concert
with
log,,
so
the
traces
are
distributed,
and
as
a
result,
they
are
actually
pretty
difficult
to
to
a
portion
to
specific
services
in
terms
of
capacity
usage,,
because,,
uh,,
the
value
of
tracing
comes
from
the
fact
that
they
span
a
lot
of
different
components.,
and
so
who
do
you
build
for
that??
Is
it
like.
E
Yuri
shkuro:,
the
first
service,,
the
start,,
the
trace
that
made
a
sampling
decision.,
or
do
you
build
the
series
which
says,
oh,,
I'm
going
to
add
like
hundred
spans
to
this
trade??
It's
big,,
all
my
internal
spans,,
and
that
affects
everyone
else,
right?,
so
that,
of
course,.
I
have
not
seen
a
good
solution
yet
to
this
problem
in
the
industry
is
how
you
properly
sort
of
like
deal
internal
beating.,
I'm
not
thinking
about
the
and
sort
of
like
a
vendor
dealing
um.
E
Yuri
shkuro:
one
unique
feature
that
traces
provides
in
in
terms
of
monitoring
people.,
don't
often
think
of
traces
as
monitoring,,
but
they
do
give
you
end
to
end
monitoring
capabilities
which
are
just
impossible
with
other
two
limited
types.
and
a
simple
example:
is,
uh,,
a
message
delivery
front
to
end
timeline,
right?,
because
you
have
to
collect
those
murders
at
different
points
in
the
architecture,,
so
all
other
to
them
to
types
of
very
localized
and
traces.
A
unique
in
that
sense..
But
there's
a
lot
of
other
kinds
of
use:
cases
that
for
traces
that.
E
Yuri
shkuro:
uh,
not
like
fully
explored
in
the
industry,
today,
like
root
cause,
isolation
is
probably
the
most
frequent
people
think
of.
uh,,
but
there
is
also,
the
uh,
like
some
of
the
big
companies
that
I
know
they
they
very
effectively
utilize
and
traces
for
resource
distribution
for
like
for
product
line,
distribution.
E
Yuri
shkuro:,
something
that's
again
very
difficult
without
you
in
two
types.
and
now
I
mentioned
that
the
advance
is
a
is
a
super
bad
name,,
because
technically
all
telemetry
starts
with
events.
and
so
uh,.
When
we
talk
about
events
as
a
as
a
distinct
telemetry
type,
we're
really
talking
about
change
events
primarily,,
although
you
could
extend
that
notion
to
some
other
stuff.
Like
I
don't
know,
weather
events,
or
I
don't
know
big
football
game
in
the
town,,
and
that
might
call
that
this
by
a
traffic
spike.
E
Yuri
shkuro:,
very
often
uh,
are
responsible
for
over
fifty
percent
of
outages
in
many
organizations
and
and
by
change.
Events,
I
mean
like
code
deployments.
configuration
changes.,
maybe
some
like
routing
configuration
changes,
even
some
order
or
the
event
like
what
the
scaling
could
be
considered
to
change
event
as
well,,
because
they're
not
that
often,,
although
that
they
kind
of
strive,
starts
to
boolean
the
um,,
the
boundary,
or
and
in
terms
of
shape.
E
Yuri
shkuro:,
the
events
are
just
nothing
but
structured,
logs,,
right?,
and
so
a
a
reasonable
question
is
like,.
Why?
Why
do
we
consider
them
separate
to
them?
to
type?,
and
my
reasons
for
for
is
considering
them
is
because
they
actually
have
very
different
requirements.,
one
of
them
is,
uh,.
One
hundred
and
fifty.
E
Yuri
shkuro:,
I
know
some
specific
error
message:
right?,
it's
probably
going
to
do
it
like
thousands
of
times.
and
so,.
If
you
lose
five
of
those,,
no
one
cares,.
You
still
get
a
strong
signal
that
there
is
a
problem
of
this
type,
and
then
you
can
go
and
localize
it,,
whereas
with
events,,
if
you
did,
the
deployment
of
a
specific
code
commit,
and
you
lose
that
as
a
to
the
inter
platform.,
then
you
kind
of
in
the
bad
situation..
You
may
not
be
able
to
troubleshoot
your
outage
for
much
longer..
E
I
I,
if
you
didn't,
lose
that
events
right?
so
like
the
much
cry,
real
ideal
to
require.
E
Yuri
shkuro:
and
uh,
on
the
other
side,
like,
yeah..
So
when
we
look
in
the
four
or
four
events
like
with
looks.
we're
not
looking
for
specific
log
instances.,
usually
we're
looking
at
a
more
like
aggregate
view
of
them,,
whereas
with
events,
we
very
often
look
into
a
very
specific
instance
of
an
event
as
part
of
the
troubleshooting,
and
as
a
result,.
They
also
tend
to
be
much
lower
volume
than
the
looks.
um,.
But
but
that's
not
only
the
case,
though,.
E
Yuri
shkuro:
depends
in
again
how
you
characterize
what
an
event
is..
Now
the
profiles
coming
into
open,
telemetry.,
that's
that's
great
to
see.
uh..
They
had
a
bit
of
a
hard
time
like
even
just
describing
what
the
profile
is
in
the
hotel.
um,
uh,.
I
think
the
current
definition,
I
I
I
call
that.,
you
know
it
when
you
see
it.
really.
E
E
Yuri
shkuro:
personal
experience.,
I've
noticed
that
profiles
are
usually
much
lower
usage
than
that
the
toilet,,
because
they're
kind
of
a
power
user
tool,
like
even
though
most
engineers
do
come
across
profiling
tools
and
some
don't
need..
Sometimes
you
do
have
a
performance
issue,
and
you
want
to
look
at
that.,
but
there's
not
something
that
you
do.
Every
day.
um,
unless
you
are
like
a
dedicated
performance
engineer
and
who's
the
office
to
go
across
multiple
systems
and
kind
of
do
this
type
of
investigations.
um.
E
E
Yuri
shkuro:
uh,
as
far
as
the
way
that
we
think
about
like
open,
toile
image
instrumentation,
whether
automatic
or
manual
profiles
are
just
like.
E
Yuri
shkuro:,
the
collection
framework
for
profiling
usually
integrated
with
the
runtime
itself.,
and
so
you
kind
of
get
it
out
of
the
box
uh,
and
they
tend
to
generate
much
larger
volumes
because
there
is
like
a
one
profile..
It
can
be
pretty
large
if
you,,
if
you
capture,
and
a
bunch
of
stuff.
uh.,
the
one
thing
that
is,,
I
think
people
don't
often
realize
is
that
profiles
are
actually
very
well
aggregateable,
and-
and
that's
actually
is
a
huge
power..
When
you
do
on
like
a
consistent.
E
Yuri
shkuro:
always
on
providing
right
in
production.,
so
it's
not
like
every
second.,
but
you
kind
of
consistently
taken
profiles
from
production..
Those
things
can
be
aggregated
and
give
you
a
lot
of
useful
information
about
a
sort
of
like
overall
impact
of
different
things.,
so
um,.
I
I
worked
with
organizations
where
they
are
using
this
to
be
able
to
sort
of
attribute
these
things
even
to
the
like
a
pull
request
when
you
have
a
pull
request,
and
they
say,
oh,.
You
add
in
this
changing
this
function,
and
this
changes
your
sort
of
like
a.
E
E
Yuri
shkuro:,
but
I
like,.
I
think
this
is
not
a
prevalent
kind
of
at
least
experience
from
from
what
I've
seen.
um.
and
finally,
the
exception.
That's
the
one
that
I
think,
is
completely
missing
from
the
open
to
limited
discussions,
today.
um!
and
on
one
hand
again
the
boundaries
between
the
telameter
types
are
kind
of
blurred..
You
can
always
find
sort
of
like
exceptions,,
but
and
and
exceptions
as
a
as
a
form
of
where,
like
super
structured
logs,.
They
technically
are
defined
and
open
to
limited
prot
above
format.
today.
E
Yuri
shkuro:
um,,
but
they
just
sort
of
like
we
didn't
pay
much
attention
to
the
processing
and
specifically
to
the
sdk
impact
and
collection,
because
one
of
the
things
that
uh,,
when
I
first
time
I
I
ran
across
century
in
production
center,
was
like
an
open
source
pro
exception,
capturing
thing..
I
was
just
blown
away
by
how
much
information
that
gave
me
is
of
like
I
got
the
ticket
from
some
other
team
saying,
no,.
We
see
in
this
problem
from
jger
sdk
recent
release
in
python,
and.
E
Yuri
shkuro:
they
in,
instead
of
a
stack
trace,.
They
gave
me
a
link
to
the
century,
and,,
like
it,,
took
me
like
one
minutes
to
identify
the
root
cause,,
because
I
was
able
to
go
and
see
like
for
every
stack
of
the
frame
and
the
exception..
What
are
my
local
variable??
So
I
could
reason
like
greatly
about
like
what
was
going
on
in
the
application
of
of
course,
like
overlaying
it
with
the
source,
code.
uh,.
But
this
is
something
that
a
a
as
as
a
sort
of
like,
as
a
debug
and
experience..
E
Yuri
shkuro:,
and
so
that
that's
kind
of
that's
why
I
think
that
exceptions
deserve
a
special
for
like
a
letter
in
in
the
acronym.
uh,
they're,
also
aggregateable,,
because,
it,,
it's
very
common.
when,
when
you
do
have
a
well
established,
like
a
exception,
processing
pipeline,,
it's
very
common
to
look
at
aggregates
of
those
saying,
like,
oh,,
I'm
I'm
seeing
a
new
type
of
exception,
suddenly
popping
up
as
a
as
a
time
series
right?
and
the
way
those
pipelines
work..
They
are
very
special.,
they
sort
of
like..
E
Yuri
shkuro:,
clever
things
about
fingerprinting
them,,
maybe
like
collapsing
some
of
the
stack
frames
that
are
not
interesting,
so
that
you
can
identify.
E
Yuri
shkuro:
unique,,
but
also
like
a
common
patterns
in
those
tech,
frames.
and
and
then
do
they
sort
of
group
in
analysis,
and
show
them
in
in
aggregate.
and
as
a
result..
They
also
tend
to
have
custom.
ui,
and
I
should
have
added
customers
the
case
uh,,
because
again,
the
way
that
the
century
and
it's
raven
sdks
are
able
to
collect
this
sort
of
like
the
information
about
exception.
That
requires
a
very
special
sdk
to
be
integrated
into
application.
E
E
Yuri
shkuro:
typically
talk
about
that,,
the
the
temple.
uh,,
maybe
we'll
we'll
get
adopted,,
as
is
a
term,
because
I
think
it's
awesome.
um,,
then
the
boundaries
between
things.
As
I
mentioned,
that
pretty
diffuse
you
can
like
you.,
you've
seen
that
a
lot
of
stuff
can
be
classified
as
an
event,
or
is
a
log
uh,,
but
with
all
kinds
of
caveats.
and.
E
Yuri
shkuro:-
and,
of
course
this
is
just
to
this-
is
the
data
pipes
that
we're
talking,
about.
right.,
we're
not
to
can
but
actually
absorb
the
solution
that
still
has
to
come
afterwards
to
aggregate
all
this
stuff.
um!
and
one
other
thing
I
want
to
mention
is
like
I
have
a
talk
at
the
next
week
about
another
uh,,
seeing
that
we
published
a
a
a
paper
on
schema,
first
application
to
limit
three.,
so
the
I
mentioned
like
a
kylie,
schematized
and
structured
logs..
E
A
D
Matt
young:
more
humble
request,
if
there's
time,
or
if
you're,
if
you're
able
to
today.
uh,,
could
you
provide
like
sort
of
the
next
layer
of
detail
on
the
paper
you
mentioned
around
schema?
first.
D
Matt
young:
stuff?
um,.
We
have
as
part
of
the
tag
a
linux
foundation,
internship
going
on
right
now,
presently.
that's
about
a
month
in
to
to
generate
ontologies
for
kubernetes
and
a
couple
of
other
ancillary
workloads,,
the
top
kubernetes
in
the
service
mesh
space..
It's
a
collaboration
between
the
networking
tag
and
observability.
tag,
so
I'm
kind
of
curious,.
Just
if
you
could
uh,.
D
Matt
young:
give
us
a
little
overview
if,
if
there,,
if
there's
time,
and
then
if
this
is
the
right
space.
E
Yuri
shkuro:,
you
mean
now?
yuri
shkuro:
yeah,,
but
I
don't
want
to
put
you
on
the
spot.
Like
I
can.
I
can
actually
like
I
I
I'm
preparing
for
this
area
to
as
well.,
so
I
do
have
that.
but
uh,.
I
think,
on
the
high
level
what.
E
Yuri
shkuro:,
a
schema.
first
approach
is,
is,
is
the
opposite
of
a
code
first
approach..
So
most
of
the
time
to
type
today
is
produced
with
the
code
first.
where,
like
I.
I
just
write
some
attributes
to
to
an
sdk.
ah,
and
that's
my
source
of
truth
about
what
the
shape
of
tileameter,
the
timing
meeting
right?
and
that.
E
Yuri
shkuro:
provides
absolutely
no
metadata
about
what
that
telemetry
means
to
the
consumers.
uh,.
It
has
no
safety
in
terms
of
like..
If
you
change
it,,
are
you
gonna
break
your
consumers.?
It
doesn't
give
you
any
information,
about.
well,,
I'm
I'm
writing
like
the
number,
as
as
a
as
a
latency..
What
is
the
units
of
that
number,
right??
So
those
like
very
common
problems,,
that
kind
of
stem
from
the
lack
of
metadata
about
the
telemetry
right
and
the
approach
to
metadata
like
schema
first,
is
not
the
only
approach.
and
so.
E
Yuri
shkuro:,
we
we
sort
of
contrast,
a
various
different
other
approaches
in
the
industry,
with
with
like,
how
how
well
they
fit
our
goals
for
for
sort
of
like
knowing
the
metadata
about
symmetry.,
right,,
um,
and
some
of
them.
Like
I
see
ninety
conventions,
and
open
today,
meter,
or
to
limited
schemas..
There
is
also,
like
vendor
approach,,
where
they
just
automatically
in
reach,
telemetry
that
you
collect
with.
so
to
like
infrastructure
dimensions
which
is.
uh,
actually,.
You
can
see
that
it's
pretty
green
across
the
board,,
except
that
it
just
doesn't
support
certain
things.
E
Yuri
shkuro:
at
all,,
like
any
custom
dimensions.,
you
can't
do
that
custom.,
metadata,
um,
and
uh,
uh,
and
so
do
this..
This
is
what
basically
the
the
like
exists
of
the
paper.
we
kind
of
go
through..
What
is
our
approach
to
schema?
First?
uh,,
I
think
at
meta..
There
is
a
a
a
very
important
aspect
of
the
cultural
change
that
uh,
occurred
already
several
years,
ago,,
where
it
was,,
I
think,
with
all
the
kind
of
privacy
and
other
like
big
data
requirements,
which
we
said.
E
Yuri
shkuro:,
as
we
produce
the
like
all
of
the
workhouse
data,,
we
really
have
to
start
with
schema
first,
right?.
So
it's
a
very
established
already..
It
did
not
apply
to
your
application.
telemetry,
um.
and
that's
what
we're
introducing.
we're
saying,
yeah,
the
same
approach
scheme
at
first
works
for
application,
symmetry
as
well,
and
in
general
it
works
much
better
across,
like
you
can
see
here
this
mostly
green
across
our
evaluation
criteria,
uh,
at
at
the
expense,
slight
expense
of
the
delivery
developer
experience..
But
we
have
already
a
bunch
of
tools
in
that
area.
E
E
Yuri
shkuro:,
so
the
fact
that
yeah,,
you
have
to
stop
and
think
about
this
scheme
that
you're
producing,
rather
than
just
really
really
writing
what
you
want.
uh,,
because
again
date
is
about
consumption,,
not
so
much
about
production
and
and
and
a
very
typical
view
on
telemetry
is.
oh,,
I'm
just
going
to
throw
stuff
in,
and
then
some
call
telemetry
platform
is
supposed
to
make
sense
of
it
and
and
give
me
like
great
solutions
to
investigate
outages.,
and
this
doesn't
work.
This
way.,
yeah,
well,
said
I,
said,,
you,,.
A
A
yuri
shkuro:,
so
you
can
just
go
to
the
website
and
just
drop
a
link.
I
can
look
for
it
to..
Let
me
cook
it,,
I
mean,
do
the
g.
Do
I
link?
should
work
as
well,
because
I
think
it's
you
can
just
uh.
A
Alolita
sharma:
very
cool.,
you
again.,
thank
you,
so
much.
uh,,
it's
really
nice
to
have
you
on,
you
know.
joining
in
the
tag
meetings.
and
and
again..
I
think
I
hope
that
you
know,
with
all
the
activity
that
is
ongoing
in
hotel.
A
A
Alolita
sharma:
uh,
also
up
to
the
tooling
that
you
know
a
service
may
provide.
but,
on
the
other
hand,.
What
are
your
thoughts?
About??
You
know:
pre,
aggregation,
uh,
and
and
um,.
You
know
kind
of
correlating
some
of
the
data
before
it
even
hits.
A
A
E
Yuri
shkuro:
well,,
I
mean
this
kind
of
the
whole
point
of
of
of
our
paper
is
that?
uh,
you
can
it?,
it
depends
on
how
you
produce
telemetry..
You
can
do
this
through,
like
with
semantic
conventions,,
which
is
a
way
it's..
It's
a
weaker
way
than
we
would
like
uh,,
because
it's
just
like
doesn't
need
some
of
the
other
requirements
that
we
have.,
but
but
yeah,
that
like,
if,
if
all
of
your
telemetry
is
compliant
with
the
c
ninety
conventions,
reliably,
then
um!.
That
gives
you
like
a
power
to
correlate
them.
E
E
Yuri
shkuro:,
I
don't
know.,
I
mean
if,
if
I'm
in
certain,
let's
say
sort
of
like
some
sort
of
like
a
customer
id
in
in
my
particularly
limited
data
set.,
and
I
want
to
say,
yeah,,
that's
the
same
field..
The
would,
in
my
other
data,
set.
E
Yuri
shkuro:,
could
you
do
it
with
semantic
conventions,?
You
can
it.
it's..
It
becomes
less
clear,
like
well,,
who
is
responsible
for
sort
of
like
for
data
governance
of
that
right?,
because
it's
not
going
to
be
open
to
the
image,,
because
it's
a
completely
customized..
So
you
kind
of
need
to
stand
up.
Your
own
organization,,
saying
like
this.
Is
my
data
governance,
for
to
damage
it.
and
that's
kind
of
problem
is
unavoidable,,
because
we
also
have
that
same
problem
with
the
schema.
first
of
all.
E
E
Yuri
shkuro:,
it
doesn't
have
to
be
one
single
data,
governance,
organization,,
and
it's
automatically
recognized
like
once.
You
put
it
as
a
metadata
in
the
schema.,
then
our
both
our
back
end
platforms
and
our
sort
of
like
a
front-end
tools,.
They
automatically
recognize
it,
and
then
they,
so
you
don't
have
to
do
the
correlation
that
they
at
the
ingestion
level.
uh,,
just
because
they
you
essentially
you..
You
label
your
data
already
with
with
things
that
oh,
this
can
be..
These
are
the
the
columns
that
you
concurrently
on.
B
Daniel
golant:
yeah,,
I
just.-
I
just
want
to
clarify,,
because
I
think
I
I
think
I'm
getting
what
this
scheme.
um,,
I'm
forgetting
the
schema
first
application
to
a
long
tree
means..
But
are
you
saying,,
you
know,
a
an
approach?
Basically
where
the
event
described,,
like
the.
B
Daniel
golant:,
the
interface
through
which
you
log
the
high
level..
What
we
call
like
everything
is
an
event
is
unified,
and
then
the
event
that
you're
logging
that
you're
sending
out
itself.
maps
to
a
schema
which
describes
the
way
you
store
and
process.
That
data
is
that
what
you're
getting
at
basically
here.
B
E
Yuri
shkuro:,
so
we
do
provide
a
an
incremental
change
to
the
api's
for
for
telemetry,,
but
we
do
not
try
to
consolidate
them..
So
a
metric
api
will
remain
a
metric
api
right?
it's
in
the
and
the
tracing
is
different.,
but
we
build
the
common
building
blocks
into
those
apis,
such
that
when
you
define
a
a
a
schema
for
your
telemetry
like
in
the
protocol,
or
in
our
case
and
thrift.,
then
you
get
the
auto-generated
struck
that
you
populate,
which
gives
you
all
kinds
of
nice
being
things
about
like.
E
Yuri
shkuro:
that
that's
what
you
populate,
and
whether
but
then
that's
still
the
sub.
I
was
like
up
to
the
specific
to
the
image
sdk:.
What
to
do
with
that.
struck
so
like
with
metrics.
um,.
We
actually,
our
metric
system
is
interestingly,
is
like
different
from
um..
The
way
that
most
metrics
system
exist
today
in
that
it
is
more
like
a
table
than
like
a
time.
Series.
so,.
E
E
Yuri
shkuro:
three
independent
time
series
is
coming
out
of
the
application,,
even
though
they
have
the
exact
same
shape
of
the
dimensions
that
ties
to
them
right
because
they
essentially
describe
the
same
business
process,
just
different
measurements
of
that
business
process..
So
what
we're
trying
to
do
with
our
like
a
metric
back
end
to
say,
yeah,
well,.
That's
just
model
it
as
as
this
business
process
as
a
table.
essentially,,
so
that
you
don't
have
one
single
numeric
value
in
the
metric,.
B
B
Daniel
golant:
I'm.
I'm.
curious,,
because
what
I've
run
into
repeatedly
is
two
questions.
one
is
like
a
um..
I
have
a
situation,
and
I
deal
with
the
app
level,
mostly
right?,
um,
I'm,.
You
know
I
writing,
you
know,
mortgage
system,
right
uh,,
you
know,
engineers,
saying
I
need
to
set
an
alert..
I
want
to
log
line
for
context.
B
Daniel
golant:
potentially.
and
then
also
the
data
team,
wants
a
piece
of
data
out
of
this,
and
I'm
logging,.
You
know
I.
I
have
like
an
entire
function
of
just
like
data.,
I'm
sending
out..
Why
can't
I
send
one
line
that
produces,
you
know,
a
metric,
and
then
from
the
same.
Data,
produces
a
log
of
line,
and
also
send
something
to
the
data
warehouse,
and
from
there,,
like
teams,,
have
tried
to
implement
their
own
unified
logger,
and
to
see,,
and
that
you
can
configure
to
to
send
to
different
destinations.
B
Yuri
shkuro:,
it
sounds
like
you're
saying
this
is
something,
and
most
people
in
this
room
agree
it
a
bad
idea.
uh,.
Maybe
at
a
high
level,,
could
you
say?
why,?
I
don't
think
it's
a
bad
idea.,
it's
it's
kind
of
it's
situational.!
It's
sometimes.!
I
know
that
some
of
the
companies
they
public
to
talked
about
those
kinds
of
sdks
that
they've
developed..
I
remember
the
name
being
near
from
one
of
the
companies,
which
is
kind
of
this:
a
single
unified,
event.
api
that
you
emit,
and
then
behind
the
scenes.
E
Yuri
shkuro:
to
do,
what
goes
into
metrics,
logs,
traces,
et
cetera.
right?,
so
that
that
makes
sense.
uh,
but
um!.
I
think
that
also
is
that's.
Why
I
mentioned
boil
the
ocean..
We
actually
look
at
the
project
like
that,
once,
and-
and
we
decided
not
to
proceed,,
because
when
you
already
have
thousands
of
services
kind
of
pushing
that
kind
of
unified
api
is
almost
like
non-stutter..
This
is
such
a
huge
migration.
that
you
need
to
force
on
people,
and
the
benefits
are
not
there
for
that
migration..
So.
E
Yuri
shkuro:
uh,,
whereas
what
we
can
instead,,
what
we
can
do
is
with
the
existing
api..
We
can
extend
them
with
the
schema
first
capabilities,,
which
are
where
I
already
allow
you
to
capture.
but
yeah,.
They
like
independent
sdk,
still
means
that
people
are
making
this
upfront
decision.
oh,
am
I
meeting
the
metric,
which
will
be
aggregated
and
lose
dimensions,,
or
am
I
meeting
very
reach
log
statement
with
all
the
fields?
right?
yeah,?
That's
that's
sort
of
like
an
unfortunate
side
effect,
which
would
potentially
you
can..
You
can
get
away.
From.
F
Ryan
perry:
yeah,
yo:
um:
yeah..
I
kind
of
wanted
to
ask
something
you
were
saying
earlier
when
you're
kind
of
like
going
through
the
different
signals.
um,
you
know,,
I've
been
working
with
the
profiling
group
on
the
yeah
on
like
the
otep
and
that
kind
of
stuff.,
and
I
would
say,
yeah,
that
a
common
uh,,
maybe
not
necessarily
concern
or
criticism,,
but
just
a
common
like
thought
or
response
that
people
have
is
that
profiling
does
tend
to
be
for
like
power,
users,.
F
Ryan
perry:
and
I'm
kind
of
curious.
um,,
because
I
know
you
were
instrumental
in
sort
of
like
the
beginning
of
tracing,
and
that
kind
of
thing
which,
from
my
perspective,
also
seemed
like
it,
kind
of
started
out
as
more
of
like
a
power
user
tool.,
and
so
I'm
curious
like,.
Do
you
see
similarities
between??
You
know
the
way:
tracing
was,
maybe,,
you
know,
three,,
four,
however,
many
years,
ago.
um,,
you
know.,
because
yeah,,
like
I
hadn't
heard
of
trade
like
I
guess,
I'd
use
profiling
before
I'd
use.
F
Ryan
perry:
tracing
um,,
but
I
also
kind
of,,
I
guess
came
later
to
the
to
the
game.,
um,
and
tracing
tends
to
be
more
of
a,.
I
guess.
yeah,,
like
kind
of
a,.
F
E
Yuri
shkuro:
um,
yeah,,
it's
a
good
question.,
so
one
thing
I
think
profiling
actually
may
have
a
easier
pass
than
tracing
into
the
day
to
day
life.
uh,,
because
uh,
a
concept
of
profiling
is
still
actually
easy
to
understand
for
people
than
they
distributed
to
trace.
because,
like
trying
to
explain
context
propagation
to
someone
who
never
heard
of
it
is
just
it's
a
plane.
I've
been
through
this
like
thousand
times
it.
it.,
it's
very
difficult,,
implying
that
it
is
also
very
difficult..
It
was
like
with
profilers.
E
Yuri
shkuro:,
you
already
have
a
much
easier
pass
on
the
implementation
side,,
because
you
don't
need
users
to
do
anything.
You
just
like.
you
integrate
with
the
runtime
and
boom,.
They
got
the
profile
right.
um,,
whereas
with
with
like
context
propagation.,
it's
almost
always
the
sort
of
application
level,
instrumentation
required,
and
that's
a
very
big
sort
of
roadblock
to
adoption
of
tracing.,
um.
and
and
so
yeah,
like
tracing,,
because
I
I
I
personally
still
think
it
struggles
to
kind
of
gain
them
sort
of
like
mind.
E
Yuri
shkuro:,
valuable
tool
for
various
things,,
and
in
and
like
our
ri,
is
really,.
I
think,,
maybe
one
of
the
big
problems
for
for
tracing..
I
don't
think
I
realize
the
problem
for
profilers..
That's
actually
is
way
easier
story
for
profilers.
um:,
the
um,
yeah.,
but
uh,
similarities,
though,.
Is
that.
E
Yuri
shkuro:
and
oh,-
and
I
think
another
thing
is,
profilers-
are
just
much
easier
to
to
to
make
them
work
right..
So
I
mentioned
that
so
like,
there's
a
value
in
aggregating,
profiles.
and.
E
E
Yuri
shkuro:
query
language
in
existence,,
let's
say
I,,
I
I
can
ask
a
ground-based
question
of
the
trace,
like
actually
using
the
fact
that
they
are
drafts
right,
like
temple
dbs
of
like.,
we
came
up
recently
with
with
the
some
form
of
it..
It's
not
really
even
the
method
yet,,
but
and
it's
not
the
how
efficient
it's
going
to
be.
we're
like
that's,
not
the
problem
at
all.,
this
profilers
you
can.,
yeah.
aggregation
is
very
straightforward,
there.
um.
and,
and
you
can.,
you
can
get
immediate
benefits
from
doing
that..
E
E
Yuri
shkuro:
there
weren't
that
many
sort
of
they
literally
that
that
I've
encountered
only
two
data
models
that
tracing
and
use
like
event,
based.
and
then
the
span
based
and
the
industry
is
completely
conversed
in
the
span
based
uh,
like
some
systems.
That,
like
facebook
canopy
is
event
based,
for
example,
right?,
some
others
are
there
like
that.
uh,,
with
the
profiles.,
from
what
I've
seen
and
the
tap
activity,.
There
is
like
literally
uh,,
like
almost
what,
like
hundreds
of
of
different
format.
C
Yuri
shkuro:
and
that
that's,
I
think,
that
that's
probably
the
biggest
challenge
in
the
profiling
space,
to
sort
of
like
doing
the
gap,
analysis
and
saying
what
is,
and
not
possible
in
in
all
different
formats;,
and
if
it's
possible
to
sort
of
come
with
a
common
one,
I
mean
linux
did
was
like
you..
If
I
trace
profile
or
something
a
trace
format,,
I
forgot
what
his
name
right,,
but
I
don't
know
how
well
it
represents
all
of
the
use.
Cases.
A
Alolita
sharma:
no.
very,,
very
interesting,
points.,
uh,
and
especially
the
linux
I
mean
linux.
Profiling
is
the
most
well
known.
use
case
right,
enough,.
A
A
Alolita
sharma:
all
right.
um,.
I
think
this
is..
This
has
been
a
pretty
awesome.
Discussion.
and
again,
you're
very,,
very
grateful
that..
You
could
join
us
today.
again,,
looking
forward
to
your
talks,
and
and
again
at
sr.
icon,,
as
as
you
present,
this
paper,
it'll
be
uh,,
actually,
quite
quite
a
good
discussion
to
have,.
A
Alolita
sharma:
because
it
does
change
the
you
know
traditional
way
of
thinking,,
but
it's
actually
very
applicable
when
you're
looking
at
application
more
and
more
applications
that
are
enabled
with.
A
Alolita
sharma:,
tracing.
and,
and
you
know,
observability
come
into
the
observability..
So
thank
you.
Again.
A
A
Alolita
sharma:,
we
have
a
whole
set
of
observability
events
going
on
one
day:
events
as
well
as
focused
on
metrics.
on
tracing
on
prometheus,
as
well
as
open,
telemetry.
A
D
D
A
Matt
young:,
but
I'm
expecting
updates
from
you
on
landscape
graph.
Later
at
a
later
point.
yes,,
that's
been
a
little
bit
pause
while
I
on
board
in
my
new
role,,
but
I
I
do
intend
to
to
return
to
that.
and
if
anyone
is
interested
in
jumping
in,.
I
know
some
people
have
have
expressed
interest.
uh,,
there's
a
whole
bunch
of
stuff
there.
That's
just
waiting
for
new
people,
to.,
there's
a
bunch
of
issues
marked
as.
D
Matt
young:
good
first
issue,
and
some
help
on
it.
stuff
with
a
roadmap
um!
I'll
check
it
out,.
If
you
like,,
correctly.
alright.
thanks,,
thanks,
everyone..
I
want
to
give
some
a
couple
of
minutes
back.,
but
thank
you
again
for
joining,
and
thank
you
so
much.