►
From YouTube: Live Panel: OpenTelemetry Metrics Deep Dive
Description
Don’t miss out! Join us at our upcoming event: KubeCon + CloudNativeCon Europe 2021 Virtual from May 4–7, 2021. Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects.
Live Panel: OpenTelemetry Metrics Deep Dive
This panel will be a deep dive into the OpenTelemetry Metrics specification, a discussion on why it differs from past metrics APIs, and where we expect it to go in the future - bring your questions for Q&A!"
A
All
right
well,
I
would
like
to
begin
this
session,
and
the
goal
here
is
to
do
a
deep
dive
into
the
open,
telemetry
metrics
system.
I
was
asked
to
give
this
talk
by
the
organizers
and
they
suggested
that
it
would
be
a
good
idea
to
have
a
conversation
a
year
ago.
A
We
did
this
earlier
on
in
our
open
telemetry
project
and
had
a
similar
presentation
here
where
liz
from
honeycomb
spoke
with
me
for
an
hour
about
the
metrics
api,
and
it
was
really
helpful
to
have
outsiders
and
insiders
talking
with
me
about
it
as
we
went
so
we're
going
to
do
that
again-
and
I
have
with
me
shelby
and
justin
shelby
is
at
honeycomb
and
has
agreed
to
be
an
eager
participant
in
this
session
and
then
justin
is
at
new
relic
and
has.
B
A
One
of
the
contributors,
especially
working
on
our
semantic
conventions
for
the
open,
telemetry
system.
Okay,
I
probably
should
introduce
myself
I'm
an
engineer
at
lightstep.
I've
been
working
in
observability
for
I
think
about
15
years
before
lightstep,
I
was
at
google
and
have
been
involved
in
like
tracing
metrics
and
logging
for
a
long
time.
Okay,
so
the
outline
is.
A
There
are
three
parts
here
going
to
start
with
what
we
were
trying
to
achieve
and
what
we're
after
in
this
project,
because
we're
not
done
it's
it's
underway
and
we'll
talk
about
the
sort
of
timeline
we
that
we
think
we
have
the
bulk
of
this
sort
of
slide.
A
Okay,
apologize
for
anybody
visually
impaired,
I'm
better
at
drawing
slides
than
I
am
at
making
them
on
the
computer,
so
I've
scanned
a
bunch
of
hand
drawings
for
this
deck,
and
hopefully
you
can
follow
along
and
read.
My
writing.
A
I'm
trying
to
sort
of
start
by
saying
who
this
is
for.
I
want
to
attract
like
an
audience
to
talk
about
metrics
now,
so
I
think
we
can
basically
agree
that
there
are
many
different
participants
in
the
in
the
in
this
type
of
community.
We
have
people
who
are
trying
to
configure
telemetry
in
diagnostics
for
their
platforms
for
their
for
their
for
their
companies
or
their
where
they
work.
A
There
are
engineers
who
are
actually
writing
instrumentation
and
software
at
those
companies
trying
to
make
more
observability
and
better
diagnostics
for
themselves,
and
then
there
are
actually
users
who
are
trying
to
understand.
Why
is
my
system
broken?
Why
can't?
I
you
know
log
into
my
my
machine
or
whatever,
so
these
three
different
groups
have
kind
of
different
perspectives
on
the
problem
space
and
we
end
up
thinking
about
what
each
of
them
wants.
As
we
talk
through
this
problem
space.
A
Okay,
ultimately,
the
goal
is
to
get
some
dashboards
or
some
other
form
of
alerting
or
monitoring
on
your
data,
and
that
looks
something
like
this
drawing
that
I
made
here
on
the
right.
This
next
diagram
comes
straight
out
of
the
open,
telemetry
library
guidelines.
A
This
was
put
together
in
the
very
early
days
of
the
project,
basically
seeing
what
we're,
after
as
far
as
the
project
as
a
whole,
one
of
the
things
that
we
wanted
to
do
is
build
a
neutral
system
so
that
you
could,
as
a
as
a
developer,
decide
to
use
open
telemetry
without
locking
yourself
into
somebody's
sdk
or
some
some
vendors
system.
So
what
that
means
is
that
we've
created
an
api
separation
from
the
sdk.
A
It
means
our
interfaces
are
decoupled
from
our
implementations,
and
this
means
that
you
can
swap
in
another
sdk
or
some
alternate
implementation
later.
So
the
the
diagram
has
here,
your
your
application
code
running
sorry,
and
then
it
goes
into
this
sort
of
green
box
which
combines
both
the
api.
That's
the
spec
that
we've
put
together
for
how
you
interact
with
metrics,
as
well
as
the
sdk,
which
is
the
default
implementation
that
all
the
open,
telemetry
libraries
are
going
to
include
and
together.
C
So
josh
is
how
much
how
much
more
effort
is
it
to
use
something
to
use
the
open,
telemetry
setup
versus
some
of
the
vendor
neutral
integration
or
the
vendor-specific
integrations
that
people
may
be
used
to.
A
I'm
not
exactly
sure
which
vendor
integrations,
you
might
be
thinking
of
so
some
say:
sdks
come
in
with
a
bunch
of
built-in
metrics
for,
say
your
platform.
So
you
start
using
this
library
and
you
get
host
metrics
by
out
of
the
box.
You
get
kubernetes
metrics
out
of
the
box.
This
is
something
that
the
open
geometry
system
will
include
so
automatic
metrics
as
much
as
possible
are
going
to
be
included
for
you
and
I've
really
structured.
A
This
talk
cut
to
leave
sort
of
the
details
that
you
as
a
programmer
might
want
to
know
to
write
your
own
metrics.
It's
really
not
the
most
important
part
of
this
topic.
This
talk,
because
what
we're
really
trying
to
do
is
help
you
set
up
an
ecosystem
help
you
set
up
a
collector
and
an
export
pipeline.
Very
few
engineers
actually
write
custom,
metrics
and
so
that's
sort
of
the
least
important
part
of
the
top
talk
here.
A
All
right,
so,
what's
next
here,
one
of
the
major
requirements
here
is
that
we
are
an
open
source
project
and-
and
this
at
least
at
least
from
my
perspective-
was
one
of
the
biggest
challenges
of
joining
an
open
source
project
is
now
you
don't
work
for
a
company
anymore.
You
kind
of
have
your
own.
We
have
the
community
to
think
about
first,
and
so
this
has
been
a
project
that
that
moves
at
the
pace
of
the
community
because
we're
getting
what
the
community
wants.
A
Okay,
one
of
the
sources
of
our
sort
of
most
sophisticated
requirements
came
from
open
census.
The
open
census
system
is
really
what
gave
us
the
open
telemetry
project
combined
with
open
tracing.
So
the
the
open
tracing
side
of
that
combination
gave
us
that
requirement
about
sdk
separation
from
the
api.
A
What
opencensus
gave
us
was
the
requirement
for
a
very
high
performance
sdk
and
the
requirement
to
have
that
sdk
be
configurable
and
in
metrics
what
that
meant
was
the
ability
to
choose
which
metrics
are
going
to
be
exported
to
choose
which
dimensions
are
going
to
be
exported
and
they
call
that
the
views
api.
So
all
those
requirements
were
given
to
us
sort
of
at
the
starting
point.
We
need
to
have
a
better
way
to
configure
which
what
happens
when
you
use
a
metrics
api.
A
Next,
it's
not
just
about
how
do
you
record
numbers?
We
need
to
make
sure
that
the
entire
problem
is
being
solved.
So,
in
addition
to
you
creating
your
own
application
metrics,
you
have
plugins
for
your
host
metrics.
You
have
plugins
for
your
kubernetes
metrics.
A
You
have
instrumentation
for
tracing
that
is
shared,
has
shared
resource
attributes
for
your
metrics,
and
so
we
want
to
have
specifications
that
tell
us
exactly
how
you
should
label
data
in
a
standard
way,
so
that
users
will
be
able
to
find
it
and
understand
what
they're
looking
at,
and
so
this
is
really
a
task
that
bridges
technical
stuff
with
kind
of
language,
stuff
and
understanding
and
questions
about
how
we
talk
and
what
names
we
use.
So
this
is
an
interesting
sort
of
corner
of
the
specification
which
is
really
about
language
and
terminology
and
understanding.
D
A
The
open
telemetry
project
is
trying
to
finish
the
tracing
spec
as
and
and
and
freeze
that
and
make
sure
that
we
have
a
stable
tracing
environment
soon,
because
of
that
attention
being
given
to
trace
the
metrics
project
has
been
slowed
down
a
little
bit
so
we're
kind
of
like
putting
our
attention
to
tracing
while
we
and
waiting
to
finish
metrics,
but
there
is
also
just
a
large
number
of
moving
parts
in
this
problem
space
we
have
specifications
which
we've
we've
definitely
finished,
the
the
for
the
most
part.
A
We
finished
our
api
spec,
but
the
sdk
specification
is
probably
going
to
be
the
last
thing
to
finish,
because
we
have
to
actually
implement
this
sdk
in
a
bunch
of
different
languages
and
figure
out
what
else
needs
to
be
answered,
but
as
far
as
sort
of
the
collector
support
that's
been
sort
of
ahead
of
schedule,
because
we
have
the
open
census
collector
already
and
a
number
of
the
integrations
and
the
receivers
and
exporters
that
you've
heard
about
today
were
already
in
place
for
metrics.
A
So
that's
that's
a
little
bit
ahead
and
then
there's
a
few
kind
of
lingering
questions
about
data
model
and
the
protocol
that
are
things
that
we
may
want
to
solve
in
the
distant
future.
A
We
may
not
need
to
at
all
those
are
sort
of
open
questions
things
questions
about
sort
of
the
obscure
corners
of
the
world
where
you
might
want
to
use
the
protocol
for
something
different
but
for
the
most
part
we're
getting
close,
and
we
think
that
by
the
second
first
half
of
next
year,
you'll
be
able
to
use
this
for
real
awesome.
Okay,
I'm
gonna
start
talking
about
data
model.
Now.
C
Before
you
do,
we
have
a
couple
questions
in
the
chat
that
I
wanted
to
share.
So
raj
has
a
question:
does
the
sdk
capture
raw
data
at
a
time
stamp
or
is
there
data
aggregation
processing
in
the
sdk.
A
Right
so
I
I
do
plan
to
talk
a
little
bit
about
that
later.
There
is
definitely
that's
one
of
those
performance
requirements
that
we
got
from
open
census.
We
need
to
have
a
high
performance,
metrics
library,
and
we
know
that
the
sort
of
gold
standard
at
the
the
point
when
we
started
was
prometheus
libraries,
so
prometheus
libraries
are
organized
in
such
a
way
that
your
updates
are
very
fast,
mainly
because
you're
actually
pinning
memory
to
keep.
C
Cool
and
yeah
we'll
go
into
that
later
in
your
presentation
and
then
the
other
question
that's
been
coming
up
a
lot
today,
I'm
not
sure
if
you're
planning
to
talk
about
this
a
little
bit
more,
but
the
relationship
between
openmetrics
and
hotel
or
otlp.
A
A
Cool
all
right,
so
I
just
want
to
start
with
data
model
like
this
is
a
like.
What
are
we
doing?
This
is
about
metrics,
so
I
kind
of
want
to
bring
us
all
back
to
the
top
level,
which
is
we're
trying
to
look
at
some
numbers,
probably
in
a
visual
way.
We
may
also
be
alerting
and
monitoring
on
them,
but
the
classic
application
for
metrics
is
to
create
some
charts
and
put
them
on
a
dashboard.
A
So
I
want
to
talk
through
the
different
kinds
of
charts
and
dashboarding
facilities
that
are
commonly
available
through
metrics,
because
there
are
different
data
types
here,
so
the
next
few
slides
are
going
to
talk
about
your
sort
of
typical
charts
that
you
might
get
out
of
a
metric
system.
A
That's
what
we're
after
so
first
here
is
a
count
time
series,
what
I'm
calling
a
count
time
series
is
just
that
there's
some
counter
and
it's
it's
cumulative
in
the
sense
that
I've
begin
this
counter
at
the
start
of
my
process,
and
I
may
add
to
it
I
may
subtract
to
it.
We
haven't
talked
about
monotonicity
yet,
but
over
time,
that
total
is
kept
as
a
sum
and
whatever
value
I'm
looking
at
as
a
function
of
time
is
the
total
cumulative
count
for
that
metric.
A
Now
this
is
one
of
the
big
deals
here
in
metrics
is
that
you
can
often
represent
count
or
some
data
as
either
a
rate
or
a
total
and
that's
this
comes.
This
is
one
of
the
complications
that
we're
going
to
have
to
deal
with
because
they
have
different
properties
but
they're,
roughly
speaking
equivalent.
So
when
we
talk
about
showing
a
rate,
it
is
going
to
be
associated
with
what
we
call
deltas,
you're
reporting,
the
change
in
some
quantity
and
then
you're
going
to
plot
the
change.
A
So
this
change
could
be,
can
is
a
number
that
can
rise
and
fall
and
if
the
number
the
count
is
not
monotonic,
this
change
could
actually
be
negative.
A
We
also
have
this
notion
of
a
gauge
in
the
kind
of
traditional
metrics
interfaces,
and
the
data
model
here
is
a
little
different
from
sort
of
well.
What
I'm
trying
to
show
with
this
visualization
diagram
here
is
that
you
may
set
a
gauge
many
times
during
an
interval,
but
what
we
commonly
report
when
we're
talking
about
gauges
is
the
last
value
that
was
set.
So,
although
you
have
many
points
visualized
in
this
in
this
graph,
only
the
blue
colored
dots
are
the
ones
that
are
going
to
be
actually
reported.
A
So
this
is
called
last
value
reporting.
We
call
this
a
gauge
and
it's
interesting,
because
we
don't
actually
value
or
keep
information
about
every
point.
In
some
sense,
the
number
of
points
is
irrelevant
here.
All
we
want
to
know
is
that
there's
a
signal
we
can
evaluate
it
at
a
point
in
time
and
we
often
record
just
one
value.
This
is
a
relatively
inexpensive
type
of
aggregation.
A
The
idea
is
that
here
you
have
these
individual
measurements
and
instead
of
just
recording,
say
the
last
value
of
one
of
those
measurements,
we're
going
to
somehow
capture
more
we're
going
to
capture
both
account
and
the
value
so
that
we
can
speak
independently
about
the
count
and
about
the
values.
So
what
I've
drawn
here
now
is
a
distribution
on
the
top.
I've
got
red
purple
and
blue,
showing
you
quantiles,
so
this
might
be
p99
p50
p10
telling
you
where
the
distribution
of
latency
or
some
other
value
in
your
system
is
and
then
on
the
bottom.
A
A
There's
two
pieces
of
the
data
model
that
are
not
present
in
an
open,
telemetry
system,
and
I
just
want
to
say
it's
not
because
they're
missing
it's,
because
when
you're
pulling
data
in
a
metric
system,
you
don't
need
these
features.
So
the
first
thing
is
resources.
A
A
So
I
use
the
term
attributes
for
this
key
value,
association
and
the
difference
between
what
we've
got
in
openometry
and
what
you
have
and
say
openmetrics,
is
that
this
concept
or
resource
is
in
the
protocol,
so
that,
when
you
report
a
batch
of
metrics
data,
you're
going
to
have
one
section
for
resources
and
then
you're
going
to
have
a
lot
of
metrics
data
inside
that
encapsulation,
and
so
this
allows
us
to
begin,
creating
common
coordinates
and
telemetry
between
traces
and
and
metrics,
because
we
have
standard
resources
and
we
can
configure
these
export
pipelines
to
attach
standard
resources
to
both
our
traces
and
our
metrics
as
they
pass
through
the
export
pipeline.
A
A
We
have
a
terminology
problem
there
and
try
to
avoid
it
for
this
talk,
so
so
the
first
thing
we're
adding
that's
different
from
openmetrics
is
the
concept
of
resources,
and
they
say
that
you
don't
need
these
in
a
pull-based
system,
because
if
you
think
about
how
prometheus
works,
you
scrape
the
target.
The
target
gives
you
all
of
its
own
metric
data,
including
its
own
metric
labels,
but
it
doesn't
know
its
own
resources.
A
A
A
So,
and
I
want
to
make
a
distinction
between,
because
we
use
this
word
in
two
senses
here,
it's
pretty
important
to
the
design
of
the
whole
system,
so
bear
with
me
the
idea
that
there's
something
called
temporality
is
trying
to
describe
a
relationship
with
time
and
when
we
report
metrics
data
they
may
be
cumulative
or
they
may
be
delta.
A
Generally
speaking,
deltas
mean
that
you're
reporting
differences
since
the
last
report
and
every
report
is
independent
and
does
not
build
on
the
previous
report,
we
need
to
so
we're
reporting
changes
one
after
the
other
in
cumulative
reporting,
we're
going
to
repeat
report
some
total
since
the
beginning
of
time
and
we're
going
to
keep
re-reporting
that
total
since
the
beginning
of
time.
Okay,
why
are
we
talking
about
this?
A
Okay,
this
is
confusing.
I
have
another
slide
on
it,
so
why
do
we
care?
Well,
the
this
choice
is
entirely
about
keeping
state
and
stateful
interactions,
betw
out
of
the
sdk
and
the
api
for
open
telemetry.
So
if
you
think
about
reporting
say
a
total,
your
current
memory
usage,
you
don't
want
to
report
your
current
memory
usage
as
a
delta,
because
that
would
require
you
to
remember
the
last
value
that
you
reported
for
your
current
memory
usage.
A
So
from
the
start,
we
have
this
built
into
our
instruments
and
it's
because
we
think
it
is
more
convenient
to
talk
about
deltas
in
some
contexts
and
more
convenient
to
talk
about
totals
in
other
contexts.
It
keeps
state
out
now
the
point
of
creating
this
concept
in
aggregation.
This
instrument,
sorry,
the
point
of
creating
instrument
temporality-
is
that
we
can
adjust
aggregation
temporality.
So
what
goes
in
is
not
necessarily
what
comes
out,
and
it
turns
out
that
if
we
have
deltas
coming
in
and
we
want
cumulatives
coming
out
we're
going
to
put
memory.
A
A
There's
going
to
be
more
on
this
topic.
We
will
keep
talking
about
this,
so
I
need
to
so
we're
sort
of
we're
still
talking
about
data
model,
but
we're
trying
now
to
integrate
the
concepts
from
prometheus
and
the
concepts
from
statsd,
because
one
of
those
systems
uses
cumulatives
and
one
of
those
systems
uses
deltas.
A
So,
however,
in
the
api
for
both
of
those
systems,
counters
use
deltas,
so
both
prometheus
and
stats,
you
expect,
when
you're,
using
a
counter
that
you
will
tell
it
a
delta.
However,
when
prometheus
writes
data
to
its
spread
ahead
log
or
exports
data
through
its
remote
right,
it's
sending
you
cumulatives,
and
so
this
is
why
prometheus
has
trouble
with
cardinality.
It
requires
you
to
keep
memory
inside
the
client
library
for
every
counter
you've
ever
used,
because
it
has
to
track
the
cumulative
value
in
open
telemetry.
A
We
are
giving
you
these
instruments
that
have
delta
instrument,
temporality
counters
and
this.
This
instrument
that
we
call
up
down
counter
are
for
for
inputting
changes
or
deltas
to
a
sum.
So
these
are
just
like
the
prometheus
concepts.
These
are
just
like
the
statsd
concept,
except
that
we've,
given
you
two
ones
for
monotonic
counters
and
one's
for
non-monotonic
counters.
A
This
is
actually
the
simple
case.
These
are
deltas.
Now,
when
we
talk
about
gauges,
I
mentioned
earlier
in
some
of
this
breakout
sessions
that
there's
a
little
bit
of
a
terminology
problem
and
the.
I
don't
want
to
go
too
close
to
the
details
here,
because
I
will
lose
you
and
you'll
get
bored,
but
basically
prometheus
and
statsy
use
the
term
gage
in
slightly
different
ways
and
I've
copied
a
definition
out
of
wikipedia
for
the
word.
Gage
itself
turns
out
that
the
actual
word
has
two
meanings
in
our
actual
engineering
practice.
A
A
So
the
thing
that
you're
using
a
gauge
for,
however,
is
a
real
application
in
prometheus
and
statsd
gauge
got
used
whenever
we
wanted
an
individual
measurement,
but
not
the
cost
of
a
histogram.
So
open
telemetry
created
a
new
distinction
to
cover
this
gauge
use
case
and
it's
when
you're
recording
an
individual
measurement.
So
I'm
going
to
talk
more
about
what
an
individual
measurement
means.
A
But
the
point
is
that
if
you
take
a
measurement
and
there's
something
so
you
say
suppose
you
measure
a
latency,
you
would
never
add
that
measurement
of
latency
to
another
measurement
of
latency,
just
for
the
sake
of
adding
those
two
numbers
together.
That
is
an
individual
measurement
and
you're
interested
in
knowing
individual
latencies.
So
there's
some
so
when
you,
when
you
use
this
value
recorder
instrument,
you're
going
to
be
computing,
a
distribution
as
opposed
to
computing,
a
sum
which
is
a
much
simpler
operation
and
much
cheaper
operation.
A
I'm
trying
to
explain
why
gauge
doesn't
exist
and
I
think
I'm
getting
a
little
confused
or
I'm
not
sure
that
people
are
following
you
at
me
at
this
time,
because
gage
is
a
very
confusing
concept.
But
what
I'm
trying
to
say
here
is
that
we've
replaced
the
concept
of
a
gauge
instrument
with
two
new
instruments.
One
is
going
to
be
called
value
recorder
and
one
is
called
value
observer.
We
use
value
recorder
to
record
individual
measurements.
A
The
problem
I'm
trying
to
explain
with
gage
is
that
sometimes
they
got
used
to
record
sums
that
were
cumulative
or
sums
that
were
not
monotonic,
and
sometimes
they
were
used
to
record
sort
of
other
measurements,
such
as
a
temperature
or
a
latency,
which
is
not
something
you
ordinarily
add
together.
A
So
the
point
is
gauges
got
used
for
several
different
things
in
each
of
these
systems
and
we
are
trying
to
provide
you
instruments
that
get
used
for
exactly
one
thing:
if
you're
observing
a
sum
or
you're
observing
a
non-monotonic
sum,
instead
of
using
a
gauge
you're
going
to
use,
one
of
these
new
instruments
called
some
observer
or
up
down
some
observer,
and
these
are
instruments
that
have
so-called
cumulative
instrument.
Temporality
we're
going
to
keep
talking.
C
Back
one
slide
so
when
you
say
monotonic
just
because
I'm
I
might
have
missed
that
meaning
at
one
point:
that's
like
a
something
that's
unrelated
to
the
previous
value
right.
It's.
A
A
The
distinction
is
between
sums
that
rise
and
fall
versus
sums
that
only
rise,
and
when,
when
we
have
a
sum
that
only
rises
more
often
we're
interested
in
showing
that
as
a
rate,
but
when
we
have
a
sum
that
rises
and
falls
more
often,
we
are
interested
in
showing
that
as
a
count
and
so
you're,
probably
familiar
with
metrics
interfaces
having
the
ability
to
choose
whether
you're
talking
about
a
rate
or
a
count.
This
is
this
is
something
that
we're
actually
encoding
in
these
instruments
for
open
telemetry.
C
So
I
can
imagine
like
for
in
a
very
lay
example.
Right
age
is
always
in
relation
to
the
previous
year.
You
can
always
just
say
like
age
plus
one
right
and
that's
my
birthday,
it's
related
to
the
previous
year,
but
for
like
temperature,
it's
completely
unrelated
to
the
past.
Yesterday's
temperature,
you
might
you
might
say,
like
you,
might
your
google
alert
says
like
oh,
it's
going
to
be
seven
degrees
colder
today,
but
usually
you
just
want
the
individual,
like
today's
temperature
yeah.
A
That's
like
that's
a
good
observation.
You
don't
ordinarily,
monitor
changes
in
temperature.
You
monitor
absolute
temperatures,
but
you
might
monitor
something
like
that.
That
is
being
counted
as
a
rate,
and
then
you
might
monitor
that
as
a
change.
So
yeah.
C
Okay,
I
I
like
using
the
very
human
examples,
but
it's
it's
also
helpful
to
have
them.
I
let
your
other
examples
of
things
like
cp
usage
right,
where
you
always
want
that
at
the
particular
moment
you
don't
care
how
you
don't
usually
care
how
much
that
relates
to
like
five
minutes
ago.
A
Yeah,
I
I
like
your
example
of
using
age,
though
it's
a
great
one,
so
you
could
imagine
two
ways
of
of
monitoring
you
an
age
of
a
person
if
you're.
If
you
use
a
counter,
then
every
year
you
increment
it
by
one,
if
you're
using
a
sum
observer,
then
every
year
you
output
your
current
age
and
that's
always
one
more
than
the
last
year
so
and
we
could
talk
more
about
the
differences,
we
will
talk
more
about
the
differences
as
one
of
the
key
aspects
of
this
design.
C
B
B
A
Yes-
and
I
I'm
I
promise-
I
tried
to
put
a
aside
about
this.
Actually
it's
the
next
one.
So
there
has
been
this
debate
and
I
promise
you.
We
are
going
to
spend
more
time
updating
our
specs
to
try
and
clarify
this
issue.
So
I
I'm
now
regretting
that.
A
I
that
I
didn't
include
more
detail
already
on
the
instruments
that
we've
designed
we've
got
six
instruments,
and-
and
some
of
this
is
what
we're
talking
about
right
now,
but
but
but
when
we
talk
about
semantic
kind,
there
really
are
two
to
my
knowledge
and
in
the
technical
committee
we
are
in
the
sig.
We've
been
talking
about
the
words
adding
versus
grouping.
What
we're
trying
to
distinguish
is
between
things
that
you
add
or
things
that
you
average
and
things
that
you
add
are
are
interesting,
let's
suppose
you're
sampling.
A
So
if
I
have
a
bunch
of
observations
of
things
that
I
add
well,
the
larger
numbers
are
more
interesting
because
they
contribute
more
to
a
sum.
So
if
I'm
say
sampling
numbers-
and
I
know
that
there
are
things
I
care
about
the
sum
well,
then
I
put
weight
on
those.
The
higher
numbers
get
more
weight
when
I'm
sampling
or
down,
or
any
kind
of
aggregation
that
I'm
doing
where.
A
Where
sum
is
the
the
property
I'm
after
then,
I
want
to
know
that
it's
a
sum,
whereas
if
I'm
doing
something
like
down
sampling
or
reducing
dimensionality
of
one
of
these
other
types
like
where
traditionally
we
have
used
a
gauge
or
where
you
use
now
a
value
recorder
or
a
value
observer,
then
there's
there's
no
just
there's
no.
In
difference
in
importance
between
a
small
number
and
a
big
number,
you
have
a
zero
latency,
that's
a
significant
measurement,
it's
no
more
or
less
significant
than
100
second
latency.
A
This
is
a
little
we're
sort
of
straining
into
the
theory
of
measurement
and-
and
I
should
have
put-
I
could
have
put
some
slides
in
here
about
that,
but
we
do
in
sort
of
in
statistics
or
in
in
math.
We
talk
about
scale
for
the
numbers,
so
you
can
talk
about
ratio,
scale
and
interval,
scale
and
logarithm
scale.
You
can
answer
the
question
in
those
terms
as
well,
but
I'd
rather
do
it
with
the
way
I
just
did.
A
So
I
just
discussed
roughly
what
was
on
this
slide
is
to
say
that
we
are
interested
in
keeping
the
semantic
type
when
we
have
when
we
have
that
information
and
this
choice,
justin's
question
the
choice
between
value
recorder
and
or
sorry
value,
observer
and
and
up
down
some
observer,
though
those
they're
very
close,
but
conceptually
we
want
them
to
be
different,
and
I
promise
you
we'll
keep
writing
to
try
and
clarify
this
point.
A
Okay,
well,
I
finished
my
section
on
data
model,
so
that's
good.
We
made
it
through
and
you
asked
the
same
question
right
as
I
was
reaching
my
last
slide.
Okay,
so,
for
the
rest
of
this
talk,
we
are
going
to
go
through
some
of
the
ways
that
you
can
configure
export
for
your
metrics
data,
especially
talking
about
how
we
control
costs.
A
Okay,
so
in
not
particularly
great
order
here,
we
have
a
few
features
that
are
sort
of
new
and
interesting
for
open
telemetry.
One
of
the
things
that's
been,
I
think,
holding
back
the
industry
generally
speaking,
is
that
we
need
variable
balanced
histograms
or
we
need
histograms
that
can
support
high
resolution
and
be
relatively
compressed
various
ways
of
saying
the
same
thing.
Sometimes
we
call
these
sketches
histograms
by.
A
However,
you
formulate
them
are
approximate
representations
of
a
distribution,
and
we
want
to
get
better
at
this,
so
I've
named
some
algorithms
that
are
that
you
that
are
used
for
this.
We
are
currently
negotiating
and
working
out
some
of
the
protocol
decisions
that
we
want
here
and
we
are
looking
to
standardize
on
one
of
these
algorithms
as
a
sort
of
recommended
approach.
A
So
open
geometry
is
going
to
offer
you
essentially
better
histograms
than
than
you've
been
getting
out
of
the
box
for
some
of
the
metric
systems
that
you're
interested
that
you've
been
using
dd
sketch
is
the
leading
contender
a
lot
of
the
discussion
that
we've
had
about
temporality
in
the
past.
15
minutes
ultimately
comes
down
to
cost
and
cardinality.
A
So
what
we
have
a
situation
where
we
expect
that
users
or
developers
are
going
to
be
writing
instrumentation
and
putting
metric
labels
together,
that
are
sort
of
that
they
may
not
actually
know
what
are
the
useful
labels
or
what
the
cost
tolerances
of
the
people
running.
That
code
are
going
to
be
so
that
you
may
end
up
as
a
situation
where
you're
generating
more
labels
than
you
actually
need.
A
So
what
do
you
do
with
that
situation?
Two
things
that
we
know
how
to
do
that,
we're,
including
in
part
as
as
as
part
of
the
open
telemetry
metric
system.
One
is
that
we
have
the
ability
to
do
in-process
aggregation,
so
these
these
sdks
are
relatively
sophisticated.
They
include
the
ability
to
coalesce
events
that
happen
over
a
short
interval
of
time
and
then,
if
we
want
to
do
label
label
reduction,
we
can
actually
erase
those
labels
and
just
aggregate
those
values
together
so
that
we
get
fewer
time
series
with
more
aggregation
happening.
A
A
So
much
so
in
a
statsd
system,
you
may
have
been
used
to
cardinality
being
okay,
because
every
at
every
flush
interval
you
can
completely
forget
what
you've
been
reporting
and
just
begin
accumulating
new
state,
whereas
in
a
prometheus
system,
if
you
use
high
cardinality,
those
labels
are
stuck
in
memory
for
a
very
long
time.
So
a
stateless
export
pipeline
is
one
where
you
force
the
use
of
delta
export
of
delta
aggregation
temporality.
In
order
to
allow
yourself
to
flesh
out
memory.
A
So,
let's
look
at
some
diagrams.
This
here
is
a
diagram
of
a
standalone
sdk
running
open,
telemetry
metrics.
So
you've
got
your
application
code
running.
You
have
a
runtime,
metrics
instrumentation
package
running!
That's
telling
you
garbage
question
statistics.
Maybe
you
have
a
host
metrics
instrumentation
package,
that's
running!
That
may
be
telling
you
cpu
usage
and
memory
usage,
and
so
on.
You
have
the
open,
telemetry
api
beneath
that
you
have
the
sdk.
The
sdk
has
a
frontline
component.
A
A
I
have
this
orange
box
here
with
a
diagram
saying
here
is
where
we
do
delta
to
cumulative
conversion.
So
this
orange
box
represents
a
long-term
memory
commitment
in
order
to
implement
cumulative
export
for
your
metrics.
You
have
to
put
memory
somewhere
and
in
this
configuration
this
standalone
configuration
you
just
put
that
directly
in
your
sdk.
So
this
is
a
configuration
where
you
don't
want
to
use
a
lot
of
cardinality
because
it's
going
to
sit
in
memory
for
a
long
time,
but
this
is
a
this
is
a
configuration,
that's
very
compatible
with
downstream
systems.
A
So
we
have
this
other
option
to
configure
an
export
pipeline.
That
is
stateless
and
the
way
we
do
this
is
by
making
sure
that
the
aggregation
temporality
matches
instrument
temporality
if
you're
putting
deltas
in
you
give
deltas
out
if
you're
putting
cumulatives
in
you
put
cumulatives
out,
and
this
way
you
have
no
memory
requirements.
So
in
this
diagram
it's
the
same
as
the
diagram
before,
but
now
there's
no
long-term
memory,
commitment
and
there's
no
orange
box.
There's
nothing
happening
in
that
box.
A
It's
just
passing
straight
through,
so
this
is
going
to
cost
you
less
and
if
you
have
an
otlp
endpoint
that
supports
deltas
natively.
This
is
actually
a
good
configuration
and
I
want
to
say,
if
anyone's
listening
here,
if
you're
the
author
of
an
open
source
backend
for
metrics
data.
This
is
an
opportunity,
a
big
opportunity,
to
accept
native
otlp
data,
because
it
will
allow
the
clients
to
begin
configuring
themselves
statelessly,
and
this
is
going
to
be
open
the
door
to
high
cardinality
metrics.
A
So
that
was
a
sort
of
complicated
configuration
here,
because
it
is
standalone.
You've
got
several
things
running
inside
your
process
and
all
those
things
running
inside
your
process
add
up
to
risk
and
cost.
So
there's
another
way
you
can
configure
this
and
this
is
using
the
collector
as
an
agent.
So
I
might
want
to
export
cumulative
metrics,
because
that
makes
the
downstream
system
happy,
so
I
can,
for
example,
write
to
prometheus,
but
somewhere
in
my
pipeline.
I
have
to
configure
this
point
where
I
convert
deltas
to
cumulatives,
which
leads
to
a
long-term
memory
requirement.
A
So
in
this
configuration
I
have
an
application
running
inside
of
a
host,
but
it's
it
is
a
stateless
exporter,
so
it
is.
It
is
outputting
otlp,
with
aggregation
temporality
to
require
no
memory.
The
hotel
collector
is
running
on
the
same
host.
It
is
able
to
implement
that
delta
to
cumulative
conversion,
and
then
you
can
output
otlp
converted
to
cumulative
from
your
host,
but
your
application
is
still
running
statelessly,
so
this
is
a
way
that
we
can
reduce
risk
on
the
application
itself.
A
Actually,
the
collector
is
probably
written
to
handle
that
better
than
your
application
is
so
so
that's
one
configuration
of
course
there
is
still
a
long
term
state
requirement
here.
We've
just
moved
it
from
the
application
to
the
collector,
so
there's
another
configuration.
This
is
where
we
talk
about
just
a
standalone,
prometheus
exporter.
Suppose
you
have
an
existing
prometheus
setup.
You
just
want
to
add
one
new
new
target.
A
I've
got
my
api
sdk,
I'm
still
doing
that
delta
cumulative
conversion,
because
I
must
for
prometheus
and
then
I
can
either
pull
that
data
using
openmetrics
or
I
can
push
that
data
using
a
prometheus
remote
write
exporter
in
both
of
these
cases,
I'm
writing
to
prometheus.
In
both
these
cases,
I
have
to
do
a
delta
to
cumulative
conversion
because
again,
prometheus
requires
it,
but
I
could
also
do
that.
Statelessly
again
same
almost
the
same
diagram
as
before.
A
I've
got
a
stateless
application,
I'm
sending
through
a
stateless
export
pipeline,
I'm
receiving
that
at
the
collector
agent.
I
then
can
do
delta
to
cumulative
conversion,
but
now
I'm
going
to
use
the
prometheus
remote
right,
because
there's
some
impedance
mismatch
using
open,
telemetry
using
openmetrics
to
pull
from
a
collector
we're
not
going
to
do
that
right
away.
A
So
this
is
a
way
you
can
export
to
prometheus
through
a
stateless,
in-process
exporter
and
a
local
agent.
That's
not
the
only
way
you
can
configure
export
for
open,
telemetry
metrics.
I
want
to
do
a
little
review
on
some
of
the
stuff
that
we
just
talked
about.
There's
sort
of
two
ways
that
you
might
to
two
ends
of
the
spectrum
that
you
might
configure
your
metrics
export.
A
You
can
do
either
push
and
pull
or
you
can
do
stateless
or
cumulative,
but
we
generally
pair
the
pull
model
with
cumulative,
which
is
the
prometheus
model
and
that's
the
default
for
open
telemetry
and
we
all
and-
and
that
has
properties-
are
good
for
prometheus
you're
automatically
going
to
work
with
prometheus.
Even
if
you
have
a
collector
in
the
loop,
you
get
easier
reliability,
because
when
you
drop
a
record
or
a
data
point
in
prometheus,
when
you
drop
a
cumulative
data
point,
it
gets
smoothed
out
by
the
next
data
point
that
arrives
so
loss.
A
Loss
of
data
is
pretty
okay
in
a
prometheus
system,
but
high
cardinality
is
pretty
much
not
allowed
and
it
will
blow
you
up
if
you
do
it
by
accident.
So,
in
this
other
side
here
we've
got
this
premie
this
this
push
model
and
this
stateless
export
model.
So
the
one
thing
that's
going
for
it
is
it's
exactly
the
same
as
we're
doing
already
for
traces.
If
you're
going
to
set
up
an
export
pipeline
for
your
span
data,
you
might
want
to
set
up
the
same
exact
topology
for
your
metrics
data.
A
So
the
idea
that
you're
going
to
attach
resources,
as
you
collect
data
through
your
infrastructure,
it
fits
very
well.
It
matches
trace
collection.
However,
it
means
you're
using
delta
delta
aggregation
temporality.
It
means
that
you
have
to
be
a
little
bit.
You
have
to
work
a
lot
harder
for
reliability,
so
you
have
to
avoid
drop.
A
Packets
means,
loss
of
data
and
replayed
packets
means
double
counted
data,
so
we
have
to
be
careful
to
avoid
dropping
data,
but
we
also
have
to
avoid
replay,
but
the
upside
of
this
configuration
is
that
that
high
cardinality
is
no
longer
a
cost
inside
the
process
and
high
cardinality.
If
the
downstream
system
supports
it
is
going
to
be
okay
and
actually
could
be
good
for
the
user.
A
I
didn't
cover
this
com.
This
sort
of
worst
and
best
of
both
cases,
which
would
be
a
push
cumulative
model.
That
is
definitely
a
valid
configuration.
It's
exactly
what
you
want
if
you're
going
to
send
prometheus
data
through
a
collector.
So
these
are
choices
currently
we're
we're.
The
default
for
open
telemetry
is
cumulative
because
it's
the
most
compatible,
but
it
does
mean
that
basically
we're
giving
you
prometheus
status
quo,
you're
gonna
be
keeping
memory
for
your
cardinality
until
we
reconfigure
your
system
to
use
this
push.
Stateless
export
strategy.
A
Good
question,
so
we
need
some
more
collector
development.
There
is
not
an
actual
accumulated
cumulative
processing
stage.
Today,
it's
imaginary
the
reference
implementation
for
the
otel
metrics
sdk,
which
is
the
one
that
I've
been
developing
in
go,
does
have
this
ability
to
select
the
stateless
versus
cumulative
mode
today,
so
the
the
sdk
support
has
been
specced
out.
The
reference
implementation
does
exist,
it
works,
but
as
far
as
a
collector
support,
it's
not
there.
A
Hope
that
answers
the
question-
I'm
very
excited
about
this
because,
as
a
user
from
a
statistic
background,
I
was
used
to
having
a
little
bit
more
cardinality
than
I'm
able
to
get
from
a
prometheus
configuration.
So
I
would
like
to
push
more
hype
more
cardinality
on
the
metrics
world
until
it's
a
real
problem
and
I'm
hoping,
I
think,
there's
a
good
opportunity
for
data
scientists
to
come
help
us
with
actual
cardinality
control
mechanisms
like
we
can
down
sample
to
control
cardinality.
A
I
think
we're
almost
at
the
end
here
of
my
slide
deck.
I
have
a
few
more
diagrams
just
to
kind
of
give
you
a
greater
picture
of
all
the
ways
you
might
configure
open,
telemetry
metrics.
This
is
an
example
of
a
sort
of
kubernetes
deployment
where
you've
got
a
node.
You've
got
a
daemon
set
collector
running
on
every
node
you've
configured,
your
receivers
for
openmetrics
for
statsd
for
otlp
you've
got
the
kubernetes
receiver
built
in
so
you're.
Getting
the
kubernetes
state.
A
Metrics
you've
got
a
host
metrics
running
on
the
collector,
so
you
don't
have
to
run
those
host
metrics
on
every
machine
running
on
the
pod
or
on
the
node,
for
example.
So
these
are
just
a
number
of
sort
of
this
is
the
scale
of
these
installations
is
now
good
we're
going
to
be
able
to
to
talk
about
one
collector
per
node
and
a
bunch
of
different
targets
per
node
and
so
on,
and
this
is
just
part
of
this.
The
plan
for
open,
telemetry
collector
definitely
part
of
that
resource
model.
A
Is
that
we're
going
to
implement
hierarchical
collections
so
that
you
have
a
node
collector?
That's
collecting
all
the
resources,
all
the
metrics
locally,
and
it's
going
to
pass
that
to
a
regional
collector,
that's
going
to
attach
all
of
your
regional
resource
attributes
and
it
might
pass
to
a
global
or
other
levels
of
hierarchy
that
you
can
use
to
organize
your
metrics
data.
A
This
is
something
that
you
might
be
able
to
to
bypass
certain
prometheus
features
with
if
you're
using
prometheus,
you
might
be
using
recording
rules
to
get
some
more
functionality,
but
we
can
just
talk
about
an
expert
pipeline
that
aggregates
everything
together
into
one
place
without
that
type
of
recording
rule
functionality,
which
is
like
a
right
time,
aggregation
all
right,
there's
one
more.
I
saved
it
for
last,
I've
discovered
that
there
are
a
lot
of
prometheus
users
out
there
who
have
invested
a
tremendous
amount
in
their
configuration.
A
So
we
were
looking
for
ways
to
help
prometheus
users
get
on
board
with
open,
telemetry
and
begin
using
otlp.
So
I
found
this
thing
that
the
stackdriver
group
had
done
a
couple
years
ago.
It
is
a
stackdriver,
prometheus
sidecar.
A
I
haven't
talked
about
this
much,
but
today
it's
open
source.
Now
we
opened
up
the
public
the
repository
publicly.
This
is
a
sidecar
for
prometheus.
Basically
lets
you
read
prometheus
data
and
send
otlp
this,
I
see
as
a
short-term
solution,
because,
ultimately,
the
prometheus
project
will
add
metadata
to
its
remote
right
protocol
and
then
it
will
be
able
to
replace
this
sidecar.
So
today
you
cannot
send
prometheus
data
directly
out
of
prometheus
into
otlp
without
a
tool
like
this,
but
hopefully
in
the
future.
That
restriction
will
go
away.
A
I
put
a
few
links
to
this.
This
code
base,
as
well
as
a
couple
of
the
prometheus
issues
that
are
currently
trying
to
address
this
shortcoming,
so
that
hopefully
we
can
retire
this
code
in
the
future,
but
meanwhile,
prometheus
users
should
be
able
to
try
their
open,
telemetry
metrics
system,
and
this
should
also
help
us
migrate
from
prometheus
onto
open
telemetry
in
a
gradual
way.
A
D
Have
a
quick
one
please!
So
this
morning
we
were
talking
about
traces
and
context
and
context
propagation,
and
this
this
afternoon
we
are
talking
about
metrics
and
labels
and
tags.
So
do
we
lose
the
context
of
traces
when
we
handle
open,
telemetry
matrix
or
is
this?
Is
there
a
way
when
we
can
link
or
correlate
the
traces
context
with
the
metrics
labels.
A
Right
first,
I
always
want
to
point
out:
there's
been
a
bit
of
terminology
debate
and
we
we
haven't
actually
settled
it.
So
the
term
attribute
the
term
label
and
term
tag
are
almost
synonymous
and
used
interchangeably.
Here
I
hope
that's
not
the
question
you're
actually
asking,
I
believe,
you're
asking
about
how
we
talk
about
distributed
context
and
getting
those
attributes
that
come
from
say,
distributed
actors
in
your
system
onto
your
metrics.
This
is
something
that
I
mentioned
in.
A
Open
senses
did
include
a
way
to
get
your
distributed
context
attributes
into
your
metrics.
It's
something
that
you
have
to
configure
we're
talking
about
the
terminology
for
this
we've
we've
got
something
called
an
an
enhancer
at
this
point,
which
is
a
a
hook
that
you
can
use
to
pull
attributes
out
of
your
context
and
put
them
into
your
metric
system.
I
hope
that
answers
the
question.
We're
still
talking
terminology.
B
One
of
them
is
that
we
can
pull
key
values
from
your
baggage
from
that
correlation
context
and
put
them
onto
metrics
another
one,
I
think,
is
just
built
into
the
semantic
conventions
of
our
metrics.
C
So
we
were
planning
to
take
a
five-minute
break
before
the
next
session,
but
if
there
are
any
other
questions
you
can
take
them
into
the
hotel
community
slack
channel.
Thank
you
so
much
josh
justin
for
for
presenting
and
sharing
your
expertise.
I
I
shared
my
notes
in
the
chat
I'll
share
them
in
in
the
slack
channel
as
well.
I
got
a
lot
out
of
this.
A
Thank
you.
Shelby
I'd
be
happy
to
stay
on
for
a
little
bit.
I
do
see
the
questions
now
when
I
was
presenting.
I
couldn't
see
it,
I'm
not
sure
how
I
could
have
fixed
that,
but
there's
one
here.
For
example,
I
can
see
about
motel
policy
on
discovery
when
using
open,
metrics
approach.
That
is
really
one
of
the
greatest
questions,
and
it's
not
one
that
we're
really
answering
here.
Openmetrics
or
sorry.
A
The
prometheus
system
has
a
great
big
piece
of
code
that
does
service
discovery
and
right
now,
there's
an
ongoing
discussion
about
how
to
either
emulate
or
simplify
or
replace
such
functionality
in
the
hotel,
collector
world,
and
the
current
state
of
affairs
is
actually
not
so
good.
We've
linked
in
a
huge
dependency
on
the
prometheus
system
and
are
actually
using
that
service
discovery
configuration
which
has
caused
some
friction
and
some
dependency
bloat.
We
are
working
on
that
bowden.
Would
you
like
to
speak?
Anybody
want
to
speak
on.
A
I
I
actually
have
a
dream
here
that
somebody
will
take
the
prometheus
service
discovery
code
factor
it
out
of
prometheus
and
create
a
first
class
service
called
service
discovery
service,
and
then
we
can
have
an
hotel,
collector
plugin,
which
will
reach
out
to
the
service
discovery
service
with
some
sort
of
sharding
information
and
get
a
list
of
sharded
targets
for
it
to
scrape
so
that
we
can
then,
rather
than
linking
in
a
huge
dependency
on
prometheus,
just
actually
call
out
to
a
service
that
does
the
same
stuff.
E
Also,
it
was
mentioned
on
the
slack
that
this
is
a
break
of
five
minutes
before
the
next
session,
so
everyone
who
is
willing
to
chat
more
feel
free
to,
but
otherwise
it's
considered
a
break.