►
From YouTube: 2020-07-31 meeting
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
B
Hi,
I'm
nara.
I
work
for
netflix.
C
Hey
matt,
this
is
sandra
from
aws
x-ray.
C
A
A
A
A
A
D
Oh,
hey
matt,
who
are
you
with
I'll,
introduce
myself?
I'm
will
tran
I'm
with
autonomic
ai
we're
building
a
connected
vehicle
platform
and
sampling
is
very
important
to
us.
So
if
you
need
a
user
to
talk
to,
I
can
be
one
of
those.
A
D
Aws,
I've
actually
been
playing
around
with
secondary
sampling,
trying
to
make
some
modifications
to
the
jager
java
client
to
because
the
way
that
works
is
it
does
a
sampling
decision
at
the
very
beginning
of
the
trace,
and
then
that
decision
holds
for
the
rest
of
the
trace
and-
and
I
read
through
that
secondary
sampling
link-
that
was
posted
a
few
weeks
back,
and
that
was
a
great
idea.
D
I
think
the
use
case
there
is
like
you
want
to
turn
up
the
sampling
rate
for
certain
sections
of
a
trace
and
in
my
case
I
have
traces
where
there's
a
lot
of
noise
and
uninteresting
stuff
and
then
there's
this
one
critical
path.
That's
really
interesting
to
me,
so
I
want
to
turn
down
that
noise
and-
and
I
can
do
that
through
secondary
sampling-
and
you
just
have
to
reevaluate
a
sampling
decision
at
every
span
I
mean-
and
that
can
be
a
little
bit
heavy
on
performance.
B
Yeah
yeah
yeah,
so
we
have.
The
link
that
you
got
is
basically
the
experiment
that
we
did
so
we
do
have
a
prototype,
so
I
haven't
pushed
anything
to
production,
yet
I'm
still
trying
it
out.
B
So
that's
just
the
idea.
So
another
way
of
implementing
that
is
like
which
is
much
easier
on
the
client
library
is
to
collect
hundred
percent,
but
that
do
the
secondary
sampling
in
the
back
end,
while
the
data
is
streamed,
but
so
that
that
so
another
challenge
is
like
now
we
have
to
collect
hundred
percent
and
then
publish
hundred
100.
D
Yeah,
that's-
and
that
was
my
that's
my
fear
in
that
I
mean-
and
maybe
it's
just
a
premature
assumption
of
mine,
where
I
can't
afford
to
collect
a
hundred
percent
of
everything.
Maybe
there
are
some
certain
use
cases
that
I
must
collect
a
hundred
percent
of,
but
in
general
I
cannot
collect
a
hundred
percent
of
of
everything
going
through
the
platform
and
then
post-process
it
it's
just
it.
D
I
fear
it
would
be
too
expensive,
and
so
I'm
trying
to
do
as
much
work
as
possible
up
front
to
cut
down
how
many
spans
my
pipeline
needs
to
process.
E
Sense,
yeah
a
new
relic.
We
have
a
sampling
strategy,
that's
reservoir
sampling
and
it
basically
so
I
was.
I
was
trying
out
the
new
sampling
api
that
we
talked
about
last
week.
That
was
my
first
week,
so
I'm
still
somewhat
new
to
all
this,
but
new
relic
does
the
sampling
just
before
harvest
rather
than
when
a
span
is
created
so
that
it
can
determine
how
many
you
know
spans
have
been
created,
and
if
we
haven't
created
more
than
some
maximum
number,
then
we
just
process
them
all.
E
E
Reservoir
sampling
is
what
we
call
it
internally.
I
don't
know
if
there's
a
like
a
a
wider
term
for
it,
but
yeah.
F
That's
a
standard
scientific
term
in
the
computer
science
literature
you
can
find
several
algorithms
they're
good
things.
I
actually
want
to
see
us
get
some
standard
reservoir
sampling
because
it
gives
you
the
ability
to
do
statistical
like
statistically
accurate
information
like
summarization
of
data
that
you
can't
collect.
All
of
so
I'm
glad
to
hear
that
I
think
in
an
answer
to
your
question.
F
I
I
dialed
in
because
last
week
we
had
sort
of
some
action
items
about
getting
back
to
this
priorities.
Concept,
which
was
going
to
be
there's
gonna,
be
some
follow-ups
from
a
lolita.
I
don't
see
her
here,
and
so
I
don't
actually
have
an
agenda
or
I
was
just
gonna.
Listen
in.
F
But
as
long
as
nobody's
talking,
I
I
would
wanna,
I
do
have
an
idea
of
the
pitch.
I
can't
remember
which
issue
it's
been
discussed
in
the
most.
It
is
that
I
do
enjoy
reservoir
sampling.
I
think
it's
really
something
we
ought
to
be
doing,
and
so
there's
been
this
proposal
to
have
a
sampling
probability
somehow
encoded
as
data
on
the
spam.
F
So
this
in
the
exporter
now
you've
done
some
reservoir
sampling
and
the
thing
about
reservoir
sampling
is
it's
in
size.
That's
the
word
reservoirs
for
fixing
size
and,
if
you
end
up
seeing
more
data
that
can
fit
it's
going
to
selectively
drop
them
in
an
ideally
unbiased
way,
so
that
you
can
then
extrapolate
from
the
statistics
you
get
out
and
summarize
the
whole
population.
F
Now,
if
you're
doing
that
and
you're
doing
this
correctly,
it
means
that
you
can
see
one
span
in
your
exporter
and
and
say
this
represents
10
spans
in
the
output
and
there
are
sort
of
simple
uniform
strategies
for
for
this
type
of
sampling
and
there
are
more
complicated
weighted
strategies
for
this
sort
of
sampling,
but
both
in
both
cases.
What
you
get
out
at
the
end
of
your
analysis
is
an
estimate
of
either
probability
or,
if
you
invert,
that
you
get
an
estimate
of
count.
F
I
like
to
call
it
sample
count,
because
it's
in
the
natural
units
that
we
think
of
when
we
talk
about
statistics
of
these
things,
this,
if
you've
done
one
in
10
pro
sampling,
which
is
sort
of
a
probability
sampling,
I
would
call
fixed
probability
sampling.
Then
every
span
you
get
out
is
going
to
have
a
multiplier
of
10
on
it.
F
So
I
call
that
sample
count
if
you're
using
reservoir
sampling,
you
don't
know
what
that
counts,
going
to
be
until
you
actually
close
the
period
and
do
the
computation,
but
you
do
get
a
number
which
is
either
probability
or
count
and
that
can
be
used
to
generate
graphs
to
do
sort
of
approximate
analysis
in
your
downstream
system.
So
that's
why
I
was
here
to
advocate
that
we
actually
put
that
information
in
the
spam
and
I
prefer
to
see
it
as
a
span
data
field.
F
That's
my
position
on
sampling.
I
don't
particularly
have
a
strong
feeling
about
this.
Other
concept.
That's
been
discussed
about
sampling
priority.
It
has
to
do
more
with
what
you
do
in
band
when
you're
calling
your
peer
to
tell
them
how
much
sample
you
think
you
have,
and
it
does
get
more
complicated
when
you're
doing
scale,
sampling
or
as
reservoirs
something
because
in
the
moment,
in
band
when
you're
doing
reservoir,
stamping
in
the
tail,
you
don't
actually
know
your
effective
probability.
F
D
Hi
josh.
Yes,
I
remember
talking
about
this.
This
sample
counts
issue
on
the
github
issue.
That's
that's
tracking,
this
stuff
and
I'd
love
to
get
some
some
more
momentum
in
in
defining
this,
and-
and
maybe
I
can
provide
some
more
examples
of
how
sample
count
could
be
useful.
D
You
may
want
to
have
like
multiple
samplers
running
through
your
data
and
if,
if
the
subsequent
sampler
can
pick
up
the
sample
account
of
input
data,
it
can
continue
to
use
that
in
its
output,
and
so
it
will
output
the
further
sampled
data
that
can
still
be
reinflated
to
represent,
I
guess
some
semblance
of
the
original
population.
D
So
yes,
that
would
that
would
support
subsequent
sampling
like
another
kind
of
sampler,
it's
a
little
more
simple,
I
guess
would
just
be
like
a
leaky
bucket
rate
limit
kind
of
sampler
and-
and
that
could
I
mean
it's,
not
probabilistic,
but
it
can
keep
track
of
what
it
throws
away
and
then,
when
it
outputs
something,
then
it
just
outputs
the
the
count
of
all
the
things
that
it
threw
away
along
with
its
output
span.
D
F
Yeah,
you
just
gave
an
idea
that
I've
never
considered
about
how
to
estimate
the
probability
from
a
from
a
leaky
bucket
type
sampling
scheme.
Yeah,
you
could
sample
the
things
you
drop
and
then
add
them
to
the
things
you
keep.
I
guess
I
haven't
really
ever
looked
at
that,
but
I
am
familiar
with
at
least
two
good
reservoir
sampling,
algorithms.
F
Maybe
three
actually
and
I've
experimented
with
them.
They
work
I
really
like
them.
So
I
want
to
make
sure
that
we
we
all
kind
of
if
you,
if
you're,
not
familiar
with
how
to
get
a
estimated
sample
count
from
probabilities.
F
E
We
do
resource,
oh
sorry,
go
ahead,
please
I'm
just
going
to
say
the
way
that
we
do
reservoir
sampling
at
new
relic.
We
also
doing
incorporate
a
priority
so
that
it's
not
completely
random.
So
if,
for
example,
a
span
has
an
error
on
it,
we
can
increase
the
priority
and
thus
the
chance
that
it
will,
you
know,
get
processed,
and
so
you
know
we,
you
know
we
definitely
want
to
keep
track
of.
You
know
how
many
are
getting
dropped.
E
You
know
how
many
and
all
of
that,
but
it's
it'll
the
way
that
we
do
it
it's
a
little
harder
to
statistically
tell
you,
know
things
about
them
other
than
the
numbers,
because
you
know,
maybe
you
know
50
come
in
with
errors,
and
so
we
increase
the
the
priority
on
those.
So
all
of
them
get
sampled,
but
that
does
not
necessarily
represent
the
the
percentage
of
them
that
had
errors.
F
Yeah
I
see
I
I'm
I've.
At
least
I
think
I
I'm
trying.
Let
me
try
to
say
that
sounds
good.
I
think
there
may
be
ways
to
it.
Doesn't
I'm
not?
I'm
not
sure,
I'm
familiar
exactly
with
what
you're
doing,
obviously,
but
I
I
am
confident
there
are
ways
and
probably
ways
that
you're
actually
doing
to
get
fairly
good
signals.
Out
of
that
which
is
what's
what's
really
important
here.
F
I
think
the
the
debate
that
I
remember
in
one
of
the
issues
was
whether
there
is
a
firm
sort
of
mathematical
concept
here,
which
is
that
there
is
a
way
to
put
a
single
number
on
a
spam
and
extrapolate
information
from
it,
which
is,
is
the
claim
and
all
there's
so
many
ways
to
compute
that
single
number
and
they
all
have
different
characteristics.
F
You
know
you
can
move
variants
around
like
you
can
improve
accuracy
in
one
place
of
the
data
space
and,
like
you
know,
you're
sacrificing
variability,
you
know
variance
for
bias
and
so
on,
and
ideally
we're
unbiased.
But
there
is
a
concept
of
I.
I
don't
like
the
phrase
biased
sampling
that
has
been
used
in
the
observability
space.
In
recent
years,
I
think
from
looking
through
statistics
textbooks
you
might
actually
prefer
to
call
that
unequal
probability
sampling
and
there
are
papers
which
I've
posted
links
to
in
some
of
these
issues.
F
Talking
about
how
to
implement
weighted
reservoir
sampling.
When
you
have
weighted
reservoir
sampling,
you
can
implement
this
unequal
probability
sample
strategy.
So
I
I
I
will
look
up
this
issue.
I
gave
a
brief
summary.
I
wish
I
had
code.
I
could
just
open
source
and
I've
actually
asked
my
company
to.
Let
me
do
this,
but
so,
let's
suppose
that
you
have
a
a
reservoir
which,
in
which
you've
captured
for
temporary
purposes,
all
of
your
data
right
excuse
me
for
temporary
search
purposes.
F
You've
aggregate
you've
got
all
of
your
data
sitting
in
a
buffer
and
you
and
it's
like
10
000
spans,
but
you
only
want
to
send
out
1000
spans
so
you're
in
the
tail
of
this.
At
this
moment,
you're
exporting
from
span,
you
need
to
reduce
the
data
by
a
factor
of
10..
If
you
have
a,
you
can
take
two
passes
over
your
data.
Take
the
first
pass
over
your
data
you're
just
going
to
count
how
many
errors
are
there
and
it
looks
like.
F
F
What
you
then
can
do
is
construct
this
arbitrary
weight
factor
and
you
want
the
the
weight
factor
on
error
spans
to
be
33
times
as
much
as
the
weight
factor
on
non-error
spans,
because
then,
when
you
multiply
them
out
the
span
times
the
weight
you
get
an
equal
sum.
So
the
sum
of
weight
times
span
count
for
errors
is
the
same
as
some
subtle
span
time
weight
count
for
non-errors.
Now
you
run
your
weighted
probability
sampler
using
a
reservoir.
F
So
what
what
that
means
is,
you
know,
like
every
span
of
an
error,
had
an
equal
probability
of
getting
into
the
output,
and
every
non-error
span
had
an
equal
probability
of
it
within
its
group
of
getting
into
the
output
and
this
reservoir
sampling
algorithm
or
any
any
weighted
reservoir
sampling
algorithm
will
then
give
you
a
select
for
you,
a
thousand
spans
out
and
each
one
of
those
gets
a
weight
factor
applied
to
it,
which
is
going
to
be
approximately
10.
F
Well
sorry,
because
the
output
now
of
this
algorithm
is,
we
expect
500
error,
examples
and
500
non-error
examples.
That's
what
this
map
does
here,
because
the
goal
of
a
weighted
probability,
sampling
algorithm
is
to
give
you
estimated
sums,
and
so
the
estimated
sum
of
all
error
spans
is
going
to
equal
the
estimated
sum
of
all
non-errors
fans.
F
The
multiplier
on
the
error
spans
in
that
case
is
exactly
one
and
the
multiplier
on
the
non-error
spans
is
going
to
be
add
up
so
that
so
that
they
each
represent
seven
hundred
out
of
ten
thousand
seven
nine
thousand
seven
nine
thousand
ninety
three
thousand
out
of
ten
thousand
or
something
like
that.
Ninety
three
hundred
out
of
ten
thousand
that
ratio
inverted,
is
the
weight
or
something
like
that.
I
don't
do
math
very
well
in
my
head,
but
I
don't
know.
Hopefully
this
didn't
sound
extremely
complicated.
F
This
is
the
basic
approach
when
you
use
weighted
sampling
and
it
gives
you
the
sample
count.
The
single
number,
which
is,
I
think
this
band-
represents
10
rising.
This
10
represents
20
and
you
can
use
those
numbers
to
generate
statistics
that
are
they're
accurate,
if
not
precise,.
F
A
Yeah
josh,
I
look
forward
to
to
reading
that
paper
right.
I
think
what
you
said
makes
sense
to
me
my.
Hopefully,
this
isn't
kind
of
a
stupid
question,
but
you're
trying
to
get
you
know
essentially
metrics
that
that
aren't
skewed
and
so
you're
trying
to
get
like
an
accurate
percentage
of
errors
and
non-errors
and
so
like.
Where
does
the
scope
of
you
know
this
sampling
metrics
accurate
metric,
stop
and
like
pre-aggregated
metrics
begin?
A
F
A
C
What
what
we
have
in
azure
monitor
that
similar
to
new
relic?
We
do
sampling
class
thing
after
we
collected
everything
and
we
use
the
tracing
data
to
build
metrics
like
number
of
http
requests
with
certain
cardinality.
So
I
think
what
matt
you're
asking
is,
whether
the
approach
described
by
josh
helps
to
solve
this
problem.
Isn't
the
question.
A
F
I
I'm
not
sure
if
I'm
fully
following,
but
I
think
I
can
answer
affirmatively
that
that
the
goal
here
is
to
say
we're
only
looking
at
spans,
we're
not
concretely
computing
metrics
after
seeing
10,
000
spans
and
computing
unbiased,
that's
really,
the
key,
unbiased,
unbiased,
unbiased
is
that
every
span
had
an
equal
chance
of
getting
into
the
output
according
to
its
weight
and
therefore
those
you
I
mean
this
is
just
like
this
is
like
basic
statistics
and
I'm
not
really
an
expert,
I'm
like
not
a
mathematician,
but
because
of
the
unbiased
property
that
you
get
this
estimated
count
or
inverse
probability
whatever
you
want
to
call
it,
which
is
accurately.
F
That
is
accurate
in
the
sense
that
I
can
sum
any
subset
of
my
data
sum
those
counts
and
estimate
that
that
is
the
number
in
the
whole
population.
From
my
sample.
I
just
put
a
link
in
the
chat
where
the
comment
I
was
referring
to.
There
are
two
papers
that
are
linked
in
there.
This
the
authors
overlap,
there's
sort
of
a
sequence
of
papers,
the
one
the
first
one,
the
sort
of
simplest
one
is
called
priority.
C
F
F
The
whole
point
of
that
paper
is
is
what
they
call
subset
sum
so
you're
going
to
compute
a
weighted
sample
and
then
using
the
items
in
your
weighted
sample.
You
can
estimate
the
weight
of
a
subset,
an
arbitrary
subset.
So
that
means
we
can.
We
can
filter
our
sample
count.
The
sum
sum
the
counts
and
say
that's:
the
proportion
in
the
whole
population,
and
this
this
meant.
This
thing
I
mentioned
earlier
with
a
two
pass
algorithm
where
I
I
compute
the
probability
of
an
error
and
then
use
that
to
generate
a
weight.
F
I've
been
calling
that
to
myself,
I
call
that
inverse
probability
sampling
and
there
are
some
papers
out
there
that
roughly
describe
the
same
thing
and
put
the
math
behind
it.
That
explains
what's
what's
going
on,
but
by
the
same
authors.
Actually,
if
you
just
go
digging
through
the
sort
of
citations
to
these
papers.
So
if
you
read
through
this
first
paper
from
2005
called
priority
sampling,
it
gives
you
a
technique
for
what
I
call
natural
weight
sampling.
F
So
if
you're
counting
network
bytes,
let's
say
you've
got
a
packet
with
those
500
bytes
you've
got
a
packet
that
was
a
thousand
bytes.
You've
got
a
million
packets
and
they
each
have
a
size,
and
you
can
only
store
a
sample
of
a
thousand
packets,
and
so
you
do
that
using
the
actual
size
of
the
packet,
and
the
reason
is
that
your
goal
in
this
case
is
to
estimate
something
about
the
total
network
traffic
from
your
thousand
packets
and
your
thousand
packets
can
be
categorized
and
summed
in
many
different
ways.
F
That's
the
subset
thing,
so
I
can
say,
choose
me
all
packets,
with
a
particular
endpoint
for
a
particular
ip
address.
Now
sum
all
those
counts,
and
that's
the
number
that
you
expect
approximately
for
that
one
endpoint
from
your
sample
data.
So
that's
what
I've
been
calling
natural
weights
that
you
actually
have
a
weight
on
the
piece
of
data
itself.
The
case
that
I
described
for
the
hit
the
tail
sampling
earlier,
where
you
take
two
passes,
is
you're
just
making
up
weights
to
give
you
the
probabilities
to
give
you
the
output
you
want.
F
F
F
You
first
just
understand
how
natural
related
sampling
works
with
a
reservoir
and
a
weighted
sampling
algorithm,
and
then
there's
this
next
step,
where
you
can
take
two
passes
or
there's
again
a
million
ways
to
do
this,
so
that
does
give
you,
I
think,
just
trying
to
tie
this
back.
F
The
original
question,
metrics
from
spans
metrics
from
a
sample
of
spans,
is
what
we're
actually
saying
so
that
the
you
can
basically
compute
a
pre-aggregated
and
sorry-
and
I
don't
say,
pre-aggregate,
you
can
compute
an
aggregated
metric,
approximately
from
a
sample
of
spans
if
it
was
done
in
an
unbiased
way,
and
that
that
is
something
I
feel
is
not
quite
well
enough
understood
in
the
group
of
open
summit
tree
to
push
forward
on
these
issues
like
I
just
want
to
have
a
sample
count.
It's
a
double.
F
It's
a
floating
point
number,
because
it's
going
to
be
approximate
and
it's
on
every
span
and
if
it's
not
set,
it
means
I
wasn't
using
probability
sampling
and
that's
okay.
You
can
still
keep
your
token
bucket
or
whatever
it
is,
and
but
I
think
what
I'm
sensing
is
that
in
the
open
source
community,
there
wasn't
an
unknown
solution
for
this
problem,
and
so
they
adopted
token
bucket
sampling,
and
that
means
that
tracing
has
been
seen.
F
That
gives
us
not
just
an
example,
but
an
estimated
weight
or
a
count
on
that
thing
and
getting
back
to
elizabeth's
original
statement
like
reservoir
sampling
is
exactly
the
way
you
can
do
both
a
rate,
limited
sample
and
get
probabilities
at
the
same
time,
and
and
just
having
this
one
field,
there's
so
much
flexibility
in
how
a
vendor
goes
about
this.
Like
I
mentioned
this
two-pass
algorithm,
it
was
very
arbitrary.
The
way
I
described
that
you
can
just
do
lots
of
different
things.
A
Josh,
what
what's
going.
A
Oh,
no,
it's
good,
I
mean
it
makes
sense.
I
mean
you
said
initially,
you
know
you
have
to
do.
The
algorithm
would
have
to
know
kind
of
what
the
total
sample
size
is
and
to
how
to
properly
assign
the
weights
right.
So
there's
kind
of
like
some
work
up
front.
That
needs
to
be
done
is
that
is
that
possibly
why
this
hasn't
been
done
before
it's
like
kind
of
more
expensive.
F
I
yeah,
I
definitely
gave
a
kind
of
boot
for
a
solution
here
where
I
was
buffering,
10
000
spans,
which
is
expensive
and
that's
sort
of
the
drawback.
As
I
took
two
passes
over
my
data
and
you
don't
always
have
that
luxury,
and
I
keep
I'm
also
kind
of
waving
my
hands
here,
because
there
are,
I
mean,
like
it's,
not
difficult,
I
think,
to
come
up
with
a
strategy.
F
That's
adaptive
is
like
I
can
say,
I'm
gonna,
like
kind
of
make
a
guess
like
in
the
last
hour,
I
had
three
percent
errors,
so
I'm
just
going
to
guess
that
I
should
wait
errors
33
times
more
than
non-errors.
That's
not
necessarily
going
to
get
you
the
output
you
want,
but
it's
also
still
an
unbiased
thing
so
like
if
you
also
all
of
a
sudden,
have
no
errors.
Your
your
output's
still
going
to
have
statistical
validity.
It
just
won't
have
any
errors
in
it.
F
If
you
all
of
a
sudden
have
100
errors,
the
output
will
have
100
errors
and
everywhere
in
between
there's
there's
these
trade-offs
between
bias
and
various
that
are
just
kind
of
you're.
Shifting
around
your
uncertainty,
like
you,
must
have
new
uncertainty
because
you
reduce
the
size
of
the
data,
so
the
various
approaches
basically
have
different
properties
and
I'm
really
not
a
mathematician.
F
So
I'm
not
so
good
at
explaining
that
these
trade-offs
sort
of
technically
speaking-
and
I
mentioned-
I
don't
know-
let
me
pause
again-
I'm
definitely
advocating
for
this
field
and
I'm
happy
to
back
it
up
with
any
talk
about
sampling,
algorithms
that
we
can.
F
Have
if
you
wouldn't
mind,
I
want
to
indulge
another
idea
which
I
mentioned
a
week
ago.
The
reason
I've
come
to
this
meeting
is
not
just
to
like
to
sort
of
promote
the
idea
of
weighted
reservoir
samples
and
so
on,
but
in
the
metric
space,
where
I've
been
we've
been
sort
of
finishing
up
the
specifications.
F
I
have
this
it's
more
of
a
personal
like
mission.
I
think,
and
then
that's
why
I
did
back
off
a
little
bit,
but
the
metrics
world
has
been
divided
between
prometheus
and
staff,
see
for
10
years
or
so,
and
the
prometheus
world
has
a
very
sort
of
strict
notion
of
what's
allowed
as
far
as
cardinality
this
fc
world.
Just
does
not
so
you
you
have
these
sort
of
different
users
user
pools,
one
of
which
is
accustomed
to.
I
guess,
high
performance
and
low
cardinality
and
the
other
is
accustomed
to
less
performance.
F
But
high
cardinality
is
okay
and
getting
to
a
point
where
open
telemetry
had
a
viable
metrics
library
that
any
user,
including
assatsy
or
prometheus
user,
might
accept,
meant
sort
of
like
at
least
opening
the
door
to
high
cardinality,
but
also
not
requiring
some
sort
of
huge
memory
consumption,
which
is
what
prometheus
has.
If
you
use
high
cardinality
and
the
way
we've
I
think
approach
this
is
rests
on
a
belief
that
I
can
that
I
have,
which
is
that
we
can
use
the
same
stamping
ideas.
F
I
just
described
to
reduce
the
dimensionality
of
a
metric
so
that
you
may
have
metrics
coming
in
with
three
dimensions
or
four
dimensions
and
the
exploded
cardinality
of
those
three
or
four
dimensions
might
be
very
high,
so
you
might
say
in
your.
I
only
intend
to
monitor
precisely
in
terms
of
exact
counts.
I
might
prefer
to
only
monitor
two
of
those
dimensions,
so
I'm
going
to
drop
two
dimensions.
F
I
can.
I
can
do
this
in
actually
two
ways
I
can
compute
one
sample
for
every
combination
of
the
first
two
dimensions.
I
could
compute
like
100
points
for
every
exact
combination
of
the
first
two
dimensions,
or
I
could
compute
a
thousand
points
for
all
of
them
and
each
in
each
case,
I'm
basically
going
to
do
the
same
type
of
thing.
F
I'm
going
to
say
I
would
like
to
get
you
don't
actually
have
to
do
any
of
this
weight
stuff,
for
example,
I'm
just
gonna
say
uniformly
sample
all
the
events
that
match
some
combination
of
my
first
two
dimensions
and
get
a
hundred
points
out.
Those
hundred
points
have
sample
counts
in
them,
just
like
I
described
for
spans
and,
if
I
add
those
up,
those
will
estimate
approximately
the
missing
dimensions
in
my
exact
calculation.
F
It's
the
same
approach,
saying
that
you
know
we
aren't
going
to
collect
all
the
spans,
we're
not
going
to
collect
all
the
metric
dimensions,
but
here's
a
sample
that
lets
you
approximate
those
metric
dimensions
that
were
that
weren't
exactly
calculated
so
I've.
Actually,
we
have
a
proposal
that
puts
a
sample
count
field
into
the
raw
exemplar
for
a
metric
data
point,
and
I
intend
to
say
it's
exactly
the
same
thing
that
is
being
discussed
for
sample
count
in
a
span
data
and
I'm
now
going
to
stop
talking
like
somebody.
F
D
I
just
caught
up
so
I
posted
and
I'm
trying
to
keep
notes
here
as
our
facility.
A
regular
facilitator
is
not
with
us.
I
put
a
link
to
the
discussion
of
in
the
open,
telemetry
specification
that
request,
so
you
can
open
that
up.
I
just
caught
up
with
that.
I
had
no
idea
that
there
was
a
lot
all
this
movement
from
10
days
ago,
but
but
thanks
to
josh
for
moving
this
thing
along.
F
I'm
happy
to
help.
This
has
been
an
area
of
personal
interest
for
me
for
a
long
time
before,
dating
to
before
I
came
to
lightstep.
Even
so,
I've
been
trying
to
figure
out
how
to
solve
sampling
in
high
cardinality
as
a
user
practitioner
for
a
while,
and
I
ended
up
asking
around
at
google
back
when
I
worked
at
google
and
found
these
algorithms
through
sort
of
asking
smarter
people
than
myself.
D
So
I
think
I
brought
this
this
up
before
a
couple
of
weeks
ago,
or
maybe
it
was
three
weeks
ago
when
I
last
attended
this
meeting
and
ted
young.
He
was
talking
about.
Well,
I
guess
he
he
didn't.
D
He
wasn't
part
of
like
directly
part
of
this
conversation
around
sample
count,
but
he
kind
of
knew
of
it
and
he
knew
that
there
was
a
lot
of
waffling
and,
and
he
suggested
well
could
we
could
we
just
like
put
this
behavior
in
a
sampling,
plug-in
and-
and
I
I'm
not
familiar
with
the
sampler
plug-in
api,
but
but
I
feel
like
it's
a
strong
enough
concept
that,
like
that,
can
work
universally
well
and
put
it
in
the
spec
rather
than
just
these
sort
of
sampler
plug-ins
that
you
could
write
that
that
may
or
may
not
support
it.
D
Unless
we
need
some
more
like
empirical
evidence
of
it,
of
its
universal
usefulness,
and
that
may
be
what
we
kind
of
have
to
start
with,
yeah
there's.
F
This
trace
state
thing,
which
is
where
you
can
put
vendor-specific
stuff,
and
you
could
imagine
like
fitting
it
in
there.
I
guess
my
I
think
you're
right,
there's
a
question
and
there's
a
lot
of
uncertainty.
So
the
question
is:
is
there
a
broadly
useful
meaning
for
a
single
value
called
sample
count?
F
Somehow
and
that's
not
known
like
if
I
was
using
one
type
of
reservoir
sample
versus
another
there's
just
trade-offs
here,
so
I
think
it
is
a
middle
ground.
An
argument
is
that
the
middle
ground
is
that
there's
something
useful
in
the
middle
ground
and
if
there's
something
more
complicated,
you
need
like
it's,
probably
not.
It
doesn't
fit.
D
Now,
if
we,
if,
if
sampling
plug-ins,
only
give
us
access
to
trace
state,
whereas
like
a
sample
count
as
a
I
think,
you're
saying
a
span
attribute,
I
I
think
sample
like
within
a
single
trace
and
when
we're
talking
about
secondary
sampling,
then
then,
if
we're
going
to
use
sample
count
and
secondary
sampling,
then
it
actually
has
to
be
part
of
every
span.
D
F
There's
a
choice:
I'm
there's
like
an
area
where
I've
felt
vague
uncertainty
just
because
there
was
something
in
this
in
a
spec
that
didn't
have
it
wasn't
flushed
out
enough
to
know
exactly
what
it
was
there
for,
but
there
is
something
both
in
the
spam.
Struct
called
trace
state
as
well
as
I
think
in
the
link
structure,
which
that's
one
where
I
get
confused
about,
but
so
there's
like.
I
don't
understand
why
you
have
the
same
thing
in
both
your
link
and
your
node,
but.
F
So,
oh
thank
you.
I
remember
sergey
explaining
that
microsoft
had
a
position
here.
So
probably
you
could
explain
that.
C
Yeah,
so
the
idea
of
the
trace
state
is
that
it's
something
that
flows
with
your
trace,
but
you
can
mutate
it
on
every
boundary
and
it
is
like
from
my
best
understanding.
It
is
a
way
to
stitch
things
together.
C
Let's
say
you
have
legacy
correlation
protocol
or
legacy
ids
that
don't
fit
into
the
ws3c
trace
parent
or
if
you
need
to
propagate
some
control
information,
so
our
interest
from
microsoft
site
here
in
sense
of
sampling,
that
we
want
to
propagate
the
score
of
the
sampling,
the
value,
the
hash
value
you
calculate
out
of
the
trace
id
and
it's
somewhat
similar
to
what
your
josh
is
describing.
C
The
item
count,
and
I
would
be
curious
to
think
more
about
it
and
understand
if
you
can
do
this,
the
one
thing
that
would
work
in
both
cases.
So
it
seems
in
your
case,
you're
more
interested
in
in-process
value
and
we
are
interested
in
both
in
process
and
the
fact
that
this
value
can
flow
downstream.
F
Yeah,
I
am
familiar
with
the
complexity
that
arises.
I
like
I'm
thinking
backwards.
I
was
on.
I
was
in
the
dapper
team
back
at
google
and
there
was
this
mechanism
that
they
had
and
I
didn't
own
the
code.
Thankfully
it
was
pretty
complicated,
it
was
all
c
plus
plus,
and
it
was
basically
like
at
the
moment
when
you
start
a
spam.
F
You
have
a
probability
that
you're
that
you're
given
is
a
target
for
your
own
process.
If
you're
a
root,
then
just
flip
a
coin
or
whatever
with
that
probability,
but
if
you're
not
a
root.
This
is
when
it
got
complicated,
so
you're
expected
to
propagate
your
parents,
probability
in
so
that
you
know
at
the
moment,
when
you're
starting
a
child,
what
your
parents
probability
was,
and
then
you
have
your
own
probability
and
you
compare
those
two
if
your
own
probability
is
less
than
the
parents.
You
just
like.
F
Take
your
parents
decision,
but
if
your
own
probability
is
greater
than
your
like,
then
you
make
your
new
decision
create
a
new
route
and
so
on-
and
this
is
the
case.
This
is
the
point
where
you
want
this
notion
of
an
in-band
probability
that
it
might
be,
and
this
is
where
it
gets
really
for
me,
it's
complicated
because
tail
sampling
is
different
than
head
sampling
and
the
sample
api
that
we
have
is
head
sampling.
F
So
if
you
will
indulge
me
for
the
moment,
try
and
explain
one
thinking
here
is,
like
you
start
a
span
knowing
its
probability,
let's
say:
you're
a
root,
I'm
I'm
head
sampling
at
that
point
and
let's
say
I've:
I've
decided
to
sample
only
10
at
the
head
because
it's
like
too
much
throughput
too
much
data.
So
so
now,
I'm
running
through
my
code
and
I've
done
10
samples.
So
I
have
an
effective
count
now
of
10.
F
and
I
start
a
child,
meaning
I'm
gonna
make
an
rpc
and
I'm
going
to
send
some
information
to
that
child
through
an
inband
connection.
F
This
is
where
we've
there's
been
a
request
for
this
thing
called
sampling
priority
and
I
think
it
came
from
microsoft
and
it's
understandable.
But
when
you
think
about
tail
sampling,
it
starts
to
get
a
little
bit
trickier,
but
but
it's
still
legitimate,
it's
just
really
complicated.
So
then
you
make
your
call
to
the
to
the
child.
You
pass
some
numbers
saying
I'm
sampled
with
max
with
10.
F
Now
the
child
may
have
some
logic
that
it
wants
to
do
which
says
I
I
have
myself.
I
have
a
absolute
rate
limit
on
on
spam
output
like
it's
not
even
about
sampling
rate.
It's
about,
I
can
only
send
thousand
spans
per
second
like
and
I'm
getting
15
000
requests
per
second,
so
I
have
to
do
something
so
at
any
given
moment,
I'm
going
to
get
a
new
fan,
starting
in
the
child.
I
have
to
make
a
decision.
F
I
don't
know
a
way
to
do
this
other
than
to
be
speculative
and
to
say
I
I
I
have
an
estimated
target
and
I
have
an
incoming
probability
and,
like
I'm,
gonna,
have
to
flip
a
coin
and
just
like
in
order
to
get
my
output
rate,
I'm
going
to
do
something
based
on
the
input
rate,
but
the
input
is
still
speculative
side,
because
the
problem
is
there's
a
divergence.
F
F
And
yet
yeah
there's
no
good
solution
here.
All
I
can
say
is
that
in
band
I
think
we're
actually
propagating
is
a
head
sampling
position
or
else
like
a
upper
bound
or
a
lower
back.
It's
a
lower
bound
on
probability.
It's
an
upper
bound
count.
You
can
only
you
can
only
you
can
only
change
that
in
one
direction
you
can't
lower.
I'm
sorry,
I'm
getting.
F
You
can't
lower
a
raised
probability,
one
way
or
the
other.
It's
it's
one
of
those
directions.
You
can't
do.
You
can
only
increase
your
sample
count.
C
Yeah,
so
if
you
sampled
in
head-based
sampling
on
the
on
the
incoming
boundary
10,
you
still
can
have
lower
rate
somewhere
else,
and
I
guess
the
the
best
effort
that
we
can
get
here
is
not
like
consistent
sampling
everywhere
right.
We
cannot
do
this
ever,
but
what
we
want
is
to
assuming
you're
services
are
configured
in
like
any
compatible
way
like
you,
don't
do
completely
different
sampling,
algorithms
in
different
places.
C
What
we
want
to
achieve
is
that
no,
we
want
to
eliminate
this
algorithm,
how
you
flip
a
coin.
So
basically
you
flipped
it
once
and
you
have
this
double
or
flawed
value,
and
if
later
you
want
to
flip
a
coin,
you
you
don't
flip
it.
You
compare
it
with
your
probability.
C
So
this
way
we
just
don't
stick
with
a
particular
sampling
algorithm.
We
don't
even
care,
it
could
be
random.
F
Yeah,
well
that
that
that
actually
connects
with
this
paper,
I
I
linked
to
from
2005
called
priority
sampling.
The
way
that
one
works
is
you
just
literally
attach
a
random
number
to
every
piece
of
data
and
it's
got
to
be
safe.
So
at
the
moment
you
get
a
new
piece
of
data,
just
generate
a
random
number
between
zero
and
one
and
attach
it
to
the
data
and
make
sure
it
stays
with
the
data.
And
then
you
can
resample
it
downstream
and
you
can
continue
resampling
it
and
the
properties
of
the
algorithm
work.
F
I
mentioned
that
it
was
a
little
harder
to
use
this
algorithm
and
it's
because
you
have
to
attach
a
random
number
to
every
piece
of
data,
but
and
that's
why
the
second
link
is
actually
easier
to
use
harder
to
implement,
but
that
was
that
gave
me
an
idea
when
we're
having
a
conversation
about
this
fixed
decision
quality.
Is
that
instead
of
having
this
spec
out
spec
for
a
sampling
priority,
which
is
like
a
number
between
zero
and
whatever
you
could
instead
create
a
pro?
F
B
So
these
are
like
smart
ways
to
we're
talking
about
smart
ways
to
sample,
but
I'm
curious
like
another
way
to
look
at
another
angle
to
look
at
this
is
what
is
the
use
of
the
trace
data
that
is
collected
this
way?
Like
you
know,
what
use
cases
can
this
enable
right?
So,
ultimately,
we
are
going
after
troubleshooting
use
cases
right,
so
data
collected
this
way.
Would
it
be
useful
because
in
in
our
experience,
what
we
have
found
is
like
anything
like
anything
less
than
100
like
for
what
I
want
to
look
at
is
useless.
B
Really
it
doesn't
help
other
than
just
understanding
the
service
topology
or
you
know
things
like
that,
but
not
really
for
troubleshooting.
B
So
when
I
say
hundred
percent,
not
hundred
percent
hundred
percent
sometimes
like,
if
I
get
hundred
percent
of
what
I
want
like,
for
example,
we
have
on
demand
tracing
where
we
sample
hundred
percent
for
certain
criteria,
and
it
said
sampling
again,
that's
one
possibility
and
the
other
one
is
the
secondary
sampling.
So
I
want
to
look
at
trace
for
few
services,
so
we
collect
hundred
percent
for
one
or
two
services.
B
F
F
I
think-
and
I
think
part
of
this
sort
of
appeal
of
open
tracing
or
open
telemetry
is
that
you
may
be
able
to
sample
spams
but
ensure
that
you
get
100
of
logs
in
each
span,
so
that
you're
at
least
looking
at
a
consistent
all
of
the
information
when
you're
looking
at
a
single
piece
of
it,
but
and
that's
that's-
maybe
sort
of
arbitrary
definition,
but
I
think
of
it
that
way.
F
So,
if
you're,
if
your
use
cases,
I
need
to
find
exactly
some
specific
information,
it's
rarely
the
sampling
is
going
to
get
in
your
way.
True
truly,
so
I
think
that
just
the
quick
answer
to
your
question
is
really
that
if
you
have
sampled
data,
you
can
generate
metrics
from
it.
That's
that's
the
application
and
there
are,
and-
and
hopefully
you
can
tolerate-
high
cardinality,
because
there's
much
more
of
a
convention
of
using
high
cardinality
tags
on
spams
than
it
is
on
metrics.
F
So
and
and
then
I
think,
some
of
the
applications
that
that
these
vendors
have
including
us,
especially
on
honeycomb,
I'm
thinking
right
now,
are
like
I'm
issue.
An
arbitrary
query
give
me
some
statistics
on
the
rates
of
things
that
match
my
query
and,
like
you
can
generate
graphs
from
an
arbitrary
query
using
if
the
samples
are
unbiased
and
so
on.
So
that's
the
use
case.
It's
it's.
Please
give
me
some
graphs
about
this
broad
data
that
I've
summarized
right.
B
I
guess
there
the
assumption
is:
the
metrics
are
created
from
the
spam
data,
because
the
thing
that
is
getting
counted
is
not
instrumented
otherwise,
but
in
practical
sense
like
we
have
ipc
counters,
like
you
know,
for
any
any
rpc
calls
made.
There
are
like
separate
counters,
at
least
like
for
us.
Metrics
is
first
when
it
comes
to
instrumentation
anything
to
do
with
accuracy
and
like
that's
the
primary
tool
for
troubleshooting.
B
F
B
F
Yeah,
I
guess
I
guess
the
the
only
leftover
bit
of
thing
there
is
is
when
you
have
high
cardinality
it
you'll
have
too
many
counters
and
then
that's
I
potentially,
and
I
I
again
it's
very
speculative.
I
I
am
as
a
vendor.
I
don't
even
know
that
this
is
going
to
catch
on.
So
I'm
I'm
just
promoting
an
idea
here,
which
is
that.
B
Yeah,
so
I'm
I'm
not
an
expert
in
high
cardinality
metrics
data,
but
from
what
I
have
seen
the
other
parts
of
the
organization.
The
way
they
do,
that
is
high.
Cardinality
data
is
converted
to
a
matrix
to
a
metric
using
our
stream
processing
system
called
mantis
so
there,
depending
upon
what
to
do
with
the
metric.
Certain
tags
are
dropped
and
the
the
metrics
the
metric
that
is
created
represents
what
they're
trying
to
troubleshoot
so
yeah.
F
That's
that's
fair,
that's
fair!
That
is
probably
the
most
traditional
way
and
I'm
advocating
for
something
non-traditional
a
little
bit
so,
but
it's
just
really,
I
mean
advocating
for
something
that
that
can
be
useful
if
you
use
an
algorithm
that
generates
this
number
and
it
yeah-
and
I
guess
another
way
if
I
just
one
last
way
of
looking
at
this-
is
that
lifestep
started
as
a
tracing
company
and
and
doesn't
do
didn't,
do
metrics
we're
we're
starting
to
get
metrics
in
our
product
right
now,
and
so
so
that
change.
F
That's
changing,
but
and
and
but
if
you
look
at
sort
of
us
and
like
honeycomb
who's,
sort
of
a
peer
of
ours
in
the
in
the
space
right
now
like
the
whole
model
is
you've
got
events
and
you
sample
them
and
you
make
queries
and
you
get
graphs
and
the
graphs
are
the
time
series
of
the
sort
of
events
and
you
can't
collect
all
of
them.
So
it's
a
new
way
of
thinking
about,
I
guess,
a
sort
of
a
replacement
for
metrics,
not
all
it's
not
completely
replaceable
for
metrics.
F
It's
sort
of
a
complementary
tool
here
that
lets
you
dig
into
high
cardinality
data
and
and-
and
I
think
just
another
way
of
thinking
about
what
you
just
said
earlier-
is
when
you
sampled
your
data,
you're,
never
going
to
be
able
to
use
it
for
this
sort
of
like
finding
a
needle
in
a
haystack
there's
just
too
many
needles
and
there's
the
haystack's
only
so
large.
F
So
what
do
you
use
sample
data
for?
Well,
you
can
usually
pick
out
the
like
heavy
hitter
or
the
like
leading
cause
of
something
so
like.
I
may
have
described
earlier
a
case
where
I'm
going
to
drop
some
dimensions
and
it's
like
those
dimensions,
cost
me
too
much
because
of
cardinality,
but
I'm
going
to
keep
this
sort
of
like
modest
size
sample.
That
does
include
a
little
bit
of
information
about
the
dimensions
I
dropped.
F
It's
very
approximate,
but
and
and
and
some
some
of
those
pieces
of
data
are
going
to
be
so
tiny
that
they
don't
statistically
mean
anything.
I
don't
have
enough
examples.
F
Now
I
have
a
new
clue:
that's
the
sort
of
application
that
people
are
getting
from
sample
data.
I
think.
B
B
G
I
don't
have
a
great,
I
don't
have
a
great
feeling
yeah
doing
anything
on.
B
The
the
client
side,
when
I
say
client
side
what
I'm
referring
to
is
on
the
the
tracing
library
or
instrumentation,
is
changes
rolling
out
changes
takes
months
so
even
like.
If
you
fix
a
bug
in
that
probability,
algorithm
like
let's
say
you
have
implemented
something
and
you
want
to
change
something
it's
and
it's
going
to
take
a
while
to
roll
them
out.
That's
one
problem
and
the
second
problem
is
like
variety
of
runtimes
like
node.js,
python,
java
and
and
so
on,
so
get
getting
this
implemented
across.
B
You
know
polyglot,
you
know
run
times,
that's
another
challenge,
so
this
this
things,
like
you
know,
make
make
it
more
complicated
to
like
compared
to
metrics
and
logs,
like
metrics
and
logs
they're,
like
you
think,
they're
like
very
stateless
right.
If
you
want
to
fix
something
and
fix
it
and
roll
it
out,
but
with
trace
it's
it's
like
you
know,
the
entire
fleet
yeah.
F
We
have
this
problem
in
tracing
in
like
more
than
one
way
like
you
want
to
assemble
a
whole
trace,
so
you
got
to
figure
out
how
to
gather
all
those
spams
and
you
need
a
buffer.
And
so
the
only
response
I
have
to
what
you
said
now
is
is
that
the
collector,
the
hotel
collector,
ought
to
be
able
to
do
sampling
so
that
you
can
say
cost
100
of
your
spans
into
your
collector
and
then
do
something
more
clever.
F
But
you
still
have
a
problem
of
like
I
need
all
my
spans
from
the
same
trace
to
be
in
the
same
place,
potentially
if
I'm
sampling
or,
if
I'm
doing
anything,
based
on
trace
structure.
So
so,
and-
and
this
is
the
architecture
that
lightstep
starts
started
with
we're
changing
that
as
well.
But
you
know
we
collect
100
data
into
our
satellites
and
then
we
can
do
this
type
of
summarization.
But
it's
it's.
F
F
B
Yeah,
the
more
we
think
about
it
here
at
netflix.
What
like
we
try
to
keep
the
the
tracer
library
instrumentation
very,
very
thin
and
move
the
complexity
to
the
back
end.
That's
what
like
general
approaches
that,
but
then
we'll
have
to
take
case
by
case
like
whether
it
is
feasible
to
do
it
in
the
client
or
in
the
back
end,
wherever
possible.
We'll
first
try
to
do
it
in
the
back
end.
So
that
way,
it's
easier
for
us
to
operate.
F
Yeah,
I
think
we
should
we
should
all
hope
for
this
to
get
into
the
collector
and
that
one
of
the
reasons
that
I
was
originally
I
mean
this
is
another
connection
here.
Is
it?
Is
that
there's
no
such
thing
as
a
reservoir
sampling
algorithm
in
the
collector
right
now,
but
there
is
a
rate
limited
sampler
which
is
non-probabilistic,
and
I
think
that
we
could
have
a
probabilistic
rate,
limited
sampler
if
we
just
swapped
in
a
reservoir
algorithm.
So
that
was
my
goal.
F
I
mean
I
made
that
post
that
I've
linked
to
you
can
check
it
out.
My
goal
was
both
to
like
say:
I
want
this
for
metrics
because
of
high
cardinality.
I
think
that's
interesting,
but
I
also
want
this
because
right
now
the
the
collector
is
doing
rate
limited
collection,
but
not
giving
me
probabilities,
and
I
think
we
could
solve
that.
F
We've
reached
the
end
of
an
hour.
It's
been
lovely
talking,
I
don't
I,
I
don't
know
that
we've
made
any
decisions
or
if
we
I
think
we
were
missing
a
lawyta.
We
were
going
to
talk
about
this
sampling
priority
question,
which
lumiela
was
also
interested
in.
So
maybe
next
week
and
I'll
be
back
and
I
but
I
hope
I'm
not
derailing
this
conversation
by
ranting
about
weighted
sampling,
for
example,.
F
Great
so
I'll
be
back
next
week
and
we
can
try
and
push
forward.
I
hopefully
there'll
be
more
more
to
talk
about
and
maybe
we'll
be
able
to
come
to
a
decision
about
sampling
probability
as
a
as
a
field,
rather
than
an
attribute
see
you
next
time.