►
From YouTube: Tempo Community Call 2020-11-12
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
B
A
Yeah
I
saw
that
you
know
so
I
I
hope
everyone
is
living
in
a
location
where
you
can
easily
ship
stuff
to.
C
Cool
well,
I
guess
we
can
go
and
get
started
here.
It's
a
couple
minutes
after
just,
I
think
we
could.
I
wanted
to
start
by
just
kind
of
introducing
ourselves
on
the
grafana
side.
This
is
the
first
tempo
community
call
which
is
kind
of
cool,
and
I
thought
we'd
introduce
ourselves
and
then
anyone
else
who
feels
comfortable
introducing
themselves
feel
free
to
jump
in,
but
you
know
feel
free
to
lurk
also,
if
you're
not
in
the
mood
to
talk.
C
So
I'm
joe,
I'm
one
of
the
maintainers
of
tempo
and
I've
been
working
on
it
for
a
bit
now,
I'm
very
happy
to
finally
open
source.
It
very
also
happy
to
see
all
the
response.
I
see
a
couple
of
people
in
this
chat
that
have
already
submitted
some
pr.
It's
been
involved
in
the
issues
and
in
the
slack
and
it's
been,
their
response
has
been
really
cool
and
I'm
excited
to
hear
kind
of
how
you
all
are
using
it
and
what's
going
on
in
the
community
and
on
you
want
to
go
ahead.
B
I
can
go
next,
so
I've
been
working
with
tracing
for
a
while
now
was
probably
like
in
2018,
I
submitted
my
first
pr
to
the
yeager
project
and
yeah
after
that,
a
friend
of
mine
at
grifana
gotham,
if
you've
heard
of
him
just
said,
hey
if
you
want
to
work
on
tracing
full-time,
might
as
well
just
join
the
company
and
that's
what
I
did
and
I
continued
to
work
on
tracing
at
grifana,
I'm
one
of
the
co-maintainers
with
joe
and
yeah
really
looking
forward
to
how
folks
are
using
it
and
sort
of
how
we
can
help
with
all
of
these.
B
A
I'm
richie
also
working
at
grafana
labs
for
on
the
community
side
and
with
my
prometheus
openmetrics
head
on
I've,
wanted
what
is
now
tempo
since
2016.,
so
I'm
totally
thrilled
that
we
finally
have
that
last
missing
piece
course
like
since
15
I
wanted
to
have
what
is
now
loki
since
16.
I
wanted
to
have
what
is
now
temple
and
everything's
great
and
awesome,
and
the
future
is
ours.
E
Yeah,
I
might
as
well
just
do
a
quick
introduction
as
a
non
grafana,
so
I
I
joined
slack
and
been
kinda
lurking,
but
also
writing
a
bit
since
the
introduction
of
tempo.
Basically
because
I
was
handed
the
test
to
introduce,
distributed
tracing
and
already
bought
into
the
grafana
ecosystem,
with
loki
and
cortex
and
so
on.
So
it
feels
nice
to
have
everything
in
one
package,
so
I'm
mostly
here
to
just
listen
in
and
see
what
the
roadmap
is
and
and
how
we
can
help
as
well,
if
needed.
F
I
am
the
documentation
writer
for
the
temple
product
working
at
grafana
labs.
C
Cool,
I
think
we
just,
I
think
yesterday
released
030
and
I'm
pretty
sure
daniel
wrote
about
80
of
the
prs
for
that
for
that
release.
So
thank
you
very
much.
It
was
excellent
contribution
for
that.
H
I
can't
go
next,
I
I'm
says
I
work
for
workday
and
I
recently
pretty
much
as
simon.
I
was
asked
with
a
distributed
racing
task
and
end
up
being
like
another
piece
with
jaeger
and
what
what
not
and
I
thought
like
tempo
would
as
a
an
appetite
for
being
integrated
with
the
other
tools
for
logs
and
and
the
visualization
of
it.
So
I
joined
the
project
and
yeah,
I'm
enjoying
so
far
cool.
B
C
All
right,
oh
sorry,
yo.
A
A
A
C
Cool
well,
I'm
gonna
move
ahead
a
little
bit,
but
certainly
I
kind
of
want
this
to
be
conversational
feel
free
to
jump
in
whenever
and
if
you
know
as
we
move
forward,
if
you
want
to
introduce
yourself,
certainly
that
door
is
not
closed.
Anyone
is
welcome
to
to
do
whatever.
C
This
is
meant
to
be
a
little
bit
of
office
hours
and
a
little
bit
of
just
community
chatter.
So
certainly
I
want
people
to
bring
questions
thoughts.
Just
reporting
kind
of
what
you're
doing
internally
with
tempo,
showing
internal
metrics
or
just
discussion,
would
be
great
while
you're
thinking
about
that
I'm
going
to
move
ahead
and
just
talk
about
what
we're
talking
about
for
v1,
so
kind
of
the
roadmap
for
v1
when
we're
hoping
to
release
and
what
we're
hoping
to
get
into
it.
C
C
So
I
believe
this
is
attached
to
the.
I
believe
this
is
attached
to
the
meeting,
but
I'm
going
to
just
go
and
put
it
here
in
chat.
Also,
this
is
the
doc.
The
community
call
doc
and
richie
has
kind
of
put
this
together
for
us,
he's
the
community
guy
at
grafana.
C
So
to
talk
a
little
bit
about
v1,
we're
really
hoping
to
have
v1
I'd,
say
towards
the
end
of
q1
2021.
That's
that's
our
our
goal
just
to
kind
of
stamp
it.
As
you
know,
production
ready
and
we
feel
like
feature
wise.
It's
reached
the
point
where
we
can
say:
v1
we've
been
using
it
internally
for
months
and
months.
So
you
know.
C
Of
course
it
was
rough
at
first
now,
it's
very
stable
and
most
of
what
we're
looking
to
to
stamp
out
for
v1
is
a
little
cleanup
kind
of
solidifying
config
making
sure
all
our
metric
names
are
what
we
want
the
kinds
of
things
you
know
we
want
to
not
change
after
we
after
we
stamp
v1.
C
If
you
go
to
the
repo
and
you
look
under
the
v1
tag,
there's
a
number
of
issues
there.
So
if
you
want
to
see
a
little
bit
through
the
repo
you
can
we
try
to
keep
those
up
to
date.
C
We
need
to
kind
of
have
more
regular
bug,
scrubs
and
such,
but
for
the
most
part,
if
you
go
there,
there's
a
v1
label
and
you'll
see
things
that
we're
targeting
for
v1
I'd,
say
primarily
compression
and
kind
of
experimenting
with
compression,
seeing
how
it
plays
a
role
for
tempo
is
on
is
high
on
the
list.
So
right
now
we're
just
writing
proto
to
s3
or
gcs
directly,
and
our
storage
costs
are
really
only,
I
think,
about
a
sixth
of
our
total
tco.
C
So
it's
less
about
cost
for
us
and
more
we're
interested
in
seeing
if
it
improves
query
performance
for
two
reasons:
one
will
pull
less
from
s3
per
call
right:
it'll
have
to
just
pull
less
we'll,
spend
more
cpu,
decompressing
and
recompressing
right,
but
it
will
pull
less
data.
So
we're
wondering
if
that's
going
to
give
us
a
little
edge
on
query
latency
and
then.
C
Secondly,
we
can
build
bigger
blocks
so
right
now
it
takes
us
a
little
over
an
hour
to
build
blocks
with
six
million
traces
in
it,
five
to
six
million,
and
that's
about
the
largest
blocks
we
build
at
grafana.
So
if
we
were
able
to
build
larger
blocks
through
compression
because
we'd
be
reading
and
writing
less
data
take
less
time,
then
we
could
reduce
the
likes
the
length
of
the
block
list,
which
would
also
improve
query
performance
and
allow
longer
retentions
with
better
latency.
C
So
compression
is
big
on
our
list
for
sure.
A
query
front
end
which
ananya
is
going
to
be
poking
around
on
probably
for
the
rest
of
this
year,
is
up
on
the
list,
and
I
think
you
want
to
talk
about
kind
of
your
thoughts
on
the
query
front
end
where
you
want
to
send
that.
B
For
sure
so
so
the
query
front
end
is
being
worked
on
for
a
couple
of
reasons.
We
think
that
the
search
algorithm
in
tempo
can
be
improved
a
lot
right
now.
The
way
it
works
is
that
we
fire,
like
we
spin
up
a
worker
thread
for
each
block,
that
we
store
in
our
back
end,
and
sometimes
this
puts
a
lot
of
load
on
our
carrier,
because
if
we
have.
F
B
A
few
hundred
blocks
and
we
fire
a
query.
Each
of
each
of
the
workers
will
fetch
the
bloom
filter
for
all
of
the
blocks,
and
then
you
know
check
if
the
trace
is
in
it
and
so
on,
and
this
is
like
a
lot
of
work
and
what
we
really
want
to
do
is
to
sort
of
make
sure
that
all
of
the
workers
are
equally
utilized
and
a
query.
B
Front-End
will
help
us
do
that,
because
when
we
get
a
query,
the
idea
is
to
shard
the
block
space
and
to
sort
of
assign
just
a
subset
of
it
or
to
each
querier,
and
so
it'll
it'll,
equally
spread
out
the
workload
across
all
of
them
and
it'll
also
improve
search
times.
We
hope
and
yeah.
These
are
some
of
the
reasons
that
we
think
aquatic
frontline
will
be
worth
it
and
what
else?
Let's
see.
B
Right,
so
we
can
do
that
right
now
with
the
query,
but
the
idea
is
that
for
traces
that
cross
blocks
so
for
long
running
traces
right,
they
can
be
part
of
multiple
blocks
and
right
now
the
way
tempo
works
is
when
we
fire
a
query
for
a
trace
id.
B
And
so
for
this
we
kind
of
want
exhaustive
search
across
all
of
the
blocks,
and
this
will
also
we
looked
at
implementing
context,
cancellations
at
first.
So
what
we?
What
we
really
wanted
to
do
was
if
we
knew
that
a
block
has
a
trace
and
we
hit
it
and
we
find
a
successful
hit.
Then
we
just
cancel
all
of
the
other
workers
because
they're
you
know
spending
time
on
necessarily
querying
other
blocks
and
then
just
return
that
subset
of
spans.
B
We
implemented
context
cancellations,
but
it
turns
out
that
there's
a
there's
some
resource
leak
in
the
gcs
client
that
we
use
and
what
happens
is
it
runs
really
smoothly
for
like
10
minutes,
but
after
that
it's
just
latency
shoot
up
and
it
just
just
doesn't
work
and
so
we're
going
to
probably
end
up
just
really
throwing
that
away,
because
we
don't
want
to
do
context,
cancellations
anymore.
We
want
exhaustive
search
across
all
of
the
blocks.
B
C
So
I
think,
when
we
get
this
query
front
end
together,
they'll
kind
of
always
be
this
option
to
run
it
the
way
it
is
right
now,
which
is
you
know,
all
your
traces
are
sub
one
second
or
sub
a
few
seconds.
You
don't
care,
you
expect
them
all
to
end
up
in
the
same
blocks
and
that's
fine
and
you
like
the
way
it
works
now,
and
you
want
that
and
then,
if
you
want
to
the
query
front,
end
is
going
to
basically
give
you
the
option
to
add
more
resources
to
the
query
path.
C
So
you're
going
to
it's
going
to
basically
shard
a
query
across
a
whole
lot
of
different
queries
to
search
the
entire
block
space
in
a
reasonable
amount
of
time,
which
is
hard
to
do
now.
It's
kind
of
the
way,
loki
and
cortex
have
both
scaled,
their
query
paths
to
handle.
You
know
extremely
large
queries.
If
you
saw
some
of
the
loki
2.0
stuff,
you
know
that
the
way
they
can
do,
that
is
by
breaking
that
query
up
into
a
bunch
of
tiny
pieces
and
assigning
them
to
different
queries.
C
B
And
down
the
road,
we
can
also
add
more
features
of
the
query
front
and
we
can
do
sharding
by
say.
Tenant
id
or
you
know,
cortex
has
this
concept
of
shuffle
sharding,
where
only
a
subset
of
queries
get
assigned
to
query
for
a
particular
tenant
and
don't
know
this
might
be
really
useful
for
tempo
as
well.
So
some
of
that
experimentation
might
happen
might
happen
in
the
query.
Frontend
area,
cool.
C
C
It
was
also
adding
query
latency,
so
sharding
allowed
us
to
reduce
significantly
the
amount
of
data
we're
requesting
from
gcs
and
the
index
paging
we'll
do
something
similar
we're
going
to
only
request
pieces
of
the
index
at
a
time.
Instead
of
the
full
thing
I
think,
for
our
largest
blocks,
the
index
hits
one
megabyte
plus
again,
that's
just
you
know
when
you
think
of
like
a
megabyte
file
on
your
hard
drive
that
seems
tiny,
but
pulling
all
that
over
the
network
marshalling
it
into
something
and
go.
You
know.
C
And
then
there's
some
other
things
like,
I
said,
feel
free
to
check
the
v1
label
in
github
for
other
options
or
other
things
kind
of
we
want
for
v1.
Certainly
if
any
major
bugs
sit,
we
will
also
take
v1
on
that.
You
know
we
want
to
make
sure
v1
feels
stable
for
everyone
and
good.
C
But
let
me
open
up
the
floor
if
there's
any
questions
externally
or
internally
or
from
whoever,
let's
just
kind
of
chat
a
bit
or
it
doesn't
have
to
be
a
question
if
you
want
to
just
share
some
of
your
experience
with
tempo
or
kind
of
what
you're
doing
that'd
be
cool
too.
A
Before
that,
just
one
point
of
order,
I
realized
I
hadn't
had
the
document
on
public
editing.
So
now
it's
on
public
editing
and
everyone
can
help
chip
in
with
the
note
taking
if
they
so
choose
and
also
they
can,
if
they
want,
fill
out
their
own
names.
But
I
already
filled
out
everyone
who's
here.
Well,
I
don't
know
where
you
work.
You
can
put
it
in
you,
but
you
don't
have
to
like
whatever
either
is
fine?
Yes,
point
of
order
over
the
document
is
now
fully
editable.
A
Yeah,
no
absolutely,
and
you
can
also
like
you-
should
absolutely
feel
free,
also
going
forward
just
add
stuff
to
the
agenda
if
you
want
to
like,
if,
if
there's
something
you
want
to
talk
about
just
toss
it
in
also
under
the
under
the
month
or
anything,
this
is
a
living
rolling
document
from
the
community
for
the
community
so
do
whatever,
like
I
mean,
obviously
don't
abuse
it,
but
outside
of
that,
do
whatever
and
feel
free
to
to
shape
what
this
call
is
about
and
how
how
it's
structured.
B
G
C
Okay,
so
we
have
gcs
and
s3.
Now
I
think
we
want
azure
blob
storage,
but
I
think
we
have
an
azure.
We
do
have
an
azure
account
at
grafana.
I
don't
plan
it
much.
I've
barely
used
azure,
so
people
in
the
community
have
asked
if
somebody
has
a
lot
of
azure
experience
and
wants
to
implement
that
that'd
be
awesome.
C
Let
me
I
will
link
real
fast
here
to
the
interfaces
that
need
to
be
kind
of
implemented
in
order
to
pull
that
off.
Let's
see
tempo
dv,
so
anon
you
did
the
most
recent
one
with
s3.
I
did
the
original
with
gcs
ananya
edit
s3
support.
C
Oh
actually,
I'm
gonna
add
this
to
the
dock.
I
was
gonna
put
in
the
chat,
but
the
docs
the
better
place
so.
C
So
that's
the
that
back
end
go
file
right.
There
is
the
is
the
interface
that,
if
you,
the
writer,
reader
and
compactor,
interfaces,
are
what's
necessary
to
implement
in
order
to
add
support
for
a
backend
and
kind
of
our
caching
layers
also
implement
those
interfaces.
So
we
have
a
memcache
layer
that
implements
those.
F
C
A
lot
of
the
calls
are
passed
through
because
you
know
it's
not
going
to
do
anything.
It
doesn't
actually
cache
any
of
the
actual
proto,
but
it
will
catch
the
bloom
filters
and
the
and
the
indexes
so
the
the
disk
cache
which
is
okay
and
the
memcache,
which
is
far
better
implement
those
interfaces
also
and
kind
of
sit
in
between
in
like
a
chain:
okay,
okay,
thanks
yeah.
B
It's
also
probably
like
a
good
idea
to
probably
open
open
issue
on
github
asking
for
like
multiple
back-end
support,
and
then
people
can
keep
commenting
on
it
and
affording.
G
C
I
am
going
to
add
an
azure
back-end
ticket
as
soon
as
we're
done
with
this
call,
I
totally
forgot
about
it.
Gotham
jumped
in
he's
like
I'd
like
to
do
that.
Something
tells
me
yeah
show
up
for
the
next
couple
months.
If
we
wait
on
gotham,
so
I'll
make
an
issue
in
the
the
tempo
repo,
so
we
could
remember
either
that
or
it's
done
already,
we
should
check
with
it
or
any
forgot
about
it
right.
B
C
Let's
see
we
could,
you
know,
share
some
metrics.
Those
are
always
fun
to
look
at.
You
guys
want
to
see
some
wiggly
lines
in
grafana.
C
Let
me
let
me
pull
up
our
our
stuff
here.
C
Yeah
somebody
pr'd
multi,
multiple
architecture
builds
right,
whoa,
that's
cool,
it
kind
of
works,
but
we
need
jaeger
to
support
it
to
fully
support
it.
So
they're
working
on
the
jaeger
repo
now
and
it
broke
our
drone
builds.
So
I
had
to
make
a
small
change.
I
forgot
you
even
told
me
that
I
forgot:
let's
see
so
right
now.
In
grafana,
we
have
1.4
billion
traces
over
a
two
week,
retention
period,
a
window
doink
this
window.
C
Yes-
and
this
is
kind
of
so
this
is
a
dashboard
that
I've
thrown
in
the
repo.
This
exists
there,
it's
just
like
sitting
there
as
jason.
C
It
does
rely
a
little
bit
on
our
kind
of
way
of
doing
things.
So
you
might,
if
you
just
dunk
this
directly
into
grafana,
it
might
not
actually
show
any
good
metrics,
but
you
might
have
to
do
some
small
adjustments.
For
instance,
let's
see
if
I
can
that's
not
a
good
one.
C
Let's
just
look
at
this
guy
here.
C
This
job
label,
for
instance,
is
something
we
use
very
commonly
at
grafana
that
you
might
not
have
in
your
infrastructure,
so
you
might
have
to
make
some
adjustments
here.
We
have
a
kind
of
rewrite.
You
know:
prometheus
scrape,
config,
rewrite
label
rewrite
rule
that
does
that
these
bad
words
also
come
from
prom
tail.
So
if
you
guys
use
loki,
then
you
have
prom
tail
metrics
and
you
probably
have
this.
C
If
you
don't,
then
you
won't
have
that
either
so
little
things
like
that
will
be
weird,
but
I
kind
of
talked
through
this.
This
is
the
dashboard.
I
look
at
all
the
time.
I
think
some
people
say
it's
an
anti-pattern
or
bad
to
look
at
dashboards
regularly
and
only
you
should
use
alerts,
but
I
look
at
dashboards
all
the
time.
Every
morning
I
look.
I
want
to
see
what
changed.
C
If
we
do
a
release,
I
watch
I
want
to
see
what
changed
it's,
how
I
kind
of
learn
what
you
know,
what
impact
things
are
having
it's,
how
I
learn
what
my
application
is
doing
and
really
get
a
feel
for
it.
I
don't
like
to
look
at
things
only
when
they're
breaking,
because
then
you
don't
know
what
normal
looks
like
and
then
so
I
every
morning
bring
this
up
and
kind
of
just
review
what
things
have
looked
like
for?
C
The
past
24
hours
get
a
feel
for
where
things
are
and
if
everything
looks
the
same,
then
I
move
on.
If
something
weird
happened,
then
I
go
dig
into
logs.
I
want
to
know
why
and
I
want
to
understand
and
improve
tempo,
so
our
block
list
is
currently
around
400
blocks
and
blockless
is
really
going
to
impact
query
latency
due
to
the
shuffle.
I'm
sorry,
due
to
the
bloom
sharding
change
from
onyar
latencies
are
actually
pretty
dang
good
p50
to
400
milliseconds,
I'm
very
happy
with
that
right
now,
but
we
can
do
better.
C
C
So
this
400
milliseconds
p50,
that
looks
like
a
two
about
two
seconds:
p99
is
looking
through
the
blocks
and
as
soon
as
it
finds
any
match,
it's
gonna
bring
that
back
and
and
give
it
to
the
and
give
it
to
the
user.
Like
I
said
we
do,
we
would
like
to
improve
it
to
search
everything
basically
exhaustively
we're
at
150
000
spans
a
second
right
now,
which
is
a
little
down
from
our
peak.
I
think
about
170
sustained
is
the
most
we've
seen
and
we've
seen.
C
Spikes
up
to
two
to
three
hundred
thousand
loki
in
particular
makes
the
dumbest
traces.
Ever
sometimes
it
goes
300
to
400
000
spans
per
trace.
We
actually
have
a.
C
We
actually
have
a
limit
of
250
000
spans
max
in
our
environment,
to
prevent
loki
from
just
basically
breaking
tempo
tempo
does
struggle
to
return
traces
somewhere
north
of
100
000
spans.
C
I
have
some
suspicion
of
where
that
is,
but
haven't
had
a
lot
of
time
to
dig
into
it,
but
it
kind
of
has
to
do
a
number
of
translations
once
it
pulls
it
from
disk
and
recombinations,
and
I
think
it's
kind
of
just
taking
too
long,
basically
in
those
in
those
areas.
So
we
need
to
kind
of
improve
that,
and
it
looks
like
I've
noticed.
Something
is
the
ratio
of
traces
to
spans
is
very
different
for
very
for
different
companies.
I'd
say,
tempo
is
mainly
span
limited
spans.
C
If,
like
I
often
when
I
think
of
load,
I
often
think
of
spans
per
second
with
tempo,
and
I
think
that's
the
primary
the
limiter
in
terms
of
like
scaling.
It
but
it's,
but
the
ratio
is
always
very
different.
I
think
traces
is
also
important,
but
less
important,
for
instance,
we're
doing
about
this
is
actually
post
replication
factor,
so
we're
actually
at
about,
I
think
900
traces
per
second
at
150
000
spans
per
second.
C
If
you
have
smaller
traces,
then
you
can
do
far
more
traces
per
second
and
you'll
just
keep
it.
You
know
under
the
and
it'll
be,
and
the
spans
per
second
will
matter
more
frankly,
again,
it's
mainly
loki
and
also
yeah.
It's
actually
mainly
loki
that
causes
these
enormous
traces.
It's
good
right!
It's
good
that
we're
testing
it
in
an
environment
with
very
large
traces
and
other
kind
of
challenges,
so
that
we
can
make
sure
it
works.
You
know
at
the
scale
other
people
need.
C
This
failed
flush
message
is
wrong.
I
need
to
fix
this
there's
some
kind
of
race
condition
in
cleaning
up
the
block
on
disk
before
it
pushes
it
out
to
s3.
So
there's
an
issue
already
to
fix
this,
but
I've
been
ignoring
this
for
a
while.
I
don't
want
to
ignore
this,
because
the
failed
flush
is
dangerous.
C
It
means
some
data
did
not
make
it
to
your
back
end.
For
some
reason,
the
injector
will
retry
repeatedly,
but
it
also
means
you
might
be.
You
know,
backing
up
data
on
your
disk,
so
this
needs
to
be
fixed.
If
I
have
not
attached
a
v1
tag
to
it,
I
need
to
attach
a
v1
tag
to
it,
because
this
metric
needs
to
be
trustable,
and
it's
not
at
the
moment.
C
D
C
Right,
the
vulture,
so
loki
made
a
loki
made
a
a
tool
called
the
canary
and
it
makes
sense
like
if
you
heard
of
like
a
canary
in
a
coal
mine
right.
You
drop
the
canary
down
to
see
if
there's
noxious
gases.
I
think
this
is
forever
go.
They
don't
think
they
continue
to
drop
small
birds
down
into
mines
to
make
sure
that
there's,
not
gases
that
will
kill
humans.
I
hope
not
at
least,
but
they
used
to
do
this
to
see
if
the
animal
passed
out-
and
they
know
it
was
like
a.
C
It
was
a
signal
that
something
was
wrong.
Basically,
so
the
loki
canary
is
a
way
that
loki
uses
to
test
loki
constantly
to
see
if
something
happens
like
it,
it's
like
a
constant
end-to-end
test
in
production
and
the
vulture
that
kind
of
became
this
joke
at
grafana.
Of
naming
of
naming
these
consistent,
consistent
checking
tools,
these
end
end-to-end
integration
tools
after
birds,
so
for
tempo,
we're
the
vulture.
C
I
don't
I
don't
know
why
we
just
are
so
this
this
tool,
just
constantly
queries
tempo
internally
and
then
we
metric
trace
is
not
found.
How
often
we
asked
for
an
id
and
we
didn't
actually
find
it
and
how
often
we
asked
for
an
id
and
then
we
search
the
we
search
the
this
the
span
tree
and
we
see
if
there's
any
missing
spans.
So
we
see
if
there's
any
reference
to
a
span
that
doesn't
actually
exist
in
the
trace,
and
so
that's
what
these
percentages
are.
C
This
is
actually
a
little
elevated
at
the
moment.
Four
percent.
We
normally
see
less
than
that
and
it's
due
to
some
rate
limiting
we've
been
seeing
in
in
our
ops
environment.
So
actually
somebody
was
just
commenting
on
the
ticket.
C
I
think
it
was
caesar
and
I
don't
mean
to
call
you
out,
but
somebody
was
just
commenting
on
a
rate
limiting
ticket
right
and
it
was
we're
seeing
that
same
we're
seeing
that
same
issue,
so
that
rate
limiting
ticket
that
you
are
looking
at
we're
seeing
some
of
that
internally
and
that's
why
some
of
these
have
slightly
elevated,
missing
spans.
C
So
these
are
the
spans
we're
dropping
right
now
and
those
are
as
far
as
I
can
tell
due
to
this,
like
spikiness
problem,
basically
we're
rate
limiting
over
every
second
and
for
whatever
reason,
traces
seem
to
be
very
spiky
compared
to
other
workloads,
and
so,
even
though
we
are
below
internally,
our
total
spans
per
second
rate
limit
is,
I
think,
250
000
spans
or
maybe
it's
200
000
I'd
have
to
go
check,
but
we're
well
below
that,
but
we're
still
throwing
away
spans
due
to
it,
and
so
I
need
to
spend
some
time
looking
at
that,
and
I
think
the
answer
is
going
to
be
instead
of
every
second
or
enforcing
rate
limit
by
the
second.
C
C
H
So
I
out
like
when
I
change
the
configuration
I
was
able
to
handle
those
spikes,
so
I
don't
know
exactly
how
different
it
is.
I
I
understand
your
overall
explanation,
but
I
don't
know
how,
if
it
is
different,
your
approach
or
mind
just
increasing
the
bucket,
because
increasing
the
bucket
actually
solves
the
bursts
and
the
rate
will
just
put
it
recycle
the
tokens
again.
So
I
can
keep
going
with
the
with
the
rate.
C
What
we
were
seeing
so
what
I'm
seeing
internally
is
our
current
spans
per
batch
is
one
thousand
and
I'm
seeing
these
being
rejected
with
batch
sizes
of
like
three
or
four
hundred,
which
is
why
my
guess
was
my
guess.
It
was
more
about
spikiness
versus,
like
the
size
per
batch
that
was
being
sent.
C
I
don't
know
I
maybe
I
should
increase
that
that
burst
size
that
batch
size
and
see.
If
that
alleviates
our
issue,
I
should
try
that
internally,
I
do
associate
that
burst
value
with
the
batch
size
and
then
the
and
then
the
rates
per
second
with
just
well
the
spans
per
second.
So
I
think
that's
right,
but
I
also
haven't
played
around
with
it,
and
you
know
you
have
spent
some
time
experimenting.
So
I
should
do
that
internally,.
H
C
I
J
C
C
The
rate
at
which
we
make
lot
of
blocks
into
less
blocks
and
marty
has
some
good
ideas
on
how
to
improve
that
and
then
secondarily,
it's
going
to
be
the
just
total
length
of
this
block
list
so,
but
I
think
the
query
front
end
will
solve
that
once
those
two
things
are
in
place,
that's
kind
of
why
we
want
to
stamp
that
with
v1
right
now,
we're
searching
yeah
about
a
little
bit
over
a
billion
traces
and
getting
decent
times.
C
We
want
to
search
those
exhaustively
and
then,
with
the
query
front
end
and
this
compaction
fix,
I
don't
know,
I
hope,
to
push
well
over
a
billion.
Basically,
I
hope
it
becomes
a
question
of
just,
and
I
think
this
will
be
true,
how
much
resources
do
you
want
to
put
at
it
you'll
be
able
to
scale
with
your
queries
as
far
out
as
you
want
you'll
be
able
to
shard.
You
know
these
queries
as
much
as
you
want,
and
then
you
know
you
can
say:
2
billion
10
billion,
whatever
works
whatever
you
need.
C
I
Yeah,
I
I'm
not
sure
if
we
have
maybe
like
placeholders
for
what
kind
of
compaction
work
we
want
to
do
in
v1,
but
I
mean
I
think
we
want
to
do
some
things
right.
C
For
sure
yeah
yeah
you're
right,
we
probably
haven't
yeah.
So
a
lot
of
tempo
was
itched
in
my
head
for
many
many
months,
and
so
I
tried
to
write
a
lot
of
tickets
and
I
think
I've
done
an
okay
job,
but
I'll
admit
there
are
some
that
have
not
made
it
into
issues.
C
Some
of
the
compaction
improvements
have
not
I've
kind
of
just
brain
dumped.
Those
on
poor
marty
in
his
first
week
of
working
at
grafana,
but
I
hope
to
I've
seen
that
communities
made
a
number
of
issues
which
is
awesome
and
I
hope
to
start
writing
better
issues
and
more
issues
instead
of
just
you
know
having
personal
knowledge
of
some
of
this,
which
is
unfair
to
the
community
and
to
the
other
other
people
who
are
using
and
working
on
this
tool
for.
C
E
C
C
C
The
batch
itself
might
contain
entire
traces
or
it
might
contain,
you,
know
a
bunch
of
different
ones.
It's
just
impossible.
To
tell
I
mean
it's
not
impossible.
Tell
we
could
do
some.
You
know
you
could
write
code
that
tries
to
determine
what
percentage
of
which
traces
it
failed
or
something.
But
right
now
it
just
refuses
an
entire
batch
at
once.
Essentially,.
C
Which
does
make
for
broken
traces
which
sucks
also,
and
this
will
be
per
user-
something
you
will
think
about
when
you
all
are
building
your
trace
pipelines
and
those
of
you
who
have
built.
This
have
probably
also
have
already
struggled
with
this.
The
larger
and
more
the
more
data
you're
pushing
the
less
likely
you
want
to
retry.
At
some
point,
you
just
don't
retry.
You
just
drop
everything
as
soon
as
it
fails
and
you
should
load
instantly
if
you're
having
problems,
because
it's
so
much
data
that
there's
any
amount
over
trying
just
immediately
starts.
C
You
know
consuming
too
much
memory.
You
start
ooming
if
you're
using
something
like
kubernetes.
You
start
uming
your
process,
your
pipeline,
and
you
start
dropping
spans
that
way
anyway.
So
internally,
we
don't
retry
anything
because
we're
pushing
just
too
much
volume
when
we
used
to
when
we
used
to
have
retry
setup
as
soon
as
tempo
stuttered,
like
a
bunch
of
the
agents
and
collectors,
would
instantly
start
using
too
much
memory,
kubernetes
would
kill
them
and
they'd
restart
and
they
dropped
spams
anyway.
So
it
didn't
matter.
C
So
if
you
have
a
very
slow
pipeline-
and
you
don't
have
a
lot
coming
through
then
sure
a
retry
makes
sense,
then-
and
you
can
just
buffer
all
that
in
memory
until
your
back
end
comes
back
up.
So
that's
something:
we've
kind
of
played
with
a
little
bit.
B
Internally,
another
thing
I
guess
is
like
we
have
folks
from
the
doc
team
as
well.
We
have
jeeta.
So
is
there
any
any
area
in
the
docks
that
you
know,
the
community
would
like
more
work
on.
Is
there
like
better
integration
guides
needed
any
particular
section
of
the
docs
that
you
know
felt
was
not
up
the
mark
or
any
feedback
on
the
docs
section.
A
A
A
And
then
we
asked
some
questions
in
the
slack
channel
and
I
don't
think
it
was
intended
to
kind
of
split
those
up
so
unless,
unless
there
is
like
a
working
example
of
that,
maybe
that
could
be
clear
in
the
documentation
and
then
in
the
future
on
how
to
do
that.
How
to
go
into
that
direction.
Where,
because
we
have
to
worry
a
lot
about
egress
okay.
So
whenever
we
leave
a
zone
or
a
region,
we
pay
for
egress
in
gcp.
At
least
I
don't
know
it's
probably
the
same
for
aws
and
azure.
B
Okay,
so
just
to
clarify
that,
like
we'll
work
on
fixing
the
docs,
thank
you
for
that
feedback.
So
I
think
to
clarify
that
what
we
would
do
is
run
agents
on
each
of
the
individual
clusters
and
all
of
these
components
would
be
on
one
central
cluster.
The
agent
could
be
the
grifana
agent
or
the
open
telemetry
collector.
If
you
already
have
it
deployed
and
then
all
of
these
components,
tempo
components
would
be
like
in
the
same
cluster.
C
I
actually
recall
that
conversation
and
what
you're
suggesting
or
trying
would
technically
work,
but
you
would
you
know
as
long
as
the
data
was
only
in
the
ingester
layer,
you
would
not
be
able
to
retrieve
that
trace
and
I
think
we
we
talked
about
that
a
bit
in
slack
so
for
everyone
else.
You
could
split
everything
up,
but
the
queries
would
not
be
able
to
contact
the
injustice.
You
would
not
have
access
those
traces
until
they
got
flushed
to
the
back
end.
A
Also
fyi,
we
now
have
something
on
community
grafanacon
and
that's
more
permanent
for
tempo
and
that's
more
permanent
than
than
slack.
We
also
have
a
user's
mailing
list,
but
I
suspect
that
this
won't
survive
that
we
now
have
community
grafana
com,
slash
blah
blah
blah
tempo,
because
that
is
probably
how
most
people
these
days
interact
with
with
user
type
questions,
not
not
email
lists
anymore.
So.
A
A
A
When
google
invited
us
to
london
to
talk
open,
metrics
at
open
senses
and
long
story
short,
I
tried
to
convince
them
that
yes,
label
sets
are
the
best
thing
ever
for
for
traces
and
they
disagreed
and
told
me
well
searching,
doesn't
scale
and
was
like
what,
because,
when
google
tells
you
that
searching
for
something
doesn't
scale
you
better,
listen
so
yeah,
roughly
20
milliseconds
after
openmetrics
gained
excellent
power
support.
A
But
the
important
part
of
this
is
that,
with
at
that
point,
18
years
of
operational
experience
of
google,
they
simply
gave
up
on
anything
non-eximplar.
So
while
I
don't
believe
that
everyone
needs
to
copy
the
hyperscalers
in
how
they
operate
and
matter
of
fact,
I
think
people
shouldn't
at
least
for
for
efficient
underlying
design.
It's
probably
a
good
idea
to
follow
their
lead,
and
this
is
basically
where
they
are
leading
or
what
they
have
been
using
for
ages
now
and
if
that
doesn't
motivate
you
to
to
go
all
in
on
ex-employers.
C
I
I
really
want
to
see
how
they
impact
us,
like
people,
love
searching
for
traces
and
logs,
it's
been
so
effective
and
so
powerful.
I
would
be
surprised
if
exemplars
completely
take
over,
but
it's
one
of
those
things
you
really
have
to
get
a
feel
for
and
start
using
to
see
how
it
impacts
the
way
you
you
know,
triage
your
issues
and
and
what
I
don't
know
in
the
moment.
You
just
use
what
works
right
and
if
I'm
in
the
moment,
clicking
exemplars,
then
that's
what
I'm
going
to
use.
C
If
I'm
in
the
moment
running
these,
you
know
loki
queries
to
get
extract,
trace,
ids,
and
that
would
work
too.
So
I
am
also
as
somebody
who's
played
around
with
them,
some
still
not
sure
how
they're
going
to
land
and
how
I
I
like
triage
and
operate
our
environments,
but
I
am
excited
to
try
them
and
see
what
what
happens.
A
So
you
jump
into
that
one
trace
with
a
lot
more
context
than
you
would
be
doing
if
you,
if
you
just
more
or
less
randomly
click
the
rounder,
which
is
not
completely
true,
of
course,
but
like
from
from
that
perspective,
you
already
come
into
that
specific
trace
or
span
with
the
context
of
what
was
roughly
wrong
with
that
thing,
which
just
gives
you
more
mental
tools
to
to
understand
what
was
actually
happening
and
what
you're
looking
for.
E
Definitely
so
I
have
a
topic.
I
want
to
hear
some
thoughts
about
and
a
bit
tied
to
the
architecture,
because
I
think
I
heard
the
mention
of
open,
telemetry
collector
and
maybe
not,
but
either
way.
Also
it's
like
we
discussed
sampling
and
I
know
that
you're
doing
100
sampling,
which
seems
like
an
easy
thing
to
do,
but
you
might
not
always
want
to
do
that
for
several
reasons.
So
I'm
a
bit
curious
about
tail-based,
sampling
and
other
kind
of
yeah
sampling
strategies.
E
C
So
when
I
say
we're
doing,
100
sampling
we're
controlling
all
that
through
jaeger
remote
sampling.
I
would
never
just
put
like
constant
one
on
my
processes,
because
you
have
no
control
and
you're
gonna
get
overwhelmed.
That's
gonna
be
awful.
So
if
you
are
getting
into
this,
a
hundred
percent
get
your
team
using
remote
sampling.
What
this
does
is
lets
you
control
from
a
central
point:
how
much
of
all
of
your
endpoints
are
sampling
and
it'll?
C
Let
you
do
per
endpoint
per
endpoint
control,
so
we
sample
100
of
our
query
path
and
some
very
low
percentage
of
our
right
path,
because
our
right
path
is
uninteresting
it.
It
always
works
basically,
and
our
query
path
is
where
we're
often
having
problems
with
latency
and
issues
that
we
want
to
use
tracing
to
diagnose.
C
So
remote
sampling
number
one,
two
for
tail-based
sampling.
We
will
intend
to
jump
on
with
open
telemetry,
so
jp
from
red
hat,
who
is
also
a
jager
maintainer
has
been
working
on
a
tail-based
implementation
in
the
collector,
so
he's
looking
at
a
way
to.
Basically,
it's
going
to
be
a
way
to
load
balance
by
trace
id
to
a
set
of
collectors,
and
the
collector
actually
already
has
tail-based
sampling
in
it.
But
you
can
only
one
run.
C
E
Yeah,
that
sounds
sounds
good
generally.
I
think
our
biggest
concern
right
now
has
been
that
if
you
like
we're
we're
using
a
kubernetes
and
style
and
istero
is,
it
can
be
configured
to
to
send
traces
for
basically
everything
that's
happening
inside
your
network,
and
that
will
be
a
lot
of
traces,
so
yeah.
We
need
some
kind
of
way
to
distinguish
those
traces,
so
yeah,
and
I
I
imagine
in
the
future,
also
some
kind
of
real
tale
by
something
as
well,
because
not
everything
is
interesting
as
you're
saying.
C
Yep,
so
I
think
our
goal
internally
will
always
be
100
of
our
read
path,
but
for
our
right
path
is
where
we're
going
to
want
to
implement
tail-based
sampling
or
something
like
that,
and
then
we'd
only
keep
maybe
ones
that
errored
or
some
something
I
mean
so
few
percentage
of
our
rights
fail
that
it
would
be
very
few
traces,
but
that's
where
we
kind
of
experiment
with
that
functionality
at
grafana.
C
B
That's
the
blog
and
well,
it
didn't
make
it
upstream,
because
the
community
had
some
concerns
about
the
number
of
folks
using
it
and
if
it's
battle,
tested
and
so
on,
but
yeah
jp
from
red
hat
is
now
working
on
on
like
a
much
better
version
of
this
which
make
it
another
conflict.
D
B
C
C
C
Well,
if
we're
done,
if
we
have
nothing
left
richie,
do
you
have
some
stuff?
You
want
to
wrap
up
with
like,
for
instance,
whatever
the
rules
of
your
little
token
game
are
how
we're
going
to
distribute
this
thing.
Did
you
track
everyone
who
asked
a
question
richie.
A
I
did
actually
mentally
track
this,
I'm
pretty
certain.
I
got
it
right.
On
the
other
hand,
I
kind
of
feel
bad
calling
out
the
only
two
people
who
didn't
really
ask
anything.
So
does
anyone
object
to
that,
just
like
run
a
random
number
and
assign
numbers
to
each
non-grafana
labs
person
and
then
figure
out
what
you're
actually
getting
shipped
okay.
So
let
me
take
a
quick
count.
One
two,
three,
four,
five,
six,
seven
eight
is
this
right?
No,
that
become
like
that's
weird.
I
know
I
didn't
count.
The
grafanas
people
stupid
me.
A
I
I
have
one
I
literally
have
one
single
coin
and
my
printer
is
supposed
to
be
shipped
tomorrow.
Of
course,
I
had
to
give
back
the
other
printer
which
I
loaned,
but
this
will
be
my
own
and
I
will
have
it
forever
but
like
if,
if
this
is
now
in
somewhere
around
the
world,
I
will
probably
try
and
either
get
someone
printing
this
locally
or
just
order
stickers
online
and
send
them
to
whoever
and
just
for
the
record.
You
would
be
having
stickers
before
joe
and
emily
so
number
five.
A
A
A
No,
that
was
3d
printed,
okay
and
I
actually
I
can
like
anyone
who
has
access
to
a
3d
printer
or
to
a
person
who
has
a
3d
printer.
Let
me
share
something
I
have
one
now
cool.
The
reason
I
ask
is,
I
thought
it
was
metal
because
I
I've
worked
with
3d,
printing
and
metal
previously,
so
I
was
just
curious.
A
D
K
Sure,
hey
guys,
thanks
for
putting
this
on,
I'm
not
familiar
with
tempo
really,
and
I've
only
done
some
research
into
tracing
jaeger
and
zipkin.
So
this
is
pretty
new
territory
for
me,
but
I
have
worked
with
grafana
for
quite
a
while
and
developed
a
couple.
Plugins
rick
has
actually
helped
out
on
one
of
the
plugins
several
years
ago.
I
appreciate
that
first
time.
C
Sure,
thank
you
all
everyone
who
showed
up
and
has
been
interested
in
tempo
talking
in
slack
making
issues
making
prs.
We
really
appreciate
it.
It's
been
a
it's
been
a
fun
ride
so
far.
Let's
all
stick
around.
Let's
get
v1
out
and
then
we're
going
to
talk
about
what
tempo
is
for.
V2
that'll
be
the
next
steps
for
this
community.
So
I'm
excited
to
get
there
for
sure.