►
From YouTube: Grafana Tempo Community Call 2023-04-13
Description
Join our next Tempo community call: https://docs.google.com/document/d/1yGsI6ywU-PxZBjmq3p3vAXr9g5yBXSDk4NU8LGo8qeY
What was discussed:
- Redhat's Tempo Operator
- Tempo 2.1
- Trace by ID tuning!
- TraceQL aggregations built by a community member!
A
Okay,
all
right
welcome
to
Tempo
Community
call
April
20
23
what
a
year
we're
going
to
start
with
some
red
hat
devs
Engineers,
showing
us
this
Tempo
operator
demo,
which
is
cool
and
then
we're
going
to
review
2.1,
so
is
it
Andreas?
Is
that
right?
Is
that
how
you
pronounce
your
name.
B
A
C
A
C
A
C
Okay,
so
hi
everyone.
In
the
last
couple
of
months,
we
were
working
on
a
kubernetes
operator
for
the
tempo
deployment
and
exactly
a
week
ago,
we
actually
tagged
our
first
release
on
GitHub,
so
it's
currently
in
the
US
observability
GitHub
organization,
but
we
plan
to
transfer
it
very
soon
to
the
grafana
org
and
today,
I
want
to
give
you
a
quick
luck.
Demo
about
the
current
state
of
the
operator,
so
I
prepared
a
small
mini
Cube
cluster,
so
I
set
up
site
manager.
C
Third
manager
is
used
to
generate
the
TLs
certificate
for
the
Web
book
of
the
temp
operator.
Then
I
set
up
a
load
generator,
so
it
would
have
a
few
traces
in
the
tempo
once
it's
set
up
and
install
key
Prometheus
Tech.
So
we
have
Prometheus
and
grafana
running
and
for
the
object,
storage
I'm
using
a
menu
instance-
and
here
is
the
tempo
operator
So.
C
C
We
can
see
the
storage
so
in
this
example
I'm
using
menu
with
S3
So.
Currently,
we
support
configuring
S3
Google
Asia
for
a
start.
So
this
is
the
name
of
the
secret
where
the
credentials
for
the
object
storage
are
stored,
which
then
later
will
be
written
to
the
tempo
configuration
file
and
the
observability
field
is
about
observability
of
the
temp
operator
and
the
tempo
components.
C
So
in
this
demo,
I
enabled
that
the
operator
create
service
monitors
for
each
type
of
component
and
I
also
enabled
de
query.
So
that
means
that
the
temp
operator
starts
a
new
container
with
Tempo
query,
which
provides
a
UI.
B
C
C
So
I'll
just
quickly
go
through
kind
of
the
CR
AP
in
the
images
section
you
can
specify
which
container
image
you
want
for
the
tempo
container.
So
of
course
we
only
like
if
you
change
the
default
setting,
it
needs
to
be
compatible
with
the
tempo
version,
which
is
like
the
default
Tempo
version
for
one
operator
version,
because
if
you,
for
example
here
just
upgrade
Tempo,
it
probably
won't
match
the
tempo
configuration
file
which
the
temp
operator
generates,
but
it's
nice.
C
For
example,
if
you
have
some
customizations
which
are
not
yet
released,
you
can
create
your
own
container
and
use
it
here.
Then
some
Tempo
settings
so
the
limits
which
then
will
be
put
in
the
tempo
configuration
file,
standard,
observability
settings,
as
mentioned
before
these
are
for
the
operator
and
the
tempo
components.
C
Then
we
can
set
the
replication
factor
of
the
deployment.
Resources
are
the
kubernetes
resources
for
CPU
and
memory,
then
retention
and
so
forth
account.
Then
the
storage
where
the
storage
secret
is
stored,
storage
size
is
the
size
of
the
persistence
volume
claim
which
is
used
for
the
interesting
component
and
in
template.
We
have
one
section
for
each
component
and
in
this
case
I
just
enabled
the
eagle
query.
So
we
have
the
eager
UI
front
end
and
in
the
meanwhile
we
hopefully
the
load
generator,
hopefully
generated
a
few
traces.
C
A
C
C
Yeah
and
maybe
it's
just
slow
because
yeah
it
showed
up
now,
but
it
took
a
while,
maybe
I,
don't
know
it.
Google
meter
also
always
takes
a
lot
of
cpos
on
my
machine.
Do
you
have
some
issue
with
Graphics
travel?
But
finally,
it's
here
and
yeah,
you
can
see
the
spins
here
and
because
I
enabled
service
monitors.
We
also
have
metrics
in
our
Prometheus
instance,
like
all
the
metrics,
the
tempo
components
expose,
for
example,
we
can
really
build
info,
and
then
we
have
one
for
each
component
interest,
that
distributor
and
so
on.
C
I
also
set
up
grafana
for
this
demo
and
I
imported
the
tempo
operational
dashboard
from
the
tempo
Upstream
Repository,
which
shows
us
cool
metrics
and,
like
some,
some
graphs
are
not
visible
because
the
UC
advisor
which
I
didn't
set
up
locally
but
yeah
the
tempo
ones,
are
showing
stuff.
So
and
of
course,
you
can
also
use
rafana
to
very
tempo
because
of
a
search
gotta.
C
B
C
A
In
the
latest,
grafana
nine
five
is
that
what
I
found
us
on
these
days,
it's
enabled
by
default,
there's
no
more
feature
flag,
I.
Think
so,
I
think
if
you
get
the
absolute
latest,
grafana
it'll
just
be
it'll,
just
be
there
cool
I.
B
B
D
C
C
B
A
C
Yeah,
so
basically,
because
there
are
some
breaking
changes
in
the
configuration
file,
so
we
kind
of
always
need
to
sync
the
temp
operator
with
the
tempo
version,
so
there
will
always
be
required.
A
new
version
under
the
by
Chance,
the
config
file,
is
exactly
like,
doesn't
bring
any
issues
if
you've
started.
B
C
So
yeah
that
was
basically
from
the
demo,
so
yeah,
let's
mentioned
before
we
plan
to
donate
the
repo
anytime
soon
to
refund
org.
B
A
A
Is
there
like
or
I
guess
in
the
link,
the
link
will
probably
have
like
a
readme
that
shows
how
to
actually
make
use
of
it.
Now.
B
A
I
forgot:
we
have
a
grafana
slack
Channel
cool
awesome.
This
is
awesome,
work
thanks,
red
hat
for
putting
the
time
in
and
building
this
and
yeah
I
think
it's
cool
that
you
all
are
participating
in
Tempo
and
helping
us
build
a
this
new
tracing
database.
Just
put
it
in
openshift
make
it
default
openshift
tracing
database.
Next,
what's
that
take.
E
A
All
right
all
right
awesome!
Thank
you!
So
much.
Let's
move
on
to
Tempo
2.1
I
can
kind
of
I'll
review
the
release,
notes
a
bit
and
kind
of
talk
through
some
of
the
big
things
that
are
going
on
here.
A
I'm
gonna
share
my
screen,
so
we
kind
of
all
see
this
because
I'll,
probably
pop
back
and
forth
between
the
dock
and
the
release,
notes
a
little
bit.
I
think
this
is
the
right
one.
I
hope
so.
A
Yeah,
okay,
so
2.1
we've
cut
RC
0.
Yesterday,
probably
we
had
some
pretty
big
merges
right
before
we
cut
it,
or
it
would
have
just
done
two
one
directly.
In
particular,
we
oops
that's
not
what
I'm
about
to
do.
B
A
Particular
we
updated
otel
and
it's
a
pretty
big
update,
so
we're
unsure
it's
probably
fine,
but
we
thought
it
makes
sense
to
be
a
little
bit
patient
here
and
not
just
cut
too
wide
immediately.
A
So
there's
some
big
changes
that
went
right
before
2-1
RC
zero,
we're
gonna,
let
those
kind
of
hang
out
for
a
little
bit
in
our
clusters
before
we,
you
know,
pull
the
trigger
and
just
do
two
one
itself
I'm
on
the
road
a
little
bit
next
week,
I'm
going
to
be
in
Chicago
for
the
observability
con
thing
so,
but
I
really
want
to
release
two
one
next
week,
I
think
there's
a
good
chance.
A
We're
going
to
cut
it
next
week,
depending
on
kind
of
how
slammed
I
am
with
the
observability
con
stuff,
but
breaking
changes
so
we've
removed
search.
This
was
announced
with
V2
Earth,
2.0
V2,
all
the
old
V2
style
blocks
and
we're
going
to
leave
and
we're
going
to
continue
to
support
for
Trace
by
ID.
A
But
we
removed
all
the
search
code
related
to
that,
and
that
includes
a
couple
of
options
here
and
a
metric.
So
these
just
don't
make
any
sense
anymore
and
they've
been
pulled
from
the
configs
and
the
metrics.
So
the
old
V2
blocks
hopefully
have
fallen
out
of
retention
since
you've
installed
2o
and
it
won't
matter
because
you
have
a
bunch
of
parquet
blocks
and
if
you're
continuing
to
run
V2,
because
you
are
doing
your
lookups
through
maybe
exemplars
or
through
logs
or
something,
then
you
can
continue
and
always
continue
to
use.
A
We
change
metrics
name
so
tons
of
our
metrics
started
with
cortex
I've
been
wanting
to
do
this
for
a
long
time.
Mario
noticed
it
and
finally
fixed
it,
but
we
had
tons
and
tons
of
metrics
just
because
we
had
this
old
cortex
dependencies,
which
are
now
like
mimir
and
DS
kit
dependencies,
but
we've
changed
all
those
prefixes
to
Tempo
underscore,
which
is
mainly
like
a
Clarity
thing
right.
A
So
you
probably
don't
even
look
at
any
of
these
metrics
because
they're
really
weird
metrics,
but
if
you
do
heads
up
or
if
you
see
something
break
or
a
dashboard
change,
you
might
want
to
keep
an
eye
on
that.
A
And
then
we
added
this
idea
of
we're
trying
to
figure
out
how
to
do
SLO
metrics,
based
on
our
slos,
based
on
both
throughput
and
latency,
on
a
query
and
we
felt
like
the
best
way
to
do.
That
was
just
kind
of
add
this
feature
into
Tempo.
A
So
you
can
configure
an
SLO
in
Tempo
and
it
will
just
track
metrics
for
the
number
of
queries
that
come
into
meet
the
SLO,
whether
that's
by
the
latency
or
by
the
total
byte
scan
or
you
can
or
right-
and
you
can
build
like
metrics
out
of
that.
So
we're
going
to
start
using
this
in
our
Cloud
graphene
Cloud
stuff,
which
is
why
we
needed
that
feature.
So
this
might
be
kind
of
Niche
I'm,
not
sure
who
else
might
use
this,
but
heads
up
in
case.
A
Let's
see
other
interesting
things
go
updated.
Hopefully,
that
won't
break
anything
too
terrible
feature
wise.
We
do
have
a
number
of
new
Trace
ql
features
we
support
kind,
which
is
a
cool
one,
so
I
don't
know
why
we
didn't
do
this
with
2o
I
think
we
just
missed
it,
but
we
have
kind.
Is
it
like
an
intrinsic
in
to
the
spam?
Every
every
span
has
a
kind,
so
it
can
be
like
kind
server.
You
can
search
for
that.
A
You
know
kind
clients,
there's
also
producer
and
consumer
and
I
think
some
ones
that
I'm
not
gonna,
remember
off
the
top
of
my
head,
so
consumer
and
producer
tend
to
refer
to
like
often
like
Q
based
traces,
like
consumers,
pull
from
cues
producers,
push
to
cues
and
then
server
and
client
kind
of
are
more
traditional,
HTTP
requests
or
network
style
request
types
of
spans.
So
it's
kind
of
a
cool
kind
of
a
cool
addition.
You
can
write
a
query
like
this
now
kind
equals
server
and
array.
A
Cool,
oh,
we
added
arbitrary
math.
So
let's
put
some
Trace
keyl
stuff
in
here.
Well,
this
is
a
fun
one
to
add
because
it
makes
for
silly
queries,
but
we
added
arbitrary
math.
So
you
can
do
this
and
I
I'm
just
going
to
make
a
giant
tree
here
math.
So
you
can
do
like
you
know,
span
dot,
bytes
process
greater
than
10
times,
so
you
can
do
this
kind
of
work,
which
is
fun.
So
you
can
write
mathematical
kind
of
statements
in
your
in
your
query
and
it
will
do
the
right
work.
A
You
know
it'll
correctly
do
the
stuff.
You
could
also
do
the
opposite
of
this,
so
we
could
go.
You
know
and
bytes
process
divided
by.
A
It
makes
more
sense
to
you
yeah.
It
might
be
useful
to
write
some
queries
that
read
a
little
bit
better
than
having
to
type
that
full
number
in
there.
You
might
find
some
other
uses.
Technically
you
can
do
stuff
like
this.
You
know,
span
dot
jobs
divided
by
spandup
bytes
or
something
greater
than
some
threshold
number
three
you
can.
If
you
have
different
number
attributes,
you
can
do
math
with
those
as
well.
So
that's
another
addition
to
traceql
in
this
release.
A
We
added
new
Aggregates.
So
previously
we
had
average
and
count
I,
think
We've
added
Min
and
Max,
so
Min
of
some
field
greater
than
a
value
and
then
we've
also
added
Max
I
thought
there
was
a
third
one.
My
change
log
entry
said
oops
said
min
max
and
average,
but
we
had
average
right.
A
That's
that's
incorrect.
Let's
go
look
at
that
because
average
was
in
a
2-0
some,
my
bad!
That's
I
should
fix
that
change
log
entry.
So
then
some
I
don't
know
why
you
would
ever
want
to
sum
a
bunch
of
durations
but
hey.
Maybe
you
do
greater
than
one
hour
so
min
max
sum.
Our
new
Aggregates
and
traceql,
which
is
a
cool
addition
we
fixed
some
bugs
before
this,
would
return.
A
Let's
see
number
comparison
before
this
would
return
nothing
which
was
kind
of
difficult
to
understand,
but
we've
fixed
it
so
that
all
numbers
compare
to
all
other
numbers.
It
was
due
to
the
data
type,
so
integer
compared
to
float
would
always
return
false.
So
we
fixed
all
the
number
type
comparisons
to
work
and
any
any
number
compared
to
any
other
number
will
convert
correctly.
A
So
this
will
now
return
traces
where
that
would
not
have
returned
traces
of
2.0,
oh
and
then
I
think
Jenny,
who
is
not
on
this
call
fixed
an
issue
where
you
couldn't
write
this
duration,
greater
than
1.5
seconds.
You
had
to
write
a
duration
greater
than
one
you
know,
or
one
second
and
500
milliseconds.
You
had
to
write
that
basically,
but
now
we've
made
it
so
you
can
do
like
float.
Durations,
which
is
kind
of
nice,
is
a
little
bit
more
succinct.
A
It
doesn't
really
add
new
abilities,
but
it
is
I
think
a
lot
more
natural
to
write
that
1.5
seconds
than
to
like
one
second
500
milliseconds,
which
worked,
but
it
was
kind
of
annoying,
so
I
think
those
are
the
trace,
ql
trisql,
fixes
and
features.
This
is
a
big
one
too
where's
the
well
I
can't
find
it.
A
But
there
is
a
really
good
there's
a
really
nice
Improvement
to
trace
skill
performance
where
we
only
we
pull
a
lot
less
data
before
we
when
we
assert
the
when
we
assert
the
conditions.
So
previously,
we
would
actually
pull
far
too
much
data
and
then
assert
the
conditions
and
throw
a
bunch
of
it
away.
We
made
a
change
to
more
to
Target
the
data
that
we
pull
a
lot
tighter
and
it
it
makes
for
a
much
more
efficient,
Trace
ql
queries.
A
We
do
have
a
ways
to
go
to
continue
to
improve
it,
but
this
release.
You
should
see
some
really
nice
changes,
it'll.
Also,
let
you
this
particular
Improvement
performance
Improvement
and
it's
particular
on
memory
and
bandwidths.
This
will
let
you
increase
the
target
job
size,
bytes,
Target,
job
size
bytes.
In
the
query
front
end,
you
can
really
beef
this
up
quite
a
bit.
We
run
it
as
200
megabytes
in
our
Cloud
offerings
and
I
think,
like
700
megabytes
in
our
internal
Ops
cluster.
A
Is
there
anything
else?
Oh
Alton?
Who
I
don't
see?
I,
don't
think
he
made
it
to
this
call
added
the
ability
for
the
metrics
generator
to
attempt
to
like
up
sample
basically
using
X
sample
ratio,
so
the
metrics
generator
just
counts
spans.
A
You
know
exactly
as
they
receive
them,
but
open
Telemetry
has
added,
in
one
specific
case,
a
some
information
about
the
ratio
of
the
spans
like
how
many
spans
that
one
represents,
and
you
can
use
that
with
this
PR
to
to
inflate
your
metrics
to
better
represent
the
like
real
traffic,
the
true
traffic
before
sampling,
which
is
kind
of
cool.
A
Oh,
this
is
a
funny
one.
This
was
broken
before
you
couldn't
do
this.
This
is
a
trace,
ql
query
that
didn't
work,
which
was
weird.
We
couldn't
write
count
greater
than
negative
one
that
broke
so
that's
been
fixed
and
you
can
do
negative
values
in
your
Aggregates
and
I.
Think
that
might
be
roughly
it
Zach
do
you
know
anything
else
in
here.
That's
worth
calling
out
the
check,
config
flag,
oh
Azure
workload,
identity,
identity,
so
a
new
way
to
off
to
Azure
back
ends.
A
I
think
those
are
the
big
ones.
Yeah.
A
Right
this
V
Park
A2
is
kind
of
a
Cool
Change.
It's
not
default
yet,
but
we're
trying
to
build
a
parquet
file
that
works
with
a
wider
array
of
tooling,
and
we
found
that
some
of
the
choices
we
made
for
whatever
reason
are
not
compatible
with
off-the-shelf
parquet
tooling.
So
Adrian
worked
on
this
and
this
brings
us
closer
to
that,
and
it
also
adds
a
couple
of
columns
that
we're
going
to
use
for
structural
structural
queries.
A
So
those
are
queries
like
The
Descendant
operator
and
the
parent
operator,
and
these
things
sibling
operators
we
need
to
add
a
couple
more
columns
and
they
were
added
in
V,
Park
A2,
so
he's
working
on
populating
those
columns
and
then
we'll
be
able
to
add
the
functionality
or
those.
You
know
those
descendant
operators,
which
I
think
are
probably
one
of
the
most
exciting
things
about
traceql
and
definitely
definitely
looking
forward
to
that
cool.
A
All
right,
I
think
that's
I,
think
that's
a
good
overview
of
two
one.
Any
questions
about
two
one
or
trisql
or
just
run
in
Tempo
or
anything
else.
E
Actually
yeah
I
I
was
testing.
I
was
talking
with
Marty,
because
I
was
having
some
issues.
Querying
I
was
doing
some
querying
that
we're
using
some
tags
that
has
not
columns
in
parquet
and
looking
in
the
last
24
hours.
I
had
since
six
queriors
with
two
cores
and
eight
gigabyte
of
memory,
and
they
were
just
dying
all
the
time.
Just
restarting
then
I
try
with
main
branch
so,
but
now
the
2.1
and
yeah.
E
The
problem
looks
like
it's
all
now,
I
can
I
can
do
even
at
seven
days
View
that
I
couldn't
do
before.
Okay,
let's
say
I
would
like,
if
I,
if
I,
can
also
configure
The
Columns
myself
like
if
I
know,
that's,
for
example,
I
know
that
Tempo
has
HTTP
dot.
I
think
was
was
URL
Noble,
Star,
URL
yeah
as
a
column,
but
Target
for
me
is
better
in
some
cases
to
aggregate
because
it
doesn't
contain
query
strings.
A
Yeah,
so
Adrian
is
working
on
that
project
as
well.
I
think
he's
trying
to
weigh
there's
two
directions:
we've
been
talking
about
going
and
it's
something
that's
on
his
list.
One
is
where
you
configure
it,
so
you
like
statically
configure
these
are
my
special
columns
and
you
get
to
name
the
columns
that
get
pulled
out
into
that.
You
know
into
those
extra
columns.
A
He
is
really
hot,
on
trying
to
just
make
every
column
Dynamic,
though,
which
would
be
amazing
but
I
think
very
difficult.
So
it's
a
discussion
we're
having
internally,
we
are
moving
forward
with
it.
We
totally
agree
it
needs
to
exist.
We
want
it
for
our
Cloud
offering,
because
we
have
tenants
who
have
all
kinds
of
different
columns.
A
A
A
B
A
A
The
information
you
get
back
from
the
query
is
immediately
useful,
more
useful
than
it
is
now
so
like
write,
queries
and
learn
things
about
your
traces
and
not
jump
to
a
specific
Trace,
but
like
like
repeatedly
right
choices
to
get
aggregate
information
like
you
could
do
like
a
grouping
right,
like
average
duration,
Group
by
service
name
and
then
I
want
a
result,
set
that
immediately
communicates
that
to
you,
and
maybe
you
jump
over
to
some
specific
traces,
but
I
want
that,
like
data
right
there
immediately
in
front
of
you,
so
that's
something:
we've
been
working
on
internally
I've
got
a
couple
design
docs
out
on
and
I
really
want
to
move
forward,
but
I
really
want
the
traceql,
like
experience
to
be
like
a
learning
experience
where
you're
writing
a
query
and
learning
about
your
traces
and
the
structure
of
your
data
and
iterating
on
that
until
you
and
as
you're
iterating
on
that
I
want
you
to
like
yeah,
learn
and
then
eventually
you
get
to
the
results
you
want
and-
and
that
comes
from
just
like
my
own
experience,
using
Prometheus
and
Loki
like
when
I'm
writing
a
Prometheus
query.
A
I'm
learning
the
whole
time,
I
write
a
very
basic
query
and
then
I
aggregate
by
this
and
then
I
make
a
change.
I
do
a
histogram
on
a
different
I
can
pop
the
you
know
the
histogram
value
around
a
little
bit
the
quantile,
and
that,
like
learning
experience,
teaches
you
about
your
application.
I
really
want
the
same
for
traces,
so
it's
something
I've
been
trying
to
drive
internally,
some
and
I.
Think
that's
going
to
be
the
next
step
for
Trace.
Ql
is
a
better
experience
in
the
front
end
and
not
I.
B
B
Other
thing
I'm
excited
about
that
I.
Don't
think
we're
ready
to
share
any
screenshots
of,
but
there's
some
work
on
and
grafana.
That
is
going
to
make
viewing
large
traces
way
better
and
we
got
a
little
preview
of
that.
I.
Don't
think
we're
ready
to
show
anything,
but
it's
gonna
be
awesome,
so
stay
tuned.
A
Right
all
right
anything
else.
D
A
That's
a
good
question
in
our
Ops
cluster.
Let
me
go.
Let
me
go.
Do
some
digging?
We
get
a
trace
back,
I!
Think
in
what
do
you
say?
What
do
we
say
Zach
a
couple
seconds
and
we're
talking
about?
A
Is
it
I,
don't
even
know
if
I'm
at
over
a
petabyte
of
data
billions
of
traces,
it
does
require
some
tuning
when
you
get
pretty
and
you
get
much
larger,
there's
some
touch
points.
There's
some
things
you
can
hit
to
improve.
A
Trace
queries
and
I
do
want
to
make
better
improvements
to
it,
like
it's
kind
of
in
an
area
that
we've
ignored
a
little
bit
while
we've
done
the
trace
queue
on
the
parquet
and
I
know
it
slipped
from
V2
in
terms
of
performance,
just
the
trace
by
ID
search,
but
in
our
in
our
Ops
cluster
I'd,
say
four
to
five
seconds
for
yeah
billions
of
traces
in
a
terabyte.
So
it's
one
of
those
things
that
I
want
to
improve,
but
is
not
glaring
and
so
I
have
not
pushed
on
it.
D
So
like
in
our
production
cluster,
basically
in
stage
we
get
data
back
in
two
seconds,
but
in
production
cluster
we
have
been
having
with
this
weird
issue
in
last
two
days
like
one
of
our
clients
went
rogue
and
he
pushed
a
ton
of
data,
and
so
now
we
have
like
40
plus
terabytes
of
data,
but
when
we
are
trying
to
query
in
last
1R
or
or
last
like
couple
of
hours
or
like
a
hourly
time
zone
and
the
trace
queries
take
like
more
than
60
seconds
or
like
sometimes
it
just
times
out,
and
so
we
try
tuning
like
the
workers.
D
A
I
think
the
things
I
would
look
at
if
you're
kind
of
in
this
world
is
definitely
the
query.
Shards,
okay,
query
front
end
traced
by
the
query:
shards
set
that
to
250.
there's
a
Max
of
255
because
of
the
way
ananya
wrote
it
three
years
ago.
A
If
anybody
wants
to
take
that
to
make
the
query
shards
that
they
can
go
higher,
please
do
it's
it's
a
it's
a
issue!
That's
marked
like
a
good
first
issue
and
we
just
have
never
had
time
but
put
that
in
250.
That's
a
good
initial
thing
to
do.
Okay,
how
I
think
the
next
questions
I
would
have
is
like
how
long
your
block
list
is
the
longer
the
block
list.
The
more
time
it's
going
to
take
and
I
would
encourage
you
to
increase
compactors
we've
seen.
A
D
The
block
list
is
at
this
moment
at
50
000.
We
are
trying
to
bring
it
down,
but
yeah.
It's
it's
in
50,
000
for
sure.
A
Yeah,
so
60
seconds
is
way
too
long.
We
can
get
it
below
60
seconds,
but
50
000
is
kind
of
approaching
sizes
where
we
have
seen
longer
times
for
traces,
but
definitely
not
60
seconds.
Oh,
what
the
size
of
the
trace
also
matters
a
lot
if
it
gets
into
the
tens
or
hundreds
of
Megs,
then
Tempo
does
struggle
to
return
traces
at
that
size.
Okay,.
D
A
I
would
just
review
the
size
of
the
traces
so
yeah
we
had
a
customer,
try
to
pull
a
400
megabyte
Trace
two
couple
days
ago
and
they
filed
an
issue
like
we
can't
get
this
Trace
in,
like
it's
400
Megs,
like
sorry,
Tempo
cannot
return
your
400
megabyte
tracing
your
phone,
so
the
size
of
the
trace
does
impact
a
lot.
What
I
would
do
is,
after
you
try
those
things
start
a
discussion
on
the
GitHub,
so
that's
a
great
place
to
have
kind
of
like
a
long-running
conversation
about
tuning.
A
We
were
just
went
through
one
on
Cut
compacting
and
pushing
blocks
I
think
we
have
one
going
on
that
somebody's
looking
at
Trace
ql
sharding.
So
if
the
best
thing
you
can
do
is
file
an
issue
there
and
put
your
config
give
us
some
good
metrics
and
we're
probably
gonna
it'll
go
back
and
forth
for
a
couple
weeks.
C
Just
curious
you
guys
were
chatting
about
block
list
is
that
in
the
compactor
config
settings
or.
A
The
compactor
pushes
the
block
list
down,
but
there's
no
like
max
blockless
length
it'll.
It's
really
your
attention
right,
so
it'll
start
deleting
blocks
after
their
attention
gives
up.
But
then
there's
a
metric
called
Tempo
DB
underscore
block
list
underscore
length
or
something
I'm,
not
sure
off
the
top
of
my
head,
but
that'll.
A
How
long
the
block
list
is
and
yeah
like
I
said
twenty
to
thirty
thousand
has
always
been
fine
for
us,
and
it's
always
been
our
Target
80
plus,
is
when
we
really
started
to
see
problems
and
started
panicking
a
little
bit.
A
E
A
It'd
be
interesting
to
do
retention
based
on
total
count
like
just
retention
based
on
20
000
blocks,
whatever
that
turns
out
to
be
and
just
delete
the
oldest
blocks.
When
you
hit
that
I,
don't
know
not
sure
if
anybody
used
that
or
not.
D
Just
just
out
of
your
curiosity,
what
is
that
block
list
block
list
is
the
count
of
blocks
which
we
have
in
the
back
end
in
this
storage.
Or
is
it
something
else?
That's.
A
D
A
And
the
reason
that
impacts,
Trace,
ID
search
is
because
we
have
to
look
in
every
single
block,
yeah
and
so
compactions.
The
main,
if
we
didn't
have
Trace
by
ID
search
compaction,
would
be
way
less
important,
because
keeping
that
list
down
is
what
makes
search
better.
Oh
another
thing
you
can
do
is
use
memcached.
So
with
memcache
d,
you
can
cache
the
bloating
filters
which
reduces
pressure
on
S3
or
Azure
or
whatever,
and
should
speed
up
your
queries
a
bit
as
well.
D
B
E
Maybe
if
you
have
one
minutes,
I
can
show
also
what
I
mentioned
the
last
permutical,
like
I
I,
feel
it's
very
hacky,
but
I
build
a
little
refunding
to
aggregate
the
tempo
traces
well
spans
and
I
think
it
is
a
little
bit
aligned
with
what
yeah
your
were
saying
like
how
we
analyze
the
data
when
there
is
error,
how
we
see
yeah
how
you
go
into
it
to
get
familiar
with
what
is
going
on
yeah,
maybe
I
can
show
it.
It
is
not
pretty.
It
is.
E
I
will
show
you
this
environment.
This,
oh
I,
need
to
give
permissions
to
this.
A
I
also
has
been
talking
about
this
for
a
while
and
I've
been
excited
to
see
it,
it
might
make
you
write
a
blog
post
about
it
or
something
fast,
though
sorry
bud.
A
What
is
your
sorry,
I
can't
see
your
another?
Is
it.
A
D
I,
don't
know
the
bytes
I
knew
the
samples
yesterday
was
like
around
150k
and
it
went
down.
We
got
it
down
to
70
000
spans
per
second
I
can
take
quickly.
Take
a
look
at
that.
I.
Don't
know.
A
That's
pretty
good
150
000,
sizable
cluster,
that's
nice!
Yeah.
D
So
we
have
been
playing
with
a
lot
with
queries
because,
again
of
the
traffic
we
were
running
like
10
query
years
yesterday
and
and
then
that
that
didn't
help,
because
again
it
was
like
I,
don't
know
why.
D
But
for
somehow
our
requests
were
like
even
though
query
front
end
said,
the
request
were
over,
like
timed
out
the
query
kept
on
searching
for
those
blocks
and
that
kind
of
increased
the
load
on
the
back
end,
and
so
we
were
getting
read
timeouts
back
again,
so
what
I
ended
up
doing
was
we
ended
up,
doing
was
bringing
it
down
to
five
and
seeing
if
that
helped
that
kind
of
helped
for
a
bit,
but
still
we
are
still
seeing
seeing
some
issues
so
yeah.
This.
A
Let's
chat,
let's
chat
in
a
discussion,
I
think
you're
on
to
something
important.
We
run
like
50
to
100
queries,
so
we
run
very
small
tons
of
very
small
queries.
Okay,
so
that
would
be
my
recommendation,
but
I
have
always
been
suspicious
that
a
context
cancel
does
not
correctly
propagate
through
the
entire
system.
A
So,
if
you're
seeing
that
I,
don't
doubt
it
so
if
you
got
us
some
graphs
and
metrics
and
logs
to
help
us
track
this
down,
I
think
I
think
we
could
probably
help
diagnose
it
and
fix
it
for
both
of
us.
Okay,.
E
Let
me
try
again:
can
you
use
same
screen
now,
Okay
cool,
so
normally
yeah?
So
normally
we
have
these
right.
So
we
just
write
some
queries
and
if
you
want-
and
there
is
a
trick
that
you
can
do
these,
for
example-
and
then
Tempo
will
return
these
attributes
back.
That's
quite
neat
because
now
I
can
use
this
to
build
aggregations.
E
So
if
I
take
the
same,
this
same
query
here:
I
just
built
this
very
silly
thingy,
but
I
can
now
do
aggregation.
So
now,
I
can
see,
for
example,
in
this
environment
account
by
status
code.
The
way
how
it
decides
which
attribute
to
aggregate
is
the
one
that
has
the
the
wild
card,
so
I
could
put
whatever
with
white
car,
and
then
he
knows.
E
E
I,
don't
have
autocomplete,
so
that's
not
nice,
but
yeah
now
I
know
it's
some
service
called
PHP.
Something
is
the
one
who
is
doing
all
of
this,
so
it
already
narrowed
down
the
issue
for
me.
So
a
lot
of
times
when
I'm
looking
errors,
I
just
try
to
yeah
just
narrow
the
issue
to
see
what
is
the
root
cause
and
so
far
yeah.
This
is
it's
helped
me
it's
quite
simple,
but
like
this
I
can
do
the
nice
aggregation.
A
Are
you
was
that
a
you're
on
a
table
or
a
graph
on
a
plug-in?
How
did
you
do
that
processing.
E
So
I'm
using
maybe
you
can
share
also
have
you
seen
a
plugin
called
Infinity
so
Infinity,
so
it's
a
data
source
and
it
allowed
me
to
do
like
HTTP
requests
and
then
I
have
like
a
small
backend
service.
It's
just
a
small
goal,
application
that
is
doing
this
aggregation
for
me
and
send
it
back
to
grafana
you
already
like
yes,
aggregated.
A
My
table
Yeah,
okay,
well,
that's
cool,
yeah,
I!
Think
that's
really
neat,
and
it
is
really
in
line
with
what
I'm
thinking
about
next.
So
it's
cool
that
you're
already
doing
it.
It's
so
worth
doing.
You
wrote
a
whole
service
to
do
it
and
hacked
it
in
grafana.
I.
Think
that
kind
of
validates
some
of
the
thoughts
we've
had
as
we
work
on
this
we'll
definitely
talk
about
the
community
call
I
want
your
thoughts
on
it
faster
as
we
move
forward
with
it
because
you're,
you
know
using
something
very
similar
here.
A
All
right
team,
how
we
doing
anything
else.