►
From YouTube: Grafana Tempo Community Call 2023-09-14
Description
Join our next Tempo community call: https://docs.google.com/document/d/1yGsI6ywU-PxZBjmq3p3vAXr9g5yBXSDk4NU8LGo8qeY/edit
What was discussed:
- v2.2.3
- Upcoming TraceQL changes
- Highlight of some community PRs
- Overrides module refactor
- Lots of good community ?s
A
We
have
a
somewhat
short
agenda,
I
suppose,
as
always
feel
free
to
throw
questions
in
the
chat.
I'll
I'll
watch
that
a
little
bit
feel
free
to
unmute.
If
that's
your
style
and
talk
no
big
deal
either
way.
A
This
is
not
a
canned
presentation,
you're
more
than
welcome
to
ask
questions
and
have
a
conversation
and,
at
the
end,
we'll
open
up
the
floor
to
anything.
So
if
you
do
have
some
more
generic
questions
feel
free
to
kind
of
think
about
those
throughout
the
course
of
the
meeting.
I
suppose
I
should
share.
Let
me
share
my
screen
here
and
that
way
we
can
kind
of
work
through
this
agenda.
A
All
right
so
223
was
released.
We've
had
this
has
been
the
release
with
the
most
batches
and
it's
been
about
this
S3
credentials
change.
So
it's
kind
of
hard
for
us
to
test
every
single
of
the
thousand
different
S3.
You
know
authentication
options
as.
B
A
As
Azure
is
kind
of
in
a
similar
bucket,
yeah,
GCS,
I
suppose
as
well
but
anyways
in
2-2,
we
made
a
change
to
the
way
we
do
S3
off,
but
it
did
break
some
people
in
ways
that
I
do
not
understand
because
I
do
not
know
all
the
intricacies
of
S3.
So
we
release
two
to
two,
though
broke
a
different
set
of
people,
and
now
223
makes
this
configurable
and
I
believe.
This
is
the
same
way
as
mamir
is.
Is
that
true,
Mario
mamir
has
a
very
similar
option.
C
Yes,
yes,
it's
even
the
same
config
parameters,
yeah
same
everything,
okay,.
A
A
I
think
this
should
and
it
mimics
mimir,
so
it
you
use
my
mirror
Loki.
Hopefully
you
have
a
similar
kind
of
experience.
There
excuse
me
and
you
can
choose
this
auth
type
explicitly,
and
then
the
old
chain
of
authentication
or
credentials
works
the
way
it
has
always
worked.
So
there
you
go
223.
If
you
are
interested
I,
suppose
I
should
update
the
home
chart
if
anybody's
like
super
excited
about
updating
home
charts.
A
Prs
are
welcome
if
not
I
will
get
to
it
when
I
get
to
it.
In
fact,
I'm
going
to
make
a
note
about
that
all
right,
so
223
is
out,
we've
had
a
I,
wouldn't
even
call
it
really
Rocky.
Just
this
one
particular
issue:
we've
kind
of
ping-ponged
on
a
bit
which
is
whatever
kind
of
frustrating.
It
speaks
to
the
fact
that
we
don't
have
amazing
integration
tests
on
all
of
the
different
off
Types
on
all
the
different
backgrounds,
but
I,
don't
think
that's
within
our
realm
of
you
know
sanity.
A
A
That's
an
extension
of
traceql,
that's
going
to
add
support
for
arrays
and
KV
Maps
support
for
attributes
links
events
a
lot
of
the
things
that
are
currently
missing
and
once
we
have
what
we
believe
is
a
solid
set
of
patterns,
we're
going
to
write
a
big
PR,
that's
going
to
PR
that
as
a
design
doc
to
the
public
repo
and
ask
for
comments
so
we're
going
to
come
up
with
a
set
of
patterns
we
like.
A
D
A
Don't
just
start
working
on
something
that
everyone
hates.
We
want
everyone
to
be
like
have
a
voice
here,
and
so
we're
going
to
go
first
round
internally.
Once
we
have
that
something
we
can
agree
on
internally,
we
will
make
it
external.
We
want
those
comments,
so
we
can
adjust,
but
we've
already
published
two
elements
of
this
early
to
get
some
to
get
some
feedback
on.
A
The
first
is
the
way
we
want
to
do
arrays
and
KB
Maps,
so
the
concept
of
like
there
or
sorry
an
open,
Telemetry
attribute
can
be
an
array
or
a
list.
We
want
to
be
able
to
index
into
that
list
and
do
comparisons
on
the
data
type
internal
to
that
at
that
index,
and
we
also
want
to
be
able
to
wait
to
say
kind
of
like
a
wild
card
so
for
all
elements
in
this
array,
if
any
match
this
condition
greater
than
equal
to
five.
A
You
know
that
would
be
true,
essentially
so
a
new
way
or
a
way
to
access
these.
These
these
arrays
elements
of
the
array,
We
Believe.
Of
course
this
is
pretty
standard
for
syntax
and
people
feel
familiar
with
it.
I
think,
if
anything
is
a
little
controversial,
we
talked
different
ways
to
do
this
part.
The
Wild
Card
idea
we
settled
on
this,
but
again
we
want
feedback.
We
want
thoughts
from
the
community
and
the
second
one
is
the
KV
map.
So
this
is
like
a
dictionary
right
or
a
map.
You
have
a
string.
A
The
string
is
going
to
reference
a
value,
so
we
want
to
be
able
to
look
up
by
string,
of
course,
Foo
equals
bar,
and
this
is
what
we
settled
on
for
wildcard
here.
We
wanted
to
be
able
to
basically
write
a
regex
against
that.
B
A
In
the
event
that
you
wanted
to
Match
multiple,
and
so
we
settled
on
like
the
wild
card,
regex
to
essentially
mean
all
so,
please
give
us
feedback
in
this
issue.
It's
in
the
community,
it's
in
the
community
or
sorry
the
community
called
doc,
there's
a
link,
but
we
are
kind
of
outlining
our
next
step.
A
Thoughts
and,
like
I,
said,
there's
going
to
be
one
big
PR,
which
outlines
all
this,
and
it's
going
to
be
another
good
place
to
kind
of
have
a
discussion
with
the
Developers
and
try
to
come
to
agreement
as
a
group.
How
we
want
to
move
Tracey
well
forward.
So
me
and
Marty
have
haggled
a
bit
here,
but
I
appreciate
comments
from
anyone.
A
The
second
one
I
think
is
a
little
bit
more
straightforward,
but
right
hotel
for
unknown
reasons
allows
any
character
in
a
label
name
which
is
kind
of
insanity,
including
oh
I,
think
it's
all
utf-8
so
like
all
emojis
are
in
there
right
period,
underscore
quotes
double
quotes.
Literally,
anything
is
a
valid
name
of
an
attribute
which
Tempo
does
not
support
like
you
can't
have
a
space
in
an
attribute
right
now
in
traceql.
A
So
this
is
what
we've
decided
to
do
if
you
want
one
of
these
escaped
characters.
One
of
these
invalid
characters-
you
open,
quotes,
you
type
your
label
name
with
spaces
or
some
weird
characters.
Tempo
doesn't
support,
otherwise
pluses
or
any
any
like
operator
is
another
good
one.
We
don't
support
operators
and
attribute
names
right
now,
because
we
can't
distinguish
between
an
actual
plus
and
the.
E
A
So
we're
going
to
this
open
some
quotes
type
in
your
your
your
name
with
all
your
weird
characters
that
don't
count
and
then
close
quotes.
So
that's
the
second
one
we've
published
publicly
and
another
one
that
we
are
looking
for
comments
on
too
I
think
this
one's
a
little
bit
less.
This
one
feels
more
straightforward
to
me,
but
I
don't
know.
Maybe
people
do
feel
strongly
about
the
way
this
would
work
and
then
you
can
use
pluses
and
names
or
minuses
or
underscore
or
sorry
or
spaces
cool.
A
A
Finally,
for
my
part
before
I
kick
it
off
to
other
folks
Community
contributions.
We've
had
some
really
solid
Community
contributions
in
the
past
month
or
two
and
I
really
wanted
to
bring
those
to
the
attention
of
the
community
call
just
just
to
like
acknowledgment
of
some
of
this
work.
Being
done,
this
I
don't
really
know
these
people's
names,
but
Lassie
Hills.
He
rewrote,
or
he
moved
us
from
an
old
Azure
SDK
to
the
new
one
in
particular.
A
I
think
I'm
just
super
impressed
with
the
amount
of
time
this
person
spent
diagnosing
an
issue
that
we
never
see,
so
they
were
seeing
some
kind
of
race
condition
on
ingesters,
and
this
went
on
for
months.
I
did
my
best
to
help
them,
but
we
don't
see
it.
I
couldn't
home,
diagnose
it
and
they
eventually
found
they
dug
deep
into
the
Azure
SDK
and
found
some
race
condition
that
was
causing
the
blocks
not
to
be
flushed
correctly
from
ingesters,
and
this
is
kind
of
a
fun
read.
A
So
if
you're
interested,
this
222129
issue
is
kind
of
a
good
read
a
really
nice
deep
dive
into
Tempo
and
the
Azure
SDK
yeah,
but
this
person
spent
a
lot
of
time,
swapping
us
from
the
old
to
the
new,
the
current
maintained,
Azure
SDK,
and
thanks
to
them
for
putting
that
together
for
us.
Secondly,
acoustic
Mitra.
A
A
Sometimes,
and
this
this
developer
has
given
us
three
really
solid
PR's
to
add
status,
messages
to
add
ancestor
parent
operators,
new
structural
operators
and
to
add
a
not
regex
as
well,
so
really
cool
contributions
from
cosic
for
the
for
Tempo
super
appreciate
those
all
right
with
that
I
will
hand
it
over
to
whoever
would
like
to
discuss
overrides
module,
and
the
limits,
I
suppose
is
all
wrapped
up
in
this
yeah.
C
Right,
okay,
so
I
think
we're
okay,
just
sharing
this
okay.
So
the
next
change
it
has
been
a
major
refactor
to
the
override
module,
I,
think
the
overwrite
module
art
grew
its
initial
scope
and
it
was
in
need
on
some
work
to
clean
up
how
we
configure
and
Define
overwrites.
C
So
this
refactor
contains
two
main
changes:
the
easier
to
see
like
the
first
one
is
we
moved
the
defaults,
all
the
default
overrides
that
you
would
configure
right
after
you
declare
the
overrides
block
into
a
default
like
they're
idented
into
a
new
scope
called
default,
pretty
safe
forward.
I
think
it
was
a
bit
confusing
when
you
were
declaring
these
defaults
weren't
explicit
defaults,
then
you
would
have
a
different
file
with
a
tenant
overrides
and
yeah.
C
It
wasn't
really
clear
and
Theory
now
is
scoped
on
the
defaults
and
the
second
change,
which
is
more
impactful,
I.
Guess
when
you
see
it
is
we
used
to
have
underscoped
override
parameters
like
you
will
see
in
the
old
config
here
in
the
document
we
had
ingestion
grade
strategy
or
metrics
generator
processes.
C
So
in
here
the
scope
of
the
override
was
implicit
in
the
name,
and
that
was
initially
okay,
but
we
kept
adding
more
and
more
and
more
overrides,
and
we
ended
up
with
some
very
long
funny
name
like
Matrix
generator
processes,
service
graph,
enable
client
prefix
and
it
was
like
50
character,
characters
long.
It
was
very
difficult
to
follow
when
you
were
reading
it,
it
was
impossible
to
type.
So
what
we
did
is
simply
scope
create
scopes
for
these
overrides.
So
now
you
have
for
this
massive
operator.
I
was
mentioning.
C
You
first
will
have
a
matrix
generator
block.
Well,
all
the
parameters
that
affect
the
metric
generator
are
located,
then
another
block
for
the
service
graphs
processor,
and
then
we
thinned
that
blog
any
configuration
that
affects
there's
a
specific
process.
So
what
we
hope
is
there
that
they're
easier
to
read
and
and
write
also
easier
to
discover
some
of
the.
So
some
of
the
overrides
are
pretty
self-explanatory
or
we're
self-replanatory
before
like
ingestion
rates
strategy
that
clearly
affects
ingestion,
but
it
wasn't,
they
weren't
clearly
grouped
like
for
them
I.
C
Think
in
in
the
read
block.
Now
there
were
some
parameters
that
were
in
clear
that
they
applied
to
read
and
you
have
to
go
through
every
single
one
thinking
well.
Does
this
apply
to
the
rich
path
or
not
and
now
they're
easily
grouped
in
in
one
block?
C
So
that's
where
the
changes
all
these,
like
the
entire
factor,
it's
backwards
compatible.
So
we
will
support
the
old
format
and
scoped
and
without
the
defaults
block
very
likely
for
a
very
long
time,
because
this
is
a
very
big
change,
so
don't
be
worried
about
that
and
I
guess.
Eventually,
at
some
point
we
will
move
to
to
the
new
format.
There
is
a
command
on
the
CLI
tool,
Tempo
CLI.
C
You
can
find
it
in
inside
CMD
down
on
the
folder
which
migrates
this
config,
so
you
can
fit
it
a
Tempo
config
and
it
will
output
the
new
version,
the
only
caveat
I
think
is
worth
mentioning.
Is
you
have
to
either?
If
you
decide
to
migrate,
you
have
to
either
migrate
everything
or
nothing.
So
what
this
means
is
you
either
have
everything
scoped
with
this
ingestion
read
Matrix
generator
blocks
in
the
defaults
and
per
tenant
overwrites
or
you
have
the
old
format.
C
You
cannot
have
a
scoped,
defaults
and
and
scoped
attendance
of
Rights.
You
have
to
choose
one
or
the
other
for
both
files
or
both
places
you
can
configure
tabs
so
yeah.
Those
are
the
changes.
I
hope
it
was
clear
enough
to
follow.
E
Yeah
something
else
I
wanted
to
mention,
like
the
biggest
downside
of
this
change,
is
that
the
error
messages
might
be
more
complicated
because,
like
how
the
parsing
will
work
is
will
be
first
try
to
parse
the
configuration
with
the
the
new
structure.
If
that
fails,
we
try
again
with
the
old
structure
and,
if
that
fails,
we'll
return
the
error,
but.
D
E
Still
kind
of
figuring
out
like
what
the
best
way
would
be
to
show
these
errors.
I
had
a
PR
App
to
like
try
to
smoothen
it
out,
but
yeah
maybe
be
confusing
at
first.
E
E
A
Cool
all
right-
this
was
always
something
of
a
mess.
It's
nice
to
see
this
get
a
bit
cleaned
up.
Have
some
nice
structure
here?
I
do
think
it's
going
to
be
kind
of
a
hard
migration
internally,
because
we
have
tons
of
you
know
configuration
deployment,
configuration
code
built
around
the
old
way
and
I
guess
we're
gonna
have
to
migrate
that
all
to
the
new
way
safely,
which
will
be
fun
and.
A
What
I
thought
I
was
going
to
hear
what
I
asked
when
are
we
deprecating
the
all
the
way,
yeah
all
right
cool?
So
that's
the
official
agenda
I'd
be
willing
to
take
questions.
Dean
would
be
willing
to
take
questions
about
the
project
or
about
your
installs
or
you
know.
Whatever
people
are
interested
in
roadmap
future
stuff,
we
did
just
have
an
interesting
production
incident.
A
I
could
talk
through
that
if
people
are
interested
in
hearing
kind
of
what
went
wrong
and
what
changes
we're
going
to
make
to
fix
it,
but
I'm
not
going
to
talk
about
it.
Unless
someone
asks
me
to
foreign.
A
I
saw
your
thumbs
up,
emoji
I've,
taken
it
I
think
that
counts.
Thank
you
all
right
here
we
go
so
I'll
type
in
some
details
here,
I
suppose
it
was
on
the
query
path.
The
right
path
was
stable,
but
on
the
read
path
we
started
seeing
ooms
in
queries
and
query
protons
pretty
aggressively
like
and
what
it
boils
down
to
is.
Tempo
does
not
protect
itself
well
from
on
the
read
path,
specifically
from
Trace
Trace
ID
queries.
A
So-
and
this
is
the
change
we're
gonna
make
to
fix
this
and
I
think
there's
two
components
here:
one
is
we
have
a
Max
Trace
size,
but
it
is
not
enforced
on
the
query
path
in
parquet
it
was
enforced
on
the
query
path
and
in
the
old
format
in
V2,
but
in
parquet
we
it's
really
hard
to
do
that
in
parquet,
because
in
the
old
format
your
Trace
was
just
a
bunch
of
bytes
in
one
spot
it
was
all
Proto.
A
It
was
really
easy
like
if
it
was
greater
than
a
certain
length-
just
don't
return
it,
but
in
Pro
and
parquet
your
Trace
is
split
across
thousands,
hundreds,
hundreds
hundreds
of
little
columns
and
you
can't
really
count
the
size
of
this
right.
It's
very
difficult,
so
we
kind
of
punted
on
it
and
we
just
got
kind
of
burned
on
it,
basically
from
a
customer
repeatedly
querying
over
and
over
again
for
these
very
large
traces.
A
So
one
is
better
large
Trace
protection
on
V
Park,
a
H
right
now,
when
you
say
Max
size
is
10
megabytes
the
ingestion
path
tries
to
protect.
It
tries
to
prevent
that
prevent
you
from
going
over
10
megabytes,
but
it's
impossible
to
perfectly
protect
ourselves
right
like
if
you
write
10
megabytes
of
a
trace.
We
would
accept
that
and
then,
if
you
wait
an
hour
and
write
another
10
megabytes
of
a
trace,
we've
already
flushed
that
old
trace
of
the
back
end.
So
we
continue
to
accept
it.
A
There's
no
real
way
to
like
scan
your
30-day
or
15
day
or
whatever
day,
retention
for
a
trace
ID
to
start
refusing
it's
impossible.
So
you
can
always
kind
of
push
greater
than
your.
You
always
push
more
than
your
max
Trace
ID
or
a
trace
size.
If
you
do
it
right,
so
better
protection
for
on
the
query
path.
So
the
query
side
needs
to
protect
itself,
because
you
can
request
two
large
or
traces
too
large.
A
Now
the
second
thing,
the
second
thing
that
got
us-
and
this
is
something
I've
kind
of
known
in
the
back
of
my
head-
needs
Improvement
and
never
really
have
done.
Is
the
querier
barrier?
Has
a
configurable
number
of
jobs,
so,
like
number
of
jobs
it
wants
to
accept
the
problem?
Is
it
does
not
distinguish
between
Trace
ID
and
search
search
and
Trace
idioga
same
it's?
A
These
are
both
jobs
tip,
so
it
says,
I
can
do
100
jobs
or
I
can
do
10
jobs
and
the
query
front
end
just
starts
throwing
jobs
at
it
and
it
doesn't
distinguish
between
tracevid
and
search
and
search.
Jobs
tend
to
be
very
small
and
quick.
So
our
query
path
in
prod
is
tuned
to
search,
so
people
searches
are
pretty
quick.
We
want
that
right.
The
problem
is,
if
you
start
slamming
tons
of
Trace
by
ID
queries
in
there.
A
Looking
up
a
Trace
by
ID
is
a
much
heavier
job
than
looking
up
at
looking
up
a
search.
So
when
you
are
overwhelmed
with
by
traced
by
ID
lookups,
the
queriors
are
trying
to
take
on
way
more
jobs
than
they
can
really
handle,
which
then
in
turn
causes
them
to
Doom.
So
I
think
the
two
major
fixes
I
want
are
better
large
protection.
Trace
production.
We
pushed
a
Band-Aid
to
prod
just
to
get
it
to
stop
happening
and
I'll
turn
that
into
a
real
PR.
A
Soon,
in
fact,
I
have
my
fake
PR
up
or
my
like
band
APR
somewhere.
Let
me
we
can
actually
put
that
up.
If
you
all
are
interested
and
then
the
second
one
is
to
split
the
concept
of
the
the
two
paths,
the
two
job
paths
search.
We
want
the
query
to
still
bring
Band-Aid
search.
We
still
want
to
bring
in
a
lot
of
jobs
and
try
to
do
as
much
as
possible,
because
search
jobs
are
tiny
like
we
said,
but
traced
by
ID
jobs.
A
We
wanted
to
be
able
to
configure
those
independently,
so
do
a
hundred
search
jobs,
but
only
try
five
Trace
by
ID
jobs
or
something
like
that
right.
So,
though
those
are
the
two
big
fixes,
I
think
there's
also
some
small
improvements.
Just
like
we
just
noticed
the
way
we
do.
Our
retries,
for
instance,
doesn't
really
work
well
in
some
of
these
scenarios,
so
I'm
going
to
spend
some
time
I
think
this
next
month.
A
A
All
right
team
anything
else,
questions
about
Tempo
in
general.
What
I'm
going
to
eat
for
lunch?
Oh,
oh
yeah,
sure
go
ahead.
Andreas.
F
Yeah
I
had
a
question
for
the
search
API,
but
he
kind
of
already
answered
it
sure
so
it
was
about.
Currently.
If
you
run
into
a
square
query,
you
only
get
the
spends
returned
which
are
in
the
query
and
I
was
wondering
if
it's
feasible
to
return
optionally
the
entire
place,
but
you
just
mentioned
that
it's
a
very
expensive
operation,
which
why
is
it
exactly?
Is
it
because
the
data
is
kind
of
split
in
multiple
columns
in
the
same
block?
So
you
need
to.
A
F
A
So
yeah,
when
we
return
Trace
by
ID,
it
goes
or
sorry
right.
We
have
to
pull
it.
Like
you
said
from
a
lot
of
different
columns,
we
actually
moved
through
a
couple
of
different
formats,
which
is
always
a
problem
and
go.
You
know
where
you
have
like
a
rich,
deep
object
model,
and
then
you
translate
it
to
a
second
Rich
deep
object
model
that
tends
to
use
a
lot
of
memory
because
of
all
of
the
Heap
allocations
and
that's
a
big
part
of
what's
Happening.
A
Now
I'm
really
excited
about
go
memory
Arenas,
because
I
think
it
will
be
great
for
Tempo,
because
Tempo
often
has
really
enormous
Rich
objects.
That
are
it's
a
trace
right.
It's
all
tiered
out
all
the
spans
all
their
attributes
and
all
these
references
we
individually
put
on
the
Heap,
and
then
we
tell
the
garbage
collector.
Please
figure
out
those
10
000
references.
Now
with
memory
Arenas,
we
can
begin
an
arena.
We
can
allocate
everything
all
of
those
elements
in
the
trace
into
that
Arena
and
then
the
heat
can
treat
this
as
a
single
object.
A
A
We
actually
can
return
more
spans
on
the
query
path,
the
search
path
than
we
do
because
on
the
search
path,
we
don't
reconstruct
the
entire
Trace.
We
only
like
look
for
very
specific
Fields,
like
duration
name
like
we
don't
build
or
find
all
the
attributes.
We
don't
care,
it's
only
the
ones
you
asked
for
as
well
as
a
couple
metadata
attributes.
A
In
fact,
just
today,
grafana
merged
a
PR
to
allow
to
that
to
make
that
configurable
tempo
right
now
has
a
parameter
on
the
search
path
called
SPSS,
which
was
actually
Community,
contributed.
So
SPSS
is
spans
per
span
set
and
on
API
search.
You
know
blah
blah
blah
Ampersand
SPSS
equals
30..
You
could
ask
for
30
spans
per
return
or
50
or
100
or
whatever,
so
that
is
in
grafana
like
Maine
right
now,
tip
of
Maine
grafana
does
this,
but
obviously
it's
not
been
released.
A
So
that's
coming
soon,
where
you
can
in
grafona,
say:
oh
I
actually
want
more
spans
per
span
set
and
then
that
would
allow
you
to
start
retrieving
like
larger
and
larger
percentages
of
your
traces,
which
I
think
which
I
see
like
people
wanting
for
sure.
A
A
Structural
information,
so
when
you
do
a
query
right
now,
you
just
see
like
three
spans
right.
Like
you
see
some
basic
metadata
I
want
to
return
enough
information
for
grafana
to
render
like
a
small
tree
there.
So
if
there's
a
child
relationship
or
a
descendant
relationship,
I
want
it
to
do
its
best
to
render
that
like
a
lot
of
times
it
won't
be
able
to,
but
sometimes
it
will
and
I
think
that
will
give
you
good
information
that
also
will
increase
the
value
of
this
parameter.
A
It
would
be
awesome
to
be
like
give
me
100
spans
if
you
felt
comfortable
with
that
and
then
or
you
know
like
if,
if
you're,
if
you
needed
that
much
and
it
would
render
a
large
percentage
of
the
trace
perhaps
and
to
give
you
a
really
cool
idea
of
like
all
right,
I
wrote
this
query
and
now
I'm
seeing
a
lot
about
what
it
returned
in
terms
of
the
structure
of
the
trace,
which
I
think
is
really
cool.
A
Can
you
get
my
email,
yeah
man,
sure
I?
Think
it's
public
like
I,
think
that's
a
it's!
Not
it!
It's
on
our
website,
I
believe
so.
I
have
a
strong
idea
of
where
I
want
Tempo
search
results
to
go.
I
want
you
to
be
able
to
ask
for
more
and
more
of
the
trace.
A
I
want
to
render
the
trace
as
a
trace
more
more
completely,
to
give
this
kind
of
experience,
where
you're
only
jumping
into
the
actual
Trace
ID
more
rarely
when
you
need
to
see
the
whole
thing
and
most
of
the
time,
you're
actually
able
to
stay
in
search
results
and
write,
queries
that
reveal
like
the
structure
of
a
trace
and
the
questions
you
have
about
that.
Trace
I,
don't
know,
that's
just
that's
where
I
wanted
to
go
and
we're
slowly
getting
that
direction.
I
think.
F
A
F
Basically,
yeah.
F
A
I
mean
I,
think
Jaeger
returning
entire
traces
is
a
mistake
on
their
search,
because
it's
just
so
much
data
like
you,
do
a
search
and
returns
everything
every
time.
It's
slowing
it
down
right
and
it's
preventing
them
from
doing
more
interesting
things
in
in
their
return.
I
think
is
actually
something
Yuri
wanted
to
fix
for
a
long
time.
I
noticed
it
at
some
point:
everyone
who's
dug
in
Yeager
notices
it
eventually,
and
everyone
wants
to
kind
of
clean
it
up,
because
just
so
much
data
return.
A
If
you
have
a
hundred
thousand
span
trace
and
your
search
happens
to
hit
it,
it
just
blows
the
front
end
up
for
no
reason
when
all
it
needs
to
really
return
is
metadata
right,
so
I
don't
think
we'll
grow
Tempo
this
way.
If
someone
really
wanted
it
and
submitted
a
PR
to
do
it
and
made
it
like
a
feature
flag
or
just
behind
a
parameter,
I
think
we'd,
consider
taking
it,
but
I
don't
have
interest
in
moving
Tempo
in
a
direction
where
our
search
returns
like
renders
and
returns
an
entire
trace.
E
F
A
I
think
it
would
be
more
palatable
to
come
up
with
a
list
of
things
we
considered
metadata,
like
that
count
the
errors
or
count
or
list
the
services
or
something
right
and
include
that
in
the
parquet
schema
at
the
highest
level
at
the
trace
level
and
then
return
that
you
know
if
we
needed
to
like
decorate
our
metadata
or
make
our
metadata
more
Rich,
that's
I
think
a
better
way
to
go
about
doing
it
so
at
the
highest
level
of
the
trace,
we
do
have
some
metadata
already.
I
can
go,
find
it
for.
A
D
A
We
already
store
some
metadata
root,
service,
name,
root,
span,
name,
and
then
this
is
the
trace
itself.
This,
like
actually,
is
the
full
thing
right
going
into
it
start
time
end
time.
This
text
is
just
to
make
things
a
little
bit
easier,
so
here
I
think
it
would
be
I
think
we
could
discuss
like
adding
right
a
count
of
errors
or
a
list
of
the
services
that
participated
in
the
trace.
Essentially
I
would
be
a
little
concerned
about
it.
A
I
think
maybe
we
would
put
a
limit
like
in
case
somebody
had
some
crazy
Trace
with
10
000
Services.
You
know
we'd
maybe
have
a
way
to
like
stop
accumulating
services
in
a
list
here
to
prevent
blowing
up
parque,
but
I
do
think.
The
idea
generally
is
very
good,
and
then
we
would
I,
don't
even
know
it's
it's
like
this
code
here
or
the
combine
code.
There's
a
couple
places
you
would
have
to
add
a
little
like
code
to
do
the
calculation.
Essentially
it's
not
as
bad
as
you
think.
Really.
A
You
can
kind
of
look
for
the
places
where
we
set
this
metadata,
because
we
do
that.
I
think
it's
during
combine
and
when
we
there's
only
two
places
when
we
translate
a
Proto
Trace
to
a
parquet,
Trace
is
the
first
one.
The
second
one
is
anytime.
We
combine
two
parquet
traces,
I
think
those
are
the
only
two
places
that
you
need
to
worry
about.
This
metadata.
A
Right,
the
combined
function
already
iterates
through
all
both
of
the
two
traces
it
already
does
the
job
of
like.
Let
me
check
every
single
span
and
every
single
trace
and
throw
away
duplicate
spans
and
bring
in
unique
spans
to
make
sure
I
have
a
complete
Trace.
So
it's
doing
so
much
work
already
that
adding
like
well,
let
me
add
a
counter
for
errors
or
let
me
accumulate
a
couple
service
names
is
significa.
It
will
cost
nothing
basically
compared
to
what
it's
doing
already.
A
I
will
say
that
schema
changes
require
a
new
schema.
Unfortunately,
so
this
would
be
a
v
Park
A4,
but
we
are
working
on
a
vehicle
park
A4
right
now.
So
if
Andre
asked,
if
you
have
interest
I
I
think
we
can
talk
about
that
like
as
an
extension,
it's
a
good
time
to
add
something
basically
like
if
we
want
to
add
another
field.
A
How
is
progress
going
on
V3
weeks
or
months
away?
V3
we
use
internally
in
prod
everywhere,
and
it's
fantastic.
It
saved
us
from
a
couple
of
different
incidents.
Already
the
speeds,
the
ability
to
pull
out
a
column
and
put
it
in
a
unique
column
is
amazing.
It
helps
us
on
speed.
It
helps
us
on
memory.
It
helps
on
everything.
It's
just
been
a
blessing
frankly,
props
to
Mario
and
Adrian,
who
spent
months
designing
and
building
and
working
on
that
and
I
can't
I
expect
it
to
be
way
rockier
getting
into
production.
A
A
It
is
default
now
in
all
of
our
production
clusters.
I
think
it's
actually
default
now
tip
of
Maine
frankly
and
yeah.
The
performance
improvements
are
amazing,
like
we're
scanning
terabytes
a
second
easily
when
you
search
for
span
level,
attributes
that
are
have
been
promoted
to
these
distinct
columns
and
almost
better
than
that,
almost
the
ability,
the
better
better
than
taking
out
the
columns.
A
You
want
to
query
it's
taking
out
the
big
columns,
like
you,
have
a
developer
development
team
that
puts
a
bunch
of
DB
statements
in
there
and
each
statement
is,
you
know
a
k
like
1K
or
10K,
and
it's
enormous
SQL
query
pulling
that
out
and
putting
that
in
its
own
column.
It
makes
it
so
much
faster
and
so
much
more
memory
efficient
on
every
query.
So
it's
been
one
of
the
highlights.
I'd
say
it's
the
biggest
major
Improvement
to
Tempo
since
parquet
it
is
the
biggest
leap
we've
made
as
V
Park
K3.
B
One
question
I
have
is:
what
is
what
is
the
latency
you
see
on
your
memcache
configuration
for
us?
We
see
a
latency
of
250
milliseconds
for
on
a
on
a
regular
basis.
Do
you
think
that's
great,
because
we
internally
are
debating
that's
not
great,
but
just
wanted
to
see
your
opinion
and
what
is
the?
What
are
the
values
you
see
in
your
infrastructures
for.
A
Sure
I'm
about
to
go.
Look
that
up
right
now:
I'm
wondering
if
you're
seeing
a
lot
of
250
milliseconds.
If
that's
the
timeout
like,
if
it's
capped
there
I,
don't
know
what
the
timeout
is
to
meant
cash
off
the
top
of
my
head.
We.
B
Updated
the
timeouts
on
our
back
end
to
a
little
bit
higher,
but
I
can
double
check
that
as
well.
A
Let
me
let
me
double
check
as
well:
I'll,
look
in
Ops
and
prod
in
a
couple
of
our
clusters
and
see
what
kind
of
timings
I'm.
Seeing.
A
B
A
Two:
it's
just
been
an
area
of
code
that
works,
but
we
haven't
spent
a
ton
of
time
like
aggressively
understanding
the
behavior
of
it
like
like
you're
asking
what
is
a
good
amount
of
time
and
my
honest
answer
is
I
kind
of
don't
know
because
it's
been
it's
been
so
long
since
we
have
really
reevaluated
it.
The
way
we
use
memcache.
So
that's
an
upcoming
project.
This
quarter
I
think
so
hit
rate's,
pretty
good
100
most
the
time.
Some
dips,
not
surprising
right.
A
You
do
a
query
over
a
range
that
it
hasn't
cached
in
a
while
we're
going
to
see
a
dip
there.
Where
resources
do
we
not
have
latency.
B
It's
in
the
tempo
reach
dashboard
again,
this
is
I,
believe
the
overall
memcache
configuration
dashboard.
A
D
A
B
D
Okay,
yeah
I.
Imagine
there's
just
some
balance
there
between
like
how
long
it
takes
you
to
pull
the
data
from
your
back
end
compared
to
how
long
it
takes
you
to
pull
the
data
from
memcache
and
kind
of
evaluating.
If
that
performance
trade-off
is
worth
it
and
then
also
the
calls
to
the
back
end
like
if
you're,
if
it's
cheaper,
to
run
memcache
than
it
is
to
you
know,
make
the
calls
to
the
back
end
and
pull
that
data.
Then
you
know
that
probably
factors
in.
A
E
A
A
I,
don't
know,
I
think
it's
around
300,
let's
just
say
that
that's
the
average,
so
here
you
go,
looks
like
our
P99
is
around
100.
Occasionally
spiking
up
to
the
250
range
p50
is
significantly
lower
here,
two
milliseconds
or
so,
and
it's
pretty
it's
pinned
down
there
at
the
bottom.
A
Excuse
me
seeing
some
obvious
spikes
when
there's
the
big
query:
spikes
here,
seven
thousand
almost
ten
thousand
requests
a
second,
so
memcache
there
and
yeah
the
p50
came
up
to
25.
It
looks
like
or
12
I'm
sorry
and
it
looks
like
we
capped
the
for
the
timeout
as
well.
B
There
you
go.
Thank
you.
One
follow-up
question:
if
I
may,
on
I
saw
your
hatreds
at
100.
Our
hatreds
memcache
rates
are
in
70
to
80
percent
of
the
range.
Can
you
give
us
some
pointers
to
get
it
from
70
to
80
to
100
or
maybe
90s.
A
D
A
10
000
blocks
or
Twenty
Thousand
or
a
hundred
thousand
blocks,
maybe
that's
where
we
calculate
it
most
of
the
time
like
I
said
this
is
I,
don't
want
to
say
ignored,
but
it's
definitely
like
a
lower
priority
because
it
just
memcache
is
awesome.
It
just
works
it
just
yeah
does
nothing
but
work.
I,
love,
memcached,
but
I.
Think
if
we,
if
we
see
it
in
the
60s
70s
80s,
we
just
scale
cash
up
and.
D
A
Deal
with
it
another
day,
you
know:
okay,
the
one
Improvement
we're
really
looking
to
make,
though-
and
hopefully
we
can
do
this
is
for
us
internally,
but
hopefully
we
can
do
a
good
job
of
sharing.
This
is
the
Loki
team
did
this
and
they
saw
great
results
which
was
moving
to
a
disc
based
memcached.
A
So
it's
a
lot
cheaper,
because
disk
is
cheaper
and
they
use
just
some
super
fast
Cloud
disk,
basically,
instead
of
memory,
and
they
take
a
little
hit
on
latency
in
exchange
for
being
able
to
have
way
more
cash,
essentially
because
disks
are
cheaper
than
memory
of
course,
and
so
there's
a
they
went
through
a
project
that
reduced
their
TCO
significantly
with
almost
no
impact
on
search
latencies
by
switching
over
to
some
disks
instead
of
using
memcache
memory.
So
that's
part
of
what
we're
looking
to
do.
Switch
to
disks
go
to
page
level.
A
Caching,
never
make
a
thing
called
a
store,
Gateway
and
I
use
memcache
instead
and
I.
Think
we're
gonna
I
think
we'll
see
some
ridiculous
times
in
the
future.
Right
now,
we're
I'm
already
impressed
with
how
fast
we
can
scan
Trace
data,
but
I
think
we're
going
to
see
some
even
better
times
in
the
next
couple
months.
B
Okay,
thank
you.
Thank
you
for
answering
that
I
guess
another
question
on
that.
Sorry
I'm
asking
too
many
things,
but
do
you
guys
alert
when
your
hit
ratio
go
down
on
your
memcache
or
you
just
as
part
of
the
on-call
workflows,
you'd
see?
Okay,
it
is
going
down.
We
scale
it
up.
A
We
alert
on
slos.
D
A
Like
a
read
SLO,
if
our
read
SLO
starts
tanking
I
would
I
don't
know
if
an
Engineers
would
you
ever
look
at
memcache
any
engineer
in
here
who
is
on
Tempo
on
call?
Would
you
look
at
them
cash
Mario,
saying
yeah?
He
totally
would
look.
B
C
E
A
I,
what
I
think
would
be
the
right
answer
to
that
question
is
maybe
a
critical
alert
would
be
your
actual
slos,
including
your
run
book,
go
to
check,
memcache,
see
what
that
looks
like
and
then
maybe
a
warning
alert
being.
What's
my
time
out
or
like
am
I
hitting
my
timeouts
a
bunch?
What's
my
latency
on
webcast.
B
D
A
Zach,
we
will
see
you
all
in
about
a
month.
Thanks
for
the
questions.
Great
questions
today,
we'll
see
all
about
a
monthly
Community
call
feel
free
to
talk
on
the
slack
GitHub
issues.
Github
PR's
everything
talk
to
us,
communicate
watch
for
those
Trace
qlprs.
We
really
really
want
you.
All's
feedback
on
where
we're
moving
next
and
we'll
see
you
in
a
month
take
care.
Everybody
come
to
slack
yeah.