►
From YouTube: 2023-02-09 meeting
Description
OpenTelemetry Prometheus WG
A
Yeah
everybody
here,
I
believe
has
been
here
before,
but
yeah
welcome
back
everyone
and
yeah
today,
I'm,
not
sure
how
much
there
is
to
discuss
on
Felix
your
side
or
the
elastic
side,
but
I.
Guess
it's
just
like
a
recap.
From
last
week
we
went
over
the
we
went
over
the
collector
Sig
meeting,
where
we
sort
of
presented
our
initial
thoughts.
A
Our
initial
ideas
about
you
know
profiling
and
how
that
will
work
in
the
collector
or
how
it
might
work
in
the
collector
and
some
of
kind
of
just
both
to
update
them
on
our
current
thinking
and
then
also
to
get
some
ideas
and
feedback.
Part
of
that
feedback
was
from
specifically
tigran
and
the
other
TC
members
was
just
that
there
should
probably
be
some
benchmarking
to
I,
guess
yeah.
A
One
of
the
big
parts
of
the
discussion
was
on
stateless
versus
stateful
protocols
and
basically
that,
if
it
is
State
full,
it
adds
a
much
higher
level
of
complexity,
and
so
that
would
need
to
be
justified
by
some
sort
of
benchmarks.
That
shows
that
the
Savings
of
a
stateful
protocol
are
worth
the
added
complexity,
so
yeah
that
was
probably
the
one
on
the
bigger
side
of
the
feedback
there
and
so
yeah.
So
then,
last
week
we
kind
of
talked
through
some
of
the
ways
we
could
collect
some
data.
A
That
would
be
helpful
in
moving
the
discussion
along
there
and
so
yeah.
The
biggest
thing
for
the
agenda
today
I
think,
will
be
the
pr
that
Dimitri
made,
which
basically
gets
a
bench.
It
uses
the
benchmarking
repo
that
we
already
had
sort
of
established,
but
basically
uses
that
repo
to
kind
of
get
the
ball
rolling
on.
A
You
know,
at
least
for
now,
a
way
for
people
to
upload
a
bunch
of
profiles
and
get
a
a
bunch
of
data
from
those
profiles
and
then
eventually
to
show
either
yeah
I,
don't
know,
show
the
benchmarking
data
on
whether
or
not
it
makes
sense
to
go
with
the
state
pool
or
the
stateless
protocol.
A
I
see
a
bunch
of
people
have
jumped
in
since
I
started
talking
but
yeah.
Hopefully
we
will
catch
everybody
up
to
any
of
the
recap.
They
just
missed
throughout
the
meeting
today.
A
If
not,
then
I
think
maybe
we
can
just
start
with
you
Dimitri.
If
you
I'll
paste
the
meeting
notes.
If
people
want
to
add
to
the
attendees
list
and
also
see
the
agenda
and
the
links
and
yeah
I
don't
know
demon,
do
you
want
to
kind
of
go
through
what
you
built,
what
it
does
so
far,
and
what
there's
left
to
do.
D
Yeah
so
last
time
we
talked
about
testing
the
hypothesis
that,
if
you
well
I,
guess
it's
not
really
a
hypothesis
like
if
you
remove,
you
know,
symbols
from
P
Prof,
the
bandwidth.
The
requirements
will
go
down
right
to
go,
the
throughput
will
go
down,
and
so
what
I
was
trying
to
do
is
kind
of
figure
out.
Okay,
if
we
did
remove
symbols,
if
we
remove
them
with
hash
symbols
or
something
like
that,
how
much
could
we
really
save?
D
D
It
generates
kind
of
three
versions,
one
where
symbols
are
completely
removed,
one
where
symbols
are
just
encoded
as
strings
and
the
last
one
where
symbols
are
hashed,
and
you
know
my
idea
was
that
that
would
be
like
simple
way
to
kind
of
estimate
how
much
we
could
save
from
using
a
system,
that's
similar
to
what
elastic
uses,
and
so
then
you
know
for
for
all
three
of
these
versions:
I
encode
them.
As
you
know,
protobuf,
blobs
and
I
also
GCF.
D
Those
profiles
and
then
I
report
in
a
table
the
sizes
of
each
one,
and
then
it
shows
you
a
percentage
difference.
D
You
know
how
much
do
symbols
taken,
how
much
do
do
hashes
take
in
addition
to
that?
Another
thing,
I
do
is
I,
take
all
the
symbols
from
all
of
these
files
and
I
put
them
together
and
I
dedupe
them
and
I
gzip
that
as
well
to
kind
of
see.
Okay.
D
If
we
were
to
remove
symbols
completely
from
from
the
profiles
we
upload
and
then
upload
symbols
separately,
like
I'm
trying
to
estimate
like
what
kind
of
savings
could
we
get
so
I
ran
this
on
a
few
apps
and
I
provided
examples
of
those
runs,
but
here's
kind
of
the
findings
so
for
most
apps,
especially
the
ones
that
get
any
traffic
that
have
High
CPU
utilization
profiles
are
typically
in,
like
10
to
50
kilobytes
range.
D
I.
Guess
I
didn't
mention
it
here,
but
these
are
all
CPU
profiles.
I
did
not
look
into
memory
or
any
other
types
of
profiles.
I
also
only
looked
at
goaling
profiles
at
this
point
so
yeah,
so
they
are
usually
in
this
kind
of
range
symbols.
Take
30
to
60
percent
of
the
total
size
of
B
Prof.
D
It
varies
for
a
couple
of
reasons.
I
noticed
that
when
the
profiles
are
on
the
smaller
side,
symbols
tend
to
take.
You
know
a
large
portion
of
that,
and
the
other
thing
I
noticed
is
that
when
you
use
labels,
symbols
take
I,
guess
less.
You
know
proportionally,
because
labels
now
take
a
lot
a
lot
of
space.
So
that's
why
you
know
the
the
difference
yeah
and
so
for,
for,
as
for
the
deduplication
part,
you
know
when
I
run
it
on
50
to
100
profiles.
D
You
duplicated
symbols,
take
also
roughly
10
to
50K
kind
of
similar
to
you
know
the
sizes
of
the
profiles
themselves,
so
so
that
I
feel
like
is
you
know,
that's
good
news
and
I
think
a
good
case
for
kind
of
you
know
continuing
with
this
stateful
approach
and
yeah
I
kind
of
summarize
here
like
in
theory.
If
we
found
a
way
to
not
send
symbols
every
time,
it
would
be
pretty
good.
We
would
get
a
pretty
good
reduction
in
size
of
each
payload.
D
B
F
I
guess
I
have
a
few
questions.
The
first
one
is
simple,
which
numbers
here
are
before
or
after
gzipping
the
profile.
It's
like
it's
a
30
to
60
size
of
the
total
P
profile
is
set
in
a
g-zip
people
for,
and
this.
B
D
Of
I
I
intentionally,
don't
even
you
know,
measure
before
g-zipping,
because
I
think
it's
you
know
I
guess
it
would
be
interesting
to
look
at,
but
I
don't
think
it's
like
important
at
the
end.
F
F
Guess
you
haven't
had
a
chance
yet
to
look
into
what
deduplicating
stack
traces
like
the
program
counter
lists
would
look
like,
which
would
be
very
interesting
and
the
second
one
is
I'm
a
little
confused
on
sort
of
like
the
further
bullet
points
on
like
using
hashes
for
the
symbols,
because
I
don't
think
that
would
be
needed
for
compiled
languages,
because
the
program
counters,
essentially
the
unique
identifier
for
the
symbol
and
you
don't
need
a
hash.
D
Yeah,
that's
a
good
point.
I
I
also
think
like
I,
don't
know
if
those
are
available
for
all
languages
and
but
yeah
I
don't
know,
I
I
also
don't
know
Sean.
Do
you
want
to
go
next
sure.
G
I
just
had
a
a
quick
question.
I
actually
think
glorian
asked
the
same
question
or
kind
of
made
the
same
comment
on
on
slack
just
in
terms
of
the
the
symbol:
removal
in
terms
of
how
our
stateful
protocol
works.
We
actually
instead
of
hashing
each
symbol
and
then
sending
in
a
hash
into
symbol.
We
actually
hashed
the
entire
stack
frame
and
then
just
send
one
hash
representing
the
entire,
like
stack
instead
of
say,
a
hash
per
symbol
per
frame.
If
that
makes
sense,.
D
Yeah,
so
so
I
almost
I
almost
got
there,
so
that
was
kind
of
my
next
point.
I
thought
that
you
know
this
naive,
like
let's,
let's
hash
symbols
themselves,
I
thought
that
would
be
kind
of
enough.
D
But
yes,
what
I
found
in
practice
is
that
this
does
not
improve
anything,
and
the
next
step
is
I.
Think
exactly
that
I
think
we
should
try
hashing
stack
traces.
D
Part
of
the
reason
why
I
did
it
this
way
is
is
that
it
was
also
much
kind
of
simpler
to
implement
you
know
in
in
P
Prof
you
get
like
this.
This
string
stable.
It's
really
easy
to
just
hash.
Every
single
one,
yeah.
B
H
Can
you
hear
me
now
yeah,
okay,
sorry
Zoom
always
confuses
my
microphone
settings.
One
I
had
one
question
to
Sean
Sean.
When
you
you
said,
like
you
hashed
the
whole
stack
Does.
It
include,
like
all
frames,
including
the
leaf.
The
reason
I'm
asking
because
like
when
you
sample
I,
would
expect
that
the
leaf
is
much
more
random
and
if
you
like
I,
don't
know
exactly
how
you
do
this,
but
if
you
would
sample
like
everything
except
the
leaf
and
then
send
the
leaf
separately,
that
might
be
even
more
efficient,
at
least
for
CPU
sampling.
H
G
Yeah
so
right
now
we
just
hashed
the
entire
thing,
but
yeah
we've
done
some
experiments
and
you
get
some
good
data
reduction
by
excluding
the
leaf,
and
then
you
also
can
get
a
little
bit
more
by
actually
excluding
the
leaf
and
the
caller
of
that.
But
we
haven't
got
a.
We
never
bothered
to
implement
it.
It's
kind
of
one
of
those
things
where
we're
like.
G
Okay,
this
gets
a
you'll
say
90
of
the
job
is
done
by
just
hashing
everything
and
then
I
think
the
it's
been
on
our
to-do
list
for
a
while
to
come
back
and
exclude
the
framed.
H
Okay
and
another
kind
of
like
open-ended
question,
I
was
curious
if,
like
if
profile
size
can
be
reduced
by
also
ordering
things
within
the
profile
in
some
way.
For
example,
if
you
take
the
string
table
in
profile
product
and
you
sort
it,
would
it
be
gzipped
more
efficiently
and
you
would
get
the
size
reduction
that
way
or
could
string
table
be
be
encoded
as
a
try?
For
example,
in
I
mean
like
there's,
I
was
just
looking
at
30
and
60
percent
and
I
was
thinking
like
well.
G
So
think
on
that
maybe
crystals
you
can
speak
to
this,
didn't
we
recently
also
start
doing
kind
of
string
interning
or
like
the
duplication
of
strings
in
our
wire
protocol
as
well.
E
Yeah,
so
we
did
this
some
time
ago,
so
effectively
we're
doing
similar
things
to
what
you
see
in
the
paper
format.
They
have
a
string
table,
we
have
a
string
table
and
then
I
did
a
whole
bunch
of
experiments
when
I
was
coming
up
with
that.
So
I
did
try
a
few
different
coding
schemes,
including
sorting
but
post
visit.
There
was
no
not
the
simple
difference,
so
we
went
with
just
simple
staying
interning
no
I
mean
that
doesn't
necessarily
mean
that
you
know
that
we
shouldn't
do
additional
experiments
going
forward.
E
That
could
be,
you
know,
games
that
are
to
be
had,
but
as
far
as
making
the
case
to
dig
around
I,
don't
think
we're
gonna
get
another
magnitude
difference.
This
is
these
are
also
the
numbers,
by
the
way
that
I'm,
seeing
in
my
own
tests
of
the
LA
I
spent
a
few
days
last
week
doing
experiments
with
the
hostages
coming
up
with
numbers
with
our
own
optimized
protocol
and
getting
a
super
representative
Benchmark
going
with
handcrafting
a
new
protocol
to
work
statelessly.
E
That
would
be
a
lot
of
work,
so
I
I
I
took
some
shortcuts
there,
but
it's
still
a
representative
in
in
a
good
way.
I
think
and
the
numbers
I'm
saying
is
on
the
argument
of
30
to
40
percent
Wars.
E
If,
if
we
did
things
statelessly
so
in
numbers,
this
basically
translates
to
a
single
mastering
single
agent
going
from
sending
less
than
300
megabytes
a
day
to
more
than
one
gigabyte
a
day
and
that's
with
the
20
Hertz
sampling
rates,
so
20
Hertz
is
not
a
lot
and
for
the
future
we
have
plans
to
to
drastically
increase
this
right.
So
that's
when
things
get
interesting,
because
that's
one
machine
sending
one
gigabyte
a
day
with
20
Hertz
we'll
go
to
200
Hertz.
E
Now
that's
10
gigabytes
a
day.
That's
for
what
I'm,
assuming
right,
the
whole
Fleet
amplifies
it
number
and
then
I'm
not
sure
what
the
current
cloud
eager
is,
of
course,
are
just
did
a
quick
Google
search
and
turn
on
the
order
of
0.08
dollars
per
gigabyte,
I
think
for
AWS
now,
basically
for
a
click.
The
100
machines
saying
that
data
translates
to
90
a
day,
egress
costs.
F
Yeah,
that's
super
interesting.
I.
Have
a
few
comments
on
the
question
of
sorting
things.
I
did
experiments
not
on
sorting
the
string
table
but
on
sorting
the
program
counters,
like
the
samples
in
p
profs
to
also
get
a
little
bit
more
compression,
but
it
didn't
move
the
needle
either
in
my
testing.
F
The
last
thing
that
was
set
by
Christos
about
the
increasing
the
sampling
rate,
causing
a
proportional
increase
in
bandwidth
I,
don't
think
that's
the
case
for
people,
because
in
the
people,
if
you
aggregate
into
a
one
minute
interval-
and
you
will
see
a
lot
more
stack
traces
being
the
same,
and
so
you
should
actually
not
see
a
linear
effect,
I,
don't
know
how
it
will
scale
exactly,
but
it's
probably
logarithmic
or
something
similar
to
that.
F
Yeah
I
had
a
question
on
for
Dimitri,
where
the
inputs
are
coming
from,
like
the
typical
size
30
to
50
kilobyte,
it
seems
a
little
low
from
what
I've
seen
on
my
end,
but
not
crazy
low,
like
I
I'm,
just
curious,
where
you
got
the
data
from
the
string
table
size
does
strike
me
at
a
little
high,
so
I
think
I've.
Seen
profiles
where
the
ratio
to
string
table
to
samples
is
a
little
different.
So
just
curious
where
the
data
is
from.
D
Yeah,
so
the
data
is
from
a
few
test:
users
that
are
running
in
basically
production
rate
data
from
some
users,
and
it
is
applications
with
you
know,
for
some
I
know,
you
know
CPU
utilization
and
I
mentioned
it
there
for
others,
I
I,
don't
know
how
you
know
how
how
much
do
those
apps
actually
work
and
I
mentioned
it
in
the
comments
as
well?
D
There
was
definitely
a
split.
You
know
some
Services,
it
seems
like
it's.
You
know
they
process
a
lot
of
data
constantly
and
those
tend
to
have
larger
pcrops,
and
there
are
some
services
that
are
clearly
kind
of
they're,
mostly
waiting
for
things.
B
D
And
those
tended
to
have,
you
know,
sometimes
even
in
like
5K,
arrange,
be
profs,
that's
kind
of
what
I've
observed
and
and
again
these
are
CPU
only
when
we
look
at
memory
profiles,
those
tend
to
be
much
larger,
but
I
haven't
written
analysis
on
like
if
it's
you
know
large
from
symbols
or
not.
A
Yeah
one
one
thing:
I'll
just
jump
in
real
quick
to
add
there
is
that
I
guess
yeah.
That's
also
part
of
the
you
know.
Obviously,
like
you
know
the
actual
like
analysis
piece
like
you
said,
there's
like
a
bunch
more
optimizations
that
could
be
made
me
and
Dima
talked
about
it,
some
as
well
as
he
was
doing
it
and
from
last
week,
I.
Remember
part
of
the
discussion.
A
The
elastic
folks
mentioned
that
you
know
that
you
were
gonna,
look
into
it,
that
it
would
take
some
time
and
so
I
figured
it
might
not
be
time
might
not
be
best
spent
with
Dima
trying
to
sort
of
you
know,
I
guess
like
recreate
what
you
already
are
more
familiar
with
in
that
sense,
but
I
am
curious,
hopeful
that
maybe
there's
some
way
to
both,
like
you
were
saying,
Felix
that
you
know
you
see
something
slightly
different
that
you
know.
Other
people
can
also
run
this
script
on
a
bunch
of
profiles.
A
You
know
it's
all
like
there's
no
actual
like
data
here,
it's
just
the
metrics
about
them,
and
so
you
know,
perhaps
you
could
get
a
know
Corpus
of
profiles
to
run
it
on
the
elastic
folks
could
do
the
same,
and
we
could
even
ask
you
know
Community
people
as
well
to
do
a
you
know
to
run
this
script
on
their
own
profiles
and
we
could
at
least
get
you
know
some
kind
of
mechanism
for
as
we
do
come
up
with
this
algorithm
or
I.
Guess.
A
However,
you
want
to
say
a
protocol
that
we
can.
Then
you
know,
run
this
script
and
get
you
know,
data
real
data
about
real
profiles
from
various.
You
know,
corners
of
the
profiling
world
and
hopefully
have
a
good
case
to
make
for
whatever
we.
Ultimately,
you
know
so
that
if
we
say
it's
50,
kilobytes
or
whatever
it
is
that
you
know
that's
substantiated
by
a
lot
of
data
that
that
comes
from
sorry,
I
didn't
I
went
before
you
met,
you
want
to
jump
in
met.
I
Sure
I
was
thinking
about
sort
of
this
problem
of
symbols
and
the
trade-offs
to
make
around
them,
and
it.
It
strikes
me
that
you
know
we
really
have
two
fundamental
streams
of
data
right.
We've
got.
You
know
all
of
the
symbolic
info
and
strings
and
other
things
that
multiplicated,
if
multiple
instances
of
the
same
binary
are
running
right
across
the
fleet
and
and
that
duplication
could
be
quite
extreme.
If
you're
talking
about
hundreds
of
thousands
of
nodes
or
something
like
that
or.
I
Before
you
get
to
very
large
clusters-
and
so
then
I
got
to
thinking
about
how
will
this
be
used
right
so
in
in
the
processors
in
The
Collector,
for
example
like
if
those
things
could
just
focus
on
one
of
those
two
streams
right
either
the
the
strings
and
symbols
that
fly
by
versus
just
the
stream
of
of
samples
that
that
later
need
to
be
combined
with
with
that
first
stream
right,
the
processor
has
become
much
simpler
if
we
actually
treat
it
like
a
multi-stream
protocol,
a
weird
analogy
might
be
like
tcps
out
of
band
data
capacity
facility,
rather
so
so
that
was
kind
of
just
one
idea
I
had
and
as
to
you
know
how
to
think
about
the
the
namespace
or
the
key
space
of
all
of
the
possible
symbols
in
an
unrelated
work
stream.
I
I've
been
looking
at
the
project,
Quine
q-u-I-n-e
and
they
kind
of
have
an
interesting
design
element
that
I
thought
might
be
relevant
here
where
you
know.
Basically,
in
a
nutshell,
it's
a
giant
graph
database
that
does
cool
things,
but
in
order
to
reduce
the
total
size
needed
to
store
all
of
the
nodes,
they
just
say
you
know
we
assume
that
all
nodes
exist
right
and
this
is
an
event
processing
thing.
I
So
it's
like
if
I,
if
there's
a
node
that
doesn't
exist
yet,
rather,
if
there's
a
note
that
I
haven't
gotten
any
attribute
data
about
or
any
metadata
about,
in
other
words,
I
haven't
heard
about
this
node,
it
doesn't
mean
it
doesn't
exist.
It
just
means
it
doesn't
matter
so
I'm
not
spending
any
storage
on
it
at
all.
So
we
could
kind
of
have
a
similar
approach
here
right
where
we
say
like
we
assume
that
you
know
for
every
individual,
unique
binary
that
has
a
unique
symbolic
info,
hash
or
I
ID.
I
You
know
just
say
across
all
languages
across
all
things.
You
know
we
can
have
a
global,
unique
key
or
just
assumes
that
they
all
exist,
but
we
don't
have
to
instantiate
them
all
and
so,
like
that
approach
to
a
protocol,
design
kind
of
makes
it
very
very
lightweight.
And
then,
if
you
have
very
large
fleets
of
things
running
the
same
thing,
you
know
you're
only
sending
samples
and
there's
no
duplication
at
all.
So
so
this
trade-off
kind
of
goes
away
a
little
bit.
So
I
was
kind
of
curious.
G
So
Matt,
just
on
the
on
the
first
point,
I'm
not
actually
sure
if
I
fully
got
got
what
you
meant,
but
just
so
for
us
in
our
stapler
protocol
like
we,
we
do
separate
the
like,
say,
ascending
of
frames,
from
sending
of
symbol
data
for
those
frames,
but
it's
a
fairly
straightforward
thing.
G
It's
just
like
essentially
two
like
different
types
of
messages
and
similarly,
if
we
have,
if
we
have
a
stack,
trace
and
frames
and
there's
no
symbols
for
that,
we
still
just
store
it
under
the
assumption
that
at
some
point
in
the
future,
something
else
will
send
symbol
data
but
like
that,
based
on
what
I
think
you're
describing
sounds
like
it's
something
similar,
but
maybe
I've
just
totally
missed
the
the
point
is
that
is
that
possible.
I
I
I
guess
I'm
asking
maybe,
if
I
make
it
more
concrete
like
say
we're
using
the
new
protocol
that
this
work
stream
comes
up
with
or
informs
the
creation
of
right
and
and
if
my
scenario
is
very
large
fleets
right,
then
you
know
I
almost
in
the
normative
case,
I
don't
want
any
symbolic
info
or
any
anything.
That's
not
just
samples
going.
You
know
being
sent
at
all
period
right,
because
I
really
only
need,
at
the
point
of
analysis
like
access
to
the
symbols
and
and
and
IDs
to
them.
I
So
when
we're
thinking
about
you
know
these
trade-offs
between
compression
and
how
how
you
know
the
trade-off
between
you
know,
stateful
versus
stateless
and
all
of
the
other
discussions
like.
Is
that
already
inherent
in
the
design
for
the
for
what
we're
thinking
like
like?
In
other
words,
if,
if
I
have
a
hundred
thousand
containers
all
running
the
same
binary,
all
being
continuously
profiled
right
is
there,
you
know
99
999
of
them
that
are
only
sending
samples,
you
know
because
they
don't
need
to
send
the
other
stuff.
I
Or
is
it
the
case
that
every
individual
instance
of
a
profiled
thing
is
going
to
negotiate
its
own?
You
know
stateful
protocol
with
the
collector,
where,
in
that
you
know
for
that
one
case,
the
symbols
are
only
sent
once,
but
but
they
they
are
still
sent.
Does
that
does
that
make
sense.
F
F
I
F
Think
my
my
understanding
is
that
the
protocol
stuff
we've
been
discussing
so
far
would
allow
symbols
to
be
sent
literally
just
once
when
you
create
a
binary
like
during
CI,
but
that
only
works
for
statically
compiled
code
that
doesn't
dynamically
create
symbols.
So
if
you
have
something
like
python
or
Ruby,
where
you
could
maybe
create
a
new
function
with
a
new
name
at
runtime,
that
symbol
will
have
to
be
sent
from
each
instance.
F
If
you
have
no
way
of
predeterministically
knowing
what
symbols
are
going
to
be
coming
out
of
that
thing
until
you
run
it
so
I
think
the
protocol
would
be
designed
to
support
both
cases
and
users
have
statically
compiled
binaries
could
choose
to
only
send
the
symbols
once
per
binary,
potentially
yeah,
not
going
through
the
collector
but
I
think
there's
value
in
still
sending
that
stream
through
the
collector.
So
it
could
do
some
filtering
on
those
symbols.
If
somebody
decides
that's
something
they
care
about,
but
yeah,
maybe
Sean
has
other
sorts
on
this.
No.
G
Just
like
in
practice
that's
how
ours
works,
so
the
for
Native
binaries
we
by
default,
never
send
any
symbols.
We
assume
that
somebody
will
have
like
a
CI
hook
that
will
actually
push
those
or
for
like
symbols
to
come
from
binaries
in
open
source
projects.
We
simply
we
in
the
back
end
mirror
those
debug
symbols
for
the
user,
and
then
we
automatically
inject
the
symbols.
But
as
Felix
was
saying
yeah
you
have
the
issue
with
with
languages
like
python.
G
The
ability
to
like
just
generate
new
symbols
on
the
Fly,
and
then
you
also
have
the
problem
of
let's
say:
you're
the
operator
at
a
large
cluster
you've
all
sorts
of
random
containers
containing
like
running
all
sorts
of
random
software.
G
G
G
The
the
basically
tackle
what
Felix
said,
I
think
the
the
protocol
that
we're
designing
should
handle
these
cases
and
I
mean
if
you
have
a
user
who
really
is
just
running
a
homogeneous,
Fleet
everywhere
and
they're.
No,
they
know
there's
no
symbol.
Creation
going
on
at
runtime.
G
There's
no
reason
that
the
protocol,
as
we've
kind
of
described
it
so
far,
wouldn't
be
able
to
support
that
I
think
so.
It
makes
sense
to
me.
A
Yeah
yeah,
so
I
guess
one
thing
that
I
definitely
want
to
make
sure
we
talk
about.
Today
is
just
you
know,
okay,
so
after
the
last
or
whatever
the
collector
Sig
meeting,
you
know,
Tyrone
had
mentioned
that
we
need
some
more
benchmarking
data
and
I
guess.
The
idea
here
was
to
take
a
step
in
a
direction,
hopefully
the
right
direction,
or
you
know
something
near
it
towards
getting
toward.
You
know
having
something,
a
benchmark
that
we
could
eventually
present.
A
You
know
to
the
TC
to
The
Collector
Sig,
to
whoever
about
you
know
what
yeah
like
you
know,
what
the
actual
data
is
like
I
know.
Like
Christos,
you
mentioned
that
you
found
certain
numbers
when
you
were
doing
some
analysis
sort
of
on
your
side
and
I
guess.
The
idea
here
is
try
to
bring
some
analogous
thing.
I
mean.
A
Maybe
you
can't
represent
the
entire
protocol
in
this,
like
you
know,
kind
of
small
sandboxed,
benchmarking
Suite,
but
I'm,
hoping
that
we
can
at
least
do
something
that
again
like
the
idea
here
is
that,
if
we're
going
to
go
with
this
stateful
protocol,
we
need
like
something
that
people
can
run,
that
we
can
show
them
some
output
metrics
that
justify
that
and
I'm
wondering
what
you
all
think.
If
this
is
a
step
in
in
that
direction,
and
if
this
is
something
we
can
you
know.
A
Obviously
this
is
a
first
step
and
there's
plenty
of
ways
like
Dima
said:
there's,
you
know
other
types
of
profiles
we
can
do,
and
we
can
maybe
even
add
that
as
a
column,
so
that
you
can,
you
know
eventually,
filter
this
by
Language
by
profile
type
by
you
know
various
things
and
still
you
know,
be
able
to
look
at
the
output
metrics
and
just
have
it
be
more
of
a
comprehensive
sort
of
Suite
but
yeah.
What
do
you
all
think
about
if
this
is?
You
know
something
that
we
can
potentially
build
on?
A
Even
if
it's
not
a
perfect
representation.
For
example
of
you
know
the
exact
stateful
protocol
that
you
use
but
of
a
you,
know,
stateful
like
protocol
and
what
it
might
look
like
if
we
were
to
implement
it.
Felix.
F
Yeah
I
think
the
the
question
comes
down
to
like
do.
We
all
agree
with
the
numbers
that
have
been
mentioned
so
far,
which
to
me
seem
to
hover
around
50
percent
decrease
in
bandwidth
for
realistic
scenarios.
Or
does
anybody
here
think
that
if
we
set
it
up
a
little
differently
we'll
be
closer
to
90,
because
I
think
that
would
make
a
big
difference
like
2x
versus
10x
would
would
make
a
compelling
case,
but
it
sounds
like
we're
going
to
be
somewhere
around
two
extra
reduction
and
then
twist.
Does
anybody
see
that
differently.
A
And
by
that,
do
you
mean
the
content
of
the
profiles
themselves
or
I?
Guess
like
the
way
that
they're
being
calculated,
because
if
it's
the
content
of
the
profiles
themselves,
then
I
would
say
we
could
just
add
more
profiles
to
this.
You
know
you
know
to
this
list
to
come
up.
You
know
like
if
you're
saying
that
the
profiles
you
see
are
different.
You
can
add
your
own
profiles.
That
would
you
know,
change
it,
but
if
you're
I.
F
Don't
I,
don't
think
the
ones
I
have
would
be
would
be
totally
different.
No
I'm
kind
of
wondering
like
if
we
think
that's
the
overall
bandwidth
requirement
of
like
sending
let's
say
CPU
profiles
will
be
more
than
2x
different
between
the
people,
approach
of
buffering
things
up
for,
let's
say
60
seconds,
sending
it
versus
see
it
like
stateless
approach
of
like
doing
the
hashing
and
all
the
good
stuff.
F
That's
what
I'm
curious
about,
because
if,
if
we
are
all
in
agreement,
then
I
think
we
like
sort
of
had
two
methodologies
Converge
on
similar
numbers,
and
then
it
doesn't
matter
too
much
on
how
much
more
detail
we
add
here.
But
if
we
think
that
the
methodologies
have
led
to
wrong
numbers,
then
we
should
continue
iterating
on
that.
Obviously,.
G
Cool
I
just
had
one
question
about
the
the
applications
that
were
profiled.
The
one
case,
that's
always
in
my
head,
is
like
the
case
of
java
programs
with
very
deep
stacks
and
very
long
function.
Names
and
I
was
just
wondering
I'm
guessing
like
Felix.
Like
you
work
with
Java
stuff,
it
sounds
like
and
Dimitri
I
don't
know
like.
Have
you
guys
included
in
your
in
your
testing?
Should
we
say
Java
applications
and.
D
So,
on
our
side
we
do
have
a
lot
of
Java
traffic.
The
problem
is
all
of
it
is
in
JFR
format
and
so
I
would
have
to
first
you
know,
convert
to
P,
Prof
and
I
I
guess
I
could,
but
that
that
would
you
know
that
would
be
quite
an
undertaking.
So
I
don't
have
those
I
think
yeah.
For
that.
The
easiest
thing
we
could
do
is
kind
of
go
from
the
other
end.
Right
like
it
would
be,
for
you
all
to
to
try
to
you,
know,
add
symbols
and
then
calculate
the
difference.
D
G
Cool
and
then
Felix
did
you
test
with
with
Java
application.
F
I
did
not,
but
I
just
had
a
thought
when
Dimitri
mentioned
that
it's
difficult
to
convert
shave
R
to
P
Prof,
which
might
be
prerequisite
to
make
a
fair
comparison
here.
Mark
Henson's
profile
or
pedia
has
at
least
one
pass
of
tools
that
you
can
chain
up
together
to
go
from
JFR
to
P
Prof.
So
maybe
that's
better
than
writing
something
from
scratch.
H
Yeah
I
have
two
questions
about
the
Benchmark:
one
is
I,
I
looked
at
the
output
and
as
soon
as
you
print
out
percentages
like
as
positive
percentages,
there's
like
no
symbols
and
then
symbols
is
X
percent.
Larger
I
would
print
it
out.
The
other
direction
because,
because
like
plus
100
is
minus,
fifty
percent
and
in
in
the
minus
percentage,
is
what
we
care
about.
H
I,
don't
I
don't
want
to
mentally,
translate
it
that's
again,
but
it's
just
it's
just
more
mental
load
and
another
question
is
how
many,
how
many
profiles
We
compare,
because
if
we
compare
like
just
two
profiles,
one
like
what
is
the
Delta
and
what
is
the
increase?
If
you
have
a
series
of
profiles,
then
the
savings
can
be
can
be
greater
because
you
ship
most
of
the
information
in
the
first
profile,
but
the
subsequent
profiles
would
be
well.
They
they
still
wouldn't
be
that
smaller,
like
right.
H
If
it's
a
series
of
yeah,
if
the
second
profile
is
2x
smaller
than
the
whole
series,
is
still
I,
guess
kind
of
took
smaller
right.
Yeah
yeah
I
was
curious
like
if,
if
it's,
if
you
need
to
look
at
the
whole
series
or
two
profiles,
as
you
know,
but
two
profiles
gives
you
essentially
a
lower
bound
I.
Think
correct
me.
If
I'm
wrong,
maybe
I'm
not
thinking
straight.
D
I
I,
don't
know
where
you're
getting
two
profiles,
so
so
each
line
in
that
you
know
output
is
a
separate
profile
and
then
for
each
one,
there's
three
numbers
one
without
symbols,
one
you
know
the
regular
one
and
one
with
cash
symbols.
Well,.
H
C
C
I
think
it
comes
down
to
the
to
the
comment
I
had
in
Spec
that
if
you
sent
the
profile
at
first
and
send
it
with
all
frames,
then
you
have
some
time
some
amount
of
data
you
spent,
but
for
the
second
time,
if
you
hash
the
complete
trace
and
just
send
128
bits
for
the
hash,
it's
a
huge
reduction
in
this
case.
So
if
you
send
it
multiple
times
and
I,
think
most
of
us
already
can
say
that
most
of
the
programs
are
repeating
something
at
some
point.
C
H
I
guess
one
practical
question
is:
do
we
want
to
extend
the
Benchmark
to
also
duplicate
the
full
Stacks
like
elastic?
Does
because
I
think
that's
one
of
the
things
that
was
mentioned,
that
is
kind
of
like
significant
difference
between
what
we
try
to
emulate
and
what
the
current
implementation
actually
does.
A
Yeah
I
guess
that's
the
one
that
I'm
most
interested
in
too,
like
what,
from
a
practical
standpoint
like
what
do
we?
How
far
is
the
current
bench?
You
know
what
the
pr
that
currently
exists.
How
far
is
it
from
you
know
modifying
the
code
of
that
PR
in
some
way,
or
you
know,
merging
the
pr
coming
up
with
a
separate
PR
to
add
that
functionality,
you
know
I
I,
guess
I,
don't
know
how
much
people
have
had
a
chance
to
like
really
dig
into
the
actual
code
of
it,
but
yeah
I
I.
A
Think
if,
if
it's
reasonably,
you
know
close
the
the
kind
of
action
item
from
last
time,
Marie
said
you
know,
elastic
will
look
at
measuring
the
efficiency
of
their
protocol
will
take
a
few
weeks.
Yeah
I'm
wondering
if,
like
this,
can
be
sort
of
the
you
know
the
single
place
where,
where
that
takes
place,
but
keep.
I
Tuning
in
a
processor
in
in
a
collector
processor
like
inline
or
or
do
you
mean
as
part
of
the
actual
protocol.
A
A
Yeah
I
don't
know
it's
I'm
curious.
What
people
think,
if
you
think
this
is
a
a
good
place
to
to
update
the
benchmarking
Suite,
to
kind
of
have
that
more
sophisticated
version
that,
like
the
one
you're
talking
about
fluorine
foreign.
D
Well,
I
would
kind
of
maybe
go
to
back
to
Felix's
point
and,
like
maybe
we
should
agree
on
like
the
theoretical
number
that
we
you
know
like
is
it?
Is
it
like
2x
maximum
difference
or
is
it
you
know,
3x
maximum
difference
that
kind
of
thing
and
I
don't
know
what
you
guys
think
but
like
we
could
potentially
even
go
with
that
and
say
hey
best
case
scenario,
we're
going
to
cut
this
much
of
you
know,
profile
sizes.
We
go
with
stateful
protocol.
What
do
you
guys
think
that
kind
of
thing.
H
For
me,
it's
like
a
two
10x
is
like
wow
2x
as
like
meh,
not
not
like,
not
exactly
my
but
it's
like.
If
someone
would
come
to
me
and
say,
like
hey
Alexi,
let's
have
a
stateful
protocol
for
this
system
and
I
would
say:
what's
the
Delta,
2X
and
I
always
be
like?
Are
you
sure
that's
something
to
carry
to
carry
for
years
or
I
would
like
try
to
check
my
benchmarks
to
see
like?
Is
it
really
2x?
Maybe
it's
10x
after
all,
but
I
wonder
what
others
think.
E
So
to
jump
quickly
in
here
and
answer
the
Felix's
question,
the
numbers
that
I'm
saying
with
Java
would
scan
the
watch
case
like
Sean
mentioned
so
down.
Benchmark
I
think
it's
all
coach
three
to
four
x
right.
So
that's
it's
not
2x!
It's
3
to
4X-
and
this
is
again
it's
a
preliminary
Benchmark.
So
you
know
in
reality
it
could
be
even
better,
but
that
will
take
more
than
a
month
of
development
time
to
actually
get
to
test.
E
E
Okay,
I'm,
comparing
stateless
of
stateful,
so
stateful
three
to
four
x,
better
in
terms
of
less
traffic
on
the
Y
right.
E
No
I
don't
have
a
specific
program
I'm
taking
our
own
elastic
protocol
and
I'm
simply
putting
the
information
back
in
right,
so
I'm
sending
everything
and
compressing
so.
F
So
you
go
from
the
stateful
data
you
have
and
you're
trying
to
create
stateful
files
out
of
them,
but
they're,
not
necessarily
GFR,
encoded
or
anything
like
that.
It's
just
you
just
dump
it.
Yes,
all
right.
E
So
I
I
took
I,
took
off
stateful
protocol
and
I
cannot
change
it
to
be
stateless,
so
it's
still
optimized
it's
still
using
the
duplication,
compression
and
everything.
So
all
the
numbers
that
I'm
mentioning
here
are
posted
application
post
compression.
So
it's
not
a
naive
Benchmark
it.
It's
somewhat
representative
of
three
other
conditions,
but
no
I
didn't
go
and
convert
it
to
people
for
JFR
or
anything
like
that
and
I.
Think
that's
another
point.
I
wanted
to
make
because
the
current.
E
Europe
is
not
really
the
best
format
to
to
to
use
when
we
talk
about
stateful
right,
because
essentially,
we
would
design
our
own
format
to
best
accommodate
a
stateful
protocol.
We
wouldn't
use
people
so
and
to
go
back
to
to
something
we
need
to
show
tigran
I
think
it's
not
going
to
be
easy
for
us
for
elastic
to
come
up
with
with
a
nice
related
Benchmark
that
that
works
in
the
same
way
that
our
hostating
works.
E
It's
going
to
be
a
lot
of
work,
but
in
in
terms
of
The
Benchmark
that
we're
looking
at
today.
Yes,
so
the
first
thing
that's
happened.
To
my
mind
is
we
could
simply
compare
paperov
with
a
custom
format,
but
simply
like
Florence
said
sends
passage
all
traces
and
it
sends
it.
E
It
has
the
trades
plus
an
Associated
count,
which
is
kind
of
the
gift
of
what
the
elastic
protocol
does
so
I
wouldn't
say
any
symbols
whatsoever
right,
because
you
would
assume
that
symbols
would
be
sent
separately
once
and
periodically
reset,
but
that
period
is
large
enough,
but
gets
amortized
out
over
the
long
run
and
it
wouldn't
affect
the
numbers.
G
Crystals
to
do
that
is
it.
It
seems
like
it
would
be
possible
to
do
that
in
the
context
of
Dimitris
framework
right,
yeah,
I
guess.
The
challenge
is
how
we
would
represent
something
like
Gather
in
that
right,
because
I
guess
there's
no
way
to
get
a
paper
off
profile
from
from
java.
Is
there
like
I
I,
exclusively
use
our
tool
to
do
whatever
profiling
so
I?
Don't
really
know
what.
F
F
Yeah
one
one
thing
I
also
want
to
add,
is
I.
Think
if
we
think
we
can
report
a
number,
that's
much
better
than
2x
for
potential
savings,
I
think
4X
yeah,
it's
not
10x,
but
we're
getting
closer
to
10x
I.
Think
then
it's
worse
yeah.
F
Getting
that
really
nicely
written
up
and
and
be
able
to
show
that
if
we
think
we
can
get
4X
instead
of
2x
or
maybe
even
more
because
I
think
if
we
present
2x
to
otel
I
think
it
will
be
difficult
to
argue
for
the
complexities
of
a.
G
G
Would
be
I,
guess
I'm
wondering?
Does
it
make
sense
to
do
both
crystals?
You
write
up
what
you
have
based
on
what
you've
done
just
by
making
our
existing
protocol
stateful
and
then
also
try
to
in
parallel,
let's
figure
out
some
way
to
make
the
representation
in
Dimitri's
format
as
faithful
as
possible,
to
what,
like
kind
of
say,
a
truly
optimized
like
status
protocol,
would
be.
Does
that?
Does
that
make
sense,
or
do
you
think
that's
Overkill.
G
Because
it
sounds
like
it
would,
it
would
be
valuable
to
have
if
we're
going
to
use
Dimitri's
like
Benchmark
and
whatnot
to
demonstrate
results.
We
should
probably
also
have
a
representation
in
that
of
what
we
think
the
best
possible
results
are.
Does
that
make
sense
besides
Java?
Is
there
anything
else
that
we
can't
get
a
good
P,
Prof
representation
from
that's
important
foreign.
A
I
guess
maybe
other
languages
right,
I
guess
like.
Would
it
matter
for
I
guess
what
like
python
or
Ruby
or
any
other
languages.
F
So
if,
if
needed,
I
I
can
get
people
from
all
those
languages
because
we
use
people
for
all
the
languages
not
just
go.
Java
is
the
only
exception
where
we
use
JFR
right
now,.
G
I
think
that
would
be
pretty
interesting
to
see,
because
if
we
can
essentially
within
this
framework
construct
what
we
think
would
be
an
optimal
representation
of
the
stateless
protocol
and
then
we
can
have
an
apple
Staples
comparison.
A
Yeah
Jonathan
your
hands
up.
J
Yeah
two
things:
one
of
the
the
data
conversion
I
have
looked
at
JFR
to
B
Prof.
The
general
case
is
hard
and
it
is
lossy,
but
if
you're
only
interested
in
CPU
samples
dealing
with
that
subsets,
actually
quite
tractable,
it's
it's
not
that
much
code.
J
The
other
point
was
around
amortizing
costs
and
the
benchmarks.
A
stateless
protocol
nevertheless
does
some
of
the
same
things
a
stateful
one
does
in
that
it's
batching
a
number
of
samples,
along
with
a
single
set
of
symbols,
and
the
interval
of
that
batching
is
quite
important
because
at
some
point,
you've
essentially
seen
all
the
symbols.
J
So
the
number
of
samples
you're
packing
into
I,
don't
know
a
60
second
batch
matters
and
I
think
the
benchmarks
need
to
reflect
that,
and
that
goes
to
what
load
is
on
the
server
as
well.
If
it's
changing
between
a
lot
of
different
things
over
the
course
of
that
interval,
that
minute-
and
that
looks
different
too,
if
it's
running
a
tight
loop
on
the
same
thing.
A
Yeah
I'm
gonna
make
sense
yeah.
So
we
have
a
couple
minutes
left,
yeah
I,
don't
know
I
want
to
what
we
think.
Next
steps
are
I
guess
to
be
clear.
On
that
yeah
I,
don't
know
I
mean
it
sounds
like
there's
no
I
guess
like
huge
objections,
at
least
to
the
like.
You
know,
Foundation,
that
this
pull
request
is
sort
of
starting
so
I'm
thinking,
maybe
as
next
steps.
A
If
you
know
some
people
from
this
group
can
can
check
out
that
pull
request
review
it
and
maybe
we
can
even
get
it
merged
and
then,
as
we
start
to
iterate
on
it,
you
know
we
can
kind
of
get
into
like
a
somewhat
regular.
Workflow
of
you
know
creating
a
pull
request
to
add
functionality
to
add
you
know
profiles
for
more
languages.
You
know
that
kind
of
stuff.
A
What
do
you
guys?
Think
of
that
as
at
least
a
hopefully
somewhat
easy,
concrete,
Next,
Step
and
then
yeah
I
guess
I'll
stop
there?
Do
you
guys
think
that's
fair
cool,
no
objections
speak
now
forever
hold
your
peace,
okay,
cool
and
then
I
guess
yeah.
A
We
can
even
make
some
issues
for
some
of
the
other
stuff
that
was
mentioned
today
and
yeah,
hopefully
make
progress
towards
that
I
I
definitely
agree
with
what
Sean
said
that
it
would
be
nice
to
have,
even
if
it's
not
a
perfect
representation,
just
some
sort
of
yeah
representation
of
I
guess
yeah,
the
the
more
efficient
way
of
I
guess:
hashing
I,
don't
know
how
to
describe
it
necessarily,
but
have
it
in
this
repo.
Just
so
that
again
we
have
like
one
place
where
we
can
say
you
know.
A
Look
here,
TC,
look
here,
collector,
Sig
and
look
here.
Community
I
mean
I.
Think
at
some
point.
Once
we
get
this
to
a
point
where
we
feel
comfortable
with
the
suite
itself,
I
mean
I.
Do
think
it
would
be
a
cool
even
just
like
to
involve
the
community
in
general
and
have
them
just
like
run
this
script
on
some
profiles
that
they
have
just
so
that
more
people
are
kind
of
involved
in
this
process
and
getting
familiar
with
profiling
and
stuff
I
feel
like
it
has
a
lot
of
other
positive
effects
too.
A
But
yeah
I
mean
that's
I,
guess
what
I
would
see
as
immediate
next
steps?
I
don't
know,
Felix
I
know
you
had
mentioned
last
week,
potentially
creating
a
draft
PR
for
something
I,
don't
know
if
you
wanted
it
or
I
just
shot
your
hands.
G
Up
just
a
quick
question:
just
before
we
move
on
Dimitri
had
you
planned
to
add
the
should
we
say
more
optimized
versions
of
the
the
stateful
protocol,
or
do
you
need
like
kind
of
more
Hands-On
deck
on
that
one.
A
Yeah
I
mean
I,
don't
know,
I
I
think
definitely
more
hands
on
deck.
I
guess
just
maybe
some
guidance
with
what
the
again
like
it
doesn't
have
to
be
perfect.
But
what
the
easiest
way
to
you
know,
estimate
approximate,
you
know
whatever
it
is.
A
You
know
what
that
might
look
like.
I
do
think
having
ul's
input
on
that
would
be
good.
Just
you
know,
since
you're
more
familiar
with
it,
yeah
I
say
like.
Maybe
we
can
just
create
an
issue
for
it
and
discuss
their
offline,
and
maybe
then
it
will
be
easy
to
come
up.
You
know,
with
with
a
board
there.
G
That
makes
sense,
I
think
I'll,
like
on
the
elastic
side,
we'll
sync
up
later
and
have
a
think
about
that
as
well,
and
then
yeah
just
contribute
that
way.
Yeah.
A
F
Yeah,
for
the
other
thing
you
mentioned,
I
had
planned
to
continue
with
the
pull
request.
F
I
had
already
shared
a
little
bit
in
the
last
meeting,
which
was
my
attempt
to
kind
of
take
the
otlp
protocol
buffer
stuff
that
collectors
using
and
trying
to
figure
out
how
our
profile
signal
could
look
there
I
had
not
did
not
have
a
chance
to
work
on
this
in
the
last
two
weeks
because
of
some
other
stuff,
but
I
will
do
it
for
the
next
meeting
and
have
some
more
to
share
on
that
yeah.
A
All
good
we'll
have
other
jobs
here
too,
so
yeah
all
right,
cool,
well,
I,
think
that
is
everything
unless
anybody
else
wants
to
add
anything
before
we
leave
otherwise
we'll
see
you
all
on
GitHub
and
in
a
couple
weeks
see.