►
From YouTube: OpenTracing Monthly Call - 2018-07-06
Description
Join us for Kubernetes Forums Seoul, Sydney, Bengaluru and Delhi - learn more at kubecon.io
Don't miss KubeCon + CloudNativeCon 2020 events in Amsterdam March 30 - April 2, Shanghai July 28-30 and Boston November 17-20! Learn more at kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy, and all of the other CNCF-hosted projects
A
B
A
B
C
Thanks
man,
so
yeah
I
didn't
prepare
a
whole
lot
better.
This
is
usually
a
pretty
laid-back
blast
off.
I
just
threw
together
a
few
slides
starting
about
20
minutes
ago.
So
I
apologize
for
the
lack
of
Polish
here
also
feel
free
to
jump
in
I.
Don't
know
how
many
folks
are
on
the
call,
but
if
someone
wants
to
jump
in
and
kind
of
guide
the
conversation
to
stuff
that
you
all
find
interesting,
that's
fine
by
me.
So
I
do
have
a
couple
of
flies
and
I
also
just
wanted
to
show
a
couple
of
things:
lives.
C
Okay,
well
we'll
play
by
your
than
plug
up
anytime,
so
yeah
30
second
intro.
Next
video
I
guess
then
I
already
kind
of
covered
it.
It's
a
distributed.
Columnstore
for
those
of
you
who
aren't
database
people,
column
stores,
basically
organize
the
data
that
you
store
sort
of
in
a
column-oriented
manner,
so
that
each
column
start
together
and
when
you
want
to
scan
just
one
column
out
of
say,
100
columns
in
your
table.
You
can
do
so
that
happened
to
waste
the
I/o
reading,
all
the
others,
which
makes
it
very
well-suited
for
analytics.
C
One
thing
that
we
did
included,
though,
is
try
to
actually
make
this
also
efficient
for
random
access.
So,
while
to
do
is
typically
used
for
analytics,
we
do
some
use
cases
that
are
pretty
random
access
oriented
and
we
do
run
some
benchmarks
using
them
more,
no
sequel,
random
access
style,
benchmarking
tools
like
whiteysd
that
people
might
be
familiar
with
it's
also
a
distributed
system,
so
we
use
rafts
for
replication.
C
Imagine
people
here
probably
know
about
rafts
is
basically
another
implementation
of
consensus
very
similar
to
multi
taxes.
So
we
do
care
about
latency,
I,
wouldn't
say
the
latency
is
our
number
one
concern
we're
not
typically
running
directly
web
facing
properties.
We
do,
but
we
do
usually
have
end
users
who
are
on
some
bi
tool
and
they
expect
queries
to
come
back
sub
second
and
oftentimes.
That
sub
second
query
actually
boils
down
to
hundreds
or
thousands
of
requests
underneath.
So
the
Taylor
agency
is
actually
pretty
important
where
one
tael
outlier
at
the
99th
percentile.
C
With
that
whole
idea
is
the
great
paper
that
I
really
like
called
tail
at
scale
from
Google,
maybe
six
or
eight
years
ago.
If
you
haven't
read
that
you
definitely
should,
if
you're
working
in
tracing
I
don't
want
to
talk
too
much
about
speedy,
though
I
think
the
way
that
we
approach
buildings
we
do
is
essentially
to
build
a
bunch
of
kind
of
generic
systems.
Infrastructure
people
who
work
at
companies
like
Google
or
uber
probably
have
a
lot
of
that
stuff
already
in-house,
not
open
source.
C
Unfortunately,
we
started
from
scratch
on
a
lot
of
stuff,
so
we
built
a
lot
of
these
things
that
probably
seem
familiar
to
people
from
either
other
companies
or
other
ecosystems
that
are
pretty
generic
to
any
distributed
system
software.
That
cares
about
this
kind
of
stuff
other
things
most
of
the
stock.
You
don't
need
to
know
anything
about
what
kudi
does
just
think
of
this
as
a
platform
for
building
high
performance,
low
latency
system
software,
so
I'm
going
to
jump
right
in
to
come
some
of
the
various
things.
This
is
kind.
C
Talk,
it's
not
like
there's
one.
Our
core
story
to
this
I
just
kind
of
here
are
the
various
things
we
do
well,
we
tend
to
be
useful.
The
first
one
is
pretty
simple:
to
request
scope
tracing.
This
is
probably
the
thing
we
do
that's
most
similar
to
open
tracing
where
we
have
a
by
the
way
who
knew
almost
all
in
C++.
So
all
this
talk
is
about
our
C++
back-end.
We've
got
a
macro
of
health
traits.
It
takes
a
little
substitution
string
with
dollar
sign
placeholders
and
pretty
much.
C
This
is
not
actually
a
hierarchical
tree
if
it's
not
really
dapper
or
open
trading
style,
it's
really
just
a
log,
and
when
we
accumulate
the
vlog
associate
with
our
PC,
we
sample
it.
So
we
have
different
sampling,
buckets
for
different
latency
profiles
and
also
we
actually
have
a
timeout
to
propagate
different
clients.
So,
whenever
a
client
sends
an
RPC,
it
says:
hey,
my
timeout
is
one
second,
and
on
the
back
end,
if
we
realize
that
we
responded
to
that
RPC
after
one
second
we'll.
C
Trace
that
RPC,
so
it
gives
us
a
pretty
good
idea
of
what's
happening
on
the
our
pcs
that
are
too
long,
very,
very
simplistic.
But
it's
again
it
took
like
you
know,
two
hours
to
write,
whereas
open
tracing
is
a
much
more
complicated
thing
and
it's
super
super
lightweight.
There's
no
infrastructure,
it's
all
in
process.
We
don't
need
to
hook
up
to
any
collectors.
Anything
like
that,
so
it's
limited
in
scope
I
mean
that
both
in
the
future
science
sense
of
the
word
scope
and
also
in
the
how
much
it
accomplishes.
C
But
it's
been
very,
very
useful
for
us
one
thing
actually
I
didn't
put
in
the
slides,
as
we
also
have
for
each
of
these
traces,
a
very
simple
map
with
counters.
So
if
you
look
at
an
RPC
trace,
we'll
have
the
log
and
we'll
all
have
a
bunch
of
counters.
Some
of
them
are
pretty
generic,
so
our
spinlock
implementation
will
count
how
many
cycles
where
it's
been
spinning,
and
it
reads
that
to
their
PC
and
then
we
also
have
a
lot
more
specific
to
the
particular
requests.
C
So
if
you're
doing
it
right,
we
have
to
write
it
right
ahead
log
and
you
might
have
time
we've
spent
waiting
to
write
to
the
right
ahead.
Log
becomes
a
counter
on
that
trace,
so
some
examples
in
just
a
minute.
Actually
we
don't
have
to
do
this
in
line
and
show
examples
while
I
talk
so
I
have
another
browser
here.
C
So
here's
a
server
I
just
started
running
on
localhost
I'm,
showing
the
RPC
v
page
I
wish
shows
the
running
our
PCs
and
sampled,
our
pcs,
but
never
made
any
other
pcs
to
the
server.
Yet
so
there's
nothing
in
there.
But
if
I
go
a
Python
shell
over
here
and
how
lift
tables
there's
no
tables
in
this
cluster
Kalitta
turn
did
it,
but
that
would
have
made
an
RPC.
So
if
I
reload,
this
page
I
can
see
the
current
RPC
connections
that
are
open
where
it's
from
the
state.
C
If
there
were
an
RPC
currently
running
at
show
up
in
this
inbound
connections
list,
and
then
we
can
see
a
sample
dirty
tea
trays,
so
the
trace
here.
Unfortunately,
my
browser
doesn't
show
the
new
lines,
but
you
can
see
the
time
that
it
arrived
how
many
microseconds
it
took
coming
on
with
a
call
to
you
coming
off
the
call
queue
handling
and
then
229
microseconds,
later
rq8,
a
success
response.
I
think
this
is
a
debug
build.
All
the
times
are
much
slower
than
you
normally
expect
in
early.
C
This
probably
would
take
you
know
a
few
microseconds,
not,
however
money
this
deck
is
100
microseconds.
This
is
very
simple.
You
know.
If
I
do
a
bunch
of
these
calls,
probably
all
of
them
are
going
to
fall
into
the
same
bucket,
we're
not
going
to
actually
see
it
as
change
as
a
weari
sample
once
a
second.
C
If
I
go
to
a
actually
one
of
our
production
servers
for
an
internal
use
case
here,
Clara
and
I
check
out
rbtv,
we
can
see
there's
a
lot
more
going
on,
there's
a
bunch
of
connections
open
from
Levites
and
hosts.
In
fact,
there's
one
called
it's
currently
in
flight,
and
you
can
see
that
the
client
sent
a
three
minutes
timeout
on
this.
This
is
a
scan
call
so
far,
astern
running
for
11
milliseconds.
C
C
If
I
go
down
to
look
at
some
of
the
more
interesting
things
you
can
see
here,
start
tablets
help
you,
which
is
one
of
our
rear
application
or
pcs
and
the
whole
information
about
what
happened.
And
then
here
the
metrics
that
I
mentioned
that
every
RPC
has
various
metrics
come
in
the
reproduce
narrow
our
I/o
code
will
account
these
metrics,
like
F
Data
Sync.
How
many
we
did,
how
many
microseconds
they
took,
how
many
microseconds
we
spent
waiting
on
mutexes
DNS.
Some
reason
those
refresh
started
a
thread
how
long
it
took
the
first
thread.
C
Every
thread
pool
that
we
use
has
few
time
and
run
time
a
little
CPU
run
time.
I,
don't
know
why
they're
not
in
alphabetical
order
here,
but
so
that
we
have
a
thread
pool,
is
called
raft
and
a
thread
pool
called
tablet,
copies
that
this
request
used.
We
can
see
this
tablet.
Copy
thread
took
quite
a
long
time.
This
is
actually
downloading
a
bunch
of
data
from
another
server,
so
it's
a
longer
request.
C
So
this
is
one
particular
sample
that
took
82
milliseconds,
but
if
we
scroll
down,
you
can
actually
see
there's
another
sample
of
the
same
RPC
that
took
longer
and
if
we're
lucky,
you
might
even
have
an
example
of
a
very
long
one.
This
is
pretty
useful
to
find
out
like
what
are
the
outliers.
What
happened
in
that
layer
that
was
different
from
other
outliers?
Maybe
it's
the
update,
I
think.
C
Maybe
it's
the
next
time
you
can
go
through
and
see
all
the
different,
our
pcs,
that
we
do
that's
sort
of
the
simple
RPC
tracing
that
we
do.
Another
thing
that
I
really
like
is
that
I
found
that
oftentimes
a
single
RPC
trace,
won't
tell
you
a
whole
lot.
It
will
tell
you
hey.
This
RPC
took
a
long
time
waiting
on
a
lock
or
took
a
long
time
waiting
at
I/o.
We
don't
really
know
what
happened
that
actually
caused
that
it's
some
across
request
interaction,
so
we
separately
have
an
infrastructure
called
process
while
tracing,
unfortunately,.
C
C
E
C
It
stopped
personally
doesn't
look
great
when
I'm
zoomed
way
in
for
the
screen
share,
but
you
can
see
on
the
top
there's
a
timeline
of
CP
usage
and
then
various
threads
down
the
left.
You
can
see
that
one
RPC
worker
was
actually
involved,
I
think
I
call
it
request.
I
called
lift
tables
four
times.
You
can
see
one
two
three
four.
If
I
zoom
way
in
here,
I
can
actually
do
the
time
line
that
this
call
started
here.
C
D
C
An
arrow
here
from
here
to
here
showing
that
the
call
was
parsed
here
and
picked
up
by
a
different
thread,
and
you
can
see
it
actually
included
in
this
trace
the
traces
that
we
just
looked
at
as
you
can
see
when
it
was
picked
up
when
it
was
handled
with
the
metrics
were
in
this
case
it's
a
pretty
uninteresting
call
of
no
metrics
and
then
responding
success.
So
this
is
again
not
a
super
interesting
RPC.
C
C
You
can
see
there's
a
lot
more
going
on
a
lot
more
threads,
a
lot
more
apiece
he's
going
on
and
there's
actually
some
RPGs
that
are
taking
pretty
long.
So
if
I
click
on
a
scan,
I
can
see
that
it
took
seven
hundred
five,
milliseconds
and
I
might
be
able
to
zoom
in
and
see.
This
is
continuing
a
scan,
meaning
that
it
started
in
the
previous
RPC.
It's
reading
some
blocks,
it
got
a
cache.
Miss
that's
probably
going
to
be
blocking
I/o
does
give
you
a
pretty
good
idea.
What
might
be
going
on?
C
C
This
is
all
very
useful
tis.
You
can
actually
see
kind
of
cross
request
when
one
thing
might
actually
be
causing
an
impact
on
another.
We're
also
able
to
see
pretty
interesting
patterns
in
red
thread
pools
where
we're
used
to
not
have
Lefou,
ordered
or
thread
pools.
So
he'd
round
robin
across
all
of
our
workers-
and
we
wouldn't
get
this
kind
of
nice
chunking
we're
only
a
small
handful
of
our
PC
workers
is
active.
C
It
would
actually
be
round-robin
across
100
threads
and
really
hurting
the
cash
performance,
and
things
like
that,
so
this
has
been
very,
very
useful
for
us
to
find
process
wide
lockups.
We
found
some
issues
at
TC
Malik.
For
example,
we've
seen
some
issues
with
the
linux
kernel,
where
the
MSM
before
gets
held
and
all
the
other
threads
block
for
apparently
no
reason
but
they're
actually
they're
all
blocked
on
the
lock
in
the
kernel.
But
I
found
this
very
useful.
C
It's
way
more
information
than
you'd
actually
get
from
something
like
open
tracing
and
it
captures
the
cross
request.
So
I
think
things
like
open
tracing
are
useful
to
pinpoint
hey.
The
server
has
high
latency,
but
when
you
actually
want
to
dig
into
what's
going
on
on
that
server,
this
can
be
more
useful.
Another
nice
feature
of
this.
This
is
actually
the
trace
viewers.
It's
built
into
Chrome,
so
I
can
type
save
here.
C
I
can
actually
save
a
JSON
file
and
we
often
are
playing
at
a
customer
site
on
premises
and
they
can
make
these
JSON
files
and
attach
it
to
a
support
ticket
and
then
I
can
load
it
into
any
other.
Kuti
server
or
even
in
chrome
I,
think,
is
good
about
tracing
and
load
the
wherever
that
JSON
file
went
and
it'll
load
in
and
display
on
anyone's
Chrome
browser.
In
fact,
I
might
even
display
a
little
bit.
Nicer
is
probably
a
newer
version
that
we've
embedded
in
q2
itself.
C
So
that's
the
process,
white
tracing
terms,
an
inner
process
racing
we
actually
haven't
had
a
big
need.
Yet
we
don't
have
a
lot
of
super
deep
RPC
call
stack
at
least
within
coop
I,
think
there's
some
cases
where
a
user
application.
If
it's
building
like
a
website
they
might
want
to
do
tracing,
in
which
case
we
want
to
support
it
for
the
consumers.
But
in
terms
of
Q
itself,
when
we
get
a
request,
our
request
is
going
to
maybe
wait
on
one
other
server
for
application,
but
that's
about
it.
So.
C
C
Macarons
I
probably
wrote
on
the
first
week
when
I
started
writing
to
do,
which
is
Coke
log,
slow
execution,
see
past
number
of
milliseconds
and
then
some
string,
and
it's
just
checks
if
this
particular
scope
that
you
put
it
in,
takes
more
than
X
number
of
milliseconds
it'll
log
out
a
statement
saying:
hey
I
took
a
long
time
to
do
X.
This
was
incredibly
useful
in
customer
environments
when
they
they
kind
of
called
up
and
say,
hey,
dude,
who's
being
a
little
bit
slow.
Oh
I,
upload
that
I
own.
C
That
I
don't
know,
here's
a
lot
of
figure
it
out
and
just
having
these
kind
of
markers
in
the
logs.
That
say,
hey,
look
right
into
the
right
ahead:
log,
a
bunch
of
threads
blog.
This
thing
saying
that
it
took
a
long
time
to
write
right
ahead.
Log
is
a
good
point,
as
maybe
you're
right
ahead.
Log
viscous
flow
or
overly
contended
by
other
applications,
and
things
like
that.
So
super
simple,
but
pretty
useful
for
the
amount
of
effort
it
took.
So
we
didn't
have
these
sprinkled
around
our
code
base
in
various
interesting
places.
C
So,
just
by
default
we
run
with
this
diagnostic
lock,
which,
if
put
in
a
long
directory,
in
this
case
it's
a
dead
built
in
temp.
So
if
I
look
at
that
file,
it's
semi
human
readable
and
basically,
you
get
stacked
race
records
which
are
by
default
once
a
minute
with
some
jitter.
So
we
don't
actually
correlate
with
any
kind
of
schedule
once
a
minute
tack.
So
this
one's
45
seconds
apart
this
one
is
another
45.
C
This
one
is
a
little
longer
and
then,
in
order
to
make
a
little
bit
smaller,
we
do
a
little
bit
of
dictionary
encoding
of
the
symbols
seals
here
in
the
stack
trace
line.
The
stacks
just
have
hex
addresses
and
then
inner
leaves
there's
these
symbols
lines
which
map
those
hex
addresses
to
particular
particulars
and
bolts
and
function
names,
the
other
type
of
until
we
put
in
these
logs
as
metrics
dumps.
So
we
have
a
lot
of
metrics
that
are
captured
from
the
server
histogram
counters
things
like
that
I'll
talk
about.
C
We
found
that
even
though
customers
may
have
centralized
metrics
collection
oftentimes,
those
do
a
lot
of
down
simple,
laying
or
aggregation,
and
it's
hard
to
get
down
to
what
happened
at
this
exact
minute
or
in
between
these
two
exact
minutes.
What
was
the
99th
percentile
log
of
pen
latency
on
this
particular
server
I?
Think
the
best
companies
in
the
world
probably
can
answer
that
question.
Most
companies
can't-
and
if
you
just
have
this
really
dumb
gzip
log
into
our
log
that
you
can
get
from
the
customer
and
look
at
this.
C
We
have
various
tools
you
can
take
these
logs
and
graph
them
and
calculate
various
derived
metrics.
That's
thank
you
very
useful
again.
It's
kind
of
the
simple
thing
but
works
pretty
well
and
we've
got
a
description,
tag
notice,
parse
stacks
and
get
some
some
our
own
part
stacks
on
this
slug
it'll
print
out
a
lot
more
information,
so
the
stacks
and
it
does.
The
symbolization
shows
if
my
thread
groups
together
threads
that
are
all
having
the
same
stack.
C
C
C
C
C
Can
point
out
these
issues
so
techniques
like
this
have
allowed
us
to
find
issues
like
ng
log,
for
example,
if
you
just
use
the
Google
login
library
and
it's
mode,
there
is
a
mutex
around
vlogging
and
that
mutex
could
be
held
while
it's
actually
doing
the
I/o
and
I
opened
take
a
long
time.
So
we've
seen
these
issues
where
all
threads
end
up
locked
on
G
log,
so
we
moved
out
to
isn't
logging
and
those
things
got
a
lot
better.
So
these
kind
of
techniques
again
pretty
simple,
but
work
really
well.
C
The
stack
traces
are
also
viewable
on
a
slash
stack
web
page
again,
unreasonably
effective,
simple
thing.
So,
if
I
go
to
one
of
our
production
servers
go
to
slash
stacks,
pretty
quick
and
I
call
it
a
kind
of
a
poor
man's
profile.
Also,
if
I'm
curious,
what
a
workload
is
doing
is
its
can
heavy.
Is
it
doing
a
lot
of
I/o?
Is
it
a
way
to
not
have
to
be
you
on
something?
C
Usually
just
a
couple
reload
this
page.
It
gives
you
a
pretty
good
idea
of
how
busy
the
server
is
and
what
might
be
some
bottlenecks.
So
it's
interesting
to
me
to
see
a
hash
table.
Look
up
on
the
serialize
row
block
Hall.
This
is
actually
unknown
performance
issues,
at
least
and
I
think
fixed,
so
very
poor
man's
profile.
Every
loaded
again,
I
also
see
the
standard
hash
table
signs
here
on
this
conflict,
and
that
probably
shouldn't
be
in
that
call.
We
should
have
something
a
little
faster
there.
C
All
right
as
a
slash
metrics
is
pretty
simple
a
lot
of
metric
stuff.
We
built
our
own
metric
subsystem.
We
couldn't
Lee
sign
much
good
for
C++
I.
Think
now
the
maybe
the
census
project
is
trying
to
do
a
little
bit
with
this,
but
we
implemented
the
HDR
histogram
data
structure
for
high
resolution,
histograms
and
all
of
our
pcs,
as
well
as
a
bunch
of
other
things
throughout
the
code
base
track
really
fancy
histograms.
So
you
can
see
in
this
example
that
this
particular
right
RPC
has
two
significant
digits.
Precision.
C
We've
done
some
number
of
them,
I
mean
all
the
percentiles,
and
these
actually
keeps
the
raw
socket
counts
as
well
underneath,
so
you
can
fetch
it
from
slash
metrics
if
you
had
a
special
query
parameter
and
they
end
up
in
that
metrics
log.
So,
given
the
metrics
log
and
given
snapshots
of
the
raw
bucket
count,
you
can
actually
say
between
any
two
points
in
time.
C
We've
found
some
bugs
there,
so
in
this
case
the
right
ahead.
Log
at
this
line
of
code
was
stuck
first
200
milliseconds
and
it
takes
the
kernel
stack
as
well.
So
we
can
actually
see
that
inside
the
kernel
it
was
waiting
on
jbjb
d2,
which
is
the
file
system
journal.
We
didn't
get
right
access
to
the
file
system
journal,
so
this
is
something
that
you
know.
You
probably
would
expect
right
ahead
luck.
The
fact
you
had
to
wait
for
a
600
milliseconds
is
a
little
bit.
D
C
Is
just
due
to
Red,
Hat,
Enterprise,
6,
being
really
old
and
pretty
bad
imitation.
A
lot
of
this
stuff,
as
you
can
also
see
in
the
user
stack,
is
in
do
right
because
kind
of
make
sense,
because
the
kernel
stack
is
in
the
right
beat
system
call
so
I
think
that's
all
the
slides
that
I've
prepared
it
and
want
to
go
too
long.
It's
a
good
questions
and
discussion
is
more
interesting.
D
I've
got
a
question
thanks
Todd.
This
is
pretty
interesting
and
I.
It's
fun
to
see
I
kind
of
knew
that
you
would
do
this
with
just
what
I
wanted
you
to
do
this,
but
it's
nice
to
see
a
presentation
about
performance,
analysis
and
stuff
like
that.
That's
not
just
like
100
percent
about
distributed
tracing,
because
these
other
techniques
are
really
interesting
relevant,
but
one
thing
that
comes
up
in
my
head,
I
think.
Actually,
this
is
a
fine
example.
D
It
sounds
like
in
this
case
it
was
the
issue
had
to
do
with
red
hat
kind
of
price
effects,
not
being
a
very
good
implementation,
but
a
lot
of
the
things
that
you're
probably
dealing
with,
have
to
do
with
contention
for
some
shared
resource,
whether
it's
the
disc
or
something
else.
Then
I'm
curious,
like
what
you
know.
Do
you
have
techniques
that
you're
using
to
understand
the
source
of
load
when
there's
just
no,
you
know
contention
issues
overloaded
resource
that
type
of
thing
like
what?
C
We
don't
have
any
super
generic
things
for
that
MINIX,
specifically
for
lock
contention.
Our
spin
locks
are
instrumented
with
apology
to
talk
about
that
here.
Our
spin
locks
have
some
instrumentation
where
they
collect
the
stack
trace
of
the
unlocking
node,
with
an
unlock
that
sees.
There
was
a
waiter
and
collected
spectrum,
so
it
kind
of
knows
which
folders
were
causing
attention
of
somebody
else,
and
then
we
expose
that
through
the
peopre
web
interface,
so
I'll
see
if
I
can
actually.
C
Show
that
yeah,
so
if
I
go
to
a
special
URL
which
can
be
read
via
that
Gopi
prof.
tool
as
well
it'll
tell
us
over
this
one.
Second,
the
various
stack
traces
where
we
had
some
contention
and
you
can
be
symbolized
if
you
have
the
binary
as
well.
So
this
is
super
useful
for
the
generic
kind
of
spin,
lock
contention.
C
Similarly,
you
can
get
CPU
profiles
from
this
kind
of
endpoint.
Honestly
I
find
the
flash
stacks
to
be
unreasonably
useful
for
this
kind
of
thing
as
well.
So
one
interesting
example
is:
maybe
six
months
ago
we
learned
that
TC
Malik,
which
is
the
alligator
we
use,
has
sort
of
six
free
lists
for
all
sizes,
less
than
one
megabyte
allocations,
but
one
megabyte
and
above
actually
goes
to
like
a
central
span
list
which
was
actually
implemented
as
a
linked
list
until
very
recently.
C
A
C
I
definitely
agree:
I
think
we
have
a
lot
of
systems
that
are
useful
in
the
right-hand
but
hard
to
expose
what
you
should
be
looking
at,
so
we're
trying
to
document
things
better,
we're
starting
some
run
books
for
our
internal
support
team
to
understand
like
how
these
things
might
be
useful
in
terms
of
like
correlating,
oh
I
saw
one
outlier,
but
I
wasn't
collecting
the
traces
at
that
time
yeah.
We
definitely
have
that
you
kind
of
have
to
like
hope
that
you
catch
the
thing
happening
that
you
want
them
to
see.
C
So
it's
a
little
hard.
That's
why
we
started
to
add
more
of
these
features
like
the
diagnostic
plug
was
just
always
digging
stack
and
that's
now
on
by
default.
It
took
us
a
little
bit
of
nervousness
to
be
like.
Is
it
actually
safe
to
have
this
thing
taking
factories
in
once
a
minute,
because
when
we
first
implemented
it,
we
actually
found
unlocked
in
the
dynamic
loader
if
you're
trying
to
back-trace
a
thread
while
it
didn't
the
loader,
and
we
have
awful
worker
appetiser,
read
that
so
that
I
think
there's
always
risk.
C
When
you
add
this
instrumentation,
either
performance
or
bugs
and
remember
actually
the
first
time
we
added
the
contention
profiling
I
introduced
this
awful
memory
correction
book
where
I
was
writing
outside
of
the
stack
and
that
almost
got
released
to
customers
and
it
would
be
really
bad
because
we
had
a
lot
of
crashes
and
things
like
that.
So
there's
always
risk
and
I.
Think
for
us,
it's
okay
to
have
even
like
a
5
or
10
percent
performance
reduction.
I
think
our
customers
are
not
so
performance
sensitive
and
there
are
a
lot.
C
Stuff
is
down,
and
they
don't
know
why
or
stuff
is
performing
badly
and
they
don't
know
why
and
it
takes
us.
You
know
it
takes
about
three
weeks
to
understand
what
the
performance
problem
is.
There'll,
be
a
lot
more
upset
versus,
if
you
say
well,
you've
got
a
5%
overhead,
but
we
can
pinpoint
that
problem
in
an
hour
instead
of
three
weeks.
It's
usually
a
good
trade-off
for
us,
it's
probably
not
the
case
for
every
company,
but
we
tend
to
lean
more
towards
that
side
of
the
spectrum.
A
Yeah
that
is
sort
of
it
around
there
is
well
one
trying
to
figure
out.
The
right.
Granularity
often
seems
to
be
part
of
the
trick,
yonder
being
potentially
dangerous.
You
know,
there's
some
there's
always
some
overhead
that
comes
with
this
stuff,
and
sometimes
it
just
seems
especially
writing
databases.
C++
stuff
people
can
be
very,
very
obsessive
about
maximal
efficiency
and
then
you're,
saying
well
we're
we're
just
gonna,
add
5%
overhead
to
figure
out
what's
wrong
with
it.
C
The
best
example
I
can
give
there's
like
yeah.
We
always
have
a
5%
overhead,
but
this
time
percent
overhead
has
allowed
us
to
pinpoint
performance
issues
that
have
saved
us,
40
or
50
percent.
I.
Think
we've
got
any
huge
gains
from
things
based
on
using
this
infrastructure,
so
we
never
added
a
spark
4%.
We
fully
stock
way
back
and
yeah
a
year
ago,
and
it
was
much
much
slower
to
give
to
spend
a
little
to
win
a
lot
cool,
great.
C
There's
one
last
thing
that
I
didn't
show
is
the
heat
profile,
which
is
another
thing
we've
turned
out
more
recently.
Oh,
it's
not
even
on
the
servers
we
turned
on
so
recently,
but
the
TC
Milan
keep
sampling
is
one
of
these
things.
It's
not
really
well
advertised,
it's
quite
low
overhead
and
I.
Think
probably
our
next
release
we're
going
to
turn
it
on
by
default.
D
A
C
Yeah
I
think
the
most
advanced
users
would
probably
find
use
I
think
a
lot
of
users
probably
don't
want
to
care
about.
This,
doesn't
just
hope
it
works
yeah.
We
certainly
use
it
on
the
dev
team
alone,
although
if
anybody
has
any
further,
questions
feel
free
to
come
on
the
git
er,
the
open
tracing
Gator
so
feel
free
to
ping.
You
there
and
I'll
I'll
check
in
later
today.
Awesome.
Thank
you.
Thank
you.
So
much
for.
C
A
Okay,
so
back
to
our
regularly
scheduled
programming,
we've
got
a
couple
things
on
the
agenda:
around
open
tracing
API
questions.
A
A
I
think
we
should
just
get
moving
on
it.
I
know
just
like
with
my
team.
We've
been
really
focused
on
getting
the
scope
and
scope
manager
released
for
Python
at
the
door,
and
so
we
haven't
felt
like
we've
personally,
had
bandwidth
to
to
also
release
and
manage
this
in
other
languages.
While
that's
going
on.
A
F
A
So
I
would
say
that
there's
like
two
issues
there,
one
it
it's
a
breaking
change
for
tracing
implementers,
but
not
a
breaking
change,
but
it's
it's
a
breaking
change
this
backwards
compatible.
So
you
now
have
to
expose
these
on
your
tracer
and
issue
a
new
version
of
your
tracer,
but
that
tracer
will
conform
to
the
older
API.
So
it's
not
like
you
need
to
fork
and
maintain
two
versions
and
four
users
of
the
code.
A
The
other
issue
that
that's,
maybe
more
serious
or
harder
to
see,
is
around
naming
these
methods
should
they
be
called,
trace,
ID
and
span
ID
or
trace
identifier
and
span
identifiers,
which
is
a
big
mouthful,
but
definitely
it's
the
chances
of
a
collision
with
a
pre-existing,
pre-existing
method
that
returns.
Something
else.
We've
seen
one
example
of
that
which
is
the
Mach:
tracer
has
Trey's
ID
and
span
ID
and
it
returns
like
a
UN,
but
we've
been
asking
around
actual
implementers
and
no
one
with
a
tracer
currently
binding
to
open
tracing
has
spoken
up
and
said.
A
No
that
won't
work
so
I
think
that's
really
the
the
final
bike
shed.
There
there's
been
a
lot
of
push
from
everyone
to
be
like
Tracey,
ID
and
span.
Id
are
nice
names
let's
and
it
doesn't
seem
to
mess
up
any
real
code.
So
let's
do
it.
I
am
a
little
nervous.
That's
a
little
show
up
too
late
and
say
hey
this
messed
with
me.
It's.
F
A
You're
casting
I,
don't
the
only
case
where
this
is
potentially
a
breaking
change
would
be
if
you
literally,
had
the
method
trace,
ID
and
span
ID
with
the
same
capitalization
and
everything
else,
yeah,
which
most
most
racers
had
I.
Think
that
that's
that's
been
the
question
asking
around
like
who
literally
has
this
method,
signature
that
returns
something
else
and
and
no
one
has
spoken
up
saying
that
they
do
the
the
other
answer
is
just
to
name
it.
Something
slightly
different
right.
I
think
that
that's
the
final
question
has
gets
resolved.
A
If
you
call
it
just
a
slightly
different
name,
then
you
massively
reduced
the
chance
of
there
being
a
collision
yeah.
No
one
called
it
trace
identifier,
because
that's
really
long
to
type
it's
just!
You
now
have
this
API
we're
asking
everyone
to
use,
and
it's
and
it's
got
a
funky
method
signature
as
a
result
of
this.
So
really
maybe
it's
like.
Can
we
do
a
more
exhaustive
audit
of
existing
tracers
that
bind
to
open
tracing
and
really
get
an
active
confirmation
that
it
will
or
will
not
be
a
problem?
Well.
F
As
far
as
Jager,
every
single
librarian
Hagar
had
a
traced,
ID
and
span
ID
in
the
most
idiomatic
form
for
the
language
right,
so
in
Ingo
it
would
be
like
upper
ID,
etc.
So
definitely
gonna
have
class
and
they
would
return
like
native
types
rather
than
strings.
Did
you
say
something
man
we've
got
that
question
anywhere,
but
you
said
like
Mach
tracer
had
the
same
thing
and
I
would
assume
most
of
the
traces
well.
A
Then
it's
just
a
matter
of
calling
it
just
calling
it
something
else.
I
think
that's!
That's
the
solution,
maybe
something
that's
not
as
long
as
identifiers
for
people
who
have
to
type
this
out
manually,
that's
hard
to
remember
how
to
spell
and
very
long,
so
I
really
think.
That's
that's
what
we
need.
A
A
D
Not
try
to
do
the
whole,
if
mean
I.
Think
URIs
question
is
like
how
you
know
this
has
been
a
known
issue
for
a
long
long
time,
some
languages
and-
and
you
know
the
natives
are
getting
restless
and
that
there's,
like
you,
know,
people
file
issues
frequently
about
this
and
we
kind
of
concluded
that
we
should
add
something,
but
that
we
have
made
progress.
D
10,
I
think
you're
accurately
saying
that,
like
there's,
basically
resourcing
issues
we're
doing
this,
it
seems
like
a
simple
change,
because
conceptually
it
is
that
like
it
does
require
a
bunch
of
roll
out
care
because
of
these
issues
that
were
bringing
up
so
I.
Think
the
question
is:
what's
the
next
step,
I
mean
and
now
I
would
rather
not
if
possibly
get
into
the
discussions
the
PR
discussion
in
this
call
or
whatever
event,
and
we
could
be
something
where
we
mean
the
opening
up.
D
The
PR
without
merging
it
in
most
languages
is
a
very
easy
thing
to
do.
I
mean
truly
easy
thing
to
do,
and
it
could
be
done.
You
know
like
without
getting
everything
through
opening
the
PRS
advertising
them
soliciting
comments
from
from.
You
know
implementers
that
sort
of
thing
could
probably
be
done
without
a
lot
of
tiny
investment,
at
least
that's
my
two
cents
and
is
the
stuff
that
could
be
taking
us
here.
D
It
would
also
allow
people
who
are
coming
in
and
filing
these
issues
to
see
that,
in
fact,
this
is
this
is
like
there's
something
in
motion:
I,
don't
know
how
you
feel
about
a
time
that
I
think
like
did
that
stuff
itself
could
be
kind
of
paralyzed,
so
to
speak.
So
it's
gonna
be
a
happy
but
remain
open
for
a
while,
anyway,
just
to
make
sure
people
see
them
and
got
a
chance
to
comment.
A
I
would
agree
with
that
yeah
and
again,
my
apologies
for
being
you
know,
maybe
too
focused
on
Python
right
now,
but
it's
there
has
been
a
long-running
PR
about
this.
We
could
essentially
socialize
it
a
bit
more
and
just
kind
of
put
it
out
there
make
tracking
issues
in
every
language
and
kind
of
announce
that
it's
coming,
but
yeah
I
am
I.
A
Do
think
what
you
just
said
does
me
around
coming
up
with
a
different
name
for
these
things,
just
to
make
sure
we
don't
collide
so
I
think
that's
the
final
bike
shed,
but
but
we
should
move
on
it
very
quickly.
Once
we've
resolved
that
I
think
if
we
go
with
the
approach
of
just
picking
a
name
that
has
a
low
chance
of
collision
with
anything
that
there's
no
reason
why
we
can't
get
a
release
candidate
out
in
every
language
quickly
and
get
people
to
start
binding
to
it.
A
A
A
A
A
Presentation
mode
here,
so
it
mostly
comes
down
to
having
both
scopes
and
spans.
So
we
added
a
sort
of
active
span
concept
to
open
tracing
so
that
the
tracer
would
be
responsible
for
managing
which
span
was
active,
in
which
context
and
if
you
have
some
kind
of
context,
switching,
whether
it
be
threads
or
some
async
level,
user
land
level
thing,
the
the
tracer
would
be
tracking
that
using
a
span,
a
scope
manager.
A
So
each
context
that
has
a
span
is
called
scope
and
you
can
ask
the
scope
manager
for
the
currently
active
scope
and
pull
the
span
off
of
it.
Scopes
have
to
be
closed
when
they're
done
and
that
doesn't
always
necessarily
line
up
with
a
span
being
finished,
because
you
may
be
moving
spans
from
context
to
context.
So
you
may
make
a
span
active
in
one
scope,
then
close
that
scope
move
this
span
to
another
scope
so
on
and
so
forth.
A
A
A
This
usually
doesn't
feel
too
onerous
because
you're
writing
code
inside
of
a
plug-in
or
an
interceptor
and
most
of
the
code.
You're.
Writing
is
really
focused
on
on
tracing
this
higher
level
concept,
so
that
doesn't
feel
too
bad,
however,
released
to
me
it
doesn't
feel
too
bad,
but
for
application
developers
if
you're,
if
start
work
and
finish
work,
contain
a
lot
of
application
code
and
you're
you're
doing
quite
a
bit
of
this.
It
gets
onerous
pretty
fast.
A
It's
also
hard
to
get
application
developers
up
to
speed
on
your
team,
because
there's
kind
of
these
like
extra
concepts,
you
know
you're
saying
build,
span,
start
active,
but
you
don't
get
a
span
back.
You
get
a
scope
and
then,
if
you
make
the
span
automatically
finish
when
you
close
the
scope,
that's
nice,
but
now
you're,
saying
scope
close
at
the
end
and
you
never
touch
the
span
there
either,
so
that
this
has
like
some
cognitive
load,
that's
sort
of
above
and
beyond
the
the
simpler
model
that
we
had
initially
envisioned.
A
So
if
you
look
at
a
simpler
API,
if
you
make
some
assumptions
that
you
can
make
when
you're
writing
application
code,
such
as
the
presence
of
a
global
tracer,
you
can
make
this
a
lot
more
declarative
right,
it's
possible
to
create
an
API
where
you
just
say
start
a
span
and
it's
automatically
made
active
and
then
you
can
access
it
declaratively,
because
you
have
access
to
a
global
tracer.
You
don't
even
necessarily
have
to
track
the
tracer
or
do
any
kind
of
object
chain
object,
method!
Chaining,
you
could
just
say:
hey
tag.
A
The
current
span
log
on
it
and
then,
when
you're
done,
you
can
say
hey
justjust
finish
this
thing,
so
I'm
not
proposing
this
precise
API
I'm
just
proposing
that
it
should
be
possible
to
produce
an
API.
That's
that's
this
simple,
and
in
order
to
get
application
developers
more
comfortable,
I
think
as
a
community,
we
should
push
for
for
providing
some
more
official
ergonomic
API,
if
not
looking
like
this,
at
least
at
least
something
with
this
level
of
complexity.
A
So
that's
that's
my
pitch
I'm
gonna
be
pushing
for
this
in
the
cross-language
working
group
starting
next
week
as
well,
but
I
was
interested
if
anyone
had
comments
on
this
at
this
time
or
thoughts
about
how
to
do
this
or
any
kind
of
experience.
Reports
from
working
with
scopes
in
active
spans
in
the
field.
G
A
Yeah
I've
heard
from
several
people
who
couldn't
be
on
this,
call
that
they're
very
interested
in
something
like
this.
So
you
know
we'll
have
a
discussion
online
on
Gitter,
but
there's
also
just
the
sort
of
general
issue
of
you
know.
Do
we
need
scope,
scope,
managers
that
kind
of
a
thing
Pavel
I
know
you
were
asking
about
that.
Do
you
have
any
thoughts
on
this?
D
You
haven't
I
think
that's
really
hinges
on
whether
you're
talking
about
instrumenting
stuff
in
library
versus
just
trying
to
get
your
work
done
as
an
application
developer,
and
these
sorts
of
higher
level
abstractions
I
think
make
a
lot
of
sense
for
the
latter,
where
we
want
an
easy
mode
type
of
experience.
But
as
Pablo
saying,
for
you
know,
very
meticulous
instrumentation
of
shared
libraries,
it
probably
makes
more
sense
to
avoid
the
Global's
and
stuff
like
that.
D
A
I
think
a
side
effect
of
making
something
like
this
more
officious
is,
is
to
make
it
clear
that
there
would
be
two
style
guides
when
you're
writing
instrumentation.
There's
a
style
guide
that
talks
about
you
know,
don't
presume
a
global
tracer
right
always
take
in
a
tracer,
is
an
option
and
fall
back
to
the
global
tracer.
It
won't
give
you
one,
and
basically
you
wouldn't
get
to
use
this.
This
cleaner
API,
because
this
API
makes
a
bunch
of
assumptions.
A
Yeah
start
span
finish
span,
maybe,
but
but
the
long
basically,
the
long
and
short
of
it
is,
can
we
take
the
scopes
and
scope
managers
and
make
that
a
concept
that,
as
an
application
developer,
you
never
have
to
think
about
that
you're,
not
necessarily
even
aware
that
they
exist
until
you
get
into
some
tricky
situation,
and
then
you
dig
into
the
docs
and
discover
there's
actually
these
lower
level
api's
that
you
can
use
to
deal
with
those
situations.
I.
A
Well,
let's
have
the
discussion
on
git
er.
This
is
Mason,
mainly
just
a
sort
of
advertisement
to
people
that
that
we
want
to
kind
of
get
moving
on
this
and
really
we
should
have
it.
You
know
in
a
forum
where
you
know
people
in
time,
zones
that
aren't
this
can't
make
it
to
this
call
can
participate
in
it,
but
I
would
if
people
have
ideas
about
what
this
kind
of
API
might
look
like,
or
you
know,
if
they're
already
working
with
application
developers
who
have
written
something
like
this,
it
would
be
great
to
start.
A
You
know
some
contribute.
Oh,
that
are
experimenting
with
this
one.
Nice
thing
is
I'm
fairly
certain.
We
can
write
all
this
without
actually
touching
the
tracer
API.
That
I
think
would
be
one
of
the
goals,
so
there's
a
lot
of
room
to
sort
of
experiment
with
different
approaches
to
this
and
contribute
moving
on
that.
F
One
thing
I
want
to
add
is
when
I
saw
there's
an
agenda.
I
thought
that
would
be
a
different
topic.
More
about
high-level
API
is
for
specific
operations
like
HTTP
requests
or
database
requests,
so
which
kind
of
I
mean
works
in
a
similar
way.
That
people
often
ask
like
for
some
standard
way
of
doing
these
things.
A
Yes,
I
definitely
think
we
need
those
as
well
and
that
could
get
get
wrapped
up
in
this.
For
example,
if
you
see
tag
where
we
say
some
tag
key
some
tag
value,
that's
fine
for
your
own
custom
tags,
but
actually
you
know
going
and
finding
the
constants
and
kind
of
gluing
them
together.
When
you
want
to
do
something
like
say
you
know,
login
error
or
an
exception.
D
Yeah
another
thought
on
this
is
that
we,
you
know
in
this
discussion
earlier
around
traces
span
IDs.
We
would
need
to
make
a
change
like
that
in
some
kind
of
coordinated
fashion
across
languages,
I
think
that
for
some
of
the
higher
level
primitives
they
they
actually
naturally
should
deviate
from
language
to
language
like
if
you're
working
in
a
Ruby
or
rails
environment,
or
something
like
that.
D
But
types
of
primitives
that
you
might
want
for
convenience
are
actually
different
than
what
you'd
want
and
go
and
so
on
so
forth
and
and
that
can
actually
make
the
stuff
go
bit
faster.
I
think
when
we
have
to
do
cross
language
stuff,
because
I
think
we're
now
dealing
with
like
nine
languages
or
something
like
that.
It's
a
bit
daunting
to
start
one,
those
projects,
knowing
how
much
parallel
work
is
going
to
have
to
take
place
and
for
this
I
hear
you
just
mentioned
around
HTTP,
and
things
like
that.
D
A
Yeah
totally
I
think
another
way
of
thinking
about
this
is
there's
been
a
lot
and
lot
of
work
of
trying
to
figure
out
what
is
the
correct
low-level
API
for
tracers
to
bind
to
and
that
work
has
been
slow
going.
It's
very
difficult
work,
but
we're
getting.
It
feels
to
me
getting
to
the
end
of
that
and
that's
starting
to
gel
up
and
now
it's
sort
of
time
between
this
kind
of
work
and
things
like
getting
span
and
trace.
A
Identifier
is
out
there
to
allow
people
to
start
building,
middleware
and
other
things,
we're
sort
of
moving
up
the
stack
to
application,
developer
zone
and
things
that
they
would
like,
and
that
world
is
definitely
much
more
opinionated
and
nuanced
and
there's
room
even
within
a
single
language,
to
have
more
than
one
way
to
to
do.
This
I
think
we
should
probably
offer
some.
A
You
know
official
version
of
this
at
some
point
just
to
lower
the
cognitive
overhead,
but
I
totally
anticipate
you
know
in
Java
there's
some
people
who
may
want
to
do
this
kind
of
thing
with
annotations
some
people
who
may
want
to
do
it
using
some
other
declarative
strategy.
Like
you
said,
Ruby
there's
a
lot
of
different
metadata
magic
approaches
to
doing
things,
and,
what's
great
about
doing
these
is
higher-level.
Api
is
like
not
everyone
has
to
agree.