►
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
This
is
the
agenda
first
I'll,
do
a
very
quick
introduction
to
open,
telemetry
and
distributed
tracing.
Then
I'll
talk
about
primskill,
which
is
a
free,
open
source,
observability
back-end
that
runs
on
top
of
timescale,
db
and
postgresql,
and
finally
I'll
show
how
you
can
use
open,
telemetry,
prompt
scale,
grafana
and
sql
to
better
understand
your
distributed
systems
by
using
a
demo
environment.
We've
created
that
you
can
get
up
and
running
on
your
computer
in
just
a
few
minutes.
A
If
you
want
to
dig
deeper,
I
recommend
you
check
that
blog
post,
you
ready,
let's
get
started
for
those
that
are
not
familiar
with
it.
I'm
going
to
take
a
few
minutes
to
introduce
open,
telemetry
and
distributed
tracing
open.
Telemetry
is
a
new
standard
for
instrumentation
that
is
hosted
by
the
cloud
native
computing
foundation.
A
A
A
In
today's
session
we
will
focus
on
open,
telemetry
traces,
since
they
hold
a
lot
of
valuable
data
to
understand,
distributed
systems
that
metrics
and
logs
cannot
provide.
But
what
is
a
trace.
A
trace
is
a
connected
representation
of
the
sequence
of
operations
that
were
performed
across
all
microservices
involved.
In
order
to
fulfill
an
individual
request,
for
example,
if
you
open
an
article
from
a
new
site
in
your
browser,
there
would
be
multiple
operations
saved
by
different
microservices.
A
Read
the
article
read
the
comments
for
the
article
and
request
ads
to
display
with
that
article.
Each
of
those
operations
are
represented
by
a
span
with
their
own
subspans.
A
span
can
have
0
or
multiple
children.
All
spans
have
just
one
parent,
except
the
initial
span
in
a
trace
called
the
root
span,
which
has
no
pattern.
A
A
A
This
is
just
a
high
level
architecture,
where
you
see
prom
scale
using
time
scale
db,
to
store
the
data
and
integrations
with
prometheus,
open,
telemetry,
grafana,
jager
and
any
tool
that
speaks
sql
as
a
node
time.
Scale
db
is
positive,
sql
with
time
series
super
powers.
Technically
it's
a
postgres
extension
and
so
also
get
access
to
all
the
capabilities.
Postgres
provides.
A
A
A
A
So,
as
you
can
see,
this
has
called
the
repo
and
then
I
just
run
docker
compose
which
will
download
all
the
different
images,
build
them
and
and
then
get
the
environment
up
and
running.
This
will
take
a
few
minutes,
so
we're
not
gonna
see
all
of
this
now,
but
you
can
do
it
on
your
laptop
and
we've
tested
with
mac
os
linux
and
windows.
A
A
A
A
Okay,
now
now,
let's
go
into
grafana
and
let's
check
all
the
different
dashboards
that
we've
built.
That
will
show
how
to
use
sql
to
derive
insights
from
tracing.
A
So
here
I'm
already
logged
in,
and
I
have
a
demo
environment
that
has
been
running
for
quite
a
bit
by
default.
The
demo
environment
comes
with
these
six
dashboards
and
we
will
be
looking
at
them
now.
One
thing
to
keep
in
mind
is
that
the
first
time
you
try
to
log
in
into
grafana,
it
will
ask
for
your
login
and
password
and
those
are
admin
admin.
Okay,
these
are
the
defaults
that
are
set.
A
You
know
the
the
micro
services,
since
this
is
a
since
the
architecture,
if
you
check
the
architecture
of
the
of,
if
we
check
the
architecture
of
the
application,
we'll
see
that
there
is
basically
one
entry
point
which
is
the
generator
okay,
so
this
is
basically
measuring
the
request
throughput
for
the
generator
the
throughput
you
know.
Request
per
second
is
one
of
the
golden
metrics
when
measuring
application
performance.
A
A
So
you'll
see
this
is
a
standard
time
series
component
from
grafana
and
what
we're
doing
is
we're
doing
this
sql
query.
Okay,
so
you
see
you
recognize
the
select
from
work
close
in
the
select.
What
we're
adding
is
we
use
the
time
scale
db
time
bucket
function.
This
creates
buckets
to
be
displayed,
so
it
aggregates
that
we,
we
then
group
by
the
time
bucket,
so
this
basically
aggregates
data
on
a
per
second
bucket
and
then
we're
doing
counting
everything
that
is
happening.
A
You
know
within
the
bucket,
so
that
gives
us
the
number
of
requests
so
we're
counting
all
the
expands.
Since
that's
what
we
have
in
the
from
close
we're
we're
querying
expands
counter
star
is
giving
us
all
the
span
that
meet
this
requirement.
You
know
where
you
know
parent
span
id
is
now
so
these
are
you
know.
Entry
requests
into
the
system
which,
as
I
mentioned,
are
basically
requests
to
the
generator
service,
and
so
this
is
the.
This
is
the
throughput
that
we
see.
A
You
know
it
comes
in
in
sort
of
waves,
and
we
see
you
know
max-
is
getting
to
11
requests
per
second,
but
we
also
see
in
some
buckets.
There
are
no
requests
at
all.
A
A
I
think
it
gets
a
little
bit
more
interesting
than
the
other
one,
so
the
other
one
was
obviously
showing
the
evolution
of
throughput
over
time,
but
this
one
is
give
us
is
giving
us
more
detail
this
one.
For
example,
if
we
look
at
let's
focus
on
this
table,
this
table
is
telling
us
for
each
service
and
operation.
A
A
So
let's
take
a
look
at
what
this
query
looks
like.
So
this
is
what
this
query
does.
It
looks
at
you
know
it
retrieves
from
so
it's
using
a
sub
query
again.
This
is
an
interesting
thing.
You
know
it's
something
that
is
available
in
sql,
but
not
necessarily
you
know
in
other
query
languages
for
or
you
know
the
observability
tools
offer,
and
so
in
this
case
we
have
a
sub
query.
We
have
an
initial
query
that
is
doing
a
select
again
on
the
span
view.
A
This
is
a
view
that
we,
the
prime
scale
exposes,
but
you
can
think
of
it
as
if
it
was
a
you
know,
regular
table
doesn't
matter
that
much
for,
for
the
purposes
of
explaining
the
sql
that
we
use
so
we're
querying
this
pan
view,
and
in
the
spam
view
we
have
a
service
name,
which
is
you
know
again
name
of
the
service
plate.
The
explanatory
span
name
span
name
is
typically
the
name
of
the
operation.
Okay
is
the
name
of
this
spam,
but
what
it
indicates
is
the
name
of
that
specific
operation.
A
And
we're
grouping
by
one
and
two
that
means
that
you
know
we're
grouping
by
service
name
and
span
name.
So
these
these
two
statistics
are
calculated
group
by
service
name
and
span
name.
But
that's
why
we
see
you
know
this
in
what
we
see
in
this
table
and
and
then
we
also
we're
using
you
know
two
variables
as
filters.
Okay.
A
So
if
we
go
up
because
we,
as
I
said
this
is
a
subquery,
so
we
have
the
we
have
these
results
and
then
the
only
thing
this
all
the
query
is
doing
is
just
taking
service
name
span,
name
and
then
calculating
the
rate,
the
error
rate.
Okay,
we
could
have
done
actually
everything
within
the
same
query,
but
you
know
to
make
it
easier
to
read.
We
just
used
a
the
subquery.
A
And
finally,
we're
ordering
ordering,
by
error
rate
descendants,
so
we
show
those
operations
that
have
a
higher
error
rate
at
the
top
okay,
so
with
this
information
very
quickly
we
can
see.
Okay,
so
generator
generate
is
the
one
that
has
a
higher
error
rate,
but
that
is
the
top
level
operation.
A
So,
let's
focus
on
you
know
next
level
and
next
level
we'll
see
process
extra
extra
process
upper
is
the
other
one
that
has
a
very
higher
rate.
There
are
some
other
operations
that
have
some
errors,
but
they
are
the
the
error
rate
for
those
is
much
lower.
So
probably
you
know
we
should
go
and
focus
in
this.
You
know
check
this
method
and
see
what's
going
on
why
we
have
such
a
high
error
rate.
A
As
as
we
mentioned,
you
know
you
you
have
here
at
the
top,
if
you
wanted,
you
could
actually
filter
down
to
some
to
some
specific
service
or
or
or
operation
in
here.
The
other
thing
we're
doing
here
is
that
we're
looking
at
the
evolution.
You
know
this
is
a
similar
to
this,
but
this
is
looking
at
the
evolution
over
time.
So,
if
you,
if
we
open
this
query,
we'll
see
that
the
query
is
pretty
much
the
same,
the
main
difference
is
that
we're
introducing
a
time
variable
here,
a
time.
A
Projection
in
the
select
that
is
the
time
bucket,
you
know
so
we're
calculating
this
stat,
the
the
error
rate
per
service
and
operation
in
a
you
know
on
a
per
minute
basis
and
we're
plotting
it
here
over
time.
A
Okay,
let's
move
to
the
next
one.
Let
the
next
one
is
latency.
Okay,
request
durations!
This
is
the
third
golden
synonym.
So,
as
I
said,
there
are
three,
so
we
have
throughput
error
rate
that
we've
already
seen
and
then
latency-
and
here
we
can
see.
A
Let's
look
at
this
chart
here.
This
chart
is
showing
the
evolution
of
duration
over
time,
but
we're
not
looking
at
average
we're
actually
looking
at
percentiles,
okay,
so
we're
computing
percentiles.
So
how
does
this
work?
Well
again,
let's
take
a
look
at
and
see
how
the
query
works.
So
here
again
we're
using
this
time
bucket
function.
That
time
scale
db
provides
to
group
the
data
in
brackets
of
one
minute.
A
So
then
it
shows
you
know
the
group
by
close
and
then
what
we're
doing
is
that
we're
looking
at
the
percentiles
for
99
percentile,
95th,
percentile,
19th
percent
percentile
in
the
median
or
50th
percentile,
and
to
do
that
we're
using
the
approx,
percentile
function
provided
by
time
scale
db,
which
looks
like
the
we
use.
A
This
percentile
act
function
as
well,
which
calculates
a
sketch
on
the
duration
millisecond,
which
is
a
data
structure
that
then
allows
us
to
compute
an
approximate
percentile
on
top
of
it
in
a
way
that
is,
you
know,
more
performant,
and
then
you
we're
just
plotting
all
of
those
here.
So
again,
you
know
we
can
use
the
power
of
sql
and
then
scale
db
to
compute
those
percentiles
and
we
could
use
any.
We
compute
any
percentile
that
we
wanted
here.
A
Another
thing
that
is
interesting
is
this:
histogram
of
durations
okay.
So
if
we
look
at
this,
this
is
showing
us
the
distribution
of
latency
for
for
request
again,
because
all
requests
go
through
the
generator.
You
know
this
is,
for
you
know,
all
generator
requests.
There
is
just
one
entry
point
into
this
micro
services,
environment,
and
what
we
see
is
that,
while
the
majority
of
the
requests
are
processed
in-
let's
say
maybe
let's
say
two
seconds
or
less,
there
are
some
of
those
that
are
extremely
slow.
A
You
even
have
requests
that
took
you
know,
30
seconds.
That's
that's
a
lot
of
time.
What
may
be
going
on
there?
Okay,
so
here
at
the
bottom,
we
have
another
interesting
thing
here:
we're
listing
individual
traces,
again
a
trace
maps
to
our
request
and
how
it
flew
how
you
know
it
went
through
the
system
so
we're
looking
at
individual
traces
when
they
happened
and
how
long
they
took
okay-
and
this
query
is
actually
showing
the
slowest
one.
A
So
let's
take
a
look
at
this,
so
if
we
look
at
it
we'll
see
that
we
have
a
number
of
traces,
you
know
the
start
time.
A
Duration,
as
we
saw
in
the
in
the
panel
in
the
dashboard-
and
this
is
what
we're
doing
so-
we're
talking
we're
displaying
the
trace
id
okay
and
we're
doing
this
replace
text
that
I'll
I'll
explain
why
we're
doing
this,
we
have
at
the
start
time
and
the
duration
that
we're
projecting
and
the
only
thing
we're
doing,
is
just
sorting
okay,
so
we
are
using
again
span
id
null,
which
means
this
is
you
know
the
root
span
and
basically
maps
to
a
trace,
a
full
trace
and-
and
the
only
thing
we're
doing
is
we're
just
sorting
okay.
A
So
it's
a
very
simple
query:
we're
just
searching
for
root
spans
and
we're
getting
the
top
10
that
the
slowest
one
right,
because
we're
sorting
by
duration
descendant
so
we're
doing
this
replace
thing.
Why
are
we
doing
this?
So
the
trace
ids
when
they
get
stored
in
prompt
scale?
They
have
they
use
a
uuid
format,
so
they
have
dashes
in
them,
but
you'll
notice
that
this
trace
id
here
is
underlined.
This
is
because
this
is
a
link.
We've
made
this
a
link
and
so
we'll.
A
Actually,
if
you
click
on
it
and
any
of
those
traces
will
open
the
grafana
ui
to
show
the
district,
an
individual
distributor
trace,
which
is
similar,
it
basically
reduces
the
code
from
jager,
and
so
with
this
you
know
you
don't
need
to
copy
and
paste
the
trace
id.
You
can
actually
just
use
this
linking
smart
thing
that
we
use.
Thanks
to
you,
the
amazing
capabilities
that
grafana
provides
that
are
very
flexible.
You
can
jump
straight
into
that
slow
trace
and
you
can
check
and
try
to
understand.
A
What's
going
on,
as
you
see
you
know,
there
are
a
lot
of
those
spans
that
are
very
quick
but
they're,
always
you
know
a
few
of
them
that
are
slow
and
if
you
check
closely
you'll
see
that
those
that
are
slow
actually
belong
to
this
digit,
and
actually
it's
the
random
digit
function
that
it's
slow.
Okay,
you
can
see
it.
You
know
very
quickly
here,
so
you
could
actually
go
back
to
your
code,
the
random
digit
method
or
function
in
your
code
and
check.
You
know
try
to
understand.
A
You
know
why
that
is
a
slow,
okay,
so
very
quickly,
we've
nailed
down
that
the
problem
is
related
to
this
specific
function,
at
least
in
this
trace.
You
know
we
could
look
at
other
traces
and
see.
Maybe
the
problems
would
be
different,
but
in
this
case
you
know
that
is.
That
is
the
problem
that
is
causing
this
trace
to
be
slow.
A
A
A
This
is
actually
something
that
is
typically
or
usually
impossible,
with
the
limited
query
language
that
other
observability
backends
provide,
but
because
we
can
leverage
the
full
capabilities
of
sql
provided
by
password
sql
we
can
do
joins,
and
in
this
case
we
join
the
spine
view.
The
span
view
with
itself
to
identify
paren
and
child
we're
using
k
here
for
kid,
so
identify
parent
and
child
spans
that
are
related
to
each
other.
A
A
The
other
condition
the
one
at
the
bottom
is
actually
very
important,
because
it
ensures
that
we
only
look
at
parent-child
relationships
across
services,
that
is
a
service
operation,
calling
an
operation
in
another
service,
and
so
we
remove
inter-service
relationships.
That
is
an
operation
in
a
service
calling
another
operation
in
the
same
service,
because
we
don't
want
to
show
those
in
this
map
where
we're
interested
in
cross-service
dependencies.
A
A
So
this
is
a
table
panel
grafana's
table
panel
and
the
query
pretty
much
uses
the
same,
join
okay,
so
it's
very
similar
uses
the
same,
join
but
we're
showing
you
know
a
set
of
different
stats,
so
we're
grouping
by
source
target
and
span
name
so
that
that's
the
grouping
that
we're
using
and
then
we're
showing
how
many
calls
are
happening
from
the
source
to
this
source
service
to
this
target
service
and
operation
and
the
total
execution
time
of
you
know
that
was
spent.
A
So
we
just
some
spans
and
we
just
compute
how
much
time
has
been
spent
in
this
specific
operation
across
all
spans
within
the
selected
time
window
and
then
the
average
execution
of
that
of
that
span,
and
so
here
very
quickly.
We
can
see
that
you
know
most
of
the
time
is
actually
spent
in
the
generator
calling
the
lower
service,
the
lower
service,
calling
the
digit
service
and
the
generator
calling
the
digit
service.
So
I
mean
it
seems
to
be
that
the
problem
is
actually
in
the
digit
service.
A
That's
the
service
that
is
very
slow,
and
I
think
we
already
saw
that
you
know
when
we
looked
at
the
traces
of
the
specific
trace
and
we
saw
that
a
lot
of
time
was
suspended,
digit
service.
So
this
is
just
you
know,
reinforcing
that,
and
that
is
not
just
an
individual
current
most
likely,
but
this
is
happening
consistently
across
or
over
time
and
across
multiple
requests.
A
A
Imagine
then
that
one
of
your
services
is
unexpectedly
going
through
a
high,
increasing
load
and
understanding
where
that
load
is
coming
from
in
a
micro
services.
Environment
is
not
easy
because
you
would
need
to
check
all
the
different
option:
services
that
end
up
calling
the
service
under
pressure.
A
So
let's
select
a
different
service
here:
let's
go,
for
example,
for
the
digit
service,
and
so
if
we
look
at
the
digit
service
and
the
you
know
the
the
slash
which
is
the
entry
point
operation
which
is
here
we
see
in
this
tree,
we
see
that
you
know
this
is
being
called
by
the
generator
it's
called
by
the
generator
through
it.
You
know
you
get,
is
a
http
get
request
to
the
service,
but
it's
also
called
by
the
lowest
service,
and
we
see
that
this
is.
A
You
know
that
there
is
a
you
know:
digit
operation
in
the
lower
service.
That
is
end
ups,
calling
digit,
which
is
I
mean
we
already
saw
in
the
service
map.
That
is
probably
wrong,
but
the
thing
that
is
interesting
as
well
is:
there
is
quite
a
bit
of
load
going
to
that
service
through
this
path.
Okay,
so
it's
you
know
close
to
half
of
the
load
is
generated
via
this
path
and
half
of
the
rest
of
the
load
you
know,
will
be
generated
by
this
path,
which
is
the
correct
one.
A
So
we
see
that
this
digit
server
is
probably
under
pressure,
we're
doubling
the
amount
of
work
it
needs
to
do,
because
we
have
you
know
so,
there's
some
something
wrong
in
our
code.
In
this
case,
and
again
I
mean
we
could
have
you
know
a
lot
of
other
hopes.
You
know
in
the
tree
of
spans
or
operations
until
we
hit
this
service
and
we
could
use
this
visualization
to
quickly
spot
where
most
of
the
requests
are
coming
from
again.
A
A
A
So
the
first
thing
to
note
is
that
you
know
doing
this
kind
of
thing
like
going
up.
You
know
in
the
chain
of
calls
is
something
that
would
be
very
tedious.
You
know
if
you
had
to
do
this
without
a
powerful
query
language,
because
basically,
we
need
to
recursively
traverse
the
tree
of
spans
up
across
all
traces
that
involve
our
problematic,
and
so
you
know.
Luckily,
you
know
we
can
leverage
the
power
of
sql
again
and
in
this
case,
what
we
use
is
a
recursive
query.
Okay,
so
we
use
this
construct.
A
Our
recursive
query
and
the
way
it
works
is
that
there
is
an
initial
query
that
will
get
executed,
which
is
this
one
and
we
see
you
know.
Service
and
operation
are
the
ones
that
you
selected
from
the
drop
downs
in
grafana,
and
we
it
runs
this
query,
which
is
retrieving
basically
all
spans.
You
know
I
have
you
know
some
data
for
all
expands
that
match
this
specific
service
and
operation,
and
then
it
runs
the
results
so
x.
You
know
our
are
the
results
from
this
initial
query.
A
It
runs
them
through
this
other
query
and
basically,
what
this
is
doing
is
a
join
where
it
checks
that
x,
so
the
the
results
from
the
the
it
looks
at
the
results
from
the
original
query
reads:
the
parent
span
id
and
then
it
checks,
for
you
know
this
new
table
that
we're
joining
again,
which
is
again
the
same.
You
know
span
view
table.
If
you
want
it
collects
it,
it
looks
at
comparing
and
ensure
that
we
retrieve
the
parents
okay.
So
basically
s
in
this
case
will
represent
the
parent
of
x.
A
So
we're
going
up
one
level
and
we're
projecting
all
these
different
values
from
the
parent
span
and
because
this
is
recursive,
it
will
do
the
same
thing
again,
so
it
will
take
the
results
that
we
just
got
this
results
here
and
run
this
scoring
again
against
it
again,
so
it
will
do
again.
The
recursive
thing
it
will
check
okay.
A
So
I
have
to
look
at
this,
the
values
that
I
have
I
inject
them.
You
know
I
inject
them
into
x.
Here
and
again,
we
look
for
okay
for
each
of
these
spans
that
were
returned
here.
Let's
look
for
the
parents,
okay
and
let's
retrieve
the
parent
spans,
and
we
do
that
again
and
again
and
again
until
there
are
no
results
returned
okay.
A
So
this
is
how
this
recursiveness
works
and
once
it
has
built,
you
know
that
table,
because
you
see
this
union
all
is
just
appending
all
those
results,
the
results
from
the
first
query
and
all
the
subsequent
queries.
That
is,
that
navigating
upstream
you
know
through
the
spans
it
runs
on
those
results.
It
runs
this
query:
okay,
which
is
doing
okay.
Returning
using
a
span
is
the
service
name
and
the
span
name
the
operation
to
generate
an
id,
so
this
is
will
be
we're
generating
one
node
for
each
service
name
and
span
name.
A
Something
important
to
notice
here.
Is
that
we're
not
excluding
inter-service
operations
because
we're
actually
interested
in
seeing
them
in
case
the
increase
in
calls
was
coming
from
an
internal
operation
within
the
service
and
not
generated
from
something
outside
it
could
be
maybe
something
wrong.
You
know
a
new
deployment
that
we
made
and
maybe
cause
that
problem,
so
so
we're
not
excluding
and
we're
actually,
including
inter
service
operations
as
well,
and
then
we
add
service
name
as
a
subtitle.
A
I
don't
think
that's
necessarily
needed,
but
just
in
case
we
use
this
thing
so
that
we
remove
any
any
potential
duplicates
there
and
we
have
an
accurate
count
and,
and
then
so
we
and
then
what
we
do
is
we're
grouping.
You
know
those
results
by
service
name
and
span
name
so
that
basically
grouping
by
node.
A
So
these
are
this
is
the
query
for
the
nodes
and
the
edges
uses
a
very
similar
query.
But
so
again
you
see
this
join
here.
What
is
traversing
app?
You
know
from
the
current
set
of
results.
Let's
get
the
parents
and
project
them,
but
it
also
adds
a
bunch
of
additional
information
because
in
here
we're
interested
in
the
edges,
so
we're
project
projecting
the
the
id
for
the
relationship,
which
is
you
know,
service
name
span,
name
from
the
source
to
the
service
name
and
the
spine
name
of
the
child.
A
So
that
is,
you
know
the
relationship
between
two
notes,
essentially
in
the
graph
that
we're
displaying
and
then
we
also
are
doing
the
target
and
source
we're
using
the
md
yeah.
Okay,
we're
doing
an
md5
on
the
service
name
and
a
span
name
again
to
compute
ids
for
those
and
then
we
just
project
here
again
the
same
thing
where
we're
projecting
is
the
target,
the
id
the
target
and
the
source
you
know.
So
the
the
node
panel
can
actually
connect
the
dots
between
the
the
services,
so
they
need
to.
A
A
Okay,
so
we
saw
we've
seen
how
we
can
troubleshoot
scenarios
where
you
know
we
have
a
service
that
is
having
some
issues.
You
know
we
can
actually
navigate
up
through
the
stream
of
through
the
sequence
of
spans,
in
all
across
all
the
different
traces
to
understand
the.
How
did
how
this
service
is
being
called.
You
know,
what's
the
impact
of
things
happening
upstream
into
the
service
we're
looking
at?
A
We
can
do
something
similar,
but
in
this
case
using
downstream
expanse.
Okay.
So
let
me
let
me
make
this
bigger,
and
this
is
showing
again
here
I
have
selected
generator
and
http
get
so,
let's
actually
select
generator
and
the
generate
operation,
because
that
is
the
entry
point,
and
so
this
is
showing
an
entire
map
of
all
the
requests.
A
Spanner
action
dependencies
dashboard.
The
only
difference
is
that
in
this
case
the
join
is
the
other
way
around.
Okay,
so
we're
looking
before
we
had
the
x
pattern
span.
Id
equals
s
span.
Id
so
here
is,
is
the
other
way
around,
so
we're
looking
for
xs
pan
id
being
the
same
as
the
pattern
spanning
so
we're
just
going.
You
know
downstream
okay
and
then
we
project
the
children
and
then
again
we
do
the
same
same
operation
here.
A
So
it's
a
very,
very
similar
thing,
so
I
will
not
explain
it
in
detail,
but
you
can
just
show
me
you
can
navigate
upstream,
but
you
could
also
navigate
downstream,
and
you
know
this
gives
you
a
very
interesting
map
of
all
the
different
calls
that
happen
in
the
service
across
you
know
the
you
know
in
this
case
the
last
50
minutes
once
you
know,
for
for
all
the
requests
to
the
generator
service.
A
A
Another
thing
that
I'll
explain
here
in
this
dashboard
that
is
interesting
as
well.
Is
this
one
and
this
one
is
looking
at
the
total
execution
time
but
operation,
but
it's
not
doing
this
just
by
blindly
adding
up
the
duration
of
all
spans.
For
that
specific
operation
is
actually
looking
at
time
actually
spent
in
the
code
of
that
operation.
That
is,
it
is
subtracting
the
time
spent
in
child
spans.
A
A
No,
it's
actually
looking
at
how
much
time
is
spent
within
that
the
code
of
that
span,
and
so
that
you
can
identify
where
the
bottleneck
is
because
otherwise
it
will
always
the
one
that
is
at
the
top
of
the
hierarchy
will
be
the
one
that
shows
us
being
the
slowest.
But
here
it's
not
the
case.
You
know.
A
If
you
see
this
query
the
slowest
one
where
most
of
the
time
is
spent-
and
we
already
have
seen
this-
you
know
over
the
course
of
this
presentation-
is
the
digit,
random,
digit
method
or
function
is
where
most
of
the
times
I
spent
88
percent
of
the
time,
the
spender.
So
definitely
this
is
the
first
place.
We
should
go
to
optimize
the
performance
of
our
service.
A
If
we
didn't
be
subtracting
time
from
child
spans,
the
one
at
the
top
would
have
been
the
generate
password
from
the
generator
service,
because
that's
the
top
level
one,
and
so
all
the
time
is
adding
up
into
the
duration
of
that
span.
So
how
do
we
do
that?
This
is
actually
really
important.
You
know
it's
this
idea
of
okay,
I'm
looking
at
where,
specifically
in
which
code
is
specifically
the
time
spent,
and
by
doing
this
subtraction
of
okay,
the
parent
span
duration.
I
re
I
subtract
the
time
span
in
the
children.
A
Then
I
get
the
actual
execution
time
within
that
that
specific
code,
it's
really
helpful
to
understand
bottlenecks
and
the
way
we
do
it
is
again
we
use
is
the
same
thing
again.
We
use
a
recursive
query
to
traverse
all
spans
and
assign
time
to
the
different
actual
time
spent.
You
know
in
the
different
spans
and
what
we
do
is
this
thing
that
you
see
here.
This
is
the
key.
This
is
the
key
thing
and
what
it's
doing
is
subtracting
to
the
parent
span.
A
Duration
is
subtracting
the
sum
of
the
duration
of
all
the
children.
Okay.
So
it's
looking
where
the
span
id
you
know
this.
This
span
here
is
the
parent
okay.
So
so
it's
subtracting
all
this
time
and
coalesce.
What
it's
doing
is
if
this
return
null
so
no
data,
it
just
says
it's
zero,
so
it
basically
doesn't
there
is.
There
is
no
number
you
don't
need
to
subtract
any
time
to
the
duration.
It's
this
is
only
useful
for
leaf
spans.
You
know
they
don't
have
any
children.
A
I
hope
that
you
enjoyed
it.
We
showed
that
with
open,
telemetry,
prom
skill
and
grafana,
you
can
get
insights,
you
didn't
think
I
don't
think
you
would
think
you
thought
were
possible
thanks
to
the
power
of
full
sequel.
A
I
encourage
all
of
you
to
download
the
open
telemetry
demo
today.
All
the
software
we've
shown
here
is
available
on
github
and
it's
free
to
use
and
if
you
have
questions
about
prompt
scale
or
the
demo
environment,
we're
available
in
the
pram
scale
channel
in
our
slack
community
that
we
see
you
see
here.
I
just
wanted
to
take
the
time
to
thank
you
for
for
watching
this
webinar
and
I
hope
to
see
you
on
our
slack
community
soon.