►
From YouTube: OpenShift Commons Briefing #82: Distributed Tracing with Jaeger & Prometheus on Kubernetes
Description
Jaeger was inspired by Dapper and OpenZipkin and is a distributed tracing system released as open source by Uber Technologies.
It can be used for monitoring microservice-based architectures:
*Distributed context propagation
*Distributed transaction monitoring
*Root cause analysis
*Service dependency analysis
*Performance / latency optimization
In this briefing, Uber’s Yuri Shkuro and Red Hat’s Gary Brown, both core contributors to the Jaeger project, will give an introduction to using Jaeger with Prometheus on Kubernetes.
Find out more here: https://github.com/uber/jaeger/blob/master/README.md
A
A
Do
their
presentation,
you
can
ask
questions
in
the
chat
and
we'll
have
an
open,
live
Q&A
at
the
end,
and
all
of
this
is
being
recorded.
So
don't
try
and
scribble
notes
fast.
There
will
be
all
the
links
at
the
end
to
the
references
or
all
of
the
stuff
that
we're
talking
about
and
with
that
I'm
gonna.
Let
URI
take
it
away
and
introduced
himself
and
I'm
looking
forward
to
the
discussion
afterwards.
B
Also
a
member
of
specification
Council
for
open
tracing,
and
so
today
what
I'm
going
to
talk
about
is
really
to
demonstrate
why
open
tracing
and
tracing
in
general
is
a
big
deal
in
the
Microsoft
world
and
I
will
do
it
with
intro
into
what
distributed
tracing
is
humans.
Some
people
might
not
know
exactly.
What
is
I
will
also
show
you
a
demo
of
like
really
demonstrate
why
it's
useful
in
an
example,
application
and
then
and
that's
pretty
much
it,
and
so
basically,
what
distributed
tracing
is
the
way
I
tend
to
think
about.
B
It
is
a
new
way
of
monitoring
for
micro-services,
and
so
we
can
ask.
Why
do
we
need
a
new
way?
Why,
like
the
old
ways,
don't
work
and
to
answer
that
question?
I
want
to
kind
of
show
you
rendering,
where
an
artist
of
the
micro-services
versus
a
monolith
application,
and
so
what
micro-service
is
really
the
biggest
difference.
Obviously,
is
that
the
pieces
of
the
previous
big
application
are
now
individual
pieces
that
work
independently
of
each
other,
and
so
when
we
were
monitoring
like
a
monolithic
application,
we
would
put
like
some
probe
on
it.
B
We
we
took
that
a
synchronous
picture
and
split
it
across
many
different
processes,
boundaries
and
should
be
broken,
and-
and
so
what
we
really
wanna
see
here
when
we
monitor
that
system
is
to
to
be
able
to
track
a
single
request,
as
it
goes
not
only
between
multiple
threads
but
between
multiple
process
boundaries
and
and
and
that's
what
distributed
tracing
really
provides.
It
is
the
ability
to
trace
a
single
turn
action
throughout
your
architecture
and
throughout
process
boundaries
threads
whatever
continuations,
I
synchronous
calls
and
all
these
things
and
conceptually
the
way
it
works.
B
Is
it
fairly
straightforward.
There
is
a
contact
concept
of
context,
propagation,
where
we
say
if
we
have
a
microscope
as
our
key
to
actually
with
this
five
microservices
and
the
first
series
receives
a
request.
We
create
a
unique
ID
for
that
request
and
we
stick
it
in
a
so
called
context,
which
is
like
a
virtual
container
which
is
associated
with
that
request
and
that
context
is
propagated
by
whatever
means
throughout
every
single
call
downstream.
B
As
far
as
pressing
that
request,
and
when
we
do
that,
it
allows
us
to
stitch
together
all
those
independent
pieces
of
execution
across
the
call
graph
and
build
a
timeline
of
that
same
request
where
we
can
see
well.
The
whole
request
took
that
much
time
and
it
sure
is
a
the
answer.
Is
a
culture
is
B
B
equals
C
and
D,
etc,
and
so
that
that
U
is
the
typical
view
that
tracing
systems
provide
based
on
the
tracking
of
the
requests
that
they
have.
B
So,
but
why
should
we
care
like
why?
Why
is
that
a
good
idea
to
actually
do
these
things
and
now
I'm
gonna
jump
into
into
the
demo,
and
so
I
will
demonstrate
the
demo,
based
on
the
only
Jaeger
as
a
an
open
source
tracing
system
and
plus
inside
that
repository
there's
a
application
called
hot
rod,
which
is
a
sample
micro
services,
application
that
I
will
be
using
here.
So
first
I
want
to
start
the
Jaeger
back
and
so
I
have,
where
am
I
in
a
github
over
the
Hager,
and
this
is
our
main
repository.
B
So
I
can
start
the
kind
of
Jaeger
back
and
a
single
with
a
single
command,
though,
and
I'll
just
give
it
a
second.
So
one
thing
it
shows
here
is
that
it
started
Jaeger
query
service
at
this
port
right,
so
that's
the
Jaeger
UI
that
we'll
be
using
later
and
then
I'm
gonna
again.
This
is
a
Sima
same
repository,
but
a
subdirectory
examples.
B
Hot
rod
so
I
can
start
the
this
application
as
well,
and
I
want
to
pay
attention
to
the
logs
here,
because
one
thing
that
it's
starting
a
whole
bunch
of
services
like
route
service
front,
end
customer
and
driver
service.
So
just
by
looking
at
these
logs,
we
kind
of
can
get
a
sense
that
this
is
a
apparently
micro
services
based
application,
because
it's
starting
a
whole
bunch
of
things,
and
so
but
the
front
end
is
obviously
the
entry
point.
B
So
let's
go
to
that
to
the
UI
that
application
I
can
make
it
a
bit
bigger
like
this.
So
just
as
a
quick
enter
intro
this
this
sample
application
is
is
like
mock
rides
on
demand
thing.
Where
you
have
these
customers
and
you
click
a
button
and
the
backend
kind
of
find
the
car
which
is
closest
to
that.
To
that
customer
and
says:
ok,
well,
the
car
will
arrive
in
two
minutes
and
it
gives
you
the
license
plate
number.
B
This
is
likely
in
New
York
license
plate
numbers,
and
it
also
gives
a
few
things
that
will
be
useful
later
in
the
demo.
So
one
thing
is
that
when
I
load
this
application,
there
is
this
client
request,
ID,
which
is
just
a
stable
session
ID
on
my
page
like
if
that,
if
I
reload
that
page
I'll
get
a
new
ID
there's
also
every
time
there
is
a
request
made
from
this
application.
B
B
B
We
can
actually
see
what
happens
within
that
service.
Although
the
front-end
service,
which
called
three
downstream
services
and
two
of
those
called,
apparently
some
storage
backends
like
radius
and
MySQL,
and
we
also
see
that
the
counts,
how
many
times
so
just
for
the
single
web
request
there,
apparently
over,
like
25,
27
RPC
pol.
That
happened
within
that
micro
service
based
app
application,
so
that
kind
of
gives
us
an
architecture
overview
of
the
application.
B
But
it
doesn't
tell
us
what
the
actual
workflow
and
the
data
flow
was
like,
which
services
was
pulled
first
and
how
long
it
took.
And
for
that
we
can
go
back
to
you,
the
main
for
a
chief
Jager
and
then
because
the
services
like
emitted
tracing
data
to
the
backend.
We
already
have
this
information,
for
example,
all
the
services
kind
of
presented
and
known
to
the
agree.I.
So
if
we
search
for
a
trace,
we
see
that
this
is
the
one
trace
that
was
imputed
by
system
and,
and
it
says
that
there
are
like
50
ones.
B
If
you
were
the
Jager
measured
from
the
back
end
point,
if
you
saw
some
network
delay
between
the
UI
and
the
back
end,
which
is
response
propensity,
and
so
when
I
go
to
that
trace,
I
now
see
that
a
picture
that
I
showed
in
a
slide
before.
So
it's
a
timeline
view
of
the
trace
so
which
means
that
this
is
a
time
and
this
every
every
horizontal
bar
represents
like
a
unit
of
work
performed
by
a
certain
service.
B
In
particular,
we
can
see
from
the
top
that
the
very
first
request
was
for
the
front
end
service
and
endpoint
call
dispatch
and
then,
in
order
that
service
called
like.
If
you
go
down
kind
of
the
parent-child
relationship,
we
can
see
that
this
front
end
service.
All
the
customer
service
with
a
customer
endpoint
and
the
customer
did
some
my
spiel
operation,
then
the
front
end
called
driver
service,
and
then
the
driver
service
did
a
whole
bunch
of
other
called
apparently
to
radius.
B
Like
first
point
driver,
IDs
and
then
a
whole
bunch
of
get
driver
request,
apparently
to
like
to
achieve
driver
information
and
then
some
of
them
we
can
see
file
like
they're
marked
by
the
exclamation
point,
so
they
took
longer
and
and
then
most
of
them
have
succeeded.
And
finally,
in
the
end,
the
front
end
service,
like
after
the
driver
called
the
front
end,
did
a
whole
bunch
of
requests
to
do
the
route
service.
And
so
again
we
don't
really
know
how
and
what's
the
business
mode
UK.
B
But
at
least
we
see
the
data
flow
of
this
application
and
then,
once
that
all
this
route
requests
were
executed
in
the
end,
the
front
and
produced
result
and
the
front
end
later
the
UI
display.
So
this
is
kind
of
very
simple
walk
through
the
workflow
application.
Just
by
looking
at
the
single
tracer,
it
gives
us
a
lot
of
context
that
what
happened
in
this
microlight,
four
seven,
whatever
micro
services
in
this
application,
now
just
a
bit
more
details
about
this
tray
so
distributed
tracing
kind
of
allows.
B
You
not
only
see
that
information,
but
also
drill
down
into
individual
pieces
of
every
every
span
and
again
spend
is
just
a
unit
of
work
within
the
application
which
is
instrumented
with
a
particular
kind
of
annotation,
and
so
we
can,
for
example,
like
in
demise
field.
We
can,
we
can
expand
that
MySQL
spray
and
you
can
see
out
there
is
this.
B
We
can
see
that
the
actual
SQL
statement
was
this
that
execute
that
we
also
see
the
request
ID
from
the
remember
that
request
ID
in
the
UI
this
guy,
and
we
also
see
something
in
the
log
which
is
associated
with
that
span.
So
this
information
kind
of
allows
you
to
if,
if
there's
an
error,
and
in
particular,
let's
look
at
this,
the
error
case
is
where
we
see
Reggie's
Falls
failed.
B
So
if
we
drill
down
into
that,
we
can
see
that
virtually
in
the
logs
that
apparently
it
was
a
time
out
basically
on
radius,
which
was
that
request
to
fail
and
then
the
back
end
austerity,
the
driver
series,
we
tried
it
with
another
request,
so
this
is
kind
of
again
just
like
a
quick
walk
through
through
the
capabilities
of
the
tracing
system.
This
is
very
common
functionality.
B
B
So
but,
however,
we
still
don't
quite
know
what
the
actual
business
logic
within
this
application.
For
example,
why
did
the
front-end
call
the
customer
service
right
and
so
to
understand
that
we
can
actually
turn
again
to
login
and
and
try
to
understand
the
behavior
of
vacation
based
on
the
logs,
but
before
we
do
that
in
a
trace?
Let's
take
a
look
at
the
logs
here
so
and
look
it's
like
I'm
scroll,
and
this
is
what's
the
one
single
request
right.
B
B
The
lone
the
brunette
but
like
I,
find
this
very
difficult
to
actually
follow
to
exception,
like
exception
traces,
and
remember
that
we
only
did
one
single
request
so
far
like
if
this
was
a
real
production
service
and
it
was
servin
like
requests
per
second,
these
boards
will
be
complete
mess,
everything
would
be
interleaved
and
there
is
no
way
to
tell
what
actually
is
happening.
What's
the
logic
of
the
application,
so,
instead
of
looking
at
the
logs,
we
can
actually
look
at
the
logs
in
the
tracing
system.
B
In
specifically,
if
we
look
at
the
front
and
service
the
very
top
span,
we
can
see
that
it
has
17
locks
and
if
we
expand
that
now
all
those
requests
that
we
saw
in
the
standard
out
there
kind
of
the
same
same
walks,
but
they
are
now
very
contextualized
to
say:
ok
well,
I
only
see
in
logs
from
this
particular
span,
like
some
other
spans,
like
my
schpeel,
it
had
its
own
box,
the
radius
pause
there.
They
had
its
own
log,
so
they
will
like
in
the
log
output.
B
They
would
be
all
mixed
up
here,
I'm
only
seeing
what's
relevant
to
the
Spence.
That's
what
we
call
contextualized
Logan
that
tracing
provides
its
kind
of
allows
you
to
narrow
down
the
behavior
of
a
particular
execution
very
closely
and
by
looking
at
the
lot,
we
can
now
actually
understand
the
actual
actual
business
logic
that
the
application
is
doing
so
once
it
received
the
request,
it
says:
I'm
gonna
load
the
customer
information
by
customer
ID,
which
was
sent
by
the
UI.
Then
I'm
gonna
find
the
nearest
drivers
to
that
custom.
B
So
again,
the
main
point
here
that
the
logs
are
very
contextualized
every
individual
and
they're,
not
mixed
up
with
anything
else,
and
also
we
can
see
that
are
we
shown
that
logs
and
tags?
This
is
like
a
standard
feature
in
open
tracing
tags
are
really
the
things
that
you
want
to
assign
to
the
whole
span
kind
of
a
description
of
this
plan.
For
example,
it's
like
I'm
calling
my
SQL
service
and
this
plan
kind.
Is
that
I'm,
a
client
of
my
school
server
site,
where
is
logs,
are
really
things
with
the
timestamp?
B
So
if
you,
if
you
meet
in
something
at
the
point
in
time,
then
it's
a
log.
Otherwise,
if
it's
the
whole
descriptor
of
a
of
the
attribute
of
expand
and
its
attack.
So
this
is
like
a
standard
terminology
and
open
tracing
and
finally,
the
last
kind
of
I
guess
not
the
least
important
thing
of
tracing
is
that
we
can
see
the
overall
latency
of
of
this
request
and
what
was
on
the
critical
path
and
what
was
was
taken
basically
750
milliseconds.
B
To
do
this
request
right,
so
you
can
see
that
my
scale
query
took
over
300
milliseconds.
So
something
to
you
to
look
into
then
now.
The
thing
you
can
see
is
that
the
loading
of
the
drivers
took
another
200
milliseconds
and
by
looking
at
this
kind
of
staircase
pattern
in
the
trace,
we
can
understand
that
all
these
drivers
were
requested
from
radius
differentially
right.
So
another
potential
optimisation
for
this
application,
saying
maybe
we
could
just
call
them
all
in
parallel
and
just
reduce
it
to
just
a
few
milliseconds.
B
Instead
of
200
and
finally,
the
request
to
the
route
service,
this
is
interesting
like
we
see
that
they
are
actually
concurrent
right.
So
we
see
a
whole
bunch
of
of
concurrent
requests,
but
they
all
not
they
not
all
concurrent.
So
actually,
in
fact,
there
is
at
most
three
concurrent
requests
going
on
to
the
route
service
and
then,
as
soon
as
one
of
them
stopped
like
first
stop
another
one
started
right.
This
stopped
sorry.
B
This
stopped
another
one
started,
so
it
looks
like
there
is
some
executor
pool,
which
is
bounded
by
the
three
threads
and
that's
like
so.
The
parallelism
of
this
of
this
whole
segment
of
the
trace
is
limited
by
three
and
so
again,
potentially
another
optimization
point
to
improve
the
application
latency.
So
now,
let's
see
what
how
this
application
actually
performs.
B
If
we
start
doing
a
lot
more
requests
right
if
I
start
clicking
many
times
here,
and
so
we
can
see
that
the
latency
is
starting
to
climb,
essentially
the
more
requests
the
longer
it
takes
and
notice
that
request.
Id
is
keep
like
incremented
as
I
mentioned
before.
So
how
can
we
use
tracing
to
investigate
it?
So
I'm
gonna
pick
this
like
this
driver,
ID
or
license-plate
ID,
and
then
try
to
search
for
trace
with
this
ID
and
tracing
allows
you
to
I
think
it's
like
driver
ID.
B
The
syntax
is
just
a
driver,
okay,
so
I'm.
Looking
at
this
thing,
it
says
driver
equals
license
plate,
so
I
can
search
by
the
tag
driver
ID,
but
if
driver
okay,
so
now
I
get
this
trace
and
we
see
like
it's
the
one
that
was
actually
very
long
like
almost
200
milliseconds.
So
this
one
is
saying
182
close
enough
right.
So
when
we
look
at
this
trace
immediately,
we
see.
Oh,
my
spiel
is
taking
an
enormous
amount
of
time
here,
1.4
seconds
so
clearly,
there
is
something
wrong
with
that
application.
B
There
is
some
bottle
knife
and,
let's
actually
use
the
login
feature
of
the
trace
in
it
we
jump
into
the
logs
we
can
see.
Oh,
this
request
is
actually
blocked
by
four
other
transactions
and
it
was
waiting
for
over
almost
like
a
second
until
it
acquired
that
log
and
allowed
to
proceed
to
you
to
query
my
skill.
What
that
means
is
like
in
practice
this
is
obviously
a
multiplication,
but
what
it
simulates,
a
real
environment,
where
you
only
have
one
one
like
connection
to
the
database
instead
of
using
a
connection
pool.
B
What,
if
I,
go
and
book
for
those
requests
and
see
which
one
was
actually
the
longest
and
and
cause
this
all
isn't
working
in
the
queue
and
that
allows
you
to
do
that.
But
what's
interesting
is
that
if
we
look
at
the
customer
service
there
is
this
HTTP
request
that
was
executed.
It
says
nothing
about
request.
Id.
It
only
says
well
give
me
a
customer
information,
so
request.
Id
came
all
the
way
from
the
from
the
front
end
from
the
JavaScript
UI,
but
it
wasn't
passed
as
a
request
parameter
to
this
service.
B
So
how
did
this
guy
know
about
all
these
transaction
IDs
right?
And
the
answer
is
because
it's
another
feature
of
hope
and
tracing
API
called
baggage
and
back,
which
is
essentially
remember.
We
talked
about
context
propagation,
where
tracing
is
is
using
context
propagation
to
pass
around
trace,
ID,
but
context.
Propagation
itself
is
a
more
general
concept.
B
B
Is
this
legislated
out
and
not
not
block
on
this
one
transaction
and
then
the
actual
transaction
to
the
database
is
simulated
by
this
sleep
statement,
which
has
a
certainty
way
right
and
so
I
also
want
you
there
just
for
demonstration
purposes.
I
want
to
go
and
reduce
that
delay
to
to
make
it
a
bit
shorter
and
and
see
how
this
this
small
change
really
affects
the
behavior
of
the
application.
B
Ok,
so
we
started
again
reload
this
page
note
my
session
ID
changed
now
and
so
again,
I
do
a
whole
bunch
of
requests.
So
what
we
see
now
is
that
latency
is
still
kind
of
going
above
the
first
one,
but
it's
not
as
dramatic
anymore
as
it
used
to
be
right.
It
doesn't
go
to
two
seconds,
and
so,
if
I
again,
the
latest
like
sorry
the
longest
race
and
try
to
search
for
it.
B
B
The
call
to
the
driver
is
still
the
same:
200
milliseconds,
because
I
really
haven't
optimized
the
thing
but
notice
how
this
change
this
segment
change
now
so
remember
we
used
to
see
is
3
at
a
time,
but
instead
we
actually
see
in
sometimes
one
sometimes
not
even
less
than
one
request
being
executed.
So
my
whole
request
is
being
blocked
like
we
can
see
it
in
the
mini-map.
B
B
Swear,
my
laptop
is
usually
faster.
It's
the
video,
that's
slowing
down,
okay,
so
we
got
this
application
started
reloaded
again
and
now
like
because
I
optimize
the
whole
bunch
of
stuff
in
this
in
this
bowl
I
have
to
pick
really
really
fast
to
actually
get
any
sort
of
latency,
or
so
you
see
like
I,
request
a
tunnel
but
immediately
so
like,
and
they
all
way
shorter
than
before
and
like
so.
If
I
take
the
longest
one
just
to
see
what
what
actually
is
happening.
B
Okay
notice
that
there's
lots
of
more
errors.
Somebody
I
don't
know
why.
That
is
interesting.
The
air
is
actually
random
and
is
so
like
it's
kind
of
surprising.
Why
why
there
are
more
errors,
but,
like
we
can
see?
Actually
this
the
impact
of
that
last
change
that
so
we
have
Qian
drivers
being
requested
or
like
a
chain
routes
being
requested
from
the
route
service,
and
now
we
can
see
that
they
all
execute
it
in
in
peril,
because
we
have
essentially
removed
a
contention
on
the
resource
pool
on
a
thread
pool.
B
So
this
again,
this
is
I.
Hope
like
this
is
a
demonstration
of
the
tracing
functionality
and
like
how
tracing
can
help
you
quickly
narrow
down
what
the
problems
are
in
individual
components
of
your
RTT,
actually
individual
services
and
how
you
can
try
to
optimize
this
by
looking
at
either
relationship
between
the
calls
critical
paths
like
here.
B
It's
like
an
exercise,
if
you
want
to
do
it
yourself,
okay,
so
and
the
final
thing
that
I
want
to
show
here
is
that
so
I
mentioned
baggage
I
want
to
show
another
use
case
for
baggage,
and
this
application
actually
emits
a
whole
bunch
of
metrics.
So
if
I
go
to
this
yes,
this
I
mean
it's
another
port
that
this
application
exposes.
So
we
can
see
a
whole
bunch
of
metrics
emitted
by
the
way
some
of
them
are
I.
Think
before
I
search
for
oh.
B
I,
actually,
don't
have
metrics
from
the
tracer
itself,
so
it's
probably
not
configured.
Normally
the
tracer
itself
emits
metrics
about
how
many
spans
it
starts
or
stops,
and
instead
what's
configured
here
is
the
RTC
metric.
So
we
can
see
that
all
the
services
and
all
their
endpoints
are
actually
being
measured
by
by
jäger
and
emitted
as
metric.
So
like
tracing
in
general,
does
a
heavy
sampling
of
the
requests.
B
But
what
I
wanted
to
really
show
here
is
this
part
so
notice
that
this
is
a
metric
which
says
how
many,
how
much
time
the
route
service
calculation
and
the
route
service
spend
in
seconds
on
behalf
of
individual
customer
or
on
behalf
of
individual
web
session,
and
remember
that
this
my
web
session
ID?
Is
this
one
right
and
so
well,
it's
kind
of
nice.
B
B
It
really
cares
only
about
where
we
start
and
where
we
drop
off
so
just
take
two
coordinates
all
that
needs,
and
yet
it
is
able
to
produce
these
metrics
by
by
the
customer
and
by
the
session
ID,
which,
which
are
the
identifiers
which
are
only
available
at
the
very
top
of
the
application.
Essentially,
the
the
front
end
service
knows
that,
but
it
doesn't
pass
it
explicitly
to
the
route
service.
B
B
And
finally,
one
other
thing
that
I
want
to
go
over
in
this
presentation
is
really
so.
I
hope
that
you
like
this,
feel
functionality
and
you
think
tracing
is
great,
so
great
like
how
how
difficult
is
actually
to
instrument
that
application
to
to
get
all
these
data
bases.
And
the
answer
is
it's
actually
not
that
hard
and
in
fact,
if
we
look
at
the
source
code
for
this
application,
they
will
be
surprisingly
very
little.
B
Information
available
to
instrumentation
for
tracing
explicitly
and
the
reason
for
that
is
because
open
tracing
API
is,
is
an
open
source
API
that
any
framework
can
use
to
instrument
itself,
in
particular
any
RPC
framework
in
use
James
to
mend
itself
and
as
a
result.
If
we
look
at
the
source
code
for
any
of
the
services,
let's
say
we
look
at
the
front
end
service.
B
We
see
that
there
is
this
like
one
mentioning
of
over-over
tracing
for
instrumentation,
which
really
just
creates
a
wrapper
around
the
the
server
and
then
once
that's
done
all
the
requests
through
that
you're
automatically
traced,
you
don't
need
to
do
anything
special.
Similarly,
there
is
another
service
here,
I
forgot,
which
one
I
think
it's
a
route
service.
So
when
it
starts
this
one
is
not
based
on
HTTP
or
it
is
actually
maybe
to
driver,
so
the
driver
server.
B
Yes,
so
the
driver
server
is
not
based
on
HTTP.
It
uses
a
tea
channel,
which
is
another
RPC
framework,
open
source
IPC
framework,
and
that
framework
is
instrumented
with
tracing
by
itself
with
open
tracing.
And
so
what
we
can
see
here
in
the
in
the
code
is
that
when
I'm
creating
this
new
channel,
the
only
thing
I'm
passing
it
is
the
tracer
and
and
that's
it-
there's
no
more
instrumentation
anywhere
in
this
in
this
service
to
actually
enable
tracing.
B
In
fact,
if
we
look
at
the
handler,
so
this
is
the
Condor
function
which
is
being
called
by
the
by
the
server.
There
is
no
mention
in
the
Papa
tracing
here
anywhere
right.
It
just
gets
a
context
object,
which
is
the
common
way
for
tracing
to
propagate
data
inside
the
application
and
on
tracing
kind
of
happens
behind
the
scenes
automatically
again,
because,
because
open
tracing
is
an
open
API
that
anyone
can
use.
If
you
are
writing
your
RPC
for,
were
you
writing
your
I?
Don't
know
radius
driver
in
particular
language?
B
You
can
you
can
write
open,
trace
instrumentation,
either
into
your
driver
directive
or
provide
like
a
wrapper
which
what
happens
with
the
HTTP,
like
they're,
in
standard
libraries
and
open
tracing
contributor
space
which
allow
you
to
wrap
HTTP
clients
and
servers,
and
and
and
not
really
worry
about
racing.
However,
if
you
do
want
to
trace
explicitly
on,
obviously
open
tracing
allows
you
to
do
that,
and
there
are
examples
in
this
application
like
radius,
for
example.
This
is
not
a
real
radius.
B
This
is
a
simulation
of
radius
and
so
to
to
actually
simulate
that
we're
making
some
some
sort
of
RPC
request.
There
is
explicit
instrumentation
for
open
tracing.
We
send
okay
start
a
new
span
here
representing
a
Cote
radius
and
we're
saying
that
this
is
a
RTC
client
kind
of
kind
of
spend
right
so
to
detect
that
we've
seen
in
a
tracing
example,
and-
and
this
is
the
only
really
place
in
this
code
where
open,
trace
instrumentation
is
done
explicitly
simply
because
there
is
no
real
ready
server.
B
B
This
is
a
kind
of
a
key
value,
login
framework
that
zap
is
a
login
framework
which
allows
your
structured
plugin
so
rather
than
a
format
in
a
string
with
like
a
formatter,
you
provide
key
value
pairs
explicitly
and
it's
a
lot
more
efficient
and
go
away.
There
is
no
memory
allocations
and
suppress.
However,
the
the
really
difficult
difference
here
from
normal
login
is
this
part
right.
B
So,
instead
of
just
calling
Bogard
info,
if
we
did
that,
then
we
wouldn't
be
able
to
associate
Logs
with
the
actual
context,
because
they
would
just
go
to
standard
out,
and
this
is
just
a
little
trick
in
this
application,
where
the
logger
isn't
really
the
normal
blogger.
But
it's
it's
a
wraparound
logger
which
allows
you
to
me
others,
either
you
can
get
a
background
which
doesn't
require
context
and
can
load
your
standard,
like
lifecycle,
application
messages
or
if
you
have
something
that
is
request
specific
in
this
case.
B
It's
obviously
a
scope
to
this
particular
request
find
nearest
car.
So
we
get
a
different
type
of
logger
for
that
context,
and
as
soon
as
we
do
that
there
is
a
magic
in
that
you
can
look
in
the
source
code,
how
it's
actually
the
same
globe
is
14
to
both
standard
out
and
into
your
tracing
spam,
and
that's
why
I
am
able
to
get
to
show
it
in
the
UI.
B
But
when
it's
associated
with
a
span,
you
get
contextualized
login
versus
like
a
standard
cloud
mess
so
I'm
chicken-
that,
oh
yes
and
that's
at
the
end
of
my
end,
just
like
the
very
final
point
is
open
tracing
doesn't
bind
you
to
any
particular
tracing
organization
right.
So
here
we
use
the
Egger,
but
if
we
look
at
how
tracing
is
actually
initialized,
this
is
the
only
single
place
in
this
whole.
Application,
which
is
specific
to
Jaeger,
will
say
and
configures
the
project
from
Jaeger
this
one
I
guess
yeah.
So
it's
a.
B
We
can
see
that
the
Eggert
client,
that's
the
only
place
where
it
is
actually
specific
to
Jaeger
right.
So
we
instantiate
Jaeger
tracer
and
from
that
point
on
the
rest
of
the
application,
is
not
aware
that
there
is
anything
to
do
with
Jaeger.
If
you
want
to
swap
it
for
Zipkin
or
for
a
light
step
or
for
any
other
open,
trace
and
compliant
tracer.
This
is
the
place
to
do
it
and
it
will
work
just
well.
B
Your
UI
will
be
different,
obviously,
but
the
the
actual
instrumentation
doesn't
need
to
change
so
that
I
think
is
the
end
of
my
demo.
So
let
me
see
yes.
So
as
a
recap,
what
we've
done
is
I've
showed
that
instrumentation
itself
is
pretty
much
off
the
shelf.
I
didn't
have
to
change
a
lot
stuff
in
my
application.
I
can
swap
another
tracer,
so
there
is
a
gender
neutrality
to
the
whole
of
penetration.
Api
and
tracing
allows
the
two
moni's
transactions
across
multiple
microservices
and
process
boundaries
and
different
threads
as
well.
B
We
can
definitely
think
do
things
like
latency
merging
latency
operations,
finding
critical
paths,
analyzing
root,
cause
of
some
errors
or
delays
in
the
execution.
We
can
get
contextualized
logon,
very
highly
contextualized
loaded
with
tracing.
We
talked
about
baggage
propagation,
how
it's
very
powerful
techniques.
In
fact,
at
uber
we
have
a
number
of
projects
which
are
built
strictly
on
top
of
baggage
propagation.
B
They
really
don't
even
have
to
be
anything
with
tracing,
but
they
rely
on
the
egg
instrumentation
because
they
need
back
propagation
and
and
I
showed
quickly
the
RTC
metric,
but
that's
something
that
the
area
will
talk
more
in
the
next
session
and
just
a
few
words
about
Jaeger.
So
Jaeger
is
a
dissident
racing
system.
We
open
sourced
it
in
April
this
year,
it's
open
tracing
inside,
so
it's
like
built
from
open
tracing
from
the
beginning.
It
can
be
used
as
a
drop-in
replacement
recipient.
B
C
I'll
try
to
get
through
this
demo
really
quickly
and
thanks
for
the
demo
Yuri.
So
what
I'm
going
to
do
is
just
show
how
we
can
use
some
an
open
tracing
system
like
Jaeger,
but
also
capture
application,
metrics
and
integrate
with
something
like
Prometheus
and
have
that
all
running
on
open
shift.
The
this
example
also
runs
on
kubernetes
and
there's
kit
up
repository
located
here,
where
you
can
find
the
example
and
the
instructions
for
running
on
both.
C
So
there's
also
a
blog
on
the
the
Red
Hat
developer
program.
That
explains
how
to
run
this
on
on
kubernetes
there's
a
github
organization
called
Jager
tracing
where
you
can
find
the
templates
for
deploying
Jager
onto
kubernetes
and
OpenShift
and
the
as
I
mentioned.
There's
the
the
Java
metrics
component
that
decorates
the
tracer
can
be
found
in
this
organization
here
in
this
repo.
Ok.
So
what
I've
done
is
I've
already
deployed
the
example,
so
there's
the
account
manager
and
the
order
manager
I'm
using
Prometheus
operator,
which
is
an
extension
project.
C
Super
meaty
'its,
which
is
able
to
identify
services
that
have
been
deployed
and
if
there's
multiple
instances
of
the
services
and
be
able
to
update
the
configuration
Prometheus
to
scrape
the
metrics
from
those
those
services
and,
of
course,
we've
got
Jaeger
deployed
as
well
before
viewing
the
demo
I'll
just
quickly
go
through
the
application.
So,
as
I
said,
this
is
a
spring
book
application,
the
the
main
application
itself,
as
you
can
see,
that
snows
tracing
specific
code
added
here
for
the
account
manager
on
same
for
the
the
controller,
the
rest
endpoint
itself.
C
C
The
metrics
are
being
reported
using
a
servlet
which
is
exposed
at
the
this
endpoint
here,
and
the
tracing
configuration
is
basically
using
a
component
of
the
open
tracing
project
called
the
trace.
The
resolver.
Don't
this
case,
what
we're
doing
is
we're
we're
obtaining
the
tracer
based
on
configuration
information.
So
in
the
same
way
the
URI
is
pointing
out
yuku.
C
You
just
changed
the
coke
in
one
place
in
if
you're,
using
the
tracer,
resolver
and
the
tracing
implementation
supports
it,
then
it
can
be
done
without
any
code
change
at
all,
but
in
this
case
we're
decorating
the
tracer
before
he
gets
returned
using
this.
This
component
here
with
a
Prometheus
matrix
water
with
Prometheus,
the
metrics
are
reported
with
a
set
of
labels.
C
So
from
the
stand
in
the
standard
way,
what
we're
doing
is
we're
using
labels
to
represent
things
by
the
service
name,
the
operation
and
various
other
fields
that
can
then
be
used
to
sort
of
categorize
the
metrics,
but
through
this
mechanism
you
can
also
customize
and
add
to
your
own
labels
as
well.
So
in
this
case,
what
what
I'm
doing
is
adding
a
package
label.
C
So
this
is
using
the
mechanism
that
we
talked
about
where
application
specific
information
can
be
propagated
with
the
trace
in
context
through
a
chain
of
the
services
they're
being
invoked,
and
so
what
this
one
is
doing
is
it's
adding
a
transaction
length?
So
this
could
be
a
business
transaction
and
the
second
parameter
is
just
a
default
value.
C
So,
in
this
case,
where,
if,
if
one
hadn't,
if
a
baggage
item
with
that
name,
hasn't
been
provided-
and
we
just
use
this
value
at
the
other
thing
in
terms
of
the
tracer
configuration-
is-
we
need
to
tell
it
to
ignore
the
rest,
endpoint
slash
metrics.
That
includes
two
to
scrape
the
Prometheus
metrics,
just
so
the
order
manager,
because
this
is
slightly
differently.
The
application
itself
again
has
no
specific
gold,
but
the
the
controller
does
include
or
injects
the
tracer,
and
this
is
purely
just
to
be
able
to
set
the
baggage
item.
C
C
Okay,
so
if
I,
this
is
the
UI
for
this
particular
application.
So
you
can
see,
there's
there's
some
transactions,
that
we've
got
the
order
manager
which
has
the
by
endpoint
vote
and
that's
invoking
the
the
account
manager.
So
that's
a
simple
invocation
and
let's
see
so
this
one's
showing
an
example
of
an
error.
So
if
I
look
at
the
account
manager,
if
I
have
a
look
at
the
logs
and
they
you
can
see
that
he
fell
to
find
account
has
been
reported,
but
because
URIs
donor,
an
in-depth
demo
of
jäger.
C
What
I'm
gonna
do
is
focus
more
on
prometheus.
So
this
is
using
the
the
Prometheus
user
interface
and
I've
set
up
some
some
queries
already.
So
this
this
first
one.
What
it's
focusing
on
is
a
metric
called
fan
count.
So
that's
just
the
number
of
spans
that
have
been
created
at
a
particular
point
in
the
business
process.
So
if
we
have
a
look
down
here,
you
can
see
that
there's
a
metric,
that's
created
for
the
operation
sell
on
in
the
service
order
manager
and
this
band
kind
of
server.
C
So
that's
the
server
endpoint
for
that
operation
and
that
service.
There's
a
number
of
labels
that
we're
ignoring
to
simplify
the
information
so
that,
for
example,
you
can
view
information
based
on
Todd's
instance,
job
namespace,
the
the
transaction
field
that
we
added
ourselves
the
transaction
label
and
also
errors.
But
the
moment
we're
just
aggregating
known
for
those
those
particular
fields.
C
So
if
we
got
interested
in
the
sale
transaction,
but
what
this
does
is
it
takes
it
cuts
through
all
of
the
services
and
is
only
focus
nemetrix
being
reported
that
the
transaction
type,
though,
for
example,
if
you
want
to
find
out
what
the
bottlenecks
were
with
a
particular
business
transaction,
this
would
be
a
good
way
to
be
able
to
focus
in
on
that.
And,
similarly,
you
could,
if
you're
interested
in
what's
executing
in
that
particular
part
of
your
infrastructure.
C
You
can
focus
on
the
pods
so
because
of
the
pod
also
includes
a
service
name
and
that's
quite
useful,
as
you
can
see
what
what
services
is
running
on
that
particular
pod.
But
again
it
helps
you
to
locate
if
there's
particular
problems
in
you
infrastructure
and
then
then.
Finally,
I've
got
a
graph.
That's
basically
looking
at
the
error
ratio
between
the
the
for
the
different
services.
C
Okay,
so
that's
just
a
quick
demo,
but
so
just
to
recap
this
is
this.
Demo
is
primarily
to
demonstrate
the
integration
of
the
open
tracing
technologies
with
and
something
like
camellias
for
capping
application
metrics,
but
within
the
context
of
a
kubernetes
or
open
safe
environment,
where
you
can
also
capture
information,
implicitly
about
where
those
services
are
running.
A
Into
your
these
last
slides
and
we
have
the
resources
slides
up
there
and
then
I
just
want
to
say
really.
Thank
you
for
this.
This
is
wonderful
to
see
sort
of
the
interplay
of
all
these
different
open-source
projects
and
how
they
all
interrelate
and
there's
a
lot
of
them
in
here,
and
this
has
been
very
a
very
good
way
to
showcase.
They
have
lots
of
different
things.
B
People
probably
lost
eye
to
go
pretty
fast.
I
just
want
to
mention
that
so
a
couple
of
like
a
few
links
here
so
for
the
open
tracing
project
itself
is
open,
tracing
that
IO
and
then
there
is
a
get
our
chat
room.
If
you
have
questions,
I
want
to
discuss
things
that
this
is
the
link,
and
then
this
is
the
link
for
Jaeger
for
the
main
repository.
B
We
also
have
a
chat
room
for
questions
and
so
on
and
the
the
demos
that
we
given,
they
actually
have
blog
posts
that
essentially
describe
what's
happening
and,
in
particular,
like
hot
rod,
has
a
very
detailed
walkthrough
blog
post.
That
kind
of
talks
about
the
same
thing
that
I
talked
about,
but
with
more
examples
in
at
a
slower
pace,
obviously
and
and
and
Gary's
blog
post
that
he
showed
is
also
here.
So
people
want
to
check
them
out
later
and
actually
go
to
the
repositories
and
look
at
the
code.
These
are
the
links
alright,.
A
B
Yeah
I
can
answer
that,
so
definitely
an
expert
asking
that
so
sampling
is
Trace
based
once
the
trace
is
sampled,
it
essentially
sample
throughout
the
whole
architecture
and
it
is
head
base.
So
the
sampling
decision
is
made
at
the
very
beginning
when
they
trace
ideas
generated
first
time.
That's
the
only
way
for
us
to
actually
ensure
consistent
sampling
across
all
micro-services,
but
having
said
that,
we
we
actually
have
various
work-in-progress
that
are
trying
to
like
add
other
ways
of
sampling
things.
A
B
So
this
is
kind
of
very
interesting
and
very
detailed
question
if
we
really
want
to
go
into
that.
The
short
answer
is
yes,
and
no,
because
the
actual
performance
impact
cannot
be
measured
in
isolation
just
based
on
a
tracing
itself.
It
really
has
to
be
measured
within
a
particular
service
within
a
particular
traffic
pattern,
because
it's
highly
dependent,
so
we
usually
at
uber
at
least,
run
tracing
with
a
fairly
low
sampling
rate,
because
we
have
very
high
volume,
very
hard
traffic
and,
and
so
because
of
the
very
low
sampling
rate.
B
Our
performance
impact
from
tracing
is
completely
negligible.
There's,
like
nothing
talk
about
u1,
but
if
you
crank
up
the
sampling
rate
much
higher,
then
you
will
start
seeing
differently
some
performance
impact.
However,
the
reason
why
that
question
is
actually
very
difficult
to
answer
is
because
that
performance
impact
is
itself
very
hard
to
measure,
because
it's
not
just
like
how
much
CPU
time
osep
CPU
load,
you
add
to
the
to
the
service,
there's
all
kinds
of
other
applications
like
how
much
memory
pressure
you
create.
B
How
much
throughput
is
affected
because
face
collection
happens
in
the
critical
path
of
the
application
of
the
request
themselves,
but
trace
reporting
happens
in
the
background,
and
here
that
background
work
is
somewhat
expensive
if
you
sample
a
lot
of
data,
and
so
that
starts
affecting
your
application,
throughput
and
latency,
and
that's
why
I
like
it's
the
you
really
have
to
try
it
out.
I
mean
with
a
low
sampling
rate.
You're
not
gonna,
have
any
performance
impact.
B
A
B
So
there
are
two
parts
here:
the
adaptive
sampling.
First
of
all,
it
solves
the
problem
of
having
very
low
throughput
end
points
which
would
be
affected.
If
you
have
very
low
sampling
rates
in
a
tracer,
then
some
of
your
endpoints
may
be
sampled
and
some
others
may
never
be
sample
because
of
the
they're
just
low
key
PS,
so
adapt
to
something
takes
care
of
that
and
guarantees
certain
throughput
of
traces
for
nu
endpoint
and
the
second
feature
of
adapt
to
something
is
what
yeah.
B
One
way
is
you
can
do
that
programmatically,
open
tracing
has
a
standard
tag
called
sample
priority
if
you
set
it
on
a
span
with
a
non-zero
value,
then
I'm
a
interpreted
as
a
signal
that
you
want
to
kind
of
turn
that
trace
into
e
debug.trace
and
it's
going
to
be
guaranteed.
Sampled
across
the
stack-
and
it
also
bypasses-
and
you
down
something
that
may
be
happening
on
the
collection
layer.
So
that's
one
way
or
if
you
don't
want
to
do
it.
Programmatically
jäger
also
supports
eager
client
support.
B
B
I'm
not
sure
what
that
means,
but
yeah
it.
Basically,
adaptive
sampling
works
at
the
central
collection
tier
and
it
measures
all
the
traffic.
That's
coming
from
a
particular
end
point
of
a
particular
series
and
it
has
a
target
and
if
we
see
like
oh,
we
want
kind
of
traces
per
second
started
by
this
endpoint
and
if
we
see
a
thousand,
then
we're
going
to
reduce
the
sampling
probability
by
ten
times.
So
that's
how
it
works.
So
I
guess
it's
like
kind
of
a
circuit
braking
when.
A
C
Think
there's
potential
for
the
open
tracing
model
to
support
it
and
because
it
has,
it
can
handle
multiple
parent
references
which
the
dapple
model
is
a
single
parent
approach.
But
I
don't
know
if
I
think
more
work
may
be
required
in
the
standard,
just
maybe
to
find
additional
references
with
reference
times.
Yes,.
B
Yeah
exactly
I
mean
the
references
mechanism
in
open
tracing
does
allow
you
to
have
multiple
parents,
but
there
hasn't
been
like
a
lot
of
work.
Put
on
that
specifically
there's
no
reference
type
defined
for
that
use
case
a
currently.
But
there
are
open
issues
that,
if
you
want
to
like
provide
an
opinion,
there's
definitely
an
issue
about
that
and
another
similar
situation.
Where
you
want
to
do
the
related
issues
like
when
you
want
to
link
to
different
traces,
then
you
can
also
use
the
reference
mechanism
to
to
link
them.
A
So
that
is
really
all
we
have
time
for,
and
I
really
appreciate,
URI
and
Gary
for
taking
the
time
out
today
to
do
this,
if
you
guys
are
interested
in
Yaeger
or
open
tracing
I
know,
like
URI
just
mentioned,
there's
a
lot
of
issues
on
that.
The
uber
Jager
github
repo
that
you
can
weigh
in
on
and
give
feedback
on.