►
From YouTube: CNCF Webinar Series - Introducing Jaeger 1.0
Description
This webinar will demonstrate how Jaeger can be used to solve a variety of observability problems, including distributed transaction monitoring, root cause analysis, performance optimization, service dependency analysis, and distributed context propagation. We will discuss the features released in Jaeger 1.0, its architecture, deployment options, integrations with other CNCF projects, and the roadmap.
Join us for KubeCon + CloudNativeCon in Barcelona May 20 - 23, Shanghai June 24 - 26, and San Diego November 18 - 21! Learn more at https://kubecon.io. The conference features presentations from developers and end users of Kubernetes, Prometheus, Envoy and all of the other CNCF-hosted projects.
A
A
B
Right,
thank
you.
So
we
will
be
introducing
Jaeger
the
supersaturation
system.
A
member
of
CMC
F
is
an
agenda.
I
will
briefly
talk
about
what
distributed
tracing
is
I
will
show
you
a
demo
of
Jaeger
on
with
a
sample
application,
we'll
look
under
the
hood
in
terms
of
architecture
and
some
of
the
sort
of
technical
details
of
the
system.
I
will
talk
about
what
is
in
1.0
release
what
features
etc.
B
I'll
briefly
talk
about
like
the
future
work
that
we
plan
in
situ
and
about
the
project,
governance
and
setup,
as
well
as
we'll
have
a
time
for
Q&A.
If
you,
if
the
answer
it's
like,
if
your
questions
are
not
answered
during
the
presentation,
we'll
have
plenty
of
time
at
the
end
to
discuss
so
don't
worry
about
it.
Just
brief
intro
about
myself.
I
am
engineer.
Tuber
Mike
mentioned
I
am
working
on
that
variability
team.
B
B
So
it's
very
typical
for
us
to
get
a
trace
which
contains
maybe
I,
don't
know.
200
service
calls
within
that
within
a
single
transaction,
and
that
happens
like
billions
of
times
a
day,
because
every
every
second
your
office
is
telling
something
to
children
by
consistency,
and
this
is
where
the
car
is
going
with,
like
the
road
conditions
are
etc.
So
as
the
engineers,
if
we
want
to
sort
of
maintain
this
system,
this
complex
system
operational
and
monitor
it,
how
do
you
do
that
right?
B
B
B
However,
the
these
tools
have
been
designed
in
the
in
the
days
when,
like
distributed
systems
and
especially
systems
built
on
micro
services,
weren't
that
widespread,
and
so
this
problems,
this
systems
have
a
problem
with
actually
dealing
with
the
complexity
introduced
by
micro
services.
This
is
an
example
that
I
recently
got
like
I
was
destroying
some
coal
program
and
it
crashed
with
this
line,
and
I
was
like
there's
no
step
trace
or
anything
a
very
unusual
for
go
actually
to
crash
in
this
way.
B
Us
is,
there
is
something
about
one
individual
instance
in
this
whole
whole
graph,
but
they
don't
tell
us
what
the
context
of
that
of
that
event
that
we
observe
right
and
so,
rather
than
today,
monitoring
sort
of
every
instance
of
the
service
separately,
which
is
what
logs
in
metrics
tattooin
is
like.
Debugging
your
whole
application
without
any
single
stack
trace
anywhere
and
what
we
really
want
to
monitor
are
distribute
the
transactions
that
transpired
within
that
system
and
involve
multiple
services
in
one
request
and
also
distribute
the
tracing
systems.
B
A
B
A
B
So
yeah,
so,
as
we
basically
is
the
request,
Traverse
is
our
architecture
and
multiple
services.
We
keep
passing
the
context
around
which
contains
the
unique
ID
or
trace
idea,
and
we
keep
building
the
time
sequence
diagram
on
the
right
side,
which
also
captures
things
like.
What's
the
causality
of
individual
requests
that
specifically,
the
call
to
e
service
came
from
a
and
not
from
something
else,
and
it
came
after
BC
and
D
service
requests
already
finished.
So
we
can.
We
can
view
this
this
understanding
of
what
happens
within
there
within
the
request.
B
B
So
let
me
edit
this
is
full
screen.
Can
you
see
the
the
monitor
is
on
the
terminal?
Yes,
so
I'm
here
in
the
Jager
repository
so
I'm,
going
around
to
two
applications.
One
is
the
standalone
version
of
Jaeger,
which
includes
all
kinds
of
bytom's
components
in
1.3.
Just
so
it's
easy
to
run.
Even
it
also
includes
the
UI.
So
when
I
run
it,
we
will
see
that
it
is
starting
a
number
of
services.
For
example,
it
says
starting
Jaeger
collector.
It's
also
starting
yoga.
Query,
there's
also
a
good
agent
somewhere.
We
will
talk
about
this.
B
One
I
talked
about
the
architecture,
but
anyway
this
is.
This
is
Rian.
So
if
I
go
back
to
the
website,
the
Jaeger
UI
I
can
sort
of
load
this
UI
from
that
process.
So
it's
accomplished
and
the
other
application,
which
is
also
included
in
the
in
the
Jaeger
poetry.
It's
like
under
examples,
hot
rod.
So
it's
a
Jemma
application
that
illustrates
the
features
of
open,
tracing
and
India
as
well.
So
when
I
start
that
one
one
thing
I
want
to
point
out
that
it
also
contains
multiple
services.
B
Even
though
it's
one
binary,
you
could
actually
start
these
services
independently.
If
you
wanted
like
I'm
saying
all,
but
they
could
be
started
one
at
a
time,
and
so
there
are
a
number
of
services
we
need
within
this
app.
They
all
talk
to
each
other
using
some
remote
codes
over
network,
and
so
that
obligation
also
has
the
front-end-
and
it
looks
like
this
so
now
just
a
quick
intro
to
this
front-end.
B
So
what
we
have
here
is
basically
it's
a
right
on
demand,
a
mock
application,
you,
you
click
a
button,
it
the
backends
sort
of
says
I'm
dispatching
a
car.
It
finds
the
closest
car
with
the
license
plate.
It
says
when
it's
arriving
and
it
gives
us
some
debugging
information
like
the
unique
request
ID
and
how
long
this
request
took
on
the
back
end
and
I
will
come
back
to
this
web
session
ID
later.
There
are
interesting
uses
for
that.
B
So
the
first
thing
I
want
to
sort
of
ask
myself
when
I'm
looking
at
the
application
is,
what
is
there?
Detect
application
right
and
since
we're
talking
about
tracing
is
a
monitoring
tool.
I
want
to
use
tools
rather
than
go
in
and
talk
to
the
whoever
created
that
application
to
understand,
and
so
by
executing
one
single
request.
B
B
This
day,
the
old
one
sort
of
the
standalone
Jaeger
it
keeps
a
memory
storage
rather
than
any
persistent
store.
So
that's
why,
when
every
startled
the
data
is
lost
and
that's
what
I
want
actually
so
again
executing
one
single
request
and
then
go
into
this
diagram,
and
now
we
get
this,
so
it
just
laid
out
incorrectly
in
the
in
the
previous
thing.
So
by
observing
the
behavior
through
traces
of
that
application,
we
already
sort
of
got
an
idea
of
the
architecture
of
that
application.
B
You
can
see
that
there
is
a
front-end
service,
there's
the
route
service,
customer
driver
and
the
customers
ticket
to
the
MySQL
database
right.
So
we
haven't
needed
to
understand
anything
about
like
going
into
the
details
of
the
application.
However,
it
doesn't
still
tell
us
what
exactly
how
the
application
itself
is.
He
and
what's
what's
the
logic
within
it
like
we
can
look
at
the
logs
of
this
application.
There's
lots
of
logs
as
usual,
and
so
this
is
a
typical
problem
with
logs
is
that
they
seem
like
a
good
idea.
B
B
B
So
here
I
have
a
few
traces
already
and
captured
by
a
system.
This
some
of
them
are
are
just
from
the
front-end
I.
Think.
If
we
look
at
this,
oh,
this
is
just
like
loading.
The
front
page
of
the
of
the
service,
but
here
is
the
interest
one
that
is
the
this
one
is
actually
the
the
request
to
the
application
itself.
That
produced
is
right
for
us
and
that's
what
we're
interested
in,
and
so
when
we
look
at
that.
First
of
all,
we
see
this.
B
This
sequence
diagram
that
I
was
showing
in
the
slide
before
so
it
shows
us
what
going
on
within
this
call
microservice
based
application.
We
can
see
that,
for
example,
a
front-end
made
a
call
to
customer
which
made
a
call
to
my
scale
then
front
and
medical
to
driver
service.
Then
it
made
a
bunch
of
calls
to
the
route
service.
B
So
already
we
are
getting
some
idea
of
the
actual
business
logic,
but
in
this
application
we
also
can
drill
down
into
the
logs
which
are
captured
within
the
trace,
but
these
logs
are
slightly
different
from
what
we
see
in
the
standard
out,
because
these
logs
are
attached
to
individual
spans
within
the
trace
right.
So
this
top-level
span
is
really
the
different
requests
that
came
to
the
to
the
top
application
to
the
dispatch
route,
whereas
other
ones
they
can
represent
individual
operations
done
by
experience,
and
so
like
this
one
doesn't
have
the
logs.
B
If
I
go
to
my
SQL
query,
then
I
do
have
a
box
and
like
acquiring
some
book
with
a
transaction
I'm,
also
seeing
in
the
tags
of
the
span.
I
can
navigate
and
say:
okay.
Well,
this
is
the
SQL
query
that
was
executed
by
the
separation
captured
automatically
by
the
trace
instrumentation.
So
what
this
provides
us
as
sort
of
like
what?
Why
is
it
different
from
logs?
And
there
is
like
one
big
difference?
B
Is
that
this,
even
though
this
this
tool
can
also
show
you,
the
logs,
these
logs
are
contextualized
to
the
individual
operations
in
the
service.
So,
no
longer
you
looking
at
kind
of
this
bunch
of
aggregated
log
statements
across
all
your
services
in
no
particular
order.
You
are
looking
at
the
very
specific
sequence
of
events,
which
is
much
easy
to
understand.
It
tells
us
exactly
what
happens
within
this
step
of
day
of
the
transaction
processing
and
we
actually,
if
we
follow
it,
we
will
see
the
exact
same
thing.
B
What
I
described
like
getting
a
customer
find
the
nearest
drivers
and
then
finding
the
shortest
route
to
to
the
like
to
the
shorter.
So
the
me
nearest
driver
so
another
thing
that
distributed
trace,
and
so
this
is
kind
of
sort
of
the
understanding
of
what
what
the
application
is
doing,
but
really
as
a
monitoring
tool.
We
also
want
to
see
well
what
are
the
problems
within
the
my
applications?
Let
me
do
some
other
request.
B
B
The
main
timeline
here
shows
this
is
basically
each
service
represented
by
a
span
by
in
operations
in
the
service,
and
it
gives
the
latency
of
that
individual
operation
and
and
also
they
sort
of
the
hierarchy
on
the
Left
provides
you
causality
of
which
operation
cost
which
other
iterations.
So
we
can
see,
for
example,
that
a
call
from
front
end
to
driver
find
near
separation,
but
that
iteration
itself
consisted
of
find
all
driver.
B
Id
is
equal
to
radius
apparently
and
then
a
bunch
of
course,
to
get
driver
and
point
and
rate
all
like
some
implementation
of
Redis,
and
so
we
we
can.
We
can.
You
can
see
also
the
performance
profile
of
this
request,
as
it
happened
in
the
architects
right.
So
we
can
see
that,
for
example,
the
SQL
query
to
what
almost
like
thirty
forty
percent
of
the
request.
So
if
you
were
to
optimize
this
thing,
let's
say
our
latency
user,
visible
data
suddenly
went
up.
B
This
is
something
that
very
easy
to
see,
even
in
the
stress
but
yeah.
Well,
this
is
definitely
problem.
We
need
to
dive
into
that
and
understand
why
this
SQL
statement
takes
so
long.
Another
problem
we
can
see
again
just
not
really
diving
into
a
lot
of
details.
Just
looking
at
the
time
sequence
diagram.
We
can
see
that
IKEA
will
find
nearest
drivers,
there
also
find
all
driver
IDs
and
then
for
each
driver
we
go
and
the
query
individual
driver,
ID
and
apparently
getting
some
information
with
location
of
the
driver.
B
B
We
see
marked
in
red
those
are
basically
ready
timeout,
so
we
can
see
in
the
local
that
the
information
so
again,
they
also
contribute
to
the
agency.
Actually
it's
proportionately
compared
to
the
a
successful
request
and
the
last
final
part
about
the
lecture
to
the
Timeline
view.
That
I
want
to
point
out
is
that
let's
look
at
the
last
segment
of
the
trace
where
front-end
calls
the
route
service.
So
remember
what
the
business
logic
here
is
that
we
got
all
the
drivers
will
get
their
locations
and
then
for
every
allegation.
B
We
say
in
the
cable
from
that
location
to
my
customer:
what's
the
shortest
route,
and
once
we
get
all
the
all
ten
routes,
we
would
pick
the
shortest
one
and
say:
okay:
this
is
a
driver
we
want
to
send
to
the
customer,
but
the
behavior
here
we
can
see
is
that
there
is
this
three
parallel
requests
going
on.
Apparently,
two
to
the
route
service,
which
is
the
good
news,
I
mean
there's
some
parallelism.
B
B
B
B
What
I
was
was
trying
to
show
is
that
these
individual
requests
like
if
you,
if
you
have
many
concurrent
requests
to
the
system,
then
you're
not
even
gonna
catch
three
at
a
time
executed,
but
it
could
be.
Sometimes
there
is
even
nothing
is
executed
because
the
whatever
the
executor
pool
that
system
is
using.
This
is
limited
by
three
and
that's
what's
brought
in
this
whole
thing,
but
actually,
since
we
have
time,
I
can
actually
show
you
some
way
to
hack
into
this
application
and
fix
some
of
the
performance
issues.
B
B
Database
right
so
the
database
implementation
is
just
it
smoking
sort
of
the
my
SQL
statement,
and
this
is
what
we
see
here.
We
can
see
that
there
is
actually
a
lock
taken
just
to
simulate
the
misconfigured
connection
pool
right,
and
so,
if
we
take
out
this
lock,
then
we
will
unblock
the
parallelism
in
this
particular
step,
and
so,
if
I
go
back,
I
can
restart
the
application
and
I
execute
a
bunch
of
requests
concurrently
start
again,
so
let's
do
them.
B
So
it
still
takes
a
while
in
some
cases,
let's
look
at
the
longer
one,
but
now
we
see
that
the
my
SQL
statement
no
longer
actually
blocks
the
overall
request,
even
though
the
whole
latency
still
increases
the
more
requests
are
going
on
and
in
fact,
the
reason
why
this
latency
increases
is
again.
This
part
as
well
plus
this
part,
as
I
mentioned,
is
the
sequential
execution
of
like
radius
calls
really
needs
to
be
fixed
by
using
some
sort
of
a
thread
pool
or
a
number
of
concurrent
requests.
B
There
is
another
one,
another
thing
that
I
can
fix
as
they,
because
they
might
still
delay
is
actually
simulated
so
and
I
can
go
and
change
that
delay
from
300
milliseconds.
He
wants
to
make
it
like
even
faster.
We
pretend
that
we
just
fixed
the
performance
issue
with
the
MySQL
storage
in
some
way
right,
and
so,
if
we
again
go
and
is
it
get
a
bunch
of
requests,
something
all.
A
B
B
One
is
kind
of
the
same
as
before,
but
this
is
what
I
was
mentioning
earlier
now
remember:
we
used
to
have
a
concurrency
here
of
three
at
a
time
now
we
don't
even
get
one
at
a
time
and
because
multiple
requests,
they
all
contend
for
the
same
thread,
pool
where
it's
beautiful
and
so
again
the
point
of
this
exercise
is
that
I'm
not
really
doing
any
deep
analysis
of
the
application.
Then
you
sort
of
profiling
and
looking
at
the
function,
calls
I'm
looking
at
just
a
time.
B
A
Uri
with
the
speaker
time
for
question
or
not
yeah,
it
could
be
the
time
so
Alex
AG
asks
he
wants
you
to
break
the
app.
Basically
I
asked
him
to
be
more
specific
and
he
said
he'd
like
to
see
if
Jay
can
catch
exceptions,
for
example,
if
the
database
server
is
not
reachable
I'd
like
to
see
in
the
UI
is
that
is
that
possible?
A
B
So
I
mean
this
is
I.
Guess
the
this
application
is
not
really
written
like
this,
because
there's
no
database
service
here,
but
it's
really
I
think
this
is
an
example
of
a
request
which
tries
to
go
to
pretend
the
database
and
it
fails
because
it's
not
available
at
the
time,
and
so
what
happens
is
really
we
still
our
I
mean
it's
also
depends
on
how
the
application
is
written.
B
This
application
is
written
in
such
a
way
that
it's
tolerant
of
Redis
timeouts
right,
so
it
just
retry
is
the
same
iteration,
but
it
does
log
expand
and
it
says
there
was
an
error.
So
if
your
investigation
is
something
you
can
even
search
for
for
this
errors,
it
wouldn't
the
spend-
and
it
will
show
which
show
me
the
traces
which
contain
error
right.
So
that's
how
I
would
say
how
you
would
you
would
find
something
being
unavailable
like.
B
Another
thing
is
that
if
there's
a
stack
trace,
then
currently
Yaeger
doesn't
really
capture
stack
traces
in
full
fidelity.
If
you
use
instrumentation,
which
you
can
log
a
stack
trace
into
the
trace
and
it
will
show
up
in
the
logs,
but
the
formatting
problem
is
not
going
to
be
super
great.
This
is
something
that
we
can
work
on
future.
We
we
want
to
take
some
lessons
from.
A
B
B
Good
question,
so
it
is
possible
to
instrument
nodejs
servers.
The
javascript
version
is
currently
being
worked
on,
so
we
were
originally
released.
Only
the
node.js
version
of
jäger
client
so
that
one
works
fine
for
servers
for
javascript
for
the
front
end.
We
just
need
you
to
do
some
work
on
that
node
client,
so
that
it
can
be
compatible
with
the
browser
JavaScript.
A
B
All
right
so
last
thing:
oh,
maybe
not
last
thing.
Actually
so
one
thing
I
just
did
I
I
rebooted
my
changes
to
again
reintroduce
the
SQL
bottleneck,
and
so
what
I
want
to
do
is
when
I
request
to
this
multiple
requests.
Again,
we
see
in
the
two
latency
is
climbing.
So
if
I
go
and
search
for
for
this
traces
again
and
the
one
I
want
to
pick
one
of
the
long
ones,
this
is
probably
the
longest
right
and
I
want
to
look
at
the
log
of
this
thing
and.
B
So
this
is
an
interesting
blog,
so
obviously
I
remember
in
the
source
code.
There
was
this
blog
and
it's
not
a
normal
go
log.
It's
it's
an
instrumented
lock,
which
takes
a
context
and
the
reason
it
takes
a
context
is
because
context
is
where
the
the
trace
information
is
stored
and
propagated
through
within
the
application.
So
it's
a
slight
modification
of
the
log,
but
what
it
does
it
able
to
tell
us
not
only
that?
B
B
However,
what's
interesting
about
this
is
that
if
we
look
at
the
so
it
happens
within
the
customer
service
right
customer
service
is
called
mysql'.
So
if
you
look
at
the
URL
request
of
that
customer
service,
it
looked
like
this.
It
says,
give
me
a
customer
with
customer
ID
one
two,
three,
that's
fine,
but
there's
no
information
here
about
the
transaction
ID.
So
how
suddenly
the
service,
which
is
like
a
database
below
this
customer
service
and
well
below
the
front-end,
knows
this
transaction
ID,
which
was
available
only
in
the
front
end
here.
B
So
we
have
the
session.
Ids
just
occurred
every
time,
I
reload
the
page
I
get
a
random
unique
accession
ID,
which
is
sticky
for
the
HTML
page,
and
then
each
request
is
sort
of
unique
with
that
number
and
that's
what
I'm
seeing
in
the
book
and
seeing
these
IDs
here
right.
So,
first
of
all,
why
is
it
important
what's
important,
because
I
can
actually
investigate
as
well?
B
Okay
I
was
waiting
on
the
lock
on
some
contention,
really
Oh
on
the
resource,
so
maybe
I'm
doing
some
large
work,
but
most
likely
because
I'm
just
waiting
most
likely.
It's
the
other
guys
who
are
doing
some
work
is
work
and
that's
why
I'm
working
on
the
queue
right
and
so
I
can
go
and
find
out
these
transactions
and
investigate.
A
B
Happened
to
them,
why
did
they
take
so
long?
So
that's
one!
Why
that's
why
it's
important
to
have
this,
but
it
does
important
part.
Is
that,
like
this,
this
whole
idea
of
transaction
ID
is
really
not
even
available
at
the
API
of
either
these
services
either
the
customer
service
doesn't
get
it
in
the
URL.
The
my
skill
of
visit
doesn't
get
it
anywhere
because
it
can't
pass
that
information
here,
but
it
still
knows
about
it
and
that's
a
feature
of
not
just
Yaeger,
it's
a
feature
of
opens
recently
in
general,
it's
called
baggage.
B
So
if
you
remember
my
with
the
services
I
said
we
propagate
the
context
throughout
the
kold-draft
right
well,
I
said
we
pass
a
unique
idea
within
that
context,
but
that's
not
just
the
only
thing
we
can
pass.
We
can
essentially
pass
any
random
key
value
pairs
and
make
them
available
throughout
the
whole
call
graph
just
by
using
this
like
transparent
context,
propagation
mechanism
and
one
of
those
key
value
pairs.
In
my
example,
is
this
transaction
ID,
which
is
really
created
by
the
front-end
application.
B
It
sticks
it
in
the
request
in
the
context
and
that
context
is
becomes
available
to
every
single
node
within
the
application
right.
This
is
super
powerful
feature,
because
not
only
it
allows
you
to
do
things
like
just
this
lock
contention,
but
it
also
allows
a
lot
of
other
things
and
maybe
I'll
speak
to
that
later,
a
bit
and
and
just
to
illustrate
the
same
point
with
another
example
of
this.
B
By
the
way
it's
called
baggage
very,
not
mentioned
so
this
is
called
like
key
value
pair
within
the
context
is
called
baggage
because
you
really
care
with
your
request
as
an
extra
pile
load,
but
take
a
route
service
route
service.
Again,
conceptually
is
just
the
function.
Well,
it's
a
it's
a
service
which
says
given
two
locations,
find
the
shortest
route
and
give
me
back
some
information
about
right.
So
again
it
does
know
anything
about
the
customer.
Doesn't
here,
it
doesn't
know
anything
about
this
transaction
idea
that
was
somewhere
at
the
top
because
it
doesn't
care.
B
However,
if
we
look
at
the
metrics
emitted
by
this
particular
service,
so
I
have
this
X
bar,
which
is
a
functionality
that
allows
me
to
sort
of
have
a
web
page
showing
the
metrics.
So
this
route
service,
it
means
these
two
metrics
where
it
says.
I
spent
this
many
seconds
on
behalf
of
this
customer
and
this
many
seconds
on
production,
ID
so
magic
again.
How
does
the
route
service
know
anything
about
the
customer
transaction
ID?
If
I
didn't
have
that
as
part
of
its
API
right
and
the
answer?
B
Is
it
got
them
through
the
baggage
and
it's
able
to
calculate
the
centrality,
resource
consumption,
a
Thomas
tree
service
attributed
to
an
ocean
or
to
a
piece
of
data
which
is
only
available
at
the
bear
top
level
of
the
request
to
enter
into
the
system
right,
so
aduba
we're
using
this
for
various
things.
I
know
that
Google,
for
example,
also
uses
that
for
quite
a
while.
B
One
other
thing
I
can
show
you
is,
is
just
to
dive
a
bit
more
into
the
metrics,
and
this
is
not
necessarily
the
feature:
jäger
backend,
it's
more
of
a
feature
of
of
Jaeger
client
libraries.
So
again,
looking
at
this
output,
this
debug
web
page
I
can
see
that
they're
a
bunch
of
metrics
which
look
like
metrics
from
my
service
right.
B
So
it's
a
service
name,
the
metrics
name,
HTTP
request,
so
how
many
requests
received
at
which
endpoint,
which
is
like
slash
customer
endpoint
and
for
each
status
code
right,
and
so
this
is
the
counter
you
can
get
the
same
exact
information
in
the
form
as
Prometheus
metrics.
If
you
want
to
it,
just
like
a
configuration
switched
application.
It
also.
B
Of
errors
or
successes,
and
it
does
it
for
every
service.
So
it's
not
so
unusual
to
have
these
metrics.
You
can
get
it
by
many
other
ways.
What
is
unusual
is
that
the
application
itself
is
actually
not
doing
these
metrics
any
of
the
services.
The
way
they
am
human,
that
they're
not
limiting
the
metrics
and
neither
are
the
RDC
frameworks
that
they
used
by
the
what's
emitting
this
matrix
is
actually
Yeager
client,
because
and
if
you
think
about
what
open
tration
EDI
is
it's
it's
a
way
of
describing
your
transaction
with
an
application
right?
B
So
it
happened
to
be
called
tracing
and
it's
used
primarily
filtration,
but
it
doesn't
have
to
be.
You
can
implement
an
open
tracing
tracer
simply
by
immediate
metrics
from
the
trace
and
do
nothing
else
right
and,
and
that's
like
a
wrapper
really
what
he.
What
Yaga
tracing
contains
an
extra
feature
which
says
oh
and
by
the
way,
if
you
ask
a
community
metrics
on
for
you
automatically,
because
really
instrumentation
will
open
tracing
is
a
superset
of
normal
metrics.
B
Instrumentation
you'll
really
count
how
much
each
request
to
in
terms
of
label
see
how
many
errors
are
there
common
requests
in
total,
so
it's
very
easy
to
actually
meet
this
metrics
using
trace
instrumentation.
So
if
you
use
a
Yaeger
client
you're
going
to
get
them
for
free
and
enable
this
thing
and
one
since
we
are
on
this
on
this
topic
of
metrics,
this
is
analysis,
so
this
page
was
from
803
83.
This
is
the
hot
rod
application
itself.
So,
like
I
said
it's
not
currently
configured
to
send
date
in
the
parameters
format.
B
But
if
you
look
at
the
at
the
help
of
it,
it
says
that
here
is
metric.
You
can,
you
can
say,
Explorer
parameters
so
so
far,
we're
looking
at
from
expire
and
I
can
switch
it
to
prometheus,
but
the
Yeager
back-end
itself
this
this
one
that
I
was
running.
It
is
configured
by
default
with
Prometheus
metrics,
and
this
is
what
we
get,
and
this
is
the
port
where
the
UI
is
running
both
all
the
whole
back-end.
B
So
it
gives
me
similar
things
also
similarities
in
metrics,
but
it
also
gives
me
all
bunch
of
metrics
about
Jaeger
itself
about
individual
Jaeger
components
like
agent,
Jaeger,
query,
service
somewhere
downstream
collector,
etc.
So
you
don't
have
to
do
anything.
It's
really
like.
If
your
character,
mrs.
Runyon,
you
just
point
it
to
this
URL
and
you
can
feel
charts
and
alerts,
and
that's
probably
monetary,
acre.
I
think
this
is
a
this
is
all
I
have
in
terms
of
the
demo.
I
can
pause
here
and
for
another
set
of
questions.
Great.
A
A
B
So
I
don't
have
an
opinion
about
dynaTrace
versus
up
dynamics,
I
assume
because
I
done
the
products
I
cannot
use
them
myself.
I
know
that.
Well,
these
specific
ones
actually
I
did
I,
don't
recall
them
coming
up
with
support
for
pond
tracing,
but
a
lot
of
other
vendors
came
out
with
support
for
open
tracing
standard.
What
that
means
is
that
if,
let's
say
you
have
this
hot
rod
application
right,
it's
currently
a
sentient
races
to
Jaeger.
B
But
if
we
look
at
the
source
code,
this
application
there's
only
one
single
place
in
this
qualification
that
actually
binds
to
the
Jaeger.
If
you
want
to
bind
it
to
any
other
vendor,
like
whatever
was
what
are
the
vendors
like
in
Stannah
data
dog,
New
Relic,
they
all
came
out
with
support
for
open
tracing.
So
if
you
want
to
send
trace
instead
to
those
tenders,
you
can
easily
do
that
with
sin.
Usually
changing
one
single
line
within
this
application,
I'm
getting
similar
behavior
within
there
you
eyes.
B
As
far
as
the
ins
I
forgot
was
it
in
inspected.
I
have
not
looked
into
that.
One
I
know
that
it's
open
source,
but
it's
also
a
vendor.
They
also
support
the
con
tracing
so
again
yeah
you
can
use
that
as
well.
Jager
is
just
yet
another
version.
I,
don't
know
how
actually
Jager
compares
with
expected
thanks.
A
B
Not
for
agents,
so
there
is
actually
an
open
tracing
generic
Java
agent,
which
allows
instruments
various
things,
so
you
could
use
that
again.
It
can
work
with
any
tracer.
Php
Jager
does
have
a
PHP
client
libraries
kernel
on
developing.
It's
not
official.
It's
still
the
community
contribution
at
this
point
so
I
think
that
it.
A
B
Okay,
so
the
the
way
a
distributed
tracing
in
general
works
is
that
it
can
work
over
any
protocol
as
long
as
that
protocol
allows
you
to
pass
some
sort
of
metadata,
usually
as
key
value
pairs
right.
In
fact,
this
particular
application
pattern
is
using
the
such
such
custom
product
multichannel
that
overdeveloped
long
time
ago.
B
We
kind
of
moving
away
from
it,
but
it
is
custom,
it's
a
binary
format,
but
it
does
support
key
value
pairs
as
part
of
the
request
and
so
tracing
just
works
and
so
any
other
custom
protocol
which
allows
sort
of
tracing
from
it
all
like,
not
even
traces.
It's
like
a
metadata
information
to
be
attached
to
the
request.
Those
protocols
can
be
traced
like,
for
example,
Cassandra
Cassandra
has
a
proprietary
protocol,
but
it
can
still
be
traced
it
over
Tracy
thanks.
A
B
So
Easter
service
mesh
currently
can
work
with
Jaeger
there.
Even
you
can
find
example.
Talks
about
doing
this,
so
I'm
not
sure
what
yes,
the
Easter
is
independent.
Really
it's.
The
only
thing
about
I
should
say
about
serious
matters
is
that
they
are
not
magic
bullets
for
tracing,
because
the
if
you
you
can
actually
maybe
find
my
talk
at
cloud
native
corn
in
December,
where
I
talked
about
this,
the.
B
Difficult
part
of
distribute
the
trace
in
the
space
and
the
context
within
the
application.
That's
like
passing
the
context
between
the
application
is
actually
the
easy
part.
It
just
sticks
and
some
grain
HTTP
headers
or
something
like
that,
but
within
the
application.
Sometimes
you
need
to
write
your
code
in
a
careful
way
so
that
the
context
is
not
lost
right
and
therefore,
if
you
use
your
service
mash,
then
services
can
take
care
of
all
the
things
like.
Oh
I'm,
gonna
create
a
spans
on
the
server
on
the
client
to
the
causality,
alter
the
headers
etc.
B
But
if
your
application
does
not
actually
propagate
those
headers,
then
you
don't
you
don't
get
any
tracing
with
service
lashes
right
got
you
that
people
may
not
realize,
and
so
there
is
also
another
talk
at
cloud
native
where
we
show
that
how
you
can
get
tracing
with
easier
and
with
easier
and
open,
trace,
instrumentation
simulation
application
and
that
the
second
example
shows
you
how
much
richer
the
traces
become.
If
you,
if
you
actually
bought
like
there's
a
little
service
and
within
the
service
match
the
service
mares
can
give
you
very
basic
tracing.
B
So
Jaeger
is
not
meant
to
be
a
metric
system.
I.
Think
a
lot
of
dust
is
really
pretty
she's.
Already
doing
all
of
these
things,
the
Jaeger
is
really
about
collecting
performance
of
transactions
rather
than
individual
pieces
of
the
application.
So
I'm
just
rereading
this
question
right,
no,
so
well,
I
guess.
One
other
thing
that
we
do
have
plans
is
is
to
you
have
some
sort
of
custom
metrics
within
the
span
where
you
can
like.
B
A
B
Well,
I
mean
I,
see
so
I
guess
I'm,
not
sure.
What's
what's
meant
by
session
storage
I
mean
traces
are
definitely
stored
and
in
in
Cassandra
you
can
have
any
retention
period.
We
also
store
them
like
an
HDFS
at
or
internally.
So
if
you
want
to
keep
them
for
roughly
here
and
them
go
and
blame
people
so
yeah
Jagr
support,
persistent
trace,
storage.
B
So
I
will
have
to
hurry
up
now,
because
we
spend
a
lot
of
time
on
questions
which
is
good.
So
let
me
go
back
to
my
presentation.
I
will
try
to
speak
fast
and
cover
a
bunch
of
stuff
here.
So
what
we've
seen
so
far
in
the
demo
is
that
we
can
do
this
through
the
transaction.
Monitor
right.
You
can
see
how
transaction
progressed
with
architecture.
B
So
just
a
few
words
about
Jaeger
itself.
We
started
it
at
uber
in
like
two
years
ago,
two
and
a
half
years
ago
now
open
source
web.
In
April,
it's
been
cnc
official
projects
since
last
year
and
it
has
full
operation
support,
including
client,
libraries
and
the
backend.
On
the
community
side,
we
have
about
10
full-time
people
working
on
it,
both
at
uber
and
rat-head,
and
we
have
plenty
of
contributors
on
that.
It
happened
in
used
in
production
by
companies.
Already
on
the
technology
side,
all
back
in
components
implemented
can
go.
B
As
I
mentioned,
we
have
like
persistent
storage
back-end.
We
officially
support
cassandra
in
elasticsearch.
The
the
example
I
ran
here,
uses
in
memory
storage.
So
it's
gone
when
you
started
the
web,
front-end
is
implemented
in
react
and
JavaScript,
and
the
open,
tration,
instrumentation
libraries
are
available
in
the
five
languages
from
here
and
plus
PHP
and
Ruby
are
in
sort
of
a
community
development
phase.
Right
now.
The
open
tration
api
for
those
is
not
finalized
either
yeah.
Now
this
might
be
interesting,
their
detector.
B
You
don't
have
to
bring
a
lot
of
dependencies
in
the
application
because,
like
it
is
available
in
every
language,
and
we
can
do
very
lightweight
clients
for
that,
and
then
the
Jaeger
agent
is
the
one
that
actually
knows
how
to
like.
80-Acre
collector
in
the
backend
communicate
to
them
and
also
there's
a
feedback
loop.
That's
usually
used
for
adapt
to
something
that
they'll
talk
on
and
on
the
collector
side.
B
Collectors
are
fully
stateless,
so
you
can
scale
them
any
way
you
want,
and
the
the
persistence
is
really
nothing
that
Big
Data
data
stores
like
a
sander
in
elasticsearch
people,
are
experimenting
also
with
influx
to
be
silly
to
be.
There's
a
Amazon
to
be
NDP
I.
Think
someone
asked
for
so.
We
do
not
officially
support
those
but
we're
working
on
the
blogging
system,
where
people
can
basically
just
contribute
those
as
plugins
on
run
side
by
side
with
the
main
Jaeger
binary
and
then
Jaeger
career
is
another
component.
B
We
start
to
freeze
the
database
and
format
it
to
the
area,
and
the
data
pipeline
is
not
something
that's
in
open-source
yet,
but
we're
working
on
that.
This
is
where
all
the
obligations
that
opinion
and
like
the
dependencies
diagrams
I
built,
the
I,
was
keep
the
data
model.
So
I
was
keep
the
well
actually
I
want
to
say
a
few
words
about.
B
So
something
is
a
very
important
topic
to
understand,
so
the
amount
of
data
that
traces
capture
from
transactions
can
actually
sometimes
exceed
the
business
record
right
because
it
depends
on
how
heavily
instrumented
applications
come
when
you
lock
you
right
into
traces,
so
most
most
racy
systems.
They
do
not
persist
all
the
data
in
storage,
because
that
just
too
expensive
and
instead
we
sample
them.
However,
sampling
doesn't
mean
that
you
can
just
randomly
flip
sound
spans
into
the
storage
and
some
throw
away,
because
you
want
to
consistently
portray.
B
So
if
you
sample
one
spend
of
the
trace,
you
want
all
trays
expand
within
the
trace
to
be
sample
not
as
well.
Otherwise,
you
just
gonna
get
garbage
data
as
a
trace,
and
there
are
two
techniques
head
by
sampling
where
you
make
a
sound,
a
decision
right,
the
beginning
of
the
trace
and
then
give
it
respect
by
all
the
services
or
you
collect
all
the
data.
First
in
some
temporary
storage,
maybe
in
memory
and
and
then
you
make
it
sampling
decision
at
the
end,
so
eager
supports
the
first
model.
That's
a
classic
tapper
model.
B
There
are
vendors
which
do
tell
by
something
actually
only
one
lifestyle
I
know,
but
we
are
considering
that
is
for
Yeager's.
Well
it
it's
like
for
our
traffic
duper.
It
was
a
bit
challenging
to
do
full
a
hundred
percent
collection,
even
in
a
temporary
storage.
That's
why
we
haven't
done
it
yet
so
on
day,
one
going
to
what?
What
did
we
release
right
so?
Well,
we
officially
really
support
for
Cassandra
in
elasticsearch.
We
wait
made
a
bunch
of
improvements
to
the
UI
we
enabled
metrics.
B
We
have
the
kubernetes
exploit
templates
and
also
there's
a
culture
that
people
developed
an
open
source
so
that
it
actually
pretty
easy
instrumentation
libraries
I
mentioned,
and
then
there
is
a
backwards
compatibility
layer
with
simcha
so
that
if
you're
already
invested
in
specific
instrumentation,
which
is
generally
not
open,
tracing
compliant.
So
you
cannot
really
just
go
and
switch
the
traces
to
Jaeger,
but
you
can
still
use
those
zipping
libraries
and
just
configure
them
to
send
data
to
Jaeger
pack,
and
then
we
can
accept
Pacific
and
formatted
spans
and
represent
them
in
a
Jaeger
data.
B
So
this
is
I
mentioned
to
a
storage
format
in
the
UI.
One
of
the
notable
thing
is
that
we've
spent
a
lot
of
time
on
performance
so
that
you
can
go
to
very
life
stages.
Like
we've
tried
up
to
80,000
spans
like
dual
eh.
Don't
implementation
is
kind
of
sorry
company.
Don't
in
the
browser
is
a
bit
challenging,
but
we
made
it
work
with
some
tricks.
B
Mention
Zipkin
metrics,
like
I,
said
the
all
the
Jaeger
components
come
up
with
premises
by
default,
all
the
metrics,
but
you
can
switch
to
other
ones
and
we
actually
internally
have
support
for
even
more
metrics,
but
so
I
think
involves
the
B's
current
already
compiled,
but
we're
not
focusing
too
much
on
that
is
seriously
possible,
but
once
we
have
the
plugin
systems
it
might
support
those
better.
So
roadmap
is
I,
think
an
interesting
thing,
so
adaptor
something
that
we
already
have
a
dragon
introduction
into
word.
But
it's
not
open
source.
B
Yet
the
point
here
that,
as
I
mentioned,
we
do
upfront,
sampling
and
one
of
the
challenges
with
that
is
that
usually
the
same
thing
was
done
as
like
one
probability
per
service.
But
if
your
service
has
multiple
endpoints
with
different
GPS,
then
that
one
permeability
is
good
for
one
and
not
so
good,
follow
kpo
some
point
and
vice
versa,
and
so
we
adapt
if
something
actually
breaks
it
apart
to
be
per
endpoint
and
the.
Secondly,
why
is
it?
B
Adaptive
is
because
the
backend
actually
keeps
track
of
from
I
stated
its
resilience
from
every
service
and
every
endpoint
and
feeds
back
the
information
back
to
the
client
saying
you
should
adjust
the
probability
because
you're
not
sending
an
update
or
you
send
it
to
my
state.
So
this
is
like
allows
us
to
control
how
much
day
to
be
getting
into
a
curve
back
end
from
old
uber
services,
for
example.
The
date
applied
pipeline
is
our
biggest
focus
for
this
year.
B
Is,
is
really
so
far,
I've
been
showing
you
examples
in
the
demo,
where
I
look
at
once
race
at
the
time
right
and
it's
useful,
provided
that
you
can
find
that
race,
but
at
uber
way
like
getting
several
billion
races
a
day
and
so
there's
no
way.
Anyone
can
actually
look
at
all
of
them
right.
So
we
would,
by
the
way,
snow
the
state
or
not
being
able
to
find
what
we
want
to
find.
So
aggregations
can
come
and
play
here
work
and
we
can
actually
do
data
mining
and
say:
ok.
B
We
we
see
in
this
kind
of
problem,
maybe
like
a
long
latency
tale
on
this
particular
service,
and
these
are
the
sample
traces,
so
Google
look
at
them.
So
that's
a
much
more
viable
approach
to
investigating
for
most
and
trying
to
find
this
individual
traces
using
just
around
ensures
that
you
can
use
in
an
eager
UI
and
some
of
the
examples
of
this.
B
So
this
is
a
another
rendering
pretty,
but
you
can
actually
get
this
picture
with
just
like
Network
sniffing,
because
all
it
really
does
it
just
measures
how
each
service
talk
to
each
other
right.
This
is
a
pairwise
connection
is
really
what
it
doesn't
show.
You
is
deeper
connections
and
that's
what
we
have
here
so
in
this
example.
So,
let's
let's
pick
this
three
services,
so
we
see
that
this
made-up
name
of
the
services,
so
the
service
makes
a
call
to
shrink.
So
it
depends
on
that
service
and
then
shrink
makes
a
call
to
dog.
B
They
also
demanded
it.
But
that's
actually
this
thing
you
service
depends
on
talk
or
not
from
the
from
this
diagram,
the
previous
one.
You
can
tell
that
I
mean
even
from
this
one.
You
can,
but
the
tool
that
we
have
working
on
internally
is
that
you
can
actually
just
type
a
search,
st.
Ingo
and
it
will
show
you
it
will
hide
the
other
services
and
you
will
see
where
there
is
actually
a
path
from
dingo
to
dog
or
not,
and
that
allows
it
to
do
again
even
deeper
dependency.
B
Analysis
of
the
Sherry's
is
saying
like
oh
what's
my
SLA
for
the
service.
What's
my
latency
SLA
all
part,
it
maybe
I
need
to
look
at
all
my
downstream
services
to
kind
of
figure
that
out
and
if
you
don't
know
what
you're
dancing
really
from
previous
diagram.
You
only
know
this
thing,
but
with
with
the
more
passed
by
this
diagram,
skip
and
actually
show
complete,
full
depth
links
and
this
I'm
actually
one
minute
so
another
way
is.
B
We
cannot
again
finding
problematic
traces
and
looking
at
the
latency
histograms
rather
than
looking
for
traces
by
tags,
and
what
we've
built
here
is
that
so
it
drove
you
a
latency,
so
I
mean
the
an
endpoint
I
doesn't
show
here.
So
I
filled
this
sort
of
another
service
and
it
shows
me
the
the
diagram
and
it's
interesting,
because
it's
it's
multimodal
right,
so
distribution.
B
Normally,
you
would
hope
to
have
like
wondrous
calm,
but
here
we
have
a
very
low
range
distribution
of
latency,
but
then
there's
another
hump
around
like
slightly
longer
and
then
there
is
a
much
worse
come
over
here.
Where
is
a
very
really
long
tail
comparatively,
and
so
what
this
diagram
allows
do
is
just
drill
down
into
this
one
and
say
no
yeah.
This
is
the
actual
traces
that
represent
this
this
long
tail
and
you
can
go
and
investigate
them
and
those
who
shows
which
upstream
colors
are
responsible
for
most
of
those
traces.
B
So
again,
this
information
could
be
rather
useful.
So
if
you
were
like,
if
you're
interested
in
contributing
to
Jaeger,
then
we
have
plenty
of
issues
open
and
very
easy
for
you
to
attack
like
either
bonded
or
beginner
to
ask
for
documentation,
and
so
really
we
don't
have
any
CLA.
You
just
agree
into
a
certificate
of
origin,
which
is
the
linux
approach.
You
just
need
to
sign
your
community
with
the
dashes,
which
puts
your
name
and
email
address
in
the
commit
and
blend
your
work
around
to
go
around
here.
B
And
finally,
this
is
just
a
reference
page.
So
if
you
want
to
get
involved,
if
you
have
any
more
questions,
you
can
come
to
this
chat
room
on
get
ur
Jaeger
tracing,
there's
always
people
hanging
around
there.
So
if
you
have
more
questions
after
the
seminar
feel
free
to
ask
them
there
or
on
the
mailing
list,
we
have
a
blog
on
the
medium
with
a
bunch
of
posts
about
Jaeger
and.
B
Is
it
possible
to
trace
apps
that
use
Kafka?
Yes,
especially
in
the
latest
Kafka?
The
protocol
now
supports
metadata
as
part
of
like
Generic
metadata,
so
it's
possible
to
instrument
that
and
and
yeah
and
trace
all
the
message.
Passing,
there's
really
nothing.
That's
preventing.
There
might
be
some
like
funny
issues
with
the
time
scale
because
most
of
the
traces
that
happen
in
RTC
world
they're,
very
short,
whereas
in
Kafka
you
can
calculate
of
Mesa
Tempe
that
are
so
the
UI
might
look
funny
a
bit.
We
don't
focus
on
that.
B
Okay,
what
is
the
price?
So
overhead
is
a
question
that
is
very
difficult
to
answer,
because
it's
completely
dependent
on
your
application,
all
right
and
the
workload
and
the
cpu
capacity
that
you
have
allocated
at
our
application.
So
you
can,
you
can
essentially
get
any
number,
and
so
any
number
I
feel
is
meaningless,
but
I
just
can
say
that
in
production,
the
more
traffic
we
get,
the
lower
probability
of
sampling
we
use
overall
and
so
with
beauty.
There
is
no
nodes
overhead,
especially
in
the
languages
which
are
like
sort
of
efficient
like
Java
and
go.
B
Where
is,
for
example,
in
Python
we
use
internet'
the
framework.
It's
event
will
place
this
framework
and
there
are
two
paths:
the
context
we're
using
the
tornado
native
stack
context.
That
actually
adds
water
for
get,
but
it's
not
really
sort
of
thing
ager
over
get
it's
just
the
overhead
of
the
frame
or
profession.
The
context
around.
A
A
Time
for
we're
going
to
stop
now.
Thank
you
very
much
Yuri
for
that
fantastic
webinar.
Thank
you
to
everyone
in
the
chat
for
being
very
active
with
questions.
That
was
good.
If
you
want
to
keep
an
eye
on
upcoming
events,
I've
just
put
the
CN
CF
events
in
there.
That's
all
we
got
time
for
now.
Thank
you
very
much
and
goodbye.
You
can
bang.