►
From YouTube: Improve Deployment Velocity by Using Distributed Tracing and Lightstep to Understand...- Andrew Chee
Description
Sponsored Lightning Talk: Improve Deployment Velocity by Using Distributed Tracing and Lightstep to Understand and Debug Deployment Problems Quickly - Andrew Chee, Lightstep
Speakers: Andrew Chee
Modern software systems are becoming more complex. When problems happen during a deployment it is very difficult to identify the actual root cause of that problem. See how distributed tracing and Lightstep's analytical capabilities help quickly identify those problems so you can quickly remediate deployment issues.
For more Continuous Delivery Foundation content, check out our blog: https://cd.foundation/blog/
A
A
As
you
see
here,
all
of
the
service
that
services
that
report
data
to
lightstep
are
listed
on
the
left
hand,
side
and
the
left
nav,
and
for
each
service
we
are
actually
able
to
monitor
using
these
distributed
traces,
certain
key
operations
that
occur
inside
the
system.
These
tend
to
be
your
important
transactions
that
occur
in
each
service.
A
We
are
actually
able
to
automatically
monitor
when
code
gets
deployed
and
we
are
also
able
to
visually
show
you
obviously,
those
particular
deployments
as
well
as
whether
you
may
have
actually
several
versions
of
code
that
are
running
in
production
because
of
canary
builds
and
things
like
that,
and
we
would
be
able
to
actually
compare
those
performances
for
you.
So
as
an
example
here
today,
you
will
see
that
I
am
looking
at
this
inventory
service
within
lightstep,
so
this
is
meant
to
be
an
e-commerce
app
that
we
use.
A
You
know,
obviously,
for
demo
purposes
and
in
this
e-commerce
app
there
one
of
the
operations
is
to
update
the
inventory.
So
you
know,
for
example,
if
a
store
is
updating
the
inventory
in
their
web
store
and,
as
you
can
see
here,
we
can
tell
by
a
deployment
marker
that
a
deployment
happened
and
then,
as
you
can
see
soon
after
that,
the
response
time
latency
of
that
particular
transaction
on
that
particular
service
has
gone
up.
So
it's
pretty
easy
to
imply
that
a
deployment
may
have
introduced
a
problem
here.
But
what
is
that
problem?
A
And
how
do
I
go
about
fixing
it
right?
That
is
actually
where
light
step
fits
in.
So
as
we
see
here,
we
see
that
the
response
time
before
the
deployment
was
about
158
milliseconds
at
the
p99,
but
after
deployment,
it's
you
know
almost
1.2
plus
seconds
okay,
to
understand
the
cause
of
this,
which
could
be
this
service
itself
upstream
services
downstream
services
for
istep.
It
is
very
simple.
A
We
simply
click
on
that
particular
regression,
and
what
happens
is
that
we
can
compare
that
performance
to
what
happened
before
so
in
this
case,
I'll
probably
choose
an
hour
before
because
as
problems
occur,
you
know
really
there's
there's
two
things
we
want
to
find
out.
We
want
to
know
why
what
happened
like,
why
did
this
problem
occur
and
what
that
problem
occurred?
And
so,
when
we
look
at
why
a
problem
occurred.
A
A
So
obviously
we
can
see
here
that
the
yellow,
which
is
the
regression,
has
a
second
mode
of
distribution
of
latency.
So
something
is
going
on
here
where
the
original
did
not
the
baseline
did
not.
But
what
is
more
important
is
because
we
actually
capture
every
single
request
in
your
system
and
we
actually
can
follow
those
requests
upstream
and
downstream.
A
Now,
just
by
this
very
quick
glance,
we
know
what
operation
is
being
affected
now
with
lightstep,
because
we're
capturing
these
complete
traces
across
all
of
your
services.
What
we
can
do
is
zoom
in
and
say
well
what
upstream
services
are
affected
by
this
problem,
so
we
do
see
that
this
inventory
services
right
cache
operation
is
causing
this
latency.
A
But
then
we
can
then
keep
going
up
and
say
that
this
particular
operation
on
this
service
is
actually
being
called
by
the
api
gateway,
which
then
are
in
call
by
being
called
by
some
web
and
mobile
front
ends,
android
ios,
web,
etc.
So
very
quickly.
We
can
not
only
tell
you
in
this
distributed
service
architecture,
where
you
have
multiple
services,
interacting
together,
to
serve
up
requests
to
your
customers.
A
We
can
actually
tell
you
where,
along
these
chains
of
services,
that
problem
is
actually
emanating
from,
as
well
as
what
are
the
downstream
and
upstream
dependencies
that
may
be
affected
by
this
particular
problem.
All
of
this
is
simply
done
by
two
clicks,
with
light
step
so
very
quickly.
We
can
tell
that
this
particular
problem
that
we
have
in
this
update
inventory
operation
is
actually
coming
from
this
right
cache
operation
that
is
in
this
system
now.
Next
question
that
we
want
to
ask
is
why
and
with
lightstep.
A
So
with
this
we
are
able
to
actually
show
what
particular
attribution
may
have
changed
from
the
baseline
to
the
regression,
which
may
help
explain
why
this
is
happening.
So,
for
example,
as
we
can
see
here,
there's
some
attribution
that
have
changed
two,
very
obviously
obvious
ones.
Is
we
see
that
the
version
four
four
four
five
five
did
not
exist
in
the
baseline,
but
does
exist
in
the
regression
data
set.
So
now
we
can
pretty
clearly
see
that
that
particular
regression
was
introduced
as
part
of
version
455.
A
A
We
do
very
similar
things
with
log
statements.
If
there
are
log
statements
that
are
attached
to
your
traces,
we
can
also
tell
you
if
they
represent
any
changes
between
the
baseline
and
the
regression.
Now.
What
we've
looked
at
so
far
is
that
this
inventory
problems
problem.
We're
in
the
inventory
services
problem
originates
from
this
inventory
right
cash
operation.
A
So
hopefully,
we've
been
able
to
show
you
when
you
have
problems,
how
lightstuff
can
help
not
only
monitor
deployments,
but
when
problems
occur
as
part
of
the
deployment,
you
can
obviously
roll
back
the
deployment,
but
in
order
for
you
to
continue,
you
know
speedy
delivery
of
your
software.
You
also
need
to
understand
what
the
cause
of
that
problem
is,
so
you
can
quickly
fix
it
and
re-release
your
code.