►
From YouTube: Discuss CI/CD Tracing and open Telemetry issue
Description
James and Michael spent some time talking through an issue to add trace collection to GitLab CI/CD Pipelines, the tech behind it, what's happening in the industry and possible future vision for this area of functionality.
https://gitlab.com/gitlab-org/gitlab/-/issues/338943
A
So
this
is
our
discussion
around
cicd
tracing
with
open
telemetry.
This
started,
as
I
think,
a
slack
message
and
turned
into
an
issue
that
we've
been
discussing
and
michael.
Let
me
say
I
am
super
impressed
with
the
issue
that
you
have
opened
here.
I'm
gonna
go
ahead
and
share
my
screen
so
that
we
can
talk
through
the
issue
since
you've
authored
it,
and
then
we
can
take
the
conversation
for
them
from
there.
Does
that
work
for
you
or
do
you
want
to
share
your
screen?
B
Yes
think
I
think
I
can
go
ahead
and
share
my
screen
and
I
to
be
honest,
I
think
the
issue
has
gotten
a
little
long,
but
I
I
think
it
really
needs
some
like
context
on
which
problem
are
we
trying
to
solve?
B
How
can
we
do
it
and
what
technical
background
you
really
need
to
know
about
open,
telemetry
and
all
the
the
tooling
involved,
because
it's
a
huge
ecosystem,
and
I
think
there
are
many
things
you
you
probably
don't
need
to
know
when
you
think
about
tracing,
but
in
the
end
it
will
help
so
like
one
of
the
things
around
ci
cd
tracing
was
or
like.
B
Why
would
I
need
that?
And
one
of
the
cases
is
like
we're
not
having
like
the
single
pipeline
with
five
jobs,
and
then
everything
is
fine
but
oftentimes.
We
kind
of
design
complex
architectures
of
ci
cd
power
plants.
We
have
asynchronous
dependencies,
we
have
add
effects,
we
have
duration
times,
which
oftentimes
exceeds
one
minute
five
minutes
and
then
the
pipeline
is
being
run
and
there
is
no
fail
fast
pattern
involved
or
something
like
that.
So
it
it
consumes
resources,
resources
might
consume
or
do
consume
cloud
resources,
compute
power
and
so
on.
B
So
it
wastes
time
essentially,
and
then
there
is
the
thing
like.
How
would
I
approach
this?
Where
should
I
be
looking
at
getting
more
efficient
with
the
pipelines
we
do
have
our
like
pipeline
efficiency
talks,
which
started
out
as
from
a
ci
monitoring
workshop.
I
I
did
one
year
ago
something
around
that
and
I
was
like
okay,
we
we
might
have
metrics,
which
we
can
solve
with
the
promises
exporter
from
our
community,
but
we
also
need
to
like
have
a
deeper
look
into
our
pipelines,
and
there
is
another
issue
which
is
being
linked.
B
I
think
at
the
bottom
somewhere
where
we
talked
about
presenting
this
on
the
ui
or
in
the
ux.
So
you
have
your.
B
Pipelines,
let
me
see
if
I
can
find
that
quickly.
Is
it
linked?
Is
it
not
linked?
I
think
it's
not
linked
yet,
but
the
idea
was
to
to
have
the
gitlab
pipeline
inside
no
probably
have
searched
for
it.
Pipeline
analytics.
B
No,
it's
not
this
one,
but
probably
I
can
find
it
later
on
the.
A
Is
it
an
issue
about
showing
the
insights
of
the
pipelines
within
it
within
the
ui.
B
Yes,
it
has
ux
elements
inside
and
there
is
an
animated
gif
or
a
video
inside
which
shows
how
you
can
like
navigate
into
new
pipeline
view.
You
can
zoom
in
you
can
view
the
duration
to
to
really
see
like
a
weather
map
or
heat
map,
which
job,
for
example,
is
exceeding
a
certain
limit.
B
B
Okay,
the
the
thing
is,
like
the
duration
is
only
one
value
of
one
metric.
You
want
to
be
looking
in
and
oftentimes.
B
It's
not
just
like
this
one
job
taking
five
minutes
or
taking
quite
a
while,
but
it's
it
can
be
an
event
of
things
or
timelines
timeline
of
things
influencing
this
overall
behavior
of
a
slow
pipeline
or
of
a
pipeline
which,
which
doesn't
scale
that
good
or
consumes
too
many
resources,
and
the
idea
around
tracing
is
that
you
basically
define
a
span
which
has
a
start
window,
a
start,
timestamp
and
an
end
time
stamp,
and
you,
for
example,
measure
the
duration
of
course
inside.
B
But
you
can
also
like
enrich
that
with
more
metadata
similar
to
logging
in
a
specific
sense
but
put
put
into
these
span
blocks,
and
there
is
not
not
only
one
span
in
in
a
in
a
trace,
but
this
can
be
like
a
multitude
of
linked
spans.
I
don't
know.
Did
I
put
a?
I
think
I
didn't
put
a
picture
in
there,
but
yeah
the
we
have
more
like
insights
with
tracing
into
that,
and
if
I
change
that
picture
into
saying.
B
Okay,
I'm
I'm
treating,
for
example,
the
chop
start
and
the
top
end
as
the
the
bigger
span
I
want
to
look
into,
but
I
could
also
be
like
looking
into
and
this
needs
to
get
that
runner
inside
then
when
the
job
starts,
when
the
docker
pull
starts,
when
the
preparation
for
the
git
sub
modules
start
and
end
and
to
just
provide
a
fine,
more
fine,
granular
inside,
what's
happening
on
the
runner
side,
but
also
what's
happening
on
the
job
server
side,
for
example,
uploading
the
artifacts
or
the
caches,
or
something
like
that
so
to
see
the
the
background
infrastructure,
but
I
also
have
the
possibility
to
instrument
our
own
jobs.
B
Maybe-
and
this
is
something
which
is
linked
with
honeycombs
build
event
over
here.
This
is
this
was
not
intended.
We
have
build
events
which,
which
is
basically
a
cli
which
gets
called,
and
then
you
start
to
spend
your
end
span
and
it
communicates
with
honeycomb
then
embedding
that
into
ci
cd
scripts
is
is
a
little
cumbersome
in
in
that
regard,
that
you
need
to
have
many
environment
variables
and
so
on.
B
So,
from
a
user
perspective,
it's
it's
a
little
complicated
to
get
started
still
the
tool
is
it's,
I
would
say,
straightforward
to
use
when,
when
you
dive
into
it,
but
again
it
doesn't
feel
out
of
the
box
and
it's
probably
easier,
at
least
for
me
as
a
consumer
as
a
user
to
just
say,
hey,
I
want
to
enable
ci
cd
tracing
or
configure
that
from
my
ci
cd
template
and
then
we,
then
I
get
an
insight
and,
depending
on
how
I
configured
it,
let's
just
say
to
the
tool,
basically
sends
the
traces
to
to
a
back
end
and
then
the
front
in
the
front
that
I
can
view
and
analyze
this
in
the
best
possible
scenario.
B
B
It
should
focus
on
open
telemetry
as
a
framework
as
a
tool
which
allows
us
to
define
where
to
send
them.
The
traces
like
use
using
a
defined
backend,
which
is
shown
in
in
the
picture
so
telemetry
acts
as
a
collector
service,
where
you
send
the
traces
to
and
it
stores
them
or
it
allows
to
store
them
in
a
back
end,
which
is
not
open,
telemetry
itself,
it
can
be
jager
tracing,
it
can
be
a
grafana
temple.
It
can
be
something
else
for
tracing
a
vendor.
B
Providing
this
I
don't
know-
maybe
I
think
splunk
has
something
in
that
regard.
So
there
are
many
different
vendors
also
involved
in
the
open
telemetry
project,
and
the
idea
is
to
provide
a
generic
interface,
merge
from
open
tracing
and
open
sensors
into
open,
telemetry
and
store
all
these
traces
there
open
telemetry
has
another
specification
coming
up
for
metrics
and
also
for
logs,
so
it's
not
just
like
open
telemetry
just
for
tracing,
but
this
is
like
out
of
the
scope.
It's
just
additional
information
that
open
telemetry
covers
more
than
just
traces.
A
So
you're
thinking
I
mean,
I
think
that
the
the
problem
is
both
founded.
We
see
based
on
what
we've
heard
from
customers
about
pipelines
when
pipelines
get
too
long
or
they
unexpectedly
change.
They
want
to
understand
why
and
we
don't
have
great.
We
don't
provide
great
visibility
into
that
today
out
of
the
box.
A
So
I
think
that
the
problem
is
it's:
it's
a
problem
space,
that's
right
for
improvement.
It
sounds
like
the
mvc,
then
would
be
you
can
enable
span
or
tracing
within
your
gitlab
ci
cd
pipeline
and
configure
a
place
to
go
collect
it.
All
we're
going
to
do
is,
say
and
every
step
within
the
jobs
we're
going
to
start
and
a
trace
just
as
part
of
the
pipeline
run
for
you,
so
that,
then
you
can
go.
Look
at
those
metrics.
You
can
then
leverage
gitlab
to
do
that
within
our
jager
implementation
or
prometheus.
A
If
you
wanted
and
then
a
further
another
iteration
on
that
might
be
a
not
free
feature
set
that
allows
you
to
visualize
that,
alongside
like
pipeline
editor
or
somewhere
in
that
experience,
you
can
see
hey.
This
is
taking
a
long
time
either
at
the
last
run,
or
you
could
start
to
dig
in
and
say,
hey
now.
Historically,
this
job
is
taking
over.
This
pipeline
is
taking
longer
whatever
it
might
be.
Is
that
is
that
what
you're
thinking?
A
B
B
Yeah
that
describes
that
very
well.
I
tried
to
break
it
up
as
in
in
many
parts
as
possible,
because
I
know
that
everyone
that
we
really
want
to
have
like
the
fancy
web
interface,
but
we
can
really-
or
we
should
really
focus
on
using,
what's
already
there
and
oftentimes
it's.
It's
also
this
the
case
that,
for
example,
someone
already
has
jaeger
tracing
in
the
environment,
their
end
and
everything
else.
So
we
we
want
to
do
the
integration,
and
when
I
saw
the
data,
talk,
ci
visibility.
B
B
What
exactly
happens
in
the
background
so
which
events
are
there
and
so
on
and
from
there
I
was
like
hey.
If
we
do
that
for
datadork,
we
can
also
do
that
for,
for
example,
for
using
the
open,
telemetry
integration
or
something
which
I
found
today.
Actually,
I
need
to
scroll
down
a
little
I've
seen
at
kubecon
that
cncf
started
cloud
events
which,
which
is
something
around
a
common
way
of
describing
event
data
also
for
ci
cd.
B
So
this
this
could
be
something
outside
of
the
issue
which
I
created
for
open
telemetry,
but
keeping
it
in
mind
that
every
every
workflow
in
in
our
ci
cd
pipelines
being
executed
either
in
ruby
or
angle
there.
There
could
be
like
a
tracing
window
or
tracing
information,
but
it
could
also
be
like
metrics
or
events
or
something
like
that,
and
this
is
what
what
it
touches
base
with
everything,
but
to
really
scope
it
down.
B
I
would
want
us
to
add
the-
and
these
are
like
the
the
proposal.
First
off,
get
a
feel
how
the
open
telemetry
rubric
client
works.
So
this
is
like
the
the
sdk
also
have
a
look
into
the
go
sdk
for
the
runner
components
and
then
follow
what
is
already
there.
B
What
has
been
added,
for
example,
with
data
dog
and
the
specific
implementations
and
based
on
that
use,
a
demo
environment
around
jager
tracing
as
a
backend
and
open
telemetry,
as
the
collector
and
I've
found
that
there
is
like
a
kubernetes
operator,
for
example,
which
can
be
used
and
the
the
minimal
change
for
configuring.
This
would
be
like
a
config
setting
which
allows
us
to
specify
host
port
authorization,
and
I
think
you
need
to
specify
the
backend
as
well,
but
I
didn't
really
dive
into
it.
Yet
it's
like
more
of
a
high
level
idea.
B
This
is
how
I
would
propose
the
steps
and
when
this
is
working-
and
we
can
see
success
in
in
by
pulling
in
the
ui
for
the
icd
tracing
and,
for
example,
com.
B
Compare
it
with
the
data
dog
integration,
compare
it
with
other
ci
cd
tools
on
the
market,
how
they
solve
the
problem
and
then
from
there
we
can
see
whether
we
build
a
ui
integration
allowed
to
to
add,
for
example,
specific,
like
monitoring
observability
in
the
rig
in
regards
of
tracing
analytics
tracing
inside
something
which
is
then
on
on
on
the
enterprise
here,
for
example.
B
But
this
is
bound
to
hey.
Let's
make
this
work
now
and
since
since
it
is
groundwork
and
and
also
like
combining
existing
resources,
also
usable
for
a
single
developer,
I
was
proposing
it
for
the
free
tier
for
the
court
for
the
core
version
and
based
on
that
and
hopefully
increased
adoption
of
the
feature
and
gathering
feedback.
We
can
build
out
more
features
for
customers,
dashboards,
cd,
dashboards
and
so
on.
A
Yeah
yeah,
I
like
leveraging
this
for
free,
so
you
can
turn
it
on
and
specify
a
collector
or
by
default.
Even
we
could
make
our
jager
tracing
implementation,
be
the
collector.
If
we
go
to
the.
If
we
go
to
the
tracing
route,
I'm
gonna
do
the
product
manager
thing
and
say
what
other
ways
can
we
solve
this
problem
outside
of
tracing?
If
we
stepped
back
from
open
telemetry
and
said,
I
don't
have
insights
into
my
pipelines.
B
Or
like
how,
when
I,
when
I
prepared
the
the
ci
pipeline
monitoring
webcast
actually
last
year,
I
was
like
yeah,
we
totally
have
all
the
tools
and
all
the
metrics
and
everything
is
there,
and
then
I
learned
we.
We
don't
have
that
much.
We
have
certain
metrics
in
our
postgresql
backend,
but
this
is
rather
expensive
to
store
and
accessing.
These
metrics
is
expensive.
B
So
when
I
found
the
the
pipeline,
if
the
gitlab
ci
pipeline
exporter
from
from
our
community-
and
this
is
this
is
the
is
also
linked
in
the
pipeline
efficiency
docs-
it
was
like
okay,
we
we
have
something
around
that.
So
it's
it's
like
how
to
say
that
we
want
to
have
self-contained
monitoring
out
inside
in
in
gitlab.
It
should
be
enabled
by
default.
B
I
think,
currently,
it's
not
enabled
by
default
and
on
the
other
side,
we
also
want
to
provide
an
interface
for
for
users
to
have
existing
monitoring
and
observability
solutions,
so
like
exposing
the
pipeline
metrics
from
slash
metrics
on
gitlab.com.
Somehow
would
also
be
nice
also
for
self-managed
installations
right
now
this
this
isn't
this
isn't
possible.
The
thing
which
is
possible
is
to
query
the
rest
api,
which
then
does
as
sql
crews
in
the
back
end
and
have
a
daemon
running
in
front
which
calculates
the
metrics
in
the
promises
format.
B
B
So
it's
it's
a
really
nice
project
and
I
don't
want
to
like
close
it
down
and
just
make
make
it
available
in
gitlab
itself,
but
getting
inspired
from
it.
Providing
more
cached
metrics,
faster
access
to
to
the
performance
values
could
provide
us
with
the
possibility
to
have
something
like
this
in
an
abstracted
way
in
the
default
pipeline
overview,
so
that
we
can,
for
example,
see
maybe
with
like
the
smaller
bullets
or
something
hey.
There
is
like
a
pattern
of
failed
job,
so
hey
there's
a
pattern
of
flaky
unit
tests.
B
I
think
we
have
some
sort
of
this
detection
for
this
already,
but
more
in,
in
a
way
that
when
I'm,
when
I
navigate
into
a
project-
and
let
me
see
if
I
can
like
quickly
find
one-
I
did
a
pipeline
efficiency
workshop
a
while
ago-
and
I
started
playing
and
when
I,
for
example,
just
navigate
into
the
pipeline
overview
is
just
something
which
is
usable
yeah.
Maybe
maybe
let's,
let's
use
a
different
one.
B
We
do
have,
I
did
a
workshop
on
saturday
around
cicd
and
we
had.
B
This
pipeline
yeah-
this
looks
a
little
more
interesting.
So
when
I
like
have
this
overview,
it
would
be
interesting
to
not
only
hover
over
that
and
see
the
dependencies,
but
also
like
get
a
pop-up,
for
example,
for
this
job
took
one
minute
and
this
job
took
20
seconds,
and
just
because
there
is
like
a
default
threshold
of
40
seconds,
the
60
seconds
one
turns
red
automatically
and
I
can
change
the
threshold,
for
example,
which
is.
A
B
Like
it's
a
little
do-it-yourself
monitoring,
but
it
allows
you
to
see
or
to
filter
visually
the
the
duration
of
jobs
and,
like
the
duration
is,
is
an
indicator
of
how
long
the
job
took
it.
Doesn't
it's
not
really
a
root
cause
analysis
because
it
could
be
network
dependencies,
it
could
or
latencies
it
could
be.
Docker
pull
is
taking
too
long
because
the
image
is
10
gigabytes
and
much
much
more,
but
really
getting
and
getting
a
high
level
indica
indication
of
yeah.
B
This
actually
looks
good,
but
the
duration
is
a
little
off
here
or
in
the
past.
This
job
failed
three
out
of
ten
times
having
having
the
detail
here
and
not
having
to
navigate
into
analytics
ci
cd
because
it
it
feels
it's
interesting
over
there
as
well.
Don't
get
me
wrong
on
that,
but
it's
it's
a
different
scope.
A
Yeah
being
able
to
yeah
to
see
from
your
branch,
like
here's
all
of
the
pipeline
executions
within
this
branch,
here's
how
it
compares
to
your
target
branch.
If
you
have
an
mr
open
yeah
being
able
to
see
the
history
being
able
to
set
that
threshold
being
able
to
dig
in
and
say
this
job
like
here's,
how
the
job
has
changed,
maybe
those
are
contributing
factors
you
know
dig
in
then
from
there
to
tests
if
it's
job
in
the
test
stage
back
into
the
cicd
analytics.
A
I
think
all
of
that
is
super
interesting
as
future
iterations.
That
makes
sense.
B
If,
if
I
move
into
like
the
merge
request
view
if
there
is,
for
example,
a
deployment
into
a
staging
environment-
and
I
can
measure
the
metrics
so
I
I
know
we
have
the
possibility
to
pro
to
collect
metric
reports
ourselves,
it's
there,
but
the
possibility
to
have
this
in
a
in
a
more
automated
way.
So
it's
not
just
the
possibility
to
add
something,
but
there
is
automated
collection,
maybe
of
like
the
job,
duration
or
detecting
something
and
having
this
this
presented.
B
Somehow
I
don't
know
how
it
exactly
fits
the
ui
with
the
ux,
because
it's
the
merge
request,
has
a
lot
of
information
already
yeah,
but,
for
example,
if
the
pipeline
would
be
failing,
it
would
be
super
helpful
to
see
hey
the
pipeline
failed.
There
were
like
six
jobs
and
iteration
was
xyz
so
to
have.
B
Where
should
you
be
looking
at?
Should
you
be
open
the
pipeline
editor?
Should
it
be
open
the
pipeline
view
and
like
improving
the
navigation?
Currently,
it's
it's
a
long
way
to
really
get
into
the
job
and
see
why
it
failed
or
see
how
long
it
took.
For
example.
B
B
It's
actually
like
the
the
overview
similar
to
grafana,
seeing
jobs
or
patents
which
which
have
a
long
execution
time
and
if
tracing
exists,
you
click
on
it
and
you
can
immediately
have
like
a
pane
being
opened,
which,
which
shows
you,
on
the
right
hand,
side
hey.
This
is
like
the
trace
of
this
job,
so
you
can
see.
Okay,
the
duration
is
my
my
service
level
objective.
Let's
call
it
like
being
the
slo.
B
It
should
be
not
more
than
one
minute,
but
if
exceeds
one
minute,
we
mark
it
as
red
was
failed
and
then
you
can
invest,
start
investigating
and
not
just
like
read
the
job
logs
and
try
to
figure
out
how
long
it
took
and
manually
calculate
the
steps
from
dr
poole
to
to
actually
executing
a
script,
but
have
this
in
a
visual
way.
So
this
would
be
my
dream
of
yeah
analyzing.
This.
A
That
would
be,
like
you
said
it's
more
of
an
operations
view
if
you're
the
you're,
the
devops
team
or
you're,
like
our
engineering
productivity
team
monitoring
a
pipeline
over
time,
seeing
when
the
pipeline
starts
taking
much
longer
than
it
had
been
historically
quickly
being
able
to
diagnose
within
that.
What
are
like
some
of
our
top
contributing
factors
is
that,
oh,
we
just
have
docker
slowness
right
now,
or
these
jobs
changed
and
after
they
changed
their
duration,
took
a
long
time.
A
B
Yeah
and
exactly
and-
and
I
also
don't
want
to
like
separate
the
teams
and
make
them
silos
again-
which
which
we
often
see
when
like
hey,
this
icd
pipeline,
doesn't
work
and
like
opening
a
ticket
for
the
infrastructure
team
and
have
fixed
that,
and
they
come
back
and
say
yeah,
but
actually
it's
the
the
pipeline
configuration
which
makes
it
slow
it's
up
to
you
and
then
you
play
ping-pong
on
the
teams
which
is
which
is
not
efficient.
B
A
product
manager
designer
everyone
who
is
working
with
the
pipeline
should
be
able
to
diagnose
the
problem
and
with
documented
instructions
or
run
books
or
something
else
being
able
to
fix
it,
because
when
the
pipeline
is
failing-
and
you
really
want
to
test
some
or
use
the
revenue
apps,
for
example,
for
testing
ux,
you
cannot
use
that
and
it
blocks
your
your
workflows.
B
It's
frustrating
and
you
always
have
to
to
ask
someone
else
for
help.
But
what?
If?
If
no
one
is
available?
It's
like
hey,
I
need
to
start
reverse
engineering,
the
source
code
or
the
pipeline
code.
Now
it's
it
works,
but
it's
not
much
fun
and
given,
given
that
there
could
be.
You
know
some
sort
of
ai
involved
which
tells
you
we
detected.
The
pipeline
duration
is
going
up
always
on
a
monday
from
nine
to
five.
B
A
Say
some
of
those
insights
from
the
within
the
issue
there's
an
interesting
article
from
the
slack
engineering
team
talking
about
how
they
were
instrumenting,
some
of
their
ci
cd
pipelines
and
how
they
just
made
them
a
little
bit
better
or
a
lot
better.
In
some
cases,.
B
And
the
thing
is
and
I'm
trying
to
find
my
own
talks,
I
do
have
a
talk
around
efficient,
efficient,
second
ops
pipelines.
A
B
B
The
first
thing
is,
even
though
it
looks
nice
to
have
that
or
to
have
your
own
like
gitlab,
in
an
ios
monitoring
system
or
something
like
that.
It
it's
extra
work
if
you're
a
single,
if
you're
a
single
engineering
group
or
if
you're,
like
the
the
devops
engineer
on
your
team,
and
you
also
need
to
do
the
monitoring,
but
just
for
gitlab
or
you
want
to
use
what's
already
there.
I
think
it's
it's
better
to
have
it
integrated
and
oftentimes.
B
At
least
that's
me.
I
don't
want
to
understand
yet
another
different
tool
which
has
a
different
configuration
syntax.
I
probably
want
to
have
it
like
out
of
the
box
or
as
a
cloud
service
or
something
else.
So
I
changed
my
opinion
from
I'm
hosting
everything
myself
rather
to
let's
pay
someone
to
do
it
for
me,
because
I
don't
want
to
learn
yeah.
B
It's
super
interesting
if,
if
you're
familiar
with
it-
and
if
you,
if
you
want
to
do
it
from
ql
and
prometheus
and
graphama,
are
amazing
tools,
but
still
it's,
it
needs
maintenance.
It
needs
knowledge,
but
especially
when
this
site
breaks
not
the
ci
cd
power
plant
breaks,
but
the
monitoring
breaks
so
the
the
kind
of
additional
maintenance
and
not
just
having
it
inside
omnibus
or
inside
the
hem
charts
and
everything
works
out
of
the
box.
Even
though
it's
not
just
like
you
have
an
empty
dashboard,
but
you
have
predefined
example.
B
The
thing
is:
when
you
do
it
on
your
own,
like
do
it
yourself,
monitoring
you
don't
know
which
metrics
are
important,
so
the
best
practice,
the
opinion
from
our
engineers
from
our
product
teams
and
also
using
or
dog
fooding,
how
our
infrastructure
team
at
series
use
that
on
a
daily
basis,
because
cicd
on
gitlab.com
is
can
also
be
like
slow
and
you
need
need
the
insights.
B
I
think
it's
better
to
have
it
like
in
the
product
and
not
not
somewhere
written
down.
You
need
to
select
this
metric
and
by
the
way,
this
one,
but
we
can
like
leverage
the
knowledge
we
have
in
production
and
also
make
the
product
better.
Yeah.
A
Yeah,
if
you're,
that
person
who
notices
the
pipeline
is
slowed
down,
seeing
seeing
those
trace
fans
or
the
metrics
or
whatever
it
might
be
within
the
same
tool
versus
now,
I
need
to
go
log
into
other
vendor.
What
happens
if
that
vendor
is
down?
That
just
adds
complexity
to
trying
to
solve
your
problem,
because
now
you
have
to
wait
for
them
to
be
back
up
et
cetera,
et
cetera,
so
I
could
okay,
that
makes
sense.
A
Yeah
I,
like
the
we
can
predefine
some
dashboards
and
be
opinionated
about
what
a
good
pipeline
looks
like
or
what
what
things
you
should
pay
attention
to
within
your
pipeline,
what
might
be
contributing
factors
to
it
slowing
down?
I
think
that
there's
appetite
for
that,
especially
for
those
folks
who
are
single
engineering
teams
or
there's
the
devops
person
on
the
team
or
within
engineering
and
or
even
worse,
that
devops
person
left.
I've
talked
to
a
few
customers
like
that.
A
B
And
the
other
thing
I
wanted
to
mention,
and
now
I
found
the
slide
and
the
issues
which
are
the
epics,
which
I
can
then
link
to
just.
Let
me
quickly
share
the
presentation
with
you
in
slack,
the
idea
is
like
the
the
animated
gif
is
something
I
stole
from
the
board
from
the
epic
vitica
created,
so
that
you
can
really
like
see
the
timeline
of
the
traces
or
the
spans
or
the
jobs,
something
which,
which
also
gives
you
a
health
indication
health
indicator.
B
If
this
continues
like
this,
the
pipeline
will
be
broken
all
the
time,
something
which
puts
the
visual
aspect
also
with
the
the
cake
diagram
over
here,
which
is
something
where
I
think
we
saw
of
with
github
actions
you
can't
make
make
like
the
pipeline
view,
the
one
you
really
want
to
be
looking
at
when
debugging
a
problem
and
not
having
to
navigate
into
the
job
details
all
the
time,
because
the
job
blog
is
it's
interesting
and
it's
nicely
formatted.
B
But
it's
too
much
really.
We
really
want
to
condense
down
the
the
level
of
detail
and
is
there
anything
else
I
wanted
to
show
on
this
slide.
B
Potentially
not
the
other
thing
I'm
like
thinking
about
is
how,
when
we
add
ci
cd,
tracing
and
ci
cd
insights,
how
can
we
like
build
something
like
quality
gates
with
captain
or
something
else
into
cicd?
B
So
like
using
a
mechanism
of
saying
it's
not
only
ci
cd,
as
the
framework
is
the
tooling
which
we
want
to
monitor
or
observe,
but
we
also
leverage
that
into
hey
the
deployment
doesn't
perform
that
good.
B
We
provide
out-of-the-box
possibility
to
monitor
the
deployment
and
when
the
slow
is
not
right
and
so
on,
so
I'm
not
talking
about
duplicating
captain's
work
here,
but
just
having
this
in
mind,
what
that
we
also
need
to
look
beyond
our
own
ecosystems
and
and
and
insights.
B
But
what
else
comes
to
mind
and
when,
when
something
around
metrics
is
failing,
maybe
we
do
a
rollback
of
the
deployment
something
like
that,
but
this
I
think
this
goes
a
little
beyond
the
scope
of
of
pipeline
insights
yeah.
But
it's
still
an
interesting
idea.
A
And
I've
talked
to
kevin
about
that
of
how
does
it
compare
to
production
if
we're
or
compared
to
test
data
or
previous
production
data?
If
traces
are
now
different
in
a
meaningful
way
and
doing
an
auto
roll
back?
Based
on
that
things,
like
that,.
B
Then
that's
again-
and
this
is
me
thinking
about
many
topics-
you
know
you
have
traces,
you
have
logs,
you
have
metrics
and
maybe
you
want
to
do
profiling
or
continuous
profiling
things.
We
have
seen
with
polar
signals
now
or
I
think
it's
called
parser.
B
Parser.Def
is
it
the
domain
yeah,
so
they
they
open
sourced
the
tooling
or
the
tool
stack
around
continuous
profiling.
I
didn't
try
it
out
yet,
but
having
the
the
possibility,
what
polar
signals
provides?
B
B
This
is
very
like
it's
based
on
the
application
itself,
so
you
won't
really
want
to
want
to
preload
something
or
see
which
which
sys
calls
are
being
performed
by
the
application
of
which
internal
function
it.
It
calls
very
often,
and
the
function
takes
one.
Second,
it's
fine,
but
it
takes
five
seconds,
it's
bad,
but
this
is
also
an
interesting
like
idea
around
you
deploy
not
only
production
and
you
profile
your
production
environment.
B
But
what
if
I
deploy
my
merch
request
into
a
staging
environment
and
everything
gets
monitored
inside
and
profiled,
and
then
I
can
compare
a
merge
request,
which
is
essentially
like
a
git
commit,
and
I
can
diff
those
things
and
I
can
compare
everything
which
might
have
had
an
influence,
and
one
of
the
examples
I
experienced
myself
in
the
past
was,
let
me
see
if
I
can
quickly
find
another
talk
of
mine,
which
I
will
be
talking
about
all
the
devops.
B
This
week,
too,
is
I
was
so
we
were
building
a
a
demon
written
in
c
plus
plus
and
for
some
reason,
the
the
api
front.
End
was
not
fast
enough,
so
we
thought
about
moving
from
threading
to
co-routines
lightweight
stacks,
basically
lightweight
threats.
B
Basically,
the
problem
was
that
the
memory
was
increased
by
using
those,
and
we
also
had
some
crashes,
which
we
couldn't
really
detect
over
time
and
by
like
having
having
something
inside
cicd
and
developing
the
feature
and
deploying
it
and
detecting
those
regressions
beforehand
would
have
been
really
really
nice,
but
we
didn't
have
that
back
then.
So
we
debugged
for
half
a
years
or
something
with
bisecting
of
bisecting.
B
The
release
and
the
next
major
release,
which
were
like
a
thousand
commits
and
every
single
commit
was
deployed
to
production,
ran
for
three
days
on
a
production
in
a
in
a
production
environment
ran
for
three
days
and
then
you
you
continued
searching
for
the
problem
and
after
a
while,
you
figured
out
hey.
This
is
the
commit
that
breaks
yeah.
But
if
you
apply
the
fix
to
it,
it
doesn't
work.
B
Detecting
those
regressions
or
performance
problems
earlier
in
the
process
before
actually
releasing
that
to
customers
who
call
you
and
say:
please
fix
that
now
and
yesterday
and
you're
saying
hey,
I'm.
I
have
no
idea
what
to
look
at.
This
is
something
which
which
plays
into
that
as
well.
A
B
And
again,
it's
like
the
problem
of
this
issue
or
this
feature
request
is
really
scoped
to
tracing.
But
it's
like
building
lego
bricks,
one
ties
in
another.
We
have
different
tools
available.
We
need
to
like
choose
what
is
already
there
and
then
then
collect
also
feedback
from
our
wider
community
and
customers
so
that
we
can
we.
We
also
know
their
use
cases,
because
I
can
like
source
from
my
10
years
or
20
years
of
development
experience.
B
But
there
are
so
many
like
challenges
out
there
and
different
different
things
in
the
programming
language,
and
you
know
that
from
unit
testing,
it's
challenging
to
find
like
the
the
defined
format
yeah
for
for
ci
cd.
I
think
we
can
like
for
our
own
pipeline
and
runners.
We
can
go
our
own
way
because
we
collect
that
and
we
have
that
available
when
it
comes
into
slo
collection,
quality,
gates,
profiling
and
other
stuff.
B
We
can
look
into
ways
making
it
easier
for
for
our
community
to
use
it
yeah
provide
examples,
use
cases,
demo,
videos
and
so
on.
But
yet
this
is
like
me
thi
or
this
is
the
three-year
plan,
or
I
don't
know
it's
an
arbitrary
number
of
years.
It
will
take
to
really
achieve
that.
Everything
is
lovable.
A
Yeah
yeah
yeah,
I
like
the
I
like
the
starting
point,
though
of
your
pipeline.
You
want
more
insights
into
your
pipeline.
You
could
go
instrument
it
yourself
and
get
the
data
out,
there's
a
project
out
there
for
it.
It
works
for
git
lab,
but
you
want
more
insights
than
what
it
provides.
You
want
them
within
the
same
tool.
You
don't
want
to
stand
up
and
host
this
thing
yourself.
A
So
if
we
just
start
with
we're
providing
data
that
you
could
point
at
a
collector
and
if
we
can
then
integrate
with
or
point
that
at
our
own
jager
collector
or
a
prometheus
collector,
so
you
can
see
it
within
your
within
git
lab
itself,
still
all
within
the
free
tier.
I
think
it's
a
great
start
and
we
can
go
from
there
and
see
what
is
super
interesting
or
what
what
is
interesting.
A
Does
that
sound
as
reasonable
as
a
mvc?
Yes,
I
like
the
way
that
you've
broken
it
down.
I
think
the
the
progression
of
steps
makes
sense.
We'd
want
to
get
let's
look
at
some
engineering
folks,
looking
at
it
and
poking
at
it
a
little
bit,
but
I
feel
like
the
the
problem.
Is
it's
a
good
problem
to
try
to
go
solve
and
to
tackle
this.
B
Yeah,
I
think
we
should
be
looking
into
tracing,
because
everyone
is
looking
right
now,
so
the
technology
is
evolving
and
there
is
was
there
suppose
there
are
possibilities
to
provide
our
feedback
to
the
open
telemetry
project
and
to
to
to
the
cloud
native
community
so
with
metrics
and
the
the
I've
linked
the
epic
now
which
which
proposes
the
fuse
and
the
things
which
we
already
have.
This
is
a
separate
problem
to
solve,
so
we
we
have
metrics
in
in
our
backend.
B
We
want
to
present
that
in
the
ux
and
the
peplum
duration,
but
this
can
be
tackled.
I
think
in
a
separate
way,
the
only
difference,
or
the
only
thing
which
has
it
in
common-
is
measuring
the
overall
job
duration,
for
example,
because
the
job
duration
can
also
be
a
tracing
span.
B
This
is
like
an
overlap,
so
it
might
make
sense
to
have
the
same
engineering
teams
work
on
this,
or
at
least
like
sync,
on
this,
but
from
like
the
the
mvc
standpoint,
you
can
totally
start
open,
tell
adding
open
telemetry
into
the
ruby
code.
Now
then,
as
a
second
step,
look
into
the
runner
code,
for
example,
see
how
to
communicate
how
to
like
sync
that
even
but
it's
like
the
separate
the
separate
step
first,
things
first
would
be
just
to.
B
B
I
I
did
that
a
while
ago
with
jaeger
tracing
and
c
plasmas-
and
I
think
it's
I
know
I'm
bad
at
estimating
hours.
I
don't
do
that,
but
I
would
I
would
say
it's
it's
an
interesting
technology
and
we
should
be
looking
into
it
and
if,
if
I
can
help
like
giving
thoughts
or
giving
directions,
I'm
happy
to
jump
in
yeah.
A
A
I
think
that's
what
I
would
describe
as
next
steps
yeah.
We
might
do
a
little
just
a
little
bit
more
validation
around
the
problem
space
to
make
sure
that
where
we
start
is
the
right
place,
we're
just
getting
job
duration
and
doing
it
from
spans
might
be
interesting,
as
well
as
getting
the
spans
being
able
to
collect
them
somewhere.
So
you
can
visualize
it
yourself.
I
think
that
would
be
a
good
step
like
a
first
good
outcome,
rather
not
a
first
step
with
the
the
outcome
we
would
want
for
users
yep.
B
Okay,
measuring
or
tracing
what's
what
we
already
have
in
our
code
and
at
the
later
point
we
can
say
we
want
to
make
it
like
a
user
function
in
ci
cd.
So
you
you
have
like
exclamation
mark
trace
as
the
yaml
function
or
I
don't
know
no
idea
about
the
implementation,
but
then
think
about
how
to
solve
it
on
the
runner
side
to
trace
a
script
execution,
for
example.
B
But
this
is
this
is
beyond
the
scope
of
the
first
success.
Now.
A
Yeah
yeah
cool,
well
anything
else
you
want
to
cover
before
we
wrap
up.
I
think
we're
getting
close
to
time.
B
Wanna
make
us
focus
on
getting
started
and
having
having
the
first
mvc
in
in
hopefully
soon
the
future
and
also
see
what
things
we
can
reuse
things.
Maybe
it
makes
sense
that
you
have
a
have
a
talk
with
andrew
nudigate
from
from
our
infrastructure
team.
He
has
knowledge
around
open
telemetry
as
well,
so
yeah
just
collecting
more
ideas
for
the
future,
but
also
like
finding
a
way
to
get
started
now
or
in
the
in
the
coming
release.
Planning
yep.
A
Yeah
we'll
evaluate
this
with
the
existing
priorities
for
the
upcoming
milestones
and
see
where
it
fits
in.
But
I
think
that
this
is
something
worthwhile
to
dig
into
a
little
bit
more
for
sure.
B
And
our
last
thought
we
can
create
content
and
blog
posts
around
it.
Sharing
our
insights,
our
learnings
around
it
and
engage
more
with
the
cloud
native
community
in
that
regard.
Yep
agreed.