►
Description
Slides: https://docs.google.com/presentation/d/15CzbqO3leXOnH3Pwz94zYRzeOT8g92YQK7wC-Ii8HzU/edit
Meetup: https://www.meetup.com/everyonecancontribute-cafe/events/282736146/
0:00 Introduction
1:40 Presentation start
3:16 3 Pillars of Observability: Metrics, logs, traces
10:47 Profiling
11:37 Overlap of Observability
15:47 Known and Unknown
17:14 Observability example: Docker Hub Rate Limits
18:55 OpenTelemetry & Tracing History
23:08 Use case: CI/CD Observability https://gitlab.com/gitlab-org/gitlab/-/issues/338943
25:31 Use case: Quality Gates
28:41 From DIY Monitoring to Observability
30:40 o11y.love as learning collection
31:30 Group discussion
A
I
think
it's,
the
47th
version
of
the
avalon
contribute
cafe
and
after
last
year's
session,
while
we
talked
about
raycast,
which
is
a
workflow
application
on
mac
os,
I'm
shared
by
michael
eichner,
and
we
sneaked
the
upstream
acquisition
into
the
meetup
actually
with
seb
and
matt
joining
live,
and
we
thought
about
well,
what's
what's
going
on
with
observability,
how
does
it
help
with
ci
cd
pipelines
and
and
so
on,
and
just
like
this
somehow
sparked
the
idea
for
today's
meet
up
and
in
order
to
get
the
conversation
going,
I've
prepared
a
long.
A
You
know
a
short
sled
deck
so
just
to
bring
everyone
up
to
speed
what
we
are
talking
about
and
if
you're
new
to
monitoring
or
observability.
So
you
can
like
get
the
feeling
or
get
get
an
idea.
I
don't
want
to
do
any
like
frontal
screen
sharing
now
and
I'm
speaking
and
you're
listening
and
we're
not
joining
the
conversation.
A
So
I
would
encourage
you
to
whenever
you
have
a
question
or
you
want
us
want
to
discuss
a
specific
topic
or
something
is
unclear
just
to
jump
in
and
unmute
yourself
and
yeah.
That
being
said,
niklas
helped
me
prepare
the
slides
for
today,
and
I
think
we
should
be
just
be
going.
The
thing
is
where
to
start
with
observability
well,
something
which,
which
I
always
think
about
is
when
talking
about
observability
and
monitoring
quickly,
make
it
a
bigger
one,
bigger.
A
A
Then
we
had
stage
changes
over
time
and
metric
data
points
and
everything
else,
and
we
have
been
moving
on
in
the
past
years
with
defining
service
level
objectives
slos
with
an
agreement.
A
So,
like
I'm
agreeing
with
my
customer
to
have
99.5
percent
availability,
we
have
objectives
which
are
normally
higher
than
the
level
because
we
want
to
have
yeah,
basically
100,
but
not
not
really,
and
there
are
certain
indicators
and
this
moved
into
defining
the
four
golden
signals
which
is
defined
by
the
google
sre
series
and
it
turned
out
to
be
something
like
okay,
we
have
latency
traffic
error
situation,
but
at
a
certain
point,
to
see
more
to
to
get
some
more
insights
to
monitor
things,
to
observe
things.
A
A
And
this
has
been
quite
some
discussion
in
the
past
years,
so
like
to
get
things
started
with
matrix
and
with
the
first
pillar,
which
could
be
metrics
like
we
have
prometheus
prometheus
is
simply
said,
a
daemon
which
collects
the
metrics
from
end
points
and
does
some
auto
discovery
and
much
much
more.
But
in
the
end
it's
a
simplified
way
of
getting
insights
into
your
application
into
your
services
and
collect
the
metrics.
A
It
has
its
own
query,
language,
which
is
called
promql.
There
are
certain
functions
which
allow
you
to
aggregate
and
calculate
the
metrics
to
present
them
in
a
different
view.
This
is
helpful
to
write
your
own
queries
generate
things
and
later
on
also
work
with
alerting.
The
thing
is
as
a
developer.
It's
often
hard
to
like
start
with
that.
So
where
do
I
actually
add
my
code?
What
is
what
is
like?
Where
do
I
start
with
metrics?
For
my
monitoring
for
my
slos
is?
A
So
you
can
define
your
own
metrics
with
that
bench
in
my
app
instrumentation,
but
the
thing
is
also:
you
have
infrastructure
monitoring
like
memory
cpu
io
on
the
node,
maybe
in
your
kubernetes
cluster,
on
the
part
on
the
cluster
nodes
and
much
much
more
for
your
services,
where
the
promises
exporter,
for
example,
is
implemented
in
in
the
in
docker
and
in
specific
other
ways.
There
are
planned
libraries
available
to
make
your
life
much
more
easy.
A
So,
like
learning,
this
can
be
playful
with
python
example,
which
is
shown
on
on
the
screen:
shot,
build
something
and,
for
example,
deployed
into
kubernetes
in
kubernetes.
Install
the
promises
operator,
use
the
custom
resource
definitions
for
service,
monitor
and
inspect
the
metrics.
Then
this
can
be
like
a
fun
learning
way
to
actually
move
on
and
okay,
I've
implemented
metrics
now,
but
the
thing
is
we
not
only
have
metrics
when
we
want
to
look
at
things.
A
So
we
are
thinking
about
sickness,
something
which
describes
a
certain
state
as
that,
a
certain
thing
which
happens
and
we
think
about
metrics
logs
events,
traces
and
profiles
or
profiling,
and
we
also
need
to
break
up
monolith
and
microservices,
and
there
is
so
much
many
things
to
unpack.
So
it's
it's
a
great
way
to
like
focus
on
app
instrumentation,
the
first
time
similar
thing
for
logs.
A
There
are
so
many
decisions
to
be
made,
so
many
tools
involved
and
stacks
to
be
evaluated,
that
it
makes
sense
to
really
focus
on
evaluating
the
options
and
also
as
a
developer,
focus
on.
How
do
I
log
things?
What
is
structured
logging?
How
can
I
improve
the
performance
of
my
logs
and
so
on?
So
this
also
evolved
in
parallel
to
metrics
over
the
past
years.
A
It's
still
a
long
story
and
many
many
decisions
to
be
made
which
led
us
to
traces
and
distributed
traces
in.
In
that
specific
regard
and
nicholas,
please
correct
me
if
I'm
saying
something
wrong,
the
thing
is
traces
and
spans
work
in
a
different
way
to
logs
so
span
has
a
start
in
the
end.
A
Time
has
some
context,
so
you
define
telling
the
user
where
the
specific
thing
happened,
and
you
also
need
code
additions
if
there
is
no
automated
tracing,
and
you
also
needed
to
learn
about
specific
implementations
back
ends
and
collectors
so
like
back-end
tracing
is
a
tool
which
can
do
it.
Grafana
tempo
is
a
tool
which
can
do
it
and
there
is
a
collector
with
open
telemetry
as
a
specification
which
can
do
it
so
again,
long
learning
curve.
Many
many
things
to
unpack
and
tracing
became
the
third
pillar
for
observability
at
least
probably.
B
People
also
know
traces
already
before
so
probably
you
can
assume
it
or
you
can
see
traces
every
not
every
day,
but
if
you're
working
with
a
browser
and
need
to
debug
your
front
end
locally,
when
you're
clicking
on
inspect
in
the
network
tab,
everything
in
there
is
also
a
trace.
In
the
end,
every
request
that
is
made
is
the
trace.
B
So
and
why
is
no
tracing
something
new
in
the
system
because
traces
were
established?
You
need
to
go
to
network.
B
And
now
you
need
to
refresh
the
site
and
then
you
see-
and
you
see,
on
the
top
the
diagram
this
trace
in
the
end.
So
since
you're
doing
multiple
requests,
loading,
something
and
you
see
now-
something
is
happening
and
it
does
zoom
after
that.
That
depends
on
how
the
website
is
configured.
B
If
google
presentation
do
it
really
in
not
in
a
parallel
range,
so
they
do
in
a
step
after
step,
but
you
would
the
interesting
point
interest
is
that
you
see
the
total
time.
This
is
the
length
the
length
the
longest
span
in
the
end,
and
then
you
can
also
see
parallel
requests
and
where
parallel
requests
often
happen
is
in
distributed
system
or
in
service
oriented
systems.
B
So,
for
example,
think
about
a
simple
microservice
architecture,
where
you
have
three
services:
a
booking
service
and
order
service
and
in
the
transfers,
and
you
want
to
get
now
insight
into
this
architecture,
how
which
service
called
whom
and
who
gets
with
which
data
and
how
long
does
it
take
and
all
the
stuff
is
helping
with
address
or
distributed
tracing
in
the
end?
The
big
benefit
of
that
is
that
you
can
decide
which
data
you
want
to
put
in
to
see
the
value
out
of
it.
So
it's
in
short,
import
choices.
A
This
was
a
quick
live
tryout
thanks
for
the
reminder
of
a
browser
developer
tool,
because
this
was
also
the
thing
which
reminded
me
immediately
when
I
saw
jaeger
the
first
time
I
think
in
2017
or
2018
and
thought.
Oh,
I
can
totally
debug
slow
websites
with
them,
there's,
certainly
more
behind
it
and
like
many
many
new
things
came
about
and
we
was
with
the
switch.
A
I'm
not
I'm
not
planning
to
talk
all
day,
but
here's
the
thing
profiling
also
came
around
and
became
popular
with
open
source
in
the
past
year,
actually
so
like
providing
application
performance
insights
in
c
when
a
function
is
called
too
often
or
taking
too
much
time
that
you
can
see
that
in
a
graph
and
analyze
that
similar
to
how
you
work
with
traces
and
it's
a
it's
an
additional
data
source,
and
so
we
could
theoretically
speak
of
the
four
pillars
of
observability
now,
but
there's
discussion
going
on
if
it's,
whether
the
pillars
or
if
it's
something
else
with
observability
for
now.
A
We
also
thought
about
discussing
like
an
overlap
with
observability,
since
we
kind
of
used
to
collect
lots
and
metrics
over
the
past
decade,
distributor
tracing
came
a
little
later
because
it
there
were
open
source
tools
being
developed
with
open
tracing
and
open
sensors
and
with
open
source.
The
adoption
just
got
wider
and
much
more
powerful,
also
in
community
building.
A
A
Where
everything
comes
together,
we
actually
need
to
put
in
profiling
as
a
data
source
or
as
a
way
to
observe
things
or
to
correlate
or
to
analyze
things.
So
we
need
all
these
data
being
available,
but
there
is
a
certain
overlap.
What
is
collected,
for
example,
I
could
write
a
trace
and
a
span
similar
to
how
I
write
logs
in
my
application,
like
defining
timing,
points
and
durations
and
saying.
A
Okay,
the
request
to
the
for
the
client
to
the
website
is
dependent
on
my
microservices
architecture
and
the
background,
and
the
query
goes
to
the
http
server
to
the
database
cluster
in
the
backend
to
the
redis
cache
and
to
the
front
end
again.
What
is
blocking
the
service?
For
example,
I.
B
Think
also
that
a
lot
of
people
are
using
like
from
probably
it's
also
interesting,
now,
of
course,
with
tracing,
but
also
with
how
it
comes
from.
How
would
I
use
it
on
myself?
Probably
some
people
are
already
using
it
already,
so
at
least
the
two
intersections
so
metrics
and
logging
is
mostly
the
standard
I
would
say
in
production
apps,
because
you
use
it
mostly
also
in
a
daily
base,
because
you
would
get
metrics
to
get
the
data
to
see.
B
So
this
is
also
a
step
that
probably
a
lot
of
people
do-
and
this
is
mostly
the
basic
system
that
everyone
has
and
also
interesting,
because
in
this
data,
what
you
see
regarding
low
volume,
high
volume
means
like
how
many
information
needs
to
be
stored
so
submit
metrics
are
mostly
only
numbers.
It's
not
so
many
data
that
need
to
be
stored
for
lots.
It
can
be
really
big.
You
need
to
be
have
a
big
system
for
doing
all
the
loading
stuff.
B
A
And
I
think
data
is
a
good
point.
You
need
storage
for
that.
So,
if
you're
planning
to
to
evaluate
something
or
plan
for
the
long
term,
metrics
traces
and
logging,
I
think
the
logs
will
take
consume
the
most
space,
if
not
aggregated
but
yeah,
it
needs
back
back-ends.
It
needs
tools
which
which
might
need
availability,
high-availability,
distributed
systems
and
so
on.
So
it's
just
it's
a
challenge
and
you
might
be
seeing
or
might
be
seeing
the
three
pillars
of
observability
being
discussed
on
the
internet.
A
You
might
also
be
seeing
something
which
which
mentions
the
nouns
and
the
unknowns
so
like
the
unknown
unknowns.
We
are
not
aware
of
and
do
not
understand.
A
This
is
something
which
is
mentioned,
for
example,
when
you
read
the
honeycomb
dot,
io
blog
posts
around
what
data
they
collect
and
which,
like
events
and
signals,
I
think
they're
using
a
column-based
storage
engine
in
their
cloud
and
they
collect
basically
everything
with
the
agent
and
the
b
lines.
So
they
have
a
lot
of
data
available
and
you
might
be
drawing
a
conclusion
from
something
I
think
this
on
the
next
slide.
A
Like
did
you
know
that
maybe
the
dns
resolution
latency
increases
your
cloud
costs.
This
is
something
which
I
probably
I
would
be
thinking
about
it
because
I
I
know
that
dns
is
always
the
problem,
but
in
the
end
it's
it's
not
really
a
known
fact,
but
we
might
be
understanding
it
if
we're
just
collecting
all
the
data
which
which
we
have
available
on
the
other
side,
like
the
known
things
which
reward
a
monitor,
is
monitor
the
state
of
your
application,
you
either
get
ping
ping
works
or
ping
does
not
work.
A
Let
me
see
what
else
is
there
yeah
one
one
example
which
happened
last
year
and
we
discussed
it
in
the
everyone
can
contribute
cafe,
were
the
docker
hub
rate
limits
where
we
didn't
really
know
what
will
be
happening.
We
knew
that
there
will
be
limits
when
you're
doing
a
docker
pull
and
after
a
while,
I
think,
100
in
six
hours
or
something
it
didn't
work
anymore.
A
So
we
thought
about
what
could
be
affected:
our
cicd
pipelines
because
we're
using
docker
cloud
native
deployments,
kubernetes
clusters
clusters
and
so
on
organizations
behind
the
nut,
which
is
still
a
thing
in
modern
infrastructure
and
another
cloud
providers
which
act
behind
certain
ip
nets
and
so
on.
So
we
had
a
known
state
like
we
could
simulate
something.
We
wrote
a
premises
exporter
back
then
and
could
monitor
something.
A
But
the
thing
is:
if
you
cannot
detect
that
or
you
would
like
to
detect
that
an
unknown
state
could
be
like
you're
deploying
something
in
your
cluster.
Does
the
icd
pipeline
kind
of
work?
But
you
have
logging
with
too
many
requests
and
the
problem
is,
you
cannot
reliably
detect
that
and
your
customers
see
different
prices
on
your
website
and
they
think
they
bought
something
for
a
hundred
dollars.
A
But
actually
it's
costing
200
because
you
increase
the
price,
but
it
didn't
reach
them,
and
this
could
be
a
problem
for
many
businesses
and
we
thought
about
yeah
how
to
how
to
really
understand
all
these
things
now
coming
to
metrics
or
coming
from
tracing
metrics
and
logs
and
there's
a
little
bit
of
profiling
to
an
idea
of
unifying
that
and
open
telemetry
was
founded
or
just
to
circle
back
a
little
bit
in
time
in
2016
for
distributed
tracing
the
open
source
projects
open
sensors.
A
I
think
this
was
driven
by
google
and
open
tracing
have
been
formed,
and
this
was
a
specification
and
also
client
library,
implementation
which
allowed
you
to
instrument
your
code
and
send
traces
to
tools
like
zipkin,
jaeger
data
door,
light
step
and
so
on.
This
has
gotten
like
doer
development
or
overlaps,
so
open
telemetry
was
founded,
yeah
and
it
aimed
to
merge,
open
tracing
and
open
sensors.
It
became
a
cncf
tech
working
group
forbes
in
the
observability
space.
A
The
project
was
created
and
in
2021
it
also
added
metrics
and
logs
to
its
agenda
and
became
an
incubation
incubating
project,
so
hopefully
soon
will
be
ga
ready
to
use
and
graduating
and
most
recently,
last
week,
open
tracing
has
been
deprecated
or
there
is
the
idea
to
deprecate
it
and
announce
it,
which
is
linked
so
open
telemetry
is
here
to
stay
now.
A
What
is
it?
How
can
you
like
use
or
combine
it
with
your
existing
tool
stack
or
how
can
you
get
started?
One
thing
you
need
to
understand
is
there
is
a
collector
or
a
side
car
which
then
consumes
the
traces
and
the
metrics,
but
you
still
need
to
provide
your
own
backends
so,
for
example,
jaeger
for
traces
promises
for
metrics.
A
If
you
want
to
instrument
your
application,
there
are
client,
libraries
and
sdks
in
development
which
either
allow
you
manual
instrumentation,
with
c
plus,
plus,
go
and
so
on,
which
is
linked
in
the
getting
started:
documentation,
for
example,
for
go
or
certain
languages.
Allow
you
to
do
auto
instrumentation,
so
something
which
I
think
plays
proxy
somewhere
in
the
code
and
then
does
automate
automatic
tracing
or
automated
instrumentation.
A
A
from
a
visual
overview
you
can
think
of
having
the
the
agent
or
the
collector
service
running
in
your
qrs
cluster,
or
either
being
instrumented
from
ports
from
virtual
machines
and
so
on.
This
is
a
picture
which
I
copied
from
the
docs
other
examples
for
adoption
range,
for
example,
for
kubernetes
system
components
which
has
been
added
in
1
22
in
alpha.
I
think
so.
The
adoption
is
is
going
further,
more
ideas
on
using
open
telemetry,
for
example,
with
up
stress
tracing.
A
There
has
been
an
early
demo
being
shared
last
week
or
this
week.
The
other
theories
other
idea
is
to
link
metrics
with
traces.
This
is
called
x
and
plus,
if
you
run
a
if
you
run
into
that,
and
one
other
thing
which
came
about
is
the
icd
observability
with
tracing
by
tracing
your
pipelines
and
find
out
the
drop
duration
and
so
on,
which
can
also
be
achieved
with
open
telemetry.
A
And
just
keeping
in
there
the
thing
is
so
we
talked
about
tracing
like
how
does
this
trace
and
the
span
look
like
and
the
back
ends,
and
so
on.
This
is
a
copy
from
the
earlier
slide.
The
idea
is
that
you
can
see,
for
example,
the
cloud
resource
costs.
They
are
very
high
because
you
have
many
ci
cd
jobs
which
are
long,
lasting
and
failing
all
the
time
you
actually
don't
need
them.
A
You
have
slow
caches
for
the
icd
in
your
infrastructure
and
you
have
a
certain
network
latency
when
containers
are
being
pulled.
This
is
something
which
is,
I
would
say,
hard
to
detect
unless
you're
scraping
the
logs
and
and
trying
to
understand
ability
by
yourself
now
the
idea
is-
and
I
have
been
working
on
this
in
the
past
two
weeks-
to
create
the
jobs
to
be
done.
A
This
is
described
in
this
gitlab
issue,
which
is,
I
think,
15
pages
long.
Meanwhile,
but
yeah.
The
idea
is
to
really
start
the
implementation
and
make
ci
cd
observability
a
breeze
in
the
future.
A
Okay,
the
other
thing
which
is
like
interesting
in
observability-
and
this
is
something
for
left,
shifting,
slos
quality
gates,
which
is
something
we
discussed
in
2020
so
like
one
year
and
some
months
ago,
around
captain
using
the
knowledge
of
metrics
and
alerts.
A
A
This
is
one
of
the
more
advanced
things
you
can
do
with
metrics
and
left
shift
dslos,
so
captain
basically
plays
the
quality
gate
and
you
can
use
either
the
graphic
graphical
interface
or
also
define
your
own
yammer.
If
you
want
to
no
it's
it's
very
easy
to
try
out,
and
there
are
many
tutorials
available
to
instrument
and
not
insurance,
to
measure
the
service
level
objectives
for
your
application.
A
For
me
personally,
this
would
have
helped
me,
for
example,
detect
certain
cic
c
plus
plus
co-routine
crashes,
on
the
stack,
but
only
with
with
1000
api
clients,
which
you
don't
have
in
a
development
environment
and
having
having
that,
for
example,
in
a
test
environment
would
certainly
have
helped
with
quality
gates,
not
merging
it
and
not
releasing
it
to
customer
environments.
A
This
is
a
definition
how
it
works,
and
this
is
a
playground
demo,
but
yeah.
The
thing
is,
captain
should
be
acting
as
a
can
can
act
as
a
quality
gate.
Promises
for
slos
and
simulating
a
production
environment
or
incidents
is
hard,
so
we
could
be
adding
chaos
engineering
to
that.
A
A
In
the
observability
space
in
the
monitoring
space
here
in
chaos,
engineering
similar
thing-
you
want
to
kill
your
parts
asynchronously
randomly
you
maybe
want
to
chaos
into
chaos,
engineering
with
network
connections-
maybe
I'm
I'm
creating
something
around
dns,
so
bgp
routing
and
still
verify
that
everything
is
operational
and
the
service
level
objectives
are
still
matching
see.
What
else
do
we
have
here?
A
And
use
the
benefits
of
cloud
environments.
The
other
thing
is
like
to
see
the
value
in
logs
metrics
and
traces
to
get
started
quite
easy
and
see
how
far
you
can
get
in
your
observability
story.
A
A
The
other
thing
which
is
interesting
or
which
which
hopefully
comes
around
is
like
some
machine
learning
which
allows
us
to
correlate
metrics
and
traces
in
the
future
and
to
easy
easily
make
it
more
easy
to
identify
any
bottlenecks
or
ci
cd
pipelines
being
blocked
or
external
resources
being
the
root
cause
of
it,
and
the
other
thing
I
want
to
highlight
is
cd
events,
which
is
a
newly
formed
specification
or
2b
form
specification
for
continuous
delivery
events,
which
sounds
very
interesting
to
join
and
spark
the
conversation
not
only
for
continuous
delivery,
but
also
for
cloud
native
environments.
A
A
A
A
B
C
I
was,
I
was
just
gonna,
ask
I
mean
yeah,
I
don't
know,
maybe
just
in
general,
so
yeah.
I
also
I've
been
working
on
building
a
open
source,
continuous
profiler
for
about
a
year
now,
but
one
of
the
things
that
that
I'm
kind
of
interested
in
is
like
you
know,
with
obviously
there's
like
tons
of
tools
out
there,
there's
tons
of
signals
that
you
can
like
add
to
your
your
workflows.
B
Probably
I
can
give
a
little
bit
view
on
my
view
on
this
so
because
we
started
we're
using
mostly
only
open
source
observability
to
it,
so
we
don't
use
any
vendor
right
now
and
what
we
did
so
in
my
past
company.
So
my
past
experience
we
used
for
standards,
but
in
my
current
company
we're
doing
a
full
other
way.
B
So
then
you
have
only
a
big
big
boiler
of
mud
of
information,
and
then
you
don't
know
where
to
start,
and
then
you
probably
ignored
here,
because
it's
not
important,
then
you
get
a
configuration
drift
probably
and
the
main
benefit
is
then
really
starting
with
less
information,
and
you
will
get
up
and
up
and
up
because
not
everyone
started
with
one
hundred
percent
and
not
everyone
is
the
google.
B
B
This
needs
to
be
different
components
and
mostly
all
newcom
components
that
coming
up
are
mostly
modular
monoliths
in
the
end,
so
that
you
can
turn
off
features
on
and
off,
but
you
don't
need
to
have
multiple
deployments
need
to
check
how
it's
working,
of
course,
the
blast
releases
a
bit
a
little
bit
higher,
but
practically
mostly
most
people
don't
hit
that
so
and
for
the
specialized
should
go
into
the
specialized
world.
That's
also
totally
fine
so,
but,
as
I
said,
I
think
not.
Everyone
has
the
same
problems
and
that's
also
why
not.
B
C
B
A
lot
of
insights
from
the
community
when
you
talk
to
people
how
they
handle
this
problem,
and
you
probably
oh,
this
was
a
little
bit
complex.
What
we
currently
doing,
probably
we
should
reshape
it,
and
I
think
the
main
role
is
overall
to
reduce
the
complexity
that
human
can
take
it
and
the
machine
can
take
that
complexity.
B
B
And
I
think
also
what
a
difference
can
be
if
you're
an
early
adopter
or
not.
So
if
you
are
starting
really
early
with
the
tools,
then
they
are
not
so
complete
and
you
grow
with
them
when
you,
when
you're
on
top
on
that,
then
it's
also
probably
not
so
problematic
to
get
this,
but,
for
example,
in
prometheus
yeah
we
didn't
know
we
had
like
we
implemented
for
us,
also
the
remote
right
internally
for
some
tools
that
we
are
using,
and
this
was
like
a
new
feature.
B
I
know
I
use
now
prometheus
for
five
years,
mostly
in
different
setups,
but
it's
also
hard
to
keep
up
on
all
the
information,
but
the
community
is
also
changing
all
the
parts
so
because
everyone
has
different
requirements.
It's
also
fine.
A
A
And
speaking
of
like
keeping
track
of
all
the
changes,
prometheus
is
building
an
agent
which
is
a
feature
flag
at
the
moment
which
is
built
on
remote
rides
and
like
making
it
easier
to
have
sort
of
something
which
was
to
push
gateway
in
the
past,
because
it's
not
just
scraping
actively
metrics
and
connecting
to
the
services,
but
you
maybe
want
to
sort
of
push
something
to
prometheus
wire
remote
right.
This
is
going
on
and
this
is
like.
A
I,
I
think
it's
the
feature
flag,
it's
currently
being
tested,
and
the
other
thing
I'm
when
I'm
looking
at
cicd
observability
in
gitlab-
and
this
is
like
the
the
feature
request
issue-
I
created
it-
I'm
worried
about
when
I'm
adding
this
line
of
app
instrumentation
for
open
telemetry.
A
For
example,
does
this
impact
the
application
performance
and
not
just
like
this-
is
a
small
installation
where
some
ci
cd
pipelines
to
run,
but
it
should
be
a
large
scale
system
and
I'm
not
sure
if
there
are
probably
from
early
adopters,
there
are
benchmarks
or
metrics
available
allowing
you
to
say
okay.
This
is
a
good
thing
to
turn
it
on
by
default
or
saying.
Okay,
this
is
something
I
don't
want
to
be
turned
on
by
default.
A
The
problem,
then,
is
you're,
not
collecting
data
when
it's
not
on,
and
in
the
case
when
you
drag
a
problem,
then
turning
on
tracing
and
wishing
for
data
being
generated
in
the
past,
which
is
not
possible,
might
not
be
possible.
It's
super
hard.
It's
a
similar
thing
to
hey.
We
didn't
collect
logs
from
the
server,
and
now
we
cannot
ssh
into
it
because
it's
broken
it's
it's
super
hard
to
like
measure
and
define
for
a
specific
environment
and
say
this.
This
works
that
way.
A
One
one
thing
to
really
look
into
for
this:
have
sort
of
a
playground:
environment,
staging
environment,
something
like
that
for
deaf
environment
and
really
try
to
measure
that,
for
your
use
case
for
your
application,
it
adds
more
workload
to
yourself,
but
I
do
see
the
benefit
of
learning
and
understanding
how
it's
being
done,
and
maybe
on-board
new
team
members
document
it
for
yourself
or
even
contribute
to
the
community
and
or
to
the
wider
community
and
help
provide
feedback
and
say:
okay.
C
C
How
do
you
kind
of
balance,
those
kind
of
like
competing
aspects.
A
I
think
you
need
to
make
a
mistake
and
get
flawed,
for
example
by
logs,
so
you're
learning
by
mistakes.
An
incident
happens
and
you
have
100
gigabytes
of
log
files.
You
need
to
search
in,
and
your
elastic
search
cluster
is
not
happy,
for
example.
So
this
helps,
if
you
want
to
be
proactive
about
it,
I
think
taking
my
developer
head
on
I'm
thinking
of
I
needed
to
learn,
for
example,
structured
logging.
A
Previously
we
just
in
c
plus
plus
we
logged
everything
we
did
throw
our
sec
traces
and
the
logs
were
not
just
one
line,
it
was
multi-line,
it
was
not.
Users
couldn't
read
it,
but
developers
were
happy.
Developers
were
not
happy
because
users
created
back
reports
because
of
the
statutories,
but
in
the
end
I
think
having
a
common
sense
of
this
could
be
something
interesting
to
log,
for
example,
you're
starting
an
http
request,
you're
ending
it
finding
some
timing
points
which
are
helpful
for
logs
and
traces.
A
I
think
it's
a
it's
a
good
way
to
start
and
say
I
really
want
to
see
when
the
client
is
doing
an
http
request.
When
does
it
start?
When
does
it
end,
and
then
I
have
the
black
box
in
between,
I
could
go
the
route
of
saying,
I'm
just
adding
logs
in
the
different
thread
or
a
different
application,
and
then
I
have
the
tools
to
to
search
and
cover
that,
or
maybe
I'm
thinking
of
hey.
A
This
could
be
a
trace
we're
starting
here
and
we
are
forwarding
basically
the
trace
id
to
the
other
application,
and
then
I
get
to
see
a
timeline
with
specific
spans
and
I
also
have
the
possibility
to
add
more
context
to
it,
because
the
log
line
is
just
it's
text
or
it's
json
or
it's
something
else,
but
sometimes
you
really
want
to
add.
This
has
been
executed
in
a
docker
environment
with
version.
A
Whatever
specific
other
text
you
can,
you
can
add
or
enrich
to
the
trace
context,
and
this
helps
you
to
see.
Oh,
the
bottleneck
is
because
we're
using
a
two
old
version
of
docker
in
the
in
that
environment
and
for
some
reason
the
customer
requests
always
go
that
route.
Maybe
we
need
to
fix
our
aha
proxy
or
something
else,
so
I
think
thinking
of
use
cases
and
incidents
into
your
environment,
which
you
hopefully
have
from
the
past,
can
be
really
helpful
to
say
this
is
a
starting
point.
B
I
also
can
recommend
this
book
a
philosophy
of
software
design
by
john
ostert,
because
when
we
mostly
talk
about
complexity,
we
think
only
that
the
systems
are
complex
to
understand,
but
mostly
what
I
saw
my
last
companies
was
that
we
had
more
the
other
problem,
because
also
it
is
defining
a
little
bit
different
or
adding
to
that.
B
So
it's
also
hard
to
make
changes
in
the
system
when
systems
are
really
complex
and
it's
not
easy
to
do
a
change,
then
you
have
also
a
big
complex
system,
and
then
you
probably
don't
do
that
and
that's
why
the
people
are
feared
of
doing
this
changes.
So
you
should
be
have
changes
need
to
be
simple
and
it
should
be
less
impact.
That's
the
reason
why
we
have
all
these
technologies
so
with
kubernetes
complex,
of
course,
but
it
delivers
also
some
value,
of
course,
that
we
can
run
probably
workload
in
parallel.
B
We
can
probably
also
use
one
cluster
for
doing
development
and
alts
production.
The
the
of
course,
the
root
cross,
one
that
we
need
is
this
down
is
probably
another
problem,
but
we
can
use
an
easily
tested
in
the
same
system
and
we
have
all
these
tools,
but
I
think
the
other
problem
is
coming
with.
That
is,
like
you,
have
a
really
steep
learning
curve.
B
B
This
is
probably
not
whatever
you
can
use
out
of
a
box
because
you
need
to
understand
the
system
and
then
you
cannot
all
support
it
from
systems
that
are
using
it.
That's
fitted
for
their
use
case,
put
it
to
you
and
you
get
the
same
results
they
build
for
the
reason,
because
they
want
to
save
some
money.
They
want
to
save
money
doesn't
need
to
be
safe
infrastructure
trust
it
could
be
save
people
time
because
they
can
work
on
other
topics
to
bring
your
product
further.
So
that's
a
lot
of
stuff,
that's
ongoing!
B
That's
also
the
reason
why
you
probably
also
not
to
jump
on
every
hype,
train,
that's
coming
up
and
also
sometimes
stay
off
of
twitter
of
all
the
instant
interesting
stuff
that
is
happening
outside
a
lot
of
people
doing
interesting
stuff,
but
we
have
enough
information
out
there.
The
prom.
The
more
problem
is
find
the
right
content
for
you,
working
on
a
simple
problem
or
working
on
a
problem
and
doing
a
focus
drop
instead
of
turning
out
in
different
spaces,
and
then
you
make
like
digging
in
a
new
hole.
B
B
B
Other
questions,
or
we
can
talk
about
other
topics-
we
can
talk
about
blockchain,
so
we
can
talk
about
rust,
I'm
in
for
that.
So.
B
B
A
A
There
was
my
sdk
for
rust
with
open
telemetry,
which,
which
also
would
be
interesting
and
I'm
planning
to
work
on
adding
telemetry
to
ci,
cd
or
at
least
try
a
park
and
see
how
far
I
get
and
also
want
to
create
more
learning
resources
regarding
how
to
get
started
and
not
just
like
the
five-minute
success.
You
add
something
to
a
code
and
then
something
says
hello
world,
but
you
really
find
a
use
case
in
in
an
application
like
this
is
the
http
request.
This
is
starting.
A
This
is
ending
finding
a
real
use
case
or
maybe
breaking
something,
and
I
will
be
giving
a
talk
next
week
at
cows,
cannibal,
I'm
thinking
of
using
chaos,
engineering
to
break
something
which
then
gets
alerted,
and
you
get
to
see
something
which
you
probably
cannot
simulate
in
a
staging
environment.
I
also
thought
about.
A
I
don't
know
if
you're
familiar
with
that,
but
there
is
a
a
tool
called
cube
doom
which
actually
kills
ports,
which
one
can
could
use
for
chaos,
engineering
and
kubernetes
yeah,
and
also
see
how
distribute
the
tracing
then
works.
For
example,
it
would
be
interesting
to
see
the
traces
if
something
is
randomly
being
killed.
A
Which
also
provides
insights
and
ideas
for
real
time
of
a
not
real-time
real
world
incidents,
because
most
often
time
an
incident
is
just
there
and
it's
an
s1
high
priority,
and
you
need
to
react
upon
things.
You
already
have.
A
To
be
honest,
for
me,
it
was
fun
to
learn
about
more,
to
learn
more
about
dependency
scanning
with
gripe
and
travi
and
get
to
see
how
these
like
tools
work.
To
do
it's.
No
it's
for
container
scanning
dependency
scanning
is
something
else
again
to
to
understand.
What
is
the
potential
of
your
pinning
your
docker
container
in
your
pipeline
or
in
your
deployments,
to
a
specific
version
and
you're?
Never
updating
it,
you're,
never
going
to
update
it
again,
but
still
shipping
cvs
and
vulnerable
software
application
applications
and
dependencies
and
so
on.
A
A
B
The
signing
press
container
images
also
with
crosic
from
the
chained
art,
dress.
A
This
would
be
interesting
for
maybe
april
well,
let's
see
about
it
first,
I
would
love
to
collaborate
on
blockchain,
rust,
observability
and
other
things.
We
can
use
the
gitlab.com
group
namespace.
Maybe
we
will
create
a
new
group
or
we
just
use
observability
or
something.
B
Provide
a
slide
and
all
this
stuff
so,
but
I
need
to
check
which
blockchain
we
will
use.
So
there
are
multiple
options
already:
there's
not
one
blockchain,
of
course,
so
they're
still
a
different
system.
It's
like
when
you're
talking
about
a
distributed
system.
This
can
mean
that
that
could
mean
that
so
there's
not
one
solution
for
the
problem
and
they
are
have
also
different
trade-offs,
because
they're
working
also
with
different
consensus
and
so
on.
B
But
I
will
make
a
short
overview
of
that
and
probably
bring
you
a
little
bit
from
the
point
from
what
bitcoin
is
what
ethereum
is.
So
what
solana
is
so
these
are
the
most
common
chains
that
everyone
know
and
then
probably
also
to
program
on
that,
because
sometimes
in
simple
words,
blockchain
is
not
nothing
different
like
in
an
immutable
database
in
the
end,
and
you
can
also
do
like
structured
procedures
or
stop
procedures
in
database
language,
and
this
is
like
a
smart
contract,
a
program
or
how.
A
A
B
A
I
think,
to
be
honest,
it
took
me
when
did
I
start
in
march
2020?
It
took
me
one
and
a
half
years
to
really
fully
understand
what
I
can
do
with
open
telemetry
and
that
I
that
it's
a
specification
and
a
collector,
but
I
do
need
to
bring
my
own
backend
and.
A
Like
also
seeing
the
adoption
of
the
clients
and
sdks
which
which
helped
me
understand,
okay,
this
is
how
I
can
instrument
it
with
tracing.
A
Yeah
similar
thing
with
understanding
kubernetes,
I
need
to
generate
the
yaml
and
I
need
to
understand
all
the
components
which,
in
contrast,
is
not
necessary,
because
you
need
to
find
a
use
case
and
deploy
an
application,
a
service
and
and
once
that
pipe
in
your
head
gets
going,
you
kind
of
get
addicted
to
doing
more.
A
Okay,
anything
else
you
want
to
chat
about.
Otherwise
I
would
just
stop
the
recording
and.
D
One
last
question:
are
there
any
good
twisting
libraries
for
non-server
use
cases
so
for
desktop
clients,
mobile
clients?
You.
D
B
Because
it
pushes
the
data,
so
if
you
have
a
problem
that
it
needs
to
be
connected
to
something
but
tracing
mostly
words
that
you
push
for
data
and
that's
so
they
will
not
be
pulled
okay.
So
that
means
the
problem
with
getting
prometheus
up
in
a
local
client.
They
were
in
a
prom
con
in
europe,
16
from
which
also
a
blog
post.
So
to
do
that,
because
the
client
is
pushing
the
data
into
the
system,
it's
quite
easy
to
do
that
so
or
it's
technically
possible.
Otherwise
it
would
otherwise
burst
around.
B
It
would
be
also
possible,
but
of
course,
and
a
lot
of
more
security
teams
have
problems
to
implement
this
because
you're,
probably
mostly
nutted
and
all
the
stuff.
But
you
can
do
this
with
open
trade
or
telemetry.
A
You
can,
I
think
you
can
use
the
javascript
library
from
open,
telemetry
and
build
it
with
npm
or
with
no
chairs.
A
I
think
you
are.
We
are
thinking
about
a
visual
studio
code
extension.
Maybe
correct
me
if
I'm
wrong,
with
the
assumption.
D
Not
directly
so
I'm
just
searching
for
a
way
because
most
most
stuff
I
see
nowadays
is
server-centric
so
distribute
tracing
and
whatnot,
and
I
have
mostly
the
problem
that
I
need
to
have
classical
end-use
appliance
and
pushing
the
data
is
generally
open.
Telemetry
is
a
good
foundation
to
have
now
a
standard
way
of
doing
it,
but
building
it
into
things
like
c
plus
plus
applications
is
a
little
bit
harder
because
there
is
no
unified
way
for
networking.
A
So
they
the
the
technical
background
for
open
telemetry
it
uses
grpc
in
the
background
for
sending
or
emitting
traces
to
a
open,
telemetry
collector.
A
This
is
basically
a
daemon
which
needs
to
be
running
in
your
network
where
the
application
is
in
so
there
there
needs
to
be
a
direct
connection,
I'm
not
sure
if
there
is
some
proxying
or
for
forwarding
already
in
place,
but
I
do
think
that
when
you
implement
the
open,
telematric
plus
plus
client
in
your
applications
code,
so
your
the
thing
is,
let
me
see
if
I
can
quickly
find
it
in
the
open,
telemetry.
A
Let
me
see
when
you
add
that
the
things
you
need
to
do
it's
experimental
tracing
is
stable.
Okay,
do
we
have
examples.
A
We
have
a
lot
of
examples.
Okay,
this
is
not
something
I
was
looking
for.
A
The
headers,
of
course-
and
I
think
it's
a
partial
license,
so
it
might
not
be
compatible
with
gpl.
I
had
that
problem
in
the
past,
but
I
created
that
pork
anyways,
so
I
I
tried
playing
around
with
it
with
a
simple
timing
point
three
years
ago.
I
think
it's
broken
now,
because
the
and
the
api
changed.
But
the
thing
is
you
need
to
kind
of
initialize
the
tracer,
which
is
relatively
straightforward.
A
I
would
say-
and
let
me
see
if
I
can
get
tracer
okay,
we're
we're
getting
something
out
of
it
and
then
we're
running
something
and
inside
we
are
creating
a
scope
in
a
thread
and
okay,
we
are
starting
a
span
and
then
it's
being
thread
is
joined
and
then
everything
is
gone.
Okay
might
not
be
the
best
example,
but
it
from
what
I've
seen
it's
it's.
A
It
has
gotten
a
lot
easier
to
add
to
edit
to
your
code,
like
it
was
in
the
beginning,
where
you
headed
to
like
include
one
page
of
things
and
then
define
something.
I
think
the
most
important
part
is
that
you
instantiate
the
tracer
or
the
object,
which
then
sends
something
over
there.
A
What
was
I
looking
for?
We
do
have
this
one
like
the
configuration
you
need
for
open
telemetry
from
the
client
side.
It's
really
straightforward.
A
You
define
a
server
you're,
defining
some
authorization,
maybe
as
an
http
header
and
you're,
defining
the
traces
exporter
which
can
be,
for
example,
diega
or
just
open
telemetry
as
the
collector,
and
this
is
something
you
need
on
the
client
which
can
be
manipulated,
manipulated
by
environment
variables.
So
you
need
to
ensure
that
this
this
doesn't
get
overwritten
which,
which
I
tried
describing
over
here
and
that's
basically.
Well,
that's
that
should
be
about
it.
A
In
this
specific
example,
it's
about
adding
something
with
ruby
and
go
which
I
look
forward
to
trying
out
the
thing
is
telemetry
can
get
quite
overwhelming
if
you're
looking
into
this
picture,
for
example-
and
my
recommendation
is
to
really
start
as
simple
as
possible
instrument
a
demo
application
use
jaeger
tracing
as
a
tracing
background,
because
jaeger
also
provides
its
own
ui.
So
you
can
really
start
simple
and
use
the
open,
telemetry
collector
in
the
middle
and
build
your
own.
I
think
there
are
some
demo
environments
around
with
soccer
compose
and
probably
kubernetes.
A
Meanwhile,
build
your
own
or
use
an
existing
demo
environment
and
really
do
shorter
iterations
on
adding
traces
and
spans
to
your
code,
then
open
up
the
ui
and
then
evaluate
what's
going
on
or
maybe
or
maybe
maybe
you
can
use
grafana,
you
might
be
using
ops
trace
in
the
future,
something
which
which
is
already
there
and
you
don't
need
to
worry
about,
like
installing
10,
different
tools
and
and
whatnot
just
use
a
simple
installation
and
when
it
comes
to
adding
more
than
that
really
instrumenting
the
application.
You
already
have
the
kit
history
things.
A
You
learned,
you
hopefully
document
it
and
there
are
certain
examples
which
allow
you
to
follow
along.
For
this
specific
thing
I
have
found,
you
see
dish,
it's
not
an
issue
anymore.
It
should
be
an
epic
zoom.
A
This
should
be
the
pull
request
for
the
ap
server
and
we
can
just
see
the
changes.
Probably
it
doesn't
make
sense
to
view
the
changes
on
the
web
interface,
but
there
is.
There
are
certain
examples
out
there
which
implement
that,
and
this
is
something
my
users
can
actually
use
in
production
already.
A
So
I'm
I'm
really
a
fan
of
learning
from
someone
else
or
learning
from
others
how
they
did
it,
and
especially
reading
the
divs
on
what
happened,
and
maybe
the
mistakes
made
on
the
way
or
the
performance
problems
which
which
were
discovered.
B
Yeah,
actually,
obviously
it
could
be
hard,
but
I
think
it's
I
think
blockchain
is
quite
hard
and
also
showing
a
little
bit
what
you
can
do
on
that
and
russ
just
like
at
least
an
hour
or
a
lot
at
least
probably
two
hours,
but
I
will
speed
a
little
bit
faster
and
we
can
watch
the
video
in
slower
motion
instead
of
watching
videos
faster
than
yeah.
But
we
can
do
that.
So
we
can
talk
about
blockchains
and
webstreet.
So.