►
From YouTube: CNCF Live Webinar: Intro to open source observability with Grafana, Prometheus, Loki, and Tempo
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Okay,
today,
we
are
talking
with
richard
hartman,
with
grafana
labs,
an
intro
to
open
source
observability
with
grafana
prometheus
loki
and
tempo
everyone.
Please
remember
that
during
the
webinar
you're
not
able
to
speak
as
an
attendee
you're
using
the
chat
I
see
already
so
oh
say
hello,
and
for
richard
we'll
get
you
as
many
of
those
as
we
can
at
the
end,
and
we
can
even
stay
on
a
little
bit
longer
since
we're
running
late.
To
take
care
of
that.
A
A
Please
also
note
that
the
recording
and
slides
will
be
posted
later
today
to
the
cncf
online
programs
page
under
online
programs.
They
were
also
available
via
this
registration
link,
which
will
take
you
to
our
online
programs
youtube
playlist
with
that,
I'm
going
to
hand
it
over
to
richard,
so
we
can
kick
things
off.
B
Thank
you.
Thank
you,
libby.
Thank
you.
Everyone
word
of
warning.
I
keep
getting
pop-ups
from
from
the
platform
that
my
internet
connection
is
unstable,
which
I
don't
believe
is
the
case,
but
something
is,
is
is
broken-ish,
so
if
I
drop
or
anything
I'll
try
to
rejoin
so,
let's
get
started
intro
to
open
source
observable
team,
a
little
bit
of
validating
that
most
of
my
life.
I've
worked
in
engineering
architecture,
operation
worlds,
so
I
I
have
strong
opinions
about
about
the
right
tools
and
about
not
perfect
or
not
good
enough
tools.
B
Oftentimes.
You
have
this.
This
parrot
thing
there,
where
you
have
breaks
between
different
media,
where
you
have
breaks
between
different
trains
of
thought
like
how
to
how
to
index
your
data,
how
how
the
mental
modeling
works.
Maybe
one
thing
has
the
one
color.
The
other
thing
is
the
other
color
or
one
of
the
things
left
to
right.
The
other
is
like
right
to
left
doesn't
matter.
You
have
breaks
between
your
different
systems
too
often,
which
in
turn
means
that
way.
B
Too
often
you
you
end
up
paying
extra
cost
in
mental
overhead
or
in
in
automation,
overhead.
It's
not
seamless!
You
need
to
switch
mental
modes
when
you
go
from
your
logs
to
your
traces
or
what
have
you,
which
is,
is
not
nice
and
it
just.
It
adds
friction,
and
you
don't
really
need
that
at
like
five
in
the
morning
on
a
sunday
when
you've
got
when
you've
gotten
a
pager.
B
So
let's
try
and
rethink
what
what
we
actually
want
to
do
here
and
I'm
going
to
to
go
through
a
little
bit
of
of
like
the
philosophy
of
observability
and
a
few
buzzwords
as
a
as
a
foundation.
Let's
say
for
what
we
are
then
talking
about.
B
There
is
a
thing
where
the
cloud
native
scale
is
basically
what
internet
scale
was
two
decades
ago
and
that's
kind
of
important
to
keep
in
mind,
because
a
lot
of
of
issues
which
we
see
in
the
cloud
native
world
have
already
been
solved
in
different
contexts
before
us,
and
it's
always
a
good
idea
to
to
look
at
what
engineers
before
us
did
to
to
solve
problems
like
not
the
specific
implementations,
because
usually
they
don't
fit
their
age
if
they're,
too
old
or
the
new
age.
B
But
the
underlying
concepts
like,
for
example,
computer
networks.
The
internet
also
power
networks,
a
lot
of
those
tend
to
run
on
metrics,
because
this
is
already
a
predestination
of
of
what
you
care
about
as
as
a
domain
subject
expert.
B
B
As
always
in
tech
we
have
buzzwords,
buzzwords
are
usually
usually
they
have
a
kernel
of
truth,
but
by
the
time
there
are
buzzwords
they
have
lost
most
of
that
meaning,
which
is
a
pity,
but
it
also
stands
like
why
they
were
so
successful.
B
It
comes
from
from
indigenous
people
who
observed
soldiers,
building,
building,
runways
for
planes
and
small
control
towers
and
such
and
then
the
gods
send
send
stuff,
from
heavens,
which
was
basically
just
logistics
of
the
army.
But
the
perception
was
that,
just
by
building
runways
and
such
you
could
get
gifts
from
the
gods
and
to
this
day,
those
those
things
still
echo
in
in
a
few
religions,
so
that
is
observed
behavior
it
becomes
part
of
culture,
but
it
it's
not
actually
doing
anything.
B
It's
not
actually
pursuing
the
the
goals
or
or
the
the
underlying
rationale
and
that's
something
which
you
always
need
to
be
worried
about.
It's
not
about
just
changing
the
name
for
a
thing,
and
anyone
who
was
assistment
yesterday
is
sae
today
and
you're
done
it's
about
actually
changing
the
behavior
and
actually
understanding
why
something
is
successful,
not
just
observing
that
it
is
successful
monitoring.
B
While
I
personally
use
monitoring
and
observability
more
or
less
interchangeably-
and
that
is
buzzwordy
definition-
monitoring
has
taken
a
little
bit
of
a
meaning
of
collecting
data,
not
using
it.
You
have
two
extremes
in
this.
One
takes
one
thing
where
you
have
the
full
text:
indexing
where
you
just
in
in
a
vain
attempt,
go
after
everything
which
you
can
find
or
data
lake,
which
outside
of
batch
analysis,
is
often
a
euphemism,
for
no
one
is
ever
going
to
look
at.
B
The
thing
observability
is
is
trying
to
reframe
that
a
little
bit
about
being
able
to
ask
new
questions,
just
observe
what
inputs,
what
outputs
a
system
has
and
being
able
to
deduce
the
internal
state
of
that
system
from
those
inputs
and
outputs,
as
in
ask
questions
which
you
didn't
know
you
wanted
to
to
ask
before,
and
that
enables
humans
to
understand
complex
system.
But
it
also
allows
you
to
automate
a
lot
of
this.
So
it's
not
just
about
determining
that
something
is
in
a
certain
state.
B
B
Another
super
important
concept
is
complexity,
where
you
have
what
I
call
fake
complexity,
aka,
bad
design,
which
you
can
reduce
and
you
should
reduce,
in
my
opinion,
like,
unless
you
have
other
engineering
constraints.
Like
I
don't
know,
money
gtm,
maybe
maybe
compliance
reasons.
What
have
you
but
outside
of
of
actually
reasons
why
you
have
complexity?
You
should
always
strive
to
get
rid
of
complexity,
but
you
have
real
system
inherent
complexity
as
well,
and
that
can
be
moved,
but
it
cannot
be
made
to
go
away
like
state
is
always
someone's
else's
problem.
B
You
have
all
your
micro
services
they're
stateless,
but
someone
has
to
maintain
the
database
so
that
that
complexity
has
to
live
somewhere.
So
yeah
you
can
move
it
back
and
forth.
You
can
comparison
mentalize
and,
in
my
opinion,
my
strong
opinion.
You
should
comparison
mentalize
it
and
you
should
distill
it
meaningfully
and
we
have
two
different
definitions
of
of
distilling.
This,
a
the
apis
towards
whatever
the
consumer,
slash
user
of
the
thing
is
and
b
already
start
thinking
about
what
you
need
to
emit
towards
the
observers
towards
your
operational
teams.
B
So
they
can
look
at
the
thing
that
is
basically
slis,
sli
slo
sla,
often
times
people
are
confused.
What
they
mean.
It's
really
really
simple.
Sli
are
several
service
level
indicators.
What
you
measure
objectives
are
what
you
need
to
hit
and
agreement
since,
when
you
need
to
start
paying
course,
someone
broke
a
contract.
B
You
have
operational
people
who
are
paid
for
for
stuff,
not
breaking
so
you
have
diametry
diametrically
opposed
incentives
where
the
one
group
wants
to
move
super
quickly
and
the
other
group
wants
to
move
rather
slowly
and
carefully,
and
so
they
always
they
always
fight.
They
always
have
strive.
Course,
that's
built
into
literally
into
their
compensation
structure
and
into
their
complete
organizational
structure.
B
There
is
one
of
the
main
things
of
sre
to
me:
is
the
concept
of
error
budgets,
where
everyone
shares
a
budget
for
how
many
errors
a
thing
can
have,
and
if
you
hit
those
budgets,
it's
fine,
but
it
doesn't
matter
if
this
is
due
to
new
features
or
a
b
testing
or
a
new
deployment
where
the
pm
needed
something
really
really
urgently
or
things
always
breaking.
If
things
break
too
often
in
operations,
the
devs
don't
have
error
budget
for
their
testing
and
deployment
velocity
anymore.
So
you
align
those
incentives.
B
Another
nice
thing
is,
if
you're
able
to
build
a
shared
understanding,
not
just
align
incentives
between
people
and
that's
where
dashboards
are
coming
in,
where
all
those
dashboards
ideally
are
shared
between
all
those
different
teams,
because
then
you
have
an
incentive
to
invest
in
shared
tooling,
and
everyone
improves
a
little
bit
and
everyone
else
benefits
from
the
thing
you
pull
all
your
institution
knowledge
around
a
thing
from
a
lot
of
different
angles,
and
everyone
works
together
in
making
this
better.
It
also
means
you're,
building
the
same
language
and
you
built
the
same
understanding.
B
Everyone
has
the
same
dashboard.
The
pm
doesn't
need
to
fight
the
engineers
about
what
that
one
metric
is
course.
They
literally
look
at
the
same
data.
They
don't
use
different
words
for
different
aspects.
Of
course,
all
of
them
use
the
same
dashboards,
the
same
alerts,
the
same
reports,
which
in
turn
means
they
use
the
same
language
services
are
not
a
super
important
concept;
they
could
basically
comparison
mentalize
complexity
and,
if
you
remember
just
now,
I
said
one
of
those
two
abstraction
layers
would
be
an
interface
towards
the
user.
B
B
B
A
lot
of
those
implicit
misunderstandings
just
go
away
because
once
it's
written
down
and
agreed
and
that's
the
basis
for
what
you
actually
do
and
how
you
operate,
a
lot
of
people
will
take
a
second
and
third
look
and
actually
start
negotiating
details,
instead
of
everyone
being
like
yeah,
whatever
it'll
work,
and
then
it
breaks
and
everyone
is
fighting
why
it
broke.
And
then
you
realize
that
you
had
a
lot
of
misunderstandings
doesn't
matter
if
the
customers
or
consumers
are
internal
or
external,
treat
them
as
if
they
were
external.
B
Of
course,
they
are
depending
on
your
thing.
Anyone
coming
from
networking
like
myself,
layers
or
another
way
of
thinking
about
this.
The
internet
wouldn't
exist
without
proper
layering,
because
I
can
literally
rip
out
layer,
one
and
layer
two
and
I
have
instead
of
ethernet,
I
have
wi-fi
or
what
have
you
and
that
wouldn't
be
possible
without
those
clean
and
long-term
stable
interfaces
between
the
different
layers.
B
Other
things
like
cpus,
hardness,
compute
nodes,
your
lunch.
Even
if
you
cook
from
scratch,
you
will
not
grow
every
last
cucumber
yourself.
You
have
certain
interfaces
where
you
buy
other
services
and
just
consume.
Those
alerting
also
super
important
customers,
don't
care.
If
I
don't
know,
you
have
20
database
notes,
they
don't
care
if
if
15
of
them
are
down
or
five
of
them
are
down
or
all
of
them
are
healthy,
they
care
about
that
service
which
they
are
consuming
being
healthy
and
responsive.
B
And
what
have
you
so
that's
the
perspective
to
mainly
take
define
your
slas,
your
sli's,
your
slos
from
that
perspective
of,
is
it
user
interfacing,
or
is
it
user
visible?
The
nice
thing?
If
you
do
this
in
depth,
what
is
your
provider's
sla
and
sli
is
perfect
for
you
to
debug,
of
course,
if
the
database
is
down,
you
don't
need
to
debug
where
your
webshop
is
not
working,
you
kind
of
know,
so
you
you
structure
again,
you
use
the
same
language
across
the
complete
stack
of
what
you're
doing
important
to
avoid
burnout.
B
Anything
or
anything
which
is
currently
or
imminent.
Customers
must
be
alerted
upon
and
nothing
else
raise
a
ticket.
Do
it
during
business
hours,
if
it's
not
customer,
impacting
just
don't,
of
course,
you'll
burn
out.
So
that's
the
intro
part
now
gets
to
the
tech
part
prometheus
prometheus,
if
you
don't
know,
is
inspired
by
google's
pokemon.
It's
a
time
series
database
internally.
It
uses
64-bit
values
for
pretty
much
everything
which
is
relevant,
there's
thousands
or
tens
of
thousands
of
insta
thousands
of
instrumentations
and
exporters
that
are
public.
There's
millions
of
installations
of
prometheus.
B
Built-In
services
cover
that
is,
will
notice
they're
next,
like
not
impossible,
it's
very
uncommon
to
run
kubernetes
without
a
prometheus
of
some
sort,
because
they
are
literally
designed
from
each
other.
Even
back
from
the
google
work
in
pork
mondays
and
more
or
less
by
a
happy
little
accident
with
kubernetes
and
prometheus
within
cncf
low
and
behold.
Those
are
the
two
founding
projects
of
cncf
they
go
together.
You
haven't,
you
have
no
hierarchical
data
models,
so
you
don't
have
your.
B
I
don't
know
your
region,
your
your
city,
your
customer,
and
then
you
need
to
select
by
customer
and
all
of
a
sudden,
you
need
to
walk
up
your
hierarchical
area.
While
you
need
to
walk
down
blah
blah
blah
now
you
have
an
n-dimensional
label
set,
which
you
slice
and
dice
as
you
need
it.
So
you
select
by
label.
Customer
equals
x
and
you're
done.
Prom
kl
is
a
function
label
a
functional
language
which
allows
you
to
do
vector
math
on
on
on
your
data,
which
is
super
efficient,
like
highly
efficient
in
particular.
B
Of
course,
the
label
matches
matching
usually
does
more
or
less
by
magic.
What
you
want-
and
this
is
used
for
everything-
processing,
graphing,
alerting
exporting
data
every
every
way
you
work
on
the
data
is
always
through
promptly.
So
it's
a
language
you
have
to
learn,
but
it's
the
one
language
and
then
everything
works.
Simply
operation,
don't
need
to
convince
you
probably
highly
efficient,
it's
pull-based
for
good
reason.
Of
course,
this
makes
a
lot
of
things
easier
to
reason
about
about
correctness
and
up-to-date
correctness
of
the
state
of
of
the
wider
system.
B
Push
versus
pull
is
a
borderline
religious
debate,
but
in
particular,
coming
from
the
networking
space.
There
are
some
properties
of
pull
which
are
next
to
impossible
or
super
hard
to
to
emulate
in
push-based
system.
Unless
the
push-based
system
has
complete
information
of
of
everything
which
should
be
sending
data
at
which
point
pulling
is
more
efficient.
Anyway,
white
box,
black
box
monitoring,
one
looks
at
the
thing
from
the
outside
without
further
information,
whereas
white
box
monitoring
looks
at
all
the
inerts,
you
instrument
your
code,
you
emit
data
from
internal.
B
Every
service
should
have
its
own
metrics
and
endpoint.
With
things
like
the
prometheus
agent,
which
we
announced
today,
with
my
promises
team
head
on
look
at
the
block
of
promises,
io
slash
blog,
we
we
can
also
accumulate
this
data
for
you
and
then
even
push
it
to
other
backends
yeah
and
super
hard
api
commits
stronger
than
anything
I've
ever
seen
in
my
life,
maybe
except
for
the
linux
kernel
time,
series
yeah
most
certainly
except
for
the
linux
kernel,
these
defined
as
user
interfaces,
which
are
not
deprecated
anyway.
B
What
are
time
series
recorded
values
which
change
over
time,
for
example
the
temperature
in
your
room?
That's
a
time
series
you
usually
merge
those
individual
events
of.
I
don't
know
tens
of
thousands
of
people
accessing
that
thing
into
counters
and
their
histograms
typical
examples
would
be
requests
to
web
server
temperatures
service
latency
this
kind
of
thing.
It's
super
easy
to
omit
the
parse
and
read:
that's
literally
how
it
looks
on
the
wire.
So
it's
like.
B
I
know
people
who
print
f
in
their
c
code
and
then
just
dump
that
file
onto
web
server
and
that's
how
they
instrument
their
code
and
it
works
like
there
are
easier
ways,
but
for
them
that
works
and
it's
totally
fine
scaling
kubernetes
is
a
spork.
Prometheus
is
a
sport.
One,
so
yeah
scale
is,
is
kind
of
built
in
prometheus
and
kubernetes
are
designed
and
written
with
each
other
in
mind,
borg
and
borgmon
again
yeah
just
looking
at
prometheus.
B
I
have
a
typo
there's
a
two
missing
in
that
in
that
sentence.
So
roughly
1
million
samples
per
second
is
not
a
problem
on
current
hardware.
B
2200K
samples
per
second
and
core
is
is
roughly
where
we're
at
and
but
that's
already
slightly
old,
and
the
single
largest
prometheus
instance,
which
we
saw
in
production,
was
125
million
active
times
years
like
we
as
in
prometheus
team.
B
I
know
of
someone
who
ran
it
at
700
million,
so
yeah
it's
kind
of
scalable,
but
it's
also
painful.
At
that
point,
you
probably
would
cortex
or
thanos
or
something
speaking
of
there's,
two
two
projects
which
have
high
overlap
with
with
prometheus
team
members,
thanos
and
cortex.
Historically,
thanos
is
easier
to
run
and
scales.
Storage
horizontally.
B
Cortex
is
a
lot
easier
to
run
these
days
and
it
started
with
scaling
storage
in
gestures
and
querying
horizontally.
It
took
the
code
of
of
thanos
to
also
scale
storage
horizontally,
guess
what
thanos
was
working
on
with.
B
B
One
customer
is
running
at
3
billion,
but
that's
kind
of
more
than
pushing
it,
but
it
did
not
completely
die
in
a
fire
loki.
It
is
basically
like
prometheus,
but
for
logs.
So
it
follows
all
the
same
design
principles
as
the
same
label
based
system.
It
has
the
same
indexing
type.
It
takes
tons
of
code
from
from
cortex
and
for
a
kind
of
aubry
seasons.
The
nice
thing
is
you
don't
need
a
full
text
index
course.
B
Usually,
if
you
work
on
logs,
you
don't
need
every
last
bit
and
piece
of
your
thing,
indexed
most
often
you're
able
to
to
extract
a
few
relevant
bits
and
pieces
of
information.
You
index
that
you
search
on
that
and
the
rest
is
just
an
opaque
string
which
is
which
is
stored
without
indexing,
which
means
you
have
a
lot
less
overhead
and
cost
in
storage
and
in
particular,
indexing
in
lookups.
B
Sorry
and
one
of
the
nice
properties
which
are
initially
non-obvious
to
a
lot
of
users,
is,
as
you
use
literally
the
same
label
based
system
as
prometheus.
It's
trivial
to
to
turn
your
logs
into
metrics
to
extract
metrics
from
your
logs
for
alerting,
graphing
blah
blah
blah
blah
blah,
basically
pre-processing
or
processing
logs
into
metrics,
again
remember
same
like
internet
scale.
Two
decades
ago.
That's
kind
of
the
same
trick,
which
is
literally
the
same
thing
where
a
lot
of
singular
ones
were
turned
into
metrics
and
then
just
the
metrics
exposed
in
loki.
B
You
have
that
mechanism
built
in
which
is
super
nice
and
except
for
google's
m
tail,
which
kind
of
was
that,
even
when
it
was
released,
something
which
which
we
haven't
seen
in
in
the
open
source
or
in
the
open
world
like
certain
search
engines
and
such
have
this
internally,
but
not
not
others
prior
to
loki
at
least,
and
you
can
pump
basically
all
type
of
of
of
text-based
information
into
into
loki.
B
One
of
the
elite
deaths
at
welsh
even
puts
his
his
car
telemetry
and
pictures
from
his
dash
cam
into
low
key
course.
He
can
and
he
likes
to
because
again
the
content
back
here
is
unindexed,
which
means
you
can
just
put
whatever,
and
it's
just
an
opaque
string
or
blob.
To
be
precise,
you
might
remember
the
prometheus
exposition
format
we
saw
earlier
or
the
open,
metrics
format
which
we
saw
earlier.
That's
actually
the
same
with
the
labels.
Here
you
just
need
a
timestamp,
of
course.
B
Obviously
an
event
is
is
always
at
a
specific
point
in
time.
So
you
need
to
emit
that
specific
point
in
time,
whereas
the
metrics
are
handled
differently.
On
a
conceptual
level,
you
can
emit
precise
timestamps,
but
usually
for
mathematical
reasons,
which
we
are
not
going
into
here.
It's
it's
better
to
to
have
prometheus
or
cortex
or
thanos
handle
handle
the
timestamping
versus
with
low
key.
It's
better
to
have
the
emitter
handle
timestamping
some
numbers.
Our
queries
at
grafana
labs
regularly
see
40
gigs
per
second
gigabytes
per
second.
B
B
We
regularly
query
terabytes
of
data
in
under
a
minute,
and
ideally
you
then
emit
this
back
into
metrics.
So
you
don't
have
to
do
those
expensive
or
relatively
expensive
queries
regularly.
You
can
just
what
you
really
care
about
already
emit
into
metrics
and
then
again
you
reduce
total
amount
of
information,
also
computational
complexity
by
orders
of
magnitude
tempo,
the
loss
of
the
bunch
with
openmetrics.
B
One
of
the
things
which
stuck
with
me
is
when,
when
the
googlers
mentioned,
how
how
searching
for
for
traces
didn't
scale
and
when
google
tells
you
that
searching
doesn't
scale
searching
for
something
you
better,
listen
which,
which
I
did
so
x,
employers
are
just
an
id.
B
You
already
know
that
this
is
a
relevant
thing
course.
It
came
from
that
high
latency
bucket,
where
I
know
your
p99
was
two
seconds.
What
have
you
doesn't
matter,
but
you
know
you
have
a
high
latency
there.
You
know
you
had
that
one
error,
you
know
you
had
that
one
security
exception.
What
have
you-
and
you
know
that
this
one
trace
is
relevant
to
the
thing
which
you're
currently
working
on,
which
you
saw
in
your
logs
or
your
metrics,
so
you
don't
need
to
search.
B
They
are
built
into
pretty
much
everything
which
we're
talking
about
of
course,
kind
of
obvious
they're
nice,
but
tempo
also
also
allows
you
to
search,
of
course,
some
users
and
some
use
cases
just
require
searching
of
more
or
less
raw
traces
and
spends
my
own
personal
opinion.
At
some
point.
B
It
would
be
nice
to
optimize
this
out
if,
if
you
need
to
do
search
as
of
today,
but
if
you
need
to
rely
on
search
going
forward,
that's
also
completely
doable
better
would
be
if,
if
you
go
through
ex-employers,
because
it's
just
so
much
more
efficient,
only
works
on
object,
storage,
you
don't
need
cassandra
elastic
anything
expensive
in
the
background,
given
an
object,
store
and
you're
done,
it's
compatible
with
all
the
things
open,
telemetry
tracing,
zipkin
jager
by
default.
We
are
not
sampling,
you
can
sample
if
you
want
to,
but
we
don't
sample.
B
I
also
need
to
update
that
slide.
I
see
because
as
of
four
months
ago,
which
is
eons
in
in
this
production
velocity,
we
had
over
2
million
samples
per
second
at
350
megabytes
per
second,
and
we
have
14
day
retention,
three
copy
stored
at
a
cost
of
240
cpu,
400,
gigs,
450,
gigs
of
ram
and
132
terabytes
of
object,
storage
and
the
p99
of
2.5,
it's
better
already,
but
like
tempo
scales
and
it
scales
insanely
high.
B
Bringing
all
of
this
together
this
this
more
holistic
thing
allows
you
to
jump
from
logs
to
traces,
from
metrics
to
traces,
from
traces
to
logs
and
all
the
all
the
other
different
ways,
of
course,
like
it's
literally
designed
for
each
other
and
while
they're
all
distinct
projects
and
you're,
not
forced
to
use
all
of
them
to
to
reap
benefits.
B
If
you
so
choose
you
get,
you
get
the
most
bang
for
your
non
buck.
B
Of
course,
a
those
things
have
been
designed
for
each
other
and
personally
speaking
since
at
least
2015,
I
have
been
working
towards
having
those
three
things
for
metrics
logs
and
traces
as
a
holistic
thing.
So
there
is
a
long-running
underlying
design
as
to
the
bank
for
the
buck.
All
of
this
is
open
source.
You
can
run
it
yourself.
B
I
like
food
and
shelter
so
you're,
also
more
than
welcome
to
go
to
grafana
cloud
or
or
buy
enterprise,
or
what
have
you
and
there's
some
more
features:
rough
sniff
test.
If
the
user,
the
intended
user,
has
more
money
than
time,
it
tends
to
be
a
paid
feature.
If
they
have
more
time
than
money,
it
tends
to
be
open
source,
like
that's
roughly
the
the
sniff
test
for
our
monetization
strategy.
B
Again
most
or
anything,
we
talked
about
right
now,
it's
completely
open
source.
You
can
run
it
yourself,
a
few
screenshots.
Most
of
you
know
how
how
grafana
looks,
but
still
those
blue
lines
are
relatively
new
and
super
nice.
You
can
have
events
you
can
have
you
can
have
your
alerts.
You
can
have
things
like
this,
which,
which
give
you
a
lot
more
context.
You
can
also
have
examples
visualized
and
things
like
this
and
tons
of
other
visualizations.
B
As
just
last
week
we
had
observabilitycon
2021
online.
Obviously
a
lot
of
what
we
just
talked
about.
You
can
find
in
more
depth
without
that
rush
to
to
cover
as
many
questions
as
possible
at
this
location,
grafanacon.
B
Anyone
that's
also
part
of
the
slice.
It's
even
a
click.
B
Thank
you
very
much.
You
can
post
talks
on
github
like
all
of
them
for
last
decade
or
so.
Email
twitter
are
there
for
your
per
user
and,
let's
see
what
we
have
as
questions.
B
Do
we
get
created
questions
and
they're
read
out
or
how
does
it
work?
I
honestly
don't
know
sorry.
I
didn't.
B
B
B
Orchestrate
is:
can
you
expand
on
what
you
mean
with
orchestrate
course?
I
think
you're
mixing?
On
the
one
hand,
your
own
orchestration
of
application
versus
how
to
how
to
emit
data
towards
grafana
cloud.
I
can
try
and
have
have
a
partial
reply
as
as
to
the
second
part
of
that
question,
how
I
understand
it
the
easiest
way
for,
for
most
things,
is
the
grafana
agent,
which
is
what
the
prometheus
agent,
which
was
released
today,
is
based
upon.
B
Of
course,
this
allows
you
to
to
channel
all
your
your
signals
towards
grafana
cloud.
If
you
have
any
of
the
other
interfaces
like
the
common
ones,
they're
all
supported
like,
ideally,
you
you
put
things
somehow
into
into
a
prometheus
remote
right
to
to
emit
towards
graphite
cloud.
If
it's
metrics
for
traces,
open,
telemetry
tracing
is,
is
the
gold
standard?
So
you
should
absolutely
do
this.
B
If
you
have
non-prometheus
things
and
there's
an
exporter
for
pretty
much
or
for
probably
everything
on
the
market
to
get
data
into
prometheus
format,
and
then
you
can
use
the
agent
or
other
mechanisms
to
to
push
towards
grafana
cloud.
If
you
want
to
the
open,
telemetry
collector
also
supports
prometheus
remote
right,
so
you
can
also
use
this
yeah
pretty
much.
Everything
which,
which
is
on
the
market,
is
supported,
prom
tail
and
such
for
loki
and
everything
is
built
into
into
the
grafana
agent.
B
If
you
just
want
the
bare
bones
open
metrics
to
to
promise
this
remote
right
pipeline,
the
prometheus
agent
is
better.
If
you
want
built-in
exporters,
if
you
want
prom
tail,
if
you
want
to
have
open
telemetry
tracing
all
those
things
built
into
a
single
binary,
the
grafana
agent
is
better
depends
on
your
trade-off.
Some
deployment
models
like
to
have
a
single,
huge
binary,
which
does
pretty
much
everything
other
deployment
models,
mandate
that
you
have
tons
of
smaller
services,
both
as
valid
both
as
covered.
B
A
B
We
do
do
you
have
an
off-the-shelf
helm
chart
for
getting
this
whole
setup.
B
I
think
we
do
there's
tons
of
work
on
in
our
integration
crew
and
we're
hiring
like
crazy
for
the
integrations
crew,
where
all
of
this
is
made
more
seamless
internally,
we
use
tanker,
which
is
jsonnet,
which
is
then
compiled
into
into
helm
and
others,
and
also
is
able
to
to
ingest
term
charts,
which
means
you
don't
have
this
common
problem
of.
B
Of
having
those
super
static,
slash
brittle
hem
charts,
which
are
hard
to
to
change
and
and
hard
to
to
track,
in
particular,
if
you
have
both
upstream
changes
and
your
own
local
changes
where
you
functionally
need
to
fork
pretty
much
everything
and
and
carry
your
own
forks.
If
you,
if
you
need
to
do
anything
more
than
really
baseline
changes,
I
suggest
you
you
look
at
tanka
and
and
js
on
it.
I.
A
B
B
How
to
integrate
apps
to
send
metrics
or
emit
data
to
grafana
cloud
depends
on
the
type
of
of
well
okay.
No,
he
said
metrics,
not
signal,
sorry,
well,
okay,
let's
go
with
metrics
and
then
with
data
metrics,
prometheus
client.
Libraries
is
the
gold
standard
for
for
emitting
metrics,
as
of
today
for
data
defined
as
traces
open,
telemetry
tracing
is
the
gold
standard
for
logs.
It
doesn't
really
matter.
Of
course,
logs
is
just
historically
kind
of
a
mess,
as
most
of
you
will
probably
agree.
B
So
promtel
can
ingest
pretty
much
everything
and
just
hammer
it
into
shape
for
for
loki
to
consume.
Again.
All
of
this
is
built
into
the
grafana
agent,
but
for
for
your
own
applications
when
you
need
to
emit
the
actual
raw
data
from
your
own
code,
and
you
need
to
instrument
your
own
code
for
metrics,
prometheus
client,
libraries
for
traces,
open,
telemetry
tracing
and
for
logs,
it
doesn't
really
matter
cause
prompted
eats
at.
B
All
does
correlation
happen
between
loki
logs
and
tempo
traces.
So
going
from
your
logs
to
your
traces,
the
ideal
case
is
you
have
an
xml
on
your
on
your
logs
there.
You
know
that
your
id
for
for
that
trace
or
that
span
or
both
ex-employer
support,
support
free
form
text.
So,
as
per
wc3's
tracing
standard,
we
support
both
span.
A
B
B
Should
kubernetes
application
services
be
designed
in
any
particular
way
to
use
these
tools?
What
is
a
good
starting
point
to
integrate
these
tools
to
custom
kubernetes
services
running
in
a
cluster
great
question,
and
it's
not
basic
not
at
all
for
prometheus
slash.
The
others
is
super
simple,
prometheus.
B
It
has
a
thing
called
service
discovery,
which
is
just
an
interface
where
prometheus
understands
how
other
services
run
their
thing
first
and
foremost,
kubernetes,
but
there's
also
things
like
text
files
where
you
just
write,
yaml
and
and
populate
your
your
service
discovery
for
anyone
more
on
the
networking
site
zone
transfer
is
is
possible,
so
you
have
your
bind
or
whatever
unbound
dns
server
allows
zone
transfers
by
prometheus
and
it
just
ingests
the
complete
zone
and
just
starts
monitoring
or
scraping
everything
which
is
defined
in
that
in
that
zone
and
again
that
is
also
the
case
for
for
kubernetes.
B
So
you
literally
just
point
your
premises
at
your
kubernetes,
and
you
tell
your
kubernetes
that,
yes,
this
thing
may
get
the
data
and
automatically
prometheus
gets
all
the
data
from
that
kubernetes
cluster
with
or
from
the
parts
like
services,
internal
blah
blah
might
be
different,
depending
on
your
precise
setup.
Maybe
you
need
a
sidecar
blah
blah
blah
the
usual,
but
for
the
parts
itself
and
such
all,
that
is
automatically
emitted,
which
is
super
nice.
Of
course,
it's
literally
one
thing
to
set
up
and
automatically
you
have
all
that
data
in
your
local
prometheus.
B
If
you
don't
want
to
have
local
storage
or
you
have
issues
with
state,
which
was
the
reason
why
we
created
the
prometheus
operator
ages
ago
to
handle
state
within
within
kubernetes,
you
can
also
just
run
the
grafana
or
the
prometheus
agent
and
just
shove
all
that
data
into
eg
grafana
cloud
or
one
of
the
other
prometheus
compatible
offerings.
Speaking
of
hermes
compatibility,
also
on
the
prometheus
block,
again
promises
io,
slash
blog.
B
We
did
start
a
prometheus
with
my
prometheus
head
on.
We
did
start
a
prometheus
compliance
thing
there
or
prometheus
conformance
thing
where,
if
you
are
compliant
to
the
relevant
apis
and
service
interfaces,
you
get
certified
as
prometheus
compatible,
which
means
for
the
users
that
you
actually
know
that
a
thing
is
promises
compatible
and-
and
you
can
just
use
it
without
fear
of
of
something
breaking
prometheus
cortex
grafana
cloud
are
prometheus
compatible.
B
B
So
if
you
have
normal
scale
like
if,
if
you're
working
at
a
huge
company-
or
you
run
a
team-
and
they
have-
I
don't
know
how
many
users
blah
blah
blah
blah
blah
that
is
not
as
applicable,
but
if
you
have
normal
sized
amounts
of
data,
it's
pretty
easy
because
you
just
start
a
prometheus
or
a
cortex
or
thanos.
Cortex
and
prometheus
have
single
binary
modes.
So
you
just
start
the
binary
and
and
you're
done.
In
this
case,
I
would
recommend
prometheus
myself.
If
you
get
started.
B
B
B
Digitalocean
also
has
quite
a
few
super
nice
prometheus
tutorials,
which
are,
I
think,
four
years
old,
but
they
are
super
nicely
written.
So
yeah
also
we
are
extending
the
tutorial
section
on
prometheusel,
so
yeah.
B
Does
prometheus
integrate
with
tools
like
istio?
I
think
I
know
the
answer,
but
I
don't
want
to
give
a
wrong
answer,
so
I
can
follow
up
and
shoot
me
an
email
or
something
I'll
I'll
get
you
the
authoritative
answer
from
robot
or
from
joe
sorry,
not
from
robert
and
and
before
I
say
something
wrong.
A
Do
you
want
to
include
a
slack
channel
or
something
in
the
chat,
richard
or
julie,
just
to
for
any
follow-up
questions?
Anything
like
that.
B
Yeah
we
have
the
I
mean,
for
we
have
to
split
this
for
cortex
and
prometheus.
You
have
you,
have
the
cncf
slack.
A
I
do,
let
me
put,
let
me
put
ours
in
and
online
programs
and
then,
if
anybody
has
any
other
questions,
you
can
hit
each
other
up
here.
A
A
Okay,
well,
if
there
are
no
other
questions,
I
want
to
thank
you
richard.
Thank
you,
everyone
for
hanging
in
there
with
us
as
we
got
things
started
a
little
bit
of
a
rough
rough
start,
but
I
think
this
was
a
great
one
and
you
got
tons
of
great
questions
and
let's
keep
those
conversations
rolling
and
thank
you
again,
and
the
recordings
will
be
up
in
a
little
bit
this
afternoon.