►
Description
With visual examples, in this video, we discuss how to use Prometheus to monitor your runners. We go from describing how to activate the runner Prometheus metrics endpoints to creating queries using PromQL and how we use Jsonnet at GitLab to create the Grafana dashboards for operating the GitLab SaaS Runner Fleet.
A
Hey
everyone
Danielson
here
principal
park
manager
for
Runner
I'm
joined
by
with
the
ideas
team,
Thomas,
musukin
senior
engineer
for
Ronald
who's,
been
with
runner
for
at
least
six
or
six
years
now,
two
months
or
seven
years,.
A
So
Thomas
is
like
one
of
the
original
architects
of
all
of
the
great
things
in
runner,
and
today
we
are
talking
about
metrics
and
specifically
metrics
endpoints
in
Iran.
It's
a
question.
I
get
all
the
time
with.
Customers
is
okay,
customers
typically
say:
hey,
David
I
know
you're
building
a
bunch
of
like
new
things
in
the
GitHub
UI
kind
of
related
to
observability
and
and
Runners,
but
like
what's
available
today
and
typically
when
customers
accident
I
say
well.
A
If
you
look
at
our
jobs,
we've
got
this
prometrius
metrics
endpoints
that
we
expose
on
a
runner,
and
you
can
do
all
these
great
things
with
it
in
terms
of
monitoring.
Let
me
just
share
my
screen
and
that's
typically
the
extent
of
my
answer,
because
I
actually
haven't
done
a
whole
lot
of
work
myself
in
doing
some
of
this,
so
very
recently,
Tomas
actually
added
some
new
metrics.
A
Hopefully
the
screen
is
showing
up
and
he's
what
he's
calling
the
Q
duration
histogram
metric
and
he
was
sharing
this
and
slacking
machines
have
really
great
grass,
so
I
thought
it'd
be
a
great
opportunity
for
us
to
like
chat
with
two
of
us.
Talk
about
what
this
new
thing
is
kind
of
the
benefits,
and
maybe
two
of
us
can
kind
of
walk
us
through
for
us
through
a
less
incline
like
how
you
go
from
understanding.
A
B
Okay,
thank
you.
So
you
are
opening
here,
a
huge
calm
of
what
can
be
done
and
how
to
be
done.
I
try
I,
try
because
we
have
here
both
things
that
are
really
really
easy
and
we
have
things
that
are
complex,
especially
if
you
are
new
to
Prometheus
monitoring
in
that
way.
Let's
start
maybe
from
the
basics.
Let
me
share
my
screen
and.
B
I
remember
when
I
think
this
is
the
this.
Is
the
one
I
want
to
use?
Yes,
so
can
you
see
the
console
I
think
you
can
see
the
console
yeah.
B
Yeah,
so
the
simplest
way,
the
the
simplest
thing
is
to
enable
the
the
metrics
endpoint
in
in
the
router,
because
when
you
look
on
the
config
terminal
file,
this
is
the
only
thing
you
need
to
add
a
global
setting
listing
address
with
the
address
to
which
you
want
to
bend.
So
basically,
this
means
0.9402
on
any
possible
interface.
B
This
will
be
on
my
laptop.
The
the
endpoint
is,
of
course
accessible
through
PCP,
so
if
we
can,
we
can
close
the
the
configuration
right
now
if
I,
if
I
start
a
runner
from
that
setting,
you
can
see
here
information
that
metrics
are
listening.
We
can
forget
about
about
that
output.
There
will
be
nothing
interesting
for
us
for
now.
B
What
we
are
interested
in
is
how
to
read
that,
yes,
and
fortunately,
with
with
the
way
how
primitive
use
originally
decided
to
go
with
metrics,
it's
a
simple
HTTP
get
request
to
a
web
server
that
now
is
started
on
that
listing
address
we
defined
so
when
I,
when
I
do
that
request,
I
get
a
list
of
metrics
with
the
values
current
for
the
moment
where
I
did
that
call
right
by
the
way,
I
think
that's
that
format
of
the
output
is
now
open
format
by
itself,
it's
still
used
by
Prometheus,
but
but
I
think
it
like
evolves
to
cncf
maintained
format
for
for
metrics
anyway.
B
This
is
what
what
we
have
in
the
runner
and
when
you,
when
you
scroll
through
that
there
is
a
a
lot
of
information.
Some
of
the
metrics
are
metrics
like
very
general,
because
we
have
few
what
is
named
in
Prometheus
World
collectors.
B
A
collector
is
is
something
inside
of
the
process
that
like
collects,
collects
data
that
can
be
next
exported
through
the
metrics.
So
you
can
see
here
some
metrics
name
process,
something
something.
This
is
one
of
the
one
of
the
collectors
available
in
The
Primitives
SDK
for
golang
that
you
can
just
hook
up
and
get
some
basic
informations
about
the
the
process
so,
for
example,
how
many
file
descriptor
the
running
process
is
using
right
now
there
is
another
General
collector
for
go
specific
metrics.
This
is
again
coming
from
the
Prometheus
SDK
for
golang.
B
It's
able
to
hook
into
goal
runtime
and
provide
you
some
information
about,
what's
happening
with
your
process
like
the
number
of
go
routines,
things
like
that.
To
be
honest,
I,
don't
remember.
When
was
the
last
time
I
I
had
to
worry
about
any
of
these
go
specific
metrics
I!
Think
from
the
from
the
from
this,
let's
name
it
default
metrics,
the
one
that
we've
been
observing
a
lot
was
the
number
of
open
file
descriptors,
because
this
is
usually
a
limited
resource
on
the
server.
Usually
by
default,
has
very
small
limits
and
with.
A
B
How
running
Works,
how
many
files
it
creates?
For
example,
any
network
connection
in
Unix
is
a
file
descriptor,
but
also
running
when
handles
jobs
is
storing
a
temporary
file
with
a
buffer
of
of
the
lock
output,
so
by
handling
hundreds
or
thousands
of
networking
connections
and
thousands
of
jobs
at
once.
We
are
creating
a
huge
number
of
file
descriptors
and
like
on
SAS
runners.
It
was
very
very
long
time
ago,
when
we,
when
we
discovered
that
the
default
limit
that
is
usually
set
for
the
process,
is
way
too
small
and.
A
B
We
had
to
increase
that
so
from
this
many
default,
General
metrics
I
think
the
process
open
if
this
was
the
the
one
that
I
was
looking
at
the
most
times,
because
we
we
had
few
cases
when
something
went
wrong
and
exceeding
the
open
if
this
limit
had
caused
an
incidents.
So
so,
yes,
that
that
metric
was
definitely
observed
by
us,
the
most
interesting
ones
and
I
think
the
more
the
the
ones
that
customers
are
mostly
interested
in
the
customers.
Users
are
the
metrics.
B
We
created
ourselves
that
describe
internals
of
how
Runner
works,
so
this
will
be
kind
of
the
of
the
of
the
business
metrics
of
the
business
logic
that
that
you,
as
a
owner
of
the
runner,
wants
to
look
on
to
understand,
what's
happening
with
your
Runner
and
whether
your
settings
are
what
you
would
like
to
have,
and
here
at
the
very
top.
B
Q,
q-
maybe
let's
maybe,
let's
grab
it
I-
think
I
have
locally
an
older
version
compiled
that
doesn't
have
this
new
metrics,
so
I
may
need
to
I
may
need
to
recompile
the
runner.
Anyway,
we
have
a
list
of
metrics
and
for
now
we
don't,
we
don't
have
any
documentation
for
what
exactly
metrics.
We
are
exporting
because
that
changes
in
time
and
and
in
the
past.
Initially
we
had
that,
but
that
documentation
was
very
quickly
outdated.
B
So
what
you
can
do,
for
example,
is
if,
if
you
are
learning
that,
if
you
are
new
to
this,
you
can
look
for
just
help
and
type
information.
So,
for
example,
you
can
see
that
there
is
a
metric
named,
let's
say,
gitlab
Runner
concurrent,
which
is
a
gauche,
and
you
can
see
the
description
is
the
current
value
of
concurrent
setting.
So
in
the
config
term
file
you
have
the
settings
name
concurrent,
it's
one
of
the
default
settings
and
one
of
the
few
that
are
required.
B
B
This
is
the
limit
is
the
limit
setting,
because
the
concurrent
one
is
for
the
whole
process
is
required.
Then
you
can
have
few
workers
in
that
config
term
file
and
on
the
worker
you
may
don't
need
you
may
set
a
limit
so
that
metric
will
then
be
provided
for
every
worker
and
we
have
gitlab
Runner
jobs,
which
is
also
a
guard
which
shows
you
the
current
number
of
running
jobs
of
running
bills.
B
So
one
of
the
simples
and
most
probably
informative
things
that
you
can
do
and
let
me
let
me
open
Prometheus
Square
Explorer
it's
to
find
out.
What's
your
saturation,
like
you
have
a
runner,
you
run
some
jobs.
You
set
some
concurrent.
B
The
thing
that
you
would
definitely
would
like
to
know
is
whether
it's
enough
or
not
whether
I'm,
using
all
of
that
capacity
or
maybe
the
concurrent,
is
too
much.
Maybe
I
can
save
some
money
and
limit
it
a
little.
So
what
you
can
do
is
is
take
a
simple
proportion,
so
we
have
the
gitlab
runner
jobs,
metric
and
divide
that
by
gitlab
Runner
concurrent.
A
B
So
how
many
jobs
per
how
maximum
can
it
be
and
if
what
we,
what
we
usually
want
to
do
here
is
also
limit
a
little.
B
B
What
does
that
mean
in
laymonster?
If,
if
we,
if
we,
if
we
look
on
this
gitlab
Runner
jobs,
metric
only,
let's
see
what
it
outputs,
so
you
can
see
here
the
the
help
information.
You
can
see
the
information
about
type
and
you
can
see
there
is
a
name
of
a
metric.
There
are
some
strange
things
between
brackets
and
there
is
the
number
at
the
very
end.
This
is
the
name
of
the
metric.
This
is
a
value
of
the
metric.
B
My
Runner
was
started,
but
it
it
doesn't
execute
any
jobs
right
now,
so
the
the
value
of
the
jobs
metric
is
zero,
but
here
we
have
what
in
Prometheus
world
is
named
labels,
so
every
metric
can
be
labeled
by
one
or
more
labels.
Every
label
can
have
one
or
more
values,
and
this
is
what
creates
the
dimensions
if
I
would
now
execute
here,
like
hundreds
of
jobs
on
that
Runner.
This
line,
this
line
here
would
be
repeated
multiple
times.
A
B
Would
have
multiple
values
and
different
values
for
the
metrics.
Let's
say
that
I
have
only
one
worker
there,
so
all
of
them
would
have
the
same
Runner
and
system
ID
label,
because
this
would
be
in
this.
In
this
specific
case,
this
would
be
static,
but
then
stage
State
executor
stage.
These
are
labels
showing
us
at
what
moment
of
the
execution.
The
job
is
because
the
job
on
the
runner
can
be
preparing
can
be
an
execution
can
be
in
turning
down.
B
B
B
A
Sounds
like
that
call
is
getting
worse.
Sorry.
B
Why
I
don't
have
okay,
maybe
let's
maybe
let's
go
to
to
promote
use
directly?
Maybe
there
is
something
odd
happening
with
Thanos,
so
GitHub
Runner
jobs.
Let's
go
quickly
through
that
yeah.
We
have
a
lot
of
data,
so
we
can.
We
can
work
on
that.
So
let's
go
to
our
previews
previous
queries.
So
this
is
a
H
and
now
I.
Remember
why
this
is
this
is
because
of
of
the
multi-dimensions
and
the
labels
that
we
need
to
include
because
the.
B
Metric
is
from
the
is
from
the
Global
level,
so
it's
not
labeled
with
the
with
the
label
named
Runner,
because
it's
not.
B
A
B
A
B
So
when
Prometheus
reads
that
metric,
you
can
see
that
it
has
already
few
labels
yeah
and
our
Prometheus
is
reading
a
metric
from
example,
Forum
a
remote
machine.
That's
named
Runners
manager,
private
blue,
one
blah
blah
blah
blah
blah
Port
9402.
So
it
creates
an
instance
equal
to
that
value
label
and
just
add
its
here
at
the
end
and
then
stores
that
locally
in
a
very,
very
magical
way,
how
Prometheus
stores
the
data
so
that
it
can.
A
B
Usually
you
don't
do
that
because,
because
when
working
with
this
output,
more
or
less,
you
know
what
you
want
to
get
if
we
are
working
with
grafana,
which
will
be
another
think
in
a
moment,
then
grafana
is
able
to
get
these
raw
outputs.
You
teach
graph,
like
you,
tell
grafana
what
type
of
data
it
is
and
it's
then
adding
the
the
the
the
specific
unique
unit
handling
by
itself,
so
I
say
brilliant.
B
So,
with
with
this
simple
query,
we
can
see
how
many
jobs
are
executed
on
every
Runner
by
how
many
jobs
can
be
executed
on
that
Runner.
So
we
get
the
concurrency
the
the
saturation
value
in
the
scope
of
the
concurrent
setting,
so
how
much
the
full
possible
capacity
of
that
Runner
is
already
used,
and-
and
this
allows
us
to
do
a
lot
of
magic
now,
what
exactly
we
can
do?
B
It
all
depends
on
what
metrics
you
are
looking
on
and
you
need
to
know
what
the
metrics
are
about,
and
this
is
something
that
is
hard
to
explain,
because
many
metrics
are
very,
very
low
level
things
for
me,
it's
very
obvious
what
they
are
about,
how
to
understand
them,
because
I
know
in
the
core
how
Runner
works.
I
understand
the
internal
Logic.
B
The
mechanisms
that
we
have
so
I
know
exactly
what
this
metric
means
for
some
for,
like
for
people
who
don't
don't,
like
the
engineering,
don't
contribute
to
the
runner,
they
just
use
the
Runner
for
their
work.
B
Some
things
may
be
a
little
harder,
so
this
is
an
opportunity
for
us
to
maybe
maybe
start
explaining
something
more
what
we
can
do
with
the
metrics
or
we
could
start
sharing
some
grafana
dashboards,
where
you
don't
need
to
understand
the
metrics,
but
rather
you
have
a
dashboard
that
explains
what
you
can
read
from
it
right,
so
something
something
to
consider
for
the
future.
Anyway,
we
have,
we
have
multiple
metrics.
B
If
you
go
through
permit
use
documentation,
you
will
see
that
there
are
three
main
types:
it's
a
gouge,
so
a
value
that
represents
the
current
value
of
something
it
goes
up
and
down.
All
the
time
you
have
a
counter
counter
is
a
value
that
is
only
increased
in
time.
Counters
may
be
restarted
to
zero,
for
example,
when
Runner
process
is
restarted
or
when
the
the
maximum
alerts
number
by
the
by
the
type
all
metrics
in
from
it,
you
use
SDK
for
forgo,
are
using
float64..
B
So
whatever
is
the
maximum
value
for
float,
64
will
also
be
the
the
maximum
value
for
the
counter
before
it's
restarted
to
zero.
That
Primitives,
when
dealing
with
that
Matrix
is
able
to
handle
the
restarts
so
counter
you
can
you
can
understand
it
as
a
metric
that
always
goes
up
and
what's
the
difference,
gouge
is
a
good
way
to
show
us
things
when
we
want
to
know.
B
What's
the
correct
number
at
this
moment,
for
example,
this
this
value
of
jobs,
how
many
jobs
we
have
right
now
now
we
have
like
five
of
them
in
five
minutes.
Maybe
it
will
be
50
in
another
five
minutes.
It
will
be
one,
this
will
be
going
up
and
down
all
the
time,
and
we
would
like
to
be
able
at
any
point
in
time
to
see
how
much
it
is,
but
we
also
have
a
metric
named.
B
First
time
after
the
first
job
will
be
will
be
taken
so
each
time
one
Runner
starts
a
job.
It
increases
a
counter
of
of
jobs
that
were
ever
started,
so
we
can
see
how
many
jobs
we
had
at
any
moment,
but
we
can
also
see
a
ratio
of
how
many
jobs
we
are
increasing
in
time.
Counters
are
very
good
to
handle
tracking
of
events
that
are
happening
in
times.
B
Gauchos
are
good
to
handle
tracking
of
what's
the
current
state
of
something,
and
there
is
also
there
is
also
one
last
one
last
type,
which
is
a
histogram
and
the
histogram
allows
us
to
Define
buckets
of
numbers
and
put
readings
into
that
bucket.
So
we
can
then
using
Prometheus
and
or
graphana
Magic.
B
So
as
I
it
seems
I've
compiled
Runner
locally
before
that
change.
So
let's
maybe
go
quickly
to
to
the
code
and
the
description
of
that
metric.
A
A
B
Exactly
okay,
so
we
have
a
metric
named
gitlab
Runner
job
queue,
duration
seconds.
This
is
a
new
addition,
but
it
provides
a
very
important
information
to
the
runner
user.
When
you
create
a
job
in
gitlab
CI,
it's
going
through
several
States
first
States
is
first
state
is
created
when
you
create
a
pipeline,
all
jobs
in
dark
pipeline
are
created
instantly
instantaneously
and
then,
depending
on
how
the
pipeline
is
processed,
jobs
are
being
slowly
transitioned
from
a
created
state
to
a
pending
State.
B
A
pending
State
means
that
this
job,
given
any
circumstances
that
control
that
it's
now
ready
to
be
started.
So
we
transition
like
gitlab
transitions
that
to
the
pending
state
and
then
that
job
awaits
for
the
runner
to
pick
it
up
as
I
always
remember
repeat:
it's
remember
to
it's
important
to
remember
that
it's
Runner
that
asks
for
jobs,
so
Runner
calls
gitlab
and
says:
hey.
Do
you
have
any
job
for
me?
Gitlab
then,
does
a
lot
of
magic,
which
we
will
not
cover
in
this
call
and
finally
gets
with
one
of
three
most
popular
responses.
B
So
please
repeat
anyway,
depending
on
how
you
configure
the
job,
how
your
project
is
configured,
how
many
Runners
of
different
types
and
configurations
you
have
available
the
specific
pending
job
that
you're
looking
on
Maybe
finally
executed
by
only
one
Runner
of
doesn't
available
or
maybe
executed
by
any
of
them
we
can
like
I
can
tell
that
it
depends
on
a
very
specific
configuration
of
your
case,
but
let's
simplify
that
that
thing,
let's
say
I
have
my
own
gitlab
instance
on
that
gitlab
instance.
On
the
whole
gitlop
instance,
I
have
one
Runner
nothing
more.
B
B
It
was
Independence
Day
two
minutes
ago,
it's
still
in
a
pending
State
what
it
means.
It
means
that
the
single
Runner
I
have
didn't
ask
for
that
job
and
from
our
experience
and
from
talking
with
many
customers
and
and
community
members,
the
speed
of
taking
jobs
from
the
pending
queue
is
probably
one
of
the
most
important
factors
when
it
goes
for
Runner
user
experience.
B
People
like
on
on
gitlab.com
on
our
SAS
Runners
people
expect
that
the
job
will
be
taken
from
the
pending
step
when
the
job
is
targeting
one
of
our
sauce
Runners
that
it
will
depend,
it
will
be
handled
and
taken
from
the
pending
queue
in
a
matter
of
seconds,
maybe
a
few
minutes,
but
when
that
goes
beyond
10
minutes,
20
minutes
an
hour
for
for
most
of
the
users.
This
is
something
that
they
can't
accept
and
now,
from
my
point
of
view,
as
a
person
who
manages
these
Runners
this.
B
This
means
that
tracking,
that
timing
is
something
that
is
important
for
me,
because
I
can
like
track.
I
was
able
right
now
to
track
multiple
different
metrics
I
was
able
to
know
how's
the
auto
scaling
of
the
runner
behaving.
What's
the
usage
of
the
runner
host
resources,
the
file
descriptors,
we
talked
about
all
of
that
I
was
able
to
to
observe
fine
tune,
notice
that
we
are
like
reaching,
maybe
some
capacities.
B
So
maybe
we
need
to
to
to
to
reconfigure
something,
but
knowing
whether
the
runners
pick
jobs
in
an
acceptable
time
was
hard,
because
we
didn't
have
that
information.
On
the
runner
side,
we
had
that
information
for
a
few
years
on
the
gitlab
site
that
on
the
gitlab
side,
this
has
two
problems.
First,
we
don't
provide
this
metrics
to
our
customers
on
gitlab.com
and
if,
if
you
are
a
self-hosted
gitlab
administrator,
you
may
also
like
provide
that
instance
to
users
who
not
necessarily
would
like
to
to
give
the
metrics
of
the
system
too.
B
So,
access
to
that
metric
is
already
limited
if,
for
example,
someone
self
hosts
runners
for
gitlab.com
projects
no
way
to
find
out
whether
Runner
picks
jobs
as
fast
as
we
would
like.
The
second
problem
is
that
on
the
GitHub
site
we
are
labeling.
The
metric
with
the
duration
was
counted
for
the
instance
Runner
or
not
nothing
else.
We
can't
track
that
per
project.
We
can
track
that
per
group.
B
Then
it's
using
like
a
lot
of
lot
of
compute
magic
to
like
the
disaggregations,
allow
you
to
to
like
quickly
access
this
metrics
and
now,
if
you,
for
example,
this
is
this
is
one
of
the
one
of
the
things
we've
been
often
asked,
and
we
also
always
need,
unfortunately,
to
to
decline.
Hey
there
is
this
metric
on
gitlab?
Could
you
like
label
it
with
the
job
ID,
because
I
would
like
to
track
something
per
job
no
weekend
because
a
job?
It's
like
this
will
be
two
big
cardinality.
B
We
will
quickly
quickly
get
hundreds
thousands,
maybe
even
Millions,
jobs
like
on
gitlab.com.
We
are
handling
few
million
jobs
every
week
and
each
of
the
job
would
horse
permit
use
to
create
a
separate
time
series
stored
in
the
storage
and
then
every
query
would
need
to
like
gather
all
of
these
millions
of
Time
series
to
like
compute
the
data
data
data
together.
B
If
you
work
with
Sobe
cardinality
of
the
of
the
label
values,
we
learned
that
hard
way
eight
kills
permit
use
very
quickly.
This
is
why,
in
gitlab,
we
in
fact
have
some
some
policies
and
checks
about
what
metrics
we
are
adding,
what
labels
they
have,
how
many
labels
they
have
basically
I
think
at
this
moment
we
are
declining,
adding
a
metric
that
would
have
more
than
10
labels
by
itself,
because
there
are
also
a
few
more
that
Prometheus
will
add,
especially
if
that
metric
is
going
to
be
to
be
tracked
by
gitlab.com
monitored.
B
So
this
is
a
problem
on
gitlab.com
like
on.
On
the
gitlab
instance,
we
have
a
very,
very
small
understanding
on
when
the
jobs
are.
We
know
only
what's
the
histogram
of
Q
duration,
but,
for
instance,
runners
or
for
non-instance
runners,
and
then
we
can
give
that
information
to
the
users
and
even
for
us
this
was
a
big
limitation
now,
since
gitlab
16
4.
This
is
this
is
a
feature
that
will
be
released
in
gitlab
and
gitlab
parameter.
16
4..
That
metric
will
be
also
given
to
the
run.
B
So
when
the
job
is
scheduled
to
the
runner,
when
Runner
asks
for
a
job,
gitlab
finds
a
job
assigns
that
to
this
Runner
and
sends
back
the
sends
back
the
the
job
payload
one
of
the
new
information.
The
job
payload
is
the
value
of
Killington.
So
we
send
to
the
random
information
about
all
details
of
how
the
job
should
be
executed
and
hey.
Your
job
was
in
the
pending
queue
for
five
seconds
or
10
seconds
or
12
hours.
B
Yes,
it
happens,
and
now
for
us
for
us
speaking
as
a
runner
owner
for
the
SAS,
runs
I'm
now
able
to
track
that
queuing
time
for
every
single
Runner
that
we
have
so.
B
Back,
let's
get
back
to
I'll
copy
this
information,
let's
get
back
to
our
Prometheus
and
let's
for
now,
don't
do
any
magic
here,
but
some
by
shot
shot
is
our
way
how
we,
how
we
label
things
on
our
infrastructure.
This
is
not
a
label
that
is
automatically
added
so
for
anyone
else.
This
Square
will
probably
not
work
you
could.
You
could
use
here
instance,
let's,
let's
do
instance,
let's
do
instance
for
a
moment
and
why
it
doesn't
work.
A
B
Oh
yeah,
yeah,
because
this
is
a
metric
name,
but
histogram
creates
met
through
like
when
you
define
a
histogram
metric
like
here.
It's
named
like
that,
but
because,
if
it's
it's
a
histogram,
it
will
create
three
different
metrics.
A
Bucket
and
so
Thomas
before
you
jump
off
the
screen,
if
you
go
back
to
that
screen
for
a
quick
second
for
folks,
so
quick
context,
I
don't
think
we
mentioned
this
or
maybe
Thomas
mentioned
this.
The
screen
that
Tomas
is
looking
at
right
now,
I
believe,
is
the
Runner
code
base
and
he's
looking
at
the
actual.
Yes,.
B
Yes,
it's
in
GitHub
Runner
commands
buildscalper.go.
If
anyone
would
like
to
look
at
that
yeah.
So
if
we
yeah,
if
we
execute
here,
it
will
give
us
some
information.
B
B
B
Okay
yeah
so
now
here
what
I
can
do
is
I
can
see
some
information
about
every
separate
chart
so
previously
taking
metrics
from
gitlab
I
could
only
know
how
good
behive
all
instance
Runners
that
we
provide,
and
we
have
like.
B
B
So
anything
going
wrong
will
be
very
quickly
alerted,
on
the
other
hand,
Windows
and
Mac
OS.
These
are
still
in
beta.
We
are
still.
We
are
still
experimenting
with
them,
learning
how
to
how
to
tune
them
to
like
fit
best.
B
What
we
want,
and
while
we
are
also
interested
in
measuring
them
and
providing
us
us
as
good
performance
as
we
can,
we
expect
that
there
may
be
something
wrong,
so
we
would
like
to,
for
example,
have
lower
alerting
thresholds
or
higher
alerting
thresholds,
depending
how
you
look
on
that
previously
I
couldn't
reverse
that
it
was
all
hidden
in
one
bucket.
This
was,
for
instance,
or
non-instance
run
here.
A
Hey
and
quick
notice,
when
I
jump
in
here
for
customers
that
I've
been
looking
at
this
video.
It's
a
quick
note
and
I'm
sure
it's
pretty
obvious,
but
I'm
just
going
to
call
it
out
on
getthat.com.
These
are
all
instance
level
or
shared
Runners.
I
know
for
some
of
my
customers.
You
have
a
mixed
environment.
You
are
offering
some
instance
Runners
you're,
offering
Runners
at
a
group
level
in
some
cases,
you're
allowing
the
group
owners
to
offer
their
own
Runners.
A
So
again,
if
you
have
a
large
run
of
three,
if
you
like,
GitHub
and
you're,
doing
tens
of
thousands
of
jobs
per
month
across
a
mixed,
flip
and
you're
thinking
about
how
to
do
something
like
this,
you
have
to
kind
of
think
about.
Okay,
maybe
start
with
my
instance
Runners
first,
you
know
and
kind
of
think
methodically
about
how
you
might
want
to
implement
monitoring,
especially
if
you
have
a
clinical
disparately
thanks,
Tomas
yeah.
B
This
is
one.
This
is
one
funny
from
ql
query.
So
what
we
do?
We
are
getting
this
this
bucket
metric
for
every
single
entry
and
entry
here
will
be
this
metric
with
all
of
the
permutations
of
label
value
pairs.
We
are
calculating
a
ratio
in
the
past
five
minutes,
because
this
bucket
metric
is
a
counter.
B
Then
we
are
summing
that
by
sharp
because
I
want
to
know
about
sharks,
we
are
summing
that
by
Le
so
the
bucket
value,
and
then
there
is
a
Prometheus
function
named
histogram
quanta,
and
this
now
says
me
that
on
our
private
Runners,
ninety
percent
of
jobs
are
handled
from
the
pending
queue
in
less
than
7.8
and
8
Seconds
on
our
most
popular
SAS,
Linux
small,
where
it
says
Linux,
small,
oh
South,
Linux
small
was
not
yet
updated
to
that
version.
So
we
will
not
see
it
here.
B
Because
it's
19th,
percentile,
90
percent
of
jobs
were
eggs
were
taken
from
the
pending
queue
below
0.98
seconds.
B
So
in
less
than
a
second,
and
now
we
are
able
to
to
like
Analyze
That
per
The,
Shard
per
instance,
per
whatever
other
grouping
definition.
We
will
Define
for
ourselves
and
we
will
like
to
say
hey.
This
is
a
a
virtual
group
that
we
like
want
it
to
have
a
specific
SLI
as
low
and,
if
like,
for
example,
that
the
small
runners
that
we
that
we
have
here
are
current.
B
Our
current
objective
is
set
is
said
to
to
also
like
look
on
the
first
scheduling
algorithm
again,
not
something
that
I
would
like
to
go
deep
into
I
agree.
A
B
But
like
for
specific
cases
of
jobs
that
are
finally
landing
on
the
small
runners,
we
expect
them
to
be
queued
for
no
more
than
a
mint.
If,
if
this
specific
specific
types
of
jobs
will
be
cute
for
more
than
a
minute,
we
are
reducing
our
object
score.
Reducing
the
object,
store
called
below
some
threshold,
others,
the
EOC
alerts
us.
B
It
declares
an
incident
and
we
are
forced
to
find
out
what's
happening
and
how
to
fix
this
problem
so
that
our
customers
don't
feel
that
the
problem
is
even
there
and
that
metric
is
based
on
how
long
their
job
is
in
the
queue
which
now
allows
us
to
to
adjust
this.
This
alerting
for
every
single
shot
and
the
the
deciding
where
we
accept
bigger
or
lower
delays.
A
A
quick
recap
of
the
Box,
sorry
Thomas,
so
pick
recap,
of
course,
Thomas
mentioned
SLI,
which
is
just
for
clarity.
Service
level
indicator
I
believe
he
mentioned
SLO,
which
is
service
level
objective,
and
so
Thomas's
team
has
service
level
objectives
to
meet
for
how
quickly
those
jobs
get
picked
up.
Imagine
adx
just
hold
the
discussion
another
day
where,
basically,
we
do
a
learning
or
Thomas's
team-based
data
learning
based
on
whether
or
not
we're
picking
up
those
jobs
within
the
SLO,
Target
or
civil
server
objective
targets.
B
Okay,
so
we
talked
a
little
about
what
properties
metrics
are
how
we
can
learn
something
about
them
from
the
output,
how
you
can
query
them
from
Prometheus
now
there
are
two
more
things.
First
thing
is
that
all
of
this
magic
here
works
when
you
get
the
metrics
to
Primitives,
and
for
that
there
is
no
other
way
you
need
to
go
through
the
documentation,
learn
how
to
set
up
permit
use
and
think
how
to
set
up
Primitives.
In
your
case,
because
some
people
are
using
kubernetes,
some
people
are
using
bar
metal.
B
Some
people
are
using
Cloud
instances
with
with
infrastructure,
automation,
Some
Cloud
instances
with
everything
done
manual.
I
can't
like
answer
you
what's
the
best
thing,
because
you
know
by
yourself
probably
how
you
would
like
to
have
it.
Prometheus
is
currently
a
mature
project
is
part
of
the
cancf
widely
used
across
the
the
cloud
computing
industry.
B
So
at
this
moment
it
shouldn't
be
hard
to
find
tutorials
and
good
documentation
about
how
to
set
it
up,
in
any
specific
case
that
you
think
you
are
so
so.
This
is
a
part
that
that
I
really
can't
hear
told
much
because
trying
to
show
how
we
set
up
for
gitlab
come
infrastructure.
B
The
the
thing
is
that
you
need
to
get
them
from
your
server
and
in
this
primitive
server
you
need
to
configure
scraping
of
the
endpoint
that
we
that
we
configured
here
so
first
thing
enable
metrics
on
the
runner
and
start
the
process.
This
is
important
if
you
will
update
this
config
term
file.
When
the
runner
process
is
working,
the
metric
service
will
be
not
started.
You
need
to
restart
the
runner
process,
so
start
the
running
process.
With
the
listing
address
in
this
will
this
will
create
the
listener.
B
This
one
is
maybe
not
as
interesting,
but
let's
look,
for
example,
on
Runner
manager,
Runner
manager,
details
and
all
of
the
graphs.
Here
we
have
it's
all
from
Prometheus
metrics.
Most
of
them
are
taken
from
the
runner,
but
we
also
have
some
metrics
from
gitlab.
We
have
some
metrics
from
gitlab
work
course.
There
are
a
few
parts
of
the
gitlab
application
stack
that
are
more
or
less
related
to
gitlab
cicd,
that
more
or
less
indicate
what's
happening
with
the
runners,
and
especially
in
case
of
problems
and
incidents.
B
We
we
track
also
gitlab
and
Workhorse
and
few
other
places
to
know
whether
it's
caused
by
Runner
or
whether
Runner
is
just
having
problems
because
I
don't
know,
we
have
a
database
incident
right
now
and
because
all
of
the
jobs
skewing
and
assigning
to
Runner
is
happening
through
SQL
query.
Thank
you,
the
database.
If
database
is
struggling,
it
affects
all
gitlab,
including
the
runners,
so
we
may
be,
for
example,
by
this
here
we
may
be
alerted
that
the
updates
dropped,
but
then
going
through
some
incidents.
B
We
may
find
out
that
okay,
it
dropped
because
we
have
right
now
a
database
incident,
and
this
is
not
something
that
I
can
fix
because
I'm,
not
a
database
specialist.
We
need
to
get
a
series
on
our
database
team
involved
so
that
they
can
fix
the
database
and
that
will
fix
many
things,
including
products.
B
Yeah
this,
this
is
a
live
view
of
the
metric
we've
been
playing
with
here.
So
this
is
this.
Is
this?
Is
taking
this
metric
and
using
a
grafana
heat
map
panel
type
to
show
the
timings
below
you
can
see
the
The
Legend
So,
the
darker,
the
color,
the
smaller
number
of
of
events
in
a
bucket,
the
lighter
the
color,
the
bigger
number,
and
looking
at
that,
you
can
see
that
the
majority
of
jobs
happens
here
so
below
one
second
and.
A
B
B
There
was
six
jobs
scheduled
for
more
than
an
hour,
but
you
have
like
1
to
15
minutes.
Occasionally
the
majority
of
jobs
is
scheduled
below
10
seconds,
and
this
is
what
I
can
read
from
that
from
that
graph
and
now,
given
that
our
our
objective
is
that
specific
cases
of
jobs
should
be
scheduled
below
one
minute
and
I
see
here
that
most
of
the
jobs
are
scheduled
below
one
minute.
I
now
know
that
everything
is
going
fine
as
I
wanted.
A
And
then
Thomas
last
question
on
this
one
before
you
get
off
of
it,
everything
is
going
fine,
don't
agree,
but
for
those
sections
of
the
histogram,
where
there's
a
little
bit
of
like
abandoning
say
up
to
the
five
minute
or
maybe
15
minute
Mark.
As
you
look
on
this
graph
I,
do
you
should
you
be
concerned?
Should
I
usually
be
concerned
about
where
those
bands
are
having
five
or
15
minutes
or
because
the
frequency
or
the
count
is
so
low?
It's
not
something
to
worry
about.
B
Yeah
this,
this
all
depends
on.
What's
what
are
your
objectives,
and
this
is
this?
Is
there
that
the
class
of
question
where
I
can
give
you
a
straight
answer,
because
it
depends
it
depends
of
what's
your
goal,
it
depends
of
what
your
objective.
What
is
your
configuration,
for
example?
We
want
to
get.
We
want
to
bring
the
best
user
experience.
We
we
learned
that
for
the
users,
one
of
the
most
I,
don't
say
the
most
important,
but
why?
B
One
of
the
most
important
indicators
for
their
happiness
from
from
usage
of
our
platform
is
that
their
jobs
are
not
scheduled
along
in
the
pending
queue,
because
they
may
take
longer
or
shorter
time.
It
all
depends
on
what
you
are
doing
there
for
some
people.
A
job
executed
for
more
than
five
minutes
will
be
too
long,
and
they
need
to
look
to
search.
B
Basically,
we
need
to
throw
some
money,
sometimes
over
provisioning,
our
Runners
Fleet,
to
like
give
it
a
lot
of
space
to
quickly
take
the
jobs,
but
you
may
be
a
user
who
is
fine
with
having
this
these
timings
even
much
longer
and
waiting
even
an
hour
for
a
job
to
be
taken
from
the
pending
State,
because
you
produce
like
10
jobs
a
week,
and
you
have
one
Runner
configured
to
handle
only
one
job
and
you
don't
want
to
invest
more
money
in
that
CI
power.
B
You
don't
just
require
so
big
capacity
and
speed
of
that
and
then
seeing
seeing
in
that
histogram
seeing
the
jumps
over
hour
over
the
Infinity.
For
your
specific
case,
it
may
be
totally
okay
and-
and
you
may
even
don't
care
about
that
graph
at
all.
You
may
be
interested
in
other
metrics
about
Runners
to
to
decide
whether
this
rather
Runner
is
working
as
you
want
or
not.
So
it's
not
it's,
not
something
where
I
can
give
you
a
magical
answer
that
will
fit
all
cases,
because
most
cases
will
be
different.
B
But
from
our
point
of
view,
like
the
SAS
Runners,
maintainer
I
know
that
our
users
are
interested
in
the
quickest
possible
scheduling.
So
what
I'm
looking
here
is
having
the
lowest
readings
and
now
this
occasional
spikes,
it's
not
bad,
and
it's
definitely
within
the
like
below
this
one
minute.
It's
within
the
I'm.
Sorry,
within
the
service
level,
objective
that
we
have
and
like
I
said
it's
not
for
all
jobs.
B
It's
for
specific
jobs,
given
how
first
kitchen
algorithm
works,
and
we
would
like
need
to
go
a
little
deeper
to
understand
why
these
few
occasional
readings
below
even
five
minutes.
It's
it's
not
a
problem
for
me,
but
if
we
would
jump
to
I
think
it
was
Thursday.
Last
week
when
we
had
an
incident,
Thursday
or
Wednesday
I
think
it
was
Wednesday.
B
Or
not
anyway,
we
had
an
incident
when
oh
I
can
see
it,
because
it
was
on
the
small
runners.
We
had
a
small
small
Linux
Runners
chart
incident.
B
B
B
So
we
have
Prometheus
alerts
that
are
pinging
us
and
we
have
grafana
to
get
an
overview
of
what's
happened
and
the
the
big
thing
for
the
customer
is
the
big
problem
for
for
customers
and
users
is
how
to
get
into
this
rafana
dashboard
and
yeah.
My
Firefox
is
failing
recently,
so
that's
not
that's
not
a
problem
of
grafana.
B
This
is,
this
is
again
a
problematic
thing,
because
we
have
our
dashboards
created
programmatically
from
from
code
we
have,
the
project
is
public,
so
I
can
I
can
share
it
here:
dashboard,
EI
runners,
so
looking
at
looking
on
at
this
dashboard
that
we've
been
looking
here,
CA
Runners
incident
support
Runner
manager,
so
the
dashboard
that
you
can
see
here
with
all
of
the
panels
and
all
of
this
nice
color
photographs
is
defined
in
this
file.
A
B
To
know
what's
happening
here,
you
need
to
go
to
all
of
this
includes,
and
here
we
are
going
very,
very
deep
because
first,
if
you
would
like
to
like
look
on
this
project
of
us
ours
and
and
analyze
it
and
build
on
top
of
that,
you
need
to
learn
jsonnet.
You
need
to
learn
the
the
jsonnet
library
for
grafana
so,
for
example,
the
the
Q
duration
metric
that
we've
been
looking
on.
It's
defined
in
job
Cube,
graphs,
duration,
histogram,
and
here
you
see
the
where
we
had
and.
B
Is
it
this
one
yeah?
It's
this
one
I
think
or
maybe
in
job
history,
job
graphs
anyway,
as
you
can
see,
there
is
a
lot
of
strange
code
here.
You
would
need
to
learn
Json
it.
You
would
need
to
learn
how
Prometheus
work
different
panels
of
grafana,
etc,
etc,
etc.
If
someone
wants
to
do
it
in
scale
going
through
that
hard
lesson
definitely
will
benefit
like
for
us.
B
It
was
a
huge
change
when
we
switched
from
handcrafted
dashboards
in
grafana,
because
in
grafana
you
can
go
through
through
UI,
like
YouTube
videos,
virtualization
the
heat
map,
let's
go
to
our
job,
Q
histogram,
and
so
we
add
the
query.
We
need
to
make
sure
that,
of
course
you
need
to
set
up
grafana.
You
need
to
point
it
to
your
Primitives.
There
are
documentation
for
that,
and
I
definitely
will
not
explain
that.
B
We
would
need
to
throw
away
this
because
this
is
part
of
jsonnet
templating
and
something
will
probably
not
work
here,
yeah,
because
we
use
variables
that
are
not
defined,
so
maybe
let's
get
rid
of
the
variables,
and
because
this
is
a
heat
map,
you
need
to
know
that
the
Heat
match
requires
that
you
will
use
a
heat
map
format
of
the
queue
and
you
have
more
or
less.
What
we
had.
We
can
then
go
to
cell
display
unit
is
a
time
represented
in
seconds.
B
And
probably
have
to
refresh-
or
maybe
it's
not
in
the
cell
display-
maybe
it's
in
the.
Why
yeah
it's
in
the
y-axis
so
time
seconds,
and
here
you
have-
you
have
basically
the
same
output
that
we've
got
in
the
dashboard
we've
been
looking
on.
All
of
that
clicking
can
be
eventually
turned
into
a
Json
file
if
we
go
back
to
our
dashboard,
I'm,
not
sure
if
I
have
here
an
option
to
look
to
the
code
of
the
full
dashboard
yeah
Json
model,
so
that
dashboard
that
you
can
see
here.
B
Which
takes
a
while
we'll
see
if
it
will
load
first,
two
rows
of
that
dashboard
are
exactly
the
same,
yeah
included.
So
here
you
have
exactly
the
same,
exactly
the
same
panels
that
were
on
the
previous
dashboard
because
for
the
runner
we
have
what
we
can
see
here
incident
support
dashboards.
We
recognize
that
there
are
some
repeating
patterns
of
running
instance
and
they
require
looking
on
specific
metrics.
So
we
group
that
in
in
few
in
few
groups,
but
for
each
of
them,
we
want
to
see
saturation.
B
We
want
to
see
the
updex
value
a
general
view
of
how
many
jobs
we
are
executing.
What's
the
queue
timing,
what's
the
queue
size?
All
of
that
is
repeated
on
all
of
these
five
dashboards
here,
but
below
we
have
things
specific
for
in
this
case,
is
the
database
incident
support
using
jsonet
using
this
Pro.
This
project
allows
us
to
define
a
reusable
component,
like
this
service
objects
panel,
that
we
can
then
pull
in
dozens
of
dashboards
and
don't
need
to
repeat
the
same
Json
definition
over
and
over
again.
B
So
if
someone
wants
to
work
in
a
bigger
scale,
then
going
to
the
hard
time
of
learning
Json
ad
graph
on
ad
learning
how
to
like
paint
all
of
that
together
and
feed
grafana
with
these
dashboard
definitions
will
definitely
pay
off
in
some
time.
B
B
Dashboards
exports
yeah,
it's
github.com,
gitlab,
4G,
rafana
dashboards
and
this
project
from
what
I
remember
has
a
daily
or
Harley
yeah.
Oh
I,
think
I
think
it's
the
daily
Daily
Dump
of.
B
B
B
B
Add
these
others.
One
thing
is
that
there
are
some
annotations.
Annotation
is
something
that
you
need
to
additionally
configure
in
grafana.
This
is
a
totally
totally
different
story.
I,
don't
remember
where
on
this
list
is
the
I
think
it
would
be
in
the
inputs
or
something
like
that,
because
you
may
have
multiple
metric
sources
for
grafana?
B
Basically,
if
you
want
to
use
our
dashboards
as
they
are,
you
need
to
feed
it
with
the
metrics
that
it
requires,
or
you
will
have
just
empty
panels
in
some
cases,
but
if
it's
just
for
the
runner
part,
if
you
like,
don't
care
about
many
things
like
here,
but
you
want
to
have
this
graph
or
I'm.
Sorry,
you
want
to
have
this
graph
and
this
graph
and,
for
example,
these
graphs,
because
this
is
all
that
you're
interested
in
you
could
like
copy
that
Json
file.
B
Then
here
I
can
go
into
edit
because,
like
I
said,
these
are
created
by
from
code
and
they
are
marked
for
grafana
to
not
be
editable.
But
if
you
create
the
dashboard
by
hand
even
from
the
import,
you
can
go
to
each
panel
edit.
It
see
how
exactly
it's
created.
What
query
it
uses,
what
settings
it
uses.
So
you
can
learn
in
that
way.
What
metrics?
In
what
way
we
present
to
give
a
useful
information
for
someone
who
who
manages
the
runner
and
would
like
to
know
what's
happening
with
this
one.
A
Wow
Tomas
this
was
eye-opening
I
felt
like
I've
learned
like
a
whole
Year's
worth
of
stuck
in
like
the
last
hour,
and
for
folks
that
are
looking
at
this.
Please,
you
know
give
a
shout
out
to
tomasa
news
for
us
I'm
going
to
upload
the
video
to
GitHub
and
filter
it'll,
be
public,
so
customers,
if
you're
looking
at
this
video-
and
you
find
it
helpful
and
you'd
like
us
to
cover
additional
topics
in
the
future.
Please
comment
on
the
video.
A
Let
us
know,
I
might
also
maybe
create
an
issue
that
we're
linked
to
as
well,
but
we
definitely
want
to
get
your
feedback,
because
the
the
goal
for
this
is
to
give
you
the
information
that
you
need
to
manage
your
free
Tomas.
This
was
brilliant
I,
but
this
was
beyond
my
expectations.
Thank.
B
You
so
much
for
doing
this,
I'm
I'm,
happy
I
could
I
could
give
you
information.
That
is
useful,
like
I
said
all
of
that,
it's
hard
when
you
look
on
it
for
the
first
time,
especially
when
you
need
to
learn
Prometheus
like
Prometheus,
has
a
specific
mindset,
let's
name
it
behind
how
it
handles
metrics
and
not
for
everyone.
It's
it's
easy
to
understand
from
the
beginning.
When
you
start
working
with
it,
when
you
get
used
to
it,
it
becomes
very,
very
clear,
and
at
least
for
me
it
was
very
easy
to
work.
B
After
a
few
weeks
when
I
learned
how
to
use
it,
then
same
goes
with
with
grafana
when
you,
when
you
first
time
start
using
everything,
is
like
this
panel
boxes
settings.
All
of
that
is
very,
very
cryptic,
and
there
is
documentation
for
most
of
that,
but,
like
every
time
with
the
recommendation,
you
like
read
that
and
it's
like
yeah,
okay,
I
read
that
and
now
what
to
do
so.
So
the
the
hard
part
is
to
start
then,
because,
like
I
went,
I
went
through
all
of
these
stages.
B
I
couldn't
understand
why
why
we
are
doing
that
in
in
this
way,
because
I
was
used
to
the
monitoring
systems
like
zabix,
like
check
MK,
where
I'm,
sorry,
where
you
have
a
different
way
of
of
tracking
metrics,
or
there
were
some
systems
where
you
were
like
pushing
specific
metric
to
to
the
to
the
monitoring
system
and
feeding
it
with
with
some
specific
values.
And
here
we
have
this,
this
metric,
exporting
and
collection,
and
why
this
is
happening,
sometimes
asynchronously
and
then
the
specific
way
of
naming
metrics
I,
remember.
B
I
was
struggling
for
at
least
two
years
what
to
put
into
metric
name
and
what
exactly
to
put
into
the
label
what
makes
sense
to
be
part
of
a
metric
name.
What
makes
sense
here
as
a
label
name
and
value,
but
once
you
start
working
with
that,
more
and
more
it's
become
more
natural.
Then,
when
we
have
this
metrics,
we
had
to
like
start
looking
on
them.
We
started
defining
first
alerting
because
Prometheus
is
not
only
monitoring
system,
it's
also
a
large
alerting
mechanism.
B
There
is
a
way
to
define,
alerting
rules
and
all
everything,
methods
and
routing
between
them
and
it's
a
huge,
huge
thing
by
itself,
but
we've
been
defining
that
by
hand.
Then
we've
got
grafana
how
to
connect
rafana
with
Prometheus
how
to
shape
the
dashboard
I.
Remember
the
CI
dashboard
that
we
used
before
we
migrated
to
this
Json
net
produced
One
had
about
100
different
panels.
B
We
we
like
made
the
rows
collapsed
because,
like
one
day
when
I
tried
to
load
that
panel
I
was
even
not
able
to
edit
enter
the
edit
page
because
it
was
like
failing
unloaded,
because
there
are
so
many
things
that
it
could
was
trying
to
pull
from
the
API
from
grafana,
and
we
wanted
to
decompose
that.
But
the
composing
that
to
multiple
dashboards
means
I,
need
to
copy
and
paste
panel
definitions
and
once
I
update.
B
Something
I
need
to
remember
where
to
update
that,
and
this
was
like
stopping
us
from
decomposing
that
huge
dashboard
and
month
within
month
it
was
less
and
less
useful.
Until
someone
showed
me
that
hey
we
are
using
jsonnet
from
a
few
months.
Maybe
you
want
to
like
experiment
with
that
and
migrate.
The
runner
monitoring
to
that
so
again,
starting
from
scratch.
B
Learning
how
jsonet
Works
learning
works
in
the
graphonet
library,
so
jsonnet
library
for
grafana
learning
was
in
a
wrapper
to
that
library,
because
we
already
created
some
wrappers
for
the
most
most
commonly
used
panels
that
we
have.
How
is
how
how
how
we
Define
dashboards
so
that
they
look
like
they
look
and-
and
they
are
not
editable
and
have
some
marking
and
linking
to
code,
etc,
etc.
B
So,
first
days
it
was
like
a
black
magic
today,
I
like
when
we,
when
we
got
this
new
metric,
it
took
me,
like
maybe
20
30
minutes
to
to
update
the
definitions
to
like
start
using
the
new
metric
in
a
new
way
that
we
that
we
want.
So
this
is
very
hard
when
you
start,
especially
if
you
don't
have
experience
with
premade
using
grafana.
B
If
you
know
how
Prometheus
and
your
funnel
works,
then
you
know
everything
you
need,
because
the
runner
part
is
just
knowing
what
metrics
there
are
and
sometimes
to
understand
what
the
metric
represents.
You
just
need
to
go
into
the
code
to
sell
to
see
how
it's
gather,
how
it's
collected,
what
information
it
presents,
because
sometimes
it's
hard
to
like
put
into
the
description.
B
What
exactly
it
is,
and
you
like
need
to
feel
feel
what
part
of
the
code
it's
in
and
what
exactly
it
shows,
but
things
like
concurrent
limit
jobs
version
info,
which
is
a
static,
constant
value,
just
showing
what
version
information
about
the
runner
is.
These
are
these
are
things
that
you
just
need
to
to
understand
how
they
work
and
and
fit
into
your
parameters
and
and
gravana.
A
Awesome,
hello,
Thomas,
Thanks,
again
I'll
be
I.
Guess
we'll
be
seeing
you
next
time
in
one
of
these
hour
long
sessions
talk
to
you
soon,
bye-bye,
okay,
bye.