►
Description
Prometheus co-founder Julius Volz presents some usability challenges with Prometheus's query language PromQL, as well as approaches to make usage easier. PromQL is great for doing calculations on time series data. However, the language also has plenty of sharp edges and can be challenging to learn and work with for both beginners and more advanced users. The talk starts with a language overview, goes into examples of usability issues, and then presents recent efforts like PromLens (https://promlens.com/ by Julius' company PromLabs) and the new PromQL text editor that will be part of both PromLens and Prometheus soon.
A
So
today
I
want
to
talk
about
so
first
of
all,
yeah,
I'm
I'm
julius
the
co-founder
of
prometheus
and
founder
of
prom
labs,
a
company
that
does
services
and
consulting
and
integration
work
around
prometheus,
but
also
is
building
its
first
product
right
now
with
prominence
which
I'll
also
touch
on
today.
So
that's
for
my
background
and
today
I
want
to
talk
about
the
query
language
that
we
have
in
prometheus.
A
You
know
not
really
give
an
introduction
to
it,
but
more
talk
about
some
of
the
challenges
that
users
have
with
it
and
then
how
a
new
text
editor
and
the
product
prom
lens,
try
to
ease
some
of
those
pains.
So
just
taking
a
step
back
here.
This
is
the
overall
prometheus
architecture.
A
A
A
Another
use
case
is
for
ad
hoc
diagnostics
to
just
ask
a
question
against
your
infrastructure
at
a
given
moment.
It
doesn't
need
to
be
part
of
a
dashboard
in
this
case.
For
example,
I
was
curious
what
is
in
my
onenote
kubernetes
cluster,
the
container
type
with
or
the
image
with
the
most
cpu
usage
that
is
deployed
and
basically
sorting
all
the
different
image
types
by
their
cpu
usage.
A
The
third
use
for
prom
ql
is
for
alerting
and
alerting
rules.
You
configure
as
a
yaml
file
in
prometheus
with
you
know,
an
alert
name
and
some
human,
readable
metadata
and
routing
labels,
and
so
on,
but
the
heart
of
every
alerting
rule
is
actually
the
prom
ql
expression
that
I
highlighted
here
in
this
case.
This
is
an
example
alert
from
the
cube
prometheus
project
to
alert
when
the
file
system
is
about
to
run
full.
A
A
So,
let's
just
look
at
an
example
of
bronc
ul,
building
a
query
from
very
simple
to
more
complex
and
talk
about
this
overall
nature
of
the
query,
without
diving
too
deeply
into
each
individual.
Like
query
concept,
so
we
might
start
out
just
selecting
all
time
series
that
have
a
given
metric
name
in
this
case
the
counter
of
all
http
requests-
and
this
might
give
us-
maybe
10
000
different
time
series
with
different
labels
on
it.
A
A
So
we
kind
of
look
at
the
last
five
minutes
of
each
of
these
return
counters,
maybe
they're
only
one
thousand
now
in
within
this
job,
and
then
we
want
to
calculate
the
per
second
rate
averaged
over
these
five
minutes.
And
then
maybe
you
know
we're
constructing
our
query.
Further.
We
don't
want
to
actually
see
1000
individual
rates.
We
want
to
sum
them
up
by
the
different
paths
that
these
http
requests
happen
on.
So
we
add
a
sound
by
path
around
this
preserving
the
path
dimension.
A
Maybe
now
we're
only
getting
back
10
different
paths
as
a
result,
and
now
we
might
want
to
know
what
is
the
ratio
of
bad
500
requests
in
comparison
to
the
total
requests
for
every
path?
And
now
we're
introducing
a
binary
operator
here
already
making
the
query
even
more
complex.
So
binary
operations
in
prongql
are
really
awesome,
but
they're
also
kind
of
a
sharp
sharp
edge
because
they
do
automatic
joins
between
the
left
side
and
the
right
side.
A
They
try
to
find
series
on
the
left
and
the
right
side
that
have
identical
sets
of
labels
and
then,
in
this
case
divide
those
identical
elements
by
each
other
and
propagate
that
into
the
result,
and
that
only
works
if
you
get
the
labels
exactly
right.
If
the
underlying
data
exactly
contains
the
right
labels
and
so
on
and
those
modifiers,
you
can
apply
to
these
binary
operators
that
you
have
to
get
right.
A
So,
in
this
case,
for
example,
I
am
so
let's
get
back
one
second
here
we
were
just
selecting
500
status
code
requests
and
now
we
want
to
maybe
see
the
ratio
for
any
bad
request.
That
starts
with
five,
but
has
any
other
two
second
digits,
so
we're
changing
this
condition
here.
This
label
match,
on
the
left,
hand,
side
to
a
regex
matcher
five,
something
something
we
do
have
to
then
preserve
the
status
dimension.
A
If
we
want
to
have
that
kind
of,
you
know
keyed
by
every
status
as
well,
but
then
the
the
binary
operator
doesn't
quite
work
anymore.
We
have
to
tell
it:
hey
only
match
on
the
path
label,
because,
on
the
right
hand,
side
we
don't
have
the
status,
so
we
can't
imagine
all
labels
anymore,
but
now
we
have
more
label
cardinality
on
the
left
side
than
on
the
right
side.
So
we
also
need
a
group
left
modifier
for
this
binary
operator
to
tell
prom
ql
yeah.
A
Then
we
might
want
to
transform
these
ratios
to
a
percentage
multiply
by
100
and
then
maybe
filter
it
down
to
just
those
paths
and
status
code,
combinations
that
have
a
larger
than
five
percent
error
rate
and
yeah.
So
you
know
this
expression
is
exactly
the
same
expression
as
this
one.
This
is
just
indented
a
bit
more
nicely
so
that
you
can
actually
start
to
see
the
nested
tree
structure
that
cronkite
consists
of,
and
we
could
indent
it
even
further
to
reflect
the
evaluation
order.
A
So
this
operator
would
actually
be
the
root
note
of
the
entire
operation,
and
then
it
goes
down
to
this
which
multiplies
it
with
this,
and
we
could
draw
this
as
an
evaluation
tree.
So,
in
the
end,
bronc
ul
is
a
language
that
has
arbitrarily
deeply
nested
expressions
the
expression
evaluation
types
that
can
be.
They
can
evaluate
to
a
vec
like
an
instant
vector,
a
range
vector,
a
string
or
a
scalar
numeric
value,
but
the
types
of
nodes
they
can
be
aggregations.
A
So
this
is
easy
right.
Well,
I
already
suspected
that
it.
You
know.
I
already
knew
that
it's
not
always
easy,
but
I
did
ask
on
twitter
end
of
last
year
what
people's
biggest
frustrations
were
with
both
both
for
beginners
and
for
advanced
prometheus
users,
and
there
were
many
clusters
of
answers
in
the
replies
about
long-term
storage
about
scalability
and
some
other
features.
A
But
the
biggest
cluster
of
those
answers
by
far
was
about
prom
ql.
So
here
you
can
just
see
some
of
the
examples
this
reaches
from
just
you
know,
having
trouble
with
this
binary
operator,
vector
matching.
We
had
a
couple
of
those
this
one
and
this
one
and
a
couple
of
others,
just
you
know,
surfacing
the
actual
help
metadata
for
metrics
in
the
ui
somewhere,
trying
to
understand
some
language
concepts
and
just
working
with
the
data
and
so
on.
It
just
turns
out
to
be
a
challenge
to
make
this
a
bit
clearer.
A
So,
for
example,
you
might
just
be
selecting
a
metric
name
that
doesn't
exist,
in
which
case
the
rate
would
produce
an
empty
result,
in
which
case
the
sum
would
produce
an
empty
result,
in
which
case
you
know
the
binary
operator
would
create
like
wouldn't
find
any
matches.
It
wouldn't
even
find
series
on
the
left
or
right
hand
side,
so
you
get
in
total
an
empty
result.
You
might
also
get
the
labels
wrong.
You
might
get
your
regex
wrong.
You
might
not
actually
have
time
series
exported
with
the
status
500
or
5xx
label
values.
A
A
You
might
also
maybe
get
the
actual
matching
modifiers
wrong
here
like
if
you
just
omitted
these
modifiers,
you
would
also
get
an
empty
result
because
you
have
the
extra
status
label
on
the
left
hand
side,
you
don't
have
it
on
the
right
hand,
side
you're!
A
Same
thing
for
errors,
if
you
do
input
a
prom,
ql
expression,
that
is
more
than
completely
simple,
you
might
get
some
kind
of
weird
funky
error,
and
you
don't
know
exactly
where
in
this
expression
is
it
happening
in
this
case
you
know,
I
can
tell
you
poo.
Actually,
I
think
it's
because
we
have
group
write
instead
of
group
left,
but
you
know
you're
wondering
which
one
of
these
notes
or
sub
expressions
is
actually
producing
the
error.
A
A
So
before
talking
about
the
new
text,
editor
and
prom
lens,
I
want
to
give
a
quick
shout
out
to
grafana's
new
explore
mode.
Actually,
it's
not
that
new
anymore.
I
think
it's
about
a
year
old
or
so
it's
a
modern
grafana
that
already
gives
you
a
bit
more
facilities
for
constructing
and
exploring
the
data
in
prometheus
and
promql
and
has
nice
autocomplete
features.
A
The
downside
is
that
this
microsoft
defined
language
server
protocol
is
great
for
local
editing
and
not
so
well
suited
for
a
web-based
editor,
but
you
know
we
do
also
want
to
have
nice
editing
functionality
like
this
built
into
the
prometer
server
at
some
point.
So
this
this
is
not
quite
100
the
solution
for
everything
yet
so
today
I
want
to
talk
about
two
new
projects
to
improve
this
whole
situation.
The
first
is
a
prom
ql
text
editor
with
a
contextual,
auto
completion
gives
you
linting
and
snippets.
A
This
part
is
actually
open
source
partially,
like
it's
a
collaboration
by
me,
promnaps
promlabs
and
augusta
from
amadeus,
and
you
can.
You
can
actually
look
at
the
source
code
of
that
in
these
two
repos
that
are
linked
here
and
contributions
are
super
duper
welcome.
So
that's
already
great,
but
it's
not
perfect.
Yet
then
prom
lens
is
a
commercial
product
by
promlabs,
and
this
is
a
query,
builder,
visualizer
and
analyzer
tool
for
prompt
ul.
A
A
Then
you
do
get
an
offline
linter
that
works
directly
in
the
editor.
Since
it's
offline
it
doesn't
have
access
to
the
full
prometheus
parser,
but
it
has
a
lightweight
offline
password
that
already
detects
a
lot
of
common
errors.
So
in
this
example,
it
would
tell
you
that
you're
trying
to
pass
the
wrong
type
into
a
function.
A
It
explains
certain
stuff
to
you
in
an
explained
tab,
so
you
can
select
any
node
and
any
type
of
node
and
it
will
try
to
do
its
best
to
explain
what
is
happening
here.
In
this
case,
it
is
visualizing
the
actual
matching
between
the
left
hand,
side
and
the
right
hand,
side
of
the
binary
operation
and
also
explaining
what
is
happening
here
yeah.
It
can
show
you
the
help
and
type
of
a
metric
later
on.
A
It
includes
a
full
form-based
editor
for
any
prom
ql
node
type,
so
you
can
actually
go
into
this
tree.
Select
any
of
the
nodes
go
into
this
form-based
editor
mode
and
just
you
know,
go
wild.
So
this
is
great
if
you
maybe
have
a
rough
understanding
of
how
promql
works
but
you're,
maybe
a
data
analyst
or
just
not
super
familiar
with
every
little
syntactic
detail
of
prometheus
of
promql.
A
So
all
of
those
details
are
actually
mapped
into
this
form
based
editor,
but
if
you're
more
of
a
power
user,
you
can
also
completely
just
switch
any
of
the
inline
three
nodes
into
a
prom
ql
editor
with
all
the
features
we
saw
earlier
and
just
change
exactly
the
things
you
need
and
then
again
switch
out
of
that
inline.
Editing
mode
and
you
get
back
to
your
modified
tree
view
and
just
change
exactly
the
things
you
want.
A
Okay,
these
were
just
some
features
of
prom
lens,
I'm
just
going
to
quickly,
also
demo
it
just
to
give
you
a
feeling
of
what
it
actually
looks
like
to
use
it.
Okay,
I
probably
will
need
to
zoom
in
a
bit
here
for
people
to
be
able
to
see
things.
This
might
distort
some
things,
but
yeah
that's
to
be
expected.
A
If,
if
it's
really
hard
to
read
because
of
the
zoom
level,
please
let
me
know
I'll
zoom
in
further,
but
otherwise
I'll
continue
like
this.
So
the
first
thing
I
want
to
do
is
just
write.
The
same
histogram
quantile,
query
three
different
times.
So,
for
example,
I
want
to
calculate
the
quantile,
the
90th
quantile
from
a
given
histogram.
I
could
just
start
out
with
a
snippet
and
start
typing
it
right.
This
is
one
way
of
doing
it.
A
A
different
way
of
doing
it
is
actually
to
say,
go
directly
into
the
form
based
editor,
but
go
over
to
snippets
and
say
hey.
I
want
to
calculate
the
quantile
from
histogram,
and
now
I
get
this
tree
view
here
with
the
placeholders,
and
I
can
actually
just
you
know,
directly
jump
into
inline
prongql
editing.
A
If
I
would
like
to
I
can
you
know,
navigate
with
the
keyboard
in
here,
and
you
know
select
the
histogram
that
I
want
to
look
at
and
then
it
already
detects
that
hey,
you
probably
want
to
add
a
rate,
because
you
don't
want
to
look
at
the
histogram
over
all
of
its
time,
so
we're
also
adding
a
rate.
So
this
is
the
second
way
to
get
to
the
same
result.
A
A
different
way,
you
might
start,
might
be
just
to
type
the
histogram
you
want
and
then
it
already
detects.
Okay
underscore
bucket.
This
is
likely
going
to
be
a
histogram.
Do
you
want
to
add
this
very
common
structure
around
it
and
you
get
the
same
thing
and
you
know
you
could
then
adjust
the
labels
you
see
like.
I
actually
want
to
preserve
the
status
label
here
as
well,
so
you're,
adding
that
and
yeah
and
then
you're
getting
the
status
label
as
well.
So
this
is
one
thing,
a
different
thing.
A
A
I
can
demonstrate
a
bit
of
like
drag
and
drop
features
and
stuff
like
that.
If
I
now,
for
example,
edit
this
note
inline-
and
I
just
remove
the
group
left-
I
will
get
an
error
here
and
yeah
in
this
case.
It's
not
visualized
properly
yet
will
soon.
A
But
if
I
have
like,
for
example,
group
write
where
I
was
supposed
to
have
a
group
left,
then
I
also
get
some
helpful
hints
for
how
like
what
could
possible
fix.
This
be,
and
no
future
also
have
more
action
buttons
for
that
here.
Okay,
this
is
kind
of
synthetic
data.
Let's
have
a
look
at
one
realistic
ish
example.
I
have
a
onenote
kubernetes
cluster
set
up
with
the
prometheus
operator
deployed
using
the
cube,
prometheus
jsonnet
files,
and
these
include
really
crazy,
alerting
rules.
A
So
these
are,
you
know
quite
complex
already,
and
those
are
the
kind
of
use
case
that
prom
lens
is
targeting.
So,
for
example,
if
we're
looking
at
yeah,
I
guess
you
can't
read
those
if
you're
looking
at
the
file
systems
on
given
nodes
and
whether
they're
full
or
not,
there's
alerts
for
that.
A
So
we
could
just
copy
one
of
those,
for
example,
and
look
at
yeah.
If,
if
you
just,
you
know,
look
for
this
in
prometheus
and
you
click
on
this
alert-
and
you
know
it's
not
going,
it's
not
going
to
be
very
intelligible.
So
if
we
just
take
this
entire
thing,
we
paste
it
into
prom
lens.
It's
also
not
going
to
be
immediately
intelligible,
but
at
least
now
we
can
work
with
it.
A
A
And
now
I
can,
you
know,
start
looking
at
this
and
see
like
okay.
This
actually
does
return
results,
but
then
this
filter
doesn't-
and
you
know
the
predict
linear
works,
but
then
the
filter
also.
Doesn't
it
filters
away
stuff?
A
So
this
is
good,
so
this
makes
it
very
obvious
very
quickly
which
parts
of
the
tree
actually
don't
produce
data,
and
I
actually
had
a
screen
share
session
with
one
of
my
customers
where
we
just
went
through
their
alerting
rules
and
we
basically
copied
them
all
into
prominence
and,
like
I
don't
want
to
say
half,
but
almost
half
of
them
were
broken
in
some
way,
and
so
basically
my
answer
for
most
of
them
was
like
yeah.
Sorry
like
this
is
never
going
to
alert.
A
A
A
Yeah
I
already
answered
one
of
them
in
chat
and
the
preview
there
will
always
be
a
free
preview.
I
think
the
difference
is
that
the
free
preview
version
will
be
licensed
in
such
a
way
that
you're
only
supposed
to
kind
of
it's
still
in
the
influx,
but
use
it
with,
if
not
millions
of
time
series,
and
maybe
only
use
it
for
personal
reasons,
and
then,
if
you
want
to
actually
use
it
commercially,
there's
going
to
be
a
license.
A
A
Does
it
work
with
extended
queries
rates
of
rates?
Oh
you
mean
sub
queries.
Yes,
it
does
should
at
least
is
there
anything
I
missed.
A
How
much
of
prominence
functionality
will
be
available
in
prometheus,
so
the
editor
is
the
main
part
which
I
really
think
we
have
to
have
in
prometheus.
The
rest,
I
cannot
currently
justify
so
prom
labs
is
just
myself
and
I
can't
currently
justify
completely
open
sourcing
it
like
having
spent
that
much
time
building
it.