►
Description
In this talk, Lili will walk through what prometheus-operator does, the resources it configures and manages. To make the story complete, we will have a look at the kube-prometheus project, which is a set of manifests that allows you to easily set up full Kubernetes monitoring into your clusters and get a complete insight into the cluster workloads. We will conclude with a guide through some examples of how to monitor your applications using prometheus-operator custom resources.
A
Okay,
so,
as
lexi
said,
my
name
is
lily
and
today
we'll
be
talking
about
monitoring
kubernetes
with
the
prometheus
operator.
A
So,
as
likely
already
mentioned,
I
am
currently
a
principal
software
engineer
at
red
hat,
but
I
used
to
work
at
ken
falk
as
well
and
I'm
an
engineer
in
the
openshift
in
cluster
monitoring
team
and
we
work
on
opportunity
being
the
red
hat's
kubernetes
product,
I'm
also
the
maintainer
and
contributor
of
prometheus
operator
cube
prometheus
and
keepsake
metrics,
to
name
a
few
just
to
mention
why
I'm
qualified
sort
of
to
do
this
talk.
A
So,
let's
see
let's
get
started
then
so,
we'll
briefly
talk
about
prometheus
because
I'm
sure
most
of
you
have
already
came
across
prometheus
by
now,
and
there
are
many
great
talks
and
resources
out
there
to
explain
the
to
explain
the
inner
workings
of
prometheus
very
well.
So
we'll
just
briefly
do
a
summary
here.
A
I
borrowed
this
graph
off
of
greetia's
website.
So
let's
just
do
a
very
quick
short
summary.
So
essentially,
prometheus
takes
care
of
target
discovery.
It's
a
monitoring,
project
and
retirement
being
usually
your
application,
for
example,
or
you're
your
infrastructure
workload.
A
It
also
pulls
the
metrics
every
time,
interval
and
stores
those
metrics
as
time
series
and
a
custom
time
series
database.
We
call
it
tsdp.
Another
thing:
important
thing
that
it
does.
It
also
evaluates,
alerting
rules
and
pushes
those
alerts
to
alert
manager.
A
Alert
manager
then
takes
care
of
proxying.
These
alerts
to
the
correct
receivers
that
you
can
figure
but
prometheus
does
the
actual,
alerting
evaluation
and
that's
something
that
a
lot
of
people
don't
necessarily
realize,
because
the
alert
manager
name
is
so
misleading
and
the
alerting
roles
are
actually
stored
in
prometheus
itself,
and
this
will
become
relevant
at
a
later
point
and
basically
prometheus
evaluates
them
against
the
data
that
is
stored
in
the
time
series
database,
so
essentially
the
metrics.
A
So
that's
briefly
about
prometheus,
but
let's
move
on
to
the
topic
of
today,
which
is
prometheus
operator,
so
prometheus
operator
is
part
of
the
prometheus
operator.
Org
actually-
and
it
consists
of
two
projects
right
now-
3ts
operator
and
cube
prometheus,
and
we
have
over
5k
stars,
which
is
of
course,
the
most
important
metric
out
there,
and
both
projects
have
a
huge
adoption
rate
and
we
recently
even
added
the
adapters
file.
A
A
So
the
operator
actually
was
one
of
the
first
kubernetes
operators
in
general,
which
was
created
by
core
os,
which
also
coined
the
term
operator.
As
we
know
it
today
and
essentially
what
prometheus
operator
does
it
manages
configures
and
operators
monitoring
components
within
your
kubernetes
clusters.
A
It
also
provides
a
very
powerful
multi-tenancy
knobs
and
features
and
gives
you
the
ability
to
self-service
your
monitoring.
Essentially,
we
also
have
a
really
cool
logo
that
was
donated
to
us
by
bianca
and
yeah,
we're
now
a
very
fancy
project
so
to
for
the
operator
to
be
able
to
actually
manage
the
monitoring
components,
you
need
to
create
custom
resources
as
a
user
and
prometheus
custom
resources
are
the
following
ones
and
we'll
go
look
at
each
one
of
those
and
what
they
do
in
the
next
few
slides.
A
So,
let's
start
with
the
most
important
one,
which
is
the
custom
resource,
so
what
the
prometheus
custom
resource
does.
Is
it
basically
implements
it?
Configures
manages
prometheus
via
statefulset.
So
basically,
the
operator,
based
on
your
configuration,
creates
a
stateful
set
depending
on
how
many
replicas
you
choose
and
it
deploys
those
and
manages
it
for
you,
some
interesting
fields
in
the
prometheus
spec
that
you
should
be.
Configuring
are
selector
which
basically
selects
which
objects
it
should
pick
up
and
you
can
match
on
labels
there.
A
Another
one
is
the
alerting
field
which
tells
prometheus,
which
alert
manager
endpoints.
It
should
push
the
alerts
to
which
is
the
thing
we
mentioned
earlier
and
another
one
which
is
very
very
useful
is
resources
because,
as
we
know,
prometheus
is
really
just
a
database
and
you
should
treat
it
as
such
and
figure
out
how
much
data
you'll
be
working
with
and
size
that
accordingly
and
finally,
the
as
we
mentioned
the
replicas.
A
So
with
no
sharding
enabled
you
should
choose
roughly
two
replicas
to
have
a
highly
available
setup
or
if
you
have,
especially,
if
you
have
more
than
one
node
cluster
and
for
a
full
list
of
all
the
apis
and
all
the
fields
we
have.
A
You
can
have
a
look
at
our
prometheus
operator,
api
dock,
which
I'll
there's
a
link
to
it
at
the
very
end-
and
these
are
just
some
of
the
things
that
you
can
configure
the
next
one
to
look
at
is
the
alert
manager
resource
essentially,
as
with
the
prometheus
one
alert
manager,
is
configured
deployed
and
managed
via
stagefulset
deployment
you
can
also,
but
what
it
doesn't.
Also
it
configures
the
alert
manager
instances
to
talk
to
each
other.
A
So
alert
managers
have
a
gossip
protocol
to
synchronize
the
instances
of
an
alert
manager
cluster,
so
it
it
does
that
because
you
wanna
have
not
duplicated
notifications
sent
out.
So
essentially
they
do
sort
of
like
a
gossip
amongst
each
other
to
prevent
those
things
as
you
don't
want
to
be
paged
three
times
for
the
same
thing
and
prometheus
operator
handles
that
configuration.
A
So
the
once
you
have
your
prometheus
up
and
your
alert
manager
up
via
the
custom
resources
you
next
are
interested
in
actually
monitoring
your
things.
So,
for
those
things
the
service
monitor
and
the
pod
monitor
custom
resources
are
needed.
A
So
what
they
actually
do
is
let
you
configure
the
targets
to
monitor
without
needing
to
learn
a
prometheus-specific
configuration.
So
essentially,
it's
very
simple
fields
which
lets
you
very
easily
monitor
things.
So
we
often
get
asked
the
question:
what's
the
difference
between
the
service
monitor
and
the
pod
monitor,
and
really
it's
just
that
the
pod
monitor
is
relatively
new
and
we
historically
had
only
service
monitor,
so
people
defaulted,
of
course,
to
service
monitor,
but
we
now
frequently
get
those
questions
and
really
what
a
service
monitor
does.
A
Is
it
selects
via
label
matcher
and
we'll
see
that
at
the
very
end,
it
selects
all
the
services
that
match
those
labels
and,
in
turn,
scrape
each
of
the
pod
that
backs
those
service
and
the
pod
monitoring
directly
selects
pods.
So
that
is
the
difference
between
them.
It
depends
on
your
setup
a
couple
of
interesting
things.
You
should
be
configuring,
especially
if
you
can
control
our
all
the
service
monitors
or
the
pod
monitors
are
the
sample
and
target
limits.
So
basically,
what
they
do
is
you.
A
A
So
it's
super
useful,
like
I
said
in
a
multi-tenant
environment
or
when
you
basically
don't
know
what
your
users
are
up
to
to
do.
Some
kind
of
high
unbound
cardinality
series
and
they're
a
fairly
recent
edition.
A
So
how
does
it
actually
work
right?
How
do
the
service
monitor
and
the
pod
monitor
actually
work
after
the
resource
is
created
by
the
user?
So
let's
say
you
specify
a
service
monitor
or
pod
monitor
for
your
application
that
gets
picked
up
by
the
prometheus
operator,
which
in
turn
creates
a
secret,
a
kubernetes
secret
with
the
content
of
the
target
discovery.
As
we
mentioned,
it
translates
everything
to
the
prometheus
specification
and
then
the
config
reloader
sidecar,
which
runs
along
prometheus
watches,
that
target
discovery,
secrets
and
reloads
prometheus.
A
If
there
were
any
changes
to
those
secrets,
so
there
is
no
real
magic
to
it.
It's
it
just
basically
boils
down
to
actual
kubernetes
objects
and
we
use
a
secret
as
your
targets
can
contain
sensitive
information.
A
So
you
don't
have
to
save
as
much
compute,
so
you
basically
save
compute
and
how
do
those
work.
So,
in
turn,
when
you
create
a
prometheus
rule,
whether
it's,
whether
it
has
alerting
or
recording
rules,
the
prometheus
operator,
depending
on
which
namespace
it
watches
and
if
you've
created
one
in
that
namespace,
it
picks
up
that
custom
resource
and
then
it
bin
packs
all
the
rules
that
you've
specified
into
a
config
map
or
multiple
config
maps
depending
on
the
size
of
the
rules,
and
it
essentially
mounts
those
config
maps.
A
So
again,
it's
really
important
information
for
whenever
you
need
to
debug
something
that
goes
wrong
in
case.
Something
doesn't
get
picked
up
like
a
prometheus
role
or
a
service
monitor
one
of
our
newer
custom
resource
that
we
added
recently
is
the
probe
custom
resource
and
what
it
is.
It
essentially
is,
as
the
name
suggests
it
lets,
you
configure
how
groups
of
static
targets
or
ingresses
should
be
monitored.
A
You
do
need
to
deploy
something
like
the
black
box
exporter,
for
example,
for
it
to
work,
and
it's
one
of
our
newest
ones,
so
we
have
been
using
it
as
well,
and
it's
really
powerful
and
the
latest
one
which
is
not
v1,
it's
technically
all
the
ones
that
I've
mentioned
are
stable
resources,
but
the
alert
manager
config,
which
is
also
a
custom
resource,
is
really
great
for
multi-tenant
environment
and
we
plan
on
using
it
in
openshift
as
well.
A
It
also
allows
you
to
configure
inhibition
rules
and
inhibition
rules
in
alert
manager
are
things
that
actually
mute.
All
specified
alerts
that
match
on
it
whenever
a
group
of
alerts
is
firing.
So,
for
example,
let's
say
that
a
node
down
alert
is
you
don't
want
to
be
firing?
10
other
alerts
after
that,
so
inhibition
rules
are
really
powerful.
For
that.
A
And
finally,
the
last
custom
resource
that
we
have
right
now
is
the
thanos
ruler.
Thanos
is
a
project
part
of
the
cncf
organization
and
we
use
it
for
we
use
the
tandem
sidecar,
which
I'll
mention
later
and
we
use
the
thomas
ruler
and
our
prometheus
operator.
But
you
can
deploy
it
on
the
square
here
to
make
like
the
full
story
and
essentially
what
the
thanos
ruler
is.
A
Is
it's
really
powerful
when
you
connect
it
to
multiple
prometheus
instances,
for
example,
and
it
does,
is
it
evaluates
prometheus
rules,
so
the
recording
and
the
alerting
rules-
and
you
can
connect
it
to
a
chosen
query
api,
so
you
can
connect
it
to
any
of
the
prometheus
instances
you
have
and
again,
with
all
the
other
components,
it's
really
useful
as
a
multi-tenant
setup,
where
multiple
instances
of
prometheus
are
deployed
or
where
you
want
to
essentially
have
a
very
powerful,
huge
cluster
and
you
make
it
into
a
very
specific
multi-tenant
environment.
A
So
often
people
end
up
using
just
some
custom
resources
out
of
the
box,
but
one
of
the
things
that
is
really
cool
is
automated
charting,
so
sharing
already
exists
in
prometheus,
but
what
we
do
is
we
essentially
distribute
the
loads
automatically
across
the
number
of
shards
specified,
and
that's
really
really
nice
and
we
do
get
some
users
using
it,
but
it
definitely
needs
a
bit
more
love
and
a
bit
more.
The
next
one
is
the
enforce
namespace
label.
A
This
is
part
of
the
prometheus
spec
and
essentially
what
it
is
is
that
it
add,
enforces
adding
a
namespace
label
for
each
of
your
alerting
and
metric
that
the
user
creates-
and
this
is
really
great
for
enforcing
tenancy
per
name
space,
so
a
user
can
never
alert
on
something
that
is
outside
of
their
namespace
essentially
and
another.
One
to
note,
as
I
mentioned
earlier,
is
the
thumbnail
sidecar.
A
You
essentially
add
it
to
prometheus
and
you
enable
object,
storage
and
you
can
connect
it
with
into
a
really
powerful
configuration,
and
this
is
something
we
heavily
use
in
openshift
as
well.
A
So
now
that
we've
learned
about
prometheus
operator,
let's
have
a
look
at
the
other
project
in
the
same
organization,
which
is
the
cube
prometheus,
so
keep
prometheus
as
as
the
name
applies,
is
essentially
a
group
of
manifests
that
lets
you
easily
modif
monitor
your
kubernetes
workloads
out
of
the
box,
so
things
like
etcd
api
server,
cubelet
the
monitoring
components
itself
and
many
more
things,
but
what
it
does
it
also
provides.
All
of
these
manifests
in
form
of
json.
A
For
those
of
you
don't
know,
jsonnet
is
a
data
templating
language
that
extends
json,
and
by
doing
this
it
really
allows
you
to
customize
your
experience.
For
example,
in
openshift
we
bring
in
q
prometheus,
but
we
do
a
very
specific
openshift
customizations
and
we
do
have
links
to
this.
How
this
can
be
done
for
your
environment,
so
you
can
essentially
customize
your
workload
monitoring.
A
So,
as
we
mentioned,
it
brings
a
lot
of
things.
So,
let's
see
exactly
what
those
things
are
so
as
mentioned
before
there.
These
are
the
kubernetes
own
workloads,
but
it
really
all
starts
with
deploying
prometheus
operators.
So
the
cube
prometheus
sac
first
deploys
the
prometheus
operator
deployment
with
all
the
custom
resource
definitions,
it
registers
and
out
of
the
box.
It
creates
a
prometheus
h,
a
setup
and
an
aha
of
alert
manager.
So
two
replicas
of
prometheus
and
three
replicas
of
alert
manager.
A
We
also
installed
keepsake
metrics,
which
helps
us
provide
an
insight
into
your
kubernetes
cluster
by
exporting
the
metrics
about
all
the
kubernetes
resources,
also
the
node
exporter,
which
provides
the
os
and
hardware
metrics,
and
we
also
monitor
all
the
kubernetes
components
so
essentially
the
cubelet,
the
lcd
all
of
those
and
the
monitoring
components,
because
you
should
always
be
monitoring
your
monitoring
system
as
well.
Otherwise,
you
can't
know
it's
reliable.
A
We
also
prevent
provide
a
bunch
of
grafana
dashboards
and
alerting
and
recording
rules
out
of
the
box
for
all
the
kubernetes
things
as
well,
so
to
visualize
this
a
bit
better.
A
A
So
now
that
we've
seen
how
the
kubernetes
owned
workloads
are
monitored,
let's
see
how
you
would
monitor
your
own
applications.
So
we
monitor
the
monitoring
system,
we
monitor
kubernetes,
but
how
do
you
monitor
your
applications?
So
it's
quite
fairly
simple.
A
Really,
as
we
mentioned,
we
have
our
service
monitor
on
the
very
right
side
here,
which
has
the
matching
labels
highlighted
and
what
that
matches
is
our
service,
which
is
in
the
middle,
and
the
service
monitor
in
the
middle
has
the
selector
that
matches
all
the
pods
which,
in
turn,
are
deployed
by
the
example
deployment.
So
we
also
expose
the
patch
export
in
the
service
and
that's
it
that's
all
the
magic.
A
Everything
else
is
taken
care
of
by
prometheus
operator,
and
your
application
only
needs
to
expose
the
slash,
metrics
endpoint,
and
then
it's
picked
up
by
prometheus
itself.
A
So
too
often
things
don't
work
out
of
the
box.
If
you,
for
example,
misconfigured
some
labels
and
the
best
way
to
go
around
troubleshooting
is
to
go
to
the
slash
to
the
prometheus
ui
and
to
the
slash
targets
page,
and
here
you
will
see
all
your
targets
that
prometheus
discovered
or,
in
my
turn,
couldn't
discover.
So
in
the
case
of
the
screenshot,
for
example,
here
prometheus
just
couldn't
scrape
the
tennis
side
car.
A
A
You
can
also
use
this
handy
command
to
see
what
is
actually
in
the
secret
itself
that
is
created
by
prometheus
operator,
so
you
can
do
that
by
grabbing
the
service,
monitor,
pod,
monitor
or
other
resource
name,
and
we
also
have
this
linter
tool.
So
it's
essentially
a
way
to
validate
your
custom
resources.
You
can
add
it
as
part
of
your
ci
or
just
do
it
for
locally.
A
So,
as
a
conclusion,
I
think
we
all
learned
a
bit
about
the
basics
of
kubernetes
monitoring
and
how
to
monitor
your
workloads
with
prometheus
operator
and
we'll
have
a
look
at
also
some
helpful
docs
and
where
you
can
ask
for
help.
So
essentially
we
have
a
new
website
coming
soon.
I
think
matthias
is
working
on
it
now,
so
you
can
bookmark
it
and
we'll
let
you
know
when
it's
ready.
A
It
should
contain
things
like
guides
and
blog
posts,
any
talks
we
do
and
just
in
general,
a
good
place
for
prometheus
operator,
but
also
prometheus
itself.
We
also
have
a
section
in
our
github,
where
we
have
troubleshooting
docs
so
feel
free
to
contribute
as
well.
If
you
found
any
good
troubleshooting
tips,
we
have
a
slack
channel
on
the
kubernetes
slack,
the
prometheus
operator
and
also
the
prometheus
operator
dev
in
case
you're,
contributing
something
you
can
always
open
an
issue
on
github
on
either
of
the
repos.
A
So
I
think
I'll
share
my
slides
afterwards
and
you
can
click
around.
We
also
recently
started
creating
a
wiki
for
run
books
for
alerts
that
is
located
in
the
q
prometheus
project,
and
anyone
can
edit
that
and
add
their
own
run
books
and
it's
essentially
community.