►
Description
A whistlestop tour of the GitLab monitoring capabilities, specifically around Kubernetes environments and metrics (Prometheus).
Incubation:APM SEG - https://about.gitlab.com/handbook/engineering/incubation/monitor-apm/
Kubernetes Observability Stack project - https://gitlab.com/gitlab-org/incubation-engineering/apm/k8s-o11y-demo
A
To
lead
on
to
the
sort
of
work
that
we're
going
to
be
doing
in
the
next
few
months
to
create
a
sas
first
solution,
an
agent-based
solution
to
solve
this
problem.
So,
as
you
can
see
here,
I've
set
up
a
kubernetes
cluster
with
minicube
and
I've
called
it
minicube.
This
is
linked
to
a
test
server
and
you
can
see
here.
A
If
you
look
at
my
canines
instance.
This
is
a
little
mini
cube.
Cluster
I've
got
on
my
machine
three
nodes
ready
to
go,
and
after
a
bit
of
configuration
that
I'll
show
you
that's
connected
to
this
local
gitlab
instance
that
I'm
running
here.
So
you
can
see
some
information
about
the
cluster.
I've
managed
to
get
it
to
connect
to
a
local
address
and
change
some
of
the
background
settings
to
do
that
and
the
ca
certificate
and
things
like
that.
It's
all
local
to
my
machine.
A
So
there's
no
risk
in
me
showing
you
this
right.
So
there's
the
mini
cube
instance,
and
what
I've
done
is
I've
used
an
observability
repository
that
I've
set
up
locally
and
I
will
add,
links
into
the
youtube
comment-
youtube
video
for
that
repository.
So
you
can
see
how
that
works
and
I've
deployed
into
my
mini
cube
instance.
A
A
As
well
as
that,
I've
installed
a
fluent
bit
an
elastic
search
to
perform
logging
operations
there
and
jaeger
as
well,
sat
on
top
of
elasticsearch
to
do
trace
monitoring-
and
this
is
all
running
in
the
gitlab-
managed
apps
namespace,
which
is
a
requirement
to
get
this
running
with
gitlab
itself,
and
there
are
specific
requirements
around
service
names
and
things
like
that.
So
I'll
switch
back
to
gitlab
here
and
we
can
click
on
health
component
here
and
it
loads.
A
And
it's
making
a
request
to
my
cluster
there.
I've
no
idea
what
this
there
was,
an
error,
getting
dashboard
validation
warnings,
information
means
not
a
clue,
and
we
can
see
some
basic
information
about
the
health
of
the
cluster
cpu
overall
cpu
usage
and
memory
usage,
and
we
can
narrow
that
down
a
little
bit
there
and
look
at
that
and
that's
just
a
top
level
bit
of
information.
A
If
you
look
at
the
integrations
here,
I've
just
turned
on
prometheus
for
the
moment:
elastic
stack
and
logging
I'll
cover
in
another
video,
and
you
can
see
more
information
about
the
the
way
that
you
can
actually
get
prometheus
integrated
there
yeah,
like
I
say
you
have
to
have
it
in
a
particular
name:
space
with
a
particular
service
name
running
on
port
80,
which
isn't
in
the
docs.
I
must
create
an
issue
for
that.
A
A
What
I
can
do
now
is
just
show
you
the
monitoring
space.
That's
set
up
now.
A
A
Now
in
this
page,
I've
noticed
that
obviously
there's
a
lot
of
integrations
here
that
I
that
are
irrelevant
to
my
cluster
but
they're.
Looking
for
specific
things
in
the
prometheus
instance,
this
gitlab
instances
connecting
through
to
via
the
api
server
of
this
mini
cube
cluster
that
I've
got.
A
I
can't
seem
to
get
system
metrics
to
display
here.
I'm
not
sure
why.
But
if
I
change
this
to
k,
it's
pod
health
at
the
top.
A
When
it
gets
to
it-
and
I
can
pick
a
pod
out
here-
so
it
will
actually
query
into
the
cluster
and
get
me
the
pods.
So
if
we
have
a
look
at
say,
the
elastic
search
master
that
I've
got
running
in
that
cluster,
give
it
a
moment
to
catch
up,
and
there
we
go.
We've
got
some
cpu
usage.
There
see
we've
had
a
little
spike
there.
This
is
all
sort
of
caused
by
millicourse,
but
container
memory
metrics.
A
As
expected,
fairly
high
memory
usage
for
elastic
there,
800
meg
and
you've
got
some
network
spikes
there.
You
can
see,
allegedly
not
using
any
disk,
which
is
highly
unlikely.
So
you
can
see
there.
We
can
get
some
basic
metrics
here
and
we
can
get
anything
from
the
you
know
get
these
from
the
pods
there
and
it
is
possible
to
customize
this.
So
you
can
create
new
dashboards
and
those
use.
A
So
there
we
go
and
you
can
explore
around
and
see
see
what's
going
on
there.
So
it's
fairly
limited.
I
I
have
seen
that
you.
You
should
also
be
able
to
view
the
logs
here.
A
I
know
there
is
some
implication
that
you
need
elasticsearch,
but
it
should,
I
think,
be
able
to
go
into
there
and
pull
the
logs
out
and
it
doesn't
seem
to
be
able
to
at
the
moment.
I'm
not
sure
why
that
is.
It
doesn't
seem
to
be
able
to
pull
the
pod
information,
even
though
we
saw
that
was
working
the
previous
step.
So
I'm
not
quite
sure.
A
What's
going
on
there,
the
other
aspects
of
monitor
on
the
side
you
can
see
there
are
tracing,
and
all
that
allows
you
to
do
is
essentially
add
a
jaeger
url
here
to
link
to
and
other
areas
like
error,
tracking
alerts
and
incidents
I'm
not
going
to
go
into,
but
they
are
areas
that
would
look
to
feed
into
in
the
future
to
complete
that
devops
cycle.
A
If
we
look
at
the
monitor
settings
generally
here,
you
can
see
that
you
can
actually
configure
dashboards,
so
you
can
link
to
an
external
dashboard.
Likewise
tracing
error,
tracking
setting
up
alerts
a
grafana
instance
so
that
you
can
embed
who
could
find
the
dashboards
in
that
metrics
area
there.
So
so
there
is
a
certain
amount
of
flexibility
built
in
there.
You
don't
have
to
use
this
built-in
solution,
but
it
does
fit
in
with
the
environments
quite
well.
A
A
What
I
will
do
is
look
at
the
cluster
that
we've
set
up
and
what
we've
got
in
this
cluster.
As
I
said,
this
mini
cube
cluster
has
this
chart
and
again
I'll,
put
a
link
to
this.
This
repository
this
project
in
gitlab,
under
the
video,
and
this
sets
up
prometheus
operator,
elasticsearch
cabana
fluent
bit
to
feed
into
elasticsearch
and
jaeger
for
the
tracing
and
I'll
just
give
you
a
quick
demonstration
of
that.
A
So
this
is
just
setting
up
all
these
sort
of
common
observability
tools
for
kubernetes,
specifically,
although
you
can
use
these
for
any
environment
really
and
I've
set
these
up
in
the
cluster
under
node
ports
here,
so
we
can
easily
look
at
them,
so
we
have
a
grafana
instance.
Here
we
have
a
jaeger
instance
and
we
have
a
cabana
instance
that
we
can
look
at.
So
if
we
quickly
look
at
the
grafana
instance,
you
can
see
that
by
default
with
the
prometheus
operator,
the
cube
prometheus
stack.
A
I
think
it's
called
you
get
a
lot
of
dashboards
and
functionality
just
out
of
box
there
straight
away
and
really
some
quite
decent
stuff
in
here.
Of
course,
you
can
customize
all
this
and
write
your
own
queries,
but
you
can
use
the
you
can
see.
Things
like
you've
got
the
use
method,
performance,
monitoring,
method
for
this
cluster.
Here,
look
at
memory,
saturation
and
cpu
usage,
and
things
like
that.
A
You
can
look
at,
for
example,
slos
for
the
actual
api
server
for
the
cluster
itself.
You
can
see
this
sort
of
basic
error,
budget
implementation
up.
There
read
availability,
right,
availability,
that's
very
useful
stuff
and
of
course
you
can
it
being
funny.
You
can
dig
down
into
any
of
these,
it's
very
responsive
and,
of
course
we
can
pick
up
any
part
here.
So
we
can
look
in
that
get
lab
manager,
that's
name
space
that
we
set
up
there.
A
Let's
again,
elastic
is
always
a
good
example
of
these.
You
can
see
a
bit
of
cpu
usage
there
you
can
see
the
memory
usage
is
relatively
high.
You
know
sat
between
a
sort
of
min
and
max
that
we've
set
up
there.
You
can
see
the
the
limits,
requests
and
limits
for
those
pods
that
have
been
set
up,
which
is
really
useful.
A
A
A
So
we
can
look
at
you
know.
Looking
at
that
interface,
the
file
doesn't
give
you
particularly
interesting
traces,
but
you
can
see
there.
We
hit
that
data
source
reverse
proxy
and
we
can
look
at
the
tags
and
information
there.
So
it's
quite
a
powerful
observability
stack
that
you
can
set
up
really
easily
from
this
repository.
Just
by
doing
a
helm,
install
essentially
and
there's
a
make
file
in
there
that
you
can
look
at
to
easily
set
that
up
out
of
the
box
there
with
a
minimal
effort.
Of
course,
it's
not
set
up
to
scale.
A
It's
not
going
to
handle
a
large
number
of
requests.
It's
just
set
up
for
experimentation
anyway.
I
thought
that
might
be
interesting
and
that
will
do
for
the
first
video
and
I'll
see
you
next.