►
Description
Weekly demo issue link - https://gitlab.com/gitlab-org/incubation-engineering/apm/apm/-/issues/9
A
Hello,
I'm
joe
shaw,
I'm
a
full
stack
engineer
in
the
incubation
engineering
department
at
gitlab
and
my
focus
is
on
application
performance
monitoring
and
how
we
take
that
idea
and
build
our
own
solution
for
it.
So
what
I'm
going
to
show
you,
as
you
can
see
now,
is
an
issue
for
my
weekly
demo.
A
It's
a
fairly
quick
on
this
week.
I've
been
out
a
couple
of
days
and
I've
been
doing
a
lot
of
useful
postgres
training
as
well,
which
has
been
very
handy
for
this
work
actually
to
be
doing
a
bit
of
postgres
work
with
time
scale
db.
A
A
So
I'll
just
go
through
to
the
issue
here,
we've
got
for
the
evaluation,
so
there's
a
few,
a
few
links
in
there
for
anyone,
that's
interested
and
what
we've
started
looking
at
is
benchmarking
click
house
against
a
few
different
databases
that
are
prominent
in
this
area
and
particularly
ones
that
are
supported
by
the
time
series
benchmark
suite,
which
I
think
I
discussed
previously
a
project
created
by.
A
I
think
influx
db
and
now
maintained
by
timescale
db
and
here's
a
breakdown
that
you
can
see
of
all
the
supported
databases,
in
that
benchmarking
tool
that
you
can
run
against
and
the
tool
supports
a
variety
of
workloads.
A
The
one
we're
interested
in
is
a
devops
workload,
so
it
will
simulate
metrics
for
cpu
disk
memory,
kernel
metrics,
all
sorts
of
stuff
like
that
and
give
you
quite
a
few
tables
there
and
the
benchmark
that
that
they
have
will
set
up
a
database,
create
tables
simulate
data
set
up
workers
to
insert
that
data,
and
it
also
creates
a
bunch
of
different
varying
queries
in
terms
of
complexity
from
basic
selects
through
to
quite
complex
group,
by
operations
as
well.
A
But
I
will
document
all
those
queries
in
here
as
well,
once
I've
done
the
analysis.
So
if
this
sort
of
matrix
I've
set
up
here,
the
what
we're
looking
for
is
a
database.
That's
is
multi-modal.
So
it's
not
just
time
series,
because
we
want
to
use
this
data
store
for
other
observability
data
as
well.
So
we
want
it
to
be
able
to
handle
things
like
traces
and
complex
data
structures
like
that,
and
we
really
need
replication,
probably
partitioning
as
well
as
as
things
scale
up.
We
want
at
least
medium
to
high
query
flexibility.
A
So
what
I'm
talking
about
there
is,
you
know,
sort
of
how
complicated
the
queries
you
can
write.
You
do
complex
groups
and
joins,
and
things
like
that
and
a
level
of
maturity
and
I've
really
based
that
on
how
long
the
the
project
has
been
running
the
amount
of
traction
things
like
likes
on
the
repository
code
and
and
discussions
on
stack
overflow
and
things
like
that
as
well.
A
So
you
can
see
from
that
grid
a
few
stamped
out
click
house
being
one
of
them
that
we're
interested
in
create
db,
another
mongodb
and,
of
course,
timescale
db
and
timescale
itself
is
metrics
focused
in
terms
of
time
series,
but
it
is
multi-modal
because
it's
on
top
of
postgres
okay,
so
you
can
put
you
know
anything
you
can
put
in
the
postgres
database
in
there
mongodb
json
store
database
very
flexible,
so
we'll
see
how
that
performs.
A
Create
db
is
one
I'm
less
familiar
with,
but
it
does
claim
to
be
multimodal
so
we'll
I'll,
I'm
just
going
to
keep
that
in
there
for
the
time
being,
the
other
ones.
I've
excluded.
For
that
reason
and
levels
of
maturity
and
stuff,
like
that,
so
I'm
just
working
through
a
checklist
now
and
we're
using
the
time
series
benchmark,
benchmark
suite
here
that
we
forked
from
the
original
one
from
timescale
and
things
that
we've
changed
in
there
we've
got
our
own
evaluation
script,
which
just
utilizes
all
the
built-in
scripts.
A
I've
had
to
make
some
tweaks
and
fixes,
and
some
slight
modifications
to
get
it
working
the
way
I
want
that
orchestrates
for
the
number
of
databases,
the
load,
the
setup
number
of
workers,
things
like
that
compose
there
that
sets
everything
up
so
yeah,
we're
running
it
in
docker.
I
think
you're
behind
this.
It
just
keeps
it
simple,
but
also
it's
containerized,
which
is
how
we
would
be
running
it
in
production
anyway.
A
So
we,
you
know,
there's
a
few
databases,
I'm
setting
up
there
and
it
will
bring
these
up
one.
At
a
time.
I
have
a
prometheus
and
c
advisor
running
to
collect
general
background
and
container
metrics
if
you're
not
familiar
with
the
advisor
it's.
What
is
used
in
the
cubelet
in
kubernetes
and
to
create
to
collect
pod
metrics
in
there
and
the
runner
itself
is
containerized
as
well
and
orchestrates
the
rest,
so
that
just
makes
it
simple
to
run.
A
We
can
override
these
environment
variables
and
control
that,
in
terms
of
the
data
that
we
have
there
just
quickly
showing
you
it,
it
spits
out
a
lot
of
these
data
files
here
for
different
runs,
different
queries.
A
If
we
just
have
a
look
in
one
of
these,
this
is
a
click
house
gives
you
a
brief,
very
simple
description
of
the
create
double
group
by
all.
I
haven't
looked
at
that
query
yet
so
I'm
not
sure
what
it's
doing.
Some
information
about
the
run,
duration
and
you've
got
percentiles
in
there,
which
is
very
useful,
so
we'll
probably
look
at
sort
of
95th
percentile
for
evaluating
for
comparing,
which
is
fairly
typical
and
a
query
rate
throughput
there
as
well.
A
So
what
I
want
to
do
with
this
is
just
write:
a
quick
script,
maybe
python
notebook,
something
like
that
that
will
get
all
these
and
make
some
graphs,
so
we
can
do
a
quick
comparison
with
them
all
and
then,
after
that,
what
we're
wanting
to
do
back
on
here.
Let
me
find
the
issue:
do
run
that
comparison
initially
just
between
time
scale
and
click
house.
A
It's
timescale
is
the
only
other
sort
of
obvious
candidate
and
then
add
create
db
and
in
there
as
well,
to
make
sure
we've,
you
know,
covered
some
obvious
choices
and
we
can
sort
of
back
up
any
decision
we
make
and
make
sure
we
analyze
cpu
memory
and
disk
usage
for
each
one
as
well
and
present
a
comparison
in
this
issue,
and
then
hopefully
I
can
make
a
sensible
choice
based
on
that
and
we
can
start
building
the
first
iteration.