►
Description
Weekly demo issue - https://gitlab.com/gitlab-org/incubation-engineering/apm/apm/-/issues/14
A
Hello,
joe
shaw
here
full
stack
engineer
in
the
incubation
engineering
department.
My
focus
is
application,
performance
management,
monitoring
and
observability
as
a
single
engineer
group,
so
we
can
bring
it
an
observability
platform
into
gitlab.
This
is
my
weekly
update,
video
here's,
the
issue
I've
created
to
cover
that
and
I'll
link
to
the
recording.
Soon
you
can
subscribe
to
the
weekly
issue
links
from
there.
A
A
So
if
we
have
a
look
at
that
previously
we'd
use
click
house,
we've
evaluated
click
house
to
see
how
that
will
work
with
respect
to
sort
of
metrics
and
time
series
data
against
things
like
timescale
db.
It
performs
very
well-
and
I
was
happy
with
that
evaluation,
so
we're
going
ahead
with
using
click
out,
so
I've
been
refining
a
design
for
it.
So
in
the
existing
benchmarks
with
click
house,
the
schema
that
are
used
for
the
metrics
is
very
restrictive.
A
So
there'd
be
a
table
like
this
for
every
single
measurement,
so
it
might
be
disk
io.
It
might
be
some
kernel
specific
resources.
It
might
be
like
pod
memory
utilization,
for
example,
and
then
it
has
again
a
tags
table,
that's
very
fixed
that
has
really
very
specific
tags
for
this
benchmark.
But
again
in
reality,
you
can't
have
a
fixed
tax
table,
because
you
know
that
your
users
are
going
to
have
arbitrary
tags.
A
So
again,
this
performs
very
well
with
the
queries
as
we'll
see,
but
it
is
not
flexible
enough
for
our
use
case.
So
moving
on
from
that,
what
we
do
is
we
start
with
a
naive
schema
where
we
basically
just
denormalize
this
and
flatten
it
down,
so
it
has
the
timestamp
and
host
a
measurement,
a
field
and
a
value
and
an
array
of
tags.
A
So
every
one
of
these
fields
in
here
then
becomes
an
individual
record
and
it's
a
naive
approach.
So
we
get
a
baseline
of
you
know
how
how
badly
can
we
do
it
without
putting
any
effort
in
then
we're
refining
that
to
create
a
better
solution?
We
think
so.
A
I
benchmarked
that
as
well
and
then
looking
at
the
actual
refined
table
the
process
we
went
through,
we
started
looking
at
codecs
that
we
can
use
for
better
compression,
so
the
timestamp
is
better
with
a
double
delta
codec,
which
is
really
good
for
sequences
like
times,
ideally
ones
with
like
a
fixed
gap
between
which
might
always
be
the
case,
but
it
compresses
this
very
well
low,
cardinality
strings
for
a
lot
of
the
fields.
A
So
this
uses
a
different
data,
set
data
structure
in
click
house
for
strings
or
any
or
any
particular
compatible,
cardinality
type
that
doesn't
have
a
large
set
size,
and
so
that's
quite
ideal
for
things
like
the
measurement
name
and
maybe
certain
tags.
In
reality,
it
probably
wouldn't
work
for
hosts,
because
you
would
have
lots
and
lots
of
different
named
hosts
if
you're
looking
in
a
real
data
set.
A
For
example,
if
you've
got
clusters
in
google
cloud,
they're
going
to
have
loads
of
unique
hosts,
so
that
might
not
be
appropriate,
but
for
this
benchmark,
we're
leaving
it
as
a
low
cardinality
and
then
for
actual
value.
We're
changing
that
to
a
gorilla
codec,
which
is
being
designed
as
part
of
a
a
research
paper
from
facebook,
where
they
were
designing
a
time
series
database
and
the
gorilla
codec
forms
an
xor
xor
between
adjacent
binary
values.
A
A
Then
we
put
in
partitioning
so
partitioning
by
day,
so
the
actual
chunks
that
are
being
written
to
disk
getting
petitioned
and
then
click
house
when
it's
looking
at
queries,
can
only
load
and
then
and
search
indexes
that
are
relevant
to
particular
partitions.
So
that's
quite
useful.
A
Each
one
of
these
changes
we
benchmarked
and
we
got
you-
know
gradual,
better
improvements
as
we
went
along
and
then
we
actually
moved
the
tags
and
fields
into
nested
structures
so,
rather
than
having
completely
completely
denormalized
method,
as
as
in
the
naive
one,
we
bring
it
back
into
having
the
fields
in
the
same
table
as
the
measurement,
but
having
the
flexibility
that
those
fields
are
just
sort
of
key
value
pairs,
and
you
can
see
that
the
table
structure
we
end
up
with
there,
and
this
is
what
we're
benchmarking
against,
and
I
think
this
is
you'll
be
glad
to
hear
if
you've
watched
my
previous
videos
the
last
few
weeks,
the
last
benchmark,
I
intend
to
do
for
a
while.
A
So
that's
good,
so
yeah.
So
we
couldn't
use
the
nested
structure
here
for
the
fields,
because
I
couldn't
find
a
way
of
getting
the
code
to
apply
properly
in
that
format.
A
A
So
let's
have
a
look
at
us.
Benchmarking,
this
against
time,
scale
db.
We
wanted
to
make
sure
we
rerun
that
as
well
just
to
verify
the
previous
results
and
compare
it
here.
I
didn't
want
to
use
the
old
benchmarks
just
to
make
sure
you
know
we
got
fresh,
runs
default
click
house
design,
which
is
this
sort
of
optimal
design
that
I
showed
you
that
wouldn't
really
work
in
the
real
world.
A
Our
our
single
table
naive
solution,
which
we
expect
to
perform
quite
badly
and
then
the
single
table,
refined
solution,
which
is
this
one
up
here
so
and
I'm
using
the
devops
data
set.
So
we're
not
only
getting
cpu
data
in
there
we're
getting.
I
think,
there's
about
10
different
data
sources
so
when
it
comes
to
having
a
single
table
there,
that
table
is
where
all
those
data
sources
are
going
as
opposed
to
going
into
separate
tables
which
does
put
a
burden
on
that
table
in
terms
of
performance.
A
A
So
here
we
go
so
loading
metrics
rate
you
can
see
while
the
refined
one
doesn't
perform
as
well
as
click
house
itself.
It
performs
slightly
better
than
timescale
db,
so
that's
good
volume
sizes,
pretty
small
for
the
refined
solution,
even
smaller
than
the
original
click
house,
one
in
some
cases.
So
that's
great.
A
The
single
table
naive
solution
even
beats
time
scale
in
certain
cases
as
well
as
you
can
see
like
here
and
here,
and
there
are
a
few
outliers,
but
it
never
performs
particularly
badly,
and
this
is
95th
percentile.
So
again,
95
of
all
the
results
in
terms
of
latencies
are
doing
much
better
than
this.
So
this
is
kind
of
worst,
almost
worst
case
scenario,
so
I'll
quickly,
scroll
through
here.
A
So
there's
only
there
are
only
a
few
where
the
single
table
won't
refine
one
just
relatively
betterly
in
comparison
and
this
one
that
timescale
db1
didn't
even
return
properly,
but
these
are
as
we
get
further
down
here.
These
are
queries
that
are
quite
unusual,
very
high
stress,
queries
and
not
at
all.
The
sort
of
thing
that
we
would
actually
be
using
the
data
set
for
is
just
to
kind
of
put
it
through
its
paces,
so
yep
so
still
doing
quite
well.
A
In
this
section
again,
for
example,
some
group
by
just
does
very
well
there
as
well
as
you
come
back
down
to
some
of
the
end
ones
again
performs
very
well,
so
we
can
be
happy
with
that.
We
could,
you
know,
as
it
gets
put
into
real
use.
We
can
then
start
analyzing
the
queries
that
are
actually
getting
run
and
see
if
we
need
to
do
things
like
have
materialized
columns
or
views
on
top
of
the
data
set.
I
think
this
is
fine
for
the
time
being.
A
As
far
as
I
can
tell
so,
moving
on
to
the
actual
cpu
memory
use,
you
can
see
that
it
has
similar
profiles
to
click
house
itself
in
the
normal
table
setup
and
as
the
dataset
gets
larger
cpu
usage
is
pretty
good
and
actually
its
memory
usage
over
the
runs
is
generally
better
than
all
of
the
cases
as
well,
especially
the
much
larger
one
down
here.
It
has
a
very
solid
memory
profile,
whereas
sort
of
time
scale
is
really
stretching
up
to
well
over
60
gig
and
click
house
as
well.
A
Is
it's
using
quite
a
lot
of
memory
so
that
that's
a
that's
a
really
good
time?
So
I'm
happy
with
that.
So
I'm
going
to
go
ahead
and
then
use
this
schema
that
we've
got
all
the
way
up
here.
A
The
only
other
thing
added
to
the
issue
here
was
there
a
few
links
found.
While
I
was
designing
that
there's
a
an
article
here
about
how
you
design
time
series
efficiently
some
information
about
the
codecs
that
I'm
using
and
some
nice
feature
slides-
and
there
was
an
article
about
recently-
click
house
incorporated
and
it's
become
its
own
company
as
well,
which
is
great
to
see.
Hopefully,
that
means
a
lot
more
releases
and
new
features
from
them,
which
is
fantastic.