►
From YouTube: Scalability Team Demo - 2021-09-02
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Okay,
so
I
have
the
first
item,
and
that
is
that
I
want
to
share
a
little
bit
about
the
rollout
of
one
of
the
new
rpcs
we
built
for
the
get
fetch
efficiency
project
and
yeah.
Let
me
just
share
my
screen,
so
this
is
a
cpu
utilization
graph
from
the
process.
Exporter,
which
shows
data,
summed
across
all
kittery
servers
in
production,
and
it
shows
two
process
groups.
One
is
kitly
hooks
here
at
the
bottom
and
one
is
gitzly
and
get
getting
hooks.
It
was
created
to
make
very
small,
relatively
boring,
rpc
calls.
A
But
when
we
introduced
the
cache
it
became
a
conduit
through
which
all
pac-file
data
has
to
pass.
So
then
it
started
using
a
lot
more
cpu
and
the
first
rp's
new
rpc.
We
deployed
and
turned
on
changes
what
getly
hooks
uses-
and
you
can
see
here
the
moment
when
we
turned
it
on
so
that
is
a
nice
drop
in
cpu.
A
B
It's
definitely
on
an
interval.
It's
not
constant
yeah.
A
Yeah,
so
we're
missing
short-lived
processes
and
kittley
hooks
is
exactly
a
short-lived
process,
so
there
will
be
more
seconds
spent
in
cpu
time
for
quickly
hooks
than
this
graph
will
show
now.
The
other
nice
thing
is
that
if
you
look
at
this
graph
of
gitly
cpu
that
utilization,
then
you
also
see
a
drop
and
in
absolute
terms
it's
actually
a
similar
drop,
because
here
we're
in
the
say,
mid
90s
and
here
where,
in
the
mid
hundreds
between
100
and
110,
so
it's
a
drop
of
about
10
cpu
seconds
per
second
and
here.
A
Well,
this
is
a
slightly
bigger
drop
here.
It's
something
like
18,
and
now
the
peaks
are
four.
So
that's
14
cpu
seconds
per
second,
but
it's
the
same
order
of
magnitude
drop,
and
it
could
be
that
we
also
deployed
something
else
to
greatly.
At
the
same
time
as
the
that
I
changed
to
feature
flag,
but
I
think
it
makes
sense
that
it
was
this.
B
B
A
Yeah,
I
I
thought
so
too
thanks
but
yeah.
We
can't
really
be
certain
because
we
just
don't
have
enough
insight
into
well
the
getting
process,
there's
a
whole
lot
of
different
things,
and
some
of
it
is
spent
on
these
hooks.
But
we
can't
go
back
in
time
and
say
well
at
that
time
the
gitly
process
spent
x
percent
of
cpu
seconds
on
hook
traffic.
A
We
can
a
little
bit
by
looking
at
the
google
cloud
profiler,
but
it's
it's
it's
hard
to
get
useful
data
out
of
that
I
found
so
that's
one
thing
and
then
another
thing
I
wanted
to
show
is
gopc
message
rates
and
I
have
to
filter
this.
If
I
want
to
show
a
graph,
because
there's
too
many
servers
and
too
many
rpcs-
and
we
don't
have
a
recording
rule
for
these.
So
if
I
try
to
draw
graph
across
all
servers,
it
just
won't
render.
A
So
this
is
the
the
primary
of
the
prefect
cluster
where
the
handbook
lives,
and
what
you
can
see
here
is
that
you
have
these
little
bumps,
which
are
back
objects
hook,
which
is
the
rpc
we
replaced,
and
here
these
bumps
are
gone
and
it's
interesting
that
these
peaks
are
up.
I
don't
know
what's
up
with
that,
but
the
good
thing
is
that
these
are
post
upload,
back
messages
and
we're
going
to
replace
that
rpc2.
B
A
Network,
transmits
bytes
node
this
one
yeah,
oh
and
then
I
can
select.
A
I
don't
understand
what
this
thing
is
doing.
Fqdn
is
file
prefect.
A
A
I
did
try
to
get
some
global
numbers,
but
for
that
I
could
only
do
a
table
instead
of
a
graph,
and
I
did
a
one-day
rate
and
a
one-day
rate
offset
by
one
week.
A
So
that
gives
us
something
to
go
by,
and
you
see
that
one
week
ago
we
have
300k
for
postal
pack,
200k
for
ssh,
upload
pack
and
100k
for
backup
objects
hook,
and
now
we
have
300k
for
poster
blog
pack
200k
for
ssh
upload
back
and
then
I
dropped
to
10k
for
info
wraps
up
back,
which
is
a
order
of
magnitude
less
so
out
of
the
top
three
it's
sort
of
three
two
one
and
and
we've
dropped
the
one
that
was
one
but
it
I
I
don't
know
if
this
is
a
good
measure
for
the
improvement,
but
if
this
means
that
we
can
expect
a
drop
in
cpu,
that
is
three
times
bigger
once
we
patch
this
up,
that
would
be
very
nice,
but
I
think
that's
wishful
thinking
and
we
won't
know
until
we
get
there,
but
anyway,
it's.
A
B
Can
you
just
just
for
our
entertainment
purposes?
Only
can
you
can
you
drop
the
the
giddy
side
of
it.
C
A
Yes,
yes,
it
does,
but
it's
it's
only
one
small
piece
of
what's
going
on
and
yeah.
No,
it
does
look
really
good
and
it's
interesting
because
we
we
have
had
an
ongoing
support
escalation
with
a
customer
where
they're
concerned
about
get
fetch
performance,
and
it
looks
like
the
overhead
of
the
hook.
Executable
is
part
of
their
problem,
although
a
lack
of
concurrency
limits
is
also
part
of
their
problem.
B
Ship
any
concurrency
limits
in
omnibus.
A
No
they're,
I
think,
they're
all
off
by
default.
We
should
probably
put
them
in
yeah
because
well
right
now,
as
I
can
the
way
I
see
it
from
where
I'm
sitting
it
looks
like
there's
a
difficult
conversation
going
on
convincing
them
to
turn
those
limits
on
and
if
it
was
the
default,
we
wouldn't
even
be
having
the
conversation
but
yeah.
A
lot
of
people,
including
matt
from
our
team,
are
are
on
top
of
that,
and
I
don't
want
to
get
more
too
mixed
up
in
it.
B
A
Yeah
unbounded
is
never
good,
but
it's
for
some
reason.
It's
a
hard
sell
in
that
case,
but
to
to
connect
this
to
that
support
situation.
The
it's
nice
that
we
can
show
that.
That
thing
that
is
using
a
lot
of
cpu
in
their
case
is
dropping
by
75
percent
in
and
the
latest
release.
B
A
Yeah,
but
part
of
the
story
here
also
is
also
that
when
we
built
the
cache,
the
the
plumbing
of
the
cache,
the
fact
that
we
make
this
extra
rpc
call
back
into
italy
and
that
the
cash
is
implemented
there,
that
adds
overheads
and
for
people
who
are
not
using
the
cash.
They
are
getting
the
overheads
and
they're
not
getting
the
benefits
of
the
cash.
A
And
when
we
worked
on
this
on
gitlab.com,
it
appeared
that
the
overhead
didn't
really
make
a
noticeable
impact
and
it's
more
uniform
if
you
just
always
have
the
same
data
path
and
if
you
can
get
already
log
cache,
keys
and
stuff
like
that.
So
at
the
time
we
decided-
let's
just
always,
go
through
the
hook,
but
it
turns
out
that
in
their
situation
they,
the
overhead.
A
Matter
but
yeah
on
top
of
that
unbounded,
concurrency,
is
also
a
problem,
but
what
they're
actually
going
to
do
now
is
to
I
think,
backboard
italy
patches
to
not
use
the
hook
when
the
cache
is
off,
which
will
which
will
help
that
particular
installation.
B
A
Perhaps
I
one
of
the
the
concerns
we
had
at
the
time
is
that,
especially
on
the
the
handbook
repo,
we
write
every
byte
that
gets
served
into
the
cache.
So
if
somebody's
hosting
gitlab
on
a
raspberry
pi
with
an
sd
card
as
the
repository
storage,
then
all
their
traffic
becomes
sd
card
rights.
That's
a
very
exaggerated
and
contrived
example,
but
it's
doing
a
lot
of
writes
can
have
a
negative
impact.
A
lot
of
disk
rights
can
have
a
negative
impact
on
the
whole
server,
so
we
were
concerned
about
just
dropping
those
disk
rights
on
everybody.
B
D
Well,
I
hope
we
see
andrew
back
in
a
minute,
yaakov
quick
question,
for
you
is
this:
the
mr3812
I
linked
it
in
the
demo.
A
No,
that's
a
different
one,
but
this
one
I
think
made
this
bump
a
little
lower.
Is
there
something
special
about.
D
That,
mr
no
I'm
trying
to
find
the
dmrs
you're
talking
about,
because
this
type
of
performance
saves,
even
if
they're
not
reliable
across
every
single
self-managed
customer,
would
be
nice
to
highlight
in
in
the
blog
post,
release
blog
post
and
I
know,
you're
not
necessarily
interested
to
highlight
those
things.
But
I
think
we
should
given
that
you
know
there
is
a
lot
of
work
that
goes
into
this.
A
Yeah,
I
I
do
want
to
highlight
these
things
and
I'll
I'll
add
a
link
to
the
rollout
issue
to
the
to
the
agenda.
Excellent.
A
In
the
upcoming,
this
changes
feature
flags
and
in
the
upcoming
release,
I
asked
with
the
kisly
team
and
they
want
the
feature
flag
to
be
off
in
the
upcoming
release,
because
we're
calling
a
new
rpc
so
otherwise,
during
a
deploy,
it
looks
we'll
start
calling
an
rpc
that
doesn't
exist,
so
it
would
be
next
release,
plus
one
where
we
can
tell
people
that
it's
on.
D
D
Okay,
great,
I
think
it's
more
important
than
ever
now
to
actually
show
that
we
are
having
some
actual
orchestrated
work,
helping
with
some
resource
consumption
and
performance
improvements,
and
so
on.
So
keep
that
in
mind.
So
I
already
see
that
you
are
so.
A
That's
good
yeah
yeah,
and
in
the
case
of
this
I
don't
know
when
you
joined
or
how
much
you
caught
off
this
story
about
this
support
escalation
with
a
self-managed
customer.
A
I
I
can
tell
from
the
reactions
that
people
have
been
communicating
this
work
towards
the
customer,
so
it's
we're
already
using
it
there
to
show.
Of
course,
that's
just
one
one
audience
member.
We
want
to
reach
with
this
message,
but
it
it
is
being
communicated
already
is
awesome.
A
Yeah
but
yeah,
so
the
really
big
hope
I
have
is
that
if
we
look
here-
and
we
see
that
before
we
made
this
change,
backobjects
was
a
hundred
thousands
and
that's
the
nature
of
brokeback
is
200
000.
Then
postal
brokeback
is
300
thousands
and
I'm
getting
very.
A
Now,
if
we
have
300
000
messages
per
second
less,
does
that
mean
we
get
a
drop
three
times
bigger
that'd
be
really
nice,
but
I
don't
know,
but
we
we,
we
may
have
a
nice
graph
to
show
when
that
happens,
when
we
get
there.
A
Because
then,
we're
talking
about
because
this
is
a
drop
of
about
10
and
the
whole
graph
is
100
to
110,
so
a
drop
of
of
30
is
near
30.
So
that's
that's
a
lot,
but
the
the
real,
the
real
thing
I'm
hoping
for
is
that
what
I'm
hoping
to
see
is
that
these
bumps
on
these
abdex
graphs
of
gitly
servers
that
we
have
let
fewer
of
these.
That
is
really
the
thing
where
we're
going
for.
E
A
Now,
well,
I
don't
think
are
we
are.
Are
we
filtering
calls
that
we
shouldn't
be
filtering.
A
Yeah,
because
some
of
the
ones
we're
filtering,
we
should
be
filtering
because
that
if
we
say
that
you
cannot
expect
a
one
gigabyte
clone
to
happen
in
one
second.
So
if,
if
it
goes,
one
gigabyte.
B
It's
only
unary
calls
and
it's
a
it's
a
it's
a
subset.
It's
a
basket
to
indicate,
and
it's
anything
that
is
like
that
is
generally
taken
out,
like
I
think
the
biggest
unary
one
is
maybe
archive
or
something
like
that,
there's
a
whole
bunch,
but
yeah.
No,
I
mean
that
most
of
the
slow
things
are
out
of
there.
It's
yeah
it's
basically
looking
for
like
get
commits
that
are
slow.
Well,.
A
Yeah
one
thing
I
remember
changing
is
that
operation
service,
which
is
part
of
kittley,
which
makes
merge,
commits
and
things
like
that,
so
I
think
that
got
excluded.
That's.
B
That's
gone
now
it
it
became
its
own
service
with
with
vastly
broader
thresholds,
and
now
it's
gone
so
because
it
was
just
it
was
just
noise
yeah.
We
couldn't
get
it
to
kind
of
play
nicely.
So
you
know
we
kept
adding
things
to
the
exclusion
list
and,
and
then
the
problem
was
that
the
op,
the
rps
on
the
service
got
so
low
because
we'd
excluded
so
much
of
it
that
it
it
wasn't.
B
Very
it
wasn't
very
good
and
then
it
just
became
noisy
and
then
we
just
said:
let's
kill
the
thing,
and
so
operation
service
is
now
totally
excluded
from
from
all.
B
D
B
F
Yeah,
because,
right
now,
the
durations
of
stuff
that
are
like
one
two,
I
don't
remember
like
satisfied,
is
one
second
but
like
get
commit
that
takes
one
second
there's
a
fine
commit
or
what
they
are.
A
A
A
F
But
that's
the
the
the
reason
that
what
andrew
said
like
if
we
pull
that
apart
and
we
can
say,
find
commit
which
we
sometimes
do
hundreds
of
times
within
a
request
needs
to
be
faster
than
I
don't
know
something
lower
than
a
second.
But
then
that
thing
that
you
just
mentioned
to
create
a
merge,
commit
or
whatever
can
be
different.
A
Yeah,
I
think,
what's
also
happening
here-
is
that
with
kittley
we've
been
defining
these
alerts
for
way
longer
than
in
general,
with
error
budgets,
so
they're
more
refined
because
of
that
but
yeah
long
term
it
should
be
easier.
Oh
that's
what
we're
working
working
towards
right!
That's
teams
can
own
these.
F
B
Yeah,
I
would
say
like
jakob
that
they
are
probably
the
ones
that
received
the
most
deltas
and
changes
because
of
the
amount
of
alerting,
because
we
have
the
per
node
level
alerting,
which
is
kind
of
unique.
B
While
it
is
unique
in
our
system
and
we
get
so
many
more
alerts
because
we're
effectively
dividing
the
slis,
you
know
60
ways
and
we
have
60
different
buckets
that
we
putting
those
in
and
therefore
we
get
more
much
higher
volume
of
alerts
because
of
that
people
are
changing
them
much
more
frequently
than
almost
any
other
one.
B
And
so
I
would
say
that
there's
probably
like
a,
we
probably
need
to
go
through
them
at
some
stage
and
kind
of
get
everything
in
order,
because
it's
probably
been
a
thousand
small
changes
that
people
have
done
as
they
are
fed
up
with
getting
an
alert
at.
You
know
three
o'clock
in
the
morning
on
a
saturday
morning
or
whatever,
and
you
know
we
it
it
probably
needs.
Some
consolidation
is
what
I'm
trying
to
say
because
does
that
that
check
files
probably
changed
more
than
any
other.
A
Yeah,
but
I
don't
think
we,
the
kind
of
filtering
we
have,
there
would
be
like
a
stage
group
saying
this
route
should
be
excluded
and
this
route
should
not
be
excluded,
and
this
route
is
this
and
like
we
have
different
threshold
categories
in
the
application
and
and
that
level
of
detail
no
stage
group
can
currently
say
that
about
their
requests.
A
That's
what
I
meant
by
refined.
That's
the
the
way,
the
run
the
the
metrics
catalog
is
organized.
We
can
point
out
individual
individual
rpcs
and
ignore
them
or
not.
F
So
wait
a
second
share.
My
screen.
F
F
So
right
now,
this
is
running
on
my
local
and
it's
producing
graphs
that
are
not
as
impressive
as
the
ones
that
jakob
was
showing
because
they're
fake,
but
these
are
the
the
sli
kind
of
metrics
that
we
want
to
like.
This
is
going
to
be
the
metric
that
we
will
allow
stage
groups
to
set
thresholds
for,
and
it's
going
to
have
two
counters
the
total
counter
and
the
success
counter.
F
I
I
wanted
to
point
out
here:
is
this
nice
period
where
everything's
zero,
the
speeds
that
we
are
that
we
will
have
recording
when
the
rails
application
has
just
started
on
a
new
pulp,
for
example,
where
before
that
would
be
missing
now
that
will
be
zero,
which
makes
it
easier
to
calculate
with,
and
it
will
avoid
like
missing
metrics
like
we
see
now
for
error
budget
when
suddenly
there's
a
huge
spike
or
a
huge
drop
when
the
metric
starts
to
record
when
it's
coming
from
nothing
to
something
instead
of
going
past
zero.
F
This
also
means
that
we
will
not
have
like
one
side
of
the
graph
being
there
and
the
other.
Not
this
shows
the
number
of
metrics.
So
this
is
the
number
of
endpoints
that
we
have
in
total
and
we
can
see
that
total
and
success
is
the
same
like
yeah.
F
F
One
id
that
I
had
while
working
on
this
merger
quest
was
limiting
what
endpoints
we
initialize
in
the
beginning,
based
on
the
fleet
that
we'll
be
emitting
them
from
so,
for
example,
the
api
fleet
doesn't
need
to
initialize
all
the
controllers
or
with
the
graphql
controller,
but
it
does
need
to
initialize
all
the
the
api.
The
great
band
points.
F
B
So
it's
funny
because
I
came
into
this
call,
having
spent
quite
a
bit
of
time
with
jegos,
because
he
recently
added
or
someone
in
in
one
of
the
teams
that
he
works
with
added
some
metrics
and
they
were
basic.
They
couldn't
even
run
it
in
table
mode.
B
You
know
with
the
one
minute
rate,
and
it
was
just
crashing
and
kind
of
people
are
saying.
Well
now
that
everything
is
in
pods,
it's
just
that
the
cardinality
of
everything
and
everyone's
kind
of
starting
to
complain
about
this
is
becoming
a
big
thing.
The
very
interesting
part
was
that
we
then
went
straight
to
the
prometheus
server
and
we're
getting
you
know.
So
we
skipped
thanos
and
we
started
getting
much
better
results
in
instant
results.
B
A
B
D
F
A
B
B
F
F
B
So
so
that
so
what
I
actually
just
put
it
at
the
bottom
of
the
agenda,
but
what
I
always
think,
because
you
know
we
to
we
think
in
aptx's
and
error
budgets
and
and
and
that,
but
a
lot
of
the
engineering
teams
they
they
still
looking
at.
You
know:
histogram
quantile.
You
know
they
want
to
know
what
the
p95,
even
though
it's
terribly
inaccurate.
B
That's
what
they're,
looking
for
and
they're
always
trying
to
do
that
on
the
raw
data
and
that's
basically
just
failing-
and
I
was
one
we
could
generate.
Recording
rules
like
that
are
useful
for
those
quantile
things
for
all
the
things
that
we've
sli's
for
automatically
and
then
we
can
give
people
the
option
to
to
to
run
those.
But
you
know
what
I
mean
and
then
and
then
that's
what
they
would
use
rather
than
the
raw
metrics.
And
then
we
just
have
to
do
some
education
and
tell
people
about
that.
I'm.
F
F
And
this
is
the
part
that
I'm
not
sure
of,
because
in
the
past
I've
showed
you
like
on
one
of
the
talks,
this
weird
trick,
that
we
did
to
get
the
feature
category
onto
the
http
request,
total
metric
and
that
we're
only
initializing
part
of
it,
because
not
everything
is
emitted
from
everywhere,
and
that
was
that
was
with
ben
at
the
time
and
ben
was
worried
about
prometheus,
not
thomas,
not
querying.
A
Well,
how,
I
suppose
it
depends
on
how
many
pods
we
have
to
a
prometheus
server.
F
B
B
I
mean
I
I
looked
at
it
for
something
else.
The
other
day
and
those
prometheus
were
like
they're
sitting
kind
of
pretty.
At
the
moment
I
mean
we
should
go,
look
through
it.
You
can
go
look
at
like
how
many
what
the
sample
rates
on
each
of
them
are,
and
you
can
obviously
just
query
that
at
the
thanos
level
to
kind
of,
but
they
they
were
pretty,
they
were
pretty
good
and
yeah.
You
know
one
of
the
things
we
should
ask
mikkel
to
start
thinking
about
is
how
do
we
add
like
a
second
prometheus?
B
B
F
I
I
like:
that's,
that's
good.
I
want
everything
initialized
because
now
like
for
error
budget,
I
get
people
creating
an
issue
because
suddenly
something
has
moved
to
to
kubernetes
and
then
the
metric
hasn't
followed
yet
and
then
yeah.
It
does
these
weird
things,
because
the
metrics
weren't
initialized
and
yeah
that
I
want
to
get
rid
of
it's
just
easy.
If
you
don't
need
to
think
about
it,
but
some
of
the
things
like
on
the
git
fleet-
we're
not
never
going
to
have
the
web
ide
render.
F
A
F
A
F
A
Well,
one
thing
we
could
do
is
query:
each
of
these
prometheus
servers
individually
take
a
metric
that
exists
across
all
pods
and
do
some
sort
of
estimate
of
how
many
different,
what
the
cardinality
of
the
fault
is
and
see.
If
it
has
a
number
of
how
many
metrics
it
can
even
can
even
track,
and
then
we
can
say
number
of
parts
times,
5
thousands
does
it.
F
B
Think
that
that
metric,
that
you
can
look
at
for
the
number
in
each
is
called
tsd
prometheus
tears
to
be
head
series.
Sorry,
I
was
urgently
scrambling
to
try
to
find
that
I'll.
Just
stick
it
in
here.
I
think
that's
what
it
is.
If
I
remember
correctly
I'll,
just
stick
it
in
there.
This
isn't
really
a
demo,
but
it's
just
a
little
heads
up.
B
B
But
I
think
there
was
like
some
concern
around
you
know
this
whole
big
thing,
that's
quite
complicated
and
also
the
fact
that
there
are
other
there
are
a
lot
of
other
use
cases.
It's
not
just
project
horse.
It's
it's!
Actually,
you
know
lots
of
self-managed
customers
who
could
he
could
use
this,
so
it
was
a
bit
of
backwards
and
forwards.
I'll
just
share
my
screen.
Quick.
Sorry
just
take
that
off
there
and
after
a
whole
bunch
of
discussion,
we
ended
up
deciding
that
we're
going
to
stick
this
into
runbooks
project.
B
For
now,
and
maybe
in
future,
we
will
take
this,
but
also
the
other
metrics
catalog
and
move
them
out
of
run
books,
but
it
didn't
make
sense
to
kind
of
have
it
half
in
and
half
out
like.
We
should
definitely
keep
this
alongside
the
metrics
catalog
rather
than
a
part,
and
it's
too
hard
to
take
the
run
books,
metrics
catalog
out
of
the
run
books.
You
know,
with
the
with
the
time
frames
that
we
have
now
and
move
that
out.
B
But
what
I'm
kind
of
imagining
is
that
we'll
have
like
several
different,
like
topologies
like
a
get
hybrid
topology
and
that
will
have
its
own
metrics
catalog.
Now
a
lot
of
this
is
going
to
be.
B
Gets
is
the
gitlab
environment
toolkit,
which
is
effective.
I
mean
this
might
actually
end
up
being
called
the
reference
architecture.
The
hybrid
reference
architecture,
topology,
is
probably
actually
a
better
name
for
it.
Because
that's
what
gets
generates
you
know
you
could
you
could
stand
up
a
similar.
D
E
B
The
yeah
terraform
and
ansible,
but
yeah.
So
so
you
know
what,
with
with
horse
we've
actually
got
this
going,
and
it's
deploying
and
it's
there's
only
one
service,
but
it
was
really
just
to
kind
of
prove
that
it
worked,
and
so,
if
you
look
at
get
and
actually,
if
you
look
at
the
hell,
the
home
charts,
there's
kind
of
the
single
service
called
web
service,
which
is
a
funny
name
but
anyway,
and
so
you
know
just
as
an
experiment.
B
I
created
a
metrics,
a
metrics
catalog
with
this
web
service,
and
you
know
it's
at
the
moment.
It's
only
got
a
puma
sli
in
it,
which
is
like
simple,
but
it's
kind
of
just
to
prove
it
out,
and
what
was
really
interesting
is
the
same
customer
that
I
suspect
the
same
customer
that
jacob
was
talking
about
earlier.
Somebody
who
was
working
on
that
was
saying
it's
really
awful
working
with
the
omnibus
dashboards
and
we
really
need
to
to
put
the
slo
monitoring
in
there
like.
B
How
can
we
do
this,
and
I
think
that
this
is
like
a
really
you
know
we
can
have
if
we
set
up
a
reference
architecture,
metrics
catalog,
we
can
generate
this
on
the
you
know,
for
that
customer
we
can
either
give
them
the
yaml
and
the
json
or
we
can
you
know
if
they
say.
Oh,
you
know,
we've
got
these
other
labels
or
whatever
we
can
just
put
them.
B
We
can
say
well,
you
know,
go
and
edit
your
your
jsonnet
file,
wherever
the
metrics
config
is
and
add
add
the
labels
that
you
need
or
customize
it.
However,
you
need
for
your
environment
and
then
run
it
again,
but
we
can
also
ship
it
with
like
the
the
recording
rules
and
the
and
the
dashboards
like
in
the
same
way,
we
have
the
gitlab
dashboards
project
at
the
moment
with
a
bunch
of
json
in
it,
and
we
could
do
that
and
one
one
of
the
reasons
why
I
like
this.
B
Is
it
kind
of
takes
there's
a
lot
of
things
that
need
to
happen
for
horse
and
like
just
keeping
this
out
is
like
one
less
kind
of
overhead
for
that,
and
it
also
gives
other
people
like
a
big
advantage
for
for
being
able
to
use
these
dashboards.
You
know
they're
not
specific
to
to
horse.
They
are
specific
to
multi-node
gitlab.
F
B
The
thing
that
the
thing
that
I
want
is
is
like:
I
want
a
way
that
people
can
like
look
at
dashboards
and
understand
quite
quickly
like
the
health
of
of
a
gitlab
instance.
So
it's
still
very
much
specific
to
get
lab
instances,
and
I
want
to
kind
of
be
doing
the
same
thing
we
do
on
gitlab.com
and
having
the
same
kind
of
tied
in
with
the
same
sort
of
rate
of
iteration
that
we
have
on
those
you
know
when
I've
looked
at
the
omnibus
charts.
B
B
B
We
could
start
off
with
just
the
ci
runners
in
there
or
a
few
of
the
ci
services,
maybe
the
git
service
and
the
gita
service,
and
and
actually
ask
some
of
the
people
that
are
working
with
that
client
to
give
that
a
try
and
and
see
what
they're
you
know
see
because
they've
already
asked
for
this,
so
we
could
get
that
going
quite
quickly
and
give
them
the
yaml
and
say
here's
a
here's.
Some
rules
apply
these
rules
and
then,
let's
take
a
look
at
the
dashboards.
B
D
F
B
The
it
would
have
to
it
would
have
to
live
in
a
different
place,
not
in
not
in
yeah.
So
I
mean
I
I'm
happy
to
discuss
it
more,
but
my
main
thing
is:
I
really
want
to
kind
of
get
on
with
with
building
up
some
dashboards,
and
I
you
know
we
can
move
it
out.
A
If
you're
trying
to
it's,
I
would
probably
find
it
easier
to
work
with
to
say
in
one
repo,
yeah
and
and
to
discover
what
the
structure
is
and
where
the.
Where
the
the
lines
are,
where
you're
going
to
cut
out
bits.
When
they're
just
directories
in
in
one
repo,
rather
than.
D
A
A
I
went
through
a
project
of
putting
workhorse
into
the
bank
repository
there's
a
if
you
want
to
move
fast
and
try
things
or
if
you.
A
B
So
I
mean
my
biggest
worry
was
that
people
would
check
changes
into
the
runbooks
project
that
would
break
the
downstream
project.
And
then
you
like
find
out.
You
know
the
next
time
you
try
run
it
and
it
all
becomes
and
just
having
it
all
together,
sort
of
solves
that,
even
though
there's
a
bit
of
extra
complexity.
B
B
Yeah
yeah,
so
it's
I
think
I
think
that's
a
reasonable
thing,
but
also
yeah,
I'm
I'm
looking
forward
to
like
they've
been
some
people
that
have
reached
out
to
me
and
said
like.
Oh,
we
want
these
dashboards
and
maybe
saying
you
know,
pinging
them
back
and
saying:
hey!
Here's
like
something
alpha.
If
you
want
to
try
it,
you
know
give
it
a
try.
B
I
suspect
that
lots
of
people
use
different
job
names
on
their
on
the
names
for
jobs
in
their
gitlab
instance,
and
that's
going
to
be
kind
of
you
know,
because
I
don't
there's
no
standard
on
on
what
you
call
the
giddily
exporter.
So
we'll
do
things
where
we'll
select
job
equals
gideon,
but
someone
else
like
in
omnibus
has
got
a
different
name.
It's
called
like
giddly
prom,
or
something
like
that,
and
you
know
so.
They're
all
they're
all
different
and
that's
going
to
be
a
bit
of
a
challenge.
A
D
B
That's
the
that's
the
work
that
was
done
like
last
week.
We
took
as
much
of
that
conflict
all
the
conflict
that
I've
seen
so
far
and
we've
put
it
into
a
single
file
and
there's
one
for
the
get
instance
and
then
there's
another
one
for
gitlab.com
and
it's
got
stuff
like
this.
Environment
has
stages,
yes
or
no,
so
that
you
don't
have
like
stage
label-
and
this
is
this.
There's,
like
you-
know,
environment
label,
for
example,
and
that's
another
one.
B
I
think
type
label
is
going
to
be
there
forever,
because
it's
just
like
kind
of
fundamental
to
the
way
we
do
things
but
the
other
labels.
It's
you
can
configure
those
and
it's
got
a
whole
bunch
like
basically
all
the
differences
I'm
trying
to
put
into
one
file
and
then
and
then
you
know
we
can
we
can
do
it
that
way
and
actually,
interestingly,
if
you
go
look
at
a
lot
of
the
kubernetes
charts,
they're
all
doing
this
as
well.
B
So
you
know
all
the
the
the
kubernetes
monitoring
it's
all
presented
in
json,
primarily
they
they
normally
have
like
the
raw
kind
of
default
version
in
yaml.
But
all
of
the
like,
and
for
lots
of
different
projects,
I'm
seeing
them
presenting
it
as
jsonnet,
and
then
people
saying
you
know
they
say
if
you
want
to
change
this,
you
know
put
this
config
in
here
and
change
this
value
and
then
run
js
on
it
and
you'll
get
a
new
and
you
file
that
c2
environment.
B
B
So
that
was
kind
of
tied
in
with
the
last
conversation,
but
that
was
just
I
did
not
mention
it.
Maybe
that's.
When
I
ran
away
from
the
snake
we
have,
we
haven't
stopped
we,
but
we
don't
have
for
every
single
sli.
We
don't
have
a
rate.
So
what
I
was
thinking
is
for
every
sli,
that's
based
on
a
histogram.
B
We
generate
effectively
some
by
le
comma,
significant
labels
and
then
the
the
underlying
histogram
and
then
and
then
it's
very
easy
for
people
to
do.
P90,
p50
and.
F
F
B
This
is
alongside
that
yeah
and-
and
it
comes
from
you
know,
this
call
that
I
had
with
all
this
discussion.
I
had
with
jegosh,
where
he's
trying
to
do
this
and
it's
just
failing,
and
but
should
we
be
facilitating
this
at
all
because,
like.
B
B
B
B
I
I
tend
to
agree,
like
I
think,
that's
probably
better
in
general,
because
but
I
don't
know
in
this
case,
if
it
was,
I
got
the
impression
that
these
steps,
there's
multiple,
there's
like
up
to
20
steps
in
a
single
request
and
that
you
don't
want
to
be
logging
every
single
step
right.
You
don't
want
to
have
like
20
log
lines
per
request.
B
But
this
is
a
variable.
This
is
like
a
ci.
These
are
sort
of
ci
processors,
and
so
they
have.
I
don't
you
know
it's
slightly
different,
so
you
couldn't
have
like
20
different
labels
for
20
variable
steps
in
a
in
a
ci
pipeline,
but
I
I
might
be
wrong
on
that,
but
also
the
other
thing
is
clearly
there's
something
wrong
with
thanos
and
it's
because
if
you
go
to
the
underlying
prometheus
to
promethei,
it's
it's
working,
much
better.
A
A
I
I
was
half
joking,
but
I
I
was
getting
confused
because
I
was
running
queries
with
lots
of
results,
so
basically
all
kittly
methods
and
the
all
the
graph
looked
like
it
was
at
the
bottom.
And
then
there
was
all
this
white
space
above
and
I
thought
why
is
it
sizing
the
y-axis
to
have
all
this
white.
D
A
A
Okay,
yeah
well,
in
this
case,
switching
to
classic
health.
I
don't
know
if
it's
if
it
does
any
good
for
you.