►
From YouTube: Scalability Demo Call - 2021-05-06
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
B
So
I'm
not
going
to
share
my
screen,
you
can
all
read
the
issue,
but
I
am
I
don't
know
I
I
was
with
craig.
I
was
discussing
database
transaction
time
like
real
database
transaction.
B
B
So
yeah
I
was
discussing
like
time
and
sql
query
timings
with
craig
and
the
merchant
question.
Then
I
noticed
hey
these
things
have
a
feature:
category
label
and
I
was
wondering
how
like,
if
and
how
and-
and
we
should
add
those
to
the
error
budget.
So
we
have
threshold
sets
set
for
queries,
how
fast
they
need
to
perform
and
if
an
endpoint
is
performing
slow
queries,
we
could
ding
them
for
it.
B
D
So
if
we're
using
histogram
buckets,
we
don't
actually
know
like.
We
know
if
they're
satisfied
or
not
satisfied
by
the
timings
right,
like
you
know,
if
the
timings
are
good
or
bad,
but
I
was
thinking
another
way
we
could
do.
That
is
to
have
like
a
counter
of
database
seconds
consumed
by
this
whatever
and
say
that
you
have
a
budget
of
like
the
field
in
the
logs
yeah.
So
you
have
a
budget
of
x
seconds
per
thing
that
you're
doing,
and
we
can
just
say
if
you're
exceeding
that
budget
or
not.
C
It
sounds
almost
like
this
is
orthogonal
to
error
budgets.
Yes
like.
If
people
have
a
feature,
that's
fast,
then
we
don't
care
how
what
it
does
as
long
as
it's
fast
and
if,
if
the
date,
if
we
want
to
say
that
the
database
is
some
sort
of
resource-
and
we
want
to
give
people
quote
as
of
how
much
they
can
use
the
database-
that's
a
nice
idea,
but
that
doesn't
sound
like
it's
error
budgets,
that's
more
about
stopping
people
from
using
the
database
too
much.
E
Also,
surely
what
you
can
do
is
you
can
have
a
like
a
measurement
of,
I
think.
Maybe
this
is
what
you
were
saying.
So
excuse
me
if
I'm
paraphrasing
you,
but
it's
basically
a
histogram
of
the
sum
total
for
the
request
right
and
then
it's
just
a
normal
aptx
after
that,
so
you
get
two,
you
get
that's
what's
going
on
and.
B
E
Suggesting
yeah
yeah
yeah
and
then
one
is
kind
of
the
one
is
a
like
the
one
where
we
measure
each
request.
We
actually
attribute
that
to
patrony
and
not
to
as
particular
team,
because
it's
a
way
of
kind
of
judging
the
database,
but
this
one
would
be
not
patrony,
it
would
be
the
actual
yeah.
Then
it
would
be
each
category
yeah.
F
B
B
It
is
not
so
important
it's
how
we
I
like
the
idea
of
using
a
separate
thing
and
jakob
to
reply
to
what
you
said
and
it's
a
way
of
like
seeing
if
people
are
nice
to
resources-
and
I
think
of
it,
like
I
think
andrew
mentioned-
that
before
to
like
it's
a
way
of
having
mechanical
sympathy,
things
included
in
those
budgets
like
then
mechanical
sympathy
becomes
how
nice
are
like
if
you're
doing
a
thousand
queries
in
your
request
that
I
think
that
would
currently
show
I
don't
know,
do
we
have
mechanical
sympathy
for
the
database.
C
I
I
was
just
going
to
say
that
if
something
is
slow,
like
the
users,
don't
care
how
much
something
uses
the
database.
So
if
something
uses
the
database
in
a
bad
way,
it
will
show
up
in
from
the
user
perspective,.
C
C
The
system
by
having
so
we
want
to,
we
want
to
have
the
teams,
make
good
decisions
about
what
to
focus
on
and
we
can
have
a
simple,
simpler
model
where
we
just
say
this.
This
thing
is
slow
too
often
of
the
time
so
something's
wrong,
or
we
can
have
a
complex
model
where
we
say
you're
using
the
database.
A
lot
so
maybe
you're
doing
something
wrong,
and
you
should
look
at
that.
But
the.
B
E
Yeah
it
I
see
where
you
like.
Basically,
you
know
the
whole
thing
with
the
service
level
monitoring.
Is
you
you're
measuring
the
service
quality
from
the
user's
point
of
view,
and
so
so
maybe
we
shouldn't
be
confusing
the
you
know
and
that's
where
error
budgets
come
from,
and
maybe
we
shouldn't
be
confusing
this
with
that.
Is
that
another
way
of
putting
what
you're
saying.
C
Yeah,
that's
one
way
and
another
way
to
look
at
it,
and
I
don't.
I
don't
have
a
good
view
of
the
bigger
project,
but
that's
the
way.
I
understand
it
we're
trying
to
make
a
big
step
in
adoption
of
error
budgets
across
the
organization
and
if
we
can
get
a
good
result
from
people
if
we
can
have
a
simple
system
that
gives
useful
signals
and
that
changes.
F
C
C
A
The
air
budgets
that
we're
putting
in
front
of
people
at
the
moment
by
and
large
a
lot
of
the
stage
groups
are
seeing
red
bars.
So,
whether
or
not
we
add
the
database
information
now
it
it's,
it
would
probably
just
make
the
red
bars
even
redder
like
it
would
make
it
worse.
So
I
think
that
we've
already
given
them
quite
a
lot
to
focus
on
and
quite
a
lot
of
things
that
they
can
improve,
and
it
may
be
that
in
future,
once
things
are
more
green,
we
can
then
decide
right.
B
I
think
it
it
would
be
easier
to
track
down,
because
if
you
like,
the
whole
thing
all
comes
down
to
what
was
good
and
what
was
bad
and
you
get
points
to
score
and
points
to
lose
and
if
you've
lost
points
on
database
time
and
on
request
time,
then
you
know
what
to
look
for
in
other
places
like.
Where
am
I
executing
too
many
queries,
but
we
can
start
with
what
we
have
for
sure.
E
E
E
Just
one
other
thing
is
there's
also
like,
in
addition
to
that,
which
I
think
is
a
really
really
important
point,
but
also
the
it
kind
of
there'll
be
two
two
things
contributing
to
that
error
budget
right
because
then
so
so,
if
you
imagine
a
request
coming
in
and
there's
one
request,
that's
really
really
slow
because
it
goes
to
an
external
thing
or
it
just
spins.
It's
just
four.
I
equals
one
to
a
million.
E
E
I
equals
one
to
a
million,
but
on
the
one,
if
you
look
at
it
overall,
they
will
get
like
50
and
on
the
other
one
they
will
get
zero
percent
right
and-
and
that's
not
really
fair,
because
the
user
doesn't
in
both
cases
it
was
a
really
slow
request,
but
we're
judging
the
one
by
because
it
was
a
database
problem
by
two
things
rather
than
one.
If
that
does
that
make
sense,.
C
You're
diluting
the
strong
signal
that
the
user
had
a
bad
experience
by
having
this
conflicting
signal
that
the
database
was
fast,
which
the
user
doesn't
care
about.
E
Yeah,
because
if
it
was
like
external
http
requests-
and
we
don't
have
that
covered
by
this-
then
you
know
we
ignore
that,
and
so
you
know
they
only
lose
one
point
for
that
because
it
wasn't
the
the
database,
which
is
the
thing
that
we're
watching
on
this.
I
think
it's
better,
as
you
said,
to
keep
it
simple.
B
But
we
do
have
sli's
that
are
not
user-faced
like
they
are
user-facing,
but
they're
not
like
gitaly
is
one
of
them
like
the
rpc
speed
that
currently
gets
attributed
with
the
italy
feature
category
or
whatever.
E
C
Consumer,
if,
if
italy
time
spent
in
italy,
is
part
of
the
error
budget
right
now,
I
think
I
would
say
it
shouldn't
be
for
the
same
reasons.
F
With
the
topic,
just
as
a
brief
tangent,
I
think
it's
maybe
worth
mentioning
that
if
we
do
have
consumers
of
of
database
connections
that
really
do
go
through
a
churn
of
just
making
the
numbers
up,
a
thousand
requests,
a
thousand,
a
thousand
connection
leases
per
incoming
request.
F
That
is
a
system-wide
degradation
class
of
events,
and
I
don't
think
the
chargeback
should
necessarily
be
we're
not
really
talking
about
charge
packs,
and
this
doesn't
strictly
fall
in
the
category
of
error
budgets,
but
it
does
fall
in
the
category
of
of
overall
aptx
jeopardy
for
for
the
system
as
as
a
whole,
not
necessarily
for
the
the
particular
component.
That's
consuming
those
that
that
high
churn
on
lee's
event
does
that
make
sense.
F
Problem
in
general
yeah
exactly
it
would
have,
it
would
exactly
have
a
knock-on
effect
for
all
of
the
consumers
that
compete
for
that
connection
pool.
Do
we
have
a
reason
to
suspect
that
that's
happening
now.
F
E
Certainly
on
on
giddily,
with
the
giddy
n
plus
ones,
we
do
have
we
used
to
have
like
five
six
thousand
grpc
requests
and
like
luckily
over
time
with
lots
of
work.
We've
got
that
done.
E
I
was
just
looking
through
the
the
mechanical
sympathy
alerts
and
I'm
not
sure
whether
we're
below
the
threshold
or
whether
we
don't
have
an
alert
on
that.
But
I'm
going
to
check
after
this,
because
because
then
we
could
look
in
that
in
that
channel
and
see
what
the
worst
case
n
plus
ones
are,
because
I
don't
see
it
here,
but
I
know
that
there's
some
that
are
like
pretty
bad
gotcha.
Okay,
yeah
cool
thanks.
C
So
I
put
something
on
the
agenda
and
to
look
a
little
bit
to
talk
a
little
bit
about
the
possible
future
projects,
I'm
working
on,
which
is
this
thing
with
the
grpc
bottleneck
and
I'm
not
really
sure
what
to
talk
about,
because
I
didn't
really
know
what
audience
to
expect.
C
So
I
can
talk
a
little
bit
about
what
we
discovered
or
give
a
chance
to
ask
questions
about
what
we
discovered,
and
I
can
also
talk
a
little
bit
about
some
of
my
ideas
for
how
we
can
actually
do
something
about
it.
C
B
C
Okay,
so
how
what
we
can
do
about
it?
Okay,
I
think
I'm
gonna
briefly
talk
about
the
problem,
because
I
I
see
it's
been
mentioned
in
the
infrastructure
group
call
and
it's
getting
more
attention
and
I'm
still
trying
to
figure
out
how
to
present
the
problem
to
people
and
and
what's
there
because
there's
different
angles
to
it.
C
C
I
think
I
made
that
less
prominent
in
the
way
I
presented
the
findings,
but
if
you're
coming
from
a
technical
angle,
that
might
be
more
interesting
so
just
how
much
memory
we're
wasting
or
how
many,
which
memory
allocations
we're
wasting,
and
so
I
I
want
to
briefly
talk
about
that.
I
think
just
so
that
we're
on
a
technical
level
we're
on
the
same
page.
C
So,
like
the
real
data
that,
if
you
want
to
have
in-depth
data
on
the
findings,
there's
just
one
thread
on
issue
1041,
where
I
collected
data
for
the
different
scenarios
that
I
summarized
in
the
table
and
the
interesting
the
one
I
want
to
talk
about
is
the
memory
allocations.
So
I
have.
C
This
is
a
capture
of
memory
allocations
when
which
one
did
I
just
click
that
was
the
first
one,
so
that
is
gettingly
a
regular
gitlab
when
there
are
no
cache
hits
and
then
we're
stalling
on
the
cpu.
Basically
we're
getting
we're
hitting
the
grpc
bottleneck
here,
and
this
is
a
30
second
profile,
and
you
see
here
100
gigabytes
of
allocations,
and
this
is
absolutely
horrible,
and
I
didn't
really.
F
C
C
If
you
want
to
transfer
data
from
one
thing
to
the
other
in
unix,
you
need
a
buffer
and
you
say
that
at
minimum
you
need
a
buffer
and
you
say
to
the
kernel:
please
read
some
data
into
this
buffer.
Then
you
want
to
write
it
somewhere
else.
Then
you
say
to
the
kernel:
please
write
the
data
in
this
buffer
to
this
other
thing,
but
you
can
reuse
that
buffer.
You
can
do
that
10
000
times
in
a
loop
and
reuse,
one
buffer
never
allocate
a
new
one.
That's
how
it's
supposed
to
be.
C
But
what
happens
with
grpc
is
that
we
read
into
a
buffer
and
then
we
want
to
send
it
out
as
a
grpc
message
and
then
grpc
allocates
memory
two
or
three
times
for
the
data
in
that
buffer
and
then
sends
that
out
and
throws
the
memory
away
again
in
a
loop.
So
ten
thousand
times
you
are
allocating
memory
for
to
hold
a
copy
of
the
buffer
and
throwing
it
away
and
from.
C
C
This
is
three
gigabytes
per
second
that
you're
allocating
and
you
only
need
to
allocate
32
kilobytes
once
in
the
request,
and
then
you
can
transfer
as
many
gigabytes
as
you
want,
and
you
can
see
this
if,
if
I
scroll
down
the
threads
to
let's
see
so
the
alternative
like
the
the
toy
implementation,
if
I
look
at
the
memory
profile
there,
it
says
seven
megabytes,
7.5,
megabytes
and
a
lot
of
this
is
done
by
the
profiling
system,
because
the
profiler
is
running.
C
I
I
just
wanted
to.
I
don't
know
if
I
highlighted
this
enough
in
the
original
presentation,
just
how
crazy
this
is
and
yeah.
So
that's
one
thing
and
then
the
other
thing
I
can
talk
about
a
little
bit
is
sort
of
a
walkthrough
of
how
the
toy
version
works.
C
If
that's
interesting,
okay,
I
see
some
nodding.
Let's
see
I
wanted
to
do
this.
Let's
just
do
it
in
here.
I'm
already
sharing
this,
so
the
easiest
way
to
start,
I
think,
is
to
look
at
the
client.
C
So
this
is
the
program
that
emulates
workhorse,
and
this
is
the
function
where
we
handle
post
upload
back,
which
is
the
thing
that
transfers
the
bulk
of
the
data
and
what
happens
is
that
it
calls
this
magical
transport
call
function
and
it
says
it
wants
to
do
post
upload
back
now
in
this
toy.
We
cannot
say
what
repo
we
want
to
clone.
It
always
clones.
The
same
repo
so
in
in
real
life,
this
would
not
just
say
I
want
to
do.
C
C
This
is
some
stuff
in
case
the
client
compressed
the
inputs,
but
really
all
that
happens
here
is
that
we
take
the
http
request
body.
We
copy
it
into
the
connection,
so
this
allocates
32k
32
kilobytes
to
do
the
copy,
no
matter
how
many
data
much
data
you're
copying
that
we
call
close
right
on
the
connection
which
signals
to
the
server
that
no
more
data
is
coming,
and
then
we
copy
the
data
from
the
connection
back
into
the
response
body.
C
C
This
is
how
I
wanted
to
talk
you
through
it.
So
transport
has
a
function,
call
it
takes
as
arguments
a
function
that
can
create
a
connection
and
the
request,
which
is
just
some
bytes.
C
So
that
connection
is
that
thing
is
passed
in
so
first
here
we
get
a
connection
and
then
it
calls
this
send
frame
thing
and
it
sends
the
request
and
it
needs
to
obey
a
deadline.
When
it
does
that,
and
then
it
reads
one
frame
back
on
the
connection
and
it
compares
that
to
a
magic
string
response,
okay,
which
is
literally
the
letters
okay,
and
if
it
sees
that,
then
it
gives
the
connection
back
to
the
caller.
C
C
Still
with
me.
So
let's
send
then,
let's
look
for
a
moment.
What
send
frame
and
receive
frame
is
so
what
that
does?
I
decided,
like,
let's
say
a
frame.
Is
I
needed
some
sort
of
chunk
of
data
and
I
thought
one
megabyte
is
enough
to
fit
like
the
request
metadata
for
grpc
call
like
we
have
gitly
feature
flags
and
authentication,
metadata
correlation,
ids
and
whatnot,
hopefully
that
all
fits
in
one
megabyte.
C
So
first
we
check
it's
not
more
than
one
megabytes.
We
set
a
deadline
here,
so
we
don't
stall.
Then
we
write
the
length
of
the
frame.
So
I
say
the
frame
is
12
bytes
in
this
case,
so
we'll
first
rewrite
12
the
number
12
in
binary
on
the
connection,
and
then
we
write
those
bytes
on
the
connection
and
then
we
remove
the
deadline
again.
C
So
that's
all
that
does
and
the
opposite
reads
the
length
header
from
the
connection
into
a
four
byte
buffer
and
then
it
allocates
a
new
buffer
for
the
frame
we're
receiving,
and
it
reads
that
many
bytes
into
the
frame
and
it
returns
it
to
the
color.
So
it's
just
exchanging
a
blob
of
bytes
with
a
length
length,
prefix.
C
C
So
the
server
it
takes
a
as
a
listening
address
and
I
put
that
in
a
global
variable.
So
what
does
rpc
server?
Do?
It
creates
a
tcp
listener
just
with
the
go
standard
library
and
it
creates
a
transport
server
with
a
handle
function
and
it
tells
it
to
serve
on
that
listener.
F
C
C
We
create
a
a
git
command,
we
say:
standard
input
is
the
connection
standard
output
is
the
connection,
run
the
commands
and
then
the
bytes
get
copied
still
with
me.
C
C
C
Is
that
so
we
have
a
connection,
we
haven't
read
anything
yet
so
the
first
thing
we
do
is
read
a
frame
and
those
are
the
request
bytes
and
then
we
create
a
server
session
object
which
holds
the
connection
and
the
deadline,
and
then
we
call
the
handler
with
that
session
object
in
the
request
and
if
the
handler
returns
any
sorts
of
error,
we
call
reject
on
the
session.
C
So
what
is
the
session?
The
session
holds
the
accepted
connection.
It
remembers
if
it's
been
accepted
and
remembers
the
deadline,
and
so
the
server
can
choose
to
accept
the
connection
and
when
it
does
that
that
is
when
it
sends
back
the
okay
frame
like
the
magic
bytes.
Okay
gets
sent
back
when
the
the
handler
code
says.
Okay,
I
want
to
do
this
connection,
so
the
handler
also
has
a
choice
to
say
I
look
at
this
request.
I
don't
know
what
I'm
doing
what
I'm
supposed
to
do
with
this.
C
C
C
B
So
the
interesting
bit
is
how
you'd
fit
that,
because
the
interesting
bit
is
now
where
you
had
the
case
like.
If
I
get
this
I'm
going
to
do
this
go
thing
and
you
want
to
fit
that.
You
want
to
find
a
way
to
do
that
in
italy.
Is
that.
C
Yeah
well,
there
are
two
pieces
to
the
plan,
one
that
I
didn't
just
show
you,
because
I
I
think
it's
not
polite
to
everybody
to
take
that
much
more
time.
But
one
is
that
this
basic
thing
I
showed
you
you
can
enrich
that
so
that
it
carries
all
the
metadata
of
a
grpc
call.
So
you
could
say:
here's
a
method.
Here's
a
protobuf
binary
encoding
of
a
request
message,
and
here
are
the
headers
that
include
authentication,
correlation
id
and
whatnot.
So
you
can
build
this.
C
But
is
it
because
in
the
middle
of
the
grpc
call
we
then
pull
out
the
tcp
connection
and
we
use
that
instead
of
grpc,
because
otherwise
we're
no
better
off
than
we
were
before?
But
the
reason
I
want
to
wrap
it
in
a
sort
of
grpc-ish
layer
is
that
we
can
use
middlewares
like
we
can
get
all
our
logging
and
prometheus
counters
and
authentication
that
would
all
work
exactly
the
same.
Correlation
id
is
tracing.
C
C
The
funny
thing
is
that
getaly
already
does
something
creative
when
prefect
connects
to
kittly,
so
prefect
can
say
can
start
a
connection
that
is
not
grpc
and
gitly
has
a
way
to
hook
into
the
grpc
library
and
detect
that
the
connection
is
not
grpc
and
treat
it
differently
and
right
now
that
is
used
only
in
italy
for
something
it's
called
back
channel
and
it's
used
specifically
for
prefect
stuff.
C
C
So
we
can
generalize
that
and
say
well,
okay,
right
now
you
know
one
type
of
non-grpc
connection,
which
is
a
back
channel
connection,
but
we
now
have
a
different
type
of
non-grpc
connection
for
this
new
thing
that
I
don't
have
a
name
for
and
then
we
do
that.
So
that
way
we
could
have
it
be
part
of
italy.
C
E
C
E
C
C
C
C
That
layer
is
super
inefficient,
but
it's
yet
another
layer,
and
one
thing
I
try
to
achieve
in
this
with
this
toy
thing
and
I
try
to
try
to
work
into
the
design,
is
to
end
up
in
a
situation
where
first
there's
an
exchange
between
the
client
and
the
server
about
what
are
we
going
to
do
and
then
all
that
stuff
gets
out
of
the
way,
and
you
just
get
a
connection
like
I
want
zero
layers
in
between,
so
that
we
have
the
maximum
opportunity
to
do
whatever
yeah.
C
So
we
can
go
as
fast
as
possible.
So
I'm
trying
to
avoid
avoid
layers.
C
E
C
C
Well,
thanks
for
letting
me
talk
about
code
as
you
for
for
15
minutes,.
A
Can
I
ask
you
a
quick
question
before
moving
off
the
topic
yeah
so
with
reading
through
your
thoughts
on
how
you're
going
to
get
there
and
that
whole
issue
where
you've
been
writing
up?
All
of
your
notes
like
at
what
point
are
you
going
to
like
just
write
up
the
the
like
the
conclusion,
like
I'm
trying
to
figure
out
how
much
further
you're
going
to
take
it
before
you
say?
But
this
is
the
conclusion,
and
will
you
get
there
before
you
go
and
leave.
C
I
think
I
should
be
able
to
I.
I
find
it
difficult
to
get
this
out
of
my
head,
because
I
I
have
it
all
worked
out
in
my
head,
but
that's
no
good
because
nobody
can
look
in
my
head.
So
for
me
it's
a
challenge
in
somehow
getting
it
out
of
my
head,
but
I
the
way
I've
been
approaching.
It
is
just
pulling
different
threads
out
of
my
head
and
starting
an
issue
thread
and
writing
that,
but
I
think
I'm
running
out
of
threads
in
my
head.
A
Great
thank
you
because
then,
we'll
be
able
to
take
that
the
proposal
and
the
idea
and
figure
out
how
we'll
slot
that
into
the
workload
that
we
have
and
when
it
would
be
right
to
pick
it
up.
E
Okay,
just
before
I
do
that,
I
thought
I
would
write
that
mechanical
sympathy
alert
for
the
sequel,
endless
ones,
because
we
don't
have
one
and
it
uses
it.
It's
not
like
single
values.
It
uses
like
a
p95
of
how
many
requests
a
call
makes
to
the
database,
and
we've
got
some
rather
bad
things.
Taking
the
worst
seems
to
be
475
p95
per
request,
which
is
pretty
terrible,
and
I'm
looking
forward
to
putting
that
in.
E
E
But
it's
something
that's
been
bugging
me
as
something
that
we
need
for
a
while
and
what
it
is
is
that
we
set
the
slos
on
our
slis
and
we
aren't
very
good
at
reviewing
like
how
how
those
sli's
are
performing
according
to
those
slos
and
that
creates
unhappiness,
because
some
things
always
alert
because
they're,
basically
under
and
then
the
other
thing
that
happens
is
that
some
things
are
just
so
far
off
they're,
like
normal
value,
that
when
things
go
wrong,
we
don't
trigger
because
the
slos
were
set.
E
E
Please
remember,
to
review
the
the
slos
and
it
can
give
you
basically
the
last
28
days,
because
I
think
that's
our
standard
now
and
it
says
over
the
last
28
days
this
sli
has
made,
has
that
you
know
this
component
has
had
this
availability,
and
this
is
where
the
slo
is.
And
then
you
can
see
how
far
up
or
down
it
is,
and
so
I
just
created
a
little
dashboard
for
that.
E
C
E
So,
let's
just
yeah
that
that
doesn't
sound
like
a
great
idea,
sean.
Let
me
just
try
sh.
E
So
can
anyone
hear
me
yep?
Yes,
okay,
good,
okay,
okay,
so,
basically
what
it
is
is
it
presents
it
as
like
how
under
or
over
the
sl,
oh,
the
the
thing
is,
and
so
what
we've
got
here
is
this
there's
no
big
surprise
there,
it's
basically
nine
percent
under
the
slo,
the
shared
runner
cues,
because
of
all
the
problems
that
we've
had
in
the
last
few
weeks.
E
The
problem
with
that
is,
it's
actually
very
difficult
to
to
make
it
any
lower,
because
basically
we're
getting
outside
of
the
realm
of
when
slo
mathematics
works
any
longer
if
we
make
it
any
lower.
So
that's
pretty
much
going
to
have
to
be
a
silence.
Unfortunately,
until
things
get
better,
but
then
with
some
of
these
other
ones,
we
tend
to
sort
of
be
sitting
very,
very
close
to
the
the
threshold,
and
maybe
we
should
push
them
up,
but
I
just
thought
it
would
be
kind
of
interesting
to
build
this
little
dashboard.
E
This
thanos
compactor,
I
think
I'm
going
to
remove
because
it
almost
seems
to
never
run
it
if
we
get
very,
very
sporadic
data.
So
I
don't
think
it's
worth
even
having
in
here
as
well.
It's
also
an
outlier
but
kind
of
what
I
was
imagining
is
we
do
a
review
of
this
and
we
basically
just
the
the
slos
according
to
where
the
data
is
so
that
you
know
we're
alerting
on
unusual
behavior
we're
not
alerting
on
like
services
that
are
just
running
poorly
and
you
know
infrastructure
don't
have
anything
to
do
with
it.
E
So
what
I
want
to
do
is
take
a
look
at
this
and
then
update
the
the
slos
on
a
monthly
basis,
or
maybe
two
months
or
however
often
but
there's
two
things
that
I
think
we
have
to
do
before.
E
We
can
do
that,
and
the
first
is
that
I
think
we
need
to
give
each
sli
its
own
slo,
because
at
the
moment
we
set
the
slo
at
the
service
level
and
the
slis
are
all
have
to
be
the
same,
and
what
we
find
is
that
they're
actually
quite
different
and
there's
no
real
reason
why
they
should
all
have
the
same
slo,
and
so
that's
one
change
that
I
think
we
should
make
before
we
do
this.
E
B
Because
I
was
looking
into
separating
out
api
like
graphql
from
the
api,
because
they
just
look
different.
But
I
would
like
my
initial
thought
was
to
put
the
thresholds
differently
inside
the
objects
for
both
slis.
Rather
than
setting
a
different
slo
for.
E
For
them,
so
some
sometimes
you
can
do
that,
but
it
depends
on
the
histograms
right
because,
like
some
histograms
are
like
you
know,
10
seconds
and
30
seconds,
and
you
know
you
can't
you
can't
do
fine-grained
adjustments
on
those
without
changing
the
application.
So
so
sometimes
you
don't
really
have
great
things
on
that,
but
you
know
that's.
So
if
we
take
a
look
this
by
the
way
clicks
through.
So
if
you
go
take
a
look
at
like
the
sidekick,
one
is
a
pretty
good
example
like
the
sidekick.
E
E
Once
it
loads,
so
that's
that's
where
it
is
at
the
moment
99.5
and
then
you
know
we
were
1.3
below
that
actually
over
the
month.
However,
there's
another
part
to
this,
and
that
is
that
we're
only
evaluating
on
the
one
hour
and
the
six
hour
thresholds
and
in
order
to
make
this
fair,
we
should
also
be
evaluating
on,
like
the
three-day
10
threshold,
to
say
that
we,
this
yeah
this
one
as
well,
is
also
super
sporadic.
So
I
might
come
up
with
a
way
of
of
filtering
out.