►
From YouTube: Scalability Team Demo - 2020-12-17
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Hello,
so
I
have
the
first
item
on
the
agenda
and
I
already
brought
it
up
to
rachel
in
our
one-on-one
doc,
and
she
mentioned
there
that
we
should
talk
about
it,
but
perhaps
it's
better
to
talk
about
with
everybody.
We
have
this
long-standing
issue
where
italy,
canary
is
alerting
on
us
and
that's
because
the
well,
our
repositories
are
there
and
they're
very
busy.
Every
time
we
look
into
it,
we
find
a
different
mold,
whack
and
like
yeah.
B
I
don't
know
from
the
isolating
perspective,
but
it
seems
like
it's
very
hard
to
like
disentangle
like
what,
especially
when
we've
got
a
bunch
of
alert
silence.
What
behavior
on
italy
can
area
is
because
it's
running
new
code
and
what
behavior
on
gitly
can
area
is
because
it's
using
these
high
traffic
repositories.
B
C
C
Often,
do
we
actually
find
a
problem
like
like
a
gateway
problem
and
not
our
repo
is
too
busy
problem.
D
Did
that
was?
Was
it
a
canary
installation
like?
Was
the
code
different
on
there
like
when
it
was
first
separated.
C
I'm
not
sure
it
could
also
be
that,
so
I'm
not
really
sure
why
you
had
a
canary
another
issue.
Here
is
the
order
of
deploys
and
the
tight
coupling
between
the
gitly
version
and
the
rails
version,
and
I
think
at
one
point
we
had
a
rule
where
we
said
you
should
always
deploy
italy
before
you
deploy
rails.
I
think
actually.
C
This
is
why
it
happened
now
that
I
think
back
when
doing
a
deploy
upgrade
gitly
before
you
upgrade
rails,
because
in
rails
we
never
account
for
the
fact
that
maybe
we're
talking
to
an
older
version
of
kittery
rails
always
assumes
it's
talking
to
the
latest
version
of
gitly
or
something
newer.
C
So
then
we
wanted
to
have
a
canary
rails.
We
had
a
problem
because
that
would
talk
to
an
old
italy.
So
then
we
created
a
canary
italy
that
that
we
can
deploy
ahead
of
canary
rails
just
from
a
correctness
and
compatibility
viewpoint,
I
I
think
jarv
would
also
know
I
think
he
was
involved
in
this.
I
think
that's
why
it
happens.
C
D
So
when
we
say
it
was
its
own
problem,
do
you
mean
that
it
was
just
causing
alerts
and
noise
and
things
to
be
looked
at
that
were
distracting
then,
from
what
actual
problems
were,
because
the
the
the
usage
of
it
is
so
different.
C
Yeah
we.
C
We
constantly
had
people
constantly
getting
paged
on
that
server
where
github
or
gitlab
was
on,
and
I
I
know
there
was
a
discussion
and
there
was
something
slightly
iffy
about
moving
it,
but
it
was
some
sort
of
pragmatic
decision
like
we're,
probably
better
off,
to
put
it
there
and
not
make
it
everybody
else's
problem.
C
I
think
they
shouldn't
be
because
it,
I
think,
actually,
I
also
think
we've
had
this
discussion
before
somewhere,
but
on
on
a
big
gateway
server,
with
lots
of
different
repositories,
you're
sort
of
seeing
an
average
across
a
lot
of
different
things,
but
on
getty
canary,
where
it's
just
get
up
orchid
lab.
It
reflects
the
performance
of
that
one
repo
a
lot
and
that
repo
is
used
very
heavily.
C
D
C
A
little
bit
I
mean
we
also
have
marquee
customer
italy
servers
in
in
a
way
gitli
canary
is
the
marquee
customer.
It's
the
marquis
gatling
server
for
ourselves.
D
But
on
the
marquee
gitly
servers
they
have
the
same
slis.
C
D
Well,
that's
an
interesting
approach
that
we
can
take
because
I
feel
like
playing
whack-a-mole
forever.
It's
never
well!
I
mean
it's
forever.
It's
never,
but
there's
always
going
to
be
something
else
that
comes
up
and
that's
why
I
thought
it
was
better
to
talk
about
this
from
a
more
fundamental
perspective,
because
if
it's,
if
it's
fundamentally
different,
then
maybe
we
need
to
you
know
be
treating
it
differently.
Maybe
these
sli's
do
need
to
be
different
and
I
think
finding
out
what
the
marquee
customer
situation
is
is
is
good.
C
Because
that's
the
your
question,
like
is,
are
we
unique?
No
we're
not
unique.
Well,
what's
unique
about
us.
Is
that
it's
a
busy
repo
and
we
care
if
performance
is
bad
because
for
the
most
part
we
ignore,
if
people's
performance
is
bad
because
we're
blind
to
it
or
we're
deaf
to
it.
We
don't
hear
it
if
their
kid
performance
is
bad,
so
we
just
ignore
it.
But
if
gitlab
or
gitlab
ci
is
stuck,
then
people
on
slack
start
clamoring,
and
then
we
do
something
about
it.
C
Okay,
that's
a
very
crude
way
of
putting
it,
but
I
I
I
think
it
is
more
or
less
true
that
we
have
a
hard
time
observing
what
the
user
experience
is
at
the
individual
level,
and
we
look
more
at
the
system
of
how
is
this
server
doing
and
is
the
server
do
we
find
most
of
the
time?
And
if
there
is
10
000
users
that
are
on
that
server,
then
that
doesn't
mean
that
all
10
000
of
them
are
happy
and
that's
also
what
marquee
customers
are
about.
C
Those
are
people
that,
where,
if
they're
not
happy,
we
actually
have
to
do
something
about
it,
because
there's
a
big
contract
behind
it.
A
Mean
what
about
the
things
that
we
have
like
looking
into
this
once
in
a
while
has
made
us
improve
things
for
everyone.
C
Yeah,
no,
we
should
make
it
better.
I
I'm
not
saying
we
shouldn't,
but
it
we
should
probably
treat
this
the
same
way.
We
treat
marquis
performance.
B
B
C
No,
I
think,
that's
an
accident
because
well
also,
if
there's
nothing
on
the
canary
server,
then
if
it
doesn't
get
any
traffic,
then
you
get
nothing
from
it.
So,
let's
just
give
it
one
of
the
biggest
chunks
of
traffic.
We
have.
A
C
Yeah
but
then,
if
it
turns
out
that
that
works
really
well,
then
we
probably
want
to
do
that
for
our
marquee
customers
too,
so
it
could
be
sort
of
a
staging
advanced
test
area
for
what
the
marquee
server
should
be
like,
but
I
mean
because
the
question
is
again
like
why
aren't
alerts
going
off
on
marquee
servers
all
the
time,
and
why
are
things
better?
There.
D
I
think
that's
worthwhile
investigating,
because
I
think
that
asking
the
question
about
the
difference
between
this
and
a
marquee
server
might
give
some
interesting
answers,
because
the
the
idea
behind
it
is
the
same
like
you
want
to
have
a
certain
type
of
usage
isolated.
D
D
Well,
when
for
context,
a
marquee
server
is
somewhere
where
specific
customers
have
been
identified
to
be
moved
on
to
specific
specific
sets
of
hardware.
So,
instead
of
having
everyone
altogether,
we've
isolated
certain
customers
on
request
onto
specific
hardware,
to
try
and
isolate
noisy
neighbor
problems.
D
So
that's
why
we're
talking
about
having
our
own
repositories
on
a
separate
server,
because
it
seems
to
follow
the
same
pattern
of
isolating
ourselves
as
a
customer
onto
specific
hardware.
We
just
added
the
whole
complexity
of
let's
also
make
this
a
canary
installation
which
is
like
a
pre-release
available
as
well,
which
yeah
probably
might
use
the
watches
a
bit
here.
B
I
think
an
actually
useful,
given
italy
canary,
would
probably
just
be
an
arbitrary,
regular
gitly
server
and
that,
like
I've,
said
this,
one
should
be
like
a
marquee
server
and
then
we
we
have
two
sort
of
dimensions
to
think
about
that
one.
But
I
think
if
you're
talking
about
pre-release
codes,
then
that
probably
makes
more
sense.
D
B
If
this
is
even
like
an
important
thing
to
the
italy
team,
but
in
terms
of
like
new
deployments,
I
think
because
they
have
to
kind
of
have
to
be
coupled
to
repos,
because
that's
how
gita
is
deployed,
isn't
it
then
it
will
make
more
sense
for
it
to
just
be
like
you
know
if
we
said
that,
because
I
think
we're
trying
to
move
towards
canary
just
being
like
five
percent
of
traffic
goes
to
canary
or
whatever
right.
That
would
be.
The
closest
we
could
get
is
to
like
deliberately
pick
up,
not
remarkable.
C
I
I
think,
maybe
we
need
to
talk
to
people
on
the
delivery
team
about
this,
because
this
whole
aspect
of
what
gets
deployed
when
it's
kind
of
complicated
and
that's
something
that
this
really
their
focus.
I'm
not
even
sure
if
this
strict
well
not
sure
why
we
would
be
gone
with
this
strict
relation
between
deployed
gitly
before
rails.
I
don't
know
if
that's
still
true,
but
if
we
had
an
arbitrary
gitly
server
act
as
the
canary
server,
then
that
would
usually
get
non-canary
rails
traffic.
A
D
C
And
another
possibility
is
that
we're
stuck
with
this
being
the
canary
italy
server
for
practical
reasons,
but
that
we
can
still
say
we
run
it
as
if
it's
a
marquee
server,
we
apply
the
same
standards,
slice
alerting
or
whatever
is
in
place
for
marquee.
We
we
treat
it
the
same,
and
maybe
we
already
do
but
right
now
we
don't
know.
D
Yeah-
and
I
think,
funding
out
is
important,
I
think,
finding
those
things
out
of
the
first
steps.
You
have
to
decide
what
to
do
next,
because
yeah
I'm
just
like
to
spend
more
time
just
bashing
the
biggest
problem
as
they
seem
to
come
up
every
week
or
two
or
what
the
next
different
thing
is.
B
And
the
other
thing
is,
I
think
you
saw
there
was
find
commit
bob
again.
A
C
C
Yeah
yeah,
probably
so
it
used
to
be
that
every
request
got
its
own
fresh
cut
file
process
and
that
was
slow.
And
then
we
came
up
with
this
thing
but
yeah.
It's
still
not
great.
B
C
C
It's
yeah
I
get
a
link
and
get
caching
is
an
interesting
problem
and
that's
one
of
the
reasons
why
I
think
it's
on
our
agenda
to
work
on
that
and.
C
I
don't
think
it's
a
problem.
We
ever
really
solved
well
when
we
started
the
get
clean
project.
I
pushed
back
on
having
that
in
scope
because
I
think
just
getting
italy
working
was
hard
enough,
but
the
result
is
that
there's
not
a
it
hasn't
been
explored
much
and
it's
yeah
it's
one
of
those
things
that
it
spans
across
multiple
components
across
the
whole
application
and
we're
maybe
just
not
doing
it
right
or
in
the
right
place.
B
Yeah,
I
think,
a
while
ago
I
created
an
mr
that
would
enable
this
ref
caching
everywhere,
but
then
the
problem
was.
I
wanted
to
use
the
mr
to
see
what
specs
broke
to
like
get
an
idea
of
like
is
this
feasible,
but
the
problem
is
that
most
of
the
spec
failures
were
like
spec
failures,
like
you
know
like
in
a
real
situation.
B
This
wouldn't
be
a
problem,
but
in
the
context
of
the
spec,
where,
like
the
spec
is
like
you
know,
doing
one
thing
then
immediately
running
a
request
and
expecting
to
get
a
different
result.
Like
you
know,
it's
it's
another
one
because,
like
you
know
in
the
specs,
the
the
request
context
is
the
whole
spec
for
simplicity.
So
basically
I
didn't.
I
didn't
pursue
it
any
further.
C
E
C
C
D
I
think
I
agree
that,
like
there's
a
lot
more
there's
a
lot
more,
that
we
could
potentially
do
to
make
the
access
faster,
and
I
think
it's
a
case
of
deciding,
if
that's
the
right
thing
to
do,
and
I
think
that
the
going
starting
with
the
path
of
finding
out
the
difference
between
this
and
the
marquee
nodes
determining
the
best
use
of
canary
and
then
it
might
be
that
keeping
it
on
keeping
the
arrangement
as
it
is,
is
actually
the
right
thing
to
do,
because
then
it
does
enable
us
to
then
look
at
this
like
how
we
can
how
we
can
improve
the
usage
across
the
board.
D
We
just
have
to
decide
if
that's
the
most
like
the
best
use
of
the
time,
given
that
it's
mainly
us
that
seem
to
do
this,
and
I
know
that
making
improvements
for
us.
It
makes
an
improvement
for
everyone,
because
it's
all
the
same
thing
underneath.
C
C
B
I
think
yeah
I
think
zj
mentioned
that
before
as
well
like,
if
you
think
about
postgres.
Obviously
postgres
does
some
other
stuff
with
like
buffers
and
so
on.
To
like
you
know,
you
know,
have
frequently
used
rows
in
in
memory.
But
basically,
if
we
want
our
database
query
to
be
quick,
then
we
catch
the
result
on
the
rail
side,
because
the
rails
side
knows
how
it's
using
that
and
knows
when
it
can
invalidate
that-
and
I
think
the
same
applies
to
italy.
B
C
Thanks
for
bringing
that
up,
because
that
is,
I
was
talking
about
me-
pushing
back
on
cashing
in
italy,
but
that's
also.
One
of
the
reasons
for
pushing
back
is
that
it
keeps
what
whatever
gitly
says,
is
authoritative
and
that's
what
actually
in
the
reap
is
in
the
repo
and
knowing
when
you
can
ignore.
What's
in
the
repo
and
serve
something
still
instead
is
complex
and.
C
So
it's
it's
a
real,
but
it
it
it's
yeah.
I
don't
know
I
mean
in
the
past.
I
would
have
maybe
said,
create
or
source
code
owns
this
because
they
own
the
rails
code
that
touches
git
repositories,
but
I'm
not
really
sure
if
that's
practically
true,
because
they
are
not
the
like
a
lots
of
other
parts
of
the
repos
of
the
application
care
about
stuff.
That
is
in
repositories,
so
it
might
be
a
little
bit
like
this
is
one
of
those
like
sidekick
efficiencies.
C
B
I
think
it
is
more
clearly
owned
in
the
rails
application,
because
only
a
few
teams,
like
you
know,
there's
a
source
code
team
that,
like
you,
know,
deals
with
like
how
rails
like
accesses
get
repositories
like
essentially.
So
I
know
that
other
teams
do
that
as
well,
but
I
think
they
would
be.
The
natural
owners,
like
they've,
got
a
lot
on,
of
course,
okay
yeah,
but
do.
A
D
So
I
suppose
after
we've
done
the
investigations
here
about
the
slis
and
the
marquee
notes
when
it
comes
to
now
talking
about
how
to
move
forward.
It's
about
getting
source
code
involved
with
this
and
working
with
them
too,
to
try
and
see
what
can
be
done
next,
but
I
think
it
also
depends
very
much
on
what
we
find
with
the
marquee
nodes
and
and
the
comparisons
there.
C
Yeah-
and
there
is
another
interesting
angle
here
which
is
not
about
jumping
in
and
trying
to
fix
the
problem
and
that's
about
observability,
because
it's
very
observable
what
kitly
does,
because
you
have
these
rpc
calls
and
we
lock
them,
and
we
know
how
long
they
take.
But
what
happens?
How?
Often
something
in
the
application
tries
to
look
up
a
commit?
C
Is
a
kind
of
rp
could
be
an
rp
sequel
like
that.
There
might
be
there's
stuff
that
the
application
tries
to
get
out
of
the
repository
and
then
that's
and
then
there's
the
things
that
actually
go
get
sent
and
asked
from
italy,
but
what's
happening
in
the
rails,
app
as
we
try
to
access
a
repository
is
maybe
not
as
observable.
C
C
About
the
things
that
one
commits
and
what
what
the
patterns
are
there
and
things
that
want
the
same,
commits
and
do
they
hit
a
cash
in
rails,
or
does
it
become
a
gitly
call
like
it.
A
C
What
like,
what?
What
does
the
behavior
look
like?
What
is
the?
What
are
the
patterns?
What
are
the
opportunities
for
optimizing
like
if
we
see
that
something
gets
repeated
all
the
time
like,
and
you
know
I
am
I'm
not
like,
can
we
make
the
well,
I
suppose
we
can
make
the
argument?
It's
not
efficient
enough
by
just
pointing
to
the
number
of
critically
calls
and
say
that
needs
to
go
down
so
that
point.
C
Thing
not
not
intentionally,
but
now
that
you
mention
it.
Yes,
that
is
maybe
what
I'm.
C
We
used
to
have
observability
inside
the
the
ruby
codes
where
ruby
method
calls
would
get
tracked
and
we
would
dump
crazy
amounts
of
data
into
influx
db
to
see
what
the
application
is
doing
and
we
had
to
stop
that
because
we
were
generating
too
much
data
and
that's
why
bob
is
suggesting.
Maybe
we
shouldn't
go
there
again.
F
Can
I
ask
one
question:
so
what
is
the
current
status
of
the
tracing
on
that
option?
I
saw
that
we
use
the
lucky
ruby
for
racing
and
stuff,
so
is
it
yet
deployed
into
russian
and
we
can
get
the
data
from
that?
It
is
that
in
my
local
environment,.
A
B
He
goes,
he
goes
working
on
that
and
I
think
that's
a
good
point
like
it
feels
like
we're
punching
a
few
things
to
that.
But,
like
you
know,
it
would
be
nice
to
revisit
some
of
these
things
once
we
have
tracing
enabled
in
production
and
see,
if
that,
like
lets
us
find
out
the
answers
to
these
things
without
going
and
building
our
own
thing.
B
C
So
tracing
is
in
a
strange
state
because
the
work
started
one
or
two
years
ago,
but
it
never
like
deploying
it
into
production,
stalled
for
a
very
long
time
and
it's
it
is
moving
now.
But
it's
still
not
there.
That's
why
local
development,
you
can
see
tracing
and
it's
like
hey,
we
have
tracing
and
then
why
isn't
it
in
production,
because.
F
Yeah
actually,
on
the
local
environment,
I
only
raised
it
to
ratio
that
it
is
not
very
helpful
to
see
how
the
god
is
behaving.
So
I
think
that
if
we
have
like
in
enhance
the
observability
in
the
tracing,
we
can
just
like
have
a
really
clear
vision
about
how
things
and
how
our
application
is
doing
on
the
ocean.
Maybe
we
can
enjoy
some
sampling
about
one
0.5
percent
or
even
with
the
long
running
cost,
and
we
can
get
the
data
from
that.
So
maybe
it's
really
helpful
long
term.
C
I
think
this
argument
I
agree.
These
arguments
are
much
if
we
want
to
argue
that
something
can
be
improved
and
we
have
tracing
data
that
would
be
easier.
Yeah.
G
G
So
it's
it's
actually
very
close.
There's
a
readiness
review,
that's
currently
underway
and
like
I'll.
Where
should
I
paste
it?
Sorry,
I
don't
have
the
doc
open.
G
G
So
if
you
look
at
the
staging
traces,
there's
there's
a
lot
of
stuff
in
there
that
I
think
we
should
clean
up
and
there's
some
I've
got
some
open
mr's
to
kind
of
clean
up,
some
of
them
like,
for
example,
we
have
like
most
of
its
health
checks
at
the
moment,
because
we
have
so
many
health
check,
requests
and
they're
all
getting
traced,
and
we
should
turn
that
off
and
obviously
we
want
to
go
to
remote
sampling
and
all
these
features,
but
as
a
sort
of
mvc.
G
We
should
have
it
in
really
soon,
depending
on
when
that's,
I
think,
hopefully,
it'll
make
the
production
change
log
at
the
end
of
the
year,
so
otherwise
it'll
be
early
january,
that
we'll
have
tracing
in
production
and
it
will
make
a
massive
difference.
B
D
Where
we
got
to
was
we
started
talking
about
the
fact
that
we
still
have
alerts
silenced
on
gitly
canary,
and
we
had
a
conversation
about
what
we
could
do.
That's
slightly
different
there
in
some
investigations.
D
C
D
To
get
to
the
answer
of
how
distributed
tracing
got
there
and
then,
when
we
brought
up
well,
we
have
distributed
tracing
locally.
Is
this
not
in
production?
Yet
because
that
might
give
us
some
more
guidance
so.
G
Yeah
so
yeah
I
mean
I
had
a.
I
had
a
call
with
jarv
and
marin
yesterday
about
this,
because
the
other
problem
is
that
we
have
affinity
for
certain
projects
for
canary.
G
So
if
you
go
to
kit
lab
or
gitlab
by
default,
if
you
don't
have
any
other
setting,
it
will
opt
you
into
canary,
whereas
if
you
go
to
like
f
droid,
f,
droid
or
another
open
source
project,
it
will,
by
default,
opt
out
of
canary
and
so
because
we're
driving
canary
web
traffic
to
web
canary,
which
then
goes
to
the
gidley
canary.
G
It's.
It
has
a
knock-on
effect,
so
we
have
silences
on
like
web
alerts
for
canary,
which
are
also
very
unhealthy,
and
so
there
was
a
discussion
on
maybe
monday,
between
marin
and
jarv
and
myself,
and
we
we
spoke
about
either
breaking
up
that
opt-out
scheme
that
we
have
on
the
web.
But
marin
was
very
against
that.
G
Basically,
what
I'm
against
is
having
the
silences
and
one
of
the
one
of
the
things
that
we
spoke
about
was
you
know.
Basically,
the
repository
is
slightly
pathological,
the
main
repository
on
canary
and
it's
always
going
to
be
slow
and
because
we
don't
get
this
kind
of
normal
mixture
of
traffic.
We.
G
It's
very
problematic,
so
one
of
the
things
that
we're
talking
about
is
until
we
get
the
clone,
the
clone
sharing
or
the
the
pac
file
sharing
or
whatever
we're
calling
it.
One
of
the
things
that
we
should
consider
is
moving
gitlab,
org,
gitlab
to
prefect
and
and
replacing
it
on
on
the
gilly
canary
with
a
bunch
of
other
gitlab
traffic,
because
we're
we're
the
best
testers
of
gitlab.
G
However,
it
only
makes
sense
on
prefect
if
we
also
have
distributed
reads,
and
that
is
not
on
at
the
moment
so
distributed
reads
and
kidney
cluster
which
share
the
reads,
are
between
multiple
prefix
nodes,
and
so
we
we
might
be
able
to
reduce
the
the
load
that
we
see
on
on
those
nodes
by
sharing
it
between
three
nodes.
But
but
that's
also
like
a
beta
feature
at
the
moment.
G
C
What
we
were
wondering
and
what
you
maybe
know
it's
like
how
is
the
gitlab
or
getback
different
from
our
housekeeping
canary
different
from
marquee
italy
servers,
because
I
presume
those
are
also
high
traffic
repos
with
special
needs,
and
do
they
have
different
slis?
Why
don't
we
get
alerts
from
them
all
the
time
so.
G
Actually,
the
marquee
servers
are
very
like
the
the
historical
reason
why
we
even
have
those
marquee
servers
was
because
there
was
a
horrible
bargain
performance
back
in
gideon
and
it
was
getting
worse
and
worse
and
worse
and
our
our
most
valued
customers
were
getting
the
angriest.
And
so
we
were
moving
them
on
to
very
low
utilization
machines.
And
so
the
marquee
machines
are
actually
very
low
utilization
and
we
never
kind
of
completed
that
and
moved
more
people
on
there,
because
it
was
really
an
emergency
they're.
All
for
proficient.
A
G
C
G
G
Let
me
just
find
it
for
you.
It's.
C
So
my
thinking
is
the
reason
we
don't
get
alerts
about
the
marquee
servers.
All
the
time
is
because
they're
over
provisioned,
so
they're
so
yeah
they
can
just
handle
what
whatever
we
throw
with
them.
G
So
so
yeah,
so
so
that
was
the
first
option
that
I
listed
and
it's
kind
of
like
we
it's
a
it's
either
or
between
that
and
and
and
prefect
will
give
me
cluster.
I
think,
are
we
calling
it
gilly
cluster?
Now
it
seems
to
be
sort
of
getting
transparent.
C
I
think
prefect
is
just
a
component
that
is
the
routing
component.
Okay,.
G
So
here
I
wrote
down
all
of
those.
It
sounds
like
we
had
the
same
conversation
yeah.
That's
that's
thanks,
but
there's
a
note
where
we
just
kind
of
describe
all
the
options
and
that
that
machine
has
got
30
cores
at
the
moment
and
if
you
go
look
through
the
metrics,
it's
totally
cpu
saturated,
so
it
can
take
up
to
one
and
a
half
times
the
amount
of
time.
It's
able
to
do
work
to
schedule
the
work
on
a
cpu
which
is
bad.
G
G
And
so
we've
already
switched
over
once
to
compute
optimized,
whatever
the
length.
C
G
G
So
I
think
that
on
the
back
of
that
and
I'm
my
my
time
is
short
because
I'm
going
on
holiday
on
friday,
and
so
if
you
want
to
drive
that
or
at
least
get
alberto
during
drive
it
with
alberto
and
get
that
machine
scaled
up
like
it's
going
to
give
everyone
a
much
more
peaceful
christmas
or
end
of
year
time
right.
G
And
so
I
think
we
should
just
do
that
because
it
was
kind
of
an
either
or
and
we've
discovered
that
the
ore
option
is
is
not
going
to
happen
in
the
next
few
weeks
and
we'll
probably
get
broke
down
with
the
change
lock.
So
we
it.
G
Because
it's
yeah,
because
it's
a
I
mean
we
could
put
a
60
course
it
it's
it's
throttled
on.
I
know
it's
it's
it's
it's
not
great,
but
it's
the
thing
it's
throttling
on,
as
far
as
I
can
tell,
is
the
cpu
so
like
throw
more
cpu
at
it
until
we
have
a
better
solution
which
is
coming.
It's
not
like.
We
don't
have
answers
here.
D
I
yeah,
you
also
would
have
missed
the
beginning
of
the
call
and
I've
said
it
a
couple
of
times,
so
I
apologize
to
everyone
who's
listening,
but
it
just
feels
like
continuing
to
do
one
small
change
at
a
time
isn't
effective,
like
we
just
keep
doing
it
every
time
the
triage
rotation
comes
around.
Someone
finds
a
little
something
that
we
can
do.
That
makes
a
difference
but
makes
a
little
dent,
but
we
still
have.
We
still
have
silenced
alerts.
So
the
reason
we
were
discussing
it
in
this
course,
because
it
just
hasn't.
D
A
giddily
canary,
so
there
just
has
to
be
something
different
that
we
can
do
so
so
I'm
happy
to
to
carry
on
trying
to
drive
this.
But
I
see
that
the
issue
that
you've
linked
here
that
one
two
one
four
two
is
specifically
about
moving
it
to
get
to
the
cluster.
G
So
if
you,
if
you
look
at
the
the
the
last
issue,
that
I
linked
with
the
hash
ref
to
the
note
to
one
two
one,
one,
eight
that's
got
all
the
that's
got
all
the
options
that
we
discussed
in
the
call
and
there's
even
an
agenda
link
in
there.
I
think
as
well.
Okay,.
D
I'll
follow
the
conversation
on
here
and
see
what
I
can
push
forward
here.
It
just
feels
like
there's
something
different.
There's
a
there
must
be
a
different
way
to
try
and
get
through
this,
because
what
we're
doing
at
the
moment
is
just
it's:
it's
repeated
work
and
yes,
we're
making
the
improvements,
but
it
just
doesn't
feel
like
it's.
G
With
marin,
because
and
job
because
we
had
a
like
a
very
you
know,
we
spoke
about
this
a
lot
on
monday
in
a
very
animated
call,
and
but
I
think
that
the
the
you
know
draft
did
actually
make
a
change
on
monday
as
well,
which
has
made
us.
Are
you
taking
into
account
the
change
of
job
made
on
monday,
because
it
did
actually
make
a
bit
of
a
difference?
But.
D
Okay,
so
I
suppose
for
next
steps
here,
I
think
we
should
still
write
up
on
the
issue
as
as
bob
is
described
above
the
conversation
that
we've
had
here
about
the
differences
between
the
servers
and
the
discussion
about
the
marquee
customers.
I
think
it's
good
for
everything
that
we've
said
here
to
be
recorded
on
the
issue,
because
it
shouldn't
just
exist
in
in
this
format.
I'll
also
continue
the
conversation
about
the
on
the
issue.
D
That
andrew
is
linked
here
about
what
we
can
do
with
this
in
the
shorter
term,
but
I'm
glad
that
we're
just
talking
about
this
in
a
different
way
and
seeing
what
else
we
could
do
here
while
you're
here.
G
No
it's
the
same,
but
but
monkey
like
I,
I
don't
feel
like
maki
is
particularly
tied
into
this,
because
those
those
machines
have
got
like
20,
utilization
of
disks
right
and
and
they're
just
they're,
just
idling
and
so
they're,
very,
very
different
from
gitlab
or
gitlab.
If
you
look
at
them
by
any
measure,
we've
got
a
dashboard
called
get
kiddie
rebalancing
dashboard,
which
is
kind
of
a
proxy
attempt
that
I
made
to
take
all
the
different
factors
that
we
consider
to
be
utilization
and
match
them
up
and
on
those
like
by
every
measure.
G
Yeah
and
they're
really
bad
labeling,
the
the
thing
that
I
think
I
like,
I
think
the
way
we
have
to
frame
it
is
the
ultimate
fix
for
this
is
going
to
be
back
files
that
back
file
sharing
or
pack
file,
caching
or
whatever,
because
that
is
that
this.
This
will
be
the
biggest
repo
impact
for
that
fix.
C
It
looks
like
it's
a
big
chunk,
but
at
this
point
I
don't
know
I
on
the
one
hand,
I'm
I'm
wildly
excited
about
it.
On
the
other
hand,
I'm
a
bit
scared
that
I'm
too
excited,
but
I
looked
at.
G
I
looked
at
kind
of
concurrent
requests
for
clones
as
a
kind
of
bad
proxy
for
how
effective
it
would
be,
and
you
know
so-
I
was
looking
for
like
how
many
concurrent
requests
were
going
through
for
for
different
repositories
and
git
label.
Gitlab
was
like
so
high
compared
to
everything.
C
Yeah
well,
if
we
know
that
the
incidents
are
about
things
where
the
pre-clone
script
is
failing,
then
that's
or
it
is
not
offering
enough
relief,
then
yeah.
That
is
exactly
the
problem.
This
solves,
and
that's
also
why
we
came
up
with
the
idea,
because
he's
all
incident
about
that.
It's
just
that
in
in
the
larger
scheme
of
things,
there's
all
sorts
of
bad
things.
C
G
B
This
really
wouldn't
make
sense
in
a
couple
of
minutes.
It
was
literally
just
if
there
was
nothing
else
on
the
call.
G
I've
got
I've
got
something
that
totally
is
unprepared,
but
maybe
I
just
run
it
by
people,
but
also
it
might
it
stopped
me
if
you
think
it's
not
the
right
thing
so.
G
G
You
know
all
of
the
metrics
that
you
see
here.
This
dashboard
is
at
the
moment
it's
in
the
wrong
order,
but
all
of
this
is
being
aggregated
by
component
service
type
and
we
have
a
set
of
aggregations.
G
Then
we
have
something
called
like
node
level
monitoring
and
we've
only
turned
that
on
for
gidly,
and
what
that
does
is
it
gives
us
a
different
aggregation
set
right,
and
so
here
we're
saying,
because
kidney
each
getting
node
is
like
a
single
point
of
failure.
We
want
to
kind
of
record
our
slis
and
what
our
error
rate
is
on
per
node
basis,
so
we
have
a
different
set
of
aggregations,
and
that
gives
us
all
these
dashboards
here.
So
we
can
look
at.
G
C
We
started
we
sort
of
manually
have
this
already
for
redis
right,
because
we
have
three
reddish
services-
and
this
is
saying:
gitli
is
maybe
50
services,
but
we
don't
want
to
present
it
as
that
so
yeah,
yes,.
G
Exactly
so,
red
is
sweet,
but
reddest
the
workloads
are
slightly
different
and
we
have
different
of
monitoring
on
them
but
yeah,
and
then
we
also
have
at
the
top
level
we
have
like
the.
Let
me
just
choose
a
different
one,
because
this
is
definitely
broken,
but
we
also
aggregate
the
service
level
indicators
for
a
service
up
to
this
top
level
so
that
we
take
all
of
the
slis
and
we
mix
them
in
together
and
we
get
a
top-level
aggregation.
G
So
there
we've
got
three
different
aggregations
and
then
for
each
of
those
we
have
a
different
function
in
json.
It's
like
give
us
the
service
level.
Aggregation
error,
charts,
give
us
the
component
aggregation
request,
chart,
and
so
there's
this
like
matrix
of
of
these
things,
and
so
because
we're
moving
to
kubernetes
now
and
we
have
four
different
clusters
that
are
almost
running
independently
java
and
and
we're
going
to
be
deploying
to
them
at
different
stages,
so
we're
actually
not
going
to
roll
them
out.
G
At
the
same
time,
we
can
have
different
deployments
in
each
of
them.
Java.
Very
much
wants
us
to
be
also
monitoring
things
at
a
her
cluster
basis.
So
he
wants
to
be
able
to
to
say
how
is
the
web
service
doing
in
the
u.s
east
1c
cluster
as
opposed
to
the
uss
one
b?
And
if
it's
not,
if
that
particular
cluster
is
not
managing
it,
it's
slos
we'll
actually
get
an
alert
that
says
usd
1c
is
is
is
failing.
G
We
also
have
you
know
we
roll
things
up
by
canary,
and
so
I
just
started
thinking
about
that
in
all
the
new
panels
and
dashboards
and
everything
that
we've
got
to
create,
and
I
started
getting
very
sad
because
there's
there's
lots
and
lots
of
stuff,
and
so
what
I
did
was
I
take.
I
took
all
of
that
stuff
and
I
abstracted
it
out
into
what
I
call
an
aggregation
set
and
this
is
still
a
work
in
progress,
but
it's
nearly
finished
and
it's
reduced
a
lot
of
the
code
like
there's
so
much
stuff.
G
That's
going
to
be
deleted
from
this,
but
effectively
what
it
says
is
each
set
of
aggregations
has
got
its
own
set
of
metrics
behind
it
and
then,
instead
of
saying
well
give
us
the
charts
for
service
level
aggregation
so
give
us
the
the
request,
series
and
use
this
aggregation
set
and
then,
when
it
needs
to
look
up
what
the
five-minute
burn
rate
or
the
one-hour
burn
rate.
Is
you
know
whichever
one
of
the
metrics
it
needs?
G
Is
we
just
plug
that
in
and
it's
kind
of
like
pluggable,
and
then
we
can
go
from
having
like
20
different
charts
that
are
all
very
similar
down
to
down
to
four
charts.
You
know
one
for
errors,
one
for
requests,
one
for
and
we
just
plug
in
the
appropriate
aggregation
set
into
that,
and
it
will
it'll.
G
You
know,
and
we
get
rid
of
that
and
when
we
want
to
add
the
new
aggregation
of
cluster
or
we
want
to
start
aggregating
by
feature
category,
we
don't
need
to
add
like
another
20
charts
and
just
more
than
that,
actually
just
the
overhead
and
the
cognitive
complexity
of
doing
that.
G
So
what
we'll
have
to
do,
then,
is
we'll
just
add
another
one
of
these
definitions,
which
kind
of
gives
all
the
metrics
that
this
aggregation
set
will
use
and
how
it's
defined,
and
then
that
will
you
know
we
can
start
adding
new
ones
like
the
per
cluster
one
without
just
having
this
explosion
of
code
that
we've
got
at
the
moment.
So
I
don't
know
it's
it's
it's
nearly
finished
and
one
of
the
things
that's
promising
is
most
of
it
is
deleting
code
and
simplifying
code.
So
I'm
very
happy
about
that.
C
G
No,
so
so
we
still
have
the
same
that
we
still
have
the
same
number
of
dashboards
like
oneplus
service,
but
at
the
moment
right
we
have
a
function
effectively
for
that
and
then
a
different
function
for
that
and
then
a
different
and
obviously.
G
Or
is
this
something
I
see
as
a
controller
as
a
contributor,
but
also
one
of
the
things
that
it
allows
us
to
do
is
much
more
quickly
say.
Well,
we
want
to
slice
our
service
level
monitoring
in
a
new
way,
and
it's
not
like.
Okay,
that's
going
to
take
me
like
three
weeks
to
update
like
hundreds
of
dashboards,
it's
just
kind
of
like
plug-in,
define
the
new
aggregation
set
like
what
all
the
metrics
are,
and
that's.
G
And
then,
and
then
just
plug
it
in
and
you'll
get
all
the
graphs
on
the
on
the
fly
and
you
and
it
sort
of
says
you
know
you
define
what
the
labels
are
for.
The
segregation
sets
in
one
place.
So
this
is
another
thing:
is
you
know?
Ben
really
wants
to
get
rid
of
the
tier
label,
and
instead
of
going
through
100
places
in
the
code,
we
can
just
remove
a
chair
and
then
everywhere
that
we
generate
that
recording
rule
where
we're
using
the
tier
label.
We
can
get
rid
of
it.
G
So
yeah
it's
it's
not
it's
not
very
exciting
stuff
and
it's
very
low
level,
but
I
think
it'll
help
us
improve
our
monitoring
going
forward.
So
that's
why
I
thought
I'd
bring
it
up.
A
G
D
Well,
thanks
for
taking
us
through
that
and
thanks
everyone
for
being
on
the
call,
I
will
upload
this
to
youtube
unfiltered
later
on,
andrew
with
the
shares
that
were
on
there.
Is
there
anything
that
shouldn't
be
made
public.
G
No,
there
was
nothing
private
unless
I
can
see
something
on
my
screen.
No.
E
D
I
think
it's
all
good
great
well
I'll,
share
that
there,
thanks
to
everyone
for
joining,
have
a
good
rest
of
your
day.