►
From YouTube: Scalability Team Demo 2021-02-25
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
B
Yeah
totally
unprepared,
but
I
thought
I
should
give
an
update
on
the
pack
file
cache
project
because
that's
the
main
thing
I've
been
working
on
together
with
sean
who
also
does
gets
other
things
done,
and
it
show
us
out
for
a
week
that
I
was
out
for
a
week.
So
it
may
be
less
clear
where
we
are
now
for
everybody,
so
I
thought
I'd
give
an
update
we.
B
So
the
idea
is
to
make
git
closing
get
fetched
faster
by
caching,
the
large
stroke
of
data
that
needs
to
be
cooked
on
the
server
every
time.
Somebody
clones
something
and
we
now
have
an
rpc
in
get
delete
that
allows
us
to
observe
this
work
happening,
and
this
is
this.
B
So
just
by
looking
at
those
logs,
we
can
see
how
much
data
we'll
be
storing
what
the
hit
rate
would
be.
We
can
think
about
what
a
good
retention
time
is
or
what
the
trade-off
is
between
retention
time
and
storage
cost.
B
And
where
we
are
now
with,
that
is
that
we
have
that
on
in
production
and
it's
creating
log
data,
and
we
can
then
fish
that
out
of
bigquery
and
download
the
log
data
and
do
offline
analyses,
because
I
think
in
four
hours
we
create
about
one
and
a
half
million
records.
B
Well,
the
the
request
rate
is
about
100
per
second
and
we
thought
that
the
24
hour
period
is
a
good
like
24
hours
on
a
weekday
is
probably
going
to
give
us
a
good
insight
of
what
goes
on,
because
weekdays
are
busier
than
weekends
and
there's
a
definitely
a
24-hour
periodic
nature
in
traffic.
So
if
you
have
a
24-hour
window,
then
you
should
see
everything
roughly
that
happens
so
we're
now
busy
capturing
a
full
24-hour
period.
B
Period
end,
so
our
plan
is
to
make
it
literally
thursday
in
utc,
so
it
would
end
at
the
end
of
today,
but
strictly
speaking,
it
started
somewhere
yesterday.
The
thing
is
that.
A
B
B
It's
just
that
yeah
we're
both
sean
and
I
are
not
very
comfortable
working
with
bigquery
and
the
best
way
we
understand
it
right
now
is
that
you
have
to
grab.
You
have
to
tell
bigquery
to
create
a
table
per
day.
I
think
that
might
be
because
we
store
logs
per
day
or
something
so
it's
it's.
Everything
is
much
easier
if
we
constrain
ourselves
to
one
calendar
a
day,
it's
not
that
bad.
B
Sense,
yeah
so,
and
it
it's
fun
that
we
get
to
use
bigquery
now
and
well.
Sean
gets
to
use
it
and
I
get
to
ask
sean
how
help?
B
How
do
I
do
that
and
then
sure
figures
it
out,
but
yeah
one
of
the
things
we
also
want
to
have
as
a
result
of
this
we're
doing
this
analysis
on
the
log
data
and
we're
trying
to
do
it
in
a
way
where
it's
reproducible
with
a
script
so
that,
assuming
that
this
all
works
out-
and
this
becomes
a
feature
we
can
put
that
script
somewhere
in
the
documentation
and
if
somebody
else
wants
to
roll
out
this
on
a
self-managed.
This
feature
in
a
self-managed
instance.
B
B
And
so
that's
also
why
it's
good
that
we're
not
working
in
kibana,
because
not
everybody
has
kibana,
and
the
other
reason
is
that
the
type
of
analysis
we
want
to
do.
They
are
easy
to
express
if
you
just
have
a
script
in
a
loop
that
looks
at
json
objects
but
they're
hard
to
express
as
kibana
queries.
So
it's
just
easier
to
do
it
this
way
and
it's
I
was
going
to
say
it
was
100
requests
per
second.
B
So
we
think
that
that
is
going
to
be
about
10
million
records
or
something
and
a
laptop
can
handle
that
like
it
takes
a
moment,
but
it's
it's
doable,
you
don't
you
don't
need
hadoop
or
mapreduce
or
whatever
in
the
cloud
to
analyze
that
yeah.
So
that's
that's
actually
so.
B
That's
exciting
and
I
hope
today
to
submit
for
review
the
sort
of
the
core
of
the
cash
mechanism
that
I've
written
as
library
code
for
greatly
so
right
now
it
doesn't
get
cold
yet,
but
it
just
tests
that
exercise
yeah
the
interesting
behavior
web
skill.
B
Yes,
we
try
to
keep
it
below
web
skill
and
so
yeah
we're
trying
to
get
that
into
review
already,
although
there's
a
slight
chance
that
the
log
analysis
teaches
us
that
we
need
to
tweak
things
a
little
bit
or
that
we
need
to
make
some
optimizations,
but
I
I'm
hoping
we
won't
have
to
we'll
see.
B
B
Those
are
allocations
that
happen
during
the
profile,
but
that
may
not
that
doesn't
count
whether
they
get
when
they
get
removed
again,
and
it
looks
like
every
byte
that
gitly
serves
also
gets
allocated,
which
is
not
a
great
way
of
serving
data,
because
usually
you
would
have
a
buffer
and
copy
something
in
copy
it
into
the
network
circuit
and
reuse
the
buffer.
So
you
only
allocate
that
buffer
once
and
because
of
the
way
a
grpc
works,
and
we
structure
this.
It
looks
like
every
time
we
have
a
chunk
of
data.
B
We
want
to
send
from
gitly
back
to
workhorse
or
gitlab
shell.
We
allocate
a
buffer
put
it
in
there,
send
it
out
and
garbage
collect
the
buffer,
so
every
byte
has
to
be
allocated
and
garbage
collected,
and
you
can
clearly
see
this
in
the
profiler,
where
the
rpcs
that
do
git
clone
creates
a
ton
of
allocations
and
that,
so
that's
a
big
chunk
of
work
and
we
get
a
full
copy
of
that
chunk
of
work
once
this
rpc
is
on
because
it's
doing
the
same,
it
has
to
send
the
same
amount
of
data.
B
Actually
there's
a
picture.
Let
me
just
quickly
show
the
picture
for
that.
I
can
find
it
that's
what
you
get
for
not
preparing
this.
B
Now
these
graphs
don't
show
the
memory
increase
that
clearly
yet,
but
because
they
were
taken
early
yeah.
This
is
the
picture
I
was
looking
for.
So
these
are
all
the
allocations
during
30
seconds,
and
this
is
an
average
where
it's
allocating
1.5
gigabyte
in
30
seconds
in
this
particular
window
or
over
these
profiles
and
the
worst
one.
It
was
allocating
14
gigabytes
in
30
seconds.
B
It
doesn't
mean
that
it
was
using
14
gigabytes
of
memory
at
one
time,
but
it
was
allocating
that
much
worth
of
memory
and
this
part
is
sort
of
a
given.
So
that's
the
workload
created
by
get
http
and
get
ssh,
so
on
average
across
all
these
things
get
http
is
bigger
than
good
ssh,
but
it
will
vary
per
server
and
per
when
you're
looking.
B
But
so
this
is
a
big
chunk
of
work
and
there's
an
identically
sized
chunk
of
work.
That
is
all
the
new
rpc
and
there's
one
server.
We
still
don't
really
understand
why,
but
in
the
in
the
dashboards
in
the
saturation
graphs,
there's
one
graph
that
goes
up
for
the
memory
and
no
I'm
not
going
to
sorry.
I
need
to
stop
sharing.
B
I
want
to
keep
showing
things
and
slow
down
more
and
more.
I
just
need
to
say
it
in
words:
the
memory
there's
an
aggregate
memory
graph
that
goes
up
in
the
saturation
and
if
you
try
to
see
why
that
is
there's
about
60
servers
and
it
all
seems
to
be
because
of
one
and
that's
no.
It's
file
29.,
and
I
don't
really
understand
what's
special
about
that
one.
B
But
when
you
turn
the
rpc
up
on
memory
usage
on
that
one
sort
of
doubles,
but
we
have
plenty
of
head
room
on
the
memory
usage.
So
it's
not.
We
can
get
away
with
it
but
yeah
and
then
looking
at
the
profiler,
where
there's
this
one
outlier
of
14
gigabytes
or
15
gigabytes
or
whatever.
I
you
can't
see
in
the
profiler
from
what
server
that
profile
is.
But
I
strongly
suspect
it's
that
one,
because
the
memory
usage
also
went
up.
B
There
but
it
so
it's
noticeable
when
you
turn
it
on,
but
it
doesn't
seem
to
be
a
problem
and
the
big
question
that
we
can't
answer
yet
is
how
much
of
a
benefit
we
see
once
the
cache
is
in
place,
because
the
rbc
that
just
measures
things
does
not
offer
any
benefit
besides
measuring
things
so.
C
B
C
B
Yeah,
we
know
we
know
the
hit
rates,
what
we
don't
know
yet
is
or
what
I
what
I
find
hard
to
predict.
B
We
know
that
creating
the
spec
file
responses
uses
a
lot
of
cpu
and
especially
on
canary
one,
where
we
have
lots
of
ci
cpu
usage
on
the
gitly
server
is
dominated
by
creating
by
the
cpu
calculating
the
same
thing
over
and
over
again.
So
it's
obvious
that
that
chunk
of
cpu
should
shrink,
but
the
reading
from
the
cache
causes
io,
and
that
means
cpu
time
will
be
spent
on
io
and.
B
B
What
is
the
networking
from
I
o
to
from
cpu
to
io?
I
don't
know
what
the
net
effect
is
for
for
one,
and
the
other
question
is
that
I
have
this
ambitious
goal
of
turning
off
the
cache,
the
ci
pre-clone
cache
on
gitlab
or
gitlab,
and
so
what
happens
now
is
that
we
download
a
chunk
of
like
a
stale
clone
from
object
storage
so
that
doesn't
hit
the
gitly
server.
B
So
once
the
cache
is
in
place,
those
small
fetches
will
collapse
onto
just
one
or
two
and
that's
already
good.
But
then,
if
we
stop
downloading
that
data
from
object
storage,
we
are
generating
it
on
the
gitly
server
and
what
I'm
hoping
is
that
we
can
just
get
away
with
that,
and
we
don't
need
this
cache
anymore,
because
it
is
fragile
and
it
is
not
a
gitlab
feature.
It's
hard
for
other
people
to
use
this
cache,
but
again
the
what
is
the
net
effect
this?
How
does
it
add
up
in
the
end?
D
B
So
that's
yeah,
that's
where
we
are
with
that:
the
we're
learning
from
the
analysis,
we're
gathering
data
and
we're
submitting
the
next
trick
of
work
of
implementation
for
review.
A
What
will
definitely
be
worthwhile
once
that?
Well,
once
the
the
data
collection
is
finished
and
once
the
analysis
is
done,
then
publicizing
what
we
think
the
impact
is
going
to
be,
because
there
was
a
lot
of
interest
from
both
the
data
stores
team
and
we
were
working
with
getly
on
it
as
well,
and
I
think
once
we
once
we
can
tell
them
what
we
think
the
impact
will
be.
We
should
let
them
know,
because
that
might
help
our
mrs
get
through
a
bit
faster
as
well.
B
Yes,
yeah
one
thing
that
I'm
very
particularly
curious
about
that.
I
didn't
I
I
did
some
preliminary
like
we
got
a
couple
small
chunks
of
data,
and
I
did
some
analysis
on
that,
but
I
didn't
take
it
very
far.
Yet
one
thing
I'm
very
curious
about
is
the
difference
between
italy
servers,
because
if
you
pull
the
data
out
of
bigquery,
it's
just
everything
and
you
can
do
a
global
analysis
which
is
going
to
give
you
a
ballpark
of
what
to
expect.
B
But
there's
going
to
be
a
lot
of
variation
from
server
to
server,
and
I
expect
that
the
cache
hit
ratio
will
be
much
higher
on
file
canary
one
because
of
all
the
ci
and
other
servers.
It
will
be
kind
of
meh
and
just
just
to
see
in
a
picture
like
what
yeah,
what
that
distribution
looks
like
will
be
interesting.
A
Thanks
for
thanks
for
showing
us
that
and
for
the
update
there,
it's
I'm
really
looking
forward
to
seeing
the
next
step
like
I
can't
wait
till
the
date
is
till
the
date
is.
A
A
So
is
there
anything
else
that
anyone
wants
to
go
through
or
wants
to
show
today.
C
I
could
perhaps
show
that
we
now
have
a
stage
group
and
stage
mapping
in
prometheus.
I
don't
know
how
interesting
that.
B
C
Shoulder
me
yesterday
played
with
that
to
get
a
number
of
how
many
endpoints
a
group
would
own.
Let
me
look
that.
C
C
Muted,
so
yeah
sorry,
I
was
muted.
So
before
we
couldn't,
we
didn't
have
this
information
inside
prometheus.
We
only
had
feature
categories,
and
these
are
the
things
the
feature
categories
are
the
things
that
we've
defined
as
features
that
are
owned
by
groups
that
are
in
stages
and
we
use
features
categories
to
tag
things
inside,
get
lab
rails
and
we
use
feature
categories.
So
they
don't
change
that
often
so
we
tag
workers
and
endpoints
with
that,
and
this
shows
how
many
endpoints
a
group
owns.
C
C
The
most
used
ones
package
has
a
lot
because
we
think
that,
because
they
support
all
the
different
kind
of
package
managers
and
so
on,
so
that's
all
different
kind
of
endpoints
and
the
fun
part
about
this
is
that
this
is
coming
now
all
from
prometheus.
So
we
don't
need
to
know
the
list
of
feature
categories
that
are
owned
by
a
group
to
be
able
to
look
for
this.
We
just
need
to
know
the
name
of
the
stage
group
or
we
can
even
do
this
by
stage.
I
think.
B
So
the
metrics
have
stage
groups
on
them,
so
whatever
possible
values
for
stage
groups
exist
in
the
metrics
are
whatever
stage
groups
are.
C
So
so
the
the
metrics,
the
metrics
themselves,
the
interesting
metrics,
are
feature
categories,
but
then
we
export
this
thing.
This
is
called
gitlab
feature
this
one
here.
If
I
copy
this.
C
And
that's
just
a
series
that
it's
a
series
for
every
feature:
category
with
the
stage
and
a
stage
group
on
it.
So
we
can
join
that
to
other
things
and
it's
just
the
one
all
the
time.
That's
something
where.
C
It
comes
from
the
stage
indirectly
from
the
stages
that
the
ammo
file
in
the
gitlab
website
repository.
C
Yes,
so
it
ends
up
in
thanos,
which
is
where
we
want
it
and
there's
something,
because
here
we
can
see
which
prometheus
is
exporting
it
and
like
that,
like
that,
as
a
side
effect
of
that
it
has
a
lot
of
labels
with
that
we're
not
interested
in
that.
We
need
to
remove
in
every
query.
So
I
might
look
into
that
too,
like.
B
C
Well,
those
like
yeah,
maybe
I
don't
know
either
how
exactly.
B
C
Get
updated,
it
gets
updated
when
you
run
a
script
and
something
is
going
to
shout
at
us
by
creating
an
issue
if
it
becomes
out
of
date.
A
C
B
Yeah,
but
so
this
is
the
first
step,
but
once
you
know
how
to
do
this
this,
if,
if
we're
pulling
that
yaml
file
from
the
handbook,
we
can
probably
also
see
what
slack
channels
belong
to
these
stage
groups.
C
B
C
There's
there's
a
discussion
that
I
think
dylan.
The
search
group
raised
in
the
global
search
group
that
that
suddenly,
their
like
elastic
search,
wasn't
processing
any
and
new
things
anymore,
and
nobody
knew
because
we
don't
have
alerts
for
those
queues
because
they're
throttled
like
we
expect
there
to
be
build
up
of
work
to
be
done.
But
we
don't
want
to
overload
the
elastic
search
so
like,
but
in
this
case
nothing
was
being
processed
and
because
it's
using
our
general
alerting
rules,
we
didn't
notice
yeah.
C
So
that's
probably
the
first
thing
that
we're
going
to
have
like
a
rooted
alert
first.
So
the
global
search
team
can
keep
an
eye
on
this
like
without
us
intervening
because
they
play
around
with
elasticsearch
itself
and
so
on.
Yeah.
A
It's
been
nice
to
see
the
engagement
from
everyone
between
the
dashboards,
the
this
alerting
piece.
Everyone
seems
to
be
interested
in
what
we're
doing
with
this,
and
it's
just
so
nice
to
see
what
we're
doing
being
used,
and
I
was
appreciative
for
hwang
min
putting
in
that
link
to
an
okr
that
source
code
has
now
an
okay
or
linking
to
how
to
regularly
review
the
dashboard.
C
C
B
And
they
don't
know
how
to
make
them,
and
so
then
the
question
is
who's
best
place
to
help
them.
Can
we
help
them.
C
Well,
I
think
the
dashboards
were
a
good
place
to
start,
and
I
think
that's
what
triggered
it,
because
now
they
put
lines
on
a
dashboard
and
they
put
red
lines
on
the
dashboard
and
they
want
to
get
an
alert
when
the
blue
line
goes
below
a
red
line.
For
example,.
B
I
suppose
what
we're
doing
here
is
that
we're
getting
them
we're
pulling
them
into
committing
to
the
run
books,
repo
and
that's
also
where
the
alerts
are
and
they're
now
in
the
directory,
where
they're
defining
their
own
dashboard
and
they
we
need
to
somehow
make
it
easier
or
the
goal
is
that
they
can
also
do
things
in
the
other
directories
where
the
alerts
are,
and
I
don't
want
to
make
it
sound
like
they
should
know
how
to
do
that,
because
I
don't
know
how
to
do
that
either.
B
I
find
the
rainbow
scary,
revo
scary,
and
it's
sort
of
a
jungle
myself,
but.
A
A
C
One
thing
that
I
was
thinking
about
like
yesterday
on
the
on
the
bicycle
was:
we
could
stop
by
I'm
going
to
run
this
by
andrew
once
you
get
back,
gets
back
because
he's
like
he
has
like
a
goal
for
this
and
just
not
enough
hands
to
type
it.
C
But
then
what
I
was
wondering
about
was
maybe
we
could
start
building
like
a
group
catalog
like
we
have
a
service,
catalog
and
a
metrics
catalog
and
then
define
like
have
a
way
for
groups
to
define
group
level
indicators
like
we
have
service
level
indicators,
so
that
would
be
like
an
automated
way
for
that.
B
Yeah
that
makes
a
lot
of
sense
because
I'm
still
not
quite
used
to
this,
but
I
think
this
is
what
andrew
has
been
building
with
all
this
jsonnet
code.
Is
this
idea
of
declaratively
saying
what
should
be
out
there
and
having
as
much
as
possible
be
auto-generated?
B
So
if
we
can
have
a
declarative
syntax
for
where
group
could
say,
I'm
interested
in
this
metric
doing
that
and
then
the
whole
thing
just
rolls
out.
That
would
be
enough.
One
way
to
help
them
get
the
information
they
need.
C
B
Yeah,
but
it
might
be
very
interesting
to
work
with
source
code
if
they're
the
ones
having
interested
in
having
interesting
custom
metrics,
because
working
with
them
would
teach
us
what
sort
of
things
are
useful
or
what
what
those
metrics
should
be.
One
thing
I'm
wondering
is:
do
people
want
met?
What
sort
of
alerts
do
people
want
because
they
probably
don't
want
their
developers
on
pager
duty?
So
do
they
want
slackpings?
Do
they
want
issues?
C
I
only
briefly
looked
at
the
code,
that's
all
going
through
what
is
it
the
pager
service
thing?
We
have
ways
of
different
of
rooting,
alerts
differently,.
B
A
With
slack
makes
sense,
because
it's
the
more
immediate
form
of
communication
than
an
issue
also,
if
we
get
into
a
stage
where
we're
just
creating
tens
and
tens
of
issues
for
a
stage
group
we're
just
like
throwing
it
in
a
hole,
whereas
I
think
if
you,
if
you're
constantly
sending
things
at
slack,
at
least
you
aren't
destroying
their
backlog.
While
we
automatically
create
issues
for
over
the
weekend,.
B
You
don't
ruin
their,
you
don't
ruin
their
issues,
but
what
I've
seen
with
these
alert
channels
is
that
they
just
got
flooded
with
alerts
and
I
lose
track.
A
B
A
So
we
just
have
to
be
careful
about
what
we
choose
to
alert
on
and
also
advising
the
the
groups
themselves
that
they
don't
need
to
see
noise
because
they
might
come
to
us
and
say
I
want
to
know
every
single
time
a
dot.
This
line
drops
below
that
line.
It's
like
you,
don't
actually
want
to
see
to
know
that
you
want
to
know
when
it's
been
down
for
this
length
of
time
or
these
other
things
match
up
like
we
need
to
discourage
them
from
wanting
to
see
noise,
but
that's.
C
The
that's
the
interesting
but
hard
part
about
that,
because
some
of
those
metrics,
like
the
the
thing
that
kagosh
was
working
on
with
the
ci
trace
conflicts
that
discovered
the
yeah
the
the
problem
that
I
won't
mention
on.
The
public
thing
like
that
was
as
soon
as
this
was
above
a
very
low
number
that
we
in
infrastructure
wouldn't
really
look
at
as
something
to
worry
about.
B
But
that
that
is
not
the
same
thing
as
having
a
noisy
alert.
It
could
be
that
if
you're
a
stage
group-
and
you
know
like
this
feature-
is
being
used
in
this
in
this
way
that
this
number
should
never
drop
below
x
for
more
than
y
time.
B
E
When
I
talk
to
some
startup,
especially
in
code
review
and
support,
they
mention
that
they
don't
care
about
the
real-time
matrix,
not
actually
don't
care,
but
they
will
care
more
about
the
metrics
over
a
really
long
period
like
one
month
or
two
months,
because,
typically
when
they
perform
some
kind
of
optimization
or
some
feature,
it
would
take
them
about
one
month
to
for
the
optimization
check
effective.
So
they
really
want
to
compare
the
current
like
after
this
deployment
by
one
month
and
the
about
two
or
three
months
before
to
compare
the
performance
of
the
optimization.
E
So
every
point
is
a
little
bit
different
and
even
though
that
we
can
post
a
lot
to
them
it,
I
really
highly
doubt
that
what
they
can
do
to
resolve
the
situation
yeah
because
most
of
the
times
this
is
that
they
can't
do
anything
this
under
the
noses.
They
have
to
take
some
time
to
investigate
sometimes
to
implement
the
long-term
solution
and
then
some
really
long
time
to
make
things
go
introduction.
E
B
Yeah,
the
the
thing
is,
what,
if
you
have
good
dashboards
and
if
you
can
fake,
having
an
alert
by
just
looking
at
the
dashboard
regularly
and
maybe
for
these
long-term.
B
Is
the
right
thing
to
do
because
it's
hard
to
define
an
alerting
rule
that
would
spot
something
that
is
a
human
you
can
see
when
you
look
at
this
dashboard,
so
maybe
we
shouldn't
get
too
distracted
by
the
alerting
parts
and
well
we
shouldn't
get
so
distracted
by
the
alerting
part.
We
forget
that
everybody
needs
to
have
a
dashboard.
E
C
B
A
A
A
Time,
like
I
think
we
we're
gonna,
have
to
be
flexible
with
with
treating
different
teams
in
different
stages,
because
that's
where
they're
at
but
that's
hard,
because
it's
not
like
straightforward,
saying
right,
we're
in
this
phase
of
the
project
now
we're
in
that
phase
of
the
project.
It's
I
think
it's
going
to
be
a
little
bit
more
messy
than
that.
B
But
it
overall
it's
exciting,
because
I
maybe
I'm
misremembering,
but
I
feel
like
this
has
been
on
in
our
team
mission
statement
for
a
long
time
now
that
we
want
to
help
state
groups
understand
how
their
stuff
operates
in
production
and
we're.
Now
at
the
point
where
we're
actually
engaging
and
becoming
that
bridge
or
that
link
between
it's.
A
A
This
is
super
like
this
is
really
fantastic
to
see
this,
and
I
think
this
is
a
really
effective
way
of
almost
scaling
the
infrastructure
team
in
a
sense,
because
we're
outsourcing
looking
at
some
of
the
problems
to
back
to
the
teams
who
own
the
code
and
as
I
might
take
this
on
a
bit
of
a
diversion
I've
been
involved
in
this
conversation
about
infra,
dev
issues
and
as
part
of
being
involved.
In
the
conversation,
I've
been
looking
at
actual
data
like
so
there
are
more.
A
Are
there
really
more
infrared
issues
than
there
used
to
be?
Are
they
being
closed
at
the
same
rates
or
not?
And
why
and
what
I
can
clearly
see
in
the
data
is
that
over
since
that
massive
hiring
spurt
in
2019,
like
the
mr8,
has
shot
up
and,
at
the
same
time,
we've
increased.
A
How
fast
we
get
things
to
production,
so
naturally,
there's
going
to
be
more
incidents
that
come
out
of
that,
because
there's
more
change
and
we
get
the
change
there
faster,
but
we're
not
doing
enough
on
the
other
side
of
dealing
with
that,
like
the
infrastructure
team
is
largely
it's
only
a
slightly
bit
bigger
than
it
used
to
be
and
the
most
effective
way.
I
think
to
encourage
to
encourage
people
to
be
aware
of
what
they're
doing
is
to
give
them
the
visibility
for
that.
A
We've
added
these
things
to
your
dashboards,
and
this
is
how
they
work.
So
so
short
answer
to
the
question:
is
it's
it's
coming
it's
relatively
soon.
I
need
to
double
check
with
marin
for
for
when
we
need
to
start
pushing
this
out,
because
the
issue.
C
A
So
that
that
issue
came
about
because
we
had
a
conversation
about
what
needs
to
happen
next
to
get
to
error
budgets,
and
the
statement
was
oh
we're
actually
not
that
far.
We
just
need
to
do
x,
so
I
said
no
we'll
write
it
down
because
I'm
concerned
that
it's
not
just
x.
So
writing
it
down
has
produced
this
list
of
this.
This
is
what
this
is.
What
needs
to
happen
in
order
to
get
to
the
next
step.
C
Yeah,
the
the
the
time
consuming
thing
time
consuming
as
in
wall
time.
Not
people
time
is
the
metrics
need
to
be
adjusted
and
then
I'm
muted
again.
So
the
thing
is
the
thing
that's
annoying
with
all
of
like
with
a
lot
of
that
work
is
that
metrics
need
to
change
and
those
metrics
are
used
for
alerts
and
graphs
and
all
of
that
things.
So
we
need
to
do
that
carefully,
and
so
that
takes
time
so.
C
For
just
recording,
basically
yeah
the
first
thing
is
recording,
but
to
be
able
to
do
that
in
like
a
proper
way.
We
need
to
tweak
metrics
to
include
feature
categories
and
not
explode
cardinality
like
now.
We
have
some
histograms
that
have
15
buckets
or
I
don't
know
what
which
yeah
we
can't.
We
can't
add
a
feature
category
label
to
those
because
then
well,
ben's
not
going
to
shout
at
me
because
he's
gone
but
yeah.
B
D
B
C
A
Great
well
thanks
so
much
for
the
time,
hoping
you
all
have
a
great
friends
and
family
day
tomorrow
and
we'll
catch
up
with
you
again
next
week,.