►
From YouTube: Scalability Team Demo Call - 2021-08-06
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
I
guess
I'll
go
first,
so
this
is
the
thing
I
was
looking
at
the
other
day.
I
don't
have
an
answer
yet
I
have
a
few
things
that
I
know
it's
not
so
I'm
just
gonna
share
my
screen.
I
guess
so.
This
came
from
me
looking
at
the
error
budgets
for
a
stage
group,
and
I
noticed
that
so
this
is
the
distribution.
This
is
the
distribution
of
requests,
http
requests
in
our
logs
into
one
second
buckets
by
the
duration.
A
They
spend
we
our
logs,
say
they
spend
in
the
shared
state
redis
I've
excluded
ones
that
take
less
than
a
second
from
here,
because
I
mean
the
vast
majority
of
requests
should
take
less
than
a
second
in
any
redis
like
it's
redis
like
this
shouldn't
be
a
thing,
but
what
I
did
notice
as
well
was
that
there's
this
big
spike
at
five,
which
is
suspicious
and
there's
also,
you
can
see,
there's
basically
nothing
at
11
12
13
14.,
but
there
is
something
at
10
and
15.
A
and
possibly
at
20
as
well,
though
it's
hard
to
see
so
I've
just
taken
them
taken
the
filter
off
here
and
you'll.
See
you'll
see
that
under
one
dominates
this.
But
this
is
odd
right
because,
like
a
reddish
command's
not
going
to
take
five
seconds,
certainly
not
an
appreciable
number
of
reddish
commands
said.
You
know
44
000
in
a
day,
because
if
they
did,
our
slow
log
would
be
full
of
that
and.
B
A
Normal,
though,
for
the
shared
state
redis
for
psychic,.
A
Oh,
no,
no,
sorry
yeah,
so
it
happened.
So
this
was
the
top
one
was
from
the
rails
logs,
oh
yeah,
so
here's
the
rails
logs
by
if
you
include
under
one
second
and
this
you
know-
obviously
that's
unreadable.
Okay,
just
give
me
a
second.
B
So
this
makes
me
wonder
if
maybe
we
are
running
blocking
commands,
not
necessarily
a
pr
pop,
but
any
blocking
command
on
that
that
would
have
a
an
implicit
timeout
and
a
reason
to
paint
yes,
sorry,
oh
you
heard
that
I
I.
A
B
I
was
wondering
if
there
was
like
a
block,
maybe
in
persistent
reddish
we
are,
we
are
still
running
blocking
commands
that
would
have
a
timeout,
not
necessarily
be
our
pop,
but
you
know
any
blocking
command
yeah.
A
C
Explicitly
remove
those
commands
from
that
reddish
duration,
then
the
the
blocking
ones.
A
Are
you
talking
about
in
the
instrumentation
where
we
remember
yeah
commands?
Let
me
check
that's
the
wrong
one,
this
one,
because
so
they
are
excluded
from
app
decks,
but
not
from
are
they
excluded
from
logs?
I
can't
even
tell
with
how
this
works.
Let
me
see.
A
I
think
this
is
what
adds
them
to
metrics
and
to
logs,
so
these
will
be
excluded
from
both
aptx
and
logs,
but
I
can
double
check
on
that.
I'm
just
just
gonna
rule
out
one
more
thing:
I
was
speaking
to
jakob
about
this.
Yesterday
he
was
off
today.
A
He
was
wondering
if
it
was
at
a
certain
point
in
the
life
cycle
of
the
process
like
either
the
first
or
the
last
request,
like
especially
the
first.
If
it's
some
kind
of
timeout
related
thing,
maybe
it's
a
client
timeout,
but
it's
not
so
here's
when
this
happened
for
two
different
kubernetes
pogs
and
you
can
see
that
there
are
requests
before
ticking
along
normal
and
then
there's
a
spike
and
then
it
goes
back
to
normal.
A
It
also
doesn't
happen
for
any
other
redis
instance
which
points
towards
maybe
a
command
like.
I
think
matt
was
just
saying
because
it
doesn't
happen
on
the
cache
instance,
the
hues
instance
or
trace
trump's
instincts.
B
Can
we
get?
Can
we
get
a
a
frequency
by
time
of
day.
B
I'm
mostly
asking
because,
because
I'd
like
to
cut
some
of
these
in
the
act
and
if
I
can
get
an
idea
of
how
often
they
occur
and
what
at
what
density
then
I'll
know
how
long
to
to
run
the
instrumentation
for.
A
Yeah
so
one
idea
I
had
for
this
was
performance
bar.
We
have
a
stack
trace
for
request
to
redis,
so
I
was
like
if
I
can
grab
one
of
these,
because
these
happen
for
like
me
and
for
you,
probably
if
I,
but
they
mostly
happen
on
async
request,
so
you
don't
notice
them.
If
I
can
grab
one
of
these
for
somebody,
who's
got
the
performance
bar
enabled
I
can
grab
that
request
id
and
put
it
in
the
performance
bar
to
get
it.
A
So
I
got
the
performance
bar
data,
but
unfortunately
the
performance
bar,
because
it's,
I
think,
because
it's
a
happens
in
the
rails
controller,
it
won't
catch
things
that
happen
outside
of
the
controller
life
cycle.
So
if
we
see
here
where
are
we
so?
First
of
all,
we
can
see
the
duration
is
five
seconds.
A
So
it's
not
it's
unlikely
to
be
a
measurement
error,
which
is
another
thing
I
was
wondering
because
if
it
was
a
measurement
error,
the
total
duration
wouldn't
be
affected
and
also
the
workhorse
duration
says
this
took
over
five
seconds.
So
it
seems
like
this
was
actually
a
very
slow
request,
but
we
can
see
that
total
radius
calls
were
19,
of
which
seven
are
on
the
shared
state,
and
here
I've
got
total
of
six
of
which
zero
are
on
the
shared
state.
A
So
I'm
clearly
missing
a
bunch
of
stuff
in
the
performance
bar,
which
is
a
separate
issue.
Another
idea
I
had
to
get
stack
trace
just
sorry
I'll
set
up
this
time
of
day
chart.
What
do
we
want?
Let's
just
say:
let's
just
say
this
specifically,
so
let's
say
this
is
between
5
and
5.1.
Oh,
this
is
this
chart
is
the
wrong
way
around
really
isn't
it.
A
Yes,
right
so
buckets
by
date,
histogram
jason.time
y-axis
can
be
count
for
the
last
day
or
actually
we
could
probably
yeah.
Let's
just
do
that.
Oh
wow,
but
I
guess
that
does
help
you
matt
and
that
you
can
do
it
at
literally
any
time.
B
Yes,
that's
great
zoom
into
any
one
of
those.
I
guess.
Let's,
let's
look
at
the
last
the
the
last
couple
of
hours
and
yes,
that's.
I
wanted
to
see
if
it's
bursting
on
a
shorter
time
scale
yeah
change
the
graduality.
What
do
we?
What
circuit
you
need
now?
I
know
it's
set
to
auto
by
default.
Yeah
30
seconds.
A
A
C
A
Yeah
wait
what?
Oh
that's
the
start.
A
B
A
D
A
B
I'll,
which.
B
A
Question
andrew,
so
I
discovered
this
because
it
happens
a
lot
on
internal
allowed,
but
this
is
basically
a
chart
of
which
endpoints
hit
the
most
often.
A
Wow
I'll
just
pop
that
link
there
for
now,
matt.
C
The
the
other
thing
we
can
try
and
do
which,
which
one
was
it
the
shirt
yeah
she
said,
and
it
doesn't
happen
on
the
others.
C
The
one
thing
you
could
look
at
is
what
calls
are
being
made
on
shared
state
that
you
don't
get
on
the
other.
Yes,
I
mean,
I
suspect,
there's
quite
a
lot
because
yeah
there's,
I
suspect,
there's
quite
a
lot
because
it's
so
much
more
generic
than
you
know
the
ones
just.
A
A
But
yeah,
basically,
if
we
just
say
if,
if
this
duration
is
over
five
in
our
logging
block
track,
but
don't
raise
an
exception,
so
we
put
it
in
century,
so
we'll
get
the
back
trace
via
century
and
then
maybe
we
can
see
where
in
our
stack
this
is
coming
from
because
yeah,
the
other
thing
with
five
seconds
is
we
are
measuring.
A
I
don't
think
it's
this,
but
the
measurement
includes
this.
You
can
pass
a
block
of
ruby
code
to
be
run
with
some
commands,
so
it
will
include
the
the
time
it
takes
to
run
that
block
so.
A
A
But
these
are
the
commands
that
can
take
a
box,
so
not
every
command
can
take
a
block
and
it
sounds
more
like
a
timeout
than
ruby
code
taking
exactly
five
seconds
every
time
to
me.
B
A
D
B
Little
post-processing
I'll,
probably
also
be
able
to
find
out
which
hosts
are
sending
those
requests,
but
not
necessarily
which
application
code
path,
at
least
with
what
I'm
thinking
of
doing
right
now,.
A
C
Z
scans
at
the
bottom
of
that
list
of
of
things
that
take
a
block
all
the
scans,
in
fact,
do
those
blocks
get
called
multiple
times,
I'm
just
trying
to
remember
how
they
work
in
in
the
case
of
those
scan.
Oh
they
they
call
for
every
single
item,
not
the
so
those
could
take
five
seconds
if
it's
measuring
the
entire
scan
right.
A
Possibly
I
just
I
just
feel
like
it's
unlikely
that
it
would
take
pretty
much
exactly
five
seconds
every
time.
Yeah
yeah
yeah,
I
mean
yeah,
I
don't
know,
I
guess
I
guess
if
we
get
the
commands
that
will
narrow
it
down
a
lot
already,
because
if
it's
not
one
of
these,
then
it's
clearly
not
a
ruby,
ruby
side
issue.
C
C
A
Well,
I
mean
you
can
literally
do
like
sleep
in
the
block
and
then
it'll
take
five
seconds.
That
was
how
I
tested
yeah,
so
yeah
not
super
helpful
to
to
conflate
the
two
things
like
that
so
yeah.
I
just
wanted
to
share
this
as
like
something
I've
been
looking
at,
but
I
don't
really.
C
A
B
This
is
this
is
why
I
want
to
do
instrumentation
to
to
catch
a
few.
A
few
examples
of
the
security
yeah.
B
A
So
sorry,
the
reason
what
I
was
going
to
say
was
the
reason
I
found
this
in
the
first
place
was
because
the
error
budgets
work.
So
I
was
looking
at
the
error
budgets
for
source
code
who
have
a
lot
of
these
ones
at
the
top,
because
source
code
is
one
of
the
groups
that
has
that
owns
most
of
the
requests
that
we
make
because
they
only
get
access
it's
about
three
percent
right.
There
we
go-
and
I
was
like
you
know-
by
proportion
like
these
endpoints
aren't
most
of
their
failing.
A
A
This
actually
does
make
quite
a
big
difference
to
their
error
budget
and
I
was
going
to
create
an
issue
for
them.
But
then
I
realized
it
happens
on
everybody's
endpoints
and
it
doesn't
seem
to
be
related
to
like
what
any
particular
stage
group
is
doing.
So
this
did
come
out
of
the
the
work
on
error
budgets
and
I
think
this
probably
would
make
a
a
dent
in
our
overall
abduct
score.
B
Some
some
of
the
alerts
I
got
last
week
ended
up
resolving
down
to
internal
allowed
being
being
running
for
multiple
seconds.
Okay,
so
yeah
definitely
definitely
this
would
this
would
help.
A
C
A
It
should
always
be
one
of
the
ones
we
should
really
have.
The
most
yeah
should
always
be
approximately
the
same
speed,
yeah
yeah
and
the
thing
is
I've
noticed
when
I'm
looking
at
these
like
because
so
many
requests
to
the
web
and
api
servers
are
either
api
requests
from
ci.
Like
you
know,
that's
what
a
lot
of
api
internal
allowed
calls
are
is
from
ci
or
from
oh
ajax
requests
like
the
one
I
showed
earlier.
That
was
my
request.
Was
the
request
for
the
mode
request.
Widget,
like
you
know,
polling.
A
You
probably
won't
notice
it
that
much
because,
like
it'll
just
be
in
the
background
somewhere,
but
it's
still
happening
so
yeah.
I
want
to
know.
Basically
I
want
to
know
what's
going
on
here.
A
So
yeah
matt-
I
just
pinged
you
on
that
issue.
I
was
wanting
to
chat
to
you
later
anyway
about
the
sidekick
stuff
you
were
talking
about
yesterday.
So
maybe
we
should
just
chat
about
both
later.
If
that
works
for
you
or
do
yeah,
we
can,
we
can
see
but
yeah.
That
would
be
good
to
get
some
help
and
try
and
try
and
unravel
this.
B
Yeah
definitely
definitely
so
yeah.
I
because
we've
got
this
periodicity.
I
think
what
I'm
going
to
try
to
do
as
a
first
cut
is
sorry
andrew.
I
I
said
earlier
that
I
had
two
ideas,
but
I
neglected
to
articulate
what
they
were.
B
One
is
one
is
packet
capture
during
one
of
these
periods
and
and
because
we
have
dense
period,
I
I
should
screen
share
just
to
show
this
is
this.
Is
this
is
sean's
query
oops?
Stop
that
I
just
pasted
a
screenshot
in
the
issue,
but
this
this
shows
even
more
clearly
the
periodicity.
All
I
did
was
switch
this
to
an
area
display
from
from
a
line
chart
to
a
bar
chart.
B
So
I'm
going
to
time
it
so
first
idea
is
packet
capture.
Do
push
processing
analysis
to
pull
out
the
the
the
calls
that
have
a
dirt
that
have
this
duration
and
that
will
identify
the
the
command
its
arguments
and
the
and
the
the
client
ip,
which
you
know.
In
many
cases,
I
think
will
be
a
kubernetes
pod,
so
that
won't
be
especially
useful,
but
identifying
the
the
nature
of
the
command
is
probably
going
to
be
more
useful
for
analytical
purposes.
B
Since
we
do
have
these
dense
periods,
I
think
I
could
probably
get
away
with
doing
like
a
30
second
capture.
If
I
time
it
right,
so
I
think
that's
probably
the
better
approach,
because
it
will
give
more
data
and
it
doesn't
burn
redis's
cpu.
It
will
burn
a
separate
cpu.
The
second.
B
A
B
Exactly
yeah,
that's
exactly
the
absence
of
data
on
the
right
of
the
server,
so
tell
us
that
yeah.
So
I
would
come
back
here
to
confirm
that
the
selected
time
span
did
have
client-side
measurements,
that
ex
that
met,
that
time,
that
that
duration
and
we'll
see
how
they
match
up
the
other
idea
I
had,
which
I
don't
think
I'll
do,
because
I
think
we'll
get
more
information
and
less
impact
from
from
the
pcap
instrumentation
is
because
we
thank
goodness,
have
debug
symbols
on
the
red
server.
B
It's
possible
to
instrument
it's
possible
to
instrument
shared
shared
code,
pads,
like
like
process
command,
for
example.
So
we
could.
We
could
measure
the
distribution
of
durations
for
for
completing
process
command,
and
that
would
tell
us
something
something
about
whether
or
not
this
was,
for
example.
That
would
be
a
reasonably
cheap
and
quick
way
to
identify
if
this
was
a
blocking
command
or
not.
I
need
to
instrument
additional
data.
B
Additional
function
calls
to
to
determine
which
specific
command
and
if
it
is
a
blocking
command
that
does
a
separate
code
path
instrument,
so
that
would
be.
That
would
be
a
little
bit
more
hunting
and
it
would
still
not
give
us
information
about
like,
for
example,
the
arguments
to
the
coming
in
so
that's
and
would
it
would
consume
some
cpu
time
within
the
redis
main
thread
which
is
undesirable
because
that's
its
bounding
capacity.
B
C
B
C
It's
timing
out
from
a
br
pop
timeout,
sorry,
I'm
into
client.
I
meant
a
block
on
the
client.
You
know
the
the
yield
blocks
rather
than
a
locking
command.
B
C
Was
it
going
to
say
yeah
I
mean
I
like
looking
at
the
spread
right,
it's
probably
something
in
access.
It's
got
to
be
like
a
permissions
check
kind
of
thing.
I
imagine
yeah,
and
I
mean
it
could
be
something
else
but
like
just.
C
That's
also
a
good
option,
but
neither
of
those
I
can
imagine
have
any
blocking
calls
I
mean
they
might
do.
I
could
be
surprised
about
anything
but,
like
I
wouldn't
imagine,
oh.
C
A
Yeah-
and
I
don't
know
what
the
periodicity
on
that
is
because
it's
happening
once
in
the
day,
so
yeah
yeah.
I
think
I
think
it's
worth
looking
into
this
because
yeah
I
mean
actually.
The
other
thing
I
should
have
mentioned
was
so
I
said
I
was
looking
into
this
because
of
the
source
code
groups,
error
budgets.
The
source
code
group
will
also
look
into
the
error
budgets,
which
is
good.
That's
what
we
want
right.
Rachel
teams
teams
looking
at
their
budget
spend-
and
they
asked
me
like.
A
A
Ourselves
anything.
D
D
D
Was,
if
is
there
anything
more
on
that
topic,
or
can
I
ask
another
question
about
error
budgets,
cool,
I'm
looking
for
a
reminder
about
not
owned
in
the
error
budgets,
and
I
was
wondering,
apart
from
graphql
what
other
things
need
to
happen
to
attribute
more
of
that
of
that
not
owned
group.
A
So
yeah
we
talked
about
this
in
the
last
demo
andrew
yeah,
so
I've
got
an
mr
to
make
the
reactive
caching
take
the
feature
category
from
its
caller,
which
will
inherit
as
well.
So
sometimes
you
have
a
reactive
caching
worth
of
call
by
reactive
caching,
worker
and
if
it
will
take
that
from
the
first
caller,
so
basically
I
think
the
logic
is
getting.
A
I
don't
want
to
make
the
logic
too
much
more
complicated
than
it
is
now,
because
this
already
caused
some
confusion
for
matt.
Yesterday,
it's
actually
related
to
what
you're
asking
me
about
yesterday,
matt
so
with
our
context,
propagation,
from
particularly
from
a
web
request
to
a
background
job,
a
psychic
job.
We
want
to
inherit
every
single
field
like
user
root,
name
space
project,
except
some
cases
in
a
background
job.
A
We
want
to
clear
all
of
that,
because
we're
operating
across
multiple
contexts
and
the
feature
category
we
normally
want
the
feature
category
of
the
job,
not
the
feature
category
of
the
caller,
because
the
feature
category
of
the
job
is
generally
more
specific.
So
even
if
say,
a
merged
job
is
called
from
a
ci
page.
A
The
merge
jobs
still
own
still
owned
by
the
team
that
owns
the
merge
job-
it's
not
owned
by
the
ci
team,
just
make
a
page
that
triggers
that
worker
the
exception
being
not
owned
workers
because
for
those
not
owned,
this
is
essentially
like.
We
can
think
of
that
as
a
null
value.
So
for
there
we
do
want
to
get
it
from
the
caller,
because
that
at
least
helps
us
yeah.
A
Yeah,
so
I
don't
want
to
make
it
this
is.
This
is
the
most
complicated
I
want
to
make
it
so
I've
kind
of
paused
that,
mr
for
now,
because
it's
also
going
to
cause
more
confusion
as
we
do
the
rest
of
the
catch-all
rollout,
because
it
already
something
related
already
gave
matt
a
fair
question
yesterday,
but
I
think
I
think,
inheriting,
if
it's
not
owned,
if
we,
if
we,
if
we
had
null
instead
of
not
owned
there,
I
think
that
would
make
perfect
sense.
A
C
So
if
we
did
more
work
to
fix
the
attribution
on
the
on
the
rails
controllers
or
the
on
the
on
the
grape
and
rails
controllers,
right
and
graph
with
graphql
there's
this,
we
have
an
approach
for
that.
Then
we
wouldn't
need
that
to
the
the
kind
of
conditional
we
could
just
say
it
comes
from
the
from
oh
wait.
A
Sorry,
no,
we
still
need
the
conditional,
but
we're
saying
the
opposite,
so
we
basically
never
need
to
care
about
not
owned
sidekick
workers
because
they
will,
if
they
are
not
owned,
they'll
get
their
feature
category
from
whatever
calls
them,
so
we
don't
have
to
go.
We
only
have
to
go
through
the
not
owned
controllers
and
api
endpoints,
not
the
not
owned
jobs,
but
I
agree
it
is
a
little
bit
confusing
to
explain.
So
I'm
not
I'm
not
saying
that's
the
most
elegant
option.
C
Yeah,
I
mean
the
one
thing:
I've
really
noticed
is
now
that
this
is
starting
to
bite
the
error
budget,
the
graphql,
no,
no
just
in
general,
error,
budgets
and
people
and
push
back
on
on
tech
on
fixing
technical
debt.
People
are
becoming
a
lot
more
interested
in
in
the
accounting
of
it
and-
and
you
know
often
it's
in
big
meetings
with
lots
of
very
busy
people
and
I'm
realizing
the
importance
of
keeping
it
simple.
C
But
but
this
might
be
the
only
way,
but
I
am
also
seeing
like
people
are
like
no,
but
this
number
doesn't
make
sense
like
explain
it
to
me
and
they
sort
of
give
you
10
seconds
to
explain
it,
and,
and
so
it
it
is
kind
of
like,
and
those
are
the
decision
makers
right.
So
you've
got
to
have
them
brought
in.
So
just
just
something.
I've
been
noticing
recently,
especially.
A
Kind
of
talked
about
this
a
bit
before,
but
the
other
option
is
with
these:
not
only
workers
we
could
just
we
could
go
to
the
other
extreme
and
just
make
like
a
not
own,
like
a
reactive
caching
worker
for
each
feature,
category
that
uses
reactive,
caching
and
like
have
each
one
use
the
correct
one.
I
mean
that's
a
lot
of
code,
duplication.
A
Yeah
we'll
see
how
it
goes
like.
I
said
I've
paused
it
for
now,
because
I
think
it
could
cause
confusion
with
the
next
stages
of
the
catch-all
sidekick
rollout,
because
we're
doing
that
by
feature
category.
So
if
we
create
a
metrics
query
that
uses
feature
category
and
not
owns
in
the
mix,
we
don't
want
those
to
be
conflated
just
right
now,
like
in
a
week's
time,
it'll
probably
be
fine
but
yeah
so
rachel.
That
answers
a
small
part
of
your
question,
but
I
don't
think
answers
the
rest
of
it.
D
So
my
what
I'm
doing
at
the
moment
is
trying
to
find
all
the
pieces
that
people
may
be
concerned
about
regarding
error,
budgets
and
say
well
either.
This
is
what
we're
doing,
or
this
is
what
we're
going
to
do,
or
no,
we
hadn't
thought
about
that
yet,
but
to
list
out
those
things
so
that
people
have
some
idea
of
where
error
budgets
are
going
to
go
next,
so
that
this
is
helpful
for
me,
because
I
can
just
put
that
in
the
list.
B
Yes,
but
I
need
to
I
need
just
a
moment
before
I'm
ready
to
talk.
I
want
to
go
back
to
our
first
topic.
B
I
don't
know
why
I'm
having
an
awful
time
putting
a
comment
on
our
on
the
issue.
I
think
this
is
probably
this
is
probably
client-side
thing:
okay,
I'll
just
I'll
push
this,
as
is
for
now
and
I'll,
put
more
details
later.
Okay,
screenshot.
B
This
is
what
we
were
looking
at
earlier.
Is
this
sharing
properly?
Can
you
see
yeah,
perfect,
okay,
great,
so
we've
seen
this
together.
We
just
did
this.
These
are.
These:
are
the
5.0
to
5.1
duration
period,
periodic
spikes.
B
A
B
Yeah,
which
is
you
know,
not
super
great
to
see,
but
this
is
yeah.
So
this
is
it's
easier
to
see
on
this
graph,
that's
showing
per
cpu
core
usage.
You
can
see
that
these
spikes
are
attributable
to
exactly
one
core,
which
means
exactly
one
process,
which
means
this.
This
is
very
likely
to
be
the
rdb
backup
process.
B
Rdb
backups,
effectively
run
by
forking
the
main,
the
the
redis
server
process,
so
that
it
begins
life
with
with
an
identical
copy
of
the
virtual
memory
for
the
main
redis
process.
As
of
a
point
in
time,
and
linux
will
implement
copy
on
write
for
all
of
those
memory
pages.
So,
as
the
main
process
mutates
says,
pages
new
pages
get
allocated
which,
for
for
the
for
the
forked
child
process,
that's
actually
trying
to
write
the
backup
file
out,
and
that
is
why
this
graph
shows
so
spikes
in
page
usage.
B
So
if
I
suppress
this,
we
get
to
see
the
the
growth
is
primarily
coming.
The
growth
in
memory
usage
is
primarily
coming
from
anonymous
pages,
which
is
because
of
the
copy
on
right
for
reduces
for
redis's,
actual
datastore
and
also
in
cache.
Because
because
the
whole
point
of
this
forked
process
is
to
write
out
files
to
disk
which
of
course,
go
through
the
fast
cache.
B
We
definitely
don't
do
them
on
redis
cache
for
cash
mains.
Yes,
I
would
have
expected
us
to
see
to
do
those
on
red
is
sidekick,
but
I'll
I'll
go
verify
that
it
may
be
that
I
don't
know.
I
mean
right.
B
C
B
A
C
Yeah
matt,
can
you
there's
a
graph
if
you
go
to
the
redis
dashboard
for
the
cop,
the
amount
of
memory
used
during
the
copy
on
rights?
If
you
go
to
redis
main
and
then
go
or,
however,
you
want
to
get
there,
there's
the
amount
of
of
memory
that
was
used
during
the
copy
and
write
by
the
fact
that
it
was
mutating
on
the
in
the
main
radius
thread.
C
B
C
Yeah,
if
you
yeah
down
it'll,
be
down
not
not
in
the
indicator,
detail
somewhere
down
here.
I
think
I
mean
it's
definitely
metric,
but
I
don't
know
if
we
plot
it,
it
doesn't
look
like
it
and
there.
Those
spikes
are
again
was
that
is
that
no.
B
C
Yeah,
there's
a
there's
a
like
a
metric
which
shows
you
how
much
memory
you
basically
used
by
you
know
the
the
the
copy
part
of
the
copy
and
write
and
that
I've
found
in
the
past
is
like
quite
an
interesting
metric
to
tell
if.
A
B
Yeah,
no
yeah,
it's
it's
a
good
question.
I
think
I
think
the
answer
is
just
that
that
we
we
configure
the
primary
and
secondaries
the
same,
because
any
any
node
could
take
up
that
rule.
C
B
C
It's
no,
I
think
it's
it's
a
very
interesting
one
of
the
things
that's
kind
of
weird
about
it,
though,
is
it
doesn't
seem
to
affect
all
the
reddish
queries
that
are.
You
know
it
seems
to
affect
a
handful,
because
obviously
there
might
be
20
000
going
through
yeah
and
we
don't
see
it
in
the
red
or
slow
log.
B
C
Which
which
yeah
they
yeah
it's?
This
is
going
to
be
very
interesting,
and
I'm
really
looking
forward
to
figuring
out
someone
figuring
out
what
it
actually
was
yeah,
because
that
is
that
is
very.
C
Their
shirts
because
they
like
this,
is
pretty
old,
but
they
like
they're,
just
good
swag,
shirts,
nice
yeah
yeah.
That's
that's
yeah
sean
like
what
I
think
because,
like
what
I've
seen
in
the
past
on,
I
think
when
we
were
still
on
azure
a
long
time
ago,
there
was
something
with
the
hypervisor
and
forking
reddest
processes.
I
forgot
it's
like
lost
in
the
midst
of
time,
but
it
basically
when
we
forked
and
did
the
copy
and
write
it
performed,
really
really
badly
and
things
basically
came
to
a
halt.
C
But
if
it
was
something
like
that,
you
would
see
far
more
requests
that
would
have
slowed
down.
But
it's
just
like
a
handful
and
that's
what's
really
strange.
Yeah.
A
B
A
C
B
So
I
think
I'll
I'll
proceed
as
as,
as
we
talked
about
before,
with
with
the
pcap
instrumentation
and
I'll
try
to
get
that
today.
So
we've
got
something
to
look
at
tomorrow.
B
Having
to
modify
the
page
tables,
I
would
expect
the
the
process
doing
the
meeting
to
incur
that
overhead,
so
that
would
be
the
rightest.
The
the
reddest
main
thread,
not
the
not
the
process
doing
rdb
backup
would
be
my
guess,
and
I
would
expect
that
to
be
on.
B
You
know,
perhaps
a
microsecond
time
scale
on
a
per
event
basis,
but
those
events
would
happen
very
often,
and
so
it
could
cumulatively
add
up
to
a
lot,
especially
early
early
in
the
process.
Before
most
of
the
pages
mutated.
I
don't
see
how
that
would.
I
cannot
imagine
a
way
that
that
would
be
biased.
A
C
A
No,
so
the
other
thing,
that's
weird,
is
that
I
don't
know
I
was
assuming
that
they
would
show
up
in
the
red
is
slow
log,
but
if
it's
an
issue
with
like
actually
processing
the
command
on
the
redis
side
like
before
it
gets
to
the
like
process
command
stage,
you
know
before
it
like
decides
which
command
it
is,
then,
maybe
maybe
that
wouldn't
be
counted
in
the
slow
log
yeah.
A
Weird
also,
to
be
honest,
the
spike
above
one
second
is
still
a
bit
concerning
you
know
like
in
that
first
chart.
I
showed
this
like
one.
B
B
A
D
I
don't
mind
if
the
rest
of
the
time
is
looking
at
this,
because
this
is
it's
interesting
to
see
what's
in
here
and
why
this
is
happening,
and
I
think
any
time
that
we
spend
researching
interesting
things
like
this
gives
us
more
understanding
of
how
else
we
can
tweak
reduce
in
our
favor
to
make
it
more
and
more
performant,
because
I
mean,
as
we've
seen
from
the
latest
10-minute
reports
like
we're,
also
going
to
have
a
problem
with
the
cash
one
as
well
so
yeah.
D
B
Sounds
good
sean.
I
saw
you
wrote
a
comment
I
rolled
out
of
bed
to
join
this
meeting,
so
I
haven't
read
it
yet.
But
I'll
read
it
right
after
this.
A
Yeah
my
comment
yeah.
No,
I
asked
you
on
slack
actually
like
when
do
you
want
to
pair
up?
So
just
let
me
know
on
slack,
but
if
you
want
to
roll
back
into
bed
for
a
bit,
that's
totally
fine,
because
it's
like
before
7,
where
you
are
so
yeah
cool
anything.
D
Else
thanks
so
much
for
joining
the
call
looking
forward
to
seeing
what
we
find.
This
is
interesting.