►
Description
As part of our investigation into a WAL archiving saturation incident (https://gitlab.com/gitlab-com/gl-infra/production/-/issues/6581) we got into an ad-hoc profiling session, and general introduction into CPU profiling.
Participants:
- Matt Smiley
- Igor Wiedler
- Alexander Sosna
- Biren Shah
A
B
B
To
the
void,
thank
you.
Thank
you,
yeah,
okay,
so
we've
got,
we've
got
yeah,
so
we've
got
some
some
some
context
here.
We
are
currently
running
a
cpu
profile
on
for
120
seconds,
we're
running
a
sampling,
cpu
profile
on
a
subset
of
the
pro
of
the
postgres
processor.
Specifically,
there's
a
one
long
live
purchase
process
that
handles
archiving
raw
files.
It's
the
parent
process,
that's
that
spawns
the
extremely
short-lived
wall
g
processes,
which
is
what
we
really
care
about
profiling.
B
So
by
default,
perf
is
going
to
recursively
inherit
any
child
processes
that
that
are
spawned
by
the
pit
that
we're
specifying
here,
which
is
how
we're
kind
of
getting
away
with
capturing
a
profile
of
the
wall
g
processes
that
would
otherwise
be
too
short-lived
to
to
grab
by
by
process
id
just
for
context.
Another
simpler
way
to
do
this
would
be
to
just
capture
a
profile
of
everything
running
on
cpu
on
this
host
and
then
filter
that
as
a
post-processing
post
process,
except
to
just
include
the
process
ids
that
we
care
about.
B
But
this
is
this-
is
kind
of
a
more
direct
measurement
of
what
we
want.
So
that's
why?
I'm
starting
with
it
so
so
this
is
the
the
first
step
of
capturing
the
profile
data
it
goes
to
so
everything
else
I'm
going
to
talk
about
for
the
next
couple
of
minutes
is
generic
by
default.
Perf
record
is
going
to
write
out
to
a
perf.data
file,
which
is
a
raw
binary
capture
file
to
get
something
useful
out
of
it.
B
We
run
perfscripts
by
the
way
I
should
have
started
with
this
just
for
reference
in
user
local
bin
we've
got
a
set
of
helper
scripts,
generally
you're
going
to
want
to
run
this
one.
It
takes
no
arguments,
you
just
run
it
like
tab
completes
for
all
cpus.
You
press
enter
on
this
and
it'll
grab
60
seconds
worth
of
profile
of
all
processes,
running
on
the
cpu
and
it'll
generate
and
it'll
do
all
of
the
post
processing
steps
that
I'm
about
to
do
manually
now,
just
for
reference.
B
B
So
I'm
gonna,
so
perfscript
is
going
I'm
just
talking
in
very
general
terms.
Now
perfscript
is
going
to
extract
the
contents
of
that
perf.data
file
and
give
us
effectively
a
textual
output
of
each
each
event
that
was
captured
as
part
of
that
profiling
run.
I
usually
name
these
files,
something
something
to
indicate
the
context.
So
in
this
case,
I'm
going
to
say
postgres
archive
or
electronic
scores.
B
So
this
will
often
emit
warnings.
Usually
you
can
afford
to
ignore
them.
The
warnings
generally
are
talking
about
how
certain
libraries
that
are
on
disk
don't
match
the
libraries
that
are
actually
linked
being
used
in
the
process,
because
we've
had
the
library
upgraded
since
the
last
time
the
process
started.
So
the
process
is
using
an
older
version
of
the
library
and
consequently,
we
can't
use
any
symbols
that
are
available
for
the
on
disk
version
of
the
library
because
they
don't
match,
what's
what's
actually
being
used
by
the
process.
B
All
of
that's
really
just
kind
of
giving
us
an
fyi
about
why
some
some
of
the
some
of
the
symbolic
names
for
functions
may
be
missing
from
the
profile.
So
just
so,
you
can
see
what
the
profile
looks
like.
This
is
not
sensitive
information
by
the
way.
So
this
is
a
single
event,
so
these
are
the
headers
by
the
way
that
just
give
kind
of
context
for
for
the
capture.
This
is
a
single,
a
single
profiling
event.
This
is
another
single
profiling
events.
You
can
see
that
it
oops.
B
You
know
I
accidentally
scrolled
my
mouse
and
start
over.
So
name
of
the
process
process
id
a
timestamp,
since
this
is
second
since
boot
with
high
precision,
and
this
is
the
kind
this
is
the
event
cpu
clock.
It's
just
the
default
events
that
we
were
profiling
and
then
you've
got
a
stack
trace
for
whatever
for
this
particular
process.
At
that
moment
in
time,
wall
g
is
compiled
without
any
useful
debug
symbols.
Unfortunately,
which
means
we
don't
get
symbolic
names.
B
All
we
got
are
the
raw,
the
raw
virtual
addresses
for
for
the
the
frames
in
the
stack
trace,
which
makes
it
hard
to
interpret
so
we
won't
get
a
useful
flame
graph,
but
I'm
still
going
to
generate
a
flame
graph.
So
you
can
see
what
it
looks
like.
B
B
B
So
it's
svg
files
are
scalable,
vector,
graphics
files,
they're
graphic
images,
but
they're
effectively
most
image
files.
These
are
actual
text
files.
These
are
xml
files
and
they
include
some
helpful
javascripts
that
lets
you
basically
treat
them
as
interactive
graphics.
So
we
can
do
mouse.
Overs,
look
at
the
bottom
of
the
screen.
Here.
B
You
can
do
mouse
overs
for
each
for
each
frame
and
it'll
tell
you
things
like
sample
account
and
the
percentage
of
sample
counts,
and
you
can
also
do
things
like
search
for
all
frames
containing
a
particular
string,
which
is
not
super
useful
here,
since
we
don't
have
any
useful
function
names,
but
in
general
it's
going
to
be
it's
going
to
be
more
helpful,
so
the
takeaway
here
is
we
were
we
were
profiling,
the
postgres
archiver
process
and
any
child
processes
that
it
creates.
So
this
frame
over
here
that
represents
one
point.
B
Three
percent
of
the
samples
was
the
postgres
process
and
and
ninety-eight
point.
Seven
percent
of
the
samples
came
from
wall
g,
child
processes
that
respond
by
it.
So
we
know
that
that
most
of
the
cpu
time
that
we're
observing
here
came
from
wall
g,
not
from
the
postgres,
the
the
postgres
parent
process.
That
was
spawning
them.
So
because
that's
the
case,
I'm
gonna.
B
C
And
this
is
this
is
cpu
time
it's
sampling
on.
B
B
Super
important
yeah:
this
is
just
cpu
time,
not
wall,
clock
time.
What
we
really
care
about
in
this
case
is
wall,
clock
time
and
we're
using
cpu
and
we're
so
bear
that
in
mind.
As
we
look
through
this,
this
is
not
going
to
represent
time
spent
on
disk
io
or
network
io,
okay.
So
so
what
I
wanted
to
use
the
the
same
profiling
data
for
is
to
load
it
into
another
tool,
called
flame
scope,
which
will
give
us
a
timeline
of
when
these
samples
occurred.
It's
just
a
convenient
way
to
visualize.
B
Yeah,
okay
to
visualize,
when
we
had
on
cpu
time.
C
C
That
doesn't
tell
you
anything
about
the
time
dimension,
and
so
what
we're
about
to
see
what
matt
is
about
to
show
in
flamescope
is
actually
showing
that
time
dimension
and
I'll
hand
it
over
to
you
for
further
commentary.
Matt.
B
Sure
yeah,
so
this
is
what
ego
was
just
talking
about.
You
can
see.
We've
got
bursty
behavior
here.
This
is
what
we
would
expect
from
what
we
know
about
the
processing
model
that
we
should
have
bursty
behavior
by
the
way
when
I
was
talking
earlier
about
how
how
little
of
the
time
was
spent
in
the
postgres
process.
B
The
main
reason
I
was
mentioning
that
is,
we
could
potentially,
as
a
kind
of
an
in-between
post-processing
step,
pull
out
only
the
samples
that
came
from
the
wall
g
pids,
rather
and
and
throw
away
the
the
samples
that
came
from
postgres,
but
because
they
represent
such
a
small
percentage
of
of
the
the
samples
anyway,
I'm
going
to
just
skip
over
that
and
and
assumed,
and
assume
rightly
assumed
that
that
the
large
majority
of
what
we're
looking
at
here
came
from
wall
g,
not
from
postgres.
B
So
that's
that's
the
whole
reason
I
wanted
to
show
the
fling
graph
so
yeah.
So
what
we
see
here
is
an
oscillating
pattern
where
we
spend
where
we
spend
some
some
significant
time
on
cpu
and
and
then
a
lot
of
time,
not
on
cpu
and
at
times
the
time
that's
spent
not
on
cpu
could
be
spent
in
any
any
of
a
few
possible
areas.
One
is
disgaea
reading
the
wall
files,
another
another
is
network.
B
I
o
interacting
with
with
the
the
api
for
uploading
to
the
object,
storage
bucket
and
another
is
just
not
doing
anything
waiting
for
waiting
for
another.
So
this
is.
This
is
a
little
bit
harder.
I
feel
like
a
whiteboard
would
do
better
for
this,
so
the
the
operating
model
for
wall
g's
background
upload
behave.
So
igor
already
knows
about
this,
because
we
talked
about
it
yesterday
in
during
the
code
review,
but
I
wanted
to.
B
I
wanted
to
kind
of
revisit
sorry,
I'm
I'm
making
gestures,
but
you
can't
see
it
because
the
gestures
don't
really
tell
you
anything
anyway.
I,
I
guess
the
important
p
piece
here
is:
oh,
maybe
maybe
maybe
I
could
just
use
a
text
file
for
this
yeah.
C
B
That,
okay,
so
the
sequence
of
events
for
for
any
individual
invocation
of
world
cheese,
so
well
g,
wology,
gets
invoked.
Take
a
step
back.
Postgres
postgres
will
run
its
configured
archive
command
once
per
wall
file.
That
needs
to
be
archived.
B
We
know
that
we've
got
a
backlog
of
thousands
of
wall
files,
so
this
is
going
to
run
thousands
of
times
eventually,
and
we
just
saw
that
we
generate
about
five
percent
of
all
files
per
per
second,
so
that
should
be
about
approximately
the
call
rate
for
for
for
this
archive
command.
So
our
archive
command
is
a
thin
shell
script.
Wrapping
an
invocation
of
wg
ball
push,
which
means
we
call.
We
invoke
a
new
world
g
process
about
five
times
a
second.
B
If
we're
keeping
up,
am
I
remembering
the
numbers
right?
It
was
about
five
five
files
per
second
that
we
were
generating.
B
Or
five
yeah,
okay,
okay,
great
so
just
so
so
going
with
that,
we
would
expect
we
would
to
be
able
to
keep
up
we'd
expect
the
average,
the
average
duration
for
the
whole
g
invocation
for
each
4g
invocation
to
need
to
be
about
a
fifth
of
a
second
about
200
milliseconds.
So
those
are
very
short-lived
processes.
This
should
be
very
short-lived
processes.
But
wallge
has
this
interesting
kind
of
implicit,
batching
behavior
where,
where
it
gets
well,
gee
whoa.
B
Whatever
I'm
gonna
make
up
the
file
names
and-
and
it
will
implicitly
say
okay,
I'm
going
to
my
main
on
my
main
thread.
I'm
going
to
archive
I'm
going
to
start
archiving
that
specified
wall
file.
B
And
as
a
background
activity,
I'm
also
going
to
try
to
look
for
the
next
several
wall
files
and
this
this
count
is
configurable
and
we've
just
increased
that
count
from
10
to
15
yesterday,
which
was
the
change
issue
that
alexander
made
for
us
and
we're.
So
I
know
I'm
covering
ground
that
you'll
have
some
pieces
of.
So
I
just
wanted
to
make
sure
we're
all
on
the
same
page
about
the
the
line
diagram
about
to
try
to
draw
if
it
takes
slightly
less
time
to
complete
this
file.
B
That
thread
will
pick
up
another
another
file
before
the
end
of
before
the
end
of
the
main
thread
finishing
this
one,
and
that
file
can
potentially
go
on
for
about
the
same
amount
of
time,
because
all
these
files
are
roughly
the
same
size
and
should
be
approximately
the
same
amount
of
work
to
do
in
upload
form
right.
B
So
statistically,
we
would
expect
about
half
of
these
threads
to
finish
before
the
main
thread
and
half
them
to
finish
after
the
main
thread,
because
they'll
get
to
start
their
work
at
about
the
same
time
and
they'll
have
about
the
same
amount
of
work.
To
do
so,
when
that,
when
a
thread
finishes
early
like
this
example,
then
it
gets
to
grab
a
second
file
to
work
on
and
when
it
finishes
late.
Whenever,
whenever
this
first
line
finishes,
it
says.
B
Okay,
all
my
helper
threads
you're
not
allowed
to
take
any
more
work,
but
you
are
allowed
to
finish
what
you're
already
doing
and
so
because
they're
all
because
they're
all
tends
to
have
about
the
same
amount
of
work.
To
start
with,
I
would
expect
that
each
of
these
helper
threads
gets
to
do
either
one
file
or
two
files,
never
never
more,
probably
never
more
or
less
and.
C
B
May
be,
there
may
be
some
other
influences
that
that
can
affect
I'm
not
as
confident
about
the
less,
but
I
think
it'd
be
really
surprising
if
any
of
the
threads
got
to
do.
Sorry,
when
I
say
threads,
I
mean
go
routines,
but
kind
of
threads
is
more
natural
to
talk
about
for
some
reason.
Anyway.
Sorry,
I
digress.
B
The
whole
reason
I'm
mentioning
this
is
in
in
this
case,
just
I'm
going
to
lie
the
rest
of
these
in
this
case,
which
I
think
is
because
we've
got
10
or
15
now
of
these
helper
helper
threads,
it's
very
very
likely
that
at
least
one
of
them
is
going
to
start
get
to
start
a
second
file,
which
means
that
our
our
average
duration
for
completing
this
command
is
going
to
be
twice.
B
It's
going
to
be
twice
as
long
as
it
takes
to
upload
just
the
one
file,
because
it
we
we
would
expect
at
least
one
of
those
threads
to
have
just
barely
started
its
second
file
by
the
time.
The
first
file
finishes.
Is
that
reasoning
hold
water
for
all?
You.
D
D
Will
and
when
archive
command
fires
again
in
between
will
it.
B
Oh,
that's
a
great
question
so
so
the
postgres
archiver
will
only
run
one
archive
command
at
a
time,
and
so
these
invocations
are
serialized,
and
this
this
is
the
other
interesting
part
of
the
story.
So
the
next
time
the
archive
command
runs,
it's
going
to
say
run
for
this
file
and
bulgey
will
will
complete
what
so
that
there's.
This
kind
of.
B
So
there's
there's
two
there's
kind
of
this
oscillation
between
wall
g,
having
a
lot
of
work
to
do
where
it
uploads
the
one
file
that
was
asked
to
upload,
plus
an
indeterminate
number
of
additional
files
thanks
to
these
helper
threads,
when
these
helper
threads
finish.
Let's
just
say
this
is:
let's
say
that's:
this
is
o2
so
later
on
the
next
time
the
postgres
archiver
says
hey,
I
want
you
to
archive
this
file.
This
wall
g
process
is
going
to
say.
B
Oh,
you
know
what
I've
already
uploaded
this
file,
so
I
exit
almost
instantly.
Does
that
make
sense?
B
Lot
easier
to
see,
this
is
probably
worth
showing
real
quick
so
because
on
because
on
these
nodes
we
have,
we
have
our
bpf
tools.
We
can
do
a
quick,
exact
snoop
on
this.
B
So
I
want
time
stamps
for
reference,
and
I
want
process
name
to
be
wall
g.
So
what
this
is
going
to
do
is
this
bpf
tool
is
I'll
press
enter
in
a
second,
but
this
bpf
tool
is
going
to
attach
to
the
the
exec
ve
syscall
and
a
couple
of
other.
I
think
it
is
one
of
their
variants
anyway.
The
point
of
it
is
whenever
a
new
process
gets
created.
B
This
bpf
program
is
going
to
is
going
to
capture
the
event
of
creating
that
new
process
check
to
see
if
the
process's
name
matches
what
we've
specified
here
and
then
print
out,
basically
a
log
event
to
it
to
so,
we
can
see
it
and
I'm
gonna
press
enter
now
and
you'll
see
the
pattern,
so
it
takes
a
couple
seconds
to
compile
and
install,
and
so
now
we're
running
so
I'm
gonna
press
enter
now
and
press
enter
as
soon
as
this
batch
finishes,
just
to
kind
of
give
a
visual
break.
B
What
we
just
saw
so
most
of
these
invocations
of
wall
g
are
very,
very
quick,
and
you
can
see
that
where
he's
getting
the
you
know
the
sequential
wall
file
names
here
when
they
say
quick,
I
mean
they're
finishing
in
a
few
milliseconds
each
a
few
tens
of
milliseconds
each
and
what's
going
on
here
is
wall
g
says:
oh,
you
know
what
you're
asking
me
to
upload
this
file,
but
I
can
see
from
my
scratch
notes
that
I've
already
uploaded
the
file,
so
I'm
just
going
to
exit
immediately
rather
than
doing
any
real
work.
B
So
this
is
effectively
a
no
up
run.
Eventually
it
gets
to
the
end
of
the
list
of
files
that
it's
already
uploaded
on
this
occasion,
and
so
it
says,
oh,
I
haven't
uploaded
this
file.
Okay,
I'll
actually
I'll
actually
do
some
work
and
by
the
way
I'm
going
to
launch
my
internally
I'm
going
to
launch
my
helper
threads
to
to,
because
I
have
it
back.
I
can
see
that
I
have
a
backlog
to
proactively
upload
another
big
old
back
to
files
and
that's
why
this
invocation
takes
several
seconds
to
run.
B
In
this
case,
it
took
just
under
three
seconds
to
run,
and
then
the
next,
the
next
n
invocations
again
are.
Oh,
I
see
this
file's
already
been
uploaded
I'll,
just
exit
immediately.
So
this
is.
This
is
the
first
kind
of
layer
of
interesting
oscillating
patterns
that
you'll
see
where,
where
one,
where
n
out
of
one
runs,
are
going
to
be
very
quick
because
they're,
no
ops
and
the
the
one
out
of
every
end
runs
is
going
to
be
rather
slow
because
it's
actually
doing
work.
C
Right
and
basically
the
for
each
one
of
these
groups
or
batches.
The
the
first
item
is
the
the
slow
run.
Actually,
I
guess
it
would
be
the
next
one
right.
C
C
Start
time,
yes,
okay,
yes,
yeah,
yeah,
okay,
yep,
so
so
that's
so
be
in
this
case
is
going
to
be
the
slow
run
and
all
of
the
fast
ones
following
it
are
the
the
concurrent
optimistically
concurrent
wall
files
that
were
uploaded
by
that
first,
one
that
are
now
sort
of
getting.
B
Yeah,
so
this
one
definitely
upgrade
uploaded
be,
but
it
also
secretly
silently
uploaded
bf
and
c0
and
c1,
etc
on
all
the
way
up
to
d1
and
the
number
of
files
in
the
batch
that
this
one
kind
of
secretly
did
for
us
is
what
we
were
trying
to
increase
by
increasing
the
the
wall.
G
upload,
concurrency,
setting.
B
B
B
So
if
we
just
run
this
in
a
loop
refreshing
every
10
seconds,
you
can
see
that
it's
that
these
files
are
very
short-lived,
so
the
the
the
exec
snoop
that
we
did
was
showing
us
the
the
typical
batch
size
implicitly
by
showing
us
that
we
had.
I
I
didn't.
I
should
have
counted
them,
but
it
was.
It
was.
C
B
Yeah
yeah,
it
was
about
15
years
yeah,
and
so
so,
we'll
effectively
we'll
get
all
15
of
those
will
be
create,
will
have
corresponding
zero
byte
files
added
to
this
directory,
and
then
the
subsequent
runs
of
world
ge
will
look
in
this
directory
before
they
do
any
work,
see
that
that
file
name
already
exists
and
know
that
that
file
was
already
successfully
uploaded
to
to
the
object
storage
bucket.
B
So
that's
that's
the
mechanism
by
which
it'll
figure
out
that
it
doesn't
that
it
can
treat
itself
as
a
no
upfront
and
we
can
just
by
watching
files
being
created
and
destroyed
from
this
directory.
We
can
kind
of
see
the
pattern
of
that
happening
so
that
that
was
the
last
bit
of
show
and
tell
I
wanted
to
I
wanted
to
give.
This
is
all
not
obvious
behavior.
I
don't
even
know
if
this
is
documented.
B
I
came
across
this
a
few
months
ago
during
our
previous
round
of
bulgy
throughput
improvements
yeah,
so
so
tying
that,
back
to,
I
forgot
tying
that
back
to
the
flamescope
data.
B
Now,
with
that
context,
we
can
see
that
that
these
periods,
by
the
way
the
scale
for
this
is
each
vertical
line-
represents
one
second
of
data
and
and
be
captured
120
seconds
worth
of
data.
So
this
time
span
is
probably
about
half
a
second
of
cpu
intensive
activity,
and
we
don't
know
what
it
was
actually
doing
on
cpu
during
this
time,
because
we
don't
have
because
we
don't
have
symbolic
names
for
any
of
the
functions.
B
So
this
is
this
is
part
of
what
I
wanted
to
see
is
when,
when
we're
having
these
microbursts
of
cpu
activity,
how
many
cpus
are
we
using?
And
the
answer
to
that
question
appears
to
be.
If
we
mouse
over,
we
can
see
how
many
samples
there
were
in
sorry.
This
is.
This
is
also
kind
of
not
obvious.
B
How
many
sorry
for
each
for
each
point
in
time
when
we,
when
we
captured
a
profile,
how
many
stacks
did
we
grab
and
we'll
only
grab
a
stack
if,
if,
if
a
thread
is
on
cpu
at
that
time,
so
this
is
effectively
saying
how
many,
how
many
cpu
cores
was
walledgi
using
at
that
at
that
moment
in
time,
and
we
can
see
that
the
scale
here
goes
from
0
to
16.
B
B
96
cpu
cores
available
and
generally
we're
using
a
little
bit
more
than
half
of
them
during
the
the
workday
peak.
I
think
I
think
it
generally
rises
to
about
50
to
60
usage.
If
I
remember
right,
yeah.
B
So
yeah,
so
that
means
that
that
we
can
afford
to
burn
16
cpus
in
short
periods
of
time,
doing
well
archiving.
So
this
also
kind
of
suggests
that
so
we're
taking
a
step
back.
There
were
two
things
I
wanted
to
get
to
see
with
this
profile.
One
of
them
was,
one
of
them
was
to
see
how
how
long
these
I
knew
that
these
bursts
had
to
be
happening,
but
I
didn't
know
how
long
they
were
they
were
happening.
B
We
saw
from
the
exec
snoop
that
that
the
the
ignoring
the
no
up
runs
that
we've
got
up
here
for
the
runs
that
actually
did
work.
We
knew
that
they
were
they
were
taking.
I
think
before
we
made
our
tuning
change
yesterday,
we
saw
an
example
of
it,
taking
a
little
bit
less
than
three
seconds
and
now
they're,
taking
at
least
in
this
case.
It
took
about
three
seconds
right.
B
Yeah,
thank
you
vote
cloud
time,
so
that
means-
and
this
is
representing
cpu
time
so
if
we
say,
for
example,
three
three
columns
here
since
we've
got
one
column
per
second,
three
columns
here
would
represent
the
total
wall
clock
time
for
any
single
wall
g
execution
and
it
kind
of
looks
just
at
a
glance
like.
Maybe
you
know
much
less
than
three
seconds
of
that
was
spent
on
cpu,
which
means
that
the
rest
of
that
three
seconds
was
spent
off
cpu,
probably
doing
another
disc
I
o
or
network.
B
C
Yeah
and
and
that
that
also
on
a
high
level
matches
what
we
see
when
we
look
at
the
overall
cpu
utilization
in
in
top
or
pit
stat
yeah,
you
were
showing
it
earlier.
So
it's!
C
When
I,
when
I
ran
this
yesterday
before
we
did
our
tuning,
it
was
peaking,
I
think,
up
to
four
or
five
hundred
percent
yeah.
So
that's.
C
Yes,
so
it's
yeah,
we
can
see
now
it's
it's
bursting
up.
To
I
mean
it
is.
It
is
very
bursty,
but
it's
bursting
up
to
like
600
percent,
so
six
cores,
but
we
gave
it
a
we
gave
it
15
go
routines
right.
Yes,
so
it's
using
roughly
half
of
those
go
routines
on
cpu
and
that
matches
what
we
were
just
looking
at
and
sort
of
this
idea
that.
C
B
So
all
of
this
is
all
of
this
is
kind
of
tying
into
the
the
the
overall
question
of.
Is
it
safe
to
further
increase
this
parameter,
and
the
answer
appears
to
be
yes,
we
can
moderately
increase.
It
again
is
my
takeaway
in
term,
and
I'm
thinking
the
way
I'm.
I
think
this
is
the
same
way
that
we're
all
framing
it,
but
just
to
be
explicit
in
terms
of
in
terms
of
machine
resource
usage,
we're
looking
at
cpu
usage
memory.
B
Memory
usage
is
less
of
a
concern,
but
memory
thrashing
could
be
a
concern
I
checked
yesterday.
It
seems
to
be
perfectly
fine
with
with
respect
to
that,
so
I'm
gonna
gloss
that
over
for
now,
without
rechecking
it
disk
and
network
io
are
the
other
two
machine
resources
that
we
could
be
concerned
about.
I
am
super
not
worried
about
disc
io,
as
as
a
as
as
a
concern
for
this
host.
We
we
are
nowhere
near
like
we're
using
about
half
of
our
of
our
spec'd
capacity
on
this.
B
I
o
and
there's
no
way
well.
G
can
burn
through
that,
and
so
that
leaves
network.
I
o
and,
as
I
recall,
it's
harder
to
calculate
our
actual
networks,
our
actual
network
usage,
because
disk
io
counts
against
it
as
well,
but
last
I
checked.
We
were
also
nowhere
near
capacity
on
that
as
well,
and
I
kind
of
don't
see
you
know
I
kind
of
don't
see,
wool
g
being
a
significant
risk
for
that
as
well.
B
So
those
are
those
are
kind
of
the
categories
that
we
could
focus
on
and
so
far
we're
fine.
A
Yeah,
I
I'm
not
always
right
with
my
assumptions,
but
even
if
we
would
take
out
of
take
out
all
throttles
and
let
wallg
run
full
powered
with
as
many
many
threats
as
we
go,
I
I
now
would
propose
no
not
proposed.
I
would
assume
that
the
disc
io
maybe
go
down,
because
the
fresh
wall
files
are
always
in
the
in
the
memory,
so
you
don't
have
to
read
them
from
this
anyway,
because.
C
A
B
B
So
I
guess,
oh
so,
okay,
so
in
terms
of
tuning
those
are
those
are.
I
think
it's
viable
to
to
bump
that
that
that
knob
a
little
further
if
we
want
to
for
for
potential
okay,
so.
B
We
have
a
couple
of
other
things
that
we
can
do
for
not
not
necessarily
immediately,
but
I
think
for
for.
B
We
know
that
we
have
some
kinds
of
events
that
can
that
can
trigger
a
significant
increase
in
wall
generation
rates
like,
for
example,
when
we
do
when
we
do
bulk
data
changes
like
like
for
migrating
from
four
byte
integers
to
eight
byte
integers
for
primary
keys,
just
as
just
as
an
example,
any
anything
that
does
large
both
both
data
updates
that
ends
up.
Writing.
B
Writing
historical
blocks,
in
particular,
is
going
to
generate
more
wall
files
than
we
more
more
world
records
than
we
usually
get
we're
not
at
the
end
of
needing
to
do
that
kind
of
maintenance.
So
I'm
a
little
bit
worried
that
we're
gonna,
have
you
know
future,
perhaps
ongoing
and
future
background
migrations
that
that
kind
of
nudge
us
over
the
edge-
and
I
guess
kind
of
in
an
even
more
mundane
sense,
folks
doing
feature
development
work
are
not
thinking
about.
B
You
know
wall
generation
as
as
a
design
requirement.
So
the
fact
like
alexander
said
to
start
with,
we
have,
I
think,
pretty
strong
evidence
at
this
point
that
we
are
kind
of
continuously
right
at
the
edge
of
of
saturation
for
for
being
able
to
arc
out
these.
These
wall
files,
turning
this
knob
further
gives
us
you
know
kind
of,
I
mean
we're.
B
I
think
I
think
we're
effectively
adding
like
you
know,
maybe
ten
percent,
it's
it's
not
it's
giving
us
a
small
additional
margin
and
I'm
kind
of
worried
that
that's
not
going
to
be
enough
to
last
us
very
long,
maybe
maybe
weeks,
maybe
months.
B
So
I
kind
of
wanted
to
talk
about
other
several,
some
of
the
other
possibilities
in
addition
to
what
we're
already
talking
about
doing
so
we
at
the
start
of
this
conversation.
We
talked
kind
of
you
know
briefly
about
moving
the
ci
tables
to
their
own
to
their
own
database
cluster,
and
I
think
that
that's
that's
that's
very
much
a
game
changer
in
terms
of
separating
two
two
large
drivers
for
for
read
and
write
activity.
B
So
I
think,
once
that's
completed,
we'll
be
in
a
very
different,
we'll
probably
be
in
a
very
different
space,
but
I
kind
of
don't
feel
like
it's
reasonable
to
just.
You
know
assume
that
that
will
take
care
of
our
problems.
We
don't,
I
don't
know
what
the
timeline
is
for
that
and
I
kind
of
don't
feel
like
it's
reasonable
to.
B
I
feel
like
I'm
using
too
strong
language
here.
I
feel
like
it's
a
significant
risk
to
assume
that
that
project
will
complete.
We
shouldn't.
B
C
I,
by
the
way,
since
since
I
think,
we've
covered
the
interesting
demo
part
I'm
I'm
gonna
stop
the
recording
here.
If
that's
all
right,
yeah.