►
From YouTube: Scalability Team Demo 2021-04-28 - Call 2
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
There
we
go
so
this
is
the
second
demo
of
today
and
I
threw
something
on
the
agenda
and
I'm
the
only
one
who
threw
something
on
the
agenda.
So
I'll
just
go
ahead
and
present
this
thing
to
you,
my
audience:
yeah,
it's
actually
something
we
talked
about,
or
it's
a
it
it's
something
you're
familiar
with.
I
think
already
so
the
it's
about
a
bottleneck
I
discovered
in
grpc
and
the
way
we
used
in
italy
and
so
found.
A
I
got
thinking
about
this
because
of
the
back
objects
cache
and
when
we
tried
to
turn
it
on,
which
is
a
change
you
made
at
some
point,
no
wait
that
did
not
get
reverted,
but
then
I
tried
to
bypass
the
pre-clone
script
for
kidnap
or
gitlab,
and
that
got
reverted
because
the
app
decks
of
falcon
area
one
started
going
down,
and
then
I
tried
to
understand
what
is
wrong
here
and
at
the
end
of
that
investigation,
I
suspected
that
we
were
hitting
a
bottleneck
in
grpc,
but
I
couldn't
really
prove
that
so
I
just
had
to
close
it
and
we
wrapped
up
the
epic,
but
I
kept
thinking.
A
I
would
really
like
to
know
if
this
is
true
or
not,
so
then
what
I
ended
up
doing
yeah,
but
one
reason
why
yeah
so
the
graphs
just
showed
like
italy
was
doing
a
lot
of
stuff.
I
think
that's
one
of
them.
Okay,
maybe
I
should
go
through
the
reasons
why
I
should
try
to
stick
with
my
story,
so
I
have
a.
I
have
a
bit
of
a
story.
Let
me
share
my
screen,
so
there's
an
issue
where
I
try
to
explain
this.
So
one
thing
is
that
we
never
had.
A
This
is
the
git
process
activity
graph
and
we
never
had
gitzli
in
here,
and
we
only
added
it
at
the
end
of
the
pack
object
cache
project,
and
then
I
started
noticing
that
gitly
is
like
the
top
thing
most
of
the
time
quickly
itself
and
we
always
were
tracking
goodly
cpu
utilization,
but
it
was
in
a
different
metric
and
we
never
had
to
do
things
see
the
same
graph.
So
then
it
doesn't
jump
out
at
you
with
that
much
now.
A
The
other
thing
I
noticed,
while
working
on
the
project,
was
that
the
getly
allocates
a
lot
of
memory
when
it's
surfing
traffic
and
it
all
happens
in
the
functions
that
are
responsible
for
transferring
get
http
data,
and
so
in
this
profile,
like
the
top
line's
hard
to
see.
But
it
says
40
gigabytes
and
it
basically
looks
like
and
if
you
click
these
traces,
it's
grpc
stuff.
So
it
looks
like
grpc
is
allocating
memory
proportional
to
the
amount
of
bytes
that
we're
transferring
and
if
you're,
just
copying
data.
A
That's
not
good,
because
that
should
be
a
constant
time
allocation
for
requests,
not
o
n
in
the
number
of
bytes
you're
proxying.
Right
that
I
I
found
that
suspicious.
So
I
I
was
yeah
just
because
of
the
allocations.
I
thought
something
fishy
is
going
on
here
and
I
wonder
if
we
make
this
just
constant,
constant
in
allocations
and
not
linear
in
in
the
requests
or
in
response
size.
What
would
that
give
us
yeah
and
then
the.
B
Other
thing
you
know:
go
ahead:
yeah
I've
got
some
some
interesting,
possibly
related
data
to
share,
but
I
don't
want
to
interrupt
your
story
so
all
right.
C
A
A
Yeah,
so
the
the
other
thing,
besides
the
allocations
which
in
themselves
are
fishy
and
in
retrospect,
they're
super
fishy,
but
we'll
get
to
that.
I
also
noticed
looking
at
cpu
utilization,
that
if
you
look
at
wright's
right
activity
this
these
are
the
threads
for
the.
This
is
the
cpu
time
spent
doing
rights
and
these
are
actual
syscalls.
A
But
all
this
is
stuff
where
we're
putting
messages
on
the
queue
inside
the
grpc
runtime
to
deliver
them
to
the
threads
that
are
doing
the
rights.
So
we're
actually
spending
more
time
telling
jeepers
grpc
to
send
something
across
the
network
for
us
than
it
takes
to
send
it
across
the
network
and
that
also
points
at
an.
A
This
is
this
is
a
cpu
profile,
not
well
yeah
yeah.
So
I
thought
how
can
I
test
this
and
then
I
realized
I
can
just
try.
I
only
care
about
git
hdp,
so
I
could
just
try
to
build
a
toy
version
of
italy
in
a
toy
version
of
workhorse.
A
That
only
knows
how
to
serve
a
git,
http
fetch
request,
but
build
it
on
top
of
tcp
sockets
and
nothing
else
and
benchmark
that
and
that
it
turns
out
that
that
is
insanely
faster
and
that's
sort
of
the
story,
and
I
have
a
table
here
at
the
bottom,
where
I
set
this
up
on
a
virtual
machine
with
the
same
machine
type.
A
If
you
add
the
back
objects,
cache
you
get
more
throughputs,
but
you
get
and
you
get
the
same
system
load.
You
get
slightly
lower
cpu,
cpu
utilization,
but
you're,
basically
still
cpu
saturated
now,
rpc
test
is
this
thing
I
built
that
just
uses
a
tcp
socket
and
you
get
twice
as
much
throughput
and
you
get
almost
100
cpu
utilization,
so
it's
still
being
saturated.
A
A
A
So
I
I
was
kind
of
blown
away
when
I
realized
that
that
was
possible
and
if
you
have
no
immediate
questions,
I
can
show
you
some
flame
graphs
that
make
it
a
bit
clearer.
What's
going
on.
B
Yeah,
I
was
going
to
say
we
this.
This
looks
like
spectacular
results
and
that,
of
course,
as
a
scientist
that
always
kind
of
evokes,
you
know
questioning
the
results.
I
would
love
to
see.
I
would
love
to
see
some
some
profiles,
some
yeah,
some
some
cpu
profiles,
in
particular.
A
Yep,
so
this
is
a
profile,
let
me
just
so.
This
is
slightly
annoying.
I
need
to
download
it
and
open
it,
so
we
can
click
around.
A
This
is
with
the
cash
off,
so
we
do
have
the
overhead
of
the
cash
plumbing
but
zero,
and
it
looks
like
this
is
goodly,
not
your
yeah.
A
B
It
down
here
yes
right
right-
and
this
is
just
just
just
for
for
clarity.
This
is
this
looks
like
a
perf
profile
and
I'm
guessing.
Your
sampling
rate
was
like
99
yeah,.
A
A
So
if
you
look
at
gidley
hooks,
one
thing
that's
interesting
here
is
that
you
see
lots
of
garbage
collection
this.
This
is
a
garbage
collection
function,
but
without
knowing
in
detail
what
the
call
runtime
does.
This
is
garbage
collection.
Obviously,
yes,
this
is
garbage
collection.
Yes,.
B
A
Well
over
here
you
see
it's
actually
reading
from
a
network
connection.
A
So
that
is
getting
data
from
gidly,
but
so
that's
this
little
tower,
but,
for
instance,
this
is
just
communicating
with
the
flow
control
mechanism
in
grpcgo.
Oh.
A
Yeah,
so
this
is
grpc
goes
scheduler
right.
It
has
its
own
scheduling
for
io,
even
though
this
thing
has
only
one
connection
that
it
has
to
work
on,
because
it's
a
okay,
a
single
process,
more
gc
and
then
also
here
like
here.
You
have
kernel
stuff
right.
A
B
A
B
A
B
A
You
see
network
rights,
socket
rights,
yes
great,
but
only
this
narrow
tower
is
circuit
rights.
Right,
yes,
it's
dominated
by
other
stuff.
So
that's
that's
what
things
look
like
with
zero
cash
hit
rate?
Next,
I
should
show
you
what
it
looks
like
with
100
cash
hit
rate,
but
still
using
italy
makes.
C
A
Yeah,
let's
get
that
one,
so
we're
still
not
at
100
cpu,
and
it's
also
what
I
saw
like
it
was
something
like
77
percent.
Yes,
git
is
much
smaller
here
on
the
left,
but
italy,
hooks
and
gately
are
still
pretty
big.
B
A
A
Fine
yeah,
you
can
okay,
so
next
I
can
show
you
rpc
test
without
cash,
so
this
one
was
load
189
and
it
was
100
cpu
utilization,
and
that
also
is
born
out
in
the
in
the
profile
picture,
because
the
swapper
is
all
the
way
over
here
on
the
right
and
it's
tiny.
Yes,
yes,.
C
Okay,
this
is
good
processes.
B
We're
driving
we're
driving
more
get
processes
because
we've
got
higher
throughput
yeah.
A
Exactly
which
is
why
it
gets
more
cpu,
okay,
yeah
otherwise,
git
is
just
stalling,
because
it's
cannot
feed
its
data
too
quickly,
because
italy
is
not
accepting
its
data
fast,
consuming.
B
A
B
There's,
if
you
don't
mind,
let
me
let
me
pause
you
for
a
second
just
for
some
basic
background.
I've
spent
a
decent
amount
of
time
with
italy
and
internal
code,
but
I
haven't
spent
any
real
time
with
giddily
hooks
is.
That
is
that
is
that
that
that's
a
separate
process
that
gets
forked
for
a
short
amount
of
time
and
and
it
it
effectively
acts
as
a
pipe
between
child
processes
in
italy
or
yeah.
A
So
normally
what
every
fetch
request
corresponds
to
a
git
upload
back
process.
Yes,
right
and
git,
upload
pack
can
do
several
things,
but
the
most
expensive
thing
is
sending
back
file
data
and
that
is
farmed
out
to
another
subprocess,
which
is
called
pack
objects.
Pack
objects.
B
A
C
B
A
B
A
B
Okay,
great
so
it's
and
we
spawn
this.
We
spawn
this.
This
hooks
process
for,
for
every
call
right
was
this
in
place
before,
for
every
upload
pack
will
always
will
always
invoke
the
hook,
regardless
of
configuration.
A
Yes,
so
that's
that
is
a
design
choice
we
had
to
make
when
working
on
the
cache-
and
it
appeared
to
me-
but
I
didn't
understand
this
well
enough
at
the
time
it
appeared
to
me
that
it
did
not
add
noticeable
overheads
and
in
the
case
where
the
cache
is
active,
it
reduces
overhead
because
you
have
fewer
good
processes
back
objects
processes.
But
yes,
it
act.
It's
effectively
a
gitly
client
right,
because
it's
it
makes
a
gitly
call
and
it's
copying
data
from.
B
Yes,
yeah
I've
seen
that
that
the
post,
upload
pack
and
ssh
upload
pack
now
nowadays
now
that
we've
got
the
the
caching
enabled
have
a
similar
number
of
another
grpc
called
that
whose
name
I'm
forgetting.
B
A
Other
jobs,
it
was
created
to
do
other
jobs
and
it
already
existed,
and
I
just
added
another
sub
command
to
it,
but
that.
A
Do
we
use
well,
as
you
can
guess,
maybe
get
likes
hook,
hook
executables
as
a
plug-in
mechanism?
Yes,
it's
part
of
git
push
during
git
push
there
is.
There
are
hooks
before
and
after
the
references
get
updated.
A
B
A
A
A
It's
all
exactly,
it's
that's
what
it's
supposed
to
be
yeah,
that's
that's!
That's
his
job,
yes
and
here's,
the
gitly
replacement
and
again
it's
all
copybuffer.
B
A
Out,
so
that's
without
cash,
so
that's
it's!
A
zero
percent
cash
hit
rate
effectively.
Okay,
would
you
zoom.
B
In
on
that
tower
one
more
time
the
rpg
yeah
perfect-
I
just
wanted
to
yeah-
take
take
another
peek,
so
we've
got
about
maybe
30
percent
in
pipe
read.
What's
the
middle
tower.
A
B
B
A
The
syscall
here
is
ebol.
Okay,
that
makes
sense
it's
yeah.
That's
how
go
makes
io
efficient
like
it's
it
all
the
go
routines
that
want
to
do.
I
o
register
with
the
polling
subsystem
and
then
one
thing
is
doing
the
bowling
and
waking
up
call
routines
when
they
have
something
ready
for
them.
C
C
A
Okay,
thanks
yeah,
so
this
was
like
where
we
hit,
like
I
said,
swappers
almost
nowhere,
so
it's
100
cpu
we're
actually
completely
cpu
saturating
the
italy
server,
that's
cool
yeah
and
that's
a
reasonable
bottleneck,
yeah
and
the
repo
we're
using
here
is
the
handbook
which
is
a
very
large
repo.
So,
okay,
a
lot
of
the
time,
because
I
I
did
through
two
sets
of
tests.
A
I
also
tested
it
with
kit,
lap
or
kit
lab,
which
is
about
1.5
gigs,
and
I
have
books
like
five
gigs
and
you
see
the
numbers
shift
a
little
bit
because
we
have
more
requests
in
the
case
of
the
gita
board
gitlab
in
a
30
second
window
than
we
do
with
this
one:
okay,
but
yeah.
So
that's
this
story
and
then
the
the
really
cool
one
was,
of
course,
where
we
have
the
cash
hits.
So
let
me
cut
that
one.
B
A
I'm
in
the
wrong
flame
graph
now
sorry
this
one:
this
is
the
right
one:
rpc
test
with
cash.
A
That
is
unbelievably
beautiful,
yeah
there's
there's
one
little
detail
that
I
want
to
show
you
that
is
really
cool.
The
go
runtime
in
some
cases
is
smart
enough
to
use,
send
file
the
send
file,
so
the
send
file
system
call
allows
you
to
copy
data
from
a
file
into
a
network.
A
B
A
The
connection,
even
I
mean
in
the
current
situation,
we
have
a
socket
and
we
have
a
file,
but
because
there's
these
layers
of
abstraction
in
between
the.
B
A
Can
see
that
that's
what's
there
and
it
cannot
use
the
syscall,
because
we
peel
these
layers
away
this
abstract
this
optimization
can
kick
in,
which
is
just
a
nice
gift
and
there's
another
nice
gift
that
I
even
didn't
even
do
here.
But
if
you
have
a,
if
you
think
of
upload
pack
right,
it
hasn't
standard
out
pipe.
B
A
A
C
A
A
But
I
cannot
shut
my
mouth
about
it,
because
it's
just
there,
it's
yeah.
No,
that's
that's!
That's
awesome
that
it's
there
that's
fantastic
yeah,
thanks,
so
yeah.
This
is
mostly
copy
buffer.
A
little
bit
of
send
file
again,
the
hook
executable
is,
is
still
copy
buffer,
like
it
was
before.
Okay,
so,
and
at
this
point
we're
obviously
we're
not
saturating
the
cpus,
but
the
load
factor
is
also
like
way
below
one.
I
don't
know
what
I
said:
it
was
0.6
or
something
yeah.
C
A
And
yeah
I
had
to
look
it
up,
but
the
limit.
Apparently
the
limit
on
the
vm
is
32
gigabits
per
second.
So
if
you
divide
that
by
eight
that's
40,
that's
four
gigabytes
per
second.
B
A
Actually,
if
I
do
that
thing,
where
I
dub
the
file
descriptor
and
sorry
dub
the
the
socket
and
give
it
to
upload
back,
then
you
really
hit
four
thousand.
I
I
have
like
if
I
go
back
in
my,
I
have
graphs
where
it
hits.
Fourth,
okay,
I'm
not
gonna.
I.
A
Yeah
yeah
it
just
didn't
in
this
particular
configuration,
and
the
other
thing
that
I
wanted
to
highlight
here
is
that
in
either
of
the
gitly
situations,
if
you
look
at
a
memory
profile,
then
you
see
that
in
30
seconds
it
allocates
115
gigabytes
of
memory,
and
this
all
has
to
do
with
protobuf,
marshalling
and
stuff
that
happens
in
the
this
async
ioq.
That
is
part
of
grpc.
C
A
And
so
that's
100
gigabytes,
and
then,
if
we
look
at
something
like
rpc
test
with
cash,
sorry
wrong
scroll
down
it's
five
megabytes.
So
I
was
saying
this
should
be
constant
in
the
amount
of
bytes,
not
o
number
of
bytes,
and
it
is
it's
like
to
go
from
100
gigabytes
to
five
megabytes
allocations
is
a
little.
B
Yes,
no,
that
makes
sense
yeah
yeah.
I
guess.
B
I
I
can.
I
can
rationalize
why
why
we'd
have
a
really
high
churn
on
the
heap
when,
when
we're
doing
multiple
layers
of
short-lived
yeah,
io.
A
Yeah,
I
think
that
it's
just
I
don't
unders
very
deeply
understand
how
grpc
works,
but
I
think
one
of
the
main
things
they
try
to
do
is
connection
multiplexing,
but.
A
So
all
all
things
that
go
across
the
connection
are
messages,
and
these
gets
put
in
a
queue,
and
then
there
is
this
central
thing
that
owns
the
connection.
That
looks
at
which
things
want
to
go
on
that
one
connection
and
it
feeds
off
their
cues
and
it
implements
a
quota
and
things
like
that,
so
that's
sort
of
what
they
built
and
the
way
it
works
is
that
yeah,
you
have
the
submitting
part
of
the
queue
and
there
they
pro
well.
Protobuf
itself
is
not
helping
either
they
serialize
byte
slices
into
protobuf
messages.
B
A
So
it's
I,
I
think
it's
a
fundamental
thing
about
this
multiplexing
async
io
design
that
the
library
forces
on
you,
plus
that
yeah
in
some
cases
like
send
files,
never
going
to
happen.
This
way
right,
if
you
have
that
stuff
in
between
yeah
yeah,
that
makes
sense.
B
Yeah
so
on,
let's
screenshare
for
just
a
second,
the
sticky
slack
crashed.
While
you
were
chatting
yes,
so.
B
I
spent
more
time
than
I
expected
to
digging
into
this
several
several
several
customers.
Several
self-managed
customers
reported
that
they
have
large
you've
already
got
all
of
my
contacts,
so
I'm
just
kind
of
yeah.
B
Else
watching
the
recording
summarizing
it
so
this.
In
brief,
we
have
multiple
large
customers
running
non-trivially
sized
italy
nodes
where
they
have
observed,
abrupt
spikes
in
memory
usage
by
italy
that
do
not
go
away
after
after
the
the
surgeon
workload
and
we're
pretty
confident
that
the
from
from
bits
of
diagnostic
data
that
those
that
those
customers
were
able
to
send
us
we're
pretty
confident
that
what's
going
on
here
is,
is
there's
a
some.
It
doesn't
matter
what
the
event
is,
but
there's
an
event
that
causes.
B
A
Yeah,
because
that
one
is,
is
doing
allocations
in
a
loop
and
it's
not
constrained
by
having
sent
things
back
to
the
client.
So
if
that
runs
out
of
control
fast
enough,
I
can
imagine
that
the
runtime
ex
responds
by
just
making
very
large
allocations.
Yes,
yeah.
B
What
we're
kind
of
analyzing
here
is
the
interactions
between
two
two
discrete
memory
management
frameworks,
one
of
them
in
the
go
runtime
and
the
other
in
the
linux
kernel
when
so
go
runtime
allocates
a
large
chunk
of
virtual
memory,
which
we're
going
to
call
a
map,
and
most
of
that
is
not
actually
allocated
to
physical
pages
most
of
the
time,
but
but
those
pages
can
be
allocated
on
demand
when,
via
a
page
fault
yeah
and
what
we
think
is
going
on
is
the
the
giddily
memory.
Sorry,
the
the
go.
B
Runtimes
memory
management
framework
has
two
options
for
I'm
gonna
wrap
this
up
because
I
know
you
already
know
this.
A
B
A
You
talked
about
this
last
week
too,
but
I
wasn't
in
that
cool.
So,
okay,
don't
don't
wrap
it
up
too
fast.
Okay,.
B
Okay,
well,
we
know
more
now
than
we
did
last
week,
so
so
that
and
the
the
thing
I'll
zoom
in
on
this
flame
graph
in
just
a
second,
the
oh,
by
the
way
I
if
anyone's
interested
this
is
the
this.
Is
the
issue
id
three
five,
six,
seven
in
the
gitly
repo
and
the
the
there's
a
there's,
a
short
status
summary
at
the
at
the
top
of
the
issue,
that's
up
to
date
as
of
yesterday,
but
the
the
short
version
is
the
the
go.
B
The
go
runtimes
memory
management
has
two
options
for
how
it
notifies
the
kernel:
hey,
colonel
this
chunk
of
memory.
I
it's
semantically
safe
for
you
to
reclaim
it.
If
you
want
to
and
the
that
that
that
notification
mechanism
is
using
m
advise-
and
there
are
two
bits
of
advice-
the
the
two
mechanisms
are
m.
B
Advise
free
and
m
advised
don't
need
and
advise
don't
need
is
what
historically,
the
go
runtime
used
until
a
couple
of
years
ago,
it
switched
to
m
advise
free
because
of
some
perceived
efficiency
improvements
and
in
the
next
in
the
the
latest,
release
of
go
1.16,
they're,
switching
back
to
don't
use,
and
that's
probably
going
to
be
better
behavior
for
us
and
there's
a
there's,
a
caveat
though.
B
According
to
the
the
the
documentation
for
the
m
advice,
syscall,
the
behavior
of
m
advise
free
is
very
different
on
systems
that
are
that
have
swap
enabled
versus
have
no
swap,
and
all
of
our
gitly
hosts
have
no
swap-
and
I
think
that's
why
we're
not
observing
the
behavior
that
these
customers
are
reporting
yeah.
B
The
behavior
and
I
have
not
empirically
tested
this,
but
I,
I
think
the
reason
the
kernel
recommendation
is
reasonably
trustworthy
with
regard
to
how
it
handles
these
cisco's.
So
I'm
taking
it
on
faith
right
now
that
that
it's
correct
in
saying
that,
when
am
advise,
is
issued
with
the
m
advise
free
advice.
B
If
the
host
is
swapless,
that
memory
gets
immediately
reaped,
which
is
pretty
similar,
but
not
exactly
the
same
as
what
happens
with
don't
need
if,
in
contrast
on
a
system
that
does
have
swap
enabled
which
we
think
these
customers
do
that
are
experiencing
this
problem,
then
m
advise
free
is
very,
very
lazy.
B
In
reclaiming
memory,
the
the
kernel
has
a
a
pretty
abstract
mechanism
for
computing,
a
perceived
amount
of
memory,
numerical
value
for
memory
pressure,
and
that
has
to
exceed
a
certain
threshold
for
the
kernel
to
even
bother
reclaiming
that
memory,
and
I
suspect
that,
because
we
know
giddily
benefits
enormously
from
the
file
system,
cache
and
frost
cache
is
kind
of
you
know
the
first
victim
for
for
relieving
memory
pressure.
B
It's
pretty
likely,
in
my
opinion,
that
these
customers
are
experiencing
a
performance
aggression
because
diddly
giddily
spikes,
its
memory
usage,
which
which
steals
memory
from
the
from
the
space
cache
yeah.
B
It
never
gets
reclaimed
by
the
kernel
because
of
this
mechanism,
and
if
we
tell
the
go
runtime
to
switch
to
this.
A
A
A
But
yeah,
if,
if
you
have
something
that
is
just
that
as
eating
into
the
space
for
the
page
cache,
then
you're
going
to
hit,
I
end
up
doing
actual
block.
I
o,
which
is
not
what
you
yes.
B
B
Exactly
okay,
so
so
on
to
the
on
to
the
other
bits
that
that
I
wanted
to
share,
so
I
wanted
to
try
to
quantify
so
the
the
proposal
that
we've
kind
of
reached
here
is
there
is
there's
like
I
said
that
via
go
debug.
B
Yes
exactly
so
we
haven't
applied
this
to
our
host
yet,
but
I
think
we
probably
should-
and
I
think
I
think
I
think
I
think
a
lot
of
folks
are
in
agreement-
that
we
should
do
that
and
I'm
just
so.
A
B
Exactly
yeah,
so
in
our
current
state
I
tried,
I
tried
a
few
tactics
for
for
trying
to
what's
the
best
way.
To
summarize
this,
so
the
the
the.
Let
me
let
me
I'm
sorry.
Let
me
take
another
couple
minutes
to
just
okay
for
one
more
piece
of
background:
it's
okay,
okay,
so
so
this
is.
B
This
is
one
last
bit
about
the
interaction
between
the
go
memory
management
and
the
the
kernel,
the
kernel
member
management,
when,
when
the
go
runtime
issues
that
m
advised
this
call
to
say
this
range
of
pages,
are
you
can
reclaim?
If
you
want
to
that,
that
is
effectively
saying.
I
know
that
you
physically
you've
allocated
physical
pages
for
this
portion
of
the
of
of
my
map.
B
If
you
want
to
reclaim
them,
you
can,
I
will
probably
use
them
again
in
the
future,
but
feel
free
to
reap
them
at
your
discretion
right
now
and
the
behavior
that
the
interesting
behavior
here
is,
if
the
the
process
in
in
this
context
to
go
runtime
touches.
This
is
this:
is
the
semantics
of
the
m
advices
call,
so
this
is
linux
behavior.
B
B
Yeah,
yes,
exactly
and
so,
and
so
the
rationale
for
for
using
this,
this
kind
of
asynchronous
reaping
behavior
is
that
the
the
go
runtime
can
can
be
fairly
aggressive
in
marking
pages
that
it
thinks
it
won't
need
again
and
if
it
ends
up
wanting
to
use
those
pages
again,
it
doesn't
have
to
worry
about
whether
the
you
know
whether
they're
there
or
not.
The
kernel
will
allocate.
A
Also
shrink
shrink
the
map
or
something
like
that,
but
it
doesn't
shrink
the
map.
It
just
says
you
don't
need
most
of
it,
but
yes,
it
could
just
barge
in
and
claim
an
insane
chunk
of
memory.
Yes,.
B
Yes,
exactly,
and
so
the
rationale
for
making
this
a
lazy
reclaim
is.
That
is
that,
if
that
page,
if
that
page,
that
got
marked
as
is
wrigley
ends
up
getting
used
before
it
gets
reaped,
then
that
saves
the
kernel,
the
the
bother
of
a
the
the
the.
A
B
Exactly
exactly,
and
so
I
figured
that
because
the
mechanism
for
provisioning
a
page
you
know
that
has
been
reaped-
is
to
do
a
page
fault,
specifically
a
user's
user
space,
page
fold
that
we
could
trace
page
fault
events
and
and
try
to
identify
when
those
page
world
events
correlate.
C
B
Yeah,
so
this
this
note
here
I'll
I'll
link
this
in
the
in
the
the
meeting
doc.
But
this
this
note
here
kind
of
walks
through
that
experiment,
and
I
thought
it
was
interesting
enough
to
share
the
short
version
of
it.
Is
that
I
found
a
trace
point
that
lets
me
excludes
most
of
the
syscalls
like
read
and
write
and
accept
the
e
and
pipe
read,
and
things
like
that
that
that
we
don't
care
about.
The
only
page
faults
that
we
want
are
ones
that
involve
the
private
anonymous
memory.
B
Thanks
so
so,
looking
at
and
page
faults
happen
really
really
often
so
I
need,
I
also
needed
to
find
a
trace
point
that
would
support
kind
of
sub
sampling.
So
I
got
this
one
to
give
me
a
one
percent
sample.
B
This
is
the
this
is
the
first
of
the
two
flame
graphs
I
wanted
to
show.
So
this
is.
This
is
effectively
saying
one
out
of
every
hundred
page
faults
that
are
initiated
from
user
space,
go
ahead
and
capture
a
stack
trace,
the
user
space
and
kernel
and
all
right,
and
so
you
can
see
a
lot
of
you
can
see.
A
lot
of
git
processes
are
doing,
which.
C
B
Yes,
I'm
going
to
just
zoom
in
on
on
a
couple
of
these,
so
we
can
see
more
detail,
so
this
is
so.
This
is,
for
example,
this
looks
like
I
get
diff
and
it's
reading
a
pac
file,
entry,
yeah
and
that
sure
makes
a
lot
of
sense
doing
an
yeah.
So
it's
doing
doing
a
mem
copy
of
of
a
page,
that's
presumably
from
the
frosting
cache
into
memory
into
a
private
memory
allocation
yeah,
and
the
thing
is
that.
A
A
That's
right
because
I
was
thinking.
Git
also
has
to
allocate
memory
itself
for
decompressing
things,
because
it's
not
just
enough
to
load
the
data
from
disk.
It's
also
it's
compressed,
so
it
needs
to
at
least
memory
on
top
of
that.
But
this
could
just
be
dio
that
we're
seeing
here.
B
Where
it
would
put
its
temporary
structures
yeah
exactly
so
just
kind
of
scroll,
the
the
for
for
folks
who
don't
spend
a
lot
of
time
with
flame
graphs,
which
I
know
jakob
does
these
are
sort
of
alphabetically
from
left
to
right,
yeah,
each
each
layer
of
the
frames
and
so
we're
we're
looking
for
the
giddily
process
and
I'm
going
to
move
my
and
you
can
see
it's
not
obvious.
So
I'm
going
to
move
my
mouse
very
slowly
right
here
and
you
can
see,
there's
giddy
dash
and
it
goes
yeah.
B
Here's
giddily
hooks
scrolling
scrolling
and
over
here
that's
giddily,
that's
the
whole
thing.
Yeah
149
frames,
so
this
is
149
frames
out
of
about
as
149
samples
out
of
about
20
000
samples.
So
this
is
a
vanishingly
small
amount
of
times
that
giddily
itself
is
doing
page
defaults
of
this
variety
and
the
second.
B
Where
did
you
capture
this?
This
was
on.
Did
I
note
that
here
I
don't
think
I
did.
This
was
on
one
of
the
production
goodly
nodes
right.
C
B
Just
if
this
is
production
traffic,
it's
production
traffic,
it's
six,
it's
60
seconds
to
try
to
name
the
files
with
clues
about
this,
so
this
is
the
so
this
is
the
a
software
type
of
perf
event
called
page
faults.
It's
capturing
one
in
a
hundred
of
those
events
and
the
duration
of
the
capture
was
60
seconds
and
it
was
on
one
of
the
production
gathering
nodes.
I
think
it
was
file
42..
A
The
the
go
memory
allocator
either
well,
both
our
workloads
are
probably
very
constant
and
the
gold
memory
allocator
is
doing
a
good
job
of
not
of
keeping
things
around
that
it
needs
or
not.
Releasing
too
many
things.
I.
B
Think
so,
yes,
and
just
for
reference,
the
second
flame
graph
I
attached
here,
is
just
extracting
the
149
frames
that
that
we
zoomed
into
here
just
so
it's
easier
to
see
them.
But
I
think
that
I
think
the
kind
of
the
the
first
most
important
message
is
that
there
were
very
very
few
of
these
events,
which
suggest
that
the
go
runtime
is
doing
a
pretty
good
job.
A
Is
smart,
http,
post
upload
back?
So
that's
gets
http
proto
marshalling
and
the
one
on
the
left
is
back
objects
hook,
which
is
the
bulk
of
the
data
that
flows
through
post
upload
back
throws
through,
like
back
objects,
hook
first
and
then.
A
Through
that
still
process
getting
hooks
and
then
back
in
and
then
out
the
door
again
so
yeah
that
makes
sense.
B
Yep
so
yeah
this
is,
this
is
more
of
the
the
grpc
marshalling
you
were
talking
about.
There
are,
I
guess,
the
other
interesting
bit
that
I
I
one
of
the
things
that
I
was
looking
for
here
is
an
efficient
way
to
quantify.
B
How
often,
how
often,
how
often
do
we
end
up
having
page
faults
in
when,
when
trying
to
allocate
memory
within
the
go
runtime,
so
we
would
expect
that
that
it's
very
very
frequent
to
have
to
do
allocations
within
the
go
run
time
that
don't
cause
page
faults,
and
I
figured
that
one
of
one
of
the
useful
one
of
the
useful
tricks
for
us
to
kind
of
quantify
the
impact
of
either
the
the
new
go,
the
new
go
version
or
or
toggling
this
this,
this
m
advise
mode
would
be
to
to
count
the
the
ratio
of
allocation
events
versus
page
faults
that
that
derive
from
those
allocation
events.
B
This
doesn't
get
all
the
way
there,
but
it
gets
kind
of
close.
Have
you
looked
at
the
goal
runtime
people
of
I
haven't.
A
Spent
a
lot
of
time
with
it
because
it
emits
numbers,
well
some
numbers,
so
it
could
be
that
yes,
it's
on
metrics
for
that
what
you
have
the
thing
you
want.
B
Yeah,
so
I
I
can't
speak,
I
can't
speak
well
to
to
that,
because
I
haven't
used
it
enough
to
to
be
confident
in
my
interpretations,
but
my
impression
is
that
is
that
the
we're
talking
about
heat
profiling
here?
My
impression
is
that
is
that
the
goes
internal
heat
profiler.
B
B
Yeah
sure
it's
it's
very
quick.
Let's
do
that,
okay
in
in
the
spirit
of
finding
ways
to
to
instrument
this,
I
wanted
to
highlight
that
that
about
about
44
of
of
these
were
were
doing
sorry.
These
are
these
are
page
folds.
So
this
is
this
portion
of
the
stack
is
a
very.
There
are
two
common.
There
are
two
common
coal
pads
near
the
tops
of
these
stacks
that
actually
did
result
in
page
folds.
B
One
of
them
is
is
when
the
the
runtime
is
trying
to
do
is
trying
to
effectively
add
a
new
span
to
the
m
cache
and
to
do
so
for
whatever
the
the
size
class
that's
needed.
That
needs
a
refill.
B
It
will
call
m
central
grow
to
allocate
a
new
set
of
spans
and
that
and
that
it's
perfectly
reasonable
to
expect
that
to
sometimes
do
a
page
fault
when,
when
we're,
when
it's
doing
that
by
touching
a
portion
of
the
map
that
has
not
yet
been
physically
allocated,
the
page.
A
B
Yeah
that
yeah
it's
it's
always
going
to
do
yeah.
You
know
it
always
does
the
zeroing
and
that's.
B
B
Sure
yeah,
I
guess
I
guess
my
point
is
that
the
is
that
is
that
if
the
page
has
already
been
allocated
from
the
kernel,
we
wouldn't
have
seen
that
as.
B
B
I
think
I
think
the
the
size
classes
end
at
like
32
kilobytes,
so
any
allocation,
that's
larger
than
32k
ends
up
getting
allocated
directly
with
this
large
outlet
call
yeah
so
that
just
exactly
like
you
just
said
that
also
returns
your
zero
pages,
which
is
why
it
traverses
this
and
it's
natural
to
expect
that
told
so
location
induced
page
faults
when
it's
touching
a
portion
of
the
map
that
don't
currently
have
a
physical
page
allocated.
B
B
I
haven't
I
haven't,
I
haven't
written
this
yet,
but
I
think
that
probably
I
could
instrument
these
two
functions
in
the
go:
runtime
and
the
and
either
a
k,
probe
or
a
trace
point
for
page
faults
and
only
and
and
only
track,
the
page
faults
where
it
happened.
A
Under
that
those
functions
under.
B
That
call
path
exactly
so
that
would
be
effectively
set
a
bit
here
for
this
particular
task
id
and
then,
and
then,
if
there's.
B
Yeah,
I
don't
know
if
that's
cold
in
other
places,
but
maybe
yeah,
maybe
so.
A
B
Well,
this
is
this
is
a
this
is
a
minor
point,
but
yeah.
That's
that's
fantastic.
This
is
a
minor
point,
but
the
the
spelling
of
this
of
this
symbol
is
turns
out
to
be
really
obnoxious,
because
it's
got
a
parenthesis
and
an
asterisk
and.
B
Escaping
that
is
hard
yeah.
I've
worked
that
I
worked
out
a
way
to
do
it,
but
it's
it's
obnoxious.
So.
A
A
Favor
of
memclear,
no,
he
pointer
said
yeah
exactly
exactly.
I'm
gonna
make
a
note
for
that
right
now.
So
what
I
wanted
to
show
you,
let
me
share
back,
is
that
oh
come
on.
Why
are
we
not
typing
in
the
right
box?
So
we
were
talking
about.
B
A
Mems,
that's
yeah
this
this
whole
namespace.
So
I
think
this
has
things
like
heap
cis
bytes.
I
wonder
if
that
is.
A
A
B
Bringing
them
up
no
yeah
yeah,
I
guess
so
yeah
I
can.
I
can
look
at
the
prometheus
exporter
to
find
where,
where
they're
getting
that
from
the
runtime
they're.
A
They're
from
here
so
so
you
maybe
want
to
look
at
this.
Maybe
this
helps,
but
I
I
don't
know
if
this
would
the
goal
run
time
may
be,
it
should
be
oblivious
to
page
faults,
so
in
that
sense
it
probably
doesn't
help,
but
I
don't
know
I'll
I'll,
throw
this
in
the
in
the
document.
B
Sure,
yeah
yeah
yeah,
that's
great.
I
think
that
was
that
was
the
most
interesting
stuff
that
I
thought
I
thought
was
worth
sharing
from
my
weekend.
A
Yeah,
that's
that's
fun!
So
how
does
it
do
you
expect.
A
I
so
this
got
mo.
This
was
motivated
by
problems
for
self-managed.
Do
you
also
see
an
this
relate
back
to
a
problem,
we're
having
in
production
or
and.
A
B
Something
like
that,
I
I
don't
think
so,
one
of
the
one
of
the
one
of
the
questions
in
my
mind
when
I
got
involved
with
that
with
that
support
task
was
was.
Why
are
we
not
seeing
this
on
our
production,
and
I
think
I
think
I'm
satisfied
with
the
answer
that
that
the
fact
that
we're
running
swappable
systems
makes
the
m
advise
free,
behavior,
fairly
similar
to
the
m,
advised
don't
need
behavior.
A
B
B
Yeah
yeah,
especially
since
yeah,
we
I'm
sure
that
they
I'm
sure
that
they
don't
have
as
much
ram
as
we
do,
but
we
did.
We
did
see
that
their
giddily
process,
one
one
of
these
customers-
was
able
to
give
us
the
you.
You
probably
know
this
already,
there's
a
there's:
a
command
line,
utility
called
pmap
that
that
has
like
three
different
levels
of
verbosity.
If
you
run
a
pmat
dash
capital
x,
capital
x,
that's
the
maximum
verbosity,
which
is
I.
A
B
Well,
you'll
remember
from
the
prop
file
system
that
there
that
there's
that
there's
for
a
given
process
id
there's
a
there's,
a
maps
file
and
there's
an
s
maps
file
and
the
space.
A
B
Is
much
much
richer
information
well.
A
B
I
actually
did
some
some
profiling
work
to
show
how
expensive
it
is
to
to
read
the
s
maps
file
for
a
process,
because
I
think
a
lot
of
people
were
kind
of
thinking
that
it
was
really
cheap
and
it's
only
cheap
under
certain
circumstances.
So
it's
easy
to
kind
of
trick
yourself
into
thinking.
It's
always
cheap,
but
it's
it's
not
in
fact
the
the
the
more
the
more
memory
that's
allocated,
the
more
expensive
it
is
to
to
do
the
the
walk,
but
I
digress.
B
Yeah
pmap
is
you'll
love
it
just
try
that
sometime
pmap
is
reading
the
s
maps
file
and
presenting
it
in
a
more
human,
readable
fashion.
If
you
it's,
it's
literally
just
you
know
window
dressing
on
on
sbabs,
but
this
customer
gave
us
gave
us
some
heat
map
output
and
it
showed
that
that
the
the
virtual
size
for
for
the
gridly
process
was
19.5
gigabytes
and
of
that
about
18
point
something
gigabytes
was
private.
B
Anonymous
memory,
which
is
you
know
reasonable,
but
like
almost
90
of
that
was
was,
was
of
that
private
anonymous
memory
was
in
the
in
the
free
state,
not
the
dirty
state
which
yeah
it's
been,
which
I
think
means
it's
been
released
to
the
os
for
potential
reclaim,
but
hadn't
actually
been
released
because
those
pages
are
actually
still
allocated
and
that's
what
led
to
the
conclusion
that
the
lazy
reclaim
is
is
at
fault,
yeah,.
A
Do
we
know,
do
you
know
what
the
effect
of
m
advice,
free
and
msi
don't
need
this
on
this
s-map's
output.
B
That's
a
great
question:
one
of
the
things
that
I
wanted
to
do,
but
haven't
had
time
to
do,
is,
is
to
write
a
write,
a
a
toy
c
program.
To
specifically
do
you
know,
create
a
map,
allocate
the
the
memory
and
then
and
then
directly
yeah.
B
A
A
B
A
B
A
C
programs
to
try
things
out
is
is
good.
B
Yeah
yeah
totally
with
you
yeah,
so
that's,
but
I
I
think
that's
all
I
had
for
for
that
topic.
A
B
B
This
is
this
is
a
total
tangent.
You
mentioned
you
mentioned
at
one
point
recently.
I
think
I
think
there's
an
in.
I
forget
if
it's
slack
or
an
issue,
but
you
mentioned
that
that
you
were
having
oh
actually.
It
was
probably
this
thing
that
you
just
demoed
that
you
started
off,
trying
to
get
some
profiles
from
gidley
and
were
being
thwarted
by
the
fact
that
when
we
do
package
installs
that,
yes
on
disc
binary,
that
is
so
obnoxious.
B
C
B
A
B
A
That's
so
from
a
slightly
different
perspective.
I
I
know
only
buzz
from
way
back
but
way
back.
I
knew
omnibus
very
deeply
and
I
think
that
actually,
I'm
not
even
sure
this
makes
sense,
but
I
I
think
omnibus
has
some
things.
Well,
it's
just
chef
code
right
and
it
does
things
to
try
and
not
restart
processes
unnecessarily.
A
So
it
could
be
that
there's
something
in
our
omnibus
cookbooks
that
that
manage
gitly
internally
to
omnibus
that
it
decides
not
to
restart
the
process
after
the
blog,
because
it
thinks
it's
the
same,
but.
A
The
same
and
we
ought
to
restart
it,
there's
of
course,
a
good
reason
for
not
restarting
processes
all
the
time,
because
people
know
this
and.
B
A
A
Live
reload,
so
if
we
just
change
omnibus
so
that
it
always
restarts
quickly.
If,
if
I
don't
know
what
the
criterion
is
right
now,
but
it
might
be
two
lakhs
and
if
we
make
it
more
strict,
so
if
it
does
more
restarts
it
should
not
impact
users
and
then
the
problem
goes
away,
because
the
running
binary
is
the
deploy
binary.
Yes,
hopefully,
that's
possible.
I
just
I
haven't
sat
down
and
looked
into
if
that's
actually
what's
going
on.
B
B
A
B
A
Don't
care
that
the
binary
doesn't
match
because
go
these
binaries
are
fairly
big
because
go
puts
enough
information
in
there
that
they
can
generate
a
self-contained
profile.
A
So
once
you
download
the
profile
that
way,
you
don't
need
to
know
what
what's
running
or
anything
yeah,
but
it
would
be
it's
so
much
nicer
when
you
can
see
imperf
what's
going
on,
that
was
that
was
really
nice
yeah
and
because
I
did
my
experiments
on
a
test
vm
that
doesn't
get
deployed
all
the
time.
I
knew.
B
B
B
Yeah
perf
has
kind
of
two
with
regard
to
like
doing
profiling.
Perf's
got
kind
of
two
modes,
two
modes
of
operation,
and
most
of
the
time
I
use
the
the
timer
based
frequency
sampling,
which
I
think
I
think
that's
what
most
folks
use
it
for
yeah.
But
you
can
also
say
you
can
also
say:
I
want
to
instrument
a
particular
a
particular
function
and
capture
every
call
to
that
function.
So
that's
that's.
A
B
More
expensive
and
on
the
occasions
when
you
want
to
instrument
a
function
that
does
get
cold
kind
of
often
enough
that
you're
worried
about
the
overhead
there's,
there's
support
for
for
sampling,
one
out
of
every
end
occurrences.
That's.
A
What
you
did
on
that?
Well,
that
wasn't
the
function
call,
but
it
was
a
that
painful
thing.
You
also
did.
B
Exactly
exactly
so,
I
I
was
actually
in
in
the
act
of
trying
to
do
that
that
I
I
discovered
kind
of
a
gotcha
that
I
I
was.
I
was
really
surprised
at
the
the
interface
for
specifying
that
you
want
one
at
every
end.
B
Events
just
silently
didn't
work
for
the
first
two
attempts
I
made,
for
there
are
several
proof:
events
that
you
can
use
for,
trying
to
catch
paid,
page
faults
and
the
ones
I
started
with
were
k-probe
events
and
trace
points
and
for
both
of
those
it
silently
ignores
any
specification
about
only
get
one
out
of
every
end
events.
So.
B
Yes-
and
it
was
like
you
know,
a
few
hundred
megabytes
worth
of
perf
data
and
you
know,
and
no
doubt
slowed
the
system
down
for
the
30
seconds.
I
was
doing
their
profile,
which
you
know
it
isn't
a
huge
deal,
but
it's
not
what
I
wanted
so
so,
because
the
tool
didn't
work.
The
way
I
wanted,
I
did
some
digging
into
it.
It
turns
out
that
only
certain
kinds
of
only
there
are
a
few
types
of
proof.
A
It
very
I
I
tried
reading
about
this
stuff
in
the
brendan
greg
book
about
ebpf
and
I
just
get
lost
in
all
the
different
kinds
of
profiles.
B
A
He
invented
them
or
he
sort
of
championed
them
anyway.
So
I
I
also
I
I
don't
want
to
make
these
mistakes
you
just
described.
So
that's
why
I
just
stay
away
and
sure
yeah,
the
cpu
sampling,
like
the
frequency,
sampling,
yeah.
B
Yep
yeah
and
that's
that's
sufficient
for
most
cases
so,
but
this
is,
I
think
this
is
the
first
time
that
I
needed
to
did
I
that
I
felt
like
I
needed
to
actually
instrument
a
high
frequency
function,
call
and-
and
I
did
work
out
a
way
to
do
it.
I
found
out
that
the
the
software
type
of
of
perf
events
does
support
this
and
the
k,
probe
and
trace
point
events
do
not,
at
least
as
of
our
current
version
of
perf
yeah.
There
was.
B
Yeah
and
that's
and
that's
that's
going
to
increase
over
time,
yeah.
B
Yeah
there's
talking
about
bpf,
I
mean
the.
B
This
is,
I
feel,
like
we're
kind
of
in
in
an
era
of
just
massively
improved
observability
for
systems,
and
I'm
just
I'm
so
happy
to
get
to
you
know
to
get
to.
You
know
explore
this
the
like
what
we
were
just
talking
about
a
few
minutes
ago.
I
think
I
think,
probably
the
right
approach
there
to
to.
B
I
think
a
reasonable
approach
there
is
is
to
write
a
bpa
program
that,
in
that
that
instruments
entry
into
into
one
into
into
the
the
frame
before
doing
the
page
folds,
as
well
as
the
page
fault
events
and
and
only
emit,
and
only
emit
events,
whether
it's
a
counter
or
a
stack
trace
where,
where
the,
where
the
the
flag's
been
set
by
the
by
the
earlier
frame,
and
that
should
be
a
lot
cheaper
than
saying
for
every
one
of
the
high
frequency
events.
B
The
page
faults
examine
the
stack
right.
B
B
And
bpf
is
a
perfect
tool
for
doing
that.
It's
it's
kind
of
a
box
just
to
do
it
like
there.
There
are
kind
of
two
interfaces
for
working
with
bpf
and
there's
the
the
bcc
suites,
the
bpf
compiler
collection,
yeah.
B
And
I
I
got
about
a
year
and
a
half
ago.
I
got
that
installed
on
all
of
our
all
of
our
servers,
but
the
the
other
interface
is
is
bpf.
A
Basically,
recreating
d
trace,
but
for
bpf
yeah.
B
Yeah
yeah
exactly
and
it's
it's
unfortunately
difficult
to
get
that
to
get
that
working
on
the
version
of
ubuntu
that
we
have
installed
on
most
of
our
fleet.
So
once
we
get
upgraded
past
a
certain
point
it'll
become
trivial
to
install
that.
But
I
really
really
really
wanted
that
to
be
on
all
of
our
boxes
too.
But
I
just
didn't
have
the
time
to
because
we
have
to
compile
it
from
source
and
it's.
A
Funny
we're
just
waiting
for
linux
to
catch
up
with
solaris
enough
to
get
some
of
their
goodies.
B
Many
many
years
ago
I
introduced
solaris
boxes
at
the
company.
I
was
working
for
at
the
time
for,
for
two
reasons,
one
one
was
d-trace.
A
A
B
Yes,
yes,
yeah,
prior
to
that,
the
best
thing
we
had
going
on
the
lake
side
was
system
tap
when
it
was
just.
I
mean
it
really,
it
really
tried,
but
it
had
some
kind
of
critical
failure.
Flaws
like
it
was
one
of
the
things
that
bpf
programs
tend
to
do
really
really
well
is
avoid
avoid
locking
across
cpus
and.
C
B
Was
really
easy
to
write
a
system
tap
instrumentation
that
had
a
shared
global
data
structure,
and
that
means
the
more
cpus
you
got,
the
more
cross,
cpu
locking
you
got
and-
and
it
ends
up
introducing
an
enormous
bottleneck
in
the
system
that
you're
trying
to
profile
bpf.
A
Yeah,
so
just
wait
for
for
the
next
ubuntu.
B
Yep
sounds
like
it.
I
I
feel,
like
more
people
will
be
comfortable
using
you
know,
kind
of
using
bpf
to
explore
a
system
once
we
get
bpf
trace
available
and
I
think
that's
really
important
for,
like
you
know,
career
development
and,
and
just
you
know,
learning
more
about
the
systems
that
we
operate.
But
for
now
what
we've
got
is
is
the
bpf
bcc,
suites
and
perf.
A
So
yeah
something
I've
been
wondering
is
if
there's
a
same
way
to
make
these
just
these
very
basic
frequency,
cpu
flame,
graphs,
more
accessible,
because
in
a
way
I
asked
for
production
access
just
to
get
those
things.
And
yes,
I'm
getting
a
lot
of
mileage
out
of
them,
but
I'm
sure
other
people
would
too
and
we
can't
give
all
of
them.
Production
access.
B
A
B
C
B
Object,
you
know
I
do
uploads
of
perf
scripts
to
object,
storage
for
kidney
nodes.
We
can
totally
do
that.
We
just
need
to.
I
mean
it's
yeah.
B
A
I
I
think
the
other
I
think,
there's
also
just
more
of
a
social
obstacle
is
that
the
people
who
need
this-
maybe
don't
know
that
this
exists.
So
we.
B
A
B
B
I
yeah,
I
agree.
Yeah.
B
Why
I
try
to
share
flame
graphs,
like
you
know
the
svg
files,
as
well
as
as
well
as,
like
you
know,
static
images
with
like
circled,
you
know
and
annotated
text
on
it
just
to
kind
of
make
it
more
accessible
and
kind
of
show
people
how
to
how
to
interpret
it.
I
think
it's
it's
so
empowering
and
I
really
really
wanted
it.
It
really.
A
Has
been
for
me
because
this
this
whole
black
object,
cache
project
started
because
there
was
a
flame
graph
of
an
incident
in
in
december,
and
I
got
to
download
that
I'd
be
like
wait.
What
what's
going
on
here
and
like
finally
yeah?
I
mean
they're
weird,
because
they
don't
always
tell
you
the
whole
story
or
it's
not
always
clear
what
conclusions
you
can
draw
from
a
flame
graph
yeah.
But
you
can
see
so
much
more
than
you
can
with
a
lot
of
more
basic
tools.
B
Totally
agree
and
there's
like
I
mean
most
of
the
time,
we're
talking
about
cpu
profiling,
but
of
course
it's
you
know,
flame
graph
is
just
a
format.
You
know
a
format
for
representing
yeah.
You
know
abstract
data
so
like
there
are
other
kinds
of
events
that
are
sometimes
applicable
to,
like
other
people,.
B
Yeah
yeah,
I
feel
like
I
feel,
like
I'm
gonna,
go
with
yes
on
that
I've
seen
one
or
two
places
where
people
used
it
for
things
other
than
revealing
stack
dominance,
but
I
feel
like
that
was
really
abusing
the
format,
although
it
was
also
hilarious
and
useful.
It's
just
not
what
you
expected
to
see.
So
I'm
not
I'm
not
casting
blame.
I
just
I.
I
thought
I
thought
it
was
wow
that
is
so
weird
and
creative,
and
I
don't
want
to
do
that
again.
B
But
no
yeah
yeah.
I
totally
agree
it's
it's!
It's!
It's
really
really
empowering
to
have
access
to
that
stuff,
and
I
want
more
folks
to
have
it.
I've
had
for
folks
that
do
have
production
access.
B
I
think
a
lot
of
those
folks
are
not
really
comfortable
with
it,
because
the
indications
can
be
kind
of
arcane
and
you
don't
really
know
you
know,
what's
what's
what's
safe
and
what's
not
in
terms
of
adding
overhead
so
kind
of
to
relax
the
that
that
sense
of
anxiety,
I've
got
a
handful
of,
I
think,
three,
three
or
so
shell
scripts
that
take
like
no
arguments
or
one
arguments.
A
A
Right,
I
sort
of
get
that
from
the
the
host
name
as
well,
but
does
it
end
up
in
the
flame
graph?
If
you
do
that.
A
I
I
know
this:
I
noticed
that
better
data
in
the
header,
but
I'm
not
sure
when
you
so
okay,
the
only
way
I
consume
these
things
is
by
making
a
flame
graph,
which
I
do
offline,
because
it
doesn't
make
sense
to
me
to
copy
flame
grab
like
copy
stack
collapse,
perfpl
onto
the
server.
B
Or
we
we
actually
got
those
installed
in
in
the
default
path,
so
you
can
just
call
stack
collapse,
dash
proof,
dot,
pl
and
flame
graph
right
on
the.
B
A
This
thing
I
mean
it
has
this
option
to
filter
out
swapper,
but
I
usually
want
to
see.
B
A
B
A
Then
that's
the
output.
So
with
those
two
things
I
can
capture
my
profiles
and
because
the
shell
scripts,
I
can't
do
anything
wrong.
It's
kind
of.
B
Yeah
exactly
yeah
exactly
yeah
yep,
and
that's
that's
exactly
the
kind
of
thing
that
I
I
figured
would
make
it
a
little
bit
easier
for
folks
that
do
have
production
access
to
to
you
know,
be
comfortable,
get
capturing
a
profile,
and
so
I
like
make
opinionated
decisions
about
about
the
capturing
frequency
and
the
direction.
B
B
It
in
I
like
having
the
header,
because
it
doesn't
impede
the
generation
of
the
flame
graph,
and
it
does
give
you
like
it,
tells
you
the
exact
invocation,
and
it
gives
you
like
some
bits
of
metadata
about,
like
what
exact
you
know.
Kernel
was
running
and
the
topology
of
the
of
memory.
If
you
ask
for
that,
it
doesn't
usually.
C
A
A
B
A
C
The
same
chef
recipe
that
installs
that
installs
perf.
A
Right
but
part
of
the
problem
is
getting
the
perf
profile
of
the
server,
because
then
you
need
to
run
scp
or
something
to
get
the
file
off,
and
what
I
like
about
my
script
is
that
it
just
uses
well,
not
literally
scp,
but
the
data
ends
up
on
my
computer,
which
is
where
I
want
it,
and
not
in
my
home
directory
on
what
server
was
on
five
minutes
ago
like
so
for
for
me,
an
ideal
solution
would
be
something
that
people
run
locally
or
it
would
be
a
web
interface
where
you
just
click
and
it
downloads
to
your
download,
folder
sure
yeah.
A
A
But
I
I
haven't
found
I
haven't
thought
about
it
too
much
but
like
if
there's
some
some
sort
of
way
to
get
this,
to
get
this
right
with
a
reasonable
amount
of
effort.
It
would
be
very
nice
just
but
yeah.
B
A
B
Yeah
yeah
yeah,
just
just
like
I've
got
I've
got
a
set
of
tutorials
that
I've
got
half
written
for
kind
of.
You
know:
teaching
people
how
to
how
to
read
flame
graphs
and
how
to
use
perf
and
bpf
tools,
and
it's
just
it
takes
time
all
the
stuff
takes
time
and
there's
so
many.
A
B
A
Yeah,
a
lot
of
people
need
to
know
they're
there,
but
that
takes
care
of
installing
the
script,
because
it's
just
get
pool.
A
Well,
okay,
it
we've
been
talking
for
more
than
an
hour.
A
Should
wrap
it
up
thanks
for
the
well
demo,
thank
you
for
the
demo.