►
A
And
there
this
inhale,
I
wanted
to
share
an
interesting
profiling
and
optimization
exercise
that
I
went
through
mostly
just
for
fun,
but
also
to
kind
of
try
and
see
if
it's
possible,
to
get
any
useful
information
out
of
this
google
profiler
service
that
we
recently
connected
custom.
A
So
this
is
how
the
default
page
looks,
and
this
is
data
for
from
production
for
cass,
and
it
shows
us
the
cpu
profile
for
all
zones
and
for
all
versions.
Let's
pick
the.
A
Most
most
cpu
using
profiles
and
a
few
of
you
of
your
like
all
of
them,
are
here.
We
had
17
000
profiles,
that's
a
lot
over
the
last
seven
days.
So
what
do
we
see
here?
I
mean
the
width
is
what's
interesting.
This
is
the
100
percent,
the
root
of
it,
and
then
we
kind
of
it's.
It's
called
the
flame
graph.
You
probably
all
know
that
and.
A
We
can
kind
of
see
what
takes
time
cpu
time
in
this
case,
and
we
can
quickly
like
filter
through
this
by
kind
of
ignoring
that
this
most
of
this
stuff,
because
it's
just
very
small
and
insignificant.
A
A
A
A
So
new
scanner
allocates
half
of
that
and
trump
bites
like
it's
half
of
that
above
so
we
we
have
three
methods
that
look
suspicious
and
some
something
here:
split,
n,
strings,
platinum,
strings,
gen,
split,
okay,
so
we
have.
This
is
a
bit
interesting
and
let's
dig
in,
but
also
I
found
that
there
is.
A
Cow
mod
itself,
it
makes
it
even
makes
it
even
easier
to
spot
the
suspicious
things
and
they're
here
as
well
same
same
same
stuff,
it's
just
different
different
stack
traces
where
the
same
method
is
called
by
a
different.
You
know
different
chain,
of
course.
A
A
So
I've
done
I've
done
the
work
already,
so
I
will
just
quickly
show
you
what
what
I've
done
here
and
let's
check
out
this
commit.
So
basically,
we
are
looking
at
check
out
revision
check
out
this
package,
and
this
this
is
copied
from
italy.
This
code,
just
parses
reference
discovery,
api
response,
which
italy
streams
over
grpc.
A
A
You
know
to
learn
whether
a
repository
has
changed
or
not
on
a
particular
branch.
So
we
know
that
hrefs
calls
that
pass
reference
discovery,
so
the
expensive
methods
are
fetch
refs
itself
and
parse
reference
discover
that
it
calls
and
fetch
refs
leaves
here
this
thing.
It
makes
a
call
to
italy
this
this
grpc
method,
and
then
it
just
consumes
the
responses
accumulating
the
the
data
into
a
byte
slice
and
then
it
par
passes
that
as
a
reader
to
the
parse
method
that
parses
that
and
just
automatically
processing
line
by
line.
A
A
Test
things
and
measure
things
in
goa,
so
benchmark
is
a
test
that
measures
allocations
and
times
times
the
code.
So
here
we
just
create
an
artificial
input,
create
a
reader
from
it.
Tell
the
benchmarking
harness
that
we
are
interested
in
allocations
in
recording
allocations,
not
only
timing,
the
code,
then
we
reset
the
timer
to
ignore
this
to
not
measure
the
above
code,
and
then
this
is
the
standard
format
where
you
you
loop
n
times
over
the
code.
A
A
A
You
know
invocation,
and
this
is
how
many
times
it
allocates
memory
for
each
invocation-
it's
just
standard
ghost
stuff
that
it
allows
you
to
measure.
Then
if
we
add
this,
we
tell
the
so
intellij
golend
in
this
case
builds
the
binary
and
then
we
pass
the
bills,
the
binary
of
the
benchmark
and
it
it.
We
tell
to
call
this
benchmark.
A
Then
we
pass
this
parameter
to
that
program.
That
golan
builds
with
this
benchmark
and
we
tell
it
to
record
the
memory
profile
into
a
file,
and
then
you
can
see
the
full
thing.
What
golem
does
here?
It
calls
go
to
with
test
and
compile
output,
maybe
not
compile
this
output,
so
it
builds
this
binary.
A
Yes,
this
is
the
binary
name
like
this
long
thing
saying
that
it
should
be
verbose
and
some
other
parameters
so
to
run
this
benchmark
with
this
rejects
and
then
just
basically
excludes
all
tests
and
then
that's
what
we
added
profile
and
record
the
profile
and
blah
blah
blah
and
that's
what
it
prints
so
to
open
the
profile
we
we
do.
A
A
A
These
are
the
most
allocating
methods
and
this
matches
as
much
as
what
google
profiler
tells
us.
Well,
of
course,
this
is
the
only
thing
that
we
run.
That
was
so.
If
we
we
do
this
parts
to
see
the
annotated
code-
and
here
here
we
see
this
function-
that
reference
discovery
parse
this
function,
basically
that
we
call
in
the
benchmark
and
we
see
where
the
memory
is
allocated
and
that's
a
lot,
because
we
it
does
a
lot
of
iterations
like
how
many.
A
257
000
iterations
to
properly
measure
it
exclude
anything
that
has
happened
on
your
computer
and
yeah.
That's
a
lot
in
here,
then
this
allocates
and
that
allocates
and
split
and
allocates
which
we
also
saw
here
in
profiler
split
in
again,
this
allocates,
of
course,
because
it
creates
new
slices
new
baking
arrays
for
slices,
and
this
also
locates
strings
again
and
again
the
same
thing,
and
here
that's
the
same,
but
the
outer
scope-
and
this
is
the
same
out
of
scope.
Okay,
this
is
the
this
is
how
it
is
this.
A
B
A
A
Yeah
we
just
changed:
bytes
split
10
to
string,
string,
split
and
two
bytes
split
and
everything
else
stays
the
same
and
okay.
Let's
run
this
benchmark
again.
B
A
A
A
Let's
first
check
that,
and
the
next
thing
is
I
decided
to
check
because
I
don't
need
any
of
that.
I
only
need
references
I
don't
need
to.
I
don't
want
to
allocate
collect
all
the
refs
and
allocate
to
that
slice.
I
I
just
iterate
over
the
parse
data
and
call
a
callback,
so
I
change
the
signature
of
the
function
to
just
the
reader
and
the
callback
that
consumes
the
data.
We
no
longer
collect
anything
here
and
no
longer
collect
references
or
capabilities.
We
don't
need
that
guitar.
A
This
is
from
italy,
as
I
said,
needs
that,
but
we
don't
so.
I
ever
just
removed
that
and
simplified
the
code
and
also
replaced
split
n
with
the
cut
method.
It's
it
was
added
in
go
118
and
let's
first,
let's
look
at
two
bytes.
A
And
split
then
uses
gen
split,
oh
by
the
way
gene
split
we
saw
that
here,
but
it
was
for
sp
for
strings,
but
with
slices
it's
the
same
basically
and
allocate
a
slice
of
byte
slices.
It's
like
a
two
dimensional
array.
Basically
it's
an
array
of
arrays,
and
this
is
memory
that
we
don't
need
to
allocate
if
we
use
the
cut
because
cut
just
slices
the
input
slice
into
two
pieces.
A
So
it
looks
for
a
separator
and
returns
this
beginning
and
like
what
was
before
the
separator
and
what
what
is
after
the
separator
and
if
it
was
found
or
not,
so
you
don't
need
to
allocate
any
memory,
just
slice
it
where
the
separator
is
so
that
removes
a
single
location,
but
it's
it
was
used
in
multiple
places.
So
it's
a
new
method
in
118,
which
is
quite
useful
and
uses
less
memory.
A
I
also
found
that
moving
this
into
a
variable
package
level-
variable,
oh
and
that
and
that
doesn't
change
anything
actually,
so
this
allocation
is
in
line
by
compiler.
Basically,
somehow
I
don't
know
magic,
so
it's
so
we
we
do
two
things
again:
a
callback
plus
cut.
Instead
of
split
ends,
and
we
remove
all
that
stuff
that
we
don't
need
that
capabilities,
collecting
references
and
I
think
that's
it.
B
A
Right
is
that
right
yeah?
I
think
so
I
think
so.
Yes,
okay,
let's
run
the
benchmark
again
and
yeah
this
time.
I
use
the
reader
here
correctly.
The
first
time
I
forgot
to
change
in
the
first
commit-
and
this
is
iteration
three
okay.
B
A
A
Memories
consumed
here
allocated
sorry
here
and
here
so
new
scanner
and
scanners
can
allocate
memory
and
maybe
a
few
other
things,
but
maybe
not,
but
these
two
things:
okay,
the
next
thing
we
do
is
we
go
to
the
scanner.
A
A
Okay,
I
don't
know
where,
but
let's
first,
we
can
just
see
that.
A
A
What
we've
done
here
is
we've
stopped
wasting
memory
by
pulling
the
buffers,
and
that
is
done
like
this
using
singapore,
we
reuse
it's
a
it's
a
memory,
it's
a
free
list.
Basically
it's
a
list
of
three
buffers
and
if
the
buffer
is
not
available,
this
function
is
called.
It
creates
a
new
one
and
we
have
helpers.
We
already
use
that
for
32
kilobyte
buffers
which
we
use
for
all
io,
and
this
we
use
now
for
person,
and
then
we
just
use
that
pull
here
and
yeah.
A
A
A
So
what
we've
done
here
is
this
is
the
diff
is
a
little
bit
of
unsafe
magic.
A
So,
let's
look
at
the
code,
not
the
diff,
so
byte
slice
is
actually
this
thing
in
memory.
It's
a
pointer
to
the
backing
array
and
length
of
the
slice
and
capacity
of
the
slice.
So
pointer
is
eight
bytes
on
a
64-bit
machine,
and
these
two
are
also
machine-sized
words,
so
24
bytes
in
total
three
by
eight
and
we
need
the
string
and
the
string
is
the
same
thing
except
it
does
have
capacity.
It
just
has
length
so
pointer
and
length.
A
So
why?
Why
won't?
We
re
reinterpret
the
memory.
We
can
pretend
that
what's
in
memory
is
actually
a
string
and
not
and
not
a
slice
in
c
that
would
be,
and
in
c
plus
that
would
be
a
reinterpret
cast.
Basically,
so
we
we
can
do
that
and
go
by
going
through
the
unsafe
pointer.
So
we
take
the
pointer
to
the
slice
memory,
that's
pointing
to
those
three
words
and
why,
on
a
save
point
or
cast
it
to
a
pointer
to
the
header,
so
compiler
thinks
it's
a
byte
slicing,
a
pointer
to
a
byte
slice.
A
Construct
a
string
header,
which
is
what
the
string
is
basically
and
pointer
to
the
same
backing
array
and
then
use
length
as
the
length.
Obviously,
and
not
we
don't
need
capacity,
that's
the
length
of
the
string.
It
can't
change
it's
immutable,
so
capacity
doesn't
work
that
is
not
needed.
Then
this
is
the
stream
basically,
and
we
now
need
to
turn
it
into
a
string
so
that
we
can
pass
it
to
a
method
that
we
do
that
by
taking
a
pointer
so
pointer
to
the
to
what
is
a
string
to
that
header
via
unsafe.
A
A
A
So,
okay,
it
doesn't
have
this
method
because
it
doesn't
allocate
any
memory.
The
profiler
profile
doesn't
have
this
method.
How
do
we
know
what
allocates
here
then
in
this
map?
That
probably,
is
the
closure.
A
A
So
it
may
be
wait,
we
are
no,
we
know
right.
We.
A
A
Anyway,
you
can
see
more
if
you
want
to
look
at
the
code,
there
is
a
merge
request
for
it
linked.
Thank
you.