►
Description
Node.js Community Benchmarking Efforts - Gareth Ellis, IBM
Benchmarks and the information they provide are important to ensure that changes going into Node.js don’t regress key attributes like startup speed, memory footprint and throughput. Come and hear about some of the fundamentals of benchmarking, how to go about narrowing down the cause of a regression between versions of node along with the efforts underway in the community benchmarking workgroup (https://github.com/nodejs/benchmarking) to run/capture/report and act on benchmark information.
A
Hello,
my
name
is
Gareth
Alice
I'm,
going
to
be
talking
to
you
about
node,
community
benchmarking
efforts,
and
so
a
little
bit
of
information
about
me.
First
I've
been
working
as
a
runtime
performance
analyst
at
IBM,
since
2012
originally
I
was
looking
at
the
performance
of
our
our
version
of
Java
that
we
produce
for
stack
products
but
for
the
past,
18
months
have
been
looking
at
the
performance
of
nodejs
I'm,
also
a
member
of
the
benchmarking
workgroup.
A
But
okay,
Oh
Cleveland,
hear
me
still
so
an
introduction
to
benchmarking.
Then
first
thing
to
mention
is
the
benchmark.
You
know.
Performance
testing
is
quite
different
to
say:
functional
tests
in
our
system,
testing
in
functional
testing.
Typically,
there's
going
to
be
one
answer:
either
it
work.
Well,
there's
only
two
answers:
either
it
worked
or
it
didn't
in
performance
testing.
A
The
number
of
results
you
could
get
is
pretty
much
in
infinite.
If
you're
measuring
startup
time,
it
could
be
anything
from
zero
milliseconds
up
to
hundreds
or
thousands
of
milliseconds.
So
one
of
the
things
that's
important
to
try
and
do
is
to
change
one
thing
and
one
thing
only
between
the
runs
and
the
thing
that
you're
going
to
change
is
going
to
typically
be
whatever
it
is
you're
wanting
to
test.
A
So,
if
you're
wanting
to
see
the
or
measure
the
performance
of
the
latest
version
of
your
application
code,
you'd
be
trying
to
keep
everything
else,
that's
involved
the
same,
so
you'd
be
trying
to
run
it
both
on
the
same
version
of
nodejs,
really
on
the
same
machines
using
the
same
versions
of
various
mpm
modules.
The
project
may
require.
That
means
that
once
you've
done
your
tests
and
you've
got
a
result,
you
can
be
fairly
confident
that
the
improvement
or
regression
has
come
from
your
application
code.
Whatever
it
is,
you've
changed.
A
Sometimes
it
can
be
quite
tempting
when
you
go
and
issue
a
new
version
of
your
application
code
to
also
pulling
the
latest
version
or
your
modules
and
then
upgrade
the
version
of
node
that
you're
running
on,
and
maybe
also
put
it
on
the
latest
hardware
that
you've
got
available.
But
if
you
do
that,
it's
going
to
be
very
difficult
to
try
and
work
out
where
a
improvement
or
a
regression
has
come
from.
A
So
some
of
the
key
challenges
that
you
may
face
in
performance
testing,
the
first
one
is
there,
is
going
to
be
variability
in
the
scores
that
you
get.
If
you
run
the
same
test,
say
ten
times,
there's
a
very
good
chance
that
you'll
end
up
with
10
different
answers.
So
one
of
the
things
that
you
need
to
get
used
to
is
the
fact
that
you
can't
just
simply
do
a
single
run
and
expect
to
be
able
to
say
how
good
or
bad
a
change
was.
A
So
one
of
the
things
you
need
to
look
out
for
also
is
false
positives.
So
if
you've
just
checked
in
a
version
of
your
code-
and
you
think
this
is
probably
going
to
offer
me
about
ten
percent
improvement,
if
you
go
and
do
one
I
it's
about
ten
percent
better,
then
you
could
be
tempted
to
just
stop
there
in
deploy
your
code.
A
Another
key
challenge
is
being
able
to
keep
a
consistent
environment,
one
of
the
things
that
we
do
to
train
make
sure
that
we're
always
running
with
the
machine
in
the
same
state.
Before
we
start
a
round
of
benchmark
testing
will
go
and
reboot
the
machine.
Whilst
this
can
sometimes
add
a
little
bit
of
variability
to
the
scores,
it
means
that
if
we
need
to
go
and
try
and
recreate
the
environment,
it's
very
easy
for
us
to
do
just
by
rebooting
the
machine
rather
than
having
a
machine.
A
That's
been
running
for
months
and
months
and
months
with
thousands
of
hours
of
testing
done
on
it.
It's
going
to
be
very
difficult
for
you
to
get
it
back
into
that
same
state.
If,
for
example,
a
security
update
comes
along
that
you
need
to
reboot
the
machine
or
have
a
power
cut
something
along
those
lines.
A
Another
thing
to
do
trying
to
keep
a
consistent
environment
is
being
able
to
isolate
the
machine,
so
this
includes
making
sure
that
people
aren't
going
to
be
logging
in
and
using
the
machine
whilst
you're
doing
your
testing,
because
otherwise
you
could
actually
be
measuring
the
fact
that
there
that
they've
been
using
some
of
the
CPU
resource
or
memory
or
whatever.
Also
if
your
test
is
going
to
be
using
a
network
having
a
private
or
a
dedicated
network
available
for
your
test
is
also
a
good
approach.
A
A
Otherwise,
in
that
sort
of
situation,
having
a
private
dedicated
network
wave
effectively
got
infinite
band,
it
may
not
be
the
best
way
to
go
something
else
that
you
can
do
to
try
and
reduce
the
variability
and
also
try
and
make
your
ins
a
little
bit
more
consistent
would
be
interleaving
the
measurements
that
you
take
so
say.
If
you've
decided
that,
because
of
the
range
of
skills,
you
get
your
going
to
need
to
run
your
test
ten
times
what
we
may
do
is
we
may
ruin
our
baseline.
A
So
it's
got
our
known
good
version
once
and
then
we'll
run
the
version
of
the
code
that
we
wanting
to
test
and
then
the
good
version
again
and
the
one
that
we're
going
to
test
and
so
on
and
keep
switching
between
these.
This
would
mean
that
if
there
is
any
sort
of
natural
Varian
variation
in
the
performance
of
your
machine
that
you're
running
on
that
should
hopefully
be
taken
account
by
the
fact
that
you
keep
running
one
and
then
the
other.
The
third
key
challenge
to
try
and
get
over
is
jumping
to
conclusions.
A
It's
very
easy
to
see
the
first
run
of
your
baseline
and
then
your
build,
see
attempts
and
improvement
and
decide
actually,
no
I
don't
need
to
bother
doing
any
loads
of
reruns
or
anything
I'm.
Happy
with
that,
you
need
to
make
sure
that
you've
continued
with
the
number
of
iterations
that
you've
decided
is
necessary.
Sometimes
the
data
can
be
very
misleading.
It's
always
important
to
try
and
get
a
good
set
of
consistent,
reliable
data.
A
So
two
different
approaches
that
you
could
take
to
performance
testing.
First
one
would
be
running
micro
benchmarks.
These
are
quite
useful
for
being
able
to
measure
a
specific
function
or
API.
For
example,
you
could
be
measuring
how
creating
a
new
buffer
may,
how
long
that
may
take
it's
quite
good
for
being
able
to
compare
key
characteristics
of
either
your
application
or
runtime.
There
are
some
disadvantages
as
well,
however,
one
of
them
being
that
micro
benchmarks
may
not
always
represent
a
real-world
improvement.
A
So
if
you
find
a
micro
benchmark
and
go
and
check
in
some
changes
to
try
and
improve
that
micro
benchmark,
it
may
not
actually
translate
into
a
real
world
improvement
in
in
a
final
product.
The
second
disadvantage
of
micro
benchmarks
is
that
sometimes
you
risk
not
actually
measuring
what
you
think
you're
measuring,
so
one
of
the
ways
that
quite
a
lot
of
micro
benchmarks
work
is
by
repeating
the
same
action
many
times
if
the
JIT
or
the
optimizing
JIT,
that's
in
your
runtime
is
able
to
spot
that.
A
The
second
approach
would
be
a
whole
system
benchmark,
so
this
would
be
where
you'd
be
trying
to
test
something
that
represents
a
solution
that
may
be
deployed
in
production.
One
of
the
benchmarks
that
we
use
in
the
benchmarking
workgroup
is
Acme
air,
which
simulates
a
fictional
airline
company
and
you
we
measure
the
throughput
of
the
number
of
requests,
a
second
by
users,
logging
in
booking
on
flies,
checking
in
logging
out
things
like
that.
A
One
of
the
disadvantages
of
this
type
of
system,
however,
is
that
typically
the
more
things
that
your
system
is
doing,
there's
more
room
to
introduce
variance
in
the
scores
that
you're
getting
out.
So
whilst
you
may
get
more
reliable
data
from
a
micro
benchmark,
you
could
perhaps
argue
that
you
may
get
more
useful
or
real-world
data
from
a
whole
system
benchmark.
A
So
if
you
followed
all
of
these
and
then
you
think,
you've
found
a
regression.
What
should
you
do
now?
The
first
thing
to
do
is
to
check
that
the
environment
and
the
data
that
you've
collected
is
correct.
Otherwise,
you
could
find
you
go
and
invest
many
hours
of
time
going
down,
trying
to
find
what
the
problem
was
when
actually
it
was
just
perhaps
and
variable
data
that
you've
collected,
if
you
sure
and
you've,
had
a
look
at
the
data
and
it's
within
the
expected
variance
that
you've
measured.
A
So,
if
we're
running
in
no
Jess
there's
a
number
of
different
possible
sources
that
a
regression
may
come
from,
it
could
be
perhaps
from
some
of
the
native
JavaScript
libraries.
It
could
be
that
your
new
version
of
nodejs
you're
testing
on
his
pic
top
of
the
8th
upgrade,
so
it
could
potentially
be
a
issue
as
a
result
of
a
v8
upgrade.
A
It
could
be
that
there's
been
a
security
fix
in
openssl
and
that's
where
the
regressions
come
in
could
be
libel,
UV
update
it
could
even
be
that
you've
gone
and
downloaded
the
latest
version
of
modules
and
they've
then
perhaps
caused
the
regression
or
it
could
be
if
you've
updated
your
version
of
v8
that
now
you're
NPM
modules
have
to
be
recompiled,
and
some
of
the
issues
have
come
in
that
way.
A
So
quick
example,
then,
so
this
is
a
micro
benchmark
that
we've
been
running
at
IBM
and
it's
simply
measuring
how
long
it
takes
to
create
a
new
instance
of
a
buffer
from
an
array
of
numbers.
So
we
have
a
function
there
which
goes
and
repeats
this
300,000
times,
and
then
it
goes
and
pushes
that
through
a
test
harness
which
the
test
harness
runs
it
until
either
we've
reached
the
maximum
number
of
attempts,
or
until
we've
got
some
reliable
data.
A
When
we
were
running
this
a
few
months
ago
in
between
node
4
32
and
note
44,
we
noticed
a
quite
a
sizable
regression
between
the
two
versions.
So
we
can
see
there
that
in
node
4
3
2,
we
were
getting
about
ten
operations,
a
second
10.6
operations,
a
second.
When
we
went
up
to
node
4
4,
we
were
only
getting
just
over
six
operations
a
second,
so
it's
a
fairly
sizable
regression
again.
A
A
So
as
I
mentioned
before,
our
metrics
could
be
one
option.
It's
something
that
you
can
install
through
NPM
and
you
can
programmatically
get
information
out
about
your
CPU
usage,
GC
memory,
v8
profile
in
loads
of
different
stuff.
So
you
can
either
pro
write
some
code
that
can
output
that
to
a
file
or
you
can
actually
connect
it
over
the
network
to
IBM
health
center
to
get
live
monitoring.
A
So
the
v8
profiler-
this
comes
as
part
of
the
node
binary
in
more
recent
versions,
and
you
can
turn
it
on
by
just
adding
minus
minus
prof
to
your
command
line.
This
will
then
generate
a
file,
that's
called
isolate,
then
a
hex
number
dash
v8
log,
which
you
can
then
post
process
by
doing
node-,
minus
prof
process
and
then
the
isolate
file
that
it's
created
and
that
will
give
you
a
breakdown
of
where
the
time
is
used
in
various
javascript
methods
and
other
things
that
v8
is
doing.
A
There
is
also
some
helper
modules
that,
if
you
prefer,
you
can
use
to
try
and
automate
this
so
when
I
went
and
run
this
on
our
two
versions
of
node
I
collected
my
post
process
data
and
then
I
dipped
them
to
the
top.
Few
lines
is
from
node
4
3
2,
so
we
can
see
23%
spent
in
lazy,
compile
of
the
from
object
in
buffer
and
then
in
node
4
40,
it
that
had
gone
up
to
forty
seven
percent.
So
that's
perhaps
somewhere
to
look
for
for
completeness.
A
I
then
also
ran
it
through
perf,
which
is
a
system
profiler
that
you
can
get
on
linux
based
systems.
So
again,
there's
a
wide
number
of
different
options.
You
can
pass
it
pass
into
Perth
these
ones.
I've
got
on
the
screen
here.
I
found
worked
quite
well
and
again.
You
can
then
do
perfil
report
to
do
the
post-processing,
which
then,
when
we
compare
them,
we
can
see
an
increase
in
the
time
spent
from
23%
up
to
forty
six
percent.
A
If
we
weren't
able
to
spot
anything
in
doing
that,
the
next
thing
that
we
could
have
a
go
at
doing
would
be
to
binary
chop.
So
so
we
could
either
do
it
manually
get
a
list
of
the
good
and
bad
result.
All
the
changes
sets
between
the
good
and
the
bad
version
and
then
go
through
and
manually,
rebuild
them,
or
we
can
use
git
bisect
and
put
together
a
small
script
that
was
able
to
run
our
benchmark
and
then
decide
if
it
passed
or
failed.
A
If
you
wish
in
that
pull
request
5819
so
now,
I'm
going
to
briefly
talk
about
the
work
that
we've
been
doing
in
the
community
benchmark
work
group,
some
of
the
things
that
we're
up
to
and
how
you
could
also
get
involved.
So
the
work
group
has
a
mandate
to
track
and
evangelize
performance
gains
between
different
node
releases.
Some
of
our
goals
are
to
define
use
cases,
identify
bench
maps
that
we
can
run
and
then
run
and
capture
the
results.
A
We've
got
12
members
at
the
moment
we
have
meetings
every
month
or
so
with
the
next
meeting
likely
to
happen
in
probably
next
week.
So
you
can
get
some
more
information
by
looking
gone
github,
it
nodejs
benchmarking.
You
can
also
look
at
the
graphs
and
charts
from
the
results
of
our
runs.
At
benchmarking.
Don't
know
jst,
so
this
is
a
list
of
the
people
that
we've
currently
got
involved.
A
Michael
Dawson
from
IBM
is
the
facilitator
that,
along
with
some
people
from
various
other
communities
and
also
freelancers,
some
of
the
benchmarks
that
we're
currently
running
so
we've
got
some
that
measure
startup
time,
and
so
that's
looking
at
how
long
it
takes
for
an
ode
to
actually
start
and
get
going.
We've
got
some
that
are
measuring
the
footprints
or
how
much
memory
node
actually
uses.
A
Well,
so
it's
running
it's
the
amount
of
time
it
takes
to
require
a
module
and
with
most
projects
using
a
large
number
of
NPM
modules,
birthers
direct
requirements
and
then
child
requirements,
that's
a
very
important
metric.
We
were
in
Acme
air,
as
I
mentioned
before,
where
we
track
the
throughput
or
the
number
of
operations
a
second.
We
track
the
response
time
and
also
we
take
footprint
measurements
at
different
stages
of
the
run,
to
see
how
node
performs,
not
only
when
it's
idle,
but
also
when
it's
just
a
lot
of
work.
A
We
also
have
a
docker
file
available
for
comparing
two
different
versions
of
node.
So
if
you
want
to
have
a
look
at
how
two
different
versions
or
node
compare,
you
can
run
those
through
the
docker
file
and
it
will
produce
some
output
at
the
end,
where
it'll
tell
you
if
the
new
version
is
better
or
worse,
we're
also
in
the
process
of
putting
together
the
facility
to
allow
to
test
particular
pull
request
again,
some
of
the
benchmarks
that
come
as
part
of
the
node
source
which
that's
taking
place
in
8157,
so
I'm
benchmarking,
nodejs
dog.
A
We
have
a
number
of
charts
that
look
a
bit
like
this,
so
this
is
the
one
where
we're
tracking
throughput
on
acne
air.
You
can
hopefully
see
there's
a
bit
of
a
jump
around
April
time,
so
that
was
when
we
took
a
new
version
of
the
eight
you'll,
also
notice
that
more
recently
mid-august,
we've
got
a
blue
line
at
the
bottom.
A
A
Second,
one
would
be
service-oriented
architectures,
so
this
is
typically
where
api's
are
provided
that
may
go
and
call
into
a
large
number
of
other
API
is
to
produce
one
result
generating
and
serving
a
sorry
micro
service
based
applications.
So
these
are
going
to
typically
be
a
very
nimble,
low
resource,
quick
start
off
applications
where
there
may
also
be
a
number
of
different
micro
services
running
on
the
same
system.
So
a
small
amount
of
footprint
there
would
be
better,
generating
and
serving
dynamic
web
page
content.
A
Some
of
the
key
attributes.
So
these
are
different
metrics
that
we're
looking
to
tracked
me
and
we
obviously,
we
still
got
some
gaps
here
that
we
need
to
fill
in.
So
we've
got
two
memory
footprint
measures,
one
just
how
much
node
uses
once
it
started
before
it's
done
any
work,
and
then
one
also
after
load,
there's
node
CPU
usage
at
idle.
A
Ideally
that
shouldn't
really
be
using
any
CPU
at
all
throughput
as
we're
tracking
through
Acme
air
operations
per
second
and
then
the
how
large
the
node
packages
when
you
download
it
and
then
also
once
you've
installed
it,
how
much
space
is
it
going
to
be
using
on
disk?
We
also
want
to
start
collecting
some
metrics
on
GC,
not
only
the
CPU
usage
impact
on
the
unknown,
but
also
how
quickly
it's
able
to
allocate
memory
tracking
the
max
pours
times
when
under
load.
A
And
finally,
then,
if
you're
wanting
to
get
involved,
you
can
take
a
look
at
the
github
repo.
No
GS
benchmarking
have
a
look
at.
What's
there
have
a
look
around
at
the
charts,
if
you
think
something's
missing
or
if
you
think
something's
wrong,
then
by
all
means
open
an
issue.
We've
got
an
issue
open
at
the
moment,
organizing
their
next
meeting.
So
if
you
wish,
you
can
either
join
the
meeting
and
contribute,
or
even
just
listening
via
the
YouTube
on
the
air
recording,
and
that
is
it.
Thank
you
very
much.