►
Description
Gareth Ellis is a part of Node.js benchmarking working group: https://github.com/nodejs/benchmarking.
In this video, he provides an introduction to benchmarking, how to get started with benchmarking (depending on what you are looking to test), key challenges to benchmarking, approaches to benchmarking, benchmarking Node.js, and use cases.
Gareth's GitHub page is here: https://github.com/gareth-ellis.
Thank you to Opbeat for sponsoring the videos for Node.js Live Paris, and IBM for sponsoring the Node.js Live Paris event.
A
So
an
introduction
to
benchmarking,
then
one
of
the
most
important
things
when
you're
benchmarking
or
performance
testing,
your
application
or
your
own
time
or
whatever
is
you
should
change?
One
thing,
one
thing
only
this
might
seem
quite
obvious,
but
the
thing
you
should
change
obviously,
is
whatever
it
is:
you're
wanting
to
work
out
the
performance
of
so
if
you've
gone
and
checked
in
a
new
copy
of
your
application
code
in
your
wanting
to
know,
does
it
perform
as
well
or
better
than
what
it
was
doing
previously.
You'll
be
keeping
everything
else.
A
The
same
so
you'll
be
running
on
the
same
version
of
node
same
machine
using
the
same
set
of
NPM,
same
versions
of
those
modules
as
well
and
just
be
changing
your
application
code,
and
it
can
sometimes
be
quite
tempting
to
go
and
change
everything
at
once.
Oh
you
just
released
version
2
of
the
application,
your
think
actually
and
also
will
go
and
update
the
version.
A
A
node
that
we
running
on
I
will
go
and
update
to
note
version
6
and
we're
going
to
put
it
on
this
new
brand
new
machine
we've
got
and
we
may
as
well
go
and
pull
in
all
the
latest
versions
of
all
the
modules.
That
means
that
it's
quite
difficult
to
decide
whether
there
is
actually
a
performance
regression
or
not.
You
could
just
be
masking
a
performance
regression
by
an
increase
in,
for
example,
the
speed
of
your
machine
that
you're
Elina.
A
Something
else
that's
worth
mentioning
is
that
performance,
testing
and
benchmarking
is
quite
different
to
functional
testing
in
functional
testing.
Typically
you'll
do
you
run
and
it
will
either
pass
or
it
will
fail,
and
if
it
fails,
you
can
maybe
rerun
it
and
it
might
work
the
next
time
with
performance
testing.
It's
not
only
whether
the
run
completes
are
not
but
you're
also
going
to
be
interested
in
whereabouts.
A
The
data
is
in
your
sort
of
range
of
acceptable
scores,
and
this
is
going
to
bring
us
on
to
some
of
the
key
challenges
when
doing
performance
runs
and
benchmarking.
The
first
key
challenge
is
that
fund
up
there
is
fundamental,
renter
and
variance.
So
if
you
go
and
run
the
same
benchmark
and
the
same
version
and
out
on
the
same
machine
once
after
in
each
other,
there's
a
very
good
chance
that
you'll
get
two
different
scores
and,
depending
on
your
environment
and
a
lot
of
other
things,
they
could
be
quite
far
apart.
A
So
the
first
thing
you're
going
to
need
to
do
when
you're
doing
some
benchmarking
is
to
run
your
benchmark
a
good
number
of
times.
So
you
can
get
a
good
feel
for
the
data
for
the
range
of
scores
that
you
see.
This
is
also
possibly
going
to
hold
you
back
later
on.
If
you
go
and
bring
your
benchmark
and
find
out
actually
there's
a
ten
percent
difference
between
the
best
score
and
the
worst
car.
A
So
the
machine
that
you're
doing
your
testing
and
you're
going
to
want
to
try
make
sure
that
it's
in
the
same
state
each
time
you
and
your
tests
in
our
team
at
IBM,
one
of
the
things
that
we
do
to
try
and
get
around
this
is
we
reboot
our
machine
before
we
ruin
each
set
of
tests?
This
means
that
each
time
we
ran
our
test
it,
the
machine
should
be
hopefully
in
the
same
state,
so
there
won't
be
any
left
over
processes
or
anything
else
like
that.
A
That
could
be
affecting
our
scores,
and
it
also
means
it's
very
easy
for
us
to
get
back
into
that
same
state.
So
if
we
want
to
rerun
a
set
of
tests,
we
can
just
reboot
the
machine
run
them
again,
and
hopefully
the
machine
should
be
in
the
same
state
and
something
else
that
would
help
with
getting
consistent
environment
is
making
sure
your
machine
is
isolated
from
outside
interference,
so
that
could
be
making
sure
other
people
aren't
going
to
be
logging
into
your
machine
and
running
something
at
the
same
time.
That
could
skew
your
scars.
A
A
Something
else
you
may
want
to
do
would
be
to
interleave
your
good,
build
and
whatever
it
is
that
you
wanting
to
test,
and
by
that
I
mean
that
we
could
run
one
copy
of
your
good,
build,
followed
by
a
copy
of
the
one
you
testing
and
so
on,
and
that
should
hopefully
mean
that
any
interference
that's
happening
on
the
machine
may
be
alleviated
it's
much
better
than
comparing
the
scores.
You
got
from
your
good
build
when
you
run
it
two
months
ago.
Things
on
a
machine
will
change,
even
though
you
think
they
may
not.
A
The
final
key
challenge
is
the
jumping
to
conclusions.
So
it's
very
it's
very
easy.
So
you're
going
to
do
a
single
run
of
your
good,
build
singer.
Only
build
your
testing
they'll,
let
the
same
brilliant
job
done.
We
don't
need
to
bother
doing
all
those
reruns,
because
everything
looks
fine.
You
should
still
go
back
and
make
sure
you
collect
a
good
number
of
measures,
so
you
have
confidence
in
the
data,
so
some
different
approaches
that
you
may
take
towards
benchmarking.
A
The
first
approach
could
be
to
do
some
micro
benchmarks,
so
this
could
be
useful
if
you're,
implementing
a
new
API
or
a
new
function
or
you
making
some
changes
to
an
API
or
function.
You
want
to
see.
Have
your
changes
made
any
changed,
how
it
performs.
These
are
quite
useful
for
maybe
comparing
key
characteristics.
A
Some
things
to
be
aware
of,
though,
is
that
even
if
you
go
in
improve,
for
example,
the
buffer
API,
so
it
now
runs
300
times
faster.
The
may
that
may
are
not
actually
translating
to
any
real-world
improvements,
because
perhaps
yeah
creating
a
buffer
might
be
a
very,
very
small
percentage
of
the
time
spent
in
a
real
world
application.
A
Another
approach
to
benchmarking
would
be
sort
of
whole
system
benchmarking,
so
this
could
be
where
you
would
be
pulling
various
metrics
out
of
your
full
application.
An
example
of
that
might
be
the
Acme
air
benchmark,
which
is
one
of
the
benchmarks
that
we
run
in
the
community
benchmarking
workgroup.
This
is
a
fiction
airline
and
a
user
can
create
it
ourselves
account
login
book
themselves
on
flights
check
in
all
sorts
of
stuff
like
that,
and
we
can.
A
We
use
jmeter
to
drive
load
against
that
to
measure
how
many
requests
a
second
nodejs
conserve
along
with
some
of
the
metrics.
This
is
good
because
it
represents
a
more
realistic,
real-world
approach.
The
disadvantage
is
the
more
things
that
you
are
doing
within
your
benchmark.
You
introduce
more
room
for
variance,
so
there's
a
good
chance
that
you
might
get
really
consistent
micro
benchmark
results,
but
when
you
go
and
do
lots
of
these
micro
benchmarks
in
a
whole
world
system,
you
may
find
that
the
variance
between
your
different
runs
increases
quite
a
bit.
A
So
you've
got
in
collected
a
load
of
data.
What
you're
going
to
do
now?
You
think
you
found
a
regression.
The
first
thing
to
do
is
to
make
sure
that
you
sure
they
actually
have
found
a
regression
make
sure
that
you've
got
a
good
range
of
data,
for
you
good,
build
and
whatever
it
is
that
you're
testing
have
a
look
at
the
variance
compared
to
the
percentage
regression.
A
You
think
you
found,
as
I
mentioned
before,
if
you've
got
a
ten
percent
variance
or
temp,
then
range
of
scores
for
your
benchmark,
and
you
think
you
see
in
a
two
percent
of
aggression-
that's
going
to
be
quite
difficult
when
we
get
later
on
to
trying
to
narrow
it
down,
because
we're
going
to
need
to
make
sure
that
this
regression
that
we
think
we
found
is
easy
to
reproduce
to
give
us
the
best
chance
of
finding
what
it
is.
That's
caused
it.
A
So,
if
we're
sure
that
the
regression
exists,
we
then
need
to
have
a
look
at
what
it
is
that
we
actually
changed.
If
it's
our
application
code,
one
way
around,
we
could
have
a
look
at
what
is
that's
change
between
our
good
build
of
our
application
code
and
this
one
that
we're
testing
it
could
be
that
you've
gone
and
upgraded
your
copy
of
node.js
to
a
later
version,
in
which
case
we
need
to
have
a
look
at
what
it
is.
That's
a
change
within
nodejs.
A
It
could
be
that
you've
just
going
to
move
your
whole
stack
onto
and
new
server.
That's
running.
That
should
be
run
in
a
bit
faster,
but
maybe
it's
not
quite
running
faster,
in
which
case
we
need
to
start
looking
at
what
v8
or
other
platform
specific
things
are
doing.
Perhaps
it's
doing
something
that
we're
not
expecting.
A
We
then
need
to
be
able
to
compare
between
the
good
and
the
bad
cases,
so
this
tools
that
we
can
use
well,
in
fact,
there's
hundreds
and
hundreds
of
different
tools.
We
could
use
I'm
going
to
talk
about
a
few
of
them.
Another
option
is
we
could
just
binary
chup
our
change
sets.
So
if
we're
looking
at
nodejs-
and
we
can
see
that
one
version-
a
node-
we
good-
we
were
good
and
then
another
we
were
bad.
We
can
have
a
look
at
what
it
is.
A
That's
changed
between
those
two
different
versions
to
try
and
help
us
work
out
what
might
have
caused
the
regression
so
within
nodejs,
as
I'm
sure
you're
all
aware,
there's
lots
of
different
things
that
come
into
the
nodejs
project
to
provoke
to
providers
with
our
nodejs
binary.
This
some
native
JavaScript
thing
is
in
the
lib
folder,
so
buffer
cluster
lots
and
lots
of
things
in
there.
V8
we,
if
you
upgraded
your
version
of
node,
it
may
well
have
pulled
in
a
newer
version
of
v8
that
could
potentially
have
caused
a
regression.
A
There
may
have
been
a
recent
security
fix
that
could
also
affect
performance.
It
could
be
libby.
Uv
could
be
new
version
of
NPM
that
you've
pulled
mpm
module
that
you've
pulled
in.
It
could
be
lots
of
different
things,
so
some
of
the
tools
that
we
could
use,
we
could
look
at
some
JavaScript
profilers,
and
so
we've
got
the
v8
profiler
and
we
also
got
something
such
of
app
metrics,
but
there
are
lots
of
others
available
as
well.
We
may
also
want
to
use
a
native
system
profiler.
A
A
What
we
do
is
we
go
and
get
a
quite
a
large
array
with
60
numbers
in
it.
We
then
go
in.
We've
got
our
full
loop
at
the
bottom
there,
which
is
going
to
go
and
create
lots
of
new
buffers
from
this
array.
We
then
go
and
run
this
through
a
test
harness
which
will
keep
repeating
until
we
either
get
a
good
consistency
of
data
or
until
we've
reached
our
maximum
number
of
runs.
When
we
run
this
on
node
4
3
2,
we
were
seeing
operations
a
second
of
about
ten
point.
A
A
So
these
are
some
of
the
tools
that
we
could
use
to
try
and
narrow
down
what
it
is.
That's
caused
this
regression,
so
the
first
one
is
the
v8
profiler.
This
comes
as
part
of
v8
and
it's
you
can
expose
it
through
node
by
adding
minus
minus
proof
to
your
command
line.
When
you
run
that
it
will
produce
a
file
just
as
we've
got
there,
isolate
a
hex
number
and
then
v8
dot
log,
so
you
can
run
your
benchmark
or
your
application
normally,
and
then
you
can
go
and
post
process.
A
This
later
so
a
more
recent
versions
of
node,
4
and
node
5,
you
can
use
minus
minus
prof
process
and
that'll
go
and
pros
process
this
log
and
give
it
something
put
in
a
format.
That's
a
little
bit
more
readable.
There
are
also
some
helper
modules,
for
example
v8
profiler,
which
will
let
you
programmatically
enable
and
disable
the
VA
profiler.
A
So
when
I
went
and
run
this
an
hour,
good
build
I'm
for
dot
32
an
hour
one
that
we
were
testing
node
for
up
for
dot
zero.
We
saw
quite
an
increase
in
the
compilation
of
the
from
object,
so
we've
originally
we've
seen
about
20,
just
over
twenty
three
percent
of
the
time
spent
in
that,
and
then
when
we
are
to
fold
up
for
at
zero.
This
went
up
to
forty
seven
percent,
so
it's
a
fare
increase
can
go
and
get
similar
set
of
data
out
of
Perth,
which
is
our
whole
system.
A
Profiler,
and
the
output
of
this
would
also
show
you
time
spent
in
native
system
modules
and
things
like
that.
But
again,
the
big
difference
between
the
good
and
the
bad
profile
was
this
time
spent
in
this
lazy.
Compile
so
that's
suggesting
that
it
could
be
something
that
v8
is
doing
differently,
so
we
can
then
go
and
turn
on
some
extra
trace
options
into
v8.
A
So
again
we
can
pass
these
straight
into
our
node
command
line,
so
you
can
use
minus
minus,
trace,
upped
and
minus
minus
trace
d
upped,
which
is
going
to
look
at
optimizations
and
D
optimizations
and
our
good
case.
We
saw
that
we
were
noticing
that
from
object
was
quite
hot,
so
we're
going
to
compile
it
and
optimize
it
and
then,
in
our
bad
case
we
saw
that,
just
after
doing
all
that
work
of
compiling
and
optimizing,
we
then
went
and
dropped
the
optimization
area
and
we're
going
to
D
optimized
it.
So
that's
a
bit
funny.
A
Another
way
that
we
could
go
and
try
and
find
the
difference
between
these
are
the
same
before
it's
just
sort
of
going
binary
chop.
The
change
sets,
which
is
all
right
in
this
case,
because
we
were
looking
at
two
adjacent
versions
of
node.
But
if
you
are
upgrading
from
nerd
028
to
node,
6
you'd
have
a
few
more
change
sets
to
look
through
and
it
might
take
a
bit
more
time.
A
You
can
go
in
get
all
sorts
of
CPU
information,
GC
memory,
profile
and
information
out
of
v8.
You
can
either
add
the
what
the
information
that
you
want.
Programmatically
like
this
turning
on
when
we
get
cpu
information,
go
and
print
it
out
to
console,
or
you
can
also
connect
IBM
health
center
to
it
over
the
network
and
then
connect
directly,
and
that
also
allows
you
to
enable
and
disable
the
method.
A
It's
just
changing
the
scope
of
the
variable,
and
when
we
made
that
change,
we
saw
the
performance
go
back
up
to
where
it
was
in
the
previous
version,
a
node,
oh
yeah,
so
I'm
just
going
to
talk
now,
very
briefly
about
the
node.js
benchmarking
workgroup.
So
the
workgroups
been
going
for
quite
a
while.
We
got
a
mandate
to
track
and
evangelize
performance
gains
between
node
releases.
We've
got
key
goals
of
defining
use
cases
for
node,
identifying
benchmarks
that
represent
these
use
cases
and
then
running
and
capturing
results
and
reporting
them
to
the
community.
A
We've
currently
got
about
13
members
and
we
have
meetings
every
month,
or
so
we
had
a
meeting
on
Tuesday
to
our
next
one's
likely
to
be
in
early
May.
You
can
have
a
look
at
what's
going
on
on
the
github
page,
there
Jess
benchmarking,
and
you
can
also
see
the
various
grass
for
the
benchmarks
we
were
in
there,
I'm
benchmarking
dunno
jst.
So
these
are
a
few
of
the
use
cases
that
we've
discussed
in
the
community
and
have
we
then
approved
at
the
meeting
on
Tuesday
this.
These
aren't
necessarily
set
in
stone.
A
So
if
you're
looking
at
this
list
and
saying
actually
my
use
case,
isn't
there
then
come
and
let
us
know-
and
we
can
see
where
we
can
go
there.
So
some
of
the
use
cases
we've
got
such
things
such
as
back-end
API
services.
So
this
is
rest
and
rest
like
ap
is
typically
running
over
the
internet
or
over
public
networks,
and
so
we're
going
to
want
to
make
sure
that
node
can
perform
and
doesn't
regress
in
these
sort
of
situations
service-oriented
architectures.
So
these
may
be
typically
be
private.
A
A
This
may
be
cases
where
they
use
some
different
types
of
networking
protocols,
possibly
things
such
as
UDP,
so
we're
going
to
want
to
make
sure
that
node
is
as
successful
as
possible
at
transmitting
this
information
generating
and
serving
dynamic
web
page
content.
So
we've
got
modules
such
as
Express
have
piko
react
and
so
on.
All
these
modules
are
very
popular
within
the
node
ecosystem,
and
we
want
to
make
sure
that
things
that
we
change
within
node
don't
go
and
regress.
A
The
use
of
the
modules
such
as
this
single
page
applications
then
communicating
back
to
the
backend
over
web
sockets
in
HTTP
two
agents
and
data
collectors
that
may
be
distributed
through
networks
where
we're
getting
to
want
to
be
able
to
update
those
automatically
rather
than
having
to
go
redeploy
nodes,
also
use
quite
a
lot
in
sort
of
small
scripts
are
being
able
to
script
stuff
to
run
quickly.
So
in
those
sort
of
cases
where
we'd
want
it
to
be
using
very
low
amounts
of
CPU
low
amounts
of
memory.
A
Quick
startup
things
such
as
that,
so
all
of
these
different
use
cases
we're
going
to
be
looking
at
these
lists
of
metrics,
so
we're
going
to
want
consistently
because
consistent
low
latency
in
our
communication,
the
ability
to
support
hiking
currency,
we're
going
to
be
wanting
to
look
at
throughput.
We
were
going
to
a
fast
start,
obtain
fast,
shut
down
and
therefore
also
fast,
restarting
and
also
using
low
resource,
so
memory
and
CPU.
A
If
you
want
to
have
a
look
at
these
use
case
in
a
bit
more
information
in
a
bit
more
detail,
the
information
is
again
on
the
nodejs
benchmarking
workgroup,
so
the
benchmarks
that
we've
been
running
so
far.
At
the
moment,
we've
been
looking
at
some
quite
basic
startup
tests,
so
we're
having
a
look
at
how
quickly
node
can
start
up
when
it's
not
going
to
be
really
doing
very
much
we're
going
to
be
looking,
we've
been
looking
at
footprints
or
how
much
memory,
footprint
or
resident
sectors
node
use.
A
A
We've
been
running
this
since
februari,
so
at
a
time
so,
once
a
day,
we
go
and
run
the
latest
check
out
from
github
of
0,
12,
4
and
master,
and
then
once
we've
cut
node
6
we'll
be
adding
that
to
the
chart
as
well.
Obviously,
at
the
moment
it's
being
tracked
in
master
and
it's
quite
a
good
sign,
so
we
can
see
there
at
the
top.
We've
got
the
master
branch
performing
the
best
followed
by
node
4,
followed
by
node
0
12.
So
it's
the
right
sort
of
pattern.
A
So
how
can
you
get
involved?
Going?
Have
a
look
at
our
github
repo,
no
GS
benchmarking
go
and
have
a
look
at
what
it
is.
We're
running,
have
a
look
at
our
use.
Cases
have
a
think
about
how
you're
using
node,
if
you're
saying
well
actually
how
I
use
node
or
the
bench
sort
of
things
that
I
run.
Aren't
there
open
an
issue
in
letters?
No,
we
want
to
try
and
get
as
many
benchmarks
running
as
possible
that
cover
all
the
uses
of
nodejs.