►
From YouTube: Rust Linz, February 2021 - Will Hawkins - Comparing performance of range- and counter-based loops
Description
In this short session, I will share the story of investigating the difference in performance between a `for v in expression` loop and a `while` loop. The talk will cover topics in the Rust language itself, compiler optimization levels and how to benchmark performance.
Code: https://github.com/hawkinsw/speed_limits_of_loops_in_rust
Will on Twitter: https://twitter.com/hawkinsw
Rust Linz on Twitter: https://twitter.com/rustlinz
Submit your talk: https://sessionize.com/rust-linz
On the web: https://rust-linz.at
A
My
name
is:
will
hawkins
and
at
any
point
feel
free
to
reach
out
to
me
over
email,
hawkinsw
gmail.com.
If
you
have
any
questions
or
you
can
find
me
on
twitter
hawkinsw.
A
They
look
very
different
from
the
ones
in
the
united
states,
so
I
hope
that
I
I
hope
that
I
got
it
right,
but
we'll
talk
a
little
bit
about
oh
good,
good,
I'm
glad
it
is
so
we'll
talk
a
little
bit
about
speed
limits
and
how
fast
we
can
go
with
looping.
A
A
Then
we'll
talk
a
little
bit
about
writing
canonical
loops
in
rust,
or
what
the
rust
book
would
recommend
that
you
write
then
we'll
talk
about
writing
fast
loops
in
rust,
which
indicates
that
there
might
be
a
problem
with
canonical
loops.
We'll
draw
some
tentative
conclusions,
we'll
talk
about
release
mode,
which
is
something
you
have
to
always
be
aware
of
we'll
address
any
persistent
speed,
differences
and
we'll
draw
some
final
conclusions,
so
I
hope
we'll
be
able
to
address
any
questions
as
well
as
we
go.
So
here
we
go
so
the
motivation
for
this
talk.
A
I
started
with
trying
to
benchmark
a
simple
function.
Call
I
was
trying
to
answer
the
question
of
whether
it
is
faster
or
slower
to
call
a
function
with
move
semantics
than
it
is
to
call
it
with,
with
sharing
semantics,
passing
a
reference
versus
passing,
something
that's
movable
and
just
giving
up
control
of
it
to
the
function.
A
I
would
anticipate,
but
I
don't
want
to
presume
that
everyone
understands
the
difference,
but
it's
not
I'm
happy
to
answer
that,
but
it's
incidental
to
what
we're
talking
about
here,
so
you
don't
have
to
understand
the
difference.
The
point
is
that
my
motivation
was
to
time
a
certain
sequence
of
operations.
A
A
So
we're
going
to
do
the
repetition
here
this
x
times
in
order
to
get
an
average,
because
if
you
do
it
once
or
you
do
it
twice,
you
never
know
what
system
level
artifacts
might
appear
and
you
might
get
a
little
bit
of
a
a
little
bit
of
an
artifact
hope
that
makes
sense
to
everybody.
A
So,
just
in
pseudocode,
what
that
might
look
like
is
we
have
a
function
main,
which
is
where
we're
going
to
start
our
program,
we're
going
to
name
a
variable
before
we're
going
to
get
the
system
time
using
our
now.
A
A
All
right
great,
so
just
for
those
of
you
following
along
at
home.
Oh
I,
like
not
matthias's
answer.
That's
really!
That's
really
really
really
good.
I
like
that,
but
let's
go
with
rainer
and
attic.
I
hope
I'm
pronouncing
that
correctly
they're
their
answers
here
and
we'll
talk
about
iterators.
So
in
an
iterator.
What
I
would
do
is
I
would
write
a
canonical
loop
with
a
4
and,
I
would
say,
4
something
in
some
iterative
range.
A
So
let's
look
at
our
first
coding
example
here
and
let's
turn
what
our
our
pseudocode
into
some
real
code.
I
hope
that
that's
big
enough
for
everyone
to
see.
If
it's
not
prior
to
the
presentation,
I
sent
out
a
link
to
the
github
repository
so
that
you
can
follow
along,
and
this
is
the
benchmark,
dash
iter.rs
file.
A
What
you
can
see
on
line
seven
is
that
we've
got
this
function
main,
which
is
like
just
like
in
our
pseudocode,
we're
going
to
define
x
here
to
be
some
relatively
large
number
of
iterations
we're
going
to
take
the
system
time
before
the
operation
and
now
in
lines
12
through
14.
This
is
where
we
get
down
to
business,
and
we
will
see
the
recommendations
here
that
rayner
and
abik
made
that
this
is
the
kind
of
loop,
the
canonical
kind
of
loop,
that
rust
recommend.
A
A
B
A
We'll
get
into
that
in
a
little
bit
later,
but
the
question
again
raynor
repeated
it
but
I'll
repeat
it
again.
Does
blanking
the
iterator
help
speed
up
the
loop,
or
does
it
only
help
linting
and
that's
not
a
question
that
I
addressed
directly
but
we'll
actually
see
when
we
go
through
here
what's
going
on,
and
my
guess
is
that
it
really
only
helps
with
linting.
A
So
just
a
very,
very,
very
good
question.
Oh
so
I
have,
I
see
a
comment
here
from
not
matthias
that
talks
about
how
llvm
can
optimize
the
loop
and
also
we've
got.
We've
got
some
good
comments
coming
in
from
stable
minor
as
well,
and
I
don't
want
them
to
ruin
the
ruin
the
surprise.
So
so,
if
you're
interested
in
knowing
the
answer
before
we
get
there,
you
can
read
some
comments
in
the
discord
channel,
but
otherwise
follow
along
with
me.
A
So
that
seems
like
an
awfully
long
time
to
that
seems
like
an
awfully
long
time
to
take
for
just
doing
nothing
over
and
over
again,
so
that
was
that
was
slower
than
it
needed
to
be
right,
probably
way
slower.
A
So,
let's
take
a
look
at
what's
actually
going
on
under
the
hood
one
of
the
ways
that
you
can
look
at
what's
going
on
underneath
the
hood
is
by
doing
something
called
objdump,
an
objdump
will
actually
be
used
to
take
the
binary
that
the
rust
compiler
generates
rust
c
and
actually
look
at
the
machine
code,
that's
generated
which
gets
executed
on
your
system,
and
I
find
this
to
be
very,
very,
very
interesting
and
I
love
looking
at
object
code.
It's
not
something
that
everyone
likes
to
do,
and
it's
also
not
something.
A
It's
also
relatively
intimidating
for
people,
but
it
doesn't
need
to
be
it's
actually
very
straightforward.
Usually
so
what
I'm
going
to
do?
Is
I've
annotated
some
of
the
object
code
files
that
are
generated
by
the
compiler
for
these
examples,
just
to
give
us
a
sense
of
what's
going
on.
So
let's
take
a
look
at
those
right.
A
A
So
what
we're
looking
at
here,
let
me
know
if
this
is
big
enough,
but
what
we're
looking
at
here
is
the
machine
code
that
the
compiler
generated
for
this
loop
for
this
main
function.
A
And
I've
highlighted
some
really
interesting
parts
of
the
code.
The
first
thing
that
we're
going
to
do
is
create
some
space
for
the
local
variables,
we're
going
to
initialize
that
x
value
we're
going
to
call
system
time
now
and
store
it
in
the
result
before
we're
going
to
make
the
xero
iterator
the
big
iterator.
A
A
A
Now.
What's
very
interesting
about
this,
you
can
all
go
back
and
look
at
this
on
your
own
at
any
time
with
the
comments
and
feel
free
to
do
so.
A
A
A
So
what
I'm
going
to
do
is
I'm
going
to
rewrite
this
benchmarking
function
with
a
simple
while
loop,
which
is
not
the
recommended
way
to
write
loops
in
rust,
but
it's
a
definitely
a
way
to
do
it.
So
I've
already
got
that
file
written
here.
It's
called
benchmark
dash
while
dot
rs
and
what
you'll
see
is
that
the
setup
and.
A
Operations
before
the
loop
are
just
as
the
same
as
they
were
before
we're
doing
the
same
number
of
iterations
we've
got
and
we're
taking
the
before
time.
The
only
difference
is
that
we
do
have
to
take
care
of
making
sure
that
we
execute
only
the
right
number
of
times,
so
we
create
our
own
variable
here.
Our
counter
variable,
I
it's
immutable,
so
we
can
change.
It
starts
at
zero.
A
We
calculate
that
we're
going
to
do
the
body
of
this
loop,
while
that
value
is
less
than
our
upper
limit,
we'll
add
one
every
time
and
we'll
say,
call
sum
function
to
time
again
after
the
loop.
Everything
is
the
same
as
it
was
before
no
difference
at
all,
interesting.
Okay.
So
now
that
we
understand
that,
let's
go
back
and
see
how
long
this
actually
takes
to
run
again,
I'm
going
to
make
sure
that
cargo
knows
that
I'm
executing
the
right
file
here.
A
A
Wow
that
is
significantly
faster
than
the
one
before
it
this
time
it
only
took
two
seconds
to
do
all
those
iterations,
instead
of
the
22
seconds
that
it
took
before.
That
is
a
significant
difference.
So
what
is
coming
to
mind?
Is
that
there's
something
that's
not
right
here,
something
not
right.
So
again,
let's
look
at
the
code
that
the
compiler
is
generating
and
see.
What's
going
on
again,
I've
annotated
that
code
and
I've
placed
it
in
this
repository
that
you
can
all
see.
A
A
What
you'll
notice
here
is
that
at
the
top
of
the
loop
there
are
no
more
function
calls
there
is
no
function
call
here
to
get
the
next
value
of
the
loop.
That's
because
everything
is
taken
care
of
without
calling
it
without
calling
a
function.
The
incremented
iterator
or
the
next
value
is
calculated
with
straight
machine
operations.
A
A
A
A
A
All
right
good,
so
this
is
not
a
fact.
Tell
me
there.
You
go
all
right
now
we're
getting
some
good
answers.
I
like
this
peter
and
stable
minor
and
felix,
I
believe,
have
already
sort
of
thought
this
through.
What
am
I
missing
here?
What
am
I
missing
about?
What
I've
just
shown
you
that
might
make
this
behavior.
A
Exactly
all
right,
so
simple,
simple,
optimization
here:
what
we're
doing
with
these
cargo
build
operations
before
is
what
you'll
see.
Basically,
the
compiler
is
telling
us.
This
is
an
unoptimized
build.
This
is
unoptimized
and
it
also
includes
debugging
information,
which
is
helpful
for
finding
problems
in
russ
code,
but
it's
also
not
useful
for
trying
to
get
the
most
speed
out
of
the
compiler.
A
A
Now,
if
I
execute
cargo
build
again,
I'm
going
to
get
the
debug
version
one
more
time,
so
what
I
need
to
do
to
get
cargo
to
build
a
optimized
version
is
to
execute
in
release
mode.
In
order
to
do
that,
I
just
pass
the
dash
dash
release
flag
and
away
I
go.
What
you'll
see
is
that
the
finished
line
here
indicates
that
we
built
a
release
version
and
then
it
is
indeed
optimized.
What's
really
cool
is
that
didn't
take
any
longer
to
do
than
to
than
the
unoptimized
version.
A
However,
if
we
try
to
debug
an
optimized
version
of
the
code,
it's
going
to
be
a
little
bit
more
difficult.
Now
we
can't
do
cargo
run
to
execute
an
optimized
version.
We
have
to
actually
call
the
binary
that
was
generated
directly.
The
program
that
was
generated
directly
so
we'll
do
target
release
and
we'll
do.
A
A
So
this
will,
let
us
execute
the
optimized
version
of
the
code
that
we
compiled
oh
boy,
zero
time
between
that's
really
fast.
So
that
was
our
big
problem.
We
weren't
optimizing
the
loop
before
and
once
the
compiler
had
its
way
with
our
loop,
the
time
between
went
to
zero.
So
now,
let's
also
compare
that
to
how
the
optimized
version
runs.
The
optimized
version
of
the
while
loop
runs
we'll
do
cargo
build
cargo
clean,
I'm
sorry
to
get
rid
of
the
old
version.
A
And
I
can't
read
quickly
enough
in
the
chat,
but
someone
gave
me
a
really
cool
answer
to
the
question
that
I
wanted
to
know
for
quite
some
time
and
I'm
going
to
tell
you
in
just
a
second
what
that
is
we'll
do
cargo
clean
again
just
to
make
sure
the
cargo
build
release
and
now
what
I
never
knew
was
this:
you
can
do.
Cargo
run
dash
dash
release
and
you
get
the
exact
same
thing
as
running
the
binary
or
running
the
program,
directly,
very
cool,
so
there
you
go.
A
The
answer
is
the
while
loop
now
takes
just
as
long
as
the
iterative
loop,
which
is
great,
so
let's
go
back
and
draw
some
more
conclusions.
That
was
a
close
call,
people
that
was
a
really
close
call.
We
almost
discovered
a
situation
where
the
rust
book
recommended
doing
something
in
a
canonical,
very
readable
fashion.
A
A
A
A
A
A
What
we
notice
here
upon
complete
investigation
is
that
the
optimizer
has
done
far
far
far
too
good
a
job,
and
what
happened
is
that
the
function
and
the
loop
in
its
entirety
was
taken
away
by
the
compiler
it.
The
compiler
realized
that
that
loop
meant
absolutely
nothing
to
the
outcome
of
our
program
and
therefore
it
was
entirely
optimized
away.
A
A
So
we
have
to
go
back
here
and
we
have
to
label
this.
A
tentative
fact
we're
not
really
sure
that
that
four
and
four
in
loops
and
while
loops
take
the
same
amount
of
time,
but
we're
going
to
investigate
whether
this
is
actually
the
case.
A
I,
like
a
couple
of
answers
here
that
I
see
in
the
discord
chat.
Benchmarking
is
hard.
Yes,
oh
it's
almost
like
abec
has
read
my
mind
here
on.
What's
going
on,
I
like
the
comment.
No
code
is
actually
the
safest
code.
That's
exactly
right.
I
work
in
safety
and
dependability
and
my
boss
used
to
say
that
the
safest
airplane
is
a
rock
because
it
never
goes
off
the
ground.
So
no
code
is
the
safest
code.
A
Yes,
rum,
I
can't
pronounce
I'm
sorry
your
screen
name.
I
like
that
comment,
a
lot
abik.
I
like
your
comment
too,
and
that's
exactly
what
we're
gonna
do
so,
the
takeaways
from
release
mode,
and
what
we
just
saw
is
that
the
optimizer
does
too
good
of
a
job
in
optimizing
our
code
and
doesn't
give
us
an
accurate
way
to
benchmark.
A
So
what
we
need
to
do
instead
is
to
use
something
called
volatile
variables
volatile
variables
when
we
label
them
such
we
label
a
variable
volatile.
It
tells
the
compiler,
don't
optimize
away
the
operations
that
are
done
on
this
variable
because
they
matter
this
name.
Volatile
comes
from
back
in
the
c
language
where
you
used
to
be
able
to
just
say,
volatile
and
then
the
name
of
a
variable
and
what
compiler,
what
programmers
use
volatile
for
is
to
make
sure
that
the
operations
are
actually
done
into
memory
rather
than
optimized
away.
A
A
Thanks
rayner,
I
appreciate
that
I
won't
take
it
personally
that,
but
I
don't
have
the
qualifications
to
pronounce
his
that
user's
screen.
Name
correctly,
I
have
to
be
austrian,
which
I'm
definitely
not
so
I
apologize
for
that.
But
back
to
the
task
at
hand,
abec
was
exactly
right
that
what
we
can
do
is
we
can
use
the
volatile
crate
in
rust
in
order
to
accomplish
the
same
thing
that
we
could
have
done
in
c.
A
So,
let's
see
what
that
looks
like
I've
got
this
version
here
that
I've
rewritten-
and
I
will
let
you
browse
this
on
your
own
later,
but
what
you'll
see
is
that
nothing
major
has
changed,
we're
doing
two
loops
back
to
back
instead
of
having
them
in
in
two
separate
files,
but
we're
going
to
do
the
operation
the
same
number
of
times.
A
A
A
A
A
All
right
so
now
in
microseconds,
it
looks
like
they
both
take
roughly
the
same
amount
of
time
perfect.
Now,
let's
do
this
a
little
bit
more.
We're
not
really
satisfied
that
one
answer
is
enough
to
prove
that
we
are
correct.
So
let's
do
another
couple
of
iterations
and
in
order
to
do
that,
I've
set
up
a
little
script
that
will
run
this
a
few
times
and
make
a
csv
file
that
we
can
use
to
actually
plot
the
results.
A
A
A
So
let's
use
libreoffice
here
to
open
this
file
and
see
what
we.
A
Get
I'm
gonna
quickly
grab
this
make
a
little
graph
here,
make
it
a
line.
Chart
put
some
lines
on
it
and
we'll
see
what
we
get
all
right,
pretty
impressive
stuff
there.
So
what
we
see
is
that
most
of
the
time
spent
in
the
two
loops
is
roughly
equivalent.
We've
done
15
iterations
and
we
see
that
they
pretty
well
track
each
other.
A
One
is
not
necessarily
always
faster
than
the
other
interesting.
So
I
hope
that
is
enough
evidence
for
you
to
believe
that
these
two
iterations,
these
two
types
of
loops
take
roughly
the
same
amount
of
time.
A
A
A
So
this
might
be
something
that
you're
interested
in
exploring
more
if
you've
never
heard
of
loop
unrolling
before
I'm
not
going
to
go
into
that
right
now,
because
I
don't
want
to
bore
anyone,
but
in
the
questions
after
I'm
happy
to
explain
this,
but
here's
the
big
surprise
is
when
I
go
look
at
the
loop
body
of
the
second
loop.
A
A
A
A
I
just
wanted
to
say
thank
you
to
the
creative
commons
providers
for
the
fact
logo
that
I
got
off
of
the
noun
project,
so
I
want
to
give
them
credit
as
appropriate
for
the
creative
commons
by
attribution
and
now
I
think,
rayner
is
going
to
pop
in
and
hope
direct
some
questions
to
me,
but
I
hope
that
you
all
enjoyed
the
presentation
and
I
hope
that
you
will
give
feedback
if
you,
if
you
didn't
and
think
I
can
improve,
I
would
always
love
to
do
better.