►
From YouTube: Faster Rust Builds
Description
1:30- Stuart Pernsteiner, "Faster Rust Builds" MV
Help us caption & translate this video!
http://amara.org/v/2Fh7/
A
Hi
I'm,
Stuart
and
I
have
been
working
this
summer
with
the
rust
team
on
getting
faster,
build
times
for
rust
code.
So
today,
I'm
going
to
talk
a
little
bit
about
some
of
the
reasons
why
rust
has
had
trouble
getting
fast
build
times
and
some
of
the
approaches
that
we're
using
to
speed
things
up.
So
one
of
the
big
reasons
why
rust
has
trouble
with
slow
build
times
is
that
the
rust
compilation
model
doesn't
exactly
make
things
easy.
A
So
here
is
what
the
compilation
model
looks
like
for
C
and
C++,
and
in
this
compilation
model
you
have
many
independent
source
files
and
on
each
source
file.
You
make
an
independent
invocation
of
the
compiler
to
produce
an
object
file
and
then,
once
all
of
your
compiler
invitations
are
done,
you
take
all
of
your
object
files
linked
them
together
to
produce
the
final
executable
and
I'm
just
going
to
ignore
headers
and
libraries,
because
they're
just
to
keep
things
simple.
A
So
with
a
compilation
model
like
this,
there
are
a
few
pretty
straightforward
things
that
you
can
do
to
speed
up
the
build
times,
so
one
option
is
to
use
parallel
builds.
So
all
of
these
GCC
and
vacations
are
independent
and
you
can
run
them
all
in
parallel.
So
you
know
if
you
have
a
quad
core
machine,
you
Ron
make
dash
j4
or
something
like
that
and
get
roughly
four
times
faster,
build.
Another
option
is
to
do
an
incremental
build.
A
So
if
you've
already
built
the
project
once
and
you've
only
changed
one
of
the
files,
then
you
can
skip
running
GCC
on
the
unchanged
files
and
just
reuse.
The
old
object
code
when
doing
linking-
and
this
can
also
give
you
pretty
significant
speed
up
on
your
build
now.
The
rust
compilation
model
looks
like
this,
and
so
the
main
difference
here
is
that
there's
only
a
single,
rusty
and
vacation
for
this
entire
project,
which
means
we
can't
run
anything
in
parallel.
A
There
aren't
any
parallel
rusty
invocations
that
we
could
run
at
the
same
time,
and
on
top
of
that,
we
can't
really
do
incremental
builds
because
I.
Basically,
if
one
of
these
files
changes
the
single
o
file,
this
entire
project
will
also
change
and
we
have
to
rerun
rusty
to
rebuild
0
file
and
there
are
no
sort
of
there's
no
sort
of
redundant
rusty
and
vacations
that
we
could
skip.
A
And
another
problem
here
is
that
those
GCC
invitations
were
each
running
on
a
fairly
small,
so
on
a
single
source
file,
which
means
maybe
a
few
thousand
lines
of
code
plus.
A
few
thousand
lines
of
headers,
which
are
mostly
declarations,
are
not
actual
code,
whereas
rusty
here
is
running
on
every
source
file
in
the
entire
project,
and
that
can
be
hundreds
of
thousands
of
lines
of
code.
A
So
since
the
so
since
the
sort
of
standard
approaches
to
speeding
up
build
times,
aren't
going
to
work
with
this
compilation
model.
If
we
want
to
make
things
faster,
we
have
to
make
changes
to
rusty
itself
and
get
that
single
rusty
invitation
to
not
take
as
long.
So,
let's
take
a
look
at
what
is
actually
happening
inside
rusty,
so
when
rusty
runs,
it
takes
all
of
the
source
files
for
the
project.
A
Parses
Amal
runs
them
through
type
checking
and
then
translates
all
of
the
rust
code
into
llvm
intermediate
representation,
which
it
passes
off
to
the
LLVM
compiler
back-end,
to
run
optimization
and
code
generation,
and
then
at
the
end
of
the
lvm
passes,
you
get
out
the
object
file.
Now
it
turns
out
that
if
you
time
how
long
these
different
steps
take,
the
type
checking
and
translation
phases
only
account
for
about
a
quarter
of
the
build
time.
A
The
time
spent
in
lv
m
accounts
for
three
quarters
of
the
build
time,
which
means
the
llvm
code
or
the
lvm
in
vacations,
are
a
good
target
for
us
to
try
to
optimize,
and
so
my
work
this
summer
has
been
mostly
focused
on
getting
being
able
to
parallelize
our
calls
to
lv
m.
So,
basically,
instead
of
having
a
single
lv
m
ir
module
and
a
single
invocation
of
the
lvm
code,
generator
would
actually
generate
multiple
llvm
modules
and
run
multiple
optimization
cogeneration
steps
in
parallel.
A
So
the
sort
of
interesting
aspects
of
this
approach
wound
up
being
sort
of
the
top
and
bottom
parts
of
this
diagram.
So
the
parts
where
you
are
taking
the
output
of
translation
and
trying
to
split
it
into
multiple
llvm
modules
and
then
at
the
end,
when
you
have
the
output
of
multiple
codegen
passes
in
trying
to
get
it
into
a
single
object
file,
so
the
splitting
ll
into
multiple
llvm
modules,
I
described
it
sort
of
like
this.
You
take
the
output
of
translation
and
split
it,
but
it's
actually
not
quite
that
simple.
A
These
llvm
modules,
if
you
try
to
duplicate
or
split
them,
I
will
actually
end
up
sharing
some
internal
data
structures
between
the
two
halves
and
the
result
is
that
if
you
try
to
run
these
on
separate
threads,
they
will
make
unsynchronized
accesses
to
those
data
structures.
You
will
get
race
conditions
and
usually
seg
faults
and.
A
And
then
the
final
step
of
this
process
is
after
code
generation
has
run.
We
each
code
generation
pass,
so
each
thread
will
produce
a
separate
object
file
and
we
would
like
to
combine
those
together
into
a
single
object
file.
Since
that's
what
the
rest
of
the
build
process
expects
as
output,
and
it
turns
out
to
do
this,
we
can
actually
just
use
a
linker
feature
called
incremental,
linking
which
combines
multiple
object
files
into
another
object
file
which
you
can
feed
into
later,
linking
steps,
and
so
that's
actually
pretty
straightforward.
A
So
the
parallel
codegen
stuff
actually
provides
a
pretty
significant
speed
up
to
rust
build
times.
So
the
blue
line
is
total,
build
time
and
you
can
see
that
it
decreases,
as
we
add
more
threads
up
to
4
threads,
which
is
the
number
of
cores
that
I
had
on
the
machine
that
I
ran.
These
tests
on.
You
can
also
see
the
red
line,
which
is
the
time
that
the
rust
compiler
spent,
not
in
llvm
that
increases,
but
it
only
increased.
A
So
the
time
spent
outside
lvm
increases
slightly
because
there
is
some
overhead
from
splitting
up
the
llvm
module
into
multiple
pieces,
but
it
only
increases
a
little
bit.
So
it's
totally
outweighed
by
the
benefits
of
running
multiple
llvm
steps
in
parallel,
and
so
one
downside
to
this.
A
parallel
build
approach
is
that
where
previously
we
had
a
single
compilation,
unit
and
lvm
could
see
basically
all
of
the
code
for
this
entire
project
at
once,
and
therefore
it
could
do
in
lining
on
basically
any
call
now
that
things
are
separated
into
multiple
pieces.
A
There
are
boundaries
between
them
and
where
lvm
cannot
see
the
target
of
a
call
and
so
might
be
prevented
from
doing
in
lining
in
some
cases,
and
there
are
also
some
other
optimizations
that
this
hinders,
but
it
turns
out
if
you
look
at
the
performance
of
the
generated
code
as
you
increase
the
number
of
threads,
it
increases
somewhat
up
to
about
fifteen
percent
overhead,
which
is
I,
guess
fairly
significant,
but
is
definitely
tolerable,
considering
the
benefits
that
you
get
in
build
time.
A
So
if
you're
doing
a
development
build
and
what
you
mainly
care
about
is
basically
being
able
to
edit
the
code
recompile
and
test
it
very
quickly,
then
this
would
be.
You
know,
parallel
code.
Jen
is
a
good
choice
to
let
you
do
that
for
release
builds.
Obviously,
if
you
want
the
maximum
performance
out
of
the
compiled
code,
then
you
should
stick
with
one
thread
and
just
basically,
hopefully
you
can
deal
with
the
longer
compilation
times,
so
this
parallel
codegen
feature
is
already
available.
Now
in
rust
master.
A
You
set
this
flag
to,
however
many
threads
you
want
to
use
and
it
will
separate
out
the
code
generation
into
that.
Many
pieces,
another
optimization
approach,
which
I
don't
think
I
have
time
to
talk
about
today,
is
a
incremental
codegen
in
which
we
only
run
translation
on
parts
of
the
project,
or
we
only
run
translation
on
functions
that
have
actually
changed.
A
A
A
There
is
also
another
effect,
which
is
that
there
are
some
there's
some
code
that
has
to
be
duplicated
into
every
compilation
unit
or
into
every
llvm
module.
So
even
if
you
had,
you
know
a
thousand
core
machine,
if
you
try
to
divide
your
project
into
a
thousand
llvm
modules,
you
die,
you
would
not
get
a
thousand
X
performance
because
there
would
be
fixed
overhead
in
every
module.
A
How
big
is
the
fixed
overhead?
It
depends
on
the
project
ID,
so
I
think,
currently
anything
that
the
programmer
has
marked,
as
in
line
will
get
duplicated
and,
let's
see
yeah
so
code
in
the
current
project
that
is
Marcus
in
line
will
be
duplicated
into
every
lvm
module.
Also,
if
two
different,
if
there
are
calls
from
two
different
modules
to
external
code
that
is
marked
in
line,
the
external
code
will
be
copied
into
both
of
those
modules.
So
it
doesn't
necessarily
get
copied
everywhere,
but
it
can
be
copied
multiple
times.
A
Hello,
can
you
hear
me?
Yes?
Yes,
you
can
hear
me
okay.
So
first
of
all,
this
is
awesome.
Work
thanks
for
doing
this,
so
I
just
had
a
question
about
the
performance
numbers
that
you
showed.
So
you
should
a
chart.
You
know
with
decreasing
build
times
as
you
added
additional
threads,
but
I
was
wondering.
A
A
So
I
don't
think,
there's
any
significant
overhead
introduced
by
the
sort
of
parallel
coach
and
infrastructure.
So
if
you
have
parallel
codegen
turned
off,
you
should
get
the
same
performance
as
before,
because
it's
generating
essentially
it's
essentially
generating
exactly
the
same
I
are
as
it
would
have
before,
and
doing
exactly
the
same
optimizations
as
it
would
have
before.
A
So
the
question
is
I:
guess:
what's
the
how
much
time
is
spent
in
optimization,
which
is
how
much
is
spent
on
code
generation?
So
that
depends
on
how
high
you
set
the
optimization
level
at
0
0.
It
spends
basically
no
time
in
optimization
on
optimization
level
2.
It
spends
about
two-thirds
of
the
time
on
optimization
and
the
remaining
one-third
on
code
generation
at
a
zero
code
generation
for
lib
rusty,
which
is
the
you
know,
main
component
of
the
rest
compiler
and
is
our
biggest
library
on
the
machine.
I
was
testing
on.
A
A
A
A
A
The
question
is:
is
this
on
by
default
and
it
currently
has
not
turned
on
by
default
anywhere.
The
plan
currently
is
to
have
it
turned
on
for
turned
on
by
default
for
low
optimization
levels.
So
if
the
user
does
not
request
o
2
or
higher
than
we
would
turn
this
on
by
default,
and
otherwise,
if
they
do
request
o
2
or
higher,
we
sort
of
expects
that
they
really
care
about
performance
and
don't
want
to
take
the
ten
to
fifteen
percent
performance
hit.
And
so
we
will
leave
it
on
a
single
threaded
compilation.
A
The
question
is
for
turning
this
on
by
default.
Do
we
have
a
way
to
find
out
how
many
cores
the
machine
actually
has
and
I,
I'm
not
sure
if
Russ
Russ
tur
does
have
such
a
mechanism?
Ok,
so
we
can
use
that,
although
the
current
plan
was
to
only
enable
basically
two
threaded
optimization,
because
enabling
a
higher
numbers
of
threads
can
actually
cause
a
can
actually
cause
worse,
build
performance
on
very
small
projects.