►
From YouTube: "A Work-stealing Runtime for Rust" - Aaron Todd
Description
"A Work-stealing Runtime for Rust" - Aaron Todd
Help us caption & translate this video!
http://amara.org/v/2FhQ/
A
Hello,
my
name
is
Aaron
Todd
and
I
was
a
research
engineering
intern
at
Mozilla
this
summer,
I'm
going
to
be
talking
about
my
work
on
the
rust
languages,
new
runtime
system
and
in
particular
the
schedule
scheduler
for
lightweight
tasks,
which
uses
a
work-stealing
strategy
now.
So
there
are
a
lot
of
questions
you
might
have
about
runtimes
or
rust
I'm
going
to
try
and
answer
some
of
them,
but
I
really
want
to
get
covered
in
this
10
or
15.
A
So
first
a
little
bit
about
rust.
Many
of
you
are
already
pretty
familiar
with
rust.
Having
seen
other
talks
on
the
subject,
but
the
core
concepts
we
need
from
rust
is
that
it's
just
a
good
language
for
systems
programming.
It
tries
to
be
both
type
safe
through
a
pretty
conventional
static
type
system,
and
also
memory
safe
through
rusts,
pretty
novel
region
system
for
giving
you
static,
guarantees
on
your
memory
usage
and
then
we're
giving
you
all
this
safety
and
also
some
helpful
abstractions
like
traits
and
general
good
features
from
a
modern
language.
A
A
One
example:
application
is
browser
engine
like
server
which
Tim
talked
about
recently,
so
the
runtime
is
just
a
bunch
of
stuff
that
goes
on
to
help
your
program
run,
sort
of
nebulously
defined
and
what
it
actually
does
can
vary
depending
on
the
language.
The
JVM
is
an
example,
runtime
pretty
much
every
language,
but
C
or
C++
has
a
runtime
of
some
sort,
and
even
C
in
C++
applications
frequently
have
some
sort
of
runtime
written
in
C
or
C++.
In
rust.
A
Rust
doesn't
have
JIT,
yet
maybe
it
will
someday,
but
it's
a
pretty
common
thing
for
runtimes,
like
the
JVM
to
do
so
task
scheduling
it's
going
to
be
the
focus
now
the
rust
runtime
and
it's
current
or
no
old
state.
It
was
written
in
C++.
It
had
a
very
simple
task:
scheduler,
it
didn't
really
do
much
for
IO
management.
It
was
just
not
a
great
runtime.
It
was
a
great
first
attempt
to
get
the
language
going
and
it
was
great
to
build
off
of,
but
there's
a
lot.
We
can
do
differently
to
make
it
better.
A
So
the
goal
was
to
change
almost
everything
about
the
runtime
and
given
that
we
were
changing
so
much,
it
makes
sense
to
just
rewrite
it
from
scratch:
get
the
benefits
of
clean
code
but
rewriting
from
scratch.
We've
got
this
language
rust,
that's
good
for
low-level
systems.
Programming,
like
writing
a
runtime,
so
rust
is
now
one
of
the
very
few
languages
out
there
with
a
self-hosting
runtime
runtime
is
written
almost
entirely
in
rust.
Now
it's
another
important
definition.
What
is
a
task
in
rust?
A
A
It's
one
of
the
things
rust
enforces
through
its
memory
safety.
But
what
you
do
is
you
communicate
information
between
these
tasks
by
passing
messages
over
channels
and
well.
This
can
sound
kind
of
inefficient.
Rusts
region
systems
allows
you
to
just
pass
an
owned
pointer
to
a
thing
to
another
task.
A
So
if
you
have
a
giant
data
structure
and
the
global
heap,
you
can
just
send
a
pointer
to
it
over
a
channel
and
the
region
system
will
statically
guarantee
that
the
other
task
takes
full
ownership
of
it
and
remove
your
access
to
it
and
that
you
get
great
high
performance
memory,
safe
message.
Sending
one
drawback
of
tasks,
though,
is
that
they're,
cooperatively
scheduled
since
you
don't
have
magical
kernel
powers,
you
can't
just
pre-emptive
for
loop
while
it's
executing
so
these
tasks,
you
have
to
not
screw
up.
In
that
sense,
no
infinite
for
loops.
A
We
can't
save
you
from
that.
Some
runtimes
do
do
some
preemption
because
anytime,
you
pass
into
the
runtime
from
a
task
like
when
you
allocate
or
do
garbage
collection
or
increase
the
size
of
your
stack.
The
runtime
is
free
to
preempt
at
that
point,
but
you
can
write
a
program
that
does
none
of
those
things
in
rust.
A
So
the
most
basic
type
of
scheduling
that
you
can
do
is
you
take
a
giant
global
queue
of
tasks
to
work
on,
and
you
make
a
bunch
of
workers,
probably
one
for
each
cpu
thread
that
you
have
and
then
you
just
have
those
workers
take
tasks
off
the
queue
run.
The
tasks
put
the
tasks
back
on
the
queue
very
straightforward
problem
is
intentioned.
This
doesn't
usually
work
out
very
well.
A
So
one
place
to
go
from
there
is
to
have
separate
work
queues
for
each
worker
thread,
and
this
is
what
the
old
rest
runtime
in
C
is
doing,
and
the
idea
here
is
when
you
spawn
a
task,
you
put
it
into
the
thread:
local
queue.
Well,
the
runtime
does
a
little
bit
more
than
this.
When
you
spawn
a
task,
you
just
put
it
in
your
local
queue
and
when
you
DQ
it
to
ask
you
grab
it
out
of
your
local
queue,
and
this
is
great.
A
No,
because
there's
no
contention
as
long
as
all
your
workers
have
stuff
to
do.
Everything
is
great.
The
problem
is
you
frequently
end
up
for
the
case
where
your
workers,
don't
all
have
stuff
to
do.
One
task
ends
up
doing
a
bunch
of
work
spawning
a
bunch
of
new
tasks
locally
and
the
other
threads
all
finish
and
go
to
sleep,
and
this
is
kind
of
bad.
A
You
don't
want
this,
so
what
we
need
to
do
is
move
the
work
between
the
worker
threads,
such
that
all
the
worker
threads
are
doing
stuff
and
their
couple
approaches
to
this.
So,
at
the
very
broad
sense
you
can
do
something
like
static
scheduling
where
you
know
exactly
what
you're
doing
you
know
details
about
the
workload
and
you
can
think
hard.
You
also
know
details
about
your
hardware
and
you
can
just
figure
out
what
the
optimal
schedule
is
ahead
of
time
to
allocate
the
work
and
the
problem
with
this.
A
Is
you
don't
actually
have
all
this
detail
and
you
don't
have
time
to
think
about
it
in
the
general
case,
which
is
why
we
can't
do
any
static
scheduling
like
this.
So
for
the
general
case,
you
need
some
sort
of
dynamic
scheduling
where
you
just
do
something.
It
has
to
be
really
easy
to
do,
and
you
just
hope
it
works
well
enough,
most
of
the
time,
but
it's
going
to
be
suboptimal
to
static
scheduling
when
you
can
pull
that
off
and
the
particular
type
of
dynamic
scheduling
we
do
in
rust
is
called
work
stealing.
A
This
is
a
very
standard
approach
used
all
the
time,
because
it's
so
if
and
the
basic
idea
is
when
your
local
work
queue
doesn't
have
any
work
to
do,
and
you
want
to
do
work.
You
just
steal
some
work
from
another
worker
threads
queue,
and
this
gives
you
the
benefit
of
using
your
local
queue
as
much
as
you
can,
but
you
still
have
the
option
of
getting
work.
Someplace
else
and
the
amount
of
effort
it
takes
to
get
work
from
someplace
else
is
very
low
because
you're
just
stealing
it
only
when
you
need
it.
A
The
problem
with
a
strategy
where
you
fan
out
work
across
worker
threads
is
it's
just
much
higher
overhead
than
this
now
back
to
contention,
which
is
a
big
deal
again,
because
now
we're
sharing
cues
between
tasks
and
one
thing
we
can
do
to
make
things
a
little
more
efficient.
Is
we
let
the
worker
only
push
and
pop
on
its
local
queue?
A
So
it
only
uses
the
left
end
and
then
we
can
make
threads
that
steal
only
use
the
right
end,
separate
the
responsibilities
a
bit
so
owner
pushes
and
pops
from
the
left
of
the
threads
steal
from
the
right
and
there's
actually
a
very,
very
optimized
data
structure.
For
exactly
this
use
case
used
in
almost
every
work
stealing
run
time.
There
is
called
the
chase
left
deck
and
it's
backed
by
an
array.
A
So
it's
pretty
fast
and
easy
to
manipulate,
and
the
other
notable
feature
is
that
it's
locked
free
and
you
lots
of
people
talking
about
lock,
free
code
like
it's.
This
fancy
new
thing
and
it
is
pretty
great.
It's
not
a
magic
bullet,
but
it
is
pretty
great
in
a
lot
of
situations
like
this
one
and
the
basic
idea
behind
it
is
that,
instead
of
using
locks
to
manage
your
concurrent
memory
access,
you
use
atomic
memory
operations,
which
are
a
little
bit
quicker.
A
The
basic
idea
for
many
data
structures
like
the
chase
live
deck,
is
to
use
a
compare
and
swap
operation,
and
what
you
do
is
you
read
a
word
say
in
this
case
it
would
be
the
index
of
the
top
element
in
the
deck,
because
you're
trying
to
steal
something,
and
then
you
do
some
stuff
say,
take
out
the
element
at
the
top
of
the
index
and
grab
a
pointer
to
it.
And
then
you
try
and
modify
the
original
top
index
to
decrement
it
because
you
removed
an
element
from
the
deck.
A
But
if
someone
else
is
doing
this
at
the
same
time,
you
can
have
multiple
people
DQ
the
same
element
and
decrement
this
index.
At
the
same
time,
bad
things
happen,
so
the
compare-and-swap
does
because
it
says
only
decrement
that
index,
if
no
one
else
has
done
it
first.
So
by
being
the
first
person
or
worker
to
succeed
at
this,
you
guarantee
that
you
are
the
only
one
who
DQ's
that
task,
and
this
sort
of
compare-and-swap
style
of
coding
makes
many
concurrent
data
structures
very
quick,
a
lot
more
complicated
in
the
code.
A
A
So
now
we've
got
all
the
bits:
I'll
assemble
them
together,
so
we've
got
the
N
worker
threads
one
per
CPU
core,
and
then
we
have
a
chase
Lev
deck
for
every
single
worker
thread.
And
then,
whenever
you
make
new
work,
you
push
it
onto
your
local
Chase
lab
deck.
Whenever
you
can
and
you
need
to
do,
work
you
DQ
it
locally,
there's
no
work
to
do
locally.
You
steal
from
a
random
other
schedulers
chase
left
deck
on
the
other
side,
so
results.
That's
pretty
much.
A
So
the
rust
implementation
and
the
first
fact
about
it
is
it's
not
actually
done
yet
so
the
work
stealing
part
of
it
is
there
and
the
rest
of
scheduler
is
pretty
much
all
working
in
rust.
We
turned
it
on
by
default
yesterday,
which
was
pretty
exciting,
but
the
chase
live
deck
is
currently
not
in
use,
so
we're
missing
that
big
part
should
be
a
big
speed-up
when
we
implement
it
and
there
are
a
lot
of
other
really
simple
optimizations.
We
can
do
to
make
this
go
a
lot
faster.
A
But
surprisingly,
despite
all
these
low-hanging
optimizations,
we
are
pretty
quick.
So
here
are
some
benchmark
graphs,
comparing
the
old
run
time,
which
was
written
in
C++
and
the
new
run
time,
which
is
written
in
rust
and
has
the
work-stealing
enabled
so
on
one
side,
we've
got
the
message
passing
benchmark,
which
is
pretty
equivalent.
A
The
new
run
time
is
a
little
slower
than
the
old
run
time
for
many
cases,
just
because
we
haven't
optimized
it
that
much,
but
where
the
new
runtime
does
pretty
well
is
in
spawning
new
tasks
and
spawn
this
benchmark
spawns
a
million
tasks
in
a
single
thread.
It's
sort
of
a
worst
case
scenario
for
straining
the
spawn
system
and
the
new
runtime
handles
it
pretty.
Well,
the
old
runtime
handles
it
very
poorly.
A
A
You
is
actually
really
useful,
even
though
you
do
need
shared
mutable
state
in
the
implementation
of
a
runtime,
because
it
forces
you
to
really
think
about
when
you
do
it
and
make
it
very
explicit
what
you're
doing
and
we're
hoping
the
new
runtime
will
catch
up
on
message
passing
pretty
soon
and
be
pretty
fast
and,
of
course,
I'll.
Thank
my
mentor,
Brian
Anderson
and
the
rest
of
the
runtime
interns
and
the
rest
and
serve
our
teams
in
Mozilla,
etc.
Any
questions.
B
B
B
A
A
A
For
everyone
else,
and
one
of
the
things
we
do
is
every
pass
through
the
scheduler
and
every
iteration
is
actually
a
run
processing
an
event
in
this
event
loop
and
right
now
we
just
need
to
make
sure
that
the
scheduler
process
an
event
when
there's
it
works
to
do
so.
We
just
send
it
events.
All
the
time
saying
go,
do
this
work
go!
A
Do
this
work
and
every
time
we
ping
it
with
the
new
event
we're
creating
a
new
callback
object,
which
is
another
allocation
and
a
bunch
of
work
to
make
sure
it
does
that
event?
They're
doing
this
a
few
times
per
message
pass
when
in
reality,
we
should
be
doing
it
only
one
allocation
and
then
sending
many
fewer
of
these
callbacks
so
it'll
be
a
big
speed.
Difference
then
get
a
question.
A
Yeah
I've
got
two
more
weeks.
I
should
be
able
to
finish
this
pretty
easily.
It's
an
example.
Easy
optimization
is:
we've
fixed
the
debugging
print
macro
to
not
format
the
strings
and
it
wasn't
printing
them
and
we
saw
a
50%
performance
increase
on
the
benchmarks
for
the
new
runtime.
That
was
a
one-line
full
request.
A
C
A
Yes,
an
example
of
how
hard
it
is
to
introduce
these
races
when
I
was
preparing.
This
presentation
dealing
with
races
isn't
actually
something
I
thought
of,
because
I
didn't
do
it
very
often
there
were
just
that
few
of
them.
There
are
other
segments
of
the
run
time
that
do
have
races,
but
the
scheduling
stuff
had
very
very
very
few.
If
any-
and
it's
all
thanks
to
rust,.